linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to rv64 JIT
@ 2020-07-13 18:37 Luke Nelson
  2020-07-13 18:37 ` [RFC PATCH bpf-next 1/3] bpf, riscv: Modify JIT ctx to support compressed instructions Luke Nelson
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Luke Nelson @ 2020-07-13 18:37 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Albert Ou, Daniel Borkmann, Luke Nelson,
	Björn Töpel, John Fastabend, Alexei Starovoitov,
	linux-kernel, netdev, Palmer Dabbelt, Paul Walmsley, KP Singh,
	Yonghong Song, linux-riscv, Andrii Nakryiko, Martin KaFai Lau,
	Xi Wang

This patch series enables using compressed riscv (RVC) instructions
in the rv64 BPF JIT.

RVC is a standard riscv extension that adds a set of compressed,
2-byte instructions that can replace some regular 4-byte instructions
for improved code density.

This series first modifies the JIT to support using 2-byte instructions
(e.g., in jump offset computations), then adds RVC encoding and
helper functions, and finally uses the helper functions to optimize
the rv64 JIT.

I used our formal verification framework, Serval, to verify the
correctness of the RVC encodings and their uses in the rv64 JIT.

The JIT continues to pass all tests in lib/test_bpf.c, and introduces
no new failures to test_verifier; both with and without RVC being enabled.

The following are examples of the JITed code for the verifier selftest
"direct packet read test#3 for CGROUP_SKB OK", without and with RVC
enabled, respectively. The former uses 178 bytes, and the latter uses 112,
for a ~37% reduction in code size for this example.

Without RVC:

   0: 02000813    addi  a6,zero,32
   4: fd010113    addi  sp,sp,-48
   8: 02813423    sd    s0,40(sp)
   c: 02913023    sd    s1,32(sp)
  10: 01213c23    sd    s2,24(sp)
  14: 01313823    sd    s3,16(sp)
  18: 01413423    sd    s4,8(sp)
  1c: 03010413    addi  s0,sp,48
  20: 03056683    lwu   a3,48(a0)
  24: 02069693    slli  a3,a3,0x20
  28: 0206d693    srli  a3,a3,0x20
  2c: 03456703    lwu   a4,52(a0)
  30: 02071713    slli  a4,a4,0x20
  34: 02075713    srli  a4,a4,0x20
  38: 03856483    lwu   s1,56(a0)
  3c: 02049493    slli  s1,s1,0x20
  40: 0204d493    srli  s1,s1,0x20
  44: 03c56903    lwu   s2,60(a0)
  48: 02091913    slli  s2,s2,0x20
  4c: 02095913    srli  s2,s2,0x20
  50: 04056983    lwu   s3,64(a0)
  54: 02099993    slli  s3,s3,0x20
  58: 0209d993    srli  s3,s3,0x20
  5c: 09056a03    lwu   s4,144(a0)
  60: 020a1a13    slli  s4,s4,0x20
  64: 020a5a13    srli  s4,s4,0x20
  68: 00900313    addi  t1,zero,9
  6c: 006a7463    bgeu  s4,t1,0x74
  70: 00000a13    addi  s4,zero,0
  74: 02d52823    sw    a3,48(a0)
  78: 02e52a23    sw    a4,52(a0)
  7c: 02952c23    sw    s1,56(a0)
  80: 03252e23    sw    s2,60(a0)
  84: 05352023    sw    s3,64(a0)
  88: 00000793    addi  a5,zero,0
  8c: 02813403    ld    s0,40(sp)
  90: 02013483    ld    s1,32(sp)
  94: 01813903    ld    s2,24(sp)
  98: 01013983    ld    s3,16(sp)
  9c: 00813a03    ld    s4,8(sp)
  a0: 03010113    addi  sp,sp,48
  a4: 00078513    addi  a0,a5,0
  a8: 00008067    jalr  zero,0(ra)

With RVC:

   0:   02000813    addi    a6,zero,32
   4:   7179        c.addi16sp  sp,-48
   6:   f422        c.sdsp  s0,40(sp)
   8:   f026        c.sdsp  s1,32(sp)
   a:   ec4a        c.sdsp  s2,24(sp)
   c:   e84e        c.sdsp  s3,16(sp)
   e:   e452        c.sdsp  s4,8(sp)
  10:   1800        c.addi4spn  s0,sp,48
  12:   03056683    lwu     a3,48(a0)
  16:   1682        c.slli  a3,0x20
  18:   9281        c.srli  a3,0x20
  1a:   03456703    lwu     a4,52(a0)
  1e:   1702        c.slli  a4,0x20
  20:   9301        c.srli  a4,0x20
  22:   03856483    lwu     s1,56(a0)
  26:   1482        c.slli  s1,0x20
  28:   9081        c.srli  s1,0x20
  2a:   03c56903    lwu     s2,60(a0)
  2e:   1902        c.slli  s2,0x20
  30:   02095913    srli    s2,s2,0x20
  34:   04056983    lwu     s3,64(a0)
  38:   1982        c.slli  s3,0x20
  3a:   0209d993    srli    s3,s3,0x20
  3e:   09056a03    lwu     s4,144(a0)
  42:   1a02        c.slli  s4,0x20
  44:   020a5a13    srli    s4,s4,0x20
  48:   4325        c.li    t1,9
  4a:   006a7363    bgeu    s4,t1,0x50
  4e:   4a01        c.li    s4,0
  50:   d914        c.sw    a3,48(a0)
  52:   d958        c.sw    a4,52(a0)
  54:   dd04        c.sw    s1,56(a0)
  56:   03252e23    sw      s2,60(a0)
  5a:   05352023    sw      s3,64(a0)
  5e:   4781        c.li    a5,0
  60:   7422        c.ldsp  s0,40(sp)
  62:   7482        c.ldsp  s1,32(sp)
  64:   6962        c.ldsp  s2,24(sp)
  66:   69c2        c.ldsp  s3,16(sp)
  68:   6a22        c.ldsp  s4,8(sp)
  6a:   6145        c.addi16sp  sp,48
  6c:   853e        c.mv    a0,a5
  6e:   8082        c.jr    ra

Luke Nelson (3):
  bpf, riscv: Modify JIT ctx to support compressed instructions
  bpf, riscv: Add encodings for compressed instructions
  bpf, riscv: Use compressed instructions in the rv64 JIT

 arch/riscv/net/bpf_jit.h        | 495 +++++++++++++++++++++++++++++++-
 arch/riscv/net/bpf_jit_comp32.c |  14 +-
 arch/riscv/net/bpf_jit_comp64.c | 287 +++++++++---------
 arch/riscv/net/bpf_jit_core.c   |   6 +-
 4 files changed, 650 insertions(+), 152 deletions(-)

-- 
2.25.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH bpf-next 1/3] bpf, riscv: Modify JIT ctx to support compressed instructions
  2020-07-13 18:37 [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to rv64 JIT Luke Nelson
@ 2020-07-13 18:37 ` Luke Nelson
  2020-07-15 18:02   ` Björn Töpel
  2020-07-13 18:37 ` [RFC PATCH bpf-next 2/3] bpf, riscv: Add encodings for " Luke Nelson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Luke Nelson @ 2020-07-13 18:37 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Albert Ou, Daniel Borkmann, Luke Nelson,
	Björn Töpel, John Fastabend, Alexei Starovoitov,
	linux-kernel, netdev, Palmer Dabbelt, Paul Walmsley, KP Singh,
	Yonghong Song, linux-riscv, Andrii Nakryiko, Martin KaFai Lau,
	Xi Wang

This patch makes the necessary changes to struct rv_jit_context and to
bpf_int_jit_compile to support compressed riscv (RVC) instructions in
the BPF JIT.

It changes the JIT image to be u16 instead of u32, since RVC instructions
are 2 bytes as opposed to 4.

It also changes ctx->offset and ctx->ninsns to refer to 2-byte
instructions rather than 4-byte ones. The riscv PC is always 16-bit
aligned, so this is sufficient to refer to any valid riscv offset.

The code for computing jump offsets in bytes is updated accordingly,
and factored in to the RVOFF macro to simplify the code.

Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
---
 arch/riscv/net/bpf_jit.h        | 23 ++++++++++++++++++++---
 arch/riscv/net/bpf_jit_comp32.c | 14 +++++++-------
 arch/riscv/net/bpf_jit_comp64.c | 12 ++++++------
 arch/riscv/net/bpf_jit_core.c   |  6 +++---
 4 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index 20e235d06f66..5c89ea904c1a 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -50,7 +50,7 @@ enum {
 
 struct rv_jit_context {
 	struct bpf_prog *prog;
-	u32 *insns;		/* RV insns */
+	u16 *insns;		/* RV insns */
 	int ninsns;
 	int epilogue_offset;
 	int *offset;		/* BPF to RV */
@@ -58,6 +58,9 @@ struct rv_jit_context {
 	int stack_size;
 };
 
+/* Convert from ninsns to bytes. */
+#define RVOFF(ninsns)	((ninsns) << 1)
+
 struct rv_jit_data {
 	struct bpf_binary_header *header;
 	u8 *image;
@@ -74,8 +77,22 @@ static inline void bpf_flush_icache(void *start, void *end)
 	flush_icache_range((unsigned long)start, (unsigned long)end);
 }
 
+/* Emit a 4-byte riscv instruction. */
 static inline void emit(const u32 insn, struct rv_jit_context *ctx)
 {
+	if (ctx->insns) {
+		ctx->insns[ctx->ninsns] = insn;
+		ctx->insns[ctx->ninsns + 1] = (insn >> 16);
+	}
+
+	ctx->ninsns += 2;
+}
+
+/* Emit a 2-byte riscv compressed instruction. */
+static inline void emitc(const u16 insn, struct rv_jit_context *ctx)
+{
+	BUILD_BUG_ON(!rvc_enabled());
+
 	if (ctx->insns)
 		ctx->insns[ctx->ninsns] = insn;
 
@@ -86,7 +103,7 @@ static inline int epilogue_offset(struct rv_jit_context *ctx)
 {
 	int to = ctx->epilogue_offset, from = ctx->ninsns;
 
-	return (to - from) << 2;
+	return RVOFF(to - from);
 }
 
 /* Return -1 or inverted cond. */
@@ -149,7 +166,7 @@ static inline int rv_offset(int insn, int off, struct rv_jit_context *ctx)
 	off++; /* BPF branch is from PC+1, RV is from PC */
 	from = (insn > 0) ? ctx->offset[insn - 1] : 0;
 	to = (insn + off > 0) ? ctx->offset[insn + off - 1] : 0;
-	return (to - from) << 2;
+	return RVOFF(to - from);
 }
 
 /* Instruction formats. */
diff --git a/arch/riscv/net/bpf_jit_comp32.c b/arch/riscv/net/bpf_jit_comp32.c
index b198eaa74456..d22001aa0057 100644
--- a/arch/riscv/net/bpf_jit_comp32.c
+++ b/arch/riscv/net/bpf_jit_comp32.c
@@ -644,7 +644,7 @@ static int emit_branch_r64(const s8 *src1, const s8 *src2, s32 rvoff,
 
 	e = ctx->ninsns;
 	/* Adjust for extra insns. */
-	rvoff -= (e - s) << 2;
+	rvoff -= RVOFF(e - s);
 	emit_jump_and_link(RV_REG_ZERO, rvoff, true, ctx);
 	return 0;
 }
@@ -713,7 +713,7 @@ static int emit_bcc(u8 op, u8 rd, u8 rs, int rvoff, struct rv_jit_context *ctx)
 	if (far) {
 		e = ctx->ninsns;
 		/* Adjust for extra insns. */
-		rvoff -= (e - s) << 2;
+		rvoff -= RVOFF(e - s);
 		emit_jump_and_link(RV_REG_ZERO, rvoff, true, ctx);
 	}
 	return 0;
@@ -731,7 +731,7 @@ static int emit_branch_r32(const s8 *src1, const s8 *src2, s32 rvoff,
 
 	e = ctx->ninsns;
 	/* Adjust for extra insns. */
-	rvoff -= (e - s) << 2;
+	rvoff -= RVOFF(e - s);
 
 	if (emit_bcc(op, lo(rs1), lo(rs2), rvoff, ctx))
 		return -1;
@@ -795,7 +795,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	 * if (index >= max_entries)
 	 *   goto out;
 	 */
-	off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
+	off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_bcc(BPF_JGE, lo(idx_reg), RV_REG_T1, off, ctx);
 
 	/*
@@ -804,7 +804,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	 *   goto out;
 	 */
 	emit(rv_addi(RV_REG_T1, RV_REG_TCC, -1), ctx);
-	off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
+	off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_bcc(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx);
 
 	/*
@@ -818,7 +818,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	if (is_12b_check(off, insn))
 		return -1;
 	emit(rv_lw(RV_REG_T0, off, RV_REG_T0), ctx);
-	off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
+	off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_bcc(BPF_JEQ, RV_REG_T0, RV_REG_ZERO, off, ctx);
 
 	/*
@@ -1214,7 +1214,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_imm32(tmp2, imm, ctx);
 			src = tmp2;
 			e = ctx->ninsns;
-			rvoff -= (e - s) << 2;
+			rvoff -= RVOFF(e - s);
 		}
 
 		if (is64)
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 6cfd164cbe88..26feed92f1bc 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -304,14 +304,14 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	if (is_12b_check(off, insn))
 		return -1;
 	emit(rv_lwu(RV_REG_T1, off, RV_REG_A1), ctx);
-	off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
+	off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_branch(BPF_JGE, RV_REG_A2, RV_REG_T1, off, ctx);
 
 	/* if (TCC-- < 0)
 	 *     goto out;
 	 */
 	emit(rv_addi(RV_REG_T1, tcc, -1), ctx);
-	off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
+	off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);
 
 	/* prog = array->ptrs[index];
@@ -324,7 +324,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	if (is_12b_check(off, insn))
 		return -1;
 	emit(rv_ld(RV_REG_T2, off, RV_REG_T2), ctx);
-	off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
+	off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_branch(BPF_JEQ, RV_REG_T2, RV_REG_ZERO, off, ctx);
 
 	/* goto *(prog->bpf_func + 4); */
@@ -757,7 +757,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			e = ctx->ninsns;
 
 			/* Adjust for extra insns */
-			rvoff -= (e - s) << 2;
+			rvoff -= RVOFF(e - s);
 		}
 
 		if (BPF_OP(code) == BPF_JSET) {
@@ -810,7 +810,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		e = ctx->ninsns;
 
 		/* Adjust for extra insns */
-		rvoff -= (e - s) << 2;
+		rvoff -= RVOFF(e - s);
 		emit_branch(BPF_OP(code), rd, rs, rvoff, ctx);
 		break;
 
@@ -831,7 +831,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		if (!is64 && imm < 0)
 			emit(rv_addiw(RV_REG_T1, RV_REG_T1, 0), ctx);
 		e = ctx->ninsns;
-		rvoff -= (e - s) << 2;
+		rvoff -= RVOFF(e - s);
 		emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff, ctx);
 		break;
 
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 709b94ece3ed..cd156efe4944 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -73,7 +73,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 
 	if (ctx->offset) {
 		extra_pass = true;
-		image_size = sizeof(u32) * ctx->ninsns;
+		image_size = sizeof(u16) * ctx->ninsns;
 		goto skip_init_ctx;
 	}
 
@@ -103,7 +103,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 			if (jit_data->header)
 				break;
 
-			image_size = sizeof(u32) * ctx->ninsns;
+			image_size = sizeof(u16) * ctx->ninsns;
 			jit_data->header =
 				bpf_jit_binary_alloc(image_size,
 						     &jit_data->image,
@@ -114,7 +114,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 				goto out_offset;
 			}
 
-			ctx->insns = (u32 *)jit_data->image;
+			ctx->insns = (u16 *)jit_data->image;
 			/*
 			 * Now, when the image is allocated, the image can
 			 * potentially shrink more (auipc/jalr -> jal).
-- 
2.25.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH bpf-next 2/3] bpf, riscv: Add encodings for compressed instructions
  2020-07-13 18:37 [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to rv64 JIT Luke Nelson
  2020-07-13 18:37 ` [RFC PATCH bpf-next 1/3] bpf, riscv: Modify JIT ctx to support compressed instructions Luke Nelson
@ 2020-07-13 18:37 ` Luke Nelson
  2020-07-15 18:10   ` Björn Töpel
  2020-07-13 18:37 ` [RFC PATCH bpf-next 3/3] bpf, riscv: Use compressed instructions in the rv64 JIT Luke Nelson
  2020-07-15 17:56 ` [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to " Björn Töpel
  3 siblings, 1 reply; 9+ messages in thread
From: Luke Nelson @ 2020-07-13 18:37 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Albert Ou, Daniel Borkmann, Luke Nelson,
	Björn Töpel, John Fastabend, Alexei Starovoitov,
	linux-kernel, netdev, Palmer Dabbelt, Paul Walmsley, KP Singh,
	Yonghong Song, linux-riscv, Andrii Nakryiko, Martin KaFai Lau,
	Xi Wang

This patch adds functions for encoding and emitting compressed riscv
(RVC) instructions to the BPF JIT.

Some regular riscv instructions can be compressed into an RVC instruction
if the instruction fields meet some requirements. For example, "add rd,
rs1, rs2" can be compressed into "c.add rd, rs2" when rd == rs1.

To make using RVC encodings simpler, this patch also adds helper
functions that selectively emit either a regular instruction or a
compressed instruction if possible.

For example, emit_add will produce a "c.add" if possible and regular
"add" otherwise.

Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
---
 arch/riscv/net/bpf_jit.h | 474 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 474 insertions(+)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index 5c89ea904c1a..f3ac2d4a50f7 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -13,6 +13,15 @@
 #include <linux/filter.h>
 #include <asm/cacheflush.h>
 
+static inline bool rvc_enabled(void)
+{
+#ifdef CONFIG_RISCV_ISA_C
+	return true;
+#else
+	return false;
+#endif
+}
+
 enum {
 	RV_REG_ZERO =	0,	/* The constant value 0 */
 	RV_REG_RA =	1,	/* Return address */
@@ -48,6 +57,18 @@ enum {
 	RV_REG_T6 =	31,
 };
 
+static inline bool is_creg(u8 reg)
+{
+	return (1 << reg) & (BIT(RV_REG_FP) |
+			     BIT(RV_REG_S1) |
+			     BIT(RV_REG_A0) |
+			     BIT(RV_REG_A1) |
+			     BIT(RV_REG_A2) |
+			     BIT(RV_REG_A3) |
+			     BIT(RV_REG_A4) |
+			     BIT(RV_REG_A5));
+}
+
 struct rv_jit_context {
 	struct bpf_prog *prog;
 	u16 *insns;		/* RV insns */
@@ -134,6 +155,16 @@ static inline int invert_bpf_cond(u8 cond)
 	return -1;
 }
 
+static inline bool is_6b_int(long val)
+{
+	return -(1L << 5) <= val && val < (1L << 5);
+}
+
+static inline bool is_10b_int(long val)
+{
+	return -(1L << 9) <= val && val < (1L << 9);
+}
+
 static inline bool is_12b_int(long val)
 {
 	return -(1L << 11) <= val && val < (1L << 11);
@@ -224,6 +255,59 @@ static inline u32 rv_amo_insn(u8 funct5, u8 aq, u8 rl, u8 rs2, u8 rs1,
 	return rv_r_insn(funct7, rs2, rs1, funct3, rd, opcode);
 }
 
+/* RISC-V compressed instruction formats. */
+
+static inline u32 rv_cr_insn(u8 funct4, u8 rd, u8 rs2, u8 op)
+{
+	return (funct4 << 12) | (rd << 7) | (rs2 << 2) | op;
+}
+
+static inline u32 rv_ci_insn(u8 funct3, u32 imm6, u8 rd, u8 op)
+{
+	u16 imm;
+
+	imm = ((imm6 & 0x20) << 7) | ((imm6 & 0x1f) << 2);
+	return (funct3 << 13) | (rd << 7) | op | imm;
+}
+
+static inline u32 rv_css_insn(u8 funct3, u32 uimm, u8 rs2, u8 op)
+{
+	return (funct3 << 13) | (uimm << 7) | (rs2 << 2) | op;
+}
+
+static inline u32 rv_ciw_insn(u8 funct3, u32 uimm, u8 rd, u8 op)
+{
+	return (funct3 << 13) | (uimm << 5) | ((rd & 0x7) << 2) | op;
+}
+
+static inline u32 rv_cl_insn(u8 funct3, u32 imm_hi, u8 rs1, u32 imm_lo, u8 rd,
+			     u8 op)
+{
+	return (funct3 << 13) | (imm_hi << 10) | ((rs1 & 0x7) << 7) |
+		(imm_lo << 5) | ((rd & 0x7) << 2) | op;
+}
+
+static inline u32 rv_cs_insn(u8 funct3, u32 imm_hi, u8 rs1, u32 imm_lo, u8 rs2,
+			     u8 op)
+{
+	return (funct3 << 13) | (imm_hi << 10) | ((rs1 & 0x7) << 7) |
+		(imm_lo << 5) | ((rs2 & 0x7) << 2) | op;
+}
+
+static inline u32 rv_ca_insn(u8 funct6, u8 rd, u8 funct2, u8 rs2, u8 op)
+{
+	return (funct6 << 10) | ((rd & 0x7) << 7) | (funct2 << 5) |
+		((rs2 & 0x7) << 2) | op;
+}
+
+static inline u32 rv_cb_insn(u8 funct3, u32 imm6, u8 funct2, u8 rd, u8 op)
+{
+	u16 imm;
+
+	imm = ((imm6 & 0x20) << 7) | ((imm6 & 0x1f) << 2);
+	return (funct3 << 13) | (funct2 << 10) | ((rd & 0x7) << 7) | op | imm;
+}
+
 /* Instructions shared by both RV32 and RV64. */
 
 static inline u32 rv_addi(u8 rd, u8 rs1, u16 imm11_0)
@@ -431,6 +515,135 @@ static inline u32 rv_amoadd_w(u8 rd, u8 rs2, u8 rs1, u8 aq, u8 rl)
 	return rv_amo_insn(0, aq, rl, rs2, rs1, 2, rd, 0x2f);
 }
 
+/* RVC instrutions. */
+
+static inline u32 rvc_addi4spn(u8 rd, u32 imm10)
+{
+	u32 imm;
+
+	imm = ((imm10 & 0x30) << 2) | ((imm10 & 0x3c0) >> 4) |
+		((imm10 & 0x4) >> 1) | ((imm10 & 0x8) >> 3);
+	return rv_ciw_insn(0x0, imm, rd, 0x0);
+}
+
+static inline u32 rvc_lw(u8 rd, u32 imm7, u8 rs1)
+{
+	u32 imm_hi, imm_lo;
+
+	imm_hi = (imm7 & 0x38) >> 3;
+	imm_lo = ((imm7 & 0x4) >> 1) | ((imm7 & 0x40) >> 6);
+	return rv_cl_insn(0x2, imm_hi, rs1, imm_lo, rd, 0x0);
+}
+
+static inline u32 rvc_sw(u8 rs1, u32 imm7, u8 rs2)
+{
+	u32 imm_hi, imm_lo;
+
+	imm_hi = (imm7 & 0x38) >> 3;
+	imm_lo = ((imm7 & 0x4) >> 1) | ((imm7 & 0x40) >> 6);
+	return rv_cs_insn(0x6, imm_hi, rs1, imm_lo, rs2, 0x0);
+}
+
+static inline u32 rvc_addi(u8 rd, u8 imm6)
+{
+	return rv_ci_insn(0, imm6, rd, 0x1);
+}
+
+static inline u32 rvc_li(u8 rd, u8 imm6)
+{
+	return rv_ci_insn(0x2, imm6, rd, 0x1);
+}
+
+static inline u32 rvc_addi16sp(u32 imm10)
+{
+	u32 imm;
+
+	imm = ((imm10 & 0x200) >> 4) | (imm10 & 0x10) | ((imm10 & 0x40) >> 3) |
+		((imm10 & 0x180) >> 6) | ((imm10 & 0x20) >> 5);
+	return rv_ci_insn(0x3, imm, RV_REG_SP, 0x1);
+}
+
+static inline u32 rvc_lui(u8 rd, u8 imm6)
+{
+	return rv_ci_insn(0x3, imm6, rd, 0x1);
+}
+
+static inline u32 rvc_srli(u8 rd, u8 imm6)
+{
+	return rv_cb_insn(0x4, imm6, 0, rd, 0x1);
+}
+
+static inline u32 rvc_srai(u8 rd, u8 imm6)
+{
+	return rv_cb_insn(0x4, imm6, 0x1, rd, 0x1);
+}
+
+static inline u32 rvc_andi(u8 rd, u8 imm6)
+{
+	return rv_cb_insn(0x4, imm6, 0x2, rd, 0x1);
+}
+
+static inline u32 rvc_sub(u8 rd, u8 rs)
+{
+	return rv_ca_insn(0x23, rd, 0, rs, 0x1);
+}
+
+static inline u32 rvc_xor(u8 rd, u8 rs)
+{
+	return rv_ca_insn(0x23, rd, 0x1, rs, 0x1);
+}
+
+static inline u32 rvc_or(u8 rd, u8 rs)
+{
+	return rv_ca_insn(0x23, rd, 0x2, rs, 0x1);
+}
+
+static inline u32 rvc_and(u8 rd, u8 rs)
+{
+	return rv_ca_insn(0x23, rd, 0x3, rs, 0x1);
+}
+
+static inline u32 rvc_slli(u8 rd, u8 imm6)
+{
+	return rv_ci_insn(0, imm6, rd, 0x2);
+}
+
+static inline u32 rvc_lwsp(u8 rd, u32 imm8)
+{
+	u32 imm;
+
+	imm = ((imm8 & 0xc0) >> 6) | (imm8 & 0x3c);
+	return rv_ci_insn(0x2, imm, rd, 0x2);
+}
+
+static inline u32 rvc_jr(u8 rs1)
+{
+	return rv_cr_insn(0x8, rs1, RV_REG_ZERO, 0x2);
+}
+
+static inline u32 rvc_mv(u8 rd, u8 rs)
+{
+	return rv_cr_insn(0x8, rd, rs, 0x2);
+}
+
+static inline u32 rvc_jalr(u8 rs1)
+{
+	return rv_cr_insn(0x9, rs1, RV_REG_ZERO, 0x2);
+}
+
+static inline u32 rvc_add(u8 rd, u8 rs)
+{
+	return rv_cr_insn(0x9, rd, rs, 0x2);
+}
+
+static inline u32 rvc_swsp(u32 imm8, u8 rs2)
+{
+	u32 imm;
+
+	imm = (imm8 & 0x3c) | ((imm8 & 0xc0) >> 6);
+	return rv_css_insn(0x6, imm, rs2, 0x2);
+}
+
 /*
  * RV64-only instructions.
  *
@@ -520,6 +733,267 @@ static inline u32 rv_amoadd_d(u8 rd, u8 rs2, u8 rs1, u8 aq, u8 rl)
 	return rv_amo_insn(0, aq, rl, rs2, rs1, 3, rd, 0x2f);
 }
 
+/* RV64-only RVC instructions. */
+
+static inline u32 rvc_ld(u8 rd, u32 imm8, u8 rs1)
+{
+	u32 imm_hi, imm_lo;
+
+	imm_hi = (imm8 & 0x38) >> 3;
+	imm_lo = (imm8 & 0xc0) >> 6;
+	return rv_cl_insn(0x3, imm_hi, rs1, imm_lo, rd, 0x0);
+}
+
+static inline u32 rvc_sd(u8 rs1, u32 imm8, u8 rs2)
+{
+	u32 imm_hi, imm_lo;
+
+	imm_hi = (imm8 & 0x38) >> 3;
+	imm_lo = (imm8 & 0xc0) >> 6;
+	return rv_cs_insn(0x7, imm_hi, rs1, imm_lo, rs2, 0x0);
+}
+
+static inline u32 rvc_subw(u8 rd, u8 rs)
+{
+	return rv_ca_insn(0x27, rd, 0, rs, 0x1);
+}
+
+static inline u32 rvc_addiw(u8 rd, u8 imm6)
+{
+	return rv_ci_insn(0x1, imm6, rd, 0x1);
+}
+
+static inline u32 rvc_ldsp(u8 rd, u32 imm9)
+{
+	u32 imm;
+
+	imm = ((imm9 & 0x1c0) >> 6) | (imm9 & 0x38);
+	return rv_ci_insn(0x3, imm, rd, 0x2);
+}
+
+static inline u32 rvc_sdsp(u32 imm9, u8 rs2)
+{
+	u32 imm;
+
+	imm = (imm9 & 0x38) | ((imm9 & 0x1c0) >> 6);
+	return rv_css_insn(0x7, imm, rs2, 0x2);
+}
+
+#endif /* __riscv_xlen == 64 */
+
+/* Helper functions that emit RVC instructions when possible. */
+
+static inline void emit_jalr(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && rd == RV_REG_RA && rs && !imm)
+		emitc(rvc_jalr(rs), ctx);
+	else if (rvc_enabled() && !rd && rs && !imm)
+		emitc(rvc_jr(rs), ctx);
+	else
+		emit(rv_jalr(rd, rs, imm), ctx);
+}
+
+static inline void emit_mv(u8 rd, u8 rs, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && rd && rs)
+		emitc(rvc_mv(rd, rs), ctx);
+	else
+		emit(rv_addi(rd, rs, 0), ctx);
+}
+
+static inline void emit_add(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && rd && rd == rs1 && rs2)
+		emitc(rvc_add(rd, rs2), ctx);
+	else
+		emit(rv_add(rd, rs1, rs2), ctx);
+}
+
+static inline void emit_addi(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
+{
+	if (rd == rs && !imm)
+		/*
+		 * RVC cannot handle imm == 0. Handle it here by emitting
+		 * no instructions since it should behave as a no-op.
+		 */
+		return;
+	else if (rvc_enabled() && rd == RV_REG_SP && rd == rs &&
+		 is_10b_int(imm) && !(imm & 0xf))
+		emitc(rvc_addi16sp(imm), ctx);
+	else if (rvc_enabled() && is_creg(rd) && rs == RV_REG_SP &&
+		 (u32)imm < 0x400 && !(imm & 0x3) && imm)
+		emitc(rvc_addi4spn(rd, imm), ctx);
+	else if (rvc_enabled() && rd && rd == rs && is_6b_int(imm))
+		emitc(rvc_addi(rd, imm), ctx);
+	else
+		emit(rv_addi(rd, rs, imm), ctx);
+}
+
+static inline void emit_li(u8 rd, s32 imm, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && rd && is_6b_int(imm))
+		emitc(rvc_li(rd, imm), ctx);
+	else
+		emit(rv_addi(rd, RV_REG_ZERO, imm), ctx);
+}
+
+static inline void emit_lui(u8 rd, s32 imm, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && rd && rd != RV_REG_SP && is_6b_int(imm) && imm)
+		emitc(rvc_lui(rd, imm), ctx);
+	else
+		emit(rv_lui(rd, imm), ctx);
+}
+
+static inline void emit_slli(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
+{
+	if (rd == rs && !imm)
+		/*
+		 * RVC cannot handle imm == 0. Handle it here by emitting
+		 * no instructions since it should behave as a no-op.
+		 */
+		return;
+	else if (rvc_enabled() && rd && rd == rs)
+		emitc(rvc_slli(rd, imm), ctx);
+	else
+		emit(rv_slli(rd, rs, imm), ctx);
+}
+
+static inline void emit_andi(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && is_creg(rd) && rd == rs && is_6b_int(imm))
+		emitc(rvc_andi(rd, imm), ctx);
+	else
+		emit(rv_andi(rd, rs, imm), ctx);
+}
+
+static inline void emit_srli(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
+{
+	if (rd == rs && !imm)
+		/*
+		 * RVC cannot handle imm == 0. Handle it here by emitting
+		 * no instructions since it should behave as a no-op.
+		 */
+		return;
+	else if (rvc_enabled() && is_creg(rd) && rd == rs)
+		emitc(rvc_srli(rd, imm), ctx);
+	else
+		emit(rv_srli(rd, rs, imm), ctx);
+}
+
+static inline void emit_srai(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
+{
+	if (rd == rs && !imm)
+		/*
+		 * RVC cannot handle imm == 0. Handle it here by emitting
+		 * no instructions since it should behave as a no-op.
+		 */
+		return;
+	else if (rvc_enabled() && is_creg(rd) && rd == rs)
+		emitc(rvc_srai(rd, imm), ctx);
+	else
+		emit(rv_srai(rd, rs, imm), ctx);
+}
+
+static inline void emit_sub(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && is_creg(rd) && rd == rs1 && is_creg(rs2))
+		emitc(rvc_sub(rd, rs2), ctx);
+	else
+		emit(rv_sub(rd, rs1, rs2), ctx);
+}
+
+static inline void emit_or(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && is_creg(rd) && rd == rs1 && is_creg(rs2))
+		emitc(rvc_or(rd, rs2), ctx);
+	else
+		emit(rv_or(rd, rs1, rs2), ctx);
+}
+
+static inline void emit_and(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && is_creg(rd) && rd == rs1 && is_creg(rs2))
+		emitc(rvc_and(rd, rs2), ctx);
+	else
+		emit(rv_and(rd, rs1, rs2), ctx);
+}
+
+static inline void emit_xor(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && is_creg(rd) && rd == rs1 && is_creg(rs2))
+		emitc(rvc_xor(rd, rs2), ctx);
+	else
+		emit(rv_xor(rd, rs1, rs2), ctx);
+}
+
+static inline void emit_lw(u8 rd, s32 off, u8 rs1, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && rs1 == RV_REG_SP && rd && (u32)off < 0x100 &&
+	    !(off & 0x3))
+		emitc(rvc_lwsp(rd, off), ctx);
+	else if (rvc_enabled() && is_creg(rd) && is_creg(rs1) &&
+		 (u32)off < 0x80 && !(off & 0x3))
+		emitc(rvc_lw(rd, off, rs1), ctx);
+	else
+		emit(rv_lw(rd, off, rs1), ctx);
+}
+
+static inline void emit_sw(u8 rs1, s32 off, u8 rs2, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && rs1 == RV_REG_SP && (u32)off < 0x100 &&
+	    !(off & 0x3))
+		emitc(rvc_swsp(off, rs2), ctx);
+	else if (rvc_enabled() && is_creg(rs1) && is_creg(rs2) &&
+		 (u32)off < 0x80 && !(off & 0x3))
+		emitc(rvc_sw(rs1, off, rs2), ctx);
+	else
+		emit(rv_sw(rs1, off, rs2), ctx);
+}
+
+/* RV64-only helper functions. */
+#if __riscv_xlen == 64
+
+static inline void emit_addiw(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && rd && rd == rs && is_6b_int(imm))
+		emitc(rvc_addiw(rd, imm), ctx);
+	else
+		emit(rv_addiw(rd, rs, imm), ctx);
+}
+
+static inline void emit_ld(u8 rd, s32 off, u8 rs1, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && rs1 == RV_REG_SP && rd && (u32)off < 0x200 &&
+	    !(off & 0x7))
+		emitc(rvc_ldsp(rd, off), ctx);
+	else if (rvc_enabled() && is_creg(rd) && is_creg(rs1) &&
+		 (u32)off < 0x100 && !(off & 0x7))
+		emitc(rvc_ld(rd, off, rs1), ctx);
+	else
+		emit(rv_ld(rd, off, rs1), ctx);
+}
+
+static inline void emit_sd(u8 rs1, s32 off, u8 rs2, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && rs1 == RV_REG_SP && (u32)off < 0x200 &&
+	    !(off & 0x7))
+		emitc(rvc_sdsp(off, rs2), ctx);
+	else if (rvc_enabled() && is_creg(rs1) && is_creg(rs2) &&
+		 (u32)off < 0x100 && !(off & 0x7))
+		emitc(rvc_sd(rs1, off, rs2), ctx);
+	else
+		emit(rv_sd(rs1, off, rs2), ctx);
+}
+
+static inline void emit_subw(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
+{
+	if (rvc_enabled() && is_creg(rd) && rd == rs1 && is_creg(rs2))
+		emitc(rvc_subw(rd, rs2), ctx);
+	else
+		emit(rv_subw(rd, rs1, rs2), ctx);
+}
+
 #endif /* __riscv_xlen == 64 */
 
 void bpf_jit_build_prologue(struct rv_jit_context *ctx);
-- 
2.25.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH bpf-next 3/3] bpf, riscv: Use compressed instructions in the rv64 JIT
  2020-07-13 18:37 [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to rv64 JIT Luke Nelson
  2020-07-13 18:37 ` [RFC PATCH bpf-next 1/3] bpf, riscv: Modify JIT ctx to support compressed instructions Luke Nelson
  2020-07-13 18:37 ` [RFC PATCH bpf-next 2/3] bpf, riscv: Add encodings for " Luke Nelson
@ 2020-07-13 18:37 ` Luke Nelson
  2020-07-15 18:12   ` Björn Töpel
  2020-07-15 17:56 ` [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to " Björn Töpel
  3 siblings, 1 reply; 9+ messages in thread
From: Luke Nelson @ 2020-07-13 18:37 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Albert Ou, Daniel Borkmann, Luke Nelson,
	Björn Töpel, John Fastabend, Alexei Starovoitov,
	linux-kernel, netdev, Palmer Dabbelt, Paul Walmsley, KP Singh,
	Yonghong Song, linux-riscv, Andrii Nakryiko, Martin KaFai Lau,
	Xi Wang

This patch uses the RVC support and encodings from bpf_jit.h to optimize
the rv64 jit.

The optimizations work by replacing emit(rv_X(...)) with a call to a
helper function emit_X, which will emit a compressed version of the
instruction when possible, and when RVC is enabled.

The JIT continues to pass all tests in lib/test_bpf.c, and introduces
no new failures to test_verifier; both with and without RVC being enabled.

The following are examples of the JITed code for the verifier selftest
"direct packet read test#3 for CGROUP_SKB OK", without and with RVC
enabled, respectively. The former uses 178 bytes, and the latter uses 112,
for a ~37% reduction in code size for this example.

Without RVC:

   0: 02000813    addi  a6,zero,32
   4: fd010113    addi  sp,sp,-48
   8: 02813423    sd    s0,40(sp)
   c: 02913023    sd    s1,32(sp)
  10: 01213c23    sd    s2,24(sp)
  14: 01313823    sd    s3,16(sp)
  18: 01413423    sd    s4,8(sp)
  1c: 03010413    addi  s0,sp,48
  20: 03056683    lwu   a3,48(a0)
  24: 02069693    slli  a3,a3,0x20
  28: 0206d693    srli  a3,a3,0x20
  2c: 03456703    lwu   a4,52(a0)
  30: 02071713    slli  a4,a4,0x20
  34: 02075713    srli  a4,a4,0x20
  38: 03856483    lwu   s1,56(a0)
  3c: 02049493    slli  s1,s1,0x20
  40: 0204d493    srli  s1,s1,0x20
  44: 03c56903    lwu   s2,60(a0)
  48: 02091913    slli  s2,s2,0x20
  4c: 02095913    srli  s2,s2,0x20
  50: 04056983    lwu   s3,64(a0)
  54: 02099993    slli  s3,s3,0x20
  58: 0209d993    srli  s3,s3,0x20
  5c: 09056a03    lwu   s4,144(a0)
  60: 020a1a13    slli  s4,s4,0x20
  64: 020a5a13    srli  s4,s4,0x20
  68: 00900313    addi  t1,zero,9
  6c: 006a7463    bgeu  s4,t1,0x74
  70: 00000a13    addi  s4,zero,0
  74: 02d52823    sw    a3,48(a0)
  78: 02e52a23    sw    a4,52(a0)
  7c: 02952c23    sw    s1,56(a0)
  80: 03252e23    sw    s2,60(a0)
  84: 05352023    sw    s3,64(a0)
  88: 00000793    addi  a5,zero,0
  8c: 02813403    ld    s0,40(sp)
  90: 02013483    ld    s1,32(sp)
  94: 01813903    ld    s2,24(sp)
  98: 01013983    ld    s3,16(sp)
  9c: 00813a03    ld    s4,8(sp)
  a0: 03010113    addi  sp,sp,48
  a4: 00078513    addi  a0,a5,0
  a8: 00008067    jalr  zero,0(ra)

With RVC:

   0:   02000813    addi    a6,zero,32
   4:   7179        c.addi16sp  sp,-48
   6:   f422        c.sdsp  s0,40(sp)
   8:   f026        c.sdsp  s1,32(sp)
   a:   ec4a        c.sdsp  s2,24(sp)
   c:   e84e        c.sdsp  s3,16(sp)
   e:   e452        c.sdsp  s4,8(sp)
  10:   1800        c.addi4spn  s0,sp,48
  12:   03056683    lwu     a3,48(a0)
  16:   1682        c.slli  a3,0x20
  18:   9281        c.srli  a3,0x20
  1a:   03456703    lwu     a4,52(a0)
  1e:   1702        c.slli  a4,0x20
  20:   9301        c.srli  a4,0x20
  22:   03856483    lwu     s1,56(a0)
  26:   1482        c.slli  s1,0x20
  28:   9081        c.srli  s1,0x20
  2a:   03c56903    lwu     s2,60(a0)
  2e:   1902        c.slli  s2,0x20
  30:   02095913    srli    s2,s2,0x20
  34:   04056983    lwu     s3,64(a0)
  38:   1982        c.slli  s3,0x20
  3a:   0209d993    srli    s3,s3,0x20
  3e:   09056a03    lwu     s4,144(a0)
  42:   1a02        c.slli  s4,0x20
  44:   020a5a13    srli    s4,s4,0x20
  48:   4325        c.li    t1,9
  4a:   006a7363    bgeu    s4,t1,0x50
  4e:   4a01        c.li    s4,0
  50:   d914        c.sw    a3,48(a0)
  52:   d958        c.sw    a4,52(a0)
  54:   dd04        c.sw    s1,56(a0)
  56:   03252e23    sw      s2,60(a0)
  5a:   05352023    sw      s3,64(a0)
  5e:   4781        c.li    a5,0
  60:   7422        c.ldsp  s0,40(sp)
  62:   7482        c.ldsp  s1,32(sp)
  64:   6962        c.ldsp  s2,24(sp)
  66:   69c2        c.ldsp  s3,16(sp)
  68:   6a22        c.ldsp  s4,8(sp)
  6a:   6145        c.addi16sp  sp,48
  6c:   853e        c.mv    a0,a5
  6e:   8082        c.jr    ra

Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
---
 arch/riscv/net/bpf_jit_comp64.c | 275 +++++++++++++++++---------------
 1 file changed, 142 insertions(+), 133 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 26feed92f1bc..823fe800f406 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -137,14 +137,14 @@ static void emit_imm(u8 rd, s64 val, struct rv_jit_context *ctx)
 
 	if (is_32b_int(val)) {
 		if (upper)
-			emit(rv_lui(rd, upper), ctx);
+			emit_lui(rd, upper, ctx);
 
 		if (!upper) {
-			emit(rv_addi(rd, RV_REG_ZERO, lower), ctx);
+			emit_li(rd, lower, ctx);
 			return;
 		}
 
-		emit(rv_addiw(rd, rd, lower), ctx);
+		emit_addiw(rd, rd, lower, ctx);
 		return;
 	}
 
@@ -154,9 +154,9 @@ static void emit_imm(u8 rd, s64 val, struct rv_jit_context *ctx)
 
 	emit_imm(rd, upper, ctx);
 
-	emit(rv_slli(rd, rd, shift), ctx);
+	emit_slli(rd, rd, shift, ctx);
 	if (lower)
-		emit(rv_addi(rd, rd, lower), ctx);
+		emit_addi(rd, rd, lower, ctx);
 }
 
 static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
@@ -164,43 +164,43 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
 	int stack_adjust = ctx->stack_size, store_offset = stack_adjust - 8;
 
 	if (seen_reg(RV_REG_RA, ctx)) {
-		emit(rv_ld(RV_REG_RA, store_offset, RV_REG_SP), ctx);
+		emit_ld(RV_REG_RA, store_offset, RV_REG_SP, ctx);
 		store_offset -= 8;
 	}
-	emit(rv_ld(RV_REG_FP, store_offset, RV_REG_SP), ctx);
+	emit_ld(RV_REG_FP, store_offset, RV_REG_SP, ctx);
 	store_offset -= 8;
 	if (seen_reg(RV_REG_S1, ctx)) {
-		emit(rv_ld(RV_REG_S1, store_offset, RV_REG_SP), ctx);
+		emit_ld(RV_REG_S1, store_offset, RV_REG_SP, ctx);
 		store_offset -= 8;
 	}
 	if (seen_reg(RV_REG_S2, ctx)) {
-		emit(rv_ld(RV_REG_S2, store_offset, RV_REG_SP), ctx);
+		emit_ld(RV_REG_S2, store_offset, RV_REG_SP, ctx);
 		store_offset -= 8;
 	}
 	if (seen_reg(RV_REG_S3, ctx)) {
-		emit(rv_ld(RV_REG_S3, store_offset, RV_REG_SP), ctx);
+		emit_ld(RV_REG_S3, store_offset, RV_REG_SP, ctx);
 		store_offset -= 8;
 	}
 	if (seen_reg(RV_REG_S4, ctx)) {
-		emit(rv_ld(RV_REG_S4, store_offset, RV_REG_SP), ctx);
+		emit_ld(RV_REG_S4, store_offset, RV_REG_SP, ctx);
 		store_offset -= 8;
 	}
 	if (seen_reg(RV_REG_S5, ctx)) {
-		emit(rv_ld(RV_REG_S5, store_offset, RV_REG_SP), ctx);
+		emit_ld(RV_REG_S5, store_offset, RV_REG_SP, ctx);
 		store_offset -= 8;
 	}
 	if (seen_reg(RV_REG_S6, ctx)) {
-		emit(rv_ld(RV_REG_S6, store_offset, RV_REG_SP), ctx);
+		emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx);
 		store_offset -= 8;
 	}
 
-	emit(rv_addi(RV_REG_SP, RV_REG_SP, stack_adjust), ctx);
+	emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx);
 	/* Set return value. */
 	if (!is_tail_call)
-		emit(rv_addi(RV_REG_A0, RV_REG_A5, 0), ctx);
-	emit(rv_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA,
-		     is_tail_call ? 4 : 0), /* skip TCC init */
-	     ctx);
+		emit_mv(RV_REG_A0, RV_REG_A5, ctx);
+	emit_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA,
+		  is_tail_call ? 4 : 0, /* skip TCC init */
+		  ctx);
 }
 
 static void emit_bcc(u8 cond, u8 rd, u8 rs, int rvoff,
@@ -280,8 +280,8 @@ static void emit_branch(u8 cond, u8 rd, u8 rs, int rvoff,
 
 static void emit_zext_32(u8 reg, struct rv_jit_context *ctx)
 {
-	emit(rv_slli(reg, reg, 32), ctx);
-	emit(rv_srli(reg, reg, 32), ctx);
+	emit_slli(reg, reg, 32, ctx);
+	emit_srli(reg, reg, 32, ctx);
 }
 
 static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
@@ -310,7 +310,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	/* if (TCC-- < 0)
 	 *     goto out;
 	 */
-	emit(rv_addi(RV_REG_T1, tcc, -1), ctx);
+	emit_addi(RV_REG_T1, tcc, -1, ctx);
 	off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);
 
@@ -318,12 +318,12 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	 * if (!prog)
 	 *     goto out;
 	 */
-	emit(rv_slli(RV_REG_T2, RV_REG_A2, 3), ctx);
-	emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_A1), ctx);
+	emit_slli(RV_REG_T2, RV_REG_A2, 3, ctx);
+	emit_add(RV_REG_T2, RV_REG_T2, RV_REG_A1, ctx);
 	off = offsetof(struct bpf_array, ptrs);
 	if (is_12b_check(off, insn))
 		return -1;
-	emit(rv_ld(RV_REG_T2, off, RV_REG_T2), ctx);
+	emit_ld(RV_REG_T2, off, RV_REG_T2, ctx);
 	off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_branch(BPF_JEQ, RV_REG_T2, RV_REG_ZERO, off, ctx);
 
@@ -331,8 +331,8 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	off = offsetof(struct bpf_prog, bpf_func);
 	if (is_12b_check(off, insn))
 		return -1;
-	emit(rv_ld(RV_REG_T3, off, RV_REG_T2), ctx);
-	emit(rv_addi(RV_REG_TCC, RV_REG_T1, 0), ctx);
+	emit_ld(RV_REG_T3, off, RV_REG_T2, ctx);
+	emit_mv(RV_REG_TCC, RV_REG_T1, ctx);
 	__build_epilogue(true, ctx);
 	return 0;
 }
@@ -360,9 +360,9 @@ static void init_regs(u8 *rd, u8 *rs, const struct bpf_insn *insn,
 
 static void emit_zext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
 {
-	emit(rv_addi(RV_REG_T2, *rd, 0), ctx);
+	emit_mv(RV_REG_T2, *rd, ctx);
 	emit_zext_32(RV_REG_T2, ctx);
-	emit(rv_addi(RV_REG_T1, *rs, 0), ctx);
+	emit_mv(RV_REG_T1, *rs, ctx);
 	emit_zext_32(RV_REG_T1, ctx);
 	*rd = RV_REG_T2;
 	*rs = RV_REG_T1;
@@ -370,15 +370,15 @@ static void emit_zext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
 
 static void emit_sext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
 {
-	emit(rv_addiw(RV_REG_T2, *rd, 0), ctx);
-	emit(rv_addiw(RV_REG_T1, *rs, 0), ctx);
+	emit_addiw(RV_REG_T2, *rd, 0, ctx);
+	emit_addiw(RV_REG_T1, *rs, 0, ctx);
 	*rd = RV_REG_T2;
 	*rs = RV_REG_T1;
 }
 
 static void emit_zext_32_rd_t1(u8 *rd, struct rv_jit_context *ctx)
 {
-	emit(rv_addi(RV_REG_T2, *rd, 0), ctx);
+	emit_mv(RV_REG_T2, *rd, ctx);
 	emit_zext_32(RV_REG_T2, ctx);
 	emit_zext_32(RV_REG_T1, ctx);
 	*rd = RV_REG_T2;
@@ -386,7 +386,7 @@ static void emit_zext_32_rd_t1(u8 *rd, struct rv_jit_context *ctx)
 
 static void emit_sext_32_rd(u8 *rd, struct rv_jit_context *ctx)
 {
-	emit(rv_addiw(RV_REG_T2, *rd, 0), ctx);
+	emit_addiw(RV_REG_T2, *rd, 0, ctx);
 	*rd = RV_REG_T2;
 }
 
@@ -432,7 +432,7 @@ static int emit_call(bool fixed, u64 addr, struct rv_jit_context *ctx)
 	if (ret)
 		return ret;
 	rd = bpf_to_rv_reg(BPF_REG_0, ctx);
-	emit(rv_addi(rd, RV_REG_A0, 0), ctx);
+	emit_mv(rd, RV_REG_A0, ctx);
 	return 0;
 }
 
@@ -458,7 +458,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_zext_32(rd, ctx);
 			break;
 		}
-		emit(is64 ? rv_addi(rd, rs, 0) : rv_addiw(rd, rs, 0), ctx);
+		emit_mv(rd, rs, ctx);
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
@@ -466,31 +466,35 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	/* dst = dst OP src */
 	case BPF_ALU | BPF_ADD | BPF_X:
 	case BPF_ALU64 | BPF_ADD | BPF_X:
-		emit(is64 ? rv_add(rd, rd, rs) : rv_addw(rd, rd, rs), ctx);
+		emit_add(rd, rd, rs, ctx);
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_SUB | BPF_X:
 	case BPF_ALU64 | BPF_SUB | BPF_X:
-		emit(is64 ? rv_sub(rd, rd, rs) : rv_subw(rd, rd, rs), ctx);
+		if (is64)
+			emit_sub(rd, rd, rs, ctx);
+		else
+			emit_subw(rd, rd, rs, ctx);
+
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_AND | BPF_X:
 	case BPF_ALU64 | BPF_AND | BPF_X:
-		emit(rv_and(rd, rd, rs), ctx);
+		emit_and(rd, rd, rs, ctx);
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_OR | BPF_X:
 	case BPF_ALU64 | BPF_OR | BPF_X:
-		emit(rv_or(rd, rd, rs), ctx);
+		emit_or(rd, rd, rs, ctx);
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_XOR | BPF_X:
 	case BPF_ALU64 | BPF_XOR | BPF_X:
-		emit(rv_xor(rd, rd, rs), ctx);
+		emit_xor(rd, rd, rs, ctx);
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
@@ -534,8 +538,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	/* dst = -dst */
 	case BPF_ALU | BPF_NEG:
 	case BPF_ALU64 | BPF_NEG:
-		emit(is64 ? rv_sub(rd, RV_REG_ZERO, rd) :
-		     rv_subw(rd, RV_REG_ZERO, rd), ctx);
+		emit_sub(rd, RV_REG_ZERO, rd, ctx);
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
@@ -544,8 +547,8 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU | BPF_END | BPF_FROM_LE:
 		switch (imm) {
 		case 16:
-			emit(rv_slli(rd, rd, 48), ctx);
-			emit(rv_srli(rd, rd, 48), ctx);
+			emit_slli(rd, rd, 48, ctx);
+			emit_srli(rd, rd, 48, ctx);
 			break;
 		case 32:
 			if (!aux->verifier_zext)
@@ -558,51 +561,51 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		break;
 
 	case BPF_ALU | BPF_END | BPF_FROM_BE:
-		emit(rv_addi(RV_REG_T2, RV_REG_ZERO, 0), ctx);
+		emit_li(RV_REG_T2, 0, ctx);
 
-		emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
-		emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
-		emit(rv_srli(rd, rd, 8), ctx);
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
 		if (imm == 16)
 			goto out_be;
 
-		emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
-		emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
-		emit(rv_srli(rd, rd, 8), ctx);
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
 
-		emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
-		emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
-		emit(rv_srli(rd, rd, 8), ctx);
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
 		if (imm == 32)
 			goto out_be;
 
-		emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
-		emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
-		emit(rv_srli(rd, rd, 8), ctx);
-
-		emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
-		emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
-		emit(rv_srli(rd, rd, 8), ctx);
-
-		emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
-		emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
-		emit(rv_srli(rd, rd, 8), ctx);
-
-		emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
-		emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
-		emit(rv_srli(rd, rd, 8), ctx);
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
+
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
+
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
+
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
 out_be:
-		emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
 
-		emit(rv_addi(rd, RV_REG_T2, 0), ctx);
+		emit_mv(rd, RV_REG_T2, ctx);
 		break;
 
 	/* dst = imm */
@@ -617,12 +620,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU | BPF_ADD | BPF_K:
 	case BPF_ALU64 | BPF_ADD | BPF_K:
 		if (is_12b_int(imm)) {
-			emit(is64 ? rv_addi(rd, rd, imm) :
-			     rv_addiw(rd, rd, imm), ctx);
+			emit_addi(rd, rd, imm, ctx);
 		} else {
 			emit_imm(RV_REG_T1, imm, ctx);
-			emit(is64 ? rv_add(rd, rd, RV_REG_T1) :
-			     rv_addw(rd, rd, RV_REG_T1), ctx);
+			emit_add(rd, rd, RV_REG_T1, ctx);
 		}
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
@@ -630,12 +631,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU | BPF_SUB | BPF_K:
 	case BPF_ALU64 | BPF_SUB | BPF_K:
 		if (is_12b_int(-imm)) {
-			emit(is64 ? rv_addi(rd, rd, -imm) :
-			     rv_addiw(rd, rd, -imm), ctx);
+			emit_addi(rd, rd, -imm, ctx);
 		} else {
 			emit_imm(RV_REG_T1, imm, ctx);
-			emit(is64 ? rv_sub(rd, rd, RV_REG_T1) :
-			     rv_subw(rd, rd, RV_REG_T1), ctx);
+			emit_sub(rd, rd, RV_REG_T1, ctx);
 		}
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
@@ -643,10 +642,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU | BPF_AND | BPF_K:
 	case BPF_ALU64 | BPF_AND | BPF_K:
 		if (is_12b_int(imm)) {
-			emit(rv_andi(rd, rd, imm), ctx);
+			emit_andi(rd, rd, imm, ctx);
 		} else {
 			emit_imm(RV_REG_T1, imm, ctx);
-			emit(rv_and(rd, rd, RV_REG_T1), ctx);
+			emit_and(rd, rd, RV_REG_T1, ctx);
 		}
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
@@ -657,7 +656,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit(rv_ori(rd, rd, imm), ctx);
 		} else {
 			emit_imm(RV_REG_T1, imm, ctx);
-			emit(rv_or(rd, rd, RV_REG_T1), ctx);
+			emit_or(rd, rd, RV_REG_T1, ctx);
 		}
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
@@ -668,7 +667,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit(rv_xori(rd, rd, imm), ctx);
 		} else {
 			emit_imm(RV_REG_T1, imm, ctx);
-			emit(rv_xor(rd, rd, RV_REG_T1), ctx);
+			emit_xor(rd, rd, RV_REG_T1, ctx);
 		}
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
@@ -699,19 +698,28 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		break;
 	case BPF_ALU | BPF_LSH | BPF_K:
 	case BPF_ALU64 | BPF_LSH | BPF_K:
-		emit(is64 ? rv_slli(rd, rd, imm) : rv_slliw(rd, rd, imm), ctx);
+		emit_slli(rd, rd, imm, ctx);
+
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_RSH | BPF_K:
 	case BPF_ALU64 | BPF_RSH | BPF_K:
-		emit(is64 ? rv_srli(rd, rd, imm) : rv_srliw(rd, rd, imm), ctx);
+		if (is64)
+			emit_srli(rd, rd, imm, ctx);
+		else
+			emit(rv_srliw(rd, rd, imm), ctx);
+
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_ARSH | BPF_K:
 	case BPF_ALU64 | BPF_ARSH | BPF_K:
-		emit(is64 ? rv_srai(rd, rd, imm) : rv_sraiw(rd, rd, imm), ctx);
+		if (is64)
+			emit_srai(rd, rd, imm, ctx);
+		else
+			emit(rv_sraiw(rd, rd, imm), ctx);
+
 		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
@@ -763,7 +771,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		if (BPF_OP(code) == BPF_JSET) {
 			/* Adjust for and */
 			rvoff -= 4;
-			emit(rv_and(RV_REG_T1, rd, rs), ctx);
+			emit_and(RV_REG_T1, rd, rs, ctx);
 			emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff,
 				    ctx);
 		} else {
@@ -819,17 +827,17 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		rvoff = rv_offset(i, off, ctx);
 		s = ctx->ninsns;
 		if (is_12b_int(imm)) {
-			emit(rv_andi(RV_REG_T1, rd, imm), ctx);
+			emit_andi(RV_REG_T1, rd, imm, ctx);
 		} else {
 			emit_imm(RV_REG_T1, imm, ctx);
-			emit(rv_and(RV_REG_T1, rd, RV_REG_T1), ctx);
+			emit_and(RV_REG_T1, rd, RV_REG_T1, ctx);
 		}
 		/* For jset32, we should clear the upper 32 bits of t1, but
 		 * sign-extension is sufficient here and saves one instruction,
 		 * as t1 is used only in comparison against zero.
 		 */
 		if (!is64 && imm < 0)
-			emit(rv_addiw(RV_REG_T1, RV_REG_T1, 0), ctx);
+			emit_addiw(RV_REG_T1, RV_REG_T1, 0, ctx);
 		e = ctx->ninsns;
 		rvoff -= RVOFF(e - s);
 		emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff, ctx);
@@ -887,7 +895,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		}
 
 		emit_imm(RV_REG_T1, off, ctx);
-		emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
+		emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
 		emit(rv_lbu(rd, 0, RV_REG_T1), ctx);
 		if (insn_is_zext(&insn[1]))
 			return 1;
@@ -899,7 +907,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		}
 
 		emit_imm(RV_REG_T1, off, ctx);
-		emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
+		emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
 		emit(rv_lhu(rd, 0, RV_REG_T1), ctx);
 		if (insn_is_zext(&insn[1]))
 			return 1;
@@ -911,20 +919,20 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		}
 
 		emit_imm(RV_REG_T1, off, ctx);
-		emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
+		emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
 		emit(rv_lwu(rd, 0, RV_REG_T1), ctx);
 		if (insn_is_zext(&insn[1]))
 			return 1;
 		break;
 	case BPF_LDX | BPF_MEM | BPF_DW:
 		if (is_12b_int(off)) {
-			emit(rv_ld(rd, off, rs), ctx);
+			emit_ld(rd, off, rs, ctx);
 			break;
 		}
 
 		emit_imm(RV_REG_T1, off, ctx);
-		emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
-		emit(rv_ld(rd, 0, RV_REG_T1), ctx);
+		emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
+		emit_ld(rd, 0, RV_REG_T1, ctx);
 		break;
 
 	/* ST: *(size *)(dst + off) = imm */
@@ -936,7 +944,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		}
 
 		emit_imm(RV_REG_T2, off, ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
 		emit(rv_sb(RV_REG_T2, 0, RV_REG_T1), ctx);
 		break;
 
@@ -948,30 +956,30 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		}
 
 		emit_imm(RV_REG_T2, off, ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
 		emit(rv_sh(RV_REG_T2, 0, RV_REG_T1), ctx);
 		break;
 	case BPF_ST | BPF_MEM | BPF_W:
 		emit_imm(RV_REG_T1, imm, ctx);
 		if (is_12b_int(off)) {
-			emit(rv_sw(rd, off, RV_REG_T1), ctx);
+			emit_sw(rd, off, RV_REG_T1, ctx);
 			break;
 		}
 
 		emit_imm(RV_REG_T2, off, ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx);
-		emit(rv_sw(RV_REG_T2, 0, RV_REG_T1), ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
+		emit_sw(RV_REG_T2, 0, RV_REG_T1, ctx);
 		break;
 	case BPF_ST | BPF_MEM | BPF_DW:
 		emit_imm(RV_REG_T1, imm, ctx);
 		if (is_12b_int(off)) {
-			emit(rv_sd(rd, off, RV_REG_T1), ctx);
+			emit_sd(rd, off, RV_REG_T1, ctx);
 			break;
 		}
 
 		emit_imm(RV_REG_T2, off, ctx);
-		emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx);
-		emit(rv_sd(RV_REG_T2, 0, RV_REG_T1), ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
+		emit_sd(RV_REG_T2, 0, RV_REG_T1, ctx);
 		break;
 
 	/* STX: *(size *)(dst + off) = src */
@@ -982,7 +990,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		}
 
 		emit_imm(RV_REG_T1, off, ctx);
-		emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx);
+		emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
 		emit(rv_sb(RV_REG_T1, 0, rs), ctx);
 		break;
 	case BPF_STX | BPF_MEM | BPF_H:
@@ -992,28 +1000,28 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		}
 
 		emit_imm(RV_REG_T1, off, ctx);
-		emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx);
+		emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
 		emit(rv_sh(RV_REG_T1, 0, rs), ctx);
 		break;
 	case BPF_STX | BPF_MEM | BPF_W:
 		if (is_12b_int(off)) {
-			emit(rv_sw(rd, off, rs), ctx);
+			emit_sw(rd, off, rs, ctx);
 			break;
 		}
 
 		emit_imm(RV_REG_T1, off, ctx);
-		emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx);
-		emit(rv_sw(RV_REG_T1, 0, rs), ctx);
+		emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
+		emit_sw(RV_REG_T1, 0, rs, ctx);
 		break;
 	case BPF_STX | BPF_MEM | BPF_DW:
 		if (is_12b_int(off)) {
-			emit(rv_sd(rd, off, rs), ctx);
+			emit_sd(rd, off, rs, ctx);
 			break;
 		}
 
 		emit_imm(RV_REG_T1, off, ctx);
-		emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx);
-		emit(rv_sd(RV_REG_T1, 0, rs), ctx);
+		emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
+		emit_sd(RV_REG_T1, 0, rs, ctx);
 		break;
 	/* STX XADD: lock *(u32 *)(dst + off) += src */
 	case BPF_STX | BPF_XADD | BPF_W:
@@ -1021,10 +1029,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_STX | BPF_XADD | BPF_DW:
 		if (off) {
 			if (is_12b_int(off)) {
-				emit(rv_addi(RV_REG_T1, rd, off), ctx);
+				emit_addi(RV_REG_T1, rd, off, ctx);
 			} else {
 				emit_imm(RV_REG_T1, off, ctx);
-				emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx);
+				emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
 			}
 
 			rd = RV_REG_T1;
@@ -1073,52 +1081,53 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
 
 	/* First instruction is always setting the tail-call-counter
 	 * (TCC) register. This instruction is skipped for tail calls.
+	 * Force using a 4-byte (non-compressed) instruction.
 	 */
 	emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
 
-	emit(rv_addi(RV_REG_SP, RV_REG_SP, -stack_adjust), ctx);
+	emit_addi(RV_REG_SP, RV_REG_SP, -stack_adjust, ctx);
 
 	if (seen_reg(RV_REG_RA, ctx)) {
-		emit(rv_sd(RV_REG_SP, store_offset, RV_REG_RA), ctx);
+		emit_sd(RV_REG_SP, store_offset, RV_REG_RA, ctx);
 		store_offset -= 8;
 	}
-	emit(rv_sd(RV_REG_SP, store_offset, RV_REG_FP), ctx);
+	emit_sd(RV_REG_SP, store_offset, RV_REG_FP, ctx);
 	store_offset -= 8;
 	if (seen_reg(RV_REG_S1, ctx)) {
-		emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S1), ctx);
+		emit_sd(RV_REG_SP, store_offset, RV_REG_S1, ctx);
 		store_offset -= 8;
 	}
 	if (seen_reg(RV_REG_S2, ctx)) {
-		emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S2), ctx);
+		emit_sd(RV_REG_SP, store_offset, RV_REG_S2, ctx);
 		store_offset -= 8;
 	}
 	if (seen_reg(RV_REG_S3, ctx)) {
-		emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S3), ctx);
+		emit_sd(RV_REG_SP, store_offset, RV_REG_S3, ctx);
 		store_offset -= 8;
 	}
 	if (seen_reg(RV_REG_S4, ctx)) {
-		emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S4), ctx);
+		emit_sd(RV_REG_SP, store_offset, RV_REG_S4, ctx);
 		store_offset -= 8;
 	}
 	if (seen_reg(RV_REG_S5, ctx)) {
-		emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S5), ctx);
+		emit_sd(RV_REG_SP, store_offset, RV_REG_S5, ctx);
 		store_offset -= 8;
 	}
 	if (seen_reg(RV_REG_S6, ctx)) {
-		emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S6), ctx);
+		emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx);
 		store_offset -= 8;
 	}
 
-	emit(rv_addi(RV_REG_FP, RV_REG_SP, stack_adjust), ctx);
+	emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx);
 
 	if (bpf_stack_adjust)
-		emit(rv_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust), ctx);
+		emit_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust, ctx);
 
 	/* Program contains calls and tail calls, so RV_REG_TCC need
 	 * to be saved across calls.
 	 */
 	if (seen_tail_call(ctx) && seen_call(ctx))
-		emit(rv_addi(RV_REG_TCC_SAVED, RV_REG_TCC, 0), ctx);
+		emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx);
 
 	ctx->stack_size = stack_adjust;
 }
-- 
2.25.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to rv64 JIT
  2020-07-13 18:37 [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to rv64 JIT Luke Nelson
                   ` (2 preceding siblings ...)
  2020-07-13 18:37 ` [RFC PATCH bpf-next 3/3] bpf, riscv: Use compressed instructions in the rv64 JIT Luke Nelson
@ 2020-07-15 17:56 ` Björn Töpel
  2020-07-15 20:08   ` Luke Nelson
  3 siblings, 1 reply; 9+ messages in thread
From: Björn Töpel @ 2020-07-15 17:56 UTC (permalink / raw)
  To: Luke Nelson
  Cc: Song Liu, Albert Ou, Daniel Borkmann, Luke Nelson, Netdev,
	John Fastabend, Alexei Starovoitov, linux-riscv, LKML,
	Palmer Dabbelt, Paul Walmsley, KP Singh, Yonghong Song, bpf,
	Andrii Nakryiko, Martin KaFai Lau, Xi Wang

On Mon, 13 Jul 2020 at 20:37, Luke Nelson <lukenels@cs.washington.edu> wrote:
>
> This patch series enables using compressed riscv (RVC) instructions
> in the rv64 BPF JIT.
>

First of all; Really nice work. I like this, and it makes the code
easier to read as well (e.g. emit_mv). I'm a bit curious why you only
did it for RV64, and not RV32? I have some minor comments on the
patches. I strongly encourage you to submit this as a proper (non-RFC)
set for bpf-next.


Cheers,
Björn

> RVC is a standard riscv extension that adds a set of compressed,
> 2-byte instructions that can replace some regular 4-byte instructions
> for improved code density.
>
> This series first modifies the JIT to support using 2-byte instructions
> (e.g., in jump offset computations), then adds RVC encoding and
> helper functions, and finally uses the helper functions to optimize
> the rv64 JIT.
>
> I used our formal verification framework, Serval, to verify the
> correctness of the RVC encodings and their uses in the rv64 JIT.
>
> The JIT continues to pass all tests in lib/test_bpf.c, and introduces
> no new failures to test_verifier; both with and without RVC being enabled.
>
> The following are examples of the JITed code for the verifier selftest
> "direct packet read test#3 for CGROUP_SKB OK", without and with RVC
> enabled, respectively. The former uses 178 bytes, and the latter uses 112,
> for a ~37% reduction in code size for this example.
>
> Without RVC:
>
>    0: 02000813    addi  a6,zero,32
>    4: fd010113    addi  sp,sp,-48
>    8: 02813423    sd    s0,40(sp)
>    c: 02913023    sd    s1,32(sp)
>   10: 01213c23    sd    s2,24(sp)
>   14: 01313823    sd    s3,16(sp)
>   18: 01413423    sd    s4,8(sp)
>   1c: 03010413    addi  s0,sp,48
>   20: 03056683    lwu   a3,48(a0)
>   24: 02069693    slli  a3,a3,0x20
>   28: 0206d693    srli  a3,a3,0x20
>   2c: 03456703    lwu   a4,52(a0)
>   30: 02071713    slli  a4,a4,0x20
>   34: 02075713    srli  a4,a4,0x20
>   38: 03856483    lwu   s1,56(a0)
>   3c: 02049493    slli  s1,s1,0x20
>   40: 0204d493    srli  s1,s1,0x20
>   44: 03c56903    lwu   s2,60(a0)
>   48: 02091913    slli  s2,s2,0x20
>   4c: 02095913    srli  s2,s2,0x20
>   50: 04056983    lwu   s3,64(a0)
>   54: 02099993    slli  s3,s3,0x20
>   58: 0209d993    srli  s3,s3,0x20
>   5c: 09056a03    lwu   s4,144(a0)
>   60: 020a1a13    slli  s4,s4,0x20
>   64: 020a5a13    srli  s4,s4,0x20
>   68: 00900313    addi  t1,zero,9
>   6c: 006a7463    bgeu  s4,t1,0x74
>   70: 00000a13    addi  s4,zero,0
>   74: 02d52823    sw    a3,48(a0)
>   78: 02e52a23    sw    a4,52(a0)
>   7c: 02952c23    sw    s1,56(a0)
>   80: 03252e23    sw    s2,60(a0)
>   84: 05352023    sw    s3,64(a0)
>   88: 00000793    addi  a5,zero,0
>   8c: 02813403    ld    s0,40(sp)
>   90: 02013483    ld    s1,32(sp)
>   94: 01813903    ld    s2,24(sp)
>   98: 01013983    ld    s3,16(sp)
>   9c: 00813a03    ld    s4,8(sp)
>   a0: 03010113    addi  sp,sp,48
>   a4: 00078513    addi  a0,a5,0
>   a8: 00008067    jalr  zero,0(ra)
>
> With RVC:
>
>    0:   02000813    addi    a6,zero,32
>    4:   7179        c.addi16sp  sp,-48
>    6:   f422        c.sdsp  s0,40(sp)
>    8:   f026        c.sdsp  s1,32(sp)
>    a:   ec4a        c.sdsp  s2,24(sp)
>    c:   e84e        c.sdsp  s3,16(sp)
>    e:   e452        c.sdsp  s4,8(sp)
>   10:   1800        c.addi4spn  s0,sp,48
>   12:   03056683    lwu     a3,48(a0)
>   16:   1682        c.slli  a3,0x20
>   18:   9281        c.srli  a3,0x20
>   1a:   03456703    lwu     a4,52(a0)
>   1e:   1702        c.slli  a4,0x20
>   20:   9301        c.srli  a4,0x20
>   22:   03856483    lwu     s1,56(a0)
>   26:   1482        c.slli  s1,0x20
>   28:   9081        c.srli  s1,0x20
>   2a:   03c56903    lwu     s2,60(a0)
>   2e:   1902        c.slli  s2,0x20
>   30:   02095913    srli    s2,s2,0x20
>   34:   04056983    lwu     s3,64(a0)
>   38:   1982        c.slli  s3,0x20
>   3a:   0209d993    srli    s3,s3,0x20
>   3e:   09056a03    lwu     s4,144(a0)
>   42:   1a02        c.slli  s4,0x20
>   44:   020a5a13    srli    s4,s4,0x20
>   48:   4325        c.li    t1,9
>   4a:   006a7363    bgeu    s4,t1,0x50
>   4e:   4a01        c.li    s4,0
>   50:   d914        c.sw    a3,48(a0)
>   52:   d958        c.sw    a4,52(a0)
>   54:   dd04        c.sw    s1,56(a0)
>   56:   03252e23    sw      s2,60(a0)
>   5a:   05352023    sw      s3,64(a0)
>   5e:   4781        c.li    a5,0
>   60:   7422        c.ldsp  s0,40(sp)
>   62:   7482        c.ldsp  s1,32(sp)
>   64:   6962        c.ldsp  s2,24(sp)
>   66:   69c2        c.ldsp  s3,16(sp)
>   68:   6a22        c.ldsp  s4,8(sp)
>   6a:   6145        c.addi16sp  sp,48
>   6c:   853e        c.mv    a0,a5
>   6e:   8082        c.jr    ra
>
> Luke Nelson (3):
>   bpf, riscv: Modify JIT ctx to support compressed instructions
>   bpf, riscv: Add encodings for compressed instructions
>   bpf, riscv: Use compressed instructions in the rv64 JIT
>
>  arch/riscv/net/bpf_jit.h        | 495 +++++++++++++++++++++++++++++++-
>  arch/riscv/net/bpf_jit_comp32.c |  14 +-
>  arch/riscv/net/bpf_jit_comp64.c | 287 +++++++++---------
>  arch/riscv/net/bpf_jit_core.c   |   6 +-
>  4 files changed, 650 insertions(+), 152 deletions(-)
>
> --
> 2.25.1
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] bpf, riscv: Modify JIT ctx to support compressed instructions
  2020-07-13 18:37 ` [RFC PATCH bpf-next 1/3] bpf, riscv: Modify JIT ctx to support compressed instructions Luke Nelson
@ 2020-07-15 18:02   ` Björn Töpel
  0 siblings, 0 replies; 9+ messages in thread
From: Björn Töpel @ 2020-07-15 18:02 UTC (permalink / raw)
  To: Luke Nelson
  Cc: Song Liu, Albert Ou, Daniel Borkmann, Luke Nelson, Netdev,
	John Fastabend, Alexei Starovoitov, linux-riscv, LKML,
	Palmer Dabbelt, Paul Walmsley, KP Singh, Yonghong Song, bpf,
	Andrii Nakryiko, Martin KaFai Lau, Xi Wang

On Mon, 13 Jul 2020 at 20:37, Luke Nelson <lukenels@cs.washington.edu> wrote:
>
> This patch makes the necessary changes to struct rv_jit_context and to
> bpf_int_jit_compile to support compressed riscv (RVC) instructions in
> the BPF JIT.
>
> It changes the JIT image to be u16 instead of u32, since RVC instructions
> are 2 bytes as opposed to 4.
>
> It also changes ctx->offset and ctx->ninsns to refer to 2-byte
> instructions rather than 4-byte ones. The riscv PC is always 16-bit
> aligned, so this is sufficient to refer to any valid riscv offset.
>
> The code for computing jump offsets in bytes is updated accordingly,
> and factored in to the RVOFF macro to simplify the code.
>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
> ---
>  arch/riscv/net/bpf_jit.h        | 23 ++++++++++++++++++++---
>  arch/riscv/net/bpf_jit_comp32.c | 14 +++++++-------
>  arch/riscv/net/bpf_jit_comp64.c | 12 ++++++------
>  arch/riscv/net/bpf_jit_core.c   |  6 +++---
>  4 files changed, 36 insertions(+), 19 deletions(-)
>
> diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
> index 20e235d06f66..5c89ea904c1a 100644
> --- a/arch/riscv/net/bpf_jit.h
> +++ b/arch/riscv/net/bpf_jit.h
> @@ -50,7 +50,7 @@ enum {
>
>  struct rv_jit_context {
>         struct bpf_prog *prog;
> -       u32 *insns;             /* RV insns */
> +       u16 *insns;             /* RV insns */
>         int ninsns;
>         int epilogue_offset;
>         int *offset;            /* BPF to RV */
> @@ -58,6 +58,9 @@ struct rv_jit_context {
>         int stack_size;
>  };
>
> +/* Convert from ninsns to bytes. */
> +#define RVOFF(ninsns)  ((ninsns) << 1)
> +

I guess it's a matter of taste, but I'd prefer a simple static inline
function instead of the macro.

>  struct rv_jit_data {
>         struct bpf_binary_header *header;
>         u8 *image;
> @@ -74,8 +77,22 @@ static inline void bpf_flush_icache(void *start, void *end)
>         flush_icache_range((unsigned long)start, (unsigned long)end);
>  }
>
> +/* Emit a 4-byte riscv instruction. */
>  static inline void emit(const u32 insn, struct rv_jit_context *ctx)
>  {
> +       if (ctx->insns) {
> +               ctx->insns[ctx->ninsns] = insn;
> +               ctx->insns[ctx->ninsns + 1] = (insn >> 16);
> +       }
> +
> +       ctx->ninsns += 2;
> +}
> +
> +/* Emit a 2-byte riscv compressed instruction. */
> +static inline void emitc(const u16 insn, struct rv_jit_context *ctx)
> +{
> +       BUILD_BUG_ON(!rvc_enabled());
> +
>         if (ctx->insns)
>                 ctx->insns[ctx->ninsns] = insn;
>
> @@ -86,7 +103,7 @@ static inline int epilogue_offset(struct rv_jit_context *ctx)
>  {
>         int to = ctx->epilogue_offset, from = ctx->ninsns;
>
> -       return (to - from) << 2;
> +       return RVOFF(to - from);
>  }
>
>  /* Return -1 or inverted cond. */
> @@ -149,7 +166,7 @@ static inline int rv_offset(int insn, int off, struct rv_jit_context *ctx)
>         off++; /* BPF branch is from PC+1, RV is from PC */
>         from = (insn > 0) ? ctx->offset[insn - 1] : 0;
>         to = (insn + off > 0) ? ctx->offset[insn + off - 1] : 0;
> -       return (to - from) << 2;
> +       return RVOFF(to - from);
>  }
>
>  /* Instruction formats. */
> diff --git a/arch/riscv/net/bpf_jit_comp32.c b/arch/riscv/net/bpf_jit_comp32.c
> index b198eaa74456..d22001aa0057 100644
> --- a/arch/riscv/net/bpf_jit_comp32.c
> +++ b/arch/riscv/net/bpf_jit_comp32.c
> @@ -644,7 +644,7 @@ static int emit_branch_r64(const s8 *src1, const s8 *src2, s32 rvoff,
>
>         e = ctx->ninsns;
>         /* Adjust for extra insns. */
> -       rvoff -= (e - s) << 2;
> +       rvoff -= RVOFF(e - s);
>         emit_jump_and_link(RV_REG_ZERO, rvoff, true, ctx);
>         return 0;
>  }
> @@ -713,7 +713,7 @@ static int emit_bcc(u8 op, u8 rd, u8 rs, int rvoff, struct rv_jit_context *ctx)
>         if (far) {
>                 e = ctx->ninsns;
>                 /* Adjust for extra insns. */
> -               rvoff -= (e - s) << 2;
> +               rvoff -= RVOFF(e - s);
>                 emit_jump_and_link(RV_REG_ZERO, rvoff, true, ctx);
>         }
>         return 0;
> @@ -731,7 +731,7 @@ static int emit_branch_r32(const s8 *src1, const s8 *src2, s32 rvoff,
>
>         e = ctx->ninsns;
>         /* Adjust for extra insns. */
> -       rvoff -= (e - s) << 2;
> +       rvoff -= RVOFF(e - s);
>
>         if (emit_bcc(op, lo(rs1), lo(rs2), rvoff, ctx))
>                 return -1;
> @@ -795,7 +795,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>          * if (index >= max_entries)
>          *   goto out;
>          */
> -       off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
> +       off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
>         emit_bcc(BPF_JGE, lo(idx_reg), RV_REG_T1, off, ctx);
>
>         /*
> @@ -804,7 +804,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>          *   goto out;
>          */
>         emit(rv_addi(RV_REG_T1, RV_REG_TCC, -1), ctx);
> -       off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
> +       off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
>         emit_bcc(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx);
>
>         /*
> @@ -818,7 +818,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>         if (is_12b_check(off, insn))
>                 return -1;
>         emit(rv_lw(RV_REG_T0, off, RV_REG_T0), ctx);
> -       off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
> +       off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
>         emit_bcc(BPF_JEQ, RV_REG_T0, RV_REG_ZERO, off, ctx);
>
>         /*
> @@ -1214,7 +1214,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                         emit_imm32(tmp2, imm, ctx);
>                         src = tmp2;
>                         e = ctx->ninsns;
> -                       rvoff -= (e - s) << 2;
> +                       rvoff -= RVOFF(e - s);
>                 }
>
>                 if (is64)
> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
> index 6cfd164cbe88..26feed92f1bc 100644
> --- a/arch/riscv/net/bpf_jit_comp64.c
> +++ b/arch/riscv/net/bpf_jit_comp64.c
> @@ -304,14 +304,14 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>         if (is_12b_check(off, insn))
>                 return -1;
>         emit(rv_lwu(RV_REG_T1, off, RV_REG_A1), ctx);
> -       off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
> +       off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
>         emit_branch(BPF_JGE, RV_REG_A2, RV_REG_T1, off, ctx);
>
>         /* if (TCC-- < 0)
>          *     goto out;
>          */
>         emit(rv_addi(RV_REG_T1, tcc, -1), ctx);
> -       off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
> +       off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
>         emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);
>
>         /* prog = array->ptrs[index];
> @@ -324,7 +324,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>         if (is_12b_check(off, insn))
>                 return -1;
>         emit(rv_ld(RV_REG_T2, off, RV_REG_T2), ctx);
> -       off = (tc_ninsn - (ctx->ninsns - start_insn)) << 2;
> +       off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
>         emit_branch(BPF_JEQ, RV_REG_T2, RV_REG_ZERO, off, ctx);
>
>         /* goto *(prog->bpf_func + 4); */
> @@ -757,7 +757,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                         e = ctx->ninsns;
>
>                         /* Adjust for extra insns */
> -                       rvoff -= (e - s) << 2;
> +                       rvoff -= RVOFF(e - s);
>                 }
>
>                 if (BPF_OP(code) == BPF_JSET) {
> @@ -810,7 +810,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 e = ctx->ninsns;
>
>                 /* Adjust for extra insns */
> -               rvoff -= (e - s) << 2;
> +               rvoff -= RVOFF(e - s);
>                 emit_branch(BPF_OP(code), rd, rs, rvoff, ctx);
>                 break;
>
> @@ -831,7 +831,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 if (!is64 && imm < 0)
>                         emit(rv_addiw(RV_REG_T1, RV_REG_T1, 0), ctx);
>                 e = ctx->ninsns;
> -               rvoff -= (e - s) << 2;
> +               rvoff -= RVOFF(e - s);
>                 emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff, ctx);
>                 break;
>
> diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
> index 709b94ece3ed..cd156efe4944 100644
> --- a/arch/riscv/net/bpf_jit_core.c
> +++ b/arch/riscv/net/bpf_jit_core.c
> @@ -73,7 +73,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>
>         if (ctx->offset) {
>                 extra_pass = true;
> -               image_size = sizeof(u32) * ctx->ninsns;
> +               image_size = sizeof(u16) * ctx->ninsns;

Maybe sizeof(*ctx->insns)?

>                 goto skip_init_ctx;
>         }
>
> @@ -103,7 +103,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>                         if (jit_data->header)
>                                 break;
>
> -                       image_size = sizeof(u32) * ctx->ninsns;
> +                       image_size = sizeof(u16) * ctx->ninsns;

Dito.


Björn
>                         jit_data->header =
>                                 bpf_jit_binary_alloc(image_size,
>                                                      &jit_data->image,
> @@ -114,7 +114,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>                                 goto out_offset;
>                         }
>
> -                       ctx->insns = (u32 *)jit_data->image;
> +                       ctx->insns = (u16 *)jit_data->image;
>                         /*
>                          * Now, when the image is allocated, the image can
>                          * potentially shrink more (auipc/jalr -> jal).
> --
> 2.25.1
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH bpf-next 2/3] bpf, riscv: Add encodings for compressed instructions
  2020-07-13 18:37 ` [RFC PATCH bpf-next 2/3] bpf, riscv: Add encodings for " Luke Nelson
@ 2020-07-15 18:10   ` Björn Töpel
  0 siblings, 0 replies; 9+ messages in thread
From: Björn Töpel @ 2020-07-15 18:10 UTC (permalink / raw)
  To: Luke Nelson
  Cc: Song Liu, Albert Ou, Daniel Borkmann, Luke Nelson, Netdev,
	John Fastabend, Alexei Starovoitov, linux-riscv, LKML,
	Palmer Dabbelt, Paul Walmsley, KP Singh, Yonghong Song, bpf,
	Andrii Nakryiko, Martin KaFai Lau, Xi Wang

On Mon, 13 Jul 2020 at 20:37, Luke Nelson <lukenels@cs.washington.edu> wrote:
>
> This patch adds functions for encoding and emitting compressed riscv
> (RVC) instructions to the BPF JIT.
>
> Some regular riscv instructions can be compressed into an RVC instruction
> if the instruction fields meet some requirements. For example, "add rd,
> rs1, rs2" can be compressed into "c.add rd, rs2" when rd == rs1.
>
> To make using RVC encodings simpler, this patch also adds helper
> functions that selectively emit either a regular instruction or a
> compressed instruction if possible.
>
> For example, emit_add will produce a "c.add" if possible and regular
> "add" otherwise.
>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
> ---
>  arch/riscv/net/bpf_jit.h | 474 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 474 insertions(+)
>
> diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
> index 5c89ea904c1a..f3ac2d4a50f7 100644
> --- a/arch/riscv/net/bpf_jit.h
> +++ b/arch/riscv/net/bpf_jit.h
> @@ -13,6 +13,15 @@
>  #include <linux/filter.h>
>  #include <asm/cacheflush.h>
>
> +static inline bool rvc_enabled(void)
> +{
> +#ifdef CONFIG_RISCV_ISA_C
> +       return true;
> +#else
> +       return false;
> +#endif
> +}
> +
>  enum {
>         RV_REG_ZERO =   0,      /* The constant value 0 */
>         RV_REG_RA =     1,      /* Return address */
> @@ -48,6 +57,18 @@ enum {
>         RV_REG_T6 =     31,
>  };
>
> +static inline bool is_creg(u8 reg)
> +{
> +       return (1 << reg) & (BIT(RV_REG_FP) |
> +                            BIT(RV_REG_S1) |
> +                            BIT(RV_REG_A0) |
> +                            BIT(RV_REG_A1) |
> +                            BIT(RV_REG_A2) |
> +                            BIT(RV_REG_A3) |
> +                            BIT(RV_REG_A4) |
> +                            BIT(RV_REG_A5));
> +}
> +
>  struct rv_jit_context {
>         struct bpf_prog *prog;
>         u16 *insns;             /* RV insns */
> @@ -134,6 +155,16 @@ static inline int invert_bpf_cond(u8 cond)
>         return -1;
>  }
>
> +static inline bool is_6b_int(long val)
> +{
> +       return -(1L << 5) <= val && val < (1L << 5);
> +}
> +
> +static inline bool is_10b_int(long val)
> +{
> +       return -(1L << 9) <= val && val < (1L << 9);
> +}
> +
>  static inline bool is_12b_int(long val)
>  {
>         return -(1L << 11) <= val && val < (1L << 11);
> @@ -224,6 +255,59 @@ static inline u32 rv_amo_insn(u8 funct5, u8 aq, u8 rl, u8 rs2, u8 rs1,
>         return rv_r_insn(funct7, rs2, rs1, funct3, rd, opcode);
>  }
>
> +/* RISC-V compressed instruction formats. */
> +
> +static inline u32 rv_cr_insn(u8 funct4, u8 rd, u8 rs2, u8 op)

Please change so that the return type is u16, so it matches emitc.

> +{
> +       return (funct4 << 12) | (rd << 7) | (rs2 << 2) | op;
> +}
> +
> +static inline u32 rv_ci_insn(u8 funct3, u32 imm6, u8 rd, u8 op)
> +{
> +       u16 imm;
> +
> +       imm = ((imm6 & 0x20) << 7) | ((imm6 & 0x1f) << 2);
> +       return (funct3 << 13) | (rd << 7) | op | imm;
> +}
> +
> +static inline u32 rv_css_insn(u8 funct3, u32 uimm, u8 rs2, u8 op)
> +{
> +       return (funct3 << 13) | (uimm << 7) | (rs2 << 2) | op;
> +}
> +
> +static inline u32 rv_ciw_insn(u8 funct3, u32 uimm, u8 rd, u8 op)
> +{
> +       return (funct3 << 13) | (uimm << 5) | ((rd & 0x7) << 2) | op;
> +}
> +
> +static inline u32 rv_cl_insn(u8 funct3, u32 imm_hi, u8 rs1, u32 imm_lo, u8 rd,
> +                            u8 op)
> +{
> +       return (funct3 << 13) | (imm_hi << 10) | ((rs1 & 0x7) << 7) |
> +               (imm_lo << 5) | ((rd & 0x7) << 2) | op;
> +}
> +
> +static inline u32 rv_cs_insn(u8 funct3, u32 imm_hi, u8 rs1, u32 imm_lo, u8 rs2,
> +                            u8 op)
> +{
> +       return (funct3 << 13) | (imm_hi << 10) | ((rs1 & 0x7) << 7) |
> +               (imm_lo << 5) | ((rs2 & 0x7) << 2) | op;
> +}
> +
> +static inline u32 rv_ca_insn(u8 funct6, u8 rd, u8 funct2, u8 rs2, u8 op)
> +{
> +       return (funct6 << 10) | ((rd & 0x7) << 7) | (funct2 << 5) |
> +               ((rs2 & 0x7) << 2) | op;
> +}
> +
> +static inline u32 rv_cb_insn(u8 funct3, u32 imm6, u8 funct2, u8 rd, u8 op)
> +{
> +       u16 imm;
> +
> +       imm = ((imm6 & 0x20) << 7) | ((imm6 & 0x1f) << 2);
> +       return (funct3 << 13) | (funct2 << 10) | ((rd & 0x7) << 7) | op | imm;
> +}
> +
>  /* Instructions shared by both RV32 and RV64. */
>
>  static inline u32 rv_addi(u8 rd, u8 rs1, u16 imm11_0)
> @@ -431,6 +515,135 @@ static inline u32 rv_amoadd_w(u8 rd, u8 rs2, u8 rs1, u8 aq, u8 rl)
>         return rv_amo_insn(0, aq, rl, rs2, rs1, 2, rd, 0x2f);
>  }
>
> +/* RVC instrutions. */
> +
> +static inline u32 rvc_addi4spn(u8 rd, u32 imm10)

And same here. Change to u16 for the return value.

> +{
> +       u32 imm;
> +
> +       imm = ((imm10 & 0x30) << 2) | ((imm10 & 0x3c0) >> 4) |
> +               ((imm10 & 0x4) >> 1) | ((imm10 & 0x8) >> 3);
> +       return rv_ciw_insn(0x0, imm, rd, 0x0);
> +}
> +
> +static inline u32 rvc_lw(u8 rd, u32 imm7, u8 rs1)
> +{
> +       u32 imm_hi, imm_lo;
> +
> +       imm_hi = (imm7 & 0x38) >> 3;
> +       imm_lo = ((imm7 & 0x4) >> 1) | ((imm7 & 0x40) >> 6);
> +       return rv_cl_insn(0x2, imm_hi, rs1, imm_lo, rd, 0x0);
> +}
> +
> +static inline u32 rvc_sw(u8 rs1, u32 imm7, u8 rs2)
> +{
> +       u32 imm_hi, imm_lo;
> +
> +       imm_hi = (imm7 & 0x38) >> 3;
> +       imm_lo = ((imm7 & 0x4) >> 1) | ((imm7 & 0x40) >> 6);
> +       return rv_cs_insn(0x6, imm_hi, rs1, imm_lo, rs2, 0x0);
> +}
> +
> +static inline u32 rvc_addi(u8 rd, u8 imm6)
> +{
> +       return rv_ci_insn(0, imm6, rd, 0x1);
> +}
> +
> +static inline u32 rvc_li(u8 rd, u8 imm6)
> +{
> +       return rv_ci_insn(0x2, imm6, rd, 0x1);
> +}
> +
> +static inline u32 rvc_addi16sp(u32 imm10)
> +{
> +       u32 imm;
> +
> +       imm = ((imm10 & 0x200) >> 4) | (imm10 & 0x10) | ((imm10 & 0x40) >> 3) |
> +               ((imm10 & 0x180) >> 6) | ((imm10 & 0x20) >> 5);
> +       return rv_ci_insn(0x3, imm, RV_REG_SP, 0x1);
> +}
> +
> +static inline u32 rvc_lui(u8 rd, u8 imm6)
> +{
> +       return rv_ci_insn(0x3, imm6, rd, 0x1);
> +}
> +
> +static inline u32 rvc_srli(u8 rd, u8 imm6)
> +{
> +       return rv_cb_insn(0x4, imm6, 0, rd, 0x1);
> +}
> +
> +static inline u32 rvc_srai(u8 rd, u8 imm6)
> +{
> +       return rv_cb_insn(0x4, imm6, 0x1, rd, 0x1);
> +}
> +
> +static inline u32 rvc_andi(u8 rd, u8 imm6)
> +{
> +       return rv_cb_insn(0x4, imm6, 0x2, rd, 0x1);
> +}
> +
> +static inline u32 rvc_sub(u8 rd, u8 rs)
> +{
> +       return rv_ca_insn(0x23, rd, 0, rs, 0x1);
> +}
> +
> +static inline u32 rvc_xor(u8 rd, u8 rs)
> +{
> +       return rv_ca_insn(0x23, rd, 0x1, rs, 0x1);
> +}
> +
> +static inline u32 rvc_or(u8 rd, u8 rs)
> +{
> +       return rv_ca_insn(0x23, rd, 0x2, rs, 0x1);
> +}
> +
> +static inline u32 rvc_and(u8 rd, u8 rs)
> +{
> +       return rv_ca_insn(0x23, rd, 0x3, rs, 0x1);
> +}
> +
> +static inline u32 rvc_slli(u8 rd, u8 imm6)
> +{
> +       return rv_ci_insn(0, imm6, rd, 0x2);
> +}
> +
> +static inline u32 rvc_lwsp(u8 rd, u32 imm8)
> +{
> +       u32 imm;
> +
> +       imm = ((imm8 & 0xc0) >> 6) | (imm8 & 0x3c);
> +       return rv_ci_insn(0x2, imm, rd, 0x2);
> +}
> +
> +static inline u32 rvc_jr(u8 rs1)
> +{
> +       return rv_cr_insn(0x8, rs1, RV_REG_ZERO, 0x2);
> +}
> +
> +static inline u32 rvc_mv(u8 rd, u8 rs)
> +{
> +       return rv_cr_insn(0x8, rd, rs, 0x2);
> +}
> +
> +static inline u32 rvc_jalr(u8 rs1)
> +{
> +       return rv_cr_insn(0x9, rs1, RV_REG_ZERO, 0x2);
> +}
> +
> +static inline u32 rvc_add(u8 rd, u8 rs)
> +{
> +       return rv_cr_insn(0x9, rd, rs, 0x2);
> +}
> +
> +static inline u32 rvc_swsp(u32 imm8, u8 rs2)
> +{
> +       u32 imm;
> +
> +       imm = (imm8 & 0x3c) | ((imm8 & 0xc0) >> 6);
> +       return rv_css_insn(0x6, imm, rs2, 0x2);
> +}
> +
>  /*
>   * RV64-only instructions.
>   *
> @@ -520,6 +733,267 @@ static inline u32 rv_amoadd_d(u8 rd, u8 rs2, u8 rs1, u8 aq, u8 rl)
>         return rv_amo_insn(0, aq, rl, rs2, rs1, 3, rd, 0x2f);
>  }
>
> +/* RV64-only RVC instructions. */
> +
> +static inline u32 rvc_ld(u8 rd, u32 imm8, u8 rs1)

...and again also u16 return type.

> +{
> +       u32 imm_hi, imm_lo;
> +
> +       imm_hi = (imm8 & 0x38) >> 3;
> +       imm_lo = (imm8 & 0xc0) >> 6;
> +       return rv_cl_insn(0x3, imm_hi, rs1, imm_lo, rd, 0x0);
> +}
> +
> +static inline u32 rvc_sd(u8 rs1, u32 imm8, u8 rs2)
> +{
> +       u32 imm_hi, imm_lo;
> +
> +       imm_hi = (imm8 & 0x38) >> 3;
> +       imm_lo = (imm8 & 0xc0) >> 6;
> +       return rv_cs_insn(0x7, imm_hi, rs1, imm_lo, rs2, 0x0);
> +}
> +
> +static inline u32 rvc_subw(u8 rd, u8 rs)
> +{
> +       return rv_ca_insn(0x27, rd, 0, rs, 0x1);
> +}
> +
> +static inline u32 rvc_addiw(u8 rd, u8 imm6)
> +{
> +       return rv_ci_insn(0x1, imm6, rd, 0x1);
> +}
> +
> +static inline u32 rvc_ldsp(u8 rd, u32 imm9)
> +{
> +       u32 imm;
> +
> +       imm = ((imm9 & 0x1c0) >> 6) | (imm9 & 0x38);
> +       return rv_ci_insn(0x3, imm, rd, 0x2);
> +}
> +
> +static inline u32 rvc_sdsp(u32 imm9, u8 rs2)
> +{
> +       u32 imm;
> +
> +       imm = (imm9 & 0x38) | ((imm9 & 0x1c0) >> 6);
> +       return rv_css_insn(0x7, imm, rs2, 0x2);
> +}
> +
> +#endif /* __riscv_xlen == 64 */
> +
> +/* Helper functions that emit RVC instructions when possible. */
> +
> +static inline void emit_jalr(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && rd == RV_REG_RA && rs && !imm)
> +               emitc(rvc_jalr(rs), ctx);
> +       else if (rvc_enabled() && !rd && rs && !imm)
> +               emitc(rvc_jr(rs), ctx);
> +       else
> +               emit(rv_jalr(rd, rs, imm), ctx);
> +}
> +
> +static inline void emit_mv(u8 rd, u8 rs, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && rd && rs)
> +               emitc(rvc_mv(rd, rs), ctx);
> +       else
> +               emit(rv_addi(rd, rs, 0), ctx);
> +}
> +
> +static inline void emit_add(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && rd && rd == rs1 && rs2)
> +               emitc(rvc_add(rd, rs2), ctx);
> +       else
> +               emit(rv_add(rd, rs1, rs2), ctx);
> +}
> +
> +static inline void emit_addi(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
> +{
> +       if (rd == rs && !imm)
> +               /*
> +                * RVC cannot handle imm == 0. Handle it here by emitting
> +                * no instructions since it should behave as a no-op.
> +                */
> +               return;
> +       else if (rvc_enabled() && rd == RV_REG_SP && rd == rs &&
> +                is_10b_int(imm) && !(imm & 0xf))
> +               emitc(rvc_addi16sp(imm), ctx);
> +       else if (rvc_enabled() && is_creg(rd) && rs == RV_REG_SP &&
> +                (u32)imm < 0x400 && !(imm & 0x3) && imm)
> +               emitc(rvc_addi4spn(rd, imm), ctx);
> +       else if (rvc_enabled() && rd && rd == rs && is_6b_int(imm))
> +               emitc(rvc_addi(rd, imm), ctx);
> +       else
> +               emit(rv_addi(rd, rs, imm), ctx);
> +}
> +
> +static inline void emit_li(u8 rd, s32 imm, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && rd && is_6b_int(imm))
> +               emitc(rvc_li(rd, imm), ctx);
> +       else
> +               emit(rv_addi(rd, RV_REG_ZERO, imm), ctx);
> +}
> +
> +static inline void emit_lui(u8 rd, s32 imm, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && rd && rd != RV_REG_SP && is_6b_int(imm) && imm)
> +               emitc(rvc_lui(rd, imm), ctx);
> +       else
> +               emit(rv_lui(rd, imm), ctx);
> +}
> +
> +static inline void emit_slli(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
> +{
> +       if (rd == rs && !imm)
> +               /*
> +                * RVC cannot handle imm == 0. Handle it here by emitting
> +                * no instructions since it should behave as a no-op.
> +                */
> +               return;
> +       else if (rvc_enabled() && rd && rd == rs)
> +               emitc(rvc_slli(rd, imm), ctx);
> +       else
> +               emit(rv_slli(rd, rs, imm), ctx);
> +}
> +
> +static inline void emit_andi(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && is_creg(rd) && rd == rs && is_6b_int(imm))
> +               emitc(rvc_andi(rd, imm), ctx);
> +       else
> +               emit(rv_andi(rd, rs, imm), ctx);
> +}
> +
> +static inline void emit_srli(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
> +{
> +       if (rd == rs && !imm)
> +               /*
> +                * RVC cannot handle imm == 0. Handle it here by emitting
> +                * no instructions since it should behave as a no-op.
> +                */
> +               return;
> +       else if (rvc_enabled() && is_creg(rd) && rd == rs)
> +               emitc(rvc_srli(rd, imm), ctx);
> +       else
> +               emit(rv_srli(rd, rs, imm), ctx);
> +}
> +
> +static inline void emit_srai(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
> +{
> +       if (rd == rs && !imm)
> +               /*
> +                * RVC cannot handle imm == 0. Handle it here by emitting
> +                * no instructions since it should behave as a no-op.
> +                */
> +               return;
> +       else if (rvc_enabled() && is_creg(rd) && rd == rs)
> +               emitc(rvc_srai(rd, imm), ctx);
> +       else
> +               emit(rv_srai(rd, rs, imm), ctx);
> +}
> +
> +static inline void emit_sub(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && is_creg(rd) && rd == rs1 && is_creg(rs2))
> +               emitc(rvc_sub(rd, rs2), ctx);
> +       else
> +               emit(rv_sub(rd, rs1, rs2), ctx);
> +}
> +
> +static inline void emit_or(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && is_creg(rd) && rd == rs1 && is_creg(rs2))
> +               emitc(rvc_or(rd, rs2), ctx);
> +       else
> +               emit(rv_or(rd, rs1, rs2), ctx);
> +}
> +
> +static inline void emit_and(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && is_creg(rd) && rd == rs1 && is_creg(rs2))
> +               emitc(rvc_and(rd, rs2), ctx);
> +       else
> +               emit(rv_and(rd, rs1, rs2), ctx);
> +}
> +
> +static inline void emit_xor(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && is_creg(rd) && rd == rs1 && is_creg(rs2))
> +               emitc(rvc_xor(rd, rs2), ctx);
> +       else
> +               emit(rv_xor(rd, rs1, rs2), ctx);
> +}
> +
> +static inline void emit_lw(u8 rd, s32 off, u8 rs1, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && rs1 == RV_REG_SP && rd && (u32)off < 0x100 &&
> +           !(off & 0x3))
> +               emitc(rvc_lwsp(rd, off), ctx);
> +       else if (rvc_enabled() && is_creg(rd) && is_creg(rs1) &&
> +                (u32)off < 0x80 && !(off & 0x3))
> +               emitc(rvc_lw(rd, off, rs1), ctx);
> +       else
> +               emit(rv_lw(rd, off, rs1), ctx);
> +}
> +
> +static inline void emit_sw(u8 rs1, s32 off, u8 rs2, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && rs1 == RV_REG_SP && (u32)off < 0x100 &&
> +           !(off & 0x3))
> +               emitc(rvc_swsp(off, rs2), ctx);
> +       else if (rvc_enabled() && is_creg(rs1) && is_creg(rs2) &&
> +                (u32)off < 0x80 && !(off & 0x3))
> +               emitc(rvc_sw(rs1, off, rs2), ctx);
> +       else
> +               emit(rv_sw(rs1, off, rs2), ctx);
> +}
> +
> +/* RV64-only helper functions. */
> +#if __riscv_xlen == 64
> +
> +static inline void emit_addiw(u8 rd, u8 rs, s32 imm, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && rd && rd == rs && is_6b_int(imm))
> +               emitc(rvc_addiw(rd, imm), ctx);
> +       else
> +               emit(rv_addiw(rd, rs, imm), ctx);
> +}
> +
> +static inline void emit_ld(u8 rd, s32 off, u8 rs1, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && rs1 == RV_REG_SP && rd && (u32)off < 0x200 &&
> +           !(off & 0x7))
> +               emitc(rvc_ldsp(rd, off), ctx);
> +       else if (rvc_enabled() && is_creg(rd) && is_creg(rs1) &&
> +                (u32)off < 0x100 && !(off & 0x7))
> +               emitc(rvc_ld(rd, off, rs1), ctx);
> +       else
> +               emit(rv_ld(rd, off, rs1), ctx);
> +}
> +
> +static inline void emit_sd(u8 rs1, s32 off, u8 rs2, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && rs1 == RV_REG_SP && (u32)off < 0x200 &&
> +           !(off & 0x7))
> +               emitc(rvc_sdsp(off, rs2), ctx);
> +       else if (rvc_enabled() && is_creg(rs1) && is_creg(rs2) &&
> +                (u32)off < 0x100 && !(off & 0x7))
> +               emitc(rvc_sd(rs1, off, rs2), ctx);
> +       else
> +               emit(rv_sd(rs1, off, rs2), ctx);
> +}
> +
> +static inline void emit_subw(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
> +{
> +       if (rvc_enabled() && is_creg(rd) && rd == rs1 && is_creg(rs2))
> +               emitc(rvc_subw(rd, rs2), ctx);
> +       else
> +               emit(rv_subw(rd, rs1, rs2), ctx);
> +}
> +
>  #endif /* __riscv_xlen == 64 */
>
>  void bpf_jit_build_prologue(struct rv_jit_context *ctx);
> --
> 2.25.1
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH bpf-next 3/3] bpf, riscv: Use compressed instructions in the rv64 JIT
  2020-07-13 18:37 ` [RFC PATCH bpf-next 3/3] bpf, riscv: Use compressed instructions in the rv64 JIT Luke Nelson
@ 2020-07-15 18:12   ` Björn Töpel
  0 siblings, 0 replies; 9+ messages in thread
From: Björn Töpel @ 2020-07-15 18:12 UTC (permalink / raw)
  To: Luke Nelson
  Cc: Song Liu, Albert Ou, Daniel Borkmann, Luke Nelson, Netdev,
	John Fastabend, Alexei Starovoitov, linux-riscv, LKML,
	Palmer Dabbelt, Paul Walmsley, KP Singh, Yonghong Song, bpf,
	Andrii Nakryiko, Martin KaFai Lau, Xi Wang

On Mon, 13 Jul 2020 at 20:37, Luke Nelson <lukenels@cs.washington.edu> wrote:
>
> This patch uses the RVC support and encodings from bpf_jit.h to optimize
> the rv64 jit.
>
> The optimizations work by replacing emit(rv_X(...)) with a call to a
> helper function emit_X, which will emit a compressed version of the
> instruction when possible, and when RVC is enabled.
>
> The JIT continues to pass all tests in lib/test_bpf.c, and introduces
> no new failures to test_verifier; both with and without RVC being enabled.
>
> The following are examples of the JITed code for the verifier selftest
> "direct packet read test#3 for CGROUP_SKB OK", without and with RVC
> enabled, respectively. The former uses 178 bytes, and the latter uses 112,
> for a ~37% reduction in code size for this example.
>
> Without RVC:
>
>    0: 02000813    addi  a6,zero,32
>    4: fd010113    addi  sp,sp,-48
>    8: 02813423    sd    s0,40(sp)
>    c: 02913023    sd    s1,32(sp)
>   10: 01213c23    sd    s2,24(sp)
>   14: 01313823    sd    s3,16(sp)
>   18: 01413423    sd    s4,8(sp)
>   1c: 03010413    addi  s0,sp,48
>   20: 03056683    lwu   a3,48(a0)
>   24: 02069693    slli  a3,a3,0x20
>   28: 0206d693    srli  a3,a3,0x20
>   2c: 03456703    lwu   a4,52(a0)
>   30: 02071713    slli  a4,a4,0x20
>   34: 02075713    srli  a4,a4,0x20
>   38: 03856483    lwu   s1,56(a0)
>   3c: 02049493    slli  s1,s1,0x20
>   40: 0204d493    srli  s1,s1,0x20
>   44: 03c56903    lwu   s2,60(a0)
>   48: 02091913    slli  s2,s2,0x20
>   4c: 02095913    srli  s2,s2,0x20
>   50: 04056983    lwu   s3,64(a0)
>   54: 02099993    slli  s3,s3,0x20
>   58: 0209d993    srli  s3,s3,0x20
>   5c: 09056a03    lwu   s4,144(a0)
>   60: 020a1a13    slli  s4,s4,0x20
>   64: 020a5a13    srli  s4,s4,0x20
>   68: 00900313    addi  t1,zero,9
>   6c: 006a7463    bgeu  s4,t1,0x74
>   70: 00000a13    addi  s4,zero,0
>   74: 02d52823    sw    a3,48(a0)
>   78: 02e52a23    sw    a4,52(a0)
>   7c: 02952c23    sw    s1,56(a0)
>   80: 03252e23    sw    s2,60(a0)
>   84: 05352023    sw    s3,64(a0)
>   88: 00000793    addi  a5,zero,0
>   8c: 02813403    ld    s0,40(sp)
>   90: 02013483    ld    s1,32(sp)
>   94: 01813903    ld    s2,24(sp)
>   98: 01013983    ld    s3,16(sp)
>   9c: 00813a03    ld    s4,8(sp)
>   a0: 03010113    addi  sp,sp,48
>   a4: 00078513    addi  a0,a5,0
>   a8: 00008067    jalr  zero,0(ra)
>
> With RVC:
>
>    0:   02000813    addi    a6,zero,32
>    4:   7179        c.addi16sp  sp,-48
>    6:   f422        c.sdsp  s0,40(sp)
>    8:   f026        c.sdsp  s1,32(sp)
>    a:   ec4a        c.sdsp  s2,24(sp)
>    c:   e84e        c.sdsp  s3,16(sp)
>    e:   e452        c.sdsp  s4,8(sp)
>   10:   1800        c.addi4spn  s0,sp,48
>   12:   03056683    lwu     a3,48(a0)
>   16:   1682        c.slli  a3,0x20
>   18:   9281        c.srli  a3,0x20
>   1a:   03456703    lwu     a4,52(a0)
>   1e:   1702        c.slli  a4,0x20
>   20:   9301        c.srli  a4,0x20
>   22:   03856483    lwu     s1,56(a0)
>   26:   1482        c.slli  s1,0x20
>   28:   9081        c.srli  s1,0x20
>   2a:   03c56903    lwu     s2,60(a0)
>   2e:   1902        c.slli  s2,0x20
>   30:   02095913    srli    s2,s2,0x20
>   34:   04056983    lwu     s3,64(a0)
>   38:   1982        c.slli  s3,0x20
>   3a:   0209d993    srli    s3,s3,0x20
>   3e:   09056a03    lwu     s4,144(a0)
>   42:   1a02        c.slli  s4,0x20
>   44:   020a5a13    srli    s4,s4,0x20
>   48:   4325        c.li    t1,9
>   4a:   006a7363    bgeu    s4,t1,0x50
>   4e:   4a01        c.li    s4,0
>   50:   d914        c.sw    a3,48(a0)
>   52:   d958        c.sw    a4,52(a0)
>   54:   dd04        c.sw    s1,56(a0)
>   56:   03252e23    sw      s2,60(a0)
>   5a:   05352023    sw      s3,64(a0)
>   5e:   4781        c.li    a5,0
>   60:   7422        c.ldsp  s0,40(sp)
>   62:   7482        c.ldsp  s1,32(sp)
>   64:   6962        c.ldsp  s2,24(sp)
>   66:   69c2        c.ldsp  s3,16(sp)
>   68:   6a22        c.ldsp  s4,8(sp)
>   6a:   6145        c.addi16sp  sp,48
>   6c:   853e        c.mv    a0,a5
>   6e:   8082        c.jr    ra
>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>

Very nice work! For the proper series, feel free to add:

Acked-by: Björn Töpel <bjorn.topel@gmail.com>

> ---
>  arch/riscv/net/bpf_jit_comp64.c | 275 +++++++++++++++++---------------
>  1 file changed, 142 insertions(+), 133 deletions(-)
>
> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
> index 26feed92f1bc..823fe800f406 100644
> --- a/arch/riscv/net/bpf_jit_comp64.c
> +++ b/arch/riscv/net/bpf_jit_comp64.c
> @@ -137,14 +137,14 @@ static void emit_imm(u8 rd, s64 val, struct rv_jit_context *ctx)
>
>         if (is_32b_int(val)) {
>                 if (upper)
> -                       emit(rv_lui(rd, upper), ctx);
> +                       emit_lui(rd, upper, ctx);
>
>                 if (!upper) {
> -                       emit(rv_addi(rd, RV_REG_ZERO, lower), ctx);
> +                       emit_li(rd, lower, ctx);
>                         return;
>                 }
>
> -               emit(rv_addiw(rd, rd, lower), ctx);
> +               emit_addiw(rd, rd, lower, ctx);
>                 return;
>         }
>
> @@ -154,9 +154,9 @@ static void emit_imm(u8 rd, s64 val, struct rv_jit_context *ctx)
>
>         emit_imm(rd, upper, ctx);
>
> -       emit(rv_slli(rd, rd, shift), ctx);
> +       emit_slli(rd, rd, shift, ctx);
>         if (lower)
> -               emit(rv_addi(rd, rd, lower), ctx);
> +               emit_addi(rd, rd, lower, ctx);
>  }
>
>  static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
> @@ -164,43 +164,43 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
>         int stack_adjust = ctx->stack_size, store_offset = stack_adjust - 8;
>
>         if (seen_reg(RV_REG_RA, ctx)) {
> -               emit(rv_ld(RV_REG_RA, store_offset, RV_REG_SP), ctx);
> +               emit_ld(RV_REG_RA, store_offset, RV_REG_SP, ctx);
>                 store_offset -= 8;
>         }
> -       emit(rv_ld(RV_REG_FP, store_offset, RV_REG_SP), ctx);
> +       emit_ld(RV_REG_FP, store_offset, RV_REG_SP, ctx);
>         store_offset -= 8;
>         if (seen_reg(RV_REG_S1, ctx)) {
> -               emit(rv_ld(RV_REG_S1, store_offset, RV_REG_SP), ctx);
> +               emit_ld(RV_REG_S1, store_offset, RV_REG_SP, ctx);
>                 store_offset -= 8;
>         }
>         if (seen_reg(RV_REG_S2, ctx)) {
> -               emit(rv_ld(RV_REG_S2, store_offset, RV_REG_SP), ctx);
> +               emit_ld(RV_REG_S2, store_offset, RV_REG_SP, ctx);
>                 store_offset -= 8;
>         }
>         if (seen_reg(RV_REG_S3, ctx)) {
> -               emit(rv_ld(RV_REG_S3, store_offset, RV_REG_SP), ctx);
> +               emit_ld(RV_REG_S3, store_offset, RV_REG_SP, ctx);
>                 store_offset -= 8;
>         }
>         if (seen_reg(RV_REG_S4, ctx)) {
> -               emit(rv_ld(RV_REG_S4, store_offset, RV_REG_SP), ctx);
> +               emit_ld(RV_REG_S4, store_offset, RV_REG_SP, ctx);
>                 store_offset -= 8;
>         }
>         if (seen_reg(RV_REG_S5, ctx)) {
> -               emit(rv_ld(RV_REG_S5, store_offset, RV_REG_SP), ctx);
> +               emit_ld(RV_REG_S5, store_offset, RV_REG_SP, ctx);
>                 store_offset -= 8;
>         }
>         if (seen_reg(RV_REG_S6, ctx)) {
> -               emit(rv_ld(RV_REG_S6, store_offset, RV_REG_SP), ctx);
> +               emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx);
>                 store_offset -= 8;
>         }
>
> -       emit(rv_addi(RV_REG_SP, RV_REG_SP, stack_adjust), ctx);
> +       emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx);
>         /* Set return value. */
>         if (!is_tail_call)
> -               emit(rv_addi(RV_REG_A0, RV_REG_A5, 0), ctx);
> -       emit(rv_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA,
> -                    is_tail_call ? 4 : 0), /* skip TCC init */
> -            ctx);
> +               emit_mv(RV_REG_A0, RV_REG_A5, ctx);
> +       emit_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA,
> +                 is_tail_call ? 4 : 0, /* skip TCC init */
> +                 ctx);
>  }
>
>  static void emit_bcc(u8 cond, u8 rd, u8 rs, int rvoff,
> @@ -280,8 +280,8 @@ static void emit_branch(u8 cond, u8 rd, u8 rs, int rvoff,
>
>  static void emit_zext_32(u8 reg, struct rv_jit_context *ctx)
>  {
> -       emit(rv_slli(reg, reg, 32), ctx);
> -       emit(rv_srli(reg, reg, 32), ctx);
> +       emit_slli(reg, reg, 32, ctx);
> +       emit_srli(reg, reg, 32, ctx);
>  }
>
>  static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
> @@ -310,7 +310,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>         /* if (TCC-- < 0)
>          *     goto out;
>          */
> -       emit(rv_addi(RV_REG_T1, tcc, -1), ctx);
> +       emit_addi(RV_REG_T1, tcc, -1, ctx);
>         off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
>         emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);
>
> @@ -318,12 +318,12 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>          * if (!prog)
>          *     goto out;
>          */
> -       emit(rv_slli(RV_REG_T2, RV_REG_A2, 3), ctx);
> -       emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_A1), ctx);
> +       emit_slli(RV_REG_T2, RV_REG_A2, 3, ctx);
> +       emit_add(RV_REG_T2, RV_REG_T2, RV_REG_A1, ctx);
>         off = offsetof(struct bpf_array, ptrs);
>         if (is_12b_check(off, insn))
>                 return -1;
> -       emit(rv_ld(RV_REG_T2, off, RV_REG_T2), ctx);
> +       emit_ld(RV_REG_T2, off, RV_REG_T2, ctx);
>         off = RVOFF(tc_ninsn - (ctx->ninsns - start_insn));
>         emit_branch(BPF_JEQ, RV_REG_T2, RV_REG_ZERO, off, ctx);
>
> @@ -331,8 +331,8 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>         off = offsetof(struct bpf_prog, bpf_func);
>         if (is_12b_check(off, insn))
>                 return -1;
> -       emit(rv_ld(RV_REG_T3, off, RV_REG_T2), ctx);
> -       emit(rv_addi(RV_REG_TCC, RV_REG_T1, 0), ctx);
> +       emit_ld(RV_REG_T3, off, RV_REG_T2, ctx);
> +       emit_mv(RV_REG_TCC, RV_REG_T1, ctx);
>         __build_epilogue(true, ctx);
>         return 0;
>  }
> @@ -360,9 +360,9 @@ static void init_regs(u8 *rd, u8 *rs, const struct bpf_insn *insn,
>
>  static void emit_zext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
>  {
> -       emit(rv_addi(RV_REG_T2, *rd, 0), ctx);
> +       emit_mv(RV_REG_T2, *rd, ctx);
>         emit_zext_32(RV_REG_T2, ctx);
> -       emit(rv_addi(RV_REG_T1, *rs, 0), ctx);
> +       emit_mv(RV_REG_T1, *rs, ctx);
>         emit_zext_32(RV_REG_T1, ctx);
>         *rd = RV_REG_T2;
>         *rs = RV_REG_T1;
> @@ -370,15 +370,15 @@ static void emit_zext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
>
>  static void emit_sext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
>  {
> -       emit(rv_addiw(RV_REG_T2, *rd, 0), ctx);
> -       emit(rv_addiw(RV_REG_T1, *rs, 0), ctx);
> +       emit_addiw(RV_REG_T2, *rd, 0, ctx);
> +       emit_addiw(RV_REG_T1, *rs, 0, ctx);
>         *rd = RV_REG_T2;
>         *rs = RV_REG_T1;
>  }
>
>  static void emit_zext_32_rd_t1(u8 *rd, struct rv_jit_context *ctx)
>  {
> -       emit(rv_addi(RV_REG_T2, *rd, 0), ctx);
> +       emit_mv(RV_REG_T2, *rd, ctx);
>         emit_zext_32(RV_REG_T2, ctx);
>         emit_zext_32(RV_REG_T1, ctx);
>         *rd = RV_REG_T2;
> @@ -386,7 +386,7 @@ static void emit_zext_32_rd_t1(u8 *rd, struct rv_jit_context *ctx)
>
>  static void emit_sext_32_rd(u8 *rd, struct rv_jit_context *ctx)
>  {
> -       emit(rv_addiw(RV_REG_T2, *rd, 0), ctx);
> +       emit_addiw(RV_REG_T2, *rd, 0, ctx);
>         *rd = RV_REG_T2;
>  }
>
> @@ -432,7 +432,7 @@ static int emit_call(bool fixed, u64 addr, struct rv_jit_context *ctx)
>         if (ret)
>                 return ret;
>         rd = bpf_to_rv_reg(BPF_REG_0, ctx);
> -       emit(rv_addi(rd, RV_REG_A0, 0), ctx);
> +       emit_mv(rd, RV_REG_A0, ctx);
>         return 0;
>  }
>
> @@ -458,7 +458,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                         emit_zext_32(rd, ctx);
>                         break;
>                 }
> -               emit(is64 ? rv_addi(rd, rs, 0) : rv_addiw(rd, rs, 0), ctx);
> +               emit_mv(rd, rs, ctx);
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
>                 break;
> @@ -466,31 +466,35 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>         /* dst = dst OP src */
>         case BPF_ALU | BPF_ADD | BPF_X:
>         case BPF_ALU64 | BPF_ADD | BPF_X:
> -               emit(is64 ? rv_add(rd, rd, rs) : rv_addw(rd, rd, rs), ctx);
> +               emit_add(rd, rd, rs, ctx);
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
>                 break;
>         case BPF_ALU | BPF_SUB | BPF_X:
>         case BPF_ALU64 | BPF_SUB | BPF_X:
> -               emit(is64 ? rv_sub(rd, rd, rs) : rv_subw(rd, rd, rs), ctx);
> +               if (is64)
> +                       emit_sub(rd, rd, rs, ctx);
> +               else
> +                       emit_subw(rd, rd, rs, ctx);
> +
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
>                 break;
>         case BPF_ALU | BPF_AND | BPF_X:
>         case BPF_ALU64 | BPF_AND | BPF_X:
> -               emit(rv_and(rd, rd, rs), ctx);
> +               emit_and(rd, rd, rs, ctx);
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
>                 break;
>         case BPF_ALU | BPF_OR | BPF_X:
>         case BPF_ALU64 | BPF_OR | BPF_X:
> -               emit(rv_or(rd, rd, rs), ctx);
> +               emit_or(rd, rd, rs, ctx);
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
>                 break;
>         case BPF_ALU | BPF_XOR | BPF_X:
>         case BPF_ALU64 | BPF_XOR | BPF_X:
> -               emit(rv_xor(rd, rd, rs), ctx);
> +               emit_xor(rd, rd, rs, ctx);
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
>                 break;
> @@ -534,8 +538,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>         /* dst = -dst */
>         case BPF_ALU | BPF_NEG:
>         case BPF_ALU64 | BPF_NEG:
> -               emit(is64 ? rv_sub(rd, RV_REG_ZERO, rd) :
> -                    rv_subw(rd, RV_REG_ZERO, rd), ctx);
> +               emit_sub(rd, RV_REG_ZERO, rd, ctx);
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
>                 break;
> @@ -544,8 +547,8 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>         case BPF_ALU | BPF_END | BPF_FROM_LE:
>                 switch (imm) {
>                 case 16:
> -                       emit(rv_slli(rd, rd, 48), ctx);
> -                       emit(rv_srli(rd, rd, 48), ctx);
> +                       emit_slli(rd, rd, 48, ctx);
> +                       emit_srli(rd, rd, 48, ctx);
>                         break;
>                 case 32:
>                         if (!aux->verifier_zext)
> @@ -558,51 +561,51 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 break;
>
>         case BPF_ALU | BPF_END | BPF_FROM_BE:
> -               emit(rv_addi(RV_REG_T2, RV_REG_ZERO, 0), ctx);
> +               emit_li(RV_REG_T2, 0, ctx);
>
> -               emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
> -               emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
> -               emit(rv_srli(rd, rd, 8), ctx);
> +               emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +               emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +               emit_srli(rd, rd, 8, ctx);
>                 if (imm == 16)
>                         goto out_be;
>
> -               emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
> -               emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
> -               emit(rv_srli(rd, rd, 8), ctx);
> +               emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +               emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +               emit_srli(rd, rd, 8, ctx);
>
> -               emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
> -               emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
> -               emit(rv_srli(rd, rd, 8), ctx);
> +               emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +               emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +               emit_srli(rd, rd, 8, ctx);
>                 if (imm == 32)
>                         goto out_be;
>
> -               emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
> -               emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
> -               emit(rv_srli(rd, rd, 8), ctx);
> -
> -               emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
> -               emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
> -               emit(rv_srli(rd, rd, 8), ctx);
> -
> -               emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
> -               emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
> -               emit(rv_srli(rd, rd, 8), ctx);
> -
> -               emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
> -               emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx);
> -               emit(rv_srli(rd, rd, 8), ctx);
> +               emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +               emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +               emit_srli(rd, rd, 8, ctx);
> +
> +               emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +               emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +               emit_srli(rd, rd, 8, ctx);
> +
> +               emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +               emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +               emit_srli(rd, rd, 8, ctx);
> +
> +               emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +               emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +               emit_srli(rd, rd, 8, ctx);
>  out_be:
> -               emit(rv_andi(RV_REG_T1, rd, 0xff), ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx);
> +               emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
>
> -               emit(rv_addi(rd, RV_REG_T2, 0), ctx);
> +               emit_mv(rd, RV_REG_T2, ctx);
>                 break;
>
>         /* dst = imm */
> @@ -617,12 +620,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>         case BPF_ALU | BPF_ADD | BPF_K:
>         case BPF_ALU64 | BPF_ADD | BPF_K:
>                 if (is_12b_int(imm)) {
> -                       emit(is64 ? rv_addi(rd, rd, imm) :
> -                            rv_addiw(rd, rd, imm), ctx);
> +                       emit_addi(rd, rd, imm, ctx);
>                 } else {
>                         emit_imm(RV_REG_T1, imm, ctx);
> -                       emit(is64 ? rv_add(rd, rd, RV_REG_T1) :
> -                            rv_addw(rd, rd, RV_REG_T1), ctx);
> +                       emit_add(rd, rd, RV_REG_T1, ctx);
>                 }
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
> @@ -630,12 +631,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>         case BPF_ALU | BPF_SUB | BPF_K:
>         case BPF_ALU64 | BPF_SUB | BPF_K:
>                 if (is_12b_int(-imm)) {
> -                       emit(is64 ? rv_addi(rd, rd, -imm) :
> -                            rv_addiw(rd, rd, -imm), ctx);
> +                       emit_addi(rd, rd, -imm, ctx);
>                 } else {
>                         emit_imm(RV_REG_T1, imm, ctx);
> -                       emit(is64 ? rv_sub(rd, rd, RV_REG_T1) :
> -                            rv_subw(rd, rd, RV_REG_T1), ctx);
> +                       emit_sub(rd, rd, RV_REG_T1, ctx);
>                 }
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
> @@ -643,10 +642,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>         case BPF_ALU | BPF_AND | BPF_K:
>         case BPF_ALU64 | BPF_AND | BPF_K:
>                 if (is_12b_int(imm)) {
> -                       emit(rv_andi(rd, rd, imm), ctx);
> +                       emit_andi(rd, rd, imm, ctx);
>                 } else {
>                         emit_imm(RV_REG_T1, imm, ctx);
> -                       emit(rv_and(rd, rd, RV_REG_T1), ctx);
> +                       emit_and(rd, rd, RV_REG_T1, ctx);
>                 }
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
> @@ -657,7 +656,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                         emit(rv_ori(rd, rd, imm), ctx);
>                 } else {
>                         emit_imm(RV_REG_T1, imm, ctx);
> -                       emit(rv_or(rd, rd, RV_REG_T1), ctx);
> +                       emit_or(rd, rd, RV_REG_T1, ctx);
>                 }
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
> @@ -668,7 +667,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                         emit(rv_xori(rd, rd, imm), ctx);
>                 } else {
>                         emit_imm(RV_REG_T1, imm, ctx);
> -                       emit(rv_xor(rd, rd, RV_REG_T1), ctx);
> +                       emit_xor(rd, rd, RV_REG_T1, ctx);
>                 }
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
> @@ -699,19 +698,28 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 break;
>         case BPF_ALU | BPF_LSH | BPF_K:
>         case BPF_ALU64 | BPF_LSH | BPF_K:
> -               emit(is64 ? rv_slli(rd, rd, imm) : rv_slliw(rd, rd, imm), ctx);
> +               emit_slli(rd, rd, imm, ctx);
> +
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
>                 break;
>         case BPF_ALU | BPF_RSH | BPF_K:
>         case BPF_ALU64 | BPF_RSH | BPF_K:
> -               emit(is64 ? rv_srli(rd, rd, imm) : rv_srliw(rd, rd, imm), ctx);
> +               if (is64)
> +                       emit_srli(rd, rd, imm, ctx);
> +               else
> +                       emit(rv_srliw(rd, rd, imm), ctx);
> +
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
>                 break;
>         case BPF_ALU | BPF_ARSH | BPF_K:
>         case BPF_ALU64 | BPF_ARSH | BPF_K:
> -               emit(is64 ? rv_srai(rd, rd, imm) : rv_sraiw(rd, rd, imm), ctx);
> +               if (is64)
> +                       emit_srai(rd, rd, imm, ctx);
> +               else
> +                       emit(rv_sraiw(rd, rd, imm), ctx);
> +
>                 if (!is64 && !aux->verifier_zext)
>                         emit_zext_32(rd, ctx);
>                 break;
> @@ -763,7 +771,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 if (BPF_OP(code) == BPF_JSET) {
>                         /* Adjust for and */
>                         rvoff -= 4;
> -                       emit(rv_and(RV_REG_T1, rd, rs), ctx);
> +                       emit_and(RV_REG_T1, rd, rs, ctx);
>                         emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff,
>                                     ctx);
>                 } else {
> @@ -819,17 +827,17 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 rvoff = rv_offset(i, off, ctx);
>                 s = ctx->ninsns;
>                 if (is_12b_int(imm)) {
> -                       emit(rv_andi(RV_REG_T1, rd, imm), ctx);
> +                       emit_andi(RV_REG_T1, rd, imm, ctx);
>                 } else {
>                         emit_imm(RV_REG_T1, imm, ctx);
> -                       emit(rv_and(RV_REG_T1, rd, RV_REG_T1), ctx);
> +                       emit_and(RV_REG_T1, rd, RV_REG_T1, ctx);
>                 }
>                 /* For jset32, we should clear the upper 32 bits of t1, but
>                  * sign-extension is sufficient here and saves one instruction,
>                  * as t1 is used only in comparison against zero.
>                  */
>                 if (!is64 && imm < 0)
> -                       emit(rv_addiw(RV_REG_T1, RV_REG_T1, 0), ctx);
> +                       emit_addiw(RV_REG_T1, RV_REG_T1, 0, ctx);
>                 e = ctx->ninsns;
>                 rvoff -= RVOFF(e - s);
>                 emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff, ctx);
> @@ -887,7 +895,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 }
>
>                 emit_imm(RV_REG_T1, off, ctx);
> -               emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
> +               emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
>                 emit(rv_lbu(rd, 0, RV_REG_T1), ctx);
>                 if (insn_is_zext(&insn[1]))
>                         return 1;
> @@ -899,7 +907,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 }
>
>                 emit_imm(RV_REG_T1, off, ctx);
> -               emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
> +               emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
>                 emit(rv_lhu(rd, 0, RV_REG_T1), ctx);
>                 if (insn_is_zext(&insn[1]))
>                         return 1;
> @@ -911,20 +919,20 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 }
>
>                 emit_imm(RV_REG_T1, off, ctx);
> -               emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
> +               emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
>                 emit(rv_lwu(rd, 0, RV_REG_T1), ctx);
>                 if (insn_is_zext(&insn[1]))
>                         return 1;
>                 break;
>         case BPF_LDX | BPF_MEM | BPF_DW:
>                 if (is_12b_int(off)) {
> -                       emit(rv_ld(rd, off, rs), ctx);
> +                       emit_ld(rd, off, rs, ctx);
>                         break;
>                 }
>
>                 emit_imm(RV_REG_T1, off, ctx);
> -               emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
> -               emit(rv_ld(rd, 0, RV_REG_T1), ctx);
> +               emit_add(RV_REG_T1, RV_REG_T1, rs, ctx);
> +               emit_ld(rd, 0, RV_REG_T1, ctx);
>                 break;
>
>         /* ST: *(size *)(dst + off) = imm */
> @@ -936,7 +944,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 }
>
>                 emit_imm(RV_REG_T2, off, ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
>                 emit(rv_sb(RV_REG_T2, 0, RV_REG_T1), ctx);
>                 break;
>
> @@ -948,30 +956,30 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 }
>
>                 emit_imm(RV_REG_T2, off, ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
>                 emit(rv_sh(RV_REG_T2, 0, RV_REG_T1), ctx);
>                 break;
>         case BPF_ST | BPF_MEM | BPF_W:
>                 emit_imm(RV_REG_T1, imm, ctx);
>                 if (is_12b_int(off)) {
> -                       emit(rv_sw(rd, off, RV_REG_T1), ctx);
> +                       emit_sw(rd, off, RV_REG_T1, ctx);
>                         break;
>                 }
>
>                 emit_imm(RV_REG_T2, off, ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx);
> -               emit(rv_sw(RV_REG_T2, 0, RV_REG_T1), ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
> +               emit_sw(RV_REG_T2, 0, RV_REG_T1, ctx);
>                 break;
>         case BPF_ST | BPF_MEM | BPF_DW:
>                 emit_imm(RV_REG_T1, imm, ctx);
>                 if (is_12b_int(off)) {
> -                       emit(rv_sd(rd, off, RV_REG_T1), ctx);
> +                       emit_sd(rd, off, RV_REG_T1, ctx);
>                         break;
>                 }
>
>                 emit_imm(RV_REG_T2, off, ctx);
> -               emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx);
> -               emit(rv_sd(RV_REG_T2, 0, RV_REG_T1), ctx);
> +               emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
> +               emit_sd(RV_REG_T2, 0, RV_REG_T1, ctx);
>                 break;
>
>         /* STX: *(size *)(dst + off) = src */
> @@ -982,7 +990,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 }
>
>                 emit_imm(RV_REG_T1, off, ctx);
> -               emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx);
> +               emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
>                 emit(rv_sb(RV_REG_T1, 0, rs), ctx);
>                 break;
>         case BPF_STX | BPF_MEM | BPF_H:
> @@ -992,28 +1000,28 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>                 }
>
>                 emit_imm(RV_REG_T1, off, ctx);
> -               emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx);
> +               emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
>                 emit(rv_sh(RV_REG_T1, 0, rs), ctx);
>                 break;
>         case BPF_STX | BPF_MEM | BPF_W:
>                 if (is_12b_int(off)) {
> -                       emit(rv_sw(rd, off, rs), ctx);
> +                       emit_sw(rd, off, rs, ctx);
>                         break;
>                 }
>
>                 emit_imm(RV_REG_T1, off, ctx);
> -               emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx);
> -               emit(rv_sw(RV_REG_T1, 0, rs), ctx);
> +               emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
> +               emit_sw(RV_REG_T1, 0, rs, ctx);
>                 break;
>         case BPF_STX | BPF_MEM | BPF_DW:
>                 if (is_12b_int(off)) {
> -                       emit(rv_sd(rd, off, rs), ctx);
> +                       emit_sd(rd, off, rs, ctx);
>                         break;
>                 }
>
>                 emit_imm(RV_REG_T1, off, ctx);
> -               emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx);
> -               emit(rv_sd(RV_REG_T1, 0, rs), ctx);
> +               emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
> +               emit_sd(RV_REG_T1, 0, rs, ctx);
>                 break;
>         /* STX XADD: lock *(u32 *)(dst + off) += src */
>         case BPF_STX | BPF_XADD | BPF_W:
> @@ -1021,10 +1029,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>         case BPF_STX | BPF_XADD | BPF_DW:
>                 if (off) {
>                         if (is_12b_int(off)) {
> -                               emit(rv_addi(RV_REG_T1, rd, off), ctx);
> +                               emit_addi(RV_REG_T1, rd, off, ctx);
>                         } else {
>                                 emit_imm(RV_REG_T1, off, ctx);
> -                               emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx);
> +                               emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
>                         }
>
>                         rd = RV_REG_T1;
> @@ -1073,52 +1081,53 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
>
>         /* First instruction is always setting the tail-call-counter
>          * (TCC) register. This instruction is skipped for tail calls.
> +        * Force using a 4-byte (non-compressed) instruction.
>          */
>         emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
>
> -       emit(rv_addi(RV_REG_SP, RV_REG_SP, -stack_adjust), ctx);
> +       emit_addi(RV_REG_SP, RV_REG_SP, -stack_adjust, ctx);
>
>         if (seen_reg(RV_REG_RA, ctx)) {
> -               emit(rv_sd(RV_REG_SP, store_offset, RV_REG_RA), ctx);
> +               emit_sd(RV_REG_SP, store_offset, RV_REG_RA, ctx);
>                 store_offset -= 8;
>         }
> -       emit(rv_sd(RV_REG_SP, store_offset, RV_REG_FP), ctx);
> +       emit_sd(RV_REG_SP, store_offset, RV_REG_FP, ctx);
>         store_offset -= 8;
>         if (seen_reg(RV_REG_S1, ctx)) {
> -               emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S1), ctx);
> +               emit_sd(RV_REG_SP, store_offset, RV_REG_S1, ctx);
>                 store_offset -= 8;
>         }
>         if (seen_reg(RV_REG_S2, ctx)) {
> -               emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S2), ctx);
> +               emit_sd(RV_REG_SP, store_offset, RV_REG_S2, ctx);
>                 store_offset -= 8;
>         }
>         if (seen_reg(RV_REG_S3, ctx)) {
> -               emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S3), ctx);
> +               emit_sd(RV_REG_SP, store_offset, RV_REG_S3, ctx);
>                 store_offset -= 8;
>         }
>         if (seen_reg(RV_REG_S4, ctx)) {
> -               emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S4), ctx);
> +               emit_sd(RV_REG_SP, store_offset, RV_REG_S4, ctx);
>                 store_offset -= 8;
>         }
>         if (seen_reg(RV_REG_S5, ctx)) {
> -               emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S5), ctx);
> +               emit_sd(RV_REG_SP, store_offset, RV_REG_S5, ctx);
>                 store_offset -= 8;
>         }
>         if (seen_reg(RV_REG_S6, ctx)) {
> -               emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S6), ctx);
> +               emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx);
>                 store_offset -= 8;
>         }
>
> -       emit(rv_addi(RV_REG_FP, RV_REG_SP, stack_adjust), ctx);
> +       emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx);
>
>         if (bpf_stack_adjust)
> -               emit(rv_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust), ctx);
> +               emit_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust, ctx);
>
>         /* Program contains calls and tail calls, so RV_REG_TCC need
>          * to be saved across calls.
>          */
>         if (seen_tail_call(ctx) && seen_call(ctx))
> -               emit(rv_addi(RV_REG_TCC_SAVED, RV_REG_TCC, 0), ctx);
> +               emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx);
>
>         ctx->stack_size = stack_adjust;
>  }
> --
> 2.25.1
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to rv64 JIT
  2020-07-15 17:56 ` [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to " Björn Töpel
@ 2020-07-15 20:08   ` Luke Nelson
  0 siblings, 0 replies; 9+ messages in thread
From: Luke Nelson @ 2020-07-15 20:08 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Song Liu, Albert Ou, Daniel Borkmann, Netdev, John Fastabend,
	Alexei Starovoitov, linux-riscv, LKML, Luke Nelson,
	Palmer Dabbelt, Paul Walmsley, KP Singh, Yonghong Song, bpf,
	Andrii Nakryiko, Martin KaFai Lau, Xi Wang

>
> First of all; Really nice work. I like this, and it makes the code
> easier to read as well (e.g. emit_mv). I'm a bit curious why you only
> did it for RV64, and not RV32? I have some minor comments on the
> patches. I strongly encourage you to submit this as a proper (non-RFC)
> set for bpf-next.
>

Thanks for the feedback! I'll clean up the patches and address your
comments in the next revision.

The patch adding RVC to the RV32 JIT is forthcoming; some of the
optimizations there are more difficult since the RV32 JIT makes more
use of "internal" jumps whose offsets depend on the sizes of emitted
instructions. I plan to clean up that code and add RVC support in a
future series.

- Luke

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-07-15 20:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 18:37 [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to rv64 JIT Luke Nelson
2020-07-13 18:37 ` [RFC PATCH bpf-next 1/3] bpf, riscv: Modify JIT ctx to support compressed instructions Luke Nelson
2020-07-15 18:02   ` Björn Töpel
2020-07-13 18:37 ` [RFC PATCH bpf-next 2/3] bpf, riscv: Add encodings for " Luke Nelson
2020-07-15 18:10   ` Björn Töpel
2020-07-13 18:37 ` [RFC PATCH bpf-next 3/3] bpf, riscv: Use compressed instructions in the rv64 JIT Luke Nelson
2020-07-15 18:12   ` Björn Töpel
2020-07-15 17:56 ` [RFC PATCH bpf-next 0/3] bpf, riscv: Add compressed instructions to " Björn Töpel
2020-07-15 20:08   ` Luke Nelson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).