bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch
@ 2022-08-20 11:50 Tiezhu Yang
  2022-08-20 11:50 ` [PATCH bpf-next v1 1/4] LoongArch: Move {signed,unsigned}_imm_check() to inst.h Tiezhu Yang
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Tiezhu Yang @ 2022-08-20 11:50 UTC (permalink / raw)
  To: Huacai Chen, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, loongarch

The basic support for LoongArch has been merged into the upstream Linux
kernel since 5.19-rc1 on June 5, 2022, this patch series adds BPF JIT
support for LoongArch.

Here is the LoongArch documention:
https://www.kernel.org/doc/html/latest/loongarch/index.html

With this patch series, the test cases in lib/test_bpf.ko have passed
on LoongArch.

  # echo 1 > /proc/sys/net/core/bpf_jit_enable
  # modprobe test_bpf
  # dmesg | grep Summary
  test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]
  test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
  test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED

It seems that this patch series can not be applied cleanly to bpf-next
which is not synced to v6.0-rc1.

v1:
  -- Rebased series on v6.0-rc1
  -- Move {signed,unsigned}_imm_check() to inst.h
  -- Define the imm field as "unsigned int" in the instruction format
  -- Use DEF_EMIT_*_FORMAT to define the same kind of instructions
  -- Use "stack_adjust += sizeof(long) * 8" in build_prologue()

RFC:
  https://lore.kernel.org/bpf/1660013580-19053-1-git-send-email-yangtiezhu@loongson.cn/

Tiezhu Yang (4):
  LoongArch: Move {signed,unsigned}_imm_check() to inst.h
  LoongArch: Add some instruction opcodes and formats
  LoongArch: Add BPF JIT support
  LoongArch: Enable BPF_JIT and TEST_BPF in default config

 arch/loongarch/Kbuild                      |    1 +
 arch/loongarch/Kconfig                     |    1 +
 arch/loongarch/configs/loongson3_defconfig |    2 +
 arch/loongarch/include/asm/inst.h          |  317 +++++++-
 arch/loongarch/kernel/module.c             |   10 -
 arch/loongarch/net/Makefile                |    7 +
 arch/loongarch/net/bpf_jit.c               | 1113 ++++++++++++++++++++++++++++
 arch/loongarch/net/bpf_jit.h               |  308 ++++++++
 8 files changed, 1744 insertions(+), 15 deletions(-)
 create mode 100644 arch/loongarch/net/Makefile
 create mode 100644 arch/loongarch/net/bpf_jit.c
 create mode 100644 arch/loongarch/net/bpf_jit.h

-- 
2.1.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v1 1/4] LoongArch: Move {signed,unsigned}_imm_check() to inst.h
  2022-08-20 11:50 [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch Tiezhu Yang
@ 2022-08-20 11:50 ` Tiezhu Yang
  2022-08-20 11:50 ` [PATCH bpf-next v1 2/4] LoongArch: Add some instruction opcodes and formats Tiezhu Yang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Tiezhu Yang @ 2022-08-20 11:50 UTC (permalink / raw)
  To: Huacai Chen, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, loongarch

{signed,unsigned}_imm_check() will also be used in the bpf jit, so move
them from module.c to inst.h, this is preparation for later patch.

By the way, no need to explicitly include asm/inst.h in module.c, because
the header file has been included indirectly.

  arch/loongarch/kernel/module.c
    include/linux/moduleloader.h
      include/linux/module.h
        arch/loongarch/include/asm/module.h
          arch/loongarch/include/asm/inst.h

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 arch/loongarch/include/asm/inst.h | 10 ++++++++++
 arch/loongarch/kernel/module.c    | 10 ----------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
index 7b07cbb..7b37509 100644
--- a/arch/loongarch/include/asm/inst.h
+++ b/arch/loongarch/include/asm/inst.h
@@ -166,4 +166,14 @@ u32 larch_insn_gen_lu32id(enum loongarch_gpr rd, int imm);
 u32 larch_insn_gen_lu52id(enum loongarch_gpr rd, enum loongarch_gpr rj, int imm);
 u32 larch_insn_gen_jirl(enum loongarch_gpr rd, enum loongarch_gpr rj, unsigned long pc, unsigned long dest);
 
+static inline bool signed_imm_check(long val, unsigned int bit)
+{
+	return -(1L << (bit - 1)) <= val && val < (1L << (bit - 1));
+}
+
+static inline bool unsigned_imm_check(unsigned long val, unsigned int bit)
+{
+	return val < (1UL << bit);
+}
+
 #endif /* _ASM_INST_H */
diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
index 638427f..edaee67 100644
--- a/arch/loongarch/kernel/module.c
+++ b/arch/loongarch/kernel/module.c
@@ -18,16 +18,6 @@
 #include <linux/string.h>
 #include <linux/kernel.h>
 
-static inline bool signed_imm_check(long val, unsigned int bit)
-{
-	return -(1L << (bit - 1)) <= val && val < (1L << (bit - 1));
-}
-
-static inline bool unsigned_imm_check(unsigned long val, unsigned int bit)
-{
-	return val < (1UL << bit);
-}
-
 static int rela_stack_push(s64 stack_value, s64 *rela_stack, size_t *rela_stack_top)
 {
 	if (*rela_stack_top >= RELA_STACK_DEPTH)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v1 2/4] LoongArch: Add some instruction opcodes and formats
  2022-08-20 11:50 [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch Tiezhu Yang
  2022-08-20 11:50 ` [PATCH bpf-next v1 1/4] LoongArch: Move {signed,unsigned}_imm_check() to inst.h Tiezhu Yang
@ 2022-08-20 11:50 ` Tiezhu Yang
  2022-08-20 11:50 ` [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support Tiezhu Yang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Tiezhu Yang @ 2022-08-20 11:50 UTC (permalink / raw)
  To: Huacai Chen, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, loongarch

According to the "Table of Instruction Encoding" in LoongArch Reference
Manual [1], add some instruction opcodes and formats which are used in
the BPF JIT for LoongArch.

[1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#table-of-instruction-encoding

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 arch/loongarch/include/asm/inst.h | 122 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 117 insertions(+), 5 deletions(-)

diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
index 7b37509..de19a96 100644
--- a/arch/loongarch/include/asm/inst.h
+++ b/arch/loongarch/include/asm/inst.h
@@ -8,6 +8,8 @@
 #include <linux/types.h>
 #include <asm/asm.h>
 
+#define INSN_BREAK		0x002a0000
+
 #define ADDR_IMMMASK_LU52ID	0xFFF0000000000000
 #define ADDR_IMMMASK_LU32ID	0x000FFFFF00000000
 #define ADDR_IMMMASK_ADDU16ID	0x00000000FFFF0000
@@ -18,9 +20,14 @@
 
 #define ADDR_IMM(addr, INSN)	((addr & ADDR_IMMMASK_##INSN) >> ADDR_IMMSHIFT_##INSN)
 
+enum reg0i26_op {
+	b_op		= 0x14,
+};
+
 enum reg1i20_op {
 	lu12iw_op	= 0x0a,
 	lu32id_op	= 0x0b,
+	pcaddu18i_op	= 0x0f,
 };
 
 enum reg1i21_op {
@@ -28,10 +35,31 @@ enum reg1i21_op {
 	bnez_op		= 0x11,
 };
 
+enum reg2_op {
+	revb2h_op	= 0x0c,
+	revb2w_op	= 0x0e,
+	revbd_op	= 0x0f,
+};
+
+enum reg2i5_op {
+	slliw_op	= 0x81,
+	srliw_op	= 0x89,
+	sraiw_op	= 0x91,
+};
+
+enum reg2i6_op {
+	sllid_op	= 0x41,
+	srlid_op	= 0x45,
+	sraid_op	= 0x49,
+};
+
 enum reg2i12_op {
 	addiw_op	= 0x0a,
 	addid_op	= 0x0b,
 	lu52id_op	= 0x0c,
+	andi_op		= 0x0d,
+	ori_op		= 0x0e,
+	xori_op		= 0x0f,
 	ldb_op		= 0xa0,
 	ldh_op		= 0xa1,
 	ldw_op		= 0xa2,
@@ -40,6 +68,16 @@ enum reg2i12_op {
 	sth_op		= 0xa5,
 	stw_op		= 0xa6,
 	std_op		= 0xa7,
+	ldbu_op		= 0xa8,
+	ldhu_op		= 0xa9,
+	ldwu_op		= 0xaa,
+};
+
+enum reg2i14_op {
+	llw_op		= 0x20,
+	scw_op		= 0x21,
+	lld_op		= 0x22,
+	scd_op		= 0x23,
 };
 
 enum reg2i16_op {
@@ -52,6 +90,41 @@ enum reg2i16_op {
 	bgeu_op		= 0x1b,
 };
 
+enum reg3_op {
+	addd_op		= 0x21,
+	subd_op		= 0x23,
+	and_op		= 0x29,
+	or_op		= 0x2a,
+	xor_op		= 0x2b,
+	sllw_op		= 0x2e,
+	srlw_op		= 0x2f,
+	sraw_op		= 0x30,
+	slld_op		= 0x31,
+	srld_op		= 0x32,
+	srad_op		= 0x33,
+	muld_op		= 0x3b,
+	divdu_op	= 0x46,
+	moddu_op	= 0x47,
+	ldxd_op		= 0x7018,
+	stxb_op		= 0x7020,
+	stxh_op		= 0x7028,
+	stxw_op		= 0x7030,
+	stxd_op		= 0x7038,
+	ldxbu_op	= 0x7040,
+	ldxhu_op	= 0x7048,
+	ldxwu_op	= 0x7050,
+	amswapw_op	= 0x70c0,
+	amswapd_op	= 0x70c1,
+	amaddw_op	= 0x70c2,
+	amaddd_op	= 0x70c3,
+	amandw_op	= 0x70c4,
+	amandd_op	= 0x70c5,
+	amorw_op	= 0x70c6,
+	amord_op	= 0x70c7,
+	amxorw_op	= 0x70c8,
+	amxord_op	= 0x70c9,
+};
+
 struct reg0i26_format {
 	unsigned int immediate_h : 10;
 	unsigned int immediate_l : 16;
@@ -71,6 +144,26 @@ struct reg1i21_format {
 	unsigned int opcode : 6;
 };
 
+struct reg2_format {
+	unsigned int rd : 5;
+	unsigned int rj : 5;
+	unsigned int opcode : 22;
+};
+
+struct reg2i5_format {
+	unsigned int rd : 5;
+	unsigned int rj : 5;
+	unsigned int immediate : 5;
+	unsigned int opcode : 17;
+};
+
+struct reg2i6_format {
+	unsigned int rd : 5;
+	unsigned int rj : 5;
+	unsigned int immediate : 6;
+	unsigned int opcode : 16;
+};
+
 struct reg2i12_format {
 	unsigned int rd : 5;
 	unsigned int rj : 5;
@@ -78,6 +171,13 @@ struct reg2i12_format {
 	unsigned int opcode : 10;
 };
 
+struct reg2i14_format {
+	unsigned int rd : 5;
+	unsigned int rj : 5;
+	unsigned int immediate : 14;
+	unsigned int opcode : 8;
+};
+
 struct reg2i16_format {
 	unsigned int rd : 5;
 	unsigned int rj : 5;
@@ -85,13 +185,25 @@ struct reg2i16_format {
 	unsigned int opcode : 6;
 };
 
+struct reg3_format {
+	unsigned int rd : 5;
+	unsigned int rj : 5;
+	unsigned int rk : 5;
+	unsigned int opcode : 17;
+};
+
 union loongarch_instruction {
 	unsigned int word;
-	struct reg0i26_format reg0i26_format;
-	struct reg1i20_format reg1i20_format;
-	struct reg1i21_format reg1i21_format;
-	struct reg2i12_format reg2i12_format;
-	struct reg2i16_format reg2i16_format;
+	struct reg0i26_format	reg0i26_format;
+	struct reg1i20_format	reg1i20_format;
+	struct reg1i21_format	reg1i21_format;
+	struct reg2_format	reg2_format;
+	struct reg2i5_format	reg2i5_format;
+	struct reg2i6_format	reg2i6_format;
+	struct reg2i12_format	reg2i12_format;
+	struct reg2i14_format	reg2i14_format;
+	struct reg2i16_format	reg2i16_format;
+	struct reg3_format	reg3_format;
 };
 
 #define LOONGARCH_INSN_SIZE	sizeof(union loongarch_instruction)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support
  2022-08-20 11:50 [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch Tiezhu Yang
  2022-08-20 11:50 ` [PATCH bpf-next v1 1/4] LoongArch: Move {signed,unsigned}_imm_check() to inst.h Tiezhu Yang
  2022-08-20 11:50 ` [PATCH bpf-next v1 2/4] LoongArch: Add some instruction opcodes and formats Tiezhu Yang
@ 2022-08-20 11:50 ` Tiezhu Yang
  2022-08-20 13:41   ` kernel test robot
                     ` (2 more replies)
  2022-08-20 11:51 ` [PATCH bpf-next v1 4/4] LoongArch: Enable BPF_JIT and TEST_BPF in default config Tiezhu Yang
  2022-08-22  1:36 ` [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch Tiezhu Yang
  4 siblings, 3 replies; 15+ messages in thread
From: Tiezhu Yang @ 2022-08-20 11:50 UTC (permalink / raw)
  To: Huacai Chen, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, loongarch

BPF programs are normally handled by a BPF interpreter, add BPF JIT
support for LoongArch to allow the kernel to generate native code
when a program is loaded into the kernel, this will significantly
speed-up processing of BPF programs.

Co-developed-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 arch/loongarch/Kbuild             |    1 +
 arch/loongarch/Kconfig            |    1 +
 arch/loongarch/include/asm/inst.h |  185 ++++++
 arch/loongarch/net/Makefile       |    7 +
 arch/loongarch/net/bpf_jit.c      | 1113 +++++++++++++++++++++++++++++++++++++
 arch/loongarch/net/bpf_jit.h      |  308 ++++++++++
 6 files changed, 1615 insertions(+)
 create mode 100644 arch/loongarch/net/Makefile
 create mode 100644 arch/loongarch/net/bpf_jit.c
 create mode 100644 arch/loongarch/net/bpf_jit.h

diff --git a/arch/loongarch/Kbuild b/arch/loongarch/Kbuild
index ab5373d..b01f5cd 100644
--- a/arch/loongarch/Kbuild
+++ b/arch/loongarch/Kbuild
@@ -1,5 +1,6 @@
 obj-y += kernel/
 obj-y += mm/
+obj-y += net/
 obj-y += vdso/
 
 # for cleaning
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 4abc9a2..6d9d846 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -82,6 +82,7 @@ config LOONGARCH
 	select HAVE_CONTEXT_TRACKING_USER
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DMA_CONTIGUOUS
+	select HAVE_EBPF_JIT if 64BIT
 	select HAVE_EXIT_THREAD
 	select HAVE_FAST_GUP
 	select HAVE_GENERIC_VDSO
diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
index de19a96..ac06f2e 100644
--- a/arch/loongarch/include/asm/inst.h
+++ b/arch/loongarch/include/asm/inst.h
@@ -288,4 +288,189 @@ static inline bool unsigned_imm_check(unsigned long val, unsigned int bit)
 	return val < (1UL << bit);
 }
 
+#define DEF_EMIT_REG0I26_FORMAT(NAME, OP)				\
+static inline void emit_##NAME(union loongarch_instruction *insn,	\
+			       int offset)				\
+{									\
+	unsigned int immediate_l, immediate_h;				\
+									\
+	immediate_l = offset & 0xffff;					\
+	offset >>= 16;							\
+	immediate_h = offset & 0x3ff;					\
+									\
+	insn->reg0i26_format.opcode = OP;				\
+	insn->reg0i26_format.immediate_l = immediate_l;			\
+	insn->reg0i26_format.immediate_h = immediate_h;			\
+}
+
+DEF_EMIT_REG0I26_FORMAT(b, b_op)
+
+#define DEF_EMIT_REG1I20_FORMAT(NAME, OP)				\
+static inline void emit_##NAME(union loongarch_instruction *insn,	\
+			       enum loongarch_gpr rd, int imm)		\
+{									\
+	insn->reg1i20_format.opcode = OP;				\
+	insn->reg1i20_format.immediate = imm;				\
+	insn->reg1i20_format.rd = rd;					\
+}
+
+DEF_EMIT_REG1I20_FORMAT(lu12iw, lu12iw_op)
+DEF_EMIT_REG1I20_FORMAT(lu32id, lu32id_op)
+DEF_EMIT_REG1I20_FORMAT(pcaddu18i, pcaddu18i_op)
+
+#define DEF_EMIT_REG2_FORMAT(NAME, OP)					\
+static inline void emit_##NAME(union loongarch_instruction *insn,	\
+			       enum loongarch_gpr rd,			\
+			       enum loongarch_gpr rj)			\
+{									\
+	insn->reg2_format.opcode = OP;					\
+	insn->reg2_format.rd = rd;					\
+	insn->reg2_format.rj = rj;					\
+}
+
+DEF_EMIT_REG2_FORMAT(revb2h, revb2h_op)
+DEF_EMIT_REG2_FORMAT(revb2w, revb2w_op)
+DEF_EMIT_REG2_FORMAT(revbd, revbd_op)
+
+#define DEF_EMIT_REG2I5_FORMAT(NAME, OP)				\
+static inline void emit_##NAME(union loongarch_instruction *insn,	\
+			       enum loongarch_gpr rd,			\
+			       enum loongarch_gpr rj,			\
+			       int imm)					\
+{									\
+	insn->reg2i5_format.opcode = OP;				\
+	insn->reg2i5_format.immediate = imm;				\
+	insn->reg2i5_format.rd = rd;					\
+	insn->reg2i5_format.rj = rj;					\
+}
+
+DEF_EMIT_REG2I5_FORMAT(slliw, slliw_op)
+DEF_EMIT_REG2I5_FORMAT(srliw, srliw_op)
+DEF_EMIT_REG2I5_FORMAT(sraiw, sraiw_op)
+
+#define DEF_EMIT_REG2I6_FORMAT(NAME, OP)				\
+static inline void emit_##NAME(union loongarch_instruction *insn,	\
+			       enum loongarch_gpr rd,			\
+			       enum loongarch_gpr rj,			\
+			       int imm)					\
+{									\
+	insn->reg2i6_format.opcode = OP;				\
+	insn->reg2i6_format.immediate = imm;				\
+	insn->reg2i6_format.rd = rd;					\
+	insn->reg2i6_format.rj = rj;					\
+}
+
+DEF_EMIT_REG2I6_FORMAT(sllid, sllid_op)
+DEF_EMIT_REG2I6_FORMAT(srlid, srlid_op)
+DEF_EMIT_REG2I6_FORMAT(sraid, sraid_op)
+
+#define DEF_EMIT_REG2I12_FORMAT(NAME, OP)				\
+static inline void emit_##NAME(union loongarch_instruction *insn,	\
+			       enum loongarch_gpr rd,			\
+			       enum loongarch_gpr rj,			\
+			       int imm)					\
+{									\
+	insn->reg2i12_format.opcode = OP;				\
+	insn->reg2i12_format.immediate = imm;				\
+	insn->reg2i12_format.rd = rd;					\
+	insn->reg2i12_format.rj = rj;					\
+}
+
+DEF_EMIT_REG2I12_FORMAT(addiw, addiw_op)
+DEF_EMIT_REG2I12_FORMAT(addid, addid_op)
+DEF_EMIT_REG2I12_FORMAT(lu52id, lu52id_op)
+DEF_EMIT_REG2I12_FORMAT(andi, andi_op)
+DEF_EMIT_REG2I12_FORMAT(ori, ori_op)
+DEF_EMIT_REG2I12_FORMAT(xori, xori_op)
+DEF_EMIT_REG2I12_FORMAT(ldbu, ldbu_op)
+DEF_EMIT_REG2I12_FORMAT(ldhu, ldhu_op)
+DEF_EMIT_REG2I12_FORMAT(ldwu, ldwu_op)
+DEF_EMIT_REG2I12_FORMAT(ldd, ldd_op)
+DEF_EMIT_REG2I12_FORMAT(stb, stb_op)
+DEF_EMIT_REG2I12_FORMAT(sth, sth_op)
+DEF_EMIT_REG2I12_FORMAT(stw, stw_op)
+DEF_EMIT_REG2I12_FORMAT(std, std_op)
+
+#define DEF_EMIT_REG2I14_FORMAT(NAME, OP)				\
+static inline void emit_##NAME(union loongarch_instruction *insn,	\
+			       enum loongarch_gpr rd,			\
+			       enum loongarch_gpr rj,			\
+			       int imm)					\
+{									\
+	insn->reg2i14_format.opcode = OP;				\
+	insn->reg2i14_format.immediate = imm;				\
+	insn->reg2i14_format.rd = rd;					\
+	insn->reg2i14_format.rj = rj;					\
+}
+
+DEF_EMIT_REG2I14_FORMAT(llw, llw_op)
+DEF_EMIT_REG2I14_FORMAT(scw, scw_op)
+DEF_EMIT_REG2I14_FORMAT(lld, lld_op)
+DEF_EMIT_REG2I14_FORMAT(scd, scd_op)
+
+#define DEF_EMIT_REG2I16_FORMAT(NAME, OP)				\
+static inline void emit_##NAME(union loongarch_instruction *insn,	\
+			       enum loongarch_gpr rj,			\
+			       enum loongarch_gpr rd,			\
+			       int offset)				\
+{									\
+	insn->reg2i16_format.opcode = OP;				\
+	insn->reg2i16_format.immediate = offset;			\
+	insn->reg2i16_format.rj = rj;					\
+	insn->reg2i16_format.rd = rd;					\
+}
+
+DEF_EMIT_REG2I16_FORMAT(beq, beq_op)
+DEF_EMIT_REG2I16_FORMAT(bne, bne_op)
+DEF_EMIT_REG2I16_FORMAT(blt, blt_op)
+DEF_EMIT_REG2I16_FORMAT(bge, bge_op)
+DEF_EMIT_REG2I16_FORMAT(bltu, bltu_op)
+DEF_EMIT_REG2I16_FORMAT(bgeu, bgeu_op)
+DEF_EMIT_REG2I16_FORMAT(jirl, jirl_op)
+
+#define DEF_EMIT_REG3_FORMAT(NAME, OP)					\
+static inline void emit_##NAME(union loongarch_instruction *insn,	\
+			       enum loongarch_gpr rd,			\
+			       enum loongarch_gpr rj,			\
+			       enum loongarch_gpr rk)			\
+{									\
+	insn->reg3_format.opcode = OP;					\
+	insn->reg3_format.rd = rd;					\
+	insn->reg3_format.rj = rj;					\
+	insn->reg3_format.rk = rk;					\
+}
+
+DEF_EMIT_REG3_FORMAT(addd, addd_op)
+DEF_EMIT_REG3_FORMAT(subd, subd_op)
+DEF_EMIT_REG3_FORMAT(muld, muld_op)
+DEF_EMIT_REG3_FORMAT(divdu, divdu_op)
+DEF_EMIT_REG3_FORMAT(moddu, moddu_op)
+DEF_EMIT_REG3_FORMAT(and, and_op)
+DEF_EMIT_REG3_FORMAT(or, or_op)
+DEF_EMIT_REG3_FORMAT(xor, xor_op)
+DEF_EMIT_REG3_FORMAT(sllw, sllw_op)
+DEF_EMIT_REG3_FORMAT(slld, slld_op)
+DEF_EMIT_REG3_FORMAT(srlw, srlw_op)
+DEF_EMIT_REG3_FORMAT(srld, srld_op)
+DEF_EMIT_REG3_FORMAT(sraw, sraw_op)
+DEF_EMIT_REG3_FORMAT(srad, srad_op)
+DEF_EMIT_REG3_FORMAT(ldxbu, ldxbu_op)
+DEF_EMIT_REG3_FORMAT(ldxhu, ldxhu_op)
+DEF_EMIT_REG3_FORMAT(ldxwu, ldxwu_op)
+DEF_EMIT_REG3_FORMAT(ldxd, ldxd_op)
+DEF_EMIT_REG3_FORMAT(stxb, stxb_op)
+DEF_EMIT_REG3_FORMAT(stxh, stxh_op)
+DEF_EMIT_REG3_FORMAT(stxw, stxw_op)
+DEF_EMIT_REG3_FORMAT(stxd, stxd_op)
+DEF_EMIT_REG3_FORMAT(amaddw, amaddw_op)
+DEF_EMIT_REG3_FORMAT(amaddd, amaddd_op)
+DEF_EMIT_REG3_FORMAT(amandw, amandw_op)
+DEF_EMIT_REG3_FORMAT(amandd, amandd_op)
+DEF_EMIT_REG3_FORMAT(amorw, amorw_op)
+DEF_EMIT_REG3_FORMAT(amord, amord_op)
+DEF_EMIT_REG3_FORMAT(amxorw, amxorw_op)
+DEF_EMIT_REG3_FORMAT(amxord, amxord_op)
+DEF_EMIT_REG3_FORMAT(amswapw, amswapw_op)
+DEF_EMIT_REG3_FORMAT(amswapd, amswapd_op)
+
 #endif /* _ASM_INST_H */
diff --git a/arch/loongarch/net/Makefile b/arch/loongarch/net/Makefile
new file mode 100644
index 0000000..1ec12a0
--- /dev/null
+++ b/arch/loongarch/net/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for arch/loongarch/net
+#
+# Copyright (C) 2022 Loongson Technology Corporation Limited
+#
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
new file mode 100644
index 0000000..2f41b9b
--- /dev/null
+++ b/arch/loongarch/net/bpf_jit.c
@@ -0,0 +1,1113 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * BPF JIT compiler for LoongArch
+ *
+ * Copyright (C) 2022 Loongson Technology Corporation Limited
+ */
+#include "bpf_jit.h"
+
+#define REG_TCC		LOONGARCH_GPR_A6
+#define TCC_SAVED	LOONGARCH_GPR_S5
+
+#define SAVE_RA		BIT(0)
+#define SAVE_TCC	BIT(1)
+
+static const int regmap[] = {
+	/* return value from in-kernel function, and exit value for eBPF program */
+	[BPF_REG_0] = LOONGARCH_GPR_A5,
+	/* arguments from eBPF program to in-kernel function */
+	[BPF_REG_1] = LOONGARCH_GPR_A0,
+	[BPF_REG_2] = LOONGARCH_GPR_A1,
+	[BPF_REG_3] = LOONGARCH_GPR_A2,
+	[BPF_REG_4] = LOONGARCH_GPR_A3,
+	[BPF_REG_5] = LOONGARCH_GPR_A4,
+	/* callee saved registers that in-kernel function will preserve */
+	[BPF_REG_6] = LOONGARCH_GPR_S0,
+	[BPF_REG_7] = LOONGARCH_GPR_S1,
+	[BPF_REG_8] = LOONGARCH_GPR_S2,
+	[BPF_REG_9] = LOONGARCH_GPR_S3,
+	/* read-only frame pointer to access stack */
+	[BPF_REG_FP] = LOONGARCH_GPR_S4,
+	/* temporary register for blinding constants */
+	[BPF_REG_AX] = LOONGARCH_GPR_T0,
+};
+
+static void mark_call(struct jit_ctx *ctx)
+{
+	ctx->flags |= SAVE_RA;
+}
+
+static void mark_tail_call(struct jit_ctx *ctx)
+{
+	ctx->flags |= SAVE_TCC;
+}
+
+static bool seen_call(struct jit_ctx *ctx)
+{
+	return (ctx->flags & SAVE_RA);
+}
+
+static bool seen_tail_call(struct jit_ctx *ctx)
+{
+	return (ctx->flags & SAVE_TCC);
+}
+
+static u8 tail_call_reg(struct jit_ctx *ctx)
+{
+	if (seen_call(ctx))
+		return TCC_SAVED;
+
+	return REG_TCC;
+}
+
+/*
+ * eBPF prog stack layout:
+ *
+ *                                        high
+ * original $sp ------------> +-------------------------+ <--LOONGARCH_GPR_FP
+ *                            |           $ra           |
+ *                            +-------------------------+
+ *                            |           $fp           |
+ *                            +-------------------------+
+ *                            |           $s0           |
+ *                            +-------------------------+
+ *                            |           $s1           |
+ *                            +-------------------------+
+ *                            |           $s2           |
+ *                            +-------------------------+
+ *                            |           $s3           |
+ *                            +-------------------------+
+ *                            |           $s4           |
+ *                            +-------------------------+
+ *                            |           $s5           |
+ *                            +-------------------------+ <--BPF_REG_FP
+ *                            |  prog->aux->stack_depth |
+ *                            |        (optional)       |
+ * current $sp -------------> +-------------------------+
+ *                                        low
+ */
+static void build_prologue(struct jit_ctx *ctx)
+{
+	int stack_adjust = 0, store_offset, bpf_stack_adjust;
+
+	bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
+
+	/* To store ra, fp, s0, s1, s2, s3, s4 and s5. */
+	stack_adjust += sizeof(long) * 8;
+
+	stack_adjust = round_up(stack_adjust, 16);
+	stack_adjust += bpf_stack_adjust;
+
+	/*
+	 * First instruction initializes the tail call count (TCC).
+	 * On tail call we skip this instruction, and the TCC is
+	 * passed in REG_TCC from the caller.
+	 */
+	emit_insn(ctx, addid, REG_TCC, LOONGARCH_GPR_ZERO, MAX_TAIL_CALL_CNT);
+
+	emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, -stack_adjust);
+
+	store_offset = stack_adjust - sizeof(long);
+	emit_insn(ctx, std, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, store_offset);
+
+	store_offset -= sizeof(long);
+	emit_insn(ctx, std, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, store_offset);
+
+	store_offset -= sizeof(long);
+	emit_insn(ctx, std, LOONGARCH_GPR_S0, LOONGARCH_GPR_SP, store_offset);
+
+	store_offset -= sizeof(long);
+	emit_insn(ctx, std, LOONGARCH_GPR_S1, LOONGARCH_GPR_SP, store_offset);
+
+	store_offset -= sizeof(long);
+	emit_insn(ctx, std, LOONGARCH_GPR_S2, LOONGARCH_GPR_SP, store_offset);
+
+	store_offset -= sizeof(long);
+	emit_insn(ctx, std, LOONGARCH_GPR_S3, LOONGARCH_GPR_SP, store_offset);
+
+	store_offset -= sizeof(long);
+	emit_insn(ctx, std, LOONGARCH_GPR_S4, LOONGARCH_GPR_SP, store_offset);
+
+	store_offset -= sizeof(long);
+	emit_insn(ctx, std, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, store_offset);
+
+	emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_adjust);
+
+	if (bpf_stack_adjust)
+		emit_insn(ctx, addid, regmap[BPF_REG_FP], LOONGARCH_GPR_SP, bpf_stack_adjust);
+
+	/*
+	 * Program contains calls and tail calls, so REG_TCC need
+	 * to be saved across calls.
+	 */
+	if (seen_tail_call(ctx) && seen_call(ctx))
+		move_reg(ctx, TCC_SAVED, REG_TCC);
+
+	ctx->stack_size = stack_adjust;
+}
+
+static void __build_epilogue(struct jit_ctx *ctx, bool is_tail_call)
+{
+	int stack_adjust = ctx->stack_size;
+	int load_offset;
+
+	load_offset = stack_adjust - sizeof(long);
+	emit_insn(ctx, ldd, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, load_offset);
+
+	load_offset -= sizeof(long);
+	emit_insn(ctx, ldd, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, load_offset);
+
+	load_offset -= sizeof(long);
+	emit_insn(ctx, ldd, LOONGARCH_GPR_S0, LOONGARCH_GPR_SP, load_offset);
+
+	load_offset -= sizeof(long);
+	emit_insn(ctx, ldd, LOONGARCH_GPR_S1, LOONGARCH_GPR_SP, load_offset);
+
+	load_offset -= sizeof(long);
+	emit_insn(ctx, ldd, LOONGARCH_GPR_S2, LOONGARCH_GPR_SP, load_offset);
+
+	load_offset -= sizeof(long);
+	emit_insn(ctx, ldd, LOONGARCH_GPR_S3, LOONGARCH_GPR_SP, load_offset);
+
+	load_offset -= sizeof(long);
+	emit_insn(ctx, ldd, LOONGARCH_GPR_S4, LOONGARCH_GPR_SP, load_offset);
+
+	load_offset -= sizeof(long);
+	emit_insn(ctx, ldd, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, load_offset);
+
+	emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, stack_adjust);
+
+	if (!is_tail_call) {
+		/* Set return value */
+		move_reg(ctx, LOONGARCH_GPR_A0, regmap[BPF_REG_0]);
+		/* Return to the caller */
+		emit_insn(ctx, jirl, LOONGARCH_GPR_RA, LOONGARCH_GPR_ZERO, 0);
+	} else {
+		/*
+		 * Call the next bpf prog and skip the first instruction
+		 * of TCC initialization.
+		 */
+		emit_insn(ctx, jirl, LOONGARCH_GPR_T3, LOONGARCH_GPR_ZERO, 1);
+	}
+}
+
+void build_epilogue(struct jit_ctx *ctx)
+{
+	__build_epilogue(ctx, false);
+}
+
+bool bpf_jit_supports_kfunc_call(void)
+{
+	return true;
+}
+
+/* initialized on the first pass of build_body() */
+static int out_offset = -1;
+static int emit_bpf_tail_call(struct jit_ctx *ctx)
+{
+	int off;
+	u8 tcc = tail_call_reg(ctx);
+	u8 a1 = LOONGARCH_GPR_A1;
+	u8 a2 = LOONGARCH_GPR_A2;
+	u8 t1 = LOONGARCH_GPR_T1;
+	u8 t2 = LOONGARCH_GPR_T2;
+	u8 t3 = LOONGARCH_GPR_T3;
+	const int idx0 = ctx->idx;
+
+#define cur_offset (ctx->idx - idx0)
+#define jmp_offset (out_offset - (cur_offset))
+
+	/*
+	 * a0: &ctx
+	 * a1: &array
+	 * a2: index
+	 *
+	 * if (index >= array->map.max_entries)
+	 *	 goto out;
+	 */
+	off = offsetof(struct bpf_array, map.max_entries);
+	emit_insn(ctx, ldwu, t1, a1, off);
+	/* bgeu $a2, $t1, jmp_offset */
+	emit_tailcall_jmp(ctx, BPF_JGE, a2, t1, jmp_offset);
+
+	/*
+	 * if (--TCC < 0)
+	 *	 goto out;
+	 */
+	emit_insn(ctx, addid, REG_TCC, tcc, -1);
+	emit_tailcall_jmp(ctx, BPF_JSLT, REG_TCC, LOONGARCH_GPR_ZERO, jmp_offset);
+
+	/*
+	 * prog = array->ptrs[index];
+	 * if (!prog)
+	 *	 goto out;
+	 */
+	emit_insn(ctx, sllid, t2, a2, 3);
+	emit_insn(ctx, addd, t2, t2, a1);
+	off = offsetof(struct bpf_array, ptrs);
+	emit_insn(ctx, ldd, t2, t2, off);
+	/* beq $t2, $zero, jmp_offset */
+	emit_tailcall_jmp(ctx, BPF_JEQ, t2, LOONGARCH_GPR_ZERO, jmp_offset);
+
+	/* goto *(prog->bpf_func + 4); */
+	off = offsetof(struct bpf_prog, bpf_func);
+	emit_insn(ctx, ldd, t3, t2, off);
+	__build_epilogue(ctx, true);
+
+	/* out: */
+	if (out_offset == -1)
+		out_offset = cur_offset;
+	if (cur_offset != out_offset) {
+		pr_err_once("tail_call out_offset = %d, expected %d!\n",
+			    cur_offset, out_offset);
+		return -1;
+	}
+
+	return 0;
+#undef cur_offset
+#undef jmp_offset
+}
+
+static void emit_atomic(const struct bpf_insn *insn, struct jit_ctx *ctx)
+{
+	const u8 dst = regmap[insn->dst_reg];
+	const u8 src = regmap[insn->src_reg];
+	const u8 t1 = LOONGARCH_GPR_T1;
+	const u8 t2 = LOONGARCH_GPR_T2;
+	const u8 t3 = LOONGARCH_GPR_T3;
+	const s16 off = insn->off;
+	const s32 imm = insn->imm;
+	const bool isdw = BPF_SIZE(insn->code) == BPF_DW;
+
+	move_imm32(ctx, t1, off, false);
+	emit_insn(ctx, addd, t1, dst, t1);
+	move_reg(ctx, t3, src);
+
+	switch (imm) {
+	/* lock *(size *)(dst + off) <op>= src */
+	case BPF_ADD:
+		if (isdw)
+			emit_insn(ctx, amaddd, t2, t1, src);
+		else
+			emit_insn(ctx, amaddw, t2, t1, src);
+		break;
+	case BPF_AND:
+		if (isdw)
+			emit_insn(ctx, amandd, t2, t1, src);
+		else
+			emit_insn(ctx, amandw, t2, t1, src);
+		break;
+	case BPF_OR:
+		if (isdw)
+			emit_insn(ctx, amord, t2, t1, src);
+		else
+			emit_insn(ctx, amorw, t2, t1, src);
+		break;
+	case BPF_XOR:
+		if (isdw)
+			emit_insn(ctx, amxord, t2, t1, src);
+		else
+			emit_insn(ctx, amxorw, t2, t1, src);
+		break;
+	/* src = atomic_fetch_<op>(dst + off, src) */
+	case BPF_ADD | BPF_FETCH:
+		if (isdw) {
+			emit_insn(ctx, amaddd, src, t1, t3);
+		} else {
+			emit_insn(ctx, amaddw, src, t1, t3);
+			emit_zext_32(ctx, src, true);
+		}
+		break;
+	case BPF_AND | BPF_FETCH:
+		if (isdw) {
+			emit_insn(ctx, amandd, src, t1, t3);
+		} else {
+			emit_insn(ctx, amandw, src, t1, t3);
+			emit_zext_32(ctx, src, true);
+		}
+		break;
+	case BPF_OR | BPF_FETCH:
+		if (isdw) {
+			emit_insn(ctx, amord, src, t1, t3);
+		} else {
+			emit_insn(ctx, amorw, src, t1, t3);
+			emit_zext_32(ctx, src, true);
+		}
+		break;
+	case BPF_XOR | BPF_FETCH:
+		if (isdw) {
+			emit_insn(ctx, amxord, src, t1, t3);
+		} else {
+			emit_insn(ctx, amxorw, src, t1, t3);
+			emit_zext_32(ctx, src, true);
+		}
+		break;
+	/* src = atomic_xchg(dst + off, src); */
+	case BPF_XCHG:
+		if (isdw) {
+			emit_insn(ctx, amswapd, src, t1, t3);
+		} else {
+			emit_insn(ctx, amswapw, src, t1, t3);
+			emit_zext_32(ctx, src, true);
+		}
+		break;
+	/* r0 = atomic_cmpxchg(dst + off, r0, src); */
+	case BPF_CMPXCHG:
+		u8 r0 = regmap[BPF_REG_0];
+
+		move_reg(ctx, t2, r0);
+		if (isdw) {
+			emit_insn(ctx, lld, r0, t1, 0);
+			emit_insn(ctx, bne, t2, r0, 4);
+			move_reg(ctx, t3, src);
+			emit_insn(ctx, scd, t3, t1, 0);
+			emit_insn(ctx, beq, t3, LOONGARCH_GPR_ZERO, -4);
+		} else {
+			emit_insn(ctx, llw, r0, t1, 0);
+			emit_zext_32(ctx, t2, true);
+			emit_zext_32(ctx, r0, true);
+			emit_insn(ctx, bne, t2, r0, 4);
+			move_reg(ctx, t3, src);
+			emit_insn(ctx, scw, t3, t1, 0);
+			emit_insn(ctx, beq, t3, LOONGARCH_GPR_ZERO, -6);
+			emit_zext_32(ctx, r0, true);
+		}
+		break;
+	}
+}
+
+static bool is_signed_bpf_cond(u8 cond)
+{
+	return cond == BPF_JSGT || cond == BPF_JSLT ||
+	       cond == BPF_JSGE || cond == BPF_JSLE;
+}
+
+static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool extra_pass)
+{
+	const bool is32 = BPF_CLASS(insn->code) == BPF_ALU ||
+			  BPF_CLASS(insn->code) == BPF_JMP32;
+	const u8 code = insn->code;
+	const u8 cond = BPF_OP(code);
+	const u8 dst = regmap[insn->dst_reg];
+	const u8 src = regmap[insn->src_reg];
+	const u8 t1 = LOONGARCH_GPR_T1;
+	const u8 t2 = LOONGARCH_GPR_T2;
+	const s16 off = insn->off;
+	const s32 imm = insn->imm;
+	int i = insn - ctx->prog->insnsi;
+	int jmp_offset;
+
+	switch (code) {
+	/* dst = src */
+	case BPF_ALU | BPF_MOV | BPF_X:
+	case BPF_ALU64 | BPF_MOV | BPF_X:
+		move_reg(ctx, dst, src);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	/* dst = imm */
+	case BPF_ALU | BPF_MOV | BPF_K:
+	case BPF_ALU64 | BPF_MOV | BPF_K:
+		move_imm32(ctx, dst, imm, is32);
+		break;
+
+	/* dst = dst + src */
+	case BPF_ALU | BPF_ADD | BPF_X:
+	case BPF_ALU64 | BPF_ADD | BPF_X:
+		emit_insn(ctx, addd, dst, dst, src);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	/* dst = dst + imm */
+	case BPF_ALU | BPF_ADD | BPF_K:
+	case BPF_ALU64 | BPF_ADD | BPF_K:
+		if (is_signed_imm12(imm)) {
+			emit_insn(ctx, addid, dst, dst, imm);
+		} else {
+			move_imm32(ctx, t1, imm, is32);
+			emit_insn(ctx, addd, dst, dst, t1);
+		}
+		emit_zext_32(ctx, dst, is32);
+		break;
+
+	/* dst = dst - src */
+	case BPF_ALU | BPF_SUB | BPF_X:
+	case BPF_ALU64 | BPF_SUB | BPF_X:
+		emit_insn(ctx, subd, dst, dst, src);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	/* dst = dst - imm */
+	case BPF_ALU | BPF_SUB | BPF_K:
+	case BPF_ALU64 | BPF_SUB | BPF_K:
+		if (is_signed_imm12(-imm)) {
+			emit_insn(ctx, addid, dst, dst, -imm);
+		} else {
+			move_imm32(ctx, t1, imm, is32);
+			emit_insn(ctx, subd, dst, dst, t1);
+		}
+		emit_zext_32(ctx, dst, is32);
+		break;
+
+	/* dst = dst * src */
+	case BPF_ALU | BPF_MUL | BPF_X:
+	case BPF_ALU64 | BPF_MUL | BPF_X:
+		emit_insn(ctx, muld, dst, dst, src);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	/* dst = dst * imm */
+	case BPF_ALU | BPF_MUL | BPF_K:
+	case BPF_ALU64 | BPF_MUL | BPF_K:
+		move_imm32(ctx, t1, imm, is32);
+		emit_insn(ctx, muld, dst, dst, t1);
+		emit_zext_32(ctx, dst, is32);
+		break;
+
+	/* dst = dst / src */
+	case BPF_ALU | BPF_DIV | BPF_X:
+	case BPF_ALU64 | BPF_DIV | BPF_X:
+		emit_zext_32(ctx, dst, is32);
+		move_reg(ctx, t1, src);
+		emit_zext_32(ctx, t1, is32);
+		emit_insn(ctx, divdu, dst, dst, t1);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	/* dst = dst / imm */
+	case BPF_ALU | BPF_DIV | BPF_K:
+	case BPF_ALU64 | BPF_DIV | BPF_K:
+		move_imm32(ctx, t1, imm, is32);
+		emit_zext_32(ctx, dst, is32);
+		emit_insn(ctx, divdu, dst, dst, t1);
+		emit_zext_32(ctx, dst, is32);
+		break;
+
+	/* dst = dst % src */
+	case BPF_ALU | BPF_MOD | BPF_X:
+	case BPF_ALU64 | BPF_MOD | BPF_X:
+		emit_zext_32(ctx, dst, is32);
+		move_reg(ctx, t1, src);
+		emit_zext_32(ctx, t1, is32);
+		emit_insn(ctx, moddu, dst, dst, t1);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	/* dst = dst % imm */
+	case BPF_ALU | BPF_MOD | BPF_K:
+	case BPF_ALU64 | BPF_MOD | BPF_K:
+		move_imm32(ctx, t1, imm, is32);
+		emit_zext_32(ctx, dst, is32);
+		emit_insn(ctx, moddu, dst, dst, t1);
+		emit_zext_32(ctx, dst, is32);
+		break;
+
+	/* dst = -dst */
+	case BPF_ALU | BPF_NEG:
+	case BPF_ALU64 | BPF_NEG:
+		move_imm32(ctx, t1, imm, is32);
+		emit_insn(ctx, subd, dst, LOONGARCH_GPR_ZERO, dst);
+		emit_zext_32(ctx, dst, is32);
+		break;
+
+	/* dst = dst & src */
+	case BPF_ALU | BPF_AND | BPF_X:
+	case BPF_ALU64 | BPF_AND | BPF_X:
+		emit_insn(ctx, and, dst, dst, src);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	/* dst = dst & imm */
+	case BPF_ALU | BPF_AND | BPF_K:
+	case BPF_ALU64 | BPF_AND | BPF_K:
+		if (is_unsigned_imm12(imm)) {
+			emit_insn(ctx, andi, dst, dst, imm);
+		} else {
+			move_imm32(ctx, t1, imm, is32);
+			emit_insn(ctx, and, dst, dst, t1);
+		}
+		emit_zext_32(ctx, dst, is32);
+		break;
+
+	/* dst = dst | src */
+	case BPF_ALU | BPF_OR | BPF_X:
+	case BPF_ALU64 | BPF_OR | BPF_X:
+		emit_insn(ctx, or, dst, dst, src);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	/* dst = dst | imm */
+	case BPF_ALU | BPF_OR | BPF_K:
+	case BPF_ALU64 | BPF_OR | BPF_K:
+		if (is_unsigned_imm12(imm)) {
+			emit_insn(ctx, ori, dst, dst, imm);
+		} else {
+			move_imm32(ctx, t1, imm, is32);
+			emit_insn(ctx, or, dst, dst, t1);
+		}
+		emit_zext_32(ctx, dst, is32);
+		break;
+
+	/* dst = dst ^ src */
+	case BPF_ALU | BPF_XOR | BPF_X:
+	case BPF_ALU64 | BPF_XOR | BPF_X:
+		emit_insn(ctx, xor, dst, dst, src);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	/* dst = dst ^ imm */
+	case BPF_ALU | BPF_XOR | BPF_K:
+	case BPF_ALU64 | BPF_XOR | BPF_K:
+		if (is_unsigned_imm12(imm)) {
+			emit_insn(ctx, xori, dst, dst, imm);
+		} else {
+			move_imm32(ctx, t1, imm, is32);
+			emit_insn(ctx, xor, dst, dst, t1);
+		}
+		emit_zext_32(ctx, dst, is32);
+		break;
+
+	/* dst = dst << src (logical) */
+	case BPF_ALU | BPF_LSH | BPF_X:
+		emit_insn(ctx, sllw, dst, dst, src);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	case BPF_ALU64 | BPF_LSH | BPF_X:
+		emit_insn(ctx, slld, dst, dst, src);
+		break;
+	/* dst = dst << imm (logical) */
+	case BPF_ALU | BPF_LSH | BPF_K:
+		emit_insn(ctx, slliw, dst, dst, imm);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	case BPF_ALU64 | BPF_LSH | BPF_K:
+		emit_insn(ctx, sllid, dst, dst, imm);
+		break;
+
+	/* dst = dst >> src (logical) */
+	case BPF_ALU | BPF_RSH | BPF_X:
+		emit_insn(ctx, srlw, dst, dst, src);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	case BPF_ALU64 | BPF_RSH | BPF_X:
+		emit_insn(ctx, srld, dst, dst, src);
+		break;
+	/* dst = dst >> imm (logical) */
+	case BPF_ALU | BPF_RSH | BPF_K:
+		emit_insn(ctx, srliw, dst, dst, imm);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	case BPF_ALU64 | BPF_RSH | BPF_K:
+		emit_insn(ctx, srlid, dst, dst, imm);
+		break;
+
+	/* dst = dst >> src (arithmetic) */
+	case BPF_ALU | BPF_ARSH | BPF_X:
+		emit_insn(ctx, sraw, dst, dst, src);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	case BPF_ALU64 | BPF_ARSH | BPF_X:
+		emit_insn(ctx, srad, dst, dst, src);
+		break;
+	/* dst = dst >> imm (arithmetic) */
+	case BPF_ALU | BPF_ARSH | BPF_K:
+		emit_insn(ctx, sraiw, dst, dst, imm);
+		emit_zext_32(ctx, dst, is32);
+		break;
+	case BPF_ALU64 | BPF_ARSH | BPF_K:
+		emit_insn(ctx, sraid, dst, dst, imm);
+		break;
+
+	/* dst = BSWAP##imm(dst) */
+	case BPF_ALU | BPF_END | BPF_FROM_LE:
+		switch (imm) {
+		case 16:
+			/* zero-extend 16 bits into 64 bits */
+			emit_insn(ctx, sllid, dst, dst, 48);
+			emit_insn(ctx, srlid, dst, dst, 48);
+			break;
+		case 32:
+			/* zero-extend 32 bits into 64 bits */
+			emit_zext_32(ctx, dst, is32);
+			break;
+		case 64:
+			/* do nothing */
+			break;
+		}
+		break;
+	case BPF_ALU | BPF_END | BPF_FROM_BE:
+		switch (imm) {
+		case 16:
+			emit_insn(ctx, revb2h, dst, dst);
+			/* zero-extend 16 bits into 64 bits */
+			emit_insn(ctx, sllid, dst, dst, 48);
+			emit_insn(ctx, srlid, dst, dst, 48);
+			break;
+		case 32:
+			emit_insn(ctx, revb2w, dst, dst);
+			/* zero-extend 32 bits into 64 bits */
+			emit_zext_32(ctx, dst, is32);
+			break;
+		case 64:
+			emit_insn(ctx, revbd, dst, dst);
+			break;
+		}
+		break;
+
+	/* PC += off if dst cond src */
+	case BPF_JMP | BPF_JEQ | BPF_X:
+	case BPF_JMP | BPF_JNE | BPF_X:
+	case BPF_JMP | BPF_JGT | BPF_X:
+	case BPF_JMP | BPF_JGE | BPF_X:
+	case BPF_JMP | BPF_JLT | BPF_X:
+	case BPF_JMP | BPF_JLE | BPF_X:
+	case BPF_JMP | BPF_JSGT | BPF_X:
+	case BPF_JMP | BPF_JSGE | BPF_X:
+	case BPF_JMP | BPF_JSLT | BPF_X:
+	case BPF_JMP | BPF_JSLE | BPF_X:
+	case BPF_JMP32 | BPF_JEQ | BPF_X:
+	case BPF_JMP32 | BPF_JNE | BPF_X:
+	case BPF_JMP32 | BPF_JGT | BPF_X:
+	case BPF_JMP32 | BPF_JGE | BPF_X:
+	case BPF_JMP32 | BPF_JLT | BPF_X:
+	case BPF_JMP32 | BPF_JLE | BPF_X:
+	case BPF_JMP32 | BPF_JSGT | BPF_X:
+	case BPF_JMP32 | BPF_JSGE | BPF_X:
+	case BPF_JMP32 | BPF_JSLT | BPF_X:
+	case BPF_JMP32 | BPF_JSLE | BPF_X:
+		jmp_offset = bpf2la_offset(i, off, ctx);
+		move_reg(ctx, t1, dst);
+		move_reg(ctx, t2, src);
+		if (is_signed_bpf_cond(BPF_OP(code))) {
+			emit_sext_32(ctx, t1, is32);
+			emit_sext_32(ctx, t2, is32);
+		} else {
+			emit_zext_32(ctx, t1, is32);
+			emit_zext_32(ctx, t2, is32);
+		}
+		emit_cond_jmp(ctx, cond, t1, t2, jmp_offset);
+		break;
+
+	/* PC += off if dst cond imm */
+	case BPF_JMP | BPF_JEQ | BPF_K:
+	case BPF_JMP | BPF_JNE | BPF_K:
+	case BPF_JMP | BPF_JGT | BPF_K:
+	case BPF_JMP | BPF_JGE | BPF_K:
+	case BPF_JMP | BPF_JLT | BPF_K:
+	case BPF_JMP | BPF_JLE | BPF_K:
+	case BPF_JMP | BPF_JSGT | BPF_K:
+	case BPF_JMP | BPF_JSGE | BPF_K:
+	case BPF_JMP | BPF_JSLT | BPF_K:
+	case BPF_JMP | BPF_JSLE | BPF_K:
+	case BPF_JMP32 | BPF_JEQ | BPF_K:
+	case BPF_JMP32 | BPF_JNE | BPF_K:
+	case BPF_JMP32 | BPF_JGT | BPF_K:
+	case BPF_JMP32 | BPF_JGE | BPF_K:
+	case BPF_JMP32 | BPF_JLT | BPF_K:
+	case BPF_JMP32 | BPF_JLE | BPF_K:
+	case BPF_JMP32 | BPF_JSGT | BPF_K:
+	case BPF_JMP32 | BPF_JSGE | BPF_K:
+	case BPF_JMP32 | BPF_JSLT | BPF_K:
+	case BPF_JMP32 | BPF_JSLE | BPF_K:
+		jmp_offset = bpf2la_offset(i, off, ctx);
+		move_imm32(ctx, t1, imm, false);
+		move_reg(ctx, t2, dst);
+		if (is_signed_bpf_cond(BPF_OP(code))) {
+			emit_sext_32(ctx, t1, is32);
+			emit_sext_32(ctx, t2, is32);
+		} else {
+			emit_zext_32(ctx, t1, is32);
+			emit_zext_32(ctx, t2, is32);
+		}
+		emit_cond_jmp(ctx, cond, t2, t1, jmp_offset);
+		break;
+
+	/* PC += off if dst & src */
+	case BPF_JMP | BPF_JSET | BPF_X:
+	case BPF_JMP32 | BPF_JSET | BPF_X:
+		jmp_offset = bpf2la_offset(i, off, ctx);
+		emit_insn(ctx, and, t1, dst, src);
+		emit_zext_32(ctx, t1, is32);
+		emit_cond_jmp(ctx, cond, t1, LOONGARCH_GPR_ZERO, jmp_offset);
+		break;
+	/* PC += off if dst & imm */
+	case BPF_JMP | BPF_JSET | BPF_K:
+	case BPF_JMP32 | BPF_JSET | BPF_K:
+		jmp_offset = bpf2la_offset(i, off, ctx);
+		move_imm32(ctx, t1, imm, is32);
+		emit_insn(ctx, and, t1, dst, t1);
+		emit_zext_32(ctx, t1, is32);
+		emit_cond_jmp(ctx, cond, t1, LOONGARCH_GPR_ZERO, jmp_offset);
+		break;
+
+	/* PC += off */
+	case BPF_JMP | BPF_JA:
+		jmp_offset = bpf2la_offset(i, off, ctx);
+		emit_uncond_jmp(ctx, jmp_offset, is32);
+		break;
+
+	/* function call */
+	case BPF_JMP | BPF_CALL:
+		bool func_addr_fixed;
+		u64 func_addr;
+		int ret;
+
+		mark_call(ctx);
+		ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
+					    &func_addr, &func_addr_fixed);
+		if (ret < 0)
+			return ret;
+
+		move_imm64(ctx, t1, func_addr, is32);
+		emit_insn(ctx, jirl, t1, LOONGARCH_GPR_RA, 0);
+		move_reg(ctx, regmap[BPF_REG_0], LOONGARCH_GPR_A0);
+		break;
+
+	/* tail call */
+	case BPF_JMP | BPF_TAIL_CALL:
+		mark_tail_call(ctx);
+		if (emit_bpf_tail_call(ctx))
+			return -EINVAL;
+		break;
+
+	/* function return */
+	case BPF_JMP | BPF_EXIT:
+		emit_sext_32(ctx, regmap[BPF_REG_0], true);
+
+		if (i == ctx->prog->len - 1)
+			break;
+
+		jmp_offset = epilogue_offset(ctx);
+		emit_uncond_jmp(ctx, jmp_offset, true);
+		break;
+
+	/* dst = imm64 */
+	case BPF_LD | BPF_IMM | BPF_DW:
+		u64 imm64 = (u64)(insn + 1)->imm << 32 | (u32)insn->imm;
+
+		move_imm64(ctx, dst, imm64, is32);
+		return 1;
+
+	/* dst = *(size *)(src + off) */
+	case BPF_LDX | BPF_MEM | BPF_B:
+	case BPF_LDX | BPF_MEM | BPF_H:
+	case BPF_LDX | BPF_MEM | BPF_W:
+	case BPF_LDX | BPF_MEM | BPF_DW:
+		if (is_signed_imm12(off)) {
+			switch (BPF_SIZE(code)) {
+			case BPF_B:
+				emit_insn(ctx, ldbu, dst, src, off);
+				break;
+			case BPF_H:
+				emit_insn(ctx, ldhu, dst, src, off);
+				break;
+			case BPF_W:
+				emit_insn(ctx, ldwu, dst, src, off);
+				break;
+			case BPF_DW:
+				emit_insn(ctx, ldd, dst, src, off);
+				break;
+			}
+		} else {
+			move_imm32(ctx, t1, off, is32);
+			switch (BPF_SIZE(code)) {
+			case BPF_B:
+				emit_insn(ctx, ldxbu, dst, src, t1);
+				break;
+			case BPF_H:
+				emit_insn(ctx, ldxhu, dst, src, t1);
+				break;
+			case BPF_W:
+				emit_insn(ctx, ldxwu, dst, src, t1);
+				break;
+			case BPF_DW:
+				emit_insn(ctx, ldxd, dst, src, t1);
+				break;
+			}
+		}
+		break;
+
+	/* *(size *)(dst + off) = imm */
+	case BPF_ST | BPF_MEM | BPF_B:
+	case BPF_ST | BPF_MEM | BPF_H:
+	case BPF_ST | BPF_MEM | BPF_W:
+	case BPF_ST | BPF_MEM | BPF_DW:
+		move_imm32(ctx, t1, imm, is32);
+		if (is_signed_imm12(off)) {
+			switch (BPF_SIZE(code)) {
+			case BPF_B:
+				emit_insn(ctx, stb, t1, dst, off);
+				break;
+			case BPF_H:
+				emit_insn(ctx, sth, t1, dst, off);
+				break;
+			case BPF_W:
+				emit_insn(ctx, stw, t1, dst, off);
+				break;
+			case BPF_DW:
+				emit_insn(ctx, std, t1, dst, off);
+				break;
+			}
+		} else {
+			move_imm32(ctx, t2, off, is32);
+			switch (BPF_SIZE(code)) {
+			case BPF_B:
+				emit_insn(ctx, stxb, t1, dst, t2);
+				break;
+			case BPF_H:
+				emit_insn(ctx, stxh, t1, dst, t2);
+				break;
+			case BPF_W:
+				emit_insn(ctx, stxw, t1, dst, t2);
+				break;
+			case BPF_DW:
+				emit_insn(ctx, stxd, t1, dst, t2);
+				break;
+			}
+		}
+		break;
+
+	/* *(size *)(dst + off) = src */
+	case BPF_STX | BPF_MEM | BPF_B:
+	case BPF_STX | BPF_MEM | BPF_H:
+	case BPF_STX | BPF_MEM | BPF_W:
+	case BPF_STX | BPF_MEM | BPF_DW:
+		if (is_signed_imm12(off)) {
+			switch (BPF_SIZE(code)) {
+			case BPF_B:
+				emit_insn(ctx, stb, src, dst, off);
+				break;
+			case BPF_H:
+				emit_insn(ctx, sth, src, dst, off);
+				break;
+			case BPF_W:
+				emit_insn(ctx, stw, src, dst, off);
+				break;
+			case BPF_DW:
+				emit_insn(ctx, std, src, dst, off);
+				break;
+			}
+		} else {
+			move_imm32(ctx, t1, off, is32);
+			switch (BPF_SIZE(code)) {
+			case BPF_B:
+				emit_insn(ctx, stxb, src, dst, t1);
+				break;
+			case BPF_H:
+				emit_insn(ctx, stxh, src, dst, t1);
+				break;
+			case BPF_W:
+				emit_insn(ctx, stxw, src, dst, t1);
+				break;
+			case BPF_DW:
+				emit_insn(ctx, stxd, src, dst, t1);
+				break;
+			}
+		}
+		break;
+
+	case BPF_STX | BPF_ATOMIC | BPF_W:
+	case BPF_STX | BPF_ATOMIC | BPF_DW:
+		emit_atomic(insn, ctx);
+		break;
+
+	default:
+		pr_err("bpf_jit: unknown opcode %02x\n", code);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int build_body(struct jit_ctx *ctx, bool extra_pass)
+{
+	const struct bpf_prog *prog = ctx->prog;
+	int i;
+
+	for (i = 0; i < prog->len; i++) {
+		const struct bpf_insn *insn = &prog->insnsi[i];
+		int ret;
+
+		if (ctx->image == NULL)
+			ctx->offset[i] = ctx->idx;
+
+		ret = build_insn(insn, ctx, extra_pass);
+		if (ret > 0) {
+			i++;
+			if (ctx->image == NULL)
+				ctx->offset[i] = ctx->idx;
+			continue;
+		}
+		if (ret)
+			return ret;
+	}
+
+	if (ctx->image == NULL)
+		ctx->offset[i] = ctx->idx;
+
+	return 0;
+}
+
+static inline void bpf_flush_icache(void *start, void *end)
+{
+	flush_icache_range((unsigned long)start, (unsigned long)end);
+}
+
+/* Fill space with illegal instructions */
+static void jit_fill_hole(void *area, unsigned int size)
+{
+	u32 *ptr;
+
+	/* We are guaranteed to have aligned memory */
+	for (ptr = area; size >= sizeof(u32); size -= sizeof(u32))
+		*ptr++ = INSN_BREAK;
+}
+
+static int validate_code(struct jit_ctx *ctx)
+{
+	int i;
+	union loongarch_instruction insn;
+
+	for (i = 0; i < ctx->idx; i++) {
+		insn = ctx->image[i];
+		/* Check INSN_BREAK */
+		if (insn.word == INSN_BREAK)
+			return -1;
+	}
+
+	return 0;
+}
+
+struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
+{
+	struct bpf_prog *tmp, *orig_prog = prog;
+	struct bpf_binary_header *header;
+	struct jit_data *jit_data;
+	struct jit_ctx ctx;
+	bool tmp_blinded = false;
+	bool extra_pass = false;
+	int image_size;
+	u8 *image_ptr;
+
+	/*
+	 * If BPF JIT was not enabled then we must fall back to
+	 * the interpreter.
+	 */
+	if (!prog->jit_requested)
+		return orig_prog;
+
+	tmp = bpf_jit_blind_constants(prog);
+	/*
+	 * If blinding was requested and we failed during blinding,
+	 * we must fall back to the interpreter. Otherwise, we save
+	 * the new JITed code.
+	 */
+	if (IS_ERR(tmp))
+		return orig_prog;
+	if (tmp != prog) {
+		tmp_blinded = true;
+		prog = tmp;
+	}
+
+	jit_data = prog->aux->jit_data;
+	if (!jit_data) {
+		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+		if (!jit_data) {
+			prog = orig_prog;
+			goto out;
+		}
+		prog->aux->jit_data = jit_data;
+	}
+	if (jit_data->ctx.offset) {
+		ctx = jit_data->ctx;
+		image_ptr = jit_data->image;
+		header = jit_data->header;
+		extra_pass = true;
+		image_size = sizeof(u32) * ctx.idx;
+		goto skip_init_ctx;
+	}
+
+	memset(&ctx, 0, sizeof(ctx));
+	ctx.prog = prog;
+
+	ctx.offset = kcalloc(prog->len + 1, sizeof(u32), GFP_KERNEL);
+	if (ctx.offset == NULL) {
+		prog = orig_prog;
+		goto out_off;
+	}
+
+	/* 1. Initial fake pass to compute ctx->idx and set ctx->flags */
+	if (build_body(&ctx, extra_pass)) {
+		prog = orig_prog;
+		goto out_off;
+	}
+	build_prologue(&ctx);
+	ctx.epilogue_offset = ctx.idx;
+	build_epilogue(&ctx);
+
+	/* Now we know the actual image size.
+	 * As each LoongArch instruction is of length 32bit,
+	 * we are translating number of JITed intructions into
+	 * the size required to store these JITed code.
+	 */
+	image_size = sizeof(u32) * ctx.idx;
+	/* Now we know the size of the structure to make */
+	header = bpf_jit_binary_alloc(image_size, &image_ptr,
+				      sizeof(u32), jit_fill_hole);
+	if (header == NULL) {
+		prog = orig_prog;
+		goto out_off;
+	}
+
+	/* 2. Now, the actual pass to generate final JIT code */
+	ctx.image = (union loongarch_instruction *)image_ptr;
+skip_init_ctx:
+	ctx.idx = 0;
+
+	build_prologue(&ctx);
+	if (build_body(&ctx, extra_pass)) {
+		bpf_jit_binary_free(header);
+		prog = orig_prog;
+		goto out_off;
+	}
+	build_epilogue(&ctx);
+
+	/* 3. Extra pass to validate JITed code */
+	if (validate_code(&ctx)) {
+		bpf_jit_binary_free(header);
+		prog = orig_prog;
+		goto out_off;
+	}
+
+	/* And we're done */
+	if (bpf_jit_enable > 1)
+		bpf_jit_dump(prog->len, image_size, 2, ctx.image);
+
+	/* Update the icache */
+	bpf_flush_icache(header, ctx.image + ctx.idx);
+
+	if (!prog->is_func || extra_pass) {
+		if (extra_pass && ctx.idx != jit_data->ctx.idx) {
+			pr_err_once("multi-func JIT bug %d != %d\n",
+				    ctx.idx, jit_data->ctx.idx);
+			bpf_jit_binary_free(header);
+			prog->bpf_func = NULL;
+			prog->jited = 0;
+			prog->jited_len = 0;
+			goto out_off;
+		}
+		bpf_jit_binary_lock_ro(header);
+	} else {
+		jit_data->ctx = ctx;
+		jit_data->image = image_ptr;
+		jit_data->header = header;
+	}
+	prog->bpf_func = (void *)ctx.image;
+	prog->jited = 1;
+	prog->jited_len = image_size;
+
+	if (!prog->is_func || extra_pass) {
+out_off:
+		kfree(ctx.offset);
+		kfree(jit_data);
+		prog->aux->jit_data = NULL;
+	}
+out:
+	if (tmp_blinded)
+		bpf_jit_prog_release_other(prog, prog == orig_prog ?
+					   tmp : orig_prog);
+
+	out_offset = -1;
+	return prog;
+}
diff --git a/arch/loongarch/net/bpf_jit.h b/arch/loongarch/net/bpf_jit.h
new file mode 100644
index 0000000..9c735f3
--- /dev/null
+++ b/arch/loongarch/net/bpf_jit.h
@@ -0,0 +1,308 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * BPF JIT compiler for LoongArch
+ *
+ * Copyright (C) 2022 Loongson Technology Corporation Limited
+ */
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <asm/cacheflush.h>
+#include <asm/inst.h>
+
+struct jit_ctx {
+	const struct bpf_prog *prog;
+	unsigned int idx;
+	unsigned int flags;
+	unsigned int epilogue_offset;
+	u32 *offset;
+	union loongarch_instruction *image;
+	u32 stack_size;
+};
+
+struct jit_data {
+	struct bpf_binary_header *header;
+	u8 *image;
+	struct jit_ctx ctx;
+};
+
+#define emit_insn(ctx, func, ...)						\
+do {										\
+	if (ctx->image != NULL) {						\
+		union loongarch_instruction *insn = &ctx->image[ctx->idx];	\
+		emit_##func(insn, ##__VA_ARGS__);				\
+	}									\
+	ctx->idx++;								\
+} while (0)
+
+#define is_signed_imm12(val)	signed_imm_check(val, 12)
+#define is_signed_imm16(val)	signed_imm_check(val, 16)
+#define is_signed_imm26(val)	signed_imm_check(val, 26)
+#define is_signed_imm32(val)	signed_imm_check(val, 32)
+#define is_signed_imm52(val)	signed_imm_check(val, 52)
+#define is_unsigned_imm12(val)	unsigned_imm_check(val, 12)
+
+static inline int bpf2la_offset(int bpf_insn, int off, const struct jit_ctx *ctx)
+{
+	/* BPF JMP offset is relative to the next instruction */
+	bpf_insn++;
+	/*
+	 * Whereas loongarch branch instructions encode the offset
+	 * from the branch itself, so we must subtract 1 from the
+	 * instruction offset.
+	 */
+	return (ctx->offset[bpf_insn + off] - (ctx->offset[bpf_insn] - 1));
+}
+
+static inline int epilogue_offset(const struct jit_ctx *ctx)
+{
+	int to = ctx->epilogue_offset;
+	int from = ctx->idx;
+
+	return (to - from);
+}
+
+/* Zero-extend 32 bits into 64 bits */
+static inline void emit_zext_32(struct jit_ctx *ctx, enum loongarch_gpr reg, bool is32)
+{
+	if (!is32)
+		return;
+
+	emit_insn(ctx, lu32id, reg, 0);
+}
+
+/* Signed-extend 32 bits into 64 bits */
+static inline void emit_sext_32(struct jit_ctx *ctx, enum loongarch_gpr reg, bool is32)
+{
+	if (!is32)
+		return;
+
+	emit_insn(ctx, addiw, reg, reg, 0);
+}
+
+static inline void move_imm32(struct jit_ctx *ctx, enum loongarch_gpr rd,
+			      int imm32, bool is32)
+{
+	int si20;
+	u32 ui12;
+
+	/* or rd, $zero, $zero */
+	if (imm32 == 0) {
+		emit_insn(ctx, or, rd, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_ZERO);
+		return;
+	}
+
+	/* addiw rd, $zero, imm_11_0(signed) */
+	if (is_signed_imm12(imm32)) {
+		emit_insn(ctx, addiw, rd, LOONGARCH_GPR_ZERO, imm32);
+		goto zext;
+	}
+
+	/* ori rd, $zero, imm_11_0(unsigned) */
+	if (is_unsigned_imm12(imm32)) {
+		emit_insn(ctx, ori, rd, LOONGARCH_GPR_ZERO, imm32);
+		goto zext;
+	}
+
+	/* lu12iw rd, imm_31_12(signed) */
+	si20 = (imm32 >> 12) & 0xfffff;
+	emit_insn(ctx, lu12iw, rd, si20);
+
+	/* ori rd, rd, imm_11_0(unsigned) */
+	ui12 = imm32 & 0xfff;
+	if (ui12 != 0)
+		emit_insn(ctx, ori, rd, rd, ui12);
+
+zext:
+	emit_zext_32(ctx, rd, is32);
+}
+
+static inline void move_imm64(struct jit_ctx *ctx, enum loongarch_gpr rd,
+			      long imm64, bool is32)
+{
+	int imm32, si20, si12;
+	long imm52;
+
+	si12 = (imm64 >> 52) & 0xfff;
+	imm52 = imm64 & 0xfffffffffffff;
+	/* lu52id rd, $zero, imm_63_52(signed) */
+	if (si12 != 0 && imm52 == 0) {
+		emit_insn(ctx, lu52id, rd, LOONGARCH_GPR_ZERO, si12);
+		return;
+	}
+
+	imm32 = imm64 & 0xffffffff;
+	move_imm32(ctx, rd, imm32, is32);
+
+	if (!is_signed_imm32(imm64)) {
+		if (imm52 != 0) {
+			/* lu32id rd, imm_51_32(signed) */
+			si20 = (imm64 >> 32) & 0xfffff;
+			emit_insn(ctx, lu32id, rd, si20);
+		}
+
+		/* lu52id rd, rd, imm_63_52(signed) */
+		if (!is_signed_imm52(imm64))
+			emit_insn(ctx, lu52id, rd, rd, si12);
+	}
+}
+
+static inline void move_reg(struct jit_ctx *ctx, enum loongarch_gpr rd,
+			    enum loongarch_gpr rj)
+{
+	emit_insn(ctx, or, rd, rj, LOONGARCH_GPR_ZERO);
+}
+
+static inline int invert_jmp_cond(u8 cond)
+{
+	switch (cond) {
+	case BPF_JEQ:
+		return BPF_JNE;
+	case BPF_JNE:
+	case BPF_JSET:
+		return BPF_JEQ;
+	case BPF_JGT:
+		return BPF_JLE;
+	case BPF_JGE:
+		return BPF_JLT;
+	case BPF_JLT:
+		return BPF_JGE;
+	case BPF_JLE:
+		return BPF_JGT;
+	case BPF_JSGT:
+		return BPF_JSLE;
+	case BPF_JSGE:
+		return BPF_JSLT;
+	case BPF_JSLT:
+		return BPF_JSGE;
+	case BPF_JSLE:
+		return BPF_JSGT;
+	}
+	return -1;
+}
+
+static inline void cond_jmp_offs16(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
+				   enum loongarch_gpr rd, int jmp_offset)
+{
+	switch (cond) {
+	case BPF_JEQ:
+		/* PC += jmp_offset if rj == rd */
+		emit_insn(ctx, beq, rj, rd, jmp_offset);
+		return;
+	case BPF_JNE:
+	case BPF_JSET:
+		/* PC += jmp_offset if rj != rd */
+		emit_insn(ctx, bne, rj, rd, jmp_offset);
+		return;
+	case BPF_JGT:
+		/* PC += jmp_offset if rj > rd (unsigned) */
+		emit_insn(ctx, bltu, rd, rj, jmp_offset);
+		return;
+	case BPF_JLT:
+		/* PC += jmp_offset if rj < rd (unsigned) */
+		emit_insn(ctx, bltu, rj, rd, jmp_offset);
+		return;
+	case BPF_JGE:
+		/* PC += jmp_offset if rj >= rd (unsigned) */
+		emit_insn(ctx, bgeu, rj, rd, jmp_offset);
+		return;
+	case BPF_JLE:
+		/* PC += jmp_offset if rj <= rd (unsigned) */
+		emit_insn(ctx, bgeu, rd, rj, jmp_offset);
+		return;
+	case BPF_JSGT:
+		/* PC += jmp_offset if rj > rd (signed) */
+		emit_insn(ctx, blt, rd, rj, jmp_offset);
+		return;
+	case BPF_JSLT:
+		/* PC += jmp_offset if rj < rd (signed) */
+		emit_insn(ctx, blt, rj, rd, jmp_offset);
+		return;
+	case BPF_JSGE:
+		/* PC += jmp_offset if rj >= rd (signed) */
+		emit_insn(ctx, bge, rj, rd, jmp_offset);
+		return;
+	case BPF_JSLE:
+		/* PC += jmp_offset if rj <= rd (signed) */
+		emit_insn(ctx, bge, rd, rj, jmp_offset);
+		return;
+	}
+}
+
+static inline void cond_jmp_offs26(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
+				   enum loongarch_gpr rd, int jmp_offset)
+{
+	cond = invert_jmp_cond(cond);
+	cond_jmp_offs16(ctx, cond, rj, rd, 2);
+	emit_insn(ctx, b, jmp_offset);
+}
+
+static inline void cond_jmp_offs32(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
+				   enum loongarch_gpr rd, int jmp_offset)
+{
+	s64 upper, lower;
+
+	upper = (jmp_offset + (1 << 15)) >> 16;
+	lower = jmp_offset & 0xffff;
+
+	cond = invert_jmp_cond(cond);
+	cond_jmp_offs16(ctx, cond, rj, rd, 3);
+
+	/*
+	 * jmp_addr = jmp_offset << 2
+	 * tmp2 = PC + jmp_addr[31, 18] + 18'b0
+	 */
+	emit_insn(ctx, pcaddu18i, LOONGARCH_GPR_T2, upper << 2);
+
+	/* jump to (tmp2 + jmp_addr[17, 2] + 2'b0) */
+	emit_insn(ctx, jirl, LOONGARCH_GPR_T2, LOONGARCH_GPR_ZERO, lower + 1);
+}
+
+static inline void uncond_jmp_offs26(struct jit_ctx *ctx, int jmp_offset)
+{
+	emit_insn(ctx, b, jmp_offset);
+}
+
+static inline void uncond_jmp_offs32(struct jit_ctx *ctx, int jmp_offset, bool is_exit)
+{
+	s64 upper, lower;
+
+	upper = (jmp_offset + (1 << 15)) >> 16;
+	lower = jmp_offset & 0xffff;
+
+	if (is_exit)
+		lower -= 1;
+
+	/*
+	 * jmp_addr = jmp_offset << 2;
+	 * tmp1 = PC + jmp_addr[31, 18] + 18'b0
+	 */
+	emit_insn(ctx, pcaddu18i, LOONGARCH_GPR_T1, upper << 2);
+
+	/* jump to (tmp1 + jmp_addr[17, 2] + 2'b0) */
+	emit_insn(ctx, jirl, LOONGARCH_GPR_T1, LOONGARCH_GPR_ZERO, lower + 1);
+}
+
+static inline void emit_cond_jmp(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
+				 enum loongarch_gpr rd, int jmp_offset)
+{
+	cond_jmp_offs26(ctx, cond, rj, rd, jmp_offset);
+}
+
+static inline void emit_uncond_jmp(struct jit_ctx *ctx, int jmp_offset, bool is_exit)
+{
+	if (is_signed_imm26(jmp_offset))
+		uncond_jmp_offs26(ctx, jmp_offset);
+	else
+		uncond_jmp_offs32(ctx, jmp_offset, is_exit);
+}
+
+static inline void emit_tailcall_jmp(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
+				     enum loongarch_gpr rd, int jmp_offset)
+{
+	if (is_signed_imm16(jmp_offset))
+		cond_jmp_offs16(ctx, cond, rj, rd, jmp_offset);
+	else if (is_signed_imm26(jmp_offset))
+		cond_jmp_offs26(ctx, cond, rj, rd, jmp_offset - 1);
+	else
+		cond_jmp_offs32(ctx, cond, rj, rd, jmp_offset - 2);
+}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v1 4/4] LoongArch: Enable BPF_JIT and TEST_BPF in default config
  2022-08-20 11:50 [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch Tiezhu Yang
                   ` (2 preceding siblings ...)
  2022-08-20 11:50 ` [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support Tiezhu Yang
@ 2022-08-20 11:51 ` Tiezhu Yang
  2022-08-22  1:36 ` [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch Tiezhu Yang
  4 siblings, 0 replies; 15+ messages in thread
From: Tiezhu Yang @ 2022-08-20 11:51 UTC (permalink / raw)
  To: Huacai Chen, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, loongarch

For now, BPF JIT for LoongArch is supported, update loongson3_defconfig to
enable BPF_JIT to allow the kernel to generate native code when a program
is loaded into the kernel, and also enable TEST_BPF to test BPF JIT.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 arch/loongarch/configs/loongson3_defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/loongarch/configs/loongson3_defconfig b/arch/loongarch/configs/loongson3_defconfig
index 3712552..93dc072 100644
--- a/arch/loongarch/configs/loongson3_defconfig
+++ b/arch/loongarch/configs/loongson3_defconfig
@@ -4,6 +4,7 @@ CONFIG_POSIX_MQUEUE=y
 CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_BPF_SYSCALL=y
+CONFIG_BPF_JIT=y
 CONFIG_PREEMPT=y
 CONFIG_BSD_PROCESS_ACCT=y
 CONFIG_BSD_PROCESS_ACCT_V3=y
@@ -801,3 +802,4 @@ CONFIG_MAGIC_SYSRQ=y
 CONFIG_SCHEDSTATS=y
 # CONFIG_DEBUG_PREEMPT is not set
 # CONFIG_FTRACE is not set
+CONFIG_TEST_BPF=m
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support
  2022-08-20 11:50 ` [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support Tiezhu Yang
@ 2022-08-20 13:41   ` kernel test robot
  2022-08-22  1:33     ` Tiezhu Yang
  2022-08-22  1:58   ` Youling Tang
  2022-08-22  2:50   ` Jinyang He
  2 siblings, 1 reply; 15+ messages in thread
From: kernel test robot @ 2022-08-20 13:41 UTC (permalink / raw)
  To: Tiezhu Yang, Huacai Chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
  Cc: kbuild-all, bpf, loongarch

Hi Tiezhu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.0-rc1 next-20220819]
[cannot apply to bpf-next/master bpf/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tiezhu-Yang/Add-BPF-JIT-support-for-LoongArch/20220820-195351
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 50cd95ac46548429e5bba7ca75cc97d11a697947
config: loongarch-allyesconfig (https://download.01.org/0day-ci/archive/20220820/202208202131.KsF0Aoos-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/ebe9a0ace4f1fb110c43c347808c81cb07dfeb9b
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Tiezhu-Yang/Add-BPF-JIT-support-for-LoongArch/20220820-195351
        git checkout ebe9a0ace4f1fb110c43c347808c81cb07dfeb9b
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch SHELL=/bin/bash arch/loongarch/net/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/loongarch/net/bpf_jit.c:194:6: warning: no previous prototype for 'build_epilogue' [-Wmissing-prototypes]
     194 | void build_epilogue(struct jit_ctx *ctx)
         |      ^~~~~~~~~~~~~~


vim +/build_epilogue +194 arch/loongarch/net/bpf_jit.c

   193	
 > 194	void build_epilogue(struct jit_ctx *ctx)
   195	{
   196		__build_epilogue(ctx, false);
   197	}
   198	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support
  2022-08-20 13:41   ` kernel test robot
@ 2022-08-22  1:33     ` Tiezhu Yang
  0 siblings, 0 replies; 15+ messages in thread
From: Tiezhu Yang @ 2022-08-22  1:33 UTC (permalink / raw)
  To: kernel test robot, Huacai Chen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko
  Cc: kbuild-all, bpf, loongarch



On 08/20/2022 09:41 PM, kernel test robot wrote:
> Hi Tiezhu,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on linus/master]
> [also build test WARNING on v6.0-rc1 next-20220819]
> [cannot apply to bpf-next/master bpf/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Tiezhu-Yang/Add-BPF-JIT-support-for-LoongArch/20220820-195351
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 50cd95ac46548429e5bba7ca75cc97d11a697947
> config: loongarch-allyesconfig (https://download.01.org/0day-ci/archive/20220820/202208202131.KsF0Aoos-lkp@intel.com/config)
> compiler: loongarch64-linux-gcc (GCC) 12.1.0
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/ebe9a0ace4f1fb110c43c347808c81cb07dfeb9b
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Tiezhu-Yang/Add-BPF-JIT-support-for-LoongArch/20220820-195351
>         git checkout ebe9a0ace4f1fb110c43c347808c81cb07dfeb9b
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch SHELL=/bin/bash arch/loongarch/net/
>
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <lkp@intel.com>
>
> All warnings (new ones prefixed by >>):
>
>>> arch/loongarch/net/bpf_jit.c:194:6: warning: no previous prototype for 'build_epilogue' [-Wmissing-prototypes]
>      194 | void build_epilogue(struct jit_ctx *ctx)
>          |      ^~~~~~~~~~~~~~
>
>
> vim +/build_epilogue +194 arch/loongarch/net/bpf_jit.c
>
>    193	
>  > 194	void build_epilogue(struct jit_ctx *ctx)
>    195	{
>    196		__build_epilogue(ctx, false);
>    197	}
>    198	
>

Hi robot,

Thank you, build_epilogue() should be static,
I will send v2 after waiting for other feedback.

Thanks,
Tiezhu


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch
  2022-08-20 11:50 [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch Tiezhu Yang
                   ` (3 preceding siblings ...)
  2022-08-20 11:51 ` [PATCH bpf-next v1 4/4] LoongArch: Enable BPF_JIT and TEST_BPF in default config Tiezhu Yang
@ 2022-08-22  1:36 ` Tiezhu Yang
  2022-08-23  0:46   ` Alexei Starovoitov
  4 siblings, 1 reply; 15+ messages in thread
From: Tiezhu Yang @ 2022-08-22  1:36 UTC (permalink / raw)
  To: Huacai Chen, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, loongarch



On 08/20/2022 07:50 PM, Tiezhu Yang wrote:
> The basic support for LoongArch has been merged into the upstream Linux
> kernel since 5.19-rc1 on June 5, 2022, this patch series adds BPF JIT
> support for LoongArch.
>
> Here is the LoongArch documention:
> https://www.kernel.org/doc/html/latest/loongarch/index.html
>
> With this patch series, the test cases in lib/test_bpf.ko have passed
> on LoongArch.
>
>   # echo 1 > /proc/sys/net/core/bpf_jit_enable
>   # modprobe test_bpf
>   # dmesg | grep Summary
>   test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]
>   test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
>   test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
>
> It seems that this patch series can not be applied cleanly to bpf-next
> which is not synced to v6.0-rc1.


Hi Alexei, Daniel, Andrii,

Do you know which tree this patch series will go through?
bpf-next or loongarch-next?

I will wait for some more review comments and then send v2
to fix the build warning in patch #3 reported by test robot.

Thanks,
Tiezhu

>
> v1:
>   -- Rebased series on v6.0-rc1
>   -- Move {signed,unsigned}_imm_check() to inst.h
>   -- Define the imm field as "unsigned int" in the instruction format
>   -- Use DEF_EMIT_*_FORMAT to define the same kind of instructions
>   -- Use "stack_adjust += sizeof(long) * 8" in build_prologue()
>
> RFC:
>   https://lore.kernel.org/bpf/1660013580-19053-1-git-send-email-yangtiezhu@loongson.cn/
>
> Tiezhu Yang (4):
>   LoongArch: Move {signed,unsigned}_imm_check() to inst.h
>   LoongArch: Add some instruction opcodes and formats
>   LoongArch: Add BPF JIT support
>   LoongArch: Enable BPF_JIT and TEST_BPF in default config
>
>  arch/loongarch/Kbuild                      |    1 +
>  arch/loongarch/Kconfig                     |    1 +
>  arch/loongarch/configs/loongson3_defconfig |    2 +
>  arch/loongarch/include/asm/inst.h          |  317 +++++++-
>  arch/loongarch/kernel/module.c             |   10 -
>  arch/loongarch/net/Makefile                |    7 +
>  arch/loongarch/net/bpf_jit.c               | 1113 ++++++++++++++++++++++++++++
>  arch/loongarch/net/bpf_jit.h               |  308 ++++++++
>  8 files changed, 1744 insertions(+), 15 deletions(-)
>  create mode 100644 arch/loongarch/net/Makefile
>  create mode 100644 arch/loongarch/net/bpf_jit.c
>  create mode 100644 arch/loongarch/net/bpf_jit.h
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support
  2022-08-20 11:50 ` [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support Tiezhu Yang
  2022-08-20 13:41   ` kernel test robot
@ 2022-08-22  1:58   ` Youling Tang
  2022-08-22  2:03     ` Youling Tang
  2022-08-22  2:49     ` Tiezhu Yang
  2022-08-22  2:50   ` Jinyang He
  2 siblings, 2 replies; 15+ messages in thread
From: Youling Tang @ 2022-08-22  1:58 UTC (permalink / raw)
  To: Tiezhu Yang, Huacai Chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
  Cc: bpf, loongarch

On 08/20/2022 07:50 PM, Tiezhu Yang wrote:
> BPF programs are normally handled by a BPF interpreter, add BPF JIT
> support for LoongArch to allow the kernel to generate native code
> when a program is loaded into the kernel, this will significantly
> speed-up processing of BPF programs.
>
> Co-developed-by: Youling Tang <tangyouling@loongson.cn>
> Signed-off-by: Youling Tang <tangyouling@loongson.cn>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
>  arch/loongarch/Kbuild             |    1 +
>  arch/loongarch/Kconfig            |    1 +
>  arch/loongarch/include/asm/inst.h |  185 ++++++
>  arch/loongarch/net/Makefile       |    7 +
>  arch/loongarch/net/bpf_jit.c      | 1113 +++++++++++++++++++++++++++++++++++++
>  arch/loongarch/net/bpf_jit.h      |  308 ++++++++++
>  6 files changed, 1615 insertions(+)
>  create mode 100644 arch/loongarch/net/Makefile
>  create mode 100644 arch/loongarch/net/bpf_jit.c
>  create mode 100644 arch/loongarch/net/bpf_jit.h
>
> diff --git a/arch/loongarch/Kbuild b/arch/loongarch/Kbuild
> index ab5373d..b01f5cd 100644
> --- a/arch/loongarch/Kbuild
> +++ b/arch/loongarch/Kbuild
> @@ -1,5 +1,6 @@
>  obj-y += kernel/
>  obj-y += mm/
> +obj-y += net/
>  obj-y += vdso/
>
>  # for cleaning
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 4abc9a2..6d9d846 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -82,6 +82,7 @@ config LOONGARCH
>  	select HAVE_CONTEXT_TRACKING_USER
>  	select HAVE_DEBUG_STACKOVERFLOW
>  	select HAVE_DMA_CONTIGUOUS
> +	select HAVE_EBPF_JIT if 64BIT
>  	select HAVE_EXIT_THREAD
>  	select HAVE_FAST_GUP
>  	select HAVE_GENERIC_VDSO
> diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
> index de19a96..ac06f2e 100644
> --- a/arch/loongarch/include/asm/inst.h
> +++ b/arch/loongarch/include/asm/inst.h
> @@ -288,4 +288,189 @@ static inline bool unsigned_imm_check(unsigned long val, unsigned int bit)
>  	return val < (1UL << bit);
>  }
>
> +#define DEF_EMIT_REG0I26_FORMAT(NAME, OP)				\
> +static inline void emit_##NAME(union loongarch_instruction *insn,	\
> +			       int offset)				\
> +{									\
> +	unsigned int immediate_l, immediate_h;				\
> +									\
> +	immediate_l = offset & 0xffff;					\
> +	offset >>= 16;							\
> +	immediate_h = offset & 0x3ff;					\
> +									\
> +	insn->reg0i26_format.opcode = OP;				\
> +	insn->reg0i26_format.immediate_l = immediate_l;			\
> +	insn->reg0i26_format.immediate_h = immediate_h;			\
> +}
> +
> +DEF_EMIT_REG0I26_FORMAT(b, b_op)
> +
> +#define DEF_EMIT_REG1I20_FORMAT(NAME, OP)				\
> +static inline void emit_##NAME(union loongarch_instruction *insn,	\
> +			       enum loongarch_gpr rd, int imm)		\
> +{									\
> +	insn->reg1i20_format.opcode = OP;				\
> +	insn->reg1i20_format.immediate = imm;				\
> +	insn->reg1i20_format.rd = rd;					\
> +}
> +
> +DEF_EMIT_REG1I20_FORMAT(lu12iw, lu12iw_op)
> +DEF_EMIT_REG1I20_FORMAT(lu32id, lu32id_op)

We can delete the larch_insn_gen_{lu32id, lu52id, jirl} functions in
inst.c and use emit_xxx.

The implementation of emit_plt_entry() is similarly modified as follows:
struct plt_entry {
         union loongarch_instruction lu12iw;
         union loongarch_instruction lu32id;
         union loongarch_instruction lu52id;
         union loongarch_instruction jirl;
};

static inline struct plt_entry emit_plt_entry(unsigned long val)
{
         union loongarch_instruction *lu12iw, *lu32id, *lu52id, *jirl;

         emit_lu32id(lu12iw, LOONGARCH_GPR_T1, ADDR_IMM(val, LU12IW));
         emit_lu32id(lu32id, LOONGARCH_GPR_T1, ADDR_IMM(val, LU32ID));
         emit_lu52id(lu52id, LOONGARCH_GPR_T1, LOONGARCH_GPR_T1, 
ADDR_IMM(val, LU52ID));
         emit_jirl(jirl, LOONGARCH_GPR_T1, 0, (val & 0xfff) >> 2);

         return (struct plt_entry) { *lu12iw, *lu32id, *lu52id, *jirl };
}

Thanks,
Youling

> +DEF_EMIT_REG1I20_FORMAT(pcaddu18i, pcaddu18i_op)
> +
> +#define DEF_EMIT_REG2_FORMAT(NAME, OP)					\
> +static inline void emit_##NAME(union loongarch_instruction *insn,	\
> +			       enum loongarch_gpr rd,			\
> +			       enum loongarch_gpr rj)			\
> +{									\
> +	insn->reg2_format.opcode = OP;					\
> +	insn->reg2_format.rd = rd;					\
> +	insn->reg2_format.rj = rj;					\
> +}
> +
> +DEF_EMIT_REG2_FORMAT(revb2h, revb2h_op)
> +DEF_EMIT_REG2_FORMAT(revb2w, revb2w_op)
> +DEF_EMIT_REG2_FORMAT(revbd, revbd_op)
> +
> +#define DEF_EMIT_REG2I5_FORMAT(NAME, OP)				\
> +static inline void emit_##NAME(union loongarch_instruction *insn,	\
> +			       enum loongarch_gpr rd,			\
> +			       enum loongarch_gpr rj,			\
> +			       int imm)					\
> +{									\
> +	insn->reg2i5_format.opcode = OP;				\
> +	insn->reg2i5_format.immediate = imm;				\
> +	insn->reg2i5_format.rd = rd;					\
> +	insn->reg2i5_format.rj = rj;					\
> +}
> +
> +DEF_EMIT_REG2I5_FORMAT(slliw, slliw_op)
> +DEF_EMIT_REG2I5_FORMAT(srliw, srliw_op)
> +DEF_EMIT_REG2I5_FORMAT(sraiw, sraiw_op)
> +
> +#define DEF_EMIT_REG2I6_FORMAT(NAME, OP)				\
> +static inline void emit_##NAME(union loongarch_instruction *insn,	\
> +			       enum loongarch_gpr rd,			\
> +			       enum loongarch_gpr rj,			\
> +			       int imm)					\
> +{									\
> +	insn->reg2i6_format.opcode = OP;				\
> +	insn->reg2i6_format.immediate = imm;				\
> +	insn->reg2i6_format.rd = rd;					\
> +	insn->reg2i6_format.rj = rj;					\
> +}
> +
> +DEF_EMIT_REG2I6_FORMAT(sllid, sllid_op)
> +DEF_EMIT_REG2I6_FORMAT(srlid, srlid_op)
> +DEF_EMIT_REG2I6_FORMAT(sraid, sraid_op)
> +
> +#define DEF_EMIT_REG2I12_FORMAT(NAME, OP)				\
> +static inline void emit_##NAME(union loongarch_instruction *insn,	\
> +			       enum loongarch_gpr rd,			\
> +			       enum loongarch_gpr rj,			\
> +			       int imm)					\
> +{									\
> +	insn->reg2i12_format.opcode = OP;				\
> +	insn->reg2i12_format.immediate = imm;				\
> +	insn->reg2i12_format.rd = rd;					\
> +	insn->reg2i12_format.rj = rj;					\
> +}
> +
> +DEF_EMIT_REG2I12_FORMAT(addiw, addiw_op)
> +DEF_EMIT_REG2I12_FORMAT(addid, addid_op)
> +DEF_EMIT_REG2I12_FORMAT(lu52id, lu52id_op)
> +DEF_EMIT_REG2I12_FORMAT(andi, andi_op)
> +DEF_EMIT_REG2I12_FORMAT(ori, ori_op)
> +DEF_EMIT_REG2I12_FORMAT(xori, xori_op)
> +DEF_EMIT_REG2I12_FORMAT(ldbu, ldbu_op)
> +DEF_EMIT_REG2I12_FORMAT(ldhu, ldhu_op)
> +DEF_EMIT_REG2I12_FORMAT(ldwu, ldwu_op)
> +DEF_EMIT_REG2I12_FORMAT(ldd, ldd_op)
> +DEF_EMIT_REG2I12_FORMAT(stb, stb_op)
> +DEF_EMIT_REG2I12_FORMAT(sth, sth_op)
> +DEF_EMIT_REG2I12_FORMAT(stw, stw_op)
> +DEF_EMIT_REG2I12_FORMAT(std, std_op)
> +
> +#define DEF_EMIT_REG2I14_FORMAT(NAME, OP)				\
> +static inline void emit_##NAME(union loongarch_instruction *insn,	\
> +			       enum loongarch_gpr rd,			\
> +			       enum loongarch_gpr rj,			\
> +			       int imm)					\
> +{									\
> +	insn->reg2i14_format.opcode = OP;				\
> +	insn->reg2i14_format.immediate = imm;				\
> +	insn->reg2i14_format.rd = rd;					\
> +	insn->reg2i14_format.rj = rj;					\
> +}
> +
> +DEF_EMIT_REG2I14_FORMAT(llw, llw_op)
> +DEF_EMIT_REG2I14_FORMAT(scw, scw_op)
> +DEF_EMIT_REG2I14_FORMAT(lld, lld_op)
> +DEF_EMIT_REG2I14_FORMAT(scd, scd_op)
> +
> +#define DEF_EMIT_REG2I16_FORMAT(NAME, OP)				\
> +static inline void emit_##NAME(union loongarch_instruction *insn,	\
> +			       enum loongarch_gpr rj,			\
> +			       enum loongarch_gpr rd,			\
> +			       int offset)				\
> +{									\
> +	insn->reg2i16_format.opcode = OP;				\
> +	insn->reg2i16_format.immediate = offset;			\
> +	insn->reg2i16_format.rj = rj;					\
> +	insn->reg2i16_format.rd = rd;					\
> +}
> +
> +DEF_EMIT_REG2I16_FORMAT(beq, beq_op)
> +DEF_EMIT_REG2I16_FORMAT(bne, bne_op)
> +DEF_EMIT_REG2I16_FORMAT(blt, blt_op)
> +DEF_EMIT_REG2I16_FORMAT(bge, bge_op)
> +DEF_EMIT_REG2I16_FORMAT(bltu, bltu_op)
> +DEF_EMIT_REG2I16_FORMAT(bgeu, bgeu_op)
> +DEF_EMIT_REG2I16_FORMAT(jirl, jirl_op)
> +
> +#define DEF_EMIT_REG3_FORMAT(NAME, OP)					\
> +static inline void emit_##NAME(union loongarch_instruction *insn,	\
> +			       enum loongarch_gpr rd,			\
> +			       enum loongarch_gpr rj,			\
> +			       enum loongarch_gpr rk)			\
> +{									\
> +	insn->reg3_format.opcode = OP;					\
> +	insn->reg3_format.rd = rd;					\
> +	insn->reg3_format.rj = rj;					\
> +	insn->reg3_format.rk = rk;					\
> +}
> +
> +DEF_EMIT_REG3_FORMAT(addd, addd_op)
> +DEF_EMIT_REG3_FORMAT(subd, subd_op)
> +DEF_EMIT_REG3_FORMAT(muld, muld_op)
> +DEF_EMIT_REG3_FORMAT(divdu, divdu_op)
> +DEF_EMIT_REG3_FORMAT(moddu, moddu_op)
> +DEF_EMIT_REG3_FORMAT(and, and_op)
> +DEF_EMIT_REG3_FORMAT(or, or_op)
> +DEF_EMIT_REG3_FORMAT(xor, xor_op)
> +DEF_EMIT_REG3_FORMAT(sllw, sllw_op)
> +DEF_EMIT_REG3_FORMAT(slld, slld_op)
> +DEF_EMIT_REG3_FORMAT(srlw, srlw_op)
> +DEF_EMIT_REG3_FORMAT(srld, srld_op)
> +DEF_EMIT_REG3_FORMAT(sraw, sraw_op)
> +DEF_EMIT_REG3_FORMAT(srad, srad_op)
> +DEF_EMIT_REG3_FORMAT(ldxbu, ldxbu_op)
> +DEF_EMIT_REG3_FORMAT(ldxhu, ldxhu_op)
> +DEF_EMIT_REG3_FORMAT(ldxwu, ldxwu_op)
> +DEF_EMIT_REG3_FORMAT(ldxd, ldxd_op)
> +DEF_EMIT_REG3_FORMAT(stxb, stxb_op)
> +DEF_EMIT_REG3_FORMAT(stxh, stxh_op)
> +DEF_EMIT_REG3_FORMAT(stxw, stxw_op)
> +DEF_EMIT_REG3_FORMAT(stxd, stxd_op)
> +DEF_EMIT_REG3_FORMAT(amaddw, amaddw_op)
> +DEF_EMIT_REG3_FORMAT(amaddd, amaddd_op)
> +DEF_EMIT_REG3_FORMAT(amandw, amandw_op)
> +DEF_EMIT_REG3_FORMAT(amandd, amandd_op)
> +DEF_EMIT_REG3_FORMAT(amorw, amorw_op)
> +DEF_EMIT_REG3_FORMAT(amord, amord_op)
> +DEF_EMIT_REG3_FORMAT(amxorw, amxorw_op)
> +DEF_EMIT_REG3_FORMAT(amxord, amxord_op)
> +DEF_EMIT_REG3_FORMAT(amswapw, amswapw_op)
> +DEF_EMIT_REG3_FORMAT(amswapd, amswapd_op)
> +
>  #endif /* _ASM_INST_H */
> diff --git a/arch/loongarch/net/Makefile b/arch/loongarch/net/Makefile
> new file mode 100644
> index 0000000..1ec12a0
> --- /dev/null
> +++ b/arch/loongarch/net/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Makefile for arch/loongarch/net
> +#
> +# Copyright (C) 2022 Loongson Technology Corporation Limited
> +#
> +obj-$(CONFIG_BPF_JIT) += bpf_jit.o
> diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
> new file mode 100644
> index 0000000..2f41b9b
> --- /dev/null
> +++ b/arch/loongarch/net/bpf_jit.c
> @@ -0,0 +1,1113 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * BPF JIT compiler for LoongArch
> + *
> + * Copyright (C) 2022 Loongson Technology Corporation Limited
> + */
> +#include "bpf_jit.h"
> +
> +#define REG_TCC		LOONGARCH_GPR_A6
> +#define TCC_SAVED	LOONGARCH_GPR_S5
> +
> +#define SAVE_RA		BIT(0)
> +#define SAVE_TCC	BIT(1)
> +
> +static const int regmap[] = {
> +	/* return value from in-kernel function, and exit value for eBPF program */
> +	[BPF_REG_0] = LOONGARCH_GPR_A5,
> +	/* arguments from eBPF program to in-kernel function */
> +	[BPF_REG_1] = LOONGARCH_GPR_A0,
> +	[BPF_REG_2] = LOONGARCH_GPR_A1,
> +	[BPF_REG_3] = LOONGARCH_GPR_A2,
> +	[BPF_REG_4] = LOONGARCH_GPR_A3,
> +	[BPF_REG_5] = LOONGARCH_GPR_A4,
> +	/* callee saved registers that in-kernel function will preserve */
> +	[BPF_REG_6] = LOONGARCH_GPR_S0,
> +	[BPF_REG_7] = LOONGARCH_GPR_S1,
> +	[BPF_REG_8] = LOONGARCH_GPR_S2,
> +	[BPF_REG_9] = LOONGARCH_GPR_S3,
> +	/* read-only frame pointer to access stack */
> +	[BPF_REG_FP] = LOONGARCH_GPR_S4,
> +	/* temporary register for blinding constants */
> +	[BPF_REG_AX] = LOONGARCH_GPR_T0,
> +};
> +
> +static void mark_call(struct jit_ctx *ctx)
> +{
> +	ctx->flags |= SAVE_RA;
> +}
> +
> +static void mark_tail_call(struct jit_ctx *ctx)
> +{
> +	ctx->flags |= SAVE_TCC;
> +}
> +
> +static bool seen_call(struct jit_ctx *ctx)
> +{
> +	return (ctx->flags & SAVE_RA);
> +}
> +
> +static bool seen_tail_call(struct jit_ctx *ctx)
> +{
> +	return (ctx->flags & SAVE_TCC);
> +}
> +
> +static u8 tail_call_reg(struct jit_ctx *ctx)
> +{
> +	if (seen_call(ctx))
> +		return TCC_SAVED;
> +
> +	return REG_TCC;
> +}
> +
> +/*
> + * eBPF prog stack layout:
> + *
> + *                                        high
> + * original $sp ------------> +-------------------------+ <--LOONGARCH_GPR_FP
> + *                            |           $ra           |
> + *                            +-------------------------+
> + *                            |           $fp           |
> + *                            +-------------------------+
> + *                            |           $s0           |
> + *                            +-------------------------+
> + *                            |           $s1           |
> + *                            +-------------------------+
> + *                            |           $s2           |
> + *                            +-------------------------+
> + *                            |           $s3           |
> + *                            +-------------------------+
> + *                            |           $s4           |
> + *                            +-------------------------+
> + *                            |           $s5           |
> + *                            +-------------------------+ <--BPF_REG_FP
> + *                            |  prog->aux->stack_depth |
> + *                            |        (optional)       |
> + * current $sp -------------> +-------------------------+
> + *                                        low
> + */
> +static void build_prologue(struct jit_ctx *ctx)
> +{
> +	int stack_adjust = 0, store_offset, bpf_stack_adjust;
> +
> +	bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
> +
> +	/* To store ra, fp, s0, s1, s2, s3, s4 and s5. */
> +	stack_adjust += sizeof(long) * 8;
> +
> +	stack_adjust = round_up(stack_adjust, 16);
> +	stack_adjust += bpf_stack_adjust;
> +
> +	/*
> +	 * First instruction initializes the tail call count (TCC).
> +	 * On tail call we skip this instruction, and the TCC is
> +	 * passed in REG_TCC from the caller.
> +	 */
> +	emit_insn(ctx, addid, REG_TCC, LOONGARCH_GPR_ZERO, MAX_TAIL_CALL_CNT);
> +
> +	emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, -stack_adjust);
> +
> +	store_offset = stack_adjust - sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S0, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S1, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S2, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S3, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S4, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, store_offset);
> +
> +	emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_adjust);
> +
> +	if (bpf_stack_adjust)
> +		emit_insn(ctx, addid, regmap[BPF_REG_FP], LOONGARCH_GPR_SP, bpf_stack_adjust);
> +
> +	/*
> +	 * Program contains calls and tail calls, so REG_TCC need
> +	 * to be saved across calls.
> +	 */
> +	if (seen_tail_call(ctx) && seen_call(ctx))
> +		move_reg(ctx, TCC_SAVED, REG_TCC);
> +
> +	ctx->stack_size = stack_adjust;
> +}
> +
> +static void __build_epilogue(struct jit_ctx *ctx, bool is_tail_call)
> +{
> +	int stack_adjust = ctx->stack_size;
> +	int load_offset;
> +
> +	load_offset = stack_adjust - sizeof(long);
> +	emit_insn(ctx, ldd, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, load_offset);
> +
> +	load_offset -= sizeof(long);
> +	emit_insn(ctx, ldd, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, load_offset);
> +
> +	load_offset -= sizeof(long);
> +	emit_insn(ctx, ldd, LOONGARCH_GPR_S0, LOONGARCH_GPR_SP, load_offset);
> +
> +	load_offset -= sizeof(long);
> +	emit_insn(ctx, ldd, LOONGARCH_GPR_S1, LOONGARCH_GPR_SP, load_offset);
> +
> +	load_offset -= sizeof(long);
> +	emit_insn(ctx, ldd, LOONGARCH_GPR_S2, LOONGARCH_GPR_SP, load_offset);
> +
> +	load_offset -= sizeof(long);
> +	emit_insn(ctx, ldd, LOONGARCH_GPR_S3, LOONGARCH_GPR_SP, load_offset);
> +
> +	load_offset -= sizeof(long);
> +	emit_insn(ctx, ldd, LOONGARCH_GPR_S4, LOONGARCH_GPR_SP, load_offset);
> +
> +	load_offset -= sizeof(long);
> +	emit_insn(ctx, ldd, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, load_offset);
> +
> +	emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, stack_adjust);
> +
> +	if (!is_tail_call) {
> +		/* Set return value */
> +		move_reg(ctx, LOONGARCH_GPR_A0, regmap[BPF_REG_0]);
> +		/* Return to the caller */
> +		emit_insn(ctx, jirl, LOONGARCH_GPR_RA, LOONGARCH_GPR_ZERO, 0);
> +	} else {
> +		/*
> +		 * Call the next bpf prog and skip the first instruction
> +		 * of TCC initialization.
> +		 */
> +		emit_insn(ctx, jirl, LOONGARCH_GPR_T3, LOONGARCH_GPR_ZERO, 1);
> +	}
> +}
> +
> +void build_epilogue(struct jit_ctx *ctx)
> +{
> +	__build_epilogue(ctx, false);
> +}
> +
> +bool bpf_jit_supports_kfunc_call(void)
> +{
> +	return true;
> +}
> +
> +/* initialized on the first pass of build_body() */
> +static int out_offset = -1;
> +static int emit_bpf_tail_call(struct jit_ctx *ctx)
> +{
> +	int off;
> +	u8 tcc = tail_call_reg(ctx);
> +	u8 a1 = LOONGARCH_GPR_A1;
> +	u8 a2 = LOONGARCH_GPR_A2;
> +	u8 t1 = LOONGARCH_GPR_T1;
> +	u8 t2 = LOONGARCH_GPR_T2;
> +	u8 t3 = LOONGARCH_GPR_T3;
> +	const int idx0 = ctx->idx;
> +
> +#define cur_offset (ctx->idx - idx0)
> +#define jmp_offset (out_offset - (cur_offset))
> +
> +	/*
> +	 * a0: &ctx
> +	 * a1: &array
> +	 * a2: index
> +	 *
> +	 * if (index >= array->map.max_entries)
> +	 *	 goto out;
> +	 */
> +	off = offsetof(struct bpf_array, map.max_entries);
> +	emit_insn(ctx, ldwu, t1, a1, off);
> +	/* bgeu $a2, $t1, jmp_offset */
> +	emit_tailcall_jmp(ctx, BPF_JGE, a2, t1, jmp_offset);
> +
> +	/*
> +	 * if (--TCC < 0)
> +	 *	 goto out;
> +	 */
> +	emit_insn(ctx, addid, REG_TCC, tcc, -1);
> +	emit_tailcall_jmp(ctx, BPF_JSLT, REG_TCC, LOONGARCH_GPR_ZERO, jmp_offset);
> +
> +	/*
> +	 * prog = array->ptrs[index];
> +	 * if (!prog)
> +	 *	 goto out;
> +	 */
> +	emit_insn(ctx, sllid, t2, a2, 3);
> +	emit_insn(ctx, addd, t2, t2, a1);
> +	off = offsetof(struct bpf_array, ptrs);
> +	emit_insn(ctx, ldd, t2, t2, off);
> +	/* beq $t2, $zero, jmp_offset */
> +	emit_tailcall_jmp(ctx, BPF_JEQ, t2, LOONGARCH_GPR_ZERO, jmp_offset);
> +
> +	/* goto *(prog->bpf_func + 4); */
> +	off = offsetof(struct bpf_prog, bpf_func);
> +	emit_insn(ctx, ldd, t3, t2, off);
> +	__build_epilogue(ctx, true);
> +
> +	/* out: */
> +	if (out_offset == -1)
> +		out_offset = cur_offset;
> +	if (cur_offset != out_offset) {
> +		pr_err_once("tail_call out_offset = %d, expected %d!\n",
> +			    cur_offset, out_offset);
> +		return -1;
> +	}
> +
> +	return 0;
> +#undef cur_offset
> +#undef jmp_offset
> +}
> +
> +static void emit_atomic(const struct bpf_insn *insn, struct jit_ctx *ctx)
> +{
> +	const u8 dst = regmap[insn->dst_reg];
> +	const u8 src = regmap[insn->src_reg];
> +	const u8 t1 = LOONGARCH_GPR_T1;
> +	const u8 t2 = LOONGARCH_GPR_T2;
> +	const u8 t3 = LOONGARCH_GPR_T3;
> +	const s16 off = insn->off;
> +	const s32 imm = insn->imm;
> +	const bool isdw = BPF_SIZE(insn->code) == BPF_DW;
> +
> +	move_imm32(ctx, t1, off, false);
> +	emit_insn(ctx, addd, t1, dst, t1);
> +	move_reg(ctx, t3, src);
> +
> +	switch (imm) {
> +	/* lock *(size *)(dst + off) <op>= src */
> +	case BPF_ADD:
> +		if (isdw)
> +			emit_insn(ctx, amaddd, t2, t1, src);
> +		else
> +			emit_insn(ctx, amaddw, t2, t1, src);
> +		break;
> +	case BPF_AND:
> +		if (isdw)
> +			emit_insn(ctx, amandd, t2, t1, src);
> +		else
> +			emit_insn(ctx, amandw, t2, t1, src);
> +		break;
> +	case BPF_OR:
> +		if (isdw)
> +			emit_insn(ctx, amord, t2, t1, src);
> +		else
> +			emit_insn(ctx, amorw, t2, t1, src);
> +		break;
> +	case BPF_XOR:
> +		if (isdw)
> +			emit_insn(ctx, amxord, t2, t1, src);
> +		else
> +			emit_insn(ctx, amxorw, t2, t1, src);
> +		break;
> +	/* src = atomic_fetch_<op>(dst + off, src) */
> +	case BPF_ADD | BPF_FETCH:
> +		if (isdw) {
> +			emit_insn(ctx, amaddd, src, t1, t3);
> +		} else {
> +			emit_insn(ctx, amaddw, src, t1, t3);
> +			emit_zext_32(ctx, src, true);
> +		}
> +		break;
> +	case BPF_AND | BPF_FETCH:
> +		if (isdw) {
> +			emit_insn(ctx, amandd, src, t1, t3);
> +		} else {
> +			emit_insn(ctx, amandw, src, t1, t3);
> +			emit_zext_32(ctx, src, true);
> +		}
> +		break;
> +	case BPF_OR | BPF_FETCH:
> +		if (isdw) {
> +			emit_insn(ctx, amord, src, t1, t3);
> +		} else {
> +			emit_insn(ctx, amorw, src, t1, t3);
> +			emit_zext_32(ctx, src, true);
> +		}
> +		break;
> +	case BPF_XOR | BPF_FETCH:
> +		if (isdw) {
> +			emit_insn(ctx, amxord, src, t1, t3);
> +		} else {
> +			emit_insn(ctx, amxorw, src, t1, t3);
> +			emit_zext_32(ctx, src, true);
> +		}
> +		break;
> +	/* src = atomic_xchg(dst + off, src); */
> +	case BPF_XCHG:
> +		if (isdw) {
> +			emit_insn(ctx, amswapd, src, t1, t3);
> +		} else {
> +			emit_insn(ctx, amswapw, src, t1, t3);
> +			emit_zext_32(ctx, src, true);
> +		}
> +		break;
> +	/* r0 = atomic_cmpxchg(dst + off, r0, src); */
> +	case BPF_CMPXCHG:
> +		u8 r0 = regmap[BPF_REG_0];
> +
> +		move_reg(ctx, t2, r0);
> +		if (isdw) {
> +			emit_insn(ctx, lld, r0, t1, 0);
> +			emit_insn(ctx, bne, t2, r0, 4);
> +			move_reg(ctx, t3, src);
> +			emit_insn(ctx, scd, t3, t1, 0);
> +			emit_insn(ctx, beq, t3, LOONGARCH_GPR_ZERO, -4);
> +		} else {
> +			emit_insn(ctx, llw, r0, t1, 0);
> +			emit_zext_32(ctx, t2, true);
> +			emit_zext_32(ctx, r0, true);
> +			emit_insn(ctx, bne, t2, r0, 4);
> +			move_reg(ctx, t3, src);
> +			emit_insn(ctx, scw, t3, t1, 0);
> +			emit_insn(ctx, beq, t3, LOONGARCH_GPR_ZERO, -6);
> +			emit_zext_32(ctx, r0, true);
> +		}
> +		break;
> +	}
> +}
> +
> +static bool is_signed_bpf_cond(u8 cond)
> +{
> +	return cond == BPF_JSGT || cond == BPF_JSLT ||
> +	       cond == BPF_JSGE || cond == BPF_JSLE;
> +}
> +
> +static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool extra_pass)
> +{
> +	const bool is32 = BPF_CLASS(insn->code) == BPF_ALU ||
> +			  BPF_CLASS(insn->code) == BPF_JMP32;
> +	const u8 code = insn->code;
> +	const u8 cond = BPF_OP(code);
> +	const u8 dst = regmap[insn->dst_reg];
> +	const u8 src = regmap[insn->src_reg];
> +	const u8 t1 = LOONGARCH_GPR_T1;
> +	const u8 t2 = LOONGARCH_GPR_T2;
> +	const s16 off = insn->off;
> +	const s32 imm = insn->imm;
> +	int i = insn - ctx->prog->insnsi;
> +	int jmp_offset;
> +
> +	switch (code) {
> +	/* dst = src */
> +	case BPF_ALU | BPF_MOV | BPF_X:
> +	case BPF_ALU64 | BPF_MOV | BPF_X:
> +		move_reg(ctx, dst, src);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	/* dst = imm */
> +	case BPF_ALU | BPF_MOV | BPF_K:
> +	case BPF_ALU64 | BPF_MOV | BPF_K:
> +		move_imm32(ctx, dst, imm, is32);
> +		break;
> +
> +	/* dst = dst + src */
> +	case BPF_ALU | BPF_ADD | BPF_X:
> +	case BPF_ALU64 | BPF_ADD | BPF_X:
> +		emit_insn(ctx, addd, dst, dst, src);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	/* dst = dst + imm */
> +	case BPF_ALU | BPF_ADD | BPF_K:
> +	case BPF_ALU64 | BPF_ADD | BPF_K:
> +		if (is_signed_imm12(imm)) {
> +			emit_insn(ctx, addid, dst, dst, imm);
> +		} else {
> +			move_imm32(ctx, t1, imm, is32);
> +			emit_insn(ctx, addd, dst, dst, t1);
> +		}
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +
> +	/* dst = dst - src */
> +	case BPF_ALU | BPF_SUB | BPF_X:
> +	case BPF_ALU64 | BPF_SUB | BPF_X:
> +		emit_insn(ctx, subd, dst, dst, src);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	/* dst = dst - imm */
> +	case BPF_ALU | BPF_SUB | BPF_K:
> +	case BPF_ALU64 | BPF_SUB | BPF_K:
> +		if (is_signed_imm12(-imm)) {
> +			emit_insn(ctx, addid, dst, dst, -imm);
> +		} else {
> +			move_imm32(ctx, t1, imm, is32);
> +			emit_insn(ctx, subd, dst, dst, t1);
> +		}
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +
> +	/* dst = dst * src */
> +	case BPF_ALU | BPF_MUL | BPF_X:
> +	case BPF_ALU64 | BPF_MUL | BPF_X:
> +		emit_insn(ctx, muld, dst, dst, src);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	/* dst = dst * imm */
> +	case BPF_ALU | BPF_MUL | BPF_K:
> +	case BPF_ALU64 | BPF_MUL | BPF_K:
> +		move_imm32(ctx, t1, imm, is32);
> +		emit_insn(ctx, muld, dst, dst, t1);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +
> +	/* dst = dst / src */
> +	case BPF_ALU | BPF_DIV | BPF_X:
> +	case BPF_ALU64 | BPF_DIV | BPF_X:
> +		emit_zext_32(ctx, dst, is32);
> +		move_reg(ctx, t1, src);
> +		emit_zext_32(ctx, t1, is32);
> +		emit_insn(ctx, divdu, dst, dst, t1);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	/* dst = dst / imm */
> +	case BPF_ALU | BPF_DIV | BPF_K:
> +	case BPF_ALU64 | BPF_DIV | BPF_K:
> +		move_imm32(ctx, t1, imm, is32);
> +		emit_zext_32(ctx, dst, is32);
> +		emit_insn(ctx, divdu, dst, dst, t1);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +
> +	/* dst = dst % src */
> +	case BPF_ALU | BPF_MOD | BPF_X:
> +	case BPF_ALU64 | BPF_MOD | BPF_X:
> +		emit_zext_32(ctx, dst, is32);
> +		move_reg(ctx, t1, src);
> +		emit_zext_32(ctx, t1, is32);
> +		emit_insn(ctx, moddu, dst, dst, t1);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	/* dst = dst % imm */
> +	case BPF_ALU | BPF_MOD | BPF_K:
> +	case BPF_ALU64 | BPF_MOD | BPF_K:
> +		move_imm32(ctx, t1, imm, is32);
> +		emit_zext_32(ctx, dst, is32);
> +		emit_insn(ctx, moddu, dst, dst, t1);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +
> +	/* dst = -dst */
> +	case BPF_ALU | BPF_NEG:
> +	case BPF_ALU64 | BPF_NEG:
> +		move_imm32(ctx, t1, imm, is32);
> +		emit_insn(ctx, subd, dst, LOONGARCH_GPR_ZERO, dst);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +
> +	/* dst = dst & src */
> +	case BPF_ALU | BPF_AND | BPF_X:
> +	case BPF_ALU64 | BPF_AND | BPF_X:
> +		emit_insn(ctx, and, dst, dst, src);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	/* dst = dst & imm */
> +	case BPF_ALU | BPF_AND | BPF_K:
> +	case BPF_ALU64 | BPF_AND | BPF_K:
> +		if (is_unsigned_imm12(imm)) {
> +			emit_insn(ctx, andi, dst, dst, imm);
> +		} else {
> +			move_imm32(ctx, t1, imm, is32);
> +			emit_insn(ctx, and, dst, dst, t1);
> +		}
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +
> +	/* dst = dst | src */
> +	case BPF_ALU | BPF_OR | BPF_X:
> +	case BPF_ALU64 | BPF_OR | BPF_X:
> +		emit_insn(ctx, or, dst, dst, src);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	/* dst = dst | imm */
> +	case BPF_ALU | BPF_OR | BPF_K:
> +	case BPF_ALU64 | BPF_OR | BPF_K:
> +		if (is_unsigned_imm12(imm)) {
> +			emit_insn(ctx, ori, dst, dst, imm);
> +		} else {
> +			move_imm32(ctx, t1, imm, is32);
> +			emit_insn(ctx, or, dst, dst, t1);
> +		}
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +
> +	/* dst = dst ^ src */
> +	case BPF_ALU | BPF_XOR | BPF_X:
> +	case BPF_ALU64 | BPF_XOR | BPF_X:
> +		emit_insn(ctx, xor, dst, dst, src);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	/* dst = dst ^ imm */
> +	case BPF_ALU | BPF_XOR | BPF_K:
> +	case BPF_ALU64 | BPF_XOR | BPF_K:
> +		if (is_unsigned_imm12(imm)) {
> +			emit_insn(ctx, xori, dst, dst, imm);
> +		} else {
> +			move_imm32(ctx, t1, imm, is32);
> +			emit_insn(ctx, xor, dst, dst, t1);
> +		}
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +
> +	/* dst = dst << src (logical) */
> +	case BPF_ALU | BPF_LSH | BPF_X:
> +		emit_insn(ctx, sllw, dst, dst, src);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	case BPF_ALU64 | BPF_LSH | BPF_X:
> +		emit_insn(ctx, slld, dst, dst, src);
> +		break;
> +	/* dst = dst << imm (logical) */
> +	case BPF_ALU | BPF_LSH | BPF_K:
> +		emit_insn(ctx, slliw, dst, dst, imm);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	case BPF_ALU64 | BPF_LSH | BPF_K:
> +		emit_insn(ctx, sllid, dst, dst, imm);
> +		break;
> +
> +	/* dst = dst >> src (logical) */
> +	case BPF_ALU | BPF_RSH | BPF_X:
> +		emit_insn(ctx, srlw, dst, dst, src);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	case BPF_ALU64 | BPF_RSH | BPF_X:
> +		emit_insn(ctx, srld, dst, dst, src);
> +		break;
> +	/* dst = dst >> imm (logical) */
> +	case BPF_ALU | BPF_RSH | BPF_K:
> +		emit_insn(ctx, srliw, dst, dst, imm);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	case BPF_ALU64 | BPF_RSH | BPF_K:
> +		emit_insn(ctx, srlid, dst, dst, imm);
> +		break;
> +
> +	/* dst = dst >> src (arithmetic) */
> +	case BPF_ALU | BPF_ARSH | BPF_X:
> +		emit_insn(ctx, sraw, dst, dst, src);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	case BPF_ALU64 | BPF_ARSH | BPF_X:
> +		emit_insn(ctx, srad, dst, dst, src);
> +		break;
> +	/* dst = dst >> imm (arithmetic) */
> +	case BPF_ALU | BPF_ARSH | BPF_K:
> +		emit_insn(ctx, sraiw, dst, dst, imm);
> +		emit_zext_32(ctx, dst, is32);
> +		break;
> +	case BPF_ALU64 | BPF_ARSH | BPF_K:
> +		emit_insn(ctx, sraid, dst, dst, imm);
> +		break;
> +
> +	/* dst = BSWAP##imm(dst) */
> +	case BPF_ALU | BPF_END | BPF_FROM_LE:
> +		switch (imm) {
> +		case 16:
> +			/* zero-extend 16 bits into 64 bits */
> +			emit_insn(ctx, sllid, dst, dst, 48);
> +			emit_insn(ctx, srlid, dst, dst, 48);
> +			break;
> +		case 32:
> +			/* zero-extend 32 bits into 64 bits */
> +			emit_zext_32(ctx, dst, is32);
> +			break;
> +		case 64:
> +			/* do nothing */
> +			break;
> +		}
> +		break;
> +	case BPF_ALU | BPF_END | BPF_FROM_BE:
> +		switch (imm) {
> +		case 16:
> +			emit_insn(ctx, revb2h, dst, dst);
> +			/* zero-extend 16 bits into 64 bits */
> +			emit_insn(ctx, sllid, dst, dst, 48);
> +			emit_insn(ctx, srlid, dst, dst, 48);
> +			break;
> +		case 32:
> +			emit_insn(ctx, revb2w, dst, dst);
> +			/* zero-extend 32 bits into 64 bits */
> +			emit_zext_32(ctx, dst, is32);
> +			break;
> +		case 64:
> +			emit_insn(ctx, revbd, dst, dst);
> +			break;
> +		}
> +		break;
> +
> +	/* PC += off if dst cond src */
> +	case BPF_JMP | BPF_JEQ | BPF_X:
> +	case BPF_JMP | BPF_JNE | BPF_X:
> +	case BPF_JMP | BPF_JGT | BPF_X:
> +	case BPF_JMP | BPF_JGE | BPF_X:
> +	case BPF_JMP | BPF_JLT | BPF_X:
> +	case BPF_JMP | BPF_JLE | BPF_X:
> +	case BPF_JMP | BPF_JSGT | BPF_X:
> +	case BPF_JMP | BPF_JSGE | BPF_X:
> +	case BPF_JMP | BPF_JSLT | BPF_X:
> +	case BPF_JMP | BPF_JSLE | BPF_X:
> +	case BPF_JMP32 | BPF_JEQ | BPF_X:
> +	case BPF_JMP32 | BPF_JNE | BPF_X:
> +	case BPF_JMP32 | BPF_JGT | BPF_X:
> +	case BPF_JMP32 | BPF_JGE | BPF_X:
> +	case BPF_JMP32 | BPF_JLT | BPF_X:
> +	case BPF_JMP32 | BPF_JLE | BPF_X:
> +	case BPF_JMP32 | BPF_JSGT | BPF_X:
> +	case BPF_JMP32 | BPF_JSGE | BPF_X:
> +	case BPF_JMP32 | BPF_JSLT | BPF_X:
> +	case BPF_JMP32 | BPF_JSLE | BPF_X:
> +		jmp_offset = bpf2la_offset(i, off, ctx);
> +		move_reg(ctx, t1, dst);
> +		move_reg(ctx, t2, src);
> +		if (is_signed_bpf_cond(BPF_OP(code))) {
> +			emit_sext_32(ctx, t1, is32);
> +			emit_sext_32(ctx, t2, is32);
> +		} else {
> +			emit_zext_32(ctx, t1, is32);
> +			emit_zext_32(ctx, t2, is32);
> +		}
> +		emit_cond_jmp(ctx, cond, t1, t2, jmp_offset);
> +		break;
> +
> +	/* PC += off if dst cond imm */
> +	case BPF_JMP | BPF_JEQ | BPF_K:
> +	case BPF_JMP | BPF_JNE | BPF_K:
> +	case BPF_JMP | BPF_JGT | BPF_K:
> +	case BPF_JMP | BPF_JGE | BPF_K:
> +	case BPF_JMP | BPF_JLT | BPF_K:
> +	case BPF_JMP | BPF_JLE | BPF_K:
> +	case BPF_JMP | BPF_JSGT | BPF_K:
> +	case BPF_JMP | BPF_JSGE | BPF_K:
> +	case BPF_JMP | BPF_JSLT | BPF_K:
> +	case BPF_JMP | BPF_JSLE | BPF_K:
> +	case BPF_JMP32 | BPF_JEQ | BPF_K:
> +	case BPF_JMP32 | BPF_JNE | BPF_K:
> +	case BPF_JMP32 | BPF_JGT | BPF_K:
> +	case BPF_JMP32 | BPF_JGE | BPF_K:
> +	case BPF_JMP32 | BPF_JLT | BPF_K:
> +	case BPF_JMP32 | BPF_JLE | BPF_K:
> +	case BPF_JMP32 | BPF_JSGT | BPF_K:
> +	case BPF_JMP32 | BPF_JSGE | BPF_K:
> +	case BPF_JMP32 | BPF_JSLT | BPF_K:
> +	case BPF_JMP32 | BPF_JSLE | BPF_K:
> +		jmp_offset = bpf2la_offset(i, off, ctx);
> +		move_imm32(ctx, t1, imm, false);
> +		move_reg(ctx, t2, dst);
> +		if (is_signed_bpf_cond(BPF_OP(code))) {
> +			emit_sext_32(ctx, t1, is32);
> +			emit_sext_32(ctx, t2, is32);
> +		} else {
> +			emit_zext_32(ctx, t1, is32);
> +			emit_zext_32(ctx, t2, is32);
> +		}
> +		emit_cond_jmp(ctx, cond, t2, t1, jmp_offset);
> +		break;
> +
> +	/* PC += off if dst & src */
> +	case BPF_JMP | BPF_JSET | BPF_X:
> +	case BPF_JMP32 | BPF_JSET | BPF_X:
> +		jmp_offset = bpf2la_offset(i, off, ctx);
> +		emit_insn(ctx, and, t1, dst, src);
> +		emit_zext_32(ctx, t1, is32);
> +		emit_cond_jmp(ctx, cond, t1, LOONGARCH_GPR_ZERO, jmp_offset);
> +		break;
> +	/* PC += off if dst & imm */
> +	case BPF_JMP | BPF_JSET | BPF_K:
> +	case BPF_JMP32 | BPF_JSET | BPF_K:
> +		jmp_offset = bpf2la_offset(i, off, ctx);
> +		move_imm32(ctx, t1, imm, is32);
> +		emit_insn(ctx, and, t1, dst, t1);
> +		emit_zext_32(ctx, t1, is32);
> +		emit_cond_jmp(ctx, cond, t1, LOONGARCH_GPR_ZERO, jmp_offset);
> +		break;
> +
> +	/* PC += off */
> +	case BPF_JMP | BPF_JA:
> +		jmp_offset = bpf2la_offset(i, off, ctx);
> +		emit_uncond_jmp(ctx, jmp_offset, is32);
> +		break;
> +
> +	/* function call */
> +	case BPF_JMP | BPF_CALL:
> +		bool func_addr_fixed;
> +		u64 func_addr;
> +		int ret;
> +
> +		mark_call(ctx);
> +		ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
> +					    &func_addr, &func_addr_fixed);
> +		if (ret < 0)
> +			return ret;
> +
> +		move_imm64(ctx, t1, func_addr, is32);
> +		emit_insn(ctx, jirl, t1, LOONGARCH_GPR_RA, 0);
> +		move_reg(ctx, regmap[BPF_REG_0], LOONGARCH_GPR_A0);
> +		break;
> +
> +	/* tail call */
> +	case BPF_JMP | BPF_TAIL_CALL:
> +		mark_tail_call(ctx);
> +		if (emit_bpf_tail_call(ctx))
> +			return -EINVAL;
> +		break;
> +
> +	/* function return */
> +	case BPF_JMP | BPF_EXIT:
> +		emit_sext_32(ctx, regmap[BPF_REG_0], true);
> +
> +		if (i == ctx->prog->len - 1)
> +			break;
> +
> +		jmp_offset = epilogue_offset(ctx);
> +		emit_uncond_jmp(ctx, jmp_offset, true);
> +		break;
> +
> +	/* dst = imm64 */
> +	case BPF_LD | BPF_IMM | BPF_DW:
> +		u64 imm64 = (u64)(insn + 1)->imm << 32 | (u32)insn->imm;
> +
> +		move_imm64(ctx, dst, imm64, is32);
> +		return 1;
> +
> +	/* dst = *(size *)(src + off) */
> +	case BPF_LDX | BPF_MEM | BPF_B:
> +	case BPF_LDX | BPF_MEM | BPF_H:
> +	case BPF_LDX | BPF_MEM | BPF_W:
> +	case BPF_LDX | BPF_MEM | BPF_DW:
> +		if (is_signed_imm12(off)) {
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, ldbu, dst, src, off);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, ldhu, dst, src, off);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, ldwu, dst, src, off);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, ldd, dst, src, off);
> +				break;
> +			}
> +		} else {
> +			move_imm32(ctx, t1, off, is32);
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, ldxbu, dst, src, t1);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, ldxhu, dst, src, t1);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, ldxwu, dst, src, t1);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, ldxd, dst, src, t1);
> +				break;
> +			}
> +		}
> +		break;
> +
> +	/* *(size *)(dst + off) = imm */
> +	case BPF_ST | BPF_MEM | BPF_B:
> +	case BPF_ST | BPF_MEM | BPF_H:
> +	case BPF_ST | BPF_MEM | BPF_W:
> +	case BPF_ST | BPF_MEM | BPF_DW:
> +		move_imm32(ctx, t1, imm, is32);
> +		if (is_signed_imm12(off)) {
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, stb, t1, dst, off);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, sth, t1, dst, off);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, stw, t1, dst, off);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, std, t1, dst, off);
> +				break;
> +			}
> +		} else {
> +			move_imm32(ctx, t2, off, is32);
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, stxb, t1, dst, t2);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, stxh, t1, dst, t2);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, stxw, t1, dst, t2);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, stxd, t1, dst, t2);
> +				break;
> +			}
> +		}
> +		break;
> +
> +	/* *(size *)(dst + off) = src */
> +	case BPF_STX | BPF_MEM | BPF_B:
> +	case BPF_STX | BPF_MEM | BPF_H:
> +	case BPF_STX | BPF_MEM | BPF_W:
> +	case BPF_STX | BPF_MEM | BPF_DW:
> +		if (is_signed_imm12(off)) {
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, stb, src, dst, off);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, sth, src, dst, off);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, stw, src, dst, off);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, std, src, dst, off);
> +				break;
> +			}
> +		} else {
> +			move_imm32(ctx, t1, off, is32);
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, stxb, src, dst, t1);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, stxh, src, dst, t1);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, stxw, src, dst, t1);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, stxd, src, dst, t1);
> +				break;
> +			}
> +		}
> +		break;
> +
> +	case BPF_STX | BPF_ATOMIC | BPF_W:
> +	case BPF_STX | BPF_ATOMIC | BPF_DW:
> +		emit_atomic(insn, ctx);
> +		break;
> +
> +	default:
> +		pr_err("bpf_jit: unknown opcode %02x\n", code);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int build_body(struct jit_ctx *ctx, bool extra_pass)
> +{
> +	const struct bpf_prog *prog = ctx->prog;
> +	int i;
> +
> +	for (i = 0; i < prog->len; i++) {
> +		const struct bpf_insn *insn = &prog->insnsi[i];
> +		int ret;
> +
> +		if (ctx->image == NULL)
> +			ctx->offset[i] = ctx->idx;
> +
> +		ret = build_insn(insn, ctx, extra_pass);
> +		if (ret > 0) {
> +			i++;
> +			if (ctx->image == NULL)
> +				ctx->offset[i] = ctx->idx;
> +			continue;
> +		}
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (ctx->image == NULL)
> +		ctx->offset[i] = ctx->idx;
> +
> +	return 0;
> +}
> +
> +static inline void bpf_flush_icache(void *start, void *end)
> +{
> +	flush_icache_range((unsigned long)start, (unsigned long)end);
> +}
> +
> +/* Fill space with illegal instructions */
> +static void jit_fill_hole(void *area, unsigned int size)
> +{
> +	u32 *ptr;
> +
> +	/* We are guaranteed to have aligned memory */
> +	for (ptr = area; size >= sizeof(u32); size -= sizeof(u32))
> +		*ptr++ = INSN_BREAK;
> +}
> +
> +static int validate_code(struct jit_ctx *ctx)
> +{
> +	int i;
> +	union loongarch_instruction insn;
> +
> +	for (i = 0; i < ctx->idx; i++) {
> +		insn = ctx->image[i];
> +		/* Check INSN_BREAK */
> +		if (insn.word == INSN_BREAK)
> +			return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
> +{
> +	struct bpf_prog *tmp, *orig_prog = prog;
> +	struct bpf_binary_header *header;
> +	struct jit_data *jit_data;
> +	struct jit_ctx ctx;
> +	bool tmp_blinded = false;
> +	bool extra_pass = false;
> +	int image_size;
> +	u8 *image_ptr;
> +
> +	/*
> +	 * If BPF JIT was not enabled then we must fall back to
> +	 * the interpreter.
> +	 */
> +	if (!prog->jit_requested)
> +		return orig_prog;
> +
> +	tmp = bpf_jit_blind_constants(prog);
> +	/*
> +	 * If blinding was requested and we failed during blinding,
> +	 * we must fall back to the interpreter. Otherwise, we save
> +	 * the new JITed code.
> +	 */
> +	if (IS_ERR(tmp))
> +		return orig_prog;
> +	if (tmp != prog) {
> +		tmp_blinded = true;
> +		prog = tmp;
> +	}
> +
> +	jit_data = prog->aux->jit_data;
> +	if (!jit_data) {
> +		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
> +		if (!jit_data) {
> +			prog = orig_prog;
> +			goto out;
> +		}
> +		prog->aux->jit_data = jit_data;
> +	}
> +	if (jit_data->ctx.offset) {
> +		ctx = jit_data->ctx;
> +		image_ptr = jit_data->image;
> +		header = jit_data->header;
> +		extra_pass = true;
> +		image_size = sizeof(u32) * ctx.idx;
> +		goto skip_init_ctx;
> +	}
> +
> +	memset(&ctx, 0, sizeof(ctx));
> +	ctx.prog = prog;
> +
> +	ctx.offset = kcalloc(prog->len + 1, sizeof(u32), GFP_KERNEL);
> +	if (ctx.offset == NULL) {
> +		prog = orig_prog;
> +		goto out_off;
> +	}
> +
> +	/* 1. Initial fake pass to compute ctx->idx and set ctx->flags */
> +	if (build_body(&ctx, extra_pass)) {
> +		prog = orig_prog;
> +		goto out_off;
> +	}
> +	build_prologue(&ctx);
> +	ctx.epilogue_offset = ctx.idx;
> +	build_epilogue(&ctx);
> +
> +	/* Now we know the actual image size.
> +	 * As each LoongArch instruction is of length 32bit,
> +	 * we are translating number of JITed intructions into
> +	 * the size required to store these JITed code.
> +	 */
> +	image_size = sizeof(u32) * ctx.idx;
> +	/* Now we know the size of the structure to make */
> +	header = bpf_jit_binary_alloc(image_size, &image_ptr,
> +				      sizeof(u32), jit_fill_hole);
> +	if (header == NULL) {
> +		prog = orig_prog;
> +		goto out_off;
> +	}
> +
> +	/* 2. Now, the actual pass to generate final JIT code */
> +	ctx.image = (union loongarch_instruction *)image_ptr;
> +skip_init_ctx:
> +	ctx.idx = 0;
> +
> +	build_prologue(&ctx);
> +	if (build_body(&ctx, extra_pass)) {
> +		bpf_jit_binary_free(header);
> +		prog = orig_prog;
> +		goto out_off;
> +	}
> +	build_epilogue(&ctx);
> +
> +	/* 3. Extra pass to validate JITed code */
> +	if (validate_code(&ctx)) {
> +		bpf_jit_binary_free(header);
> +		prog = orig_prog;
> +		goto out_off;
> +	}
> +
> +	/* And we're done */
> +	if (bpf_jit_enable > 1)
> +		bpf_jit_dump(prog->len, image_size, 2, ctx.image);
> +
> +	/* Update the icache */
> +	bpf_flush_icache(header, ctx.image + ctx.idx);
> +
> +	if (!prog->is_func || extra_pass) {
> +		if (extra_pass && ctx.idx != jit_data->ctx.idx) {
> +			pr_err_once("multi-func JIT bug %d != %d\n",
> +				    ctx.idx, jit_data->ctx.idx);
> +			bpf_jit_binary_free(header);
> +			prog->bpf_func = NULL;
> +			prog->jited = 0;
> +			prog->jited_len = 0;
> +			goto out_off;
> +		}
> +		bpf_jit_binary_lock_ro(header);
> +	} else {
> +		jit_data->ctx = ctx;
> +		jit_data->image = image_ptr;
> +		jit_data->header = header;
> +	}
> +	prog->bpf_func = (void *)ctx.image;
> +	prog->jited = 1;
> +	prog->jited_len = image_size;
> +
> +	if (!prog->is_func || extra_pass) {
> +out_off:
> +		kfree(ctx.offset);
> +		kfree(jit_data);
> +		prog->aux->jit_data = NULL;
> +	}
> +out:
> +	if (tmp_blinded)
> +		bpf_jit_prog_release_other(prog, prog == orig_prog ?
> +					   tmp : orig_prog);
> +
> +	out_offset = -1;
> +	return prog;
> +}
> diff --git a/arch/loongarch/net/bpf_jit.h b/arch/loongarch/net/bpf_jit.h
> new file mode 100644
> index 0000000..9c735f3
> --- /dev/null
> +++ b/arch/loongarch/net/bpf_jit.h
> @@ -0,0 +1,308 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * BPF JIT compiler for LoongArch
> + *
> + * Copyright (C) 2022 Loongson Technology Corporation Limited
> + */
> +#include <linux/bpf.h>
> +#include <linux/filter.h>
> +#include <asm/cacheflush.h>
> +#include <asm/inst.h>
> +
> +struct jit_ctx {
> +	const struct bpf_prog *prog;
> +	unsigned int idx;
> +	unsigned int flags;
> +	unsigned int epilogue_offset;
> +	u32 *offset;
> +	union loongarch_instruction *image;
> +	u32 stack_size;
> +};
> +
> +struct jit_data {
> +	struct bpf_binary_header *header;
> +	u8 *image;
> +	struct jit_ctx ctx;
> +};
> +
> +#define emit_insn(ctx, func, ...)						\
> +do {										\
> +	if (ctx->image != NULL) {						\
> +		union loongarch_instruction *insn = &ctx->image[ctx->idx];	\
> +		emit_##func(insn, ##__VA_ARGS__);				\
> +	}									\
> +	ctx->idx++;								\
> +} while (0)
> +
> +#define is_signed_imm12(val)	signed_imm_check(val, 12)
> +#define is_signed_imm16(val)	signed_imm_check(val, 16)
> +#define is_signed_imm26(val)	signed_imm_check(val, 26)
> +#define is_signed_imm32(val)	signed_imm_check(val, 32)
> +#define is_signed_imm52(val)	signed_imm_check(val, 52)
> +#define is_unsigned_imm12(val)	unsigned_imm_check(val, 12)
> +
> +static inline int bpf2la_offset(int bpf_insn, int off, const struct jit_ctx *ctx)
> +{
> +	/* BPF JMP offset is relative to the next instruction */
> +	bpf_insn++;
> +	/*
> +	 * Whereas loongarch branch instructions encode the offset
> +	 * from the branch itself, so we must subtract 1 from the
> +	 * instruction offset.
> +	 */
> +	return (ctx->offset[bpf_insn + off] - (ctx->offset[bpf_insn] - 1));
> +}
> +
> +static inline int epilogue_offset(const struct jit_ctx *ctx)
> +{
> +	int to = ctx->epilogue_offset;
> +	int from = ctx->idx;
> +
> +	return (to - from);
> +}
> +
> +/* Zero-extend 32 bits into 64 bits */
> +static inline void emit_zext_32(struct jit_ctx *ctx, enum loongarch_gpr reg, bool is32)
> +{
> +	if (!is32)
> +		return;
> +
> +	emit_insn(ctx, lu32id, reg, 0);
> +}
> +
> +/* Signed-extend 32 bits into 64 bits */
> +static inline void emit_sext_32(struct jit_ctx *ctx, enum loongarch_gpr reg, bool is32)
> +{
> +	if (!is32)
> +		return;
> +
> +	emit_insn(ctx, addiw, reg, reg, 0);
> +}
> +
> +static inline void move_imm32(struct jit_ctx *ctx, enum loongarch_gpr rd,
> +			      int imm32, bool is32)
> +{
> +	int si20;
> +	u32 ui12;
> +
> +	/* or rd, $zero, $zero */
> +	if (imm32 == 0) {
> +		emit_insn(ctx, or, rd, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_ZERO);
> +		return;
> +	}
> +
> +	/* addiw rd, $zero, imm_11_0(signed) */
> +	if (is_signed_imm12(imm32)) {
> +		emit_insn(ctx, addiw, rd, LOONGARCH_GPR_ZERO, imm32);
> +		goto zext;
> +	}
> +
> +	/* ori rd, $zero, imm_11_0(unsigned) */
> +	if (is_unsigned_imm12(imm32)) {
> +		emit_insn(ctx, ori, rd, LOONGARCH_GPR_ZERO, imm32);
> +		goto zext;
> +	}
> +
> +	/* lu12iw rd, imm_31_12(signed) */
> +	si20 = (imm32 >> 12) & 0xfffff;
> +	emit_insn(ctx, lu12iw, rd, si20);
> +
> +	/* ori rd, rd, imm_11_0(unsigned) */
> +	ui12 = imm32 & 0xfff;
> +	if (ui12 != 0)
> +		emit_insn(ctx, ori, rd, rd, ui12);
> +
> +zext:
> +	emit_zext_32(ctx, rd, is32);
> +}
> +
> +static inline void move_imm64(struct jit_ctx *ctx, enum loongarch_gpr rd,
> +			      long imm64, bool is32)
> +{
> +	int imm32, si20, si12;
> +	long imm52;
> +
> +	si12 = (imm64 >> 52) & 0xfff;
> +	imm52 = imm64 & 0xfffffffffffff;
> +	/* lu52id rd, $zero, imm_63_52(signed) */
> +	if (si12 != 0 && imm52 == 0) {
> +		emit_insn(ctx, lu52id, rd, LOONGARCH_GPR_ZERO, si12);
> +		return;
> +	}
> +
> +	imm32 = imm64 & 0xffffffff;
> +	move_imm32(ctx, rd, imm32, is32);
> +
> +	if (!is_signed_imm32(imm64)) {
> +		if (imm52 != 0) {
> +			/* lu32id rd, imm_51_32(signed) */
> +			si20 = (imm64 >> 32) & 0xfffff;
> +			emit_insn(ctx, lu32id, rd, si20);
> +		}
> +
> +		/* lu52id rd, rd, imm_63_52(signed) */
> +		if (!is_signed_imm52(imm64))
> +			emit_insn(ctx, lu52id, rd, rd, si12);
> +	}
> +}
> +
> +static inline void move_reg(struct jit_ctx *ctx, enum loongarch_gpr rd,
> +			    enum loongarch_gpr rj)
> +{
> +	emit_insn(ctx, or, rd, rj, LOONGARCH_GPR_ZERO);
> +}
> +
> +static inline int invert_jmp_cond(u8 cond)
> +{
> +	switch (cond) {
> +	case BPF_JEQ:
> +		return BPF_JNE;
> +	case BPF_JNE:
> +	case BPF_JSET:
> +		return BPF_JEQ;
> +	case BPF_JGT:
> +		return BPF_JLE;
> +	case BPF_JGE:
> +		return BPF_JLT;
> +	case BPF_JLT:
> +		return BPF_JGE;
> +	case BPF_JLE:
> +		return BPF_JGT;
> +	case BPF_JSGT:
> +		return BPF_JSLE;
> +	case BPF_JSGE:
> +		return BPF_JSLT;
> +	case BPF_JSLT:
> +		return BPF_JSGE;
> +	case BPF_JSLE:
> +		return BPF_JSGT;
> +	}
> +	return -1;
> +}
> +
> +static inline void cond_jmp_offs16(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
> +				   enum loongarch_gpr rd, int jmp_offset)
> +{
> +	switch (cond) {
> +	case BPF_JEQ:
> +		/* PC += jmp_offset if rj == rd */
> +		emit_insn(ctx, beq, rj, rd, jmp_offset);
> +		return;
> +	case BPF_JNE:
> +	case BPF_JSET:
> +		/* PC += jmp_offset if rj != rd */
> +		emit_insn(ctx, bne, rj, rd, jmp_offset);
> +		return;
> +	case BPF_JGT:
> +		/* PC += jmp_offset if rj > rd (unsigned) */
> +		emit_insn(ctx, bltu, rd, rj, jmp_offset);
> +		return;
> +	case BPF_JLT:
> +		/* PC += jmp_offset if rj < rd (unsigned) */
> +		emit_insn(ctx, bltu, rj, rd, jmp_offset);
> +		return;
> +	case BPF_JGE:
> +		/* PC += jmp_offset if rj >= rd (unsigned) */
> +		emit_insn(ctx, bgeu, rj, rd, jmp_offset);
> +		return;
> +	case BPF_JLE:
> +		/* PC += jmp_offset if rj <= rd (unsigned) */
> +		emit_insn(ctx, bgeu, rd, rj, jmp_offset);
> +		return;
> +	case BPF_JSGT:
> +		/* PC += jmp_offset if rj > rd (signed) */
> +		emit_insn(ctx, blt, rd, rj, jmp_offset);
> +		return;
> +	case BPF_JSLT:
> +		/* PC += jmp_offset if rj < rd (signed) */
> +		emit_insn(ctx, blt, rj, rd, jmp_offset);
> +		return;
> +	case BPF_JSGE:
> +		/* PC += jmp_offset if rj >= rd (signed) */
> +		emit_insn(ctx, bge, rj, rd, jmp_offset);
> +		return;
> +	case BPF_JSLE:
> +		/* PC += jmp_offset if rj <= rd (signed) */
> +		emit_insn(ctx, bge, rd, rj, jmp_offset);
> +		return;
> +	}
> +}
> +
> +static inline void cond_jmp_offs26(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
> +				   enum loongarch_gpr rd, int jmp_offset)
> +{
> +	cond = invert_jmp_cond(cond);
> +	cond_jmp_offs16(ctx, cond, rj, rd, 2);
> +	emit_insn(ctx, b, jmp_offset);
> +}
> +
> +static inline void cond_jmp_offs32(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
> +				   enum loongarch_gpr rd, int jmp_offset)
> +{
> +	s64 upper, lower;
> +
> +	upper = (jmp_offset + (1 << 15)) >> 16;
> +	lower = jmp_offset & 0xffff;
> +
> +	cond = invert_jmp_cond(cond);
> +	cond_jmp_offs16(ctx, cond, rj, rd, 3);
> +
> +	/*
> +	 * jmp_addr = jmp_offset << 2
> +	 * tmp2 = PC + jmp_addr[31, 18] + 18'b0
> +	 */
> +	emit_insn(ctx, pcaddu18i, LOONGARCH_GPR_T2, upper << 2);
> +
> +	/* jump to (tmp2 + jmp_addr[17, 2] + 2'b0) */
> +	emit_insn(ctx, jirl, LOONGARCH_GPR_T2, LOONGARCH_GPR_ZERO, lower + 1);
> +}
> +
> +static inline void uncond_jmp_offs26(struct jit_ctx *ctx, int jmp_offset)
> +{
> +	emit_insn(ctx, b, jmp_offset);
> +}
> +
> +static inline void uncond_jmp_offs32(struct jit_ctx *ctx, int jmp_offset, bool is_exit)
> +{
> +	s64 upper, lower;
> +
> +	upper = (jmp_offset + (1 << 15)) >> 16;
> +	lower = jmp_offset & 0xffff;
> +
> +	if (is_exit)
> +		lower -= 1;
> +
> +	/*
> +	 * jmp_addr = jmp_offset << 2;
> +	 * tmp1 = PC + jmp_addr[31, 18] + 18'b0
> +	 */
> +	emit_insn(ctx, pcaddu18i, LOONGARCH_GPR_T1, upper << 2);
> +
> +	/* jump to (tmp1 + jmp_addr[17, 2] + 2'b0) */
> +	emit_insn(ctx, jirl, LOONGARCH_GPR_T1, LOONGARCH_GPR_ZERO, lower + 1);
> +}
> +
> +static inline void emit_cond_jmp(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
> +				 enum loongarch_gpr rd, int jmp_offset)
> +{
> +	cond_jmp_offs26(ctx, cond, rj, rd, jmp_offset);
> +}
> +
> +static inline void emit_uncond_jmp(struct jit_ctx *ctx, int jmp_offset, bool is_exit)
> +{
> +	if (is_signed_imm26(jmp_offset))
> +		uncond_jmp_offs26(ctx, jmp_offset);
> +	else
> +		uncond_jmp_offs32(ctx, jmp_offset, is_exit);
> +}
> +
> +static inline void emit_tailcall_jmp(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
> +				     enum loongarch_gpr rd, int jmp_offset)
> +{
> +	if (is_signed_imm16(jmp_offset))
> +		cond_jmp_offs16(ctx, cond, rj, rd, jmp_offset);
> +	else if (is_signed_imm26(jmp_offset))
> +		cond_jmp_offs26(ctx, cond, rj, rd, jmp_offset - 1);
> +	else
> +		cond_jmp_offs32(ctx, cond, rj, rd, jmp_offset - 2);
> +}
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support
  2022-08-22  1:58   ` Youling Tang
@ 2022-08-22  2:03     ` Youling Tang
  2022-08-22  2:49     ` Tiezhu Yang
  1 sibling, 0 replies; 15+ messages in thread
From: Youling Tang @ 2022-08-22  2:03 UTC (permalink / raw)
  To: Tiezhu Yang, Huacai Chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
  Cc: bpf, loongarch



On 08/22/2022 09:58 AM, Youling Tang wrote:
> On 08/20/2022 07:50 PM, Tiezhu Yang wrote:
>> BPF programs are normally handled by a BPF interpreter, add BPF JIT
>> support for LoongArch to allow the kernel to generate native code
>> when a program is loaded into the kernel, this will significantly
>> speed-up processing of BPF programs.
>>
>> Co-developed-by: Youling Tang <tangyouling@loongson.cn>
>> Signed-off-by: Youling Tang <tangyouling@loongson.cn>
>> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
>> ---
>>  arch/loongarch/Kbuild             |    1 +
>>  arch/loongarch/Kconfig            |    1 +
>>  arch/loongarch/include/asm/inst.h |  185 ++++++
>>  arch/loongarch/net/Makefile       |    7 +
>>  arch/loongarch/net/bpf_jit.c      | 1113
>> +++++++++++++++++++++++++++++++++++++
>>  arch/loongarch/net/bpf_jit.h      |  308 ++++++++++
>>  6 files changed, 1615 insertions(+)
>>  create mode 100644 arch/loongarch/net/Makefile
>>  create mode 100644 arch/loongarch/net/bpf_jit.c
>>  create mode 100644 arch/loongarch/net/bpf_jit.h
>>
>> diff --git a/arch/loongarch/Kbuild b/arch/loongarch/Kbuild
>> index ab5373d..b01f5cd 100644
>> --- a/arch/loongarch/Kbuild
>> +++ b/arch/loongarch/Kbuild
>> @@ -1,5 +1,6 @@
>>  obj-y += kernel/
>>  obj-y += mm/
>> +obj-y += net/
>>  obj-y += vdso/
>>
>>  # for cleaning
>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
>> index 4abc9a2..6d9d846 100644
>> --- a/arch/loongarch/Kconfig
>> +++ b/arch/loongarch/Kconfig
>> @@ -82,6 +82,7 @@ config LOONGARCH
>>      select HAVE_CONTEXT_TRACKING_USER
>>      select HAVE_DEBUG_STACKOVERFLOW
>>      select HAVE_DMA_CONTIGUOUS
>> +    select HAVE_EBPF_JIT if 64BIT
>>      select HAVE_EXIT_THREAD
>>      select HAVE_FAST_GUP
>>      select HAVE_GENERIC_VDSO
>> diff --git a/arch/loongarch/include/asm/inst.h
>> b/arch/loongarch/include/asm/inst.h
>> index de19a96..ac06f2e 100644
>> --- a/arch/loongarch/include/asm/inst.h
>> +++ b/arch/loongarch/include/asm/inst.h
>> @@ -288,4 +288,189 @@ static inline bool unsigned_imm_check(unsigned
>> long val, unsigned int bit)
>>      return val < (1UL << bit);
>>  }
>>
>> +#define DEF_EMIT_REG0I26_FORMAT(NAME, OP)                \
>> +static inline void emit_##NAME(union loongarch_instruction *insn,    \
>> +                   int offset)                \
>> +{                                    \
>> +    unsigned int immediate_l, immediate_h;                \
>> +                                    \
>> +    immediate_l = offset & 0xffff;                    \
>> +    offset >>= 16;                            \
>> +    immediate_h = offset & 0x3ff;                    \
>> +                                    \
>> +    insn->reg0i26_format.opcode = OP;                \
>> +    insn->reg0i26_format.immediate_l = immediate_l;            \
>> +    insn->reg0i26_format.immediate_h = immediate_h;            \
>> +}
>> +
>> +DEF_EMIT_REG0I26_FORMAT(b, b_op)
>> +
>> +#define DEF_EMIT_REG1I20_FORMAT(NAME, OP)                \
>> +static inline void emit_##NAME(union loongarch_instruction *insn,    \
>> +                   enum loongarch_gpr rd, int imm)        \
>> +{                                    \
>> +    insn->reg1i20_format.opcode = OP;                \
>> +    insn->reg1i20_format.immediate = imm;                \
>> +    insn->reg1i20_format.rd = rd;                    \
>> +}
>> +
>> +DEF_EMIT_REG1I20_FORMAT(lu12iw, lu12iw_op)
>> +DEF_EMIT_REG1I20_FORMAT(lu32id, lu32id_op)
>
> We can delete the larch_insn_gen_{lu32id, lu52id, jirl} functions in
> inst.c and use emit_xxx.
>
> The implementation of emit_plt_entry() is similarly modified as follows:
> struct plt_entry {
>         union loongarch_instruction lu12iw;
>         union loongarch_instruction lu32id;
>         union loongarch_instruction lu52id;
>         union loongarch_instruction jirl;
> };
>
> static inline struct plt_entry emit_plt_entry(unsigned long val)
> {
>         union loongarch_instruction *lu12iw, *lu32id, *lu52id, *jirl;
>
>         emit_lu32id(lu12iw, LOONGARCH_GPR_T1, ADDR_IMM(val, LU12IW));

Sorry, this is emit_lu12iw.

>         emit_lu32id(lu32id, LOONGARCH_GPR_T1, ADDR_IMM(val, LU32ID));
>         emit_lu52id(lu52id, LOONGARCH_GPR_T1, LOONGARCH_GPR_T1,
> ADDR_IMM(val, LU52ID));
>         emit_jirl(jirl, LOONGARCH_GPR_T1, 0, (val & 0xfff) >> 2);
>
>         return (struct plt_entry) { *lu12iw, *lu32id, *lu52id, *jirl };
> }
>
> Thanks,
> Youling
>
>> +DEF_EMIT_REG1I20_FORMAT(pcaddu18i, pcaddu18i_op)
>> +
>> +#define DEF_EMIT_REG2_FORMAT(NAME, OP)                    \
>> +static inline void emit_##NAME(union loongarch_instruction *insn,    \
>> +                   enum loongarch_gpr rd,            \
>> +                   enum loongarch_gpr rj)            \
>> +{                                    \
>> +    insn->reg2_format.opcode = OP;                    \
>> +    insn->reg2_format.rd = rd;                    \
>> +    insn->reg2_format.rj = rj;                    \
>> +}
>> +
>> +DEF_EMIT_REG2_FORMAT(revb2h, revb2h_op)
>> +DEF_EMIT_REG2_FORMAT(revb2w, revb2w_op)
>> +DEF_EMIT_REG2_FORMAT(revbd, revbd_op)
>> +
>> +#define DEF_EMIT_REG2I5_FORMAT(NAME, OP)                \
>> +static inline void emit_##NAME(union loongarch_instruction *insn,    \
>> +                   enum loongarch_gpr rd,            \
>> +                   enum loongarch_gpr rj,            \
>> +                   int imm)                    \
>> +{                                    \
>> +    insn->reg2i5_format.opcode = OP;                \
>> +    insn->reg2i5_format.immediate = imm;                \
>> +    insn->reg2i5_format.rd = rd;                    \
>> +    insn->reg2i5_format.rj = rj;                    \
>> +}
>> +
>> +DEF_EMIT_REG2I5_FORMAT(slliw, slliw_op)
>> +DEF_EMIT_REG2I5_FORMAT(srliw, srliw_op)
>> +DEF_EMIT_REG2I5_FORMAT(sraiw, sraiw_op)
>> +
>> +#define DEF_EMIT_REG2I6_FORMAT(NAME, OP)                \
>> +static inline void emit_##NAME(union loongarch_instruction *insn,    \
>> +                   enum loongarch_gpr rd,            \
>> +                   enum loongarch_gpr rj,            \
>> +                   int imm)                    \
>> +{                                    \
>> +    insn->reg2i6_format.opcode = OP;                \
>> +    insn->reg2i6_format.immediate = imm;                \
>> +    insn->reg2i6_format.rd = rd;                    \
>> +    insn->reg2i6_format.rj = rj;                    \
>> +}
>> +
>> +DEF_EMIT_REG2I6_FORMAT(sllid, sllid_op)
>> +DEF_EMIT_REG2I6_FORMAT(srlid, srlid_op)
>> +DEF_EMIT_REG2I6_FORMAT(sraid, sraid_op)
>> +
>> +#define DEF_EMIT_REG2I12_FORMAT(NAME, OP)                \
>> +static inline void emit_##NAME(union loongarch_instruction *insn,    \
>> +                   enum loongarch_gpr rd,            \
>> +                   enum loongarch_gpr rj,            \
>> +                   int imm)                    \
>> +{                                    \
>> +    insn->reg2i12_format.opcode = OP;                \
>> +    insn->reg2i12_format.immediate = imm;                \
>> +    insn->reg2i12_format.rd = rd;                    \
>> +    insn->reg2i12_format.rj = rj;                    \
>> +}
>> +
>> +DEF_EMIT_REG2I12_FORMAT(addiw, addiw_op)
>> +DEF_EMIT_REG2I12_FORMAT(addid, addid_op)
>> +DEF_EMIT_REG2I12_FORMAT(lu52id, lu52id_op)
>> +DEF_EMIT_REG2I12_FORMAT(andi, andi_op)
>> +DEF_EMIT_REG2I12_FORMAT(ori, ori_op)
>> +DEF_EMIT_REG2I12_FORMAT(xori, xori_op)
>> +DEF_EMIT_REG2I12_FORMAT(ldbu, ldbu_op)
>> +DEF_EMIT_REG2I12_FORMAT(ldhu, ldhu_op)
>> +DEF_EMIT_REG2I12_FORMAT(ldwu, ldwu_op)
>> +DEF_EMIT_REG2I12_FORMAT(ldd, ldd_op)
>> +DEF_EMIT_REG2I12_FORMAT(stb, stb_op)
>> +DEF_EMIT_REG2I12_FORMAT(sth, sth_op)
>> +DEF_EMIT_REG2I12_FORMAT(stw, stw_op)
>> +DEF_EMIT_REG2I12_FORMAT(std, std_op)
>> +
>> +#define DEF_EMIT_REG2I14_FORMAT(NAME, OP)                \
>> +static inline void emit_##NAME(union loongarch_instruction *insn,    \
>> +                   enum loongarch_gpr rd,            \
>> +                   enum loongarch_gpr rj,            \
>> +                   int imm)                    \
>> +{                                    \
>> +    insn->reg2i14_format.opcode = OP;                \
>> +    insn->reg2i14_format.immediate = imm;                \
>> +    insn->reg2i14_format.rd = rd;                    \
>> +    insn->reg2i14_format.rj = rj;                    \
>> +}
>> +
>> +DEF_EMIT_REG2I14_FORMAT(llw, llw_op)
>> +DEF_EMIT_REG2I14_FORMAT(scw, scw_op)
>> +DEF_EMIT_REG2I14_FORMAT(lld, lld_op)
>> +DEF_EMIT_REG2I14_FORMAT(scd, scd_op)
>> +
>> +#define DEF_EMIT_REG2I16_FORMAT(NAME, OP)                \
>> +static inline void emit_##NAME(union loongarch_instruction *insn,    \
>> +                   enum loongarch_gpr rj,            \
>> +                   enum loongarch_gpr rd,            \
>> +                   int offset)                \
>> +{                                    \
>> +    insn->reg2i16_format.opcode = OP;                \
>> +    insn->reg2i16_format.immediate = offset;            \
>> +    insn->reg2i16_format.rj = rj;                    \
>> +    insn->reg2i16_format.rd = rd;                    \
>> +}
>> +
>> +DEF_EMIT_REG2I16_FORMAT(beq, beq_op)
>> +DEF_EMIT_REG2I16_FORMAT(bne, bne_op)
>> +DEF_EMIT_REG2I16_FORMAT(blt, blt_op)
>> +DEF_EMIT_REG2I16_FORMAT(bge, bge_op)
>> +DEF_EMIT_REG2I16_FORMAT(bltu, bltu_op)
>> +DEF_EMIT_REG2I16_FORMAT(bgeu, bgeu_op)
>> +DEF_EMIT_REG2I16_FORMAT(jirl, jirl_op)
>> +
>> +#define DEF_EMIT_REG3_FORMAT(NAME, OP)                    \
>> +static inline void emit_##NAME(union loongarch_instruction *insn,    \
>> +                   enum loongarch_gpr rd,            \
>> +                   enum loongarch_gpr rj,            \
>> +                   enum loongarch_gpr rk)            \
>> +{                                    \
>> +    insn->reg3_format.opcode = OP;                    \
>> +    insn->reg3_format.rd = rd;                    \
>> +    insn->reg3_format.rj = rj;                    \
>> +    insn->reg3_format.rk = rk;                    \
>> +}
>> +
>> +DEF_EMIT_REG3_FORMAT(addd, addd_op)
>> +DEF_EMIT_REG3_FORMAT(subd, subd_op)
>> +DEF_EMIT_REG3_FORMAT(muld, muld_op)
>> +DEF_EMIT_REG3_FORMAT(divdu, divdu_op)
>> +DEF_EMIT_REG3_FORMAT(moddu, moddu_op)
>> +DEF_EMIT_REG3_FORMAT(and, and_op)
>> +DEF_EMIT_REG3_FORMAT(or, or_op)
>> +DEF_EMIT_REG3_FORMAT(xor, xor_op)
>> +DEF_EMIT_REG3_FORMAT(sllw, sllw_op)
>> +DEF_EMIT_REG3_FORMAT(slld, slld_op)
>> +DEF_EMIT_REG3_FORMAT(srlw, srlw_op)
>> +DEF_EMIT_REG3_FORMAT(srld, srld_op)
>> +DEF_EMIT_REG3_FORMAT(sraw, sraw_op)
>> +DEF_EMIT_REG3_FORMAT(srad, srad_op)
>> +DEF_EMIT_REG3_FORMAT(ldxbu, ldxbu_op)
>> +DEF_EMIT_REG3_FORMAT(ldxhu, ldxhu_op)
>> +DEF_EMIT_REG3_FORMAT(ldxwu, ldxwu_op)
>> +DEF_EMIT_REG3_FORMAT(ldxd, ldxd_op)
>> +DEF_EMIT_REG3_FORMAT(stxb, stxb_op)
>> +DEF_EMIT_REG3_FORMAT(stxh, stxh_op)
>> +DEF_EMIT_REG3_FORMAT(stxw, stxw_op)
>> +DEF_EMIT_REG3_FORMAT(stxd, stxd_op)
>> +DEF_EMIT_REG3_FORMAT(amaddw, amaddw_op)
>> +DEF_EMIT_REG3_FORMAT(amaddd, amaddd_op)
>> +DEF_EMIT_REG3_FORMAT(amandw, amandw_op)
>> +DEF_EMIT_REG3_FORMAT(amandd, amandd_op)
>> +DEF_EMIT_REG3_FORMAT(amorw, amorw_op)
>> +DEF_EMIT_REG3_FORMAT(amord, amord_op)
>> +DEF_EMIT_REG3_FORMAT(amxorw, amxorw_op)
>> +DEF_EMIT_REG3_FORMAT(amxord, amxord_op)
>> +DEF_EMIT_REG3_FORMAT(amswapw, amswapw_op)
>> +DEF_EMIT_REG3_FORMAT(amswapd, amswapd_op)
>> +
>>  #endif /* _ASM_INST_H */
>> diff --git a/arch/loongarch/net/Makefile b/arch/loongarch/net/Makefile
>> new file mode 100644
>> index 0000000..1ec12a0
>> --- /dev/null
>> +++ b/arch/loongarch/net/Makefile
>> @@ -0,0 +1,7 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +#
>> +# Makefile for arch/loongarch/net
>> +#
>> +# Copyright (C) 2022 Loongson Technology Corporation Limited
>> +#
>> +obj-$(CONFIG_BPF_JIT) += bpf_jit.o
>> diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
>> new file mode 100644
>> index 0000000..2f41b9b
>> --- /dev/null
>> +++ b/arch/loongarch/net/bpf_jit.c
>> @@ -0,0 +1,1113 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * BPF JIT compiler for LoongArch
>> + *
>> + * Copyright (C) 2022 Loongson Technology Corporation Limited
>> + */
>> +#include "bpf_jit.h"
>> +
>> +#define REG_TCC        LOONGARCH_GPR_A6
>> +#define TCC_SAVED    LOONGARCH_GPR_S5
>> +
>> +#define SAVE_RA        BIT(0)
>> +#define SAVE_TCC    BIT(1)
>> +
>> +static const int regmap[] = {
>> +    /* return value from in-kernel function, and exit value for eBPF
>> program */
>> +    [BPF_REG_0] = LOONGARCH_GPR_A5,
>> +    /* arguments from eBPF program to in-kernel function */
>> +    [BPF_REG_1] = LOONGARCH_GPR_A0,
>> +    [BPF_REG_2] = LOONGARCH_GPR_A1,
>> +    [BPF_REG_3] = LOONGARCH_GPR_A2,
>> +    [BPF_REG_4] = LOONGARCH_GPR_A3,
>> +    [BPF_REG_5] = LOONGARCH_GPR_A4,
>> +    /* callee saved registers that in-kernel function will preserve */
>> +    [BPF_REG_6] = LOONGARCH_GPR_S0,
>> +    [BPF_REG_7] = LOONGARCH_GPR_S1,
>> +    [BPF_REG_8] = LOONGARCH_GPR_S2,
>> +    [BPF_REG_9] = LOONGARCH_GPR_S3,
>> +    /* read-only frame pointer to access stack */
>> +    [BPF_REG_FP] = LOONGARCH_GPR_S4,
>> +    /* temporary register for blinding constants */
>> +    [BPF_REG_AX] = LOONGARCH_GPR_T0,
>> +};
>> +
>> +static void mark_call(struct jit_ctx *ctx)
>> +{
>> +    ctx->flags |= SAVE_RA;
>> +}
>> +
>> +static void mark_tail_call(struct jit_ctx *ctx)
>> +{
>> +    ctx->flags |= SAVE_TCC;
>> +}
>> +
>> +static bool seen_call(struct jit_ctx *ctx)
>> +{
>> +    return (ctx->flags & SAVE_RA);
>> +}
>> +
>> +static bool seen_tail_call(struct jit_ctx *ctx)
>> +{
>> +    return (ctx->flags & SAVE_TCC);
>> +}
>> +
>> +static u8 tail_call_reg(struct jit_ctx *ctx)
>> +{
>> +    if (seen_call(ctx))
>> +        return TCC_SAVED;
>> +
>> +    return REG_TCC;
>> +}
>> +
>> +/*
>> + * eBPF prog stack layout:
>> + *
>> + *                                        high
>> + * original $sp ------------> +-------------------------+
>> <--LOONGARCH_GPR_FP
>> + *                            |           $ra           |
>> + *                            +-------------------------+
>> + *                            |           $fp           |
>> + *                            +-------------------------+
>> + *                            |           $s0           |
>> + *                            +-------------------------+
>> + *                            |           $s1           |
>> + *                            +-------------------------+
>> + *                            |           $s2           |
>> + *                            +-------------------------+
>> + *                            |           $s3           |
>> + *                            +-------------------------+
>> + *                            |           $s4           |
>> + *                            +-------------------------+
>> + *                            |           $s5           |
>> + *                            +-------------------------+ <--BPF_REG_FP
>> + *                            |  prog->aux->stack_depth |
>> + *                            |        (optional)       |
>> + * current $sp -------------> +-------------------------+
>> + *                                        low
>> + */
>> +static void build_prologue(struct jit_ctx *ctx)
>> +{
>> +    int stack_adjust = 0, store_offset, bpf_stack_adjust;
>> +
>> +    bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
>> +
>> +    /* To store ra, fp, s0, s1, s2, s3, s4 and s5. */
>> +    stack_adjust += sizeof(long) * 8;
>> +
>> +    stack_adjust = round_up(stack_adjust, 16);
>> +    stack_adjust += bpf_stack_adjust;
>> +
>> +    /*
>> +     * First instruction initializes the tail call count (TCC).
>> +     * On tail call we skip this instruction, and the TCC is
>> +     * passed in REG_TCC from the caller.
>> +     */
>> +    emit_insn(ctx, addid, REG_TCC, LOONGARCH_GPR_ZERO,
>> MAX_TAIL_CALL_CNT);
>> +
>> +    emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP,
>> -stack_adjust);
>> +
>> +    store_offset = stack_adjust - sizeof(long);
>> +    emit_insn(ctx, std, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP,
>> store_offset);
>> +
>> +    store_offset -= sizeof(long);
>> +    emit_insn(ctx, std, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP,
>> store_offset);
>> +
>> +    store_offset -= sizeof(long);
>> +    emit_insn(ctx, std, LOONGARCH_GPR_S0, LOONGARCH_GPR_SP,
>> store_offset);
>> +
>> +    store_offset -= sizeof(long);
>> +    emit_insn(ctx, std, LOONGARCH_GPR_S1, LOONGARCH_GPR_SP,
>> store_offset);
>> +
>> +    store_offset -= sizeof(long);
>> +    emit_insn(ctx, std, LOONGARCH_GPR_S2, LOONGARCH_GPR_SP,
>> store_offset);
>> +
>> +    store_offset -= sizeof(long);
>> +    emit_insn(ctx, std, LOONGARCH_GPR_S3, LOONGARCH_GPR_SP,
>> store_offset);
>> +
>> +    store_offset -= sizeof(long);
>> +    emit_insn(ctx, std, LOONGARCH_GPR_S4, LOONGARCH_GPR_SP,
>> store_offset);
>> +
>> +    store_offset -= sizeof(long);
>> +    emit_insn(ctx, std, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP,
>> store_offset);
>> +
>> +    emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP,
>> stack_adjust);
>> +
>> +    if (bpf_stack_adjust)
>> +        emit_insn(ctx, addid, regmap[BPF_REG_FP], LOONGARCH_GPR_SP,
>> bpf_stack_adjust);
>> +
>> +    /*
>> +     * Program contains calls and tail calls, so REG_TCC need
>> +     * to be saved across calls.
>> +     */
>> +    if (seen_tail_call(ctx) && seen_call(ctx))
>> +        move_reg(ctx, TCC_SAVED, REG_TCC);
>> +
>> +    ctx->stack_size = stack_adjust;
>> +}
>> +
>> +static void __build_epilogue(struct jit_ctx *ctx, bool is_tail_call)
>> +{
>> +    int stack_adjust = ctx->stack_size;
>> +    int load_offset;
>> +
>> +    load_offset = stack_adjust - sizeof(long);
>> +    emit_insn(ctx, ldd, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP,
>> load_offset);
>> +
>> +    load_offset -= sizeof(long);
>> +    emit_insn(ctx, ldd, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP,
>> load_offset);
>> +
>> +    load_offset -= sizeof(long);
>> +    emit_insn(ctx, ldd, LOONGARCH_GPR_S0, LOONGARCH_GPR_SP,
>> load_offset);
>> +
>> +    load_offset -= sizeof(long);
>> +    emit_insn(ctx, ldd, LOONGARCH_GPR_S1, LOONGARCH_GPR_SP,
>> load_offset);
>> +
>> +    load_offset -= sizeof(long);
>> +    emit_insn(ctx, ldd, LOONGARCH_GPR_S2, LOONGARCH_GPR_SP,
>> load_offset);
>> +
>> +    load_offset -= sizeof(long);
>> +    emit_insn(ctx, ldd, LOONGARCH_GPR_S3, LOONGARCH_GPR_SP,
>> load_offset);
>> +
>> +    load_offset -= sizeof(long);
>> +    emit_insn(ctx, ldd, LOONGARCH_GPR_S4, LOONGARCH_GPR_SP,
>> load_offset);
>> +
>> +    load_offset -= sizeof(long);
>> +    emit_insn(ctx, ldd, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP,
>> load_offset);
>> +
>> +    emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP,
>> stack_adjust);
>> +
>> +    if (!is_tail_call) {
>> +        /* Set return value */
>> +        move_reg(ctx, LOONGARCH_GPR_A0, regmap[BPF_REG_0]);
>> +        /* Return to the caller */
>> +        emit_insn(ctx, jirl, LOONGARCH_GPR_RA, LOONGARCH_GPR_ZERO, 0);
>> +    } else {
>> +        /*
>> +         * Call the next bpf prog and skip the first instruction
>> +         * of TCC initialization.
>> +         */
>> +        emit_insn(ctx, jirl, LOONGARCH_GPR_T3, LOONGARCH_GPR_ZERO, 1);
>> +    }
>> +}
>> +
>> +void build_epilogue(struct jit_ctx *ctx)
>> +{
>> +    __build_epilogue(ctx, false);
>> +}
>> +
>> +bool bpf_jit_supports_kfunc_call(void)
>> +{
>> +    return true;
>> +}
>> +
>> +/* initialized on the first pass of build_body() */
>> +static int out_offset = -1;
>> +static int emit_bpf_tail_call(struct jit_ctx *ctx)
>> +{
>> +    int off;
>> +    u8 tcc = tail_call_reg(ctx);
>> +    u8 a1 = LOONGARCH_GPR_A1;
>> +    u8 a2 = LOONGARCH_GPR_A2;
>> +    u8 t1 = LOONGARCH_GPR_T1;
>> +    u8 t2 = LOONGARCH_GPR_T2;
>> +    u8 t3 = LOONGARCH_GPR_T3;
>> +    const int idx0 = ctx->idx;
>> +
>> +#define cur_offset (ctx->idx - idx0)
>> +#define jmp_offset (out_offset - (cur_offset))
>> +
>> +    /*
>> +     * a0: &ctx
>> +     * a1: &array
>> +     * a2: index
>> +     *
>> +     * if (index >= array->map.max_entries)
>> +     *     goto out;
>> +     */
>> +    off = offsetof(struct bpf_array, map.max_entries);
>> +    emit_insn(ctx, ldwu, t1, a1, off);
>> +    /* bgeu $a2, $t1, jmp_offset */
>> +    emit_tailcall_jmp(ctx, BPF_JGE, a2, t1, jmp_offset);
>> +
>> +    /*
>> +     * if (--TCC < 0)
>> +     *     goto out;
>> +     */
>> +    emit_insn(ctx, addid, REG_TCC, tcc, -1);
>> +    emit_tailcall_jmp(ctx, BPF_JSLT, REG_TCC, LOONGARCH_GPR_ZERO,
>> jmp_offset);
>> +
>> +    /*
>> +     * prog = array->ptrs[index];
>> +     * if (!prog)
>> +     *     goto out;
>> +     */
>> +    emit_insn(ctx, sllid, t2, a2, 3);
>> +    emit_insn(ctx, addd, t2, t2, a1);
>> +    off = offsetof(struct bpf_array, ptrs);
>> +    emit_insn(ctx, ldd, t2, t2, off);
>> +    /* beq $t2, $zero, jmp_offset */
>> +    emit_tailcall_jmp(ctx, BPF_JEQ, t2, LOONGARCH_GPR_ZERO, jmp_offset);
>> +
>> +    /* goto *(prog->bpf_func + 4); */
>> +    off = offsetof(struct bpf_prog, bpf_func);
>> +    emit_insn(ctx, ldd, t3, t2, off);
>> +    __build_epilogue(ctx, true);
>> +
>> +    /* out: */
>> +    if (out_offset == -1)
>> +        out_offset = cur_offset;
>> +    if (cur_offset != out_offset) {
>> +        pr_err_once("tail_call out_offset = %d, expected %d!\n",
>> +                cur_offset, out_offset);
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +#undef cur_offset
>> +#undef jmp_offset
>> +}
>> +
>> +static void emit_atomic(const struct bpf_insn *insn, struct jit_ctx
>> *ctx)
>> +{
>> +    const u8 dst = regmap[insn->dst_reg];
>> +    const u8 src = regmap[insn->src_reg];
>> +    const u8 t1 = LOONGARCH_GPR_T1;
>> +    const u8 t2 = LOONGARCH_GPR_T2;
>> +    const u8 t3 = LOONGARCH_GPR_T3;
>> +    const s16 off = insn->off;
>> +    const s32 imm = insn->imm;
>> +    const bool isdw = BPF_SIZE(insn->code) == BPF_DW;
>> +
>> +    move_imm32(ctx, t1, off, false);
>> +    emit_insn(ctx, addd, t1, dst, t1);
>> +    move_reg(ctx, t3, src);
>> +
>> +    switch (imm) {
>> +    /* lock *(size *)(dst + off) <op>= src */
>> +    case BPF_ADD:
>> +        if (isdw)
>> +            emit_insn(ctx, amaddd, t2, t1, src);
>> +        else
>> +            emit_insn(ctx, amaddw, t2, t1, src);
>> +        break;
>> +    case BPF_AND:
>> +        if (isdw)
>> +            emit_insn(ctx, amandd, t2, t1, src);
>> +        else
>> +            emit_insn(ctx, amandw, t2, t1, src);
>> +        break;
>> +    case BPF_OR:
>> +        if (isdw)
>> +            emit_insn(ctx, amord, t2, t1, src);
>> +        else
>> +            emit_insn(ctx, amorw, t2, t1, src);
>> +        break;
>> +    case BPF_XOR:
>> +        if (isdw)
>> +            emit_insn(ctx, amxord, t2, t1, src);
>> +        else
>> +            emit_insn(ctx, amxorw, t2, t1, src);
>> +        break;
>> +    /* src = atomic_fetch_<op>(dst + off, src) */
>> +    case BPF_ADD | BPF_FETCH:
>> +        if (isdw) {
>> +            emit_insn(ctx, amaddd, src, t1, t3);
>> +        } else {
>> +            emit_insn(ctx, amaddw, src, t1, t3);
>> +            emit_zext_32(ctx, src, true);
>> +        }
>> +        break;
>> +    case BPF_AND | BPF_FETCH:
>> +        if (isdw) {
>> +            emit_insn(ctx, amandd, src, t1, t3);
>> +        } else {
>> +            emit_insn(ctx, amandw, src, t1, t3);
>> +            emit_zext_32(ctx, src, true);
>> +        }
>> +        break;
>> +    case BPF_OR | BPF_FETCH:
>> +        if (isdw) {
>> +            emit_insn(ctx, amord, src, t1, t3);
>> +        } else {
>> +            emit_insn(ctx, amorw, src, t1, t3);
>> +            emit_zext_32(ctx, src, true);
>> +        }
>> +        break;
>> +    case BPF_XOR | BPF_FETCH:
>> +        if (isdw) {
>> +            emit_insn(ctx, amxord, src, t1, t3);
>> +        } else {
>> +            emit_insn(ctx, amxorw, src, t1, t3);
>> +            emit_zext_32(ctx, src, true);
>> +        }
>> +        break;
>> +    /* src = atomic_xchg(dst + off, src); */
>> +    case BPF_XCHG:
>> +        if (isdw) {
>> +            emit_insn(ctx, amswapd, src, t1, t3);
>> +        } else {
>> +            emit_insn(ctx, amswapw, src, t1, t3);
>> +            emit_zext_32(ctx, src, true);
>> +        }
>> +        break;
>> +    /* r0 = atomic_cmpxchg(dst + off, r0, src); */
>> +    case BPF_CMPXCHG:
>> +        u8 r0 = regmap[BPF_REG_0];
>> +
>> +        move_reg(ctx, t2, r0);
>> +        if (isdw) {
>> +            emit_insn(ctx, lld, r0, t1, 0);
>> +            emit_insn(ctx, bne, t2, r0, 4);
>> +            move_reg(ctx, t3, src);
>> +            emit_insn(ctx, scd, t3, t1, 0);
>> +            emit_insn(ctx, beq, t3, LOONGARCH_GPR_ZERO, -4);
>> +        } else {
>> +            emit_insn(ctx, llw, r0, t1, 0);
>> +            emit_zext_32(ctx, t2, true);
>> +            emit_zext_32(ctx, r0, true);
>> +            emit_insn(ctx, bne, t2, r0, 4);
>> +            move_reg(ctx, t3, src);
>> +            emit_insn(ctx, scw, t3, t1, 0);
>> +            emit_insn(ctx, beq, t3, LOONGARCH_GPR_ZERO, -6);
>> +            emit_zext_32(ctx, r0, true);
>> +        }
>> +        break;
>> +    }
>> +}
>> +
>> +static bool is_signed_bpf_cond(u8 cond)
>> +{
>> +    return cond == BPF_JSGT || cond == BPF_JSLT ||
>> +           cond == BPF_JSGE || cond == BPF_JSLE;
>> +}
>> +
>> +static int build_insn(const struct bpf_insn *insn, struct jit_ctx
>> *ctx, bool extra_pass)
>> +{
>> +    const bool is32 = BPF_CLASS(insn->code) == BPF_ALU ||
>> +              BPF_CLASS(insn->code) == BPF_JMP32;
>> +    const u8 code = insn->code;
>> +    const u8 cond = BPF_OP(code);
>> +    const u8 dst = regmap[insn->dst_reg];
>> +    const u8 src = regmap[insn->src_reg];
>> +    const u8 t1 = LOONGARCH_GPR_T1;
>> +    const u8 t2 = LOONGARCH_GPR_T2;
>> +    const s16 off = insn->off;
>> +    const s32 imm = insn->imm;
>> +    int i = insn - ctx->prog->insnsi;
>> +    int jmp_offset;
>> +
>> +    switch (code) {
>> +    /* dst = src */
>> +    case BPF_ALU | BPF_MOV | BPF_X:
>> +    case BPF_ALU64 | BPF_MOV | BPF_X:
>> +        move_reg(ctx, dst, src);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    /* dst = imm */
>> +    case BPF_ALU | BPF_MOV | BPF_K:
>> +    case BPF_ALU64 | BPF_MOV | BPF_K:
>> +        move_imm32(ctx, dst, imm, is32);
>> +        break;
>> +
>> +    /* dst = dst + src */
>> +    case BPF_ALU | BPF_ADD | BPF_X:
>> +    case BPF_ALU64 | BPF_ADD | BPF_X:
>> +        emit_insn(ctx, addd, dst, dst, src);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    /* dst = dst + imm */
>> +    case BPF_ALU | BPF_ADD | BPF_K:
>> +    case BPF_ALU64 | BPF_ADD | BPF_K:
>> +        if (is_signed_imm12(imm)) {
>> +            emit_insn(ctx, addid, dst, dst, imm);
>> +        } else {
>> +            move_imm32(ctx, t1, imm, is32);
>> +            emit_insn(ctx, addd, dst, dst, t1);
>> +        }
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +
>> +    /* dst = dst - src */
>> +    case BPF_ALU | BPF_SUB | BPF_X:
>> +    case BPF_ALU64 | BPF_SUB | BPF_X:
>> +        emit_insn(ctx, subd, dst, dst, src);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    /* dst = dst - imm */
>> +    case BPF_ALU | BPF_SUB | BPF_K:
>> +    case BPF_ALU64 | BPF_SUB | BPF_K:
>> +        if (is_signed_imm12(-imm)) {
>> +            emit_insn(ctx, addid, dst, dst, -imm);
>> +        } else {
>> +            move_imm32(ctx, t1, imm, is32);
>> +            emit_insn(ctx, subd, dst, dst, t1);
>> +        }
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +
>> +    /* dst = dst * src */
>> +    case BPF_ALU | BPF_MUL | BPF_X:
>> +    case BPF_ALU64 | BPF_MUL | BPF_X:
>> +        emit_insn(ctx, muld, dst, dst, src);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    /* dst = dst * imm */
>> +    case BPF_ALU | BPF_MUL | BPF_K:
>> +    case BPF_ALU64 | BPF_MUL | BPF_K:
>> +        move_imm32(ctx, t1, imm, is32);
>> +        emit_insn(ctx, muld, dst, dst, t1);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +
>> +    /* dst = dst / src */
>> +    case BPF_ALU | BPF_DIV | BPF_X:
>> +    case BPF_ALU64 | BPF_DIV | BPF_X:
>> +        emit_zext_32(ctx, dst, is32);
>> +        move_reg(ctx, t1, src);
>> +        emit_zext_32(ctx, t1, is32);
>> +        emit_insn(ctx, divdu, dst, dst, t1);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    /* dst = dst / imm */
>> +    case BPF_ALU | BPF_DIV | BPF_K:
>> +    case BPF_ALU64 | BPF_DIV | BPF_K:
>> +        move_imm32(ctx, t1, imm, is32);
>> +        emit_zext_32(ctx, dst, is32);
>> +        emit_insn(ctx, divdu, dst, dst, t1);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +
>> +    /* dst = dst % src */
>> +    case BPF_ALU | BPF_MOD | BPF_X:
>> +    case BPF_ALU64 | BPF_MOD | BPF_X:
>> +        emit_zext_32(ctx, dst, is32);
>> +        move_reg(ctx, t1, src);
>> +        emit_zext_32(ctx, t1, is32);
>> +        emit_insn(ctx, moddu, dst, dst, t1);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    /* dst = dst % imm */
>> +    case BPF_ALU | BPF_MOD | BPF_K:
>> +    case BPF_ALU64 | BPF_MOD | BPF_K:
>> +        move_imm32(ctx, t1, imm, is32);
>> +        emit_zext_32(ctx, dst, is32);
>> +        emit_insn(ctx, moddu, dst, dst, t1);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +
>> +    /* dst = -dst */
>> +    case BPF_ALU | BPF_NEG:
>> +    case BPF_ALU64 | BPF_NEG:
>> +        move_imm32(ctx, t1, imm, is32);
>> +        emit_insn(ctx, subd, dst, LOONGARCH_GPR_ZERO, dst);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +
>> +    /* dst = dst & src */
>> +    case BPF_ALU | BPF_AND | BPF_X:
>> +    case BPF_ALU64 | BPF_AND | BPF_X:
>> +        emit_insn(ctx, and, dst, dst, src);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    /* dst = dst & imm */
>> +    case BPF_ALU | BPF_AND | BPF_K:
>> +    case BPF_ALU64 | BPF_AND | BPF_K:
>> +        if (is_unsigned_imm12(imm)) {
>> +            emit_insn(ctx, andi, dst, dst, imm);
>> +        } else {
>> +            move_imm32(ctx, t1, imm, is32);
>> +            emit_insn(ctx, and, dst, dst, t1);
>> +        }
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +
>> +    /* dst = dst | src */
>> +    case BPF_ALU | BPF_OR | BPF_X:
>> +    case BPF_ALU64 | BPF_OR | BPF_X:
>> +        emit_insn(ctx, or, dst, dst, src);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    /* dst = dst | imm */
>> +    case BPF_ALU | BPF_OR | BPF_K:
>> +    case BPF_ALU64 | BPF_OR | BPF_K:
>> +        if (is_unsigned_imm12(imm)) {
>> +            emit_insn(ctx, ori, dst, dst, imm);
>> +        } else {
>> +            move_imm32(ctx, t1, imm, is32);
>> +            emit_insn(ctx, or, dst, dst, t1);
>> +        }
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +
>> +    /* dst = dst ^ src */
>> +    case BPF_ALU | BPF_XOR | BPF_X:
>> +    case BPF_ALU64 | BPF_XOR | BPF_X:
>> +        emit_insn(ctx, xor, dst, dst, src);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    /* dst = dst ^ imm */
>> +    case BPF_ALU | BPF_XOR | BPF_K:
>> +    case BPF_ALU64 | BPF_XOR | BPF_K:
>> +        if (is_unsigned_imm12(imm)) {
>> +            emit_insn(ctx, xori, dst, dst, imm);
>> +        } else {
>> +            move_imm32(ctx, t1, imm, is32);
>> +            emit_insn(ctx, xor, dst, dst, t1);
>> +        }
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +
>> +    /* dst = dst << src (logical) */
>> +    case BPF_ALU | BPF_LSH | BPF_X:
>> +        emit_insn(ctx, sllw, dst, dst, src);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    case BPF_ALU64 | BPF_LSH | BPF_X:
>> +        emit_insn(ctx, slld, dst, dst, src);
>> +        break;
>> +    /* dst = dst << imm (logical) */
>> +    case BPF_ALU | BPF_LSH | BPF_K:
>> +        emit_insn(ctx, slliw, dst, dst, imm);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    case BPF_ALU64 | BPF_LSH | BPF_K:
>> +        emit_insn(ctx, sllid, dst, dst, imm);
>> +        break;
>> +
>> +    /* dst = dst >> src (logical) */
>> +    case BPF_ALU | BPF_RSH | BPF_X:
>> +        emit_insn(ctx, srlw, dst, dst, src);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    case BPF_ALU64 | BPF_RSH | BPF_X:
>> +        emit_insn(ctx, srld, dst, dst, src);
>> +        break;
>> +    /* dst = dst >> imm (logical) */
>> +    case BPF_ALU | BPF_RSH | BPF_K:
>> +        emit_insn(ctx, srliw, dst, dst, imm);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    case BPF_ALU64 | BPF_RSH | BPF_K:
>> +        emit_insn(ctx, srlid, dst, dst, imm);
>> +        break;
>> +
>> +    /* dst = dst >> src (arithmetic) */
>> +    case BPF_ALU | BPF_ARSH | BPF_X:
>> +        emit_insn(ctx, sraw, dst, dst, src);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    case BPF_ALU64 | BPF_ARSH | BPF_X:
>> +        emit_insn(ctx, srad, dst, dst, src);
>> +        break;
>> +    /* dst = dst >> imm (arithmetic) */
>> +    case BPF_ALU | BPF_ARSH | BPF_K:
>> +        emit_insn(ctx, sraiw, dst, dst, imm);
>> +        emit_zext_32(ctx, dst, is32);
>> +        break;
>> +    case BPF_ALU64 | BPF_ARSH | BPF_K:
>> +        emit_insn(ctx, sraid, dst, dst, imm);
>> +        break;
>> +
>> +    /* dst = BSWAP##imm(dst) */
>> +    case BPF_ALU | BPF_END | BPF_FROM_LE:
>> +        switch (imm) {
>> +        case 16:
>> +            /* zero-extend 16 bits into 64 bits */
>> +            emit_insn(ctx, sllid, dst, dst, 48);
>> +            emit_insn(ctx, srlid, dst, dst, 48);
>> +            break;
>> +        case 32:
>> +            /* zero-extend 32 bits into 64 bits */
>> +            emit_zext_32(ctx, dst, is32);
>> +            break;
>> +        case 64:
>> +            /* do nothing */
>> +            break;
>> +        }
>> +        break;
>> +    case BPF_ALU | BPF_END | BPF_FROM_BE:
>> +        switch (imm) {
>> +        case 16:
>> +            emit_insn(ctx, revb2h, dst, dst);
>> +            /* zero-extend 16 bits into 64 bits */
>> +            emit_insn(ctx, sllid, dst, dst, 48);
>> +            emit_insn(ctx, srlid, dst, dst, 48);
>> +            break;
>> +        case 32:
>> +            emit_insn(ctx, revb2w, dst, dst);
>> +            /* zero-extend 32 bits into 64 bits */
>> +            emit_zext_32(ctx, dst, is32);
>> +            break;
>> +        case 64:
>> +            emit_insn(ctx, revbd, dst, dst);
>> +            break;
>> +        }
>> +        break;
>> +
>> +    /* PC += off if dst cond src */
>> +    case BPF_JMP | BPF_JEQ | BPF_X:
>> +    case BPF_JMP | BPF_JNE | BPF_X:
>> +    case BPF_JMP | BPF_JGT | BPF_X:
>> +    case BPF_JMP | BPF_JGE | BPF_X:
>> +    case BPF_JMP | BPF_JLT | BPF_X:
>> +    case BPF_JMP | BPF_JLE | BPF_X:
>> +    case BPF_JMP | BPF_JSGT | BPF_X:
>> +    case BPF_JMP | BPF_JSGE | BPF_X:
>> +    case BPF_JMP | BPF_JSLT | BPF_X:
>> +    case BPF_JMP | BPF_JSLE | BPF_X:
>> +    case BPF_JMP32 | BPF_JEQ | BPF_X:
>> +    case BPF_JMP32 | BPF_JNE | BPF_X:
>> +    case BPF_JMP32 | BPF_JGT | BPF_X:
>> +    case BPF_JMP32 | BPF_JGE | BPF_X:
>> +    case BPF_JMP32 | BPF_JLT | BPF_X:
>> +    case BPF_JMP32 | BPF_JLE | BPF_X:
>> +    case BPF_JMP32 | BPF_JSGT | BPF_X:
>> +    case BPF_JMP32 | BPF_JSGE | BPF_X:
>> +    case BPF_JMP32 | BPF_JSLT | BPF_X:
>> +    case BPF_JMP32 | BPF_JSLE | BPF_X:
>> +        jmp_offset = bpf2la_offset(i, off, ctx);
>> +        move_reg(ctx, t1, dst);
>> +        move_reg(ctx, t2, src);
>> +        if (is_signed_bpf_cond(BPF_OP(code))) {
>> +            emit_sext_32(ctx, t1, is32);
>> +            emit_sext_32(ctx, t2, is32);
>> +        } else {
>> +            emit_zext_32(ctx, t1, is32);
>> +            emit_zext_32(ctx, t2, is32);
>> +        }
>> +        emit_cond_jmp(ctx, cond, t1, t2, jmp_offset);
>> +        break;
>> +
>> +    /* PC += off if dst cond imm */
>> +    case BPF_JMP | BPF_JEQ | BPF_K:
>> +    case BPF_JMP | BPF_JNE | BPF_K:
>> +    case BPF_JMP | BPF_JGT | BPF_K:
>> +    case BPF_JMP | BPF_JGE | BPF_K:
>> +    case BPF_JMP | BPF_JLT | BPF_K:
>> +    case BPF_JMP | BPF_JLE | BPF_K:
>> +    case BPF_JMP | BPF_JSGT | BPF_K:
>> +    case BPF_JMP | BPF_JSGE | BPF_K:
>> +    case BPF_JMP | BPF_JSLT | BPF_K:
>> +    case BPF_JMP | BPF_JSLE | BPF_K:
>> +    case BPF_JMP32 | BPF_JEQ | BPF_K:
>> +    case BPF_JMP32 | BPF_JNE | BPF_K:
>> +    case BPF_JMP32 | BPF_JGT | BPF_K:
>> +    case BPF_JMP32 | BPF_JGE | BPF_K:
>> +    case BPF_JMP32 | BPF_JLT | BPF_K:
>> +    case BPF_JMP32 | BPF_JLE | BPF_K:
>> +    case BPF_JMP32 | BPF_JSGT | BPF_K:
>> +    case BPF_JMP32 | BPF_JSGE | BPF_K:
>> +    case BPF_JMP32 | BPF_JSLT | BPF_K:
>> +    case BPF_JMP32 | BPF_JSLE | BPF_K:
>> +        jmp_offset = bpf2la_offset(i, off, ctx);
>> +        move_imm32(ctx, t1, imm, false);
>> +        move_reg(ctx, t2, dst);
>> +        if (is_signed_bpf_cond(BPF_OP(code))) {
>> +            emit_sext_32(ctx, t1, is32);
>> +            emit_sext_32(ctx, t2, is32);
>> +        } else {
>> +            emit_zext_32(ctx, t1, is32);
>> +            emit_zext_32(ctx, t2, is32);
>> +        }
>> +        emit_cond_jmp(ctx, cond, t2, t1, jmp_offset);
>> +        break;
>> +
>> +    /* PC += off if dst & src */
>> +    case BPF_JMP | BPF_JSET | BPF_X:
>> +    case BPF_JMP32 | BPF_JSET | BPF_X:
>> +        jmp_offset = bpf2la_offset(i, off, ctx);
>> +        emit_insn(ctx, and, t1, dst, src);
>> +        emit_zext_32(ctx, t1, is32);
>> +        emit_cond_jmp(ctx, cond, t1, LOONGARCH_GPR_ZERO, jmp_offset);
>> +        break;
>> +    /* PC += off if dst & imm */
>> +    case BPF_JMP | BPF_JSET | BPF_K:
>> +    case BPF_JMP32 | BPF_JSET | BPF_K:
>> +        jmp_offset = bpf2la_offset(i, off, ctx);
>> +        move_imm32(ctx, t1, imm, is32);
>> +        emit_insn(ctx, and, t1, dst, t1);
>> +        emit_zext_32(ctx, t1, is32);
>> +        emit_cond_jmp(ctx, cond, t1, LOONGARCH_GPR_ZERO, jmp_offset);
>> +        break;
>> +
>> +    /* PC += off */
>> +    case BPF_JMP | BPF_JA:
>> +        jmp_offset = bpf2la_offset(i, off, ctx);
>> +        emit_uncond_jmp(ctx, jmp_offset, is32);
>> +        break;
>> +
>> +    /* function call */
>> +    case BPF_JMP | BPF_CALL:
>> +        bool func_addr_fixed;
>> +        u64 func_addr;
>> +        int ret;
>> +
>> +        mark_call(ctx);
>> +        ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
>> +                        &func_addr, &func_addr_fixed);
>> +        if (ret < 0)
>> +            return ret;
>> +
>> +        move_imm64(ctx, t1, func_addr, is32);
>> +        emit_insn(ctx, jirl, t1, LOONGARCH_GPR_RA, 0);
>> +        move_reg(ctx, regmap[BPF_REG_0], LOONGARCH_GPR_A0);
>> +        break;
>> +
>> +    /* tail call */
>> +    case BPF_JMP | BPF_TAIL_CALL:
>> +        mark_tail_call(ctx);
>> +        if (emit_bpf_tail_call(ctx))
>> +            return -EINVAL;
>> +        break;
>> +
>> +    /* function return */
>> +    case BPF_JMP | BPF_EXIT:
>> +        emit_sext_32(ctx, regmap[BPF_REG_0], true);
>> +
>> +        if (i == ctx->prog->len - 1)
>> +            break;
>> +
>> +        jmp_offset = epilogue_offset(ctx);
>> +        emit_uncond_jmp(ctx, jmp_offset, true);
>> +        break;
>> +
>> +    /* dst = imm64 */
>> +    case BPF_LD | BPF_IMM | BPF_DW:
>> +        u64 imm64 = (u64)(insn + 1)->imm << 32 | (u32)insn->imm;
>> +
>> +        move_imm64(ctx, dst, imm64, is32);
>> +        return 1;
>> +
>> +    /* dst = *(size *)(src + off) */
>> +    case BPF_LDX | BPF_MEM | BPF_B:
>> +    case BPF_LDX | BPF_MEM | BPF_H:
>> +    case BPF_LDX | BPF_MEM | BPF_W:
>> +    case BPF_LDX | BPF_MEM | BPF_DW:
>> +        if (is_signed_imm12(off)) {
>> +            switch (BPF_SIZE(code)) {
>> +            case BPF_B:
>> +                emit_insn(ctx, ldbu, dst, src, off);
>> +                break;
>> +            case BPF_H:
>> +                emit_insn(ctx, ldhu, dst, src, off);
>> +                break;
>> +            case BPF_W:
>> +                emit_insn(ctx, ldwu, dst, src, off);
>> +                break;
>> +            case BPF_DW:
>> +                emit_insn(ctx, ldd, dst, src, off);
>> +                break;
>> +            }
>> +        } else {
>> +            move_imm32(ctx, t1, off, is32);
>> +            switch (BPF_SIZE(code)) {
>> +            case BPF_B:
>> +                emit_insn(ctx, ldxbu, dst, src, t1);
>> +                break;
>> +            case BPF_H:
>> +                emit_insn(ctx, ldxhu, dst, src, t1);
>> +                break;
>> +            case BPF_W:
>> +                emit_insn(ctx, ldxwu, dst, src, t1);
>> +                break;
>> +            case BPF_DW:
>> +                emit_insn(ctx, ldxd, dst, src, t1);
>> +                break;
>> +            }
>> +        }
>> +        break;
>> +
>> +    /* *(size *)(dst + off) = imm */
>> +    case BPF_ST | BPF_MEM | BPF_B:
>> +    case BPF_ST | BPF_MEM | BPF_H:
>> +    case BPF_ST | BPF_MEM | BPF_W:
>> +    case BPF_ST | BPF_MEM | BPF_DW:
>> +        move_imm32(ctx, t1, imm, is32);
>> +        if (is_signed_imm12(off)) {
>> +            switch (BPF_SIZE(code)) {
>> +            case BPF_B:
>> +                emit_insn(ctx, stb, t1, dst, off);
>> +                break;
>> +            case BPF_H:
>> +                emit_insn(ctx, sth, t1, dst, off);
>> +                break;
>> +            case BPF_W:
>> +                emit_insn(ctx, stw, t1, dst, off);
>> +                break;
>> +            case BPF_DW:
>> +                emit_insn(ctx, std, t1, dst, off);
>> +                break;
>> +            }
>> +        } else {
>> +            move_imm32(ctx, t2, off, is32);
>> +            switch (BPF_SIZE(code)) {
>> +            case BPF_B:
>> +                emit_insn(ctx, stxb, t1, dst, t2);
>> +                break;
>> +            case BPF_H:
>> +                emit_insn(ctx, stxh, t1, dst, t2);
>> +                break;
>> +            case BPF_W:
>> +                emit_insn(ctx, stxw, t1, dst, t2);
>> +                break;
>> +            case BPF_DW:
>> +                emit_insn(ctx, stxd, t1, dst, t2);
>> +                break;
>> +            }
>> +        }
>> +        break;
>> +
>> +    /* *(size *)(dst + off) = src */
>> +    case BPF_STX | BPF_MEM | BPF_B:
>> +    case BPF_STX | BPF_MEM | BPF_H:
>> +    case BPF_STX | BPF_MEM | BPF_W:
>> +    case BPF_STX | BPF_MEM | BPF_DW:
>> +        if (is_signed_imm12(off)) {
>> +            switch (BPF_SIZE(code)) {
>> +            case BPF_B:
>> +                emit_insn(ctx, stb, src, dst, off);
>> +                break;
>> +            case BPF_H:
>> +                emit_insn(ctx, sth, src, dst, off);
>> +                break;
>> +            case BPF_W:
>> +                emit_insn(ctx, stw, src, dst, off);
>> +                break;
>> +            case BPF_DW:
>> +                emit_insn(ctx, std, src, dst, off);
>> +                break;
>> +            }
>> +        } else {
>> +            move_imm32(ctx, t1, off, is32);
>> +            switch (BPF_SIZE(code)) {
>> +            case BPF_B:
>> +                emit_insn(ctx, stxb, src, dst, t1);
>> +                break;
>> +            case BPF_H:
>> +                emit_insn(ctx, stxh, src, dst, t1);
>> +                break;
>> +            case BPF_W:
>> +                emit_insn(ctx, stxw, src, dst, t1);
>> +                break;
>> +            case BPF_DW:
>> +                emit_insn(ctx, stxd, src, dst, t1);
>> +                break;
>> +            }
>> +        }
>> +        break;
>> +
>> +    case BPF_STX | BPF_ATOMIC | BPF_W:
>> +    case BPF_STX | BPF_ATOMIC | BPF_DW:
>> +        emit_atomic(insn, ctx);
>> +        break;
>> +
>> +    default:
>> +        pr_err("bpf_jit: unknown opcode %02x\n", code);
>> +        return -EINVAL;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int build_body(struct jit_ctx *ctx, bool extra_pass)
>> +{
>> +    const struct bpf_prog *prog = ctx->prog;
>> +    int i;
>> +
>> +    for (i = 0; i < prog->len; i++) {
>> +        const struct bpf_insn *insn = &prog->insnsi[i];
>> +        int ret;
>> +
>> +        if (ctx->image == NULL)
>> +            ctx->offset[i] = ctx->idx;
>> +
>> +        ret = build_insn(insn, ctx, extra_pass);
>> +        if (ret > 0) {
>> +            i++;
>> +            if (ctx->image == NULL)
>> +                ctx->offset[i] = ctx->idx;
>> +            continue;
>> +        }
>> +        if (ret)
>> +            return ret;
>> +    }
>> +
>> +    if (ctx->image == NULL)
>> +        ctx->offset[i] = ctx->idx;
>> +
>> +    return 0;
>> +}
>> +
>> +static inline void bpf_flush_icache(void *start, void *end)
>> +{
>> +    flush_icache_range((unsigned long)start, (unsigned long)end);
>> +}
>> +
>> +/* Fill space with illegal instructions */
>> +static void jit_fill_hole(void *area, unsigned int size)
>> +{
>> +    u32 *ptr;
>> +
>> +    /* We are guaranteed to have aligned memory */
>> +    for (ptr = area; size >= sizeof(u32); size -= sizeof(u32))
>> +        *ptr++ = INSN_BREAK;
>> +}
>> +
>> +static int validate_code(struct jit_ctx *ctx)
>> +{
>> +    int i;
>> +    union loongarch_instruction insn;
>> +
>> +    for (i = 0; i < ctx->idx; i++) {
>> +        insn = ctx->image[i];
>> +        /* Check INSN_BREAK */
>> +        if (insn.word == INSN_BREAK)
>> +            return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>> +{
>> +    struct bpf_prog *tmp, *orig_prog = prog;
>> +    struct bpf_binary_header *header;
>> +    struct jit_data *jit_data;
>> +    struct jit_ctx ctx;
>> +    bool tmp_blinded = false;
>> +    bool extra_pass = false;
>> +    int image_size;
>> +    u8 *image_ptr;
>> +
>> +    /*
>> +     * If BPF JIT was not enabled then we must fall back to
>> +     * the interpreter.
>> +     */
>> +    if (!prog->jit_requested)
>> +        return orig_prog;
>> +
>> +    tmp = bpf_jit_blind_constants(prog);
>> +    /*
>> +     * If blinding was requested and we failed during blinding,
>> +     * we must fall back to the interpreter. Otherwise, we save
>> +     * the new JITed code.
>> +     */
>> +    if (IS_ERR(tmp))
>> +        return orig_prog;
>> +    if (tmp != prog) {
>> +        tmp_blinded = true;
>> +        prog = tmp;
>> +    }
>> +
>> +    jit_data = prog->aux->jit_data;
>> +    if (!jit_data) {
>> +        jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
>> +        if (!jit_data) {
>> +            prog = orig_prog;
>> +            goto out;
>> +        }
>> +        prog->aux->jit_data = jit_data;
>> +    }
>> +    if (jit_data->ctx.offset) {
>> +        ctx = jit_data->ctx;
>> +        image_ptr = jit_data->image;
>> +        header = jit_data->header;
>> +        extra_pass = true;
>> +        image_size = sizeof(u32) * ctx.idx;
>> +        goto skip_init_ctx;
>> +    }
>> +
>> +    memset(&ctx, 0, sizeof(ctx));
>> +    ctx.prog = prog;
>> +
>> +    ctx.offset = kcalloc(prog->len + 1, sizeof(u32), GFP_KERNEL);
>> +    if (ctx.offset == NULL) {
>> +        prog = orig_prog;
>> +        goto out_off;
>> +    }
>> +
>> +    /* 1. Initial fake pass to compute ctx->idx and set ctx->flags */
>> +    if (build_body(&ctx, extra_pass)) {
>> +        prog = orig_prog;
>> +        goto out_off;
>> +    }
>> +    build_prologue(&ctx);
>> +    ctx.epilogue_offset = ctx.idx;
>> +    build_epilogue(&ctx);
>> +
>> +    /* Now we know the actual image size.
>> +     * As each LoongArch instruction is of length 32bit,
>> +     * we are translating number of JITed intructions into
>> +     * the size required to store these JITed code.
>> +     */
>> +    image_size = sizeof(u32) * ctx.idx;
>> +    /* Now we know the size of the structure to make */
>> +    header = bpf_jit_binary_alloc(image_size, &image_ptr,
>> +                      sizeof(u32), jit_fill_hole);
>> +    if (header == NULL) {
>> +        prog = orig_prog;
>> +        goto out_off;
>> +    }
>> +
>> +    /* 2. Now, the actual pass to generate final JIT code */
>> +    ctx.image = (union loongarch_instruction *)image_ptr;
>> +skip_init_ctx:
>> +    ctx.idx = 0;
>> +
>> +    build_prologue(&ctx);
>> +    if (build_body(&ctx, extra_pass)) {
>> +        bpf_jit_binary_free(header);
>> +        prog = orig_prog;
>> +        goto out_off;
>> +    }
>> +    build_epilogue(&ctx);
>> +
>> +    /* 3. Extra pass to validate JITed code */
>> +    if (validate_code(&ctx)) {
>> +        bpf_jit_binary_free(header);
>> +        prog = orig_prog;
>> +        goto out_off;
>> +    }
>> +
>> +    /* And we're done */
>> +    if (bpf_jit_enable > 1)
>> +        bpf_jit_dump(prog->len, image_size, 2, ctx.image);
>> +
>> +    /* Update the icache */
>> +    bpf_flush_icache(header, ctx.image + ctx.idx);
>> +
>> +    if (!prog->is_func || extra_pass) {
>> +        if (extra_pass && ctx.idx != jit_data->ctx.idx) {
>> +            pr_err_once("multi-func JIT bug %d != %d\n",
>> +                    ctx.idx, jit_data->ctx.idx);
>> +            bpf_jit_binary_free(header);
>> +            prog->bpf_func = NULL;
>> +            prog->jited = 0;
>> +            prog->jited_len = 0;
>> +            goto out_off;
>> +        }
>> +        bpf_jit_binary_lock_ro(header);
>> +    } else {
>> +        jit_data->ctx = ctx;
>> +        jit_data->image = image_ptr;
>> +        jit_data->header = header;
>> +    }
>> +    prog->bpf_func = (void *)ctx.image;
>> +    prog->jited = 1;
>> +    prog->jited_len = image_size;
>> +
>> +    if (!prog->is_func || extra_pass) {
>> +out_off:
>> +        kfree(ctx.offset);
>> +        kfree(jit_data);
>> +        prog->aux->jit_data = NULL;
>> +    }
>> +out:
>> +    if (tmp_blinded)
>> +        bpf_jit_prog_release_other(prog, prog == orig_prog ?
>> +                       tmp : orig_prog);
>> +
>> +    out_offset = -1;
>> +    return prog;
>> +}
>> diff --git a/arch/loongarch/net/bpf_jit.h b/arch/loongarch/net/bpf_jit.h
>> new file mode 100644
>> index 0000000..9c735f3
>> --- /dev/null
>> +++ b/arch/loongarch/net/bpf_jit.h
>> @@ -0,0 +1,308 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * BPF JIT compiler for LoongArch
>> + *
>> + * Copyright (C) 2022 Loongson Technology Corporation Limited
>> + */
>> +#include <linux/bpf.h>
>> +#include <linux/filter.h>
>> +#include <asm/cacheflush.h>
>> +#include <asm/inst.h>
>> +
>> +struct jit_ctx {
>> +    const struct bpf_prog *prog;
>> +    unsigned int idx;
>> +    unsigned int flags;
>> +    unsigned int epilogue_offset;
>> +    u32 *offset;
>> +    union loongarch_instruction *image;
>> +    u32 stack_size;
>> +};
>> +
>> +struct jit_data {
>> +    struct bpf_binary_header *header;
>> +    u8 *image;
>> +    struct jit_ctx ctx;
>> +};
>> +
>> +#define emit_insn(ctx, func, ...)                        \
>> +do {                                        \
>> +    if (ctx->image != NULL) {                        \
>> +        union loongarch_instruction *insn = &ctx->image[ctx->idx];    \
>> +        emit_##func(insn, ##__VA_ARGS__);                \
>> +    }                                    \
>> +    ctx->idx++;                                \
>> +} while (0)
>> +
>> +#define is_signed_imm12(val)    signed_imm_check(val, 12)
>> +#define is_signed_imm16(val)    signed_imm_check(val, 16)
>> +#define is_signed_imm26(val)    signed_imm_check(val, 26)
>> +#define is_signed_imm32(val)    signed_imm_check(val, 32)
>> +#define is_signed_imm52(val)    signed_imm_check(val, 52)
>> +#define is_unsigned_imm12(val)    unsigned_imm_check(val, 12)
>> +
>> +static inline int bpf2la_offset(int bpf_insn, int off, const struct
>> jit_ctx *ctx)
>> +{
>> +    /* BPF JMP offset is relative to the next instruction */
>> +    bpf_insn++;
>> +    /*
>> +     * Whereas loongarch branch instructions encode the offset
>> +     * from the branch itself, so we must subtract 1 from the
>> +     * instruction offset.
>> +     */
>> +    return (ctx->offset[bpf_insn + off] - (ctx->offset[bpf_insn] - 1));
>> +}
>> +
>> +static inline int epilogue_offset(const struct jit_ctx *ctx)
>> +{
>> +    int to = ctx->epilogue_offset;
>> +    int from = ctx->idx;
>> +
>> +    return (to - from);
>> +}
>> +
>> +/* Zero-extend 32 bits into 64 bits */
>> +static inline void emit_zext_32(struct jit_ctx *ctx, enum
>> loongarch_gpr reg, bool is32)
>> +{
>> +    if (!is32)
>> +        return;
>> +
>> +    emit_insn(ctx, lu32id, reg, 0);
>> +}
>> +
>> +/* Signed-extend 32 bits into 64 bits */
>> +static inline void emit_sext_32(struct jit_ctx *ctx, enum
>> loongarch_gpr reg, bool is32)
>> +{
>> +    if (!is32)
>> +        return;
>> +
>> +    emit_insn(ctx, addiw, reg, reg, 0);
>> +}
>> +
>> +static inline void move_imm32(struct jit_ctx *ctx, enum loongarch_gpr
>> rd,
>> +                  int imm32, bool is32)
>> +{
>> +    int si20;
>> +    u32 ui12;
>> +
>> +    /* or rd, $zero, $zero */
>> +    if (imm32 == 0) {
>> +        emit_insn(ctx, or, rd, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_ZERO);
>> +        return;
>> +    }
>> +
>> +    /* addiw rd, $zero, imm_11_0(signed) */
>> +    if (is_signed_imm12(imm32)) {
>> +        emit_insn(ctx, addiw, rd, LOONGARCH_GPR_ZERO, imm32);
>> +        goto zext;
>> +    }
>> +
>> +    /* ori rd, $zero, imm_11_0(unsigned) */
>> +    if (is_unsigned_imm12(imm32)) {
>> +        emit_insn(ctx, ori, rd, LOONGARCH_GPR_ZERO, imm32);
>> +        goto zext;
>> +    }
>> +
>> +    /* lu12iw rd, imm_31_12(signed) */
>> +    si20 = (imm32 >> 12) & 0xfffff;
>> +    emit_insn(ctx, lu12iw, rd, si20);
>> +
>> +    /* ori rd, rd, imm_11_0(unsigned) */
>> +    ui12 = imm32 & 0xfff;
>> +    if (ui12 != 0)
>> +        emit_insn(ctx, ori, rd, rd, ui12);
>> +
>> +zext:
>> +    emit_zext_32(ctx, rd, is32);
>> +}
>> +
>> +static inline void move_imm64(struct jit_ctx *ctx, enum loongarch_gpr
>> rd,
>> +                  long imm64, bool is32)
>> +{
>> +    int imm32, si20, si12;
>> +    long imm52;
>> +
>> +    si12 = (imm64 >> 52) & 0xfff;
>> +    imm52 = imm64 & 0xfffffffffffff;
>> +    /* lu52id rd, $zero, imm_63_52(signed) */
>> +    if (si12 != 0 && imm52 == 0) {
>> +        emit_insn(ctx, lu52id, rd, LOONGARCH_GPR_ZERO, si12);
>> +        return;
>> +    }
>> +
>> +    imm32 = imm64 & 0xffffffff;
>> +    move_imm32(ctx, rd, imm32, is32);
>> +
>> +    if (!is_signed_imm32(imm64)) {
>> +        if (imm52 != 0) {
>> +            /* lu32id rd, imm_51_32(signed) */
>> +            si20 = (imm64 >> 32) & 0xfffff;
>> +            emit_insn(ctx, lu32id, rd, si20);
>> +        }
>> +
>> +        /* lu52id rd, rd, imm_63_52(signed) */
>> +        if (!is_signed_imm52(imm64))
>> +            emit_insn(ctx, lu52id, rd, rd, si12);
>> +    }
>> +}
>> +
>> +static inline void move_reg(struct jit_ctx *ctx, enum loongarch_gpr rd,
>> +                enum loongarch_gpr rj)
>> +{
>> +    emit_insn(ctx, or, rd, rj, LOONGARCH_GPR_ZERO);
>> +}
>> +
>> +static inline int invert_jmp_cond(u8 cond)
>> +{
>> +    switch (cond) {
>> +    case BPF_JEQ:
>> +        return BPF_JNE;
>> +    case BPF_JNE:
>> +    case BPF_JSET:
>> +        return BPF_JEQ;
>> +    case BPF_JGT:
>> +        return BPF_JLE;
>> +    case BPF_JGE:
>> +        return BPF_JLT;
>> +    case BPF_JLT:
>> +        return BPF_JGE;
>> +    case BPF_JLE:
>> +        return BPF_JGT;
>> +    case BPF_JSGT:
>> +        return BPF_JSLE;
>> +    case BPF_JSGE:
>> +        return BPF_JSLT;
>> +    case BPF_JSLT:
>> +        return BPF_JSGE;
>> +    case BPF_JSLE:
>> +        return BPF_JSGT;
>> +    }
>> +    return -1;
>> +}
>> +
>> +static inline void cond_jmp_offs16(struct jit_ctx *ctx, u8 cond, enum
>> loongarch_gpr rj,
>> +                   enum loongarch_gpr rd, int jmp_offset)
>> +{
>> +    switch (cond) {
>> +    case BPF_JEQ:
>> +        /* PC += jmp_offset if rj == rd */
>> +        emit_insn(ctx, beq, rj, rd, jmp_offset);
>> +        return;
>> +    case BPF_JNE:
>> +    case BPF_JSET:
>> +        /* PC += jmp_offset if rj != rd */
>> +        emit_insn(ctx, bne, rj, rd, jmp_offset);
>> +        return;
>> +    case BPF_JGT:
>> +        /* PC += jmp_offset if rj > rd (unsigned) */
>> +        emit_insn(ctx, bltu, rd, rj, jmp_offset);
>> +        return;
>> +    case BPF_JLT:
>> +        /* PC += jmp_offset if rj < rd (unsigned) */
>> +        emit_insn(ctx, bltu, rj, rd, jmp_offset);
>> +        return;
>> +    case BPF_JGE:
>> +        /* PC += jmp_offset if rj >= rd (unsigned) */
>> +        emit_insn(ctx, bgeu, rj, rd, jmp_offset);
>> +        return;
>> +    case BPF_JLE:
>> +        /* PC += jmp_offset if rj <= rd (unsigned) */
>> +        emit_insn(ctx, bgeu, rd, rj, jmp_offset);
>> +        return;
>> +    case BPF_JSGT:
>> +        /* PC += jmp_offset if rj > rd (signed) */
>> +        emit_insn(ctx, blt, rd, rj, jmp_offset);
>> +        return;
>> +    case BPF_JSLT:
>> +        /* PC += jmp_offset if rj < rd (signed) */
>> +        emit_insn(ctx, blt, rj, rd, jmp_offset);
>> +        return;
>> +    case BPF_JSGE:
>> +        /* PC += jmp_offset if rj >= rd (signed) */
>> +        emit_insn(ctx, bge, rj, rd, jmp_offset);
>> +        return;
>> +    case BPF_JSLE:
>> +        /* PC += jmp_offset if rj <= rd (signed) */
>> +        emit_insn(ctx, bge, rd, rj, jmp_offset);
>> +        return;
>> +    }
>> +}
>> +
>> +static inline void cond_jmp_offs26(struct jit_ctx *ctx, u8 cond, enum
>> loongarch_gpr rj,
>> +                   enum loongarch_gpr rd, int jmp_offset)
>> +{
>> +    cond = invert_jmp_cond(cond);
>> +    cond_jmp_offs16(ctx, cond, rj, rd, 2);
>> +    emit_insn(ctx, b, jmp_offset);
>> +}
>> +
>> +static inline void cond_jmp_offs32(struct jit_ctx *ctx, u8 cond, enum
>> loongarch_gpr rj,
>> +                   enum loongarch_gpr rd, int jmp_offset)
>> +{
>> +    s64 upper, lower;
>> +
>> +    upper = (jmp_offset + (1 << 15)) >> 16;
>> +    lower = jmp_offset & 0xffff;
>> +
>> +    cond = invert_jmp_cond(cond);
>> +    cond_jmp_offs16(ctx, cond, rj, rd, 3);
>> +
>> +    /*
>> +     * jmp_addr = jmp_offset << 2
>> +     * tmp2 = PC + jmp_addr[31, 18] + 18'b0
>> +     */
>> +    emit_insn(ctx, pcaddu18i, LOONGARCH_GPR_T2, upper << 2);
>> +
>> +    /* jump to (tmp2 + jmp_addr[17, 2] + 2'b0) */
>> +    emit_insn(ctx, jirl, LOONGARCH_GPR_T2, LOONGARCH_GPR_ZERO, lower
>> + 1);
>> +}
>> +
>> +static inline void uncond_jmp_offs26(struct jit_ctx *ctx, int
>> jmp_offset)
>> +{
>> +    emit_insn(ctx, b, jmp_offset);
>> +}
>> +
>> +static inline void uncond_jmp_offs32(struct jit_ctx *ctx, int
>> jmp_offset, bool is_exit)
>> +{
>> +    s64 upper, lower;
>> +
>> +    upper = (jmp_offset + (1 << 15)) >> 16;
>> +    lower = jmp_offset & 0xffff;
>> +
>> +    if (is_exit)
>> +        lower -= 1;
>> +
>> +    /*
>> +     * jmp_addr = jmp_offset << 2;
>> +     * tmp1 = PC + jmp_addr[31, 18] + 18'b0
>> +     */
>> +    emit_insn(ctx, pcaddu18i, LOONGARCH_GPR_T1, upper << 2);
>> +
>> +    /* jump to (tmp1 + jmp_addr[17, 2] + 2'b0) */
>> +    emit_insn(ctx, jirl, LOONGARCH_GPR_T1, LOONGARCH_GPR_ZERO, lower
>> + 1);
>> +}
>> +
>> +static inline void emit_cond_jmp(struct jit_ctx *ctx, u8 cond, enum
>> loongarch_gpr rj,
>> +                 enum loongarch_gpr rd, int jmp_offset)
>> +{
>> +    cond_jmp_offs26(ctx, cond, rj, rd, jmp_offset);
>> +}
>> +
>> +static inline void emit_uncond_jmp(struct jit_ctx *ctx, int
>> jmp_offset, bool is_exit)
>> +{
>> +    if (is_signed_imm26(jmp_offset))
>> +        uncond_jmp_offs26(ctx, jmp_offset);
>> +    else
>> +        uncond_jmp_offs32(ctx, jmp_offset, is_exit);
>> +}
>> +
>> +static inline void emit_tailcall_jmp(struct jit_ctx *ctx, u8 cond,
>> enum loongarch_gpr rj,
>> +                     enum loongarch_gpr rd, int jmp_offset)
>> +{
>> +    if (is_signed_imm16(jmp_offset))
>> +        cond_jmp_offs16(ctx, cond, rj, rd, jmp_offset);
>> +    else if (is_signed_imm26(jmp_offset))
>> +        cond_jmp_offs26(ctx, cond, rj, rd, jmp_offset - 1);
>> +    else
>> +        cond_jmp_offs32(ctx, cond, rj, rd, jmp_offset - 2);
>> +}
>>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support
  2022-08-22  1:58   ` Youling Tang
  2022-08-22  2:03     ` Youling Tang
@ 2022-08-22  2:49     ` Tiezhu Yang
  1 sibling, 0 replies; 15+ messages in thread
From: Tiezhu Yang @ 2022-08-22  2:49 UTC (permalink / raw)
  To: Youling Tang, Huacai Chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
  Cc: bpf, loongarch



On 08/22/2022 09:58 AM, Youling Tang wrote:
> On 08/20/2022 07:50 PM, Tiezhu Yang wrote:
>> BPF programs are normally handled by a BPF interpreter, add BPF JIT
>> support for LoongArch to allow the kernel to generate native code
>> when a program is loaded into the kernel, this will significantly
>> speed-up processing of BPF programs.

[...]

>> +#define DEF_EMIT_REG1I20_FORMAT(NAME, OP)                \
>> +static inline void emit_##NAME(union loongarch_instruction *insn,    \
>> +                   enum loongarch_gpr rd, int imm)        \
>> +{                                    \
>> +    insn->reg1i20_format.opcode = OP;                \
>> +    insn->reg1i20_format.immediate = imm;                \
>> +    insn->reg1i20_format.rd = rd;                    \
>> +}
>> +
>> +DEF_EMIT_REG1I20_FORMAT(lu12iw, lu12iw_op)
>> +DEF_EMIT_REG1I20_FORMAT(lu32id, lu32id_op)
>
> We can delete the larch_insn_gen_{lu32id, lu52id, jirl} functions in
> inst.c and use emit_xxx.
>
> The implementation of emit_plt_entry() is similarly modified as follows:
> struct plt_entry {
>         union loongarch_instruction lu12iw;
>         union loongarch_instruction lu32id;
>         union loongarch_instruction lu52id;
>         union loongarch_instruction jirl;
> };
>
> static inline struct plt_entry emit_plt_entry(unsigned long val)
> {
>         union loongarch_instruction *lu12iw, *lu32id, *lu52id, *jirl;
>
>         emit_lu32id(lu12iw, LOONGARCH_GPR_T1, ADDR_IMM(val, LU12IW));
>         emit_lu32id(lu32id, LOONGARCH_GPR_T1, ADDR_IMM(val, LU32ID));
>         emit_lu52id(lu52id, LOONGARCH_GPR_T1, LOONGARCH_GPR_T1,
> ADDR_IMM(val, LU52ID));
>         emit_jirl(jirl, LOONGARCH_GPR_T1, 0, (val & 0xfff) >> 2);
>
>         return (struct plt_entry) { *lu12iw, *lu32id, *lu52id, *jirl };
> }
>
> Thanks,
> Youling

Hi Youling,

Yes, this is the benefit we define the instructions in inst.h,
but these changes are not much related with this patch series,
I think we can do it after this patch series is merged.

Thanks,
Tiezhu


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support
  2022-08-20 11:50 ` [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support Tiezhu Yang
  2022-08-20 13:41   ` kernel test robot
  2022-08-22  1:58   ` Youling Tang
@ 2022-08-22  2:50   ` Jinyang He
  2022-08-25  2:27     ` Tiezhu Yang
  2 siblings, 1 reply; 15+ messages in thread
From: Jinyang He @ 2022-08-22  2:50 UTC (permalink / raw)
  To: Tiezhu Yang, Huacai Chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
  Cc: bpf, loongarch

Hi, Tiezhu,

On 2022/8/20 19:50, Tiezhu Yang wrote:
> BPF programs are normally handled by a BPF interpreter, add BPF JIT
> support for LoongArch to allow the kernel to generate native code
> when a program is loaded into the kernel, this will significantly
> speed-up processing of BPF programs.
>
> Co-developed-by: Youling Tang <tangyouling@loongson.cn>
> Signed-off-by: Youling Tang <tangyouling@loongson.cn>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
>   arch/loongarch/Kbuild             |    1 +
>   arch/loongarch/Kconfig            |    1 +
>   arch/loongarch/include/asm/inst.h |  185 ++++++
>   arch/loongarch/net/Makefile       |    7 +
>   arch/loongarch/net/bpf_jit.c      | 1113 +++++++++++++++++++++++++++++++++++++
>   arch/loongarch/net/bpf_jit.h      |  308 ++++++++++
>   6 files changed, 1615 insertions(+)
>   create mode 100644 arch/loongarch/net/Makefile
>   create mode 100644 arch/loongarch/net/bpf_jit.c
>   create mode 100644 arch/loongarch/net/bpf_jit.h
[...]
> +
> +/*
> + * eBPF prog stack layout:
> + *
> + *                                        high
> + * original $sp ------------> +-------------------------+ <--LOONGARCH_GPR_FP
> + *                            |           $ra           |
> + *                            +-------------------------+
> + *                            |           $fp           |
> + *                            +-------------------------+
> + *                            |           $s0           |
> + *                            +-------------------------+
> + *                            |           $s1           |
> + *                            +-------------------------+
> + *                            |           $s2           |
> + *                            +-------------------------+
> + *                            |           $s3           |
> + *                            +-------------------------+
> + *                            |           $s4           |
> + *                            +-------------------------+
> + *                            |           $s5           |
> + *                            +-------------------------+ <--BPF_REG_FP
> + *                            |  prog->aux->stack_depth |
> + *                            |        (optional)       |
> + * current $sp -------------> +-------------------------+
> + *                                        low
> + */
> +static void build_prologue(struct jit_ctx *ctx)
> +{
> +	int stack_adjust = 0, store_offset, bpf_stack_adjust;
> +
> +	bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
> +
> +	/* To store ra, fp, s0, s1, s2, s3, s4 and s5. */
> +	stack_adjust += sizeof(long) * 8;
> +
> +	stack_adjust = round_up(stack_adjust, 16);
> +	stack_adjust += bpf_stack_adjust;
> +
> +	/*
> +	 * First instruction initializes the tail call count (TCC).
> +	 * On tail call we skip this instruction, and the TCC is
> +	 * passed in REG_TCC from the caller.
> +	 */
> +	emit_insn(ctx, addid, REG_TCC, LOONGARCH_GPR_ZERO, MAX_TAIL_CALL_CNT);
> +
> +	emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, -stack_adjust);

Have you checked the stack size before this, such as in compiler or
common codes? Is there any chance of overflow 12bits range?

> +
> +	store_offset = stack_adjust - sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S0, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S1, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S2, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S3, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S4, LOONGARCH_GPR_SP, store_offset);
> +
> +	store_offset -= sizeof(long);
> +	emit_insn(ctx, std, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, store_offset);
> +
> +	emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_adjust);
> +
> +	if (bpf_stack_adjust)
> +		emit_insn(ctx, addid, regmap[BPF_REG_FP], LOONGARCH_GPR_SP, bpf_stack_adjust);
> +
> +	/*
> +	 * Program contains calls and tail calls, so REG_TCC need
> +	 * to be saved across calls.
> +	 */
> +	if (seen_tail_call(ctx) && seen_call(ctx))
> +		move_reg(ctx, TCC_SAVED, REG_TCC);
> +
> +	ctx->stack_size = stack_adjust;
> +}
> +
[...]
> +
> +/* initialized on the first pass of build_body() */
> +static int out_offset = -1;
> +static int emit_bpf_tail_call(struct jit_ctx *ctx)
> +{
> +	int off;
> +	u8 tcc = tail_call_reg(ctx);
> +	u8 a1 = LOONGARCH_GPR_A1;
> +	u8 a2 = LOONGARCH_GPR_A2;
> +	u8 t1 = LOONGARCH_GPR_T1;
> +	u8 t2 = LOONGARCH_GPR_T2;
> +	u8 t3 = LOONGARCH_GPR_T3;
> +	const int idx0 = ctx->idx;
> +
> +#define cur_offset (ctx->idx - idx0)
> +#define jmp_offset (out_offset - (cur_offset))
> +
> +	/*
> +	 * a0: &ctx
> +	 * a1: &array
> +	 * a2: index
> +	 *
> +	 * if (index >= array->map.max_entries)
> +	 *	 goto out;
> +	 */
> +	off = offsetof(struct bpf_array, map.max_entries);
> +	emit_insn(ctx, ldwu, t1, a1, off);
> +	/* bgeu $a2, $t1, jmp_offset */
> +	emit_tailcall_jmp(ctx, BPF_JGE, a2, t1, jmp_offset);
> +
> +	/*
> +	 * if (--TCC < 0)
> +	 *	 goto out;
> +	 */
> +	emit_insn(ctx, addid, REG_TCC, tcc, -1);
> +	emit_tailcall_jmp(ctx, BPF_JSLT, REG_TCC, LOONGARCH_GPR_ZERO, jmp_offset);
> +
> +	/*
> +	 * prog = array->ptrs[index];
> +	 * if (!prog)
> +	 *	 goto out;
> +	 */
> +	emit_insn(ctx, sllid, t2, a2, 3);
> +	emit_insn(ctx, addd, t2, t2, a1);
alsl.d
> +	off = offsetof(struct bpf_array, ptrs);
> +	emit_insn(ctx, ldd, t2, t2, off);
> +	/* beq $t2, $zero, jmp_offset */
> +	emit_tailcall_jmp(ctx, BPF_JEQ, t2, LOONGARCH_GPR_ZERO, jmp_offset);
> +
> +	/* goto *(prog->bpf_func + 4); */
> +	off = offsetof(struct bpf_prog, bpf_func);
> +	emit_insn(ctx, ldd, t3, t2, off);
> +	__build_epilogue(ctx, true);
> +
> +	/* out: */
> +	if (out_offset == -1)
> +		out_offset = cur_offset;
> +	if (cur_offset != out_offset) {
> +		pr_err_once("tail_call out_offset = %d, expected %d!\n",
> +			    cur_offset, out_offset);
> +		return -1;
> +	}
> +
> +	return 0;
> +#undef cur_offset
> +#undef jmp_offset
> +}
> +
[...]
> +
> +	/* dst = BSWAP##imm(dst) */
> +	case BPF_ALU | BPF_END | BPF_FROM_LE:
> +		switch (imm) {
> +		case 16:
> +			/* zero-extend 16 bits into 64 bits */
> +			emit_insn(ctx, sllid, dst, dst, 48);
> +			emit_insn(ctx, srlid, dst, dst, 48);
bstrpick.d
> +			break;
> +		case 32:
> +			/* zero-extend 32 bits into 64 bits */
> +			emit_zext_32(ctx, dst, is32);
> +			break;
> +		case 64:
> +			/* do nothing */
> +			break;
> +		}
> +		break;
> +	case BPF_ALU | BPF_END | BPF_FROM_BE:
> +		switch (imm) {
> +		case 16:
> +			emit_insn(ctx, revb2h, dst, dst);
> +			/* zero-extend 16 bits into 64 bits */
> +			emit_insn(ctx, sllid, dst, dst, 48);
> +			emit_insn(ctx, srlid, dst, dst, 48);
> +			break;
> +		case 32:
> +			emit_insn(ctx, revb2w, dst, dst);
> +			/* zero-extend 32 bits into 64 bits */
> +			emit_zext_32(ctx, dst, is32);
> +			break;
> +		case 64:
> +			emit_insn(ctx, revbd, dst, dst);
> +			break;
> +		}
> +		break;
> +
> +	/* PC += off if dst cond src */
> +	case BPF_JMP | BPF_JEQ | BPF_X:
> +	case BPF_JMP | BPF_JNE | BPF_X:
> +	case BPF_JMP | BPF_JGT | BPF_X:
> +	case BPF_JMP | BPF_JGE | BPF_X:
> +	case BPF_JMP | BPF_JLT | BPF_X:
> +	case BPF_JMP | BPF_JLE | BPF_X:
> +	case BPF_JMP | BPF_JSGT | BPF_X:
> +	case BPF_JMP | BPF_JSGE | BPF_X:
> +	case BPF_JMP | BPF_JSLT | BPF_X:
> +	case BPF_JMP | BPF_JSLE | BPF_X:
> +	case BPF_JMP32 | BPF_JEQ | BPF_X:
> +	case BPF_JMP32 | BPF_JNE | BPF_X:
> +	case BPF_JMP32 | BPF_JGT | BPF_X:
> +	case BPF_JMP32 | BPF_JGE | BPF_X:
> +	case BPF_JMP32 | BPF_JLT | BPF_X:
> +	case BPF_JMP32 | BPF_JLE | BPF_X:
> +	case BPF_JMP32 | BPF_JSGT | BPF_X:
> +	case BPF_JMP32 | BPF_JSGE | BPF_X:
> +	case BPF_JMP32 | BPF_JSLT | BPF_X:
> +	case BPF_JMP32 | BPF_JSLE | BPF_X:
> +		jmp_offset = bpf2la_offset(i, off, ctx);
> +		move_reg(ctx, t1, dst);
> +		move_reg(ctx, t2, src);
> +		if (is_signed_bpf_cond(BPF_OP(code))) {
> +			emit_sext_32(ctx, t1, is32);
> +			emit_sext_32(ctx, t2, is32);
> +		} else {
> +			emit_zext_32(ctx, t1, is32);
> +			emit_zext_32(ctx, t2, is32);
> +		}
> +		emit_cond_jmp(ctx, cond, t1, t2, jmp_offset);
> +		break;
> +
> +	/* PC += off if dst cond imm */
> +	case BPF_JMP | BPF_JEQ | BPF_K:
> +	case BPF_JMP | BPF_JNE | BPF_K:
> +	case BPF_JMP | BPF_JGT | BPF_K:
> +	case BPF_JMP | BPF_JGE | BPF_K:
> +	case BPF_JMP | BPF_JLT | BPF_K:
> +	case BPF_JMP | BPF_JLE | BPF_K:
> +	case BPF_JMP | BPF_JSGT | BPF_K:
> +	case BPF_JMP | BPF_JSGE | BPF_K:
> +	case BPF_JMP | BPF_JSLT | BPF_K:
> +	case BPF_JMP | BPF_JSLE | BPF_K:
> +	case BPF_JMP32 | BPF_JEQ | BPF_K:
> +	case BPF_JMP32 | BPF_JNE | BPF_K:
> +	case BPF_JMP32 | BPF_JGT | BPF_K:
> +	case BPF_JMP32 | BPF_JGE | BPF_K:
> +	case BPF_JMP32 | BPF_JLT | BPF_K:
> +	case BPF_JMP32 | BPF_JLE | BPF_K:
> +	case BPF_JMP32 | BPF_JSGT | BPF_K:
> +	case BPF_JMP32 | BPF_JSGE | BPF_K:
> +	case BPF_JMP32 | BPF_JSLT | BPF_K:
> +	case BPF_JMP32 | BPF_JSLE | BPF_K:
> +		jmp_offset = bpf2la_offset(i, off, ctx);
> +		move_imm32(ctx, t1, imm, false);
imm = 0 -> t1->zero
> +		move_reg(ctx, t2, dst);
> +		if (is_signed_bpf_cond(BPF_OP(code))) {
> +			emit_sext_32(ctx, t1, is32);
> +			emit_sext_32(ctx, t2, is32);
> +		} else {
> +			emit_zext_32(ctx, t1, is32);
> +			emit_zext_32(ctx, t2, is32);
> +		}
> +		emit_cond_jmp(ctx, cond, t2, t1, jmp_offset);
> +		break;
> +
> +	/* PC += off if dst & src */
> +	case BPF_JMP | BPF_JSET | BPF_X:
> +	case BPF_JMP32 | BPF_JSET | BPF_X:
> +		jmp_offset = bpf2la_offset(i, off, ctx);
> +		emit_insn(ctx, and, t1, dst, src);
> +		emit_zext_32(ctx, t1, is32);
> +		emit_cond_jmp(ctx, cond, t1, LOONGARCH_GPR_ZERO, jmp_offset);
> +		break;
> +	/* PC += off if dst & imm */
> +	case BPF_JMP | BPF_JSET | BPF_K:
> +	case BPF_JMP32 | BPF_JSET | BPF_K:
> +		jmp_offset = bpf2la_offset(i, off, ctx);
> +		move_imm32(ctx, t1, imm, is32);
> +		emit_insn(ctx, and, t1, dst, t1);
> +		emit_zext_32(ctx, t1, is32);
> +		emit_cond_jmp(ctx, cond, t1, LOONGARCH_GPR_ZERO, jmp_offset);
> +		break;
> +
> +	/* PC += off */
> +	case BPF_JMP | BPF_JA:
> +		jmp_offset = bpf2la_offset(i, off, ctx);
> +		emit_uncond_jmp(ctx, jmp_offset, is32);
> +		break;
> +
> +	/* function call */
> +	case BPF_JMP | BPF_CALL:
> +		bool func_addr_fixed;
> +		u64 func_addr;
> +		int ret;
> +
> +		mark_call(ctx);
> +		ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
> +					    &func_addr, &func_addr_fixed);
> +		if (ret < 0)
> +			return ret;
> +
> +		move_imm64(ctx, t1, func_addr, is32);
> +		emit_insn(ctx, jirl, t1, LOONGARCH_GPR_RA, 0);
> +		move_reg(ctx, regmap[BPF_REG_0], LOONGARCH_GPR_A0);
> +		break;
> +
> +	/* tail call */
> +	case BPF_JMP | BPF_TAIL_CALL:
> +		mark_tail_call(ctx);
> +		if (emit_bpf_tail_call(ctx))
> +			return -EINVAL;
> +		break;
> +
> +	/* function return */
> +	case BPF_JMP | BPF_EXIT:
> +		emit_sext_32(ctx, regmap[BPF_REG_0], true);
> +
> +		if (i == ctx->prog->len - 1)
> +			break;
> +
> +		jmp_offset = epilogue_offset(ctx);
> +		emit_uncond_jmp(ctx, jmp_offset, true);
> +		break;
> +
> +	/* dst = imm64 */
> +	case BPF_LD | BPF_IMM | BPF_DW:
> +		u64 imm64 = (u64)(insn + 1)->imm << 32 | (u32)insn->imm;
> +
> +		move_imm64(ctx, dst, imm64, is32);
> +		return 1;
> +
> +	/* dst = *(size *)(src + off) */
> +	case BPF_LDX | BPF_MEM | BPF_B:
> +	case BPF_LDX | BPF_MEM | BPF_H:
> +	case BPF_LDX | BPF_MEM | BPF_W:
> +	case BPF_LDX | BPF_MEM | BPF_DW:
> +		if (is_signed_imm12(off)) {
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, ldbu, dst, src, off);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, ldhu, dst, src, off);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, ldwu, dst, src, off);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, ldd, dst, src, off);
> +				break;
> +			}
> +		} else {
> +			move_imm32(ctx, t1, off, is32);
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, ldxbu, dst, src, t1);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, ldxhu, dst, src, t1);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, ldxwu, dst, src, t1);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, ldxd, dst, src, t1);
> +				break;

In BFD_W, BFF_DW cases, if offsets is quadruple and in 16bits range,
we can use [ld/st]ptr.[w/d].
> +			}
> +		}
> +		break;
> +
> +	/* *(size *)(dst + off) = imm */
> +	case BPF_ST | BPF_MEM | BPF_B:
> +	case BPF_ST | BPF_MEM | BPF_H:
> +	case BPF_ST | BPF_MEM | BPF_W:
> +	case BPF_ST | BPF_MEM | BPF_DW:
> +		move_imm32(ctx, t1, imm, is32);
> +		if (is_signed_imm12(off)) {
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, stb, t1, dst, off);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, sth, t1, dst, off);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, stw, t1, dst, off);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, std, t1, dst, off);
> +				break;
> +			}
> +		} else {
> +			move_imm32(ctx, t2, off, is32);
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, stxb, t1, dst, t2);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, stxh, t1, dst, t2);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, stxw, t1, dst, t2);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, stxd, t1, dst, t2);
> +				break;
> +			}
> +		}
> +		break;
> +
> +	/* *(size *)(dst + off) = src */
> +	case BPF_STX | BPF_MEM | BPF_B:
> +	case BPF_STX | BPF_MEM | BPF_H:
> +	case BPF_STX | BPF_MEM | BPF_W:
> +	case BPF_STX | BPF_MEM | BPF_DW:
> +		if (is_signed_imm12(off)) {
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, stb, src, dst, off);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, sth, src, dst, off);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, stw, src, dst, off);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, std, src, dst, off);
> +				break;
> +			}
> +		} else {
> +			move_imm32(ctx, t1, off, is32);
> +			switch (BPF_SIZE(code)) {
> +			case BPF_B:
> +				emit_insn(ctx, stxb, src, dst, t1);
> +				break;
> +			case BPF_H:
> +				emit_insn(ctx, stxh, src, dst, t1);
> +				break;
> +			case BPF_W:
> +				emit_insn(ctx, stxw, src, dst, t1);
> +				break;
> +			case BPF_DW:
> +				emit_insn(ctx, stxd, src, dst, t1);
> +				break;
> +			}
> +		}
> +		break;
> +
> +	case BPF_STX | BPF_ATOMIC | BPF_W:
> +	case BPF_STX | BPF_ATOMIC | BPF_DW:
> +		emit_atomic(insn, ctx);
> +		break;
> +
> +	default:
> +		pr_err("bpf_jit: unknown opcode %02x\n", code);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
[...]
> +
> +static inline void move_imm64(struct jit_ctx *ctx, enum loongarch_gpr rd,
> +			      long imm64, bool is32)
> +{
> +	int imm32, si20, si12;
> +	long imm52;
> +
> +	si12 = (imm64 >> 52) & 0xfff;
> +	imm52 = imm64 & 0xfffffffffffff;
> +	/* lu52id rd, $zero, imm_63_52(signed) */
> +	if (si12 != 0 && imm52 == 0) {
> +		emit_insn(ctx, lu52id, rd, LOONGARCH_GPR_ZERO, si12);
> +		return;
> +	}
> +
> +	imm32 = imm64 & 0xffffffff;
> +	move_imm32(ctx, rd, imm32, is32);
> +
> +	if (!is_signed_imm32(imm64)) {
> +		if (imm52 != 0) {
> +			/* lu32id rd, imm_51_32(signed) */
> +			si20 = (imm64 >> 32) & 0xfffff;
> +			emit_insn(ctx, lu32id, rd, si20);
Imm32 is signed, lu32i.d can be optimized in some cases.
> +		}
> +
> +		/* lu52id rd, rd, imm_63_52(signed) */
> +		if (!is_signed_imm52(imm64))
> +			emit_insn(ctx, lu52id, rd, rd, si12);
> +	}
> +}
> +
> +static inline void move_reg(struct jit_ctx *ctx, enum loongarch_gpr rd,
> +			    enum loongarch_gpr rj)
> +{
> +	emit_insn(ctx, or, rd, rj, LOONGARCH_GPR_ZERO);
> +}
> +
[...]
> +
> +static inline void emit_cond_jmp(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
> +				 enum loongarch_gpr rd, int jmp_offset)
> +{
> +	cond_jmp_offs26(ctx, cond, rj, rd, jmp_offset);

Why not call cond_jmp_offs16 as a preference?


Thanks,

Jinyang

> +}
> +
> +static inline void emit_uncond_jmp(struct jit_ctx *ctx, int jmp_offset, bool is_exit)
> +{
> +	if (is_signed_imm26(jmp_offset))
> +		uncond_jmp_offs26(ctx, jmp_offset);
> +	else
> +		uncond_jmp_offs32(ctx, jmp_offset, is_exit);
> +}
> +
> +static inline void emit_tailcall_jmp(struct jit_ctx *ctx, u8 cond, enum loongarch_gpr rj,
> +				     enum loongarch_gpr rd, int jmp_offset)
> +{
> +	if (is_signed_imm16(jmp_offset))
> +		cond_jmp_offs16(ctx, cond, rj, rd, jmp_offset);
> +	else if (is_signed_imm26(jmp_offset))
> +		cond_jmp_offs26(ctx, cond, rj, rd, jmp_offset - 1);
> +	else
> +		cond_jmp_offs32(ctx, cond, rj, rd, jmp_offset - 2);
> +}


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch
  2022-08-22  1:36 ` [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch Tiezhu Yang
@ 2022-08-23  0:46   ` Alexei Starovoitov
  2022-08-23 12:08     ` Huacai Chen
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2022-08-23  0:46 UTC (permalink / raw)
  To: Tiezhu Yang
  Cc: Huacai Chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, loongarch

On Sun, Aug 21, 2022 at 6:36 PM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>
>
>
> On 08/20/2022 07:50 PM, Tiezhu Yang wrote:
> > The basic support for LoongArch has been merged into the upstream Linux
> > kernel since 5.19-rc1 on June 5, 2022, this patch series adds BPF JIT
> > support for LoongArch.
> >
> > Here is the LoongArch documention:
> > https://www.kernel.org/doc/html/latest/loongarch/index.html
> >
> > With this patch series, the test cases in lib/test_bpf.ko have passed
> > on LoongArch.
> >
> >   # echo 1 > /proc/sys/net/core/bpf_jit_enable
> >   # modprobe test_bpf
> >   # dmesg | grep Summary
> >   test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]
> >   test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
> >   test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
> >
> > It seems that this patch series can not be applied cleanly to bpf-next
> > which is not synced to v6.0-rc1.
>
>
> Hi Alexei, Daniel, Andrii,
>
> Do you know which tree this patch series will go through?
> bpf-next or loongarch-next?

Whichever way is easier.
Looks like all changes are contained within arch/loongarch,
so there should be no conflicts with generic JIT infra.
In that sense it's fine to carry it in loongarch-next.
We can take it through bpf-next too with arch maintainers acks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch
  2022-08-23  0:46   ` Alexei Starovoitov
@ 2022-08-23 12:08     ` Huacai Chen
  0 siblings, 0 replies; 15+ messages in thread
From: Huacai Chen @ 2022-08-23 12:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tiezhu Yang, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, loongarch

Hi, all,

On Tue, Aug 23, 2022 at 8:46 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sun, Aug 21, 2022 at 6:36 PM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
> >
> >
> >
> > On 08/20/2022 07:50 PM, Tiezhu Yang wrote:
> > > The basic support for LoongArch has been merged into the upstream Linux
> > > kernel since 5.19-rc1 on June 5, 2022, this patch series adds BPF JIT
> > > support for LoongArch.
> > >
> > > Here is the LoongArch documention:
> > > https://www.kernel.org/doc/html/latest/loongarch/index.html
> > >
> > > With this patch series, the test cases in lib/test_bpf.ko have passed
> > > on LoongArch.
> > >
> > >   # echo 1 > /proc/sys/net/core/bpf_jit_enable
> > >   # modprobe test_bpf
> > >   # dmesg | grep Summary
> > >   test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]
> > >   test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
> > >   test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
> > >
> > > It seems that this patch series can not be applied cleanly to bpf-next
> > > which is not synced to v6.0-rc1.
> >
> >
> > Hi Alexei, Daniel, Andrii,
> >
> > Do you know which tree this patch series will go through?
> > bpf-next or loongarch-next?
>
> Whichever way is easier.
> Looks like all changes are contained within arch/loongarch,
> so there should be no conflicts with generic JIT infra.
> In that sense it's fine to carry it in loongarch-next.
> We can take it through bpf-next too with arch maintainers acks.
OK, both ways look good to me.

Huacai

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support
  2022-08-22  2:50   ` Jinyang He
@ 2022-08-25  2:27     ` Tiezhu Yang
  0 siblings, 0 replies; 15+ messages in thread
From: Tiezhu Yang @ 2022-08-25  2:27 UTC (permalink / raw)
  To: Jinyang He, Huacai Chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
  Cc: bpf, loongarch

Hi Jinyang,

Thank you very much for your review.

On 08/22/2022 10:50 AM, Jinyang He wrote:
> Hi, Tiezhu,
>

[...]

>> +    emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP,
>> -stack_adjust);
>
> Have you checked the stack size before this, such as in compiler or
> common codes? Is there any chance of overflow 12bits range?
>

MAX_BPF_STACK is 512, BPF program can access up to 512 bytes of stack
space, so it is OK here.

>> +    emit_insn(ctx, sllid, t2, a2, 3);
>> +    emit_insn(ctx, addd, t2, t2, a1);
> alsl.d

[...]

>> +            /* zero-extend 16 bits into 64 bits */
>> +            emit_insn(ctx, sllid, dst, dst, 48);
>> +            emit_insn(ctx, srlid, dst, dst, 48);
> bstrpick.d

[...]

>> +        move_imm32(ctx, t1, imm, false);
> imm = 0 -> t1->zero

[...]

> In BFD_W, BFF_DW cases, if offsets is quadruple and in 16bits range,
> we can use [ld/st]ptr.[w/d].

Good catch, I will modify the related code to save some instructions.

>> +static inline void emit_cond_jmp(struct jit_ctx *ctx, u8 cond, enum
>> loongarch_gpr rj,
>> +                 enum loongarch_gpr rd, int jmp_offset)
>> +{
>> +    cond_jmp_offs26(ctx, cond, rj, rd, jmp_offset);
>
> Why not call cond_jmp_offs16 as a preference?
>

Good question, this is intended to handle some special test cases.

A large PC-relative jump offset may overflow the immediate field of
the native conditional branch instruction, triggering a conversion
to use an absolute jump instead, this jump sequence is particularly
nasty. For now, let us use cond_jmp_offs26() directly to keep it
simple. In the future, maybe we can add support for far branching,
the branch relaxation requires more than two passes to converge,
the code seems too complex to understand, not quite sure whether it
is necessary and worth the extra pain. Anyway, just leave it as is
to enhance code readability now.

I will add code comments to explain the above considerations.

Thanks,
Tiezhu


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-08-25  2:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-20 11:50 [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch Tiezhu Yang
2022-08-20 11:50 ` [PATCH bpf-next v1 1/4] LoongArch: Move {signed,unsigned}_imm_check() to inst.h Tiezhu Yang
2022-08-20 11:50 ` [PATCH bpf-next v1 2/4] LoongArch: Add some instruction opcodes and formats Tiezhu Yang
2022-08-20 11:50 ` [PATCH bpf-next v1 3/4] LoongArch: Add BPF JIT support Tiezhu Yang
2022-08-20 13:41   ` kernel test robot
2022-08-22  1:33     ` Tiezhu Yang
2022-08-22  1:58   ` Youling Tang
2022-08-22  2:03     ` Youling Tang
2022-08-22  2:49     ` Tiezhu Yang
2022-08-22  2:50   ` Jinyang He
2022-08-25  2:27     ` Tiezhu Yang
2022-08-20 11:51 ` [PATCH bpf-next v1 4/4] LoongArch: Enable BPF_JIT and TEST_BPF in default config Tiezhu Yang
2022-08-22  1:36 ` [PATCH bpf-next v1 0/4] Add BPF JIT support for LoongArch Tiezhu Yang
2022-08-23  0:46   ` Alexei Starovoitov
2022-08-23 12:08     ` Huacai Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).