All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14] Initial BPF-JIT support for MIPS
@ 2014-04-08 11:47 ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

Hi,

This adds support for BPF-JIT for MIPS. Tested on mips32 LE/BE and mips64 BE
with a few networking tools such as tcpdump and dhcp but not all opcodes have
been tested as far as I can tell. There are a few optimizations left to be made
(fastpath for load operations instead of calling the helper functions) but
these can be added later on. If someone has complex network setups in place and
would like to give it a try, that would be much appreciated.

This patchset is for the upstream-sfr/mips-for-linux-next tree

Markos Chandras (14):
  MIPS: uasm: Add u3u2u1 instruction builders
  MIPS: uasm: Add u2u1 instruction builders
  MIPS: uasm: Add sllv uasm instruction
  MIPS: uasm: Add srlv uasm instruction
  MIPS: uasm: Add divu uasm instruction
  MIPS: uasm: Add mfhi uasm instruction
  MIPS: uasm: Add jalr uasm instruction
  MIPS: uasm: Add sltiu uasm instruction
  MIPS: uasm: Add sltu uasm instruction
  MIPS: uasm: Add wsbh uasm instruction
  MIPS: uasm: Add lh uam instruction
  MIPS: uasm: Add mul uasm instruction
  MIPS: net: Add BPF JIT
  MIPS: Enable the BPF_JIT symbol for MIPS

 arch/mips/Kbuild                  |    1 +
 arch/mips/Kconfig                 |    1 +
 arch/mips/include/asm/uasm.h      |   16 +
 arch/mips/include/uapi/asm/inst.h |   17 +
 arch/mips/mm/uasm-micromips.c     |   10 +
 arch/mips/mm/uasm-mips.c          |   10 +
 arch/mips/mm/uasm.c               |   39 +-
 arch/mips/net/Makefile            |    3 +
 arch/mips/net/bpf_jit.c           | 1327 +++++++++++++++++++++++++++++++++++++
 arch/mips/net/bpf_jit.h           |   45 ++
 10 files changed, 1462 insertions(+), 7 deletions(-)
 create mode 100644 arch/mips/net/Makefile
 create mode 100644 arch/mips/net/bpf_jit.c
 create mode 100644 arch/mips/net/bpf_jit.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 00/14] Initial BPF-JIT support for MIPS
@ 2014-04-08 11:47 ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

Hi,

This adds support for BPF-JIT for MIPS. Tested on mips32 LE/BE and mips64 BE
with a few networking tools such as tcpdump and dhcp but not all opcodes have
been tested as far as I can tell. There are a few optimizations left to be made
(fastpath for load operations instead of calling the helper functions) but
these can be added later on. If someone has complex network setups in place and
would like to give it a try, that would be much appreciated.

This patchset is for the upstream-sfr/mips-for-linux-next tree

Markos Chandras (14):
  MIPS: uasm: Add u3u2u1 instruction builders
  MIPS: uasm: Add u2u1 instruction builders
  MIPS: uasm: Add sllv uasm instruction
  MIPS: uasm: Add srlv uasm instruction
  MIPS: uasm: Add divu uasm instruction
  MIPS: uasm: Add mfhi uasm instruction
  MIPS: uasm: Add jalr uasm instruction
  MIPS: uasm: Add sltiu uasm instruction
  MIPS: uasm: Add sltu uasm instruction
  MIPS: uasm: Add wsbh uasm instruction
  MIPS: uasm: Add lh uam instruction
  MIPS: uasm: Add mul uasm instruction
  MIPS: net: Add BPF JIT
  MIPS: Enable the BPF_JIT symbol for MIPS

 arch/mips/Kbuild                  |    1 +
 arch/mips/Kconfig                 |    1 +
 arch/mips/include/asm/uasm.h      |   16 +
 arch/mips/include/uapi/asm/inst.h |   17 +
 arch/mips/mm/uasm-micromips.c     |   10 +
 arch/mips/mm/uasm-mips.c          |   10 +
 arch/mips/mm/uasm.c               |   39 +-
 arch/mips/net/Makefile            |    3 +
 arch/mips/net/bpf_jit.c           | 1327 +++++++++++++++++++++++++++++++++++++
 arch/mips/net/bpf_jit.h           |   45 ++
 10 files changed, 1462 insertions(+), 7 deletions(-)
 create mode 100644 arch/mips/net/Makefile
 create mode 100644 arch/mips/net/bpf_jit.c
 create mode 100644 arch/mips/net/bpf_jit.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 01/14] MIPS: uasm: Add u3u2u1 instruction builders
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by the sllv and srlv instructions.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h | 3 +++
 arch/mips/mm/uasm.c          | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index c33a956..c2f1f0b 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -55,6 +55,9 @@ void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b, unsigned int c)
 #define Ip_u2u1u3(op)							\
 void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b, unsigned int c)
 
+#define Ip_u3u2u1(op)							\
+void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b, unsigned int c)
+
 #define Ip_u3u1u2(op)							\
 void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b, unsigned int c)
 
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index b9d14b6..6210a0f 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -144,6 +144,13 @@ Ip_u2u1u3(op)						\
 }							\
 UASM_EXPORT_SYMBOL(uasm_i##op);
 
+#define I_u3u2u1(op)					\
+Ip_u3u2u1(op)						\
+{							\
+	build_insn(buf, insn##op, c, b, a);		\
+}							\
+UASM_EXPORT_SYMBOL(uasm_i##op);
+
 #define I_u3u1u2(op)					\
 Ip_u3u1u2(op)						\
 {							\
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 01/14] MIPS: uasm: Add u3u2u1 instruction builders
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by the sllv and srlv instructions.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h | 3 +++
 arch/mips/mm/uasm.c          | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index c33a956..c2f1f0b 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -55,6 +55,9 @@ void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b, unsigned int c)
 #define Ip_u2u1u3(op)							\
 void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b, unsigned int c)
 
+#define Ip_u3u2u1(op)							\
+void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b, unsigned int c)
+
 #define Ip_u3u1u2(op)							\
 void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b, unsigned int c)
 
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index b9d14b6..6210a0f 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -144,6 +144,13 @@ Ip_u2u1u3(op)						\
 }							\
 UASM_EXPORT_SYMBOL(uasm_i##op);
 
+#define I_u3u2u1(op)					\
+Ip_u3u2u1(op)						\
+{							\
+	build_insn(buf, insn##op, c, b, a);		\
+}							\
+UASM_EXPORT_SYMBOL(uasm_i##op);
+
 #define I_u3u1u2(op)					\
 Ip_u3u1u2(op)						\
 {							\
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 02/14] MIPS: uasm: Add u2u1 instruction builders
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later one for the jalr and wsbh instructions.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h | 3 +++
 arch/mips/mm/uasm.c          | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index c2f1f0b..449bef4 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -77,6 +77,9 @@ void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b, unsigned int c, \
 #define Ip_u1u2(op)							\
 void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b)
 
+#define Ip_u2u1(op)							\
+void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b)
+
 #define Ip_u1s2(op)							\
 void ISAOPC(op)(u32 **buf, unsigned int a, signed int b)
 
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 6210a0f..eb82bb1 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -207,6 +207,13 @@ Ip_u1u2(op)						\
 }							\
 UASM_EXPORT_SYMBOL(uasm_i##op);
 
+#define I_u2u1(op)					\
+Ip_u2u1(op)						\
+{							\
+	build_insn(buf, insn##op, b, a);		\
+}							\
+UASM_EXPORT_SYMBOL(uasm_i##op);
+
 #define I_u1s2(op)					\
 Ip_u1s2(op)						\
 {							\
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 02/14] MIPS: uasm: Add u2u1 instruction builders
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later one for the jalr and wsbh instructions.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h | 3 +++
 arch/mips/mm/uasm.c          | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index c2f1f0b..449bef4 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -77,6 +77,9 @@ void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b, unsigned int c, \
 #define Ip_u1u2(op)							\
 void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b)
 
+#define Ip_u2u1(op)							\
+void ISAOPC(op)(u32 **buf, unsigned int a, unsigned int b)
+
 #define Ip_u1s2(op)							\
 void ISAOPC(op)(u32 **buf, unsigned int a, signed int b)
 
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 6210a0f..eb82bb1 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -207,6 +207,13 @@ Ip_u1u2(op)						\
 }							\
 UASM_EXPORT_SYMBOL(uasm_i##op);
 
+#define I_u2u1(op)					\
+Ip_u2u1(op)						\
+{							\
+	build_insn(buf, insn##op, b, a);		\
+}							\
+UASM_EXPORT_SYMBOL(uasm_i##op);
+
 #define I_u1s2(op)					\
 Ip_u1s2(op)						\
 {							\
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 03/14] MIPS: uasm: Add sllv uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      | 1 +
 arch/mips/include/uapi/asm/inst.h | 1 +
 arch/mips/mm/uasm-micromips.c     | 1 +
 arch/mips/mm/uasm-mips.c          | 1 +
 arch/mips/mm/uasm.c               | 3 ++-
 5 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 449bef4..cda42cb 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -139,6 +139,7 @@ Ip_u2s3u1(_sc);
 Ip_u2s3u1(_scd);
 Ip_u2s3u1(_sd);
 Ip_u2u1u3(_sll);
+Ip_u3u2u1(_sllv);
 Ip_u2u1u3(_sra);
 Ip_u2u1u3(_srl);
 Ip_u3u1u2(_subu);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index df6e775..513ccc0 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -244,6 +244,7 @@ enum mm_32i_minor_op {
 enum mm_32a_minor_op {
 	mm_sll32_op = 0x000,
 	mm_ins_op = 0x00c,
+	mm_sllv32_op = 0x010,
 	mm_ext_op = 0x02c,
 	mm_pool32axf_op = 0x03c,
 	mm_srl32_op = 0x040,
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index b8d580c..489b762 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -94,6 +94,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_scd, 0, 0 },
 	{ insn_sd, 0, 0 },
 	{ insn_sll, M(mm_pool32a_op, 0, 0, 0, 0, mm_sll32_op), RT | RS | RD },
+	{ insn_sllv, M(mm_pool32a_op, 0, 0, 0, 0, mm_sllv32_op), RT | RS | RD },
 	{ insn_sra, M(mm_pool32a_op, 0, 0, 0, 0, mm_sra_op), RT | RS | RD },
 	{ insn_srl, M(mm_pool32a_op, 0, 0, 0, 0, mm_srl32_op), RT | RS | RD },
 	{ insn_rotr, M(mm_pool32a_op, 0, 0, 0, 0, mm_rotr_op), RT | RS | RD },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 3abd609..40ac140 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -102,6 +102,7 @@ static struct insn insn_table[] = {
 	{ insn_sc,  M(sc_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_sd,  M(sd_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_sll,  M(spec_op, 0, 0, 0, 0, sll_op),  RT | RD | RE },
+	{ insn_sllv,  M(spec_op, 0, 0, 0, 0, sllv_op),  RS | RT | RD },
 	{ insn_sra,  M(spec_op, 0, 0, 0, 0, sra_op),  RT | RD | RE },
 	{ insn_srl,  M(spec_op, 0, 0, 0, 0, srl_op),  RT | RD | RE },
 	{ insn_subu,  M(spec_op, 0, 0, 0, 0, subu_op),	RS | RT | RD },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index eb82bb1..bf9f370 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -52,7 +52,7 @@ enum opcode {
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jr, insn_ld, insn_ldx,
 	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mtc0,
 	insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sc, insn_scd,
-	insn_sd, insn_sll, insn_sra, insn_srl, insn_subu, insn_sw,
+	insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_subu, insn_sw,
 	insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor,
 	insn_xori,
 };
@@ -279,6 +279,7 @@ I_u2s3u1(_sc)
 I_u2s3u1(_scd)
 I_u2s3u1(_sd)
 I_u2u1u3(_sll)
+I_u3u2u1(_sllv)
 I_u2u1u3(_sra)
 I_u2u1u3(_srl)
 I_u2u1u3(_rotr)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 03/14] MIPS: uasm: Add sllv uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      | 1 +
 arch/mips/include/uapi/asm/inst.h | 1 +
 arch/mips/mm/uasm-micromips.c     | 1 +
 arch/mips/mm/uasm-mips.c          | 1 +
 arch/mips/mm/uasm.c               | 3 ++-
 5 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 449bef4..cda42cb 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -139,6 +139,7 @@ Ip_u2s3u1(_sc);
 Ip_u2s3u1(_scd);
 Ip_u2s3u1(_sd);
 Ip_u2u1u3(_sll);
+Ip_u3u2u1(_sllv);
 Ip_u2u1u3(_sra);
 Ip_u2u1u3(_srl);
 Ip_u3u1u2(_subu);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index df6e775..513ccc0 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -244,6 +244,7 @@ enum mm_32i_minor_op {
 enum mm_32a_minor_op {
 	mm_sll32_op = 0x000,
 	mm_ins_op = 0x00c,
+	mm_sllv32_op = 0x010,
 	mm_ext_op = 0x02c,
 	mm_pool32axf_op = 0x03c,
 	mm_srl32_op = 0x040,
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index b8d580c..489b762 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -94,6 +94,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_scd, 0, 0 },
 	{ insn_sd, 0, 0 },
 	{ insn_sll, M(mm_pool32a_op, 0, 0, 0, 0, mm_sll32_op), RT | RS | RD },
+	{ insn_sllv, M(mm_pool32a_op, 0, 0, 0, 0, mm_sllv32_op), RT | RS | RD },
 	{ insn_sra, M(mm_pool32a_op, 0, 0, 0, 0, mm_sra_op), RT | RS | RD },
 	{ insn_srl, M(mm_pool32a_op, 0, 0, 0, 0, mm_srl32_op), RT | RS | RD },
 	{ insn_rotr, M(mm_pool32a_op, 0, 0, 0, 0, mm_rotr_op), RT | RS | RD },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 3abd609..40ac140 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -102,6 +102,7 @@ static struct insn insn_table[] = {
 	{ insn_sc,  M(sc_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_sd,  M(sd_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_sll,  M(spec_op, 0, 0, 0, 0, sll_op),  RT | RD | RE },
+	{ insn_sllv,  M(spec_op, 0, 0, 0, 0, sllv_op),  RS | RT | RD },
 	{ insn_sra,  M(spec_op, 0, 0, 0, 0, sra_op),  RT | RD | RE },
 	{ insn_srl,  M(spec_op, 0, 0, 0, 0, srl_op),  RT | RD | RE },
 	{ insn_subu,  M(spec_op, 0, 0, 0, 0, subu_op),	RS | RT | RD },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index eb82bb1..bf9f370 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -52,7 +52,7 @@ enum opcode {
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jr, insn_ld, insn_ldx,
 	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mtc0,
 	insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sc, insn_scd,
-	insn_sd, insn_sll, insn_sra, insn_srl, insn_subu, insn_sw,
+	insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_subu, insn_sw,
 	insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor,
 	insn_xori,
 };
@@ -279,6 +279,7 @@ I_u2s3u1(_sc)
 I_u2s3u1(_scd)
 I_u2s3u1(_sd)
 I_u2u1u3(_sll)
+I_u3u2u1(_sllv)
 I_u2u1u3(_sra)
 I_u2u1u3(_srl)
 I_u2u1u3(_rotr)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 04/14] MIPS: uasm: Add srlv uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      | 1 +
 arch/mips/include/uapi/asm/inst.h | 1 +
 arch/mips/mm/uasm-micromips.c     | 1 +
 arch/mips/mm/uasm-mips.c          | 1 +
 arch/mips/mm/uasm.c               | 7 ++++---
 5 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index cda42cb..5132f00 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -142,6 +142,7 @@ Ip_u2u1u3(_sll);
 Ip_u3u2u1(_sllv);
 Ip_u2u1u3(_sra);
 Ip_u2u1u3(_srl);
+Ip_u3u2u1(_srlv);
 Ip_u3u1u2(_subu);
 Ip_u2s3u1(_sw);
 Ip_u1(_syscall);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index 513ccc0..3f973a0 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -249,6 +249,7 @@ enum mm_32a_minor_op {
 	mm_pool32axf_op = 0x03c,
 	mm_srl32_op = 0x040,
 	mm_sra_op = 0x080,
+	mm_srlv32_op = 0x090,
 	mm_rotr_op = 0x0c0,
 	mm_lwxs_op = 0x118,
 	mm_addu32_op = 0x150,
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index 489b762..97f4bfb 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -97,6 +97,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_sllv, M(mm_pool32a_op, 0, 0, 0, 0, mm_sllv32_op), RT | RS | RD },
 	{ insn_sra, M(mm_pool32a_op, 0, 0, 0, 0, mm_sra_op), RT | RS | RD },
 	{ insn_srl, M(mm_pool32a_op, 0, 0, 0, 0, mm_srl32_op), RT | RS | RD },
+	{ insn_srlv, M(mm_pool32a_op, 0, 0, 0, 0, mm_srlv32_op), RT | RS | RD },
 	{ insn_rotr, M(mm_pool32a_op, 0, 0, 0, 0, mm_rotr_op), RT | RS | RD },
 	{ insn_subu, M(mm_pool32a_op, 0, 0, 0, 0, mm_subu32_op), RT | RS | RD },
 	{ insn_sw, M(mm_sw32_op, 0, 0, 0, 0, 0), RT | RS | SIMM },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 40ac140..7bcc3b8 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -105,6 +105,7 @@ static struct insn insn_table[] = {
 	{ insn_sllv,  M(spec_op, 0, 0, 0, 0, sllv_op),  RS | RT | RD },
 	{ insn_sra,  M(spec_op, 0, 0, 0, 0, sra_op),  RT | RD | RE },
 	{ insn_srl,  M(spec_op, 0, 0, 0, 0, srl_op),  RT | RD | RE },
+	{ insn_srlv,  M(spec_op, 0, 0, 0, 0, srlv_op),  RS | RT | RD },
 	{ insn_subu,  M(spec_op, 0, 0, 0, 0, subu_op),	RS | RT | RD },
 	{ insn_sw,  M(sw_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_syscall, M(spec_op, 0, 0, 0, 0, syscall_op), SCIMM},
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index bf9f370..9d95c06 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -52,9 +52,9 @@ enum opcode {
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jr, insn_ld, insn_ldx,
 	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mtc0,
 	insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sc, insn_scd,
-	insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_subu, insn_sw,
-	insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor,
-	insn_xori,
+	insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_srlv, insn_subu,
+	insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr,
+	insn_xor, insn_xori,
 };
 
 struct insn {
@@ -282,6 +282,7 @@ I_u2u1u3(_sll)
 I_u3u2u1(_sllv)
 I_u2u1u3(_sra)
 I_u2u1u3(_srl)
+I_u3u2u1(_srlv)
 I_u2u1u3(_rotr)
 I_u3u1u2(_subu)
 I_u2s3u1(_sw)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 04/14] MIPS: uasm: Add srlv uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      | 1 +
 arch/mips/include/uapi/asm/inst.h | 1 +
 arch/mips/mm/uasm-micromips.c     | 1 +
 arch/mips/mm/uasm-mips.c          | 1 +
 arch/mips/mm/uasm.c               | 7 ++++---
 5 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index cda42cb..5132f00 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -142,6 +142,7 @@ Ip_u2u1u3(_sll);
 Ip_u3u2u1(_sllv);
 Ip_u2u1u3(_sra);
 Ip_u2u1u3(_srl);
+Ip_u3u2u1(_srlv);
 Ip_u3u1u2(_subu);
 Ip_u2s3u1(_sw);
 Ip_u1(_syscall);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index 513ccc0..3f973a0 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -249,6 +249,7 @@ enum mm_32a_minor_op {
 	mm_pool32axf_op = 0x03c,
 	mm_srl32_op = 0x040,
 	mm_sra_op = 0x080,
+	mm_srlv32_op = 0x090,
 	mm_rotr_op = 0x0c0,
 	mm_lwxs_op = 0x118,
 	mm_addu32_op = 0x150,
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index 489b762..97f4bfb 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -97,6 +97,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_sllv, M(mm_pool32a_op, 0, 0, 0, 0, mm_sllv32_op), RT | RS | RD },
 	{ insn_sra, M(mm_pool32a_op, 0, 0, 0, 0, mm_sra_op), RT | RS | RD },
 	{ insn_srl, M(mm_pool32a_op, 0, 0, 0, 0, mm_srl32_op), RT | RS | RD },
+	{ insn_srlv, M(mm_pool32a_op, 0, 0, 0, 0, mm_srlv32_op), RT | RS | RD },
 	{ insn_rotr, M(mm_pool32a_op, 0, 0, 0, 0, mm_rotr_op), RT | RS | RD },
 	{ insn_subu, M(mm_pool32a_op, 0, 0, 0, 0, mm_subu32_op), RT | RS | RD },
 	{ insn_sw, M(mm_sw32_op, 0, 0, 0, 0, 0), RT | RS | SIMM },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 40ac140..7bcc3b8 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -105,6 +105,7 @@ static struct insn insn_table[] = {
 	{ insn_sllv,  M(spec_op, 0, 0, 0, 0, sllv_op),  RS | RT | RD },
 	{ insn_sra,  M(spec_op, 0, 0, 0, 0, sra_op),  RT | RD | RE },
 	{ insn_srl,  M(spec_op, 0, 0, 0, 0, srl_op),  RT | RD | RE },
+	{ insn_srlv,  M(spec_op, 0, 0, 0, 0, srlv_op),  RS | RT | RD },
 	{ insn_subu,  M(spec_op, 0, 0, 0, 0, subu_op),	RS | RT | RD },
 	{ insn_sw,  M(sw_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_syscall, M(spec_op, 0, 0, 0, 0, syscall_op), SCIMM},
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index bf9f370..9d95c06 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -52,9 +52,9 @@ enum opcode {
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jr, insn_ld, insn_ldx,
 	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mtc0,
 	insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sc, insn_scd,
-	insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_subu, insn_sw,
-	insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor,
-	insn_xori,
+	insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_srlv, insn_subu,
+	insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr,
+	insn_xor, insn_xori,
 };
 
 struct insn {
@@ -282,6 +282,7 @@ I_u2u1u3(_sll)
 I_u3u2u1(_sllv)
 I_u2u1u3(_sra)
 I_u2u1u3(_srl)
+I_u3u2u1(_srlv)
 I_u2u1u3(_rotr)
 I_u3u1u2(_subu)
 I_u2s3u1(_sw)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 05/14] MIPS: uasm: Add divu uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      | 1 +
 arch/mips/include/uapi/asm/inst.h | 1 +
 arch/mips/mm/uasm-micromips.c     | 1 +
 arch/mips/mm/uasm-mips.c          | 1 +
 arch/mips/mm/uasm.c               | 3 ++-
 5 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 5132f00..236b2db 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -105,6 +105,7 @@ Ip_u2u1s3(_daddiu);
 Ip_u3u1u2(_daddu);
 Ip_u2u1msbu3(_dins);
 Ip_u2u1msbu3(_dinsm);
+Ip_u1u2(_divu);
 Ip_u1u2u3(_dmfc0);
 Ip_u1u2u3(_dmtc0);
 Ip_u2u1u3(_drotr);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index 3f973a0..c28cae1 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -305,6 +305,7 @@ enum mm_32axf_minor_op {
 	mm_jalrshb_op = 0x17c,
 	mm_syscall_op = 0x22d,
 	mm_eret_op = 0x3cd,
+	mm_divu_op = 0x5dc,
 };
 
 /*
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index 97f4bfb..dc448be 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -63,6 +63,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_cache, M(mm_pool32b_op, 0, 0, mm_cache_func, 0, 0), RT | RS | SIMM },
 	{ insn_daddu, 0, 0 },
 	{ insn_daddiu, 0, 0 },
+	{ insn_divu, M(mm_pool32a_op, 0, 0, 0, mm_divu_op, mm_pool32axf_op), RT | RS },
 	{ insn_dmfc0, 0, 0 },
 	{ insn_dmtc0, 0, 0 },
 	{ insn_dsll, 0, 0 },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 7bcc3b8..fe4be68 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -67,6 +67,7 @@ static struct insn insn_table[] = {
 	{ insn_daddu, M(spec_op, 0, 0, 0, 0, daddu_op), RS | RT | RD },
 	{ insn_dinsm, M(spec3_op, 0, 0, 0, 0, dinsm_op), RS | RT | RD | RE },
 	{ insn_dins, M(spec3_op, 0, 0, 0, 0, dins_op), RS | RT | RD | RE },
+	{ insn_divu, M(spec_op, 0, 0, 0, 0, divu_op), RS | RT },
 	{ insn_dmfc0, M(cop0_op, dmfc_op, 0, 0, 0, 0), RT | RD | SET},
 	{ insn_dmtc0, M(cop0_op, dmtc_op, 0, 0, 0, 0), RT | RD | SET},
 	{ insn_drotr32, M(spec_op, 1, 0, 0, 0, dsrl32_op), RT | RD | RE },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 9d95c06..c2b7c4b 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -47,7 +47,7 @@ enum opcode {
 	insn_addiu, insn_addu, insn_and, insn_andi, insn_bbit0, insn_bbit1,
 	insn_beq, insn_beql, insn_bgez, insn_bgezl, insn_bltz, insn_bltzl,
 	insn_bne, insn_cache, insn_daddiu, insn_daddu, insn_dins, insn_dinsm,
-	insn_dmfc0, insn_dmtc0, insn_drotr, insn_drotr32, insn_dsll,
+	insn_divu, insn_dmfc0, insn_dmtc0, insn_drotr, insn_drotr32, insn_dsll,
 	insn_dsll32, insn_dsra, insn_dsrl, insn_dsrl32, insn_dsubu, insn_eret,
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jr, insn_ld, insn_ldx,
 	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mtc0,
@@ -251,6 +251,7 @@ I_u1u2u3(_dmfc0)
 I_u1u2u3(_dmtc0)
 I_u2u1s3(_daddiu)
 I_u3u1u2(_daddu)
+I_u1u2(_divu)
 I_u2u1u3(_dsll)
 I_u2u1u3(_dsll32)
 I_u2u1u3(_dsra)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 05/14] MIPS: uasm: Add divu uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      | 1 +
 arch/mips/include/uapi/asm/inst.h | 1 +
 arch/mips/mm/uasm-micromips.c     | 1 +
 arch/mips/mm/uasm-mips.c          | 1 +
 arch/mips/mm/uasm.c               | 3 ++-
 5 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 5132f00..236b2db 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -105,6 +105,7 @@ Ip_u2u1s3(_daddiu);
 Ip_u3u1u2(_daddu);
 Ip_u2u1msbu3(_dins);
 Ip_u2u1msbu3(_dinsm);
+Ip_u1u2(_divu);
 Ip_u1u2u3(_dmfc0);
 Ip_u1u2u3(_dmtc0);
 Ip_u2u1u3(_drotr);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index 3f973a0..c28cae1 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -305,6 +305,7 @@ enum mm_32axf_minor_op {
 	mm_jalrshb_op = 0x17c,
 	mm_syscall_op = 0x22d,
 	mm_eret_op = 0x3cd,
+	mm_divu_op = 0x5dc,
 };
 
 /*
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index 97f4bfb..dc448be 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -63,6 +63,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_cache, M(mm_pool32b_op, 0, 0, mm_cache_func, 0, 0), RT | RS | SIMM },
 	{ insn_daddu, 0, 0 },
 	{ insn_daddiu, 0, 0 },
+	{ insn_divu, M(mm_pool32a_op, 0, 0, 0, mm_divu_op, mm_pool32axf_op), RT | RS },
 	{ insn_dmfc0, 0, 0 },
 	{ insn_dmtc0, 0, 0 },
 	{ insn_dsll, 0, 0 },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 7bcc3b8..fe4be68 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -67,6 +67,7 @@ static struct insn insn_table[] = {
 	{ insn_daddu, M(spec_op, 0, 0, 0, 0, daddu_op), RS | RT | RD },
 	{ insn_dinsm, M(spec3_op, 0, 0, 0, 0, dinsm_op), RS | RT | RD | RE },
 	{ insn_dins, M(spec3_op, 0, 0, 0, 0, dins_op), RS | RT | RD | RE },
+	{ insn_divu, M(spec_op, 0, 0, 0, 0, divu_op), RS | RT },
 	{ insn_dmfc0, M(cop0_op, dmfc_op, 0, 0, 0, 0), RT | RD | SET},
 	{ insn_dmtc0, M(cop0_op, dmtc_op, 0, 0, 0, 0), RT | RD | SET},
 	{ insn_drotr32, M(spec_op, 1, 0, 0, 0, dsrl32_op), RT | RD | RE },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 9d95c06..c2b7c4b 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -47,7 +47,7 @@ enum opcode {
 	insn_addiu, insn_addu, insn_and, insn_andi, insn_bbit0, insn_bbit1,
 	insn_beq, insn_beql, insn_bgez, insn_bgezl, insn_bltz, insn_bltzl,
 	insn_bne, insn_cache, insn_daddiu, insn_daddu, insn_dins, insn_dinsm,
-	insn_dmfc0, insn_dmtc0, insn_drotr, insn_drotr32, insn_dsll,
+	insn_divu, insn_dmfc0, insn_dmtc0, insn_drotr, insn_drotr32, insn_dsll,
 	insn_dsll32, insn_dsra, insn_dsrl, insn_dsrl32, insn_dsubu, insn_eret,
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jr, insn_ld, insn_ldx,
 	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mtc0,
@@ -251,6 +251,7 @@ I_u1u2u3(_dmfc0)
 I_u1u2u3(_dmtc0)
 I_u2u1s3(_daddiu)
 I_u3u1u2(_daddu)
+I_u1u2(_divu)
 I_u2u1u3(_dsll)
 I_u2u1u3(_dsll32)
 I_u2u1u3(_dsra)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 06/14] MIPS: uasm: Add mfhi uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      |  1 +
 arch/mips/include/uapi/asm/inst.h |  1 +
 arch/mips/mm/uasm-micromips.c     |  1 +
 arch/mips/mm/uasm-mips.c          |  1 +
 arch/mips/mm/uasm.c               | 11 ++++++-----
 5 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 236b2db..9da0ceb 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -130,6 +130,7 @@ Ip_u1s2(_lui);
 Ip_u2s3u1(_lw);
 Ip_u3u1u2(_lwx);
 Ip_u1u2u3(_mfc0);
+Ip_u1(_mfhi);
 Ip_u1u2u3(_mtc0);
 Ip_u3u1u2(_or);
 Ip_u2u1u3(_ori);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index c28cae1..f46930f 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -296,6 +296,7 @@ enum mm_32axf_minor_op {
 	mm_mfc0_op = 0x003,
 	mm_mtc0_op = 0x00b,
 	mm_tlbp_op = 0x00d,
+	mm_mfhi32_op = 0x035,
 	mm_jalr_op = 0x03c,
 	mm_tlbr_op = 0x04d,
 	mm_jalrhb_op = 0x07c,
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index dc448be..fd5ab06 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -86,6 +86,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_lui, M(mm_pool32i_op, mm_lui_op, 0, 0, 0, 0), RS | SIMM },
 	{ insn_lw, M(mm_lw32_op, 0, 0, 0, 0, 0), RT | RS | SIMM },
 	{ insn_mfc0, M(mm_pool32a_op, 0, 0, 0, mm_mfc0_op, mm_pool32axf_op), RT | RS | RD },
+	{ insn_mfhi, M(mm_pool32a_op, 0, 0, 0, mm_mfhi32_op, mm_pool32axf_op), RS },
 	{ insn_mtc0, M(mm_pool32a_op, 0, 0, 0, mm_mtc0_op, mm_pool32axf_op), RT | RS | RD },
 	{ insn_or, M(mm_pool32a_op, 0, 0, 0, 0, mm_or32_op), RT | RS | RD },
 	{ insn_ori, M(mm_ori32_op, 0, 0, 0, 0, 0), RT | RS | UIMM },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index fe4be68..15e4048 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -93,6 +93,7 @@ static struct insn insn_table[] = {
 	{ insn_lw,  M(lw_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_lwx, M(spec3_op, 0, 0, 0, lwx_op, lx_op), RS | RT | RD },
 	{ insn_mfc0,  M(cop0_op, mfc_op, 0, 0, 0, 0),  RT | RD | SET},
+	{ insn_mfhi,  M(spec_op, 0, 0, 0, 0, mfhi_op),  RT | RD | SET},
 	{ insn_mtc0,  M(cop0_op, mtc_op, 0, 0, 0, 0),  RT | RD | SET},
 	{ insn_ori,  M(ori_op, 0, 0, 0, 0, 0),	RS | RT | UIMM },
 	{ insn_or,  M(spec_op, 0, 0, 0, 0, or_op),  RS | RT | RD },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index c2b7c4b..7ecde42 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -50,11 +50,11 @@ enum opcode {
 	insn_divu, insn_dmfc0, insn_dmtc0, insn_drotr, insn_drotr32, insn_dsll,
 	insn_dsll32, insn_dsra, insn_dsrl, insn_dsrl32, insn_dsubu, insn_eret,
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jr, insn_ld, insn_ldx,
-	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mtc0,
-	insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sc, insn_scd,
-	insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_srlv, insn_subu,
-	insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr,
-	insn_xor, insn_xori,
+	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mfhi,
+	insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sc,
+	insn_scd, insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_srlv,
+	insn_subu, insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi,
+	insn_tlbwr, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -272,6 +272,7 @@ I_u2s3u1(_lld)
 I_u1s2(_lui)
 I_u2s3u1(_lw)
 I_u1u2u3(_mfc0)
+I_u1(_mfhi)
 I_u1u2u3(_mtc0)
 I_u2u1u3(_ori)
 I_u3u1u2(_or)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 06/14] MIPS: uasm: Add mfhi uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      |  1 +
 arch/mips/include/uapi/asm/inst.h |  1 +
 arch/mips/mm/uasm-micromips.c     |  1 +
 arch/mips/mm/uasm-mips.c          |  1 +
 arch/mips/mm/uasm.c               | 11 ++++++-----
 5 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 236b2db..9da0ceb 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -130,6 +130,7 @@ Ip_u1s2(_lui);
 Ip_u2s3u1(_lw);
 Ip_u3u1u2(_lwx);
 Ip_u1u2u3(_mfc0);
+Ip_u1(_mfhi);
 Ip_u1u2u3(_mtc0);
 Ip_u3u1u2(_or);
 Ip_u2u1u3(_ori);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index c28cae1..f46930f 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -296,6 +296,7 @@ enum mm_32axf_minor_op {
 	mm_mfc0_op = 0x003,
 	mm_mtc0_op = 0x00b,
 	mm_tlbp_op = 0x00d,
+	mm_mfhi32_op = 0x035,
 	mm_jalr_op = 0x03c,
 	mm_tlbr_op = 0x04d,
 	mm_jalrhb_op = 0x07c,
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index dc448be..fd5ab06 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -86,6 +86,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_lui, M(mm_pool32i_op, mm_lui_op, 0, 0, 0, 0), RS | SIMM },
 	{ insn_lw, M(mm_lw32_op, 0, 0, 0, 0, 0), RT | RS | SIMM },
 	{ insn_mfc0, M(mm_pool32a_op, 0, 0, 0, mm_mfc0_op, mm_pool32axf_op), RT | RS | RD },
+	{ insn_mfhi, M(mm_pool32a_op, 0, 0, 0, mm_mfhi32_op, mm_pool32axf_op), RS },
 	{ insn_mtc0, M(mm_pool32a_op, 0, 0, 0, mm_mtc0_op, mm_pool32axf_op), RT | RS | RD },
 	{ insn_or, M(mm_pool32a_op, 0, 0, 0, 0, mm_or32_op), RT | RS | RD },
 	{ insn_ori, M(mm_ori32_op, 0, 0, 0, 0, 0), RT | RS | UIMM },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index fe4be68..15e4048 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -93,6 +93,7 @@ static struct insn insn_table[] = {
 	{ insn_lw,  M(lw_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_lwx, M(spec3_op, 0, 0, 0, lwx_op, lx_op), RS | RT | RD },
 	{ insn_mfc0,  M(cop0_op, mfc_op, 0, 0, 0, 0),  RT | RD | SET},
+	{ insn_mfhi,  M(spec_op, 0, 0, 0, 0, mfhi_op),  RT | RD | SET},
 	{ insn_mtc0,  M(cop0_op, mtc_op, 0, 0, 0, 0),  RT | RD | SET},
 	{ insn_ori,  M(ori_op, 0, 0, 0, 0, 0),	RS | RT | UIMM },
 	{ insn_or,  M(spec_op, 0, 0, 0, 0, or_op),  RS | RT | RD },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index c2b7c4b..7ecde42 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -50,11 +50,11 @@ enum opcode {
 	insn_divu, insn_dmfc0, insn_dmtc0, insn_drotr, insn_drotr32, insn_dsll,
 	insn_dsll32, insn_dsra, insn_dsrl, insn_dsrl32, insn_dsubu, insn_eret,
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jr, insn_ld, insn_ldx,
-	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mtc0,
-	insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sc, insn_scd,
-	insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_srlv, insn_subu,
-	insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr,
-	insn_xor, insn_xori,
+	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mfhi,
+	insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sc,
+	insn_scd, insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_srlv,
+	insn_subu, insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi,
+	insn_tlbwr, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -272,6 +272,7 @@ I_u2s3u1(_lld)
 I_u1s2(_lui)
 I_u2s3u1(_lw)
 I_u1u2u3(_mfc0)
+I_u1(_mfhi)
 I_u1u2u3(_mtc0)
 I_u2u1u3(_ori)
 I_u3u1u2(_or)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 07/14] MIPS: uasm: Add jalr uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h  |  1 +
 arch/mips/mm/uasm-micromips.c |  1 +
 arch/mips/mm/uasm-mips.c      |  1 +
 arch/mips/mm/uasm.c           | 13 +++++++------
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 9da0ceb..61c8cce 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -121,6 +121,7 @@ Ip_u2u1msbu3(_ext);
 Ip_u2u1msbu3(_ins);
 Ip_u1(_j);
 Ip_u1(_jal);
+Ip_u2u1(_jalr);
 Ip_u1(_jr);
 Ip_u2s3u1(_ld);
 Ip_u3u1u2(_ldx);
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index fd5ab06..a6f0686 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -79,6 +79,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_ext, M(mm_pool32a_op, 0, 0, 0, 0, mm_ext_op), RT | RS | RD | RE },
 	{ insn_j, M(mm_j32_op, 0, 0, 0, 0, 0), JIMM },
 	{ insn_jal, M(mm_jal32_op, 0, 0, 0, 0, 0), JIMM },
+	{ insn_jalr, M(mm_pool32a_op, 0, 0, 0, mm_jalr_op, mm_pool32axf_op), RT | RS },
 	{ insn_jr, M(mm_pool32a_op, 0, 0, 0, mm_jalr_op, mm_pool32axf_op), RS },
 	{ insn_ld, 0, 0 },
 	{ insn_ll, M(mm_pool32c_op, 0, 0, (mm_ll_func << 1), 0, 0), RS | RT | SIMM },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 15e4048..6e295c2 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -83,6 +83,7 @@ static struct insn insn_table[] = {
 	{ insn_ins, M(spec3_op, 0, 0, 0, 0, ins_op), RS | RT | RD | RE },
 	{ insn_j,  M(j_op, 0, 0, 0, 0, 0),  JIMM },
 	{ insn_jal,  M(jal_op, 0, 0, 0, 0, 0),	JIMM },
+	{ insn_jalr, M(spec_op, 0, 0, 0, 0, jalr_op), RS | RD },
 	{ insn_j,  M(j_op, 0, 0, 0, 0, 0),  JIMM },
 	{ insn_jr,  M(spec_op, 0, 0, 0, 0, jr_op),  RS },
 	{ insn_ld,  M(ld_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 7ecde42..87d233d 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -49,12 +49,12 @@ enum opcode {
 	insn_bne, insn_cache, insn_daddiu, insn_daddu, insn_dins, insn_dinsm,
 	insn_divu, insn_dmfc0, insn_dmtc0, insn_drotr, insn_drotr32, insn_dsll,
 	insn_dsll32, insn_dsra, insn_dsrl, insn_dsrl32, insn_dsubu, insn_eret,
-	insn_ext, insn_ins, insn_j, insn_jal, insn_jr, insn_ld, insn_ldx,
-	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mfhi,
-	insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sc,
-	insn_scd, insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_srlv,
-	insn_subu, insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi,
-	insn_tlbwr, insn_xor, insn_xori,
+	insn_ext, insn_ins, insn_j, insn_jal, insn_jalr, insn_jr, insn_ld,
+	insn_ldx, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0,
+	insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr,
+	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl,
+	insn_srlv, insn_subu, insn_sw, insn_syscall, insn_tlbp, insn_tlbr,
+	insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -265,6 +265,7 @@ I_u2u1msbdu3(_ext)
 I_u2u1msbu3(_ins)
 I_u1(_j)
 I_u1(_jal)
+I_u2u1(_jalr)
 I_u1(_jr)
 I_u2s3u1(_ld)
 I_u2s3u1(_ll)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 07/14] MIPS: uasm: Add jalr uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h  |  1 +
 arch/mips/mm/uasm-micromips.c |  1 +
 arch/mips/mm/uasm-mips.c      |  1 +
 arch/mips/mm/uasm.c           | 13 +++++++------
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 9da0ceb..61c8cce 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -121,6 +121,7 @@ Ip_u2u1msbu3(_ext);
 Ip_u2u1msbu3(_ins);
 Ip_u1(_j);
 Ip_u1(_jal);
+Ip_u2u1(_jalr);
 Ip_u1(_jr);
 Ip_u2s3u1(_ld);
 Ip_u3u1u2(_ldx);
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index fd5ab06..a6f0686 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -79,6 +79,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_ext, M(mm_pool32a_op, 0, 0, 0, 0, mm_ext_op), RT | RS | RD | RE },
 	{ insn_j, M(mm_j32_op, 0, 0, 0, 0, 0), JIMM },
 	{ insn_jal, M(mm_jal32_op, 0, 0, 0, 0, 0), JIMM },
+	{ insn_jalr, M(mm_pool32a_op, 0, 0, 0, mm_jalr_op, mm_pool32axf_op), RT | RS },
 	{ insn_jr, M(mm_pool32a_op, 0, 0, 0, mm_jalr_op, mm_pool32axf_op), RS },
 	{ insn_ld, 0, 0 },
 	{ insn_ll, M(mm_pool32c_op, 0, 0, (mm_ll_func << 1), 0, 0), RS | RT | SIMM },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 15e4048..6e295c2 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -83,6 +83,7 @@ static struct insn insn_table[] = {
 	{ insn_ins, M(spec3_op, 0, 0, 0, 0, ins_op), RS | RT | RD | RE },
 	{ insn_j,  M(j_op, 0, 0, 0, 0, 0),  JIMM },
 	{ insn_jal,  M(jal_op, 0, 0, 0, 0, 0),	JIMM },
+	{ insn_jalr, M(spec_op, 0, 0, 0, 0, jalr_op), RS | RD },
 	{ insn_j,  M(j_op, 0, 0, 0, 0, 0),  JIMM },
 	{ insn_jr,  M(spec_op, 0, 0, 0, 0, jr_op),  RS },
 	{ insn_ld,  M(ld_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 7ecde42..87d233d 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -49,12 +49,12 @@ enum opcode {
 	insn_bne, insn_cache, insn_daddiu, insn_daddu, insn_dins, insn_dinsm,
 	insn_divu, insn_dmfc0, insn_dmtc0, insn_drotr, insn_drotr32, insn_dsll,
 	insn_dsll32, insn_dsra, insn_dsrl, insn_dsrl32, insn_dsubu, insn_eret,
-	insn_ext, insn_ins, insn_j, insn_jal, insn_jr, insn_ld, insn_ldx,
-	insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0, insn_mfhi,
-	insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sc,
-	insn_scd, insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl, insn_srlv,
-	insn_subu, insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi,
-	insn_tlbwr, insn_xor, insn_xori,
+	insn_ext, insn_ins, insn_j, insn_jal, insn_jalr, insn_jr, insn_ld,
+	insn_ldx, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0,
+	insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr,
+	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl,
+	insn_srlv, insn_subu, insn_sw, insn_syscall, insn_tlbp, insn_tlbr,
+	insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -265,6 +265,7 @@ I_u2u1msbdu3(_ext)
 I_u2u1msbu3(_ins)
 I_u1(_j)
 I_u1(_jal)
+I_u2u1(_jalr)
 I_u1(_jr)
 I_u2s3u1(_ld)
 I_u2s3u1(_ll)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 08/14] MIPS: uasm: Add sltiu uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h  | 1 +
 arch/mips/mm/uasm-micromips.c | 1 +
 arch/mips/mm/uasm-mips.c      | 1 +
 arch/mips/mm/uasm.c           | 7 ++++---
 4 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 61c8cce..340b34a 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -143,6 +143,7 @@ Ip_u2s3u1(_scd);
 Ip_u2s3u1(_sd);
 Ip_u2u1u3(_sll);
 Ip_u3u2u1(_sllv);
+Ip_u2u1s3(_sltiu);
 Ip_u2u1u3(_sra);
 Ip_u2u1u3(_srl);
 Ip_u3u2u1(_srlv);
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index a6f0686..929d887 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -98,6 +98,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_sd, 0, 0 },
 	{ insn_sll, M(mm_pool32a_op, 0, 0, 0, 0, mm_sll32_op), RT | RS | RD },
 	{ insn_sllv, M(mm_pool32a_op, 0, 0, 0, 0, mm_sllv32_op), RT | RS | RD },
+	{ insn_sltiu, M(mm_sltiu32_op, 0, 0, 0, 0, 0), RT | RS | SIMM },
 	{ insn_sra, M(mm_pool32a_op, 0, 0, 0, 0, mm_sra_op), RT | RS | RD },
 	{ insn_srl, M(mm_pool32a_op, 0, 0, 0, 0, mm_srl32_op), RT | RS | RD },
 	{ insn_srlv, M(mm_pool32a_op, 0, 0, 0, 0, mm_srlv32_op), RT | RS | RD },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 6e295c2..d2a56a6 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -106,6 +106,7 @@ static struct insn insn_table[] = {
 	{ insn_sd,  M(sd_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_sll,  M(spec_op, 0, 0, 0, 0, sll_op),  RT | RD | RE },
 	{ insn_sllv,  M(spec_op, 0, 0, 0, 0, sllv_op),  RS | RT | RD },
+	{ insn_sltiu, M(sltiu_op, 0, 0, 0, 0, 0), RS | RT | SIMM },
 	{ insn_sra,  M(spec_op, 0, 0, 0, 0, sra_op),  RT | RD | RE },
 	{ insn_srl,  M(spec_op, 0, 0, 0, 0, srl_op),  RT | RD | RE },
 	{ insn_srlv,  M(spec_op, 0, 0, 0, 0, srlv_op),  RS | RT | RD },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 87d233d..d451266 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -52,9 +52,9 @@ enum opcode {
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jalr, insn_jr, insn_ld,
 	insn_ldx, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0,
 	insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr,
-	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl,
-	insn_srlv, insn_subu, insn_sw, insn_syscall, insn_tlbp, insn_tlbr,
-	insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
+	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sltiu, insn_sra,
+	insn_srl, insn_srlv, insn_subu, insn_sw, insn_syscall, insn_tlbp,
+	insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -283,6 +283,7 @@ I_u2s3u1(_scd)
 I_u2s3u1(_sd)
 I_u2u1u3(_sll)
 I_u3u2u1(_sllv)
+I_u2u1s3(_sltiu)
 I_u2u1u3(_sra)
 I_u2u1u3(_srl)
 I_u3u2u1(_srlv)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 08/14] MIPS: uasm: Add sltiu uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h  | 1 +
 arch/mips/mm/uasm-micromips.c | 1 +
 arch/mips/mm/uasm-mips.c      | 1 +
 arch/mips/mm/uasm.c           | 7 ++++---
 4 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 61c8cce..340b34a 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -143,6 +143,7 @@ Ip_u2s3u1(_scd);
 Ip_u2s3u1(_sd);
 Ip_u2u1u3(_sll);
 Ip_u3u2u1(_sllv);
+Ip_u2u1s3(_sltiu);
 Ip_u2u1u3(_sra);
 Ip_u2u1u3(_srl);
 Ip_u3u2u1(_srlv);
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index a6f0686..929d887 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -98,6 +98,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_sd, 0, 0 },
 	{ insn_sll, M(mm_pool32a_op, 0, 0, 0, 0, mm_sll32_op), RT | RS | RD },
 	{ insn_sllv, M(mm_pool32a_op, 0, 0, 0, 0, mm_sllv32_op), RT | RS | RD },
+	{ insn_sltiu, M(mm_sltiu32_op, 0, 0, 0, 0, 0), RT | RS | SIMM },
 	{ insn_sra, M(mm_pool32a_op, 0, 0, 0, 0, mm_sra_op), RT | RS | RD },
 	{ insn_srl, M(mm_pool32a_op, 0, 0, 0, 0, mm_srl32_op), RT | RS | RD },
 	{ insn_srlv, M(mm_pool32a_op, 0, 0, 0, 0, mm_srlv32_op), RT | RS | RD },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 6e295c2..d2a56a6 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -106,6 +106,7 @@ static struct insn insn_table[] = {
 	{ insn_sd,  M(sd_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_sll,  M(spec_op, 0, 0, 0, 0, sll_op),  RT | RD | RE },
 	{ insn_sllv,  M(spec_op, 0, 0, 0, 0, sllv_op),  RS | RT | RD },
+	{ insn_sltiu, M(sltiu_op, 0, 0, 0, 0, 0), RS | RT | SIMM },
 	{ insn_sra,  M(spec_op, 0, 0, 0, 0, sra_op),  RT | RD | RE },
 	{ insn_srl,  M(spec_op, 0, 0, 0, 0, srl_op),  RT | RD | RE },
 	{ insn_srlv,  M(spec_op, 0, 0, 0, 0, srlv_op),  RS | RT | RD },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 87d233d..d451266 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -52,9 +52,9 @@ enum opcode {
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jalr, insn_jr, insn_ld,
 	insn_ldx, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0,
 	insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr,
-	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sra, insn_srl,
-	insn_srlv, insn_subu, insn_sw, insn_syscall, insn_tlbp, insn_tlbr,
-	insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
+	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sltiu, insn_sra,
+	insn_srl, insn_srlv, insn_subu, insn_sw, insn_syscall, insn_tlbp,
+	insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -283,6 +283,7 @@ I_u2s3u1(_scd)
 I_u2s3u1(_sd)
 I_u2u1u3(_sll)
 I_u3u2u1(_sllv)
+I_u2u1s3(_sltiu)
 I_u2u1u3(_sra)
 I_u2u1u3(_srl)
 I_u3u2u1(_srlv)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 09/14] MIPS: uasm: Add sltu uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      | 1 +
 arch/mips/include/uapi/asm/inst.h | 1 +
 arch/mips/mm/uasm-micromips.c     | 1 +
 arch/mips/mm/uasm-mips.c          | 1 +
 arch/mips/mm/uasm.c               | 7 ++++---
 5 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 340b34a..1f1d777 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -144,6 +144,7 @@ Ip_u2s3u1(_sd);
 Ip_u2u1u3(_sll);
 Ip_u3u2u1(_sllv);
 Ip_u2u1s3(_sltiu);
+Ip_u3u1u2(_sltu);
 Ip_u2u1u3(_sra);
 Ip_u2u1u3(_srl);
 Ip_u3u2u1(_srlv);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index f46930f..da27bf7 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -257,6 +257,7 @@ enum mm_32a_minor_op {
 	mm_and_op = 0x250,
 	mm_or32_op = 0x290,
 	mm_xor32_op = 0x310,
+	mm_sltu_op = 0x390,
 };
 
 /*
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index 929d887..bc4e912 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -99,6 +99,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_sll, M(mm_pool32a_op, 0, 0, 0, 0, mm_sll32_op), RT | RS | RD },
 	{ insn_sllv, M(mm_pool32a_op, 0, 0, 0, 0, mm_sllv32_op), RT | RS | RD },
 	{ insn_sltiu, M(mm_sltiu32_op, 0, 0, 0, 0, 0), RT | RS | SIMM },
+	{ insn_sltu, M(mm_pool32a_op, 0, 0, 0, 0, mm_sltu_op), RT | RS | RD },
 	{ insn_sra, M(mm_pool32a_op, 0, 0, 0, 0, mm_sra_op), RT | RS | RD },
 	{ insn_srl, M(mm_pool32a_op, 0, 0, 0, 0, mm_srl32_op), RT | RS | RD },
 	{ insn_srlv, M(mm_pool32a_op, 0, 0, 0, 0, mm_srlv32_op), RT | RS | RD },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index d2a56a6..a135b3c 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -107,6 +107,7 @@ static struct insn insn_table[] = {
 	{ insn_sll,  M(spec_op, 0, 0, 0, 0, sll_op),  RT | RD | RE },
 	{ insn_sllv,  M(spec_op, 0, 0, 0, 0, sllv_op),  RS | RT | RD },
 	{ insn_sltiu, M(sltiu_op, 0, 0, 0, 0, 0), RS | RT | SIMM },
+	{ insn_sltu, M(spec_op, 0, 0, 0, 0, sltu_op), RS | RT | RD },
 	{ insn_sra,  M(spec_op, 0, 0, 0, 0, sra_op),  RT | RD | RE },
 	{ insn_srl,  M(spec_op, 0, 0, 0, 0, srl_op),  RT | RD | RE },
 	{ insn_srlv,  M(spec_op, 0, 0, 0, 0, srlv_op),  RS | RT | RD },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index d451266..36658f6 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -52,9 +52,9 @@ enum opcode {
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jalr, insn_jr, insn_ld,
 	insn_ldx, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0,
 	insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr,
-	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sltiu, insn_sra,
-	insn_srl, insn_srlv, insn_subu, insn_sw, insn_syscall, insn_tlbp,
-	insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
+	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sltiu, insn_sltu,
+	insn_sra, insn_srl, insn_srlv, insn_subu, insn_sw, insn_syscall,
+	insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -284,6 +284,7 @@ I_u2s3u1(_sd)
 I_u2u1u3(_sll)
 I_u3u2u1(_sllv)
 I_u2u1s3(_sltiu)
+I_u3u1u2(_sltu)
 I_u2u1u3(_sra)
 I_u2u1u3(_srl)
 I_u3u2u1(_srlv)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 09/14] MIPS: uasm: Add sltu uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      | 1 +
 arch/mips/include/uapi/asm/inst.h | 1 +
 arch/mips/mm/uasm-micromips.c     | 1 +
 arch/mips/mm/uasm-mips.c          | 1 +
 arch/mips/mm/uasm.c               | 7 ++++---
 5 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 340b34a..1f1d777 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -144,6 +144,7 @@ Ip_u2s3u1(_sd);
 Ip_u2u1u3(_sll);
 Ip_u3u2u1(_sllv);
 Ip_u2u1s3(_sltiu);
+Ip_u3u1u2(_sltu);
 Ip_u2u1u3(_sra);
 Ip_u2u1u3(_srl);
 Ip_u3u2u1(_srlv);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index f46930f..da27bf7 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -257,6 +257,7 @@ enum mm_32a_minor_op {
 	mm_and_op = 0x250,
 	mm_or32_op = 0x290,
 	mm_xor32_op = 0x310,
+	mm_sltu_op = 0x390,
 };
 
 /*
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index 929d887..bc4e912 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -99,6 +99,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_sll, M(mm_pool32a_op, 0, 0, 0, 0, mm_sll32_op), RT | RS | RD },
 	{ insn_sllv, M(mm_pool32a_op, 0, 0, 0, 0, mm_sllv32_op), RT | RS | RD },
 	{ insn_sltiu, M(mm_sltiu32_op, 0, 0, 0, 0, 0), RT | RS | SIMM },
+	{ insn_sltu, M(mm_pool32a_op, 0, 0, 0, 0, mm_sltu_op), RT | RS | RD },
 	{ insn_sra, M(mm_pool32a_op, 0, 0, 0, 0, mm_sra_op), RT | RS | RD },
 	{ insn_srl, M(mm_pool32a_op, 0, 0, 0, 0, mm_srl32_op), RT | RS | RD },
 	{ insn_srlv, M(mm_pool32a_op, 0, 0, 0, 0, mm_srlv32_op), RT | RS | RD },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index d2a56a6..a135b3c 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -107,6 +107,7 @@ static struct insn insn_table[] = {
 	{ insn_sll,  M(spec_op, 0, 0, 0, 0, sll_op),  RT | RD | RE },
 	{ insn_sllv,  M(spec_op, 0, 0, 0, 0, sllv_op),  RS | RT | RD },
 	{ insn_sltiu, M(sltiu_op, 0, 0, 0, 0, 0), RS | RT | SIMM },
+	{ insn_sltu, M(spec_op, 0, 0, 0, 0, sltu_op), RS | RT | RD },
 	{ insn_sra,  M(spec_op, 0, 0, 0, 0, sra_op),  RT | RD | RE },
 	{ insn_srl,  M(spec_op, 0, 0, 0, 0, srl_op),  RT | RD | RE },
 	{ insn_srlv,  M(spec_op, 0, 0, 0, 0, srlv_op),  RS | RT | RD },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index d451266..36658f6 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -52,9 +52,9 @@ enum opcode {
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jalr, insn_jr, insn_ld,
 	insn_ldx, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0,
 	insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr,
-	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sltiu, insn_sra,
-	insn_srl, insn_srlv, insn_subu, insn_sw, insn_syscall, insn_tlbp,
-	insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
+	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sltiu, insn_sltu,
+	insn_sra, insn_srl, insn_srlv, insn_subu, insn_sw, insn_syscall,
+	insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -284,6 +284,7 @@ I_u2s3u1(_sd)
 I_u2u1u3(_sll)
 I_u3u2u1(_sllv)
 I_u2u1s3(_sltiu)
+I_u3u1u2(_sltu)
 I_u2u1u3(_sra)
 I_u2u1u3(_srl)
 I_u3u2u1(_srlv)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 10/14] MIPS: uasm: Add wsbh uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      |  1 +
 arch/mips/include/uapi/asm/inst.h | 11 +++++++++++
 arch/mips/mm/uasm-micromips.c     |  1 +
 arch/mips/mm/uasm-mips.c          |  1 +
 arch/mips/mm/uasm.c               |  4 +++-
 5 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 1f1d777..e1258f3 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -155,6 +155,7 @@ Ip_0(_tlbp);
 Ip_0(_tlbr);
 Ip_0(_tlbwi);
 Ip_0(_tlbwr);
+Ip_u2u1(_wsbh);
 Ip_u3u1u2(_xor);
 Ip_u2u1u3(_xori);
 
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index da27bf7..c98b061 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -202,6 +202,16 @@ enum lx_func {
 };
 
 /*
+ * BSHFL opcodes
+ */
+enum bshfl_func {
+	wsbh_op = 0x2,
+	dshd_op = 0x5,
+	seb_op  = 0x10,
+	seh_op  = 0x18,
+};
+
+/*
  * (microMIPS) Major opcodes.
  */
 enum mm_major_op {
@@ -254,6 +264,7 @@ enum mm_32a_minor_op {
 	mm_lwxs_op = 0x118,
 	mm_addu32_op = 0x150,
 	mm_subu32_op = 0x1d0,
+	mm_wsbh_op = 0x1ec,
 	mm_and_op = 0x250,
 	mm_or32_op = 0x290,
 	mm_xor32_op = 0x310,
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index bc4e912..528526a 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -110,6 +110,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_tlbr, M(mm_pool32a_op, 0, 0, 0, mm_tlbr_op, mm_pool32axf_op), 0 },
 	{ insn_tlbwi, M(mm_pool32a_op, 0, 0, 0, mm_tlbwi_op, mm_pool32axf_op), 0 },
 	{ insn_tlbwr, M(mm_pool32a_op, 0, 0, 0, mm_tlbwr_op, mm_pool32axf_op), 0 },
+	{ insn_wsbh, M(mm_pool32a_op, 0, 0, 0, mm_wsbh_op, mm_pool32axf_op), RT | RS },
 	{ insn_xor, M(mm_pool32a_op, 0, 0, 0, 0, mm_xor32_op), RT | RS | RD },
 	{ insn_xori, M(mm_xori32_op, 0, 0, 0, 0, 0), RT | RS | UIMM },
 	{ insn_dins, 0, 0 },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index a135b3c..a1845dc 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -118,6 +118,7 @@ static struct insn insn_table[] = {
 	{ insn_tlbr,  M(cop0_op, cop_op, 0, 0, 0, tlbr_op),  0 },
 	{ insn_tlbwi,  M(cop0_op, cop_op, 0, 0, 0, tlbwi_op),  0 },
 	{ insn_tlbwr,  M(cop0_op, cop_op, 0, 0, 0, tlbwr_op),  0 },
+	{ insn_wsbh, M(spec3_op, 0, 0, 0, wsbh_op, bshfl_op), RT | RD },
 	{ insn_xori,  M(xori_op, 0, 0, 0, 0, 0),  RS | RT | UIMM },
 	{ insn_xor,  M(spec_op, 0, 0, 0, 0, xor_op),  RS | RT | RD },
 	{ insn_invalid, 0, 0 }
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 36658f6..d816a3a 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -54,7 +54,8 @@ enum opcode {
 	insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr,
 	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sltiu, insn_sltu,
 	insn_sra, insn_srl, insn_srlv, insn_subu, insn_sw, insn_syscall,
-	insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
+	insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_wsbh, insn_xor,
+	insn_xori,
 };
 
 struct insn {
@@ -295,6 +296,7 @@ I_0(_tlbp)
 I_0(_tlbr)
 I_0(_tlbwi)
 I_0(_tlbwr)
+I_u2u1(_wsbh)
 I_u3u1u2(_xor)
 I_u2u1u3(_xori)
 I_u2u1msbu3(_dins);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 10/14] MIPS: uasm: Add wsbh uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      |  1 +
 arch/mips/include/uapi/asm/inst.h | 11 +++++++++++
 arch/mips/mm/uasm-micromips.c     |  1 +
 arch/mips/mm/uasm-mips.c          |  1 +
 arch/mips/mm/uasm.c               |  4 +++-
 5 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index 1f1d777..e1258f3 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -155,6 +155,7 @@ Ip_0(_tlbp);
 Ip_0(_tlbr);
 Ip_0(_tlbwi);
 Ip_0(_tlbwr);
+Ip_u2u1(_wsbh);
 Ip_u3u1u2(_xor);
 Ip_u2u1u3(_xori);
 
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index da27bf7..c98b061 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -202,6 +202,16 @@ enum lx_func {
 };
 
 /*
+ * BSHFL opcodes
+ */
+enum bshfl_func {
+	wsbh_op = 0x2,
+	dshd_op = 0x5,
+	seb_op  = 0x10,
+	seh_op  = 0x18,
+};
+
+/*
  * (microMIPS) Major opcodes.
  */
 enum mm_major_op {
@@ -254,6 +264,7 @@ enum mm_32a_minor_op {
 	mm_lwxs_op = 0x118,
 	mm_addu32_op = 0x150,
 	mm_subu32_op = 0x1d0,
+	mm_wsbh_op = 0x1ec,
 	mm_and_op = 0x250,
 	mm_or32_op = 0x290,
 	mm_xor32_op = 0x310,
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index bc4e912..528526a 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -110,6 +110,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_tlbr, M(mm_pool32a_op, 0, 0, 0, mm_tlbr_op, mm_pool32axf_op), 0 },
 	{ insn_tlbwi, M(mm_pool32a_op, 0, 0, 0, mm_tlbwi_op, mm_pool32axf_op), 0 },
 	{ insn_tlbwr, M(mm_pool32a_op, 0, 0, 0, mm_tlbwr_op, mm_pool32axf_op), 0 },
+	{ insn_wsbh, M(mm_pool32a_op, 0, 0, 0, mm_wsbh_op, mm_pool32axf_op), RT | RS },
 	{ insn_xor, M(mm_pool32a_op, 0, 0, 0, 0, mm_xor32_op), RT | RS | RD },
 	{ insn_xori, M(mm_xori32_op, 0, 0, 0, 0, 0), RT | RS | UIMM },
 	{ insn_dins, 0, 0 },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index a135b3c..a1845dc 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -118,6 +118,7 @@ static struct insn insn_table[] = {
 	{ insn_tlbr,  M(cop0_op, cop_op, 0, 0, 0, tlbr_op),  0 },
 	{ insn_tlbwi,  M(cop0_op, cop_op, 0, 0, 0, tlbwi_op),  0 },
 	{ insn_tlbwr,  M(cop0_op, cop_op, 0, 0, 0, tlbwr_op),  0 },
+	{ insn_wsbh, M(spec3_op, 0, 0, 0, wsbh_op, bshfl_op), RT | RD },
 	{ insn_xori,  M(xori_op, 0, 0, 0, 0, 0),  RS | RT | UIMM },
 	{ insn_xor,  M(spec_op, 0, 0, 0, 0, xor_op),  RS | RT | RD },
 	{ insn_invalid, 0, 0 }
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 36658f6..d816a3a 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -54,7 +54,8 @@ enum opcode {
 	insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr,
 	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sltiu, insn_sltu,
 	insn_sra, insn_srl, insn_srlv, insn_subu, insn_sw, insn_syscall,
-	insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_xor, insn_xori,
+	insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_wsbh, insn_xor,
+	insn_xori,
 };
 
 struct insn {
@@ -295,6 +296,7 @@ I_0(_tlbp)
 I_0(_tlbr)
 I_0(_tlbwi)
 I_0(_tlbwr)
+I_u2u1(_wsbh)
 I_u3u1u2(_xor)
 I_u2u1u3(_xori)
 I_u2u1msbu3(_dins);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 11/14] MIPS: uasm: Add lh uam instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h  |  1 +
 arch/mips/mm/uasm-micromips.c |  1 +
 arch/mips/mm/uasm-mips.c      |  1 +
 arch/mips/mm/uasm.c           | 13 +++++++------
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index e1258f3..ec11524 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -125,6 +125,7 @@ Ip_u2u1(_jalr);
 Ip_u1(_jr);
 Ip_u2s3u1(_ld);
 Ip_u3u1u2(_ldx);
+Ip_u2s3u1(_lh);
 Ip_u2s3u1(_ll);
 Ip_u2s3u1(_lld);
 Ip_u1s2(_lui);
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index 528526a..89df379 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -82,6 +82,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_jalr, M(mm_pool32a_op, 0, 0, 0, mm_jalr_op, mm_pool32axf_op), RT | RS },
 	{ insn_jr, M(mm_pool32a_op, 0, 0, 0, mm_jalr_op, mm_pool32axf_op), RS },
 	{ insn_ld, 0, 0 },
+	{ insn_lh, M(mm_lh32_op, 0, 0, 0, 0, 0), RS | RS | SIMM },
 	{ insn_ll, M(mm_pool32c_op, 0, 0, (mm_ll_func << 1), 0, 0), RS | RT | SIMM },
 	{ insn_lld, 0, 0 },
 	{ insn_lui, M(mm_pool32i_op, mm_lui_op, 0, 0, 0, 0), RS | SIMM },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index a1845dc..576ae34 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -88,6 +88,7 @@ static struct insn insn_table[] = {
 	{ insn_jr,  M(spec_op, 0, 0, 0, 0, jr_op),  RS },
 	{ insn_ld,  M(ld_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_ldx, M(spec3_op, 0, 0, 0, ldx_op, lx_op), RS | RT | RD },
+	{ insn_lh,  M(lw_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_lld,  M(lld_op, 0, 0, 0, 0, 0),	RS | RT | SIMM },
 	{ insn_ll,  M(ll_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_lui,  M(lui_op, 0, 0, 0, 0, 0),	RT | SIMM },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index d816a3a..96f8b5b 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -50,12 +50,12 @@ enum opcode {
 	insn_divu, insn_dmfc0, insn_dmtc0, insn_drotr, insn_drotr32, insn_dsll,
 	insn_dsll32, insn_dsra, insn_dsrl, insn_dsrl32, insn_dsubu, insn_eret,
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jalr, insn_jr, insn_ld,
-	insn_ldx, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0,
-	insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr,
-	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sltiu, insn_sltu,
-	insn_sra, insn_srl, insn_srlv, insn_subu, insn_sw, insn_syscall,
-	insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_wsbh, insn_xor,
-	insn_xori,
+	insn_ldx, insn_lh, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx,
+	insn_mfc0, insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref,
+	insn_rfe, insn_rotr, insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv,
+	insn_sltiu, insn_sltu, insn_sra, insn_srl, insn_srlv, insn_subu,
+	insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr,
+	insn_wsbh, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -269,6 +269,7 @@ I_u1(_jal)
 I_u2u1(_jalr)
 I_u1(_jr)
 I_u2s3u1(_ld)
+I_u2s3u1(_lh)
 I_u2s3u1(_ll)
 I_u2s3u1(_lld)
 I_u1s2(_lui)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 11/14] MIPS: uasm: Add lh uam instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h  |  1 +
 arch/mips/mm/uasm-micromips.c |  1 +
 arch/mips/mm/uasm-mips.c      |  1 +
 arch/mips/mm/uasm.c           | 13 +++++++------
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index e1258f3..ec11524 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -125,6 +125,7 @@ Ip_u2u1(_jalr);
 Ip_u1(_jr);
 Ip_u2s3u1(_ld);
 Ip_u3u1u2(_ldx);
+Ip_u2s3u1(_lh);
 Ip_u2s3u1(_ll);
 Ip_u2s3u1(_lld);
 Ip_u1s2(_lui);
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index 528526a..89df379 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -82,6 +82,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_jalr, M(mm_pool32a_op, 0, 0, 0, mm_jalr_op, mm_pool32axf_op), RT | RS },
 	{ insn_jr, M(mm_pool32a_op, 0, 0, 0, mm_jalr_op, mm_pool32axf_op), RS },
 	{ insn_ld, 0, 0 },
+	{ insn_lh, M(mm_lh32_op, 0, 0, 0, 0, 0), RS | RS | SIMM },
 	{ insn_ll, M(mm_pool32c_op, 0, 0, (mm_ll_func << 1), 0, 0), RS | RT | SIMM },
 	{ insn_lld, 0, 0 },
 	{ insn_lui, M(mm_pool32i_op, mm_lui_op, 0, 0, 0, 0), RS | SIMM },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index a1845dc..576ae34 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -88,6 +88,7 @@ static struct insn insn_table[] = {
 	{ insn_jr,  M(spec_op, 0, 0, 0, 0, jr_op),  RS },
 	{ insn_ld,  M(ld_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_ldx, M(spec3_op, 0, 0, 0, ldx_op, lx_op), RS | RT | RD },
+	{ insn_lh,  M(lw_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_lld,  M(lld_op, 0, 0, 0, 0, 0),	RS | RT | SIMM },
 	{ insn_ll,  M(ll_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
 	{ insn_lui,  M(lui_op, 0, 0, 0, 0, 0),	RT | SIMM },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index d816a3a..96f8b5b 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -50,12 +50,12 @@ enum opcode {
 	insn_divu, insn_dmfc0, insn_dmtc0, insn_drotr, insn_drotr32, insn_dsll,
 	insn_dsll32, insn_dsra, insn_dsrl, insn_dsrl32, insn_dsubu, insn_eret,
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jalr, insn_jr, insn_ld,
-	insn_ldx, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx, insn_mfc0,
-	insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr,
-	insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv, insn_sltiu, insn_sltu,
-	insn_sra, insn_srl, insn_srlv, insn_subu, insn_sw, insn_syscall,
-	insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr, insn_wsbh, insn_xor,
-	insn_xori,
+	insn_ldx, insn_lh, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx,
+	insn_mfc0, insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref,
+	insn_rfe, insn_rotr, insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv,
+	insn_sltiu, insn_sltu, insn_sra, insn_srl, insn_srlv, insn_subu,
+	insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr,
+	insn_wsbh, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -269,6 +269,7 @@ I_u1(_jal)
 I_u2u1(_jalr)
 I_u1(_jr)
 I_u2s3u1(_ld)
+I_u2s3u1(_lh)
 I_u2s3u1(_ll)
 I_u2s3u1(_lld)
 I_u1s2(_lui)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 12/14] MIPS: uasm: Add mul uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      |  1 +
 arch/mips/include/uapi/asm/inst.h |  1 +
 arch/mips/mm/uasm-micromips.c     |  1 +
 arch/mips/mm/uasm-mips.c          |  1 +
 arch/mips/mm/uasm.c               | 11 ++++++-----
 5 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index ec11524..45af01b 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -134,6 +134,7 @@ Ip_u3u1u2(_lwx);
 Ip_u1u2u3(_mfc0);
 Ip_u1(_mfhi);
 Ip_u1u2u3(_mtc0);
+Ip_u3u1u2(_mul);
 Ip_u3u1u2(_or);
 Ip_u2u1u3(_ori);
 Ip_u2s3u1(_pref);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index c98b061..7700284 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -265,6 +265,7 @@ enum mm_32a_minor_op {
 	mm_addu32_op = 0x150,
 	mm_subu32_op = 0x1d0,
 	mm_wsbh_op = 0x1ec,
+	mm_mul_op = 0x210,
 	mm_and_op = 0x250,
 	mm_or32_op = 0x290,
 	mm_xor32_op = 0x310,
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index 89df379..f47fa6e 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -90,6 +90,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_mfc0, M(mm_pool32a_op, 0, 0, 0, mm_mfc0_op, mm_pool32axf_op), RT | RS | RD },
 	{ insn_mfhi, M(mm_pool32a_op, 0, 0, 0, mm_mfhi32_op, mm_pool32axf_op), RS },
 	{ insn_mtc0, M(mm_pool32a_op, 0, 0, 0, mm_mtc0_op, mm_pool32axf_op), RT | RS | RD },
+	{ insn_mul, M(mm_pool32a_op, 0, 0, 0, 0, mm_mul_op), RT | RS | RD },
 	{ insn_or, M(mm_pool32a_op, 0, 0, 0, 0, mm_or32_op), RT | RS | RD },
 	{ insn_ori, M(mm_ori32_op, 0, 0, 0, 0, 0), RT | RS | UIMM },
 	{ insn_pref, M(mm_pool32c_op, 0, 0, (mm_pref_func << 1), 0, 0), RT | RS | SIMM },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 576ae34..0f66284 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -97,6 +97,7 @@ static struct insn insn_table[] = {
 	{ insn_mfc0,  M(cop0_op, mfc_op, 0, 0, 0, 0),  RT | RD | SET},
 	{ insn_mfhi,  M(spec_op, 0, 0, 0, 0, mfhi_op),  RT | RD | SET},
 	{ insn_mtc0,  M(cop0_op, mtc_op, 0, 0, 0, 0),  RT | RD | SET},
+	{ insn_mul, M(spec2_op, 0, 0, 0, 0, mul_op), RS | RT | RD},
 	{ insn_ori,  M(ori_op, 0, 0, 0, 0, 0),	RS | RT | UIMM },
 	{ insn_or,  M(spec_op, 0, 0, 0, 0, or_op),  RS | RT | RD },
 	{ insn_pref,  M(pref_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 96f8b5b..5b15ce7 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -51,11 +51,11 @@ enum opcode {
 	insn_dsll32, insn_dsra, insn_dsrl, insn_dsrl32, insn_dsubu, insn_eret,
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jalr, insn_jr, insn_ld,
 	insn_ldx, insn_lh, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx,
-	insn_mfc0, insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref,
-	insn_rfe, insn_rotr, insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv,
-	insn_sltiu, insn_sltu, insn_sra, insn_srl, insn_srlv, insn_subu,
-	insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr,
-	insn_wsbh, insn_xor, insn_xori,
+	insn_mfc0, insn_mfhi, insn_mtc0, insn_mul, insn_or, insn_ori,
+	insn_pref, insn_rfe, insn_rotr, insn_sc, insn_scd, insn_sd, insn_sll,
+	insn_sllv, insn_sltiu, insn_sltu, insn_sra, insn_srl, insn_srlv,
+	insn_subu, insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi,
+	insn_tlbwr, insn_wsbh, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -277,6 +277,7 @@ I_u2s3u1(_lw)
 I_u1u2u3(_mfc0)
 I_u1(_mfhi)
 I_u1u2u3(_mtc0)
+I_u3u1u2(_mul)
 I_u2u1u3(_ori)
 I_u3u1u2(_or)
 I_0(_rfe)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 12/14] MIPS: uasm: Add mul uasm instruction
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

It will be used later on by bpf-jit

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/include/asm/uasm.h      |  1 +
 arch/mips/include/uapi/asm/inst.h |  1 +
 arch/mips/mm/uasm-micromips.c     |  1 +
 arch/mips/mm/uasm-mips.c          |  1 +
 arch/mips/mm/uasm.c               | 11 ++++++-----
 5 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index ec11524..45af01b 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -134,6 +134,7 @@ Ip_u3u1u2(_lwx);
 Ip_u1u2u3(_mfc0);
 Ip_u1(_mfhi);
 Ip_u1u2u3(_mtc0);
+Ip_u3u1u2(_mul);
 Ip_u3u1u2(_or);
 Ip_u2u1u3(_ori);
 Ip_u2s3u1(_pref);
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index c98b061..7700284 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -265,6 +265,7 @@ enum mm_32a_minor_op {
 	mm_addu32_op = 0x150,
 	mm_subu32_op = 0x1d0,
 	mm_wsbh_op = 0x1ec,
+	mm_mul_op = 0x210,
 	mm_and_op = 0x250,
 	mm_or32_op = 0x290,
 	mm_xor32_op = 0x310,
diff --git a/arch/mips/mm/uasm-micromips.c b/arch/mips/mm/uasm-micromips.c
index 89df379..f47fa6e 100644
--- a/arch/mips/mm/uasm-micromips.c
+++ b/arch/mips/mm/uasm-micromips.c
@@ -90,6 +90,7 @@ static struct insn insn_table_MM[] = {
 	{ insn_mfc0, M(mm_pool32a_op, 0, 0, 0, mm_mfc0_op, mm_pool32axf_op), RT | RS | RD },
 	{ insn_mfhi, M(mm_pool32a_op, 0, 0, 0, mm_mfhi32_op, mm_pool32axf_op), RS },
 	{ insn_mtc0, M(mm_pool32a_op, 0, 0, 0, mm_mtc0_op, mm_pool32axf_op), RT | RS | RD },
+	{ insn_mul, M(mm_pool32a_op, 0, 0, 0, 0, mm_mul_op), RT | RS | RD },
 	{ insn_or, M(mm_pool32a_op, 0, 0, 0, 0, mm_or32_op), RT | RS | RD },
 	{ insn_ori, M(mm_ori32_op, 0, 0, 0, 0, 0), RT | RS | UIMM },
 	{ insn_pref, M(mm_pool32c_op, 0, 0, (mm_pref_func << 1), 0, 0), RT | RS | SIMM },
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 576ae34..0f66284 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -97,6 +97,7 @@ static struct insn insn_table[] = {
 	{ insn_mfc0,  M(cop0_op, mfc_op, 0, 0, 0, 0),  RT | RD | SET},
 	{ insn_mfhi,  M(spec_op, 0, 0, 0, 0, mfhi_op),  RT | RD | SET},
 	{ insn_mtc0,  M(cop0_op, mtc_op, 0, 0, 0, 0),  RT | RD | SET},
+	{ insn_mul, M(spec2_op, 0, 0, 0, 0, mul_op), RS | RT | RD},
 	{ insn_ori,  M(ori_op, 0, 0, 0, 0, 0),	RS | RT | UIMM },
 	{ insn_or,  M(spec_op, 0, 0, 0, 0, or_op),  RS | RT | RD },
 	{ insn_pref,  M(pref_op, 0, 0, 0, 0, 0),  RS | RT | SIMM },
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 96f8b5b..5b15ce7 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -51,11 +51,11 @@ enum opcode {
 	insn_dsll32, insn_dsra, insn_dsrl, insn_dsrl32, insn_dsubu, insn_eret,
 	insn_ext, insn_ins, insn_j, insn_jal, insn_jalr, insn_jr, insn_ld,
 	insn_ldx, insn_lh, insn_ll, insn_lld, insn_lui, insn_lw, insn_lwx,
-	insn_mfc0, insn_mfhi, insn_mtc0, insn_or, insn_ori, insn_pref,
-	insn_rfe, insn_rotr, insn_sc, insn_scd, insn_sd, insn_sll, insn_sllv,
-	insn_sltiu, insn_sltu, insn_sra, insn_srl, insn_srlv, insn_subu,
-	insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi, insn_tlbwr,
-	insn_wsbh, insn_xor, insn_xori,
+	insn_mfc0, insn_mfhi, insn_mtc0, insn_mul, insn_or, insn_ori,
+	insn_pref, insn_rfe, insn_rotr, insn_sc, insn_scd, insn_sd, insn_sll,
+	insn_sllv, insn_sltiu, insn_sltu, insn_sra, insn_srl, insn_srlv,
+	insn_subu, insn_sw, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi,
+	insn_tlbwr, insn_wsbh, insn_xor, insn_xori,
 };
 
 struct insn {
@@ -277,6 +277,7 @@ I_u2s3u1(_lw)
 I_u1u2u3(_mfc0)
 I_u1(_mfhi)
 I_u1u2u3(_mtc0)
+I_u3u1u2(_mul)
 I_u2u1u3(_ori)
 I_u3u1u2(_or)
 I_0(_rfe)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 13/14] MIPS: net: Add BPF JIT
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

This adds initial support for BPF-JIT on MIPS

Tested on mips32 LE/BE and mips64 BE/n64 using
dhcp, ping and various tcpdump filters.

Benchmarking:

Assuming the remote MIPS target uses 192.168.154.181
as its IP address, and the local host uses 192.168.154.136,
the following results can be obtained using the following
tcpdump filter (catches no frames) and a simple
'time ping -f -c 1000000' command.

[root@(none) ~]# tcpdump -p -n -s 0 -i eth0 net 10.0.0.0/24 -d
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 8
(002) ld       [26]
(003) and      #0xffffff00
(004) jeq      #0xa000000       jt 16	jf 5
(005) ld       [30]
(006) and      #0xffffff00
(007) jeq      #0xa000000       jt 16	jf 17
(008) jeq      #0x806           jt 10	jf 9
(009) jeq      #0x8035          jt 10	jf 17
(010) ld       [28]
(011) and      #0xffffff00
(012) jeq      #0xa000000       jt 16	jf 13
(013) ld       [38]
(014) and      #0xffffff00
(015) jeq      #0xa000000       jt 16	jf 17
(016) ret      #65535

- BPF-JIT Disabled

real    1m38.005s
user    0m1.510s
sys     0m6.710s

- BPF-JIT Enabled

real    1m35.215s
user    0m1.200s
sys     0m4.140s

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/Kbuild        |    1 +
 arch/mips/net/Makefile  |    3 +
 arch/mips/net/bpf_jit.c | 1327 +++++++++++++++++++++++++++++++++++++++++++++++
 arch/mips/net/bpf_jit.h |   44 ++
 4 files changed, 1375 insertions(+)
 create mode 100644 arch/mips/net/Makefile
 create mode 100644 arch/mips/net/bpf_jit.c
 create mode 100644 arch/mips/net/bpf_jit.h

diff --git a/arch/mips/Kbuild b/arch/mips/Kbuild
index d2cfe45..9179336 100644
--- a/arch/mips/Kbuild
+++ b/arch/mips/Kbuild
@@ -17,6 +17,7 @@ obj- := $(platform-)
 obj-y += kernel/
 obj-y += mm/
 obj-y += math-emu/
+obj-y += net/
 
 ifdef CONFIG_KVM
 obj-y += kvm/
diff --git a/arch/mips/net/Makefile b/arch/mips/net/Makefile
new file mode 100644
index 0000000..ae74b3a
--- /dev/null
+++ b/arch/mips/net/Makefile
@@ -0,0 +1,3 @@
+# MIPS networking code
+
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
new file mode 100644
index 0000000..6dedd91
--- /dev/null
+++ b/arch/mips/net/bpf_jit.c
@@ -0,0 +1,1327 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <linux/kconfig.h>
+#include <linux/moduleloader.h>
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <asm/bitops.h>
+#include <asm/cacheflush.h>
+#include <asm/uasm.h>
+
+#include "bpf_jit.h"
+
+/* ABI
+ *
+ * s0	1st scratch register
+ * s1	2nd scratch register
+ * s2	offset register
+ * s3	BPF register A
+ * s4	BPF register X
+ * s5	*skb
+ * s6	*scratch memory
+ *
+ * On entry (*bpf_func)(*skb, *filter)
+ * a0 = MIPS_R_A0 = skb;
+ * a1 = MIPS_R_A1 = filter;
+ *
+ * Stack
+ * ...
+ * M[15]
+ * M[14]
+ * M[13]
+ * ...
+ * M[0] <-- r_M
+ * saved reg k-1
+ * saved reg k-2
+ * ...
+ * saved reg 0 <-- r_sp
+ * <no argument area>
+ *
+ *                     Packet layout
+ *
+ * <--------------------- len ------------------------>
+ * <--skb-len(r_skb_hl)-->< ----- skb->data_len ------>
+ * ----------------------------------------------------
+ * |                  skb->data                       |
+ * ----------------------------------------------------
+ */
+
+#define RSIZE	(sizeof(unsigned long))
+#define ptr typeof(unsigned long)
+
+/* ABI specific return values */
+#ifdef CONFIG_32BIT /* O32 */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define r_err	MIPS_R_V1
+#define r_val	MIPS_R_V0
+#else /* CONFIG_CPU_LITTLE_ENDIAN */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V1
+#endif
+#else /* N64 */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V0
+#endif
+
+#define r_ret	MIPS_R_V0
+
+/*
+ * Use 2 scratch registers to avoid pipeline interlocks.
+ * There is no overhead during epilogue and prologue since
+ * any of the $s0-$s6 registers will only be preserved if
+ * they are going to actually be used.
+ */
+#define r_s0		MIPS_R_S0 /* scratch reg 1 */
+#define r_s1		MIPS_R_S1 /* scratch reg 2 */
+#define r_off		MIPS_R_S2
+#define r_A		MIPS_R_S3
+#define r_X		MIPS_R_S4
+#define r_skb		MIPS_R_S5
+#define r_M		MIPS_R_S6
+#define r_tmp_imm	MIPS_R_T6 /* No need to preserve this */
+#define r_tmp		MIPS_R_T7 /* No need to preserve this */
+#define r_zero		MIPS_R_ZERO
+#define r_sp		MIPS_R_SP
+#define r_ra		MIPS_R_RA
+
+#define SCRATCH_OFF(k)		(4 * (k))
+
+/* JIT flags */
+#define SEEN_CALL		(1 << BPF_MEMWORDS)
+#define SEEN_SREG_SFT		(BPF_MEMWORDS + 1)
+#define SEEN_SREG_BASE		(1 << SEEN_SREG_SFT)
+#define SEEN_SREG(x)		(SEEN_SREG_BASE << (x))
+#define SEEN_S0			SEEN_SREG(0)
+#define SEEN_S1			SEEN_SREG(1)
+#define SEEN_OFF		SEEN_SREG(2)
+#define SEEN_A			SEEN_SREG(3)
+#define SEEN_X			SEEN_SREG(4)
+#define SEEN_SKB		SEEN_SREG(5)
+#define SEEN_MEM		SEEN_SREG(6)
+
+/* Arguments used by JIT */
+#define ARGS_USED_BY_JIT	2 /* only applicable to 64-bit */
+
+#define FLAG_NEED_X_RESET	(1 << 0)
+
+#define SBIT(x)			(1 << (x)) /* Signed version of BIT() */
+
+/**
+ * struct jit_ctx - JIT context
+ * @skf:		The sk_filter
+ * @prologue_bytes:	Number of bytes for prologue
+ * @idx:		Instruction index
+ * @flags:		JIT flags
+ * @offsets:		Instruction offsets
+ * @target:		Memory location for the compiled filter
+ */
+struct jit_ctx {
+	const struct sk_filter *skf;
+	unsigned int prologue_bytes;
+	u32 idx;
+	u32 flags;
+	u32 *offsets;
+	u32 *target;
+};
+
+
+static inline int optimize_div(u32 *k)
+{
+	/* power of 2 divides can be implemented with right shift */
+	if (!(*k & (*k-1))) {
+		*k = ilog2(*k);
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Simply emit the instruction if the JIT memory space has been allocated */
+#define emit_instr(ctx, func, ...)			\
+do {							\
+	if ((ctx)->target != NULL) {			\
+		u32 *p = &(ctx)->target[ctx->idx];	\
+		uasm_i_##func(&p, ##__VA_ARGS__);	\
+	}						\
+	(ctx)->idx++;					\
+} while (0)
+
+/* Determine if immediate is within the 16-bit signed range */
+static inline bool is_range16(s32 imm)
+{
+	if (imm >= SBIT(15) || imm < -SBIT(15))
+		return true;
+	return false;
+}
+
+static inline void emit_addu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, addu, dst, src1, src2);
+}
+
+static inline void emit_ins(unsigned int dst, unsigned int src,
+			    unsigned int pos, unsigned int size,
+			    struct jit_ctx *ctx)
+{
+	emit_instr(ctx, ins, dst, src, pos, size);
+}
+
+static inline void emit_nop(struct jit_ctx *ctx)
+{
+	emit_instr(ctx, nop);
+}
+
+/* Load a u32 immediate to a register */
+static inline void emit_load_imm(unsigned int dst, u32 imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		/* addiu can only handle s16 */
+		if (is_range16(imm)) {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_lui(&p, r_tmp_imm, (s32)imm >> 16);
+			p = &ctx->target[ctx->idx + 1];
+			uasm_i_ori(&p, dst, r_tmp_imm, imm & 0xffff);
+		} else {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_addiu(&p, dst, r_zero, imm);
+		}
+	}
+	ctx->idx++;
+
+	if (is_range16(imm))
+		ctx->idx++;
+}
+
+static inline void emit_or(unsigned int dst, unsigned int src1,
+			   unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, or, dst, src1, src2);
+}
+
+static inline void emit_ori(unsigned int dst, unsigned src, u32 imm,
+			    struct jit_ctx *ctx)
+{
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_or(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, ori, dst, src, imm);
+	}
+
+}
+
+
+static inline void emit_daddu(unsigned int dst, unsigned int src1,
+			      unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, daddu, dst, src1, src2);
+}
+
+static inline void emit_daddiu(unsigned int dst, unsigned int src,
+			       int imm, struct jit_ctx *ctx)
+{
+	/*
+	 * Only used for stack, so the imm is relatively small
+	 * and it fits in 15-bits
+	 */
+	emit_instr(ctx, daddiu, dst, src, imm);
+}
+
+static inline void emit_addiu(unsigned int dst, unsigned int src,
+			      u32 imm, struct jit_ctx *ctx)
+{
+	if (is_range16(imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_addu(dst, r_tmp, src, ctx);
+	} else {
+		emit_instr(ctx, addiu, dst, src, imm);
+	}
+}
+
+static inline void emit_and(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, and, dst, src1, src2);
+}
+
+static inline void emit_andi(unsigned int dst, unsigned int src,
+			     u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_and(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, andi, dst, src, imm);
+	}
+}
+
+static inline void emit_xor(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, xor, dst, src1, src2);
+}
+
+static inline void emit_xori(ptr dst, ptr src, u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_xor(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, xori, dst, src, imm);
+	}
+}
+
+static inline void emit_stack_offset(int offset, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, daddiu, r_sp, r_sp, offset);
+	else
+		emit_instr(ctx, addiu, r_sp, r_sp, offset);
+
+}
+
+static inline void emit_subu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, subu, dst, src1, src2);
+}
+
+static inline void emit_neg(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_subu(reg, r_zero, reg, ctx);
+}
+
+static inline void emit_sllv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sllv, dst, src, sa);
+}
+
+static inline void emit_sll(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, sll, dst, src, sa);
+}
+
+static inline void emit_srlv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, srlv, dst, src, sa);
+}
+
+static inline void emit_srl(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, srl, dst, src, sa);
+}
+
+static inline void emit_sltu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sltu, dst, src1, src2);
+}
+
+static inline void emit_sltiu(unsigned dst, unsigned int src,
+			      unsigned int imm, struct jit_ctx *ctx)
+{
+	/* 16 bit immediate */
+	if (is_range16((s32)imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_sltu(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, sltiu, dst, src, imm);
+	}
+
+}
+
+/* Store register on the stack */
+static inline void emit_store_stack_reg(ptr reg, ptr base,
+					unsigned int offset,
+					struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, sd, reg, offset, base);
+	else
+		emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_store(ptr reg, ptr base, unsigned int offset,
+			      struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_load_stack_reg(ptr reg, ptr base,
+				       unsigned int offset,
+				       struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, ld, reg, offset, base);
+	else
+		emit_instr(ctx, lw, reg, offset, base);
+
+}
+
+static inline void emit_load(unsigned int reg, unsigned int base,
+			     unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_half_load(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lh, reg, offset, base);
+}
+
+static inline void emit_mul(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, mul, dst, src1, src2);
+}
+
+static inline void emit_div(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mfhi(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_dsll(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsll, dst, src, sa);
+}
+
+static inline void emit_dsrl32(unsigned int dst, unsigned int src,
+			       unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsrl32, dst, src, sa);
+}
+
+static inline void emit_wsbh(unsigned int dst, unsigned int src,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, wsbh, dst, src);
+}
+
+/* load a function pointer to register */
+static inline void emit_load_func(unsigned int reg, ptr imm,
+				  struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT)) {
+		/* At this point imm is always 64-bit */
+		emit_load_imm(r_tmp, imm >> 32, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(r_tmp, r_tmp_imm, (imm >> 16) & 0xffff, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(reg, r_tmp_imm, imm & 0xffff, ctx);
+	} else {
+		emit_load_imm(reg, imm, ctx);
+	}
+}
+
+/* Move to real MIPS register */
+static inline void emit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_daddu(dst, src, r_zero, ctx);
+	else
+		emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Move to JIT (32-bit) register */
+static inline void emit_jit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Compute the immediate value for PC-relative branches. */
+static inline u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
+{
+	if (ctx->target == NULL)
+		return 0;
+
+	/*
+	 * We want a pc-relative branch. We only do forward branches
+	 * so tgt is always after pc. tgt is the instruction offset
+	 * we want to jump to.
+
+	 * Branch on MIPS:
+	 * I: target_offset <- sign_extend(offset)
+	 * I+1: PC += target_offset (delay slot)
+	 *
+	 * ctx->idx currently points to the branch instruction
+	 * but the offset is added to the delay slot so we need
+	 * to subtract 4.
+	 */
+	return ctx->offsets[tgt] -
+		(ctx->idx * 4 - ctx->prologue_bytes) - 4;
+}
+
+static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
+			     unsigned int imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+
+		switch (cond) {
+		case MIPS_COND_EQ:
+			uasm_i_beq(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_NE:
+			uasm_i_bne(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_ALL:
+			uasm_i_b(&p, imm);
+			break;
+		default:
+			pr_warn("%s: Unhandled branch conditional: %d\n",
+				__func__, cond);
+		}
+	}
+	ctx->idx++;
+}
+
+static inline void emit_b(unsigned int imm, struct jit_ctx *ctx)
+{
+	emit_bcond(MIPS_COND_ALL, r_zero, r_zero, imm, ctx);
+}
+
+static inline void emit_jalr(unsigned int link, unsigned int reg,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jalr, link, reg);
+}
+
+static inline void emit_jr(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jr, reg);
+}
+
+static inline u16 align_sp(unsigned int num)
+{
+	/* Double word alignment for 32-bit, quadword for 64-bit */
+	unsigned int align = config_enabled(CONFIG_64BIT) ? 16 : 8;
+	num = (num + (align - 1)) & -align;
+	return num;
+}
+
+static inline void update_on_xread(struct jit_ctx *ctx)
+{
+	if (!(ctx->flags & SEEN_X))
+		ctx->flags |= FLAG_NEED_X_RESET;
+
+	ctx->flags |= SEEN_X;
+}
+
+static bool is_load_to_a(u16 inst)
+{
+	switch (inst) {
+	case BPF_S_LD_W_LEN:
+	case BPF_S_LD_W_ABS:
+	case BPF_S_LD_H_ABS:
+	case BPF_S_LD_B_ABS:
+	case BPF_S_ANC_CPU:
+	case BPF_S_ANC_IFINDEX:
+	case BPF_S_ANC_MARK:
+	case BPF_S_ANC_PROTOCOL:
+	case BPF_S_ANC_RXHASH:
+	case BPF_S_ANC_VLAN_TAG:
+	case BPF_S_ANC_VLAN_TAG_PRESENT:
+	case BPF_S_ANC_QUEUE:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
+{
+	int i = 0, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	/* Adjust the stack pointer */
+	emit_stack_offset(-align_sp(offset), ctx);
+
+	if (ctx->flags & SEEN_CALL) {
+		/* Argument save area */
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is essentially a bitmap */
+	pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			pr_debug("%s: preserving register %d\n", __func__,
+				 MIPS_R_S0 + i);
+			emit_store_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					     ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* save return address */
+	if (ctx->flags & SEEN_CALL) {
+		emit_store_stack_reg(r_ra, r_sp, real_off, ctx);
+		real_off += RSIZE;
+	}
+
+	/* Setup r_M leaving the alignment gap if necessary */
+	if (ctx->flags & SEEN_MEM) {
+		if (real_off % (RSIZE * 2))
+			real_off += RSIZE;
+		emit_addiu(r_M, r_sp, real_off, ctx);
+	}
+
+}
+
+static void restore_bpf_jit_regs(struct jit_ctx *ctx,
+				 unsigned int offset)
+{
+	int i, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	if (ctx->flags & SEEN_CALL) {
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_load_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_load_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is a bitmap */
+	pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);
+	i = 0;
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			pr_debug("%s: restoring register %d\n", __func__,
+				 MIPS_R_S0 + i);
+			emit_load_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					    ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* restore return address */
+	if (ctx->flags & SEEN_CALL)
+		emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
+
+	/* Restore the sp and discard the scrach memory */
+	emit_stack_offset(align_sp(offset), ctx);
+
+}
+
+static unsigned int get_stack_depth(struct jit_ctx *ctx)
+{
+	int sp_off = 0;
+
+
+	/* How may s* regs do we need to preserved? */
+	sp_off += hweight32(ctx->flags >> SEEN_SREG_SFT) * RSIZE;
+
+	if (ctx->flags & SEEN_MEM)
+		sp_off += 4 * BPF_MEMWORDS; /* BPF_MEMWORDS are 32-bit */
+
+	if (ctx->flags & SEEN_CALL)
+		/*
+		 * The JIT code make calls to external functions using 2
+		 * arguments. Therefore, for o32 we don't need to allocate
+		 * space because we don't care if the argumetns are lost
+		 * across calls. We do need however to preserve incoming
+		 * arguments but the space is already allocated for us by
+		 * the caller. On the other hand, for n64, we need to allocate
+		 * this space ourselves. We need to preserve $ra as well.
+		 */
+		sp_off += config_enabled(CONFIG_64BIT) ?
+			(ARGS_USED_BY_JIT + 1) * RSIZE : RSIZE;
+
+	/*
+	 * Subtract the bytes for the last registers since we only care about
+	 * the location on the stack pointer.
+	 */
+	return sp_off - RSIZE;
+}
+
+static void build_prologue(struct jit_ctx *ctx)
+{
+	u16 first_inst = ctx->skf->insns[0].code;
+	int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+	sp_off = get_stack_depth(ctx);
+	save_bpf_jit_regs(ctx, sp_off);
+
+	if (ctx->flags & SEEN_SKB)
+		emit_reg_move(r_skb, MIPS_R_A0, ctx);
+
+	if (ctx->flags & FLAG_NEED_X_RESET)
+		emit_jit_reg_move(r_X, r_zero, ctx);
+
+	/* Do not leak kernel data to userspace */
+	if ((first_inst != BPF_S_RET_K) && !(is_load_to_a(first_inst)))
+		emit_jit_reg_move(r_A, r_zero, ctx);
+
+}
+
+static void build_epilogue(struct jit_ctx *ctx)
+{
+	unsigned int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+
+	sp_off = get_stack_depth(ctx);
+	restore_bpf_jit_regs(ctx, sp_off);
+
+	/* Return */
+	emit_jr(r_ra, ctx);
+	emit_nop(ctx);
+}
+
+static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
+{
+	u8 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 1);
+
+	return (u64)err << 32 | ret;
+}
+
+static u64 jit_get_skb_h(struct sk_buff *skb, unsigned offset)
+{
+	u16 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 2);
+
+	return (u64)err << 32 | ntohs(ret);
+}
+
+static u64 jit_get_skb_w(struct sk_buff *skb, unsigned offset)
+{
+	u32 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 4);
+
+	return (u64)err << 32 | ntohl(ret);
+}
+
+static int build_body(struct jit_ctx *ctx)
+{
+	void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
+	const struct sk_filter *prog = ctx->skf;
+	const struct sock_filter *inst;
+	unsigned int i, off, load_order, condt;
+	u32 k, b_off __maybe_unused;
+
+	for (i = 0; i < prog->len; i++) {
+		inst = &(prog->insns[i]);
+		pr_debug("%s: code->0x%02x, jt->0x%x, jf->0x%x, k->0x%x\n",
+			 __func__, inst->code, inst->jt, inst->jf, inst->k);
+		k = inst->k;
+
+		if (ctx->target == NULL)
+			ctx->offsets[i] = ctx->idx * 4;
+
+		switch (inst->code) {
+		case BPF_S_LD_IMM:
+			/* A <- k ==> li r_A, k */
+			ctx->flags |= SEEN_A;
+			emit_load_imm(r_A, k, ctx);
+			break;
+		case BPF_S_LD_W_LEN:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4);
+			/* A <- len ==> lw r_A, offset(skb) */
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_LD_MEM:
+			/* A <- M[k] ==> lw r_A, offset(M) */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_load(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LD_W_ABS:
+			/* A <- P[k:4] */
+			load_order = 2;
+			goto load;
+		case BPF_S_LD_H_ABS:
+			/* A <- P[k:2] */
+			load_order = 1;
+			goto load;
+		case BPF_S_LD_B_ABS:
+			/* A <- P[k:1] */
+			load_order = 0;
+load:
+			emit_load_imm(r_off, k, ctx);
+load_common:
+			ctx->flags |= SEEN_CALL | SEEN_OFF | SEEN_S0 |
+				SEEN_SKB | SEEN_A;
+
+			emit_load_func(r_s0, (ptr)load_func[load_order],
+				      ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			/* Load second argument to delay slot */
+			emit_reg_move(MIPS_R_A1, r_off, ctx);
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Get error code from the top 32-bits */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero, 3 << 2,
+					   ctx);
+			} else {
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_err, r_zero, 3 << 2,
+					   ctx);
+			}
+			emit_nop(ctx);
+			/* We are good */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_jit_reg_move(r_A, r_val, ctx);
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			break;
+		case BPF_S_LD_W_IND:
+			/* A <- P[X + k:4] */
+			load_order = 2;
+			goto load_ind;
+		case BPF_S_LD_H_IND:
+			/* A <- P[X + k:2] */
+			load_order = 1;
+			goto load_ind;
+		case BPF_S_LD_B_IND:
+			/* A <- P[X + k:1] */
+			load_order = 0;
+load_ind:
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_OFF | SEEN_X;
+			emit_addiu(r_off, r_X, k, ctx);
+		goto load_common;
+		case BPF_S_LDX_IMM:
+			/* X <- k */
+			ctx->flags |= SEEN_X;
+			emit_load_imm(r_X, k, ctx);
+			break;
+		case BPF_S_LDX_MEM:
+			/* X <- M[k] */
+			ctx->flags |= SEEN_X | SEEN_MEM;
+			emit_load(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LDX_W_LEN:
+			/* X <- len */
+			ctx->flags |= SEEN_X | SEEN_SKB;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_X, r_skb, off, ctx);
+			break;
+		case BPF_S_LDX_B_MSH:
+			/* X <- 4 * (P[k:1] & 0xf) */
+			ctx->flags |= SEEN_X | SEEN_CALL | SEEN_S0 | SEEN_SKB;
+			/* Load offset to a1 */
+			emit_load_func(r_s0, (ptr)jit_get_skb_b, ctx);
+			/*
+			 * This may emit two instructions so it may not fit
+			 * in the delay slot. So use a0 in the delay slot.
+			 */
+			emit_load_imm(MIPS_R_A1, k, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx); /* delay slot */
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Top 32-bits of $v0 on 64-bit */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero,
+					   3 << 2, ctx);
+			} else {
+				emit_bcond(MIPS_COND_NE, r_err, r_zero,
+					   3 << 2, ctx);
+			}
+			/* No need for delay slot */
+			/* We are good */
+			/* X <- P[1:K] & 0xf */
+			emit_andi(r_X, r_val, 0xf, ctx);
+			/* X << 2 */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_sll(r_X, r_X, 2, ctx); /* delay slot */
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_ret, 0, ctx); /* delay slot */
+			break;
+		case BPF_S_ST:
+			/* M[k] <- A */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_store(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_STX:
+			/* M[k] <- X */
+			ctx->flags |= SEEN_MEM | SEEN_X;
+			emit_store(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_ALU_ADD_K:
+			/* A += K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_SUB_K:
+			/* A -= K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, -k, ctx);
+			break;
+		case BPF_S_ALU_MUL_K:
+			/* A *= K */
+			/* Load K to scratch register before MUL */
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_mul(r_A, r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MUL_X:
+			/* A *= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_mul(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_DIV_K:
+			/* A /= k */
+			if (k == 1)
+				break;
+			if (optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_srl(r_A, r_A, k, ctx);
+				break;
+			}
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_div(r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_DIV_X:
+			/* A /= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_div(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_OR_K:
+			/* A |= K */
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_OR_X:
+			/* A |= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_XOR_K:
+			/* A ^= k */
+			ctx->flags |= SEEN_A;
+			emit_xori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ANC_ALU_XOR_X:
+		case BPF_S_ALU_XOR_X:
+			/* A ^= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_xor(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_AND_K:
+			/* A &= K */
+			ctx->flags |= SEEN_A;
+			emit_andi(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_AND_X:
+			/* A &= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_and(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_LSH_K:
+			/* A <<= K */
+			ctx->flags |= SEEN_A;
+			emit_sll(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_LSH_X:
+			/* A <<= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_sllv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_RSH_K:
+			/* A >>= K */
+			ctx->flags |= SEEN_A;
+			emit_srl(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_RSH_X:
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_srlv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_NEG:
+			/* A = -A */
+			ctx->flags |= SEEN_A;
+			emit_neg(r_A, ctx);
+			break;
+		case BPF_S_JMP_JA:
+			/* pc += K */
+			emit_b(b_imm(i + k + 1, ctx), ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JEQ_K:
+			/* pc += ( A == K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JEQ_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A == X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_K:
+			/* pc += ( A >= K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A >= X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_K:
+			/* pc += ( A > K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A > X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_X;
+jmp_cmp:
+			/* Greater or Equal */
+			if ((condt & MIPS_COND_GE) ||
+			    (condt & MIPS_COND_GT)) {
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_sltiu(r_s0, r_A, k, ctx);
+				} else { /* X */
+					ctx->flags |= SEEN_S0 | SEEN_A |
+						SEEN_X;
+					emit_sltu(r_s0, r_A, r_X, ctx);
+				}
+				/* A < (K|X) ? r_scrach = 1 */
+				b_off = b_imm(i + inst->jf + 1, ctx);
+				emit_bcond(MIPS_COND_GT, r_s0, r_zero, b_off,
+					   ctx);
+				emit_nop(ctx);
+				/* A > (K|X) ? scratch = 0 */
+				if (condt & MIPS_COND_GT) {
+					/* Checking for equality */
+					ctx->flags |= SEEN_S0 | SEEN_A | SEEN_X;
+					if (condt & MIPS_COND_K)
+						emit_load_imm(r_s0, k, ctx);
+					else
+						emit_jit_reg_move(r_s0, r_X,
+								  ctx);
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* Finally, A > K|X */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				} else {
+					/* A >= (K|X) so jump */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				}
+			} else {
+				/* A == K|X */
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_load_imm(r_s0, k, ctx);
+					/* jump true */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+				} else { /* X */
+					/* jump true */
+					ctx->flags |= SEEN_A | SEEN_X;
+					b_off = b_imm(i + inst->jt + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+				}
+			}
+			break;
+		case BPF_S_JMP_JSET_K:
+			ctx->flags |= SEEN_S0 | SEEN_S1 | SEEN_A;
+			/* pc += (A & K) ? pc -> jt : pc -> jf */
+			emit_load_imm(r_s1, k, ctx);
+			emit_and(r_s0, r_A, r_s1, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JSET_X:
+			ctx->flags |= SEEN_S0 | SEEN_X | SEEN_A;
+			/* pc += (A & X) ? pc -> jt : pc -> jf */
+			emit_and(r_s0, r_A, r_X, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_RET_A:
+			ctx->flags |= SEEN_A;
+			if (i != prog->len - 1)
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_A, ctx); /* delay slot */
+			break;
+		case BPF_S_RET_K:
+			/*
+			 * It can emit two instructions so it does not fit on
+			 * the delay slot.
+			 */
+			emit_load_imm(r_ret, k, ctx);
+			if (i != prog->len - 1) {
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+				emit_nop(ctx);
+			}
+			break;
+		case BPF_S_MISC_TAX:
+			/* X = A */
+			ctx->flags |= SEEN_X | SEEN_A;
+			emit_jit_reg_move(r_X, r_A, ctx);
+			break;
+		case BPF_S_MISC_TXA:
+			/* A = X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_jit_reg_move(r_A, r_X, ctx);
+			break;
+		/* AUX */
+		case BPF_S_ANC_PROTOCOL:
+			/* A = ntohs(skb->protocol */
+			ctx->flags |= SEEN_SKB | SEEN_OFF | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  protocol) != 2);
+			off = offsetof(struct sk_buff, protocol);
+			emit_half_load(r_A, r_skb, off, ctx);
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+			/* This needs little endian fixup */
+			emit_wsbh(r_A, r_A, ctx);
+#endif
+			break;
+		case BPF_S_ANC_CPU:
+			ctx->flags |= SEEN_A | SEEN_OFF;
+			/* A = current_thread_info()->cpu */
+			BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info,
+						  cpu) != 4);
+			off = offsetof(struct thread_info, cpu);
+			/* $28/gp points to the thread_info struct */
+			emit_load(r_A, 28, off, ctx);
+			break;
+		case BPF_S_ANC_IFINDEX:
+			/* A = skb->dev->ifindex */
+			ctx->flags |= SEEN_SKB | SEEN_A | SEEN_S0;
+			off = offsetof(struct sk_buff, dev);
+			emit_load(r_s0, r_skb, off, ctx);
+			/* error (0) in the delay slot */
+			emit_bcond(MIPS_COND_EQ, r_s0, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+						  ifindex) != 4);
+			off = offsetof(struct net_device, ifindex);
+			emit_load(r_A, r_s0, off, ctx);
+			break;
+		case BPF_S_ANC_MARK:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
+			off = offsetof(struct sk_buff, mark);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_RXHASH:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, rxhash) != 4);
+			off = offsetof(struct sk_buff, rxhash);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_VLAN_TAG_PRESENT:
+			ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  vlan_tci) != 2);
+			off = offsetof(struct sk_buff, vlan_tci);
+			emit_half_load(r_s0, r_skb, off, ctx);
+			if (inst->code == BPF_S_ANC_VLAN_TAG)
+				emit_and(r_A, r_s0, VLAN_VID_MASK, ctx);
+			else
+				emit_and(r_A, r_s0, VLAN_TAG_PRESENT, ctx);
+			break;
+		case BPF_S_ANC_QUEUE:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  queue_mapping) != 2);
+			BUILD_BUG_ON(offsetof(struct sk_buff,
+					      queue_mapping) > 0xff);
+			off = offsetof(struct sk_buff, queue_mapping);
+			emit_half_load(r_A, r_skb, off, ctx);
+			break;
+		default:
+			pr_warn("%s: Unhandled opcode: 0x%02x\n", __FILE__,
+				inst->code);
+			return -1;
+		}
+	}
+
+	/* compute offsets only during the first pass */
+	if (ctx->target == NULL)
+		ctx->offsets[i] = ctx->idx * 4;
+
+	return 0;
+
+}
+
+int bpf_jit_enable __read_mostly;
+
+void bpf_jit_compile(struct sk_filter *fp)
+{
+	struct jit_ctx ctx;
+	unsigned int alloc_size, tmp_idx;
+
+	if (!bpf_jit_enable)
+		return;
+
+	memset(&ctx, 0, sizeof(ctx));
+
+	ctx.offsets = kcalloc(fp->len, sizeof(*ctx.offsets), GFP_KERNEL);
+	if (ctx.offsets == NULL)
+		return;
+
+	ctx.skf = fp;
+
+	if (unlikely(build_body(&ctx)))
+		goto out;
+
+	tmp_idx = ctx.idx;
+	build_prologue(&ctx);
+	ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
+	/* just to complete the ctx.idx count */
+	build_epilogue(&ctx);
+
+	alloc_size = 4 * ctx.idx;
+	ctx.target = module_alloc(alloc_size);
+	if (ctx.target == NULL)
+		goto out;
+
+	/* Clean it */
+	memset(ctx.target, 0, alloc_size);
+
+	ctx.idx = 0;
+
+	/* Generate the actual JIT code */
+	build_prologue(&ctx);
+	build_body(&ctx);
+	build_epilogue(&ctx);
+
+	/* Update the icache */
+	flush_icache_range((ptr)ctx.target, (ptr)(ctx.target + ctx.idx));
+
+	if (bpf_jit_enable > 1)
+		/* Dump JIT code */
+		bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);
+
+	fp->bpf_func = (void *)ctx.target;
+
+out:
+	kfree(ctx.offsets);
+}
+
+void bpf_jit_free(struct sk_filter *fp)
+{
+	if (fp->bpf_func != sk_run_filter)
+		module_free(NULL, fp->bpf_func);
+	kfree(fp);
+}
diff --git a/arch/mips/net/bpf_jit.h b/arch/mips/net/bpf_jit.h
new file mode 100644
index 0000000..3a5751b
--- /dev/null
+++ b/arch/mips/net/bpf_jit.h
@@ -0,0 +1,44 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#ifndef BPF_JIT_MIPS_OP_H
+#define BPF_JIT_MIPS_OP_H
+
+/* Registers used by JIT */
+#define MIPS_R_ZERO	0
+#define MIPS_R_V0	2
+#define MIPS_R_V1	3
+#define MIPS_R_A0	4
+#define MIPS_R_A1	5
+#define MIPS_R_T6	14
+#define MIPS_R_T7	15
+#define MIPS_R_S0	16
+#define MIPS_R_S1	17
+#define MIPS_R_S2	18
+#define MIPS_R_S3	19
+#define MIPS_R_S4	20
+#define MIPS_R_S5	21
+#define MIPS_R_S6	22
+#define MIPS_R_S7	23
+#define MIPS_R_SP	29
+#define MIPS_R_RA	31
+
+/* Conditional codes */
+#define MIPS_COND_EQ	0x1
+#define MIPS_COND_GE	(0x1 << 1)
+#define MIPS_COND_GT	(0x1 << 2)
+#define MIPS_COND_NE	(0x1 << 3)
+#define MIPS_COND_ALL	(0x1 << 4)
+/* Conditionals on X register or K immediate */
+#define MIPS_COND_X	(0x1 << 5)
+#define MIPS_COND_K	(0x1 << 6)
+
+#endif /* BPF_JIT_MIPS_OP_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 13/14] MIPS: net: Add BPF JIT
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

This adds initial support for BPF-JIT on MIPS

Tested on mips32 LE/BE and mips64 BE/n64 using
dhcp, ping and various tcpdump filters.

Benchmarking:

Assuming the remote MIPS target uses 192.168.154.181
as its IP address, and the local host uses 192.168.154.136,
the following results can be obtained using the following
tcpdump filter (catches no frames) and a simple
'time ping -f -c 1000000' command.

[root@(none) ~]# tcpdump -p -n -s 0 -i eth0 net 10.0.0.0/24 -d
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 8
(002) ld       [26]
(003) and      #0xffffff00
(004) jeq      #0xa000000       jt 16	jf 5
(005) ld       [30]
(006) and      #0xffffff00
(007) jeq      #0xa000000       jt 16	jf 17
(008) jeq      #0x806           jt 10	jf 9
(009) jeq      #0x8035          jt 10	jf 17
(010) ld       [28]
(011) and      #0xffffff00
(012) jeq      #0xa000000       jt 16	jf 13
(013) ld       [38]
(014) and      #0xffffff00
(015) jeq      #0xa000000       jt 16	jf 17
(016) ret      #65535

- BPF-JIT Disabled

real    1m38.005s
user    0m1.510s
sys     0m6.710s

- BPF-JIT Enabled

real    1m35.215s
user    0m1.200s
sys     0m4.140s

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/Kbuild        |    1 +
 arch/mips/net/Makefile  |    3 +
 arch/mips/net/bpf_jit.c | 1327 +++++++++++++++++++++++++++++++++++++++++++++++
 arch/mips/net/bpf_jit.h |   44 ++
 4 files changed, 1375 insertions(+)
 create mode 100644 arch/mips/net/Makefile
 create mode 100644 arch/mips/net/bpf_jit.c
 create mode 100644 arch/mips/net/bpf_jit.h

diff --git a/arch/mips/Kbuild b/arch/mips/Kbuild
index d2cfe45..9179336 100644
--- a/arch/mips/Kbuild
+++ b/arch/mips/Kbuild
@@ -17,6 +17,7 @@ obj- := $(platform-)
 obj-y += kernel/
 obj-y += mm/
 obj-y += math-emu/
+obj-y += net/
 
 ifdef CONFIG_KVM
 obj-y += kvm/
diff --git a/arch/mips/net/Makefile b/arch/mips/net/Makefile
new file mode 100644
index 0000000..ae74b3a
--- /dev/null
+++ b/arch/mips/net/Makefile
@@ -0,0 +1,3 @@
+# MIPS networking code
+
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
new file mode 100644
index 0000000..6dedd91
--- /dev/null
+++ b/arch/mips/net/bpf_jit.c
@@ -0,0 +1,1327 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <linux/kconfig.h>
+#include <linux/moduleloader.h>
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <asm/bitops.h>
+#include <asm/cacheflush.h>
+#include <asm/uasm.h>
+
+#include "bpf_jit.h"
+
+/* ABI
+ *
+ * s0	1st scratch register
+ * s1	2nd scratch register
+ * s2	offset register
+ * s3	BPF register A
+ * s4	BPF register X
+ * s5	*skb
+ * s6	*scratch memory
+ *
+ * On entry (*bpf_func)(*skb, *filter)
+ * a0 = MIPS_R_A0 = skb;
+ * a1 = MIPS_R_A1 = filter;
+ *
+ * Stack
+ * ...
+ * M[15]
+ * M[14]
+ * M[13]
+ * ...
+ * M[0] <-- r_M
+ * saved reg k-1
+ * saved reg k-2
+ * ...
+ * saved reg 0 <-- r_sp
+ * <no argument area>
+ *
+ *                     Packet layout
+ *
+ * <--------------------- len ------------------------>
+ * <--skb-len(r_skb_hl)-->< ----- skb->data_len ------>
+ * ----------------------------------------------------
+ * |                  skb->data                       |
+ * ----------------------------------------------------
+ */
+
+#define RSIZE	(sizeof(unsigned long))
+#define ptr typeof(unsigned long)
+
+/* ABI specific return values */
+#ifdef CONFIG_32BIT /* O32 */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define r_err	MIPS_R_V1
+#define r_val	MIPS_R_V0
+#else /* CONFIG_CPU_LITTLE_ENDIAN */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V1
+#endif
+#else /* N64 */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V0
+#endif
+
+#define r_ret	MIPS_R_V0
+
+/*
+ * Use 2 scratch registers to avoid pipeline interlocks.
+ * There is no overhead during epilogue and prologue since
+ * any of the $s0-$s6 registers will only be preserved if
+ * they are going to actually be used.
+ */
+#define r_s0		MIPS_R_S0 /* scratch reg 1 */
+#define r_s1		MIPS_R_S1 /* scratch reg 2 */
+#define r_off		MIPS_R_S2
+#define r_A		MIPS_R_S3
+#define r_X		MIPS_R_S4
+#define r_skb		MIPS_R_S5
+#define r_M		MIPS_R_S6
+#define r_tmp_imm	MIPS_R_T6 /* No need to preserve this */
+#define r_tmp		MIPS_R_T7 /* No need to preserve this */
+#define r_zero		MIPS_R_ZERO
+#define r_sp		MIPS_R_SP
+#define r_ra		MIPS_R_RA
+
+#define SCRATCH_OFF(k)		(4 * (k))
+
+/* JIT flags */
+#define SEEN_CALL		(1 << BPF_MEMWORDS)
+#define SEEN_SREG_SFT		(BPF_MEMWORDS + 1)
+#define SEEN_SREG_BASE		(1 << SEEN_SREG_SFT)
+#define SEEN_SREG(x)		(SEEN_SREG_BASE << (x))
+#define SEEN_S0			SEEN_SREG(0)
+#define SEEN_S1			SEEN_SREG(1)
+#define SEEN_OFF		SEEN_SREG(2)
+#define SEEN_A			SEEN_SREG(3)
+#define SEEN_X			SEEN_SREG(4)
+#define SEEN_SKB		SEEN_SREG(5)
+#define SEEN_MEM		SEEN_SREG(6)
+
+/* Arguments used by JIT */
+#define ARGS_USED_BY_JIT	2 /* only applicable to 64-bit */
+
+#define FLAG_NEED_X_RESET	(1 << 0)
+
+#define SBIT(x)			(1 << (x)) /* Signed version of BIT() */
+
+/**
+ * struct jit_ctx - JIT context
+ * @skf:		The sk_filter
+ * @prologue_bytes:	Number of bytes for prologue
+ * @idx:		Instruction index
+ * @flags:		JIT flags
+ * @offsets:		Instruction offsets
+ * @target:		Memory location for the compiled filter
+ */
+struct jit_ctx {
+	const struct sk_filter *skf;
+	unsigned int prologue_bytes;
+	u32 idx;
+	u32 flags;
+	u32 *offsets;
+	u32 *target;
+};
+
+
+static inline int optimize_div(u32 *k)
+{
+	/* power of 2 divides can be implemented with right shift */
+	if (!(*k & (*k-1))) {
+		*k = ilog2(*k);
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Simply emit the instruction if the JIT memory space has been allocated */
+#define emit_instr(ctx, func, ...)			\
+do {							\
+	if ((ctx)->target != NULL) {			\
+		u32 *p = &(ctx)->target[ctx->idx];	\
+		uasm_i_##func(&p, ##__VA_ARGS__);	\
+	}						\
+	(ctx)->idx++;					\
+} while (0)
+
+/* Determine if immediate is within the 16-bit signed range */
+static inline bool is_range16(s32 imm)
+{
+	if (imm >= SBIT(15) || imm < -SBIT(15))
+		return true;
+	return false;
+}
+
+static inline void emit_addu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, addu, dst, src1, src2);
+}
+
+static inline void emit_ins(unsigned int dst, unsigned int src,
+			    unsigned int pos, unsigned int size,
+			    struct jit_ctx *ctx)
+{
+	emit_instr(ctx, ins, dst, src, pos, size);
+}
+
+static inline void emit_nop(struct jit_ctx *ctx)
+{
+	emit_instr(ctx, nop);
+}
+
+/* Load a u32 immediate to a register */
+static inline void emit_load_imm(unsigned int dst, u32 imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		/* addiu can only handle s16 */
+		if (is_range16(imm)) {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_lui(&p, r_tmp_imm, (s32)imm >> 16);
+			p = &ctx->target[ctx->idx + 1];
+			uasm_i_ori(&p, dst, r_tmp_imm, imm & 0xffff);
+		} else {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_addiu(&p, dst, r_zero, imm);
+		}
+	}
+	ctx->idx++;
+
+	if (is_range16(imm))
+		ctx->idx++;
+}
+
+static inline void emit_or(unsigned int dst, unsigned int src1,
+			   unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, or, dst, src1, src2);
+}
+
+static inline void emit_ori(unsigned int dst, unsigned src, u32 imm,
+			    struct jit_ctx *ctx)
+{
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_or(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, ori, dst, src, imm);
+	}
+
+}
+
+
+static inline void emit_daddu(unsigned int dst, unsigned int src1,
+			      unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, daddu, dst, src1, src2);
+}
+
+static inline void emit_daddiu(unsigned int dst, unsigned int src,
+			       int imm, struct jit_ctx *ctx)
+{
+	/*
+	 * Only used for stack, so the imm is relatively small
+	 * and it fits in 15-bits
+	 */
+	emit_instr(ctx, daddiu, dst, src, imm);
+}
+
+static inline void emit_addiu(unsigned int dst, unsigned int src,
+			      u32 imm, struct jit_ctx *ctx)
+{
+	if (is_range16(imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_addu(dst, r_tmp, src, ctx);
+	} else {
+		emit_instr(ctx, addiu, dst, src, imm);
+	}
+}
+
+static inline void emit_and(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, and, dst, src1, src2);
+}
+
+static inline void emit_andi(unsigned int dst, unsigned int src,
+			     u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_and(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, andi, dst, src, imm);
+	}
+}
+
+static inline void emit_xor(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, xor, dst, src1, src2);
+}
+
+static inline void emit_xori(ptr dst, ptr src, u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_xor(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, xori, dst, src, imm);
+	}
+}
+
+static inline void emit_stack_offset(int offset, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, daddiu, r_sp, r_sp, offset);
+	else
+		emit_instr(ctx, addiu, r_sp, r_sp, offset);
+
+}
+
+static inline void emit_subu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, subu, dst, src1, src2);
+}
+
+static inline void emit_neg(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_subu(reg, r_zero, reg, ctx);
+}
+
+static inline void emit_sllv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sllv, dst, src, sa);
+}
+
+static inline void emit_sll(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, sll, dst, src, sa);
+}
+
+static inline void emit_srlv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, srlv, dst, src, sa);
+}
+
+static inline void emit_srl(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, srl, dst, src, sa);
+}
+
+static inline void emit_sltu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sltu, dst, src1, src2);
+}
+
+static inline void emit_sltiu(unsigned dst, unsigned int src,
+			      unsigned int imm, struct jit_ctx *ctx)
+{
+	/* 16 bit immediate */
+	if (is_range16((s32)imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_sltu(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, sltiu, dst, src, imm);
+	}
+
+}
+
+/* Store register on the stack */
+static inline void emit_store_stack_reg(ptr reg, ptr base,
+					unsigned int offset,
+					struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, sd, reg, offset, base);
+	else
+		emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_store(ptr reg, ptr base, unsigned int offset,
+			      struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_load_stack_reg(ptr reg, ptr base,
+				       unsigned int offset,
+				       struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, ld, reg, offset, base);
+	else
+		emit_instr(ctx, lw, reg, offset, base);
+
+}
+
+static inline void emit_load(unsigned int reg, unsigned int base,
+			     unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_half_load(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lh, reg, offset, base);
+}
+
+static inline void emit_mul(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, mul, dst, src1, src2);
+}
+
+static inline void emit_div(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mfhi(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_dsll(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsll, dst, src, sa);
+}
+
+static inline void emit_dsrl32(unsigned int dst, unsigned int src,
+			       unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsrl32, dst, src, sa);
+}
+
+static inline void emit_wsbh(unsigned int dst, unsigned int src,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, wsbh, dst, src);
+}
+
+/* load a function pointer to register */
+static inline void emit_load_func(unsigned int reg, ptr imm,
+				  struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT)) {
+		/* At this point imm is always 64-bit */
+		emit_load_imm(r_tmp, imm >> 32, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(r_tmp, r_tmp_imm, (imm >> 16) & 0xffff, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(reg, r_tmp_imm, imm & 0xffff, ctx);
+	} else {
+		emit_load_imm(reg, imm, ctx);
+	}
+}
+
+/* Move to real MIPS register */
+static inline void emit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_daddu(dst, src, r_zero, ctx);
+	else
+		emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Move to JIT (32-bit) register */
+static inline void emit_jit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Compute the immediate value for PC-relative branches. */
+static inline u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
+{
+	if (ctx->target == NULL)
+		return 0;
+
+	/*
+	 * We want a pc-relative branch. We only do forward branches
+	 * so tgt is always after pc. tgt is the instruction offset
+	 * we want to jump to.
+
+	 * Branch on MIPS:
+	 * I: target_offset <- sign_extend(offset)
+	 * I+1: PC += target_offset (delay slot)
+	 *
+	 * ctx->idx currently points to the branch instruction
+	 * but the offset is added to the delay slot so we need
+	 * to subtract 4.
+	 */
+	return ctx->offsets[tgt] -
+		(ctx->idx * 4 - ctx->prologue_bytes) - 4;
+}
+
+static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
+			     unsigned int imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+
+		switch (cond) {
+		case MIPS_COND_EQ:
+			uasm_i_beq(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_NE:
+			uasm_i_bne(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_ALL:
+			uasm_i_b(&p, imm);
+			break;
+		default:
+			pr_warn("%s: Unhandled branch conditional: %d\n",
+				__func__, cond);
+		}
+	}
+	ctx->idx++;
+}
+
+static inline void emit_b(unsigned int imm, struct jit_ctx *ctx)
+{
+	emit_bcond(MIPS_COND_ALL, r_zero, r_zero, imm, ctx);
+}
+
+static inline void emit_jalr(unsigned int link, unsigned int reg,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jalr, link, reg);
+}
+
+static inline void emit_jr(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jr, reg);
+}
+
+static inline u16 align_sp(unsigned int num)
+{
+	/* Double word alignment for 32-bit, quadword for 64-bit */
+	unsigned int align = config_enabled(CONFIG_64BIT) ? 16 : 8;
+	num = (num + (align - 1)) & -align;
+	return num;
+}
+
+static inline void update_on_xread(struct jit_ctx *ctx)
+{
+	if (!(ctx->flags & SEEN_X))
+		ctx->flags |= FLAG_NEED_X_RESET;
+
+	ctx->flags |= SEEN_X;
+}
+
+static bool is_load_to_a(u16 inst)
+{
+	switch (inst) {
+	case BPF_S_LD_W_LEN:
+	case BPF_S_LD_W_ABS:
+	case BPF_S_LD_H_ABS:
+	case BPF_S_LD_B_ABS:
+	case BPF_S_ANC_CPU:
+	case BPF_S_ANC_IFINDEX:
+	case BPF_S_ANC_MARK:
+	case BPF_S_ANC_PROTOCOL:
+	case BPF_S_ANC_RXHASH:
+	case BPF_S_ANC_VLAN_TAG:
+	case BPF_S_ANC_VLAN_TAG_PRESENT:
+	case BPF_S_ANC_QUEUE:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
+{
+	int i = 0, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	/* Adjust the stack pointer */
+	emit_stack_offset(-align_sp(offset), ctx);
+
+	if (ctx->flags & SEEN_CALL) {
+		/* Argument save area */
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is essentially a bitmap */
+	pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			pr_debug("%s: preserving register %d\n", __func__,
+				 MIPS_R_S0 + i);
+			emit_store_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					     ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* save return address */
+	if (ctx->flags & SEEN_CALL) {
+		emit_store_stack_reg(r_ra, r_sp, real_off, ctx);
+		real_off += RSIZE;
+	}
+
+	/* Setup r_M leaving the alignment gap if necessary */
+	if (ctx->flags & SEEN_MEM) {
+		if (real_off % (RSIZE * 2))
+			real_off += RSIZE;
+		emit_addiu(r_M, r_sp, real_off, ctx);
+	}
+
+}
+
+static void restore_bpf_jit_regs(struct jit_ctx *ctx,
+				 unsigned int offset)
+{
+	int i, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	if (ctx->flags & SEEN_CALL) {
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_load_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_load_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is a bitmap */
+	pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);
+	i = 0;
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			pr_debug("%s: restoring register %d\n", __func__,
+				 MIPS_R_S0 + i);
+			emit_load_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					    ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* restore return address */
+	if (ctx->flags & SEEN_CALL)
+		emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
+
+	/* Restore the sp and discard the scrach memory */
+	emit_stack_offset(align_sp(offset), ctx);
+
+}
+
+static unsigned int get_stack_depth(struct jit_ctx *ctx)
+{
+	int sp_off = 0;
+
+
+	/* How may s* regs do we need to preserved? */
+	sp_off += hweight32(ctx->flags >> SEEN_SREG_SFT) * RSIZE;
+
+	if (ctx->flags & SEEN_MEM)
+		sp_off += 4 * BPF_MEMWORDS; /* BPF_MEMWORDS are 32-bit */
+
+	if (ctx->flags & SEEN_CALL)
+		/*
+		 * The JIT code make calls to external functions using 2
+		 * arguments. Therefore, for o32 we don't need to allocate
+		 * space because we don't care if the argumetns are lost
+		 * across calls. We do need however to preserve incoming
+		 * arguments but the space is already allocated for us by
+		 * the caller. On the other hand, for n64, we need to allocate
+		 * this space ourselves. We need to preserve $ra as well.
+		 */
+		sp_off += config_enabled(CONFIG_64BIT) ?
+			(ARGS_USED_BY_JIT + 1) * RSIZE : RSIZE;
+
+	/*
+	 * Subtract the bytes for the last registers since we only care about
+	 * the location on the stack pointer.
+	 */
+	return sp_off - RSIZE;
+}
+
+static void build_prologue(struct jit_ctx *ctx)
+{
+	u16 first_inst = ctx->skf->insns[0].code;
+	int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+	sp_off = get_stack_depth(ctx);
+	save_bpf_jit_regs(ctx, sp_off);
+
+	if (ctx->flags & SEEN_SKB)
+		emit_reg_move(r_skb, MIPS_R_A0, ctx);
+
+	if (ctx->flags & FLAG_NEED_X_RESET)
+		emit_jit_reg_move(r_X, r_zero, ctx);
+
+	/* Do not leak kernel data to userspace */
+	if ((first_inst != BPF_S_RET_K) && !(is_load_to_a(first_inst)))
+		emit_jit_reg_move(r_A, r_zero, ctx);
+
+}
+
+static void build_epilogue(struct jit_ctx *ctx)
+{
+	unsigned int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+
+	sp_off = get_stack_depth(ctx);
+	restore_bpf_jit_regs(ctx, sp_off);
+
+	/* Return */
+	emit_jr(r_ra, ctx);
+	emit_nop(ctx);
+}
+
+static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
+{
+	u8 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 1);
+
+	return (u64)err << 32 | ret;
+}
+
+static u64 jit_get_skb_h(struct sk_buff *skb, unsigned offset)
+{
+	u16 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 2);
+
+	return (u64)err << 32 | ntohs(ret);
+}
+
+static u64 jit_get_skb_w(struct sk_buff *skb, unsigned offset)
+{
+	u32 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 4);
+
+	return (u64)err << 32 | ntohl(ret);
+}
+
+static int build_body(struct jit_ctx *ctx)
+{
+	void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
+	const struct sk_filter *prog = ctx->skf;
+	const struct sock_filter *inst;
+	unsigned int i, off, load_order, condt;
+	u32 k, b_off __maybe_unused;
+
+	for (i = 0; i < prog->len; i++) {
+		inst = &(prog->insns[i]);
+		pr_debug("%s: code->0x%02x, jt->0x%x, jf->0x%x, k->0x%x\n",
+			 __func__, inst->code, inst->jt, inst->jf, inst->k);
+		k = inst->k;
+
+		if (ctx->target == NULL)
+			ctx->offsets[i] = ctx->idx * 4;
+
+		switch (inst->code) {
+		case BPF_S_LD_IMM:
+			/* A <- k ==> li r_A, k */
+			ctx->flags |= SEEN_A;
+			emit_load_imm(r_A, k, ctx);
+			break;
+		case BPF_S_LD_W_LEN:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4);
+			/* A <- len ==> lw r_A, offset(skb) */
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_LD_MEM:
+			/* A <- M[k] ==> lw r_A, offset(M) */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_load(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LD_W_ABS:
+			/* A <- P[k:4] */
+			load_order = 2;
+			goto load;
+		case BPF_S_LD_H_ABS:
+			/* A <- P[k:2] */
+			load_order = 1;
+			goto load;
+		case BPF_S_LD_B_ABS:
+			/* A <- P[k:1] */
+			load_order = 0;
+load:
+			emit_load_imm(r_off, k, ctx);
+load_common:
+			ctx->flags |= SEEN_CALL | SEEN_OFF | SEEN_S0 |
+				SEEN_SKB | SEEN_A;
+
+			emit_load_func(r_s0, (ptr)load_func[load_order],
+				      ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			/* Load second argument to delay slot */
+			emit_reg_move(MIPS_R_A1, r_off, ctx);
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Get error code from the top 32-bits */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero, 3 << 2,
+					   ctx);
+			} else {
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_err, r_zero, 3 << 2,
+					   ctx);
+			}
+			emit_nop(ctx);
+			/* We are good */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_jit_reg_move(r_A, r_val, ctx);
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			break;
+		case BPF_S_LD_W_IND:
+			/* A <- P[X + k:4] */
+			load_order = 2;
+			goto load_ind;
+		case BPF_S_LD_H_IND:
+			/* A <- P[X + k:2] */
+			load_order = 1;
+			goto load_ind;
+		case BPF_S_LD_B_IND:
+			/* A <- P[X + k:1] */
+			load_order = 0;
+load_ind:
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_OFF | SEEN_X;
+			emit_addiu(r_off, r_X, k, ctx);
+		goto load_common;
+		case BPF_S_LDX_IMM:
+			/* X <- k */
+			ctx->flags |= SEEN_X;
+			emit_load_imm(r_X, k, ctx);
+			break;
+		case BPF_S_LDX_MEM:
+			/* X <- M[k] */
+			ctx->flags |= SEEN_X | SEEN_MEM;
+			emit_load(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LDX_W_LEN:
+			/* X <- len */
+			ctx->flags |= SEEN_X | SEEN_SKB;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_X, r_skb, off, ctx);
+			break;
+		case BPF_S_LDX_B_MSH:
+			/* X <- 4 * (P[k:1] & 0xf) */
+			ctx->flags |= SEEN_X | SEEN_CALL | SEEN_S0 | SEEN_SKB;
+			/* Load offset to a1 */
+			emit_load_func(r_s0, (ptr)jit_get_skb_b, ctx);
+			/*
+			 * This may emit two instructions so it may not fit
+			 * in the delay slot. So use a0 in the delay slot.
+			 */
+			emit_load_imm(MIPS_R_A1, k, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx); /* delay slot */
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Top 32-bits of $v0 on 64-bit */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero,
+					   3 << 2, ctx);
+			} else {
+				emit_bcond(MIPS_COND_NE, r_err, r_zero,
+					   3 << 2, ctx);
+			}
+			/* No need for delay slot */
+			/* We are good */
+			/* X <- P[1:K] & 0xf */
+			emit_andi(r_X, r_val, 0xf, ctx);
+			/* X << 2 */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_sll(r_X, r_X, 2, ctx); /* delay slot */
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_ret, 0, ctx); /* delay slot */
+			break;
+		case BPF_S_ST:
+			/* M[k] <- A */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_store(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_STX:
+			/* M[k] <- X */
+			ctx->flags |= SEEN_MEM | SEEN_X;
+			emit_store(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_ALU_ADD_K:
+			/* A += K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_SUB_K:
+			/* A -= K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, -k, ctx);
+			break;
+		case BPF_S_ALU_MUL_K:
+			/* A *= K */
+			/* Load K to scratch register before MUL */
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_mul(r_A, r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MUL_X:
+			/* A *= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_mul(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_DIV_K:
+			/* A /= k */
+			if (k == 1)
+				break;
+			if (optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_srl(r_A, r_A, k, ctx);
+				break;
+			}
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_div(r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_DIV_X:
+			/* A /= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_div(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_OR_K:
+			/* A |= K */
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_OR_X:
+			/* A |= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_XOR_K:
+			/* A ^= k */
+			ctx->flags |= SEEN_A;
+			emit_xori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ANC_ALU_XOR_X:
+		case BPF_S_ALU_XOR_X:
+			/* A ^= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_xor(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_AND_K:
+			/* A &= K */
+			ctx->flags |= SEEN_A;
+			emit_andi(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_AND_X:
+			/* A &= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_and(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_LSH_K:
+			/* A <<= K */
+			ctx->flags |= SEEN_A;
+			emit_sll(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_LSH_X:
+			/* A <<= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_sllv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_RSH_K:
+			/* A >>= K */
+			ctx->flags |= SEEN_A;
+			emit_srl(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_RSH_X:
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_srlv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_NEG:
+			/* A = -A */
+			ctx->flags |= SEEN_A;
+			emit_neg(r_A, ctx);
+			break;
+		case BPF_S_JMP_JA:
+			/* pc += K */
+			emit_b(b_imm(i + k + 1, ctx), ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JEQ_K:
+			/* pc += ( A == K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JEQ_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A == X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_K:
+			/* pc += ( A >= K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A >= X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_K:
+			/* pc += ( A > K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A > X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_X;
+jmp_cmp:
+			/* Greater or Equal */
+			if ((condt & MIPS_COND_GE) ||
+			    (condt & MIPS_COND_GT)) {
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_sltiu(r_s0, r_A, k, ctx);
+				} else { /* X */
+					ctx->flags |= SEEN_S0 | SEEN_A |
+						SEEN_X;
+					emit_sltu(r_s0, r_A, r_X, ctx);
+				}
+				/* A < (K|X) ? r_scrach = 1 */
+				b_off = b_imm(i + inst->jf + 1, ctx);
+				emit_bcond(MIPS_COND_GT, r_s0, r_zero, b_off,
+					   ctx);
+				emit_nop(ctx);
+				/* A > (K|X) ? scratch = 0 */
+				if (condt & MIPS_COND_GT) {
+					/* Checking for equality */
+					ctx->flags |= SEEN_S0 | SEEN_A | SEEN_X;
+					if (condt & MIPS_COND_K)
+						emit_load_imm(r_s0, k, ctx);
+					else
+						emit_jit_reg_move(r_s0, r_X,
+								  ctx);
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* Finally, A > K|X */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				} else {
+					/* A >= (K|X) so jump */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				}
+			} else {
+				/* A == K|X */
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_load_imm(r_s0, k, ctx);
+					/* jump true */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+				} else { /* X */
+					/* jump true */
+					ctx->flags |= SEEN_A | SEEN_X;
+					b_off = b_imm(i + inst->jt + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+				}
+			}
+			break;
+		case BPF_S_JMP_JSET_K:
+			ctx->flags |= SEEN_S0 | SEEN_S1 | SEEN_A;
+			/* pc += (A & K) ? pc -> jt : pc -> jf */
+			emit_load_imm(r_s1, k, ctx);
+			emit_and(r_s0, r_A, r_s1, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JSET_X:
+			ctx->flags |= SEEN_S0 | SEEN_X | SEEN_A;
+			/* pc += (A & X) ? pc -> jt : pc -> jf */
+			emit_and(r_s0, r_A, r_X, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_RET_A:
+			ctx->flags |= SEEN_A;
+			if (i != prog->len - 1)
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_A, ctx); /* delay slot */
+			break;
+		case BPF_S_RET_K:
+			/*
+			 * It can emit two instructions so it does not fit on
+			 * the delay slot.
+			 */
+			emit_load_imm(r_ret, k, ctx);
+			if (i != prog->len - 1) {
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+				emit_nop(ctx);
+			}
+			break;
+		case BPF_S_MISC_TAX:
+			/* X = A */
+			ctx->flags |= SEEN_X | SEEN_A;
+			emit_jit_reg_move(r_X, r_A, ctx);
+			break;
+		case BPF_S_MISC_TXA:
+			/* A = X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_jit_reg_move(r_A, r_X, ctx);
+			break;
+		/* AUX */
+		case BPF_S_ANC_PROTOCOL:
+			/* A = ntohs(skb->protocol */
+			ctx->flags |= SEEN_SKB | SEEN_OFF | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  protocol) != 2);
+			off = offsetof(struct sk_buff, protocol);
+			emit_half_load(r_A, r_skb, off, ctx);
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+			/* This needs little endian fixup */
+			emit_wsbh(r_A, r_A, ctx);
+#endif
+			break;
+		case BPF_S_ANC_CPU:
+			ctx->flags |= SEEN_A | SEEN_OFF;
+			/* A = current_thread_info()->cpu */
+			BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info,
+						  cpu) != 4);
+			off = offsetof(struct thread_info, cpu);
+			/* $28/gp points to the thread_info struct */
+			emit_load(r_A, 28, off, ctx);
+			break;
+		case BPF_S_ANC_IFINDEX:
+			/* A = skb->dev->ifindex */
+			ctx->flags |= SEEN_SKB | SEEN_A | SEEN_S0;
+			off = offsetof(struct sk_buff, dev);
+			emit_load(r_s0, r_skb, off, ctx);
+			/* error (0) in the delay slot */
+			emit_bcond(MIPS_COND_EQ, r_s0, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+						  ifindex) != 4);
+			off = offsetof(struct net_device, ifindex);
+			emit_load(r_A, r_s0, off, ctx);
+			break;
+		case BPF_S_ANC_MARK:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
+			off = offsetof(struct sk_buff, mark);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_RXHASH:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, rxhash) != 4);
+			off = offsetof(struct sk_buff, rxhash);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_VLAN_TAG_PRESENT:
+			ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  vlan_tci) != 2);
+			off = offsetof(struct sk_buff, vlan_tci);
+			emit_half_load(r_s0, r_skb, off, ctx);
+			if (inst->code == BPF_S_ANC_VLAN_TAG)
+				emit_and(r_A, r_s0, VLAN_VID_MASK, ctx);
+			else
+				emit_and(r_A, r_s0, VLAN_TAG_PRESENT, ctx);
+			break;
+		case BPF_S_ANC_QUEUE:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  queue_mapping) != 2);
+			BUILD_BUG_ON(offsetof(struct sk_buff,
+					      queue_mapping) > 0xff);
+			off = offsetof(struct sk_buff, queue_mapping);
+			emit_half_load(r_A, r_skb, off, ctx);
+			break;
+		default:
+			pr_warn("%s: Unhandled opcode: 0x%02x\n", __FILE__,
+				inst->code);
+			return -1;
+		}
+	}
+
+	/* compute offsets only during the first pass */
+	if (ctx->target == NULL)
+		ctx->offsets[i] = ctx->idx * 4;
+
+	return 0;
+
+}
+
+int bpf_jit_enable __read_mostly;
+
+void bpf_jit_compile(struct sk_filter *fp)
+{
+	struct jit_ctx ctx;
+	unsigned int alloc_size, tmp_idx;
+
+	if (!bpf_jit_enable)
+		return;
+
+	memset(&ctx, 0, sizeof(ctx));
+
+	ctx.offsets = kcalloc(fp->len, sizeof(*ctx.offsets), GFP_KERNEL);
+	if (ctx.offsets == NULL)
+		return;
+
+	ctx.skf = fp;
+
+	if (unlikely(build_body(&ctx)))
+		goto out;
+
+	tmp_idx = ctx.idx;
+	build_prologue(&ctx);
+	ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
+	/* just to complete the ctx.idx count */
+	build_epilogue(&ctx);
+
+	alloc_size = 4 * ctx.idx;
+	ctx.target = module_alloc(alloc_size);
+	if (ctx.target == NULL)
+		goto out;
+
+	/* Clean it */
+	memset(ctx.target, 0, alloc_size);
+
+	ctx.idx = 0;
+
+	/* Generate the actual JIT code */
+	build_prologue(&ctx);
+	build_body(&ctx);
+	build_epilogue(&ctx);
+
+	/* Update the icache */
+	flush_icache_range((ptr)ctx.target, (ptr)(ctx.target + ctx.idx));
+
+	if (bpf_jit_enable > 1)
+		/* Dump JIT code */
+		bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);
+
+	fp->bpf_func = (void *)ctx.target;
+
+out:
+	kfree(ctx.offsets);
+}
+
+void bpf_jit_free(struct sk_filter *fp)
+{
+	if (fp->bpf_func != sk_run_filter)
+		module_free(NULL, fp->bpf_func);
+	kfree(fp);
+}
diff --git a/arch/mips/net/bpf_jit.h b/arch/mips/net/bpf_jit.h
new file mode 100644
index 0000000..3a5751b
--- /dev/null
+++ b/arch/mips/net/bpf_jit.h
@@ -0,0 +1,44 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#ifndef BPF_JIT_MIPS_OP_H
+#define BPF_JIT_MIPS_OP_H
+
+/* Registers used by JIT */
+#define MIPS_R_ZERO	0
+#define MIPS_R_V0	2
+#define MIPS_R_V1	3
+#define MIPS_R_A0	4
+#define MIPS_R_A1	5
+#define MIPS_R_T6	14
+#define MIPS_R_T7	15
+#define MIPS_R_S0	16
+#define MIPS_R_S1	17
+#define MIPS_R_S2	18
+#define MIPS_R_S3	19
+#define MIPS_R_S4	20
+#define MIPS_R_S5	21
+#define MIPS_R_S6	22
+#define MIPS_R_S7	23
+#define MIPS_R_SP	29
+#define MIPS_R_RA	31
+
+/* Conditional codes */
+#define MIPS_COND_EQ	0x1
+#define MIPS_COND_GE	(0x1 << 1)
+#define MIPS_COND_GT	(0x1 << 2)
+#define MIPS_COND_NE	(0x1 << 3)
+#define MIPS_COND_ALL	(0x1 << 4)
+/* Conditionals on X register or K immediate */
+#define MIPS_COND_X	(0x1 << 5)
+#define MIPS_COND_K	(0x1 << 6)
+
+#endif /* BPF_JIT_MIPS_OP_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 14/14] MIPS: Enable the BPF_JIT symbol for MIPS
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 16d5ab1..7a6edca 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -12,6 +12,7 @@ config MIPS
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_TRACEHOOK
+	select HAVE_BPF_JIT if CPU_MIPSR2 && !CPU_MICROMIPS
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 14/14] MIPS: Enable the BPF_JIT symbol for MIPS
@ 2014-04-08 11:47   ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-08 11:47 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
 arch/mips/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 16d5ab1..7a6edca 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -12,6 +12,7 @@ config MIPS
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_TRACEHOOK
+	select HAVE_BPF_JIT if CPU_MIPSR2 && !CPU_MICROMIPS
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 00/14] Initial BPF-JIT support for MIPS
  2014-04-08 11:47 ` Markos Chandras
                   ` (14 preceding siblings ...)
  (?)
@ 2014-04-08 18:38 ` Florian Fainelli
  2014-04-09  8:55   ` Markos Chandras
  -1 siblings, 1 reply; 53+ messages in thread
From: Florian Fainelli @ 2014-04-08 18:38 UTC (permalink / raw)
  To: Markos Chandras; +Cc: Linux-MIPS

Hi,

2014-04-08 4:47 GMT-07:00 Markos Chandras <markos.chandras@imgtec.com>:
> Hi,
>
> This adds support for BPF-JIT for MIPS. Tested on mips32 LE/BE and mips64 BE
> with a few networking tools such as tcpdump and dhcp but not all opcodes have
> been tested as far as I can tell. There are a few optimizations left to be made
> (fastpath for load operations instead of calling the helper functions) but
> these can be added later on. If someone has complex network setups in place and
> would like to give it a try, that would be much appreciated.
>
> This patchset is for the upstream-sfr/mips-for-linux-next tree

You should have probably CC'd netdev@vger.kernel.org to get their
review on the specific JIT implementation.

BPF_JIT is made conditional to MIPS32/64R2 processors, I could not
spot easily in the implementation whether this is because you are
using r2-only instructions, or this is just the targets you tested. Is
there any chance to make that work on MIPS32r1 CPUs for instance?
Those are used in low-end devices which could benefit from such a
performance boost.

Thanks!

>
> Markos Chandras (14):
>   MIPS: uasm: Add u3u2u1 instruction builders
>   MIPS: uasm: Add u2u1 instruction builders
>   MIPS: uasm: Add sllv uasm instruction
>   MIPS: uasm: Add srlv uasm instruction
>   MIPS: uasm: Add divu uasm instruction
>   MIPS: uasm: Add mfhi uasm instruction
>   MIPS: uasm: Add jalr uasm instruction
>   MIPS: uasm: Add sltiu uasm instruction
>   MIPS: uasm: Add sltu uasm instruction
>   MIPS: uasm: Add wsbh uasm instruction
>   MIPS: uasm: Add lh uam instruction
>   MIPS: uasm: Add mul uasm instruction
>   MIPS: net: Add BPF JIT
>   MIPS: Enable the BPF_JIT symbol for MIPS
>
>  arch/mips/Kbuild                  |    1 +
>  arch/mips/Kconfig                 |    1 +
>  arch/mips/include/asm/uasm.h      |   16 +
>  arch/mips/include/uapi/asm/inst.h |   17 +
>  arch/mips/mm/uasm-micromips.c     |   10 +
>  arch/mips/mm/uasm-mips.c          |   10 +
>  arch/mips/mm/uasm.c               |   39 +-
>  arch/mips/net/Makefile            |    3 +
>  arch/mips/net/bpf_jit.c           | 1327 +++++++++++++++++++++++++++++++++++++
>  arch/mips/net/bpf_jit.h           |   45 ++
>  10 files changed, 1462 insertions(+), 7 deletions(-)
>  create mode 100644 arch/mips/net/Makefile
>  create mode 100644 arch/mips/net/bpf_jit.c
>  create mode 100644 arch/mips/net/bpf_jit.h
>
> --
> 1.9.1
>
>



-- 
Florian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 00/14] Initial BPF-JIT support for MIPS
  2014-04-08 18:38 ` [PATCH 00/14] Initial BPF-JIT support " Florian Fainelli
@ 2014-04-09  8:55   ` Markos Chandras
  2014-04-09 10:11     ` Jonas Gorski
  0 siblings, 1 reply; 53+ messages in thread
From: Markos Chandras @ 2014-04-09  8:55 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Linux-MIPS

Hi Florian,

On 04/08/2014 07:38 PM, Florian Fainelli wrote:
> Hi,
>
> 2014-04-08 4:47 GMT-07:00 Markos Chandras <markos.chandras@imgtec.com>:
>> Hi,
>>
>> This adds support for BPF-JIT for MIPS. Tested on mips32 LE/BE and mips64 BE
>> with a few networking tools such as tcpdump and dhcp but not all opcodes have
>> been tested as far as I can tell. There are a few optimizations left to be made
>> (fastpath for load operations instead of calling the helper functions) but
>> these can be added later on. If someone has complex network setups in place and
>> would like to give it a try, that would be much appreciated.
>>
>> This patchset is for the upstream-sfr/mips-for-linux-next tree
>
> You should have probably CC'd netdev@vger.kernel.org to get their
> review on the specific JIT implementation.
>
> BPF_JIT is made conditional to MIPS32/64R2 processors, I could not
> spot easily in the implementation whether this is because you are
> using r2-only instructions, or this is just the targets you tested. Is
> there any chance to make that work on MIPS32r1 CPUs for instance?
> Those are used in low-end devices which could benefit from such a
> performance boost.
>
> Thanks!
>
>>

I used in fact only R2 devices. I also use R2 specific instructions such 
as "ins" and "wsbh". I haven't really considered supporting older ISAs 
for a couple of reasons. R2 has been around for a long time so a ~7y old 
device is probably R2 capable. Furthermore, I was not sure whether 
mips32/64R1 devices are likely to run the latest kernel (and/or use 
BPF-JIT at all). But I could easily be wrong.
Having said that, it's possible to support R1 devices but I'd like to 
avoid the overhead so a few #ifdefs are needed in the code. My personal 
preference is to support R2 in the initial patch, and add R1 support 
later on (along with the optimizations I have in mind).

-- 
markos

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 00/14] Initial BPF-JIT support for MIPS
  2014-04-09  8:55   ` Markos Chandras
@ 2014-04-09 10:11     ` Jonas Gorski
  2014-04-09 10:56         ` Markos Chandras
  0 siblings, 1 reply; 53+ messages in thread
From: Jonas Gorski @ 2014-04-09 10:11 UTC (permalink / raw)
  To: Markos Chandras; +Cc: Florian Fainelli, Linux-MIPS

On Wed, Apr 9, 2014 at 10:55 AM, Markos Chandras
<Markos.Chandras@imgtec.com> wrote:
> I used in fact only R2 devices. I also use R2 specific instructions such as
> "ins" and "wsbh". I haven't really considered supporting older ISAs for a
> couple of reasons. R2 has been around for a long time so a ~7y old device is
> probably R2 capable. Furthermore, I was not sure whether mips32/64R1 devices
> are likely to run the latest kernel (and/or use BPF-JIT at all). But I could
> easily be wrong.
> Having said that, it's possible to support R1 devices but I'd like to avoid
> the overhead so a few #ifdefs are needed in the code. My personal preference
> is to support R2 in the initial patch, and add R1 support later on (along
> with the optimizations I have in mind).

As far as I can tell on a first glance, you are only using "wsbh", and
only if the device is running little endian. There is code to emit
"ins", but it isn't evoked from anywhere.

The biggest supplier of linux-running R1 chips is probably Broadcom
with its xDSL/GPON SoCs; there are still 4k cores, even (afaik) the
newest ones introduced last year. And since they started introducing
ARM based variants, the I would assume the chances of them switching
to R2 or newer are nil. Luckily they are running in big endian mode,
so they should be able to run it even now.

That said, I don't see a problem in making it R2 first, then adding R1
support later.


Regards
Jonas

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 00/14] Initial BPF-JIT support for MIPS
@ 2014-04-09 10:56         ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-09 10:56 UTC (permalink / raw)
  To: linux-mips

Hi Jonas,

On 04/09/2014 11:11 AM, Jonas Gorski wrote:
> On Wed, Apr 9, 2014 at 10:55 AM, Markos Chandras
> <Markos.Chandras@imgtec.com> wrote:
>> I used in fact only R2 devices. I also use R2 specific instructions such as
>> "ins" and "wsbh". I haven't really considered supporting older ISAs for a
>> couple of reasons. R2 has been around for a long time so a ~7y old device is
>> probably R2 capable. Furthermore, I was not sure whether mips32/64R1 devices
>> are likely to run the latest kernel (and/or use BPF-JIT at all). But I could
>> easily be wrong.
>> Having said that, it's possible to support R1 devices but I'd like to avoid
>> the overhead so a few #ifdefs are needed in the code. My personal preference
>> is to support R2 in the initial patch, and add R1 support later on (along
>> with the optimizations I have in mind).
>
> As far as I can tell on a first glance, you are only using "wsbh", and
> only if the device is running little endian. There is code to emit
> "ins", but it isn't evoked from anywhere.

You are right. Seems like a leftover while I was playing with different 
implementations. I will drop it. Regarding the wsbh, I assume a few ORs 
and shifts will emulate the wsbh instruction for R1. I will have a look 
and provide a v2 soon.

-- 
markos

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 00/14] Initial BPF-JIT support for MIPS
@ 2014-04-09 10:56         ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-09 10:56 UTC (permalink / raw)
  To: linux-mips

Hi Jonas,

On 04/09/2014 11:11 AM, Jonas Gorski wrote:
> On Wed, Apr 9, 2014 at 10:55 AM, Markos Chandras
> <Markos.Chandras@imgtec.com> wrote:
>> I used in fact only R2 devices. I also use R2 specific instructions such as
>> "ins" and "wsbh". I haven't really considered supporting older ISAs for a
>> couple of reasons. R2 has been around for a long time so a ~7y old device is
>> probably R2 capable. Furthermore, I was not sure whether mips32/64R1 devices
>> are likely to run the latest kernel (and/or use BPF-JIT at all). But I could
>> easily be wrong.
>> Having said that, it's possible to support R1 devices but I'd like to avoid
>> the overhead so a few #ifdefs are needed in the code. My personal preference
>> is to support R2 in the initial patch, and add R1 support later on (along
>> with the optimizations I have in mind).
>
> As far as I can tell on a first glance, you are only using "wsbh", and
> only if the device is running little endian. There is code to emit
> "ins", but it isn't evoked from anywhere.

You are right. Seems like a leftover while I was playing with different 
implementations. I will drop it. Regarding the wsbh, I assume a few ORs 
and shifts will emulate the wsbh instruction for R1. I will have a look 
and provide a v2 soon.

-- 
markos

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 13/14] MIPS: net: Add BPF JIT
@ 2014-04-09 16:00     ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-09 16:00 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras, David S. Miller, netdev

This adds initial support for BPF-JIT on MIPS

Tested on mips32 LE/BE and mips64 BE/n64 using
dhcp, ping and various tcpdump filters.

Benchmarking:

Assuming the remote MIPS target uses 192.168.154.181
as its IP address, and the local host uses 192.168.154.136,
the following results can be obtained using the following
tcpdump filter (catches no frames) and a simple
'time ping -f -c 1000000' command.

[root@(none) ~]# tcpdump -p -n -s 0 -i eth0 net 10.0.0.0/24 -d
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 8
(002) ld       [26]
(003) and      #0xffffff00
(004) jeq      #0xa000000       jt 16	jf 5
(005) ld       [30]
(006) and      #0xffffff00
(007) jeq      #0xa000000       jt 16	jf 17
(008) jeq      #0x806           jt 10	jf 9
(009) jeq      #0x8035          jt 10	jf 17
(010) ld       [28]
(011) and      #0xffffff00
(012) jeq      #0xa000000       jt 16	jf 13
(013) ld       [38]
(014) and      #0xffffff00
(015) jeq      #0xa000000       jt 16	jf 17
(016) ret      #65535

- BPF-JIT Disabled

real    1m38.005s
user    0m1.510s
sys     0m6.710s

- BPF-JIT Enabled

real    1m35.215s
user    0m1.200s
sys     0m4.140s

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
Changes since v1:

- Add 'wsbh' implementation for MIPS32/64R1
- Rename 'rxhash' to 'hash' because of
61b905da33ae25edb6b9d2a5de21e34c3a77efe3
- Initialize 'fp->jited' because of
f8bbbfc3b97f4c7a6c7c23185e520b22bfc3a21d
---
 arch/mips/Kbuild        |    1 +
 arch/mips/net/Makefile  |    3 +
 arch/mips/net/bpf_jit.c | 1334 +++++++++++++++++++++++++++++++++++++++++++++++
 arch/mips/net/bpf_jit.h |   44 ++
 4 files changed, 1382 insertions(+)
 create mode 100644 arch/mips/net/Makefile
 create mode 100644 arch/mips/net/bpf_jit.c
 create mode 100644 arch/mips/net/bpf_jit.h

diff --git a/arch/mips/Kbuild b/arch/mips/Kbuild
index d2cfe45..9179336 100644
--- a/arch/mips/Kbuild
+++ b/arch/mips/Kbuild
@@ -17,6 +17,7 @@ obj- := $(platform-)
 obj-y += kernel/
 obj-y += mm/
 obj-y += math-emu/
+obj-y += net/
 
 ifdef CONFIG_KVM
 obj-y += kvm/
diff --git a/arch/mips/net/Makefile b/arch/mips/net/Makefile
new file mode 100644
index 0000000..ae74b3a
--- /dev/null
+++ b/arch/mips/net/Makefile
@@ -0,0 +1,3 @@
+# MIPS networking code
+
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
new file mode 100644
index 0000000..a12f843
--- /dev/null
+++ b/arch/mips/net/bpf_jit.c
@@ -0,0 +1,1334 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <linux/kconfig.h>
+#include <linux/moduleloader.h>
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <asm/bitops.h>
+#include <asm/cacheflush.h>
+#include <asm/uasm.h>
+
+#include "bpf_jit.h"
+
+/* ABI
+ *
+ * s0	1st scratch register
+ * s1	2nd scratch register
+ * s2	offset register
+ * s3	BPF register A
+ * s4	BPF register X
+ * s5	*skb
+ * s6	*scratch memory
+ *
+ * On entry (*bpf_func)(*skb, *filter)
+ * a0 = MIPS_R_A0 = skb;
+ * a1 = MIPS_R_A1 = filter;
+ *
+ * Stack
+ * ...
+ * M[15]
+ * M[14]
+ * M[13]
+ * ...
+ * M[0] <-- r_M
+ * saved reg k-1
+ * saved reg k-2
+ * ...
+ * saved reg 0 <-- r_sp
+ * <no argument area>
+ *
+ *                     Packet layout
+ *
+ * <--------------------- len ------------------------>
+ * <--skb-len(r_skb_hl)-->< ----- skb->data_len ------>
+ * ----------------------------------------------------
+ * |                  skb->data                       |
+ * ----------------------------------------------------
+ */
+
+#define RSIZE	(sizeof(unsigned long))
+#define ptr typeof(unsigned long)
+
+/* ABI specific return values */
+#ifdef CONFIG_32BIT /* O32 */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define r_err	MIPS_R_V1
+#define r_val	MIPS_R_V0
+#else /* CONFIG_CPU_LITTLE_ENDIAN */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V1
+#endif
+#else /* N64 */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V0
+#endif
+
+#define r_ret	MIPS_R_V0
+
+/*
+ * Use 2 scratch registers to avoid pipeline interlocks.
+ * There is no overhead during epilogue and prologue since
+ * any of the $s0-$s6 registers will only be preserved if
+ * they are going to actually be used.
+ */
+#define r_s0		MIPS_R_S0 /* scratch reg 1 */
+#define r_s1		MIPS_R_S1 /* scratch reg 2 */
+#define r_off		MIPS_R_S2
+#define r_A		MIPS_R_S3
+#define r_X		MIPS_R_S4
+#define r_skb		MIPS_R_S5
+#define r_M		MIPS_R_S6
+#define r_tmp_imm	MIPS_R_T6 /* No need to preserve this */
+#define r_tmp		MIPS_R_T7 /* No need to preserve this */
+#define r_zero		MIPS_R_ZERO
+#define r_sp		MIPS_R_SP
+#define r_ra		MIPS_R_RA
+
+#define SCRATCH_OFF(k)		(4 * (k))
+
+/* JIT flags */
+#define SEEN_CALL		(1 << BPF_MEMWORDS)
+#define SEEN_SREG_SFT		(BPF_MEMWORDS + 1)
+#define SEEN_SREG_BASE		(1 << SEEN_SREG_SFT)
+#define SEEN_SREG(x)		(SEEN_SREG_BASE << (x))
+#define SEEN_S0			SEEN_SREG(0)
+#define SEEN_S1			SEEN_SREG(1)
+#define SEEN_OFF		SEEN_SREG(2)
+#define SEEN_A			SEEN_SREG(3)
+#define SEEN_X			SEEN_SREG(4)
+#define SEEN_SKB		SEEN_SREG(5)
+#define SEEN_MEM		SEEN_SREG(6)
+
+/* Arguments used by JIT */
+#define ARGS_USED_BY_JIT	2 /* only applicable to 64-bit */
+
+#define FLAG_NEED_X_RESET	(1 << 0)
+
+#define SBIT(x)			(1 << (x)) /* Signed version of BIT() */
+
+/**
+ * struct jit_ctx - JIT context
+ * @skf:		The sk_filter
+ * @prologue_bytes:	Number of bytes for prologue
+ * @idx:		Instruction index
+ * @flags:		JIT flags
+ * @offsets:		Instruction offsets
+ * @target:		Memory location for the compiled filter
+ */
+struct jit_ctx {
+	const struct sk_filter *skf;
+	unsigned int prologue_bytes;
+	u32 idx;
+	u32 flags;
+	u32 *offsets;
+	u32 *target;
+};
+
+
+static inline int optimize_div(u32 *k)
+{
+	/* power of 2 divides can be implemented with right shift */
+	if (!(*k & (*k-1))) {
+		*k = ilog2(*k);
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Simply emit the instruction if the JIT memory space has been allocated */
+#define emit_instr(ctx, func, ...)			\
+do {							\
+	if ((ctx)->target != NULL) {			\
+		u32 *p = &(ctx)->target[ctx->idx];	\
+		uasm_i_##func(&p, ##__VA_ARGS__);	\
+	}						\
+	(ctx)->idx++;					\
+} while (0)
+
+/* Determine if immediate is within the 16-bit signed range */
+static inline bool is_range16(s32 imm)
+{
+	if (imm >= SBIT(15) || imm < -SBIT(15))
+		return true;
+	return false;
+}
+
+static inline void emit_addu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, addu, dst, src1, src2);
+}
+
+static inline void emit_nop(struct jit_ctx *ctx)
+{
+	emit_instr(ctx, nop);
+}
+
+/* Load a u32 immediate to a register */
+static inline void emit_load_imm(unsigned int dst, u32 imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		/* addiu can only handle s16 */
+		if (is_range16(imm)) {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_lui(&p, r_tmp_imm, (s32)imm >> 16);
+			p = &ctx->target[ctx->idx + 1];
+			uasm_i_ori(&p, dst, r_tmp_imm, imm & 0xffff);
+		} else {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_addiu(&p, dst, r_zero, imm);
+		}
+	}
+	ctx->idx++;
+
+	if (is_range16(imm))
+		ctx->idx++;
+}
+
+static inline void emit_or(unsigned int dst, unsigned int src1,
+			   unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, or, dst, src1, src2);
+}
+
+static inline void emit_ori(unsigned int dst, unsigned src, u32 imm,
+			    struct jit_ctx *ctx)
+{
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_or(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, ori, dst, src, imm);
+	}
+
+}
+
+
+static inline void emit_daddu(unsigned int dst, unsigned int src1,
+			      unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, daddu, dst, src1, src2);
+}
+
+static inline void emit_daddiu(unsigned int dst, unsigned int src,
+			       int imm, struct jit_ctx *ctx)
+{
+	/*
+	 * Only used for stack, so the imm is relatively small
+	 * and it fits in 15-bits
+	 */
+	emit_instr(ctx, daddiu, dst, src, imm);
+}
+
+static inline void emit_addiu(unsigned int dst, unsigned int src,
+			      u32 imm, struct jit_ctx *ctx)
+{
+	if (is_range16(imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_addu(dst, r_tmp, src, ctx);
+	} else {
+		emit_instr(ctx, addiu, dst, src, imm);
+	}
+}
+
+static inline void emit_and(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, and, dst, src1, src2);
+}
+
+static inline void emit_andi(unsigned int dst, unsigned int src,
+			     u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_and(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, andi, dst, src, imm);
+	}
+}
+
+static inline void emit_xor(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, xor, dst, src1, src2);
+}
+
+static inline void emit_xori(ptr dst, ptr src, u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_xor(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, xori, dst, src, imm);
+	}
+}
+
+static inline void emit_stack_offset(int offset, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, daddiu, r_sp, r_sp, offset);
+	else
+		emit_instr(ctx, addiu, r_sp, r_sp, offset);
+
+}
+
+static inline void emit_subu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, subu, dst, src1, src2);
+}
+
+static inline void emit_neg(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_subu(reg, r_zero, reg, ctx);
+}
+
+static inline void emit_sllv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sllv, dst, src, sa);
+}
+
+static inline void emit_sll(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, sll, dst, src, sa);
+}
+
+static inline void emit_srlv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, srlv, dst, src, sa);
+}
+
+static inline void emit_srl(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, srl, dst, src, sa);
+}
+
+static inline void emit_sltu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sltu, dst, src1, src2);
+}
+
+static inline void emit_sltiu(unsigned dst, unsigned int src,
+			      unsigned int imm, struct jit_ctx *ctx)
+{
+	/* 16 bit immediate */
+	if (is_range16((s32)imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_sltu(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, sltiu, dst, src, imm);
+	}
+
+}
+
+/* Store register on the stack */
+static inline void emit_store_stack_reg(ptr reg, ptr base,
+					unsigned int offset,
+					struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, sd, reg, offset, base);
+	else
+		emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_store(ptr reg, ptr base, unsigned int offset,
+			      struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_load_stack_reg(ptr reg, ptr base,
+				       unsigned int offset,
+				       struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, ld, reg, offset, base);
+	else
+		emit_instr(ctx, lw, reg, offset, base);
+
+}
+
+static inline void emit_load(unsigned int reg, unsigned int base,
+			     unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_half_load(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lh, reg, offset, base);
+}
+
+static inline void emit_mul(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, mul, dst, src1, src2);
+}
+
+static inline void emit_div(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mfhi(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_dsll(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsll, dst, src, sa);
+}
+
+static inline void emit_dsrl32(unsigned int dst, unsigned int src,
+			       unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsrl32, dst, src, sa);
+}
+
+static inline void emit_wsbh(unsigned int dst, unsigned int src,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, wsbh, dst, src);
+}
+
+/* load a function pointer to register */
+static inline void emit_load_func(unsigned int reg, ptr imm,
+				  struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT)) {
+		/* At this point imm is always 64-bit */
+		emit_load_imm(r_tmp, (u64)imm >> 32, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(r_tmp, r_tmp_imm, (imm >> 16) & 0xffff, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(reg, r_tmp_imm, imm & 0xffff, ctx);
+	} else {
+		emit_load_imm(reg, imm, ctx);
+	}
+}
+
+/* Move to real MIPS register */
+static inline void emit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_daddu(dst, src, r_zero, ctx);
+	else
+		emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Move to JIT (32-bit) register */
+static inline void emit_jit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Compute the immediate value for PC-relative branches. */
+static inline u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
+{
+	if (ctx->target == NULL)
+		return 0;
+
+	/*
+	 * We want a pc-relative branch. We only do forward branches
+	 * so tgt is always after pc. tgt is the instruction offset
+	 * we want to jump to.
+
+	 * Branch on MIPS:
+	 * I: target_offset <- sign_extend(offset)
+	 * I+1: PC += target_offset (delay slot)
+	 *
+	 * ctx->idx currently points to the branch instruction
+	 * but the offset is added to the delay slot so we need
+	 * to subtract 4.
+	 */
+	return ctx->offsets[tgt] -
+		(ctx->idx * 4 - ctx->prologue_bytes) - 4;
+}
+
+static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
+			     unsigned int imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+
+		switch (cond) {
+		case MIPS_COND_EQ:
+			uasm_i_beq(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_NE:
+			uasm_i_bne(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_ALL:
+			uasm_i_b(&p, imm);
+			break;
+		default:
+			pr_warn("%s: Unhandled branch conditional: %d\n",
+				__func__, cond);
+		}
+	}
+	ctx->idx++;
+}
+
+static inline void emit_b(unsigned int imm, struct jit_ctx *ctx)
+{
+	emit_bcond(MIPS_COND_ALL, r_zero, r_zero, imm, ctx);
+}
+
+static inline void emit_jalr(unsigned int link, unsigned int reg,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jalr, link, reg);
+}
+
+static inline void emit_jr(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jr, reg);
+}
+
+static inline u16 align_sp(unsigned int num)
+{
+	/* Double word alignment for 32-bit, quadword for 64-bit */
+	unsigned int align = config_enabled(CONFIG_64BIT) ? 16 : 8;
+	num = (num + (align - 1)) & -align;
+	return num;
+}
+
+static inline void update_on_xread(struct jit_ctx *ctx)
+{
+	if (!(ctx->flags & SEEN_X))
+		ctx->flags |= FLAG_NEED_X_RESET;
+
+	ctx->flags |= SEEN_X;
+}
+
+static bool is_load_to_a(u16 inst)
+{
+	switch (inst) {
+	case BPF_S_LD_W_LEN:
+	case BPF_S_LD_W_ABS:
+	case BPF_S_LD_H_ABS:
+	case BPF_S_LD_B_ABS:
+	case BPF_S_ANC_CPU:
+	case BPF_S_ANC_IFINDEX:
+	case BPF_S_ANC_MARK:
+	case BPF_S_ANC_PROTOCOL:
+	case BPF_S_ANC_RXHASH:
+	case BPF_S_ANC_VLAN_TAG:
+	case BPF_S_ANC_VLAN_TAG_PRESENT:
+	case BPF_S_ANC_QUEUE:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
+{
+	int i = 0, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	/* Adjust the stack pointer */
+	emit_stack_offset(-align_sp(offset), ctx);
+
+	if (ctx->flags & SEEN_CALL) {
+		/* Argument save area */
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is essentially a bitmap */
+	pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			pr_debug("%s: preserving register %d\n", __func__,
+				 MIPS_R_S0 + i);
+			emit_store_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					     ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* save return address */
+	if (ctx->flags & SEEN_CALL) {
+		emit_store_stack_reg(r_ra, r_sp, real_off, ctx);
+		real_off += RSIZE;
+	}
+
+	/* Setup r_M leaving the alignment gap if necessary */
+	if (ctx->flags & SEEN_MEM) {
+		if (real_off % (RSIZE * 2))
+			real_off += RSIZE;
+		emit_addiu(r_M, r_sp, real_off, ctx);
+	}
+
+}
+
+static void restore_bpf_jit_regs(struct jit_ctx *ctx,
+				 unsigned int offset)
+{
+	int i, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	if (ctx->flags & SEEN_CALL) {
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_load_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_load_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is a bitmap */
+	pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);
+	i = 0;
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			pr_debug("%s: restoring register %d\n", __func__,
+				 MIPS_R_S0 + i);
+			emit_load_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					    ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* restore return address */
+	if (ctx->flags & SEEN_CALL)
+		emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
+
+	/* Restore the sp and discard the scrach memory */
+	emit_stack_offset(align_sp(offset), ctx);
+
+}
+
+static unsigned int get_stack_depth(struct jit_ctx *ctx)
+{
+	int sp_off = 0;
+
+
+	/* How may s* regs do we need to preserved? */
+	sp_off += hweight32(ctx->flags >> SEEN_SREG_SFT) * RSIZE;
+
+	if (ctx->flags & SEEN_MEM)
+		sp_off += 4 * BPF_MEMWORDS; /* BPF_MEMWORDS are 32-bit */
+
+	if (ctx->flags & SEEN_CALL)
+		/*
+		 * The JIT code make calls to external functions using 2
+		 * arguments. Therefore, for o32 we don't need to allocate
+		 * space because we don't care if the argumetns are lost
+		 * across calls. We do need however to preserve incoming
+		 * arguments but the space is already allocated for us by
+		 * the caller. On the other hand, for n64, we need to allocate
+		 * this space ourselves. We need to preserve $ra as well.
+		 */
+		sp_off += config_enabled(CONFIG_64BIT) ?
+			(ARGS_USED_BY_JIT + 1) * RSIZE : RSIZE;
+
+	/*
+	 * Subtract the bytes for the last registers since we only care about
+	 * the location on the stack pointer.
+	 */
+	return sp_off - RSIZE;
+}
+
+static void build_prologue(struct jit_ctx *ctx)
+{
+	u16 first_inst = ctx->skf->insns[0].code;
+	int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+	sp_off = get_stack_depth(ctx);
+	save_bpf_jit_regs(ctx, sp_off);
+
+	if (ctx->flags & SEEN_SKB)
+		emit_reg_move(r_skb, MIPS_R_A0, ctx);
+
+	if (ctx->flags & FLAG_NEED_X_RESET)
+		emit_jit_reg_move(r_X, r_zero, ctx);
+
+	/* Do not leak kernel data to userspace */
+	if ((first_inst != BPF_S_RET_K) && !(is_load_to_a(first_inst)))
+		emit_jit_reg_move(r_A, r_zero, ctx);
+
+}
+
+static void build_epilogue(struct jit_ctx *ctx)
+{
+	unsigned int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+
+	sp_off = get_stack_depth(ctx);
+	restore_bpf_jit_regs(ctx, sp_off);
+
+	/* Return */
+	emit_jr(r_ra, ctx);
+	emit_nop(ctx);
+}
+
+static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
+{
+	u8 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 1);
+
+	return (u64)err << 32 | ret;
+}
+
+static u64 jit_get_skb_h(struct sk_buff *skb, unsigned offset)
+{
+	u16 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 2);
+
+	return (u64)err << 32 | ntohs(ret);
+}
+
+static u64 jit_get_skb_w(struct sk_buff *skb, unsigned offset)
+{
+	u32 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 4);
+
+	return (u64)err << 32 | ntohl(ret);
+}
+
+static int build_body(struct jit_ctx *ctx)
+{
+	void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
+	const struct sk_filter *prog = ctx->skf;
+	const struct sock_filter *inst;
+	unsigned int i, off, load_order, condt;
+	u32 k, b_off __maybe_unused;
+
+	for (i = 0; i < prog->len; i++) {
+		inst = &(prog->insns[i]);
+		pr_debug("%s: code->0x%02x, jt->0x%x, jf->0x%x, k->0x%x\n",
+			 __func__, inst->code, inst->jt, inst->jf, inst->k);
+		k = inst->k;
+
+		if (ctx->target == NULL)
+			ctx->offsets[i] = ctx->idx * 4;
+
+		switch (inst->code) {
+		case BPF_S_LD_IMM:
+			/* A <- k ==> li r_A, k */
+			ctx->flags |= SEEN_A;
+			emit_load_imm(r_A, k, ctx);
+			break;
+		case BPF_S_LD_W_LEN:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4);
+			/* A <- len ==> lw r_A, offset(skb) */
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_LD_MEM:
+			/* A <- M[k] ==> lw r_A, offset(M) */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_load(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LD_W_ABS:
+			/* A <- P[k:4] */
+			load_order = 2;
+			goto load;
+		case BPF_S_LD_H_ABS:
+			/* A <- P[k:2] */
+			load_order = 1;
+			goto load;
+		case BPF_S_LD_B_ABS:
+			/* A <- P[k:1] */
+			load_order = 0;
+load:
+			emit_load_imm(r_off, k, ctx);
+load_common:
+			ctx->flags |= SEEN_CALL | SEEN_OFF | SEEN_S0 |
+				SEEN_SKB | SEEN_A;
+
+			emit_load_func(r_s0, (ptr)load_func[load_order],
+				      ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			/* Load second argument to delay slot */
+			emit_reg_move(MIPS_R_A1, r_off, ctx);
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Get error code from the top 32-bits */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero, 3 << 2,
+					   ctx);
+			} else {
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_err, r_zero, 3 << 2,
+					   ctx);
+			}
+			emit_nop(ctx);
+			/* We are good */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_jit_reg_move(r_A, r_val, ctx);
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			break;
+		case BPF_S_LD_W_IND:
+			/* A <- P[X + k:4] */
+			load_order = 2;
+			goto load_ind;
+		case BPF_S_LD_H_IND:
+			/* A <- P[X + k:2] */
+			load_order = 1;
+			goto load_ind;
+		case BPF_S_LD_B_IND:
+			/* A <- P[X + k:1] */
+			load_order = 0;
+load_ind:
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_OFF | SEEN_X;
+			emit_addiu(r_off, r_X, k, ctx);
+		goto load_common;
+		case BPF_S_LDX_IMM:
+			/* X <- k */
+			ctx->flags |= SEEN_X;
+			emit_load_imm(r_X, k, ctx);
+			break;
+		case BPF_S_LDX_MEM:
+			/* X <- M[k] */
+			ctx->flags |= SEEN_X | SEEN_MEM;
+			emit_load(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LDX_W_LEN:
+			/* X <- len */
+			ctx->flags |= SEEN_X | SEEN_SKB;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_X, r_skb, off, ctx);
+			break;
+		case BPF_S_LDX_B_MSH:
+			/* X <- 4 * (P[k:1] & 0xf) */
+			ctx->flags |= SEEN_X | SEEN_CALL | SEEN_S0 | SEEN_SKB;
+			/* Load offset to a1 */
+			emit_load_func(r_s0, (ptr)jit_get_skb_b, ctx);
+			/*
+			 * This may emit two instructions so it may not fit
+			 * in the delay slot. So use a0 in the delay slot.
+			 */
+			emit_load_imm(MIPS_R_A1, k, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx); /* delay slot */
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Top 32-bits of $v0 on 64-bit */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero,
+					   3 << 2, ctx);
+			} else {
+				emit_bcond(MIPS_COND_NE, r_err, r_zero,
+					   3 << 2, ctx);
+			}
+			/* No need for delay slot */
+			/* We are good */
+			/* X <- P[1:K] & 0xf */
+			emit_andi(r_X, r_val, 0xf, ctx);
+			/* X << 2 */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_sll(r_X, r_X, 2, ctx); /* delay slot */
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_ret, 0, ctx); /* delay slot */
+			break;
+		case BPF_S_ST:
+			/* M[k] <- A */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_store(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_STX:
+			/* M[k] <- X */
+			ctx->flags |= SEEN_MEM | SEEN_X;
+			emit_store(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_ALU_ADD_K:
+			/* A += K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_SUB_K:
+			/* A -= K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, -k, ctx);
+			break;
+		case BPF_S_ALU_MUL_K:
+			/* A *= K */
+			/* Load K to scratch register before MUL */
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_mul(r_A, r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MUL_X:
+			/* A *= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_mul(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_DIV_K:
+			/* A /= k */
+			if (k == 1)
+				break;
+			if (optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_srl(r_A, r_A, k, ctx);
+				break;
+			}
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_div(r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_DIV_X:
+			/* A /= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_div(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_OR_K:
+			/* A |= K */
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_OR_X:
+			/* A |= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_XOR_K:
+			/* A ^= k */
+			ctx->flags |= SEEN_A;
+			emit_xori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ANC_ALU_XOR_X:
+		case BPF_S_ALU_XOR_X:
+			/* A ^= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_xor(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_AND_K:
+			/* A &= K */
+			ctx->flags |= SEEN_A;
+			emit_andi(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_AND_X:
+			/* A &= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_and(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_LSH_K:
+			/* A <<= K */
+			ctx->flags |= SEEN_A;
+			emit_sll(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_LSH_X:
+			/* A <<= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_sllv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_RSH_K:
+			/* A >>= K */
+			ctx->flags |= SEEN_A;
+			emit_srl(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_RSH_X:
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_srlv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_NEG:
+			/* A = -A */
+			ctx->flags |= SEEN_A;
+			emit_neg(r_A, ctx);
+			break;
+		case BPF_S_JMP_JA:
+			/* pc += K */
+			emit_b(b_imm(i + k + 1, ctx), ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JEQ_K:
+			/* pc += ( A == K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JEQ_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A == X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_K:
+			/* pc += ( A >= K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A >= X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_K:
+			/* pc += ( A > K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A > X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_X;
+jmp_cmp:
+			/* Greater or Equal */
+			if ((condt & MIPS_COND_GE) ||
+			    (condt & MIPS_COND_GT)) {
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_sltiu(r_s0, r_A, k, ctx);
+				} else { /* X */
+					ctx->flags |= SEEN_S0 | SEEN_A |
+						SEEN_X;
+					emit_sltu(r_s0, r_A, r_X, ctx);
+				}
+				/* A < (K|X) ? r_scrach = 1 */
+				b_off = b_imm(i + inst->jf + 1, ctx);
+				emit_bcond(MIPS_COND_GT, r_s0, r_zero, b_off,
+					   ctx);
+				emit_nop(ctx);
+				/* A > (K|X) ? scratch = 0 */
+				if (condt & MIPS_COND_GT) {
+					/* Checking for equality */
+					ctx->flags |= SEEN_S0 | SEEN_A | SEEN_X;
+					if (condt & MIPS_COND_K)
+						emit_load_imm(r_s0, k, ctx);
+					else
+						emit_jit_reg_move(r_s0, r_X,
+								  ctx);
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* Finally, A > K|X */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				} else {
+					/* A >= (K|X) so jump */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				}
+			} else {
+				/* A == K|X */
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_load_imm(r_s0, k, ctx);
+					/* jump true */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+				} else { /* X */
+					/* jump true */
+					ctx->flags |= SEEN_A | SEEN_X;
+					b_off = b_imm(i + inst->jt + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+				}
+			}
+			break;
+		case BPF_S_JMP_JSET_K:
+			ctx->flags |= SEEN_S0 | SEEN_S1 | SEEN_A;
+			/* pc += (A & K) ? pc -> jt : pc -> jf */
+			emit_load_imm(r_s1, k, ctx);
+			emit_and(r_s0, r_A, r_s1, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JSET_X:
+			ctx->flags |= SEEN_S0 | SEEN_X | SEEN_A;
+			/* pc += (A & X) ? pc -> jt : pc -> jf */
+			emit_and(r_s0, r_A, r_X, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_RET_A:
+			ctx->flags |= SEEN_A;
+			if (i != prog->len - 1)
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_A, ctx); /* delay slot */
+			break;
+		case BPF_S_RET_K:
+			/*
+			 * It can emit two instructions so it does not fit on
+			 * the delay slot.
+			 */
+			emit_load_imm(r_ret, k, ctx);
+			if (i != prog->len - 1) {
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+				emit_nop(ctx);
+			}
+			break;
+		case BPF_S_MISC_TAX:
+			/* X = A */
+			ctx->flags |= SEEN_X | SEEN_A;
+			emit_jit_reg_move(r_X, r_A, ctx);
+			break;
+		case BPF_S_MISC_TXA:
+			/* A = X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_jit_reg_move(r_A, r_X, ctx);
+			break;
+		/* AUX */
+		case BPF_S_ANC_PROTOCOL:
+			/* A = ntohs(skb->protocol */
+			ctx->flags |= SEEN_SKB | SEEN_OFF | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  protocol) != 2);
+			off = offsetof(struct sk_buff, protocol);
+			emit_half_load(r_A, r_skb, off, ctx);
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+			/* This needs little endian fixup */
+			if (config_enabled(CPU_MIPSR1)) {
+				/* Get first byte */
+				emit_andi(r_tmp_imm, r_A, 0xff, ctx);
+				/* Shift it */
+				emit_sll(r_tmp, r_tmp_imm, 8, ctx);
+				/* Get second byte */
+				emit_srl(r_tmp_imm, r_A, 8, ctx);
+				emit_andi(r_tmp_imm, r_tmp_imm, 0xff, ctx);
+				/* Put everyting together in r_A */
+				emit_or(r_A, r_tmp, r_tmp_imm, ctx);
+			} else {
+				/* R2 and later have the wsbh instruction */
+				emit_wsbh(r_A, r_A, ctx);
+			}
+#endif
+			break;
+		case BPF_S_ANC_CPU:
+			ctx->flags |= SEEN_A | SEEN_OFF;
+			/* A = current_thread_info()->cpu */
+			BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info,
+						  cpu) != 4);
+			off = offsetof(struct thread_info, cpu);
+			/* $28/gp points to the thread_info struct */
+			emit_load(r_A, 28, off, ctx);
+			break;
+		case BPF_S_ANC_IFINDEX:
+			/* A = skb->dev->ifindex */
+			ctx->flags |= SEEN_SKB | SEEN_A | SEEN_S0;
+			off = offsetof(struct sk_buff, dev);
+			emit_load(r_s0, r_skb, off, ctx);
+			/* error (0) in the delay slot */
+			emit_bcond(MIPS_COND_EQ, r_s0, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+						  ifindex) != 4);
+			off = offsetof(struct net_device, ifindex);
+			emit_load(r_A, r_s0, off, ctx);
+			break;
+		case BPF_S_ANC_MARK:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
+			off = offsetof(struct sk_buff, mark);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_RXHASH:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, hash) != 4);
+			off = offsetof(struct sk_buff, hash);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_VLAN_TAG_PRESENT:
+			ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  vlan_tci) != 2);
+			off = offsetof(struct sk_buff, vlan_tci);
+			emit_half_load(r_s0, r_skb, off, ctx);
+			if (inst->code == BPF_S_ANC_VLAN_TAG)
+				emit_and(r_A, r_s0, VLAN_VID_MASK, ctx);
+			else
+				emit_and(r_A, r_s0, VLAN_TAG_PRESENT, ctx);
+			break;
+		case BPF_S_ANC_QUEUE:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  queue_mapping) != 2);
+			BUILD_BUG_ON(offsetof(struct sk_buff,
+					      queue_mapping) > 0xff);
+			off = offsetof(struct sk_buff, queue_mapping);
+			emit_half_load(r_A, r_skb, off, ctx);
+			break;
+		default:
+			pr_warn("%s: Unhandled opcode: 0x%02x\n", __FILE__,
+				inst->code);
+			return -1;
+		}
+	}
+
+	/* compute offsets only during the first pass */
+	if (ctx->target == NULL)
+		ctx->offsets[i] = ctx->idx * 4;
+
+	return 0;
+
+}
+
+int bpf_jit_enable __read_mostly;
+
+void bpf_jit_compile(struct sk_filter *fp)
+{
+	struct jit_ctx ctx;
+	unsigned int alloc_size, tmp_idx;
+
+	if (!bpf_jit_enable)
+		return;
+
+	memset(&ctx, 0, sizeof(ctx));
+
+	ctx.offsets = kcalloc(fp->len, sizeof(*ctx.offsets), GFP_KERNEL);
+	if (ctx.offsets == NULL)
+		return;
+
+	ctx.skf = fp;
+
+	if (unlikely(build_body(&ctx)))
+		goto out;
+
+	tmp_idx = ctx.idx;
+	build_prologue(&ctx);
+	ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
+	/* just to complete the ctx.idx count */
+	build_epilogue(&ctx);
+
+	alloc_size = 4 * ctx.idx;
+	ctx.target = module_alloc(alloc_size);
+	if (ctx.target == NULL)
+		goto out;
+
+	/* Clean it */
+	memset(ctx.target, 0, alloc_size);
+
+	ctx.idx = 0;
+
+	/* Generate the actual JIT code */
+	build_prologue(&ctx);
+	build_body(&ctx);
+	build_epilogue(&ctx);
+
+	/* Update the icache */
+	flush_icache_range((ptr)ctx.target, (ptr)(ctx.target + ctx.idx));
+
+	if (bpf_jit_enable > 1)
+		/* Dump JIT code */
+		bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);
+
+	fp->bpf_func = (void *)ctx.target;
+	fp->jited = 1;
+
+out:
+	kfree(ctx.offsets);
+}
+
+void bpf_jit_free(struct sk_filter *fp)
+{
+	if (fp->jited)
+		module_free(NULL, fp->bpf_func);
+	kfree(fp);
+}
diff --git a/arch/mips/net/bpf_jit.h b/arch/mips/net/bpf_jit.h
new file mode 100644
index 0000000..3a5751b
--- /dev/null
+++ b/arch/mips/net/bpf_jit.h
@@ -0,0 +1,44 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#ifndef BPF_JIT_MIPS_OP_H
+#define BPF_JIT_MIPS_OP_H
+
+/* Registers used by JIT */
+#define MIPS_R_ZERO	0
+#define MIPS_R_V0	2
+#define MIPS_R_V1	3
+#define MIPS_R_A0	4
+#define MIPS_R_A1	5
+#define MIPS_R_T6	14
+#define MIPS_R_T7	15
+#define MIPS_R_S0	16
+#define MIPS_R_S1	17
+#define MIPS_R_S2	18
+#define MIPS_R_S3	19
+#define MIPS_R_S4	20
+#define MIPS_R_S5	21
+#define MIPS_R_S6	22
+#define MIPS_R_S7	23
+#define MIPS_R_SP	29
+#define MIPS_R_RA	31
+
+/* Conditional codes */
+#define MIPS_COND_EQ	0x1
+#define MIPS_COND_GE	(0x1 << 1)
+#define MIPS_COND_GT	(0x1 << 2)
+#define MIPS_COND_NE	(0x1 << 3)
+#define MIPS_COND_ALL	(0x1 << 4)
+/* Conditionals on X register or K immediate */
+#define MIPS_COND_X	(0x1 << 5)
+#define MIPS_COND_K	(0x1 << 6)
+
+#endif /* BPF_JIT_MIPS_OP_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 13/14] MIPS: net: Add BPF JIT
@ 2014-04-09 16:00     ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-09 16:00 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras, David S. Miller, netdev

This adds initial support for BPF-JIT on MIPS

Tested on mips32 LE/BE and mips64 BE/n64 using
dhcp, ping and various tcpdump filters.

Benchmarking:

Assuming the remote MIPS target uses 192.168.154.181
as its IP address, and the local host uses 192.168.154.136,
the following results can be obtained using the following
tcpdump filter (catches no frames) and a simple
'time ping -f -c 1000000' command.

[root@(none) ~]# tcpdump -p -n -s 0 -i eth0 net 10.0.0.0/24 -d
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 8
(002) ld       [26]
(003) and      #0xffffff00
(004) jeq      #0xa000000       jt 16	jf 5
(005) ld       [30]
(006) and      #0xffffff00
(007) jeq      #0xa000000       jt 16	jf 17
(008) jeq      #0x806           jt 10	jf 9
(009) jeq      #0x8035          jt 10	jf 17
(010) ld       [28]
(011) and      #0xffffff00
(012) jeq      #0xa000000       jt 16	jf 13
(013) ld       [38]
(014) and      #0xffffff00
(015) jeq      #0xa000000       jt 16	jf 17
(016) ret      #65535

- BPF-JIT Disabled

real    1m38.005s
user    0m1.510s
sys     0m6.710s

- BPF-JIT Enabled

real    1m35.215s
user    0m1.200s
sys     0m4.140s

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
Changes since v1:

- Add 'wsbh' implementation for MIPS32/64R1
- Rename 'rxhash' to 'hash' because of
61b905da33ae25edb6b9d2a5de21e34c3a77efe3
- Initialize 'fp->jited' because of
f8bbbfc3b97f4c7a6c7c23185e520b22bfc3a21d
---
 arch/mips/Kbuild        |    1 +
 arch/mips/net/Makefile  |    3 +
 arch/mips/net/bpf_jit.c | 1334 +++++++++++++++++++++++++++++++++++++++++++++++
 arch/mips/net/bpf_jit.h |   44 ++
 4 files changed, 1382 insertions(+)
 create mode 100644 arch/mips/net/Makefile
 create mode 100644 arch/mips/net/bpf_jit.c
 create mode 100644 arch/mips/net/bpf_jit.h

diff --git a/arch/mips/Kbuild b/arch/mips/Kbuild
index d2cfe45..9179336 100644
--- a/arch/mips/Kbuild
+++ b/arch/mips/Kbuild
@@ -17,6 +17,7 @@ obj- := $(platform-)
 obj-y += kernel/
 obj-y += mm/
 obj-y += math-emu/
+obj-y += net/
 
 ifdef CONFIG_KVM
 obj-y += kvm/
diff --git a/arch/mips/net/Makefile b/arch/mips/net/Makefile
new file mode 100644
index 0000000..ae74b3a
--- /dev/null
+++ b/arch/mips/net/Makefile
@@ -0,0 +1,3 @@
+# MIPS networking code
+
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
new file mode 100644
index 0000000..a12f843
--- /dev/null
+++ b/arch/mips/net/bpf_jit.c
@@ -0,0 +1,1334 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <linux/kconfig.h>
+#include <linux/moduleloader.h>
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <asm/bitops.h>
+#include <asm/cacheflush.h>
+#include <asm/uasm.h>
+
+#include "bpf_jit.h"
+
+/* ABI
+ *
+ * s0	1st scratch register
+ * s1	2nd scratch register
+ * s2	offset register
+ * s3	BPF register A
+ * s4	BPF register X
+ * s5	*skb
+ * s6	*scratch memory
+ *
+ * On entry (*bpf_func)(*skb, *filter)
+ * a0 = MIPS_R_A0 = skb;
+ * a1 = MIPS_R_A1 = filter;
+ *
+ * Stack
+ * ...
+ * M[15]
+ * M[14]
+ * M[13]
+ * ...
+ * M[0] <-- r_M
+ * saved reg k-1
+ * saved reg k-2
+ * ...
+ * saved reg 0 <-- r_sp
+ * <no argument area>
+ *
+ *                     Packet layout
+ *
+ * <--------------------- len ------------------------>
+ * <--skb-len(r_skb_hl)-->< ----- skb->data_len ------>
+ * ----------------------------------------------------
+ * |                  skb->data                       |
+ * ----------------------------------------------------
+ */
+
+#define RSIZE	(sizeof(unsigned long))
+#define ptr typeof(unsigned long)
+
+/* ABI specific return values */
+#ifdef CONFIG_32BIT /* O32 */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define r_err	MIPS_R_V1
+#define r_val	MIPS_R_V0
+#else /* CONFIG_CPU_LITTLE_ENDIAN */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V1
+#endif
+#else /* N64 */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V0
+#endif
+
+#define r_ret	MIPS_R_V0
+
+/*
+ * Use 2 scratch registers to avoid pipeline interlocks.
+ * There is no overhead during epilogue and prologue since
+ * any of the $s0-$s6 registers will only be preserved if
+ * they are going to actually be used.
+ */
+#define r_s0		MIPS_R_S0 /* scratch reg 1 */
+#define r_s1		MIPS_R_S1 /* scratch reg 2 */
+#define r_off		MIPS_R_S2
+#define r_A		MIPS_R_S3
+#define r_X		MIPS_R_S4
+#define r_skb		MIPS_R_S5
+#define r_M		MIPS_R_S6
+#define r_tmp_imm	MIPS_R_T6 /* No need to preserve this */
+#define r_tmp		MIPS_R_T7 /* No need to preserve this */
+#define r_zero		MIPS_R_ZERO
+#define r_sp		MIPS_R_SP
+#define r_ra		MIPS_R_RA
+
+#define SCRATCH_OFF(k)		(4 * (k))
+
+/* JIT flags */
+#define SEEN_CALL		(1 << BPF_MEMWORDS)
+#define SEEN_SREG_SFT		(BPF_MEMWORDS + 1)
+#define SEEN_SREG_BASE		(1 << SEEN_SREG_SFT)
+#define SEEN_SREG(x)		(SEEN_SREG_BASE << (x))
+#define SEEN_S0			SEEN_SREG(0)
+#define SEEN_S1			SEEN_SREG(1)
+#define SEEN_OFF		SEEN_SREG(2)
+#define SEEN_A			SEEN_SREG(3)
+#define SEEN_X			SEEN_SREG(4)
+#define SEEN_SKB		SEEN_SREG(5)
+#define SEEN_MEM		SEEN_SREG(6)
+
+/* Arguments used by JIT */
+#define ARGS_USED_BY_JIT	2 /* only applicable to 64-bit */
+
+#define FLAG_NEED_X_RESET	(1 << 0)
+
+#define SBIT(x)			(1 << (x)) /* Signed version of BIT() */
+
+/**
+ * struct jit_ctx - JIT context
+ * @skf:		The sk_filter
+ * @prologue_bytes:	Number of bytes for prologue
+ * @idx:		Instruction index
+ * @flags:		JIT flags
+ * @offsets:		Instruction offsets
+ * @target:		Memory location for the compiled filter
+ */
+struct jit_ctx {
+	const struct sk_filter *skf;
+	unsigned int prologue_bytes;
+	u32 idx;
+	u32 flags;
+	u32 *offsets;
+	u32 *target;
+};
+
+
+static inline int optimize_div(u32 *k)
+{
+	/* power of 2 divides can be implemented with right shift */
+	if (!(*k & (*k-1))) {
+		*k = ilog2(*k);
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Simply emit the instruction if the JIT memory space has been allocated */
+#define emit_instr(ctx, func, ...)			\
+do {							\
+	if ((ctx)->target != NULL) {			\
+		u32 *p = &(ctx)->target[ctx->idx];	\
+		uasm_i_##func(&p, ##__VA_ARGS__);	\
+	}						\
+	(ctx)->idx++;					\
+} while (0)
+
+/* Determine if immediate is within the 16-bit signed range */
+static inline bool is_range16(s32 imm)
+{
+	if (imm >= SBIT(15) || imm < -SBIT(15))
+		return true;
+	return false;
+}
+
+static inline void emit_addu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, addu, dst, src1, src2);
+}
+
+static inline void emit_nop(struct jit_ctx *ctx)
+{
+	emit_instr(ctx, nop);
+}
+
+/* Load a u32 immediate to a register */
+static inline void emit_load_imm(unsigned int dst, u32 imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		/* addiu can only handle s16 */
+		if (is_range16(imm)) {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_lui(&p, r_tmp_imm, (s32)imm >> 16);
+			p = &ctx->target[ctx->idx + 1];
+			uasm_i_ori(&p, dst, r_tmp_imm, imm & 0xffff);
+		} else {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_addiu(&p, dst, r_zero, imm);
+		}
+	}
+	ctx->idx++;
+
+	if (is_range16(imm))
+		ctx->idx++;
+}
+
+static inline void emit_or(unsigned int dst, unsigned int src1,
+			   unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, or, dst, src1, src2);
+}
+
+static inline void emit_ori(unsigned int dst, unsigned src, u32 imm,
+			    struct jit_ctx *ctx)
+{
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_or(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, ori, dst, src, imm);
+	}
+
+}
+
+
+static inline void emit_daddu(unsigned int dst, unsigned int src1,
+			      unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, daddu, dst, src1, src2);
+}
+
+static inline void emit_daddiu(unsigned int dst, unsigned int src,
+			       int imm, struct jit_ctx *ctx)
+{
+	/*
+	 * Only used for stack, so the imm is relatively small
+	 * and it fits in 15-bits
+	 */
+	emit_instr(ctx, daddiu, dst, src, imm);
+}
+
+static inline void emit_addiu(unsigned int dst, unsigned int src,
+			      u32 imm, struct jit_ctx *ctx)
+{
+	if (is_range16(imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_addu(dst, r_tmp, src, ctx);
+	} else {
+		emit_instr(ctx, addiu, dst, src, imm);
+	}
+}
+
+static inline void emit_and(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, and, dst, src1, src2);
+}
+
+static inline void emit_andi(unsigned int dst, unsigned int src,
+			     u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_and(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, andi, dst, src, imm);
+	}
+}
+
+static inline void emit_xor(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, xor, dst, src1, src2);
+}
+
+static inline void emit_xori(ptr dst, ptr src, u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_xor(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, xori, dst, src, imm);
+	}
+}
+
+static inline void emit_stack_offset(int offset, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, daddiu, r_sp, r_sp, offset);
+	else
+		emit_instr(ctx, addiu, r_sp, r_sp, offset);
+
+}
+
+static inline void emit_subu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, subu, dst, src1, src2);
+}
+
+static inline void emit_neg(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_subu(reg, r_zero, reg, ctx);
+}
+
+static inline void emit_sllv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sllv, dst, src, sa);
+}
+
+static inline void emit_sll(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, sll, dst, src, sa);
+}
+
+static inline void emit_srlv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, srlv, dst, src, sa);
+}
+
+static inline void emit_srl(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, srl, dst, src, sa);
+}
+
+static inline void emit_sltu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sltu, dst, src1, src2);
+}
+
+static inline void emit_sltiu(unsigned dst, unsigned int src,
+			      unsigned int imm, struct jit_ctx *ctx)
+{
+	/* 16 bit immediate */
+	if (is_range16((s32)imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_sltu(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, sltiu, dst, src, imm);
+	}
+
+}
+
+/* Store register on the stack */
+static inline void emit_store_stack_reg(ptr reg, ptr base,
+					unsigned int offset,
+					struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, sd, reg, offset, base);
+	else
+		emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_store(ptr reg, ptr base, unsigned int offset,
+			      struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_load_stack_reg(ptr reg, ptr base,
+				       unsigned int offset,
+				       struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, ld, reg, offset, base);
+	else
+		emit_instr(ctx, lw, reg, offset, base);
+
+}
+
+static inline void emit_load(unsigned int reg, unsigned int base,
+			     unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_half_load(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lh, reg, offset, base);
+}
+
+static inline void emit_mul(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, mul, dst, src1, src2);
+}
+
+static inline void emit_div(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mfhi(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_dsll(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsll, dst, src, sa);
+}
+
+static inline void emit_dsrl32(unsigned int dst, unsigned int src,
+			       unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsrl32, dst, src, sa);
+}
+
+static inline void emit_wsbh(unsigned int dst, unsigned int src,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, wsbh, dst, src);
+}
+
+/* load a function pointer to register */
+static inline void emit_load_func(unsigned int reg, ptr imm,
+				  struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT)) {
+		/* At this point imm is always 64-bit */
+		emit_load_imm(r_tmp, (u64)imm >> 32, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(r_tmp, r_tmp_imm, (imm >> 16) & 0xffff, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(reg, r_tmp_imm, imm & 0xffff, ctx);
+	} else {
+		emit_load_imm(reg, imm, ctx);
+	}
+}
+
+/* Move to real MIPS register */
+static inline void emit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_daddu(dst, src, r_zero, ctx);
+	else
+		emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Move to JIT (32-bit) register */
+static inline void emit_jit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Compute the immediate value for PC-relative branches. */
+static inline u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
+{
+	if (ctx->target == NULL)
+		return 0;
+
+	/*
+	 * We want a pc-relative branch. We only do forward branches
+	 * so tgt is always after pc. tgt is the instruction offset
+	 * we want to jump to.
+
+	 * Branch on MIPS:
+	 * I: target_offset <- sign_extend(offset)
+	 * I+1: PC += target_offset (delay slot)
+	 *
+	 * ctx->idx currently points to the branch instruction
+	 * but the offset is added to the delay slot so we need
+	 * to subtract 4.
+	 */
+	return ctx->offsets[tgt] -
+		(ctx->idx * 4 - ctx->prologue_bytes) - 4;
+}
+
+static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
+			     unsigned int imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+
+		switch (cond) {
+		case MIPS_COND_EQ:
+			uasm_i_beq(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_NE:
+			uasm_i_bne(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_ALL:
+			uasm_i_b(&p, imm);
+			break;
+		default:
+			pr_warn("%s: Unhandled branch conditional: %d\n",
+				__func__, cond);
+		}
+	}
+	ctx->idx++;
+}
+
+static inline void emit_b(unsigned int imm, struct jit_ctx *ctx)
+{
+	emit_bcond(MIPS_COND_ALL, r_zero, r_zero, imm, ctx);
+}
+
+static inline void emit_jalr(unsigned int link, unsigned int reg,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jalr, link, reg);
+}
+
+static inline void emit_jr(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jr, reg);
+}
+
+static inline u16 align_sp(unsigned int num)
+{
+	/* Double word alignment for 32-bit, quadword for 64-bit */
+	unsigned int align = config_enabled(CONFIG_64BIT) ? 16 : 8;
+	num = (num + (align - 1)) & -align;
+	return num;
+}
+
+static inline void update_on_xread(struct jit_ctx *ctx)
+{
+	if (!(ctx->flags & SEEN_X))
+		ctx->flags |= FLAG_NEED_X_RESET;
+
+	ctx->flags |= SEEN_X;
+}
+
+static bool is_load_to_a(u16 inst)
+{
+	switch (inst) {
+	case BPF_S_LD_W_LEN:
+	case BPF_S_LD_W_ABS:
+	case BPF_S_LD_H_ABS:
+	case BPF_S_LD_B_ABS:
+	case BPF_S_ANC_CPU:
+	case BPF_S_ANC_IFINDEX:
+	case BPF_S_ANC_MARK:
+	case BPF_S_ANC_PROTOCOL:
+	case BPF_S_ANC_RXHASH:
+	case BPF_S_ANC_VLAN_TAG:
+	case BPF_S_ANC_VLAN_TAG_PRESENT:
+	case BPF_S_ANC_QUEUE:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
+{
+	int i = 0, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	/* Adjust the stack pointer */
+	emit_stack_offset(-align_sp(offset), ctx);
+
+	if (ctx->flags & SEEN_CALL) {
+		/* Argument save area */
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is essentially a bitmap */
+	pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			pr_debug("%s: preserving register %d\n", __func__,
+				 MIPS_R_S0 + i);
+			emit_store_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					     ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* save return address */
+	if (ctx->flags & SEEN_CALL) {
+		emit_store_stack_reg(r_ra, r_sp, real_off, ctx);
+		real_off += RSIZE;
+	}
+
+	/* Setup r_M leaving the alignment gap if necessary */
+	if (ctx->flags & SEEN_MEM) {
+		if (real_off % (RSIZE * 2))
+			real_off += RSIZE;
+		emit_addiu(r_M, r_sp, real_off, ctx);
+	}
+
+}
+
+static void restore_bpf_jit_regs(struct jit_ctx *ctx,
+				 unsigned int offset)
+{
+	int i, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	if (ctx->flags & SEEN_CALL) {
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_load_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_load_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is a bitmap */
+	pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);
+	i = 0;
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			pr_debug("%s: restoring register %d\n", __func__,
+				 MIPS_R_S0 + i);
+			emit_load_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					    ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* restore return address */
+	if (ctx->flags & SEEN_CALL)
+		emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
+
+	/* Restore the sp and discard the scrach memory */
+	emit_stack_offset(align_sp(offset), ctx);
+
+}
+
+static unsigned int get_stack_depth(struct jit_ctx *ctx)
+{
+	int sp_off = 0;
+
+
+	/* How may s* regs do we need to preserved? */
+	sp_off += hweight32(ctx->flags >> SEEN_SREG_SFT) * RSIZE;
+
+	if (ctx->flags & SEEN_MEM)
+		sp_off += 4 * BPF_MEMWORDS; /* BPF_MEMWORDS are 32-bit */
+
+	if (ctx->flags & SEEN_CALL)
+		/*
+		 * The JIT code make calls to external functions using 2
+		 * arguments. Therefore, for o32 we don't need to allocate
+		 * space because we don't care if the argumetns are lost
+		 * across calls. We do need however to preserve incoming
+		 * arguments but the space is already allocated for us by
+		 * the caller. On the other hand, for n64, we need to allocate
+		 * this space ourselves. We need to preserve $ra as well.
+		 */
+		sp_off += config_enabled(CONFIG_64BIT) ?
+			(ARGS_USED_BY_JIT + 1) * RSIZE : RSIZE;
+
+	/*
+	 * Subtract the bytes for the last registers since we only care about
+	 * the location on the stack pointer.
+	 */
+	return sp_off - RSIZE;
+}
+
+static void build_prologue(struct jit_ctx *ctx)
+{
+	u16 first_inst = ctx->skf->insns[0].code;
+	int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+	sp_off = get_stack_depth(ctx);
+	save_bpf_jit_regs(ctx, sp_off);
+
+	if (ctx->flags & SEEN_SKB)
+		emit_reg_move(r_skb, MIPS_R_A0, ctx);
+
+	if (ctx->flags & FLAG_NEED_X_RESET)
+		emit_jit_reg_move(r_X, r_zero, ctx);
+
+	/* Do not leak kernel data to userspace */
+	if ((first_inst != BPF_S_RET_K) && !(is_load_to_a(first_inst)))
+		emit_jit_reg_move(r_A, r_zero, ctx);
+
+}
+
+static void build_epilogue(struct jit_ctx *ctx)
+{
+	unsigned int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+
+	sp_off = get_stack_depth(ctx);
+	restore_bpf_jit_regs(ctx, sp_off);
+
+	/* Return */
+	emit_jr(r_ra, ctx);
+	emit_nop(ctx);
+}
+
+static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
+{
+	u8 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 1);
+
+	return (u64)err << 32 | ret;
+}
+
+static u64 jit_get_skb_h(struct sk_buff *skb, unsigned offset)
+{
+	u16 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 2);
+
+	return (u64)err << 32 | ntohs(ret);
+}
+
+static u64 jit_get_skb_w(struct sk_buff *skb, unsigned offset)
+{
+	u32 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 4);
+
+	return (u64)err << 32 | ntohl(ret);
+}
+
+static int build_body(struct jit_ctx *ctx)
+{
+	void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
+	const struct sk_filter *prog = ctx->skf;
+	const struct sock_filter *inst;
+	unsigned int i, off, load_order, condt;
+	u32 k, b_off __maybe_unused;
+
+	for (i = 0; i < prog->len; i++) {
+		inst = &(prog->insns[i]);
+		pr_debug("%s: code->0x%02x, jt->0x%x, jf->0x%x, k->0x%x\n",
+			 __func__, inst->code, inst->jt, inst->jf, inst->k);
+		k = inst->k;
+
+		if (ctx->target == NULL)
+			ctx->offsets[i] = ctx->idx * 4;
+
+		switch (inst->code) {
+		case BPF_S_LD_IMM:
+			/* A <- k ==> li r_A, k */
+			ctx->flags |= SEEN_A;
+			emit_load_imm(r_A, k, ctx);
+			break;
+		case BPF_S_LD_W_LEN:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4);
+			/* A <- len ==> lw r_A, offset(skb) */
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_LD_MEM:
+			/* A <- M[k] ==> lw r_A, offset(M) */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_load(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LD_W_ABS:
+			/* A <- P[k:4] */
+			load_order = 2;
+			goto load;
+		case BPF_S_LD_H_ABS:
+			/* A <- P[k:2] */
+			load_order = 1;
+			goto load;
+		case BPF_S_LD_B_ABS:
+			/* A <- P[k:1] */
+			load_order = 0;
+load:
+			emit_load_imm(r_off, k, ctx);
+load_common:
+			ctx->flags |= SEEN_CALL | SEEN_OFF | SEEN_S0 |
+				SEEN_SKB | SEEN_A;
+
+			emit_load_func(r_s0, (ptr)load_func[load_order],
+				      ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			/* Load second argument to delay slot */
+			emit_reg_move(MIPS_R_A1, r_off, ctx);
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Get error code from the top 32-bits */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero, 3 << 2,
+					   ctx);
+			} else {
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_err, r_zero, 3 << 2,
+					   ctx);
+			}
+			emit_nop(ctx);
+			/* We are good */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_jit_reg_move(r_A, r_val, ctx);
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			break;
+		case BPF_S_LD_W_IND:
+			/* A <- P[X + k:4] */
+			load_order = 2;
+			goto load_ind;
+		case BPF_S_LD_H_IND:
+			/* A <- P[X + k:2] */
+			load_order = 1;
+			goto load_ind;
+		case BPF_S_LD_B_IND:
+			/* A <- P[X + k:1] */
+			load_order = 0;
+load_ind:
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_OFF | SEEN_X;
+			emit_addiu(r_off, r_X, k, ctx);
+		goto load_common;
+		case BPF_S_LDX_IMM:
+			/* X <- k */
+			ctx->flags |= SEEN_X;
+			emit_load_imm(r_X, k, ctx);
+			break;
+		case BPF_S_LDX_MEM:
+			/* X <- M[k] */
+			ctx->flags |= SEEN_X | SEEN_MEM;
+			emit_load(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LDX_W_LEN:
+			/* X <- len */
+			ctx->flags |= SEEN_X | SEEN_SKB;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_X, r_skb, off, ctx);
+			break;
+		case BPF_S_LDX_B_MSH:
+			/* X <- 4 * (P[k:1] & 0xf) */
+			ctx->flags |= SEEN_X | SEEN_CALL | SEEN_S0 | SEEN_SKB;
+			/* Load offset to a1 */
+			emit_load_func(r_s0, (ptr)jit_get_skb_b, ctx);
+			/*
+			 * This may emit two instructions so it may not fit
+			 * in the delay slot. So use a0 in the delay slot.
+			 */
+			emit_load_imm(MIPS_R_A1, k, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx); /* delay slot */
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Top 32-bits of $v0 on 64-bit */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero,
+					   3 << 2, ctx);
+			} else {
+				emit_bcond(MIPS_COND_NE, r_err, r_zero,
+					   3 << 2, ctx);
+			}
+			/* No need for delay slot */
+			/* We are good */
+			/* X <- P[1:K] & 0xf */
+			emit_andi(r_X, r_val, 0xf, ctx);
+			/* X << 2 */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_sll(r_X, r_X, 2, ctx); /* delay slot */
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_ret, 0, ctx); /* delay slot */
+			break;
+		case BPF_S_ST:
+			/* M[k] <- A */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_store(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_STX:
+			/* M[k] <- X */
+			ctx->flags |= SEEN_MEM | SEEN_X;
+			emit_store(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_ALU_ADD_K:
+			/* A += K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_SUB_K:
+			/* A -= K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, -k, ctx);
+			break;
+		case BPF_S_ALU_MUL_K:
+			/* A *= K */
+			/* Load K to scratch register before MUL */
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_mul(r_A, r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MUL_X:
+			/* A *= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_mul(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_DIV_K:
+			/* A /= k */
+			if (k == 1)
+				break;
+			if (optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_srl(r_A, r_A, k, ctx);
+				break;
+			}
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_div(r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_DIV_X:
+			/* A /= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_div(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_OR_K:
+			/* A |= K */
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_OR_X:
+			/* A |= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_XOR_K:
+			/* A ^= k */
+			ctx->flags |= SEEN_A;
+			emit_xori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ANC_ALU_XOR_X:
+		case BPF_S_ALU_XOR_X:
+			/* A ^= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_xor(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_AND_K:
+			/* A &= K */
+			ctx->flags |= SEEN_A;
+			emit_andi(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_AND_X:
+			/* A &= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_and(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_LSH_K:
+			/* A <<= K */
+			ctx->flags |= SEEN_A;
+			emit_sll(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_LSH_X:
+			/* A <<= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_sllv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_RSH_K:
+			/* A >>= K */
+			ctx->flags |= SEEN_A;
+			emit_srl(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_RSH_X:
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_srlv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_NEG:
+			/* A = -A */
+			ctx->flags |= SEEN_A;
+			emit_neg(r_A, ctx);
+			break;
+		case BPF_S_JMP_JA:
+			/* pc += K */
+			emit_b(b_imm(i + k + 1, ctx), ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JEQ_K:
+			/* pc += ( A == K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JEQ_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A == X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_K:
+			/* pc += ( A >= K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A >= X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_K:
+			/* pc += ( A > K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A > X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_X;
+jmp_cmp:
+			/* Greater or Equal */
+			if ((condt & MIPS_COND_GE) ||
+			    (condt & MIPS_COND_GT)) {
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_sltiu(r_s0, r_A, k, ctx);
+				} else { /* X */
+					ctx->flags |= SEEN_S0 | SEEN_A |
+						SEEN_X;
+					emit_sltu(r_s0, r_A, r_X, ctx);
+				}
+				/* A < (K|X) ? r_scrach = 1 */
+				b_off = b_imm(i + inst->jf + 1, ctx);
+				emit_bcond(MIPS_COND_GT, r_s0, r_zero, b_off,
+					   ctx);
+				emit_nop(ctx);
+				/* A > (K|X) ? scratch = 0 */
+				if (condt & MIPS_COND_GT) {
+					/* Checking for equality */
+					ctx->flags |= SEEN_S0 | SEEN_A | SEEN_X;
+					if (condt & MIPS_COND_K)
+						emit_load_imm(r_s0, k, ctx);
+					else
+						emit_jit_reg_move(r_s0, r_X,
+								  ctx);
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* Finally, A > K|X */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				} else {
+					/* A >= (K|X) so jump */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				}
+			} else {
+				/* A == K|X */
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_load_imm(r_s0, k, ctx);
+					/* jump true */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+				} else { /* X */
+					/* jump true */
+					ctx->flags |= SEEN_A | SEEN_X;
+					b_off = b_imm(i + inst->jt + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+				}
+			}
+			break;
+		case BPF_S_JMP_JSET_K:
+			ctx->flags |= SEEN_S0 | SEEN_S1 | SEEN_A;
+			/* pc += (A & K) ? pc -> jt : pc -> jf */
+			emit_load_imm(r_s1, k, ctx);
+			emit_and(r_s0, r_A, r_s1, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JSET_X:
+			ctx->flags |= SEEN_S0 | SEEN_X | SEEN_A;
+			/* pc += (A & X) ? pc -> jt : pc -> jf */
+			emit_and(r_s0, r_A, r_X, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_RET_A:
+			ctx->flags |= SEEN_A;
+			if (i != prog->len - 1)
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_A, ctx); /* delay slot */
+			break;
+		case BPF_S_RET_K:
+			/*
+			 * It can emit two instructions so it does not fit on
+			 * the delay slot.
+			 */
+			emit_load_imm(r_ret, k, ctx);
+			if (i != prog->len - 1) {
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+				emit_nop(ctx);
+			}
+			break;
+		case BPF_S_MISC_TAX:
+			/* X = A */
+			ctx->flags |= SEEN_X | SEEN_A;
+			emit_jit_reg_move(r_X, r_A, ctx);
+			break;
+		case BPF_S_MISC_TXA:
+			/* A = X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_jit_reg_move(r_A, r_X, ctx);
+			break;
+		/* AUX */
+		case BPF_S_ANC_PROTOCOL:
+			/* A = ntohs(skb->protocol */
+			ctx->flags |= SEEN_SKB | SEEN_OFF | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  protocol) != 2);
+			off = offsetof(struct sk_buff, protocol);
+			emit_half_load(r_A, r_skb, off, ctx);
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+			/* This needs little endian fixup */
+			if (config_enabled(CPU_MIPSR1)) {
+				/* Get first byte */
+				emit_andi(r_tmp_imm, r_A, 0xff, ctx);
+				/* Shift it */
+				emit_sll(r_tmp, r_tmp_imm, 8, ctx);
+				/* Get second byte */
+				emit_srl(r_tmp_imm, r_A, 8, ctx);
+				emit_andi(r_tmp_imm, r_tmp_imm, 0xff, ctx);
+				/* Put everyting together in r_A */
+				emit_or(r_A, r_tmp, r_tmp_imm, ctx);
+			} else {
+				/* R2 and later have the wsbh instruction */
+				emit_wsbh(r_A, r_A, ctx);
+			}
+#endif
+			break;
+		case BPF_S_ANC_CPU:
+			ctx->flags |= SEEN_A | SEEN_OFF;
+			/* A = current_thread_info()->cpu */
+			BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info,
+						  cpu) != 4);
+			off = offsetof(struct thread_info, cpu);
+			/* $28/gp points to the thread_info struct */
+			emit_load(r_A, 28, off, ctx);
+			break;
+		case BPF_S_ANC_IFINDEX:
+			/* A = skb->dev->ifindex */
+			ctx->flags |= SEEN_SKB | SEEN_A | SEEN_S0;
+			off = offsetof(struct sk_buff, dev);
+			emit_load(r_s0, r_skb, off, ctx);
+			/* error (0) in the delay slot */
+			emit_bcond(MIPS_COND_EQ, r_s0, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+						  ifindex) != 4);
+			off = offsetof(struct net_device, ifindex);
+			emit_load(r_A, r_s0, off, ctx);
+			break;
+		case BPF_S_ANC_MARK:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
+			off = offsetof(struct sk_buff, mark);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_RXHASH:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, hash) != 4);
+			off = offsetof(struct sk_buff, hash);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_VLAN_TAG_PRESENT:
+			ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  vlan_tci) != 2);
+			off = offsetof(struct sk_buff, vlan_tci);
+			emit_half_load(r_s0, r_skb, off, ctx);
+			if (inst->code == BPF_S_ANC_VLAN_TAG)
+				emit_and(r_A, r_s0, VLAN_VID_MASK, ctx);
+			else
+				emit_and(r_A, r_s0, VLAN_TAG_PRESENT, ctx);
+			break;
+		case BPF_S_ANC_QUEUE:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  queue_mapping) != 2);
+			BUILD_BUG_ON(offsetof(struct sk_buff,
+					      queue_mapping) > 0xff);
+			off = offsetof(struct sk_buff, queue_mapping);
+			emit_half_load(r_A, r_skb, off, ctx);
+			break;
+		default:
+			pr_warn("%s: Unhandled opcode: 0x%02x\n", __FILE__,
+				inst->code);
+			return -1;
+		}
+	}
+
+	/* compute offsets only during the first pass */
+	if (ctx->target == NULL)
+		ctx->offsets[i] = ctx->idx * 4;
+
+	return 0;
+
+}
+
+int bpf_jit_enable __read_mostly;
+
+void bpf_jit_compile(struct sk_filter *fp)
+{
+	struct jit_ctx ctx;
+	unsigned int alloc_size, tmp_idx;
+
+	if (!bpf_jit_enable)
+		return;
+
+	memset(&ctx, 0, sizeof(ctx));
+
+	ctx.offsets = kcalloc(fp->len, sizeof(*ctx.offsets), GFP_KERNEL);
+	if (ctx.offsets == NULL)
+		return;
+
+	ctx.skf = fp;
+
+	if (unlikely(build_body(&ctx)))
+		goto out;
+
+	tmp_idx = ctx.idx;
+	build_prologue(&ctx);
+	ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
+	/* just to complete the ctx.idx count */
+	build_epilogue(&ctx);
+
+	alloc_size = 4 * ctx.idx;
+	ctx.target = module_alloc(alloc_size);
+	if (ctx.target == NULL)
+		goto out;
+
+	/* Clean it */
+	memset(ctx.target, 0, alloc_size);
+
+	ctx.idx = 0;
+
+	/* Generate the actual JIT code */
+	build_prologue(&ctx);
+	build_body(&ctx);
+	build_epilogue(&ctx);
+
+	/* Update the icache */
+	flush_icache_range((ptr)ctx.target, (ptr)(ctx.target + ctx.idx));
+
+	if (bpf_jit_enable > 1)
+		/* Dump JIT code */
+		bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);
+
+	fp->bpf_func = (void *)ctx.target;
+	fp->jited = 1;
+
+out:
+	kfree(ctx.offsets);
+}
+
+void bpf_jit_free(struct sk_filter *fp)
+{
+	if (fp->jited)
+		module_free(NULL, fp->bpf_func);
+	kfree(fp);
+}
diff --git a/arch/mips/net/bpf_jit.h b/arch/mips/net/bpf_jit.h
new file mode 100644
index 0000000..3a5751b
--- /dev/null
+++ b/arch/mips/net/bpf_jit.h
@@ -0,0 +1,44 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#ifndef BPF_JIT_MIPS_OP_H
+#define BPF_JIT_MIPS_OP_H
+
+/* Registers used by JIT */
+#define MIPS_R_ZERO	0
+#define MIPS_R_V0	2
+#define MIPS_R_V1	3
+#define MIPS_R_A0	4
+#define MIPS_R_A1	5
+#define MIPS_R_T6	14
+#define MIPS_R_T7	15
+#define MIPS_R_S0	16
+#define MIPS_R_S1	17
+#define MIPS_R_S2	18
+#define MIPS_R_S3	19
+#define MIPS_R_S4	20
+#define MIPS_R_S5	21
+#define MIPS_R_S6	22
+#define MIPS_R_S7	23
+#define MIPS_R_SP	29
+#define MIPS_R_RA	31
+
+/* Conditional codes */
+#define MIPS_COND_EQ	0x1
+#define MIPS_COND_GE	(0x1 << 1)
+#define MIPS_COND_GT	(0x1 << 2)
+#define MIPS_COND_NE	(0x1 << 3)
+#define MIPS_COND_ALL	(0x1 << 4)
+/* Conditionals on X register or K immediate */
+#define MIPS_COND_X	(0x1 << 5)
+#define MIPS_COND_K	(0x1 << 6)
+
+#endif /* BPF_JIT_MIPS_OP_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 14/14] MIPS: Enable the BPF_JIT symbol for MIPS
@ 2014-04-09 16:02     ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-09 16:02 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
Changes since v1:
- Drop the MIPS32/64R2 dependency
---
 arch/mips/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 885cbf2..ca24cfc0 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -12,6 +12,7 @@ config MIPS
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_TRACEHOOK
+	select HAVE_BPF_JIT if !CPU_MICROMIPS
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 14/14] MIPS: Enable the BPF_JIT symbol for MIPS
@ 2014-04-09 16:02     ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-09 16:02 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
Changes since v1:
- Drop the MIPS32/64R2 dependency
---
 arch/mips/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 885cbf2..ca24cfc0 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -12,6 +12,7 @@ config MIPS
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_TRACEHOOK
+	select HAVE_BPF_JIT if !CPU_MICROMIPS
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 13/14] MIPS: net: Add BPF JIT
  2014-04-09 16:00     ` Markos Chandras
  (?)
@ 2014-04-09 22:28     ` Jonas Gorski
  2014-04-10  7:46         ` Markos Chandras
  -1 siblings, 1 reply; 53+ messages in thread
From: Jonas Gorski @ 2014-04-09 22:28 UTC (permalink / raw)
  To: Markos Chandras; +Cc: MIPS Mailing List, David S. Miller, netdev

On Wed, Apr 9, 2014 at 6:00 PM, Markos Chandras
<markos.chandras@imgtec.com> wrote:
> This adds initial support for BPF-JIT on MIPS
>
> Tested on mips32 LE/BE and mips64 BE/n64 using
> dhcp, ping and various tcpdump filters.
>
> Benchmarking:
>
> Assuming the remote MIPS target uses 192.168.154.181
> as its IP address, and the local host uses 192.168.154.136,
> the following results can be obtained using the following
> tcpdump filter (catches no frames) and a simple
> 'time ping -f -c 1000000' command.
>
> [root@(none) ~]# tcpdump -p -n -s 0 -i eth0 net 10.0.0.0/24 -d
> (000) ldh      [12]
> (001) jeq      #0x800           jt 2    jf 8
> (002) ld       [26]
> (003) and      #0xffffff00
> (004) jeq      #0xa000000       jt 16   jf 5
> (005) ld       [30]
> (006) and      #0xffffff00
> (007) jeq      #0xa000000       jt 16   jf 17
> (008) jeq      #0x806           jt 10   jf 9
> (009) jeq      #0x8035          jt 10   jf 17
> (010) ld       [28]
> (011) and      #0xffffff00
> (012) jeq      #0xa000000       jt 16   jf 13
> (013) ld       [38]
> (014) and      #0xffffff00
> (015) jeq      #0xa000000       jt 16   jf 17
> (016) ret      #65535
>
> - BPF-JIT Disabled
>
> real    1m38.005s
> user    0m1.510s
> sys     0m6.710s
>
> - BPF-JIT Enabled
>
> real    1m35.215s
> user    0m1.200s
> sys     0m4.140s
>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
> ---
> Changes since v1:
>
> - Add 'wsbh' implementation for MIPS32/64R1
> - Rename 'rxhash' to 'hash' because of
> 61b905da33ae25edb6b9d2a5de21e34c3a77efe3
> - Initialize 'fp->jited' because of
> f8bbbfc3b97f4c7a6c7c23185e520b22bfc3a21d
> ---

(snip)

> diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> new file mode 100644
> index 0000000..a12f843
> --- /dev/null
> +++ b/arch/mips/net/bpf_jit.c

(snip)

> +#ifdef CONFIG_CPU_LITTLE_ENDIAN
> +                       /* This needs little endian fixup */
> +                       if (config_enabled(CPU_MIPSR1)) {

Hm, looking at arch/mips/Kconfig, this will falsely identify CPU_BMIPS
as R2 as it does not select CPU_MIPSR1.

How about "if (cpu_has_mips_r1)"? There are some targets that have
both R1 and R2 cores (e.g. bcm47xx), and even if we built the kernel
for R1, it does not prevent us from emitting R2 instructions because
we do this at runtime when we can check for it.

> +                               /* Get first byte */
> +                               emit_andi(r_tmp_imm, r_A, 0xff, ctx);
> +                               /* Shift it */
> +                               emit_sll(r_tmp, r_tmp_imm, 8, ctx);
> +                               /* Get second byte */
> +                               emit_srl(r_tmp_imm, r_A, 8, ctx);
> +                               emit_andi(r_tmp_imm, r_tmp_imm, 0xff, ctx);
> +                               /* Put everyting together in r_A */
> +                               emit_or(r_A, r_tmp, r_tmp_imm, ctx);
> +                       } else {
> +                               /* R2 and later have the wsbh instruction */
> +                               emit_wsbh(r_A, r_A, ctx);
> +                       }
> +#endif


Regards
Jonas

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 13/14] MIPS: net: Add BPF JIT
  2014-04-09 16:00     ` Markos Chandras
  (?)
  (?)
@ 2014-04-10  3:40     ` Alexei Starovoitov
  2014-04-10  8:06         ` Markos Chandras
  -1 siblings, 1 reply; 53+ messages in thread
From: Alexei Starovoitov @ 2014-04-10  3:40 UTC (permalink / raw)
  To: Markos Chandras; +Cc: linux-mips, David S. Miller, netdev, Daniel Borkmann

On Wed, Apr 9, 2014 at 9:00 AM, Markos Chandras
<markos.chandras@imgtec.com> wrote:
> This adds initial support for BPF-JIT on MIPS

Great work!

btw, net-next is closed and we're waiting to submit classic+internal
BPF testsuite
that would have helped in testing and benchmarking.

> Benchmarking:
> - BPF-JIT Disabled
> real    1m38.005s
> - BPF-JIT Enabled
> real    1m35.215s

it's hard to see the difference in a such setup.
In bpf only tests we see 4-20x speedup from jit.
I think mips arch should see something similar.

Few questions:
- why did you implement only this small set of bpf extensions?
  was there a use case or they were easier comparing to others?

- this patch set depends on 12 other patches.
  would be easier to review if the whole set is cc-ed to netdev

- did you consider doing jit over internal bpf?
  all bpf extensions support would have come for free and it would work
  for seccomp and tracing filters in the future.

Few comments:

> +#define RSIZE  (sizeof(unsigned long))
> +#define ptr typeof(unsigned long)

these are odd looking macros.

> +static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
> +                            unsigned int imm, struct jit_ctx *ctx)
> +{
> +       if (ctx->target != NULL) {
> +               u32 *p = &ctx->target[ctx->idx];
> +
> +               switch (cond) {
> +               case MIPS_COND_EQ:
> +                       uasm_i_beq(&p, reg1, reg2, imm);
> +                       break;
> +               case MIPS_COND_NE:
> +                       uasm_i_bne(&p, reg1, reg2, imm);
> +                       break;
> +               case MIPS_COND_ALL:
> +                       uasm_i_b(&p, imm);
> +                       break;
> +               default:
> +                       pr_warn("%s: Unhandled branch conditional: %d\n",
> +                               __func__, cond);

shouldn't it be BUG_ON instead?
can it spam kernel logs?

> +static bool is_load_to_a(u16 inst)
> +{
> +       switch (inst) {
> +       case BPF_S_LD_W_LEN:
> +       case BPF_S_LD_W_ABS:
> +       case BPF_S_LD_H_ABS:
> +       case BPF_S_LD_B_ABS:
> +       case BPF_S_ANC_CPU:
> +       case BPF_S_ANC_IFINDEX:
> +       case BPF_S_ANC_MARK:
> +       case BPF_S_ANC_PROTOCOL:
> +       case BPF_S_ANC_RXHASH:
> +       case BPF_S_ANC_VLAN_TAG:
> +       case BPF_S_ANC_VLAN_TAG_PRESENT:
> +       case BPF_S_ANC_QUEUE:

it seems this switch() statement handles more extensions
that actually jitted later. Future proofing?

> +static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
> +{
> +       int i = 0, real_off = 0;
> +       u32 sflags, tmp_flags;
> +
> +       /* Adjust the stack pointer */
> +       emit_stack_offset(-align_sp(offset), ctx);
> +
> +       if (ctx->flags & SEEN_CALL) {
> +               /* Argument save area */
> +               if (config_enabled(CONFIG_64BIT))
> +                       /* Bottom of current frame */
> +                       real_off = align_sp(offset) - RSIZE;
> +               else
> +                       /* Top of previous frame */
> +                       real_off = align_sp(offset) + RSIZE;
> +               emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
> +               emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
> +
> +               real_off = 0;
> +       }
> +
> +       tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
> +       /* sflags is essentially a bitmap */
> +       pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);

that will spam logs. please remove.

> +       while (tmp_flags) {
> +               if ((sflags >> i) & 0x1) {
> +                       pr_debug("%s: preserving register %d\n", __func__,
> +                                MIPS_R_S0 + i);

likewise.

> +static void restore_bpf_jit_regs(struct jit_ctx *ctx,
> +                                unsigned int offset)
> +{
> +       int i, real_off = 0;
> +       u32 sflags, tmp_flags;
> +
> +       if (ctx->flags & SEEN_CALL) {
> +               if (config_enabled(CONFIG_64BIT))
> +                       /* Bottom of current frame */
> +                       real_off = align_sp(offset) - RSIZE;
> +               else
> +                       /* Top of previous frame */
> +                       real_off = align_sp(offset) + RSIZE;
> +               emit_load_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
> +               emit_load_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
> +
> +               real_off = 0;
> +       }
> +
> +       tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
> +       /* sflags is a bitmap */
> +       pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);

likewise. please remove.

> +       i = 0;
> +       while (tmp_flags) {
> +               if ((sflags >> i) & 0x1) {
> +                       pr_debug("%s: restoring register %d\n", __func__,
> +                                MIPS_R_S0 + i);

likewise.

> +static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
> +{
> +       u8 ret;
> +       int err;
> +
> +       err = skb_copy_bits(skb, offset, &ret, 1);
> +
> +       return (u64)err << 32 | ret;
> +}

negative offsets are not supported intentionally?

> +static int build_body(struct jit_ctx *ctx)
> +{
> +       void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
> +       const struct sk_filter *prog = ctx->skf;
> +       const struct sock_filter *inst;
> +       unsigned int i, off, load_order, condt;
> +       u32 k, b_off __maybe_unused;
> +
> +       for (i = 0; i < prog->len; i++) {
> +               inst = &(prog->insns[i]);
> +               pr_debug("%s: code->0x%02x, jt->0x%x, jf->0x%x, k->0x%x\n",
> +                        __func__, inst->code, inst->jt, inst->jf, inst->k);

please remove.

> +load_ind:
> +                       update_on_xread(ctx);
> +                       ctx->flags |= SEEN_OFF | SEEN_X;
> +                       emit_addiu(r_off, r_X, k, ctx);
> +               goto load_common;

needs extra tab of formatting

> +               case BPF_S_ANC_VLAN_TAG_PRESENT:
> +                       ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
> +                       BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
> +                                                 vlan_tci) != 2);
> +                       off = offsetof(struct sk_buff, vlan_tci);
> +                       emit_half_load(r_s0, r_skb, off, ctx);
> +                       if (inst->code == BPF_S_ANC_VLAN_TAG)

this branch can never be hit. Did you miss 'case VLAN_TAG' few lines above?

> +                               emit_and(r_A, r_s0, VLAN_VID_MASK, ctx);
> +                       else
> +                               emit_and(r_A, r_s0, VLAN_TAG_PRESENT, ctx);
> +                       break;
> +               case BPF_S_ANC_QUEUE:
> +                       ctx->flags |= SEEN_SKB | SEEN_A;
> +                       BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
> +                                                 queue_mapping) != 2);
> +                       BUILD_BUG_ON(offsetof(struct sk_buff,
> +                                             queue_mapping) > 0xff);
> +                       off = offsetof(struct sk_buff, queue_mapping);
> +                       emit_half_load(r_A, r_skb, off, ctx);
> +                       break;
> +               default:
> +                       pr_warn("%s: Unhandled opcode: 0x%02x\n", __FILE__,
> +                               inst->code);

that will spam the logs. please remove.

> +                       return -1;
> +               }
> +       }
> +
> +       /* compute offsets only during the first pass */
> +       if (ctx->target == NULL)
> +               ctx->offsets[i] = ctx->idx * 4;
> +
> +       return 0;
> +
> +}
> +
> +int bpf_jit_enable __read_mostly;
> +
> +void bpf_jit_compile(struct sk_filter *fp)
> +{
> +       struct jit_ctx ctx;
> +       unsigned int alloc_size, tmp_idx;
> +
> +       if (!bpf_jit_enable)
> +               return;
> +
> +       memset(&ctx, 0, sizeof(ctx));
> +
> +       ctx.offsets = kcalloc(fp->len, sizeof(*ctx.offsets), GFP_KERNEL);
> +       if (ctx.offsets == NULL)
> +               return;
> +
> +       ctx.skf = fp;
> +
> +       if (unlikely(build_body(&ctx)))

why 'unlikely'? this jit doesn't support all extensions
so it may very well be 'likely' for some use cases.

> +               goto out;
> +
> +       tmp_idx = ctx.idx;
> +       build_prologue(&ctx);
> +       ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
> +       /* just to complete the ctx.idx count */
> +       build_epilogue(&ctx);
> +
> +       alloc_size = 4 * ctx.idx;
> +       ctx.target = module_alloc(alloc_size);
> +       if (ctx.target == NULL)
> +               goto out;
> +
> +       /* Clean it */
> +       memset(ctx.target, 0, alloc_size);
> +
> +       ctx.idx = 0;
> +
> +       /* Generate the actual JIT code */
> +       build_prologue(&ctx);
> +       build_body(&ctx);

do you want to add BUG_ON to make sure that 2nd build_body() succeeds?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 13/14] MIPS: net: Add BPF JIT
@ 2014-04-10  7:46         ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-10  7:46 UTC (permalink / raw)
  To: Jonas Gorski; +Cc: MIPS Mailing List, David S. Miller, netdev

On 04/09/2014 11:28 PM, Jonas Gorski wrote:
>
>> +#ifdef CONFIG_CPU_LITTLE_ENDIAN
>> +                       /* This needs little endian fixup */
>> +                       if (config_enabled(CPU_MIPSR1)) {
>
> Hm, looking at arch/mips/Kconfig, this will falsely identify CPU_BMIPS
> as R2 as it does not select CPU_MIPSR1.
>
> How about "if (cpu_has_mips_r1)"? There are some targets that have
> both R1 and R2 cores (e.g. bcm47xx), and even if we built the kernel
> for R1, it does not prevent us from emitting R2 instructions because
> we do this at runtime when we can check for it.

Sounds about right. Let me have a look


-- 
markos

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 13/14] MIPS: net: Add BPF JIT
@ 2014-04-10  7:46         ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-10  7:46 UTC (permalink / raw)
  To: Jonas Gorski; +Cc: MIPS Mailing List, David S. Miller, netdev

On 04/09/2014 11:28 PM, Jonas Gorski wrote:
>
>> +#ifdef CONFIG_CPU_LITTLE_ENDIAN
>> +                       /* This needs little endian fixup */
>> +                       if (config_enabled(CPU_MIPSR1)) {
>
> Hm, looking at arch/mips/Kconfig, this will falsely identify CPU_BMIPS
> as R2 as it does not select CPU_MIPSR1.
>
> How about "if (cpu_has_mips_r1)"? There are some targets that have
> both R1 and R2 cores (e.g. bcm47xx), and even if we built the kernel
> for R1, it does not prevent us from emitting R2 instructions because
> we do this at runtime when we can check for it.

Sounds about right. Let me have a look


-- 
markos

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 13/14] MIPS: net: Add BPF JIT
@ 2014-04-10  8:06         ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-10  8:06 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: linux-mips, David S. Miller, netdev, Daniel Borkmann

Hi Alexei,

On 04/10/2014 04:40 AM, Alexei Starovoitov wrote:
> On Wed, Apr 9, 2014 at 9:00 AM, Markos Chandras
> <markos.chandras@imgtec.com> wrote:
>> This adds initial support for BPF-JIT on MIPS
>
> Great work!
>
> btw, net-next is closed and we're waiting to submit classic+internal
> BPF testsuite
> that would have helped in testing and benchmarking.

That would be very useful thanks.

>
>> Benchmarking:
>> - BPF-JIT Disabled
>> real    1m38.005s
>> - BPF-JIT Enabled
>> real    1m35.215s
>
> it's hard to see the difference in a such setup.
> In bpf only tests we see 4-20x speedup from jit.
> I think mips arch should see something similar.
>
> Few questions:
> - why did you implement only this small set of bpf extensions?
>    was there a use case or they were easier comparing to others?

I assume you are referring to the BPF_S_ANC_* opcodes? I may have 
overlooked something. I just compared that to ARM and it seems i 
implemented the same extensions, but it seems I lack a few compared to x86.

>
> - this patch set depends on 12 other patches.
>    would be easier to review if the whole set is cc-ed to netdev

The rest of the patches add uasm instructions so they are not netdev@ 
related. An example of such patch is here.
http://patchwork.linux-mips.org/patch/6725/

>
> - did you consider doing jit over internal bpf?
>    all bpf extensions support would have come for free and it would work
>    for seccomp and tracing filters in the future.
>
Is this the recommended way? (could you point me to some info about 
internal bpf+jit). I pretty much did that other architectures are doing 
at the moment.

> Few comments:
>
>> +#define RSIZE  (sizeof(unsigned long))
>> +#define ptr typeof(unsigned long)
>
> these are odd looking macros.
>
Indeed but the code is aimed to run in 32 and 64-bit processors so i 
needed some kind of abstraction. I open to suggestions.

>> +static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
>> +                            unsigned int imm, struct jit_ctx *ctx)
>> +{
>> +       if (ctx->target != NULL) {
>> +               u32 *p = &ctx->target[ctx->idx];
>> +
>> +               switch (cond) {
>> +               case MIPS_COND_EQ:
>> +                       uasm_i_beq(&p, reg1, reg2, imm);
>> +                       break;
>> +               case MIPS_COND_NE:
>> +                       uasm_i_bne(&p, reg1, reg2, imm);
>> +                       break;
>> +               case MIPS_COND_ALL:
>> +                       uasm_i_b(&p, imm);
>> +                       break;
>> +               default:
>> +                       pr_warn("%s: Unhandled branch conditional: %d\n",
>> +                               __func__, cond);
>
> shouldn't it be BUG_ON instead?
> can it spam kernel logs?

BUG_ON() can be disabled (CONFIG_BUG) so spamming the log was 
intentional to make sure this will not go unnoticed.

>
>> +static bool is_load_to_a(u16 inst)
>> +{
>> +       switch (inst) {
>> +       case BPF_S_LD_W_LEN:
>> +       case BPF_S_LD_W_ABS:
>> +       case BPF_S_LD_H_ABS:
>> +       case BPF_S_LD_B_ABS:
>> +       case BPF_S_ANC_CPU:
>> +       case BPF_S_ANC_IFINDEX:
>> +       case BPF_S_ANC_MARK:
>> +       case BPF_S_ANC_PROTOCOL:
>> +       case BPF_S_ANC_RXHASH:
>> +       case BPF_S_ANC_VLAN_TAG:
>> +       case BPF_S_ANC_VLAN_TAG_PRESENT:
>> +       case BPF_S_ANC_QUEUE:
>
> it seems this switch() statement handles more extensions
> that actually jitted later. Future proofing?

It's likely i overlooked something again. I will double check.

>
>> +static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
>> +{
>> +       int i = 0, real_off = 0;
>> +       u32 sflags, tmp_flags;
>> +
>> +       /* Adjust the stack pointer */
>> +       emit_stack_offset(-align_sp(offset), ctx);
>> +
>> +       if (ctx->flags & SEEN_CALL) {
>> +               /* Argument save area */
>> +               if (config_enabled(CONFIG_64BIT))
>> +                       /* Bottom of current frame */
>> +                       real_off = align_sp(offset) - RSIZE;
>> +               else
>> +                       /* Top of previous frame */
>> +                       real_off = align_sp(offset) + RSIZE;
>> +               emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
>> +               emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
>> +
>> +               real_off = 0;
>> +       }
>> +
>> +       tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
>> +       /* sflags is essentially a bitmap */
>> +       pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);
>
> that will spam logs. please remove.

This will only spam the logs if you build with -DDEBUG. This is an 
unlikely build situation so spamming the logs is desired because if you 
added -DDEBUG yourself, you need to see as many details as you want 
(same for the rest of pr_debug() calls)


>
>> +static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
>> +{
>> +       u8 ret;
>> +       int err;
>> +
>> +       err = skb_copy_bits(skb, offset, &ret, 1);
>> +
>> +       return (u64)err << 32 | ret;
>> +}
>
> negative offsets are not supported intentionally?
I am sorry. I don't understand what you mean (and this code is identical 
to ARM)

>> +load_ind:
>> +                       update_on_xread(ctx);
>> +                       ctx->flags |= SEEN_OFF | SEEN_X;
>> +                       emit_addiu(r_off, r_X, k, ctx);
>> +               goto load_common;
>
> needs extra tab of formatting
Thanks. I will fix it.

>
>> +               case BPF_S_ANC_VLAN_TAG_PRESENT:
>> +                       ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
>> +                       BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
>> +                                                 vlan_tci) != 2);
>> +                       off = offsetof(struct sk_buff, vlan_tci);
>> +                       emit_half_load(r_s0, r_skb, off, ctx);
>> +                       if (inst->code == BPF_S_ANC_VLAN_TAG)
>
> this branch can never be hit. Did you miss 'case VLAN_TAG' few lines above?

Indeed I did...


>> +
>> +       ctx.skf = fp;
>> +
>> +       if (unlikely(build_body(&ctx)))
>
> why 'unlikely'? this jit doesn't support all extensions
> so it may very well be 'likely' for some use cases.

I will remove it once I add the missing extensions.

>
>> +               goto out;
>> +
>> +       tmp_idx = ctx.idx;
>> +       build_prologue(&ctx);
>> +       ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
>> +       /* just to complete the ctx.idx count */
>> +       build_epilogue(&ctx);
>> +
>> +       alloc_size = 4 * ctx.idx;
>> +       ctx.target = module_alloc(alloc_size);
>> +       if (ctx.target == NULL)
>> +               goto out;
>> +
>> +       /* Clean it */
>> +       memset(ctx.target, 0, alloc_size);
>> +
>> +       ctx.idx = 0;
>> +
>> +       /* Generate the actual JIT code */
>> +       build_prologue(&ctx);
>> +       build_body(&ctx);
>
> do you want to add BUG_ON to make sure that 2nd build_body() succeeds?
If the first one managed to succeed why would the second one fail?

Thanks for the review!

-- 
markos

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 13/14] MIPS: net: Add BPF JIT
@ 2014-04-10  8:06         ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-10  8:06 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: linux-mips, David S. Miller, netdev, Daniel Borkmann

Hi Alexei,

On 04/10/2014 04:40 AM, Alexei Starovoitov wrote:
> On Wed, Apr 9, 2014 at 9:00 AM, Markos Chandras
> <markos.chandras@imgtec.com> wrote:
>> This adds initial support for BPF-JIT on MIPS
>
> Great work!
>
> btw, net-next is closed and we're waiting to submit classic+internal
> BPF testsuite
> that would have helped in testing and benchmarking.

That would be very useful thanks.

>
>> Benchmarking:
>> - BPF-JIT Disabled
>> real    1m38.005s
>> - BPF-JIT Enabled
>> real    1m35.215s
>
> it's hard to see the difference in a such setup.
> In bpf only tests we see 4-20x speedup from jit.
> I think mips arch should see something similar.
>
> Few questions:
> - why did you implement only this small set of bpf extensions?
>    was there a use case or they were easier comparing to others?

I assume you are referring to the BPF_S_ANC_* opcodes? I may have 
overlooked something. I just compared that to ARM and it seems i 
implemented the same extensions, but it seems I lack a few compared to x86.

>
> - this patch set depends on 12 other patches.
>    would be easier to review if the whole set is cc-ed to netdev

The rest of the patches add uasm instructions so they are not netdev@ 
related. An example of such patch is here.
http://patchwork.linux-mips.org/patch/6725/

>
> - did you consider doing jit over internal bpf?
>    all bpf extensions support would have come for free and it would work
>    for seccomp and tracing filters in the future.
>
Is this the recommended way? (could you point me to some info about 
internal bpf+jit). I pretty much did that other architectures are doing 
at the moment.

> Few comments:
>
>> +#define RSIZE  (sizeof(unsigned long))
>> +#define ptr typeof(unsigned long)
>
> these are odd looking macros.
>
Indeed but the code is aimed to run in 32 and 64-bit processors so i 
needed some kind of abstraction. I open to suggestions.

>> +static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
>> +                            unsigned int imm, struct jit_ctx *ctx)
>> +{
>> +       if (ctx->target != NULL) {
>> +               u32 *p = &ctx->target[ctx->idx];
>> +
>> +               switch (cond) {
>> +               case MIPS_COND_EQ:
>> +                       uasm_i_beq(&p, reg1, reg2, imm);
>> +                       break;
>> +               case MIPS_COND_NE:
>> +                       uasm_i_bne(&p, reg1, reg2, imm);
>> +                       break;
>> +               case MIPS_COND_ALL:
>> +                       uasm_i_b(&p, imm);
>> +                       break;
>> +               default:
>> +                       pr_warn("%s: Unhandled branch conditional: %d\n",
>> +                               __func__, cond);
>
> shouldn't it be BUG_ON instead?
> can it spam kernel logs?

BUG_ON() can be disabled (CONFIG_BUG) so spamming the log was 
intentional to make sure this will not go unnoticed.

>
>> +static bool is_load_to_a(u16 inst)
>> +{
>> +       switch (inst) {
>> +       case BPF_S_LD_W_LEN:
>> +       case BPF_S_LD_W_ABS:
>> +       case BPF_S_LD_H_ABS:
>> +       case BPF_S_LD_B_ABS:
>> +       case BPF_S_ANC_CPU:
>> +       case BPF_S_ANC_IFINDEX:
>> +       case BPF_S_ANC_MARK:
>> +       case BPF_S_ANC_PROTOCOL:
>> +       case BPF_S_ANC_RXHASH:
>> +       case BPF_S_ANC_VLAN_TAG:
>> +       case BPF_S_ANC_VLAN_TAG_PRESENT:
>> +       case BPF_S_ANC_QUEUE:
>
> it seems this switch() statement handles more extensions
> that actually jitted later. Future proofing?

It's likely i overlooked something again. I will double check.

>
>> +static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
>> +{
>> +       int i = 0, real_off = 0;
>> +       u32 sflags, tmp_flags;
>> +
>> +       /* Adjust the stack pointer */
>> +       emit_stack_offset(-align_sp(offset), ctx);
>> +
>> +       if (ctx->flags & SEEN_CALL) {
>> +               /* Argument save area */
>> +               if (config_enabled(CONFIG_64BIT))
>> +                       /* Bottom of current frame */
>> +                       real_off = align_sp(offset) - RSIZE;
>> +               else
>> +                       /* Top of previous frame */
>> +                       real_off = align_sp(offset) + RSIZE;
>> +               emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
>> +               emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
>> +
>> +               real_off = 0;
>> +       }
>> +
>> +       tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
>> +       /* sflags is essentially a bitmap */
>> +       pr_debug("%s: register flags: 0x%08x\n", __func__, tmp_flags);
>
> that will spam logs. please remove.

This will only spam the logs if you build with -DDEBUG. This is an 
unlikely build situation so spamming the logs is desired because if you 
added -DDEBUG yourself, you need to see as many details as you want 
(same for the rest of pr_debug() calls)


>
>> +static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
>> +{
>> +       u8 ret;
>> +       int err;
>> +
>> +       err = skb_copy_bits(skb, offset, &ret, 1);
>> +
>> +       return (u64)err << 32 | ret;
>> +}
>
> negative offsets are not supported intentionally?
I am sorry. I don't understand what you mean (and this code is identical 
to ARM)

>> +load_ind:
>> +                       update_on_xread(ctx);
>> +                       ctx->flags |= SEEN_OFF | SEEN_X;
>> +                       emit_addiu(r_off, r_X, k, ctx);
>> +               goto load_common;
>
> needs extra tab of formatting
Thanks. I will fix it.

>
>> +               case BPF_S_ANC_VLAN_TAG_PRESENT:
>> +                       ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
>> +                       BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
>> +                                                 vlan_tci) != 2);
>> +                       off = offsetof(struct sk_buff, vlan_tci);
>> +                       emit_half_load(r_s0, r_skb, off, ctx);
>> +                       if (inst->code == BPF_S_ANC_VLAN_TAG)
>
> this branch can never be hit. Did you miss 'case VLAN_TAG' few lines above?

Indeed I did...


>> +
>> +       ctx.skf = fp;
>> +
>> +       if (unlikely(build_body(&ctx)))
>
> why 'unlikely'? this jit doesn't support all extensions
> so it may very well be 'likely' for some use cases.

I will remove it once I add the missing extensions.

>
>> +               goto out;
>> +
>> +       tmp_idx = ctx.idx;
>> +       build_prologue(&ctx);
>> +       ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
>> +       /* just to complete the ctx.idx count */
>> +       build_epilogue(&ctx);
>> +
>> +       alloc_size = 4 * ctx.idx;
>> +       ctx.target = module_alloc(alloc_size);
>> +       if (ctx.target == NULL)
>> +               goto out;
>> +
>> +       /* Clean it */
>> +       memset(ctx.target, 0, alloc_size);
>> +
>> +       ctx.idx = 0;
>> +
>> +       /* Generate the actual JIT code */
>> +       build_prologue(&ctx);
>> +       build_body(&ctx);
>
> do you want to add BUG_ON to make sure that 2nd build_body() succeeds?
If the first one managed to succeed why would the second one fail?

Thanks for the review!

-- 
markos

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 13/14] MIPS: net: Add BPF JIT
@ 2014-04-24 14:50     ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-24 14:50 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras, David S. Miller, netdev

This adds initial support for BPF-JIT on MIPS

Tested on mips32 LE/BE and mips64 BE/n64 using
dhcp, ping and various tcpdump filters.

Benchmarking:

Assuming the remote MIPS target uses 192.168.154.181
as its IP address, and the local host uses 192.168.154.136,
the following results can be obtained using the following
tcpdump filter (catches no frames) and a simple
'time ping -f -c 1000000' command.

[root@(none) ~]# tcpdump -p -n -s 0 -i eth0 net 10.0.0.0/24 -d
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 8
(002) ld       [26]
(003) and      #0xffffff00
(004) jeq      #0xa000000       jt 16	jf 5
(005) ld       [30]
(006) and      #0xffffff00
(007) jeq      #0xa000000       jt 16	jf 17
(008) jeq      #0x806           jt 10	jf 9
(009) jeq      #0x8035          jt 10	jf 17
(010) ld       [28]
(011) and      #0xffffff00
(012) jeq      #0xa000000       jt 16	jf 13
(013) ld       [38]
(014) and      #0xffffff00
(015) jeq      #0xa000000       jt 16	jf 17
(016) ret      #65535

- BPF-JIT Disabled

real    1m38.005s
user    0m1.510s
sys     0m6.710s

- BPF-JIT Enabled

real    1m35.215s
user    0m1.200s
sys     0m4.140s

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
Changes since v2
- Use cpu_has_mips_r1 instead of CONFIG_CPU_MIPSR1 for runtime MIPS32/64 R1 detection
- Implement missing opcodes
- Remove a few pr_debug statements
- A bit of clean up
---
 arch/mips/Kbuild        |    1 +
 arch/mips/net/Makefile  |    3 +
 arch/mips/net/bpf_jit.c | 1399 +++++++++++++++++++++++++++++++++++++++++++++++
 arch/mips/net/bpf_jit.h |   44 ++
 4 files changed, 1447 insertions(+)
 create mode 100644 arch/mips/net/Makefile
 create mode 100644 arch/mips/net/bpf_jit.c
 create mode 100644 arch/mips/net/bpf_jit.h

diff --git a/arch/mips/Kbuild b/arch/mips/Kbuild
index d2cfe45..9179336 100644
--- a/arch/mips/Kbuild
+++ b/arch/mips/Kbuild
@@ -17,6 +17,7 @@ obj- := $(platform-)
 obj-y += kernel/
 obj-y += mm/
 obj-y += math-emu/
+obj-y += net/
 
 ifdef CONFIG_KVM
 obj-y += kvm/
diff --git a/arch/mips/net/Makefile b/arch/mips/net/Makefile
new file mode 100644
index 0000000..ae74b3a
--- /dev/null
+++ b/arch/mips/net/Makefile
@@ -0,0 +1,3 @@
+# MIPS networking code
+
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
new file mode 100644
index 0000000..864e5b7
--- /dev/null
+++ b/arch/mips/net/bpf_jit.c
@@ -0,0 +1,1399 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <linux/kconfig.h>
+#include <linux/moduleloader.h>
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <asm/bitops.h>
+#include <asm/cacheflush.h>
+#include <asm/cpu-features.h>
+#include <asm/uasm.h>
+
+#include "bpf_jit.h"
+
+/* ABI
+ *
+ * s0	1st scratch register
+ * s1	2nd scratch register
+ * s2	offset register
+ * s3	BPF register A
+ * s4	BPF register X
+ * s5	*skb
+ * s6	*scratch memory
+ *
+ * On entry (*bpf_func)(*skb, *filter)
+ * a0 = MIPS_R_A0 = skb;
+ * a1 = MIPS_R_A1 = filter;
+ *
+ * Stack
+ * ...
+ * M[15]
+ * M[14]
+ * M[13]
+ * ...
+ * M[0] <-- r_M
+ * saved reg k-1
+ * saved reg k-2
+ * ...
+ * saved reg 0 <-- r_sp
+ * <no argument area>
+ *
+ *                     Packet layout
+ *
+ * <--------------------- len ------------------------>
+ * <--skb-len(r_skb_hl)-->< ----- skb->data_len ------>
+ * ----------------------------------------------------
+ * |                  skb->data                       |
+ * ----------------------------------------------------
+ */
+
+#define RSIZE	(sizeof(unsigned long))
+#define ptr typeof(unsigned long)
+
+/* ABI specific return values */
+#ifdef CONFIG_32BIT /* O32 */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define r_err	MIPS_R_V1
+#define r_val	MIPS_R_V0
+#else /* CONFIG_CPU_LITTLE_ENDIAN */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V1
+#endif
+#else /* N64 */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V0
+#endif
+
+#define r_ret	MIPS_R_V0
+
+/*
+ * Use 2 scratch registers to avoid pipeline interlocks.
+ * There is no overhead during epilogue and prologue since
+ * any of the $s0-$s6 registers will only be preserved if
+ * they are going to actually be used.
+ */
+#define r_s0		MIPS_R_S0 /* scratch reg 1 */
+#define r_s1		MIPS_R_S1 /* scratch reg 2 */
+#define r_off		MIPS_R_S2
+#define r_A		MIPS_R_S3
+#define r_X		MIPS_R_S4
+#define r_skb		MIPS_R_S5
+#define r_M		MIPS_R_S6
+#define r_tmp_imm	MIPS_R_T6 /* No need to preserve this */
+#define r_tmp		MIPS_R_T7 /* No need to preserve this */
+#define r_zero		MIPS_R_ZERO
+#define r_sp		MIPS_R_SP
+#define r_ra		MIPS_R_RA
+
+#define SCRATCH_OFF(k)		(4 * (k))
+
+/* JIT flags */
+#define SEEN_CALL		(1 << BPF_MEMWORDS)
+#define SEEN_SREG_SFT		(BPF_MEMWORDS + 1)
+#define SEEN_SREG_BASE		(1 << SEEN_SREG_SFT)
+#define SEEN_SREG(x)		(SEEN_SREG_BASE << (x))
+#define SEEN_S0			SEEN_SREG(0)
+#define SEEN_S1			SEEN_SREG(1)
+#define SEEN_OFF		SEEN_SREG(2)
+#define SEEN_A			SEEN_SREG(3)
+#define SEEN_X			SEEN_SREG(4)
+#define SEEN_SKB		SEEN_SREG(5)
+#define SEEN_MEM		SEEN_SREG(6)
+
+/* Arguments used by JIT */
+#define ARGS_USED_BY_JIT	2 /* only applicable to 64-bit */
+
+#define FLAG_NEED_X_RESET	(1 << 0)
+
+#define SBIT(x)			(1 << (x)) /* Signed version of BIT() */
+
+/**
+ * struct jit_ctx - JIT context
+ * @skf:		The sk_filter
+ * @prologue_bytes:	Number of bytes for prologue
+ * @idx:		Instruction index
+ * @flags:		JIT flags
+ * @offsets:		Instruction offsets
+ * @target:		Memory location for the compiled filter
+ */
+struct jit_ctx {
+	const struct sk_filter *skf;
+	unsigned int prologue_bytes;
+	u32 idx;
+	u32 flags;
+	u32 *offsets;
+	u32 *target;
+};
+
+
+static inline int optimize_div(u32 *k)
+{
+	/* power of 2 divides can be implemented with right shift */
+	if (!(*k & (*k-1))) {
+		*k = ilog2(*k);
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Simply emit the instruction if the JIT memory space has been allocated */
+#define emit_instr(ctx, func, ...)			\
+do {							\
+	if ((ctx)->target != NULL) {			\
+		u32 *p = &(ctx)->target[ctx->idx];	\
+		uasm_i_##func(&p, ##__VA_ARGS__);	\
+	}						\
+	(ctx)->idx++;					\
+} while (0)
+
+/* Determine if immediate is within the 16-bit signed range */
+static inline bool is_range16(s32 imm)
+{
+	if (imm >= SBIT(15) || imm < -SBIT(15))
+		return true;
+	return false;
+}
+
+static inline void emit_addu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, addu, dst, src1, src2);
+}
+
+static inline void emit_nop(struct jit_ctx *ctx)
+{
+	emit_instr(ctx, nop);
+}
+
+/* Load a u32 immediate to a register */
+static inline void emit_load_imm(unsigned int dst, u32 imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		/* addiu can only handle s16 */
+		if (is_range16(imm)) {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_lui(&p, r_tmp_imm, (s32)imm >> 16);
+			p = &ctx->target[ctx->idx + 1];
+			uasm_i_ori(&p, dst, r_tmp_imm, imm & 0xffff);
+		} else {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_addiu(&p, dst, r_zero, imm);
+		}
+	}
+	ctx->idx++;
+
+	if (is_range16(imm))
+		ctx->idx++;
+}
+
+static inline void emit_or(unsigned int dst, unsigned int src1,
+			   unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, or, dst, src1, src2);
+}
+
+static inline void emit_ori(unsigned int dst, unsigned src, u32 imm,
+			    struct jit_ctx *ctx)
+{
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_or(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, ori, dst, src, imm);
+	}
+}
+
+
+static inline void emit_daddu(unsigned int dst, unsigned int src1,
+			      unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, daddu, dst, src1, src2);
+}
+
+static inline void emit_daddiu(unsigned int dst, unsigned int src,
+			       int imm, struct jit_ctx *ctx)
+{
+	/*
+	 * Only used for stack, so the imm is relatively small
+	 * and it fits in 15-bits
+	 */
+	emit_instr(ctx, daddiu, dst, src, imm);
+}
+
+static inline void emit_addiu(unsigned int dst, unsigned int src,
+			      u32 imm, struct jit_ctx *ctx)
+{
+	if (is_range16(imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_addu(dst, r_tmp, src, ctx);
+	} else {
+		emit_instr(ctx, addiu, dst, src, imm);
+	}
+}
+
+static inline void emit_and(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, and, dst, src1, src2);
+}
+
+static inline void emit_andi(unsigned int dst, unsigned int src,
+			     u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_and(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, andi, dst, src, imm);
+	}
+}
+
+static inline void emit_xor(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, xor, dst, src1, src2);
+}
+
+static inline void emit_xori(ptr dst, ptr src, u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_xor(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, xori, dst, src, imm);
+	}
+}
+
+static inline void emit_stack_offset(int offset, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, daddiu, r_sp, r_sp, offset);
+	else
+		emit_instr(ctx, addiu, r_sp, r_sp, offset);
+
+}
+
+static inline void emit_subu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, subu, dst, src1, src2);
+}
+
+static inline void emit_neg(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_subu(reg, r_zero, reg, ctx);
+}
+
+static inline void emit_sllv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sllv, dst, src, sa);
+}
+
+static inline void emit_sll(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, sll, dst, src, sa);
+}
+
+static inline void emit_srlv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, srlv, dst, src, sa);
+}
+
+static inline void emit_srl(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, srl, dst, src, sa);
+}
+
+static inline void emit_sltu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sltu, dst, src1, src2);
+}
+
+static inline void emit_sltiu(unsigned dst, unsigned int src,
+			      unsigned int imm, struct jit_ctx *ctx)
+{
+	/* 16 bit immediate */
+	if (is_range16((s32)imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_sltu(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, sltiu, dst, src, imm);
+	}
+
+}
+
+/* Store register on the stack */
+static inline void emit_store_stack_reg(ptr reg, ptr base,
+					unsigned int offset,
+					struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, sd, reg, offset, base);
+	else
+		emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_store(ptr reg, ptr base, unsigned int offset,
+			      struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_load_stack_reg(ptr reg, ptr base,
+				       unsigned int offset,
+				       struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, ld, reg, offset, base);
+	else
+		emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_load(unsigned int reg, unsigned int base,
+			     unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_load_byte(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lb, reg, offset, base);
+}
+
+static inline void emit_half_load(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lh, reg, offset, base);
+}
+
+static inline void emit_mul(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, mul, dst, src1, src2);
+}
+
+static inline void emit_div(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mfhi(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_mod(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mflo(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_dsll(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsll, dst, src, sa);
+}
+
+static inline void emit_dsrl32(unsigned int dst, unsigned int src,
+			       unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsrl32, dst, src, sa);
+}
+
+static inline void emit_wsbh(unsigned int dst, unsigned int src,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, wsbh, dst, src);
+}
+
+/* load a function pointer to register */
+static inline void emit_load_func(unsigned int reg, ptr imm,
+				  struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT)) {
+		/* At this point imm is always 64-bit */
+		emit_load_imm(r_tmp, (u64)imm >> 32, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(r_tmp, r_tmp_imm, (imm >> 16) & 0xffff, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(reg, r_tmp_imm, imm & 0xffff, ctx);
+	} else {
+		emit_load_imm(reg, imm, ctx);
+	}
+}
+
+/* Move to real MIPS register */
+static inline void emit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_daddu(dst, src, r_zero, ctx);
+	else
+		emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Move to JIT (32-bit) register */
+static inline void emit_jit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Compute the immediate value for PC-relative branches. */
+static inline u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
+{
+	if (ctx->target == NULL)
+		return 0;
+
+	/*
+	 * We want a pc-relative branch. We only do forward branches
+	 * so tgt is always after pc. tgt is the instruction offset
+	 * we want to jump to.
+
+	 * Branch on MIPS:
+	 * I: target_offset <- sign_extend(offset)
+	 * I+1: PC += target_offset (delay slot)
+	 *
+	 * ctx->idx currently points to the branch instruction
+	 * but the offset is added to the delay slot so we need
+	 * to subtract 4.
+	 */
+	return ctx->offsets[tgt] -
+		(ctx->idx * 4 - ctx->prologue_bytes) - 4;
+}
+
+static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
+			     unsigned int imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+
+		switch (cond) {
+		case MIPS_COND_EQ:
+			uasm_i_beq(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_NE:
+			uasm_i_bne(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_ALL:
+			uasm_i_b(&p, imm);
+			break;
+		default:
+			pr_warn("%s: Unhandled branch conditional: %d\n",
+				__func__, cond);
+		}
+	}
+	ctx->idx++;
+}
+
+static inline void emit_b(unsigned int imm, struct jit_ctx *ctx)
+{
+	emit_bcond(MIPS_COND_ALL, r_zero, r_zero, imm, ctx);
+}
+
+static inline void emit_jalr(unsigned int link, unsigned int reg,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jalr, link, reg);
+}
+
+static inline void emit_jr(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jr, reg);
+}
+
+static inline u16 align_sp(unsigned int num)
+{
+	/* Double word alignment for 32-bit, quadword for 64-bit */
+	unsigned int align = config_enabled(CONFIG_64BIT) ? 16 : 8;
+	num = (num + (align - 1)) & -align;
+	return num;
+}
+
+static inline void update_on_xread(struct jit_ctx *ctx)
+{
+	if (!(ctx->flags & SEEN_X))
+		ctx->flags |= FLAG_NEED_X_RESET;
+
+	ctx->flags |= SEEN_X;
+}
+
+static bool is_load_to_a(u16 inst)
+{
+	switch (inst) {
+	case BPF_S_LD_W_LEN:
+	case BPF_S_LD_W_ABS:
+	case BPF_S_LD_H_ABS:
+	case BPF_S_LD_B_ABS:
+	case BPF_S_ANC_CPU:
+	case BPF_S_ANC_IFINDEX:
+	case BPF_S_ANC_MARK:
+	case BPF_S_ANC_PROTOCOL:
+	case BPF_S_ANC_RXHASH:
+	case BPF_S_ANC_VLAN_TAG:
+	case BPF_S_ANC_VLAN_TAG_PRESENT:
+	case BPF_S_ANC_QUEUE:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
+{
+	int i = 0, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	/* Adjust the stack pointer */
+	emit_stack_offset(-align_sp(offset), ctx);
+
+	if (ctx->flags & SEEN_CALL) {
+		/* Argument save area */
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is essentially a bitmap */
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			emit_store_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					     ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* save return address */
+	if (ctx->flags & SEEN_CALL) {
+		emit_store_stack_reg(r_ra, r_sp, real_off, ctx);
+		real_off += RSIZE;
+	}
+
+	/* Setup r_M leaving the alignment gap if necessary */
+	if (ctx->flags & SEEN_MEM) {
+		if (real_off % (RSIZE * 2))
+			real_off += RSIZE;
+		emit_addiu(r_M, r_sp, real_off, ctx);
+	}
+}
+
+static void restore_bpf_jit_regs(struct jit_ctx *ctx,
+				 unsigned int offset)
+{
+	int i, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	if (ctx->flags & SEEN_CALL) {
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_load_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_load_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is a bitmap */
+	i = 0;
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			emit_load_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					    ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* restore return address */
+	if (ctx->flags & SEEN_CALL)
+		emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
+
+	/* Restore the sp and discard the scrach memory */
+	emit_stack_offset(align_sp(offset), ctx);
+}
+
+static unsigned int get_stack_depth(struct jit_ctx *ctx)
+{
+	int sp_off = 0;
+
+
+	/* How may s* regs do we need to preserved? */
+	sp_off += hweight32(ctx->flags >> SEEN_SREG_SFT) * RSIZE;
+
+	if (ctx->flags & SEEN_MEM)
+		sp_off += 4 * BPF_MEMWORDS; /* BPF_MEMWORDS are 32-bit */
+
+	if (ctx->flags & SEEN_CALL)
+		/*
+		 * The JIT code make calls to external functions using 2
+		 * arguments. Therefore, for o32 we don't need to allocate
+		 * space because we don't care if the argumetns are lost
+		 * across calls. We do need however to preserve incoming
+		 * arguments but the space is already allocated for us by
+		 * the caller. On the other hand, for n64, we need to allocate
+		 * this space ourselves. We need to preserve $ra as well.
+		 */
+		sp_off += config_enabled(CONFIG_64BIT) ?
+			(ARGS_USED_BY_JIT + 1) * RSIZE : RSIZE;
+
+	/*
+	 * Subtract the bytes for the last registers since we only care about
+	 * the location on the stack pointer.
+	 */
+	return sp_off - RSIZE;
+}
+
+static void build_prologue(struct jit_ctx *ctx)
+{
+	u16 first_inst = ctx->skf->insns[0].code;
+	int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+	sp_off = get_stack_depth(ctx);
+	save_bpf_jit_regs(ctx, sp_off);
+
+	if (ctx->flags & SEEN_SKB)
+		emit_reg_move(r_skb, MIPS_R_A0, ctx);
+
+	if (ctx->flags & FLAG_NEED_X_RESET)
+		emit_jit_reg_move(r_X, r_zero, ctx);
+
+	/* Do not leak kernel data to userspace */
+	if ((first_inst != BPF_S_RET_K) && !(is_load_to_a(first_inst)))
+		emit_jit_reg_move(r_A, r_zero, ctx);
+}
+
+static void build_epilogue(struct jit_ctx *ctx)
+{
+	unsigned int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+
+	sp_off = get_stack_depth(ctx);
+	restore_bpf_jit_regs(ctx, sp_off);
+
+	/* Return */
+	emit_jr(r_ra, ctx);
+	emit_nop(ctx);
+}
+
+static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
+{
+	u8 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 1);
+
+	return (u64)err << 32 | ret;
+}
+
+static u64 jit_get_skb_h(struct sk_buff *skb, unsigned offset)
+{
+	u16 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 2);
+
+	return (u64)err << 32 | ntohs(ret);
+}
+
+static u64 jit_get_skb_w(struct sk_buff *skb, unsigned offset)
+{
+	u32 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 4);
+
+	return (u64)err << 32 | ntohl(ret);
+}
+
+#define PKT_TYPE_MAX 7
+static int pkt_type_offset(void)
+{
+	struct sk_buff skb_probe = {
+		.pkt_type = ~0,
+	};
+	char *ct = (char *)&skb_probe;
+	unsigned int off;
+
+	for (off = 0; off < sizeof(struct sk_buff); off++) {
+		if (ct[off] == PKT_TYPE_MAX)
+			return off;
+	}
+	pr_err_once("Please fix pkt_type_offset(), as pkt_type couldn't be found\n");
+	return -1;
+}
+
+static int build_body(struct jit_ctx *ctx)
+{
+	void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
+	const struct sk_filter *prog = ctx->skf;
+	const struct sock_filter *inst;
+	unsigned int i, off, load_order, condt;
+	u32 k, b_off __maybe_unused;
+
+	for (i = 0; i < prog->len; i++) {
+		inst = &(prog->insns[i]);
+		pr_debug("%s: code->0x%02x, jt->0x%x, jf->0x%x, k->0x%x\n",
+			 __func__, inst->code, inst->jt, inst->jf, inst->k);
+		k = inst->k;
+
+		if (ctx->target == NULL)
+			ctx->offsets[i] = ctx->idx * 4;
+
+		switch (inst->code) {
+		case BPF_S_LD_IMM:
+			/* A <- k ==> li r_A, k */
+			ctx->flags |= SEEN_A;
+			emit_load_imm(r_A, k, ctx);
+			break;
+		case BPF_S_LD_W_LEN:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4);
+			/* A <- len ==> lw r_A, offset(skb) */
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_LD_MEM:
+			/* A <- M[k] ==> lw r_A, offset(M) */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_load(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LD_W_ABS:
+			/* A <- P[k:4] */
+			load_order = 2;
+			goto load;
+		case BPF_S_LD_H_ABS:
+			/* A <- P[k:2] */
+			load_order = 1;
+			goto load;
+		case BPF_S_LD_B_ABS:
+			/* A <- P[k:1] */
+			load_order = 0;
+load:
+			emit_load_imm(r_off, k, ctx);
+load_common:
+			ctx->flags |= SEEN_CALL | SEEN_OFF | SEEN_S0 |
+				SEEN_SKB | SEEN_A;
+
+			emit_load_func(r_s0, (ptr)load_func[load_order],
+				      ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			/* Load second argument to delay slot */
+			emit_reg_move(MIPS_R_A1, r_off, ctx);
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Get error code from the top 32-bits */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero, 3 << 2,
+					   ctx);
+			} else {
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_err, r_zero, 3 << 2,
+					   ctx);
+			}
+			emit_nop(ctx);
+			/* We are good */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_jit_reg_move(r_A, r_val, ctx);
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			break;
+		case BPF_S_LD_W_IND:
+			/* A <- P[X + k:4] */
+			load_order = 2;
+			goto load_ind;
+		case BPF_S_LD_H_IND:
+			/* A <- P[X + k:2] */
+			load_order = 1;
+			goto load_ind;
+		case BPF_S_LD_B_IND:
+			/* A <- P[X + k:1] */
+			load_order = 0;
+load_ind:
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_OFF | SEEN_X;
+			emit_addiu(r_off, r_X, k, ctx);
+			goto load_common;
+		case BPF_S_LDX_IMM:
+			/* X <- k */
+			ctx->flags |= SEEN_X;
+			emit_load_imm(r_X, k, ctx);
+			break;
+		case BPF_S_LDX_MEM:
+			/* X <- M[k] */
+			ctx->flags |= SEEN_X | SEEN_MEM;
+			emit_load(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LDX_W_LEN:
+			/* X <- len */
+			ctx->flags |= SEEN_X | SEEN_SKB;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_X, r_skb, off, ctx);
+			break;
+		case BPF_S_LDX_B_MSH:
+			/* X <- 4 * (P[k:1] & 0xf) */
+			ctx->flags |= SEEN_X | SEEN_CALL | SEEN_S0 | SEEN_SKB;
+			/* Load offset to a1 */
+			emit_load_func(r_s0, (ptr)jit_get_skb_b, ctx);
+			/*
+			 * This may emit two instructions so it may not fit
+			 * in the delay slot. So use a0 in the delay slot.
+			 */
+			emit_load_imm(MIPS_R_A1, k, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx); /* delay slot */
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Top 32-bits of $v0 on 64-bit */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero,
+					   3 << 2, ctx);
+			} else {
+				emit_bcond(MIPS_COND_NE, r_err, r_zero,
+					   3 << 2, ctx);
+			}
+			/* No need for delay slot */
+			/* We are good */
+			/* X <- P[1:K] & 0xf */
+			emit_andi(r_X, r_val, 0xf, ctx);
+			/* X << 2 */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_sll(r_X, r_X, 2, ctx); /* delay slot */
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_ret, 0, ctx); /* delay slot */
+			break;
+		case BPF_S_ST:
+			/* M[k] <- A */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_store(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_STX:
+			/* M[k] <- X */
+			ctx->flags |= SEEN_MEM | SEEN_X;
+			emit_store(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_ALU_ADD_K:
+			/* A += K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_ADD_X:
+			/* A += X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_addu(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_SUB_K:
+			/* A -= K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, -k, ctx);
+			break;
+		case BPF_S_ALU_SUB_X:
+			/* A -= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_subu(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_MUL_K:
+			/* A *= K */
+			/* Load K to scratch register before MUL */
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_mul(r_A, r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MUL_X:
+			/* A *= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_mul(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_DIV_K:
+			/* A /= k */
+			if (k == 1)
+				break;
+			if (optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_srl(r_A, r_A, k, ctx);
+				break;
+			}
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_div(r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MOD_K:
+			/* A %= k */
+			if (k == 1 || optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_jit_reg_move(r_A, r_zero, ctx);
+			} else {
+				ctx->flags |= SEEN_A | SEEN_S0;
+				emit_load_imm(r_s0, k, ctx);
+				emit_mod(r_A, r_s0, ctx);
+			}
+			break;
+		case BPF_S_ALU_DIV_X:
+			/* A /= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_div(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_MOD_X:
+			/* A %= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_mod(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_OR_K:
+			/* A |= K */
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_OR_X:
+			/* A |= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_XOR_K:
+			/* A ^= k */
+			ctx->flags |= SEEN_A;
+			emit_xori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ANC_ALU_XOR_X:
+		case BPF_S_ALU_XOR_X:
+			/* A ^= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_xor(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_AND_K:
+			/* A &= K */
+			ctx->flags |= SEEN_A;
+			emit_andi(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_AND_X:
+			/* A &= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_and(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_LSH_K:
+			/* A <<= K */
+			ctx->flags |= SEEN_A;
+			emit_sll(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_LSH_X:
+			/* A <<= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_sllv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_RSH_K:
+			/* A >>= K */
+			ctx->flags |= SEEN_A;
+			emit_srl(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_RSH_X:
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_srlv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_NEG:
+			/* A = -A */
+			ctx->flags |= SEEN_A;
+			emit_neg(r_A, ctx);
+			break;
+		case BPF_S_JMP_JA:
+			/* pc += K */
+			emit_b(b_imm(i + k + 1, ctx), ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JEQ_K:
+			/* pc += ( A == K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JEQ_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A == X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_K:
+			/* pc += ( A >= K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A >= X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_K:
+			/* pc += ( A > K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A > X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_X;
+jmp_cmp:
+			/* Greater or Equal */
+			if ((condt & MIPS_COND_GE) ||
+			    (condt & MIPS_COND_GT)) {
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_sltiu(r_s0, r_A, k, ctx);
+				} else { /* X */
+					ctx->flags |= SEEN_S0 | SEEN_A |
+						SEEN_X;
+					emit_sltu(r_s0, r_A, r_X, ctx);
+				}
+				/* A < (K|X) ? r_scrach = 1 */
+				b_off = b_imm(i + inst->jf + 1, ctx);
+				emit_bcond(MIPS_COND_GT, r_s0, r_zero, b_off,
+					   ctx);
+				emit_nop(ctx);
+				/* A > (K|X) ? scratch = 0 */
+				if (condt & MIPS_COND_GT) {
+					/* Checking for equality */
+					ctx->flags |= SEEN_S0 | SEEN_A | SEEN_X;
+					if (condt & MIPS_COND_K)
+						emit_load_imm(r_s0, k, ctx);
+					else
+						emit_jit_reg_move(r_s0, r_X,
+								  ctx);
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* Finally, A > K|X */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				} else {
+					/* A >= (K|X) so jump */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				}
+			} else {
+				/* A == K|X */
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_load_imm(r_s0, k, ctx);
+					/* jump true */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+				} else { /* X */
+					/* jump true */
+					ctx->flags |= SEEN_A | SEEN_X;
+					b_off = b_imm(i + inst->jt + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+				}
+			}
+			break;
+		case BPF_S_JMP_JSET_K:
+			ctx->flags |= SEEN_S0 | SEEN_S1 | SEEN_A;
+			/* pc += (A & K) ? pc -> jt : pc -> jf */
+			emit_load_imm(r_s1, k, ctx);
+			emit_and(r_s0, r_A, r_s1, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JSET_X:
+			ctx->flags |= SEEN_S0 | SEEN_X | SEEN_A;
+			/* pc += (A & X) ? pc -> jt : pc -> jf */
+			emit_and(r_s0, r_A, r_X, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_RET_A:
+			ctx->flags |= SEEN_A;
+			if (i != prog->len - 1)
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_A, ctx); /* delay slot */
+			break;
+		case BPF_S_RET_K:
+			/*
+			 * It can emit two instructions so it does not fit on
+			 * the delay slot.
+			 */
+			emit_load_imm(r_ret, k, ctx);
+			if (i != prog->len - 1) {
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+				emit_nop(ctx);
+			}
+			break;
+		case BPF_S_MISC_TAX:
+			/* X = A */
+			ctx->flags |= SEEN_X | SEEN_A;
+			emit_jit_reg_move(r_X, r_A, ctx);
+			break;
+		case BPF_S_MISC_TXA:
+			/* A = X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_jit_reg_move(r_A, r_X, ctx);
+			break;
+		/* AUX */
+		case BPF_S_ANC_PROTOCOL:
+			/* A = ntohs(skb->protocol */
+			ctx->flags |= SEEN_SKB | SEEN_OFF | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  protocol) != 2);
+			off = offsetof(struct sk_buff, protocol);
+			emit_half_load(r_A, r_skb, off, ctx);
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+			/* This needs little endian fixup */
+			if (cpu_has_mips_r1) {
+				/* Get first byte */
+				emit_andi(r_tmp_imm, r_A, 0xff, ctx);
+				/* Shift it */
+				emit_sll(r_tmp, r_tmp_imm, 8, ctx);
+				/* Get second byte */
+				emit_srl(r_tmp_imm, r_A, 8, ctx);
+				emit_andi(r_tmp_imm, r_tmp_imm, 0xff, ctx);
+				/* Put everyting together in r_A */
+				emit_or(r_A, r_tmp, r_tmp_imm, ctx);
+			} else {
+				/* R2 and later have the wsbh instruction */
+				emit_wsbh(r_A, r_A, ctx);
+			}
+#endif
+			break;
+		case BPF_S_ANC_CPU:
+			ctx->flags |= SEEN_A | SEEN_OFF;
+			/* A = current_thread_info()->cpu */
+			BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info,
+						  cpu) != 4);
+			off = offsetof(struct thread_info, cpu);
+			/* $28/gp points to the thread_info struct */
+			emit_load(r_A, 28, off, ctx);
+			break;
+		case BPF_S_ANC_IFINDEX:
+			/* A = skb->dev->ifindex */
+			ctx->flags |= SEEN_SKB | SEEN_A | SEEN_S0;
+			off = offsetof(struct sk_buff, dev);
+			emit_load(r_s0, r_skb, off, ctx);
+			/* error (0) in the delay slot */
+			emit_bcond(MIPS_COND_EQ, r_s0, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+						  ifindex) != 4);
+			off = offsetof(struct net_device, ifindex);
+			emit_load(r_A, r_s0, off, ctx);
+			break;
+		case BPF_S_ANC_MARK:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
+			off = offsetof(struct sk_buff, mark);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_RXHASH:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, hash) != 4);
+			off = offsetof(struct sk_buff, hash);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_VLAN_TAG:
+		case BPF_S_ANC_VLAN_TAG_PRESENT:
+			ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  vlan_tci) != 2);
+			off = offsetof(struct sk_buff, vlan_tci);
+			emit_half_load(r_s0, r_skb, off, ctx);
+			if (inst->code == BPF_S_ANC_VLAN_TAG)
+				emit_and(r_A, r_s0, VLAN_VID_MASK, ctx);
+			else
+				emit_and(r_A, r_s0, VLAN_TAG_PRESENT, ctx);
+			break;
+		case BPF_S_ANC_PKTTYPE:
+			off = pkt_type_offset();
+
+			if (off < 0)
+				return -1;
+			emit_load_byte(r_tmp, r_skb, off, ctx);
+			/* Keep only the last 3 bits */
+			emit_andi(r_A, r_tmp, PKT_TYPE_MAX, ctx);
+			break;
+		case BPF_S_ANC_QUEUE:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  queue_mapping) != 2);
+			BUILD_BUG_ON(offsetof(struct sk_buff,
+					      queue_mapping) > 0xff);
+			off = offsetof(struct sk_buff, queue_mapping);
+			emit_half_load(r_A, r_skb, off, ctx);
+			break;
+		default:
+			pr_warn("%s: Unhandled opcode: 0x%02x\n", __FILE__,
+				inst->code);
+			return -1;
+		}
+	}
+
+	/* compute offsets only during the first pass */
+	if (ctx->target == NULL)
+		ctx->offsets[i] = ctx->idx * 4;
+
+	return 0;
+}
+
+int bpf_jit_enable __read_mostly;
+
+void bpf_jit_compile(struct sk_filter *fp)
+{
+	struct jit_ctx ctx;
+	unsigned int alloc_size, tmp_idx;
+
+	if (!bpf_jit_enable)
+		return;
+
+	memset(&ctx, 0, sizeof(ctx));
+
+	ctx.offsets = kcalloc(fp->len, sizeof(*ctx.offsets), GFP_KERNEL);
+	if (ctx.offsets == NULL)
+		return;
+
+	ctx.skf = fp;
+
+	if (build_body(&ctx))
+		goto out;
+
+	tmp_idx = ctx.idx;
+	build_prologue(&ctx);
+	ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
+	/* just to complete the ctx.idx count */
+	build_epilogue(&ctx);
+
+	alloc_size = 4 * ctx.idx;
+	ctx.target = module_alloc(alloc_size);
+	if (ctx.target == NULL)
+		goto out;
+
+	/* Clean it */
+	memset(ctx.target, 0, alloc_size);
+
+	ctx.idx = 0;
+
+	/* Generate the actual JIT code */
+	build_prologue(&ctx);
+	build_body(&ctx);
+	build_epilogue(&ctx);
+
+	/* Update the icache */
+	flush_icache_range((ptr)ctx.target, (ptr)(ctx.target + ctx.idx));
+
+	if (bpf_jit_enable > 1)
+		/* Dump JIT code */
+		bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);
+
+	fp->bpf_func = (void *)ctx.target;
+	fp->jited = 1;
+
+out:
+	kfree(ctx.offsets);
+}
+
+void bpf_jit_free(struct sk_filter *fp)
+{
+	if (fp->jited)
+		module_free(NULL, fp->bpf_func);
+	kfree(fp);
+}
diff --git a/arch/mips/net/bpf_jit.h b/arch/mips/net/bpf_jit.h
new file mode 100644
index 0000000..3a5751b
--- /dev/null
+++ b/arch/mips/net/bpf_jit.h
@@ -0,0 +1,44 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#ifndef BPF_JIT_MIPS_OP_H
+#define BPF_JIT_MIPS_OP_H
+
+/* Registers used by JIT */
+#define MIPS_R_ZERO	0
+#define MIPS_R_V0	2
+#define MIPS_R_V1	3
+#define MIPS_R_A0	4
+#define MIPS_R_A1	5
+#define MIPS_R_T6	14
+#define MIPS_R_T7	15
+#define MIPS_R_S0	16
+#define MIPS_R_S1	17
+#define MIPS_R_S2	18
+#define MIPS_R_S3	19
+#define MIPS_R_S4	20
+#define MIPS_R_S5	21
+#define MIPS_R_S6	22
+#define MIPS_R_S7	23
+#define MIPS_R_SP	29
+#define MIPS_R_RA	31
+
+/* Conditional codes */
+#define MIPS_COND_EQ	0x1
+#define MIPS_COND_GE	(0x1 << 1)
+#define MIPS_COND_GT	(0x1 << 2)
+#define MIPS_COND_NE	(0x1 << 3)
+#define MIPS_COND_ALL	(0x1 << 4)
+/* Conditionals on X register or K immediate */
+#define MIPS_COND_X	(0x1 << 5)
+#define MIPS_COND_K	(0x1 << 6)
+
+#endif /* BPF_JIT_MIPS_OP_H */
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 13/14] MIPS: net: Add BPF JIT
@ 2014-04-24 14:50     ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-24 14:50 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras, David S. Miller, netdev

This adds initial support for BPF-JIT on MIPS

Tested on mips32 LE/BE and mips64 BE/n64 using
dhcp, ping and various tcpdump filters.

Benchmarking:

Assuming the remote MIPS target uses 192.168.154.181
as its IP address, and the local host uses 192.168.154.136,
the following results can be obtained using the following
tcpdump filter (catches no frames) and a simple
'time ping -f -c 1000000' command.

[root@(none) ~]# tcpdump -p -n -s 0 -i eth0 net 10.0.0.0/24 -d
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 8
(002) ld       [26]
(003) and      #0xffffff00
(004) jeq      #0xa000000       jt 16	jf 5
(005) ld       [30]
(006) and      #0xffffff00
(007) jeq      #0xa000000       jt 16	jf 17
(008) jeq      #0x806           jt 10	jf 9
(009) jeq      #0x8035          jt 10	jf 17
(010) ld       [28]
(011) and      #0xffffff00
(012) jeq      #0xa000000       jt 16	jf 13
(013) ld       [38]
(014) and      #0xffffff00
(015) jeq      #0xa000000       jt 16	jf 17
(016) ret      #65535

- BPF-JIT Disabled

real    1m38.005s
user    0m1.510s
sys     0m6.710s

- BPF-JIT Enabled

real    1m35.215s
user    0m1.200s
sys     0m4.140s

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
Changes since v2
- Use cpu_has_mips_r1 instead of CONFIG_CPU_MIPSR1 for runtime MIPS32/64 R1 detection
- Implement missing opcodes
- Remove a few pr_debug statements
- A bit of clean up
---
 arch/mips/Kbuild        |    1 +
 arch/mips/net/Makefile  |    3 +
 arch/mips/net/bpf_jit.c | 1399 +++++++++++++++++++++++++++++++++++++++++++++++
 arch/mips/net/bpf_jit.h |   44 ++
 4 files changed, 1447 insertions(+)
 create mode 100644 arch/mips/net/Makefile
 create mode 100644 arch/mips/net/bpf_jit.c
 create mode 100644 arch/mips/net/bpf_jit.h

diff --git a/arch/mips/Kbuild b/arch/mips/Kbuild
index d2cfe45..9179336 100644
--- a/arch/mips/Kbuild
+++ b/arch/mips/Kbuild
@@ -17,6 +17,7 @@ obj- := $(platform-)
 obj-y += kernel/
 obj-y += mm/
 obj-y += math-emu/
+obj-y += net/
 
 ifdef CONFIG_KVM
 obj-y += kvm/
diff --git a/arch/mips/net/Makefile b/arch/mips/net/Makefile
new file mode 100644
index 0000000..ae74b3a
--- /dev/null
+++ b/arch/mips/net/Makefile
@@ -0,0 +1,3 @@
+# MIPS networking code
+
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
new file mode 100644
index 0000000..864e5b7
--- /dev/null
+++ b/arch/mips/net/bpf_jit.c
@@ -0,0 +1,1399 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <linux/kconfig.h>
+#include <linux/moduleloader.h>
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <asm/bitops.h>
+#include <asm/cacheflush.h>
+#include <asm/cpu-features.h>
+#include <asm/uasm.h>
+
+#include "bpf_jit.h"
+
+/* ABI
+ *
+ * s0	1st scratch register
+ * s1	2nd scratch register
+ * s2	offset register
+ * s3	BPF register A
+ * s4	BPF register X
+ * s5	*skb
+ * s6	*scratch memory
+ *
+ * On entry (*bpf_func)(*skb, *filter)
+ * a0 = MIPS_R_A0 = skb;
+ * a1 = MIPS_R_A1 = filter;
+ *
+ * Stack
+ * ...
+ * M[15]
+ * M[14]
+ * M[13]
+ * ...
+ * M[0] <-- r_M
+ * saved reg k-1
+ * saved reg k-2
+ * ...
+ * saved reg 0 <-- r_sp
+ * <no argument area>
+ *
+ *                     Packet layout
+ *
+ * <--------------------- len ------------------------>
+ * <--skb-len(r_skb_hl)-->< ----- skb->data_len ------>
+ * ----------------------------------------------------
+ * |                  skb->data                       |
+ * ----------------------------------------------------
+ */
+
+#define RSIZE	(sizeof(unsigned long))
+#define ptr typeof(unsigned long)
+
+/* ABI specific return values */
+#ifdef CONFIG_32BIT /* O32 */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define r_err	MIPS_R_V1
+#define r_val	MIPS_R_V0
+#else /* CONFIG_CPU_LITTLE_ENDIAN */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V1
+#endif
+#else /* N64 */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V0
+#endif
+
+#define r_ret	MIPS_R_V0
+
+/*
+ * Use 2 scratch registers to avoid pipeline interlocks.
+ * There is no overhead during epilogue and prologue since
+ * any of the $s0-$s6 registers will only be preserved if
+ * they are going to actually be used.
+ */
+#define r_s0		MIPS_R_S0 /* scratch reg 1 */
+#define r_s1		MIPS_R_S1 /* scratch reg 2 */
+#define r_off		MIPS_R_S2
+#define r_A		MIPS_R_S3
+#define r_X		MIPS_R_S4
+#define r_skb		MIPS_R_S5
+#define r_M		MIPS_R_S6
+#define r_tmp_imm	MIPS_R_T6 /* No need to preserve this */
+#define r_tmp		MIPS_R_T7 /* No need to preserve this */
+#define r_zero		MIPS_R_ZERO
+#define r_sp		MIPS_R_SP
+#define r_ra		MIPS_R_RA
+
+#define SCRATCH_OFF(k)		(4 * (k))
+
+/* JIT flags */
+#define SEEN_CALL		(1 << BPF_MEMWORDS)
+#define SEEN_SREG_SFT		(BPF_MEMWORDS + 1)
+#define SEEN_SREG_BASE		(1 << SEEN_SREG_SFT)
+#define SEEN_SREG(x)		(SEEN_SREG_BASE << (x))
+#define SEEN_S0			SEEN_SREG(0)
+#define SEEN_S1			SEEN_SREG(1)
+#define SEEN_OFF		SEEN_SREG(2)
+#define SEEN_A			SEEN_SREG(3)
+#define SEEN_X			SEEN_SREG(4)
+#define SEEN_SKB		SEEN_SREG(5)
+#define SEEN_MEM		SEEN_SREG(6)
+
+/* Arguments used by JIT */
+#define ARGS_USED_BY_JIT	2 /* only applicable to 64-bit */
+
+#define FLAG_NEED_X_RESET	(1 << 0)
+
+#define SBIT(x)			(1 << (x)) /* Signed version of BIT() */
+
+/**
+ * struct jit_ctx - JIT context
+ * @skf:		The sk_filter
+ * @prologue_bytes:	Number of bytes for prologue
+ * @idx:		Instruction index
+ * @flags:		JIT flags
+ * @offsets:		Instruction offsets
+ * @target:		Memory location for the compiled filter
+ */
+struct jit_ctx {
+	const struct sk_filter *skf;
+	unsigned int prologue_bytes;
+	u32 idx;
+	u32 flags;
+	u32 *offsets;
+	u32 *target;
+};
+
+
+static inline int optimize_div(u32 *k)
+{
+	/* power of 2 divides can be implemented with right shift */
+	if (!(*k & (*k-1))) {
+		*k = ilog2(*k);
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Simply emit the instruction if the JIT memory space has been allocated */
+#define emit_instr(ctx, func, ...)			\
+do {							\
+	if ((ctx)->target != NULL) {			\
+		u32 *p = &(ctx)->target[ctx->idx];	\
+		uasm_i_##func(&p, ##__VA_ARGS__);	\
+	}						\
+	(ctx)->idx++;					\
+} while (0)
+
+/* Determine if immediate is within the 16-bit signed range */
+static inline bool is_range16(s32 imm)
+{
+	if (imm >= SBIT(15) || imm < -SBIT(15))
+		return true;
+	return false;
+}
+
+static inline void emit_addu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, addu, dst, src1, src2);
+}
+
+static inline void emit_nop(struct jit_ctx *ctx)
+{
+	emit_instr(ctx, nop);
+}
+
+/* Load a u32 immediate to a register */
+static inline void emit_load_imm(unsigned int dst, u32 imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		/* addiu can only handle s16 */
+		if (is_range16(imm)) {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_lui(&p, r_tmp_imm, (s32)imm >> 16);
+			p = &ctx->target[ctx->idx + 1];
+			uasm_i_ori(&p, dst, r_tmp_imm, imm & 0xffff);
+		} else {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_addiu(&p, dst, r_zero, imm);
+		}
+	}
+	ctx->idx++;
+
+	if (is_range16(imm))
+		ctx->idx++;
+}
+
+static inline void emit_or(unsigned int dst, unsigned int src1,
+			   unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, or, dst, src1, src2);
+}
+
+static inline void emit_ori(unsigned int dst, unsigned src, u32 imm,
+			    struct jit_ctx *ctx)
+{
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_or(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, ori, dst, src, imm);
+	}
+}
+
+
+static inline void emit_daddu(unsigned int dst, unsigned int src1,
+			      unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, daddu, dst, src1, src2);
+}
+
+static inline void emit_daddiu(unsigned int dst, unsigned int src,
+			       int imm, struct jit_ctx *ctx)
+{
+	/*
+	 * Only used for stack, so the imm is relatively small
+	 * and it fits in 15-bits
+	 */
+	emit_instr(ctx, daddiu, dst, src, imm);
+}
+
+static inline void emit_addiu(unsigned int dst, unsigned int src,
+			      u32 imm, struct jit_ctx *ctx)
+{
+	if (is_range16(imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_addu(dst, r_tmp, src, ctx);
+	} else {
+		emit_instr(ctx, addiu, dst, src, imm);
+	}
+}
+
+static inline void emit_and(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, and, dst, src1, src2);
+}
+
+static inline void emit_andi(unsigned int dst, unsigned int src,
+			     u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_and(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, andi, dst, src, imm);
+	}
+}
+
+static inline void emit_xor(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, xor, dst, src1, src2);
+}
+
+static inline void emit_xori(ptr dst, ptr src, u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_xor(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, xori, dst, src, imm);
+	}
+}
+
+static inline void emit_stack_offset(int offset, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, daddiu, r_sp, r_sp, offset);
+	else
+		emit_instr(ctx, addiu, r_sp, r_sp, offset);
+
+}
+
+static inline void emit_subu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, subu, dst, src1, src2);
+}
+
+static inline void emit_neg(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_subu(reg, r_zero, reg, ctx);
+}
+
+static inline void emit_sllv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sllv, dst, src, sa);
+}
+
+static inline void emit_sll(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, sll, dst, src, sa);
+}
+
+static inline void emit_srlv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, srlv, dst, src, sa);
+}
+
+static inline void emit_srl(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, srl, dst, src, sa);
+}
+
+static inline void emit_sltu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sltu, dst, src1, src2);
+}
+
+static inline void emit_sltiu(unsigned dst, unsigned int src,
+			      unsigned int imm, struct jit_ctx *ctx)
+{
+	/* 16 bit immediate */
+	if (is_range16((s32)imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_sltu(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, sltiu, dst, src, imm);
+	}
+
+}
+
+/* Store register on the stack */
+static inline void emit_store_stack_reg(ptr reg, ptr base,
+					unsigned int offset,
+					struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, sd, reg, offset, base);
+	else
+		emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_store(ptr reg, ptr base, unsigned int offset,
+			      struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_load_stack_reg(ptr reg, ptr base,
+				       unsigned int offset,
+				       struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, ld, reg, offset, base);
+	else
+		emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_load(unsigned int reg, unsigned int base,
+			     unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_load_byte(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lb, reg, offset, base);
+}
+
+static inline void emit_half_load(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lh, reg, offset, base);
+}
+
+static inline void emit_mul(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, mul, dst, src1, src2);
+}
+
+static inline void emit_div(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mfhi(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_mod(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mflo(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_dsll(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsll, dst, src, sa);
+}
+
+static inline void emit_dsrl32(unsigned int dst, unsigned int src,
+			       unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsrl32, dst, src, sa);
+}
+
+static inline void emit_wsbh(unsigned int dst, unsigned int src,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, wsbh, dst, src);
+}
+
+/* load a function pointer to register */
+static inline void emit_load_func(unsigned int reg, ptr imm,
+				  struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT)) {
+		/* At this point imm is always 64-bit */
+		emit_load_imm(r_tmp, (u64)imm >> 32, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(r_tmp, r_tmp_imm, (imm >> 16) & 0xffff, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(reg, r_tmp_imm, imm & 0xffff, ctx);
+	} else {
+		emit_load_imm(reg, imm, ctx);
+	}
+}
+
+/* Move to real MIPS register */
+static inline void emit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_daddu(dst, src, r_zero, ctx);
+	else
+		emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Move to JIT (32-bit) register */
+static inline void emit_jit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Compute the immediate value for PC-relative branches. */
+static inline u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
+{
+	if (ctx->target == NULL)
+		return 0;
+
+	/*
+	 * We want a pc-relative branch. We only do forward branches
+	 * so tgt is always after pc. tgt is the instruction offset
+	 * we want to jump to.
+
+	 * Branch on MIPS:
+	 * I: target_offset <- sign_extend(offset)
+	 * I+1: PC += target_offset (delay slot)
+	 *
+	 * ctx->idx currently points to the branch instruction
+	 * but the offset is added to the delay slot so we need
+	 * to subtract 4.
+	 */
+	return ctx->offsets[tgt] -
+		(ctx->idx * 4 - ctx->prologue_bytes) - 4;
+}
+
+static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
+			     unsigned int imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+
+		switch (cond) {
+		case MIPS_COND_EQ:
+			uasm_i_beq(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_NE:
+			uasm_i_bne(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_ALL:
+			uasm_i_b(&p, imm);
+			break;
+		default:
+			pr_warn("%s: Unhandled branch conditional: %d\n",
+				__func__, cond);
+		}
+	}
+	ctx->idx++;
+}
+
+static inline void emit_b(unsigned int imm, struct jit_ctx *ctx)
+{
+	emit_bcond(MIPS_COND_ALL, r_zero, r_zero, imm, ctx);
+}
+
+static inline void emit_jalr(unsigned int link, unsigned int reg,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jalr, link, reg);
+}
+
+static inline void emit_jr(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jr, reg);
+}
+
+static inline u16 align_sp(unsigned int num)
+{
+	/* Double word alignment for 32-bit, quadword for 64-bit */
+	unsigned int align = config_enabled(CONFIG_64BIT) ? 16 : 8;
+	num = (num + (align - 1)) & -align;
+	return num;
+}
+
+static inline void update_on_xread(struct jit_ctx *ctx)
+{
+	if (!(ctx->flags & SEEN_X))
+		ctx->flags |= FLAG_NEED_X_RESET;
+
+	ctx->flags |= SEEN_X;
+}
+
+static bool is_load_to_a(u16 inst)
+{
+	switch (inst) {
+	case BPF_S_LD_W_LEN:
+	case BPF_S_LD_W_ABS:
+	case BPF_S_LD_H_ABS:
+	case BPF_S_LD_B_ABS:
+	case BPF_S_ANC_CPU:
+	case BPF_S_ANC_IFINDEX:
+	case BPF_S_ANC_MARK:
+	case BPF_S_ANC_PROTOCOL:
+	case BPF_S_ANC_RXHASH:
+	case BPF_S_ANC_VLAN_TAG:
+	case BPF_S_ANC_VLAN_TAG_PRESENT:
+	case BPF_S_ANC_QUEUE:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
+{
+	int i = 0, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	/* Adjust the stack pointer */
+	emit_stack_offset(-align_sp(offset), ctx);
+
+	if (ctx->flags & SEEN_CALL) {
+		/* Argument save area */
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is essentially a bitmap */
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			emit_store_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					     ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* save return address */
+	if (ctx->flags & SEEN_CALL) {
+		emit_store_stack_reg(r_ra, r_sp, real_off, ctx);
+		real_off += RSIZE;
+	}
+
+	/* Setup r_M leaving the alignment gap if necessary */
+	if (ctx->flags & SEEN_MEM) {
+		if (real_off % (RSIZE * 2))
+			real_off += RSIZE;
+		emit_addiu(r_M, r_sp, real_off, ctx);
+	}
+}
+
+static void restore_bpf_jit_regs(struct jit_ctx *ctx,
+				 unsigned int offset)
+{
+	int i, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	if (ctx->flags & SEEN_CALL) {
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_load_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_load_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is a bitmap */
+	i = 0;
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			emit_load_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					    ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* restore return address */
+	if (ctx->flags & SEEN_CALL)
+		emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
+
+	/* Restore the sp and discard the scrach memory */
+	emit_stack_offset(align_sp(offset), ctx);
+}
+
+static unsigned int get_stack_depth(struct jit_ctx *ctx)
+{
+	int sp_off = 0;
+
+
+	/* How may s* regs do we need to preserved? */
+	sp_off += hweight32(ctx->flags >> SEEN_SREG_SFT) * RSIZE;
+
+	if (ctx->flags & SEEN_MEM)
+		sp_off += 4 * BPF_MEMWORDS; /* BPF_MEMWORDS are 32-bit */
+
+	if (ctx->flags & SEEN_CALL)
+		/*
+		 * The JIT code make calls to external functions using 2
+		 * arguments. Therefore, for o32 we don't need to allocate
+		 * space because we don't care if the argumetns are lost
+		 * across calls. We do need however to preserve incoming
+		 * arguments but the space is already allocated for us by
+		 * the caller. On the other hand, for n64, we need to allocate
+		 * this space ourselves. We need to preserve $ra as well.
+		 */
+		sp_off += config_enabled(CONFIG_64BIT) ?
+			(ARGS_USED_BY_JIT + 1) * RSIZE : RSIZE;
+
+	/*
+	 * Subtract the bytes for the last registers since we only care about
+	 * the location on the stack pointer.
+	 */
+	return sp_off - RSIZE;
+}
+
+static void build_prologue(struct jit_ctx *ctx)
+{
+	u16 first_inst = ctx->skf->insns[0].code;
+	int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+	sp_off = get_stack_depth(ctx);
+	save_bpf_jit_regs(ctx, sp_off);
+
+	if (ctx->flags & SEEN_SKB)
+		emit_reg_move(r_skb, MIPS_R_A0, ctx);
+
+	if (ctx->flags & FLAG_NEED_X_RESET)
+		emit_jit_reg_move(r_X, r_zero, ctx);
+
+	/* Do not leak kernel data to userspace */
+	if ((first_inst != BPF_S_RET_K) && !(is_load_to_a(first_inst)))
+		emit_jit_reg_move(r_A, r_zero, ctx);
+}
+
+static void build_epilogue(struct jit_ctx *ctx)
+{
+	unsigned int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+
+	sp_off = get_stack_depth(ctx);
+	restore_bpf_jit_regs(ctx, sp_off);
+
+	/* Return */
+	emit_jr(r_ra, ctx);
+	emit_nop(ctx);
+}
+
+static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
+{
+	u8 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 1);
+
+	return (u64)err << 32 | ret;
+}
+
+static u64 jit_get_skb_h(struct sk_buff *skb, unsigned offset)
+{
+	u16 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 2);
+
+	return (u64)err << 32 | ntohs(ret);
+}
+
+static u64 jit_get_skb_w(struct sk_buff *skb, unsigned offset)
+{
+	u32 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 4);
+
+	return (u64)err << 32 | ntohl(ret);
+}
+
+#define PKT_TYPE_MAX 7
+static int pkt_type_offset(void)
+{
+	struct sk_buff skb_probe = {
+		.pkt_type = ~0,
+	};
+	char *ct = (char *)&skb_probe;
+	unsigned int off;
+
+	for (off = 0; off < sizeof(struct sk_buff); off++) {
+		if (ct[off] == PKT_TYPE_MAX)
+			return off;
+	}
+	pr_err_once("Please fix pkt_type_offset(), as pkt_type couldn't be found\n");
+	return -1;
+}
+
+static int build_body(struct jit_ctx *ctx)
+{
+	void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
+	const struct sk_filter *prog = ctx->skf;
+	const struct sock_filter *inst;
+	unsigned int i, off, load_order, condt;
+	u32 k, b_off __maybe_unused;
+
+	for (i = 0; i < prog->len; i++) {
+		inst = &(prog->insns[i]);
+		pr_debug("%s: code->0x%02x, jt->0x%x, jf->0x%x, k->0x%x\n",
+			 __func__, inst->code, inst->jt, inst->jf, inst->k);
+		k = inst->k;
+
+		if (ctx->target == NULL)
+			ctx->offsets[i] = ctx->idx * 4;
+
+		switch (inst->code) {
+		case BPF_S_LD_IMM:
+			/* A <- k ==> li r_A, k */
+			ctx->flags |= SEEN_A;
+			emit_load_imm(r_A, k, ctx);
+			break;
+		case BPF_S_LD_W_LEN:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4);
+			/* A <- len ==> lw r_A, offset(skb) */
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_LD_MEM:
+			/* A <- M[k] ==> lw r_A, offset(M) */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_load(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LD_W_ABS:
+			/* A <- P[k:4] */
+			load_order = 2;
+			goto load;
+		case BPF_S_LD_H_ABS:
+			/* A <- P[k:2] */
+			load_order = 1;
+			goto load;
+		case BPF_S_LD_B_ABS:
+			/* A <- P[k:1] */
+			load_order = 0;
+load:
+			emit_load_imm(r_off, k, ctx);
+load_common:
+			ctx->flags |= SEEN_CALL | SEEN_OFF | SEEN_S0 |
+				SEEN_SKB | SEEN_A;
+
+			emit_load_func(r_s0, (ptr)load_func[load_order],
+				      ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			/* Load second argument to delay slot */
+			emit_reg_move(MIPS_R_A1, r_off, ctx);
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Get error code from the top 32-bits */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero, 3 << 2,
+					   ctx);
+			} else {
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_err, r_zero, 3 << 2,
+					   ctx);
+			}
+			emit_nop(ctx);
+			/* We are good */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_jit_reg_move(r_A, r_val, ctx);
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			break;
+		case BPF_S_LD_W_IND:
+			/* A <- P[X + k:4] */
+			load_order = 2;
+			goto load_ind;
+		case BPF_S_LD_H_IND:
+			/* A <- P[X + k:2] */
+			load_order = 1;
+			goto load_ind;
+		case BPF_S_LD_B_IND:
+			/* A <- P[X + k:1] */
+			load_order = 0;
+load_ind:
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_OFF | SEEN_X;
+			emit_addiu(r_off, r_X, k, ctx);
+			goto load_common;
+		case BPF_S_LDX_IMM:
+			/* X <- k */
+			ctx->flags |= SEEN_X;
+			emit_load_imm(r_X, k, ctx);
+			break;
+		case BPF_S_LDX_MEM:
+			/* X <- M[k] */
+			ctx->flags |= SEEN_X | SEEN_MEM;
+			emit_load(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LDX_W_LEN:
+			/* X <- len */
+			ctx->flags |= SEEN_X | SEEN_SKB;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_X, r_skb, off, ctx);
+			break;
+		case BPF_S_LDX_B_MSH:
+			/* X <- 4 * (P[k:1] & 0xf) */
+			ctx->flags |= SEEN_X | SEEN_CALL | SEEN_S0 | SEEN_SKB;
+			/* Load offset to a1 */
+			emit_load_func(r_s0, (ptr)jit_get_skb_b, ctx);
+			/*
+			 * This may emit two instructions so it may not fit
+			 * in the delay slot. So use a0 in the delay slot.
+			 */
+			emit_load_imm(MIPS_R_A1, k, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx); /* delay slot */
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Top 32-bits of $v0 on 64-bit */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero,
+					   3 << 2, ctx);
+			} else {
+				emit_bcond(MIPS_COND_NE, r_err, r_zero,
+					   3 << 2, ctx);
+			}
+			/* No need for delay slot */
+			/* We are good */
+			/* X <- P[1:K] & 0xf */
+			emit_andi(r_X, r_val, 0xf, ctx);
+			/* X << 2 */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_sll(r_X, r_X, 2, ctx); /* delay slot */
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_ret, 0, ctx); /* delay slot */
+			break;
+		case BPF_S_ST:
+			/* M[k] <- A */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_store(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_STX:
+			/* M[k] <- X */
+			ctx->flags |= SEEN_MEM | SEEN_X;
+			emit_store(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_ALU_ADD_K:
+			/* A += K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_ADD_X:
+			/* A += X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_addu(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_SUB_K:
+			/* A -= K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, -k, ctx);
+			break;
+		case BPF_S_ALU_SUB_X:
+			/* A -= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_subu(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_MUL_K:
+			/* A *= K */
+			/* Load K to scratch register before MUL */
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_mul(r_A, r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MUL_X:
+			/* A *= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_mul(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_DIV_K:
+			/* A /= k */
+			if (k == 1)
+				break;
+			if (optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_srl(r_A, r_A, k, ctx);
+				break;
+			}
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_div(r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MOD_K:
+			/* A %= k */
+			if (k == 1 || optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_jit_reg_move(r_A, r_zero, ctx);
+			} else {
+				ctx->flags |= SEEN_A | SEEN_S0;
+				emit_load_imm(r_s0, k, ctx);
+				emit_mod(r_A, r_s0, ctx);
+			}
+			break;
+		case BPF_S_ALU_DIV_X:
+			/* A /= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_div(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_MOD_X:
+			/* A %= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_mod(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_OR_K:
+			/* A |= K */
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_OR_X:
+			/* A |= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_XOR_K:
+			/* A ^= k */
+			ctx->flags |= SEEN_A;
+			emit_xori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ANC_ALU_XOR_X:
+		case BPF_S_ALU_XOR_X:
+			/* A ^= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_xor(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_AND_K:
+			/* A &= K */
+			ctx->flags |= SEEN_A;
+			emit_andi(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_AND_X:
+			/* A &= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_and(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_LSH_K:
+			/* A <<= K */
+			ctx->flags |= SEEN_A;
+			emit_sll(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_LSH_X:
+			/* A <<= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_sllv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_RSH_K:
+			/* A >>= K */
+			ctx->flags |= SEEN_A;
+			emit_srl(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_RSH_X:
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_srlv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_NEG:
+			/* A = -A */
+			ctx->flags |= SEEN_A;
+			emit_neg(r_A, ctx);
+			break;
+		case BPF_S_JMP_JA:
+			/* pc += K */
+			emit_b(b_imm(i + k + 1, ctx), ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JEQ_K:
+			/* pc += ( A == K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JEQ_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A == X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_K:
+			/* pc += ( A >= K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A >= X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_K:
+			/* pc += ( A > K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A > X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_X;
+jmp_cmp:
+			/* Greater or Equal */
+			if ((condt & MIPS_COND_GE) ||
+			    (condt & MIPS_COND_GT)) {
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_sltiu(r_s0, r_A, k, ctx);
+				} else { /* X */
+					ctx->flags |= SEEN_S0 | SEEN_A |
+						SEEN_X;
+					emit_sltu(r_s0, r_A, r_X, ctx);
+				}
+				/* A < (K|X) ? r_scrach = 1 */
+				b_off = b_imm(i + inst->jf + 1, ctx);
+				emit_bcond(MIPS_COND_GT, r_s0, r_zero, b_off,
+					   ctx);
+				emit_nop(ctx);
+				/* A > (K|X) ? scratch = 0 */
+				if (condt & MIPS_COND_GT) {
+					/* Checking for equality */
+					ctx->flags |= SEEN_S0 | SEEN_A | SEEN_X;
+					if (condt & MIPS_COND_K)
+						emit_load_imm(r_s0, k, ctx);
+					else
+						emit_jit_reg_move(r_s0, r_X,
+								  ctx);
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* Finally, A > K|X */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				} else {
+					/* A >= (K|X) so jump */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				}
+			} else {
+				/* A == K|X */
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_load_imm(r_s0, k, ctx);
+					/* jump true */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+				} else { /* X */
+					/* jump true */
+					ctx->flags |= SEEN_A | SEEN_X;
+					b_off = b_imm(i + inst->jt + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+				}
+			}
+			break;
+		case BPF_S_JMP_JSET_K:
+			ctx->flags |= SEEN_S0 | SEEN_S1 | SEEN_A;
+			/* pc += (A & K) ? pc -> jt : pc -> jf */
+			emit_load_imm(r_s1, k, ctx);
+			emit_and(r_s0, r_A, r_s1, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JSET_X:
+			ctx->flags |= SEEN_S0 | SEEN_X | SEEN_A;
+			/* pc += (A & X) ? pc -> jt : pc -> jf */
+			emit_and(r_s0, r_A, r_X, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_RET_A:
+			ctx->flags |= SEEN_A;
+			if (i != prog->len - 1)
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_A, ctx); /* delay slot */
+			break;
+		case BPF_S_RET_K:
+			/*
+			 * It can emit two instructions so it does not fit on
+			 * the delay slot.
+			 */
+			emit_load_imm(r_ret, k, ctx);
+			if (i != prog->len - 1) {
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+				emit_nop(ctx);
+			}
+			break;
+		case BPF_S_MISC_TAX:
+			/* X = A */
+			ctx->flags |= SEEN_X | SEEN_A;
+			emit_jit_reg_move(r_X, r_A, ctx);
+			break;
+		case BPF_S_MISC_TXA:
+			/* A = X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_jit_reg_move(r_A, r_X, ctx);
+			break;
+		/* AUX */
+		case BPF_S_ANC_PROTOCOL:
+			/* A = ntohs(skb->protocol */
+			ctx->flags |= SEEN_SKB | SEEN_OFF | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  protocol) != 2);
+			off = offsetof(struct sk_buff, protocol);
+			emit_half_load(r_A, r_skb, off, ctx);
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+			/* This needs little endian fixup */
+			if (cpu_has_mips_r1) {
+				/* Get first byte */
+				emit_andi(r_tmp_imm, r_A, 0xff, ctx);
+				/* Shift it */
+				emit_sll(r_tmp, r_tmp_imm, 8, ctx);
+				/* Get second byte */
+				emit_srl(r_tmp_imm, r_A, 8, ctx);
+				emit_andi(r_tmp_imm, r_tmp_imm, 0xff, ctx);
+				/* Put everyting together in r_A */
+				emit_or(r_A, r_tmp, r_tmp_imm, ctx);
+			} else {
+				/* R2 and later have the wsbh instruction */
+				emit_wsbh(r_A, r_A, ctx);
+			}
+#endif
+			break;
+		case BPF_S_ANC_CPU:
+			ctx->flags |= SEEN_A | SEEN_OFF;
+			/* A = current_thread_info()->cpu */
+			BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info,
+						  cpu) != 4);
+			off = offsetof(struct thread_info, cpu);
+			/* $28/gp points to the thread_info struct */
+			emit_load(r_A, 28, off, ctx);
+			break;
+		case BPF_S_ANC_IFINDEX:
+			/* A = skb->dev->ifindex */
+			ctx->flags |= SEEN_SKB | SEEN_A | SEEN_S0;
+			off = offsetof(struct sk_buff, dev);
+			emit_load(r_s0, r_skb, off, ctx);
+			/* error (0) in the delay slot */
+			emit_bcond(MIPS_COND_EQ, r_s0, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+						  ifindex) != 4);
+			off = offsetof(struct net_device, ifindex);
+			emit_load(r_A, r_s0, off, ctx);
+			break;
+		case BPF_S_ANC_MARK:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
+			off = offsetof(struct sk_buff, mark);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_RXHASH:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, hash) != 4);
+			off = offsetof(struct sk_buff, hash);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_VLAN_TAG:
+		case BPF_S_ANC_VLAN_TAG_PRESENT:
+			ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  vlan_tci) != 2);
+			off = offsetof(struct sk_buff, vlan_tci);
+			emit_half_load(r_s0, r_skb, off, ctx);
+			if (inst->code == BPF_S_ANC_VLAN_TAG)
+				emit_and(r_A, r_s0, VLAN_VID_MASK, ctx);
+			else
+				emit_and(r_A, r_s0, VLAN_TAG_PRESENT, ctx);
+			break;
+		case BPF_S_ANC_PKTTYPE:
+			off = pkt_type_offset();
+
+			if (off < 0)
+				return -1;
+			emit_load_byte(r_tmp, r_skb, off, ctx);
+			/* Keep only the last 3 bits */
+			emit_andi(r_A, r_tmp, PKT_TYPE_MAX, ctx);
+			break;
+		case BPF_S_ANC_QUEUE:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  queue_mapping) != 2);
+			BUILD_BUG_ON(offsetof(struct sk_buff,
+					      queue_mapping) > 0xff);
+			off = offsetof(struct sk_buff, queue_mapping);
+			emit_half_load(r_A, r_skb, off, ctx);
+			break;
+		default:
+			pr_warn("%s: Unhandled opcode: 0x%02x\n", __FILE__,
+				inst->code);
+			return -1;
+		}
+	}
+
+	/* compute offsets only during the first pass */
+	if (ctx->target == NULL)
+		ctx->offsets[i] = ctx->idx * 4;
+
+	return 0;
+}
+
+int bpf_jit_enable __read_mostly;
+
+void bpf_jit_compile(struct sk_filter *fp)
+{
+	struct jit_ctx ctx;
+	unsigned int alloc_size, tmp_idx;
+
+	if (!bpf_jit_enable)
+		return;
+
+	memset(&ctx, 0, sizeof(ctx));
+
+	ctx.offsets = kcalloc(fp->len, sizeof(*ctx.offsets), GFP_KERNEL);
+	if (ctx.offsets == NULL)
+		return;
+
+	ctx.skf = fp;
+
+	if (build_body(&ctx))
+		goto out;
+
+	tmp_idx = ctx.idx;
+	build_prologue(&ctx);
+	ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
+	/* just to complete the ctx.idx count */
+	build_epilogue(&ctx);
+
+	alloc_size = 4 * ctx.idx;
+	ctx.target = module_alloc(alloc_size);
+	if (ctx.target == NULL)
+		goto out;
+
+	/* Clean it */
+	memset(ctx.target, 0, alloc_size);
+
+	ctx.idx = 0;
+
+	/* Generate the actual JIT code */
+	build_prologue(&ctx);
+	build_body(&ctx);
+	build_epilogue(&ctx);
+
+	/* Update the icache */
+	flush_icache_range((ptr)ctx.target, (ptr)(ctx.target + ctx.idx));
+
+	if (bpf_jit_enable > 1)
+		/* Dump JIT code */
+		bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);
+
+	fp->bpf_func = (void *)ctx.target;
+	fp->jited = 1;
+
+out:
+	kfree(ctx.offsets);
+}
+
+void bpf_jit_free(struct sk_filter *fp)
+{
+	if (fp->jited)
+		module_free(NULL, fp->bpf_func);
+	kfree(fp);
+}
diff --git a/arch/mips/net/bpf_jit.h b/arch/mips/net/bpf_jit.h
new file mode 100644
index 0000000..3a5751b
--- /dev/null
+++ b/arch/mips/net/bpf_jit.h
@@ -0,0 +1,44 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#ifndef BPF_JIT_MIPS_OP_H
+#define BPF_JIT_MIPS_OP_H
+
+/* Registers used by JIT */
+#define MIPS_R_ZERO	0
+#define MIPS_R_V0	2
+#define MIPS_R_V1	3
+#define MIPS_R_A0	4
+#define MIPS_R_A1	5
+#define MIPS_R_T6	14
+#define MIPS_R_T7	15
+#define MIPS_R_S0	16
+#define MIPS_R_S1	17
+#define MIPS_R_S2	18
+#define MIPS_R_S3	19
+#define MIPS_R_S4	20
+#define MIPS_R_S5	21
+#define MIPS_R_S6	22
+#define MIPS_R_S7	23
+#define MIPS_R_SP	29
+#define MIPS_R_RA	31
+
+/* Conditional codes */
+#define MIPS_COND_EQ	0x1
+#define MIPS_COND_GE	(0x1 << 1)
+#define MIPS_COND_GT	(0x1 << 2)
+#define MIPS_COND_NE	(0x1 << 3)
+#define MIPS_COND_ALL	(0x1 << 4)
+/* Conditionals on X register or K immediate */
+#define MIPS_COND_X	(0x1 << 5)
+#define MIPS_COND_K	(0x1 << 6)
+
+#endif /* BPF_JIT_MIPS_OP_H */
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 13/14] MIPS: net: Add BPF JIT
@ 2014-04-24 16:31       ` Paul Burton
  0 siblings, 0 replies; 53+ messages in thread
From: Paul Burton @ 2014-04-24 16:31 UTC (permalink / raw)
  To: Markos Chandras; +Cc: linux-mips, David S. Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]

On Thu, Apr 24, 2014 at 03:50:13PM +0100, Markos Chandras wrote:
> diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> new file mode 100644
> index 0000000..864e5b7
> --- /dev/null
> +++ b/arch/mips/net/bpf_jit.c

... cut ...

> +			off = offsetof(struct sk_buff, protocol);
> +			emit_half_load(r_A, r_skb, off, ctx);
> +#ifdef CONFIG_CPU_LITTLE_ENDIAN
> +			/* This needs little endian fixup */
> +			if (cpu_has_mips_r1) {

Doesn't cpu_has_mips_r1 cover everything >= r1? ie. everything will now
take this path, including systems >= r2. Don't you want the inverse
(!cpu_has_mips_r2) instead?

Paul

> +				/* Get first byte */
> +				emit_andi(r_tmp_imm, r_A, 0xff, ctx);
> +				/* Shift it */
> +				emit_sll(r_tmp, r_tmp_imm, 8, ctx);
> +				/* Get second byte */
> +				emit_srl(r_tmp_imm, r_A, 8, ctx);
> +				emit_andi(r_tmp_imm, r_tmp_imm, 0xff, ctx);
> +				/* Put everyting together in r_A */
> +				emit_or(r_A, r_tmp, r_tmp_imm, ctx);
> +			} else {
> +				/* R2 and later have the wsbh instruction */
> +				emit_wsbh(r_A, r_A, ctx);
> +			}

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 13/14] MIPS: net: Add BPF JIT
@ 2014-04-24 16:31       ` Paul Burton
  0 siblings, 0 replies; 53+ messages in thread
From: Paul Burton @ 2014-04-24 16:31 UTC (permalink / raw)
  To: Markos Chandras; +Cc: linux-mips, David S. Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]

On Thu, Apr 24, 2014 at 03:50:13PM +0100, Markos Chandras wrote:
> diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> new file mode 100644
> index 0000000..864e5b7
> --- /dev/null
> +++ b/arch/mips/net/bpf_jit.c

... cut ...

> +			off = offsetof(struct sk_buff, protocol);
> +			emit_half_load(r_A, r_skb, off, ctx);
> +#ifdef CONFIG_CPU_LITTLE_ENDIAN
> +			/* This needs little endian fixup */
> +			if (cpu_has_mips_r1) {

Doesn't cpu_has_mips_r1 cover everything >= r1? ie. everything will now
take this path, including systems >= r2. Don't you want the inverse
(!cpu_has_mips_r2) instead?

Paul

> +				/* Get first byte */
> +				emit_andi(r_tmp_imm, r_A, 0xff, ctx);
> +				/* Shift it */
> +				emit_sll(r_tmp, r_tmp_imm, 8, ctx);
> +				/* Get second byte */
> +				emit_srl(r_tmp_imm, r_A, 8, ctx);
> +				emit_andi(r_tmp_imm, r_tmp_imm, 0xff, ctx);
> +				/* Put everyting together in r_A */
> +				emit_or(r_A, r_tmp, r_tmp_imm, ctx);
> +			} else {
> +				/* R2 and later have the wsbh instruction */
> +				emit_wsbh(r_A, r_A, ctx);
> +			}

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 13/14] MIPS: net: Add BPF JIT
@ 2014-04-25 12:06         ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-25 12:06 UTC (permalink / raw)
  To: Paul Burton; +Cc: linux-mips, David S. Miller, netdev

On Thu, Apr 24, 2014 at 05:31:14PM +0100, Paul Burton wrote:
> On Thu, Apr 24, 2014 at 03:50:13PM +0100, Markos Chandras wrote:
> > diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> > new file mode 100644
> > index 0000000..864e5b7
> > --- /dev/null
> > +++ b/arch/mips/net/bpf_jit.c
> 
> ... cut ...
> 
> > +			off = offsetof(struct sk_buff, protocol);
> > +			emit_half_load(r_A, r_skb, off, ctx);
> > +#ifdef CONFIG_CPU_LITTLE_ENDIAN
> > +			/* This needs little endian fixup */
> > +			if (cpu_has_mips_r1) {
> 
> Doesn't cpu_has_mips_r1 cover everything >= r1? ie. everything will now
> take this path, including systems >= r2. Don't you want the inverse
> (!cpu_has_mips_r2) instead?
> 
> Paul

Hi Paul,

Yes you are right. I will fix it and submit a new version.

-- 
markos

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 13/14] MIPS: net: Add BPF JIT
@ 2014-04-25 12:06         ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-25 12:06 UTC (permalink / raw)
  To: Paul Burton; +Cc: linux-mips, David S. Miller, netdev

On Thu, Apr 24, 2014 at 05:31:14PM +0100, Paul Burton wrote:
> On Thu, Apr 24, 2014 at 03:50:13PM +0100, Markos Chandras wrote:
> > diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> > new file mode 100644
> > index 0000000..864e5b7
> > --- /dev/null
> > +++ b/arch/mips/net/bpf_jit.c
> 
> ... cut ...
> 
> > +			off = offsetof(struct sk_buff, protocol);
> > +			emit_half_load(r_A, r_skb, off, ctx);
> > +#ifdef CONFIG_CPU_LITTLE_ENDIAN
> > +			/* This needs little endian fixup */
> > +			if (cpu_has_mips_r1) {
> 
> Doesn't cpu_has_mips_r1 cover everything >= r1? ie. everything will now
> take this path, including systems >= r2. Don't you want the inverse
> (!cpu_has_mips_r2) instead?
> 
> Paul

Hi Paul,

Yes you are right. I will fix it and submit a new version.

-- 
markos

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v4 13/14] MIPS: net: Add BPF JIT
@ 2014-04-29 15:58       ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-29 15:58 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras, David S. Miller, netdev

This adds initial support for BPF-JIT on MIPS

Tested on mips32 LE/BE and mips64 BE/n64 using
dhcp, ping and various tcpdump filters.

Benchmarking:

Assuming the remote MIPS target uses 192.168.154.181
as its IP address, and the local host uses 192.168.154.136,
the following results can be obtained using the following
tcpdump filter (catches no frames) and a simple
'time ping -f -c 1000000' command.

[root@(none) ~]# tcpdump -p -n -s 0 -i eth0 net 10.0.0.0/24 -d
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 8
(002) ld       [26]
(003) and      #0xffffff00
(004) jeq      #0xa000000       jt 16	jf 5
(005) ld       [30]
(006) and      #0xffffff00
(007) jeq      #0xa000000       jt 16	jf 17
(008) jeq      #0x806           jt 10	jf 9
(009) jeq      #0x8035          jt 10	jf 17
(010) ld       [28]
(011) and      #0xffffff00
(012) jeq      #0xa000000       jt 16	jf 13
(013) ld       [38]
(014) and      #0xffffff00
(015) jeq      #0xa000000       jt 16	jf 17
(016) ret      #65535

- BPF-JIT Disabled

real    1m38.005s
user    0m1.510s
sys     0m6.710s

- BPF-JIT Enabled

real    1m35.215s
user    0m1.200s
sys     0m4.140s

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
Changes since v3:
- Replace cpu_has_mips_r1 with !cpu_has_mips_r2 to avoid taking the slow path
even on MIPS R2 processors
---
 arch/mips/Kbuild        |    1 +
 arch/mips/net/Makefile  |    3 +
 arch/mips/net/bpf_jit.c | 1399 +++++++++++++++++++++++++++++++++++++++++++++++
 arch/mips/net/bpf_jit.h |   44 ++
 4 files changed, 1447 insertions(+)
 create mode 100644 arch/mips/net/Makefile
 create mode 100644 arch/mips/net/bpf_jit.c
 create mode 100644 arch/mips/net/bpf_jit.h

diff --git a/arch/mips/Kbuild b/arch/mips/Kbuild
index d2cfe45..9179336 100644
--- a/arch/mips/Kbuild
+++ b/arch/mips/Kbuild
@@ -17,6 +17,7 @@ obj- := $(platform-)
 obj-y += kernel/
 obj-y += mm/
 obj-y += math-emu/
+obj-y += net/
 
 ifdef CONFIG_KVM
 obj-y += kvm/
diff --git a/arch/mips/net/Makefile b/arch/mips/net/Makefile
new file mode 100644
index 0000000..ae74b3a
--- /dev/null
+++ b/arch/mips/net/Makefile
@@ -0,0 +1,3 @@
+# MIPS networking code
+
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
new file mode 100644
index 0000000..98e9d23
--- /dev/null
+++ b/arch/mips/net/bpf_jit.c
@@ -0,0 +1,1399 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <linux/kconfig.h>
+#include <linux/moduleloader.h>
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <asm/bitops.h>
+#include <asm/cacheflush.h>
+#include <asm/cpu-features.h>
+#include <asm/uasm.h>
+
+#include "bpf_jit.h"
+
+/* ABI
+ *
+ * s0	1st scratch register
+ * s1	2nd scratch register
+ * s2	offset register
+ * s3	BPF register A
+ * s4	BPF register X
+ * s5	*skb
+ * s6	*scratch memory
+ *
+ * On entry (*bpf_func)(*skb, *filter)
+ * a0 = MIPS_R_A0 = skb;
+ * a1 = MIPS_R_A1 = filter;
+ *
+ * Stack
+ * ...
+ * M[15]
+ * M[14]
+ * M[13]
+ * ...
+ * M[0] <-- r_M
+ * saved reg k-1
+ * saved reg k-2
+ * ...
+ * saved reg 0 <-- r_sp
+ * <no argument area>
+ *
+ *                     Packet layout
+ *
+ * <--------------------- len ------------------------>
+ * <--skb-len(r_skb_hl)-->< ----- skb->data_len ------>
+ * ----------------------------------------------------
+ * |                  skb->data                       |
+ * ----------------------------------------------------
+ */
+
+#define RSIZE	(sizeof(unsigned long))
+#define ptr typeof(unsigned long)
+
+/* ABI specific return values */
+#ifdef CONFIG_32BIT /* O32 */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define r_err	MIPS_R_V1
+#define r_val	MIPS_R_V0
+#else /* CONFIG_CPU_LITTLE_ENDIAN */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V1
+#endif
+#else /* N64 */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V0
+#endif
+
+#define r_ret	MIPS_R_V0
+
+/*
+ * Use 2 scratch registers to avoid pipeline interlocks.
+ * There is no overhead during epilogue and prologue since
+ * any of the $s0-$s6 registers will only be preserved if
+ * they are going to actually be used.
+ */
+#define r_s0		MIPS_R_S0 /* scratch reg 1 */
+#define r_s1		MIPS_R_S1 /* scratch reg 2 */
+#define r_off		MIPS_R_S2
+#define r_A		MIPS_R_S3
+#define r_X		MIPS_R_S4
+#define r_skb		MIPS_R_S5
+#define r_M		MIPS_R_S6
+#define r_tmp_imm	MIPS_R_T6 /* No need to preserve this */
+#define r_tmp		MIPS_R_T7 /* No need to preserve this */
+#define r_zero		MIPS_R_ZERO
+#define r_sp		MIPS_R_SP
+#define r_ra		MIPS_R_RA
+
+#define SCRATCH_OFF(k)		(4 * (k))
+
+/* JIT flags */
+#define SEEN_CALL		(1 << BPF_MEMWORDS)
+#define SEEN_SREG_SFT		(BPF_MEMWORDS + 1)
+#define SEEN_SREG_BASE		(1 << SEEN_SREG_SFT)
+#define SEEN_SREG(x)		(SEEN_SREG_BASE << (x))
+#define SEEN_S0			SEEN_SREG(0)
+#define SEEN_S1			SEEN_SREG(1)
+#define SEEN_OFF		SEEN_SREG(2)
+#define SEEN_A			SEEN_SREG(3)
+#define SEEN_X			SEEN_SREG(4)
+#define SEEN_SKB		SEEN_SREG(5)
+#define SEEN_MEM		SEEN_SREG(6)
+
+/* Arguments used by JIT */
+#define ARGS_USED_BY_JIT	2 /* only applicable to 64-bit */
+
+#define FLAG_NEED_X_RESET	(1 << 0)
+
+#define SBIT(x)			(1 << (x)) /* Signed version of BIT() */
+
+/**
+ * struct jit_ctx - JIT context
+ * @skf:		The sk_filter
+ * @prologue_bytes:	Number of bytes for prologue
+ * @idx:		Instruction index
+ * @flags:		JIT flags
+ * @offsets:		Instruction offsets
+ * @target:		Memory location for the compiled filter
+ */
+struct jit_ctx {
+	const struct sk_filter *skf;
+	unsigned int prologue_bytes;
+	u32 idx;
+	u32 flags;
+	u32 *offsets;
+	u32 *target;
+};
+
+
+static inline int optimize_div(u32 *k)
+{
+	/* power of 2 divides can be implemented with right shift */
+	if (!(*k & (*k-1))) {
+		*k = ilog2(*k);
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Simply emit the instruction if the JIT memory space has been allocated */
+#define emit_instr(ctx, func, ...)			\
+do {							\
+	if ((ctx)->target != NULL) {			\
+		u32 *p = &(ctx)->target[ctx->idx];	\
+		uasm_i_##func(&p, ##__VA_ARGS__);	\
+	}						\
+	(ctx)->idx++;					\
+} while (0)
+
+/* Determine if immediate is within the 16-bit signed range */
+static inline bool is_range16(s32 imm)
+{
+	if (imm >= SBIT(15) || imm < -SBIT(15))
+		return true;
+	return false;
+}
+
+static inline void emit_addu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, addu, dst, src1, src2);
+}
+
+static inline void emit_nop(struct jit_ctx *ctx)
+{
+	emit_instr(ctx, nop);
+}
+
+/* Load a u32 immediate to a register */
+static inline void emit_load_imm(unsigned int dst, u32 imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		/* addiu can only handle s16 */
+		if (is_range16(imm)) {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_lui(&p, r_tmp_imm, (s32)imm >> 16);
+			p = &ctx->target[ctx->idx + 1];
+			uasm_i_ori(&p, dst, r_tmp_imm, imm & 0xffff);
+		} else {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_addiu(&p, dst, r_zero, imm);
+		}
+	}
+	ctx->idx++;
+
+	if (is_range16(imm))
+		ctx->idx++;
+}
+
+static inline void emit_or(unsigned int dst, unsigned int src1,
+			   unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, or, dst, src1, src2);
+}
+
+static inline void emit_ori(unsigned int dst, unsigned src, u32 imm,
+			    struct jit_ctx *ctx)
+{
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_or(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, ori, dst, src, imm);
+	}
+}
+
+
+static inline void emit_daddu(unsigned int dst, unsigned int src1,
+			      unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, daddu, dst, src1, src2);
+}
+
+static inline void emit_daddiu(unsigned int dst, unsigned int src,
+			       int imm, struct jit_ctx *ctx)
+{
+	/*
+	 * Only used for stack, so the imm is relatively small
+	 * and it fits in 15-bits
+	 */
+	emit_instr(ctx, daddiu, dst, src, imm);
+}
+
+static inline void emit_addiu(unsigned int dst, unsigned int src,
+			      u32 imm, struct jit_ctx *ctx)
+{
+	if (is_range16(imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_addu(dst, r_tmp, src, ctx);
+	} else {
+		emit_instr(ctx, addiu, dst, src, imm);
+	}
+}
+
+static inline void emit_and(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, and, dst, src1, src2);
+}
+
+static inline void emit_andi(unsigned int dst, unsigned int src,
+			     u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_and(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, andi, dst, src, imm);
+	}
+}
+
+static inline void emit_xor(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, xor, dst, src1, src2);
+}
+
+static inline void emit_xori(ptr dst, ptr src, u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_xor(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, xori, dst, src, imm);
+	}
+}
+
+static inline void emit_stack_offset(int offset, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, daddiu, r_sp, r_sp, offset);
+	else
+		emit_instr(ctx, addiu, r_sp, r_sp, offset);
+
+}
+
+static inline void emit_subu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, subu, dst, src1, src2);
+}
+
+static inline void emit_neg(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_subu(reg, r_zero, reg, ctx);
+}
+
+static inline void emit_sllv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sllv, dst, src, sa);
+}
+
+static inline void emit_sll(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, sll, dst, src, sa);
+}
+
+static inline void emit_srlv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, srlv, dst, src, sa);
+}
+
+static inline void emit_srl(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, srl, dst, src, sa);
+}
+
+static inline void emit_sltu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sltu, dst, src1, src2);
+}
+
+static inline void emit_sltiu(unsigned dst, unsigned int src,
+			      unsigned int imm, struct jit_ctx *ctx)
+{
+	/* 16 bit immediate */
+	if (is_range16((s32)imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_sltu(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, sltiu, dst, src, imm);
+	}
+
+}
+
+/* Store register on the stack */
+static inline void emit_store_stack_reg(ptr reg, ptr base,
+					unsigned int offset,
+					struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, sd, reg, offset, base);
+	else
+		emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_store(ptr reg, ptr base, unsigned int offset,
+			      struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_load_stack_reg(ptr reg, ptr base,
+				       unsigned int offset,
+				       struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, ld, reg, offset, base);
+	else
+		emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_load(unsigned int reg, unsigned int base,
+			     unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_load_byte(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lb, reg, offset, base);
+}
+
+static inline void emit_half_load(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lh, reg, offset, base);
+}
+
+static inline void emit_mul(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, mul, dst, src1, src2);
+}
+
+static inline void emit_div(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mfhi(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_mod(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mflo(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_dsll(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsll, dst, src, sa);
+}
+
+static inline void emit_dsrl32(unsigned int dst, unsigned int src,
+			       unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsrl32, dst, src, sa);
+}
+
+static inline void emit_wsbh(unsigned int dst, unsigned int src,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, wsbh, dst, src);
+}
+
+/* load a function pointer to register */
+static inline void emit_load_func(unsigned int reg, ptr imm,
+				  struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT)) {
+		/* At this point imm is always 64-bit */
+		emit_load_imm(r_tmp, (u64)imm >> 32, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(r_tmp, r_tmp_imm, (imm >> 16) & 0xffff, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(reg, r_tmp_imm, imm & 0xffff, ctx);
+	} else {
+		emit_load_imm(reg, imm, ctx);
+	}
+}
+
+/* Move to real MIPS register */
+static inline void emit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_daddu(dst, src, r_zero, ctx);
+	else
+		emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Move to JIT (32-bit) register */
+static inline void emit_jit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Compute the immediate value for PC-relative branches. */
+static inline u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
+{
+	if (ctx->target == NULL)
+		return 0;
+
+	/*
+	 * We want a pc-relative branch. We only do forward branches
+	 * so tgt is always after pc. tgt is the instruction offset
+	 * we want to jump to.
+
+	 * Branch on MIPS:
+	 * I: target_offset <- sign_extend(offset)
+	 * I+1: PC += target_offset (delay slot)
+	 *
+	 * ctx->idx currently points to the branch instruction
+	 * but the offset is added to the delay slot so we need
+	 * to subtract 4.
+	 */
+	return ctx->offsets[tgt] -
+		(ctx->idx * 4 - ctx->prologue_bytes) - 4;
+}
+
+static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
+			     unsigned int imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+
+		switch (cond) {
+		case MIPS_COND_EQ:
+			uasm_i_beq(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_NE:
+			uasm_i_bne(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_ALL:
+			uasm_i_b(&p, imm);
+			break;
+		default:
+			pr_warn("%s: Unhandled branch conditional: %d\n",
+				__func__, cond);
+		}
+	}
+	ctx->idx++;
+}
+
+static inline void emit_b(unsigned int imm, struct jit_ctx *ctx)
+{
+	emit_bcond(MIPS_COND_ALL, r_zero, r_zero, imm, ctx);
+}
+
+static inline void emit_jalr(unsigned int link, unsigned int reg,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jalr, link, reg);
+}
+
+static inline void emit_jr(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jr, reg);
+}
+
+static inline u16 align_sp(unsigned int num)
+{
+	/* Double word alignment for 32-bit, quadword for 64-bit */
+	unsigned int align = config_enabled(CONFIG_64BIT) ? 16 : 8;
+	num = (num + (align - 1)) & -align;
+	return num;
+}
+
+static inline void update_on_xread(struct jit_ctx *ctx)
+{
+	if (!(ctx->flags & SEEN_X))
+		ctx->flags |= FLAG_NEED_X_RESET;
+
+	ctx->flags |= SEEN_X;
+}
+
+static bool is_load_to_a(u16 inst)
+{
+	switch (inst) {
+	case BPF_S_LD_W_LEN:
+	case BPF_S_LD_W_ABS:
+	case BPF_S_LD_H_ABS:
+	case BPF_S_LD_B_ABS:
+	case BPF_S_ANC_CPU:
+	case BPF_S_ANC_IFINDEX:
+	case BPF_S_ANC_MARK:
+	case BPF_S_ANC_PROTOCOL:
+	case BPF_S_ANC_RXHASH:
+	case BPF_S_ANC_VLAN_TAG:
+	case BPF_S_ANC_VLAN_TAG_PRESENT:
+	case BPF_S_ANC_QUEUE:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
+{
+	int i = 0, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	/* Adjust the stack pointer */
+	emit_stack_offset(-align_sp(offset), ctx);
+
+	if (ctx->flags & SEEN_CALL) {
+		/* Argument save area */
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is essentially a bitmap */
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			emit_store_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					     ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* save return address */
+	if (ctx->flags & SEEN_CALL) {
+		emit_store_stack_reg(r_ra, r_sp, real_off, ctx);
+		real_off += RSIZE;
+	}
+
+	/* Setup r_M leaving the alignment gap if necessary */
+	if (ctx->flags & SEEN_MEM) {
+		if (real_off % (RSIZE * 2))
+			real_off += RSIZE;
+		emit_addiu(r_M, r_sp, real_off, ctx);
+	}
+}
+
+static void restore_bpf_jit_regs(struct jit_ctx *ctx,
+				 unsigned int offset)
+{
+	int i, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	if (ctx->flags & SEEN_CALL) {
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_load_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_load_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is a bitmap */
+	i = 0;
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			emit_load_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					    ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* restore return address */
+	if (ctx->flags & SEEN_CALL)
+		emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
+
+	/* Restore the sp and discard the scrach memory */
+	emit_stack_offset(align_sp(offset), ctx);
+}
+
+static unsigned int get_stack_depth(struct jit_ctx *ctx)
+{
+	int sp_off = 0;
+
+
+	/* How may s* regs do we need to preserved? */
+	sp_off += hweight32(ctx->flags >> SEEN_SREG_SFT) * RSIZE;
+
+	if (ctx->flags & SEEN_MEM)
+		sp_off += 4 * BPF_MEMWORDS; /* BPF_MEMWORDS are 32-bit */
+
+	if (ctx->flags & SEEN_CALL)
+		/*
+		 * The JIT code make calls to external functions using 2
+		 * arguments. Therefore, for o32 we don't need to allocate
+		 * space because we don't care if the argumetns are lost
+		 * across calls. We do need however to preserve incoming
+		 * arguments but the space is already allocated for us by
+		 * the caller. On the other hand, for n64, we need to allocate
+		 * this space ourselves. We need to preserve $ra as well.
+		 */
+		sp_off += config_enabled(CONFIG_64BIT) ?
+			(ARGS_USED_BY_JIT + 1) * RSIZE : RSIZE;
+
+	/*
+	 * Subtract the bytes for the last registers since we only care about
+	 * the location on the stack pointer.
+	 */
+	return sp_off - RSIZE;
+}
+
+static void build_prologue(struct jit_ctx *ctx)
+{
+	u16 first_inst = ctx->skf->insns[0].code;
+	int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+	sp_off = get_stack_depth(ctx);
+	save_bpf_jit_regs(ctx, sp_off);
+
+	if (ctx->flags & SEEN_SKB)
+		emit_reg_move(r_skb, MIPS_R_A0, ctx);
+
+	if (ctx->flags & FLAG_NEED_X_RESET)
+		emit_jit_reg_move(r_X, r_zero, ctx);
+
+	/* Do not leak kernel data to userspace */
+	if ((first_inst != BPF_S_RET_K) && !(is_load_to_a(first_inst)))
+		emit_jit_reg_move(r_A, r_zero, ctx);
+}
+
+static void build_epilogue(struct jit_ctx *ctx)
+{
+	unsigned int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+
+	sp_off = get_stack_depth(ctx);
+	restore_bpf_jit_regs(ctx, sp_off);
+
+	/* Return */
+	emit_jr(r_ra, ctx);
+	emit_nop(ctx);
+}
+
+static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
+{
+	u8 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 1);
+
+	return (u64)err << 32 | ret;
+}
+
+static u64 jit_get_skb_h(struct sk_buff *skb, unsigned offset)
+{
+	u16 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 2);
+
+	return (u64)err << 32 | ntohs(ret);
+}
+
+static u64 jit_get_skb_w(struct sk_buff *skb, unsigned offset)
+{
+	u32 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 4);
+
+	return (u64)err << 32 | ntohl(ret);
+}
+
+#define PKT_TYPE_MAX 7
+static int pkt_type_offset(void)
+{
+	struct sk_buff skb_probe = {
+		.pkt_type = ~0,
+	};
+	char *ct = (char *)&skb_probe;
+	unsigned int off;
+
+	for (off = 0; off < sizeof(struct sk_buff); off++) {
+		if (ct[off] == PKT_TYPE_MAX)
+			return off;
+	}
+	pr_err_once("Please fix pkt_type_offset(), as pkt_type couldn't be found\n");
+	return -1;
+}
+
+static int build_body(struct jit_ctx *ctx)
+{
+	void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
+	const struct sk_filter *prog = ctx->skf;
+	const struct sock_filter *inst;
+	unsigned int i, off, load_order, condt;
+	u32 k, b_off __maybe_unused;
+
+	for (i = 0; i < prog->len; i++) {
+		inst = &(prog->insns[i]);
+		pr_debug("%s: code->0x%02x, jt->0x%x, jf->0x%x, k->0x%x\n",
+			 __func__, inst->code, inst->jt, inst->jf, inst->k);
+		k = inst->k;
+
+		if (ctx->target == NULL)
+			ctx->offsets[i] = ctx->idx * 4;
+
+		switch (inst->code) {
+		case BPF_S_LD_IMM:
+			/* A <- k ==> li r_A, k */
+			ctx->flags |= SEEN_A;
+			emit_load_imm(r_A, k, ctx);
+			break;
+		case BPF_S_LD_W_LEN:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4);
+			/* A <- len ==> lw r_A, offset(skb) */
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_LD_MEM:
+			/* A <- M[k] ==> lw r_A, offset(M) */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_load(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LD_W_ABS:
+			/* A <- P[k:4] */
+			load_order = 2;
+			goto load;
+		case BPF_S_LD_H_ABS:
+			/* A <- P[k:2] */
+			load_order = 1;
+			goto load;
+		case BPF_S_LD_B_ABS:
+			/* A <- P[k:1] */
+			load_order = 0;
+load:
+			emit_load_imm(r_off, k, ctx);
+load_common:
+			ctx->flags |= SEEN_CALL | SEEN_OFF | SEEN_S0 |
+				SEEN_SKB | SEEN_A;
+
+			emit_load_func(r_s0, (ptr)load_func[load_order],
+				      ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			/* Load second argument to delay slot */
+			emit_reg_move(MIPS_R_A1, r_off, ctx);
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Get error code from the top 32-bits */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero, 3 << 2,
+					   ctx);
+			} else {
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_err, r_zero, 3 << 2,
+					   ctx);
+			}
+			emit_nop(ctx);
+			/* We are good */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_jit_reg_move(r_A, r_val, ctx);
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			break;
+		case BPF_S_LD_W_IND:
+			/* A <- P[X + k:4] */
+			load_order = 2;
+			goto load_ind;
+		case BPF_S_LD_H_IND:
+			/* A <- P[X + k:2] */
+			load_order = 1;
+			goto load_ind;
+		case BPF_S_LD_B_IND:
+			/* A <- P[X + k:1] */
+			load_order = 0;
+load_ind:
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_OFF | SEEN_X;
+			emit_addiu(r_off, r_X, k, ctx);
+			goto load_common;
+		case BPF_S_LDX_IMM:
+			/* X <- k */
+			ctx->flags |= SEEN_X;
+			emit_load_imm(r_X, k, ctx);
+			break;
+		case BPF_S_LDX_MEM:
+			/* X <- M[k] */
+			ctx->flags |= SEEN_X | SEEN_MEM;
+			emit_load(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LDX_W_LEN:
+			/* X <- len */
+			ctx->flags |= SEEN_X | SEEN_SKB;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_X, r_skb, off, ctx);
+			break;
+		case BPF_S_LDX_B_MSH:
+			/* X <- 4 * (P[k:1] & 0xf) */
+			ctx->flags |= SEEN_X | SEEN_CALL | SEEN_S0 | SEEN_SKB;
+			/* Load offset to a1 */
+			emit_load_func(r_s0, (ptr)jit_get_skb_b, ctx);
+			/*
+			 * This may emit two instructions so it may not fit
+			 * in the delay slot. So use a0 in the delay slot.
+			 */
+			emit_load_imm(MIPS_R_A1, k, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx); /* delay slot */
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Top 32-bits of $v0 on 64-bit */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero,
+					   3 << 2, ctx);
+			} else {
+				emit_bcond(MIPS_COND_NE, r_err, r_zero,
+					   3 << 2, ctx);
+			}
+			/* No need for delay slot */
+			/* We are good */
+			/* X <- P[1:K] & 0xf */
+			emit_andi(r_X, r_val, 0xf, ctx);
+			/* X << 2 */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_sll(r_X, r_X, 2, ctx); /* delay slot */
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_ret, 0, ctx); /* delay slot */
+			break;
+		case BPF_S_ST:
+			/* M[k] <- A */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_store(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_STX:
+			/* M[k] <- X */
+			ctx->flags |= SEEN_MEM | SEEN_X;
+			emit_store(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_ALU_ADD_K:
+			/* A += K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_ADD_X:
+			/* A += X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_addu(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_SUB_K:
+			/* A -= K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, -k, ctx);
+			break;
+		case BPF_S_ALU_SUB_X:
+			/* A -= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_subu(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_MUL_K:
+			/* A *= K */
+			/* Load K to scratch register before MUL */
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_mul(r_A, r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MUL_X:
+			/* A *= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_mul(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_DIV_K:
+			/* A /= k */
+			if (k == 1)
+				break;
+			if (optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_srl(r_A, r_A, k, ctx);
+				break;
+			}
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_div(r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MOD_K:
+			/* A %= k */
+			if (k == 1 || optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_jit_reg_move(r_A, r_zero, ctx);
+			} else {
+				ctx->flags |= SEEN_A | SEEN_S0;
+				emit_load_imm(r_s0, k, ctx);
+				emit_mod(r_A, r_s0, ctx);
+			}
+			break;
+		case BPF_S_ALU_DIV_X:
+			/* A /= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_div(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_MOD_X:
+			/* A %= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_mod(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_OR_K:
+			/* A |= K */
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_OR_X:
+			/* A |= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_XOR_K:
+			/* A ^= k */
+			ctx->flags |= SEEN_A;
+			emit_xori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ANC_ALU_XOR_X:
+		case BPF_S_ALU_XOR_X:
+			/* A ^= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_xor(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_AND_K:
+			/* A &= K */
+			ctx->flags |= SEEN_A;
+			emit_andi(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_AND_X:
+			/* A &= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_and(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_LSH_K:
+			/* A <<= K */
+			ctx->flags |= SEEN_A;
+			emit_sll(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_LSH_X:
+			/* A <<= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_sllv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_RSH_K:
+			/* A >>= K */
+			ctx->flags |= SEEN_A;
+			emit_srl(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_RSH_X:
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_srlv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_NEG:
+			/* A = -A */
+			ctx->flags |= SEEN_A;
+			emit_neg(r_A, ctx);
+			break;
+		case BPF_S_JMP_JA:
+			/* pc += K */
+			emit_b(b_imm(i + k + 1, ctx), ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JEQ_K:
+			/* pc += ( A == K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JEQ_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A == X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_K:
+			/* pc += ( A >= K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A >= X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_K:
+			/* pc += ( A > K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A > X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_X;
+jmp_cmp:
+			/* Greater or Equal */
+			if ((condt & MIPS_COND_GE) ||
+			    (condt & MIPS_COND_GT)) {
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_sltiu(r_s0, r_A, k, ctx);
+				} else { /* X */
+					ctx->flags |= SEEN_S0 | SEEN_A |
+						SEEN_X;
+					emit_sltu(r_s0, r_A, r_X, ctx);
+				}
+				/* A < (K|X) ? r_scrach = 1 */
+				b_off = b_imm(i + inst->jf + 1, ctx);
+				emit_bcond(MIPS_COND_GT, r_s0, r_zero, b_off,
+					   ctx);
+				emit_nop(ctx);
+				/* A > (K|X) ? scratch = 0 */
+				if (condt & MIPS_COND_GT) {
+					/* Checking for equality */
+					ctx->flags |= SEEN_S0 | SEEN_A | SEEN_X;
+					if (condt & MIPS_COND_K)
+						emit_load_imm(r_s0, k, ctx);
+					else
+						emit_jit_reg_move(r_s0, r_X,
+								  ctx);
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* Finally, A > K|X */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				} else {
+					/* A >= (K|X) so jump */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				}
+			} else {
+				/* A == K|X */
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_load_imm(r_s0, k, ctx);
+					/* jump true */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+				} else { /* X */
+					/* jump true */
+					ctx->flags |= SEEN_A | SEEN_X;
+					b_off = b_imm(i + inst->jt + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+				}
+			}
+			break;
+		case BPF_S_JMP_JSET_K:
+			ctx->flags |= SEEN_S0 | SEEN_S1 | SEEN_A;
+			/* pc += (A & K) ? pc -> jt : pc -> jf */
+			emit_load_imm(r_s1, k, ctx);
+			emit_and(r_s0, r_A, r_s1, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JSET_X:
+			ctx->flags |= SEEN_S0 | SEEN_X | SEEN_A;
+			/* pc += (A & X) ? pc -> jt : pc -> jf */
+			emit_and(r_s0, r_A, r_X, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_RET_A:
+			ctx->flags |= SEEN_A;
+			if (i != prog->len - 1)
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_A, ctx); /* delay slot */
+			break;
+		case BPF_S_RET_K:
+			/*
+			 * It can emit two instructions so it does not fit on
+			 * the delay slot.
+			 */
+			emit_load_imm(r_ret, k, ctx);
+			if (i != prog->len - 1) {
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+				emit_nop(ctx);
+			}
+			break;
+		case BPF_S_MISC_TAX:
+			/* X = A */
+			ctx->flags |= SEEN_X | SEEN_A;
+			emit_jit_reg_move(r_X, r_A, ctx);
+			break;
+		case BPF_S_MISC_TXA:
+			/* A = X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_jit_reg_move(r_A, r_X, ctx);
+			break;
+		/* AUX */
+		case BPF_S_ANC_PROTOCOL:
+			/* A = ntohs(skb->protocol */
+			ctx->flags |= SEEN_SKB | SEEN_OFF | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  protocol) != 2);
+			off = offsetof(struct sk_buff, protocol);
+			emit_half_load(r_A, r_skb, off, ctx);
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+			/* This needs little endian fixup */
+			if (!cpu_has_mips_r2) {
+				/* Get first byte */
+				emit_andi(r_tmp_imm, r_A, 0xff, ctx);
+				/* Shift it */
+				emit_sll(r_tmp, r_tmp_imm, 8, ctx);
+				/* Get second byte */
+				emit_srl(r_tmp_imm, r_A, 8, ctx);
+				emit_andi(r_tmp_imm, r_tmp_imm, 0xff, ctx);
+				/* Put everyting together in r_A */
+				emit_or(r_A, r_tmp, r_tmp_imm, ctx);
+			} else {
+				/* R2 and later have the wsbh instruction */
+				emit_wsbh(r_A, r_A, ctx);
+			}
+#endif
+			break;
+		case BPF_S_ANC_CPU:
+			ctx->flags |= SEEN_A | SEEN_OFF;
+			/* A = current_thread_info()->cpu */
+			BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info,
+						  cpu) != 4);
+			off = offsetof(struct thread_info, cpu);
+			/* $28/gp points to the thread_info struct */
+			emit_load(r_A, 28, off, ctx);
+			break;
+		case BPF_S_ANC_IFINDEX:
+			/* A = skb->dev->ifindex */
+			ctx->flags |= SEEN_SKB | SEEN_A | SEEN_S0;
+			off = offsetof(struct sk_buff, dev);
+			emit_load(r_s0, r_skb, off, ctx);
+			/* error (0) in the delay slot */
+			emit_bcond(MIPS_COND_EQ, r_s0, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+						  ifindex) != 4);
+			off = offsetof(struct net_device, ifindex);
+			emit_load(r_A, r_s0, off, ctx);
+			break;
+		case BPF_S_ANC_MARK:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
+			off = offsetof(struct sk_buff, mark);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_RXHASH:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, hash) != 4);
+			off = offsetof(struct sk_buff, hash);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_VLAN_TAG:
+		case BPF_S_ANC_VLAN_TAG_PRESENT:
+			ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  vlan_tci) != 2);
+			off = offsetof(struct sk_buff, vlan_tci);
+			emit_half_load(r_s0, r_skb, off, ctx);
+			if (inst->code == BPF_S_ANC_VLAN_TAG)
+				emit_and(r_A, r_s0, VLAN_VID_MASK, ctx);
+			else
+				emit_and(r_A, r_s0, VLAN_TAG_PRESENT, ctx);
+			break;
+		case BPF_S_ANC_PKTTYPE:
+			off = pkt_type_offset();
+
+			if (off < 0)
+				return -1;
+			emit_load_byte(r_tmp, r_skb, off, ctx);
+			/* Keep only the last 3 bits */
+			emit_andi(r_A, r_tmp, PKT_TYPE_MAX, ctx);
+			break;
+		case BPF_S_ANC_QUEUE:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  queue_mapping) != 2);
+			BUILD_BUG_ON(offsetof(struct sk_buff,
+					      queue_mapping) > 0xff);
+			off = offsetof(struct sk_buff, queue_mapping);
+			emit_half_load(r_A, r_skb, off, ctx);
+			break;
+		default:
+			pr_warn("%s: Unhandled opcode: 0x%02x\n", __FILE__,
+				inst->code);
+			return -1;
+		}
+	}
+
+	/* compute offsets only during the first pass */
+	if (ctx->target == NULL)
+		ctx->offsets[i] = ctx->idx * 4;
+
+	return 0;
+}
+
+int bpf_jit_enable __read_mostly;
+
+void bpf_jit_compile(struct sk_filter *fp)
+{
+	struct jit_ctx ctx;
+	unsigned int alloc_size, tmp_idx;
+
+	if (!bpf_jit_enable)
+		return;
+
+	memset(&ctx, 0, sizeof(ctx));
+
+	ctx.offsets = kcalloc(fp->len, sizeof(*ctx.offsets), GFP_KERNEL);
+	if (ctx.offsets == NULL)
+		return;
+
+	ctx.skf = fp;
+
+	if (build_body(&ctx))
+		goto out;
+
+	tmp_idx = ctx.idx;
+	build_prologue(&ctx);
+	ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
+	/* just to complete the ctx.idx count */
+	build_epilogue(&ctx);
+
+	alloc_size = 4 * ctx.idx;
+	ctx.target = module_alloc(alloc_size);
+	if (ctx.target == NULL)
+		goto out;
+
+	/* Clean it */
+	memset(ctx.target, 0, alloc_size);
+
+	ctx.idx = 0;
+
+	/* Generate the actual JIT code */
+	build_prologue(&ctx);
+	build_body(&ctx);
+	build_epilogue(&ctx);
+
+	/* Update the icache */
+	flush_icache_range((ptr)ctx.target, (ptr)(ctx.target + ctx.idx));
+
+	if (bpf_jit_enable > 1)
+		/* Dump JIT code */
+		bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);
+
+	fp->bpf_func = (void *)ctx.target;
+	fp->jited = 1;
+
+out:
+	kfree(ctx.offsets);
+}
+
+void bpf_jit_free(struct sk_filter *fp)
+{
+	if (fp->jited)
+		module_free(NULL, fp->bpf_func);
+	kfree(fp);
+}
diff --git a/arch/mips/net/bpf_jit.h b/arch/mips/net/bpf_jit.h
new file mode 100644
index 0000000..3a5751b
--- /dev/null
+++ b/arch/mips/net/bpf_jit.h
@@ -0,0 +1,44 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#ifndef BPF_JIT_MIPS_OP_H
+#define BPF_JIT_MIPS_OP_H
+
+/* Registers used by JIT */
+#define MIPS_R_ZERO	0
+#define MIPS_R_V0	2
+#define MIPS_R_V1	3
+#define MIPS_R_A0	4
+#define MIPS_R_A1	5
+#define MIPS_R_T6	14
+#define MIPS_R_T7	15
+#define MIPS_R_S0	16
+#define MIPS_R_S1	17
+#define MIPS_R_S2	18
+#define MIPS_R_S3	19
+#define MIPS_R_S4	20
+#define MIPS_R_S5	21
+#define MIPS_R_S6	22
+#define MIPS_R_S7	23
+#define MIPS_R_SP	29
+#define MIPS_R_RA	31
+
+/* Conditional codes */
+#define MIPS_COND_EQ	0x1
+#define MIPS_COND_GE	(0x1 << 1)
+#define MIPS_COND_GT	(0x1 << 2)
+#define MIPS_COND_NE	(0x1 << 3)
+#define MIPS_COND_ALL	(0x1 << 4)
+/* Conditionals on X register or K immediate */
+#define MIPS_COND_X	(0x1 << 5)
+#define MIPS_COND_K	(0x1 << 6)
+
+#endif /* BPF_JIT_MIPS_OP_H */
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 13/14] MIPS: net: Add BPF JIT
@ 2014-04-29 15:58       ` Markos Chandras
  0 siblings, 0 replies; 53+ messages in thread
From: Markos Chandras @ 2014-04-29 15:58 UTC (permalink / raw)
  To: linux-mips; +Cc: Markos Chandras, David S. Miller, netdev

This adds initial support for BPF-JIT on MIPS

Tested on mips32 LE/BE and mips64 BE/n64 using
dhcp, ping and various tcpdump filters.

Benchmarking:

Assuming the remote MIPS target uses 192.168.154.181
as its IP address, and the local host uses 192.168.154.136,
the following results can be obtained using the following
tcpdump filter (catches no frames) and a simple
'time ping -f -c 1000000' command.

[root@(none) ~]# tcpdump -p -n -s 0 -i eth0 net 10.0.0.0/24 -d
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 8
(002) ld       [26]
(003) and      #0xffffff00
(004) jeq      #0xa000000       jt 16	jf 5
(005) ld       [30]
(006) and      #0xffffff00
(007) jeq      #0xa000000       jt 16	jf 17
(008) jeq      #0x806           jt 10	jf 9
(009) jeq      #0x8035          jt 10	jf 17
(010) ld       [28]
(011) and      #0xffffff00
(012) jeq      #0xa000000       jt 16	jf 13
(013) ld       [38]
(014) and      #0xffffff00
(015) jeq      #0xa000000       jt 16	jf 17
(016) ret      #65535

- BPF-JIT Disabled

real    1m38.005s
user    0m1.510s
sys     0m6.710s

- BPF-JIT Enabled

real    1m35.215s
user    0m1.200s
sys     0m4.140s

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
---
Changes since v3:
- Replace cpu_has_mips_r1 with !cpu_has_mips_r2 to avoid taking the slow path
even on MIPS R2 processors
---
 arch/mips/Kbuild        |    1 +
 arch/mips/net/Makefile  |    3 +
 arch/mips/net/bpf_jit.c | 1399 +++++++++++++++++++++++++++++++++++++++++++++++
 arch/mips/net/bpf_jit.h |   44 ++
 4 files changed, 1447 insertions(+)
 create mode 100644 arch/mips/net/Makefile
 create mode 100644 arch/mips/net/bpf_jit.c
 create mode 100644 arch/mips/net/bpf_jit.h

diff --git a/arch/mips/Kbuild b/arch/mips/Kbuild
index d2cfe45..9179336 100644
--- a/arch/mips/Kbuild
+++ b/arch/mips/Kbuild
@@ -17,6 +17,7 @@ obj- := $(platform-)
 obj-y += kernel/
 obj-y += mm/
 obj-y += math-emu/
+obj-y += net/
 
 ifdef CONFIG_KVM
 obj-y += kvm/
diff --git a/arch/mips/net/Makefile b/arch/mips/net/Makefile
new file mode 100644
index 0000000..ae74b3a
--- /dev/null
+++ b/arch/mips/net/Makefile
@@ -0,0 +1,3 @@
+# MIPS networking code
+
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
new file mode 100644
index 0000000..98e9d23
--- /dev/null
+++ b/arch/mips/net/bpf_jit.c
@@ -0,0 +1,1399 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#include <linux/bitops.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <linux/kconfig.h>
+#include <linux/moduleloader.h>
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <asm/bitops.h>
+#include <asm/cacheflush.h>
+#include <asm/cpu-features.h>
+#include <asm/uasm.h>
+
+#include "bpf_jit.h"
+
+/* ABI
+ *
+ * s0	1st scratch register
+ * s1	2nd scratch register
+ * s2	offset register
+ * s3	BPF register A
+ * s4	BPF register X
+ * s5	*skb
+ * s6	*scratch memory
+ *
+ * On entry (*bpf_func)(*skb, *filter)
+ * a0 = MIPS_R_A0 = skb;
+ * a1 = MIPS_R_A1 = filter;
+ *
+ * Stack
+ * ...
+ * M[15]
+ * M[14]
+ * M[13]
+ * ...
+ * M[0] <-- r_M
+ * saved reg k-1
+ * saved reg k-2
+ * ...
+ * saved reg 0 <-- r_sp
+ * <no argument area>
+ *
+ *                     Packet layout
+ *
+ * <--------------------- len ------------------------>
+ * <--skb-len(r_skb_hl)-->< ----- skb->data_len ------>
+ * ----------------------------------------------------
+ * |                  skb->data                       |
+ * ----------------------------------------------------
+ */
+
+#define RSIZE	(sizeof(unsigned long))
+#define ptr typeof(unsigned long)
+
+/* ABI specific return values */
+#ifdef CONFIG_32BIT /* O32 */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define r_err	MIPS_R_V1
+#define r_val	MIPS_R_V0
+#else /* CONFIG_CPU_LITTLE_ENDIAN */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V1
+#endif
+#else /* N64 */
+#define r_err	MIPS_R_V0
+#define r_val	MIPS_R_V0
+#endif
+
+#define r_ret	MIPS_R_V0
+
+/*
+ * Use 2 scratch registers to avoid pipeline interlocks.
+ * There is no overhead during epilogue and prologue since
+ * any of the $s0-$s6 registers will only be preserved if
+ * they are going to actually be used.
+ */
+#define r_s0		MIPS_R_S0 /* scratch reg 1 */
+#define r_s1		MIPS_R_S1 /* scratch reg 2 */
+#define r_off		MIPS_R_S2
+#define r_A		MIPS_R_S3
+#define r_X		MIPS_R_S4
+#define r_skb		MIPS_R_S5
+#define r_M		MIPS_R_S6
+#define r_tmp_imm	MIPS_R_T6 /* No need to preserve this */
+#define r_tmp		MIPS_R_T7 /* No need to preserve this */
+#define r_zero		MIPS_R_ZERO
+#define r_sp		MIPS_R_SP
+#define r_ra		MIPS_R_RA
+
+#define SCRATCH_OFF(k)		(4 * (k))
+
+/* JIT flags */
+#define SEEN_CALL		(1 << BPF_MEMWORDS)
+#define SEEN_SREG_SFT		(BPF_MEMWORDS + 1)
+#define SEEN_SREG_BASE		(1 << SEEN_SREG_SFT)
+#define SEEN_SREG(x)		(SEEN_SREG_BASE << (x))
+#define SEEN_S0			SEEN_SREG(0)
+#define SEEN_S1			SEEN_SREG(1)
+#define SEEN_OFF		SEEN_SREG(2)
+#define SEEN_A			SEEN_SREG(3)
+#define SEEN_X			SEEN_SREG(4)
+#define SEEN_SKB		SEEN_SREG(5)
+#define SEEN_MEM		SEEN_SREG(6)
+
+/* Arguments used by JIT */
+#define ARGS_USED_BY_JIT	2 /* only applicable to 64-bit */
+
+#define FLAG_NEED_X_RESET	(1 << 0)
+
+#define SBIT(x)			(1 << (x)) /* Signed version of BIT() */
+
+/**
+ * struct jit_ctx - JIT context
+ * @skf:		The sk_filter
+ * @prologue_bytes:	Number of bytes for prologue
+ * @idx:		Instruction index
+ * @flags:		JIT flags
+ * @offsets:		Instruction offsets
+ * @target:		Memory location for the compiled filter
+ */
+struct jit_ctx {
+	const struct sk_filter *skf;
+	unsigned int prologue_bytes;
+	u32 idx;
+	u32 flags;
+	u32 *offsets;
+	u32 *target;
+};
+
+
+static inline int optimize_div(u32 *k)
+{
+	/* power of 2 divides can be implemented with right shift */
+	if (!(*k & (*k-1))) {
+		*k = ilog2(*k);
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Simply emit the instruction if the JIT memory space has been allocated */
+#define emit_instr(ctx, func, ...)			\
+do {							\
+	if ((ctx)->target != NULL) {			\
+		u32 *p = &(ctx)->target[ctx->idx];	\
+		uasm_i_##func(&p, ##__VA_ARGS__);	\
+	}						\
+	(ctx)->idx++;					\
+} while (0)
+
+/* Determine if immediate is within the 16-bit signed range */
+static inline bool is_range16(s32 imm)
+{
+	if (imm >= SBIT(15) || imm < -SBIT(15))
+		return true;
+	return false;
+}
+
+static inline void emit_addu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, addu, dst, src1, src2);
+}
+
+static inline void emit_nop(struct jit_ctx *ctx)
+{
+	emit_instr(ctx, nop);
+}
+
+/* Load a u32 immediate to a register */
+static inline void emit_load_imm(unsigned int dst, u32 imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		/* addiu can only handle s16 */
+		if (is_range16(imm)) {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_lui(&p, r_tmp_imm, (s32)imm >> 16);
+			p = &ctx->target[ctx->idx + 1];
+			uasm_i_ori(&p, dst, r_tmp_imm, imm & 0xffff);
+		} else {
+			u32 *p = &ctx->target[ctx->idx];
+			uasm_i_addiu(&p, dst, r_zero, imm);
+		}
+	}
+	ctx->idx++;
+
+	if (is_range16(imm))
+		ctx->idx++;
+}
+
+static inline void emit_or(unsigned int dst, unsigned int src1,
+			   unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, or, dst, src1, src2);
+}
+
+static inline void emit_ori(unsigned int dst, unsigned src, u32 imm,
+			    struct jit_ctx *ctx)
+{
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_or(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, ori, dst, src, imm);
+	}
+}
+
+
+static inline void emit_daddu(unsigned int dst, unsigned int src1,
+			      unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, daddu, dst, src1, src2);
+}
+
+static inline void emit_daddiu(unsigned int dst, unsigned int src,
+			       int imm, struct jit_ctx *ctx)
+{
+	/*
+	 * Only used for stack, so the imm is relatively small
+	 * and it fits in 15-bits
+	 */
+	emit_instr(ctx, daddiu, dst, src, imm);
+}
+
+static inline void emit_addiu(unsigned int dst, unsigned int src,
+			      u32 imm, struct jit_ctx *ctx)
+{
+	if (is_range16(imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_addu(dst, r_tmp, src, ctx);
+	} else {
+		emit_instr(ctx, addiu, dst, src, imm);
+	}
+}
+
+static inline void emit_and(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, and, dst, src1, src2);
+}
+
+static inline void emit_andi(unsigned int dst, unsigned int src,
+			     u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_and(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, andi, dst, src, imm);
+	}
+}
+
+static inline void emit_xor(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, xor, dst, src1, src2);
+}
+
+static inline void emit_xori(ptr dst, ptr src, u32 imm, struct jit_ctx *ctx)
+{
+	/* If imm does not fit in u16 then load it to register */
+	if (imm >= BIT(16)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_xor(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, xori, dst, src, imm);
+	}
+}
+
+static inline void emit_stack_offset(int offset, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, daddiu, r_sp, r_sp, offset);
+	else
+		emit_instr(ctx, addiu, r_sp, r_sp, offset);
+
+}
+
+static inline void emit_subu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, subu, dst, src1, src2);
+}
+
+static inline void emit_neg(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_subu(reg, r_zero, reg, ctx);
+}
+
+static inline void emit_sllv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sllv, dst, src, sa);
+}
+
+static inline void emit_sll(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, sll, dst, src, sa);
+}
+
+static inline void emit_srlv(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, srlv, dst, src, sa);
+}
+
+static inline void emit_srl(unsigned int dst, unsigned int src,
+			    unsigned int sa, struct jit_ctx *ctx)
+{
+	/* sa is 5-bits long */
+	BUG_ON(sa >= BIT(5));
+	emit_instr(ctx, srl, dst, src, sa);
+}
+
+static inline void emit_sltu(unsigned int dst, unsigned int src1,
+			     unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sltu, dst, src1, src2);
+}
+
+static inline void emit_sltiu(unsigned dst, unsigned int src,
+			      unsigned int imm, struct jit_ctx *ctx)
+{
+	/* 16 bit immediate */
+	if (is_range16((s32)imm)) {
+		emit_load_imm(r_tmp, imm, ctx);
+		emit_sltu(dst, src, r_tmp, ctx);
+	} else {
+		emit_instr(ctx, sltiu, dst, src, imm);
+	}
+
+}
+
+/* Store register on the stack */
+static inline void emit_store_stack_reg(ptr reg, ptr base,
+					unsigned int offset,
+					struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, sd, reg, offset, base);
+	else
+		emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_store(ptr reg, ptr base, unsigned int offset,
+			      struct jit_ctx *ctx)
+{
+	emit_instr(ctx, sw, reg, offset, base);
+}
+
+static inline void emit_load_stack_reg(ptr reg, ptr base,
+				       unsigned int offset,
+				       struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_instr(ctx, ld, reg, offset, base);
+	else
+		emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_load(unsigned int reg, unsigned int base,
+			     unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lw, reg, offset, base);
+}
+
+static inline void emit_load_byte(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lb, reg, offset, base);
+}
+
+static inline void emit_half_load(unsigned int reg, unsigned int base,
+				  unsigned int offset, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, lh, reg, offset, base);
+}
+
+static inline void emit_mul(unsigned int dst, unsigned int src1,
+			    unsigned int src2, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, mul, dst, src1, src2);
+}
+
+static inline void emit_div(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mfhi(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_mod(unsigned int dst, unsigned int src,
+			    struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+		uasm_i_divu(&p, dst, src);
+		p = &ctx->target[ctx->idx + 1];
+		uasm_i_mflo(&p, dst);
+	}
+	ctx->idx += 2; /* 2 insts */
+}
+
+static inline void emit_dsll(unsigned int dst, unsigned int src,
+			     unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsll, dst, src, sa);
+}
+
+static inline void emit_dsrl32(unsigned int dst, unsigned int src,
+			       unsigned int sa, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, dsrl32, dst, src, sa);
+}
+
+static inline void emit_wsbh(unsigned int dst, unsigned int src,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, wsbh, dst, src);
+}
+
+/* load a function pointer to register */
+static inline void emit_load_func(unsigned int reg, ptr imm,
+				  struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT)) {
+		/* At this point imm is always 64-bit */
+		emit_load_imm(r_tmp, (u64)imm >> 32, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(r_tmp, r_tmp_imm, (imm >> 16) & 0xffff, ctx);
+		emit_dsll(r_tmp_imm, r_tmp, 16, ctx); /* left shift by 16 */
+		emit_ori(reg, r_tmp_imm, imm & 0xffff, ctx);
+	} else {
+		emit_load_imm(reg, imm, ctx);
+	}
+}
+
+/* Move to real MIPS register */
+static inline void emit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	if (config_enabled(CONFIG_64BIT))
+		emit_daddu(dst, src, r_zero, ctx);
+	else
+		emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Move to JIT (32-bit) register */
+static inline void emit_jit_reg_move(ptr dst, ptr src, struct jit_ctx *ctx)
+{
+	emit_addu(dst, src, r_zero, ctx);
+}
+
+/* Compute the immediate value for PC-relative branches. */
+static inline u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
+{
+	if (ctx->target == NULL)
+		return 0;
+
+	/*
+	 * We want a pc-relative branch. We only do forward branches
+	 * so tgt is always after pc. tgt is the instruction offset
+	 * we want to jump to.
+
+	 * Branch on MIPS:
+	 * I: target_offset <- sign_extend(offset)
+	 * I+1: PC += target_offset (delay slot)
+	 *
+	 * ctx->idx currently points to the branch instruction
+	 * but the offset is added to the delay slot so we need
+	 * to subtract 4.
+	 */
+	return ctx->offsets[tgt] -
+		(ctx->idx * 4 - ctx->prologue_bytes) - 4;
+}
+
+static inline void emit_bcond(int cond, unsigned int reg1, unsigned int reg2,
+			     unsigned int imm, struct jit_ctx *ctx)
+{
+	if (ctx->target != NULL) {
+		u32 *p = &ctx->target[ctx->idx];
+
+		switch (cond) {
+		case MIPS_COND_EQ:
+			uasm_i_beq(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_NE:
+			uasm_i_bne(&p, reg1, reg2, imm);
+			break;
+		case MIPS_COND_ALL:
+			uasm_i_b(&p, imm);
+			break;
+		default:
+			pr_warn("%s: Unhandled branch conditional: %d\n",
+				__func__, cond);
+		}
+	}
+	ctx->idx++;
+}
+
+static inline void emit_b(unsigned int imm, struct jit_ctx *ctx)
+{
+	emit_bcond(MIPS_COND_ALL, r_zero, r_zero, imm, ctx);
+}
+
+static inline void emit_jalr(unsigned int link, unsigned int reg,
+			     struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jalr, link, reg);
+}
+
+static inline void emit_jr(unsigned int reg, struct jit_ctx *ctx)
+{
+	emit_instr(ctx, jr, reg);
+}
+
+static inline u16 align_sp(unsigned int num)
+{
+	/* Double word alignment for 32-bit, quadword for 64-bit */
+	unsigned int align = config_enabled(CONFIG_64BIT) ? 16 : 8;
+	num = (num + (align - 1)) & -align;
+	return num;
+}
+
+static inline void update_on_xread(struct jit_ctx *ctx)
+{
+	if (!(ctx->flags & SEEN_X))
+		ctx->flags |= FLAG_NEED_X_RESET;
+
+	ctx->flags |= SEEN_X;
+}
+
+static bool is_load_to_a(u16 inst)
+{
+	switch (inst) {
+	case BPF_S_LD_W_LEN:
+	case BPF_S_LD_W_ABS:
+	case BPF_S_LD_H_ABS:
+	case BPF_S_LD_B_ABS:
+	case BPF_S_ANC_CPU:
+	case BPF_S_ANC_IFINDEX:
+	case BPF_S_ANC_MARK:
+	case BPF_S_ANC_PROTOCOL:
+	case BPF_S_ANC_RXHASH:
+	case BPF_S_ANC_VLAN_TAG:
+	case BPF_S_ANC_VLAN_TAG_PRESENT:
+	case BPF_S_ANC_QUEUE:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned offset)
+{
+	int i = 0, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	/* Adjust the stack pointer */
+	emit_stack_offset(-align_sp(offset), ctx);
+
+	if (ctx->flags & SEEN_CALL) {
+		/* Argument save area */
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_store_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_store_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is essentially a bitmap */
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			emit_store_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					     ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* save return address */
+	if (ctx->flags & SEEN_CALL) {
+		emit_store_stack_reg(r_ra, r_sp, real_off, ctx);
+		real_off += RSIZE;
+	}
+
+	/* Setup r_M leaving the alignment gap if necessary */
+	if (ctx->flags & SEEN_MEM) {
+		if (real_off % (RSIZE * 2))
+			real_off += RSIZE;
+		emit_addiu(r_M, r_sp, real_off, ctx);
+	}
+}
+
+static void restore_bpf_jit_regs(struct jit_ctx *ctx,
+				 unsigned int offset)
+{
+	int i, real_off = 0;
+	u32 sflags, tmp_flags;
+
+	if (ctx->flags & SEEN_CALL) {
+		if (config_enabled(CONFIG_64BIT))
+			/* Bottom of current frame */
+			real_off = align_sp(offset) - RSIZE;
+		else
+			/* Top of previous frame */
+			real_off = align_sp(offset) + RSIZE;
+		emit_load_stack_reg(MIPS_R_A0, r_sp, real_off, ctx);
+		emit_load_stack_reg(MIPS_R_A1, r_sp, real_off + RSIZE, ctx);
+
+		real_off = 0;
+	}
+
+	tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
+	/* sflags is a bitmap */
+	i = 0;
+	while (tmp_flags) {
+		if ((sflags >> i) & 0x1) {
+			emit_load_stack_reg(MIPS_R_S0 + i, r_sp, real_off,
+					    ctx);
+			real_off += RSIZE;
+		}
+		i++;
+		tmp_flags >>= 1;
+	}
+
+	/* restore return address */
+	if (ctx->flags & SEEN_CALL)
+		emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
+
+	/* Restore the sp and discard the scrach memory */
+	emit_stack_offset(align_sp(offset), ctx);
+}
+
+static unsigned int get_stack_depth(struct jit_ctx *ctx)
+{
+	int sp_off = 0;
+
+
+	/* How may s* regs do we need to preserved? */
+	sp_off += hweight32(ctx->flags >> SEEN_SREG_SFT) * RSIZE;
+
+	if (ctx->flags & SEEN_MEM)
+		sp_off += 4 * BPF_MEMWORDS; /* BPF_MEMWORDS are 32-bit */
+
+	if (ctx->flags & SEEN_CALL)
+		/*
+		 * The JIT code make calls to external functions using 2
+		 * arguments. Therefore, for o32 we don't need to allocate
+		 * space because we don't care if the argumetns are lost
+		 * across calls. We do need however to preserve incoming
+		 * arguments but the space is already allocated for us by
+		 * the caller. On the other hand, for n64, we need to allocate
+		 * this space ourselves. We need to preserve $ra as well.
+		 */
+		sp_off += config_enabled(CONFIG_64BIT) ?
+			(ARGS_USED_BY_JIT + 1) * RSIZE : RSIZE;
+
+	/*
+	 * Subtract the bytes for the last registers since we only care about
+	 * the location on the stack pointer.
+	 */
+	return sp_off - RSIZE;
+}
+
+static void build_prologue(struct jit_ctx *ctx)
+{
+	u16 first_inst = ctx->skf->insns[0].code;
+	int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+	sp_off = get_stack_depth(ctx);
+	save_bpf_jit_regs(ctx, sp_off);
+
+	if (ctx->flags & SEEN_SKB)
+		emit_reg_move(r_skb, MIPS_R_A0, ctx);
+
+	if (ctx->flags & FLAG_NEED_X_RESET)
+		emit_jit_reg_move(r_X, r_zero, ctx);
+
+	/* Do not leak kernel data to userspace */
+	if ((first_inst != BPF_S_RET_K) && !(is_load_to_a(first_inst)))
+		emit_jit_reg_move(r_A, r_zero, ctx);
+}
+
+static void build_epilogue(struct jit_ctx *ctx)
+{
+	unsigned int sp_off;
+
+	/* Calculate the total offset for the stack pointer */
+
+	sp_off = get_stack_depth(ctx);
+	restore_bpf_jit_regs(ctx, sp_off);
+
+	/* Return */
+	emit_jr(r_ra, ctx);
+	emit_nop(ctx);
+}
+
+static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
+{
+	u8 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 1);
+
+	return (u64)err << 32 | ret;
+}
+
+static u64 jit_get_skb_h(struct sk_buff *skb, unsigned offset)
+{
+	u16 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 2);
+
+	return (u64)err << 32 | ntohs(ret);
+}
+
+static u64 jit_get_skb_w(struct sk_buff *skb, unsigned offset)
+{
+	u32 ret;
+	int err;
+
+	err = skb_copy_bits(skb, offset, &ret, 4);
+
+	return (u64)err << 32 | ntohl(ret);
+}
+
+#define PKT_TYPE_MAX 7
+static int pkt_type_offset(void)
+{
+	struct sk_buff skb_probe = {
+		.pkt_type = ~0,
+	};
+	char *ct = (char *)&skb_probe;
+	unsigned int off;
+
+	for (off = 0; off < sizeof(struct sk_buff); off++) {
+		if (ct[off] == PKT_TYPE_MAX)
+			return off;
+	}
+	pr_err_once("Please fix pkt_type_offset(), as pkt_type couldn't be found\n");
+	return -1;
+}
+
+static int build_body(struct jit_ctx *ctx)
+{
+	void *load_func[] = {jit_get_skb_b, jit_get_skb_h, jit_get_skb_w};
+	const struct sk_filter *prog = ctx->skf;
+	const struct sock_filter *inst;
+	unsigned int i, off, load_order, condt;
+	u32 k, b_off __maybe_unused;
+
+	for (i = 0; i < prog->len; i++) {
+		inst = &(prog->insns[i]);
+		pr_debug("%s: code->0x%02x, jt->0x%x, jf->0x%x, k->0x%x\n",
+			 __func__, inst->code, inst->jt, inst->jf, inst->k);
+		k = inst->k;
+
+		if (ctx->target == NULL)
+			ctx->offsets[i] = ctx->idx * 4;
+
+		switch (inst->code) {
+		case BPF_S_LD_IMM:
+			/* A <- k ==> li r_A, k */
+			ctx->flags |= SEEN_A;
+			emit_load_imm(r_A, k, ctx);
+			break;
+		case BPF_S_LD_W_LEN:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, len) != 4);
+			/* A <- len ==> lw r_A, offset(skb) */
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_LD_MEM:
+			/* A <- M[k] ==> lw r_A, offset(M) */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_load(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LD_W_ABS:
+			/* A <- P[k:4] */
+			load_order = 2;
+			goto load;
+		case BPF_S_LD_H_ABS:
+			/* A <- P[k:2] */
+			load_order = 1;
+			goto load;
+		case BPF_S_LD_B_ABS:
+			/* A <- P[k:1] */
+			load_order = 0;
+load:
+			emit_load_imm(r_off, k, ctx);
+load_common:
+			ctx->flags |= SEEN_CALL | SEEN_OFF | SEEN_S0 |
+				SEEN_SKB | SEEN_A;
+
+			emit_load_func(r_s0, (ptr)load_func[load_order],
+				      ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			/* Load second argument to delay slot */
+			emit_reg_move(MIPS_R_A1, r_off, ctx);
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Get error code from the top 32-bits */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero, 3 << 2,
+					   ctx);
+			} else {
+				/* Branch to 3 instructions ahead */
+				emit_bcond(MIPS_COND_NE, r_err, r_zero, 3 << 2,
+					   ctx);
+			}
+			emit_nop(ctx);
+			/* We are good */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_jit_reg_move(r_A, r_val, ctx);
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			break;
+		case BPF_S_LD_W_IND:
+			/* A <- P[X + k:4] */
+			load_order = 2;
+			goto load_ind;
+		case BPF_S_LD_H_IND:
+			/* A <- P[X + k:2] */
+			load_order = 1;
+			goto load_ind;
+		case BPF_S_LD_B_IND:
+			/* A <- P[X + k:1] */
+			load_order = 0;
+load_ind:
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_OFF | SEEN_X;
+			emit_addiu(r_off, r_X, k, ctx);
+			goto load_common;
+		case BPF_S_LDX_IMM:
+			/* X <- k */
+			ctx->flags |= SEEN_X;
+			emit_load_imm(r_X, k, ctx);
+			break;
+		case BPF_S_LDX_MEM:
+			/* X <- M[k] */
+			ctx->flags |= SEEN_X | SEEN_MEM;
+			emit_load(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_LDX_W_LEN:
+			/* X <- len */
+			ctx->flags |= SEEN_X | SEEN_SKB;
+			off = offsetof(struct sk_buff, len);
+			emit_load(r_X, r_skb, off, ctx);
+			break;
+		case BPF_S_LDX_B_MSH:
+			/* X <- 4 * (P[k:1] & 0xf) */
+			ctx->flags |= SEEN_X | SEEN_CALL | SEEN_S0 | SEEN_SKB;
+			/* Load offset to a1 */
+			emit_load_func(r_s0, (ptr)jit_get_skb_b, ctx);
+			/*
+			 * This may emit two instructions so it may not fit
+			 * in the delay slot. So use a0 in the delay slot.
+			 */
+			emit_load_imm(MIPS_R_A1, k, ctx);
+			emit_jalr(MIPS_R_RA, r_s0, ctx);
+			emit_reg_move(MIPS_R_A0, r_skb, ctx); /* delay slot */
+			/* Check the error value */
+			if (config_enabled(CONFIG_64BIT)) {
+				/* Top 32-bits of $v0 on 64-bit */
+				emit_dsrl32(r_s0, r_val, 0, ctx);
+				emit_bcond(MIPS_COND_NE, r_s0, r_zero,
+					   3 << 2, ctx);
+			} else {
+				emit_bcond(MIPS_COND_NE, r_err, r_zero,
+					   3 << 2, ctx);
+			}
+			/* No need for delay slot */
+			/* We are good */
+			/* X <- P[1:K] & 0xf */
+			emit_andi(r_X, r_val, 0xf, ctx);
+			/* X << 2 */
+			emit_b(b_imm(i + 1, ctx), ctx);
+			emit_sll(r_X, r_X, 2, ctx); /* delay slot */
+			/* Return with error */
+			emit_b(b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_ret, 0, ctx); /* delay slot */
+			break;
+		case BPF_S_ST:
+			/* M[k] <- A */
+			ctx->flags |= SEEN_MEM | SEEN_A;
+			emit_store(r_A, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_STX:
+			/* M[k] <- X */
+			ctx->flags |= SEEN_MEM | SEEN_X;
+			emit_store(r_X, r_M, SCRATCH_OFF(k), ctx);
+			break;
+		case BPF_S_ALU_ADD_K:
+			/* A += K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_ADD_X:
+			/* A += X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_addu(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_SUB_K:
+			/* A -= K */
+			ctx->flags |= SEEN_A;
+			emit_addiu(r_A, r_A, -k, ctx);
+			break;
+		case BPF_S_ALU_SUB_X:
+			/* A -= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_subu(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_MUL_K:
+			/* A *= K */
+			/* Load K to scratch register before MUL */
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_mul(r_A, r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MUL_X:
+			/* A *= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_mul(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_DIV_K:
+			/* A /= k */
+			if (k == 1)
+				break;
+			if (optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_srl(r_A, r_A, k, ctx);
+				break;
+			}
+			ctx->flags |= SEEN_A | SEEN_S0;
+			emit_load_imm(r_s0, k, ctx);
+			emit_div(r_A, r_s0, ctx);
+			break;
+		case BPF_S_ALU_MOD_K:
+			/* A %= k */
+			if (k == 1 || optimize_div(&k)) {
+				ctx->flags |= SEEN_A;
+				emit_jit_reg_move(r_A, r_zero, ctx);
+			} else {
+				ctx->flags |= SEEN_A | SEEN_S0;
+				emit_load_imm(r_s0, k, ctx);
+				emit_mod(r_A, r_s0, ctx);
+			}
+			break;
+		case BPF_S_ALU_DIV_X:
+			/* A /= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_div(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_MOD_X:
+			/* A %= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_X | SEEN_A;
+			/* Check if r_X is zero */
+			emit_bcond(MIPS_COND_EQ, r_X, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_load_imm(r_val, 0, ctx); /* delay slot */
+			emit_mod(r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_OR_K:
+			/* A |= K */
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_OR_X:
+			/* A |= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_ori(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_XOR_K:
+			/* A ^= k */
+			ctx->flags |= SEEN_A;
+			emit_xori(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ANC_ALU_XOR_X:
+		case BPF_S_ALU_XOR_X:
+			/* A ^= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A;
+			emit_xor(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_AND_K:
+			/* A &= K */
+			ctx->flags |= SEEN_A;
+			emit_andi(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_AND_X:
+			/* A &= X */
+			update_on_xread(ctx);
+			ctx->flags |= SEEN_A | SEEN_X;
+			emit_and(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_LSH_K:
+			/* A <<= K */
+			ctx->flags |= SEEN_A;
+			emit_sll(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_LSH_X:
+			/* A <<= X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_sllv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_RSH_K:
+			/* A >>= K */
+			ctx->flags |= SEEN_A;
+			emit_srl(r_A, r_A, k, ctx);
+			break;
+		case BPF_S_ALU_RSH_X:
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_srlv(r_A, r_A, r_X, ctx);
+			break;
+		case BPF_S_ALU_NEG:
+			/* A = -A */
+			ctx->flags |= SEEN_A;
+			emit_neg(r_A, ctx);
+			break;
+		case BPF_S_JMP_JA:
+			/* pc += K */
+			emit_b(b_imm(i + k + 1, ctx), ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JEQ_K:
+			/* pc += ( A == K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JEQ_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A == X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_EQ | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_K:
+			/* pc += ( A >= K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGE_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A >= X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GE | MIPS_COND_X;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_K:
+			/* pc += ( A > K ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_K;
+			goto jmp_cmp;
+		case BPF_S_JMP_JGT_X:
+			ctx->flags |= SEEN_X;
+			/* pc += ( A > X ) ? pc->jt : pc->jf */
+			condt = MIPS_COND_GT | MIPS_COND_X;
+jmp_cmp:
+			/* Greater or Equal */
+			if ((condt & MIPS_COND_GE) ||
+			    (condt & MIPS_COND_GT)) {
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_sltiu(r_s0, r_A, k, ctx);
+				} else { /* X */
+					ctx->flags |= SEEN_S0 | SEEN_A |
+						SEEN_X;
+					emit_sltu(r_s0, r_A, r_X, ctx);
+				}
+				/* A < (K|X) ? r_scrach = 1 */
+				b_off = b_imm(i + inst->jf + 1, ctx);
+				emit_bcond(MIPS_COND_GT, r_s0, r_zero, b_off,
+					   ctx);
+				emit_nop(ctx);
+				/* A > (K|X) ? scratch = 0 */
+				if (condt & MIPS_COND_GT) {
+					/* Checking for equality */
+					ctx->flags |= SEEN_S0 | SEEN_A | SEEN_X;
+					if (condt & MIPS_COND_K)
+						emit_load_imm(r_s0, k, ctx);
+					else
+						emit_jit_reg_move(r_s0, r_X,
+								  ctx);
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* Finally, A > K|X */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				} else {
+					/* A >= (K|X) so jump */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_b(b_off, ctx);
+					emit_nop(ctx);
+				}
+			} else {
+				/* A == K|X */
+				if (condt & MIPS_COND_K) { /* K */
+					ctx->flags |= SEEN_S0 | SEEN_A;
+					emit_load_imm(r_s0, k, ctx);
+					/* jump true */
+					b_off = b_imm(i + inst->jt + 1, ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_s0,
+						   b_off, ctx);
+					emit_nop(ctx);
+				} else { /* X */
+					/* jump true */
+					ctx->flags |= SEEN_A | SEEN_X;
+					b_off = b_imm(i + inst->jt + 1,
+						      ctx);
+					emit_bcond(MIPS_COND_EQ, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+					/* jump false */
+					b_off = b_imm(i + inst->jf + 1, ctx);
+					emit_bcond(MIPS_COND_NE, r_A, r_X,
+						   b_off, ctx);
+					emit_nop(ctx);
+				}
+			}
+			break;
+		case BPF_S_JMP_JSET_K:
+			ctx->flags |= SEEN_S0 | SEEN_S1 | SEEN_A;
+			/* pc += (A & K) ? pc -> jt : pc -> jf */
+			emit_load_imm(r_s1, k, ctx);
+			emit_and(r_s0, r_A, r_s1, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_JMP_JSET_X:
+			ctx->flags |= SEEN_S0 | SEEN_X | SEEN_A;
+			/* pc += (A & X) ? pc -> jt : pc -> jf */
+			emit_and(r_s0, r_A, r_X, ctx);
+			/* jump true */
+			b_off = b_imm(i + inst->jt + 1, ctx);
+			emit_bcond(MIPS_COND_NE, r_s0, r_zero, b_off, ctx);
+			emit_nop(ctx);
+			/* jump false */
+			b_off = b_imm(i + inst->jf + 1, ctx);
+			emit_b(b_off, ctx);
+			emit_nop(ctx);
+			break;
+		case BPF_S_RET_A:
+			ctx->flags |= SEEN_A;
+			if (i != prog->len - 1)
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_A, ctx); /* delay slot */
+			break;
+		case BPF_S_RET_K:
+			/*
+			 * It can emit two instructions so it does not fit on
+			 * the delay slot.
+			 */
+			emit_load_imm(r_ret, k, ctx);
+			if (i != prog->len - 1) {
+				/*
+				 * If this is not the last instruction
+				 * then jump to the epilogue
+				 */
+				emit_b(b_imm(prog->len, ctx), ctx);
+				emit_nop(ctx);
+			}
+			break;
+		case BPF_S_MISC_TAX:
+			/* X = A */
+			ctx->flags |= SEEN_X | SEEN_A;
+			emit_jit_reg_move(r_X, r_A, ctx);
+			break;
+		case BPF_S_MISC_TXA:
+			/* A = X */
+			ctx->flags |= SEEN_A | SEEN_X;
+			update_on_xread(ctx);
+			emit_jit_reg_move(r_A, r_X, ctx);
+			break;
+		/* AUX */
+		case BPF_S_ANC_PROTOCOL:
+			/* A = ntohs(skb->protocol */
+			ctx->flags |= SEEN_SKB | SEEN_OFF | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  protocol) != 2);
+			off = offsetof(struct sk_buff, protocol);
+			emit_half_load(r_A, r_skb, off, ctx);
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+			/* This needs little endian fixup */
+			if (!cpu_has_mips_r2) {
+				/* Get first byte */
+				emit_andi(r_tmp_imm, r_A, 0xff, ctx);
+				/* Shift it */
+				emit_sll(r_tmp, r_tmp_imm, 8, ctx);
+				/* Get second byte */
+				emit_srl(r_tmp_imm, r_A, 8, ctx);
+				emit_andi(r_tmp_imm, r_tmp_imm, 0xff, ctx);
+				/* Put everyting together in r_A */
+				emit_or(r_A, r_tmp, r_tmp_imm, ctx);
+			} else {
+				/* R2 and later have the wsbh instruction */
+				emit_wsbh(r_A, r_A, ctx);
+			}
+#endif
+			break;
+		case BPF_S_ANC_CPU:
+			ctx->flags |= SEEN_A | SEEN_OFF;
+			/* A = current_thread_info()->cpu */
+			BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info,
+						  cpu) != 4);
+			off = offsetof(struct thread_info, cpu);
+			/* $28/gp points to the thread_info struct */
+			emit_load(r_A, 28, off, ctx);
+			break;
+		case BPF_S_ANC_IFINDEX:
+			/* A = skb->dev->ifindex */
+			ctx->flags |= SEEN_SKB | SEEN_A | SEEN_S0;
+			off = offsetof(struct sk_buff, dev);
+			emit_load(r_s0, r_skb, off, ctx);
+			/* error (0) in the delay slot */
+			emit_bcond(MIPS_COND_EQ, r_s0, r_zero,
+				   b_imm(prog->len, ctx), ctx);
+			emit_reg_move(r_ret, r_zero, ctx);
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+						  ifindex) != 4);
+			off = offsetof(struct net_device, ifindex);
+			emit_load(r_A, r_s0, off, ctx);
+			break;
+		case BPF_S_ANC_MARK:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
+			off = offsetof(struct sk_buff, mark);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_RXHASH:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, hash) != 4);
+			off = offsetof(struct sk_buff, hash);
+			emit_load(r_A, r_skb, off, ctx);
+			break;
+		case BPF_S_ANC_VLAN_TAG:
+		case BPF_S_ANC_VLAN_TAG_PRESENT:
+			ctx->flags |= SEEN_SKB | SEEN_S0 | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  vlan_tci) != 2);
+			off = offsetof(struct sk_buff, vlan_tci);
+			emit_half_load(r_s0, r_skb, off, ctx);
+			if (inst->code == BPF_S_ANC_VLAN_TAG)
+				emit_and(r_A, r_s0, VLAN_VID_MASK, ctx);
+			else
+				emit_and(r_A, r_s0, VLAN_TAG_PRESENT, ctx);
+			break;
+		case BPF_S_ANC_PKTTYPE:
+			off = pkt_type_offset();
+
+			if (off < 0)
+				return -1;
+			emit_load_byte(r_tmp, r_skb, off, ctx);
+			/* Keep only the last 3 bits */
+			emit_andi(r_A, r_tmp, PKT_TYPE_MAX, ctx);
+			break;
+		case BPF_S_ANC_QUEUE:
+			ctx->flags |= SEEN_SKB | SEEN_A;
+			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+						  queue_mapping) != 2);
+			BUILD_BUG_ON(offsetof(struct sk_buff,
+					      queue_mapping) > 0xff);
+			off = offsetof(struct sk_buff, queue_mapping);
+			emit_half_load(r_A, r_skb, off, ctx);
+			break;
+		default:
+			pr_warn("%s: Unhandled opcode: 0x%02x\n", __FILE__,
+				inst->code);
+			return -1;
+		}
+	}
+
+	/* compute offsets only during the first pass */
+	if (ctx->target == NULL)
+		ctx->offsets[i] = ctx->idx * 4;
+
+	return 0;
+}
+
+int bpf_jit_enable __read_mostly;
+
+void bpf_jit_compile(struct sk_filter *fp)
+{
+	struct jit_ctx ctx;
+	unsigned int alloc_size, tmp_idx;
+
+	if (!bpf_jit_enable)
+		return;
+
+	memset(&ctx, 0, sizeof(ctx));
+
+	ctx.offsets = kcalloc(fp->len, sizeof(*ctx.offsets), GFP_KERNEL);
+	if (ctx.offsets == NULL)
+		return;
+
+	ctx.skf = fp;
+
+	if (build_body(&ctx))
+		goto out;
+
+	tmp_idx = ctx.idx;
+	build_prologue(&ctx);
+	ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;
+	/* just to complete the ctx.idx count */
+	build_epilogue(&ctx);
+
+	alloc_size = 4 * ctx.idx;
+	ctx.target = module_alloc(alloc_size);
+	if (ctx.target == NULL)
+		goto out;
+
+	/* Clean it */
+	memset(ctx.target, 0, alloc_size);
+
+	ctx.idx = 0;
+
+	/* Generate the actual JIT code */
+	build_prologue(&ctx);
+	build_body(&ctx);
+	build_epilogue(&ctx);
+
+	/* Update the icache */
+	flush_icache_range((ptr)ctx.target, (ptr)(ctx.target + ctx.idx));
+
+	if (bpf_jit_enable > 1)
+		/* Dump JIT code */
+		bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);
+
+	fp->bpf_func = (void *)ctx.target;
+	fp->jited = 1;
+
+out:
+	kfree(ctx.offsets);
+}
+
+void bpf_jit_free(struct sk_filter *fp)
+{
+	if (fp->jited)
+		module_free(NULL, fp->bpf_func);
+	kfree(fp);
+}
diff --git a/arch/mips/net/bpf_jit.h b/arch/mips/net/bpf_jit.h
new file mode 100644
index 0000000..3a5751b
--- /dev/null
+++ b/arch/mips/net/bpf_jit.h
@@ -0,0 +1,44 @@
+/*
+ * Just-In-Time compiler for BPF filters on MIPS
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ */
+
+#ifndef BPF_JIT_MIPS_OP_H
+#define BPF_JIT_MIPS_OP_H
+
+/* Registers used by JIT */
+#define MIPS_R_ZERO	0
+#define MIPS_R_V0	2
+#define MIPS_R_V1	3
+#define MIPS_R_A0	4
+#define MIPS_R_A1	5
+#define MIPS_R_T6	14
+#define MIPS_R_T7	15
+#define MIPS_R_S0	16
+#define MIPS_R_S1	17
+#define MIPS_R_S2	18
+#define MIPS_R_S3	19
+#define MIPS_R_S4	20
+#define MIPS_R_S5	21
+#define MIPS_R_S6	22
+#define MIPS_R_S7	23
+#define MIPS_R_SP	29
+#define MIPS_R_RA	31
+
+/* Conditional codes */
+#define MIPS_COND_EQ	0x1
+#define MIPS_COND_GE	(0x1 << 1)
+#define MIPS_COND_GT	(0x1 << 2)
+#define MIPS_COND_NE	(0x1 << 3)
+#define MIPS_COND_ALL	(0x1 << 4)
+/* Conditionals on X register or K immediate */
+#define MIPS_COND_X	(0x1 << 5)
+#define MIPS_COND_K	(0x1 << 6)
+
+#endif /* BPF_JIT_MIPS_OP_H */
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2014-04-29 15:58 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-08 11:47 [PATCH 00/14] Initial BPF-JIT support for MIPS Markos Chandras
2014-04-08 11:47 ` Markos Chandras
2014-04-08 11:47 ` [PATCH 01/14] MIPS: uasm: Add u3u2u1 instruction builders Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 02/14] MIPS: uasm: Add u2u1 " Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 03/14] MIPS: uasm: Add sllv uasm instruction Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 04/14] MIPS: uasm: Add srlv " Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 05/14] MIPS: uasm: Add divu " Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 06/14] MIPS: uasm: Add mfhi " Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 07/14] MIPS: uasm: Add jalr " Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 08/14] MIPS: uasm: Add sltiu " Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 09/14] MIPS: uasm: Add sltu " Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 10/14] MIPS: uasm: Add wsbh " Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 11/14] MIPS: uasm: Add lh uam instruction Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 12/14] MIPS: uasm: Add mul uasm instruction Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-08 11:47 ` [PATCH 13/14] MIPS: net: Add BPF JIT Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-09 16:00   ` [PATCH v2 " Markos Chandras
2014-04-09 16:00     ` Markos Chandras
2014-04-09 22:28     ` Jonas Gorski
2014-04-10  7:46       ` Markos Chandras
2014-04-10  7:46         ` Markos Chandras
2014-04-10  3:40     ` Alexei Starovoitov
2014-04-10  8:06       ` Markos Chandras
2014-04-10  8:06         ` Markos Chandras
2014-04-24 14:50   ` [PATCH v3 " Markos Chandras
2014-04-24 14:50     ` Markos Chandras
2014-04-24 16:31     ` Paul Burton
2014-04-24 16:31       ` Paul Burton
2014-04-25 12:06       ` Markos Chandras
2014-04-25 12:06         ` Markos Chandras
2014-04-29 15:58     ` [PATCH v4 " Markos Chandras
2014-04-29 15:58       ` Markos Chandras
2014-04-08 11:47 ` [PATCH 14/14] MIPS: Enable the BPF_JIT symbol for MIPS Markos Chandras
2014-04-08 11:47   ` Markos Chandras
2014-04-09 16:02   ` [PATCH v2 " Markos Chandras
2014-04-09 16:02     ` Markos Chandras
2014-04-08 18:38 ` [PATCH 00/14] Initial BPF-JIT support " Florian Fainelli
2014-04-09  8:55   ` Markos Chandras
2014-04-09 10:11     ` Jonas Gorski
2014-04-09 10:56       ` Markos Chandras
2014-04-09 10:56         ` Markos Chandras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.