linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT
@ 2021-07-12  0:34 Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 01/14] MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis Tony Ambardar
                   ` (14 more replies)
  0 siblings, 15 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Greetings!

This patch series adds an eBPF JIT for MIPS32. The approach taken first
updates existing code to support MIPS64/MIPS32 systems, then refactors
source into a common core and dependent MIPS64 JIT, and finally adds a
MIPS32 eBPF JIT implementation using the common framework.

Compared to writing a standalone MIPS32 JIT, this approach has benefits
for long-term maintainability, but has taken much longer than expected.
This RFC posting is intended to share progress, gather feedback, and
raise some questions with BPF and MIPS experts (which I'll cover later).


Code Overview
=============

The initial code updates and refactoring exposed a number of problems in
the existing MIPS64 JIT, which the first several patches fix. Patch #11
updates common code to support MIPS64/MIPS32 operation. Patch #12
separates the common core from the MIPS64 JIT code. Patch #13 adds a
needed MIPS32 uasm opcode, while patch #14 adds the MIPS32 eBPF JIT.

On MIPS32, 64-bit BPF registers are mapped to 32-bit register pairs, and
all 64-bit operations are built on 32-bit subregister ops. The MIPS32
tailcall counter is stored on the stack however. Notable changes from the
MIPS64 JIT include:

  * BPF_JMP32: implement all conditionals
  * BPF_JMP | JSET | BPF_K: drop bbit insns only usable on MIPS64 Octeon

Since MIPS32 does not include 64-bit div/mod or atomic opcodes, these BPF
insns are implemented by directly calling the built-in kernel functions:
(with thanks to Luke Nelson for posting similar code online)

  * BPF_STX   | BPF_DW  | BPF_XADD
  * BPF_ALU64 | BPF_DIV | BPF_X
  * BPF_ALU64 | BPF_DIV | BPF_K
  * BPF_ALU64 | BPF_MOD | BPF_X
  * BPF_ALU64 | BPF_MOD | BPF_K


Testing
=======

Testing used LTS kernel 5.10.x and stable 5.13.x running under QEMU.
The test suite included the 'test_bpf' module and 'test_verifier' from
kselftests. Using 'test_progs' from kselftests is too difficult in general
since cross-compilation depends on libbpf/bpftool, which does not support
cross-endian builds.

The matrix of test configurations executed for this series covered the
expected register sizes, MIPS ISA releases, and JIT settings:

  WORDSIZE={64-bit,32-bit} x ISA={R2,R6} x JIT={off,on,hardened}

On MIPS32BE and MIPS32LE there was general parity between the results of
interpreter vs. JIT-backed tests with respect to the numbers of PASSED,
SKIPPED, and FAILED tests. The same was also true of MIPS64 retesting.

For example, the results below on MIPS32 are typical. Note that skipped
tests 854 and 855 are "scale" tests which result in OOM on the QEMU malta
MIPS32 test systems.

  root@OpenWrt:~# sysctl net.core.bpf_jit_enable=1
  root@OpenWrt:~# modprobe test_bpf
  ...
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]
  root@OpenWrt:~# ./test_verifier 0 853
  ...
  Summary: 1127 PASSED, 0 SKIPPED, 89 FAILED
  root@OpenWrt:~# ./test_verifier 855 1149
  ...
  Summary: 408 PASSED, 7 SKIPPED, 53 FAILED


Open Questions
==============

1. As seen in the patch series, the static analysis used by the MIPS64 JIT
tends to be fragile in the face of verifier, insn and patching changes.
After tracking down and fixing several related bugs, I wonder if it were
better to remove the static analysis and leave things more robust and
maintainable going forward.

Paul, Thomas, David, what are your views? Do you have thoughts on how best
to do this?

Would it be possible to replace the static analysis by accessing verifier
analysis results from a JIT? Daniel, Alexei, or Andrii?


2. The series tries to correctly handle tailcall counter across bpf2bpf
and tailcalls, and it would be nice to properly support mixing these,
but this is still a WIP for me. Much of what I've read seems very specific
to the x86_64 JIT. Is there a good summary of the required changes for a
JIT in general?

Note: I built a MIPS32LE 'test_progs' after some horrible, ugly hacking,
and the 'tailcall' tests pass but the 'tailcall_bpf2bpf' tests fail
cryptically. I can send a log and strace if someone helpful could kindly
take a look. Is there an alternative, good standalone test available?



Possible Next Steps
===================

1. Implementing the new BPF_ATOMIC insns *should* be straightforward
on MIPS32. I'm less certain of MIPS64 given the static analysis and
related zext/sext logic.

2. The BPF_JMP32 class is another big gap on MIPS64. Has anyone looked at
this before? It also ties to the static analysis, but on first glance
appears feasible.



Thanks in advance for any feedback or suggestions!


Tony Ambardar (14):
  MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis
  MIPS: eBPF: mask 32-bit index for tail calls
  MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis
  MIPS: eBPF: support BPF_JMP32 in JIT static analysis
  MIPS: eBPF: fix system hang with verifier dead-code patching
  MIPS: eBPF: fix JIT static analysis hang with bounded loops
  MIPS: eBPF: fix MOD64 insn on R6 ISA
  MIPS: eBPF: support long jump for BPF_JMP|EXIT
  MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM
  MIPS: eBPF: improve and clarify enum 'which_ebpf_reg'
  MIPS: eBPF: add core support for 32/64-bit systems
  MIPS: eBPF: refactor common MIPS64/MIPS32 functions and headers
  MIPS: uasm: Enable muhu opcode for MIPS R6
  MIPS: eBPF: add MIPS32 JIT

 Documentation/admin-guide/sysctl/net.rst |    6 +-
 Documentation/networking/filter.rst      |    6 +-
 arch/mips/Kconfig                        |    4 +-
 arch/mips/include/asm/uasm.h             |    1 +
 arch/mips/mm/uasm-mips.c                 |    4 +-
 arch/mips/mm/uasm.c                      |    3 +-
 arch/mips/net/Makefile                   |    8 +-
 arch/mips/net/ebpf_jit.c                 | 1935 ----------------------
 arch/mips/net/ebpf_jit.h                 |  295 ++++
 arch/mips/net/ebpf_jit_comp32.c          | 1241 ++++++++++++++
 arch/mips/net/ebpf_jit_comp64.c          |  987 +++++++++++
 arch/mips/net/ebpf_jit_core.c            | 1118 +++++++++++++
 12 files changed, 3663 insertions(+), 1945 deletions(-)
 delete mode 100644 arch/mips/net/ebpf_jit.c
 create mode 100644 arch/mips/net/ebpf_jit.h
 create mode 100644 arch/mips/net/ebpf_jit_comp32.c
 create mode 100644 arch/mips/net/ebpf_jit_comp64.c
 create mode 100644 arch/mips/net/ebpf_jit_core.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 01/14] MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 02/14] MIPS: eBPF: mask 32-bit index for tail calls Tony Ambardar
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Add support in reg_val_propagate_range() for BPF_TAIL_CALL, fixing many
kernel log WARNINGs ("Unhandled BPF_JMP case") seen during JIT testing.
Treat BPF_TAIL_CALL like a NOP, falling through as if the tail call failed.

Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 939dd06764bc..ed47a10d533f 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1714,6 +1714,9 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 				for (reg = BPF_REG_0; reg <= BPF_REG_5; reg++)
 					set_reg_val_type(&exit_rvt, reg, REG_64BIT);
 
+				rvt[idx] |= RVT_DONE;
+				break;
+			case BPF_TAIL_CALL:
 				rvt[idx] |= RVT_DONE;
 				break;
 			default:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 02/14] MIPS: eBPF: mask 32-bit index for tail calls
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 01/14] MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 03/14] MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis Tony Ambardar
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

The program array index for tail-calls should be 32-bit, so zero-extend to
sanitize the value. This fixes failures seen for test_verifier test:

  852/p runtime/jit: pass > 32bit index to tail_call FAIL retval 2 != 42

Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index ed47a10d533f..64f76c7191b1 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -611,6 +611,8 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx, int this_idx)
 	 * if (index >= array->map.max_entries)
 	 *     goto out;
 	 */
+	/* Mask index as 32-bit */
+	emit_instr(ctx, dinsu, MIPS_R_A2, MIPS_R_ZERO, 32, 32);
 	off = offsetof(struct bpf_array, map.max_entries);
 	emit_instr(ctx, lwu, MIPS_R_T5, off, MIPS_R_A1);
 	emit_instr(ctx, sltu, MIPS_R_AT, MIPS_R_T5, MIPS_R_A2);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 03/14] MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 01/14] MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 02/14] MIPS: eBPF: mask 32-bit index for tail calls Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 04/14] MIPS: eBPF: support BPF_JMP32 " Tony Ambardar
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Update reg_val_propagate_range() to add the missing case for BPF_ALU|ARSH,
otherwise zero-extension breaks for this opcode.

Resolves failures for test_verifier tests:
  963/p arsh32 reg zero extend check FAIL retval -1 != 0
  964/u arsh32 imm zero extend check FAIL retval -1 != 0
  964/p arsh32 imm zero extend check FAIL retval -1 != 0

Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 64f76c7191b1..67b45d502435 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1587,6 +1587,7 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 			case BPF_AND:
 			case BPF_LSH:
 			case BPF_RSH:
+			case BPF_ARSH:
 			case BPF_NEG:
 			case BPF_MOD:
 			case BPF_XOR:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 04/14] MIPS: eBPF: support BPF_JMP32 in JIT static analysis
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (2 preceding siblings ...)
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 03/14] MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 05/14] MIPS: eBPF: fix system hang with verifier dead-code patching Tony Ambardar
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

While the MIPS64 JIT rejects programs with JMP32 insns, it still performs
initial static analysis. Add support in reg_val_propagate_range() for
BPF_JMP32, fixing kernel log WARNINGs ("Unhandled BPF_JMP case") seen
during JIT testing. Handle code BPF_JMP32 the same as BPF_JMP.

Fixes: 092ed0968bb6 ("bpf: verifier support JMP32")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 67b45d502435..ad0e54a842fc 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1683,6 +1683,7 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 			rvt[idx] |= RVT_DONE;
 			break;
 		case BPF_JMP:
+		case BPF_JMP32:
 			switch (BPF_OP(insn->code)) {
 			case BPF_EXIT:
 				rvt[idx] = RVT_DONE | exit_rvt;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 05/14] MIPS: eBPF: fix system hang with verifier dead-code patching
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (3 preceding siblings ...)
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 04/14] MIPS: eBPF: support BPF_JMP32 " Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 06/14] MIPS: eBPF: fix JIT static analysis hang with bounded loops Tony Ambardar
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Commit 2a5418a13fcf changed verifier dead code handling from patching with
NOPs to using a loop trap made with BPF_JMP_IMM(BPF_JA, 0, 0, -1). This
confuses the JIT static analysis, which follows the loop assuming the
verifier passed safe code, and results in a system hang and RCU stall.
Update reg_val_propagate_range() to fall through these trap insns.

Trigger the bug using test_verifier "check known subreg with unknown reg".

Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index ad0e54a842fc..e60a089ee3b3 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1691,6 +1691,14 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 				return idx;
 			case BPF_JA:
 				rvt[idx] |= RVT_DONE;
+				/*
+				 * Verifier dead code patching can use
+				 * infinite-loop traps, causing hangs and
+				 * RCU stalls here. Treat traps as nops
+				 * if detected and fall through.
+				 */
+				if (insn->off == -1)
+					break;
 				idx += insn->off;
 				break;
 			case BPF_JEQ:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 06/14] MIPS: eBPF: fix JIT static analysis hang with bounded loops
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (4 preceding siblings ...)
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 05/14] MIPS: eBPF: fix system hang with verifier dead-code patching Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 07/14] MIPS: eBPF: fix MOD64 insn on R6 ISA Tony Ambardar
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Support for bounded loops allowed the verifier to output backward jumps
such as BPF_JMP_A(-4). These trap the JIT's static analysis in a loop,
resulting in a system hang and eventual RCU stall.

Fix by updating reg_val_propagate_range() to skip backward jumps when in
fallthrough mode and if the jump target has been visited already.

Trigger the bug using the test_verifier test "bounded loop that jumps out
rather than in".

Fixes: 2589726d12a1 ("bpf: introduce bounded loops")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index e60a089ee3b3..4f641dcb2031 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1690,6 +1690,10 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 				rvt[prog->len] = exit_rvt;
 				return idx;
 			case BPF_JA:
+			{
+				int tgt = idx + 1 + insn->off;
+				bool visited = (rvt[tgt] & RVT_FALL_THROUGH);
+
 				rvt[idx] |= RVT_DONE;
 				/*
 				 * Verifier dead code patching can use
@@ -1699,8 +1703,16 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 				 */
 				if (insn->off == -1)
 					break;
+				/*
+				 * Bounded loops cause the same issues in
+				 * fallthrough mode; follow only if jump
+				 * target is unvisited to mitigate.
+				 */
+				if (insn->off < 0 && !follow_taken && visited)
+					break;
 				idx += insn->off;
 				break;
+			}
 			case BPF_JEQ:
 			case BPF_JGT:
 			case BPF_JGE:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 07/14] MIPS: eBPF: fix MOD64 insn on R6 ISA
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (5 preceding siblings ...)
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 06/14] MIPS: eBPF: fix JIT static analysis hang with bounded loops Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 08/14] MIPS: eBPF: support long jump for BPF_JMP|EXIT Tony Ambardar
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

The BPF_ALU64 | BPF_MOD implementation is broken on MIPS64R6 due to use of
a 32-bit "modu" insn, as shown by the test_verifier failures:

  455/p MOD64 overflow, check 1 FAIL retval 0 != 1 (run 1/1)
  456/p MOD64 overflow, check 2 FAIL retval 0 != 1 (run 1/1)

Resolve by using the 64-bit "dmodu" instead.

Fixes: 6c2c8a188868 ("MIPS: eBPF: Provide eBPF support for MIPS64R6")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 4f641dcb2031..e8c403c6cfa3 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -800,7 +800,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			if (bpf_op == BPF_DIV)
 				emit_instr(ctx, ddivu_r6, dst, dst, MIPS_R_AT);
 			else
-				emit_instr(ctx, modu, dst, dst, MIPS_R_AT);
+				emit_instr(ctx, dmodu, dst, dst, MIPS_R_AT);
 			break;
 		}
 		emit_instr(ctx, ddivu, dst, MIPS_R_AT);
@@ -882,7 +882,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 					emit_instr(ctx, ddivu_r6,
 							dst, dst, src);
 				else
-					emit_instr(ctx, modu, dst, dst, src);
+					emit_instr(ctx, dmodu, dst, dst, src);
 				break;
 			}
 			emit_instr(ctx, ddivu, dst, src);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 08/14] MIPS: eBPF: support long jump for BPF_JMP|EXIT
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (6 preceding siblings ...)
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 07/14] MIPS: eBPF: fix MOD64 insn on R6 ISA Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 09/14] MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM Tony Ambardar
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Existing JIT code supports only short (18-bit) branches for BPF EXIT, and
results in some tests from module test_bpf not being jited. Update code
to fall back to long (28-bit) jumps if short branches are insufficient.

Before:
  test_bpf: #296 BPF_MAXINSNS: exec all MSH jited:0 1556004 PASS
  test_bpf: #297 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 824957 PASS
  test_bpf: Summary: 378 PASSED, 0 FAILED, [364/366 JIT'ed]

After:
  test_bpf: #296 BPF_MAXINSNS: exec all MSH jited:1 221998 PASS
  test_bpf: #297 BPF_MAXINSNS: ld_abs+get_processor_id jited:1 490507 PASS
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]

Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index e8c403c6cfa3..f510c692975e 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -994,9 +994,14 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_EXIT:
 		if (this_idx + 1 < exit_idx) {
 			b_off = b_imm(exit_idx, ctx);
-			if (is_bad_offset(b_off))
-				return -E2BIG;
-			emit_instr(ctx, beq, MIPS_R_ZERO, MIPS_R_ZERO, b_off);
+			if (is_bad_offset(b_off)) {
+				target = j_target(ctx, exit_idx);
+				if (target == (unsigned int)-1)
+					return -E2BIG;
+				emit_instr(ctx, j, target);
+			} else {
+				emit_instr(ctx, b, b_off);
+			}
 			emit_instr(ctx, nop);
 		}
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 09/14] MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (7 preceding siblings ...)
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 08/14] MIPS: eBPF: support long jump for BPF_JMP|EXIT Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 10/14] MIPS: eBPF: improve and clarify enum 'which_ebpf_reg' Tony Ambardar
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Stop enforcing (insn->src_reg == 0) and allow special verifier insns such
as "BPF_LD_MAP_FD(BPF_REG_2, 0)", used to refer to a process-local map fd
and introduced in 0246e64d9a5f ("bpf: handle pseudo BPF_LD_IMM64 insn").

This is consistent with other JITs such as riscv32 and also used by
test_verifier (e.g. "runtime/jit: tail_call within bounds, prog once").

Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index f510c692975e..bba41f334d07 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1303,8 +1303,6 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		emit_instr(ctx, nop);
 		break;
 	case BPF_LD | BPF_DW | BPF_IMM:
-		if (insn->src_reg != 0)
-			return -EINVAL;
 		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
 		if (dst < 0)
 			return dst;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 10/14] MIPS: eBPF: improve and clarify enum 'which_ebpf_reg'
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (8 preceding siblings ...)
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 09/14] MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 11/14] MIPS: eBPF: add core support for 32/64-bit systems Tony Ambardar
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Update enum and literals to be more consistent and less confusing. Prior
usage resulted in 'dst_reg' variables and 'dst_reg' enum literals, often
adjacent to each other and hampering maintenance.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 64 ++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index bba41f334d07..61d7894051aa 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -177,11 +177,11 @@ static u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
 		(ctx->idx * 4) - 4;
 }
 
-enum which_ebpf_reg {
-	src_reg,
-	src_reg_no_fp,
-	dst_reg,
-	dst_reg_fp_ok
+enum reg_usage {
+	REG_SRC_FP_OK,
+	REG_SRC_NO_FP,
+	REG_DST_FP_OK,
+	REG_DST_NO_FP
 };
 
 /*
@@ -192,9 +192,9 @@ enum which_ebpf_reg {
  */
 static int ebpf_to_mips_reg(struct jit_ctx *ctx,
 			    const struct bpf_insn *insn,
-			    enum which_ebpf_reg w)
+			    enum reg_usage u)
 {
-	int ebpf_reg = (w == src_reg || w == src_reg_no_fp) ?
+	int ebpf_reg = (u == REG_SRC_FP_OK || u == REG_SRC_NO_FP) ?
 		insn->src_reg : insn->dst_reg;
 
 	switch (ebpf_reg) {
@@ -223,7 +223,7 @@ static int ebpf_to_mips_reg(struct jit_ctx *ctx,
 		ctx->flags |= EBPF_SAVE_S3;
 		return MIPS_R_S3;
 	case BPF_REG_10:
-		if (w == dst_reg || w == src_reg_no_fp)
+		if (u == REG_DST_NO_FP || u == REG_SRC_NO_FP)
 			goto bad_reg;
 		ctx->flags |= EBPF_SEEN_FP;
 		/*
@@ -423,7 +423,7 @@ static int gen_imm_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			int idx)
 {
 	int upper_bound, lower_bound;
-	int dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+	int dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 
 	if (dst < 0)
 		return dst;
@@ -696,7 +696,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			return r;
 		break;
 	case BPF_ALU64 | BPF_MUL | BPF_K: /* ALU64_IMM */
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		if (get_reg_val_type(ctx, this_idx, insn->dst_reg) == REG_32BIT)
@@ -712,7 +712,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		}
 		break;
 	case BPF_ALU64 | BPF_NEG | BPF_K: /* ALU64_IMM */
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		if (get_reg_val_type(ctx, this_idx, insn->dst_reg) == REG_32BIT)
@@ -720,7 +720,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		emit_instr(ctx, dsubu, dst, MIPS_R_ZERO, dst);
 		break;
 	case BPF_ALU | BPF_MUL | BPF_K: /* ALU_IMM */
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -739,7 +739,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		}
 		break;
 	case BPF_ALU | BPF_NEG | BPF_K: /* ALU_IMM */
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -753,7 +753,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU | BPF_MOD | BPF_K: /* ALU_IMM */
 		if (insn->imm == 0)
 			return -EINVAL;
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -784,7 +784,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU64 | BPF_MOD | BPF_K: /* ALU_IMM */
 		if (insn->imm == 0)
 			return -EINVAL;
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		if (get_reg_val_type(ctx, this_idx, insn->dst_reg) == REG_32BIT)
@@ -821,8 +821,8 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU64 | BPF_LSH | BPF_X: /* ALU64_REG */
 	case BPF_ALU64 | BPF_RSH | BPF_X: /* ALU64_REG */
 	case BPF_ALU64 | BPF_ARSH | BPF_X: /* ALU64_REG */
-		src = ebpf_to_mips_reg(ctx, insn, src_reg);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (src < 0 || dst < 0)
 			return -EINVAL;
 		if (get_reg_val_type(ctx, this_idx, insn->dst_reg) == REG_32BIT)
@@ -917,8 +917,8 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU | BPF_LSH | BPF_X: /* ALU_REG */
 	case BPF_ALU | BPF_RSH | BPF_X: /* ALU_REG */
 	case BPF_ALU | BPF_ARSH | BPF_X: /* ALU_REG */
-		src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (src < 0 || dst < 0)
 			return -EINVAL;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -1008,7 +1008,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JEQ | BPF_K: /* JMP_IMM */
 	case BPF_JMP | BPF_JNE | BPF_K: /* JMP_IMM */
 		cmp_eq = (bpf_op == BPF_JEQ);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg_fp_ok);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
 		if (insn->imm == 0) {
@@ -1029,8 +1029,8 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JGT | BPF_X:
 	case BPF_JMP | BPF_JGE | BPF_X:
 	case BPF_JMP | BPF_JSET | BPF_X:
-		src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (src < 0 || dst < 0)
 			return -EINVAL;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -1160,7 +1160,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JSLT | BPF_K: /* JMP_IMM */
 	case BPF_JMP | BPF_JSLE | BPF_K: /* JMP_IMM */
 		cmp_eq = (bpf_op == BPF_JSGE);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg_fp_ok);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
 
@@ -1235,7 +1235,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JLT | BPF_K:
 	case BPF_JMP | BPF_JLE | BPF_K:
 		cmp_eq = (bpf_op == BPF_JGE);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg_fp_ok);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
 		/*
@@ -1258,7 +1258,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		goto jeq_common;
 
 	case BPF_JMP | BPF_JSET | BPF_K: /* JMP_IMM */
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg_fp_ok);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
 
@@ -1303,7 +1303,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		emit_instr(ctx, nop);
 		break;
 	case BPF_LD | BPF_DW | BPF_IMM:
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		t64 = ((u64)(u32)insn->imm) | ((u64)(insn + 1)->imm << 32);
@@ -1326,7 +1326,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 
 	case BPF_ALU | BPF_END | BPF_FROM_BE:
 	case BPF_ALU | BPF_END | BPF_FROM_LE:
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -1369,7 +1369,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			dst = MIPS_R_SP;
 			mem_off = insn->off + MAX_BPF_STACK;
 		} else {
-			dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+			dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 			if (dst < 0)
 				return dst;
 			mem_off = insn->off;
@@ -1400,12 +1400,12 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			src = MIPS_R_SP;
 			mem_off = insn->off + MAX_BPF_STACK;
 		} else {
-			src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
+			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
 			if (src < 0)
 				return src;
 			mem_off = insn->off;
 		}
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		switch (BPF_SIZE(insn->code)) {
@@ -1435,12 +1435,12 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			dst = MIPS_R_SP;
 			mem_off = insn->off + MAX_BPF_STACK;
 		} else {
-			dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+			dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 			if (dst < 0)
 				return dst;
 			mem_off = insn->off;
 		}
-		src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
 		if (src < 0)
 			return src;
 		if (BPF_MODE(insn->code) == BPF_ATOMIC) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 11/14] MIPS: eBPF: add core support for 32/64-bit systems
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (9 preceding siblings ...)
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 10/14] MIPS: eBPF: improve and clarify enum 'which_ebpf_reg' Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 13/14] MIPS: uasm: Enable muhu opcode for MIPS R6 Tony Ambardar
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Update register definitions and flags for both 32/64-bit operation. Add a
common register lookup table and modify ebpf_to_mips_reg() to use this,
and add is64bit() and isbigend() common helper functions.

On MIPS32, BPF registers are held in register pairs defined by the base
register. Word-size and endian-aware helper macros select 32-bit registers
from a pair and generate 32-bit word memory storage offsets. The BPF TCC
is stored to the stack due to register pressure.

Modify build_int_prologue() and build_int_epilogue() to handle MIPS32
registers and any adjustments needed during program entry/exit/transfer
if transitioning between the native N64/O32 ABI and the BPF 64-bit ABI.
Also ensure ABI-consistent stack alignment and use verifier-provided stack
depth during setup.

Update emit_const_to_reg() to work across MIPS64 and MIPS32 systems and
optimize gen_imm_to_reg() to only set the lower halfword if needed.

Rework emit_bpf_tail_call() to also support MIPS32 usage and add common
helpers, emit_bpf_call() and emit_push_args(), handling TCC and ABI
variations on MIPS32/MIPS64. Add tail_call_present() and update tailcall
handling to support mixing BPF2BPF subprograms and tailcalls.

Add sign and zero-extension helpers usable with verifier zext insertion,
gen_zext_insn() and gen_sext_insn().

Add common functions emit_caller_save() and emit_caller_restore(), which
push and pop all caller-saved BPF registers to the stack, for use with
JIT-internal kernel calls such as those needed for BPF insns unsupported
by native system ISA opcodes. Since these calls would be hidden from any
BPF C compiler, which would normally spill needed registers during a call,
the JIT must handle save/restore itself.

Adopt a dedicated BPF FP (in MIPS_R_S8), and relax FP usage within insns.
This reduces ad-hoc code doing $sp manipulation with temp registers, and
allows wider usage of BPF FP for comparison and arithmetic. For example,
the following tests from test_verifier are now jited but not previously:

  939/p store PTR_TO_STACK in R10 to array map using BPF_B
  981/p unpriv: cmp pointer with pointer
  984/p unpriv: indirectly pass pointer on stack to helper function
  985/p unpriv: mangle pointer on stack 1
  986/p unpriv: mangle pointer on stack 2
  1001/p unpriv: partial copy of pointer
  1097/p xadd/w check whether src/dst got mangled, 1
  1098/p xadd/w check whether src/dst got mangled, 2

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 750 +++++++++++++++++++++++++++++----------
 1 file changed, 563 insertions(+), 187 deletions(-)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 61d7894051aa..525f1df8db21 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1,11 +1,13 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Just-In-Time compiler for eBPF filters on MIPS
- *
- * Copyright (c) 2017 Cavium, Inc.
+ * Just-In-Time compiler for eBPF filters on MIPS32/MIPS64
+ * Copyright (c) 2021 Tony Ambardar <Tony.Ambardar@gmail.com>
  *
  * Based on code from:
  *
+ * Copyright (c) 2017 Cavium, Inc.
+ * Author: David Daney <david.daney@cavium.com>
+ *
  * Copyright (c) 2014 Imagination Technologies Ltd.
  * Author: Markos Chandras <markos.chandras@imgtec.com>
  */
@@ -22,31 +24,42 @@
 #include <asm/isa-rev.h>
 #include <asm/uasm.h>
 
-/* Registers used by JIT */
+/* Registers used by JIT:	  (MIPS32)	(MIPS64) */
 #define MIPS_R_ZERO	0
 #define MIPS_R_AT	1
-#define MIPS_R_V0	2	/* BPF_R0 */
-#define MIPS_R_V1	3
-#define MIPS_R_A0	4	/* BPF_R1 */
-#define MIPS_R_A1	5	/* BPF_R2 */
-#define MIPS_R_A2	6	/* BPF_R3 */
-#define MIPS_R_A3	7	/* BPF_R4 */
-#define MIPS_R_A4	8	/* BPF_R5 */
-#define MIPS_R_T4	12	/* BPF_AX */
-#define MIPS_R_T5	13
-#define MIPS_R_T6	14
-#define MIPS_R_T7	15
-#define MIPS_R_S0	16	/* BPF_R6 */
-#define MIPS_R_S1	17	/* BPF_R7 */
-#define MIPS_R_S2	18	/* BPF_R8 */
-#define MIPS_R_S3	19	/* BPF_R9 */
-#define MIPS_R_S4	20	/* BPF_TCC */
-#define MIPS_R_S5	21
-#define MIPS_R_S6	22
-#define MIPS_R_S7	23
-#define MIPS_R_T8	24
-#define MIPS_R_T9	25
+#define MIPS_R_V0	2	/* BPF_R0	BPF_R0 */
+#define MIPS_R_V1	3	/* BPF_R0	BPF_TCC */
+#define MIPS_R_A0	4	/* BPF_R1	BPF_R1 */
+#define MIPS_R_A1	5	/* BPF_R1	BPF_R2 */
+#define MIPS_R_A2	6	/* BPF_R2	BPF_R3 */
+#define MIPS_R_A3	7	/* BPF_R2	BPF_R4 */
+
+/* MIPS64 replaces T0-T3 scratch regs with extra arguments A4-A7. */
+#ifdef CONFIG_64BIT
+#  define MIPS_R_A4	8	/* (n/a)	BPF_R5 */
+#else
+#  define MIPS_R_T0	8	/* BPF_R3	(n/a)  */
+#  define MIPS_R_T1	9	/* BPF_R3	(n/a)  */
+#  define MIPS_R_T2	10	/* BPF_R4	(n/a)  */
+#  define MIPS_R_T3	11	/* BPF_R4	(n/a)  */
+#endif
+
+#define MIPS_R_T4	12	/* BPF_R5	BPF_AX */
+#define MIPS_R_T5	13	/* BPF_R5	(free) */
+#define MIPS_R_T6	14	/* BPF_AX	(used) */
+#define MIPS_R_T7	15	/* BPF_AX	(free) */
+#define MIPS_R_S0	16	/* BPF_R6	BPF_R6 */
+#define MIPS_R_S1	17	/* BPF_R6	BPF_R7 */
+#define MIPS_R_S2	18	/* BPF_R7	BPF_R8 */
+#define MIPS_R_S3	19	/* BPF_R7	BPF_R9 */
+#define MIPS_R_S4	20	/* BPF_R8	BPF_TCC */
+#define MIPS_R_S5	21	/* BPF_R8	(free) */
+#define MIPS_R_S6	22	/* BPF_R9	(free) */
+#define MIPS_R_S7	23	/* BPF_R9	(free) */
+#define MIPS_R_T8	24	/* (used)	(used) */
+#define MIPS_R_T9	25	/* (used)	(used) */
 #define MIPS_R_SP	29
+#define MIPS_R_S8	30	/* BPF_R10	BPF_R10 */
 #define MIPS_R_RA	31
 
 /* eBPF flags */
@@ -55,10 +68,117 @@
 #define EBPF_SAVE_S2	BIT(2)
 #define EBPF_SAVE_S3	BIT(3)
 #define EBPF_SAVE_S4	BIT(4)
-#define EBPF_SAVE_RA	BIT(5)
-#define EBPF_SEEN_FP	BIT(6)
-#define EBPF_SEEN_TC	BIT(7)
-#define EBPF_TCC_IN_V1	BIT(8)
+#define EBPF_SAVE_S5	BIT(5)
+#define EBPF_SAVE_S6	BIT(6)
+#define EBPF_SAVE_S7	BIT(7)
+#define EBPF_SAVE_S8	BIT(8)
+#define EBPF_SAVE_RA	BIT(9)
+#define EBPF_SEEN_FP	BIT(10)
+#define EBPF_SEEN_TC	BIT(11)
+#define EBPF_TCC_IN_RUN	BIT(12)
+
+/*
+ * Extra JIT registers dedicated to holding TCC during runtime or saving
+ * across calls.
+ */
+enum {
+	JIT_RUN_TCC = MAX_BPF_JIT_REG,
+	JIT_SAV_TCC
+};
+/* Temporary register for passing TCC if nothing dedicated. */
+#define TEMP_PASS_TCC MIPS_R_T8
+
+/*
+ * Word-size and endianness-aware helpers for building MIPS32 vs MIPS64
+ * tables and selecting 32-bit subregisters from a register pair base.
+ * Simplify use by emulating MIPS_R_SP and MIPS_R_ZERO as register pairs
+ * and adding HI/LO word memory offsets.
+ */
+#ifdef CONFIG_64BIT
+#  define HI(reg) (reg)
+#  define LO(reg) (reg)
+#  define OFFHI(mem) (mem)
+#  define OFFLO(mem) (mem)
+#else	/* CONFIG_32BIT */
+#  ifdef __BIG_ENDIAN
+#    define HI(reg) ((reg) == MIPS_R_SP ? MIPS_R_ZERO : \
+		     (reg) == MIPS_R_S8 ? MIPS_R_ZERO : \
+		     (reg))
+#    define LO(reg) ((reg) == MIPS_R_ZERO ? (reg) : \
+		     (reg) == MIPS_R_SP ? (reg) : \
+		     (reg) == MIPS_R_S8 ? (reg) : \
+		     (reg) + 1)
+#    define OFFHI(mem) (mem)
+#    define OFFLO(mem) ((mem) + sizeof(long))
+#  else	/* __LITTLE_ENDIAN */
+#    define HI(reg) ((reg) == MIPS_R_ZERO ? (reg) : \
+		     (reg) == MIPS_R_SP ? MIPS_R_ZERO : \
+		     (reg) == MIPS_R_S8 ? MIPS_R_ZERO : \
+		     (reg) + 1)
+#    define LO(reg) (reg)
+#    define OFFHI(mem) ((mem) + sizeof(long))
+#    define OFFLO(mem) (mem)
+#  endif
+#endif
+
+#ifdef CONFIG_64BIT
+#  define M(expr32, expr64) (expr64)
+#else
+#  define M(expr32, expr64) (expr32)
+#endif
+const struct {
+	/* Register or pair base */
+	int reg;
+	/* Register flags */
+	u32 flags;
+	/* Usage table:   (MIPS32)			 (MIPS64) */
+} bpf2mips[] = {
+	/* Return value from in-kernel function, and exit value from eBPF. */
+	[BPF_REG_0] =  {M(MIPS_R_V0,			MIPS_R_V0)},
+	/* Arguments from eBPF program to in-kernel/BPF functions. */
+	[BPF_REG_1] =  {M(MIPS_R_A0,			MIPS_R_A0)},
+	[BPF_REG_2] =  {M(MIPS_R_A2,			MIPS_R_A1)},
+	[BPF_REG_3] =  {M(MIPS_R_T0,			MIPS_R_A2)},
+	[BPF_REG_4] =  {M(MIPS_R_T2,			MIPS_R_A3)},
+	[BPF_REG_5] =  {M(MIPS_R_T4,			MIPS_R_A4)},
+	/* Callee-saved registers preserved by in-kernel/BPF functions. */
+	[BPF_REG_6] =  {M(MIPS_R_S0,			MIPS_R_S0),
+			M(EBPF_SAVE_S0|EBPF_SAVE_S1,	EBPF_SAVE_S0)},
+	[BPF_REG_7] =  {M(MIPS_R_S2,			MIPS_R_S1),
+			M(EBPF_SAVE_S2|EBPF_SAVE_S3,	EBPF_SAVE_S1)},
+	[BPF_REG_8] =  {M(MIPS_R_S4,			MIPS_R_S2),
+			M(EBPF_SAVE_S4|EBPF_SAVE_S5,	EBPF_SAVE_S2)},
+	[BPF_REG_9] =  {M(MIPS_R_S6,			MIPS_R_S3),
+			M(EBPF_SAVE_S6|EBPF_SAVE_S7,	EBPF_SAVE_S3)},
+	[BPF_REG_10] = {M(MIPS_R_S8,			MIPS_R_S8),
+			M(EBPF_SAVE_S8|EBPF_SEEN_FP,	EBPF_SAVE_S8|EBPF_SEEN_FP)},
+	/* Internal register for rewriting insns during JIT blinding. */
+	[BPF_REG_AX] = {M(MIPS_R_T6,			MIPS_R_T4)},
+	/*
+	 * Internal registers for TCC runtime holding and saving during
+	 * calls. A zero save register indicates using scratch space on
+	 * the stack for storage during calls. A zero hold register means
+	 * no dedicated register holds TCC during runtime (but a temp reg
+	 * still passes TCC to tailcall or bpf2bpf call).
+	 */
+	[JIT_RUN_TCC] =	{M(0,				MIPS_R_V1)},
+	[JIT_SAV_TCC] =	{M(0,				MIPS_R_S4),
+			 M(0,				EBPF_SAVE_S4)}
+};
+#undef M
+
+static inline bool is64bit(void)
+{
+	return IS_ENABLED(CONFIG_64BIT);
+}
+
+static inline bool isbigend(void)
+{
+	return IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
+}
+
+/* Stack region alignment under N64 and O32 ABIs */
+#define STACK_ALIGN (2 * sizeof(long))
 
 /*
  * For the mips64 ISA, we need to track the value range or type for
@@ -100,6 +220,7 @@ enum reg_val_type {
 struct jit_ctx {
 	const struct bpf_prog *skf;
 	int stack_size;
+	int bpf_stack_off;
 	u32 idx;
 	u32 flags;
 	u32 *offsets;
@@ -177,6 +298,34 @@ static u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
 		(ctx->idx * 4) - 4;
 }
 
+/* Sign-extend dst register or HI 32-bit reg of pair. */
+static inline void gen_sext_insn(int dst, struct jit_ctx *ctx)
+{
+	if (is64bit())
+		emit_instr(ctx, sll, dst, dst, 0);
+	else
+		emit_instr(ctx, sra, HI(dst), LO(dst), 31);
+}
+
+/*
+ * Zero-extend dst register or HI 32-bit reg of pair, if either forced
+ * or the BPF verifier does not insert its own zext insns.
+ */
+static inline void gen_zext_insn(int dst, bool force, struct jit_ctx *ctx)
+{
+	if (!ctx->skf->aux->verifier_zext || force) {
+		if (is64bit())
+			emit_instr(ctx, dinsu, dst, MIPS_R_ZERO, 32, 32);
+		else
+			emit_instr(ctx, and, HI(dst), MIPS_R_ZERO, MIPS_R_ZERO);
+	}
+}
+
+static inline bool tail_call_present(struct jit_ctx *ctx)
+{
+	return ctx->flags & EBPF_SEEN_TC || ctx->skf->aux->tail_call_reachable;
+}
+
 enum reg_usage {
 	REG_SRC_FP_OK,
 	REG_SRC_NO_FP,
@@ -186,9 +335,8 @@ enum reg_usage {
 
 /*
  * For eBPF, the register mapping naturally falls out of the
- * requirements of eBPF and the MIPS n64 ABI.  We don't maintain a
- * separate frame pointer, so BPF_REG_10 relative accesses are
- * adjusted to be $sp relative.
+ * requirements of eBPF and the MIPS N64/O32 ABIs. We also maintain
+ * a separate frame pointer, setting BPF_REG_10 relative to $sp.
  */
 static int ebpf_to_mips_reg(struct jit_ctx *ctx,
 			    const struct bpf_insn *insn,
@@ -199,110 +347,139 @@ static int ebpf_to_mips_reg(struct jit_ctx *ctx,
 
 	switch (ebpf_reg) {
 	case BPF_REG_0:
-		return MIPS_R_V0;
 	case BPF_REG_1:
-		return MIPS_R_A0;
 	case BPF_REG_2:
-		return MIPS_R_A1;
 	case BPF_REG_3:
-		return MIPS_R_A2;
 	case BPF_REG_4:
-		return MIPS_R_A3;
 	case BPF_REG_5:
-		return MIPS_R_A4;
 	case BPF_REG_6:
-		ctx->flags |= EBPF_SAVE_S0;
-		return MIPS_R_S0;
 	case BPF_REG_7:
-		ctx->flags |= EBPF_SAVE_S1;
-		return MIPS_R_S1;
 	case BPF_REG_8:
-		ctx->flags |= EBPF_SAVE_S2;
-		return MIPS_R_S2;
 	case BPF_REG_9:
-		ctx->flags |= EBPF_SAVE_S3;
-		return MIPS_R_S3;
+	case BPF_REG_AX:
+		ctx->flags |= bpf2mips[ebpf_reg].flags;
+		return bpf2mips[ebpf_reg].reg;
 	case BPF_REG_10:
 		if (u == REG_DST_NO_FP || u == REG_SRC_NO_FP)
 			goto bad_reg;
-		ctx->flags |= EBPF_SEEN_FP;
-		/*
-		 * Needs special handling, return something that
-		 * cannot be clobbered just in case.
-		 */
-		return MIPS_R_ZERO;
-	case BPF_REG_AX:
-		return MIPS_R_T4;
+		ctx->flags |= bpf2mips[ebpf_reg].flags;
+		return bpf2mips[ebpf_reg].reg;
 	default:
 bad_reg:
 		WARN(1, "Illegal bpf reg: %d\n", ebpf_reg);
 		return -EINVAL;
 	}
 }
+
 /*
  * eBPF stack frame will be something like:
  *
  *  Entry $sp ------>   +--------------------------------+
  *                      |   $ra  (optional)              |
  *                      +--------------------------------+
- *                      |   $s0  (optional)              |
+ *                      |   $s8  (optional)              |
  *                      +--------------------------------+
- *                      |   $s1  (optional)              |
+ *                      |   $s7  (optional)              |
  *                      +--------------------------------+
- *                      |   $s2  (optional)              |
+ *                      |   $s6  (optional)              |
  *                      +--------------------------------+
- *                      |   $s3  (optional)              |
+ *                      |   $s5  (optional)              |
  *                      +--------------------------------+
  *                      |   $s4  (optional)              |
  *                      +--------------------------------+
- *                      |   tmp-storage  (if $ra saved)  |
- * $sp + tmp_offset --> +--------------------------------+ <--BPF_REG_10
+ *                      |   $s3  (optional)              |
+ *                      +--------------------------------+
+ *                      |   $s2  (optional)              |
+ *                      +--------------------------------+
+ *                      |   $s1  (optional)              |
+ *                      +--------------------------------+
+ *                      |   $s0  (optional)              |
+ *                      +--------------------------------+
+ *                      |   tmp-storage  (optional)      |
+ * $sp + bpf_stack_off->+--------------------------------+ <--BPF_REG_10
  *                      |   BPF_REG_10 relative storage  |
  *                      |    MAX_BPF_STACK (optional)    |
  *                      |      .                         |
  *                      |      .                         |
  *                      |      .                         |
- *     $sp -------->    +--------------------------------+
+ *        $sp ------>   +--------------------------------+
  *
  * If BPF_REG_10 is never referenced, then the MAX_BPF_STACK sized
  * area is not allocated.
  */
-static int gen_int_prologue(struct jit_ctx *ctx)
+static int build_int_prologue(struct jit_ctx *ctx)
 {
+	int tcc_run = bpf2mips[JIT_RUN_TCC].reg ?
+		      bpf2mips[JIT_RUN_TCC].reg :
+		      TEMP_PASS_TCC;
+	int tcc_sav = bpf2mips[JIT_SAV_TCC].reg;
+	const struct bpf_prog *prog = ctx->skf;
+	int r10 = bpf2mips[BPF_REG_10].reg;
+	int r1 = bpf2mips[BPF_REG_1].reg;
 	int stack_adjust = 0;
 	int store_offset;
 	int locals_size;
 
 	if (ctx->flags & EBPF_SAVE_RA)
-		/*
-		 * If RA we are doing a function call and may need
-		 * extra 8-byte tmp area.
-		 */
-		stack_adjust += 2 * sizeof(long);
-	if (ctx->flags & EBPF_SAVE_S0)
 		stack_adjust += sizeof(long);
-	if (ctx->flags & EBPF_SAVE_S1)
+	if (ctx->flags & EBPF_SAVE_S8)
 		stack_adjust += sizeof(long);
-	if (ctx->flags & EBPF_SAVE_S2)
+	if (ctx->flags & EBPF_SAVE_S7)
 		stack_adjust += sizeof(long);
-	if (ctx->flags & EBPF_SAVE_S3)
+	if (ctx->flags & EBPF_SAVE_S6)
+		stack_adjust += sizeof(long);
+	if (ctx->flags & EBPF_SAVE_S5)
 		stack_adjust += sizeof(long);
 	if (ctx->flags & EBPF_SAVE_S4)
 		stack_adjust += sizeof(long);
+	if (ctx->flags & EBPF_SAVE_S3)
+		stack_adjust += sizeof(long);
+	if (ctx->flags & EBPF_SAVE_S2)
+		stack_adjust += sizeof(long);
+	if (ctx->flags & EBPF_SAVE_S1)
+		stack_adjust += sizeof(long);
+	if (ctx->flags & EBPF_SAVE_S0)
+		stack_adjust += sizeof(long);
+	if (tail_call_present(ctx) &&
+	    !(ctx->flags & EBPF_TCC_IN_RUN) && !tcc_sav)
+		/* Allocate scratch space for holding TCC if needed. */
+		stack_adjust += sizeof(long);
 
-	BUILD_BUG_ON(MAX_BPF_STACK & 7);
-	locals_size = (ctx->flags & EBPF_SEEN_FP) ? MAX_BPF_STACK : 0;
+	stack_adjust = ALIGN(stack_adjust, STACK_ALIGN);
+
+	locals_size = (ctx->flags & EBPF_SEEN_FP) ? prog->aux->stack_depth : 0;
+	locals_size = ALIGN(locals_size, STACK_ALIGN);
 
 	stack_adjust += locals_size;
 
 	ctx->stack_size = stack_adjust;
+	ctx->bpf_stack_off = locals_size;
+
+	/*
+	 * First instruction initializes the tail call count (TCC) if
+	 * called from kernel or via BPF tail call. A BPF tail-caller
+	 * will skip this instruction and pass the TCC via register.
+	 * As a BPF2BPF subprog, we are called directly and must avoid
+	 * resetting the TCC.
+	 */
+	if (!ctx->skf->is_func)
+		emit_instr(ctx, addiu, tcc_run, MIPS_R_ZERO, MAX_TAIL_CALL_CNT);
 
 	/*
-	 * First instruction initializes the tail call count (TCC).
-	 * On tail call we skip this instruction, and the TCC is
-	 * passed in $v1 from the caller.
+	 * If called from kernel under O32 ABI we must set up BPF R1 context,
+	 * since BPF R1 is an endian-order regster pair ($a0:$a1 or $a1:$a0)
+	 * but context is always passed in $a0 as 32-bit pointer. Entry from
+	 * a tail-call looks just like a kernel call, which means the caller
+	 * must set up R1 context according to the kernel call ABI. If we are
+	 * a BPF2BPF call then all registers are already correctly set up.
 	 */
-	emit_instr(ctx, addiu, MIPS_R_V1, MIPS_R_ZERO, MAX_TAIL_CALL_CNT);
+	if (!is64bit() && !ctx->skf->is_func) {
+		if (isbigend())
+			emit_instr(ctx, move, LO(r1), MIPS_R_A0);
+		/* Sanitize upper 32-bit reg */
+		gen_zext_insn(r1, true, ctx);
+	}
+
 	if (stack_adjust)
 		emit_instr_long(ctx, daddiu, addiu,
 					MIPS_R_SP, MIPS_R_SP, -stack_adjust);
@@ -316,24 +493,24 @@ static int gen_int_prologue(struct jit_ctx *ctx)
 					MIPS_R_RA, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S0) {
+	if (ctx->flags & EBPF_SAVE_S8) {
 		emit_instr_long(ctx, sd, sw,
-					MIPS_R_S0, store_offset, MIPS_R_SP);
+					MIPS_R_S8, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S1) {
+	if (ctx->flags & EBPF_SAVE_S7) {
 		emit_instr_long(ctx, sd, sw,
-					MIPS_R_S1, store_offset, MIPS_R_SP);
+					MIPS_R_S7, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S2) {
+	if (ctx->flags & EBPF_SAVE_S6) {
 		emit_instr_long(ctx, sd, sw,
-					MIPS_R_S2, store_offset, MIPS_R_SP);
+					MIPS_R_S6, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S3) {
+	if (ctx->flags & EBPF_SAVE_S5) {
 		emit_instr_long(ctx, sd, sw,
-					MIPS_R_S3, store_offset, MIPS_R_SP);
+					MIPS_R_S5, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
 	if (ctx->flags & EBPF_SAVE_S4) {
@@ -341,10 +518,40 @@ static int gen_int_prologue(struct jit_ctx *ctx)
 					MIPS_R_S4, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
+	if (ctx->flags & EBPF_SAVE_S3) {
+		emit_instr_long(ctx, sd, sw,
+					MIPS_R_S3, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S2) {
+		emit_instr_long(ctx, sd, sw,
+					MIPS_R_S2, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S1) {
+		emit_instr_long(ctx, sd, sw,
+					MIPS_R_S1, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S0) {
+		emit_instr_long(ctx, sd, sw,
+					MIPS_R_S0, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+
+	/* Store TCC in backup register or stack scratch space if indicated. */
+	if (tail_call_present(ctx) && !(ctx->flags & EBPF_TCC_IN_RUN)) {
+		if (tcc_sav)
+			emit_instr(ctx, move, tcc_sav, tcc_run);
+		else
+			emit_instr_long(ctx, sd, sw,
+					tcc_run, ctx->bpf_stack_off, MIPS_R_SP);
+	}
 
-	if ((ctx->flags & EBPF_SEEN_TC) && !(ctx->flags & EBPF_TCC_IN_V1))
-		emit_instr_long(ctx, daddu, addu,
-					MIPS_R_S4, MIPS_R_V1, MIPS_R_ZERO);
+	/* Prepare BPF FP as single-reg ptr, emulate upper 32-bits as needed.*/
+	if (ctx->flags & EBPF_SEEN_FP)
+		emit_instr_long(ctx, daddiu, addiu, r10,
+						MIPS_R_SP, ctx->bpf_stack_off);
 
 	return 0;
 }
@@ -354,14 +561,38 @@ static int build_int_epilogue(struct jit_ctx *ctx, int dest_reg)
 	const struct bpf_prog *prog = ctx->skf;
 	int stack_adjust = ctx->stack_size;
 	int store_offset = stack_adjust - sizeof(long);
+	int r1 = bpf2mips[BPF_REG_1].reg;
+	int r0 = bpf2mips[BPF_REG_0].reg;
 	enum reg_val_type td;
-	int r0 = MIPS_R_V0;
 
-	if (dest_reg == MIPS_R_RA) {
-		/* Don't let zero extended value escape. */
-		td = get_reg_val_type(ctx, prog->len, BPF_REG_0);
-		if (td == REG_64BIT)
-			emit_instr(ctx, sll, r0, r0, 0);
+	/*
+	 * Returns from BPF2BPF calls consistently use the BPF 64-bit ABI
+	 * i.e. register usage and mapping between JIT and OS is unchanged.
+	 * Returning to the kernel must follow the N64 or O32 ABI, and for
+	 * the latter requires fixup of BPF R0 to MIPS V0 register mapping.
+	 *
+	 * Tails calls must ensure the passed R1 context is consistent with
+	 * the kernel ABI, and requires fixup on MIPS32 bigendian systems.
+	 */
+	if (dest_reg == MIPS_R_RA && !ctx->skf->is_func) { /* kernel return */
+		if (is64bit()) {
+			/* Don't let zero extended value escape. */
+			td = get_reg_val_type(ctx, prog->len, BPF_REG_0);
+			if (td == REG_64BIT)
+				gen_sext_insn(r0, ctx);
+		} else if (isbigend()) { /* and 32-bit */
+			/*
+			 * O32 ABI specifies 32-bit return value always
+			 * placed in MIPS_R_V0 regardless of the native
+			 * endianness. This would be in the wrong position
+			 * in a BPF R0 reg pair on big-endian systems, so
+			 * we must relocate.
+			 */
+			emit_instr(ctx, move, MIPS_R_V0, LO(r0));
+		}
+	} else if (dest_reg == MIPS_R_T9) { /* tail call */
+		if (!is64bit() && isbigend())
+			emit_instr(ctx, move, MIPS_R_A0, LO(r1));
 	}
 
 	if (ctx->flags & EBPF_SAVE_RA) {
@@ -369,24 +600,24 @@ static int build_int_epilogue(struct jit_ctx *ctx, int dest_reg)
 					MIPS_R_RA, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S0) {
+	if (ctx->flags & EBPF_SAVE_S8) {
 		emit_instr_long(ctx, ld, lw,
-					MIPS_R_S0, store_offset, MIPS_R_SP);
+					MIPS_R_S8, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S1) {
+	if (ctx->flags & EBPF_SAVE_S7) {
 		emit_instr_long(ctx, ld, lw,
-					MIPS_R_S1, store_offset, MIPS_R_SP);
+					MIPS_R_S7, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S2) {
+	if (ctx->flags & EBPF_SAVE_S6) {
 		emit_instr_long(ctx, ld, lw,
-				MIPS_R_S2, store_offset, MIPS_R_SP);
+					MIPS_R_S6, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S3) {
+	if (ctx->flags & EBPF_SAVE_S5) {
 		emit_instr_long(ctx, ld, lw,
-					MIPS_R_S3, store_offset, MIPS_R_SP);
+					MIPS_R_S5, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
 	if (ctx->flags & EBPF_SAVE_S4) {
@@ -394,8 +625,29 @@ static int build_int_epilogue(struct jit_ctx *ctx, int dest_reg)
 					MIPS_R_S4, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
+	if (ctx->flags & EBPF_SAVE_S3) {
+		emit_instr_long(ctx, ld, lw,
+					MIPS_R_S3, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S2) {
+		emit_instr_long(ctx, ld, lw,
+					MIPS_R_S2, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S1) {
+		emit_instr_long(ctx, ld, lw,
+					MIPS_R_S1, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S0) {
+		emit_instr_long(ctx, ld, lw,
+					MIPS_R_S0, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
 	emit_instr(ctx, jr, dest_reg);
 
+	/* Delay slot */
 	if (stack_adjust)
 		emit_instr_long(ctx, daddiu, addiu,
 					MIPS_R_SP, MIPS_R_SP, stack_adjust);
@@ -415,7 +667,9 @@ static void gen_imm_to_reg(const struct bpf_insn *insn, int reg,
 		int upper = insn->imm - lower;
 
 		emit_instr(ctx, lui, reg, upper >> 16);
-		emit_instr(ctx, addiu, reg, reg, lower);
+		/* lui already clears lower halfword */
+		if (lower)
+			emit_instr(ctx, addiu, reg, reg, lower);
 	}
 }
 
@@ -564,12 +818,12 @@ static int gen_imm_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	return 0;
 }
 
-static void emit_const_to_reg(struct jit_ctx *ctx, int dst, u64 value)
+static void emit_const_to_reg(struct jit_ctx *ctx, int dst, unsigned long value)
 {
-	if (value >= 0xffffffffffff8000ull || value < 0x8000ull) {
-		emit_instr(ctx, daddiu, dst, MIPS_R_ZERO, (int)value);
-	} else if (value >= 0xffffffff80000000ull ||
-		   (value < 0x80000000 && value > 0xffff)) {
+	if (value >= S16_MIN || value <= S16_MAX) {
+		emit_instr_long(ctx, daddiu, addiu, dst, MIPS_R_ZERO, (int)value);
+	} else if (value >= S32_MIN ||
+		   (value <= S32_MAX && value > U16_MAX)) {
 		emit_instr(ctx, lui, dst, (s32)(s16)(value >> 16));
 		emit_instr(ctx, ori, dst, dst, (unsigned int)(value & 0xffff));
 	} else {
@@ -601,54 +855,145 @@ static void emit_const_to_reg(struct jit_ctx *ctx, int dst, u64 value)
 	}
 }
 
+/*
+ * Push BPF regs R3-R5 to the stack, skipping BPF regs R1-R2 which are
+ * passed via MIPS register pairs in $a0-$a3. Register order within pairs
+ * and the memory storage order are identical i.e. endian native.
+ */
+static void emit_push_args(struct jit_ctx *ctx)
+{
+	int store_offset = 2 * sizeof(u64); /* Skip R1-R2 in $a0-$a3 */
+	int bpf, reg;
+
+	for (bpf = BPF_REG_3; bpf <= BPF_REG_5; bpf++) {
+		reg = bpf2mips[bpf].reg;
+
+		emit_instr(ctx, sw, LO(reg), OFFLO(store_offset), MIPS_R_SP);
+		emit_instr(ctx, sw, HI(reg), OFFHI(store_offset), MIPS_R_SP);
+		store_offset += sizeof(u64);
+	}
+}
+
+/*
+ * Common helper for BPF_CALL insn, handling TCC and ABI variations.
+ * Kernel calls under O32 ABI require arguments passed on the stack,
+ * while BPF2BPF calls need the TCC passed via register as expected
+ * by the subprog's prologue.
+ *
+ * Under MIPS32 O32 ABI calling convention, u64 BPF regs R1-R2 are passed
+ * via reg pairs in $a0-$a3, while BPF regs R3-R5 are passed via the stack.
+ * Stack space is still reserved for $a0-$a3, and the whole area aligned.
+ */
+#define ARGS_SIZE (5 * sizeof(u64))
+
+void emit_bpf_call(struct jit_ctx *ctx, const struct bpf_insn *insn)
+{
+	int stack_adjust = ALIGN(ARGS_SIZE, STACK_ALIGN);
+	int tcc_run = bpf2mips[JIT_RUN_TCC].reg ?
+		      bpf2mips[JIT_RUN_TCC].reg :
+		      TEMP_PASS_TCC;
+	int tcc_sav = bpf2mips[JIT_SAV_TCC].reg;
+	long func_addr;
+
+	ctx->flags |= EBPF_SAVE_RA;
+
+	/* Ensure TCC passed into BPF subprog */
+	if ((insn->src_reg == BPF_PSEUDO_CALL) &&
+	    tail_call_present(ctx) && !(ctx->flags & EBPF_TCC_IN_RUN)) {
+		/* Set TCC from reg or stack */
+		if (tcc_sav)
+			emit_instr(ctx, move, tcc_run, tcc_sav);
+		else
+			emit_instr_long(ctx, ld, lw, tcc_run,
+						ctx->bpf_stack_off, MIPS_R_SP);
+	}
+
+	/* Push O32 stack args for kernel call */
+	if (!is64bit() && (insn->src_reg != BPF_PSEUDO_CALL)) {
+		emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, -stack_adjust);
+		emit_push_args(ctx);
+	}
+
+	func_addr = (long)__bpf_call_base + insn->imm;
+	emit_const_to_reg(ctx, MIPS_R_T9, func_addr);
+	emit_instr(ctx, jalr, MIPS_R_RA, MIPS_R_T9);
+	/* Delay slot */
+	emit_instr(ctx, nop);
+
+	/* Restore stack */
+	if (!is64bit() && (insn->src_reg != BPF_PSEUDO_CALL))
+		emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, stack_adjust);
+}
+
+/*
+ * Tail call helper arguments passed via BPF ABI as u64 parameters. On
+ * MIPS64 N64 ABI systems these are native regs, while on MIPS32 O32 ABI
+ * systems these are reg pairs:
+ *
+ * R1 -> &ctx
+ * R2 -> &array
+ * R3 -> index
+ */
 static int emit_bpf_tail_call(struct jit_ctx *ctx, int this_idx)
 {
+	int tcc_run = bpf2mips[JIT_RUN_TCC].reg ?
+		      bpf2mips[JIT_RUN_TCC].reg :
+		      TEMP_PASS_TCC;
+	int tcc_sav = bpf2mips[JIT_SAV_TCC].reg;
+	int r2 = bpf2mips[BPF_REG_2].reg;
+	int r3 = bpf2mips[BPF_REG_3].reg;
 	int off, b_off;
-	int tcc_reg;
+	int tcc;
 
 	ctx->flags |= EBPF_SEEN_TC;
 	/*
 	 * if (index >= array->map.max_entries)
 	 *     goto out;
 	 */
-	/* Mask index as 32-bit */
-	emit_instr(ctx, dinsu, MIPS_R_A2, MIPS_R_ZERO, 32, 32);
+	if (is64bit())
+		/* Mask index as 32-bit */
+		gen_zext_insn(r3, true, ctx);
 	off = offsetof(struct bpf_array, map.max_entries);
-	emit_instr(ctx, lwu, MIPS_R_T5, off, MIPS_R_A1);
-	emit_instr(ctx, sltu, MIPS_R_AT, MIPS_R_T5, MIPS_R_A2);
+	emit_instr_long(ctx, lwu, lw, MIPS_R_AT, off, LO(r2));
+	emit_instr(ctx, sltu, MIPS_R_AT, MIPS_R_AT, LO(r3));
 	b_off = b_imm(this_idx + 1, ctx);
-	emit_instr(ctx, bne, MIPS_R_AT, MIPS_R_ZERO, b_off);
+	emit_instr(ctx, bnez, MIPS_R_AT, b_off);
 	/*
 	 * if (TCC-- < 0)
 	 *     goto out;
 	 */
 	/* Delay slot */
-	tcc_reg = (ctx->flags & EBPF_TCC_IN_V1) ? MIPS_R_V1 : MIPS_R_S4;
-	emit_instr(ctx, daddiu, MIPS_R_T5, tcc_reg, -1);
+	tcc = (ctx->flags & EBPF_TCC_IN_RUN) ? tcc_run : tcc_sav;
+	/* Get TCC from reg or stack */
+	if (tcc)
+		emit_instr(ctx, move, MIPS_R_T8, tcc);
+	else
+		emit_instr_long(ctx, ld, lw, MIPS_R_T8,
+						ctx->bpf_stack_off, MIPS_R_SP);
 	b_off = b_imm(this_idx + 1, ctx);
-	emit_instr(ctx, bltz, tcc_reg, b_off);
+	emit_instr(ctx, bltz, MIPS_R_T8, b_off);
 	/*
 	 * prog = array->ptrs[index];
 	 * if (prog == NULL)
 	 *     goto out;
 	 */
 	/* Delay slot */
-	emit_instr(ctx, dsll, MIPS_R_T8, MIPS_R_A2, 3);
-	emit_instr(ctx, daddu, MIPS_R_T8, MIPS_R_T8, MIPS_R_A1);
+	emit_instr_long(ctx, dsll, sll, MIPS_R_AT, LO(r3), ilog2(sizeof(long)));
+	emit_instr_long(ctx, daddu, addu, MIPS_R_AT, MIPS_R_AT, LO(r2));
 	off = offsetof(struct bpf_array, ptrs);
-	emit_instr(ctx, ld, MIPS_R_AT, off, MIPS_R_T8);
+	emit_instr_long(ctx, ld, lw, MIPS_R_AT, off, MIPS_R_AT);
 	b_off = b_imm(this_idx + 1, ctx);
-	emit_instr(ctx, beq, MIPS_R_AT, MIPS_R_ZERO, b_off);
+	emit_instr(ctx, beqz, MIPS_R_AT, b_off);
 	/* Delay slot */
 	emit_instr(ctx, nop);
 
 	/* goto *(prog->bpf_func + 4); */
 	off = offsetof(struct bpf_prog, bpf_func);
-	emit_instr(ctx, ld, MIPS_R_T9, off, MIPS_R_AT);
-	/* All systems are go... propagate TCC */
-	emit_instr(ctx, daddu, MIPS_R_V1, MIPS_R_T5, MIPS_R_ZERO);
+	emit_instr_long(ctx, ld, lw, MIPS_R_T9, off, MIPS_R_AT);
+	/* All systems are go... decrement and propagate TCC */
+	emit_instr_long(ctx, daddiu, addiu, tcc_run, MIPS_R_T8, -1);
 	/* Skip first instruction (TCC initialization) */
-	emit_instr(ctx, daddiu, MIPS_R_T9, MIPS_R_T9, 4);
+	emit_instr_long(ctx, daddiu, addiu, MIPS_R_T9, MIPS_R_T9, 4);
 	return build_int_epilogue(ctx, MIPS_R_T9);
 }
 
@@ -828,15 +1173,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		if (get_reg_val_type(ctx, this_idx, insn->dst_reg) == REG_32BIT)
 			emit_instr(ctx, dinsu, dst, MIPS_R_ZERO, 32, 32);
 		did_move = false;
-		if (insn->src_reg == BPF_REG_10) {
-			if (bpf_op == BPF_MOV) {
-				emit_instr(ctx, daddiu, dst, MIPS_R_SP, MAX_BPF_STACK);
-				did_move = true;
-			} else {
-				emit_instr(ctx, daddiu, MIPS_R_AT, MIPS_R_SP, MAX_BPF_STACK);
-				src = MIPS_R_AT;
-			}
-		} else if (get_reg_val_type(ctx, this_idx, insn->src_reg) == REG_32BIT) {
+		if (get_reg_val_type(ctx, this_idx, insn->src_reg) == REG_32BIT) {
 			int tmp_reg = MIPS_R_AT;
 
 			if (bpf_op == BPF_MOV) {
@@ -917,7 +1254,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU | BPF_LSH | BPF_X: /* ALU_REG */
 	case BPF_ALU | BPF_RSH | BPF_X: /* ALU_REG */
 	case BPF_ALU | BPF_ARSH | BPF_X: /* ALU_REG */
-		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
 		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (src < 0 || dst < 0)
 			return -EINVAL;
@@ -1029,8 +1366,8 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JGT | BPF_X:
 	case BPF_JMP | BPF_JGE | BPF_X:
 	case BPF_JMP | BPF_JSET | BPF_X:
-		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
-		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (src < 0 || dst < 0)
 			return -EINVAL;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -1311,12 +1648,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		return 2; /* Double slot insn */
 
 	case BPF_JMP | BPF_CALL:
-		ctx->flags |= EBPF_SAVE_RA;
-		t64s = (s64)insn->imm + (long)__bpf_call_base;
-		emit_const_to_reg(ctx, MIPS_R_T9, (u64)t64s);
-		emit_instr(ctx, jalr, MIPS_R_RA, MIPS_R_T9);
-		/* delay slot */
-		emit_instr(ctx, nop);
+		emit_bpf_call(ctx, insn);
 		break;
 
 	case BPF_JMP | BPF_TAIL_CALL:
@@ -1364,16 +1696,10 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ST | BPF_H | BPF_MEM:
 	case BPF_ST | BPF_W | BPF_MEM:
 	case BPF_ST | BPF_DW | BPF_MEM:
-		if (insn->dst_reg == BPF_REG_10) {
-			ctx->flags |= EBPF_SEEN_FP;
-			dst = MIPS_R_SP;
-			mem_off = insn->off + MAX_BPF_STACK;
-		} else {
-			dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
-			if (dst < 0)
-				return dst;
-			mem_off = insn->off;
-		}
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		if (dst < 0)
+			return dst;
+		mem_off = insn->off;
 		gen_imm_to_reg(insn, MIPS_R_AT, ctx);
 		switch (BPF_SIZE(insn->code)) {
 		case BPF_B:
@@ -1395,19 +1721,11 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_LDX | BPF_H | BPF_MEM:
 	case BPF_LDX | BPF_W | BPF_MEM:
 	case BPF_LDX | BPF_DW | BPF_MEM:
-		if (insn->src_reg == BPF_REG_10) {
-			ctx->flags |= EBPF_SEEN_FP;
-			src = MIPS_R_SP;
-			mem_off = insn->off + MAX_BPF_STACK;
-		} else {
-			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
-			if (src < 0)
-				return src;
-			mem_off = insn->off;
-		}
 		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
-		if (dst < 0)
-			return dst;
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (dst < 0 || src < 0)
+			return -EINVAL;
+		mem_off = insn->off;
 		switch (BPF_SIZE(insn->code)) {
 		case BPF_B:
 			emit_instr(ctx, lbu, dst, mem_off, src);
@@ -1430,25 +1748,16 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_STX | BPF_DW | BPF_MEM:
 	case BPF_STX | BPF_W | BPF_ATOMIC:
 	case BPF_STX | BPF_DW | BPF_ATOMIC:
-		if (insn->dst_reg == BPF_REG_10) {
-			ctx->flags |= EBPF_SEEN_FP;
-			dst = MIPS_R_SP;
-			mem_off = insn->off + MAX_BPF_STACK;
-		} else {
-			dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
-			if (dst < 0)
-				return dst;
-			mem_off = insn->off;
-		}
-		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
-		if (src < 0)
-			return src;
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
 		if (BPF_MODE(insn->code) == BPF_ATOMIC) {
 			if (insn->imm != BPF_ADD) {
 				pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
 				return -EINVAL;
 			}
-
 			/*
 			 * If mem_off does not fit within the 9 bit ll/sc
 			 * instruction immediate field, use a temp reg.
@@ -1829,6 +2138,73 @@ static void jit_fill_hole(void *area, unsigned int size)
 		uasm_i_break(&p, BRK_BUG); /* Increments p */
 }
 
+/*
+ * Save and restore the BPF VM state across a direct kernel call. This
+ * includes the caller-saved registers used for BPF_REG_0 .. BPF_REG_5
+ * and BPF_REG_AX used by the verifier for blinding and other dark arts.
+ * Restore avoids clobbering bpf_ret, which holds the call return value.
+ * BPF_REG_6 .. BPF_REG_10 and TCC are already callee-saved or on stack.
+ */
+static const int bpf_caller_save[] = {
+	BPF_REG_0,
+	BPF_REG_1,
+	BPF_REG_2,
+	BPF_REG_3,
+	BPF_REG_4,
+	BPF_REG_5,
+	BPF_REG_AX,
+};
+
+#define CALLER_ENV_SIZE (ARRAY_SIZE(bpf_caller_save) * sizeof(u64))
+
+void emit_caller_save(struct jit_ctx *ctx)
+{
+	int stack_adj = ALIGN(CALLER_ENV_SIZE, STACK_ALIGN);
+	int i, bpf, reg, store_offset;
+
+	emit_instr_long(ctx, daddiu, addiu, MIPS_R_SP, MIPS_R_SP, -stack_adj);
+
+	for (i = 0; i < ARRAY_SIZE(bpf_caller_save); i++) {
+		bpf = bpf_caller_save[i];
+		reg = bpf2mips[bpf].reg;
+		store_offset = i * sizeof(u64);
+
+		if (is64bit()) {
+			emit_instr(ctx, sd, reg, store_offset, MIPS_R_SP);
+		} else {
+			emit_instr(ctx, sw, LO(reg),
+						OFFLO(store_offset), MIPS_R_SP);
+			emit_instr(ctx, sw, HI(reg),
+						OFFHI(store_offset), MIPS_R_SP);
+		}
+	}
+}
+
+void emit_caller_restore(struct jit_ctx *ctx, int bpf_ret)
+{
+	int stack_adj = ALIGN(CALLER_ENV_SIZE, STACK_ALIGN);
+	int i, bpf, reg, store_offset;
+
+	for (i = 0; i < ARRAY_SIZE(bpf_caller_save); i++) {
+		bpf = bpf_caller_save[i];
+		reg = bpf2mips[bpf].reg;
+		store_offset = i * sizeof(u64);
+		if (bpf == bpf_ret)
+			continue;
+
+		if (is64bit()) {
+			emit_instr(ctx, ld, reg, store_offset, MIPS_R_SP);
+		} else {
+			emit_instr(ctx, lw, LO(reg),
+						OFFLO(store_offset), MIPS_R_SP);
+			emit_instr(ctx, lw, HI(reg),
+						OFFHI(store_offset), MIPS_R_SP);
+		}
+	}
+
+	emit_instr_long(ctx, daddiu, addiu, MIPS_R_SP, MIPS_R_SP, stack_adj);
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
 	struct bpf_prog *orig_prog = prog;
@@ -1889,14 +2265,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		goto out_err;
 
 	/*
-	 * If no calls are made (EBPF_SAVE_RA), then tail call count
-	 * in $v1, else we must save in n$s4.
+	 * If no calls are made (EBPF_SAVE_RA), then tailcall count located
+	 * in runtime reg if defined, else we backup to save reg or stack.
 	 */
-	if (ctx.flags & EBPF_SEEN_TC) {
+	if (tail_call_present(&ctx)) {
 		if (ctx.flags & EBPF_SAVE_RA)
-			ctx.flags |= EBPF_SAVE_S4;
-		else
-			ctx.flags |= EBPF_TCC_IN_V1;
+			ctx.flags |= bpf2mips[JIT_SAV_TCC].flags;
+		else if (bpf2mips[JIT_RUN_TCC].reg)
+			ctx.flags |= EBPF_TCC_IN_RUN;
 	}
 
 	/*
@@ -1910,7 +2286,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		ctx.idx = 0;
 		ctx.gen_b_offsets = 1;
 		ctx.long_b_conversion = 0;
-		if (gen_int_prologue(&ctx))
+		if (build_int_prologue(&ctx))
 			goto out_err;
 		if (build_int_body(&ctx))
 			goto out_err;
@@ -1929,7 +2305,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 
 	/* Third pass generates the code */
 	ctx.idx = 0;
-	if (gen_int_prologue(&ctx))
+	if (build_int_prologue(&ctx))
 		goto out_err;
 	if (build_int_body(&ctx))
 		goto out_err;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 13/14] MIPS: uasm: Enable muhu opcode for MIPS R6
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (10 preceding siblings ...)
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 11/14] MIPS: eBPF: add core support for 32/64-bit systems Tony Ambardar
@ 2021-07-12  0:34 ` Tony Ambardar
  2021-07-12  0:35 ` [RFC PATCH bpf-next v1 14/14] MIPS: eBPF: add MIPS32 JIT Tony Ambardar
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Enable the 'muhu' instruction, complementing the existing 'mulu', needed
to implement a MIPS32 BPF JIT.

Also fix a typo in the existing definition of 'dmulu'.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/include/asm/uasm.h | 1 +
 arch/mips/mm/uasm-mips.c     | 4 +++-
 arch/mips/mm/uasm.c          | 3 ++-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index f7effca791a5..5efa4e2dc9ab 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -145,6 +145,7 @@ Ip_u1(_mtlo);
 Ip_u3u1u2(_mul);
 Ip_u1u2(_multu);
 Ip_u3u1u2(_mulu);
+Ip_u3u1u2(_muhu);
 Ip_u3u1u2(_nor);
 Ip_u3u1u2(_or);
 Ip_u2u1u3(_ori);
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 7154a1d99aad..e15c6700cd08 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -90,7 +90,7 @@ static const struct insn insn_table[insn_invalid] = {
 				RS | RT | RD},
 	[insn_dmtc0]	= {M(cop0_op, dmtc_op, 0, 0, 0, 0), RT | RD | SET},
 	[insn_dmultu]	= {M(spec_op, 0, 0, 0, 0, dmultu_op), RS | RT},
-	[insn_dmulu]	= {M(spec_op, 0, 0, 0, dmult_dmul_op, dmultu_op),
+	[insn_dmulu]	= {M(spec_op, 0, 0, 0, dmultu_dmulu_op, dmultu_op),
 				RS | RT | RD},
 	[insn_drotr]	= {M(spec_op, 1, 0, 0, 0, dsrl_op), RT | RD | RE},
 	[insn_drotr32]	= {M(spec_op, 1, 0, 0, 0, dsrl32_op), RT | RD | RE},
@@ -150,6 +150,8 @@ static const struct insn insn_table[insn_invalid] = {
 	[insn_mtlo]	= {M(spec_op, 0, 0, 0, 0, mtlo_op), RS},
 	[insn_mulu]	= {M(spec_op, 0, 0, 0, multu_mulu_op, multu_op),
 				RS | RT | RD},
+	[insn_muhu]	= {M(spec_op, 0, 0, 0, multu_muhu_op, multu_op),
+				RS | RT | RD},
 #ifndef CONFIG_CPU_MIPSR6
 	[insn_mul]	= {M(spec2_op, 0, 0, 0, 0, mul_op), RS | RT | RD},
 #else
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 81dd226d6b6b..125140979d62 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -59,7 +59,7 @@ enum opcode {
 	insn_lddir, insn_ldpte, insn_ldx, insn_lh, insn_lhu, insn_ll, insn_lld,
 	insn_lui, insn_lw, insn_lwu, insn_lwx, insn_mfc0, insn_mfhc0, insn_mfhi,
 	insn_mflo, insn_modu, insn_movn, insn_movz, insn_mtc0, insn_mthc0,
-	insn_mthi, insn_mtlo, insn_mul, insn_multu, insn_mulu, insn_nor,
+	insn_mthi, insn_mtlo, insn_mul, insn_multu, insn_mulu, insn_muhu, insn_nor,
 	insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sb, insn_sc,
 	insn_scd, insn_seleqz, insn_selnez, insn_sd, insn_sh, insn_sll,
 	insn_sllv, insn_slt, insn_slti, insn_sltiu, insn_sltu, insn_sra,
@@ -344,6 +344,7 @@ I_u1(_mtlo)
 I_u3u1u2(_mul)
 I_u1u2(_multu)
 I_u3u1u2(_mulu)
+I_u3u1u2(_muhu)
 I_u3u1u2(_nor)
 I_u3u1u2(_or)
 I_u2u1u3(_ori)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v1 14/14] MIPS: eBPF: add MIPS32 JIT
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (11 preceding siblings ...)
  2021-07-12  0:34 ` [RFC PATCH bpf-next v1 13/14] MIPS: uasm: Enable muhu opcode for MIPS R6 Tony Ambardar
@ 2021-07-12  0:35 ` Tony Ambardar
  2021-07-20  1:25 ` [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, " Johan Almbladh
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-07-12  0:35 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Add a new variant of build_one_insn() supporting MIPS32, leveraging the
previously added common functions, and disable static analysis as unneeded
on MIPS32. Also define bpf_jit_needs_zext() to request verifier zext
insertion. Handle these zext insns, and add conditional zext for all ALU32
and LDX word-size operations for cases where the verifier is unable to do
so (e.g. test_bpf bypasses verifier).

Aside from mapping 64-bit BPF registers to pairs of 32-bit MIPS registers,
notable changes from the MIPS64 version of build_one_insn() include:

  BPF_JMP32: implement all conditionals.

  BPF_ALU{,64} | {DIV,MOD} | BPF_K: drop divide-by-zero guard as the insns
  underlying do not raise exceptions.

  BPF_JMP | JSET | BPF_K: drop bbit insns only usable on MIPS64 Octeon.

The MIPS32 ISA does not include 64-bit div/mod or atomic opcodes. Add the
emit_bpf_divmod64() and emit_bpf_atomic64() functions, which use built-in
kernel functions to implement the follwing BPF insns:

  BPF_STX   | BPF_DW  | BPF_ATOMIC
  BPF_ALU64 | BPF_DIV | BPF_X
  BPF_ALU64 | BPF_DIV | BPF_K
  BPF_ALU64 | BPF_MOD | BPF_X
  BPF_ALU64 | BPF_MOD | BPF_K

Atomics other than BPF_ADD or using BPF_FETCH are currently unsupported.

Test and development primarily used LTS kernel 5.10.x and then 5.13.x,
running under QEMU. Test suites included the 'test_bpf' module and the
'test_verifier' program from kselftests. Testing with 'test_progs' from
kselftests was not possible in general since cross-compilation depends on
libbpf/bpftool, which does not support cross-endian builds (see also [1]).

The matrix of test configurations executed for this series covers:
MIPSWORD={64-bit,32-bit} x MIPSISA={R2,R6} x JIT={off,on,hardened}

On MIPS32BE and MIPS32LE there was general parity between the results of
interpreter vs. JIT-backed tests with respect to the numbers of PASSED,
SKIPPED, and FAILED tests.

  root@OpenWrt:~# sysctl net.core.bpf_jit_enable=1
  root@OpenWrt:~# modprobe test_bpf
  ...
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]
  root@OpenWrt:~# ./test_verifier 0 853
  ...
  Summary: 1127 PASSED, 0 SKIPPED, 89 FAILED
  root@OpenWrt:~# ./test_verifier 855 1149
  ...
  Summary: 408 PASSED, 7 SKIPPED, 53 FAILED

Link: [1] https://lore.kernel.org/bpf/CAEf4BzZCnP3oB81w4BDL4TCmvO3vPw8MucOTbVnjbW8UuCtejw@mail.gmail.com/

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 Documentation/admin-guide/sysctl/net.rst |    6 +-
 Documentation/networking/filter.rst      |    6 +-
 arch/mips/Kconfig                        |    4 +-
 arch/mips/net/Makefile                   |    8 +-
 arch/mips/net/ebpf_jit_comp32.c          | 1241 ++++++++++++++++++++++
 arch/mips/net/ebpf_jit_core.c            |   20 +-
 6 files changed, 1270 insertions(+), 15 deletions(-)
 create mode 100644 arch/mips/net/ebpf_jit_comp32.c

diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index 4150f74c521a..099e3efbf38e 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -66,14 +66,16 @@ two flavors of JITs, the newer eBPF JIT currently supported on:
   - ppc64
   - ppc32
   - sparc64
-  - mips64
+  - mips64 (R2+)
+  - mips32 (R2+)
   - s390x
   - riscv64
   - riscv32
 
 And the older cBPF JIT supported on the following archs:
 
-  - mips
+  - mips64 (R1)
+  - mips32 (R1)
   - sparc
 
 eBPF JITs are a superset of cBPF JITs, meaning the kernel will
diff --git a/Documentation/networking/filter.rst b/Documentation/networking/filter.rst
index 3e2221f4abe4..31101411da0e 100644
--- a/Documentation/networking/filter.rst
+++ b/Documentation/networking/filter.rst
@@ -637,9 +637,9 @@ skb pointer). All constraints and restrictions from bpf_check_classic() apply
 before a conversion to the new layout is being done behind the scenes!
 
 Currently, the classic BPF format is being used for JITing on most
-32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64,
-sparc64, arm32, riscv64, riscv32 perform JIT compilation from eBPF
-instruction set.
+32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64,
+mips64, riscv64, arm32, riscv32, and mips32 perform JIT compilation from
+eBPF instruction set.
 
 Some core changes of the new internal format:
 
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index ed51970c08e7..d096d2332fe4 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -55,7 +55,7 @@ config MIPS
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES
 	select HAVE_ASM_MODVERSIONS
-	select HAVE_CBPF_JIT if !64BIT && !CPU_MICROMIPS
+	select HAVE_CBPF_JIT if !CPU_MICROMIPS && TARGET_ISA_REV < 2
 	select HAVE_CONTEXT_TRACKING
 	select HAVE_TIF_NOHZ
 	select HAVE_C_RECORDMCOUNT
@@ -63,7 +63,7 @@ config MIPS
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DMA_CONTIGUOUS
 	select HAVE_DYNAMIC_FTRACE
-	select HAVE_EBPF_JIT if 64BIT && !CPU_MICROMIPS && TARGET_ISA_REV >= 2
+	select HAVE_EBPF_JIT if !CPU_MICROMIPS && TARGET_ISA_REV >= 2
 	select HAVE_EXIT_THREAD
 	select HAVE_FAST_GUP
 	select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/mips/net/Makefile b/arch/mips/net/Makefile
index de42f4a4db56..5f804bc54629 100644
--- a/arch/mips/net/Makefile
+++ b/arch/mips/net/Makefile
@@ -2,4 +2,10 @@
 # MIPS networking code
 
 obj-$(CONFIG_MIPS_CBPF_JIT) += bpf_jit.o bpf_jit_asm.o
-obj-$(CONFIG_MIPS_EBPF_JIT) += ebpf_jit_core.o ebpf_jit_comp64.o
+
+obj-$(CONFIG_MIPS_EBPF_JIT) += ebpf_jit_core.o
+ifeq ($(CONFIG_CPU_MIPS64),y)
+	obj-$(CONFIG_MIPS_EBPF_JIT) += ebpf_jit_comp64.o
+else
+	obj-$(CONFIG_MIPS_EBPF_JIT) += ebpf_jit_comp32.o
+endif
diff --git a/arch/mips/net/ebpf_jit_comp32.c b/arch/mips/net/ebpf_jit_comp32.c
new file mode 100644
index 000000000000..069b3e044b89
--- /dev/null
+++ b/arch/mips/net/ebpf_jit_comp32.c
@@ -0,0 +1,1241 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Just-In-Time compiler for eBPF filters on MIPS32/MIPS64
+ * Copyright (c) 2021 Tony Ambardar <Tony.Ambardar@gmail.com>
+ *
+ * Based on code from:
+ *
+ * Copyright (c) 2017 Cavium, Inc.
+ * Author: David Daney <david.daney@cavium.com>
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ */
+
+#include <linux/errno.h>
+#include <linux/filter.h>
+#include <asm/uasm.h>
+
+#include "ebpf_jit.h"
+
+static int gen_imm_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
+			int idx)
+{
+	int dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+	int upper_bound, lower_bound, shamt;
+	int imm = insn->imm;
+
+	if (dst < 0)
+		return dst;
+
+	switch (BPF_OP(insn->code)) {
+	case BPF_MOV:
+	case BPF_ADD:
+		upper_bound = S16_MAX;
+		lower_bound = S16_MIN;
+		break;
+	case BPF_SUB:
+		upper_bound = -(int)S16_MIN;
+		lower_bound = -(int)S16_MAX;
+		break;
+	case BPF_AND:
+	case BPF_OR:
+	case BPF_XOR:
+		upper_bound = 0xffff;
+		lower_bound = 0;
+		break;
+	case BPF_RSH:
+	case BPF_LSH:
+	case BPF_ARSH:
+		/* Shift amounts are truncated, no need for bounds */
+		upper_bound = S32_MAX;
+		lower_bound = S32_MIN;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	/*
+	 * Immediate move clobbers the register, so no sign/zero
+	 * extension needed.
+	 */
+	if (lower_bound <= imm && imm <= upper_bound) {
+		/* single insn immediate case */
+		switch (BPF_OP(insn->code) | BPF_CLASS(insn->code)) {
+		case BPF_ALU64 | BPF_MOV:
+			emit_instr(ctx, addiu, LO(dst), MIPS_R_ZERO, imm);
+			if (imm < 0)
+				gen_sext_insn(dst, ctx);
+			else
+				gen_zext_insn(dst, true, ctx);
+			break;
+		case BPF_ALU | BPF_MOV:
+			emit_instr(ctx, addiu, LO(dst), MIPS_R_ZERO, imm);
+			break;
+		case BPF_ALU64 | BPF_AND:
+			if (imm >= 0)
+				gen_zext_insn(dst, true, ctx);
+			fallthrough;
+		case BPF_ALU | BPF_AND:
+			emit_instr(ctx, andi, LO(dst), LO(dst), imm);
+			break;
+		case BPF_ALU64 | BPF_OR:
+			if (imm < 0)
+				emit_instr(ctx, nor, HI(dst),
+						MIPS_R_ZERO, MIPS_R_ZERO);
+			fallthrough;
+		case BPF_ALU | BPF_OR:
+			emit_instr(ctx, ori, LO(dst), LO(dst), imm);
+			break;
+		case BPF_ALU64 | BPF_XOR:
+			if (imm < 0)
+				emit_instr(ctx, nor, HI(dst),
+							HI(dst), MIPS_R_ZERO);
+			fallthrough;
+		case BPF_ALU | BPF_XOR:
+			emit_instr(ctx, xori, LO(dst), LO(dst), imm);
+			break;
+		case BPF_ALU64 | BPF_ADD:
+			emit_instr(ctx, addiu, LO(dst), LO(dst), imm);
+			if (imm < 0)
+				emit_instr(ctx, addiu, HI(dst), HI(dst), -1);
+			emit_instr(ctx, sltiu, MIPS_R_AT, LO(dst), imm);
+			emit_instr(ctx, addu, HI(dst), HI(dst), MIPS_R_AT);
+			break;
+		case BPF_ALU64 | BPF_SUB:
+			emit_instr(ctx, addiu, MIPS_R_AT, LO(dst), -imm);
+			if (imm < 0)
+				emit_instr(ctx, addiu, HI(dst), HI(dst), 1);
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), MIPS_R_AT);
+			emit_instr(ctx, subu, HI(dst), HI(dst), MIPS_R_AT);
+			emit_instr(ctx, addiu, LO(dst), LO(dst), -imm);
+			break;
+		case BPF_ALU64 | BPF_ARSH:
+			shamt = imm & 0x3f;
+			if (shamt >= 32) {
+				emit_instr(ctx, sra, LO(dst),
+							HI(dst), shamt - 32);
+				emit_instr(ctx, sra, HI(dst), HI(dst), 31);
+			} else if (shamt > 0) {
+				emit_instr(ctx, srl, LO(dst), LO(dst), shamt);
+				emit_instr(ctx, ins, LO(dst), HI(dst),
+							32 - shamt, shamt);
+				emit_instr(ctx, sra, HI(dst), HI(dst), shamt);
+			}
+			break;
+		case BPF_ALU64 | BPF_RSH:
+			shamt = imm & 0x3f;
+			if (shamt >= 32) {
+				emit_instr(ctx, srl, LO(dst),
+							HI(dst), shamt - 32);
+				emit_instr(ctx, and, HI(dst),
+							HI(dst), MIPS_R_ZERO);
+			} else if (shamt > 0) {
+				emit_instr(ctx, srl, LO(dst), LO(dst), shamt);
+				emit_instr(ctx, ins, LO(dst), HI(dst),
+							32 - shamt, shamt);
+				emit_instr(ctx, srl, HI(dst), HI(dst), shamt);
+			}
+			break;
+		case BPF_ALU64 | BPF_LSH:
+			shamt = imm & 0x3f;
+			if (shamt >= 32) {
+				emit_instr(ctx, sll, HI(dst),
+							LO(dst), shamt - 32);
+				emit_instr(ctx, and, LO(dst),
+							LO(dst), MIPS_R_ZERO);
+			} else if (shamt > 0) {
+				emit_instr(ctx, srl, MIPS_R_AT,
+							LO(dst), 32 - shamt);
+				emit_instr(ctx, sll, HI(dst), HI(dst), shamt);
+				emit_instr(ctx, sll, LO(dst), LO(dst), shamt);
+				emit_instr(ctx, or, HI(dst),
+							HI(dst), MIPS_R_AT);
+			}
+			break;
+		case BPF_ALU | BPF_RSH:
+			emit_instr(ctx, srl, LO(dst), LO(dst), imm & 0x1f);
+			break;
+		case BPF_ALU | BPF_LSH:
+			emit_instr(ctx, sll, LO(dst), LO(dst), imm & 0x1f);
+			break;
+		case BPF_ALU | BPF_ARSH:
+			emit_instr(ctx, sra, LO(dst), LO(dst), imm & 0x1f);
+			break;
+		case BPF_ALU | BPF_ADD:
+			emit_instr(ctx, addiu, LO(dst), LO(dst), imm);
+			break;
+		case BPF_ALU | BPF_SUB:
+			emit_instr(ctx, addiu, LO(dst), LO(dst), -imm);
+			break;
+		default:
+			return -EINVAL;
+		}
+	} else {
+		/* multi insn immediate case */
+		if (BPF_OP(insn->code) == BPF_MOV) {
+			gen_imm_to_reg(insn, LO(dst), ctx);
+			if (BPF_CLASS(insn->code) == BPF_ALU64)
+				gen_sext_insn(dst, ctx);
+		} else {
+			gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+			switch (BPF_OP(insn->code) | BPF_CLASS(insn->code)) {
+			case BPF_ALU64 | BPF_AND:
+				if (imm >= 0)
+					gen_zext_insn(dst, true, ctx);
+				fallthrough;
+			case BPF_ALU | BPF_AND:
+				emit_instr(ctx, and, LO(dst), LO(dst),
+								MIPS_R_AT);
+				break;
+			case BPF_ALU64 | BPF_OR:
+				if (imm < 0)
+					emit_instr(ctx, nor, HI(dst),
+						MIPS_R_ZERO, MIPS_R_ZERO);
+			fallthrough;
+			case BPF_ALU | BPF_OR:
+				emit_instr(ctx, or, LO(dst), LO(dst),
+								MIPS_R_AT);
+				break;
+			case BPF_ALU64 | BPF_XOR:
+				if (imm < 0)
+					emit_instr(ctx, nor, HI(dst),
+							HI(dst), MIPS_R_ZERO);
+			fallthrough;
+			case BPF_ALU | BPF_XOR:
+				emit_instr(ctx, xor, LO(dst), LO(dst),
+								MIPS_R_AT);
+				break;
+			case BPF_ALU64 | BPF_ADD:
+				emit_instr(ctx, addu, LO(dst),
+							LO(dst), MIPS_R_AT);
+				if (imm < 0)
+					emit_instr(ctx, addiu, HI(dst), HI(dst), -1);
+				emit_instr(ctx, sltu, MIPS_R_AT,
+							LO(dst), MIPS_R_AT);
+				emit_instr(ctx, addu, HI(dst),
+							HI(dst), MIPS_R_AT);
+				break;
+			case BPF_ALU64 | BPF_SUB:
+				emit_instr(ctx, subu, LO(dst),
+							LO(dst), MIPS_R_AT);
+				if (imm < 0)
+					emit_instr(ctx, addiu, HI(dst), HI(dst), 1);
+				emit_instr(ctx, sltu, MIPS_R_AT,
+							MIPS_R_AT, LO(dst));
+				emit_instr(ctx, subu, HI(dst),
+							HI(dst), MIPS_R_AT);
+				break;
+			case BPF_ALU | BPF_ADD:
+				emit_instr(ctx, addu, LO(dst), LO(dst),
+								MIPS_R_AT);
+				break;
+			case BPF_ALU | BPF_SUB:
+				emit_instr(ctx, subu, LO(dst), LO(dst),
+								MIPS_R_AT);
+				break;
+			default:
+				return -EINVAL;
+			}
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Implement 64-bit BPF div/mod insns on 32-bit systems by calling the
+ * equivalent built-in kernel function. The function args may be mixed
+ * 64/32-bit types, unlike the uniform u64 args of BPF kernel helpers.
+ * Func proto: u64 div64_u64_rem(u64 dividend, u64 divisor, u64 *remainder)
+ */
+static int emit_bpf_divmod64(struct jit_ctx *ctx, const struct bpf_insn *insn)
+{
+	const int bpf_src = BPF_SRC(insn->code);
+	const int bpf_op = BPF_OP(insn->code);
+	int rem_off, arg_off;
+	int src, dst, tmp;
+	u32 func_addr;
+
+	ctx->flags |= EBPF_SAVE_RA;
+
+	dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+	if (dst < 0)
+		return -EINVAL;
+
+	if (bpf_src == BPF_X) {
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0)
+			return -EINVAL;
+		/*
+		 * Use MIPS_R_T8 as temp reg pair to avoid target
+		 * of dst from clobbering src.
+		 */
+		if (src == MIPS_R_A0) {
+			tmp = MIPS_R_T8;
+			emit_instr(ctx, move, LO(tmp), LO(src));
+			emit_instr(ctx, move, HI(tmp), HI(src));
+			src = tmp;
+		}
+	}
+
+	/* Save caller registers */
+	emit_caller_save(ctx);
+	/* Push O32 stack, aligned space for u64, u64, u64 *, u64 */
+	emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, -32);
+
+	func_addr = (u32) &div64_u64_rem;
+	/* Move u64 dst to arg 1 as needed */
+	if (dst != MIPS_R_A0) {
+		emit_instr(ctx, move, LO(MIPS_R_A0), LO(dst));
+		emit_instr(ctx, move, HI(MIPS_R_A0), HI(dst));
+	}
+	/* Load imm or move u64 src to arg 2 as needed */
+	if (bpf_src == BPF_K) {
+		gen_imm_to_reg(insn, LO(MIPS_R_A2), ctx);
+		gen_sext_insn(MIPS_R_A2, ctx);
+	} else if (src != MIPS_R_A2) { /* BPF_X */
+		emit_instr(ctx, move, LO(MIPS_R_A2), LO(src));
+		emit_instr(ctx, move, HI(MIPS_R_A2), HI(src));
+	}
+	/* Set up stack arg 3 as ptr to u64 remainder on stack */
+	arg_off = 16;
+	rem_off = 24;
+	emit_instr(ctx, addiu, MIPS_R_AT, MIPS_R_SP, rem_off);
+	emit_instr(ctx, sw, MIPS_R_AT, arg_off, MIPS_R_SP);
+
+	emit_const_to_reg(ctx, MIPS_R_T9, func_addr);
+	emit_instr(ctx, jalr, MIPS_R_RA, MIPS_R_T9);
+	/* Delay slot */
+	emit_instr(ctx, nop);
+
+	/* Move return value to dst as needed */
+	switch (bpf_op) {
+	case BPF_DIV:
+		/* Quotient in MIPS_R_V0 reg pair */
+		if (dst != MIPS_R_V0) {
+			emit_instr(ctx, move, LO(dst), LO(MIPS_R_V0));
+			emit_instr(ctx, move, HI(dst), HI(MIPS_R_V0));
+		}
+		break;
+	case BPF_MOD:
+		/* Remainder on stack */
+		emit_instr(ctx, lw, LO(dst), OFFLO(rem_off), MIPS_R_SP);
+		emit_instr(ctx, lw, HI(dst), OFFHI(rem_off), MIPS_R_SP);
+		break;
+	}
+
+	/* Pop O32 call stack */
+	emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, 32);
+	/* Restore all caller registers except call return value*/
+	emit_caller_restore(ctx, insn->dst_reg);
+
+	return 0;
+}
+
+/*
+ * Implement 64-bit BPF atomic insns on 32-bit systems by calling the
+ * equivalent built-in kernel function. The function args may be mixed
+ * 64/32-bit types, unlike the uniform u64 args of BPF kernel helpers.
+ * Func proto: void atomic64_add(s64 a, atomic64_t *v)
+ */
+static int emit_bpf_atomic64(struct jit_ctx *ctx, const struct bpf_insn *insn)
+{
+	int src, dst, mem_off;
+	u32 func_addr;
+
+	ctx->flags |= EBPF_SAVE_RA;
+
+	dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+	src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+	if (src < 0 || dst < 0)
+		return -EINVAL;
+	mem_off = insn->off;
+
+	/* Save caller registers */
+	emit_caller_save(ctx);
+
+	switch (insn->imm) {
+	case BPF_ADD:
+		func_addr = (u32) &atomic64_add;
+		/* Move s64 src to arg 1 as needed */
+		if (src != MIPS_R_A0) {
+			emit_instr(ctx, move, LO(MIPS_R_A0), LO(src));
+			emit_instr(ctx, move, HI(MIPS_R_A0), HI(src));
+		}
+		/* Set up dst ptr in arg 2 base register*/
+		emit_instr(ctx, addiu, MIPS_R_A2, LO(dst), mem_off);
+		break;
+	default:
+		pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
+		return -EINVAL;
+	}
+
+	emit_const_to_reg(ctx, MIPS_R_T9, func_addr);
+	emit_instr(ctx, jalr, MIPS_R_RA, MIPS_R_T9);
+	/* Delay slot */
+	/* Push minimal O32 stack */
+	emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, -16);
+
+	/* Pop minimal O32 stack */
+	emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, 16);
+	/* Restore all caller registers since none clobbered by call */
+	emit_caller_restore(ctx, BPF_REG_FP);
+
+	return 0;
+}
+
+/* Returns the number of insn slots consumed. */
+int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
+			  int this_idx, int exit_idx)
+{
+	const int bpf_class = BPF_CLASS(insn->code);
+	const int bpf_size = BPF_SIZE(insn->code);
+	const int bpf_src = BPF_SRC(insn->code);
+	const int bpf_op = BPF_OP(insn->code);
+	int src, dst, r, mem_off, b_off;
+	bool need_swap, cmp_eq;
+	unsigned int target = 0;
+	u64 t64u;
+
+	switch (insn->code) {
+	case BPF_ALU64 | BPF_ADD | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_SUB | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_LSH | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_RSH | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_ARSH | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_XOR | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_MOV | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_OR | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_AND | BPF_K: /* ALU64_IMM */
+	case BPF_ALU | BPF_MOV | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_ADD | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_SUB | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_OR | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_AND | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_LSH | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_RSH | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_XOR | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_ARSH | BPF_K: /* ALU32_IMM */
+		r = gen_imm_insn(insn, ctx, this_idx);
+		if (r < 0)
+			return r;
+		break;
+	case BPF_ALU64 | BPF_MUL | BPF_K: /* ALU64_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		if (insn->imm == 1) /* Mult by 1 is a nop */
+			break;
+		src = MIPS_R_T8; /* Use tmp reg pair for imm */
+		gen_imm_to_reg(insn, LO(src), ctx);
+		emit_instr(ctx, sra, HI(src), LO(src), 31);
+		goto case_alu64_mul_x;
+
+	case BPF_ALU64 | BPF_NEG | BPF_K: /* ALU64_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		emit_instr(ctx, subu, LO(dst), MIPS_R_ZERO, LO(dst));
+		emit_instr(ctx, subu, HI(dst), MIPS_R_ZERO, HI(dst));
+		emit_instr(ctx, sltu, MIPS_R_AT, MIPS_R_ZERO, LO(dst));
+		emit_instr(ctx, subu, HI(dst), HI(dst), MIPS_R_AT);
+		break;
+	case BPF_ALU | BPF_MUL | BPF_K: /* ALU_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		if (insn->imm == 1) /* Mult by 1 is a nop */
+			break;
+		gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+		if (MIPS_ISA_REV >= 6) {
+			emit_instr(ctx, mulu, LO(dst), LO(dst), MIPS_R_AT);
+		} else {
+			emit_instr(ctx, multu, LO(dst), MIPS_R_AT);
+			emit_instr(ctx, mflo, LO(dst));
+		}
+		break;
+	case BPF_ALU | BPF_NEG | BPF_K: /* ALU_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		emit_instr(ctx, subu, LO(dst), MIPS_R_ZERO, LO(dst));
+		break;
+	case BPF_ALU | BPF_DIV | BPF_K: /* ALU_IMM */
+	case BPF_ALU | BPF_MOD | BPF_K: /* ALU_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		if (insn->imm == 1) {
+			/* div by 1 is a nop, mod by 1 is zero */
+			if (bpf_op == BPF_MOD)
+				emit_instr(ctx, move, LO(dst), MIPS_R_ZERO);
+			break;
+		}
+		gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+		if (MIPS_ISA_REV >= 6) {
+			if (bpf_op == BPF_DIV)
+				emit_instr(ctx, divu_r6, LO(dst),
+							LO(dst), MIPS_R_AT);
+			else
+				emit_instr(ctx, modu, LO(dst),
+							LO(dst), MIPS_R_AT);
+			break;
+		}
+		emit_instr(ctx, divu, LO(dst), MIPS_R_AT);
+		if (bpf_op == BPF_DIV)
+			emit_instr(ctx, mflo, LO(dst));
+		else
+			emit_instr(ctx, mfhi, LO(dst));
+		break;
+	case BPF_ALU64 | BPF_DIV | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_MOD | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_DIV | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_MOD | BPF_X: /* ALU64_REG */
+		r = emit_bpf_divmod64(ctx, insn);
+		if (r < 0)
+			return r;
+		break;
+	case BPF_ALU64 | BPF_MUL | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_ADD | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_SUB | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_MOV | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_XOR | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_OR | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_AND | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_LSH | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_RSH | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_ARSH | BPF_X: /* ALU64_REG */
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		switch (bpf_op) {
+		case BPF_MOV:
+			emit_instr(ctx, move, LO(dst), LO(src));
+			emit_instr(ctx, move, HI(dst), HI(src));
+			break;
+		case BPF_ADD:
+			emit_instr(ctx, addu, HI(dst), HI(dst), HI(src));
+			emit_instr(ctx, addu, MIPS_R_AT, LO(dst), LO(src));
+			emit_instr(ctx, sltu, MIPS_R_AT, MIPS_R_AT, LO(dst));
+			emit_instr(ctx, addu, HI(dst), HI(dst), MIPS_R_AT);
+			emit_instr(ctx, addu, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_SUB:
+			emit_instr(ctx, subu, HI(dst), HI(dst), HI(src));
+			emit_instr(ctx, subu, MIPS_R_AT, LO(dst), LO(src));
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), MIPS_R_AT);
+			emit_instr(ctx, subu, HI(dst), HI(dst), MIPS_R_AT);
+			emit_instr(ctx, subu, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_XOR:
+			emit_instr(ctx, xor, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, xor, HI(dst), HI(dst), HI(src));
+			break;
+		case BPF_OR:
+			emit_instr(ctx, or, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, or, HI(dst), HI(dst), HI(src));
+			break;
+		case BPF_AND:
+			emit_instr(ctx, and, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, and, HI(dst), HI(dst), HI(src));
+			break;
+		case BPF_MUL:
+case_alu64_mul_x:
+			emit_instr(ctx, mul, HI(dst), HI(dst), LO(src));
+			emit_instr(ctx, mul, MIPS_R_AT, LO(dst), HI(src));
+			emit_instr(ctx, addu, HI(dst), HI(dst), MIPS_R_AT);
+			if (MIPS_ISA_REV >= 6) {
+				emit_instr(ctx, muhu, MIPS_R_AT, LO(dst), LO(src));
+				emit_instr(ctx, mul, LO(dst), LO(dst), LO(src));
+			} else {
+				emit_instr(ctx, multu, LO(dst), LO(src));
+				emit_instr(ctx, mfhi, MIPS_R_AT);
+				emit_instr(ctx, mflo, LO(dst));
+			}
+			emit_instr(ctx, addu, HI(dst), HI(dst), MIPS_R_AT);
+			break;
+		case BPF_DIV:
+		case BPF_MOD:
+			return -EINVAL;
+		case BPF_LSH:
+			emit_instr(ctx, beqz, LO(src), 11 * 4);
+			emit_instr(ctx, addiu, MIPS_R_AT, LO(src), -32);
+			emit_instr(ctx, bltz, MIPS_R_AT, 4 * 4);
+			emit_instr(ctx, nop);
+			emit_instr(ctx, sllv, HI(dst), LO(dst), MIPS_R_AT);
+			emit_instr(ctx, and, LO(dst), LO(dst), MIPS_R_ZERO);
+			emit_instr(ctx, b, 5 * 4);
+			emit_instr(ctx, subu, MIPS_R_AT, MIPS_R_ZERO, MIPS_R_AT);
+			emit_instr(ctx, srlv, MIPS_R_AT, LO(dst), MIPS_R_AT);
+			emit_instr(ctx, sllv, HI(dst), HI(dst), LO(src));
+			emit_instr(ctx, sllv, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, or, HI(dst), HI(dst), MIPS_R_AT);
+			break;
+		case BPF_RSH:
+			emit_instr(ctx, beqz, LO(src), 11 * 4);
+			emit_instr(ctx, addiu, MIPS_R_AT, LO(src), -32);
+			emit_instr(ctx, bltz, MIPS_R_AT, 4 * 4);
+			emit_instr(ctx, nop);
+			emit_instr(ctx, srlv, LO(dst), HI(dst), MIPS_R_AT);
+			emit_instr(ctx, and, HI(dst), HI(dst), MIPS_R_ZERO);
+			emit_instr(ctx, b, 5 * 4);
+			emit_instr(ctx, subu, MIPS_R_AT, MIPS_R_ZERO, MIPS_R_AT);
+			emit_instr(ctx, sllv, MIPS_R_AT, HI(dst), MIPS_R_AT);
+			emit_instr(ctx, srlv, HI(dst), HI(dst), LO(src));
+			emit_instr(ctx, srlv, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, or, LO(dst), LO(dst), MIPS_R_AT);
+			break;
+		case BPF_ARSH:
+			emit_instr(ctx, beqz, LO(src), 11 * 4);
+			emit_instr(ctx, addiu, MIPS_R_AT, LO(src), -32);
+			emit_instr(ctx, bltz, MIPS_R_AT, 4 * 4);
+			emit_instr(ctx, nop);
+			emit_instr(ctx, srav, LO(dst), HI(dst), MIPS_R_AT);
+			emit_instr(ctx, sra, HI(dst), HI(dst), 31);
+			emit_instr(ctx, b, 5 * 4);
+			emit_instr(ctx, subu, MIPS_R_AT, MIPS_R_ZERO, MIPS_R_AT);
+			emit_instr(ctx, sllv, MIPS_R_AT, HI(dst), MIPS_R_AT);
+			emit_instr(ctx, srav, HI(dst), HI(dst), LO(src));
+			emit_instr(ctx, srlv, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, or, LO(dst), LO(dst), MIPS_R_AT);
+			break;
+		default:
+			pr_err("ALU64_REG NOT HANDLED\n");
+			return -EINVAL;
+		}
+		break;
+	case BPF_ALU | BPF_MOV | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_ADD | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_SUB | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_XOR | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_OR | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_AND | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_MUL | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_DIV | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_MOD | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_LSH | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_RSH | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_ARSH | BPF_X: /* ALU_REG */
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		/* Special BPF_MOV zext insn from verifier. */
+		if (insn_is_zext(insn)) {
+			gen_zext_insn(dst, true, ctx);
+			break;
+		}
+		switch (bpf_op) {
+		case BPF_MOV:
+			emit_instr(ctx, move, LO(dst), LO(src));
+			break;
+		case BPF_ADD:
+			emit_instr(ctx, addu, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_SUB:
+			emit_instr(ctx, subu, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_XOR:
+			emit_instr(ctx, xor, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_OR:
+			emit_instr(ctx, or, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_AND:
+			emit_instr(ctx, and, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_MUL:
+			emit_instr(ctx, mul, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_DIV:
+		case BPF_MOD:
+			if (MIPS_ISA_REV >= 6) {
+				if (bpf_op == BPF_DIV)
+					emit_instr(ctx, divu_r6, LO(dst),
+							LO(dst), LO(src));
+				else
+					emit_instr(ctx, modu, LO(dst),
+							LO(dst), LO(src));
+				break;
+			}
+			emit_instr(ctx, divu, LO(dst), LO(src));
+			if (bpf_op == BPF_DIV)
+				emit_instr(ctx, mflo, LO(dst));
+			else
+				emit_instr(ctx, mfhi, LO(dst));
+			break;
+		case BPF_LSH:
+			emit_instr(ctx, sllv, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_RSH:
+			emit_instr(ctx, srlv, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_ARSH:
+			emit_instr(ctx, srav, LO(dst), LO(dst), LO(src));
+			break;
+		default:
+			pr_err("ALU_REG NOT HANDLED\n");
+			return -EINVAL;
+		}
+		break;
+	case BPF_JMP | BPF_EXIT:
+		if (this_idx + 1 < exit_idx) {
+			b_off = b_imm(exit_idx, ctx);
+			if (is_bad_offset(b_off)) {
+				target = j_target(ctx, exit_idx);
+				if (target == (unsigned int)-1)
+					return -E2BIG;
+				emit_instr(ctx, j, target);
+			} else {
+				emit_instr(ctx, b, b_off);
+			}
+			emit_instr(ctx, nop);
+		}
+		break;
+	case BPF_JMP32 | BPF_JSLT | BPF_X:
+	case BPF_JMP32 | BPF_JSLE | BPF_X:
+	case BPF_JMP32 | BPF_JSGT | BPF_X:
+	case BPF_JMP32 | BPF_JSGE | BPF_X:
+	case BPF_JMP32 | BPF_JSGT | BPF_K:
+	case BPF_JMP32 | BPF_JSGE | BPF_K:
+	case BPF_JMP32 | BPF_JSLT | BPF_K:
+	case BPF_JMP32 | BPF_JSLE | BPF_K:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return -EINVAL;
+
+		if (bpf_src == BPF_X) {
+			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+			if (src < 0)
+				return -EINVAL;
+		} else if (insn->imm == 0) { /* and BPF_K */
+			src = MIPS_R_ZERO;
+		} else {
+			src = MIPS_R_T8;
+			gen_imm_to_reg(insn, LO(src), ctx);
+		}
+
+		cmp_eq = bpf_op == BPF_JSLE || bpf_op == BPF_JSGE;
+		switch (bpf_op) {
+		case BPF_JSGE:
+			emit_instr(ctx, slt, MIPS_R_AT, LO(dst), LO(src));
+			break;
+		case BPF_JSLT:
+			emit_instr(ctx, slt, MIPS_R_AT, LO(dst), LO(src));
+			break;
+		case BPF_JSGT:
+			emit_instr(ctx, slt, MIPS_R_AT, LO(src), LO(dst));
+			break;
+		case BPF_JSLE:
+			emit_instr(ctx, slt, MIPS_R_AT, LO(src), LO(dst));
+			break;
+		}
+
+		src = MIPS_R_AT;
+		dst = MIPS_R_ZERO;
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JSLT | BPF_X:
+	case BPF_JMP | BPF_JSLE | BPF_X:
+	case BPF_JMP | BPF_JSGT | BPF_X:
+	case BPF_JMP | BPF_JSGE | BPF_X:
+	case BPF_JMP | BPF_JSGT | BPF_K:
+	case BPF_JMP | BPF_JSGE | BPF_K:
+	case BPF_JMP | BPF_JSLT | BPF_K:
+	case BPF_JMP | BPF_JSLE | BPF_K:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return -EINVAL;
+
+		if (bpf_src == BPF_X) {
+			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+			if (src < 0)
+				return -EINVAL;
+		} else if (insn->imm == 0) { /* and BPF_K */
+			src = MIPS_R_ZERO;
+		} else {
+			src = MIPS_R_T8;
+			gen_imm_to_reg(insn, LO(src), ctx);
+			if (insn->imm < 0)
+				gen_sext_insn(src, ctx);
+			else
+				gen_zext_insn(src, true, ctx);
+		}
+
+		cmp_eq = bpf_op == BPF_JSGT || bpf_op == BPF_JSGE;
+
+		if (bpf_op == BPF_JSGT || bpf_op == BPF_JSLE) {
+			/* Check dst <= src */
+			emit_instr(ctx, bne, HI(dst), HI(src), 4 * 4);
+			/* Delay slot */
+			emit_instr(ctx, slt, MIPS_R_AT, HI(dst), HI(src));
+			emit_instr(ctx, bne, LO(dst), LO(src), 2 * 4);
+			/* Delay slot */
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+			emit_instr(ctx, nor, MIPS_R_AT, MIPS_R_ZERO, MIPS_R_AT);
+		} else {
+			/* Check dst < src */
+			emit_instr(ctx, bne, HI(dst), HI(src), 2 * 4);
+			/* Delay slot */
+			emit_instr(ctx, slt, MIPS_R_AT, HI(dst), HI(src));
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+		}
+
+		src = MIPS_R_AT;
+		dst = MIPS_R_ZERO;
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JLT | BPF_X:
+	case BPF_JMP | BPF_JLE | BPF_X:
+	case BPF_JMP | BPF_JGT | BPF_X:
+	case BPF_JMP | BPF_JGE | BPF_X:
+	case BPF_JMP | BPF_JGT | BPF_K:
+	case BPF_JMP | BPF_JGE | BPF_K:
+	case BPF_JMP | BPF_JLT | BPF_K:
+	case BPF_JMP | BPF_JLE | BPF_K:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return -EINVAL;
+
+		if (bpf_src == BPF_X) {
+			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+			if (src < 0)
+				return -EINVAL;
+		} else if (insn->imm == 0) { /* and BPF_K */
+			src = MIPS_R_ZERO;
+		} else {
+			src = MIPS_R_T8;
+			gen_imm_to_reg(insn, LO(src), ctx);
+			if (insn->imm < 0)
+				gen_sext_insn(src, ctx);
+			else
+				gen_zext_insn(src, true, ctx);
+		}
+
+		cmp_eq = bpf_op == BPF_JGT || bpf_op == BPF_JGE;
+
+		if (bpf_op == BPF_JGT || bpf_op == BPF_JLE) {
+			/* Check dst <= src */
+			emit_instr(ctx, bne, HI(dst), HI(src), 4 * 4);
+			/* Delay slot */
+			emit_instr(ctx, sltu, MIPS_R_AT, HI(dst), HI(src));
+			emit_instr(ctx, bne, LO(dst), LO(src), 2 * 4);
+			/* Delay slot */
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+			emit_instr(ctx, nor, MIPS_R_AT, MIPS_R_ZERO, MIPS_R_AT);
+		} else {
+			/* Check dst < src */
+			emit_instr(ctx, bne, HI(dst), HI(src), 2 * 4);
+			/* Delay slot */
+			emit_instr(ctx, sltu, MIPS_R_AT, HI(dst), HI(src));
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+		}
+
+		src = MIPS_R_AT;
+		dst = MIPS_R_ZERO;
+		goto jeq_common;
+
+	case BPF_JMP32 | BPF_JLT | BPF_X:
+	case BPF_JMP32 | BPF_JLE | BPF_X:
+	case BPF_JMP32 | BPF_JGT | BPF_X:
+	case BPF_JMP32 | BPF_JGE | BPF_X:
+	case BPF_JMP32 | BPF_JGT | BPF_K:
+	case BPF_JMP32 | BPF_JGE | BPF_K:
+	case BPF_JMP32 | BPF_JLT | BPF_K:
+	case BPF_JMP32 | BPF_JLE | BPF_K:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return -EINVAL;
+
+		if (bpf_src == BPF_X) {
+			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+			if (src < 0)
+				return -EINVAL;
+		} else if (insn->imm == 0) { /* and BPF_K */
+			src = MIPS_R_ZERO;
+		} else {
+			src = MIPS_R_T8;
+			gen_imm_to_reg(insn, LO(src), ctx);
+		}
+
+		cmp_eq = bpf_op == BPF_JLE || bpf_op == BPF_JGE;
+		switch (bpf_op) {
+		case BPF_JGE:
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+			break;
+		case BPF_JLT:
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+			break;
+		case BPF_JGT:
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(src), LO(dst));
+			break;
+		case BPF_JLE:
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(src), LO(dst));
+			break;
+		}
+
+		src = MIPS_R_AT;
+		dst = MIPS_R_ZERO;
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JEQ | BPF_X: /* JMP_REG */
+	case BPF_JMP | BPF_JNE | BPF_X:
+	case BPF_JMP32 | BPF_JEQ | BPF_X:
+	case BPF_JMP32 | BPF_JNE | BPF_X:
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+
+		cmp_eq = (bpf_op == BPF_JEQ);
+		if (bpf_class == BPF_JMP) {
+			emit_instr(ctx, beq, HI(dst), HI(src), 2 * 4);
+			/* Delay slot */
+			emit_instr(ctx, move, MIPS_R_AT, LO(src));
+			/* Make low words unequal if high word unequal. */
+			emit_instr(ctx, addu, MIPS_R_AT, LO(dst), MIPS_R_SP);
+			dst = LO(dst);
+			src = MIPS_R_AT;
+		} else { /* BPF_JMP32 */
+			dst = LO(dst);
+			src = LO(src);
+		}
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JSET | BPF_X: /* JMP_REG */
+	case BPF_JMP32 | BPF_JSET | BPF_X:
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		emit_instr(ctx, and, MIPS_R_AT, LO(dst), LO(src));
+		if (bpf_class == BPF_JMP) {
+			emit_instr(ctx, and, MIPS_R_T8, HI(dst), HI(src));
+			emit_instr(ctx, or, MIPS_R_AT, MIPS_R_AT, MIPS_R_T8);
+		}
+		cmp_eq = false;
+		dst = MIPS_R_AT;
+		src = MIPS_R_ZERO;
+jeq_common:
+		/*
+		 * If the next insn is EXIT and we are jumping arround
+		 * only it, invert the sense of the compare and
+		 * conditionally jump to the exit.  Poor man's branch
+		 * chaining.
+		 */
+		if ((insn + 1)->code == (BPF_JMP | BPF_EXIT) && insn->off == 1) {
+			b_off = b_imm(exit_idx, ctx);
+			if (is_bad_offset(b_off)) {
+				target = j_target(ctx, exit_idx);
+				if (target == (unsigned int)-1)
+					return -E2BIG;
+				cmp_eq = !cmp_eq;
+				b_off = 4 * 3;
+				if (!(ctx->offsets[this_idx] & OFFSETS_B_CONV)) {
+					ctx->offsets[this_idx] |= OFFSETS_B_CONV;
+					ctx->long_b_conversion = 1;
+				}
+			}
+
+			if (cmp_eq)
+				emit_instr(ctx, bne, dst, src, b_off);
+			else
+				emit_instr(ctx, beq, dst, src, b_off);
+			emit_instr(ctx, nop);
+			if (ctx->offsets[this_idx] & OFFSETS_B_CONV) {
+				emit_instr(ctx, j, target);
+				emit_instr(ctx, nop);
+			}
+			return 2; /* We consumed the exit. */
+		}
+		b_off = b_imm(this_idx + insn->off + 1, ctx);
+		if (is_bad_offset(b_off)) {
+			target = j_target(ctx, this_idx + insn->off + 1);
+			if (target == (unsigned int)-1)
+				return -E2BIG;
+			cmp_eq = !cmp_eq;
+			b_off = 4 * 3;
+			if (!(ctx->offsets[this_idx] & OFFSETS_B_CONV)) {
+				ctx->offsets[this_idx] |= OFFSETS_B_CONV;
+				ctx->long_b_conversion = 1;
+			}
+		}
+
+		if (cmp_eq)
+			emit_instr(ctx, beq, dst, src, b_off);
+		else
+			emit_instr(ctx, bne, dst, src, b_off);
+		emit_instr(ctx, nop);
+		if (ctx->offsets[this_idx] & OFFSETS_B_CONV) {
+			emit_instr(ctx, j, target);
+			emit_instr(ctx, nop);
+		}
+		break;
+
+	case BPF_JMP | BPF_JEQ | BPF_K: /* JMP_IMM */
+	case BPF_JMP | BPF_JNE | BPF_K: /* JMP_IMM */
+	case BPF_JMP32 | BPF_JEQ | BPF_K: /* JMP_IMM */
+	case BPF_JMP32 | BPF_JNE | BPF_K: /* JMP_IMM */
+		cmp_eq = (bpf_op == BPF_JEQ);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		if (dst < 0)
+			return dst;
+		if (insn->imm == 0) {
+			src = MIPS_R_ZERO;
+			if (bpf_class == BPF_JMP32) {
+				dst = LO(dst);
+			} else { /* BPF_JMP */
+				emit_instr(ctx, or, MIPS_R_AT, LO(dst), HI(dst));
+				dst = MIPS_R_AT;
+			}
+		} else if (bpf_class == BPF_JMP32) {
+			gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+			src = MIPS_R_AT;
+			dst = LO(dst);
+		} else { /* BPF_JMP */
+			gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+			/* If low words equal, check high word vs imm sign. */
+			emit_instr(ctx, beq, LO(dst), MIPS_R_AT, 2 * 4);
+			emit_instr(ctx, nop);
+			/* Make high word signs unequal if low words unequal. */
+			emit_instr(ctx, nor, MIPS_R_AT, MIPS_R_ZERO, HI(dst));
+			emit_instr(ctx, sra, MIPS_R_AT, MIPS_R_AT, 31);
+			src = MIPS_R_AT;
+			dst = HI(dst);
+		}
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JSET | BPF_K: /* JMP_IMM */
+	case BPF_JMP32 | BPF_JSET | BPF_K: /* JMP_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+
+		t64u = (u32)insn->imm;
+		gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+		emit_instr(ctx, and, MIPS_R_AT, LO(dst), MIPS_R_AT);
+		if (bpf_class == BPF_JMP && insn->imm < 0)
+			emit_instr(ctx, or, MIPS_R_AT, MIPS_R_AT, HI(dst));
+		src = MIPS_R_AT;
+		dst = MIPS_R_ZERO;
+		cmp_eq = false;
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JA:
+		/*
+		 * Prefer relative branch for easier debugging, but
+		 * fall back if needed.
+		 */
+		b_off = b_imm(this_idx + insn->off + 1, ctx);
+		if (is_bad_offset(b_off)) {
+			target = j_target(ctx, this_idx + insn->off + 1);
+			if (target == (unsigned int)-1)
+				return -E2BIG;
+			emit_instr(ctx, j, target);
+		} else {
+			emit_instr(ctx, b, b_off);
+		}
+		emit_instr(ctx, nop);
+		break;
+	case BPF_LD | BPF_DW | BPF_IMM:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		gen_imm_to_reg(insn, LO(dst), ctx);
+		gen_imm_to_reg(insn+1, HI(dst), ctx);
+		return 2; /* Double slot insn */
+
+	case BPF_JMP | BPF_CALL:
+		emit_bpf_call(ctx, insn);
+		break;
+	case BPF_JMP | BPF_TAIL_CALL:
+		if (emit_bpf_tail_call(ctx, this_idx))
+			return -EINVAL;
+		break;
+
+	case BPF_ALU | BPF_END | BPF_FROM_BE:
+	case BPF_ALU | BPF_END | BPF_FROM_LE:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+#ifdef __BIG_ENDIAN
+		need_swap = (bpf_src == BPF_FROM_LE);
+#else
+		need_swap = (bpf_src == BPF_FROM_BE);
+#endif
+		if (insn->imm == 16) {
+			if (need_swap)
+				emit_instr(ctx, wsbh, LO(dst), LO(dst));
+			emit_instr(ctx, andi, LO(dst), LO(dst), 0xffff);
+		} else if (insn->imm == 32) {
+			if (need_swap) {
+				emit_instr(ctx, wsbh, LO(dst), LO(dst));
+				emit_instr(ctx, rotr, LO(dst), LO(dst), 16);
+			}
+		} else { /* 64-bit*/
+			if (need_swap) {
+				emit_instr(ctx, wsbh, MIPS_R_AT, LO(dst));
+				emit_instr(ctx, wsbh, LO(dst), HI(dst));
+				emit_instr(ctx, rotr, HI(dst), MIPS_R_AT, 16);
+				emit_instr(ctx, rotr, LO(dst), LO(dst), 16);
+			}
+		}
+		break;
+
+	case BPF_ST | BPF_DW | BPF_MEM:
+	case BPF_ST | BPF_B | BPF_MEM:
+	case BPF_ST | BPF_H | BPF_MEM:
+	case BPF_ST | BPF_W | BPF_MEM:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		if (dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
+		gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+
+		switch (bpf_size) {
+		case BPF_B:
+			emit_instr(ctx, sb, MIPS_R_AT, mem_off, LO(dst));
+			break;
+		case BPF_H:
+			emit_instr(ctx, sh, MIPS_R_AT, mem_off, LO(dst));
+			break;
+		case BPF_W:
+			emit_instr(ctx, sw, MIPS_R_AT, mem_off, LO(dst));
+			break;
+		case BPF_DW:
+			/* Memory order == register order in pair */
+			emit_instr(ctx, sw, MIPS_R_AT, OFFLO(mem_off), LO(dst));
+			if (insn->imm < 0) {
+				emit_instr(ctx, nor, MIPS_R_AT,
+						MIPS_R_ZERO, MIPS_R_ZERO);
+				emit_instr(ctx, sw, MIPS_R_AT,
+						OFFHI(mem_off), LO(dst));
+			} else {
+				emit_instr(ctx, sw, MIPS_R_ZERO,
+						OFFHI(mem_off), LO(dst));
+			}
+			break;
+		}
+		break;
+
+	case BPF_LDX | BPF_DW | BPF_MEM:
+	case BPF_LDX | BPF_B | BPF_MEM:
+	case BPF_LDX | BPF_H | BPF_MEM:
+	case BPF_LDX | BPF_W | BPF_MEM:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
+
+		switch (bpf_size) {
+		case BPF_B:
+			emit_instr(ctx, lbu, LO(dst), mem_off, LO(src));
+			break;
+		case BPF_H:
+			emit_instr(ctx, lhu, LO(dst), mem_off, LO(src));
+			break;
+		case BPF_W:
+			emit_instr(ctx, lw, LO(dst), mem_off, LO(src));
+			break;
+		case BPF_DW:
+			/*
+			 * Careful: update HI(dst) first in case dst == src,
+			 * since only LO(src) is the usable pointer.
+			 */
+			emit_instr(ctx, lw, HI(dst), OFFHI(mem_off), LO(src));
+			emit_instr(ctx, lw, LO(dst), OFFLO(mem_off), LO(src));
+			break;
+		}
+		break;
+
+	case BPF_STX | BPF_DW | BPF_ATOMIC:
+		r = emit_bpf_atomic64(ctx, insn);
+		if (r < 0)
+			return r;
+		break;
+	case BPF_STX | BPF_W | BPF_ATOMIC:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
+		if (insn->imm != BPF_ADD) {
+			pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
+			return -EINVAL;
+		}
+		/*
+		 * Drop reg pair scheme for more efficient temp register usage
+		 * given BPF_W mode.
+		 */
+		dst = LO(dst);
+		src = LO(src);
+		/*
+		 * If mem_off does not fit within the 9 bit ll/sc instruction
+		 * immediate field, use a temp reg.
+		 */
+		if (MIPS_ISA_REV >= 6 &&
+		    (mem_off >= BIT(8) || mem_off < -BIT(8))) {
+			emit_instr(ctx, addiu, MIPS_R_T9, dst, mem_off);
+			mem_off = 0;
+			dst = MIPS_R_T9;
+		}
+		emit_instr(ctx, ll, MIPS_R_AT, mem_off, dst);
+		emit_instr(ctx, addu, MIPS_R_AT, MIPS_R_AT, src);
+		emit_instr(ctx, sc, MIPS_R_AT, mem_off, dst);
+		/*
+		 * On failure back up to LL (-4 insns of 4 bytes each)
+		 */
+		emit_instr(ctx, beqz, MIPS_R_AT, -4 * 4);
+		emit_instr(ctx, nop);
+		break;
+
+	case BPF_STX | BPF_DW | BPF_MEM:
+	case BPF_STX | BPF_B | BPF_MEM:
+	case BPF_STX | BPF_H | BPF_MEM:
+	case BPF_STX | BPF_W | BPF_MEM:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
+
+		switch (bpf_size) {
+		case BPF_B:
+			emit_instr(ctx, sb, LO(src), mem_off, LO(dst));
+			break;
+		case BPF_H:
+			emit_instr(ctx, sh, LO(src), mem_off, LO(dst));
+			break;
+		case BPF_W:
+			emit_instr(ctx, sw, LO(src), mem_off, LO(dst));
+			break;
+		case BPF_DW:
+			emit_instr(ctx, sw, HI(src), OFFHI(mem_off), LO(dst));
+			emit_instr(ctx, sw, LO(src), OFFLO(mem_off), LO(dst));
+			break;
+		}
+		break;
+
+	default:
+		pr_err("NOT HANDLED %d - (%02x)\n",
+		       this_idx, (unsigned int)insn->code);
+		return -EINVAL;
+	}
+	/*
+	 * Handle zero-extension if the verifier is unable to patch and
+	 * insert it's own special zext insns.
+	 */
+	if ((bpf_class == BPF_ALU && !(bpf_op == BPF_END && insn->imm == 64)) ||
+	    (bpf_class == BPF_LDX && bpf_size != BPF_DW))
+		gen_zext_insn(dst, false, ctx);
+	return 1;
+}
+
+/* Enable the verifier to insert zext insn for ALU32 ops as needed. */
+bool bpf_jit_needs_zext(void)
+{
+	return true;
+}
diff --git a/arch/mips/net/ebpf_jit_core.c b/arch/mips/net/ebpf_jit_core.c
index 5bc33b4bbb2a..5ea5d4afd661 100644
--- a/arch/mips/net/ebpf_jit_core.c
+++ b/arch/mips/net/ebpf_jit_core.c
@@ -633,7 +633,8 @@ static int build_int_body(struct jit_ctx *ctx)
 
 	for (i = 0; i < prog->len; ) {
 		insn = prog->insnsi + i;
-		if ((ctx->reg_val_types[i] & RVT_VISITED_MASK) == 0) {
+		if (is64bit() && (ctx->reg_val_types[i] &
+				  RVT_VISITED_MASK) == 0) {
 			/* dead instruction, don't emit it. */
 			i++;
 			continue;
@@ -1019,14 +1020,19 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	if (ctx.offsets == NULL)
 		goto out_err;
 
-	ctx.reg_val_types = kcalloc(prog->len + 1, sizeof(*ctx.reg_val_types), GFP_KERNEL);
-	if (ctx.reg_val_types == NULL)
-		goto out_err;
-
 	ctx.skf = prog;
 
-	if (reg_val_propagate(&ctx))
-		goto out_err;
+	/* Static analysis only used for MIPS64. */
+	if (is64bit()) {
+		ctx.reg_val_types = kcalloc(prog->len + 1,
+					    sizeof(*ctx.reg_val_types),
+					    GFP_KERNEL);
+		if (ctx.reg_val_types == NULL)
+			goto out_err;
+
+		if (reg_val_propagate(&ctx))
+			goto out_err;
+	}
 
 	/*
 	 * First pass discovers used resources and instruction offsets
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (12 preceding siblings ...)
  2021-07-12  0:35 ` [RFC PATCH bpf-next v1 14/14] MIPS: eBPF: add MIPS32 JIT Tony Ambardar
@ 2021-07-20  1:25 ` Johan Almbladh
  2021-07-20 17:47   ` Alexei Starovoitov
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
  14 siblings, 1 reply; 32+ messages in thread
From: Johan Almbladh @ 2021-07-20  1:25 UTC (permalink / raw)
  To: Tony Ambardar
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton, netdev, bpf, linux-mips,
	Hassan Naveed, David Daney, Luke Nelson, Serge Semin,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh

Hi Tony,

I am glad that there are more people interested in having a JIT for
MIPS32. We seem to have been working in parallel on the same thing
though. I sent a summary on the state of the MIPS32 JIT on the
linux-mips list a couple of months ago, asking for feedback on the
best way to complete it. When I received no response, I started to
work on a MIPS32 JIT implementation myself. I'll be glad to share what
I have got so we can work together on this.

When I dug deeper into the 64-bit JIT code, I realised that a lot of
fundamental things such as 32-bit register mappings were completely
missing. Most of 32-bit operations were unimplemented. The code is
also quite complex already, so adding full 32-bit hardware support
into the mix did not seem like a good idea. I am sure there is some
common code that can be factored out and re-used, but I do think the
64-bit and 32-bit JITs would be better off as two different
implementations.

My 32-bit implementation is now complete and I am currently testing
it. Test suite output below. What remains to be tested is tail calls.

test_bpf: Summary: 676 PASSED, 0 FAILED, [664/664 JIT'ed]
Tested with kernel 5.14 on MIPS32r2 big-endian and little-endian under QEMU.
Also tested with kernel 5.4 on MIPS 24KEc (MT7628) physical hardware.
(I have added a lot of new tests in the eBPF test suite during the JIT
development, which explains the higher count)

The implementation supports both 32-bit and 64-bit eBPF instructions,
including all atomic operations. 64-bit atomics and div/mod are
implemented as function calls to atomic64 functions, while 32-bit
variants are implemented natively by the JIT.

Register mapping
=================
My 32-bit implementation maps all 64-bit eBPF registers to native
32-bit MIPS registers. In addition, there are four temporary 32-bit
registers available, which is precisely what is needed for doing the
more complex ALU64 operations. This means that the JIT does not use
any stack scratch space for registers. It should be a good thing from
a performance perspective. The register mapping is as follows.

R0: v0,v1 (return)
R1-R2: a0-a3 (args passed in registers)
R3-R5: t0-t5 (args passed on stack)
R6-R9: s0-s7 (callee-saved)
R10: r0,fp (frame pointer)
AX: gp,at (constant blinding)
Temp: t6-t9

To squeeze out enough MIPS registers for the eBPF mapping I had to
make a few unusual choices. First,  I use the at (assembler temporary)
register, which should be fine because the JIT is the assembler. I
also use use the gp (global pointer) register. It is callee-saved, so
I save it on stack and restore it in the epilogue. The eBPF frame
pointer R10 is mapped to fp, also callee-saved, and r0. The latter is
always zero, but on a 32-bit architecture it will also be used to
"store" zeros, so it should be perfectly fine for the 32-bit JIT.
According to the ISA documentation r0 is valid both as a source and a
destination operand.

The complete register mapping simplifies the code since we get rid of
all the swapping to/from the stack scratch space.

I have been focusing on the code the last couple of weeks so I didn't
see your email until now. I am sure that this comes as much of a
surprise to you as it did to me. Anyway, can send a patch with my JIT
implementation tomorrow.

Cheers,
Johan














On Mon, Jul 12, 2021 at 2:35 AM Tony Ambardar <tony.ambardar@gmail.com> wrote:
>
> Greetings!
>
> This patch series adds an eBPF JIT for MIPS32. The approach taken first
> updates existing code to support MIPS64/MIPS32 systems, then refactors
> source into a common core and dependent MIPS64 JIT, and finally adds a
> MIPS32 eBPF JIT implementation using the common framework.
>
> Compared to writing a standalone MIPS32 JIT, this approach has benefits
> for long-term maintainability, but has taken much longer than expected.
> This RFC posting is intended to share progress, gather feedback, and
> raise some questions with BPF and MIPS experts (which I'll cover later).
>
>
> Code Overview
> =============
>
> The initial code updates and refactoring exposed a number of problems in
> the existing MIPS64 JIT, which the first several patches fix. Patch #11
> updates common code to support MIPS64/MIPS32 operation. Patch #12
> separates the common core from the MIPS64 JIT code. Patch #13 adds a
> needed MIPS32 uasm opcode, while patch #14 adds the MIPS32 eBPF JIT.
>
> On MIPS32, 64-bit BPF registers are mapped to 32-bit register pairs, and
> all 64-bit operations are built on 32-bit subregister ops. The MIPS32
> tailcall counter is stored on the stack however. Notable changes from the
> MIPS64 JIT include:
>
>   * BPF_JMP32: implement all conditionals
>   * BPF_JMP | JSET | BPF_K: drop bbit insns only usable on MIPS64 Octeon
>
> Since MIPS32 does not include 64-bit div/mod or atomic opcodes, these BPF
> insns are implemented by directly calling the built-in kernel functions:
> (with thanks to Luke Nelson for posting similar code online)
>
>   * BPF_STX   | BPF_DW  | BPF_XADD
>   * BPF_ALU64 | BPF_DIV | BPF_X
>   * BPF_ALU64 | BPF_DIV | BPF_K
>   * BPF_ALU64 | BPF_MOD | BPF_X
>   * BPF_ALU64 | BPF_MOD | BPF_K
>
>
> Testing
> =======
>
> Testing used LTS kernel 5.10.x and stable 5.13.x running under QEMU.
> The test suite included the 'test_bpf' module and 'test_verifier' from
> kselftests. Using 'test_progs' from kselftests is too difficult in general
> since cross-compilation depends on libbpf/bpftool, which does not support
> cross-endian builds.
>
> The matrix of test configurations executed for this series covered the
> expected register sizes, MIPS ISA releases, and JIT settings:
>
>   WORDSIZE={64-bit,32-bit} x ISA={R2,R6} x JIT={off,on,hardened}
>
> On MIPS32BE and MIPS32LE there was general parity between the results of
> interpreter vs. JIT-backed tests with respect to the numbers of PASSED,
> SKIPPED, and FAILED tests. The same was also true of MIPS64 retesting.
>
> For example, the results below on MIPS32 are typical. Note that skipped
> tests 854 and 855 are "scale" tests which result in OOM on the QEMU malta
> MIPS32 test systems.
>
>   root@OpenWrt:~# sysctl net.core.bpf_jit_enable=1
>   root@OpenWrt:~# modprobe test_bpf
>   ...
>   test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]
>   root@OpenWrt:~# ./test_verifier 0 853
>   ...
>   Summary: 1127 PASSED, 0 SKIPPED, 89 FAILED
>   root@OpenWrt:~# ./test_verifier 855 1149
>   ...
>   Summary: 408 PASSED, 7 SKIPPED, 53 FAILED
>
>
> Open Questions
> ==============
>
> 1. As seen in the patch series, the static analysis used by the MIPS64 JIT
> tends to be fragile in the face of verifier, insn and patching changes.
> After tracking down and fixing several related bugs, I wonder if it were
> better to remove the static analysis and leave things more robust and
> maintainable going forward.
>
> Paul, Thomas, David, what are your views? Do you have thoughts on how best
> to do this?
>
> Would it be possible to replace the static analysis by accessing verifier
> analysis results from a JIT? Daniel, Alexei, or Andrii?
>
>
> 2. The series tries to correctly handle tailcall counter across bpf2bpf
> and tailcalls, and it would be nice to properly support mixing these,
> but this is still a WIP for me. Much of what I've read seems very specific
> to the x86_64 JIT. Is there a good summary of the required changes for a
> JIT in general?
>
> Note: I built a MIPS32LE 'test_progs' after some horrible, ugly hacking,
> and the 'tailcall' tests pass but the 'tailcall_bpf2bpf' tests fail
> cryptically. I can send a log and strace if someone helpful could kindly
> take a look. Is there an alternative, good standalone test available?
>
>
>
> Possible Next Steps
> ===================
>
> 1. Implementing the new BPF_ATOMIC insns *should* be straightforward
> on MIPS32. I'm less certain of MIPS64 given the static analysis and
> related zext/sext logic.
>
> 2. The BPF_JMP32 class is another big gap on MIPS64. Has anyone looked at
> this before? It also ties to the static analysis, but on first glance
> appears feasible.
>
>
>
> Thanks in advance for any feedback or suggestions!
>
>
> Tony Ambardar (14):
>   MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis
>   MIPS: eBPF: mask 32-bit index for tail calls
>   MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis
>   MIPS: eBPF: support BPF_JMP32 in JIT static analysis
>   MIPS: eBPF: fix system hang with verifier dead-code patching
>   MIPS: eBPF: fix JIT static analysis hang with bounded loops
>   MIPS: eBPF: fix MOD64 insn on R6 ISA
>   MIPS: eBPF: support long jump for BPF_JMP|EXIT
>   MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM
>   MIPS: eBPF: improve and clarify enum 'which_ebpf_reg'
>   MIPS: eBPF: add core support for 32/64-bit systems
>   MIPS: eBPF: refactor common MIPS64/MIPS32 functions and headers
>   MIPS: uasm: Enable muhu opcode for MIPS R6
>   MIPS: eBPF: add MIPS32 JIT
>
>  Documentation/admin-guide/sysctl/net.rst |    6 +-
>  Documentation/networking/filter.rst      |    6 +-
>  arch/mips/Kconfig                        |    4 +-
>  arch/mips/include/asm/uasm.h             |    1 +
>  arch/mips/mm/uasm-mips.c                 |    4 +-
>  arch/mips/mm/uasm.c                      |    3 +-
>  arch/mips/net/Makefile                   |    8 +-
>  arch/mips/net/ebpf_jit.c                 | 1935 ----------------------
>  arch/mips/net/ebpf_jit.h                 |  295 ++++
>  arch/mips/net/ebpf_jit_comp32.c          | 1241 ++++++++++++++
>  arch/mips/net/ebpf_jit_comp64.c          |  987 +++++++++++
>  arch/mips/net/ebpf_jit_core.c            | 1118 +++++++++++++
>  12 files changed, 3663 insertions(+), 1945 deletions(-)
>  delete mode 100644 arch/mips/net/ebpf_jit.c
>  create mode 100644 arch/mips/net/ebpf_jit.h
>  create mode 100644 arch/mips/net/ebpf_jit_comp32.c
>  create mode 100644 arch/mips/net/ebpf_jit_comp64.c
>  create mode 100644 arch/mips/net/ebpf_jit_core.c
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT
  2021-07-20  1:25 ` [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, " Johan Almbladh
@ 2021-07-20 17:47   ` Alexei Starovoitov
  0 siblings, 0 replies; 32+ messages in thread
From: Alexei Starovoitov @ 2021-07-20 17:47 UTC (permalink / raw)
  To: Johan Almbladh
  Cc: Tony Ambardar, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Thomas Bogendoerfer, Paul Burton,
	Network Development, bpf, linux-mips, Hassan Naveed, David Daney,
	Luke Nelson, Serge Semin, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh

On Mon, Jul 19, 2021 at 6:25 PM Johan Almbladh
<johan.almbladh@anyfinetworks.com> wrote:
>
> I have been focusing on the code the last couple of weeks so I didn't
> see your email until now. I am sure that this comes as much of a
> surprise to you as it did to me. Anyway, can send a patch with my JIT
> implementation tomorrow.

It is surprising to have not one but two mips32 JITs :)
I really hope you folks can figure out the common path forward.
It sounds to me that the register mapping choices in both
implementations are the same (which would be the most debated
part to agree on).
Not seeing Johan's patches it's hard to make any comparison.
So far I like Tony's patches. The refactoring and code sharing is great.

Tony,
what 'static analysis' by the JIT you're referring to?
re: bpf_jit_needs_zext issue between JIT and the verifier.
It's a difficult one.
opt_subreg_zext_lo32_rnd_hi32() shouldn't depend on JIT
(other than bpf_jit_needs_zext).
But you're setting that callback the same way as x86-32 JIT.
So the same bug should be seen there too.
Could you double check if it's the case?
It's either a regression (if both x86-32 and mips32 JITs fail
this test_verifier test) or endianness related (if it's mip32 JIT only).

Thank you both for the exciting work!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 00/16] MIPS: eBPF: refactor code, add MIPS32 JIT
  2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
                   ` (13 preceding siblings ...)
  2021-07-20  1:25 ` [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, " Johan Almbladh
@ 2021-10-05  8:26 ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 01/16] MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis Tony Ambardar
                     ` (14 more replies)
  14 siblings, 15 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

This patch series adds an eBPF JIT for MIPS32. The approach taken fixes and
updates existing code to support MIPS64/MIPS32 systems, then refactors
source into a common core and dependent MIPS64 JIT, and finally adds a
MIPS32 eBPF JIT implementation using the common framework. This approach of
developing MIPS64 and MIPS32 JITs in tandem has desirable benefits for
consistency and long-term maintainability, and the iterative refactoring
has helped identify several problems.


Overview
========

The initial code updates and refactoring exposed a number of problems in
the existing MIPS64 JIT, which are fixed in patches #1 to #9. Patch #10
updates common code to support MIPS64/MIPS32 operation. Patch #12 separates
the common core from the MIPS64 JIT code. Patches #13 and #14 add MIPS64
support for BPF_ATOMIC and BPF_JMP32 insns. Patch #15 adds a needed MIPS32
uasm opcode, while patch #16 adds the MIPS32 eBPF JIT.

Updates to the common core notably include support for bpf2bpf calls and
making tailcalls from BPF subprograms (e.g. patch #11). Some lower priority
features such as MIPS R1 ISA support, direct kernel calls (e.g. for TCP
congestion control experiments) and PROBE_MEM support have been omitted.

On MIPS32, 64-bit BPF registers are mapped to 32-bit register pairs, and
all 64-bit operations are built on 32-bit subregister ops. A few
differences from the MIPS64 JIT include:

  * BPF TAILCALL: counter stored on stack due to register pressure. 
  * BPF_JMP | JSET | BPF_K: drop bbit insns only usable on MIPS64 Octeon

Since MIPS32 does not include 64-bit div/mod or atomic opcodes, these BPF
insns are implemented by directly calling the built-in kernel functions:
(with thanks to Luke Nelson for posting similar code online)

  * BPF_ATOMIC | BPF_DW | BPF_ADD (+BPF_FETCH)
  * BPF_ATOMIC | BPF_DW | BPF_AND (+BPF_FETCH)
  * BPF_ATOMIC | BPF_DW | BPF_XOR (+BPF_FETCH)
  * BPF_ATOMIC | BPF_DW | BPF_OR  (+BPF_FETCH)
  * BPF_ATOMIC | BPF_DW | BPF_XCHG
  * BPF_ATOMIC | BPF_DW | BPF_CMPXCHG
  * BPF_ALU64  | BPF_DIV | BPF_X
  * BPF_ALU64  | BPF_DIV | BPF_K
  * BPF_ALU64  | BPF_MOD | BPF_X
  * BPF_ALU64  | BPF_MOD | BPF_K


Testing
=======

Testing used LTS kernel 5.10.x and stable 5.13.x running on QEMU/OpenWRT.
The test suite included the 'test_bpf' module, and 'test_verifier' and
'test_progs' from kselftests.

Using 'test_progs' from kselftests proved to be difficult in general
since cross-compilation depends on libbpf/bpftool, which does not support
cross-endian builds. A very hacked build was used, primarily for testing
bpf2bpf calls and tailcalls.

The matrix of test configurations executed for this series covered the
expected register sizes, MIPS ISA releases, and JIT settings:

  WORDSIZE={64-bit,32-bit} x ISA={R2,R6} x JIT={off,on,hardened}

On MIPS32BE and MIPS32LE there was general parity between the results of
interpreter vs. JIT-backed tests with respect to the numbers of PASSED,
SKIPPED, and FAILED tests. The same was also true of MIPS64 retesting.

For example, the results below on MIPS32 are typical. Note that skipped
test 885 is a "scale" test which results in OOM on the QEMU malta MIPS32
test systems used.

  root@OpenWrt:~# sysctl net.core.bpf_jit_enable=1
  root@OpenWrt:~# modprobe test_bpf
  ...
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]
  root@OpenWrt:~# ./test_verifier 0 884
  ...
  Summary: 1231 PASSED, 0 SKIPPED, 20 FAILED
  root@OpenWrt:~# ./test_verifier 886 1184
  ...
  Summary: 459 PASSED, 1 SKIPPED, 2 FAILED
  root@OpenWrt:~# ./test_progs -n 105,106
  ...
  105 subprogs:OK
  106/1 tailcall_1:OK
  106/2 tailcall_2:OK
  106/3 tailcall_3:OK
  106/4 tailcall_4:OK
  106/5 tailcall_5:OK
  106/6 tailcall_bpf2bpf_1:OK
  106/7 tailcall_bpf2bpf_2:OK
  106/8 tailcall_bpf2bpf_3:OK
  106/9 tailcall_bpf2bpf_4:OK
  106 tailcalls:OK
  Summary: 2/9 PASSED, 0 SKIPPED, 0 FAILED


All feedback and suggestions are much appreciated!

---
Change History:

rfc v2:
* Implement all BPF_ATOMIC ops. For MIPS32 BPF_DW insns, call built-in
  64-bit kernel functions.
* Add MIPS64 support for BPF_JMP32 conditionals.
* Support making tailcalls from bpf2bpf functions.
* Support bpf2bpf calls with an extra JIT pass to patch call addresses.
* Add JIT support for bpf_line_info via bpf_prog_fill_jited_linfo().
* Further code optimizations, cleanup and simplification.
* Update kernel docs.

rfc v1:
* Initial code proposal, focused on consistency and maintainability for
  both MIPS32/MIPS64.
* Several MIPS64 bugfixes and factoring out common shareable code.
* Addition of MIPS32 JIT, roughly matching MIPS64 capabilities.

---
Tony Ambardar (16):
  MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis
  MIPS: eBPF: mask 32-bit index for tail calls
  MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis
  MIPS: eBPF: support BPF_JMP32 in JIT static analysis
  MIPS: eBPF: fix system hang with verifier dead-code patching
  MIPS: eBPF: fix JIT static analysis hang with bounded loops
  MIPS: eBPF: fix MOD64 insn on R6 ISA
  MIPS: eBPF: support long jump for BPF_JMP|EXIT
  MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM
  MIPS: eBPF: add core support for 32/64-bit systems
  bpf: allow tailcalls in subprograms for MIPS64/MIPS32
  MIPS: eBPF: refactor common 32/64-bit functions and headers
  MIPS: eBPF64: support BPF_JMP32 conditionals
  MIPS: eBPF64: implement all BPF_ATOMIC ops
  MIPS: uasm: Enable muhu opcode for MIPS R6
  MIPS: eBPF: add MIPS32 JIT

 Documentation/admin-guide/sysctl/net.rst |    6 +-
 Documentation/networking/filter.rst      |    6 +-
 arch/mips/Kconfig                        |    4 +-
 arch/mips/include/asm/uasm.h             |    1 +
 arch/mips/mm/uasm-mips.c                 |    4 +-
 arch/mips/mm/uasm.c                      |    3 +-
 arch/mips/net/Makefile                   |    8 +-
 arch/mips/net/ebpf_jit.c                 | 1938 ----------------------
 arch/mips/net/ebpf_jit.h                 |  297 ++++
 arch/mips/net/ebpf_jit_comp32.c          | 1398 ++++++++++++++++
 arch/mips/net/ebpf_jit_comp64.c          | 1085 ++++++++++++
 arch/mips/net/ebpf_jit_core.c            | 1193 +++++++++++++
 arch/x86/net/bpf_jit_comp.c              |    6 +
 include/linux/filter.h                   |    1 +
 kernel/bpf/core.c                        |    6 +
 kernel/bpf/verifier.c                    |    3 +-
 16 files changed, 4010 insertions(+), 1949 deletions(-)
 delete mode 100644 arch/mips/net/ebpf_jit.c
 create mode 100644 arch/mips/net/ebpf_jit.h
 create mode 100644 arch/mips/net/ebpf_jit_comp32.c
 create mode 100644 arch/mips/net/ebpf_jit_comp64.c
 create mode 100644 arch/mips/net/ebpf_jit_core.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 01/16] MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 02/16] MIPS: eBPF: mask 32-bit index for tail calls Tony Ambardar
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Add support in reg_val_propagate_range() for BPF_TAIL_CALL, fixing many
kernel log WARNINGs ("Unhandled BPF_JMP case") seen during JIT testing.
Treat BPF_TAIL_CALL like a NOP, falling through as if the tail call failed.

Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 3a73e9375712..0e99cb790564 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1717,6 +1717,9 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 				for (reg = BPF_REG_0; reg <= BPF_REG_5; reg++)
 					set_reg_val_type(&exit_rvt, reg, REG_64BIT);
 
+				rvt[idx] |= RVT_DONE;
+				break;
+			case BPF_TAIL_CALL:
 				rvt[idx] |= RVT_DONE;
 				break;
 			default:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 02/16] MIPS: eBPF: mask 32-bit index for tail calls
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 01/16] MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 03/16] MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis Tony Ambardar
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

The program array index for tail-calls should be 32-bit, so zero-extend to
sanitize the value. This fixes failures seen for test_verifier test:

  852/p runtime/jit: pass > 32bit index to tail_call FAIL retval 2 != 42

Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 0e99cb790564..82ea20399b70 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -611,6 +611,8 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx, int this_idx)
 	 * if (index >= array->map.max_entries)
 	 *     goto out;
 	 */
+	/* Mask index as 32-bit */
+	emit_instr(ctx, dinsu, MIPS_R_A2, MIPS_R_ZERO, 32, 32);
 	off = offsetof(struct bpf_array, map.max_entries);
 	emit_instr(ctx, lwu, MIPS_R_T5, off, MIPS_R_A1);
 	emit_instr(ctx, sltu, MIPS_R_AT, MIPS_R_T5, MIPS_R_A2);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 03/16] MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 01/16] MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 02/16] MIPS: eBPF: mask 32-bit index for tail calls Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 04/16] MIPS: eBPF: support BPF_JMP32 " Tony Ambardar
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Update reg_val_propagate_range() to add the missing case for BPF_ALU|ARSH,
otherwise zero-extension breaks for this opcode.

Resolves failures for test_verifier tests:
  963/p arsh32 reg zero extend check FAIL retval -1 != 0
  964/u arsh32 imm zero extend check FAIL retval -1 != 0
  964/p arsh32 imm zero extend check FAIL retval -1 != 0

Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 82ea20399b70..b41ebcfb90c4 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1590,6 +1590,7 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 			case BPF_AND:
 			case BPF_LSH:
 			case BPF_RSH:
+			case BPF_ARSH:
 			case BPF_NEG:
 			case BPF_MOD:
 			case BPF_XOR:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 04/16] MIPS: eBPF: support BPF_JMP32 in JIT static analysis
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (2 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 03/16] MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 05/16] MIPS: eBPF: fix system hang with verifier dead-code patching Tony Ambardar
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

While the MIPS64 JIT rejects programs with JMP32 insns, it still performs
initial static analysis. Add support in reg_val_propagate_range() for
BPF_JMP32, fixing kernel log WARNINGs ("Unhandled BPF_JMP case") seen
during JIT testing. Handle code BPF_JMP32 the same as BPF_JMP.

Fixes: 092ed0968bb6 ("bpf: verifier support JMP32")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index b41ebcfb90c4..dbde5d6eefa6 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1686,6 +1686,7 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 			rvt[idx] |= RVT_DONE;
 			break;
 		case BPF_JMP:
+		case BPF_JMP32:
 			switch (BPF_OP(insn->code)) {
 			case BPF_EXIT:
 				rvt[idx] = RVT_DONE | exit_rvt;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 05/16] MIPS: eBPF: fix system hang with verifier dead-code patching
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (3 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 04/16] MIPS: eBPF: support BPF_JMP32 " Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 06/16] MIPS: eBPF: fix JIT static analysis hang with bounded loops Tony Ambardar
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Commit 2a5418a13fcf changed verifier dead code handling from patching with
NOPs to using a loop trap made with BPF_JMP_IMM(BPF_JA, 0, 0, -1). This
confuses the JIT static analysis, which follows the loop assuming the
verifier passed safe code, and results in a system hang and RCU stall.
Update reg_val_propagate_range() to fall through these trap insns.

Trigger the bug using test_verifier "check known subreg with unknown reg".

Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index dbde5d6eefa6..0928d86cb3b0 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1694,6 +1694,14 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 				return idx;
 			case BPF_JA:
 				rvt[idx] |= RVT_DONE;
+				/*
+				 * Verifier dead code patching can use
+				 * infinite-loop traps, causing hangs and
+				 * RCU stalls here. Treat traps as nops
+				 * if detected and fall through.
+				 */
+				if (insn->off == -1)
+					break;
 				idx += insn->off;
 				break;
 			case BPF_JEQ:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 06/16] MIPS: eBPF: fix JIT static analysis hang with bounded loops
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (4 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 05/16] MIPS: eBPF: fix system hang with verifier dead-code patching Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 07/16] MIPS: eBPF: fix MOD64 insn on R6 ISA Tony Ambardar
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Support for bounded loops allowed the verifier to output backward jumps
such as BPF_JMP_A(-4). These trap the JIT's static analysis in a loop,
resulting in a system hang and eventual RCU stall.

Fix by updating reg_val_propagate_range() to skip backward jumps when in
fallthrough mode and if the jump target has been visited already.

Trigger the bug using the test_verifier test "bounded loop that jumps out
rather than in".

Fixes: 2589726d12a1 ("bpf: introduce bounded loops")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 0928d86cb3b0..b8dc6cebefab 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1693,6 +1693,10 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 				rvt[prog->len] = exit_rvt;
 				return idx;
 			case BPF_JA:
+			{
+				int tgt = idx + 1 + insn->off;
+				bool visited = (rvt[tgt] & RVT_FALL_THROUGH);
+
 				rvt[idx] |= RVT_DONE;
 				/*
 				 * Verifier dead code patching can use
@@ -1702,8 +1706,16 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 				 */
 				if (insn->off == -1)
 					break;
+				/*
+				 * Bounded loops cause the same issues in
+				 * fallthrough mode; follow only if jump
+				 * target is unvisited to mitigate.
+				 */
+				if (insn->off < 0 && !follow_taken && visited)
+					break;
 				idx += insn->off;
 				break;
+			}
 			case BPF_JEQ:
 			case BPF_JGT:
 			case BPF_JGE:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 07/16] MIPS: eBPF: fix MOD64 insn on R6 ISA
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (5 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 06/16] MIPS: eBPF: fix JIT static analysis hang with bounded loops Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 08/16] MIPS: eBPF: support long jump for BPF_JMP|EXIT Tony Ambardar
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

The BPF_ALU64 | BPF_MOD implementation is broken on MIPS64R6 due to use of
a 32-bit "modu" insn, as shown by the test_verifier failures:

  455/p MOD64 overflow, check 1 FAIL retval 0 != 1 (run 1/1)
  456/p MOD64 overflow, check 2 FAIL retval 0 != 1 (run 1/1)

Resolve by using the 64-bit "dmodu" instead.

Fixes: 6c2c8a188868 ("MIPS: eBPF: Provide eBPF support for MIPS64R6")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index b8dc6cebefab..00dc20bc0def 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -800,7 +800,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			if (bpf_op == BPF_DIV)
 				emit_instr(ctx, ddivu_r6, dst, dst, MIPS_R_AT);
 			else
-				emit_instr(ctx, modu, dst, dst, MIPS_R_AT);
+				emit_instr(ctx, dmodu, dst, dst, MIPS_R_AT);
 			break;
 		}
 		emit_instr(ctx, ddivu, dst, MIPS_R_AT);
@@ -882,7 +882,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 					emit_instr(ctx, ddivu_r6,
 							dst, dst, src);
 				else
-					emit_instr(ctx, modu, dst, dst, src);
+					emit_instr(ctx, dmodu, dst, dst, src);
 				break;
 			}
 			emit_instr(ctx, ddivu, dst, src);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 08/16] MIPS: eBPF: support long jump for BPF_JMP|EXIT
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (6 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 07/16] MIPS: eBPF: fix MOD64 insn on R6 ISA Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 09/16] MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM Tony Ambardar
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Existing JIT code supports only short (18-bit) branches for BPF EXIT, and
results in some tests from module test_bpf not being jited. Update code
to fall back to long (28-bit) jumps if short branches are insufficient.

Before:
  test_bpf: #296 BPF_MAXINSNS: exec all MSH jited:0 1556004 PASS
  test_bpf: #297 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 824957 PASS
  test_bpf: Summary: 378 PASSED, 0 FAILED, [364/366 JIT'ed]

After:
  test_bpf: #296 BPF_MAXINSNS: exec all MSH jited:1 221998 PASS
  test_bpf: #297 BPF_MAXINSNS: ld_abs+get_processor_id jited:1 490507 PASS
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]

Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 00dc20bc0def..7252cd44ff63 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -994,9 +994,14 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_EXIT:
 		if (this_idx + 1 < exit_idx) {
 			b_off = b_imm(exit_idx, ctx);
-			if (is_bad_offset(b_off))
-				return -E2BIG;
-			emit_instr(ctx, beq, MIPS_R_ZERO, MIPS_R_ZERO, b_off);
+			if (is_bad_offset(b_off)) {
+				target = j_target(ctx, exit_idx);
+				if (target == (unsigned int)-1)
+					return -E2BIG;
+				emit_instr(ctx, j, target);
+			} else {
+				emit_instr(ctx, b, b_off);
+			}
 			emit_instr(ctx, nop);
 		}
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 09/16] MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (7 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 08/16] MIPS: eBPF: support long jump for BPF_JMP|EXIT Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 10/16] MIPS: eBPF: add core support for 32/64-bit systems Tony Ambardar
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Stop enforcing (insn->src_reg == 0) and allow special verifier insns such
as "BPF_LD_MAP_FD(BPF_REG_2, 0)", used to refer to a process-local map fd
and introduced in 0246e64d9a5f ("bpf: handle pseudo BPF_LD_IMM64 insn").

This is consistent with other JITs such as riscv32 and also used by
test_verifier (e.g. "runtime/jit: tail_call within bounds, prog once").

Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 7252cd44ff63..7fbd4e371c80 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1303,8 +1303,6 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		emit_instr(ctx, nop);
 		break;
 	case BPF_LD | BPF_DW | BPF_IMM:
-		if (insn->src_reg != 0)
-			return -EINVAL;
 		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
 		if (dst < 0)
 			return dst;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 10/16] MIPS: eBPF: add core support for 32/64-bit systems
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (8 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 09/16] MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 11/16] bpf: allow tailcalls in subprograms for MIPS64/MIPS32 Tony Ambardar
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Update register definitions and flags for both 32/64-bit operation. Add a
common register lookup table, modifying ebpf_to_mips_reg() to use this,
and update enum and literals for ebpf_to_mips_reg() to be more consistent
and less confusing. Add is64bit() and isbigend() common helper functions.

On MIPS32, BPF registers are held in register pairs defined by the base
register. Word-size and endian-aware helper macros select 32-bit registers
from a pair and generate 32-bit word memory storage offsets. The BPF TCC
is stored to the stack due to register pressure.

Update bpf_int_jit_compile() to properly enable BPF2BPF calls, by adding
support for the extra pass needed to fix up function target addresses.
Also provide bpf line info by calling bpf_prog_fill_jited_linfo().
Modify build_int_prologue() and build_int_epilogue() to handle MIPS32
registers and any adjustments needed during program entry/exit/transfer
when transitioning between the native N64/O32 ABI and the BPF 64-bit ABI.
Also ensure ABI-consistent stack alignment and use the verifier-provided
stack depth during setup, saving considerable stack space.

Update emit_const_to_reg() to work across MIPS64 and MIPS32 systems and
optimize gen_imm_to_reg() to only set the lower halfword if needed.

Rework emit_bpf_tail_call() to also support MIPS32 usage and add common
helpers, emit_bpf_call() and emit_push_args(), handling TCC and ABI
variations on MIPS32/MIPS64. Add tail_call_present() and update tailcall
handling to support mixing BPF2BPF subprograms and tailcalls.

Add sign and zero-extension helpers usable with verifier zext insertion,
gen_zext_insn() and gen_sext_insn().

Add common functions emit_caller_save() and emit_caller_restore(), which
push and pop all caller-saved BPF registers to the stack, for use with
JIT-internal kernel calls such as those needed for BPF insns unsupported
by native system ISA opcodes. Since these calls would be hidden from any
BPF C compiler, which would normally spill needed registers during a call,
the JIT must handle save/restore itself.

Adopt a dedicated BPF FP (in MIPS_R_S8), and relax FP usage within insns.
This reduces ad-hoc code doing $sp manipulation with temp registers, and
allows wider usage of BPF FP for comparison and arithmetic. For example,
the following tests from test_verifier are now jited but not previously:

  939/p store PTR_TO_STACK in R10 to array map using BPF_B
  981/p unpriv: cmp pointer with pointer
  984/p unpriv: indirectly pass pointer on stack to helper function
  985/p unpriv: mangle pointer on stack 1
  986/p unpriv: mangle pointer on stack 2
  1001/p unpriv: partial copy of pointer
  1097/p xadd/w check whether src/dst got mangled, 1
  1098/p xadd/w check whether src/dst got mangled, 2

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c | 1008 +++++++++++++++++++++++++++-----------
 1 file changed, 729 insertions(+), 279 deletions(-)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 7fbd4e371c80..7d8ed8bb19ab 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1,11 +1,13 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Just-In-Time compiler for eBPF filters on MIPS
- *
- * Copyright (c) 2017 Cavium, Inc.
+ * Just-In-Time compiler for eBPF filters on MIPS32/MIPS64
+ * Copyright (c) 2021 Tony Ambardar <Tony.Ambardar@gmail.com>
  *
  * Based on code from:
  *
+ * Copyright (c) 2017 Cavium, Inc.
+ * Author: David Daney <david.daney@cavium.com>
+ *
  * Copyright (c) 2014 Imagination Technologies Ltd.
  * Author: Markos Chandras <markos.chandras@imgtec.com>
  */
@@ -22,31 +24,42 @@
 #include <asm/isa-rev.h>
 #include <asm/uasm.h>
 
-/* Registers used by JIT */
+/* Registers used by JIT:	  (MIPS32)	(MIPS64) */
 #define MIPS_R_ZERO	0
 #define MIPS_R_AT	1
-#define MIPS_R_V0	2	/* BPF_R0 */
-#define MIPS_R_V1	3
-#define MIPS_R_A0	4	/* BPF_R1 */
-#define MIPS_R_A1	5	/* BPF_R2 */
-#define MIPS_R_A2	6	/* BPF_R3 */
-#define MIPS_R_A3	7	/* BPF_R4 */
-#define MIPS_R_A4	8	/* BPF_R5 */
-#define MIPS_R_T4	12	/* BPF_AX */
-#define MIPS_R_T5	13
-#define MIPS_R_T6	14
-#define MIPS_R_T7	15
-#define MIPS_R_S0	16	/* BPF_R6 */
-#define MIPS_R_S1	17	/* BPF_R7 */
-#define MIPS_R_S2	18	/* BPF_R8 */
-#define MIPS_R_S3	19	/* BPF_R9 */
-#define MIPS_R_S4	20	/* BPF_TCC */
-#define MIPS_R_S5	21
-#define MIPS_R_S6	22
-#define MIPS_R_S7	23
-#define MIPS_R_T8	24
-#define MIPS_R_T9	25
+#define MIPS_R_V0	2	/* BPF_R0	BPF_R0 */
+#define MIPS_R_V1	3	/* BPF_R0	BPF_TCC */
+#define MIPS_R_A0	4	/* BPF_R1	BPF_R1 */
+#define MIPS_R_A1	5	/* BPF_R1	BPF_R2 */
+#define MIPS_R_A2	6	/* BPF_R2	BPF_R3 */
+#define MIPS_R_A3	7	/* BPF_R2	BPF_R4 */
+
+/* MIPS64 replaces T0-T3 scratch regs with extra arguments A4-A7. */
+#ifdef CONFIG_64BIT
+#  define MIPS_R_A4	8	/* (n/a)	BPF_R5 */
+#else
+#  define MIPS_R_T0	8	/* BPF_R3	(n/a)  */
+#  define MIPS_R_T1	9	/* BPF_R3	(n/a)  */
+#  define MIPS_R_T2	10	/* BPF_R4	(n/a)  */
+#  define MIPS_R_T3	11	/* BPF_R4	(n/a)  */
+#endif
+
+#define MIPS_R_T4	12	/* BPF_R5	BPF_AX */
+#define MIPS_R_T5	13	/* BPF_R5	(free) */
+#define MIPS_R_T6	14	/* BPF_AX	(used) */
+#define MIPS_R_T7	15	/* BPF_AX	(free) */
+#define MIPS_R_S0	16	/* BPF_R6	BPF_R6 */
+#define MIPS_R_S1	17	/* BPF_R6	BPF_R7 */
+#define MIPS_R_S2	18	/* BPF_R7	BPF_R8 */
+#define MIPS_R_S3	19	/* BPF_R7	BPF_R9 */
+#define MIPS_R_S4	20	/* BPF_R8	BPF_TCC */
+#define MIPS_R_S5	21	/* BPF_R8	(free) */
+#define MIPS_R_S6	22	/* BPF_R9	(free) */
+#define MIPS_R_S7	23	/* BPF_R9	(free) */
+#define MIPS_R_T8	24	/* (used)	(used) */
+#define MIPS_R_T9	25	/* (used)	(used) */
 #define MIPS_R_SP	29
+#define MIPS_R_S8	30	/* BPF_R10	BPF_R10 */
 #define MIPS_R_RA	31
 
 /* eBPF flags */
@@ -55,10 +68,117 @@
 #define EBPF_SAVE_S2	BIT(2)
 #define EBPF_SAVE_S3	BIT(3)
 #define EBPF_SAVE_S4	BIT(4)
-#define EBPF_SAVE_RA	BIT(5)
-#define EBPF_SEEN_FP	BIT(6)
-#define EBPF_SEEN_TC	BIT(7)
-#define EBPF_TCC_IN_V1	BIT(8)
+#define EBPF_SAVE_S5	BIT(5)
+#define EBPF_SAVE_S6	BIT(6)
+#define EBPF_SAVE_S7	BIT(7)
+#define EBPF_SAVE_S8	BIT(8)
+#define EBPF_SAVE_RA	BIT(9)
+#define EBPF_SEEN_FP	BIT(10)
+#define EBPF_SEEN_TC	BIT(11)
+#define EBPF_TCC_IN_RUN	BIT(12)
+
+/*
+ * Extra JIT registers dedicated to holding TCC during runtime or saving
+ * across calls.
+ */
+enum {
+	JIT_RUN_TCC = MAX_BPF_JIT_REG,
+	JIT_SAV_TCC
+};
+/* Temporary register for passing TCC if nothing dedicated. */
+#define TEMP_PASS_TCC MIPS_R_T8
+
+/*
+ * Word-size and endianness-aware helpers for building MIPS32 vs MIPS64
+ * tables and selecting 32-bit subregisters from a register pair base.
+ * Simplify use by emulating MIPS_R_SP and MIPS_R_ZERO as register pairs
+ * and adding HI/LO word memory offsets.
+ */
+#ifdef CONFIG_64BIT
+#  define HI(reg) (reg)
+#  define LO(reg) (reg)
+#  define OFFHI(mem) (mem)
+#  define OFFLO(mem) (mem)
+#else	/* CONFIG_32BIT */
+#  ifdef __BIG_ENDIAN
+#    define HI(reg) ((reg) == MIPS_R_SP ? MIPS_R_ZERO : \
+		     (reg) == MIPS_R_S8 ? MIPS_R_ZERO : \
+		     (reg))
+#    define LO(reg) ((reg) == MIPS_R_ZERO ? (reg) : \
+		     (reg) == MIPS_R_SP ? (reg) : \
+		     (reg) == MIPS_R_S8 ? (reg) : \
+		     (reg) + 1)
+#    define OFFHI(mem) (mem)
+#    define OFFLO(mem) ((mem) + sizeof(long))
+#  else	/* __LITTLE_ENDIAN */
+#    define HI(reg) ((reg) == MIPS_R_ZERO ? (reg) : \
+		     (reg) == MIPS_R_SP ? MIPS_R_ZERO : \
+		     (reg) == MIPS_R_S8 ? MIPS_R_ZERO : \
+		     (reg) + 1)
+#    define LO(reg) (reg)
+#    define OFFHI(mem) ((mem) + sizeof(long))
+#    define OFFLO(mem) (mem)
+#  endif
+#endif
+
+#ifdef CONFIG_64BIT
+#  define M(expr32, expr64) (expr64)
+#else
+#  define M(expr32, expr64) (expr32)
+#endif
+const struct {
+	/* Register or pair base */
+	int reg;
+	/* Register flags */
+	u32 flags;
+	/* Usage table:   (MIPS32)			 (MIPS64) */
+} bpf2mips[] = {
+	/* Return value from in-kernel function, and exit value from eBPF. */
+	[BPF_REG_0] =  {M(MIPS_R_V0,			MIPS_R_V0)},
+	/* Arguments from eBPF program to in-kernel/BPF functions. */
+	[BPF_REG_1] =  {M(MIPS_R_A0,			MIPS_R_A0)},
+	[BPF_REG_2] =  {M(MIPS_R_A2,			MIPS_R_A1)},
+	[BPF_REG_3] =  {M(MIPS_R_T0,			MIPS_R_A2)},
+	[BPF_REG_4] =  {M(MIPS_R_T2,			MIPS_R_A3)},
+	[BPF_REG_5] =  {M(MIPS_R_T4,			MIPS_R_A4)},
+	/* Callee-saved registers preserved by in-kernel/BPF functions. */
+	[BPF_REG_6] =  {M(MIPS_R_S0,			MIPS_R_S0),
+			M(EBPF_SAVE_S0|EBPF_SAVE_S1,	EBPF_SAVE_S0)},
+	[BPF_REG_7] =  {M(MIPS_R_S2,			MIPS_R_S1),
+			M(EBPF_SAVE_S2|EBPF_SAVE_S3,	EBPF_SAVE_S1)},
+	[BPF_REG_8] =  {M(MIPS_R_S4,			MIPS_R_S2),
+			M(EBPF_SAVE_S4|EBPF_SAVE_S5,	EBPF_SAVE_S2)},
+	[BPF_REG_9] =  {M(MIPS_R_S6,			MIPS_R_S3),
+			M(EBPF_SAVE_S6|EBPF_SAVE_S7,	EBPF_SAVE_S3)},
+	[BPF_REG_10] = {M(MIPS_R_S8,			MIPS_R_S8),
+			M(EBPF_SAVE_S8|EBPF_SEEN_FP,	EBPF_SAVE_S8|EBPF_SEEN_FP)},
+	/* Internal register for rewriting insns during JIT blinding. */
+	[BPF_REG_AX] = {M(MIPS_R_T6,			MIPS_R_T4)},
+	/*
+	 * Internal registers for TCC runtime holding and saving during
+	 * calls. A zero save register indicates using scratch space on
+	 * the stack for storage during calls. A zero hold register means
+	 * no dedicated register holds TCC during runtime (but a temp reg
+	 * still passes TCC to tailcall or bpf2bpf call).
+	 */
+	[JIT_RUN_TCC] =	{M(0,				MIPS_R_V1)},
+	[JIT_SAV_TCC] =	{M(0,				MIPS_R_S4),
+			 M(0,				EBPF_SAVE_S4)}
+};
+#undef M
+
+static inline bool is64bit(void)
+{
+	return IS_ENABLED(CONFIG_64BIT);
+}
+
+static inline bool isbigend(void)
+{
+	return IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
+}
+
+/* Stack region alignment under N64 and O32 ABIs */
+#define STACK_ALIGN (2 * sizeof(long))
 
 /*
  * For the mips64 ISA, we need to track the value range or type for
@@ -89,17 +209,21 @@ enum reg_val_type {
 
 /**
  * struct jit_ctx - JIT context
- * @skf:		The sk_filter
+ * @prog:		The program
  * @stack_size:		eBPF stack size
+ * @bpf_stack_off:	eBPF FP offset
+ * @prolog_skip:	Prologue insns to skip by BPF caller
  * @idx:		Instruction index
  * @flags:		JIT flags
  * @offsets:		Instruction offsets
- * @target:		Memory location for the compiled filter
- * @reg_val_types	Packed enum reg_val_type for each register.
+ * @target:		Memory location for compiled instructions
+ * @reg_val_types:	Packed enum reg_val_type for each register
  */
 struct jit_ctx {
-	const struct bpf_prog *skf;
+	const struct bpf_prog *prog;
 	int stack_size;
+	int bpf_stack_off;
+	int prolog_skip;
 	u32 idx;
 	u32 flags;
 	u32 *offsets;
@@ -177,132 +301,192 @@ static u32 b_imm(unsigned int tgt, struct jit_ctx *ctx)
 		(ctx->idx * 4) - 4;
 }
 
-enum which_ebpf_reg {
-	src_reg,
-	src_reg_no_fp,
-	dst_reg,
-	dst_reg_fp_ok
+/* Sign-extend dst register or HI 32-bit reg of pair. */
+static inline void gen_sext_insn(int dst, struct jit_ctx *ctx)
+{
+	if (is64bit())
+		emit_instr(ctx, sll, dst, dst, 0);
+	else
+		emit_instr(ctx, sra, HI(dst), LO(dst), 31);
+}
+
+/*
+ * Zero-extend dst register or HI 32-bit reg of pair, if either forced
+ * or the BPF verifier does not insert its own zext insns.
+ */
+static inline void gen_zext_insn(int dst, bool force, struct jit_ctx *ctx)
+{
+	if (!ctx->prog->aux->verifier_zext || force) {
+		if (is64bit())
+			emit_instr(ctx, dinsu, dst, MIPS_R_ZERO, 32, 32);
+		else
+			emit_instr(ctx, and, HI(dst), MIPS_R_ZERO, MIPS_R_ZERO);
+	}
+}
+
+static inline bool tail_call_present(struct jit_ctx *ctx)
+{
+	return ctx->flags & EBPF_SEEN_TC || ctx->prog->aux->tail_call_reachable;
+}
+
+enum reg_usage {
+	REG_SRC_FP_OK,
+	REG_SRC_NO_FP,
+	REG_DST_FP_OK,
+	REG_DST_NO_FP
 };
 
 /*
  * For eBPF, the register mapping naturally falls out of the
- * requirements of eBPF and the MIPS n64 ABI.  We don't maintain a
- * separate frame pointer, so BPF_REG_10 relative accesses are
- * adjusted to be $sp relative.
+ * requirements of eBPF and the MIPS N64/O32 ABIs. We also maintain
+ * a separate frame pointer, setting BPF_REG_10 relative to $sp.
  */
 static int ebpf_to_mips_reg(struct jit_ctx *ctx,
 			    const struct bpf_insn *insn,
-			    enum which_ebpf_reg w)
+			    enum reg_usage u)
 {
-	int ebpf_reg = (w == src_reg || w == src_reg_no_fp) ?
+	int ebpf_reg = (u == REG_SRC_FP_OK || u == REG_SRC_NO_FP) ?
 		insn->src_reg : insn->dst_reg;
 
 	switch (ebpf_reg) {
 	case BPF_REG_0:
-		return MIPS_R_V0;
 	case BPF_REG_1:
-		return MIPS_R_A0;
 	case BPF_REG_2:
-		return MIPS_R_A1;
 	case BPF_REG_3:
-		return MIPS_R_A2;
 	case BPF_REG_4:
-		return MIPS_R_A3;
 	case BPF_REG_5:
-		return MIPS_R_A4;
 	case BPF_REG_6:
-		ctx->flags |= EBPF_SAVE_S0;
-		return MIPS_R_S0;
 	case BPF_REG_7:
-		ctx->flags |= EBPF_SAVE_S1;
-		return MIPS_R_S1;
 	case BPF_REG_8:
-		ctx->flags |= EBPF_SAVE_S2;
-		return MIPS_R_S2;
 	case BPF_REG_9:
-		ctx->flags |= EBPF_SAVE_S3;
-		return MIPS_R_S3;
+	case BPF_REG_AX:
+		ctx->flags |= bpf2mips[ebpf_reg].flags;
+		return bpf2mips[ebpf_reg].reg;
 	case BPF_REG_10:
-		if (w == dst_reg || w == src_reg_no_fp)
+		if (u == REG_DST_NO_FP || u == REG_SRC_NO_FP)
 			goto bad_reg;
-		ctx->flags |= EBPF_SEEN_FP;
-		/*
-		 * Needs special handling, return something that
-		 * cannot be clobbered just in case.
-		 */
-		return MIPS_R_ZERO;
-	case BPF_REG_AX:
-		return MIPS_R_T4;
+		ctx->flags |= bpf2mips[ebpf_reg].flags;
+		return bpf2mips[ebpf_reg].reg;
 	default:
 bad_reg:
 		WARN(1, "Illegal bpf reg: %d\n", ebpf_reg);
 		return -EINVAL;
 	}
 }
+
 /*
  * eBPF stack frame will be something like:
  *
  *  Entry $sp ------>   +--------------------------------+
  *                      |   $ra  (optional)              |
  *                      +--------------------------------+
- *                      |   $s0  (optional)              |
+ *                      |   $s8  (optional)              |
  *                      +--------------------------------+
- *                      |   $s1  (optional)              |
+ *                      |   $s7  (optional)              |
  *                      +--------------------------------+
- *                      |   $s2  (optional)              |
+ *                      |   $s6  (optional)              |
  *                      +--------------------------------+
- *                      |   $s3  (optional)              |
+ *                      |   $s5  (optional)              |
  *                      +--------------------------------+
  *                      |   $s4  (optional)              |
  *                      +--------------------------------+
- *                      |   tmp-storage  (if $ra saved)  |
- * $sp + tmp_offset --> +--------------------------------+ <--BPF_REG_10
+ *                      |   $s3  (optional)              |
+ *                      +--------------------------------+
+ *                      |   $s2  (optional)              |
+ *                      +--------------------------------+
+ *                      |   $s1  (optional)              |
+ *                      +--------------------------------+
+ *                      |   $s0  (optional)              |
+ *                      +--------------------------------+
+ *                      |   tmp-storage  (optional)      |
+ * $sp + bpf_stack_off->+--------------------------------+ <--BPF_REG_10
  *                      |   BPF_REG_10 relative storage  |
  *                      |    MAX_BPF_STACK (optional)    |
  *                      |      .                         |
  *                      |      .                         |
  *                      |      .                         |
- *     $sp -------->    +--------------------------------+
+ *        $sp ------>   +--------------------------------+
  *
  * If BPF_REG_10 is never referenced, then the MAX_BPF_STACK sized
  * area is not allocated.
  */
-static int gen_int_prologue(struct jit_ctx *ctx)
+static int build_int_prologue(struct jit_ctx *ctx)
 {
+	int tcc_run = bpf2mips[JIT_RUN_TCC].reg ?
+		      bpf2mips[JIT_RUN_TCC].reg :
+		      TEMP_PASS_TCC;
+	int tcc_sav = bpf2mips[JIT_SAV_TCC].reg;
+	const struct bpf_prog *prog = ctx->prog;
+	int r10 = bpf2mips[BPF_REG_10].reg;
+	int r1 = bpf2mips[BPF_REG_1].reg;
 	int stack_adjust = 0;
 	int store_offset;
 	int locals_size;
+	int start_idx;
 
 	if (ctx->flags & EBPF_SAVE_RA)
-		/*
-		 * If RA we are doing a function call and may need
-		 * extra 8-byte tmp area.
-		 */
-		stack_adjust += 2 * sizeof(long);
-	if (ctx->flags & EBPF_SAVE_S0)
 		stack_adjust += sizeof(long);
-	if (ctx->flags & EBPF_SAVE_S1)
+	if (ctx->flags & EBPF_SAVE_S8)
 		stack_adjust += sizeof(long);
-	if (ctx->flags & EBPF_SAVE_S2)
+	if (ctx->flags & EBPF_SAVE_S7)
 		stack_adjust += sizeof(long);
-	if (ctx->flags & EBPF_SAVE_S3)
+	if (ctx->flags & EBPF_SAVE_S6)
+		stack_adjust += sizeof(long);
+	if (ctx->flags & EBPF_SAVE_S5)
 		stack_adjust += sizeof(long);
 	if (ctx->flags & EBPF_SAVE_S4)
 		stack_adjust += sizeof(long);
+	if (ctx->flags & EBPF_SAVE_S3)
+		stack_adjust += sizeof(long);
+	if (ctx->flags & EBPF_SAVE_S2)
+		stack_adjust += sizeof(long);
+	if (ctx->flags & EBPF_SAVE_S1)
+		stack_adjust += sizeof(long);
+	if (ctx->flags & EBPF_SAVE_S0)
+		stack_adjust += sizeof(long);
+	if (tail_call_present(ctx) &&
+	    !(ctx->flags & EBPF_TCC_IN_RUN) && !tcc_sav)
+		/* Allocate scratch space for holding TCC if needed. */
+		stack_adjust += sizeof(long);
+
+	stack_adjust = ALIGN(stack_adjust, STACK_ALIGN);
 
-	BUILD_BUG_ON(MAX_BPF_STACK & 7);
-	locals_size = (ctx->flags & EBPF_SEEN_FP) ? MAX_BPF_STACK : 0;
+	locals_size = (ctx->flags & EBPF_SEEN_FP) ? prog->aux->stack_depth : 0;
+	locals_size = ALIGN(locals_size, STACK_ALIGN);
 
 	stack_adjust += locals_size;
 
 	ctx->stack_size = stack_adjust;
+	ctx->bpf_stack_off = locals_size;
 
 	/*
-	 * First instruction initializes the tail call count (TCC).
-	 * On tail call we skip this instruction, and the TCC is
-	 * passed in $v1 from the caller.
+	 * First instruction initializes the tail call count (TCC) and
+	 * assumes a call from kernel using the native ABI. Calls made
+	 * using the BPF ABI (bpf2bpf or tail call) will skip this insn
+	 * and pass the TCC via register.
 	 */
-	emit_instr(ctx, addiu, MIPS_R_V1, MIPS_R_ZERO, MAX_TAIL_CALL_CNT);
+	start_idx = ctx->idx;
+	emit_instr(ctx, addiu, tcc_run, MIPS_R_ZERO, MAX_TAIL_CALL_CNT);
+
+	/*
+	 * When called from kernel under O32 ABI we must set up BPF R1
+	 * context, since BPF R1 is an endian-order regster pair ($a0:$a1
+	 * or $a1:$a0) but context is always passed in $a0 as a 32-bit
+	 * pointer. As above, bpf2bpf and tail calls will skip these insns
+	 * since all registers are correctly set up already.
+	 */
+	if (!is64bit()) {
+		if (isbigend())
+			emit_instr(ctx, move, LO(r1), MIPS_R_A0);
+		/* Sanitize upper 32-bit reg */
+		gen_zext_insn(r1, true, ctx);
+	}
+	/*
+	 * Calls using BPF ABI (bpf2bpf and tail calls) will skip TCC
+	 * initialization and R1 context fixup needed by kernel calls.
+	 */
+	ctx->prolog_skip = (ctx->idx - start_idx) * 4;
+
 	if (stack_adjust)
 		emit_instr_long(ctx, daddiu, addiu,
 					MIPS_R_SP, MIPS_R_SP, -stack_adjust);
@@ -316,24 +500,24 @@ static int gen_int_prologue(struct jit_ctx *ctx)
 					MIPS_R_RA, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S0) {
+	if (ctx->flags & EBPF_SAVE_S8) {
 		emit_instr_long(ctx, sd, sw,
-					MIPS_R_S0, store_offset, MIPS_R_SP);
+					MIPS_R_S8, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S1) {
+	if (ctx->flags & EBPF_SAVE_S7) {
 		emit_instr_long(ctx, sd, sw,
-					MIPS_R_S1, store_offset, MIPS_R_SP);
+					MIPS_R_S7, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S2) {
+	if (ctx->flags & EBPF_SAVE_S6) {
 		emit_instr_long(ctx, sd, sw,
-					MIPS_R_S2, store_offset, MIPS_R_SP);
+					MIPS_R_S6, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S3) {
+	if (ctx->flags & EBPF_SAVE_S5) {
 		emit_instr_long(ctx, sd, sw,
-					MIPS_R_S3, store_offset, MIPS_R_SP);
+					MIPS_R_S5, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
 	if (ctx->flags & EBPF_SAVE_S4) {
@@ -341,27 +525,95 @@ static int gen_int_prologue(struct jit_ctx *ctx)
 					MIPS_R_S4, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
+	if (ctx->flags & EBPF_SAVE_S3) {
+		emit_instr_long(ctx, sd, sw,
+					MIPS_R_S3, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S2) {
+		emit_instr_long(ctx, sd, sw,
+					MIPS_R_S2, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S1) {
+		emit_instr_long(ctx, sd, sw,
+					MIPS_R_S1, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S0) {
+		emit_instr_long(ctx, sd, sw,
+					MIPS_R_S0, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+
+	/* Store TCC in backup register or stack scratch space if indicated. */
+	if (tail_call_present(ctx) && !(ctx->flags & EBPF_TCC_IN_RUN)) {
+		if (tcc_sav)
+			emit_instr(ctx, move, tcc_sav, tcc_run);
+		else
+			emit_instr_long(ctx, sd, sw,
+					tcc_run, ctx->bpf_stack_off, MIPS_R_SP);
+	}
 
-	if ((ctx->flags & EBPF_SEEN_TC) && !(ctx->flags & EBPF_TCC_IN_V1))
-		emit_instr_long(ctx, daddu, addu,
-					MIPS_R_S4, MIPS_R_V1, MIPS_R_ZERO);
+	/* Prepare BPF FP as single-reg ptr, emulate upper 32-bits as needed.*/
+	if (ctx->flags & EBPF_SEEN_FP)
+		emit_instr_long(ctx, daddiu, addiu, r10,
+						MIPS_R_SP, ctx->bpf_stack_off);
 
 	return 0;
 }
 
 static int build_int_epilogue(struct jit_ctx *ctx, int dest_reg)
 {
-	const struct bpf_prog *prog = ctx->skf;
+	const struct bpf_prog *prog = ctx->prog;
 	int stack_adjust = ctx->stack_size;
 	int store_offset = stack_adjust - sizeof(long);
+	int ax = bpf2mips[BPF_REG_AX].reg;
+	int r0 = bpf2mips[BPF_REG_0].reg;
 	enum reg_val_type td;
-	int r0 = MIPS_R_V0;
 
-	if (dest_reg == MIPS_R_RA) {
-		/* Don't let zero extended value escape. */
-		td = get_reg_val_type(ctx, prog->len, BPF_REG_0);
-		if (td == REG_64BIT)
-			emit_instr(ctx, sll, r0, r0, 0);
+	/*
+	 * As in prologue code, we default to assuming exit to the kernel.
+	 * Returns to the kernel follow the N64 or O32 ABI. For N64, the
+	 * BPF R0 return value may need to be sign-extended, while O32 may
+	 * need fixup of BPF R0 to place the 32-bit return value in MIPS V0.
+	 *
+	 * Returns to BPF2BPF callers consistently use the BPF 64-bit ABI,
+	 * so register usage and mapping between JIT and OS is unchanged.
+	 * Accommodate by saving unmodified R0 register data to allow a
+	 * BPF caller to restore R0 after we return.
+	 */
+	if (dest_reg == MIPS_R_RA) { /* kernel or bpf2bpf function return */
+		if (is64bit()) {
+			/*
+			 * Backup BPF R0 to AX, allowing the caller to
+			 * restore it in case this is a BPF2BPF rather
+			 * than a kernel return.
+			 */
+			emit_instr(ctx, move, ax, r0);
+			/*
+			 * Don't let zero-extended R0 value escape to
+			 * kernel on return, so sign-extend if needed.
+			 */
+			td = get_reg_val_type(ctx, prog->len, BPF_REG_0);
+			if (td == REG_64BIT)
+				gen_sext_insn(r0, ctx);
+		} else if (isbigend()) { /* and 32-bit */
+			/*
+			 * Backup high 32-bit register of BPF R0 to AX,
+			 * since it occupies MIPS_R_V0 which needs to be
+			 * clobbered for a kernel return.
+			 */
+			emit_instr(ctx, move, HI(ax), HI(r0));
+			/*
+			 * O32 ABI specifies 32-bit return value always
+			 * placed in MIPS_R_V0 regardless of the native
+			 * endianness. This would be in the wrong position
+			 * in a BPF R0 reg pair on big-endian systems, so
+			 * we must relocate.
+			 */
+			emit_instr(ctx, move, MIPS_R_V0, LO(r0));
+		}
 	}
 
 	if (ctx->flags & EBPF_SAVE_RA) {
@@ -369,24 +621,24 @@ static int build_int_epilogue(struct jit_ctx *ctx, int dest_reg)
 					MIPS_R_RA, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S0) {
+	if (ctx->flags & EBPF_SAVE_S8) {
 		emit_instr_long(ctx, ld, lw,
-					MIPS_R_S0, store_offset, MIPS_R_SP);
+					MIPS_R_S8, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S1) {
+	if (ctx->flags & EBPF_SAVE_S7) {
 		emit_instr_long(ctx, ld, lw,
-					MIPS_R_S1, store_offset, MIPS_R_SP);
+					MIPS_R_S7, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S2) {
+	if (ctx->flags & EBPF_SAVE_S6) {
 		emit_instr_long(ctx, ld, lw,
-				MIPS_R_S2, store_offset, MIPS_R_SP);
+					MIPS_R_S6, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
-	if (ctx->flags & EBPF_SAVE_S3) {
+	if (ctx->flags & EBPF_SAVE_S5) {
 		emit_instr_long(ctx, ld, lw,
-					MIPS_R_S3, store_offset, MIPS_R_SP);
+					MIPS_R_S5, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
 	if (ctx->flags & EBPF_SAVE_S4) {
@@ -394,8 +646,29 @@ static int build_int_epilogue(struct jit_ctx *ctx, int dest_reg)
 					MIPS_R_S4, store_offset, MIPS_R_SP);
 		store_offset -= sizeof(long);
 	}
+	if (ctx->flags & EBPF_SAVE_S3) {
+		emit_instr_long(ctx, ld, lw,
+					MIPS_R_S3, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S2) {
+		emit_instr_long(ctx, ld, lw,
+					MIPS_R_S2, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S1) {
+		emit_instr_long(ctx, ld, lw,
+					MIPS_R_S1, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
+	if (ctx->flags & EBPF_SAVE_S0) {
+		emit_instr_long(ctx, ld, lw,
+					MIPS_R_S0, store_offset, MIPS_R_SP);
+		store_offset -= sizeof(long);
+	}
 	emit_instr(ctx, jr, dest_reg);
 
+	/* Delay slot */
 	if (stack_adjust)
 		emit_instr_long(ctx, daddiu, addiu,
 					MIPS_R_SP, MIPS_R_SP, stack_adjust);
@@ -415,7 +688,9 @@ static void gen_imm_to_reg(const struct bpf_insn *insn, int reg,
 		int upper = insn->imm - lower;
 
 		emit_instr(ctx, lui, reg, upper >> 16);
-		emit_instr(ctx, addiu, reg, reg, lower);
+		/* lui already clears lower halfword */
+		if (lower)
+			emit_instr(ctx, addiu, reg, reg, lower);
 	}
 }
 
@@ -423,7 +698,7 @@ static int gen_imm_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			int idx)
 {
 	int upper_bound, lower_bound;
-	int dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+	int dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 
 	if (dst < 0)
 		return dst;
@@ -564,12 +839,12 @@ static int gen_imm_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	return 0;
 }
 
-static void emit_const_to_reg(struct jit_ctx *ctx, int dst, u64 value)
+static void emit_const_to_reg(struct jit_ctx *ctx, int dst, unsigned long value)
 {
-	if (value >= 0xffffffffffff8000ull || value < 0x8000ull) {
-		emit_instr(ctx, daddiu, dst, MIPS_R_ZERO, (int)value);
-	} else if (value >= 0xffffffff80000000ull ||
-		   (value < 0x80000000 && value > 0xffff)) {
+	if (value >= S16_MIN || value <= S16_MAX) {
+		emit_instr_long(ctx, daddiu, addiu, dst, MIPS_R_ZERO, (int)value);
+	} else if (value >= S32_MIN ||
+		   (value <= S32_MAX && value > U16_MAX)) {
 		emit_instr(ctx, lui, dst, (s32)(s16)(value >> 16));
 		emit_instr(ctx, ori, dst, dst, (unsigned int)(value & 0xffff));
 	} else {
@@ -601,54 +876,167 @@ static void emit_const_to_reg(struct jit_ctx *ctx, int dst, u64 value)
 	}
 }
 
+/*
+ * Push BPF regs R3-R5 to the stack, skipping BPF regs R1-R2 which are
+ * passed via MIPS register pairs in $a0-$a3. Register order within pairs
+ * and the memory storage order are identical i.e. endian native.
+ */
+static void emit_push_args(struct jit_ctx *ctx)
+{
+	int store_offset = 2 * sizeof(u64); /* Skip R1-R2 in $a0-$a3 */
+	int bpf, reg;
+
+	for (bpf = BPF_REG_3; bpf <= BPF_REG_5; bpf++) {
+		reg = bpf2mips[bpf].reg;
+
+		emit_instr(ctx, sw, LO(reg), OFFLO(store_offset), MIPS_R_SP);
+		emit_instr(ctx, sw, HI(reg), OFFHI(store_offset), MIPS_R_SP);
+		store_offset += sizeof(u64);
+	}
+}
+
+/*
+ * Common helper for BPF_CALL insn, handling TCC and ABI variations.
+ * Kernel calls under O32 ABI require arguments passed on the stack,
+ * while BPF2BPF calls need the TCC passed via register as expected
+ * by the subprog's prologue.
+ *
+ * Under MIPS32 O32 ABI calling convention, u64 BPF regs R1-R2 are passed
+ * via reg pairs in $a0-$a3, while BPF regs R3-R5 are passed via the stack.
+ * Stack space is still reserved for $a0-$a3, and the whole area aligned.
+ */
+#define ARGS_SIZE (5 * sizeof(u64))
+
+void emit_bpf_call(struct jit_ctx *ctx, const struct bpf_insn *insn)
+{
+	int stack_adjust = ALIGN(ARGS_SIZE, STACK_ALIGN);
+	int tcc_run = bpf2mips[JIT_RUN_TCC].reg ?
+		      bpf2mips[JIT_RUN_TCC].reg :
+		      TEMP_PASS_TCC;
+	int tcc_sav = bpf2mips[JIT_SAV_TCC].reg;
+	int ax = bpf2mips[BPF_REG_AX].reg;
+	int r0 = bpf2mips[BPF_REG_0].reg;
+	long func_addr;
+
+	ctx->flags |= EBPF_SAVE_RA;
+
+	/* Ensure TCC passed into BPF subprog */
+	if ((insn->src_reg == BPF_PSEUDO_CALL) &&
+	    tail_call_present(ctx) && !(ctx->flags & EBPF_TCC_IN_RUN)) {
+		/* Set TCC from reg or stack */
+		if (tcc_sav)
+			emit_instr(ctx, move, tcc_run, tcc_sav);
+		else
+			emit_instr_long(ctx, ld, lw, tcc_run,
+						ctx->bpf_stack_off, MIPS_R_SP);
+	}
+
+	/* Push O32 stack args for kernel call */
+	if (!is64bit() && (insn->src_reg != BPF_PSEUDO_CALL)) {
+		emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, -stack_adjust);
+		emit_push_args(ctx);
+	}
+
+	func_addr = (long)__bpf_call_base + insn->imm;
+
+	/* Skip TCC init and R1 register fixup with BPF ABI. */
+	if (insn->src_reg == BPF_PSEUDO_CALL)
+		func_addr += ctx->prolog_skip;
+
+	emit_const_to_reg(ctx, MIPS_R_T9, func_addr);
+	emit_instr(ctx, jalr, MIPS_R_RA, MIPS_R_T9);
+	/* Delay slot */
+	emit_instr(ctx, nop);
+
+	/* Restore stack */
+	if (!is64bit() && (insn->src_reg != BPF_PSEUDO_CALL))
+		emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, stack_adjust);
+
+	/*
+	 * Assuming a kernel return, a MIPS64 function epilogue may
+	 * sign-extend R0, while MIPS32BE mangles the R0 register pair.
+	 * Undo both for a bpf2bpf call return.
+	 */
+	if (insn->src_reg == BPF_PSEUDO_CALL) {
+		/* Restore BPF R0 from AX */
+		if (is64bit()) {
+			emit_instr(ctx, move, r0, ax);
+		} else if (isbigend()) { /* and 32-bit */
+			emit_instr(ctx, move, LO(r0), MIPS_R_V0);
+			emit_instr(ctx, move, HI(r0), HI(ax));
+		}
+	}
+}
+
+/*
+ * Tail call helper arguments passed via BPF ABI as u64 parameters. On
+ * MIPS64 N64 ABI systems these are native regs, while on MIPS32 O32 ABI
+ * systems these are reg pairs:
+ *
+ * R1 -> &ctx
+ * R2 -> &array
+ * R3 -> index
+ */
 static int emit_bpf_tail_call(struct jit_ctx *ctx, int this_idx)
 {
+	int tcc_run = bpf2mips[JIT_RUN_TCC].reg ?
+		      bpf2mips[JIT_RUN_TCC].reg :
+		      TEMP_PASS_TCC;
+	int tcc_sav = bpf2mips[JIT_SAV_TCC].reg;
+	int r2 = bpf2mips[BPF_REG_2].reg;
+	int r3 = bpf2mips[BPF_REG_3].reg;
 	int off, b_off;
-	int tcc_reg;
+	int tcc;
 
 	ctx->flags |= EBPF_SEEN_TC;
 	/*
 	 * if (index >= array->map.max_entries)
 	 *     goto out;
 	 */
-	/* Mask index as 32-bit */
-	emit_instr(ctx, dinsu, MIPS_R_A2, MIPS_R_ZERO, 32, 32);
+	if (is64bit())
+		/* Mask index as 32-bit */
+		gen_zext_insn(r3, true, ctx);
 	off = offsetof(struct bpf_array, map.max_entries);
-	emit_instr(ctx, lwu, MIPS_R_T5, off, MIPS_R_A1);
-	emit_instr(ctx, sltu, MIPS_R_AT, MIPS_R_T5, MIPS_R_A2);
+	emit_instr_long(ctx, lwu, lw, MIPS_R_AT, off, LO(r2));
+	emit_instr(ctx, sltu, MIPS_R_AT, MIPS_R_AT, LO(r3));
 	b_off = b_imm(this_idx + 1, ctx);
-	emit_instr(ctx, bne, MIPS_R_AT, MIPS_R_ZERO, b_off);
+	emit_instr(ctx, bnez, MIPS_R_AT, b_off);
 	/*
 	 * if (TCC-- < 0)
 	 *     goto out;
 	 */
 	/* Delay slot */
-	tcc_reg = (ctx->flags & EBPF_TCC_IN_V1) ? MIPS_R_V1 : MIPS_R_S4;
-	emit_instr(ctx, daddiu, MIPS_R_T5, tcc_reg, -1);
+	tcc = (ctx->flags & EBPF_TCC_IN_RUN) ? tcc_run : tcc_sav;
+	/* Get TCC from reg or stack */
+	if (tcc)
+		emit_instr(ctx, move, MIPS_R_T8, tcc);
+	else
+		emit_instr_long(ctx, ld, lw, MIPS_R_T8,
+						ctx->bpf_stack_off, MIPS_R_SP);
 	b_off = b_imm(this_idx + 1, ctx);
-	emit_instr(ctx, bltz, tcc_reg, b_off);
+	emit_instr(ctx, bltz, MIPS_R_T8, b_off);
 	/*
 	 * prog = array->ptrs[index];
 	 * if (prog == NULL)
 	 *     goto out;
 	 */
 	/* Delay slot */
-	emit_instr(ctx, dsll, MIPS_R_T8, MIPS_R_A2, 3);
-	emit_instr(ctx, daddu, MIPS_R_T8, MIPS_R_T8, MIPS_R_A1);
+	emit_instr_long(ctx, dsll, sll, MIPS_R_AT, LO(r3), ilog2(sizeof(long)));
+	emit_instr_long(ctx, daddu, addu, MIPS_R_AT, MIPS_R_AT, LO(r2));
 	off = offsetof(struct bpf_array, ptrs);
-	emit_instr(ctx, ld, MIPS_R_AT, off, MIPS_R_T8);
+	emit_instr_long(ctx, ld, lw, MIPS_R_AT, off, MIPS_R_AT);
 	b_off = b_imm(this_idx + 1, ctx);
-	emit_instr(ctx, beq, MIPS_R_AT, MIPS_R_ZERO, b_off);
+	emit_instr(ctx, beqz, MIPS_R_AT, b_off);
 	/* Delay slot */
 	emit_instr(ctx, nop);
 
-	/* goto *(prog->bpf_func + 4); */
+	/* goto *(prog->bpf_func + skip); */
 	off = offsetof(struct bpf_prog, bpf_func);
-	emit_instr(ctx, ld, MIPS_R_T9, off, MIPS_R_AT);
-	/* All systems are go... propagate TCC */
-	emit_instr(ctx, daddu, MIPS_R_V1, MIPS_R_T5, MIPS_R_ZERO);
-	/* Skip first instruction (TCC initialization) */
-	emit_instr(ctx, daddiu, MIPS_R_T9, MIPS_R_T9, 4);
+	emit_instr_long(ctx, ld, lw, MIPS_R_T9, off, MIPS_R_AT);
+	/* All systems are go... decrement and propagate TCC */
+	emit_instr_long(ctx, daddiu, addiu, tcc_run, MIPS_R_T8, -1);
+	/* Skip first instructions (TCC init and R1 fixup) */
+	emit_instr_long(ctx, daddiu, addiu, MIPS_R_T9, MIPS_R_T9, ctx->prolog_skip);
 	return build_int_epilogue(ctx, MIPS_R_T9);
 }
 
@@ -696,7 +1084,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			return r;
 		break;
 	case BPF_ALU64 | BPF_MUL | BPF_K: /* ALU64_IMM */
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		if (get_reg_val_type(ctx, this_idx, insn->dst_reg) == REG_32BIT)
@@ -712,7 +1100,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		}
 		break;
 	case BPF_ALU64 | BPF_NEG | BPF_K: /* ALU64_IMM */
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		if (get_reg_val_type(ctx, this_idx, insn->dst_reg) == REG_32BIT)
@@ -720,7 +1108,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		emit_instr(ctx, dsubu, dst, MIPS_R_ZERO, dst);
 		break;
 	case BPF_ALU | BPF_MUL | BPF_K: /* ALU_IMM */
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -739,7 +1127,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		}
 		break;
 	case BPF_ALU | BPF_NEG | BPF_K: /* ALU_IMM */
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -753,7 +1141,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU | BPF_MOD | BPF_K: /* ALU_IMM */
 		if (insn->imm == 0)
 			return -EINVAL;
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -784,7 +1172,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU64 | BPF_MOD | BPF_K: /* ALU_IMM */
 		if (insn->imm == 0)
 			return -EINVAL;
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		if (get_reg_val_type(ctx, this_idx, insn->dst_reg) == REG_32BIT)
@@ -821,22 +1209,14 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU64 | BPF_LSH | BPF_X: /* ALU64_REG */
 	case BPF_ALU64 | BPF_RSH | BPF_X: /* ALU64_REG */
 	case BPF_ALU64 | BPF_ARSH | BPF_X: /* ALU64_REG */
-		src = ebpf_to_mips_reg(ctx, insn, src_reg);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (src < 0 || dst < 0)
 			return -EINVAL;
 		if (get_reg_val_type(ctx, this_idx, insn->dst_reg) == REG_32BIT)
 			emit_instr(ctx, dinsu, dst, MIPS_R_ZERO, 32, 32);
 		did_move = false;
-		if (insn->src_reg == BPF_REG_10) {
-			if (bpf_op == BPF_MOV) {
-				emit_instr(ctx, daddiu, dst, MIPS_R_SP, MAX_BPF_STACK);
-				did_move = true;
-			} else {
-				emit_instr(ctx, daddiu, MIPS_R_AT, MIPS_R_SP, MAX_BPF_STACK);
-				src = MIPS_R_AT;
-			}
-		} else if (get_reg_val_type(ctx, this_idx, insn->src_reg) == REG_32BIT) {
+		if (get_reg_val_type(ctx, this_idx, insn->src_reg) == REG_32BIT) {
 			int tmp_reg = MIPS_R_AT;
 
 			if (bpf_op == BPF_MOV) {
@@ -917,8 +1297,8 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ALU | BPF_LSH | BPF_X: /* ALU_REG */
 	case BPF_ALU | BPF_RSH | BPF_X: /* ALU_REG */
 	case BPF_ALU | BPF_ARSH | BPF_X: /* ALU_REG */
-		src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (src < 0 || dst < 0)
 			return -EINVAL;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -1008,7 +1388,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JEQ | BPF_K: /* JMP_IMM */
 	case BPF_JMP | BPF_JNE | BPF_K: /* JMP_IMM */
 		cmp_eq = (bpf_op == BPF_JEQ);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg_fp_ok);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
 		if (insn->imm == 0) {
@@ -1029,8 +1409,8 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JGT | BPF_X:
 	case BPF_JMP | BPF_JGE | BPF_X:
 	case BPF_JMP | BPF_JSET | BPF_X:
-		src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (src < 0 || dst < 0)
 			return -EINVAL;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -1160,7 +1540,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JSLT | BPF_K: /* JMP_IMM */
 	case BPF_JMP | BPF_JSLE | BPF_K: /* JMP_IMM */
 		cmp_eq = (bpf_op == BPF_JSGE);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg_fp_ok);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
 
@@ -1235,7 +1615,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JLT | BPF_K:
 	case BPF_JMP | BPF_JLE | BPF_K:
 		cmp_eq = (bpf_op == BPF_JGE);
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg_fp_ok);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
 		/*
@@ -1258,7 +1638,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		goto jeq_common;
 
 	case BPF_JMP | BPF_JSET | BPF_K: /* JMP_IMM */
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg_fp_ok);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
 
@@ -1303,7 +1683,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		emit_instr(ctx, nop);
 		break;
 	case BPF_LD | BPF_DW | BPF_IMM:
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		t64 = ((u64)(u32)insn->imm) | ((u64)(insn + 1)->imm << 32);
@@ -1311,12 +1691,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		return 2; /* Double slot insn */
 
 	case BPF_JMP | BPF_CALL:
-		ctx->flags |= EBPF_SAVE_RA;
-		t64s = (s64)insn->imm + (long)__bpf_call_base;
-		emit_const_to_reg(ctx, MIPS_R_T9, (u64)t64s);
-		emit_instr(ctx, jalr, MIPS_R_RA, MIPS_R_T9);
-		/* delay slot */
-		emit_instr(ctx, nop);
+		emit_bpf_call(ctx, insn);
 		break;
 
 	case BPF_JMP | BPF_TAIL_CALL:
@@ -1326,7 +1701,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 
 	case BPF_ALU | BPF_END | BPF_FROM_BE:
 	case BPF_ALU | BPF_END | BPF_FROM_LE:
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
 		if (dst < 0)
 			return dst;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
@@ -1367,16 +1742,10 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_ST | BPF_H | BPF_MEM:
 	case BPF_ST | BPF_W | BPF_MEM:
 	case BPF_ST | BPF_DW | BPF_MEM:
-		if (insn->dst_reg == BPF_REG_10) {
-			ctx->flags |= EBPF_SEEN_FP;
-			dst = MIPS_R_SP;
-			mem_off = insn->off + MAX_BPF_STACK;
-		} else {
-			dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
-			if (dst < 0)
-				return dst;
-			mem_off = insn->off;
-		}
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		if (dst < 0)
+			return dst;
+		mem_off = insn->off;
 		gen_imm_to_reg(insn, MIPS_R_AT, ctx);
 		switch (BPF_SIZE(insn->code)) {
 		case BPF_B:
@@ -1398,19 +1767,11 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_LDX | BPF_H | BPF_MEM:
 	case BPF_LDX | BPF_W | BPF_MEM:
 	case BPF_LDX | BPF_DW | BPF_MEM:
-		if (insn->src_reg == BPF_REG_10) {
-			ctx->flags |= EBPF_SEEN_FP;
-			src = MIPS_R_SP;
-			mem_off = insn->off + MAX_BPF_STACK;
-		} else {
-			src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
-			if (src < 0)
-				return src;
-			mem_off = insn->off;
-		}
-		dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
-		if (dst < 0)
-			return dst;
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (dst < 0 || src < 0)
+			return -EINVAL;
+		mem_off = insn->off;
 		switch (BPF_SIZE(insn->code)) {
 		case BPF_B:
 			emit_instr(ctx, lbu, dst, mem_off, src);
@@ -1433,25 +1794,16 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_STX | BPF_DW | BPF_MEM:
 	case BPF_STX | BPF_W | BPF_ATOMIC:
 	case BPF_STX | BPF_DW | BPF_ATOMIC:
-		if (insn->dst_reg == BPF_REG_10) {
-			ctx->flags |= EBPF_SEEN_FP;
-			dst = MIPS_R_SP;
-			mem_off = insn->off + MAX_BPF_STACK;
-		} else {
-			dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
-			if (dst < 0)
-				return dst;
-			mem_off = insn->off;
-		}
-		src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
-		if (src < 0)
-			return src;
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
 		if (BPF_MODE(insn->code) == BPF_ATOMIC) {
 			if (insn->imm != BPF_ADD) {
 				pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
 				return -EINVAL;
 			}
-
 			/*
 			 * If mem_off does not fit within the 9 bit ll/sc
 			 * instruction immediate field, use a temp reg.
@@ -1530,7 +1882,7 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 
 static int build_int_body(struct jit_ctx *ctx)
 {
-	const struct bpf_prog *prog = ctx->skf;
+	const struct bpf_prog *prog = ctx->prog;
 	const struct bpf_insn *insn;
 	int i, r;
 
@@ -1572,7 +1924,7 @@ static int build_int_body(struct jit_ctx *ctx)
 static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
 				   int start_idx, bool follow_taken)
 {
-	const struct bpf_prog *prog = ctx->skf;
+	const struct bpf_prog *prog = ctx->prog;
 	const struct bpf_insn *insn;
 	u64 exit_rvt = initial_rvt;
 	u64 *rvt = ctx->reg_val_types;
@@ -1773,7 +2125,7 @@ static int reg_val_propagate_range(struct jit_ctx *ctx, u64 initial_rvt,
  */
 static int reg_val_propagate(struct jit_ctx *ctx)
 {
-	const struct bpf_prog *prog = ctx->skf;
+	const struct bpf_prog *prog = ctx->prog;
 	u64 exit_rvt;
 	int reg;
 	int i;
@@ -1832,23 +2184,86 @@ static void jit_fill_hole(void *area, unsigned int size)
 		uasm_i_break(&p, BRK_BUG); /* Increments p */
 }
 
+/*
+ * Save and restore the BPF VM state across a direct kernel call. This
+ * includes the caller-saved registers used for BPF_REG_0 .. BPF_REG_5
+ * and BPF_REG_AX used by the verifier for blinding and other dark arts.
+ * Restore avoids clobbering bpf_ret, which holds the call return value.
+ * BPF_REG_6 .. BPF_REG_10 and TCC are already callee-saved or on stack.
+ */
+static const int bpf_caller_save[] = {
+	BPF_REG_0,
+	BPF_REG_1,
+	BPF_REG_2,
+	BPF_REG_3,
+	BPF_REG_4,
+	BPF_REG_5,
+	BPF_REG_AX,
+};
+
+#define CALLER_ENV_SIZE (ARRAY_SIZE(bpf_caller_save) * sizeof(u64))
+
+void emit_caller_save(struct jit_ctx *ctx)
+{
+	int stack_adj = ALIGN(CALLER_ENV_SIZE, STACK_ALIGN);
+	int i, bpf, reg, store_offset;
+
+	emit_instr_long(ctx, daddiu, addiu, MIPS_R_SP, MIPS_R_SP, -stack_adj);
+
+	for (i = 0; i < ARRAY_SIZE(bpf_caller_save); i++) {
+		bpf = bpf_caller_save[i];
+		reg = bpf2mips[bpf].reg;
+		store_offset = i * sizeof(u64);
+
+		if (is64bit()) {
+			emit_instr(ctx, sd, reg, store_offset, MIPS_R_SP);
+		} else {
+			emit_instr(ctx, sw, LO(reg),
+						OFFLO(store_offset), MIPS_R_SP);
+			emit_instr(ctx, sw, HI(reg),
+						OFFHI(store_offset), MIPS_R_SP);
+		}
+	}
+}
+
+void emit_caller_restore(struct jit_ctx *ctx, int bpf_ret)
+{
+	int stack_adj = ALIGN(CALLER_ENV_SIZE, STACK_ALIGN);
+	int i, bpf, reg, store_offset;
+
+	for (i = 0; i < ARRAY_SIZE(bpf_caller_save); i++) {
+		bpf = bpf_caller_save[i];
+		reg = bpf2mips[bpf].reg;
+		store_offset = i * sizeof(u64);
+		if (bpf == bpf_ret)
+			continue;
+
+		if (is64bit()) {
+			emit_instr(ctx, ld, reg, store_offset, MIPS_R_SP);
+		} else {
+			emit_instr(ctx, lw, LO(reg),
+						OFFLO(store_offset), MIPS_R_SP);
+			emit_instr(ctx, lw, HI(reg),
+						OFFHI(store_offset), MIPS_R_SP);
+		}
+	}
+
+	emit_instr_long(ctx, daddiu, addiu, MIPS_R_SP, MIPS_R_SP, stack_adj);
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
-	struct bpf_prog *orig_prog = prog;
-	bool tmp_blinded = false;
-	struct bpf_prog *tmp;
+	bool tmp_blinded = false, extra_pass = false;
+	struct bpf_prog *tmp, *orig_prog = prog;
 	struct bpf_binary_header *header = NULL;
-	struct jit_ctx ctx;
-	unsigned int image_size;
-	u8 *image_ptr;
+	unsigned int image_size, pass = 3;
+	struct jit_ctx *ctx;
 
 	if (!prog->jit_requested)
-		return prog;
+		return orig_prog;
 
+	/* Attempt blinding but fall back to the interpreter on failure. */
 	tmp = bpf_jit_blind_constants(prog);
-	/* If blinding was requested and we failed during blinding,
-	 * we must fall back to the interpreter.
-	 */
 	if (IS_ERR(tmp))
 		return orig_prog;
 	if (tmp != prog) {
@@ -1856,50 +2271,75 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		prog = tmp;
 	}
 
-	memset(&ctx, 0, sizeof(ctx));
+	ctx = prog->aux->jit_data;
+	if (!ctx) {
+		ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+		if (!ctx) {
+			prog = orig_prog;
+			goto out;
+		}
+	}
 
-	preempt_disable();
-	switch (current_cpu_type()) {
-	case CPU_CAVIUM_OCTEON:
-	case CPU_CAVIUM_OCTEON_PLUS:
-	case CPU_CAVIUM_OCTEON2:
-	case CPU_CAVIUM_OCTEON3:
-		ctx.use_bbit_insns = 1;
-		break;
-	default:
-		ctx.use_bbit_insns = 0;
+	/*
+	 * Assume extra pass needed for patching addresses if previous
+	 * ctx exists in saved jit_data, so skip to code generation.
+	 */
+	if (ctx->offsets) {
+		extra_pass = true;
+		pass++;
+		image_size = 4 * ctx->idx;
+		header = bpf_jit_binary_hdr(ctx->prog);
+		goto skip_init_ctx;
 	}
-	preempt_enable();
 
-	ctx.offsets = kcalloc(prog->len + 1, sizeof(*ctx.offsets), GFP_KERNEL);
-	if (ctx.offsets == NULL)
+	ctx->prog = prog;
+	ctx->offsets = kcalloc(prog->len + 1,
+			       sizeof(*ctx->offsets),
+			       GFP_KERNEL);
+	if (!ctx->offsets)
 		goto out_err;
 
-	ctx.reg_val_types = kcalloc(prog->len + 1, sizeof(*ctx.reg_val_types), GFP_KERNEL);
-	if (ctx.reg_val_types == NULL)
-		goto out_err;
+	/* Check Octeon bbit ops only for MIPS64. */
+	if (is64bit()) {
+		preempt_disable();
+		switch (current_cpu_type()) {
+		case CPU_CAVIUM_OCTEON:
+		case CPU_CAVIUM_OCTEON_PLUS:
+		case CPU_CAVIUM_OCTEON2:
+		case CPU_CAVIUM_OCTEON3:
+			ctx->use_bbit_insns = 1;
+			break;
+		default:
+			ctx->use_bbit_insns = 0;
+		}
+		preempt_enable();
+	}
 
-	ctx.skf = prog;
+	ctx->reg_val_types = kcalloc(prog->len + 1,
+				     sizeof(*ctx->reg_val_types),
+				     GFP_KERNEL);
+	if (!ctx->reg_val_types)
+		goto out_err;
 
-	if (reg_val_propagate(&ctx))
+	if (reg_val_propagate(ctx))
 		goto out_err;
 
 	/*
 	 * First pass discovers used resources and instruction offsets
 	 * assuming short branches are used.
 	 */
-	if (build_int_body(&ctx))
+	if (build_int_body(ctx))
 		goto out_err;
 
 	/*
-	 * If no calls are made (EBPF_SAVE_RA), then tail call count
-	 * in $v1, else we must save in n$s4.
+	 * If no calls are made (EBPF_SAVE_RA), then tailcall count located
+	 * in runtime reg if defined, else we backup to save reg or stack.
 	 */
-	if (ctx.flags & EBPF_SEEN_TC) {
-		if (ctx.flags & EBPF_SAVE_RA)
-			ctx.flags |= EBPF_SAVE_S4;
-		else
-			ctx.flags |= EBPF_TCC_IN_V1;
+	if (tail_call_present(ctx)) {
+		if (ctx->flags & EBPF_SAVE_RA)
+			ctx->flags |= bpf2mips[JIT_SAV_TCC].flags;
+		else if (bpf2mips[JIT_RUN_TCC].reg)
+			ctx->flags |= EBPF_TCC_IN_RUN;
 	}
 
 	/*
@@ -1910,59 +2350,69 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	 * necessary.
 	 */
 	do {
-		ctx.idx = 0;
-		ctx.gen_b_offsets = 1;
-		ctx.long_b_conversion = 0;
-		if (gen_int_prologue(&ctx))
+		ctx->idx = 0;
+		ctx->gen_b_offsets = 1;
+		ctx->long_b_conversion = 0;
+		if (build_int_prologue(ctx))
 			goto out_err;
-		if (build_int_body(&ctx))
+		if (build_int_body(ctx))
 			goto out_err;
-		if (build_int_epilogue(&ctx, MIPS_R_RA))
+		if (build_int_epilogue(ctx, MIPS_R_RA))
 			goto out_err;
-	} while (ctx.long_b_conversion);
+	} while (ctx->long_b_conversion);
 
-	image_size = 4 * ctx.idx;
+	image_size = 4 * ctx->idx;
 
-	header = bpf_jit_binary_alloc(image_size, &image_ptr,
+	header = bpf_jit_binary_alloc(image_size, (void *)&ctx->target,
 				      sizeof(u32), jit_fill_hole);
-	if (header == NULL)
+	if (!header)
 		goto out_err;
 
-	ctx.target = (u32 *)image_ptr;
+skip_init_ctx:
 
-	/* Third pass generates the code */
-	ctx.idx = 0;
-	if (gen_int_prologue(&ctx))
+	/* Third pass generates the code (fourth patches call addresses) */
+	ctx->idx = 0;
+	if (build_int_prologue(ctx))
 		goto out_err;
-	if (build_int_body(&ctx))
+	if (build_int_body(ctx))
 		goto out_err;
-	if (build_int_epilogue(&ctx, MIPS_R_RA))
+	if (build_int_epilogue(ctx, MIPS_R_RA))
 		goto out_err;
 
-	/* Update the icache */
-	flush_icache_range((unsigned long)ctx.target,
-			   (unsigned long)&ctx.target[ctx.idx]);
-
 	if (bpf_jit_enable > 1)
 		/* Dump JIT code */
-		bpf_jit_dump(prog->len, image_size, 2, ctx.target);
+		bpf_jit_dump(prog->len, image_size, pass, ctx->target);
+
+	/* Update the icache */
+	flush_icache_range((unsigned long)ctx->target,
+			   (unsigned long)&ctx->target[ctx->idx]);
+
+	if (!prog->is_func || extra_pass)
+		bpf_jit_binary_lock_ro(header);
+	else
+		prog->aux->jit_data = ctx;
 
-	bpf_jit_binary_lock_ro(header);
-	prog->bpf_func = (void *)ctx.target;
+	prog->bpf_func = (void *)ctx->target;
 	prog->jited = 1;
 	prog->jited_len = image_size;
-out_normal:
+
+	if (!prog->is_func || extra_pass) {
+		bpf_prog_fill_jited_linfo(prog, ctx->offsets + 1);
+out_ctx:
+		kfree(ctx->offsets);
+		kfree(ctx->reg_val_types);
+		kfree(ctx);
+		prog->aux->jit_data = NULL;
+	}
+out:
 	if (tmp_blinded)
 		bpf_jit_prog_release_other(prog, prog == orig_prog ?
 					   tmp : orig_prog);
-	kfree(ctx.offsets);
-	kfree(ctx.reg_val_types);
-
 	return prog;
 
 out_err:
 	prog = orig_prog;
 	if (header)
 		bpf_jit_binary_free(header);
-	goto out_normal;
+	goto out_ctx;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 11/16] bpf: allow tailcalls in subprograms for MIPS64/MIPS32
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (9 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 10/16] MIPS: eBPF: add core support for 32/64-bit systems Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 13/16] MIPS: eBPF64: support BPF_JMP32 conditionals Tony Ambardar
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

The BPF core/verifier is hard-coded to permit mixing bpf2bpf and tail
calls for only x86-64. Change the logic to instead rely on a new weak
function 'bool bpf_jit_supports_subprog_tailcalls(void)', which a capable
JIT backend can override.

Update the x86-64 eBPF JIT to reflect this, and also enable the feature
for the MIPS64/MIPS32 JIT.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit.c    | 6 ++++++
 arch/x86/net/bpf_jit_comp.c | 6 ++++++
 include/linux/filter.h      | 1 +
 kernel/bpf/core.c           | 6 ++++++
 kernel/bpf/verifier.c       | 3 ++-
 5 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 7d8ed8bb19ab..501c1d532be6 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -2416,3 +2416,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		bpf_jit_binary_free(header);
 	goto out_ctx;
 }
+
+/* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
+bool bpf_jit_supports_subprog_tailcalls(void)
+{
+	return true;
+}
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 3ae729bb2475..2d78588fb5b5 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2367,3 +2367,9 @@ bool bpf_jit_supports_kfunc_call(void)
 {
 	return true;
 }
+
+/* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
+bool bpf_jit_supports_subprog_tailcalls(void)
+{
+	return true;
+}
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 16e5cebea82c..50b50fb271b5 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -933,6 +933,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
 void bpf_jit_compile(struct bpf_prog *prog);
 bool bpf_jit_needs_zext(void);
+bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_kfunc_call(void);
 bool bpf_helper_changes_pkt_data(void *func);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 189934d2a3f2..c82b48ed0005 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2366,6 +2366,12 @@ bool __weak bpf_jit_needs_zext(void)
 	return false;
 }
 
+/* Return TRUE if the JIT backend supports mixing bpf2bpf and tailcalls. */
+bool __weak bpf_jit_supports_subprog_tailcalls(void)
+{
+	return false;
+}
+
 bool __weak bpf_jit_supports_kfunc_call(void)
 {
 	return false;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6e2ebcb0d66f..267de5428fad 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5144,7 +5144,8 @@ static bool may_update_sockmap(struct bpf_verifier_env *env, int func_id)
 
 static bool allow_tail_call_in_subprogs(struct bpf_verifier_env *env)
 {
-	return env->prog->jit_requested && IS_ENABLED(CONFIG_X86_64);
+	return env->prog->jit_requested &&
+	       bpf_jit_supports_subprog_tailcalls();
 }
 
 static int check_map_func_compatibility(struct bpf_verifier_env *env,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 13/16] MIPS: eBPF64: support BPF_JMP32 conditionals
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (10 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 11/16] bpf: allow tailcalls in subprograms for MIPS64/MIPS32 Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 14/16] MIPS: eBPF64: implement all BPF_ATOMIC ops Tony Ambardar
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Simplify BPF_JGT/JLE/JSGT/JSLE and related code by dropping unneeded usage
of MIPS_R_T8/T9 registers and jump-around branches, and extra comparison
arithmetic. Also reorganize var declarations and add 'bpf_class' helper
constant.

Implement BPF_JMP32 branches using sign or zero-extended temporaries as
needed for comparisons. This enables JITing of many more BPF programs, and
provides greater test coverage by e.g. 'test_verifier'.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit_comp64.c | 206 +++++++++++++++++++-------------
 1 file changed, 122 insertions(+), 84 deletions(-)

diff --git a/arch/mips/net/ebpf_jit_comp64.c b/arch/mips/net/ebpf_jit_comp64.c
index c38d93d37ce3..842e516ce749 100644
--- a/arch/mips/net/ebpf_jit_comp64.c
+++ b/arch/mips/net/ebpf_jit_comp64.c
@@ -167,12 +167,13 @@ static int gen_imm_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		   int this_idx, int exit_idx)
 {
-	int src, dst, r, td, ts, mem_off, b_off;
+	const int bpf_class = BPF_CLASS(insn->code);
+	const int bpf_op = BPF_OP(insn->code);
 	bool need_swap, did_move, cmp_eq;
 	unsigned int target = 0;
+	int src, dst, r, td, ts;
+	int mem_off, b_off;
 	u64 t64;
-	s64 t64s;
-	int bpf_op = BPF_OP(insn->code);
 
 	switch (insn->code) {
 	case BPF_ALU64 | BPF_ADD | BPF_K: /* ALU64_IMM */
@@ -500,7 +501,9 @@ int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		}
 		break;
 	case BPF_JMP | BPF_JEQ | BPF_K: /* JMP_IMM */
-	case BPF_JMP | BPF_JNE | BPF_K: /* JMP_IMM */
+	case BPF_JMP | BPF_JNE | BPF_K:
+	case BPF_JMP32 | BPF_JEQ | BPF_K: /* JMP_IMM */
+	case BPF_JMP32 | BPF_JNE | BPF_K:
 		cmp_eq = (bpf_op == BPF_JEQ);
 		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
@@ -511,6 +514,16 @@ int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			gen_imm_to_reg(insn, MIPS_R_AT, ctx);
 			src = MIPS_R_AT;
 		}
+		if (bpf_class == BPF_JMP32) {
+			emit_instr(ctx, move, MIPS_R_T8, dst);
+			emit_instr(ctx, dinsu, MIPS_R_T8, MIPS_R_ZERO, 32, 32);
+			dst = MIPS_R_T8;
+			if (src != MIPS_R_ZERO) {
+				emit_instr(ctx, move, MIPS_R_T9, src);
+				emit_instr(ctx, dinsu, MIPS_R_T9, MIPS_R_ZERO, 32, 32);
+				src = MIPS_R_T9;
+			}
+		}
 		goto jeq_common;
 	case BPF_JMP | BPF_JEQ | BPF_X: /* JMP_REG */
 	case BPF_JMP | BPF_JNE | BPF_X:
@@ -523,18 +536,46 @@ int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JGT | BPF_X:
 	case BPF_JMP | BPF_JGE | BPF_X:
 	case BPF_JMP | BPF_JSET | BPF_X:
+	case BPF_JMP32 | BPF_JEQ | BPF_X: /* JMP_REG */
+	case BPF_JMP32 | BPF_JNE | BPF_X:
+	case BPF_JMP32 | BPF_JSLT | BPF_X:
+	case BPF_JMP32 | BPF_JSLE | BPF_X:
+	case BPF_JMP32 | BPF_JSGT | BPF_X:
+	case BPF_JMP32 | BPF_JSGE | BPF_X:
+	case BPF_JMP32 | BPF_JLT | BPF_X:
+	case BPF_JMP32 | BPF_JLE | BPF_X:
+	case BPF_JMP32 | BPF_JGT | BPF_X:
+	case BPF_JMP32 | BPF_JGE | BPF_X:
+	case BPF_JMP32 | BPF_JSET | BPF_X:
 		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
 		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (src < 0 || dst < 0)
 			return -EINVAL;
 		td = get_reg_val_type(ctx, this_idx, insn->dst_reg);
 		ts = get_reg_val_type(ctx, this_idx, insn->src_reg);
-		if (td == REG_32BIT && ts != REG_32BIT) {
-			emit_instr(ctx, sll, MIPS_R_AT, src, 0);
-			src = MIPS_R_AT;
-		} else if (ts == REG_32BIT && td != REG_32BIT) {
-			emit_instr(ctx, sll, MIPS_R_AT, dst, 0);
-			dst = MIPS_R_AT;
+		if (bpf_class == BPF_JMP) {
+			if (td == REG_32BIT && ts != REG_32BIT) {
+				emit_instr(ctx, sll, MIPS_R_AT, src, 0);
+				src = MIPS_R_AT;
+			} else if (ts == REG_32BIT && td != REG_32BIT) {
+				emit_instr(ctx, sll, MIPS_R_AT, dst, 0);
+				dst = MIPS_R_AT;
+			}
+		} else { /* BPF_JMP32 */
+			if (bpf_op == BPF_JSGT || bpf_op == BPF_JSLT ||
+			    bpf_op == BPF_JSGE || bpf_op == BPF_JSLE) {
+				emit_instr(ctx, sll, MIPS_R_T8, dst, 0);
+				emit_instr(ctx, sll, MIPS_R_T9, src, 0);
+				dst = MIPS_R_T8;
+				src = MIPS_R_T9;
+			} else {
+				emit_instr(ctx, move, MIPS_R_T8, dst);
+				emit_instr(ctx, move, MIPS_R_T9, src);
+				emit_instr(ctx, dinsu, MIPS_R_T8, MIPS_R_ZERO, 32, 32);
+				emit_instr(ctx, dinsu, MIPS_R_T9, MIPS_R_ZERO, 32, 32);
+				dst = MIPS_R_T8;
+				src = MIPS_R_T9;
+			}
 		}
 		if (bpf_op == BPF_JSET) {
 			emit_instr(ctx, and, MIPS_R_AT, dst, src);
@@ -542,48 +583,20 @@ int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			dst = MIPS_R_AT;
 			src = MIPS_R_ZERO;
 		} else if (bpf_op == BPF_JSGT || bpf_op == BPF_JSLE) {
-			emit_instr(ctx, dsubu, MIPS_R_AT, dst, src);
-			if ((insn + 1)->code == (BPF_JMP | BPF_EXIT) && insn->off == 1) {
-				b_off = b_imm(exit_idx, ctx);
-				if (is_bad_offset(b_off))
-					return -E2BIG;
-				if (bpf_op == BPF_JSGT)
-					emit_instr(ctx, blez, MIPS_R_AT, b_off);
-				else
-					emit_instr(ctx, bgtz, MIPS_R_AT, b_off);
-				emit_instr(ctx, nop);
-				return 2; /* We consumed the exit. */
-			}
-			b_off = b_imm(this_idx + insn->off + 1, ctx);
-			if (is_bad_offset(b_off))
-				return -E2BIG;
-			if (bpf_op == BPF_JSGT)
-				emit_instr(ctx, bgtz, MIPS_R_AT, b_off);
-			else
-				emit_instr(ctx, blez, MIPS_R_AT, b_off);
-			emit_instr(ctx, nop);
-			break;
+			/* swap dst and src to simplify comparison */
+			emit_instr(ctx, slt, MIPS_R_AT, src, dst);
+			cmp_eq = bpf_op == BPF_JSLE;
+			dst = MIPS_R_AT;
+			src = MIPS_R_ZERO;
 		} else if (bpf_op == BPF_JSGE || bpf_op == BPF_JSLT) {
 			emit_instr(ctx, slt, MIPS_R_AT, dst, src);
 			cmp_eq = bpf_op == BPF_JSGE;
 			dst = MIPS_R_AT;
 			src = MIPS_R_ZERO;
 		} else if (bpf_op == BPF_JGT || bpf_op == BPF_JLE) {
-			/* dst or src could be AT */
-			emit_instr(ctx, dsubu, MIPS_R_T8, dst, src);
-			emit_instr(ctx, sltu, MIPS_R_AT, dst, src);
-			/* SP known to be non-zero, movz becomes boolean not */
-			if (MIPS_ISA_REV >= 6) {
-				emit_instr(ctx, seleqz, MIPS_R_T9,
-						MIPS_R_SP, MIPS_R_T8);
-			} else {
-				emit_instr(ctx, movz, MIPS_R_T9,
-						MIPS_R_SP, MIPS_R_T8);
-				emit_instr(ctx, movn, MIPS_R_T9,
-						MIPS_R_ZERO, MIPS_R_T8);
-			}
-			emit_instr(ctx, or, MIPS_R_AT, MIPS_R_T9, MIPS_R_AT);
-			cmp_eq = bpf_op == BPF_JGT;
+			/* swap dst and src to simplify comparison */
+			emit_instr(ctx, sltu, MIPS_R_AT, src, dst);
+			cmp_eq = bpf_op == BPF_JLE;
 			dst = MIPS_R_AT;
 			src = MIPS_R_ZERO;
 		} else if (bpf_op == BPF_JGE || bpf_op == BPF_JLT) {
@@ -650,14 +663,20 @@ int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		}
 		break;
 	case BPF_JMP | BPF_JSGT | BPF_K: /* JMP_IMM */
-	case BPF_JMP | BPF_JSGE | BPF_K: /* JMP_IMM */
-	case BPF_JMP | BPF_JSLT | BPF_K: /* JMP_IMM */
-	case BPF_JMP | BPF_JSLE | BPF_K: /* JMP_IMM */
-		cmp_eq = (bpf_op == BPF_JSGE);
+	case BPF_JMP | BPF_JSGE | BPF_K:
+	case BPF_JMP | BPF_JSLT | BPF_K:
+	case BPF_JMP | BPF_JSLE | BPF_K:
+	case BPF_JMP32 | BPF_JSGT | BPF_K: /* JMP_IMM */
+	case BPF_JMP32 | BPF_JSGE | BPF_K:
+	case BPF_JMP32 | BPF_JSLT | BPF_K:
+	case BPF_JMP32 | BPF_JSLE | BPF_K:
 		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
-
+		if (bpf_class == BPF_JMP32) {
+			emit_instr(ctx, sll, MIPS_R_T8, dst, 0);
+			dst = MIPS_R_T8;
+		}
 		if (insn->imm == 0) {
 			if ((insn + 1)->code == (BPF_JMP | BPF_EXIT) && insn->off == 1) {
 				b_off = b_imm(exit_idx, ctx);
@@ -700,26 +719,24 @@ int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			emit_instr(ctx, nop);
 			break;
 		}
-		/*
-		 * only "LT" compare available, so we must use imm + 1
-		 * to generate "GT" and imm -1 to generate LE
-		 */
-		if (bpf_op == BPF_JSGT)
-			t64s = insn->imm + 1;
-		else if (bpf_op == BPF_JSLE)
-			t64s = insn->imm + 1;
-		else
-			t64s = insn->imm;
 
-		cmp_eq = bpf_op == BPF_JSGT || bpf_op == BPF_JSGE;
-		if (t64s >= S16_MIN && t64s <= S16_MAX) {
-			emit_instr(ctx, slti, MIPS_R_AT, dst, (int)t64s);
-			src = MIPS_R_AT;
-			dst = MIPS_R_ZERO;
-			goto jeq_common;
+		gen_imm_to_reg(insn, MIPS_R_T9, ctx);
+		src = MIPS_R_T9;
+		cmp_eq = bpf_op == BPF_JSLE || bpf_op == BPF_JSGE;
+		switch (bpf_op) {
+		case BPF_JSGE:
+			emit_instr(ctx, slt, MIPS_R_AT, dst, src);
+			break;
+		case BPF_JSLT:
+			emit_instr(ctx, slt, MIPS_R_AT, dst, src);
+			break;
+		case BPF_JSGT:
+			emit_instr(ctx, slt, MIPS_R_AT, src, dst);
+			break;
+		case BPF_JSLE:
+			emit_instr(ctx, slt, MIPS_R_AT, src, dst);
+			break;
 		}
-		emit_const_to_reg(ctx, MIPS_R_AT, (u64)t64s);
-		emit_instr(ctx, slt, MIPS_R_AT, dst, MIPS_R_AT);
 		src = MIPS_R_AT;
 		dst = MIPS_R_ZERO;
 		goto jeq_common;
@@ -728,33 +745,52 @@ int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP | BPF_JGE | BPF_K:
 	case BPF_JMP | BPF_JLT | BPF_K:
 	case BPF_JMP | BPF_JLE | BPF_K:
-		cmp_eq = (bpf_op == BPF_JGE);
+	case BPF_JMP32 | BPF_JGT | BPF_K:
+	case BPF_JMP32 | BPF_JGE | BPF_K:
+	case BPF_JMP32 | BPF_JLT | BPF_K:
+	case BPF_JMP32 | BPF_JLE | BPF_K:
 		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
-		/*
-		 * only "LT" compare available, so we must use imm + 1
-		 * to generate "GT" and imm -1 to generate LE
-		 */
-		if (bpf_op == BPF_JGT)
-			t64s = (u64)(u32)(insn->imm) + 1;
-		else if (bpf_op == BPF_JLE)
-			t64s = (u64)(u32)(insn->imm) + 1;
-		else
-			t64s = (u64)(u32)(insn->imm);
-
-		cmp_eq = bpf_op == BPF_JGT || bpf_op == BPF_JGE;
+		if (bpf_class == BPF_JMP32) {
+			emit_instr(ctx, move, MIPS_R_T8, dst);
+			gen_zext_insn(MIPS_R_T8, true, ctx);
+			dst = MIPS_R_T8;
+		}
+		gen_imm_to_reg(insn, MIPS_R_T9, ctx);
+		if (bpf_class == BPF_JMP32 && insn->imm < 0)
+			gen_zext_insn(MIPS_R_T9, true, ctx);
+		src = MIPS_R_T9;
 
-		emit_const_to_reg(ctx, MIPS_R_AT, (u64)t64s);
-		emit_instr(ctx, sltu, MIPS_R_AT, dst, MIPS_R_AT);
+		cmp_eq = bpf_op == BPF_JLE || bpf_op == BPF_JGE;
+		switch (bpf_op) {
+		case BPF_JGE:
+			emit_instr(ctx, sltu, MIPS_R_AT, dst, src);
+			break;
+		case BPF_JLT:
+			emit_instr(ctx, sltu, MIPS_R_AT, dst, src);
+			break;
+		case BPF_JGT:
+			emit_instr(ctx, sltu, MIPS_R_AT, src, dst);
+			break;
+		case BPF_JLE:
+			emit_instr(ctx, sltu, MIPS_R_AT, src, dst);
+			break;
+		}
 		src = MIPS_R_AT;
 		dst = MIPS_R_ZERO;
 		goto jeq_common;
 
 	case BPF_JMP | BPF_JSET | BPF_K: /* JMP_IMM */
+	case BPF_JMP32 | BPF_JSET | BPF_K: /* JMP_IMM */
 		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
 		if (dst < 0)
 			return dst;
+		if (bpf_class == BPF_JMP32) {
+			emit_instr(ctx, move, MIPS_R_T8, dst);
+			emit_instr(ctx, dinsu, MIPS_R_T8, MIPS_R_ZERO, 32, 32);
+			dst = MIPS_R_T8;
+		}
 
 		if (ctx->use_bbit_insns && hweight32((u32)insn->imm) == 1) {
 			if ((insn + 1)->code == (BPF_JMP | BPF_EXIT) && insn->off == 1) {
@@ -774,6 +810,8 @@ int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		}
 		t64 = (u32)insn->imm;
 		emit_const_to_reg(ctx, MIPS_R_AT, t64);
+		if (bpf_class == BPF_JMP32)
+			emit_instr(ctx, dinsu, MIPS_R_AT, MIPS_R_ZERO, 32, 32);
 		emit_instr(ctx, and, MIPS_R_AT, dst, MIPS_R_AT);
 		src = MIPS_R_AT;
 		dst = MIPS_R_ZERO;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 14/16] MIPS: eBPF64: implement all BPF_ATOMIC ops
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (11 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 13/16] MIPS: eBPF64: support BPF_JMP32 conditionals Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 15/16] MIPS: uasm: Enable muhu opcode for MIPS R6 Tony Ambardar
  2021-10-05  8:27   ` [RFC PATCH bpf-next v2 16/16] MIPS: eBPF: add MIPS32 JIT Tony Ambardar
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Reorganize code for BPF_ATOMIC and BPF_MEM, and add the atomic ops AND,
OR, XOR, XCHG and CMPXCHG, with support for BPF_FETCH.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/net/ebpf_jit_comp64.c | 181 +++++++++++++++++++++-----------
 1 file changed, 119 insertions(+), 62 deletions(-)

diff --git a/arch/mips/net/ebpf_jit_comp64.c b/arch/mips/net/ebpf_jit_comp64.c
index 842e516ce749..35c8c8307b64 100644
--- a/arch/mips/net/ebpf_jit_comp64.c
+++ b/arch/mips/net/ebpf_jit_comp64.c
@@ -167,7 +167,15 @@ static int gen_imm_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		   int this_idx, int exit_idx)
 {
+	/*
+	 * Since CMPXCHG uses R0 implicitly, outside of a passed
+	 * bpf_insn, we fake a lookup to get the MIPS base reg.
+	 */
+	const struct bpf_insn r0_insn = {.src_reg = BPF_REG_0};
+	const int r0 = ebpf_to_mips_reg(ctx, &r0_insn,
+					REG_SRC_NO_FP);
 	const int bpf_class = BPF_CLASS(insn->code);
+	const int bpf_size = BPF_SIZE(insn->code);
 	const int bpf_op = BPF_OP(insn->code);
 	bool need_swap, did_move, cmp_eq;
 	unsigned int target = 0;
@@ -944,6 +952,32 @@ int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_STX | BPF_H | BPF_MEM:
 	case BPF_STX | BPF_W | BPF_MEM:
 	case BPF_STX | BPF_DW | BPF_MEM:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
+		switch (BPF_SIZE(insn->code)) {
+		case BPF_B:
+			emit_instr(ctx, sb, src, mem_off, dst);
+			break;
+		case BPF_H:
+			emit_instr(ctx, sh, src, mem_off, dst);
+			break;
+		case BPF_W:
+			emit_instr(ctx, sw, src, mem_off, dst);
+			break;
+		case BPF_DW:
+			if (get_reg_val_type(ctx, this_idx, insn->src_reg) == REG_32BIT) {
+				emit_instr(ctx, daddu, MIPS_R_AT, src, MIPS_R_ZERO);
+				emit_instr(ctx, dinsu, MIPS_R_AT, MIPS_R_ZERO, 32, 32);
+				src = MIPS_R_AT;
+			}
+			emit_instr(ctx, sd, src, mem_off, dst);
+			break;
+		}
+		break;
+
 	case BPF_STX | BPF_W | BPF_ATOMIC:
 	case BPF_STX | BPF_DW | BPF_ATOMIC:
 		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
@@ -951,71 +985,94 @@ int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		if (src < 0 || dst < 0)
 			return -EINVAL;
 		mem_off = insn->off;
-		if (BPF_MODE(insn->code) == BPF_ATOMIC) {
-			if (insn->imm != BPF_ADD) {
-				pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
-				return -EINVAL;
+		/*
+		 * If mem_off does not fit within the 9 bit ll/sc
+		 * instruction immediate field, use a temp reg.
+		 */
+		if (MIPS_ISA_REV >= 6 &&
+		    (mem_off >= BIT(8) || mem_off < -BIT(8))) {
+			emit_instr(ctx, daddiu, MIPS_R_T6, dst, mem_off);
+			mem_off = 0;
+			dst = MIPS_R_T6;
+		}
+		/* Copy or adjust 32-bit src regs based on BPF op size. */
+		ts = get_reg_val_type(ctx, this_idx, insn->src_reg);
+		if (bpf_size == BPF_W) {
+			if (ts == REG_32BIT) {
+				emit_instr(ctx, sll, MIPS_R_T9, src, 0);
+				src = MIPS_R_T9;
 			}
+			/* Ensure proper old == new comparison .*/
+			if (insn->imm == BPF_CMPXCHG)
+				emit_instr(ctx, sll, r0, r0, 0);
+		}
+		if (bpf_size == BPF_DW && ts == REG_32BIT) {
+			emit_instr(ctx, move, MIPS_R_T9, src);
+			emit_instr(ctx, dinsu, MIPS_R_T9, MIPS_R_ZERO, 32, 32);
+			src = MIPS_R_T9;
+		}
+
+/* Helper to simplify using BPF_DW/BPF_W atomic opcodes. */
+#define emit_instr_size(ctx, func64, func32, ...)                              \
+do {                                                                           \
+	if (bpf_size == BPF_DW)                                                \
+		emit_instr(ctx, func64, ##__VA_ARGS__);                        \
+	else                                                                   \
+		emit_instr(ctx, func32, ##__VA_ARGS__);                        \
+} while (0)
+
+		/* Track variable branch offset due to CMPXCHG. */
+		b_off = ctx->idx;
+		emit_instr_size(ctx, lld, ll, MIPS_R_AT, mem_off, dst);
+		switch (insn->imm) {
+		case BPF_AND | BPF_FETCH:
+		case BPF_AND:
+			emit_instr(ctx, and, MIPS_R_T8, MIPS_R_AT, src);
+			break;
+		case BPF_OR | BPF_FETCH:
+		case BPF_OR:
+			emit_instr(ctx, or, MIPS_R_T8, MIPS_R_AT, src);
+			break;
+		case BPF_XOR | BPF_FETCH:
+		case BPF_XOR:
+			emit_instr(ctx, xor, MIPS_R_T8, MIPS_R_AT, src);
+			break;
+		case BPF_ADD | BPF_FETCH:
+		case BPF_ADD:
+			emit_instr_size(ctx, daddu, addu, MIPS_R_T8, MIPS_R_AT, src);
+			break;
+		case BPF_XCHG:
+			emit_instr_size(ctx, daddu, addu, MIPS_R_T8, MIPS_R_ZERO, src);
+			break;
+		case BPF_CMPXCHG:
 			/*
-			 * If mem_off does not fit within the 9 bit ll/sc
-			 * instruction immediate field, use a temp reg.
+			 * If R0 != old_val then break out of LL/SC loop
 			 */
-			if (MIPS_ISA_REV >= 6 &&
-			    (mem_off >= BIT(8) || mem_off < -BIT(8))) {
-				emit_instr(ctx, daddiu, MIPS_R_T6,
-						dst, mem_off);
-				mem_off = 0;
-				dst = MIPS_R_T6;
-			}
-			switch (BPF_SIZE(insn->code)) {
-			case BPF_W:
-				if (get_reg_val_type(ctx, this_idx, insn->src_reg) == REG_32BIT) {
-					emit_instr(ctx, sll, MIPS_R_AT, src, 0);
-					src = MIPS_R_AT;
-				}
-				emit_instr(ctx, ll, MIPS_R_T8, mem_off, dst);
-				emit_instr(ctx, addu, MIPS_R_T8, MIPS_R_T8, src);
-				emit_instr(ctx, sc, MIPS_R_T8, mem_off, dst);
-				/*
-				 * On failure back up to LL (-4
-				 * instructions of 4 bytes each
-				 */
-				emit_instr(ctx, beq, MIPS_R_T8, MIPS_R_ZERO, -4 * 4);
-				emit_instr(ctx, nop);
-				break;
-			case BPF_DW:
-				if (get_reg_val_type(ctx, this_idx, insn->src_reg) == REG_32BIT) {
-					emit_instr(ctx, daddu, MIPS_R_AT, src, MIPS_R_ZERO);
-					emit_instr(ctx, dinsu, MIPS_R_AT, MIPS_R_ZERO, 32, 32);
-					src = MIPS_R_AT;
-				}
-				emit_instr(ctx, lld, MIPS_R_T8, mem_off, dst);
-				emit_instr(ctx, daddu, MIPS_R_T8, MIPS_R_T8, src);
-				emit_instr(ctx, scd, MIPS_R_T8, mem_off, dst);
-				emit_instr(ctx, beq, MIPS_R_T8, MIPS_R_ZERO, -4 * 4);
-				emit_instr(ctx, nop);
-				break;
-			}
-		} else { /* BPF_MEM */
-			switch (BPF_SIZE(insn->code)) {
-			case BPF_B:
-				emit_instr(ctx, sb, src, mem_off, dst);
-				break;
-			case BPF_H:
-				emit_instr(ctx, sh, src, mem_off, dst);
-				break;
-			case BPF_W:
-				emit_instr(ctx, sw, src, mem_off, dst);
-				break;
-			case BPF_DW:
-				if (get_reg_val_type(ctx, this_idx, insn->src_reg) == REG_32BIT) {
-					emit_instr(ctx, daddu, MIPS_R_AT, src, MIPS_R_ZERO);
-					emit_instr(ctx, dinsu, MIPS_R_AT, MIPS_R_ZERO, 32, 32);
-					src = MIPS_R_AT;
-				}
-				emit_instr(ctx, sd, src, mem_off, dst);
-				break;
-			}
+			emit_instr(ctx, bne, r0, MIPS_R_AT, 4 * 4);
+			/* Delay slot */
+			emit_instr_size(ctx, daddu, addu, MIPS_R_T8, MIPS_R_ZERO, src);
+			/* Return old_val in R0 */
+			src = r0;
+			break;
+		default:
+			pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
+			return -EINVAL;
+		}
+		emit_instr_size(ctx, scd, sc, MIPS_R_T8, mem_off, dst);
+#undef emit_instr_size
+		/*
+		 * On failure back up to LL (calculate # insns)
+		 */
+		b_off = (b_off - ctx->idx - 1) * 4;
+		emit_instr(ctx, beqz, MIPS_R_T8, b_off);
+		emit_instr(ctx, nop);
+		/*
+		 * Using fetch returns old value in src or R0
+		 */
+		if (insn->imm & BPF_FETCH) {
+			if (bpf_size == BPF_W)
+				emit_instr(ctx, dinsu, MIPS_R_AT, MIPS_R_ZERO, 32, 32);
+			emit_instr(ctx, move, src, MIPS_R_AT);
 		}
 		break;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 15/16] MIPS: uasm: Enable muhu opcode for MIPS R6
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (12 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 14/16] MIPS: eBPF64: implement all BPF_ATOMIC ops Tony Ambardar
@ 2021-10-05  8:26   ` Tony Ambardar
  2021-10-05  8:27   ` [RFC PATCH bpf-next v2 16/16] MIPS: eBPF: add MIPS32 JIT Tony Ambardar
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Enable the 'muhu' instruction, complementing the existing 'mulu', needed
to implement a MIPS32 BPF JIT.

Also fix a typo in the existing definition of 'dmulu'.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 arch/mips/include/asm/uasm.h | 1 +
 arch/mips/mm/uasm-mips.c     | 4 +++-
 arch/mips/mm/uasm.c          | 3 ++-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index f7effca791a5..5efa4e2dc9ab 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -145,6 +145,7 @@ Ip_u1(_mtlo);
 Ip_u3u1u2(_mul);
 Ip_u1u2(_multu);
 Ip_u3u1u2(_mulu);
+Ip_u3u1u2(_muhu);
 Ip_u3u1u2(_nor);
 Ip_u3u1u2(_or);
 Ip_u2u1u3(_ori);
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index 7154a1d99aad..e15c6700cd08 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -90,7 +90,7 @@ static const struct insn insn_table[insn_invalid] = {
 				RS | RT | RD},
 	[insn_dmtc0]	= {M(cop0_op, dmtc_op, 0, 0, 0, 0), RT | RD | SET},
 	[insn_dmultu]	= {M(spec_op, 0, 0, 0, 0, dmultu_op), RS | RT},
-	[insn_dmulu]	= {M(spec_op, 0, 0, 0, dmult_dmul_op, dmultu_op),
+	[insn_dmulu]	= {M(spec_op, 0, 0, 0, dmultu_dmulu_op, dmultu_op),
 				RS | RT | RD},
 	[insn_drotr]	= {M(spec_op, 1, 0, 0, 0, dsrl_op), RT | RD | RE},
 	[insn_drotr32]	= {M(spec_op, 1, 0, 0, 0, dsrl32_op), RT | RD | RE},
@@ -150,6 +150,8 @@ static const struct insn insn_table[insn_invalid] = {
 	[insn_mtlo]	= {M(spec_op, 0, 0, 0, 0, mtlo_op), RS},
 	[insn_mulu]	= {M(spec_op, 0, 0, 0, multu_mulu_op, multu_op),
 				RS | RT | RD},
+	[insn_muhu]	= {M(spec_op, 0, 0, 0, multu_muhu_op, multu_op),
+				RS | RT | RD},
 #ifndef CONFIG_CPU_MIPSR6
 	[insn_mul]	= {M(spec2_op, 0, 0, 0, 0, mul_op), RS | RT | RD},
 #else
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 81dd226d6b6b..125140979d62 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -59,7 +59,7 @@ enum opcode {
 	insn_lddir, insn_ldpte, insn_ldx, insn_lh, insn_lhu, insn_ll, insn_lld,
 	insn_lui, insn_lw, insn_lwu, insn_lwx, insn_mfc0, insn_mfhc0, insn_mfhi,
 	insn_mflo, insn_modu, insn_movn, insn_movz, insn_mtc0, insn_mthc0,
-	insn_mthi, insn_mtlo, insn_mul, insn_multu, insn_mulu, insn_nor,
+	insn_mthi, insn_mtlo, insn_mul, insn_multu, insn_mulu, insn_muhu, insn_nor,
 	insn_or, insn_ori, insn_pref, insn_rfe, insn_rotr, insn_sb, insn_sc,
 	insn_scd, insn_seleqz, insn_selnez, insn_sd, insn_sh, insn_sll,
 	insn_sllv, insn_slt, insn_slti, insn_sltiu, insn_sltu, insn_sra,
@@ -344,6 +344,7 @@ I_u1(_mtlo)
 I_u3u1u2(_mul)
 I_u1u2(_multu)
 I_u3u1u2(_mulu)
+I_u3u1u2(_muhu)
 I_u3u1u2(_nor)
 I_u3u1u2(_or)
 I_u2u1u3(_ori)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH bpf-next v2 16/16] MIPS: eBPF: add MIPS32 JIT
  2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
                     ` (13 preceding siblings ...)
  2021-10-05  8:26   ` [RFC PATCH bpf-next v2 15/16] MIPS: uasm: Enable muhu opcode for MIPS R6 Tony Ambardar
@ 2021-10-05  8:27   ` Tony Ambardar
  14 siblings, 0 replies; 32+ messages in thread
From: Tony Ambardar @ 2021-10-05  8:27 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Thomas Bogendoerfer, Paul Burton
  Cc: Tony Ambardar, netdev, bpf, linux-mips, Johan Almbladh,
	Tiezhu Yang, Hassan Naveed, David Daney, Luke Nelson,
	Serge Semin, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh

Add a new variant of build_one_insn() supporting MIPS32, leveraging the
previously added common functions, and disable static analysis as unneeded
on MIPS32. Also define bpf_jit_needs_zext() to request verifier zext
insertion. Handle these zext insns, and add conditional zext for all ALU32
and LDX word-size operations for cases where the verifier is unable to do
so (e.g. test_bpf bypasses verifier).

Aside from mapping 64-bit BPF registers to pairs of 32-bit MIPS registers,
notable changes from the MIPS64 version of build_one_insn() include:

  BPF_ALU{,64} | {DIV,MOD} | BPF_K: drop divide-by-zero guard as the insns
  underlying do not raise exceptions.

  BPF_JMP | JSET | BPF_K: drop bbit insns only usable on MIPS64 Octeon.

In addition to BPF_ADD, implement atomic insns AND, OR, XOR, XCHG and
CMPXCHG, together with support for BPF_FETCH, in mode BPF_W.

The MIPS32 ISA does not include 64-bit div/mod or atomic opcodes. Add the
emit_bpf_divmod64() and emit_bpf_atomic64() functions, which use built-in
kernel functions to implement the following BPF insns:

  BPF_STX   | BPF_DW  | BPF_ATOMIC
    (AND, OR, XOR, XCHG, CMPXCHG, FETCH)
  BPF_ALU64 | BPF_DIV | BPF_X
  BPF_ALU64 | BPF_DIV | BPF_K
  BPF_ALU64 | BPF_MOD | BPF_X
  BPF_ALU64 | BPF_MOD | BPF_K

Test and development primarily used LTS kernel 5.10.x and then 5.13.x,
running under QEMU. Test suites included the 'test_bpf' module and the
'test_verifier' program from kselftests. Testing with 'test_progs' from
kselftests was not possible in general since cross-compilation depends on
libbpf/bpftool, which does not support cross-endian builds (see also [1]).

The matrix of test configurations executed for this series covers:
MIPSWORD={64-bit,32-bit} x MIPSISA={R2,R6} x JIT={off,on,hardened}

On MIPS32BE and MIPS32LE there was general parity between the results of
interpreter vs. JIT-backed tests with respect to the numbers of PASSED,
SKIPPED, and FAILED tests.

A sample of results on QEMU/MIPS32LE from kernel 5.13.x:

  root@OpenWrt:~# sysctl net.core.bpf_jit_enable=1
  root@OpenWrt:~# modprobe test_bpf
  ...
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]
  root@OpenWrt:~# ./test_verifier 0 884
  ...
  Summary: 1231 PASSED, 0 SKIPPED, 20 FAILED
  root@OpenWrt:~# ./test_verifier 886 1184
  ...
  Summary: 459 PASSED, 1 SKIPPED, 2 FAILED
  root@OpenWrt:~# ./test_progs -n 105,106
  ...
  105 subprogs:OK
  106/1 tailcall_1:OK
  106/2 tailcall_2:OK
  106/3 tailcall_3:OK
  106/4 tailcall_4:OK
  106/5 tailcall_5:OK
  106/6 tailcall_bpf2bpf_1:OK
  106/7 tailcall_bpf2bpf_2:OK
  106/8 tailcall_bpf2bpf_3:OK
  106/9 tailcall_bpf2bpf_4:OK
  106 tailcalls:OK
  Summary: 2/9 PASSED, 0 SKIPPED, 0 FAILED

Link: [1] https://lore.kernel.org/bpf/CAEf4BzZCnP3oB81w4BDL4TCmvO3vPw8MucOTbVnjbW8UuCtejw@mail.gmail.com/

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
---
 Documentation/admin-guide/sysctl/net.rst |    6 +-
 Documentation/networking/filter.rst      |    6 +-
 arch/mips/Kconfig                        |    4 +-
 arch/mips/net/Makefile                   |    8 +-
 arch/mips/net/ebpf_jit_comp32.c          | 1398 ++++++++++++++++++++++
 arch/mips/net/ebpf_jit_core.c            |   20 +-
 6 files changed, 1426 insertions(+), 16 deletions(-)
 create mode 100644 arch/mips/net/ebpf_jit_comp32.c

diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index 4150f74c521a..099e3efbf38e 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -66,14 +66,16 @@ two flavors of JITs, the newer eBPF JIT currently supported on:
   - ppc64
   - ppc32
   - sparc64
-  - mips64
+  - mips64 (R2+)
+  - mips32 (R2+)
   - s390x
   - riscv64
   - riscv32
 
 And the older cBPF JIT supported on the following archs:
 
-  - mips
+  - mips64 (R1)
+  - mips32 (R1)
   - sparc
 
 eBPF JITs are a superset of cBPF JITs, meaning the kernel will
diff --git a/Documentation/networking/filter.rst b/Documentation/networking/filter.rst
index 3e2221f4abe4..31101411da0e 100644
--- a/Documentation/networking/filter.rst
+++ b/Documentation/networking/filter.rst
@@ -637,9 +637,9 @@ skb pointer). All constraints and restrictions from bpf_check_classic() apply
 before a conversion to the new layout is being done behind the scenes!
 
 Currently, the classic BPF format is being used for JITing on most
-32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64,
-sparc64, arm32, riscv64, riscv32 perform JIT compilation from eBPF
-instruction set.
+32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64,
+mips64, riscv64, arm32, riscv32, and mips32 perform JIT compilation from
+eBPF instruction set.
 
 Some core changes of the new internal format:
 
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index ed51970c08e7..d096d2332fe4 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -55,7 +55,7 @@ config MIPS
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES
 	select HAVE_ASM_MODVERSIONS
-	select HAVE_CBPF_JIT if !64BIT && !CPU_MICROMIPS
+	select HAVE_CBPF_JIT if !CPU_MICROMIPS && TARGET_ISA_REV < 2
 	select HAVE_CONTEXT_TRACKING
 	select HAVE_TIF_NOHZ
 	select HAVE_C_RECORDMCOUNT
@@ -63,7 +63,7 @@ config MIPS
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DMA_CONTIGUOUS
 	select HAVE_DYNAMIC_FTRACE
-	select HAVE_EBPF_JIT if 64BIT && !CPU_MICROMIPS && TARGET_ISA_REV >= 2
+	select HAVE_EBPF_JIT if !CPU_MICROMIPS && TARGET_ISA_REV >= 2
 	select HAVE_EXIT_THREAD
 	select HAVE_FAST_GUP
 	select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/mips/net/Makefile b/arch/mips/net/Makefile
index de42f4a4db56..5f804bc54629 100644
--- a/arch/mips/net/Makefile
+++ b/arch/mips/net/Makefile
@@ -2,4 +2,10 @@
 # MIPS networking code
 
 obj-$(CONFIG_MIPS_CBPF_JIT) += bpf_jit.o bpf_jit_asm.o
-obj-$(CONFIG_MIPS_EBPF_JIT) += ebpf_jit_core.o ebpf_jit_comp64.o
+
+obj-$(CONFIG_MIPS_EBPF_JIT) += ebpf_jit_core.o
+ifeq ($(CONFIG_CPU_MIPS64),y)
+	obj-$(CONFIG_MIPS_EBPF_JIT) += ebpf_jit_comp64.o
+else
+	obj-$(CONFIG_MIPS_EBPF_JIT) += ebpf_jit_comp32.o
+endif
diff --git a/arch/mips/net/ebpf_jit_comp32.c b/arch/mips/net/ebpf_jit_comp32.c
new file mode 100644
index 000000000000..13623224f78e
--- /dev/null
+++ b/arch/mips/net/ebpf_jit_comp32.c
@@ -0,0 +1,1398 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Just-In-Time compiler for eBPF filters on MIPS32/MIPS64
+ * Copyright (c) 2021 Tony Ambardar <Tony.Ambardar@gmail.com>
+ *
+ * Based on code from:
+ *
+ * Copyright (c) 2017 Cavium, Inc.
+ * Author: David Daney <david.daney@cavium.com>
+ *
+ * Copyright (c) 2014 Imagination Technologies Ltd.
+ * Author: Markos Chandras <markos.chandras@imgtec.com>
+ */
+
+#include <linux/errno.h>
+#include <linux/filter.h>
+#include <asm/uasm.h>
+
+#include "ebpf_jit.h"
+
+static int gen_imm_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
+			int idx)
+{
+	int dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+	int upper_bound, lower_bound, shamt;
+	int imm = insn->imm;
+
+	if (dst < 0)
+		return dst;
+
+	switch (BPF_OP(insn->code)) {
+	case BPF_MOV:
+	case BPF_ADD:
+		upper_bound = S16_MAX;
+		lower_bound = S16_MIN;
+		break;
+	case BPF_SUB:
+		upper_bound = -(int)S16_MIN;
+		lower_bound = -(int)S16_MAX;
+		break;
+	case BPF_AND:
+	case BPF_OR:
+	case BPF_XOR:
+		upper_bound = 0xffff;
+		lower_bound = 0;
+		break;
+	case BPF_RSH:
+	case BPF_LSH:
+	case BPF_ARSH:
+		/* Shift amounts are truncated, no need for bounds */
+		upper_bound = S32_MAX;
+		lower_bound = S32_MIN;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	/*
+	 * Immediate move clobbers the register, so no sign/zero
+	 * extension needed.
+	 */
+	if (lower_bound <= imm && imm <= upper_bound) {
+		/* single insn immediate case */
+		switch (BPF_OP(insn->code) | BPF_CLASS(insn->code)) {
+		case BPF_ALU64 | BPF_MOV:
+			emit_instr(ctx, addiu, LO(dst), MIPS_R_ZERO, imm);
+			if (imm < 0)
+				gen_sext_insn(dst, ctx);
+			else
+				gen_zext_insn(dst, true, ctx);
+			break;
+		case BPF_ALU | BPF_MOV:
+			emit_instr(ctx, addiu, LO(dst), MIPS_R_ZERO, imm);
+			break;
+		case BPF_ALU64 | BPF_AND:
+			if (imm >= 0)
+				gen_zext_insn(dst, true, ctx);
+			fallthrough;
+		case BPF_ALU | BPF_AND:
+			emit_instr(ctx, andi, LO(dst), LO(dst), imm);
+			break;
+		case BPF_ALU64 | BPF_OR:
+			if (imm < 0)
+				emit_instr(ctx, nor, HI(dst),
+						MIPS_R_ZERO, MIPS_R_ZERO);
+			fallthrough;
+		case BPF_ALU | BPF_OR:
+			emit_instr(ctx, ori, LO(dst), LO(dst), imm);
+			break;
+		case BPF_ALU64 | BPF_XOR:
+			if (imm < 0)
+				emit_instr(ctx, nor, HI(dst),
+							HI(dst), MIPS_R_ZERO);
+			fallthrough;
+		case BPF_ALU | BPF_XOR:
+			emit_instr(ctx, xori, LO(dst), LO(dst), imm);
+			break;
+		case BPF_ALU64 | BPF_ADD:
+			emit_instr(ctx, addiu, LO(dst), LO(dst), imm);
+			if (imm < 0)
+				emit_instr(ctx, addiu, HI(dst), HI(dst), -1);
+			emit_instr(ctx, sltiu, MIPS_R_AT, LO(dst), imm);
+			emit_instr(ctx, addu, HI(dst), HI(dst), MIPS_R_AT);
+			break;
+		case BPF_ALU64 | BPF_SUB:
+			emit_instr(ctx, addiu, MIPS_R_AT, LO(dst), -imm);
+			if (imm < 0)
+				emit_instr(ctx, addiu, HI(dst), HI(dst), 1);
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), MIPS_R_AT);
+			emit_instr(ctx, subu, HI(dst), HI(dst), MIPS_R_AT);
+			emit_instr(ctx, addiu, LO(dst), LO(dst), -imm);
+			break;
+		case BPF_ALU64 | BPF_ARSH:
+			shamt = imm & 0x3f;
+			if (shamt >= 32) {
+				emit_instr(ctx, sra, LO(dst),
+							HI(dst), shamt - 32);
+				emit_instr(ctx, sra, HI(dst), HI(dst), 31);
+			} else if (shamt > 0) {
+				emit_instr(ctx, srl, LO(dst), LO(dst), shamt);
+				emit_instr(ctx, ins, LO(dst), HI(dst),
+							32 - shamt, shamt);
+				emit_instr(ctx, sra, HI(dst), HI(dst), shamt);
+			}
+			break;
+		case BPF_ALU64 | BPF_RSH:
+			shamt = imm & 0x3f;
+			if (shamt >= 32) {
+				emit_instr(ctx, srl, LO(dst),
+							HI(dst), shamt - 32);
+				emit_instr(ctx, and, HI(dst),
+							HI(dst), MIPS_R_ZERO);
+			} else if (shamt > 0) {
+				emit_instr(ctx, srl, LO(dst), LO(dst), shamt);
+				emit_instr(ctx, ins, LO(dst), HI(dst),
+							32 - shamt, shamt);
+				emit_instr(ctx, srl, HI(dst), HI(dst), shamt);
+			}
+			break;
+		case BPF_ALU64 | BPF_LSH:
+			shamt = imm & 0x3f;
+			if (shamt >= 32) {
+				emit_instr(ctx, sll, HI(dst),
+							LO(dst), shamt - 32);
+				emit_instr(ctx, and, LO(dst),
+							LO(dst), MIPS_R_ZERO);
+			} else if (shamt > 0) {
+				emit_instr(ctx, srl, MIPS_R_AT,
+							LO(dst), 32 - shamt);
+				emit_instr(ctx, sll, HI(dst), HI(dst), shamt);
+				emit_instr(ctx, sll, LO(dst), LO(dst), shamt);
+				emit_instr(ctx, or, HI(dst),
+							HI(dst), MIPS_R_AT);
+			}
+			break;
+		case BPF_ALU | BPF_RSH:
+			emit_instr(ctx, srl, LO(dst), LO(dst), imm & 0x1f);
+			break;
+		case BPF_ALU | BPF_LSH:
+			emit_instr(ctx, sll, LO(dst), LO(dst), imm & 0x1f);
+			break;
+		case BPF_ALU | BPF_ARSH:
+			emit_instr(ctx, sra, LO(dst), LO(dst), imm & 0x1f);
+			break;
+		case BPF_ALU | BPF_ADD:
+			emit_instr(ctx, addiu, LO(dst), LO(dst), imm);
+			break;
+		case BPF_ALU | BPF_SUB:
+			emit_instr(ctx, addiu, LO(dst), LO(dst), -imm);
+			break;
+		default:
+			return -EINVAL;
+		}
+	} else {
+		/* multi insn immediate case */
+		if (BPF_OP(insn->code) == BPF_MOV) {
+			gen_imm_to_reg(insn, LO(dst), ctx);
+			if (BPF_CLASS(insn->code) == BPF_ALU64) {
+				if (imm < 0)
+					gen_sext_insn(dst, ctx);
+				else
+					gen_zext_insn(dst, true, ctx);
+			}
+		} else {
+			gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+			switch (BPF_OP(insn->code) | BPF_CLASS(insn->code)) {
+			case BPF_ALU64 | BPF_AND:
+				if (imm >= 0)
+					gen_zext_insn(dst, true, ctx);
+				fallthrough;
+			case BPF_ALU | BPF_AND:
+				emit_instr(ctx, and, LO(dst), LO(dst),
+								MIPS_R_AT);
+				break;
+			case BPF_ALU64 | BPF_OR:
+				if (imm < 0)
+					emit_instr(ctx, nor, HI(dst),
+						MIPS_R_ZERO, MIPS_R_ZERO);
+			fallthrough;
+			case BPF_ALU | BPF_OR:
+				emit_instr(ctx, or, LO(dst), LO(dst),
+								MIPS_R_AT);
+				break;
+			case BPF_ALU64 | BPF_XOR:
+				if (imm < 0)
+					emit_instr(ctx, nor, HI(dst),
+							HI(dst), MIPS_R_ZERO);
+			fallthrough;
+			case BPF_ALU | BPF_XOR:
+				emit_instr(ctx, xor, LO(dst), LO(dst),
+								MIPS_R_AT);
+				break;
+			case BPF_ALU64 | BPF_ADD:
+				emit_instr(ctx, addu, LO(dst),
+							LO(dst), MIPS_R_AT);
+				if (imm < 0)
+					emit_instr(ctx, addiu, HI(dst), HI(dst), -1);
+				emit_instr(ctx, sltu, MIPS_R_AT,
+							LO(dst), MIPS_R_AT);
+				emit_instr(ctx, addu, HI(dst),
+							HI(dst), MIPS_R_AT);
+				break;
+			case BPF_ALU64 | BPF_SUB:
+				emit_instr(ctx, subu, LO(dst),
+							LO(dst), MIPS_R_AT);
+				if (imm < 0)
+					emit_instr(ctx, addiu, HI(dst), HI(dst), 1);
+				emit_instr(ctx, sltu, MIPS_R_AT,
+							MIPS_R_AT, LO(dst));
+				emit_instr(ctx, subu, HI(dst),
+							HI(dst), MIPS_R_AT);
+				break;
+			case BPF_ALU | BPF_ADD:
+				emit_instr(ctx, addu, LO(dst), LO(dst),
+								MIPS_R_AT);
+				break;
+			case BPF_ALU | BPF_SUB:
+				emit_instr(ctx, subu, LO(dst), LO(dst),
+								MIPS_R_AT);
+				break;
+			default:
+				return -EINVAL;
+			}
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Implement 64-bit BPF div/mod insns on 32-bit systems by calling the
+ * equivalent built-in kernel function. The function args may be mixed
+ * 64/32-bit types, unlike the uniform u64 args of BPF kernel helpers.
+ * Func proto: u64 div64_u64_rem(u64 dividend, u64 divisor, u64 *remainder)
+ */
+static int emit_bpf_divmod64(struct jit_ctx *ctx, const struct bpf_insn *insn)
+{
+	const int bpf_src = BPF_SRC(insn->code);
+	const int bpf_op = BPF_OP(insn->code);
+	int rem_off, arg_off;
+	int src, dst, tmp;
+	u32 func_addr;
+
+	ctx->flags |= EBPF_SAVE_RA;
+
+	dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+	if (dst < 0)
+		return -EINVAL;
+
+	if (bpf_src == BPF_X) {
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0)
+			return -EINVAL;
+		/*
+		 * Use MIPS_R_T8 as temp reg pair to avoid target
+		 * of dst from clobbering src.
+		 */
+		if (src == MIPS_R_A0) {
+			tmp = MIPS_R_T8;
+			emit_instr(ctx, move, LO(tmp), LO(src));
+			emit_instr(ctx, move, HI(tmp), HI(src));
+			src = tmp;
+		}
+	}
+
+	/* Save caller registers */
+	emit_caller_save(ctx);
+	/* Push O32 stack, aligned space for u64, u64, u64 *, u64 */
+	emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, -32);
+
+	func_addr = (u32) &div64_u64_rem;
+	/* Move u64 dst to arg 1 as needed */
+	if (dst != MIPS_R_A0) {
+		emit_instr(ctx, move, LO(MIPS_R_A0), LO(dst));
+		emit_instr(ctx, move, HI(MIPS_R_A0), HI(dst));
+	}
+	/* Load imm or move u64 src to arg 2 as needed */
+	if (bpf_src == BPF_K) {
+		gen_imm_to_reg(insn, LO(MIPS_R_A2), ctx);
+		gen_sext_insn(MIPS_R_A2, ctx);
+	} else if (src != MIPS_R_A2) { /* BPF_X */
+		emit_instr(ctx, move, LO(MIPS_R_A2), LO(src));
+		emit_instr(ctx, move, HI(MIPS_R_A2), HI(src));
+	}
+	/* Set up stack arg 3 as ptr to u64 remainder on stack */
+	arg_off = 16;
+	rem_off = 24;
+	emit_instr(ctx, addiu, MIPS_R_AT, MIPS_R_SP, rem_off);
+	emit_instr(ctx, sw, MIPS_R_AT, arg_off, MIPS_R_SP);
+
+	emit_const_to_reg(ctx, MIPS_R_T9, func_addr);
+	emit_instr(ctx, jalr, MIPS_R_RA, MIPS_R_T9);
+	/* Delay slot */
+	emit_instr(ctx, nop);
+
+	/* Move return value to dst as needed */
+	switch (bpf_op) {
+	case BPF_DIV:
+		/* Quotient in MIPS_R_V0 reg pair */
+		if (dst != MIPS_R_V0) {
+			emit_instr(ctx, move, LO(dst), LO(MIPS_R_V0));
+			emit_instr(ctx, move, HI(dst), HI(MIPS_R_V0));
+		}
+		break;
+	case BPF_MOD:
+		/* Remainder on stack */
+		emit_instr(ctx, lw, LO(dst), OFFLO(rem_off), MIPS_R_SP);
+		emit_instr(ctx, lw, HI(dst), OFFHI(rem_off), MIPS_R_SP);
+		break;
+	}
+
+	/* Pop O32 call stack */
+	emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, 32);
+	/* Restore all caller registers except call return value*/
+	emit_caller_restore(ctx, insn->dst_reg);
+
+	return 0;
+}
+
+/*
+ * Implement 64-bit BPF atomic insns on 32-bit systems by calling the
+ * equivalent built-in kernel function. The function args may be mixed
+ * 64/32-bit types, unlike the uniform u64 args of BPF kernel helpers.
+ */
+static int emit_bpf_atomic64(struct jit_ctx *ctx, const struct bpf_insn *insn)
+{
+	/*
+	 * Since CMPXCHG uses R0 implicitly, outside of a passed
+	 * bpf_insn, we fake a lookup to get the MIPS base reg.
+	 */
+	const struct bpf_insn r0_insn = {.src_reg = BPF_REG_0};
+	const int r0 = ebpf_to_mips_reg(ctx, &r0_insn,
+					REG_SRC_NO_FP);
+	int mem_off, arg_off, stack_adj;
+	int src, dst, fetch;
+	u32 func_addr;
+
+
+	ctx->flags |= EBPF_SAVE_RA;
+
+	dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+	src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+	if (src < 0 || dst < 0)
+		return -EINVAL;
+	mem_off = insn->off;
+
+	/* Save caller registers */
+	emit_caller_save(ctx);
+
+	/*
+	 * Push O32 ABI stack, noting CMPXCHG requires 1 u64 arg passed
+	 * on the stack.
+	 */
+	stack_adj = insn->imm == BPF_CMPXCHG ? 24 : 16;
+	emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, -stack_adj);
+
+	switch (insn->imm) {
+	case BPF_AND | BPF_FETCH:
+		/* void atomic64_fetch_and(s64 a, atomic64_t *v) */
+		func_addr = (u32) &atomic64_fetch_and;
+		goto args_common;
+	case BPF_OR | BPF_FETCH:
+		/* void atomic64_fetch_or(s64 a, atomic64_t *v) */
+		func_addr = (u32) &atomic64_fetch_or;
+		goto args_common;
+	case BPF_XOR | BPF_FETCH:
+		/* void atomic64_fetch_xor(s64 a, atomic64_t *v) */
+		func_addr = (u32) &atomic64_fetch_xor;
+		goto args_common;
+	case BPF_ADD | BPF_FETCH:
+		/* void atomic64_fetch_add(s64 a, atomic64_t *v) */
+		func_addr = (u32) &atomic64_fetch_add;
+		goto args_common;
+	case BPF_AND:
+		/* void atomic64_and(s64 a, atomic64_t *v) */
+		func_addr = (u32) &atomic64_and;
+		goto args_common;
+	case BPF_OR:
+		/* void atomic64_or(s64 a, atomic64_t *v) */
+		func_addr = (u32) &atomic64_or;
+		goto args_common;
+	case BPF_XOR:
+		/* void atomic64_xor(s64 a, atomic64_t *v) */
+		func_addr = (u32) &atomic64_xor;
+		goto args_common;
+	case BPF_ADD:
+		/* void atomic64_add(s64 a, atomic64_t *v) */
+		func_addr = (u32) &atomic64_add;
+args_common:
+		fetch = src;
+		/* Set up dst ptr in tmp reg if conflicting */
+		if (dst == MIPS_R_A0) {
+			emit_instr(ctx, move, LO(MIPS_R_T8), LO(dst));
+			dst = MIPS_R_T8;
+		}
+		/* Move s64 src to arg 1 as needed */
+		if (src != MIPS_R_A0) {
+			emit_instr(ctx, move, LO(MIPS_R_A0), LO(src));
+			emit_instr(ctx, move, HI(MIPS_R_A0), HI(src));
+		}
+		/* Set up dst ptr in arg 2 base register*/
+		emit_instr(ctx, addiu, MIPS_R_A2, LO(dst), mem_off);
+		break;
+	case BPF_XCHG:
+		/* s64 atomic64_xchg(atomic64_t *v, s64 i) */
+		func_addr = (u32) &atomic64_xchg;
+		fetch = src;
+		/* Set up dst ptr in tmp reg if conflicting */
+		if (dst == MIPS_R_A2) {
+			emit_instr(ctx, move, LO(MIPS_R_T8), LO(dst));
+			dst = MIPS_R_T8;
+		}
+		emit_instr(ctx, addiu, MIPS_R_AT, LO(dst), mem_off);
+		/* Move s64 src to arg 2 as needed */
+		if (src != MIPS_R_A2) {
+			emit_instr(ctx, move, LO(MIPS_R_A2), LO(src));
+			emit_instr(ctx, move, HI(MIPS_R_A2), HI(src));
+		}
+		/* Set up dst ptr in arg 1 base register*/
+		emit_instr(ctx, addiu, MIPS_R_A0, LO(dst), mem_off);
+		break;
+	case BPF_CMPXCHG:
+		/* s64 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new) */
+		func_addr = (u32) &atomic64_cmpxchg;
+		fetch = r0;
+		/* Move s64 src to arg 3 on stack */
+		arg_off = 16;
+		emit_instr(ctx, sw, LO(src), OFFLO(arg_off), MIPS_R_SP);
+		emit_instr(ctx, sw, HI(src), OFFHI(arg_off), MIPS_R_SP);
+		/* Set up dst ptr in arg 1 base register*/
+		emit_instr(ctx, addiu, MIPS_R_A0, LO(dst), mem_off);
+		/* Move s64 R0 to arg 2 */
+		emit_instr(ctx, move, LO(MIPS_R_A2), LO(r0));
+		emit_instr(ctx, move, HI(MIPS_R_A2), HI(r0));
+		break;
+	default:
+		pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
+		return -EINVAL;
+	}
+
+	emit_const_to_reg(ctx, MIPS_R_T9, func_addr);
+	emit_instr(ctx, jalr, MIPS_R_RA, MIPS_R_T9);
+	/* Delay slot */
+	emit_instr(ctx, nop);
+
+	/* Pop O32 stack */
+	emit_instr(ctx, addiu, MIPS_R_SP, MIPS_R_SP, stack_adj);
+
+	if (insn->imm & BPF_FETCH) {
+		/* Set up returned value */
+		if (fetch != MIPS_R_V0) {
+			emit_instr(ctx, move, LO(fetch), LO(MIPS_R_V0));
+			emit_instr(ctx, move, HI(fetch), HI(MIPS_R_V0));
+		}
+		/* Restore all caller registers except one fetched */
+		if (insn->imm == BPF_CMPXCHG)
+			emit_caller_restore(ctx, BPF_REG_0);
+		else
+			emit_caller_restore(ctx, insn->src_reg);
+	} else {
+		/* Restore all caller registers since none clobbered */
+		emit_caller_restore(ctx, BPF_REG_FP);
+	}
+
+	return 0;
+}
+
+/* Returns the number of insn slots consumed. */
+int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
+		   int this_idx, int exit_idx)
+{
+	/*
+	 * Since CMPXCHG uses R0 implicitly, outside of a passed
+	 * bpf_insn, we fake a lookup to get the MIPS base reg.
+	 */
+	const struct bpf_insn r0_insn = {.src_reg = BPF_REG_0};
+	const int r0 = ebpf_to_mips_reg(ctx, &r0_insn,
+					REG_SRC_NO_FP);
+	const int bpf_class = BPF_CLASS(insn->code);
+	const int bpf_size = BPF_SIZE(insn->code);
+	const int bpf_src = BPF_SRC(insn->code);
+	const int bpf_op = BPF_OP(insn->code);
+	int src, dst, r, mem_off, b_off;
+	bool need_swap, cmp_eq;
+	unsigned int target = 0;
+
+	switch (insn->code) {
+	case BPF_ALU64 | BPF_ADD | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_SUB | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_LSH | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_RSH | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_ARSH | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_XOR | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_MOV | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_OR | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_AND | BPF_K: /* ALU64_IMM */
+	case BPF_ALU | BPF_MOV | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_ADD | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_SUB | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_OR | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_AND | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_LSH | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_RSH | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_XOR | BPF_K: /* ALU32_IMM */
+	case BPF_ALU | BPF_ARSH | BPF_K: /* ALU32_IMM */
+		r = gen_imm_insn(insn, ctx, this_idx);
+		if (r < 0)
+			return r;
+		break;
+	case BPF_ALU64 | BPF_MUL | BPF_K: /* ALU64_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		if (insn->imm == 1) /* Mult by 1 is a nop */
+			break;
+		src = MIPS_R_T8; /* Use tmp reg pair for imm */
+		gen_imm_to_reg(insn, LO(src), ctx);
+		emit_instr(ctx, sra, HI(src), LO(src), 31);
+		goto case_alu64_mul_x;
+
+	case BPF_ALU64 | BPF_NEG | BPF_K: /* ALU64_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		emit_instr(ctx, subu, LO(dst), MIPS_R_ZERO, LO(dst));
+		emit_instr(ctx, subu, HI(dst), MIPS_R_ZERO, HI(dst));
+		emit_instr(ctx, sltu, MIPS_R_AT, MIPS_R_ZERO, LO(dst));
+		emit_instr(ctx, subu, HI(dst), HI(dst), MIPS_R_AT);
+		break;
+	case BPF_ALU | BPF_MUL | BPF_K: /* ALU_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		if (insn->imm == 1) /* Mult by 1 is a nop */
+			break;
+		gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+		if (MIPS_ISA_REV >= 6) {
+			emit_instr(ctx, mulu, LO(dst), LO(dst), MIPS_R_AT);
+		} else {
+			emit_instr(ctx, multu, LO(dst), MIPS_R_AT);
+			emit_instr(ctx, mflo, LO(dst));
+		}
+		break;
+	case BPF_ALU | BPF_NEG | BPF_K: /* ALU_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		emit_instr(ctx, subu, LO(dst), MIPS_R_ZERO, LO(dst));
+		break;
+	case BPF_ALU | BPF_DIV | BPF_K: /* ALU_IMM */
+	case BPF_ALU | BPF_MOD | BPF_K: /* ALU_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		if (insn->imm == 1) {
+			/* div by 1 is a nop, mod by 1 is zero */
+			if (bpf_op == BPF_MOD)
+				emit_instr(ctx, move, LO(dst), MIPS_R_ZERO);
+			break;
+		}
+		gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+		if (MIPS_ISA_REV >= 6) {
+			if (bpf_op == BPF_DIV)
+				emit_instr(ctx, divu_r6, LO(dst),
+							LO(dst), MIPS_R_AT);
+			else
+				emit_instr(ctx, modu, LO(dst),
+							LO(dst), MIPS_R_AT);
+			break;
+		}
+		emit_instr(ctx, divu, LO(dst), MIPS_R_AT);
+		if (bpf_op == BPF_DIV)
+			emit_instr(ctx, mflo, LO(dst));
+		else
+			emit_instr(ctx, mfhi, LO(dst));
+		break;
+	case BPF_ALU64 | BPF_DIV | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_MOD | BPF_K: /* ALU64_IMM */
+	case BPF_ALU64 | BPF_DIV | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_MOD | BPF_X: /* ALU64_REG */
+		r = emit_bpf_divmod64(ctx, insn);
+		if (r < 0)
+			return r;
+		break;
+	case BPF_ALU64 | BPF_MUL | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_ADD | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_SUB | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_MOV | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_XOR | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_OR | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_AND | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_LSH | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_RSH | BPF_X: /* ALU64_REG */
+	case BPF_ALU64 | BPF_ARSH | BPF_X: /* ALU64_REG */
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		switch (bpf_op) {
+		case BPF_MOV:
+			emit_instr(ctx, move, LO(dst), LO(src));
+			emit_instr(ctx, move, HI(dst), HI(src));
+			break;
+		case BPF_ADD:
+			emit_instr(ctx, addu, HI(dst), HI(dst), HI(src));
+			emit_instr(ctx, addu, MIPS_R_AT, LO(dst), LO(src));
+			emit_instr(ctx, sltu, MIPS_R_AT, MIPS_R_AT, LO(dst));
+			emit_instr(ctx, addu, HI(dst), HI(dst), MIPS_R_AT);
+			emit_instr(ctx, addu, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_SUB:
+			emit_instr(ctx, subu, HI(dst), HI(dst), HI(src));
+			emit_instr(ctx, subu, MIPS_R_AT, LO(dst), LO(src));
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), MIPS_R_AT);
+			emit_instr(ctx, subu, HI(dst), HI(dst), MIPS_R_AT);
+			emit_instr(ctx, subu, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_XOR:
+			emit_instr(ctx, xor, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, xor, HI(dst), HI(dst), HI(src));
+			break;
+		case BPF_OR:
+			emit_instr(ctx, or, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, or, HI(dst), HI(dst), HI(src));
+			break;
+		case BPF_AND:
+			emit_instr(ctx, and, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, and, HI(dst), HI(dst), HI(src));
+			break;
+		case BPF_MUL:
+case_alu64_mul_x:
+			emit_instr(ctx, mul, HI(dst), HI(dst), LO(src));
+			emit_instr(ctx, mul, MIPS_R_AT, LO(dst), HI(src));
+			emit_instr(ctx, addu, HI(dst), HI(dst), MIPS_R_AT);
+			if (MIPS_ISA_REV >= 6) {
+				emit_instr(ctx, muhu, MIPS_R_AT, LO(dst), LO(src));
+				emit_instr(ctx, mul, LO(dst), LO(dst), LO(src));
+			} else {
+				emit_instr(ctx, multu, LO(dst), LO(src));
+				emit_instr(ctx, mfhi, MIPS_R_AT);
+				emit_instr(ctx, mflo, LO(dst));
+			}
+			emit_instr(ctx, addu, HI(dst), HI(dst), MIPS_R_AT);
+			break;
+		case BPF_LSH:
+			emit_instr(ctx, beqz, LO(src), 11 * 4);
+			emit_instr(ctx, addiu, MIPS_R_AT, LO(src), -32);
+			emit_instr(ctx, bltz, MIPS_R_AT, 4 * 4);
+			emit_instr(ctx, nop);
+			emit_instr(ctx, sllv, HI(dst), LO(dst), MIPS_R_AT);
+			emit_instr(ctx, and, LO(dst), LO(dst), MIPS_R_ZERO);
+			emit_instr(ctx, b, 5 * 4);
+			emit_instr(ctx, subu, MIPS_R_AT, MIPS_R_ZERO, MIPS_R_AT);
+			emit_instr(ctx, srlv, MIPS_R_AT, LO(dst), MIPS_R_AT);
+			emit_instr(ctx, sllv, HI(dst), HI(dst), LO(src));
+			emit_instr(ctx, sllv, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, or, HI(dst), HI(dst), MIPS_R_AT);
+			break;
+		case BPF_RSH:
+			emit_instr(ctx, beqz, LO(src), 11 * 4);
+			emit_instr(ctx, addiu, MIPS_R_AT, LO(src), -32);
+			emit_instr(ctx, bltz, MIPS_R_AT, 4 * 4);
+			emit_instr(ctx, nop);
+			emit_instr(ctx, srlv, LO(dst), HI(dst), MIPS_R_AT);
+			emit_instr(ctx, and, HI(dst), HI(dst), MIPS_R_ZERO);
+			emit_instr(ctx, b, 5 * 4);
+			emit_instr(ctx, subu, MIPS_R_AT, MIPS_R_ZERO, MIPS_R_AT);
+			emit_instr(ctx, sllv, MIPS_R_AT, HI(dst), MIPS_R_AT);
+			emit_instr(ctx, srlv, HI(dst), HI(dst), LO(src));
+			emit_instr(ctx, srlv, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, or, LO(dst), LO(dst), MIPS_R_AT);
+			break;
+		case BPF_ARSH:
+			emit_instr(ctx, beqz, LO(src), 11 * 4);
+			emit_instr(ctx, addiu, MIPS_R_AT, LO(src), -32);
+			emit_instr(ctx, bltz, MIPS_R_AT, 4 * 4);
+			emit_instr(ctx, nop);
+			emit_instr(ctx, srav, LO(dst), HI(dst), MIPS_R_AT);
+			emit_instr(ctx, sra, HI(dst), HI(dst), 31);
+			emit_instr(ctx, b, 5 * 4);
+			emit_instr(ctx, subu, MIPS_R_AT, MIPS_R_ZERO, MIPS_R_AT);
+			emit_instr(ctx, sllv, MIPS_R_AT, HI(dst), MIPS_R_AT);
+			emit_instr(ctx, srav, HI(dst), HI(dst), LO(src));
+			emit_instr(ctx, srlv, LO(dst), LO(dst), LO(src));
+			emit_instr(ctx, or, LO(dst), LO(dst), MIPS_R_AT);
+			break;
+		default:
+			pr_err("ALU64_REG NOT HANDLED\n");
+			return -EINVAL;
+		}
+		break;
+	case BPF_ALU | BPF_MOV | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_ADD | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_SUB | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_XOR | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_OR | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_AND | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_MUL | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_DIV | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_MOD | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_LSH | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_RSH | BPF_X: /* ALU_REG */
+	case BPF_ALU | BPF_ARSH | BPF_X: /* ALU_REG */
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		/* Special BPF_MOV zext insn from verifier. */
+		if (insn_is_zext(insn)) {
+			gen_zext_insn(dst, true, ctx);
+			break;
+		}
+		switch (bpf_op) {
+		case BPF_MOV:
+			emit_instr(ctx, move, LO(dst), LO(src));
+			break;
+		case BPF_ADD:
+			emit_instr(ctx, addu, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_SUB:
+			emit_instr(ctx, subu, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_XOR:
+			emit_instr(ctx, xor, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_OR:
+			emit_instr(ctx, or, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_AND:
+			emit_instr(ctx, and, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_MUL:
+			emit_instr(ctx, mul, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_DIV:
+		case BPF_MOD:
+			if (MIPS_ISA_REV >= 6) {
+				if (bpf_op == BPF_DIV)
+					emit_instr(ctx, divu_r6, LO(dst),
+							LO(dst), LO(src));
+				else
+					emit_instr(ctx, modu, LO(dst),
+							LO(dst), LO(src));
+				break;
+			}
+			emit_instr(ctx, divu, LO(dst), LO(src));
+			if (bpf_op == BPF_DIV)
+				emit_instr(ctx, mflo, LO(dst));
+			else
+				emit_instr(ctx, mfhi, LO(dst));
+			break;
+		case BPF_LSH:
+			emit_instr(ctx, sllv, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_RSH:
+			emit_instr(ctx, srlv, LO(dst), LO(dst), LO(src));
+			break;
+		case BPF_ARSH:
+			emit_instr(ctx, srav, LO(dst), LO(dst), LO(src));
+			break;
+		default:
+			pr_err("ALU_REG NOT HANDLED\n");
+			return -EINVAL;
+		}
+		break;
+	case BPF_JMP | BPF_EXIT:
+		if (this_idx + 1 < exit_idx) {
+			b_off = b_imm(exit_idx, ctx);
+			if (is_bad_offset(b_off)) {
+				target = j_target(ctx, exit_idx);
+				if (target == (unsigned int)-1)
+					return -E2BIG;
+				emit_instr(ctx, j, target);
+			} else {
+				emit_instr(ctx, b, b_off);
+			}
+			emit_instr(ctx, nop);
+		}
+		break;
+	case BPF_JMP32 | BPF_JSLT | BPF_X:
+	case BPF_JMP32 | BPF_JSLE | BPF_X:
+	case BPF_JMP32 | BPF_JSGT | BPF_X:
+	case BPF_JMP32 | BPF_JSGE | BPF_X:
+	case BPF_JMP32 | BPF_JSGT | BPF_K:
+	case BPF_JMP32 | BPF_JSGE | BPF_K:
+	case BPF_JMP32 | BPF_JSLT | BPF_K:
+	case BPF_JMP32 | BPF_JSLE | BPF_K:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return -EINVAL;
+
+		if (bpf_src == BPF_X) {
+			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+			if (src < 0)
+				return -EINVAL;
+		} else if (insn->imm == 0) { /* and BPF_K */
+			src = MIPS_R_ZERO;
+		} else {
+			src = MIPS_R_T8;
+			gen_imm_to_reg(insn, LO(src), ctx);
+		}
+
+		cmp_eq = bpf_op == BPF_JSLE || bpf_op == BPF_JSGE;
+		switch (bpf_op) {
+		case BPF_JSGE:
+			emit_instr(ctx, slt, MIPS_R_AT, LO(dst), LO(src));
+			break;
+		case BPF_JSLT:
+			emit_instr(ctx, slt, MIPS_R_AT, LO(dst), LO(src));
+			break;
+		case BPF_JSGT:
+			emit_instr(ctx, slt, MIPS_R_AT, LO(src), LO(dst));
+			break;
+		case BPF_JSLE:
+			emit_instr(ctx, slt, MIPS_R_AT, LO(src), LO(dst));
+			break;
+		}
+
+		src = MIPS_R_AT;
+		dst = MIPS_R_ZERO;
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JSLT | BPF_X:
+	case BPF_JMP | BPF_JSLE | BPF_X:
+	case BPF_JMP | BPF_JSGT | BPF_X:
+	case BPF_JMP | BPF_JSGE | BPF_X:
+	case BPF_JMP | BPF_JSGT | BPF_K:
+	case BPF_JMP | BPF_JSGE | BPF_K:
+	case BPF_JMP | BPF_JSLT | BPF_K:
+	case BPF_JMP | BPF_JSLE | BPF_K:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return -EINVAL;
+
+		if (bpf_src == BPF_X) {
+			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+			if (src < 0)
+				return -EINVAL;
+		} else if (insn->imm == 0) { /* and BPF_K */
+			src = MIPS_R_ZERO;
+		} else {
+			src = MIPS_R_T8;
+			gen_imm_to_reg(insn, LO(src), ctx);
+			if (insn->imm < 0)
+				gen_sext_insn(src, ctx);
+			else
+				gen_zext_insn(src, true, ctx);
+		}
+
+		cmp_eq = bpf_op == BPF_JSGT || bpf_op == BPF_JSGE;
+
+		if (bpf_op == BPF_JSGT || bpf_op == BPF_JSLE) {
+			/* Check dst <= src */
+			emit_instr(ctx, bne, HI(dst), HI(src), 4 * 4);
+			/* Delay slot */
+			emit_instr(ctx, slt, MIPS_R_AT, HI(dst), HI(src));
+			emit_instr(ctx, bne, LO(dst), LO(src), 2 * 4);
+			/* Delay slot */
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+			emit_instr(ctx, nor, MIPS_R_AT, MIPS_R_ZERO, MIPS_R_AT);
+		} else {
+			/* Check dst < src */
+			emit_instr(ctx, bne, HI(dst), HI(src), 2 * 4);
+			/* Delay slot */
+			emit_instr(ctx, slt, MIPS_R_AT, HI(dst), HI(src));
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+		}
+
+		src = MIPS_R_AT;
+		dst = MIPS_R_ZERO;
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JLT | BPF_X:
+	case BPF_JMP | BPF_JLE | BPF_X:
+	case BPF_JMP | BPF_JGT | BPF_X:
+	case BPF_JMP | BPF_JGE | BPF_X:
+	case BPF_JMP | BPF_JGT | BPF_K:
+	case BPF_JMP | BPF_JGE | BPF_K:
+	case BPF_JMP | BPF_JLT | BPF_K:
+	case BPF_JMP | BPF_JLE | BPF_K:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return -EINVAL;
+
+		if (bpf_src == BPF_X) {
+			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+			if (src < 0)
+				return -EINVAL;
+		} else if (insn->imm == 0) { /* and BPF_K */
+			src = MIPS_R_ZERO;
+		} else {
+			src = MIPS_R_T8;
+			gen_imm_to_reg(insn, LO(src), ctx);
+			if (insn->imm < 0)
+				gen_sext_insn(src, ctx);
+			else
+				gen_zext_insn(src, true, ctx);
+		}
+
+		cmp_eq = bpf_op == BPF_JGT || bpf_op == BPF_JGE;
+
+		if (bpf_op == BPF_JGT || bpf_op == BPF_JLE) {
+			/* Check dst <= src */
+			emit_instr(ctx, bne, HI(dst), HI(src), 4 * 4);
+			/* Delay slot */
+			emit_instr(ctx, sltu, MIPS_R_AT, HI(dst), HI(src));
+			emit_instr(ctx, bne, LO(dst), LO(src), 2 * 4);
+			/* Delay slot */
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+			emit_instr(ctx, nor, MIPS_R_AT, MIPS_R_ZERO, MIPS_R_AT);
+		} else {
+			/* Check dst < src */
+			emit_instr(ctx, bne, HI(dst), HI(src), 2 * 4);
+			/* Delay slot */
+			emit_instr(ctx, sltu, MIPS_R_AT, HI(dst), HI(src));
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+		}
+
+		src = MIPS_R_AT;
+		dst = MIPS_R_ZERO;
+		goto jeq_common;
+
+	case BPF_JMP32 | BPF_JLT | BPF_X:
+	case BPF_JMP32 | BPF_JLE | BPF_X:
+	case BPF_JMP32 | BPF_JGT | BPF_X:
+	case BPF_JMP32 | BPF_JGE | BPF_X:
+	case BPF_JMP32 | BPF_JGT | BPF_K:
+	case BPF_JMP32 | BPF_JGE | BPF_K:
+	case BPF_JMP32 | BPF_JLT | BPF_K:
+	case BPF_JMP32 | BPF_JLE | BPF_K:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return -EINVAL;
+
+		if (bpf_src == BPF_X) {
+			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+			if (src < 0)
+				return -EINVAL;
+		} else if (insn->imm == 0) { /* and BPF_K */
+			src = MIPS_R_ZERO;
+		} else {
+			src = MIPS_R_T8;
+			gen_imm_to_reg(insn, LO(src), ctx);
+		}
+
+		cmp_eq = bpf_op == BPF_JLE || bpf_op == BPF_JGE;
+		switch (bpf_op) {
+		case BPF_JGE:
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+			break;
+		case BPF_JLT:
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(dst), LO(src));
+			break;
+		case BPF_JGT:
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(src), LO(dst));
+			break;
+		case BPF_JLE:
+			emit_instr(ctx, sltu, MIPS_R_AT, LO(src), LO(dst));
+			break;
+		}
+
+		src = MIPS_R_AT;
+		dst = MIPS_R_ZERO;
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JEQ | BPF_X: /* JMP_REG */
+	case BPF_JMP | BPF_JNE | BPF_X:
+	case BPF_JMP32 | BPF_JEQ | BPF_X:
+	case BPF_JMP32 | BPF_JNE | BPF_X:
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+
+		cmp_eq = (bpf_op == BPF_JEQ);
+		if (bpf_class == BPF_JMP) {
+			emit_instr(ctx, beq, HI(dst), HI(src), 2 * 4);
+			/* Delay slot */
+			emit_instr(ctx, move, MIPS_R_AT, LO(src));
+			/* Make low words unequal if high word unequal. */
+			emit_instr(ctx, addu, MIPS_R_AT, LO(dst), MIPS_R_SP);
+			dst = LO(dst);
+			src = MIPS_R_AT;
+		} else { /* BPF_JMP32 */
+			dst = LO(dst);
+			src = LO(src);
+		}
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JSET | BPF_X: /* JMP_REG */
+	case BPF_JMP32 | BPF_JSET | BPF_X:
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		emit_instr(ctx, and, MIPS_R_AT, LO(dst), LO(src));
+		if (bpf_class == BPF_JMP) {
+			emit_instr(ctx, and, MIPS_R_T8, HI(dst), HI(src));
+			emit_instr(ctx, or, MIPS_R_AT, MIPS_R_AT, MIPS_R_T8);
+		}
+		cmp_eq = false;
+		dst = MIPS_R_AT;
+		src = MIPS_R_ZERO;
+jeq_common:
+		/*
+		 * If the next insn is EXIT and we are jumping arround
+		 * only it, invert the sense of the compare and
+		 * conditionally jump to the exit.  Poor man's branch
+		 * chaining.
+		 */
+		if ((insn + 1)->code == (BPF_JMP | BPF_EXIT) && insn->off == 1) {
+			b_off = b_imm(exit_idx, ctx);
+			if (is_bad_offset(b_off)) {
+				target = j_target(ctx, exit_idx);
+				if (target == (unsigned int)-1)
+					return -E2BIG;
+				cmp_eq = !cmp_eq;
+				b_off = 4 * 3;
+				if (!(ctx->offsets[this_idx] & OFFSETS_B_CONV)) {
+					ctx->offsets[this_idx] |= OFFSETS_B_CONV;
+					ctx->long_b_conversion = 1;
+				}
+			}
+
+			if (cmp_eq)
+				emit_instr(ctx, bne, dst, src, b_off);
+			else
+				emit_instr(ctx, beq, dst, src, b_off);
+			emit_instr(ctx, nop);
+			if (ctx->offsets[this_idx] & OFFSETS_B_CONV) {
+				emit_instr(ctx, j, target);
+				emit_instr(ctx, nop);
+			}
+			return 2; /* We consumed the exit. */
+		}
+		b_off = b_imm(this_idx + insn->off + 1, ctx);
+		if (is_bad_offset(b_off)) {
+			target = j_target(ctx, this_idx + insn->off + 1);
+			if (target == (unsigned int)-1)
+				return -E2BIG;
+			cmp_eq = !cmp_eq;
+			b_off = 4 * 3;
+			if (!(ctx->offsets[this_idx] & OFFSETS_B_CONV)) {
+				ctx->offsets[this_idx] |= OFFSETS_B_CONV;
+				ctx->long_b_conversion = 1;
+			}
+		}
+
+		if (cmp_eq)
+			emit_instr(ctx, beq, dst, src, b_off);
+		else
+			emit_instr(ctx, bne, dst, src, b_off);
+		emit_instr(ctx, nop);
+		if (ctx->offsets[this_idx] & OFFSETS_B_CONV) {
+			emit_instr(ctx, j, target);
+			emit_instr(ctx, nop);
+		}
+		break;
+
+	case BPF_JMP | BPF_JEQ | BPF_K: /* JMP_IMM */
+	case BPF_JMP | BPF_JNE | BPF_K: /* JMP_IMM */
+	case BPF_JMP32 | BPF_JEQ | BPF_K: /* JMP_IMM */
+	case BPF_JMP32 | BPF_JNE | BPF_K: /* JMP_IMM */
+		cmp_eq = (bpf_op == BPF_JEQ);
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		if (dst < 0)
+			return dst;
+		if (insn->imm == 0) {
+			src = MIPS_R_ZERO;
+			if (bpf_class == BPF_JMP32) {
+				dst = LO(dst);
+			} else { /* BPF_JMP */
+				emit_instr(ctx, or, MIPS_R_AT, LO(dst), HI(dst));
+				dst = MIPS_R_AT;
+			}
+		} else if (bpf_class == BPF_JMP32) {
+			gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+			src = MIPS_R_AT;
+			dst = LO(dst);
+		} else { /* BPF_JMP */
+			gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+			/* If low words equal, check high word vs imm sign. */
+			emit_instr(ctx, beq, LO(dst), MIPS_R_AT, 2 * 4);
+			emit_instr(ctx, nop);
+			/* Make high word signs unequal if low words unequal. */
+			emit_instr(ctx, nor, MIPS_R_AT, MIPS_R_ZERO, HI(dst));
+			emit_instr(ctx, sra, MIPS_R_AT, MIPS_R_AT, 31);
+			src = MIPS_R_AT;
+			dst = HI(dst);
+		}
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JSET | BPF_K: /* JMP_IMM */
+	case BPF_JMP32 | BPF_JSET | BPF_K: /* JMP_IMM */
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+
+		gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+		emit_instr(ctx, and, MIPS_R_AT, LO(dst), MIPS_R_AT);
+		if (bpf_class == BPF_JMP && insn->imm < 0)
+			emit_instr(ctx, or, MIPS_R_AT, MIPS_R_AT, HI(dst));
+		src = MIPS_R_AT;
+		dst = MIPS_R_ZERO;
+		cmp_eq = false;
+		goto jeq_common;
+
+	case BPF_JMP | BPF_JA:
+		/*
+		 * Prefer relative branch for easier debugging, but
+		 * fall back if needed.
+		 */
+		b_off = b_imm(this_idx + insn->off + 1, ctx);
+		if (is_bad_offset(b_off)) {
+			target = j_target(ctx, this_idx + insn->off + 1);
+			if (target == (unsigned int)-1)
+				return -E2BIG;
+			emit_instr(ctx, j, target);
+		} else {
+			emit_instr(ctx, b, b_off);
+		}
+		emit_instr(ctx, nop);
+		break;
+	case BPF_LD | BPF_DW | BPF_IMM:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+		gen_imm_to_reg(insn, LO(dst), ctx);
+		gen_imm_to_reg(insn+1, HI(dst), ctx);
+		return 2; /* Double slot insn */
+
+	case BPF_JMP | BPF_CALL:
+		emit_bpf_call(ctx, insn);
+		break;
+	case BPF_JMP | BPF_TAIL_CALL:
+		if (emit_bpf_tail_call(ctx, this_idx))
+			return -EINVAL;
+		break;
+
+	case BPF_ALU | BPF_END | BPF_FROM_BE:
+	case BPF_ALU | BPF_END | BPF_FROM_LE:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return dst;
+#ifdef __BIG_ENDIAN
+		need_swap = (bpf_src == BPF_FROM_LE);
+#else
+		need_swap = (bpf_src == BPF_FROM_BE);
+#endif
+		if (insn->imm == 16) {
+			if (need_swap)
+				emit_instr(ctx, wsbh, LO(dst), LO(dst));
+			emit_instr(ctx, andi, LO(dst), LO(dst), 0xffff);
+		} else if (insn->imm == 32) {
+			if (need_swap) {
+				emit_instr(ctx, wsbh, LO(dst), LO(dst));
+				emit_instr(ctx, rotr, LO(dst), LO(dst), 16);
+			}
+		} else { /* 64-bit*/
+			if (need_swap) {
+				emit_instr(ctx, wsbh, MIPS_R_AT, LO(dst));
+				emit_instr(ctx, wsbh, LO(dst), HI(dst));
+				emit_instr(ctx, rotr, HI(dst), MIPS_R_AT, 16);
+				emit_instr(ctx, rotr, LO(dst), LO(dst), 16);
+			}
+		}
+		break;
+
+	case BPF_ST | BPF_NOSPEC: /* speculation barrier */
+		break;
+
+	case BPF_ST | BPF_B | BPF_MEM:
+	case BPF_ST | BPF_H | BPF_MEM:
+	case BPF_ST | BPF_W | BPF_MEM:
+	case BPF_ST | BPF_DW | BPF_MEM:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		if (dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
+		gen_imm_to_reg(insn, MIPS_R_AT, ctx);
+
+		switch (bpf_size) {
+		case BPF_B:
+			emit_instr(ctx, sb, MIPS_R_AT, mem_off, LO(dst));
+			break;
+		case BPF_H:
+			emit_instr(ctx, sh, MIPS_R_AT, mem_off, LO(dst));
+			break;
+		case BPF_W:
+			emit_instr(ctx, sw, MIPS_R_AT, mem_off, LO(dst));
+			break;
+		case BPF_DW:
+			/* Memory order == register order in pair */
+			emit_instr(ctx, sw, MIPS_R_AT, OFFLO(mem_off), LO(dst));
+			if (insn->imm < 0) {
+				emit_instr(ctx, nor, MIPS_R_AT,
+						MIPS_R_ZERO, MIPS_R_ZERO);
+				emit_instr(ctx, sw, MIPS_R_AT,
+						OFFHI(mem_off), LO(dst));
+			} else {
+				emit_instr(ctx, sw, MIPS_R_ZERO,
+						OFFHI(mem_off), LO(dst));
+			}
+			break;
+		}
+		break;
+
+	case BPF_LDX | BPF_B | BPF_MEM:
+	case BPF_LDX | BPF_H | BPF_MEM:
+	case BPF_LDX | BPF_W | BPF_MEM:
+	case BPF_LDX | BPF_DW | BPF_MEM:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
+
+		switch (bpf_size) {
+		case BPF_B:
+			emit_instr(ctx, lbu, LO(dst), mem_off, LO(src));
+			break;
+		case BPF_H:
+			emit_instr(ctx, lhu, LO(dst), mem_off, LO(src));
+			break;
+		case BPF_W:
+			emit_instr(ctx, lw, LO(dst), mem_off, LO(src));
+			break;
+		case BPF_DW:
+			/*
+			 * Careful: update HI(dst) first in case dst == src,
+			 * since only LO(src) is the usable pointer.
+			 */
+			emit_instr(ctx, lw, HI(dst), OFFHI(mem_off), LO(src));
+			emit_instr(ctx, lw, LO(dst), OFFLO(mem_off), LO(src));
+			break;
+		}
+		break;
+
+	case BPF_STX | BPF_DW | BPF_ATOMIC:
+		r = emit_bpf_atomic64(ctx, insn);
+		if (r < 0)
+			return r;
+		break;
+	case BPF_STX | BPF_W | BPF_ATOMIC:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
+		/*
+		 * Drop reg pair scheme for more efficient temp register usage
+		 * given BPF_W mode.
+		 */
+		dst = LO(dst);
+		src = LO(src);
+		/*
+		 * If mem_off does not fit within the 9 bit ll/sc instruction
+		 * immediate field, use a temp reg.
+		 */
+		if (MIPS_ISA_REV >= 6 &&
+		    (mem_off >= BIT(8) || mem_off < -BIT(8))) {
+			emit_instr(ctx, addiu, MIPS_R_T9, dst, mem_off);
+			mem_off = 0;
+			dst = MIPS_R_T9;
+		}
+		/* Track variable branch offset due to CMPXCHG. */
+		b_off = ctx->idx;
+		emit_instr(ctx, ll, MIPS_R_AT, mem_off, dst);
+		switch (insn->imm) {
+		case BPF_AND | BPF_FETCH:
+		case BPF_AND:
+			emit_instr(ctx, and, MIPS_R_T8, MIPS_R_AT, src);
+			break;
+		case BPF_OR | BPF_FETCH:
+		case BPF_OR:
+			emit_instr(ctx, or, MIPS_R_T8, MIPS_R_AT, src);
+			break;
+		case BPF_XOR | BPF_FETCH:
+		case BPF_XOR:
+			emit_instr(ctx, xor, MIPS_R_T8, MIPS_R_AT, src);
+			break;
+		case BPF_ADD | BPF_FETCH:
+		case BPF_ADD:
+			emit_instr(ctx, addu, MIPS_R_T8, MIPS_R_AT, src);
+			break;
+		case BPF_XCHG:
+			emit_instr(ctx, move, MIPS_R_T8, src);
+			break;
+		case BPF_CMPXCHG:
+			/*
+			 * If R0 != old_val then break out of LL/SC loop
+			 */
+			emit_instr(ctx, bne, LO(r0), MIPS_R_AT, 4 * 4);
+			/* Delay slot */
+			emit_instr(ctx, move, MIPS_R_T8, src);
+			/* Return old_val in R0 */
+			src = LO(r0);
+			break;
+		default:
+			pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
+			return -EINVAL;
+		}
+		emit_instr(ctx, sc, MIPS_R_T8, mem_off, dst);
+		/*
+		 * On failure back up to LL (calculate # insns)
+		 */
+		b_off = (b_off - ctx->idx - 1) * 4;
+		emit_instr(ctx, beqz, MIPS_R_AT, b_off);
+		emit_instr(ctx, nop);
+		/*
+		 * Using fetch returns old value in src or R0
+		 */
+		if (insn->imm & BPF_FETCH)
+			emit_instr(ctx, move, src, MIPS_R_AT);
+		break;
+
+	case BPF_STX | BPF_DW | BPF_MEM:
+	case BPF_STX | BPF_B | BPF_MEM:
+	case BPF_STX | BPF_H | BPF_MEM:
+	case BPF_STX | BPF_W | BPF_MEM:
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_FP_OK);
+		src = ebpf_to_mips_reg(ctx, insn, REG_SRC_FP_OK);
+		if (src < 0 || dst < 0)
+			return -EINVAL;
+		mem_off = insn->off;
+
+		switch (bpf_size) {
+		case BPF_B:
+			emit_instr(ctx, sb, LO(src), mem_off, LO(dst));
+			break;
+		case BPF_H:
+			emit_instr(ctx, sh, LO(src), mem_off, LO(dst));
+			break;
+		case BPF_W:
+			emit_instr(ctx, sw, LO(src), mem_off, LO(dst));
+			break;
+		case BPF_DW:
+			emit_instr(ctx, sw, HI(src), OFFHI(mem_off), LO(dst));
+			emit_instr(ctx, sw, LO(src), OFFLO(mem_off), LO(dst));
+			break;
+		}
+		break;
+
+	default:
+		pr_err("NOT HANDLED %d - (%02x)\n",
+		       this_idx, (unsigned int)insn->code);
+		return -EINVAL;
+	}
+	/*
+	 * Handle zero-extension if the verifier is unable to patch and
+	 * insert it's own special zext insns.
+	 */
+	if ((bpf_class == BPF_ALU && !(bpf_op == BPF_END && insn->imm == 64)) ||
+	    (bpf_class == BPF_LDX && bpf_size != BPF_DW)) {
+		dst = ebpf_to_mips_reg(ctx, insn, REG_DST_NO_FP);
+		if (dst < 0)
+			return -EINVAL;
+		gen_zext_insn(dst, false, ctx);
+	}
+	if (insn->code == (BPF_STX|BPF_ATOMIC|BPF_W) && insn->imm & BPF_FETCH) {
+		if (insn->imm == BPF_CMPXCHG) {
+			gen_zext_insn(r0, false, ctx);
+		} else {
+			src = ebpf_to_mips_reg(ctx, insn, REG_SRC_NO_FP);
+			if (src < 0)
+				return -EINVAL;
+			gen_zext_insn(src, false, ctx);
+		}
+	}
+
+	return 1;
+}
+
+/* Enable the verifier to insert zext insn for ALU32 ops as needed. */
+bool bpf_jit_needs_zext(void)
+{
+	return true;
+}
diff --git a/arch/mips/net/ebpf_jit_core.c b/arch/mips/net/ebpf_jit_core.c
index 37b496f47ddb..6eac8150ce2a 100644
--- a/arch/mips/net/ebpf_jit_core.c
+++ b/arch/mips/net/ebpf_jit_core.c
@@ -637,7 +637,8 @@ static int build_int_body(struct jit_ctx *ctx)
 
 	for (i = 0; i < prog->len; ) {
 		insn = prog->insnsi + i;
-		if ((ctx->reg_val_types[i] & RVT_VISITED_MASK) == 0) {
+		if (is64bit() && (ctx->reg_val_types[i] &
+				  RVT_VISITED_MASK) == 0) {
 			/* dead instruction, don't emit it. */
 			i++;
 			continue;
@@ -1080,14 +1081,17 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		preempt_enable();
 	}
 
-	ctx->reg_val_types = kcalloc(prog->len + 1,
-				     sizeof(*ctx->reg_val_types),
-				     GFP_KERNEL);
-	if (!ctx->reg_val_types)
-		goto out_err;
+	/* Static analysis only used for MIPS64. */
+	if (is64bit()) {
+		ctx->reg_val_types = kcalloc(prog->len + 1,
+					     sizeof(*ctx->reg_val_types),
+					     GFP_KERNEL);
+		if (!ctx->reg_val_types)
+			goto out_err;
 
-	if (reg_val_propagate(ctx))
-		goto out_err;
+		if (reg_val_propagate(ctx))
+			goto out_err;
+	}
 
 	/*
 	 * First pass discovers used resources and instruction offsets
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-10-05  8:39 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-12  0:34 [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, add MIPS32 JIT Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 01/14] MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 02/14] MIPS: eBPF: mask 32-bit index for tail calls Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 03/14] MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 04/14] MIPS: eBPF: support BPF_JMP32 " Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 05/14] MIPS: eBPF: fix system hang with verifier dead-code patching Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 06/14] MIPS: eBPF: fix JIT static analysis hang with bounded loops Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 07/14] MIPS: eBPF: fix MOD64 insn on R6 ISA Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 08/14] MIPS: eBPF: support long jump for BPF_JMP|EXIT Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 09/14] MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 10/14] MIPS: eBPF: improve and clarify enum 'which_ebpf_reg' Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 11/14] MIPS: eBPF: add core support for 32/64-bit systems Tony Ambardar
2021-07-12  0:34 ` [RFC PATCH bpf-next v1 13/14] MIPS: uasm: Enable muhu opcode for MIPS R6 Tony Ambardar
2021-07-12  0:35 ` [RFC PATCH bpf-next v1 14/14] MIPS: eBPF: add MIPS32 JIT Tony Ambardar
2021-07-20  1:25 ` [RFC PATCH bpf-next v1 00/14] MIPS: eBPF: refactor code, " Johan Almbladh
2021-07-20 17:47   ` Alexei Starovoitov
2021-10-05  8:26 ` [RFC PATCH bpf-next v2 00/16] " Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 01/16] MIPS: eBPF: support BPF_TAIL_CALL in JIT static analysis Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 02/16] MIPS: eBPF: mask 32-bit index for tail calls Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 03/16] MIPS: eBPF: fix BPF_ALU|ARSH handling in JIT static analysis Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 04/16] MIPS: eBPF: support BPF_JMP32 " Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 05/16] MIPS: eBPF: fix system hang with verifier dead-code patching Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 06/16] MIPS: eBPF: fix JIT static analysis hang with bounded loops Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 07/16] MIPS: eBPF: fix MOD64 insn on R6 ISA Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 08/16] MIPS: eBPF: support long jump for BPF_JMP|EXIT Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 09/16] MIPS: eBPF: drop src_reg restriction in BPF_LD|BPF_DW|BPF_IMM Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 10/16] MIPS: eBPF: add core support for 32/64-bit systems Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 11/16] bpf: allow tailcalls in subprograms for MIPS64/MIPS32 Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 13/16] MIPS: eBPF64: support BPF_JMP32 conditionals Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 14/16] MIPS: eBPF64: implement all BPF_ATOMIC ops Tony Ambardar
2021-10-05  8:26   ` [RFC PATCH bpf-next v2 15/16] MIPS: uasm: Enable muhu opcode for MIPS R6 Tony Ambardar
2021-10-05  8:27   ` [RFC PATCH bpf-next v2 16/16] MIPS: eBPF: add MIPS32 JIT Tony Ambardar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).