* [RFC PATCH 0/3] RV64G eBPF JIT
@ 2019-01-15 8:35 Björn Töpel
2019-01-15 8:35 ` Björn Töpel
` (6 more replies)
0 siblings, 7 replies; 23+ messages in thread
From: Björn Töpel @ 2019-01-15 8:35 UTC (permalink / raw)
To: linux-riscv; +Cc: Björn Töpel, daniel, palmer, davidlee, netdev
Hi!
I've been hacking on a RV64G eBPF JIT compiler, and would like some
feedback.
Codewise, it needs some refactoring. Currently there's a bit too much
copy-and-paste going on, and I know some places where I could optimize
the code generation a bit (mostly BPF_K type of instructions, dealing
with immediates).
>From a features perspective, two things are missing:
* tail calls
* "far-branches", i.e. conditional branches that reach beyond 13b.
The test_bpf.ko (only tested on 4.20!) passes all tests.
I've done all the tests on QEMU (version 3.1.50), so no real hardware.
Some questions/observations:
* I've added "HAVE_EFFICIENT_UNALIGNED_ACCESS" to
arch/riscv/Kconfig. Is this assumption correct?
* emit_imm() just relies on lui, adds and shifts. No fancy xori cost
optimizations like GCC does.
* Suggestions on how to implement the tail call, given that the
prologue/epilogue has variable size. I will dig into the details of
mips/arm64/x86. :-)
Next steps (prior patch proper) is cleaning up the code, add tail
calls, and making sure that bpftool disassembly works correctly.
All input are welcome. This is my first RISC-V hack, so I sure there
are a lot things to improve!
Thanks,
Björn
Björn Töpel (3):
riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS
riscv: add build infra for JIT compiler
bpf, riscv: added eBPF JIT for RV64G
arch/riscv/Kconfig | 2 +
arch/riscv/Makefile | 4 +
arch/riscv/net/Makefile | 5 +
arch/riscv/net/bpf_jit_comp.c | 1612 +++++++++++++++++++++++++++++++++
4 files changed, 1623 insertions(+)
create mode 100644 arch/riscv/net/Makefile
create mode 100644 arch/riscv/net/bpf_jit_comp.c
--
2.19.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH 0/3] RV64G eBPF JIT 2019-01-15 8:35 [RFC PATCH 0/3] RV64G eBPF JIT Björn Töpel @ 2019-01-15 8:35 ` Björn Töpel 2019-01-15 8:35 ` [RFC PATCH 1/3] riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS Björn Töpel ` (5 subsequent siblings) 6 siblings, 0 replies; 23+ messages in thread From: Björn Töpel @ 2019-01-15 8:35 UTC (permalink / raw) To: linux-riscv; +Cc: Björn Töpel, daniel, palmer, davidlee, netdev Hi! I've been hacking on a RV64G eBPF JIT compiler, and would like some feedback. Codewise, it needs some refactoring. Currently there's a bit too much copy-and-paste going on, and I know some places where I could optimize the code generation a bit (mostly BPF_K type of instructions, dealing with immediates). From a features perspective, two things are missing: * tail calls * "far-branches", i.e. conditional branches that reach beyond 13b. The test_bpf.ko (only tested on 4.20!) passes all tests. I've done all the tests on QEMU (version 3.1.50), so no real hardware. Some questions/observations: * I've added "HAVE_EFFICIENT_UNALIGNED_ACCESS" to arch/riscv/Kconfig. Is this assumption correct? * emit_imm() just relies on lui, adds and shifts. No fancy xori cost optimizations like GCC does. * Suggestions on how to implement the tail call, given that the prologue/epilogue has variable size. I will dig into the details of mips/arm64/x86. :-) Next steps (prior patch proper) is cleaning up the code, add tail calls, and making sure that bpftool disassembly works correctly. All input are welcome. This is my first RISC-V hack, so I sure there are a lot things to improve! Thanks, Björn Björn Töpel (3): riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS riscv: add build infra for JIT compiler bpf, riscv: added eBPF JIT for RV64G arch/riscv/Kconfig | 2 + arch/riscv/Makefile | 4 + arch/riscv/net/Makefile | 5 + arch/riscv/net/bpf_jit_comp.c | 1612 +++++++++++++++++++++++++++++++++ 4 files changed, 1623 insertions(+) create mode 100644 arch/riscv/net/Makefile create mode 100644 arch/riscv/net/bpf_jit_comp.c -- 2.19.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH 1/3] riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS 2019-01-15 8:35 [RFC PATCH 0/3] RV64G eBPF JIT Björn Töpel 2019-01-15 8:35 ` Björn Töpel @ 2019-01-15 8:35 ` Björn Töpel 2019-01-15 15:39 ` Christoph Hellwig 2019-01-15 8:35 ` [RFC PATCH 2/3] riscv: add build infra for JIT compiler Björn Töpel ` (4 subsequent siblings) 6 siblings, 1 reply; 23+ messages in thread From: Björn Töpel @ 2019-01-15 8:35 UTC (permalink / raw) To: linux-riscv; +Cc: Björn Töpel, daniel, palmer, davidlee, netdev Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> --- arch/riscv/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index feeeaa60697c..f13220904d7c 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -49,6 +49,7 @@ config RISCV select RISCV_TIMER select GENERIC_IRQ_MULTI_HANDLER select ARCH_HAS_PTE_SPECIAL + select HAVE_EFFICIENT_UNALIGNED_ACCESS config MMU def_bool y -- 2.19.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 1/3] riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS 2019-01-15 8:35 ` [RFC PATCH 1/3] riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS Björn Töpel @ 2019-01-15 15:39 ` Christoph Hellwig 2019-01-15 16:06 ` Björn Töpel 0 siblings, 1 reply; 23+ messages in thread From: Christoph Hellwig @ 2019-01-15 15:39 UTC (permalink / raw) To: Björn Töpel; +Cc: linux-riscv, palmer, davidlee, daniel, netdev Hmm, while the RISC-V spec requires misaligned load/store support, who says they are efficient? Maybe add a little comment that says on which cpus they are efficient. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 1/3] riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS 2019-01-15 15:39 ` Christoph Hellwig @ 2019-01-15 16:06 ` Björn Töpel 2019-01-25 20:21 ` Palmer Dabbelt 0 siblings, 1 reply; 23+ messages in thread From: Björn Töpel @ 2019-01-15 16:06 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-riscv, Palmer Dabbelt, davidlee, Daniel Borkmann, Netdev Den tis 15 jan. 2019 kl 16:39 skrev Christoph Hellwig <hch@infradead.org>: > > Hmm, while the RISC-V spec requires misaligned load/store support, > who says they are efficient? Maybe add a little comment that says > on which cpus they are efficient. Good point! :-) I need to check how other architectures does this. Enabling it for *all* RV64 is probably not correct. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 1/3] riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS 2019-01-15 16:06 ` Björn Töpel @ 2019-01-25 20:21 ` Palmer Dabbelt 2019-01-26 1:33 ` Jim Wilson 0 siblings, 1 reply; 23+ messages in thread From: Palmer Dabbelt @ 2019-01-25 20:21 UTC (permalink / raw) To: bjorn.topel, Jim Wilson Cc: Christoph Hellwig, linux-riscv, davidlee, daniel, netdev On Tue, 15 Jan 2019 08:06:47 PST (-0800), bjorn.topel@gmail.com wrote: > Den tis 15 jan. 2019 kl 16:39 skrev Christoph Hellwig <hch@infradead.org>: >> >> Hmm, while the RISC-V spec requires misaligned load/store support, >> who says they are efficient? Maybe add a little comment that says >> on which cpus they are efficient. > > Good point! :-) I need to check how other architectures does this. > Enabling it for *all* RV64 is probably not correct. RISC-V mandates that misaligned memory accesses execute correctly in S-mode, but allow them to be trapped and emulated in M-mode. As a result they can be quite slow. Every microarchitecture I know of traps misaligned accesses into M-mode, so for now we're probably safe just unconditionally saying they're slow. GCC does have a tuning parameter that says "are misaligned accesses fast?" that we set depending on -mtune, but it doesn't appear to be exposed as a preprocessor macro. I think it's probably best to just expose the tuning parameter as a macro so software that needs to know this has one standard way of doing it. Jim, would you be opposed to something like this? diff --git a/riscv-c-api.md b/riscv-c-api.md index 0b0236c38826..a790f5cc23ee 100644 --- a/riscv-c-api.md +++ b/riscv-c-api.md @@ -52,6 +52,10 @@ https://creativecommons.org/licenses/by/4.0/. * `__riscv_cmodel_medlow` * `__riscv_cmodel_medany` * `__riscv_cmodel_pic` +* `__riscv_tune_misaligned_load_cost`: The number of cycles a word-sized + misaligned load will take. +* `__riscv_tune_misaligned_store_cost`: The number of cycles a word-sized + misaligned store will take. ## Function Attributes Which I think shouldn't be too much of a headache to implement in GCC -- I haven't compiled this yet, though... diff --git a/gcc/config/riscv/riscv-c.c b/gcc/config/riscv/riscv-c.c index ca72de74a7b4..fa71a4a22104 100644 --- a/gcc/config/riscv/riscv-c.c +++ b/gcc/config/riscv/riscv-c.c @@ -98,4 +98,9 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile) builtin_define ("__riscv_cmodel_pic"); break; } + + builtin_define_with_int_value ("__riscv_tune_misaligned_load_cost", + riscv_tune_info->slow_unaligned_access ? 1024 : 1); + builtin_define_with_int_value ("__riscv_tune_misaligned_store_cost", + riscv_tune_info->slow_unaligned_access ? 1024 : 1); } diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h index a3ab6cec33b4..d58a307d27b4 100644 --- a/gcc/config/riscv/riscv-opts.h +++ b/gcc/config/riscv/riscv-opts.h @@ -39,4 +39,6 @@ enum riscv_code_model { }; extern enum riscv_code_model riscv_cmodel; +extern struct riscv_tune_info riscv_tune_info; + #endif /* ! GCC_RISCV_OPTS_H */ diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index bf4571d91b8c..671c2ddaaa0f 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -226,7 +226,7 @@ struct riscv_cpu_info { const char *name; /* Tuning parameters for this CPU. */ - const struct riscv_tune_info *tune_info; + const struct riscv_tune_info *riscv_tune_info; }; /* Global variables for machine-dependent things. */ @@ -243,7 +243,7 @@ unsigned riscv_stack_boundary; static int epilogue_cfa_sp_offset; /* Which tuning parameters to use. */ -static const struct riscv_tune_info *tune_info; +const struct riscv_tune_info *riscv_tune_info; /* Index R is the smallest register class that contains register R. */ const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = { @@ -1528,7 +1528,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN instructions it needs. */ if ((cost = riscv_address_insns (XEXP (x, 0), mode, true)) > 0) { - *total = COSTS_N_INSNS (cost + tune_info->memory_cost); + *total = COSTS_N_INSNS (cost + riscv_tune_info->memory_cost); return true; } /* Otherwise use the default handling. */ @@ -1592,7 +1592,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN mode instead. */ mode = GET_MODE (XEXP (x, 0)); if (float_mode_p) - *total = tune_info->fp_add[mode == DFmode]; + *total = riscv_tune_info->fp_add[mode == DFmode]; else *total = riscv_binary_cost (x, 1, 3); return false; @@ -1601,14 +1601,14 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN case ORDERED: /* (FEQ(A, A) & FEQ(B, B)) compared against 0. */ mode = GET_MODE (XEXP (x, 0)); - *total = tune_info->fp_add[mode == DFmode] + COSTS_N_INSNS (2); + *total = riscv_tune_info->fp_add[mode == DFmode] + COSTS_N_INSNS (2); return false; case UNEQ: case LTGT: /* (FEQ(A, A) & FEQ(B, B)) compared against FEQ(A, B). */ mode = GET_MODE (XEXP (x, 0)); - *total = tune_info->fp_add[mode == DFmode] + COSTS_N_INSNS (3); + *total = riscv_tune_info->fp_add[mode == DFmode] + COSTS_N_INSNS (3); return false; case UNGE: @@ -1617,13 +1617,13 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN case UNLT: /* FLT or FLE, but guarded by an FFLAGS read and write. */ mode = GET_MODE (XEXP (x, 0)); - *total = tune_info->fp_add[mode == DFmode] + COSTS_N_INSNS (4); + *total = riscv_tune_info->fp_add[mode == DFmode] + COSTS_N_INSNS (4); return false; case MINUS: case PLUS: if (float_mode_p) - *total = tune_info->fp_add[mode == DFmode]; + *total = riscv_tune_info->fp_add[mode == DFmode]; else *total = riscv_binary_cost (x, 1, 4); return false; @@ -1633,7 +1633,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN rtx op = XEXP (x, 0); if (GET_CODE (op) == FMA && !HONOR_SIGNED_ZEROS (mode)) { - *total = (tune_info->fp_mul[mode == DFmode] + *total = (riscv_tune_info->fp_mul[mode == DFmode] + set_src_cost (XEXP (op, 0), mode, speed) + set_src_cost (XEXP (op, 1), mode, speed) + set_src_cost (XEXP (op, 2), mode, speed)); @@ -1642,23 +1642,23 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN } if (float_mode_p) - *total = tune_info->fp_add[mode == DFmode]; + *total = riscv_tune_info->fp_add[mode == DFmode]; else *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) > UNITS_PER_WORD ? 4 : 1); return false; case MULT: if (float_mode_p) - *total = tune_info->fp_mul[mode == DFmode]; + *total = riscv_tune_info->fp_mul[mode == DFmode]; else if (!TARGET_MUL) /* Estimate the cost of a library call. */ *total = COSTS_N_INSNS (speed ? 32 : 6); else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD) - *total = 3 * tune_info->int_mul[0] + COSTS_N_INSNS (2); + *total = 3 * riscv_tune_info->int_mul[0] + COSTS_N_INSNS (2); else if (!speed) *total = COSTS_N_INSNS (1); else - *total = tune_info->int_mul[mode == DImode]; + *total = riscv_tune_info->int_mul[mode == DImode]; return false; case DIV: @@ -1666,7 +1666,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN case MOD: if (float_mode_p) { - *total = tune_info->fp_div[mode == DFmode]; + *total = riscv_tune_info->fp_div[mode == DFmode]; return false; } /* Fall through. */ @@ -1677,7 +1677,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN /* Estimate the cost of a library call. */ *total = COSTS_N_INSNS (speed ? 32 : 6); else if (speed) - *total = tune_info->int_div[mode == DImode]; + *total = riscv_tune_info->int_div[mode == DImode]; else *total = COSTS_N_INSNS (1); return false; @@ -1699,11 +1699,11 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN case FIX: case FLOAT_EXTEND: case FLOAT_TRUNCATE: - *total = tune_info->fp_add[mode == DFmode]; + *total = riscv_tune_info->fp_add[mode == DFmode]; return false; case FMA: - *total = (tune_info->fp_mul[mode == DFmode] + *total = (riscv_tune_info->fp_mul[mode == DFmode] + set_src_cost (XEXP (x, 0), mode, speed) + set_src_cost (XEXP (x, 1), mode, speed) + set_src_cost (XEXP (x, 2), mode, speed)); @@ -4165,7 +4165,7 @@ riscv_class_max_nregs (reg_class_t rclass, machine_mode mode) static int riscv_memory_move_cost (machine_mode mode, reg_class_t rclass, bool in) { - return (tune_info->memory_cost + return (riscv_tune_info->memory_cost + memory_move_secondary_cost (mode, rclass, in)); } @@ -4174,7 +4174,7 @@ riscv_memory_move_cost (machine_mode mode, reg_class_t rclass, bool in) static int riscv_issue_rate (void) { - return tune_info->issue_rate; + return riscv_tune_info->issue_rate; } /* Implement TARGET_ASM_FILE_START. */ @@ -4307,22 +4307,22 @@ riscv_option_override (void) /* Handle -mtune. */ cpu = riscv_parse_cpu (riscv_tune_string ? riscv_tune_string : RISCV_TUNE_STRING_DEFAULT); - tune_info = optimize_size ? &optimize_size_tune_info : cpu->tune_info; + riscv_tune_info = optimize_size ? &optimize_size_tune_info : cpu->riscv_tune_info; /* Use -mtune's setting for slow_unaligned_access, even when optimizing for size. For architectures that trap and emulate unaligned accesses, the performance cost is too great, even for -Os. Similarly, if -m[no-]strict-align is left unspecified, heed -mtune's advice. */ - riscv_slow_unaligned_access_p = (cpu->tune_info->slow_unaligned_access + riscv_slow_unaligned_access_p = (cpu->riscv_tune_info->slow_unaligned_access || TARGET_STRICT_ALIGN); if ((target_flags_explicit & MASK_STRICT_ALIGN) == 0 - && cpu->tune_info->slow_unaligned_access) + && cpu->riscv_tune_info->slow_unaligned_access) target_flags |= MASK_STRICT_ALIGN; /* If the user hasn't specified a branch cost, use the processor's default. */ if (riscv_branch_cost == 0) - riscv_branch_cost = tune_info->branch_cost; + riscv_branch_cost = riscv_tune_info->branch_cost; /* Function to allocate machine-dependent function status. */ init_machine_status = &riscv_init_machine_status; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 1/3] riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS 2019-01-25 20:21 ` Palmer Dabbelt @ 2019-01-26 1:33 ` Jim Wilson 2019-01-29 2:43 ` Palmer Dabbelt 0 siblings, 1 reply; 23+ messages in thread From: Jim Wilson @ 2019-01-26 1:33 UTC (permalink / raw) To: Palmer Dabbelt Cc: bjorn.topel, Christoph Hellwig, linux-riscv, David Lee, daniel, netdev On Fri, Jan 25, 2019 at 12:21 PM Palmer Dabbelt <palmer@sifive.com> wrote: > Jim, would you be opposed to something like this? This looks OK to me. > + builtin_define_with_int_value ("__riscv_tune_misaligned_load_cost", > + riscv_tune_info->slow_unaligned_access ? 1024 : 1); > + builtin_define_with_int_value ("__riscv_tune_misaligned_store_cost", > + riscv_tune_info->slow_unaligned_access ? 1024 : 1); It would be nice to have a better way to compute these values, maybe an extra field in the tune structure, but we can always worry about that later when we need it. Jim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 1/3] riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS 2019-01-26 1:33 ` Jim Wilson @ 2019-01-29 2:43 ` Palmer Dabbelt 0 siblings, 0 replies; 23+ messages in thread From: Palmer Dabbelt @ 2019-01-29 2:43 UTC (permalink / raw) To: Jim Wilson Cc: bjorn.topel, Christoph Hellwig, linux-riscv, davidlee, daniel, netdev On Fri, 25 Jan 2019 17:33:50 PST (-0800), Jim Wilson wrote: > On Fri, Jan 25, 2019 at 12:21 PM Palmer Dabbelt <palmer@sifive.com> wrote: >> Jim, would you be opposed to something like this? > > This looks OK to me. OK, thanks. I'll send some patches around :) > >> + builtin_define_with_int_value ("__riscv_tune_misaligned_load_cost", >> + riscv_tune_info->slow_unaligned_access ? 1024 : 1); >> + builtin_define_with_int_value ("__riscv_tune_misaligned_store_cost", >> + riscv_tune_info->slow_unaligned_access ? 1024 : 1); > > It would be nice to have a better way to compute these values, maybe > an extra field in the tune structure, but we can always worry about > that later when we need it. I agree. I just went and designed the external interface first and hid the ugliness here. The internal interfaces are easier to change :) ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH 2/3] riscv: add build infra for JIT compiler 2019-01-15 8:35 [RFC PATCH 0/3] RV64G eBPF JIT Björn Töpel 2019-01-15 8:35 ` Björn Töpel 2019-01-15 8:35 ` [RFC PATCH 1/3] riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS Björn Töpel @ 2019-01-15 8:35 ` Björn Töpel 2019-01-15 15:43 ` Christoph Hellwig 2019-01-15 8:35 ` [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G Björn Töpel ` (3 subsequent siblings) 6 siblings, 1 reply; 23+ messages in thread From: Björn Töpel @ 2019-01-15 8:35 UTC (permalink / raw) To: linux-riscv; +Cc: Björn Töpel, daniel, palmer, davidlee, netdev Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> --- arch/riscv/Kconfig | 1 + arch/riscv/Makefile | 4 ++++ arch/riscv/net/Makefile | 5 +++++ arch/riscv/net/bpf_jit_comp.c | 4 ++++ 4 files changed, 14 insertions(+) create mode 100644 arch/riscv/net/Makefile create mode 100644 arch/riscv/net/bpf_jit_comp.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index f13220904d7c..3edaa5958262 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -50,6 +50,7 @@ config RISCV select GENERIC_IRQ_MULTI_HANDLER select ARCH_HAS_PTE_SPECIAL select HAVE_EFFICIENT_UNALIGNED_ACCESS + select HAVE_EBPF_JIT if 64BIT config MMU def_bool y diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index 4b594f2e4f7e..ad487f3c1d7b 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -79,6 +79,10 @@ head-y := arch/riscv/kernel/head.o core-y += arch/riscv/kernel/ arch/riscv/mm/ +ifeq ($(CONFIG_ARCH_RV64I),y) +core-y += arch/riscv/net/ +endif + libs-y += arch/riscv/lib/ PHONY += vdso_install diff --git a/arch/riscv/net/Makefile b/arch/riscv/net/Makefile new file mode 100644 index 000000000000..b0b6ac13edf5 --- /dev/null +++ b/arch/riscv/net/Makefile @@ -0,0 +1,5 @@ +# +# RISCV networking code +# + +obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c new file mode 100644 index 000000000000..7e359d3249ee --- /dev/null +++ b/arch/riscv/net/bpf_jit_comp.c @@ -0,0 +1,4 @@ +struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) +{ + return prog; +} -- 2.19.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/3] riscv: add build infra for JIT compiler 2019-01-15 8:35 ` [RFC PATCH 2/3] riscv: add build infra for JIT compiler Björn Töpel @ 2019-01-15 15:43 ` Christoph Hellwig 2019-01-15 16:09 ` Björn Töpel 0 siblings, 1 reply; 23+ messages in thread From: Christoph Hellwig @ 2019-01-15 15:43 UTC (permalink / raw) To: Björn Töpel; +Cc: linux-riscv, palmer, davidlee, daniel, netdev > core-y += arch/riscv/kernel/ arch/riscv/mm/ > > +ifeq ($(CONFIG_ARCH_RV64I),y) > +core-y += arch/riscv/net/ > +endif I think this should be core-$(CONFIG_ARCH_RV64I) to get the same result. Or even better just core-y given that the Kconfig dependencies should ensure you can't ever enable CONFIG_BPF_JIT for 32-bit builds. > new file mode 100644 > index 000000000000..b0b6ac13edf5 > --- /dev/null > +++ b/arch/riscv/net/Makefile > @@ -0,0 +1,5 @@ > +# > +# RISCV networking code > +# I don't think this comment adds any value. In fact it is highly confusing given that we use bpf for a lot more than networking these days. > diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c > new file mode 100644 > index 000000000000..7e359d3249ee > --- /dev/null > +++ b/arch/riscv/net/bpf_jit_comp.c > @@ -0,0 +1,4 @@ > +struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) > +{ > + return prog; > +} Please don't just add stubs files. This patch should probably be merged into the one adding the actual implementation. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/3] riscv: add build infra for JIT compiler 2019-01-15 15:43 ` Christoph Hellwig @ 2019-01-15 16:09 ` Björn Töpel 0 siblings, 0 replies; 23+ messages in thread From: Björn Töpel @ 2019-01-15 16:09 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-riscv, Palmer Dabbelt, davidlee, Daniel Borkmann, Netdev Den tis 15 jan. 2019 kl 16:43 skrev Christoph Hellwig <hch@infradead.org>: > > > core-y += arch/riscv/kernel/ arch/riscv/mm/ > > > > +ifeq ($(CONFIG_ARCH_RV64I),y) > > +core-y += arch/riscv/net/ > > +endif > > I think this should be core-$(CONFIG_ARCH_RV64I) to get the same result. > Or even better just core-y given that the Kconfig dependencies should > ensure you can't ever enable CONFIG_BPF_JIT for 32-bit builds. > Good point! I'll address that! > > new file mode 100644 > > index 000000000000..b0b6ac13edf5 > > --- /dev/null > > +++ b/arch/riscv/net/Makefile > > @@ -0,0 +1,5 @@ > > +# > > +# RISCV networking code > > +# > > I don't think this comment adds any value. In fact it is highly > confusing given that we use bpf for a lot more than networking these > days. > Yeah, I agree. I'll remove that. > > diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c > > new file mode 100644 > > index 000000000000..7e359d3249ee > > --- /dev/null > > +++ b/arch/riscv/net/bpf_jit_comp.c > > @@ -0,0 +1,4 @@ > > +struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) > > +{ > > + return prog; > > +} > > Please don't just add stubs files. This patch should probably be merged > into the one adding the actual implementation. Noted! I'll remove that! Thanks for taking a look, Christoph! Björn ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G 2019-01-15 8:35 [RFC PATCH 0/3] RV64G eBPF JIT Björn Töpel ` (2 preceding siblings ...) 2019-01-15 8:35 ` [RFC PATCH 2/3] riscv: add build infra for JIT compiler Björn Töpel @ 2019-01-15 8:35 ` Björn Töpel 2019-01-15 8:35 ` Björn Töpel 2019-01-15 23:49 ` Daniel Borkmann 2019-01-15 15:40 ` [RFC PATCH 0/3] RV64G eBPF JIT Christoph Hellwig ` (2 subsequent siblings) 6 siblings, 2 replies; 23+ messages in thread From: Björn Töpel @ 2019-01-15 8:35 UTC (permalink / raw) To: linux-riscv; +Cc: Björn Töpel, daniel, palmer, davidlee, netdev This commit adds eBPF JIT for RV64G. Codewise, it needs some refactoring. Currently there's a bit too much copy-and-paste going on, and I know some places where I could optimize the code generation a bit (mostly BPF_K type of instructions, dealing with immediates). >From a features perspective, two things are missing: * tail calls * "far-branches", i.e. conditional branches that reach beyond 13b. The test_bpf.ko passes all tests. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> --- arch/riscv/net/bpf_jit_comp.c | 1608 +++++++++++++++++++++++++++++++++ 1 file changed, 1608 insertions(+) diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c index 7e359d3249ee..562d56eb8d23 100644 --- a/arch/riscv/net/bpf_jit_comp.c +++ b/arch/riscv/net/bpf_jit_comp.c @@ -1,4 +1,1612 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * BPF JIT compiler for RV64G + * + * Copyright(c) 2019 Björn Töpel <bjorn.topel@gmail.com> + * + */ + +#include <linux/bpf.h> +#include <linux/filter.h> +#include <asm/cacheflush.h> + +#define TMP_REG_0 (MAX_BPF_JIT_REG + 0) +#define TMP_REG_1 (MAX_BPF_JIT_REG + 1) +#define TAIL_CALL_REG (MAX_BPF_JIT_REG + 2) + +enum rv_register { + RV_REG_ZERO = 0, /* The constant value 0 */ + RV_REG_RA = 1, /* Return address */ + RV_REG_SP = 2, /* Stack pointer */ + RV_REG_GP = 3, /* Global pointer */ + RV_REG_TP = 4, /* Thread pointer */ + RV_REG_T0 = 5, /* Temporaries */ + RV_REG_T1 = 6, + RV_REG_T2 = 7, + RV_REG_FP = 8, + RV_REG_S1 = 9, /* Saved registers */ + RV_REG_A0 = 10, /* Function argument/return values */ + RV_REG_A1 = 11, /* Function arguments */ + RV_REG_A2 = 12, + RV_REG_A3 = 13, + RV_REG_A4 = 14, + RV_REG_A5 = 15, + RV_REG_A6 = 16, + RV_REG_A7 = 17, + RV_REG_S2 = 18, /* Saved registers */ + RV_REG_S3 = 19, + RV_REG_S4 = 20, + RV_REG_S5 = 21, + RV_REG_S6 = 22, + RV_REG_S7 = 23, + RV_REG_S8 = 24, + RV_REG_S9 = 25, + RV_REG_S10 = 26, + RV_REG_S11 = 27, + RV_REG_T3 = 28, /* Temporaries */ + RV_REG_T4 = 29, + RV_REG_T5 = 30, + RV_REG_T6 = 31, +}; + +struct rv_jit_context { + struct bpf_prog *prog; + u32 *insns; /* RV insns */ + int ninsns; + int epilogue_offset; + int *offset; /* BPF to RV */ + unsigned long seen_reg_bits; + int stack_size; +}; + +struct rv_jit_data { + struct bpf_binary_header *header; + u8 *image; + struct rv_jit_context ctx; +}; + +static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx) +{ + switch (bpf_reg) { + /* Return value */ + case BPF_REG_0: + __set_bit(RV_REG_A5, &ctx->seen_reg_bits); + return RV_REG_A5; + /* Function arguments */ + case BPF_REG_1: + __set_bit(RV_REG_A0, &ctx->seen_reg_bits); + return RV_REG_A0; + case BPF_REG_2: + __set_bit(RV_REG_A1, &ctx->seen_reg_bits); + return RV_REG_A1; + case BPF_REG_3: + __set_bit(RV_REG_A2, &ctx->seen_reg_bits); + return RV_REG_A2; + case BPF_REG_4: + __set_bit(RV_REG_A3, &ctx->seen_reg_bits); + return RV_REG_A3; + case BPF_REG_5: + __set_bit(RV_REG_A4, &ctx->seen_reg_bits); + return RV_REG_A4; + /* Callee saved registers */ + case BPF_REG_6: + __set_bit(RV_REG_S1, &ctx->seen_reg_bits); + return RV_REG_S1; + case BPF_REG_7: + __set_bit(RV_REG_S2, &ctx->seen_reg_bits); + return RV_REG_S2; + case BPF_REG_8: + __set_bit(RV_REG_S3, &ctx->seen_reg_bits); + return RV_REG_S3; + case BPF_REG_9: + __set_bit(RV_REG_S4, &ctx->seen_reg_bits); + return RV_REG_S4; + /* Stack read-only frame pointer to access stack */ + case BPF_REG_FP: + __set_bit(RV_REG_S5, &ctx->seen_reg_bits); + return RV_REG_S5; + /* Temporary register */ + case BPF_REG_AX: + __set_bit(RV_REG_T0, &ctx->seen_reg_bits); + return RV_REG_T0; + /* Tail call counter */ + case TAIL_CALL_REG: + __set_bit(RV_REG_S6, &ctx->seen_reg_bits); + return RV_REG_S6; + default: + return 0; + } +}; + +static void seen_call(struct rv_jit_context *ctx) +{ + __set_bit(RV_REG_RA, &ctx->seen_reg_bits); +} + +static bool seen_reg(int rv_reg, struct rv_jit_context *ctx) +{ + return test_bit(rv_reg, &ctx->seen_reg_bits); +} + +static void emit(const u32 insn, struct rv_jit_context *ctx) +{ + if (ctx->insns) + ctx->insns[ctx->ninsns] = insn; + + ctx->ninsns++; +} + +static u32 rv_r_insn(u8 funct7, u8 rs2, u8 rs1, u8 funct3, u8 rd, u8 opcode) +{ + return (funct7 << 25) | (rs2 << 20) | (rs1 << 15) | (funct3 << 12) | + (rd << 7) | opcode; +} + +static u32 rv_i_insn(u16 imm11_0, u8 rs1, u8 funct3, u8 rd, u8 opcode) +{ + return (imm11_0 << 20) | (rs1 << 15) | (funct3 << 12) | (rd << 7) | + opcode; +} + +static u32 rv_s_insn(u16 imm11_0, u8 rs2, u8 rs1, u8 funct3, u8 opcode) +{ + u8 imm11_5 = imm11_0 >> 5, imm4_0 = imm11_0 & 0x1f; + + return (imm11_5 << 25) | (rs2 << 20) | (rs1 << 15) | (funct3 << 12) | + (imm4_0 << 7) | opcode; +} + +static u32 rv_sb_insn(u16 imm12_1, u8 rs2, u8 rs1, u8 funct3, u8 opcode) +{ + u8 imm12 = ((imm12_1 & 0x800) >> 5) | ((imm12_1 & 0x3f0) >> 4); + u8 imm4_1 = ((imm12_1 & 0xf) << 1) | ((imm12_1 & 0x400) >> 10); + + return (imm12 << 25) | (rs2 << 20) | (rs1 << 15) | (funct3 << 12) | + (imm4_1 << 7) | opcode; +} + +static u32 rv_u_insn(u32 imm31_12, u8 rd, u8 opcode) +{ + return (imm31_12 << 12) | (rd << 7) | opcode; +} + +static u32 rv_uj_insn(u32 imm20_1, u8 rd, u8 opcode) +{ + u32 imm; + + imm = (imm20_1 & 0x80000) | ((imm20_1 & 0x3ff) << 9) | + ((imm20_1 & 0x400) >> 2) | ((imm20_1 & 0x7f800) >> 11); + + return (imm << 12) | (rd << 7) | opcode; +} + +static u32 rv_amo_insn(u8 funct5, u8 aq, u8 rl, u8 rs2, u8 rs1, + u8 funct3, u8 rd, u8 opcode) +{ + u8 funct7 = (funct5 << 2) | (aq << 1) | rl; + + return rv_r_insn(funct7, rs2, rs1, funct3, rd, opcode); +} + +static u32 rv_addiw(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 0, rd, 0x1b); +} + +static u32 rv_addi(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 0, rd, 0x13); +} + +static u32 rv_addw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 0, rd, 0x3b); +} + +static u32 rv_add(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 0, rd, 0x33); +} + +static u32 rv_subw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0x20, rs2, rs1, 0, rd, 0x3b); +} + +static u32 rv_sub(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0x20, rs2, rs1, 0, rd, 0x33); +} + +static u32 rv_and(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 7, rd, 0x33); +} + +static u32 rv_or(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 6, rd, 0x33); +} + +static u32 rv_xor(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 4, rd, 0x33); +} + +static u32 rv_mulw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 0, rd, 0x3b); +} + +static u32 rv_mul(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 0, rd, 0x33); +} + +static u32 rv_divuw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 5, rd, 0x3b); +} + +static u32 rv_divu(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 5, rd, 0x33); +} + +static u32 rv_remuw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 7, rd, 0x3b); +} + +static u32 rv_remu(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 7, rd, 0x33); +} + +static u32 rv_sllw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 1, rd, 0x3b); +} + +static u32 rv_sll(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 1, rd, 0x33); +} + +static u32 rv_srlw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 5, rd, 0x3b); +} + +static u32 rv_srl(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 5, rd, 0x33); +} + +static u32 rv_sraw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0x20, rs2, rs1, 5, rd, 0x3b); +} + +static u32 rv_sra(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0x20, rs2, rs1, 5, rd, 0x33); +} + +static u32 rv_lui(u8 rd, u32 imm31_12) +{ + return rv_u_insn(imm31_12, rd, 0x37); +} + +static u32 rv_slli(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 1, rd, 0x13); +} + +static u32 rv_andi(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 7, rd, 0x13); +} + +static u32 rv_ori(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 6, rd, 0x13); +} + +static u32 rv_xori(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 4, rd, 0x13); +} + +static u32 rv_slliw(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 1, rd, 0x1b); +} + +static u32 rv_srliw(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 5, rd, 0x1b); +} + +static u32 rv_srli(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 5, rd, 0x13); +} + +static u32 rv_sraiw(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(0x400 | imm11_0, rs1, 5, rd, 0x1b); +} + +static u32 rv_srai(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(0x400 | imm11_0, rs1, 5, rd, 0x13); +} + +#if 0 +static u32 rv_auipc(u8 rd, u32 imm31_12) +{ + return rv_u_insn(imm31_12, rd, 0x17); +} +#endif + +static u32 rv_jal(u8 rd, u32 imm20_1) +{ + return rv_uj_insn(imm20_1, rd, 0x6f); +} + +static u32 rv_jalr(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 0, rd, 0x67); +} + +static u32 rv_beq(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 0, 0x63); +} + +static u32 rv_bltu(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 6, 0x63); +} + +static u32 rv_bgeu(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 7, 0x63); +} + +static u32 rv_bne(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 1, 0x63); +} + +static u32 rv_blt(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 4, 0x63); +} + +static u32 rv_bge(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 5, 0x63); +} + +static u32 rv_sb(u8 rs1, u16 imm11_0, u8 rs2) +{ + return rv_s_insn(imm11_0, rs2, rs1, 0, 0x23); +} + +static u32 rv_sh(u8 rs1, u16 imm11_0, u8 rs2) +{ + return rv_s_insn(imm11_0, rs2, rs1, 1, 0x23); +} + +static u32 rv_sw(u8 rs1, u16 imm11_0, u8 rs2) +{ + return rv_s_insn(imm11_0, rs2, rs1, 2, 0x23); +} + +static u32 rv_sd(u8 rs1, u16 imm11_0, u8 rs2) +{ + return rv_s_insn(imm11_0, rs2, rs1, 3, 0x23); +} + +#if 0 +static u32 rv_lb(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 0, rd, 0x03); +} +#endif + +static u32 rv_lbu(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 4, rd, 0x03); +} + +#if 0 +static u32 rv_lh(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 1, rd, 0x03); +} +#endif + +static u32 rv_lhu(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 5, rd, 0x03); +} + +#if 0 +static u32 rv_lw(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 2, rd, 0x03); +} +#endif + +static u32 rv_lwu(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 6, rd, 0x03); +} + +static u32 rv_ld(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 3, rd, 0x03); +} + +static u32 rv_amoadd_w(u8 rd, u8 rs2, u8 rs1, u8 aq, u8 rl) +{ + return rv_amo_insn(0, aq, rl, rs2, rs1, 2, rd, 0x2f); +} + +static u32 rv_amoadd_d(u8 rd, u8 rs2, u8 rs1, u8 aq, u8 rl) +{ + return rv_amo_insn(0, aq, rl, rs2, rs1, 3, rd, 0x2f); +} + +static bool is_12b_int(s64 val) +{ + return -(1 << 11) <= val && val < (1 << 11); +} + +static bool is_32b_int(s64 val) +{ + return -(1L << 31) <= val && val < (1L << 31); +} + +/* jumps */ +static bool is_21b_int(s64 val) +{ + return -(1L << 20) <= val && val < (1L << 20); + +} + +/* conditional branches */ +static bool is_13b_int(s64 val) +{ + return -(1 << 12) <= val && val < (1 << 12); +} + +static void emit_imm(u8 rd, s64 val, struct rv_jit_context *ctx) +{ + /* Note that the immediate from the add is sign-extended, + * which means that we need to compensate this by adding 2^12, + * when the 12th bit is set. A simpler way of doing this, and + * getting rid of the check, is to just add 2**11 before the + * shift. The "Loading a 32-Bit constant" example from the + * "Computer Organization and Design, RISC-V edition" book by + * Patterson/Hennessy highlights this fact. + * + * This also means that we need to process LSB to MSB. + */ + s64 upper = (val + (1 << 11)) >> 12, lower = val & 0xfff; + int shift; + + if (is_32b_int(val)) { + if (upper) + emit(rv_lui(rd, upper), ctx); + + if (!upper) { + emit(rv_addi(rd, RV_REG_ZERO, lower), ctx); + return; + } + + emit(rv_addiw(rd, rd, lower), ctx); + return; + } + + shift = __ffs(upper); + upper >>= shift; + shift += 12; + + emit_imm(rd, upper, ctx); + + emit(rv_slli(rd, rd, shift), ctx); + if (lower) + emit(rv_addi(rd, rd, lower), ctx); +} + +static int rv_offset(int bpf_to, int bpf_from, struct rv_jit_context *ctx) +{ + int from = ctx->offset[bpf_from] - 1, to = ctx->offset[bpf_to]; + + return (to - from) << 2; +} + +static int epilogue_offset(struct rv_jit_context *ctx) +{ + int to = ctx->epilogue_offset, from = ctx->ninsns; + + return (to - from) << 2; +} + +static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, + bool extra_pass) +{ + bool is64 = BPF_CLASS(insn->code) == BPF_ALU64; + int rvoff, i = insn - ctx->prog->insnsi; + u8 rd, rs, code = insn->code; + s16 off = insn->off; + s32 imm = insn->imm; + + switch (code) { + /* dst = src */ + case BPF_ALU | BPF_MOV | BPF_X: + case BPF_ALU64 | BPF_MOV | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_addi(rd, rs, 0) : rv_addiw(rd, rs, 0), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + + /* dst = dst OP src */ + case BPF_ALU | BPF_ADD | BPF_X: + case BPF_ALU64 | BPF_ADD | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_add(rd, rd, rs) : rv_addw(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_SUB | BPF_X: + case BPF_ALU64 | BPF_SUB | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_sub(rd, rd, rs) : rv_subw(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_AND | BPF_X: + case BPF_ALU64 | BPF_AND | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_and(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_OR | BPF_X: + case BPF_ALU64 | BPF_OR | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_or(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_XOR | BPF_X: + case BPF_ALU64 | BPF_XOR | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_xor(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_MUL | BPF_X: + case BPF_ALU64 | BPF_MUL | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_mul(rd, rd, rs) : rv_mulw(rd, rd, rs), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_DIV | BPF_X: + case BPF_ALU64 | BPF_DIV | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_divu(rd, rd, rs) : rv_divuw(rd, rd, rs), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_MOD | BPF_X: + case BPF_ALU64 | BPF_MOD | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_remu(rd, rd, rs) : rv_remuw(rd, rd, rs), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_LSH | BPF_X: + case BPF_ALU64 | BPF_LSH | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_sll(rd, rd, rs) : rv_sllw(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_RSH | BPF_X: + case BPF_ALU64 | BPF_RSH | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_srl(rd, rd, rs) : rv_srlw(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_ARSH | BPF_X: + case BPF_ALU64 | BPF_ARSH | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_sra(rd, rd, rs) : rv_sraw(rd, rd, rs), ctx); + break; + + /* dst = -dst */ + case BPF_ALU | BPF_NEG: + case BPF_ALU64 | BPF_NEG: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? + rv_sub(rd, RV_REG_ZERO, rd) : + rv_subw(rd, RV_REG_ZERO, rd), + ctx); + break; + + /* dst = BSWAP##imm(dst) */ + case BPF_ALU | BPF_END | BPF_FROM_LE: + { + int shift = 64 - imm; + + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_slli(rd, rd, shift), ctx); + emit(rv_srli(rd, rd, shift), ctx); + break; + } + case BPF_ALU | BPF_END | BPF_FROM_BE: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + + emit(rv_addi(RV_REG_T2, RV_REG_ZERO, 0), ctx); + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + if (imm == 16) + goto out_be; + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + if (imm == 32) + goto out_be; + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + out_be: + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + + emit(rv_addi(rd, RV_REG_T2, 0), ctx); + break; + + /* dst = imm */ + case BPF_ALU | BPF_MOV | BPF_K: + case BPF_ALU64 | BPF_MOV | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(rd, imm, ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + + /* dst = dst OP imm */ + case BPF_ALU | BPF_ADD | BPF_K: + case BPF_ALU64 | BPF_ADD | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(imm)) { + emit(is64 ? rv_addi(rd, rd, imm) : + rv_addiw(rd, rd, imm), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + } + emit_imm(RV_REG_T1, imm, ctx); + emit(is64 ? rv_add(rd, rd, RV_REG_T1) : + rv_addw(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_SUB | BPF_K: + case BPF_ALU64 | BPF_SUB | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(-imm)) { + emit(is64 ? rv_addi(rd, rd, -imm) : + rv_addiw(rd, rd, -imm), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + } + emit_imm(RV_REG_T1, imm, ctx); + emit(is64 ? rv_sub(rd, rd, RV_REG_T1) : + rv_subw(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_AND | BPF_K: + case BPF_ALU64 | BPF_AND | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(imm)) { + emit(rv_andi(rd, rd, imm), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + } + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_and(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_OR | BPF_K: + case BPF_ALU64 | BPF_OR | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(imm)) { + emit(rv_ori(rd, rd, imm), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + } + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_or(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_XOR | BPF_K: + case BPF_ALU64 | BPF_XOR | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(imm)) { + emit(rv_xori(rd, rd, imm), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + } + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_xor(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_MUL | BPF_K: + case BPF_ALU64 | BPF_MUL | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(is64 ? rv_mul(rd, rd, RV_REG_T1) : + rv_mulw(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_DIV | BPF_K: + case BPF_ALU64 | BPF_DIV | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(is64 ? rv_divu(rd, rd, RV_REG_T1) : + rv_divuw(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_MOD | BPF_K: + case BPF_ALU64 | BPF_MOD | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(is64 ? rv_remu(rd, rd, RV_REG_T1) : + rv_remuw(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_LSH | BPF_K: + case BPF_ALU64 | BPF_LSH | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_slli(rd, rd, imm) : + rv_slliw(rd, rd, imm), ctx); + break; + case BPF_ALU | BPF_RSH | BPF_K: + case BPF_ALU64 | BPF_RSH | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_srli(rd, rd, imm) : + rv_srliw(rd, rd, imm), ctx); + break; + case BPF_ALU | BPF_ARSH | BPF_K: + case BPF_ALU64 | BPF_ARSH | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_srai(rd, rd, imm) : + rv_sraiw(rd, rd, imm), ctx); + break; + + /* JUMP off */ + case BPF_JMP | BPF_JA: + rvoff = rv_offset(i + off, i, ctx); + if (!is_21b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, rvoff); + return -1; + } + + emit(rv_jal(RV_REG_ZERO, rvoff >> 1), ctx); + break; + + /* IF (dst COND src) JUMP off */ + case BPF_JMP | BPF_JEQ | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_beq(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JGT | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bltu(rs, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JLT | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bltu(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JGE | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bgeu(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JLE | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bgeu(rs, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JNE | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bne(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSGT | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_blt(rs, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSLT | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_blt(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSGE | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bge(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSLE | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bge(rs, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSET | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_and(RV_REG_T1, rd, rs), ctx); + emit(rv_bne(RV_REG_T1, RV_REG_ZERO, rvoff >> 1), ctx); + break; + + /* IF (dst COND imm) JUMP off */ + case BPF_JMP | BPF_JEQ | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_beq(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JGT | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bltu(RV_REG_T1, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JLT | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bltu(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JGE | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bgeu(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JLE | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bgeu(RV_REG_T1, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JNE | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bne(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSGT | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_blt(RV_REG_T1, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSLT | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_blt(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSGE | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bge(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSLE | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bge(RV_REG_T1, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSET | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T2, imm, ctx); + emit(rv_and(RV_REG_T1, rd, RV_REG_T2), ctx); + emit(rv_bne(RV_REG_T1, RV_REG_ZERO, rvoff >> 1), ctx); + break; + + /* function call */ + case BPF_JMP | BPF_CALL: + { + bool fixed; + int i, ret; + u64 addr; + + seen_call(ctx); + ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass, &addr, + &fixed); + if (ret < 0) + return ret; + if (fixed) { + emit_imm(RV_REG_T1, addr, ctx); + } else { + i = ctx->ninsns; + emit_imm(RV_REG_T1, addr, ctx); + for (i = ctx->ninsns - i; i < 8; i++) { + /* nop */ + emit(rv_addi(RV_REG_ZERO, RV_REG_ZERO, 0), + ctx); + } + } + emit(rv_jalr(RV_REG_RA, RV_REG_T1, 0), ctx); + rd = bpf_to_rv_reg(BPF_REG_0, ctx); + emit(rv_addi(rd, RV_REG_A0, 0), ctx); + break; + } + /* tail call */ + case BPF_JMP | BPF_TAIL_CALL: + rd = bpf_to_rv_reg(TAIL_CALL_REG, ctx); + pr_err("bpf-jit: tail call not supported yet!\n"); + return -1; + + /* function return */ + case BPF_JMP | BPF_EXIT: + if (i == ctx->prog->len - 1) + break; + + rvoff = epilogue_offset(ctx); + if (!is_21b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, rvoff); + return -1; + } + + emit(rv_jal(RV_REG_ZERO, rvoff >> 1), ctx); + break; + + /* dst = imm64 */ + case BPF_LD | BPF_IMM | BPF_DW: + { + struct bpf_insn insn1 = insn[1]; + u64 imm64; + + imm64 = (u64)insn1.imm << 32 | (u32)imm; + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(rd, imm64, ctx); + return 1; + } + + /* LDX: dst = *(size *)(src + off) */ + case BPF_LDX | BPF_MEM | BPF_B: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_lbu(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx); + emit(rv_lbu(rd, 0, RV_REG_T1), ctx); + break; + case BPF_LDX | BPF_MEM | BPF_H: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_lhu(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx); + emit(rv_lhu(rd, 0, RV_REG_T1), ctx); + break; + case BPF_LDX | BPF_MEM | BPF_W: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_lwu(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx); + emit(rv_lwu(rd, 0, RV_REG_T1), ctx); + break; + case BPF_LDX | BPF_MEM | BPF_DW: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_ld(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx); + emit(rv_ld(rd, 0, RV_REG_T1), ctx); + break; + + /* ST: *(size *)(dst + off) = imm */ + case BPF_ST | BPF_MEM | BPF_B: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + if (is_12b_int(off)) { + emit(rv_sb(rd, off, RV_REG_T1), ctx); + break; + } + + emit_imm(RV_REG_T2, off, ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx); + emit(rv_sb(RV_REG_T2, 0, RV_REG_T1), ctx); + break; + + case BPF_ST | BPF_MEM | BPF_H: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + if (is_12b_int(off)) { + emit(rv_sh(rd, off, RV_REG_T1), ctx); + break; + } + + emit_imm(RV_REG_T2, off, ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx); + emit(rv_sh(RV_REG_T2, 0, RV_REG_T1), ctx); + break; + case BPF_ST | BPF_MEM | BPF_W: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + if (is_12b_int(off)) { + emit(rv_sw(rd, off, RV_REG_T1), ctx); + break; + } + + emit_imm(RV_REG_T2, off, ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx); + emit(rv_sw(RV_REG_T2, 0, RV_REG_T1), ctx); + break; + case BPF_ST | BPF_MEM | BPF_DW: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + if (is_12b_int(off)) { + emit(rv_sd(rd, off, RV_REG_T1), ctx); + break; + } + + emit_imm(RV_REG_T2, off, ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx); + emit(rv_sd(RV_REG_T2, 0, RV_REG_T1), ctx); + break; + + /* STX: *(size *)(dst + off) = src */ + case BPF_STX | BPF_MEM | BPF_B: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_sb(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx); + emit(rv_sb(RV_REG_T1, 0, rs), ctx); + break; + case BPF_STX | BPF_MEM | BPF_H: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_sh(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx); + emit(rv_sh(RV_REG_T1, 0, rs), ctx); + break; + case BPF_STX | BPF_MEM | BPF_W: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_sw(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx); + emit(rv_sw(RV_REG_T1, 0, rs), ctx); + break; + case BPF_STX | BPF_MEM | BPF_DW: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_sd(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx); + emit(rv_sd(RV_REG_T1, 0, rs), ctx); + break; + /* STX XADD: lock *(u32 *)(dst + off) += src */ + case BPF_STX | BPF_XADD | BPF_W: + /* STX XADD: lock *(u64 *)(dst + off) += src */ + case BPF_STX | BPF_XADD | BPF_DW: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (off) { + if (is_12b_int(off)) { + emit(rv_addi(RV_REG_T1, rd, off), ctx); + } else { + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx); + } + + rd = RV_REG_T1; + } + + emit(BPF_SIZE(code) == BPF_W ? + rv_amoadd_w(RV_REG_ZERO, rs, rd, 0, 0) : + rv_amoadd_d(RV_REG_ZERO, rs, rd, 0, 0), ctx); + break; + default: + pr_err("bpf-jit: unknown opcode %02x\n", code); + return -EINVAL; + } + + return 0; +} + +static void build_prologue(struct rv_jit_context *ctx) +{ + int stack_adjust = 0, store_offset, bpf_stack_adjust; + + if (seen_reg(RV_REG_RA, ctx)) + stack_adjust += 8; + stack_adjust += 8; /* RV_REG_FP */ + if (seen_reg(RV_REG_S1, ctx)) + stack_adjust += 8; + if (seen_reg(RV_REG_S2, ctx)) + stack_adjust += 8; + if (seen_reg(RV_REG_S3, ctx)) + stack_adjust += 8; + if (seen_reg(RV_REG_S4, ctx)) + stack_adjust += 8; + if (seen_reg(RV_REG_S5, ctx)) + stack_adjust += 8; + if (seen_reg(RV_REG_S6, ctx)) + stack_adjust += 8; + + stack_adjust = round_up(stack_adjust, 16); + bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); + stack_adjust += bpf_stack_adjust; + + store_offset = stack_adjust - 8; + + emit(rv_addi(RV_REG_SP, RV_REG_SP, -stack_adjust), ctx); + + if (seen_reg(RV_REG_RA, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_RA), ctx); + store_offset -= 8; + } + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_FP), ctx); + store_offset -= 8; + if (seen_reg(RV_REG_S1, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S1), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S2, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S2), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S3, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S3), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S4, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S4), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S5, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S5), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S6, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S6), ctx); + store_offset -= 8; + } + + emit(rv_addi(RV_REG_FP, RV_REG_SP, stack_adjust), ctx); + + if (bpf_stack_adjust) { + if (!seen_reg(RV_REG_S5, ctx)) + pr_warn("bpf-jit: not seen BPF_REG_FP, stack is %d\n", + bpf_stack_adjust); + emit(rv_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust), ctx); + } + + ctx->stack_size = stack_adjust; +} + +static void build_epilogue(struct rv_jit_context *ctx) +{ + int stack_adjust = ctx->stack_size, store_offset = stack_adjust - 8; + + if (seen_reg(RV_REG_RA, ctx)) { + emit(rv_ld(RV_REG_RA, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + emit(rv_ld(RV_REG_FP, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + if (seen_reg(RV_REG_S1, ctx)) { + emit(rv_ld(RV_REG_S1, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S2, ctx)) { + emit(rv_ld(RV_REG_S2, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S3, ctx)) { + emit(rv_ld(RV_REG_S3, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S4, ctx)) { + emit(rv_ld(RV_REG_S4, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S5, ctx)) { + emit(rv_ld(RV_REG_S5, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S6, ctx)) { + emit(rv_ld(RV_REG_S6, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + + emit(rv_addi(RV_REG_SP, RV_REG_SP, stack_adjust), ctx); + /* Set return value. */ + emit(rv_addi(RV_REG_A0, RV_REG_A5, 0), ctx); + emit(rv_jalr(RV_REG_ZERO, RV_REG_RA, 0), ctx); +} + +static int build_body(struct rv_jit_context *ctx, bool extra_pass) +{ + const struct bpf_prog *prog = ctx->prog; + int i; + + for (i = 0; i < prog->len; i++) { + const struct bpf_insn *insn = &prog->insnsi[i]; + int ret; + + ret = emit_insn(insn, ctx, extra_pass); + if (ret > 0) { + i++; + if (ctx->insns == NULL) + ctx->offset[i] = ctx->ninsns; + continue; + } + if (ctx->insns == NULL) + ctx->offset[i] = ctx->ninsns; + if (ret) + return ret; + } + return 0; +} + +static void bpf_fill_ill_insns(void *area, unsigned int size) +{ + memset(area, 0, size); +} + +static void bpf_flush_icache(void *start, void *end) +{ + flush_icache_range((unsigned long)start, (unsigned long)end); +} + struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) { + bool tmp_blinded = false, extra_pass = false; + struct bpf_prog *tmp, *orig_prog = prog; + struct rv_jit_data *jit_data; + struct rv_jit_context *ctx; + unsigned int image_size; + + if (!prog->jit_requested) + return orig_prog; + + tmp = bpf_jit_blind_constants(prog); + if (IS_ERR(tmp)) + return orig_prog; + if (tmp != prog) { + tmp_blinded = true; + prog = tmp; + } + + jit_data = prog->aux->jit_data; + if (!jit_data) { + jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL); + if (!jit_data) { + prog = orig_prog; + goto out; + } + prog->aux->jit_data = jit_data; + } + + ctx = &jit_data->ctx; + + if (ctx->offset) { + extra_pass = true; + image_size = sizeof(u32) * ctx->ninsns; + goto skip_init_ctx; + } + + ctx->prog = prog; + ctx->offset = kcalloc(prog->len, sizeof(int), GFP_KERNEL); + if (!ctx->offset) { + prog = orig_prog; + goto out_offset; + } + + /* First pass generates the ctx->offset, but does not emit an image. */ + if (build_body(ctx, extra_pass)) { + prog = orig_prog; + goto out_offset; + } + build_prologue(ctx); + ctx->epilogue_offset = ctx->ninsns; + build_epilogue(ctx); + + /* Allocate image, now that we know the size. */ + image_size = sizeof(u32) * ctx->ninsns; + jit_data->header = bpf_jit_binary_alloc(image_size, &jit_data->image, + sizeof(u32), + bpf_fill_ill_insns); + if (!jit_data->header) { + prog = orig_prog; + goto out_offset; + } + + /* Second, real pass, that acutally emits the image. */ + ctx->insns = (u32 *)jit_data->image; +skip_init_ctx: + ctx->ninsns = 0; + + build_prologue(ctx); + if (build_body(ctx, extra_pass)) { + bpf_jit_binary_free(jit_data->header); + prog = orig_prog; + goto out_offset; + } + build_epilogue(ctx); + + if (bpf_jit_enable > 1) + bpf_jit_dump(prog->len, image_size, 2, ctx->insns); + + prog->bpf_func = (void *)ctx->insns; + prog->jited = 1; + prog->jited_len = image_size; + + bpf_flush_icache(jit_data->header, (u8 *)ctx->insns + ctx->ninsns); + + if (!prog->is_func || extra_pass) { +out_offset: + kfree(ctx->offset); + kfree(jit_data); + prog->aux->jit_data = NULL; + } +out: + if (tmp_blinded) + bpf_jit_prog_release_other(prog, prog == orig_prog ? + tmp : orig_prog); return prog; } -- 2.19.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G 2019-01-15 8:35 ` [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G Björn Töpel @ 2019-01-15 8:35 ` Björn Töpel 2019-01-15 23:49 ` Daniel Borkmann 1 sibling, 0 replies; 23+ messages in thread From: Björn Töpel @ 2019-01-15 8:35 UTC (permalink / raw) To: linux-riscv; +Cc: Björn Töpel, daniel, palmer, davidlee, netdev This commit adds eBPF JIT for RV64G. Codewise, it needs some refactoring. Currently there's a bit too much copy-and-paste going on, and I know some places where I could optimize the code generation a bit (mostly BPF_K type of instructions, dealing with immediates). From a features perspective, two things are missing: * tail calls * "far-branches", i.e. conditional branches that reach beyond 13b. The test_bpf.ko passes all tests. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> --- arch/riscv/net/bpf_jit_comp.c | 1608 +++++++++++++++++++++++++++++++++ 1 file changed, 1608 insertions(+) diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c index 7e359d3249ee..562d56eb8d23 100644 --- a/arch/riscv/net/bpf_jit_comp.c +++ b/arch/riscv/net/bpf_jit_comp.c @@ -1,4 +1,1612 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * BPF JIT compiler for RV64G + * + * Copyright(c) 2019 Björn Töpel <bjorn.topel@gmail.com> + * + */ + +#include <linux/bpf.h> +#include <linux/filter.h> +#include <asm/cacheflush.h> + +#define TMP_REG_0 (MAX_BPF_JIT_REG + 0) +#define TMP_REG_1 (MAX_BPF_JIT_REG + 1) +#define TAIL_CALL_REG (MAX_BPF_JIT_REG + 2) + +enum rv_register { + RV_REG_ZERO = 0, /* The constant value 0 */ + RV_REG_RA = 1, /* Return address */ + RV_REG_SP = 2, /* Stack pointer */ + RV_REG_GP = 3, /* Global pointer */ + RV_REG_TP = 4, /* Thread pointer */ + RV_REG_T0 = 5, /* Temporaries */ + RV_REG_T1 = 6, + RV_REG_T2 = 7, + RV_REG_FP = 8, + RV_REG_S1 = 9, /* Saved registers */ + RV_REG_A0 = 10, /* Function argument/return values */ + RV_REG_A1 = 11, /* Function arguments */ + RV_REG_A2 = 12, + RV_REG_A3 = 13, + RV_REG_A4 = 14, + RV_REG_A5 = 15, + RV_REG_A6 = 16, + RV_REG_A7 = 17, + RV_REG_S2 = 18, /* Saved registers */ + RV_REG_S3 = 19, + RV_REG_S4 = 20, + RV_REG_S5 = 21, + RV_REG_S6 = 22, + RV_REG_S7 = 23, + RV_REG_S8 = 24, + RV_REG_S9 = 25, + RV_REG_S10 = 26, + RV_REG_S11 = 27, + RV_REG_T3 = 28, /* Temporaries */ + RV_REG_T4 = 29, + RV_REG_T5 = 30, + RV_REG_T6 = 31, +}; + +struct rv_jit_context { + struct bpf_prog *prog; + u32 *insns; /* RV insns */ + int ninsns; + int epilogue_offset; + int *offset; /* BPF to RV */ + unsigned long seen_reg_bits; + int stack_size; +}; + +struct rv_jit_data { + struct bpf_binary_header *header; + u8 *image; + struct rv_jit_context ctx; +}; + +static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx) +{ + switch (bpf_reg) { + /* Return value */ + case BPF_REG_0: + __set_bit(RV_REG_A5, &ctx->seen_reg_bits); + return RV_REG_A5; + /* Function arguments */ + case BPF_REG_1: + __set_bit(RV_REG_A0, &ctx->seen_reg_bits); + return RV_REG_A0; + case BPF_REG_2: + __set_bit(RV_REG_A1, &ctx->seen_reg_bits); + return RV_REG_A1; + case BPF_REG_3: + __set_bit(RV_REG_A2, &ctx->seen_reg_bits); + return RV_REG_A2; + case BPF_REG_4: + __set_bit(RV_REG_A3, &ctx->seen_reg_bits); + return RV_REG_A3; + case BPF_REG_5: + __set_bit(RV_REG_A4, &ctx->seen_reg_bits); + return RV_REG_A4; + /* Callee saved registers */ + case BPF_REG_6: + __set_bit(RV_REG_S1, &ctx->seen_reg_bits); + return RV_REG_S1; + case BPF_REG_7: + __set_bit(RV_REG_S2, &ctx->seen_reg_bits); + return RV_REG_S2; + case BPF_REG_8: + __set_bit(RV_REG_S3, &ctx->seen_reg_bits); + return RV_REG_S3; + case BPF_REG_9: + __set_bit(RV_REG_S4, &ctx->seen_reg_bits); + return RV_REG_S4; + /* Stack read-only frame pointer to access stack */ + case BPF_REG_FP: + __set_bit(RV_REG_S5, &ctx->seen_reg_bits); + return RV_REG_S5; + /* Temporary register */ + case BPF_REG_AX: + __set_bit(RV_REG_T0, &ctx->seen_reg_bits); + return RV_REG_T0; + /* Tail call counter */ + case TAIL_CALL_REG: + __set_bit(RV_REG_S6, &ctx->seen_reg_bits); + return RV_REG_S6; + default: + return 0; + } +}; + +static void seen_call(struct rv_jit_context *ctx) +{ + __set_bit(RV_REG_RA, &ctx->seen_reg_bits); +} + +static bool seen_reg(int rv_reg, struct rv_jit_context *ctx) +{ + return test_bit(rv_reg, &ctx->seen_reg_bits); +} + +static void emit(const u32 insn, struct rv_jit_context *ctx) +{ + if (ctx->insns) + ctx->insns[ctx->ninsns] = insn; + + ctx->ninsns++; +} + +static u32 rv_r_insn(u8 funct7, u8 rs2, u8 rs1, u8 funct3, u8 rd, u8 opcode) +{ + return (funct7 << 25) | (rs2 << 20) | (rs1 << 15) | (funct3 << 12) | + (rd << 7) | opcode; +} + +static u32 rv_i_insn(u16 imm11_0, u8 rs1, u8 funct3, u8 rd, u8 opcode) +{ + return (imm11_0 << 20) | (rs1 << 15) | (funct3 << 12) | (rd << 7) | + opcode; +} + +static u32 rv_s_insn(u16 imm11_0, u8 rs2, u8 rs1, u8 funct3, u8 opcode) +{ + u8 imm11_5 = imm11_0 >> 5, imm4_0 = imm11_0 & 0x1f; + + return (imm11_5 << 25) | (rs2 << 20) | (rs1 << 15) | (funct3 << 12) | + (imm4_0 << 7) | opcode; +} + +static u32 rv_sb_insn(u16 imm12_1, u8 rs2, u8 rs1, u8 funct3, u8 opcode) +{ + u8 imm12 = ((imm12_1 & 0x800) >> 5) | ((imm12_1 & 0x3f0) >> 4); + u8 imm4_1 = ((imm12_1 & 0xf) << 1) | ((imm12_1 & 0x400) >> 10); + + return (imm12 << 25) | (rs2 << 20) | (rs1 << 15) | (funct3 << 12) | + (imm4_1 << 7) | opcode; +} + +static u32 rv_u_insn(u32 imm31_12, u8 rd, u8 opcode) +{ + return (imm31_12 << 12) | (rd << 7) | opcode; +} + +static u32 rv_uj_insn(u32 imm20_1, u8 rd, u8 opcode) +{ + u32 imm; + + imm = (imm20_1 & 0x80000) | ((imm20_1 & 0x3ff) << 9) | + ((imm20_1 & 0x400) >> 2) | ((imm20_1 & 0x7f800) >> 11); + + return (imm << 12) | (rd << 7) | opcode; +} + +static u32 rv_amo_insn(u8 funct5, u8 aq, u8 rl, u8 rs2, u8 rs1, + u8 funct3, u8 rd, u8 opcode) +{ + u8 funct7 = (funct5 << 2) | (aq << 1) | rl; + + return rv_r_insn(funct7, rs2, rs1, funct3, rd, opcode); +} + +static u32 rv_addiw(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 0, rd, 0x1b); +} + +static u32 rv_addi(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 0, rd, 0x13); +} + +static u32 rv_addw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 0, rd, 0x3b); +} + +static u32 rv_add(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 0, rd, 0x33); +} + +static u32 rv_subw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0x20, rs2, rs1, 0, rd, 0x3b); +} + +static u32 rv_sub(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0x20, rs2, rs1, 0, rd, 0x33); +} + +static u32 rv_and(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 7, rd, 0x33); +} + +static u32 rv_or(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 6, rd, 0x33); +} + +static u32 rv_xor(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 4, rd, 0x33); +} + +static u32 rv_mulw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 0, rd, 0x3b); +} + +static u32 rv_mul(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 0, rd, 0x33); +} + +static u32 rv_divuw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 5, rd, 0x3b); +} + +static u32 rv_divu(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 5, rd, 0x33); +} + +static u32 rv_remuw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 7, rd, 0x3b); +} + +static u32 rv_remu(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(1, rs2, rs1, 7, rd, 0x33); +} + +static u32 rv_sllw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 1, rd, 0x3b); +} + +static u32 rv_sll(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 1, rd, 0x33); +} + +static u32 rv_srlw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 5, rd, 0x3b); +} + +static u32 rv_srl(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0, rs2, rs1, 5, rd, 0x33); +} + +static u32 rv_sraw(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0x20, rs2, rs1, 5, rd, 0x3b); +} + +static u32 rv_sra(u8 rd, u8 rs1, u8 rs2) +{ + return rv_r_insn(0x20, rs2, rs1, 5, rd, 0x33); +} + +static u32 rv_lui(u8 rd, u32 imm31_12) +{ + return rv_u_insn(imm31_12, rd, 0x37); +} + +static u32 rv_slli(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 1, rd, 0x13); +} + +static u32 rv_andi(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 7, rd, 0x13); +} + +static u32 rv_ori(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 6, rd, 0x13); +} + +static u32 rv_xori(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 4, rd, 0x13); +} + +static u32 rv_slliw(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 1, rd, 0x1b); +} + +static u32 rv_srliw(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 5, rd, 0x1b); +} + +static u32 rv_srli(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 5, rd, 0x13); +} + +static u32 rv_sraiw(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(0x400 | imm11_0, rs1, 5, rd, 0x1b); +} + +static u32 rv_srai(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(0x400 | imm11_0, rs1, 5, rd, 0x13); +} + +#if 0 +static u32 rv_auipc(u8 rd, u32 imm31_12) +{ + return rv_u_insn(imm31_12, rd, 0x17); +} +#endif + +static u32 rv_jal(u8 rd, u32 imm20_1) +{ + return rv_uj_insn(imm20_1, rd, 0x6f); +} + +static u32 rv_jalr(u8 rd, u8 rs1, u16 imm11_0) +{ + return rv_i_insn(imm11_0, rs1, 0, rd, 0x67); +} + +static u32 rv_beq(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 0, 0x63); +} + +static u32 rv_bltu(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 6, 0x63); +} + +static u32 rv_bgeu(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 7, 0x63); +} + +static u32 rv_bne(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 1, 0x63); +} + +static u32 rv_blt(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 4, 0x63); +} + +static u32 rv_bge(u8 rs1, u8 rs2, u16 imm12_1) +{ + return rv_sb_insn(imm12_1, rs2, rs1, 5, 0x63); +} + +static u32 rv_sb(u8 rs1, u16 imm11_0, u8 rs2) +{ + return rv_s_insn(imm11_0, rs2, rs1, 0, 0x23); +} + +static u32 rv_sh(u8 rs1, u16 imm11_0, u8 rs2) +{ + return rv_s_insn(imm11_0, rs2, rs1, 1, 0x23); +} + +static u32 rv_sw(u8 rs1, u16 imm11_0, u8 rs2) +{ + return rv_s_insn(imm11_0, rs2, rs1, 2, 0x23); +} + +static u32 rv_sd(u8 rs1, u16 imm11_0, u8 rs2) +{ + return rv_s_insn(imm11_0, rs2, rs1, 3, 0x23); +} + +#if 0 +static u32 rv_lb(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 0, rd, 0x03); +} +#endif + +static u32 rv_lbu(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 4, rd, 0x03); +} + +#if 0 +static u32 rv_lh(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 1, rd, 0x03); +} +#endif + +static u32 rv_lhu(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 5, rd, 0x03); +} + +#if 0 +static u32 rv_lw(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 2, rd, 0x03); +} +#endif + +static u32 rv_lwu(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 6, rd, 0x03); +} + +static u32 rv_ld(u8 rd, u16 imm11_0, u8 rs1) +{ + return rv_i_insn(imm11_0, rs1, 3, rd, 0x03); +} + +static u32 rv_amoadd_w(u8 rd, u8 rs2, u8 rs1, u8 aq, u8 rl) +{ + return rv_amo_insn(0, aq, rl, rs2, rs1, 2, rd, 0x2f); +} + +static u32 rv_amoadd_d(u8 rd, u8 rs2, u8 rs1, u8 aq, u8 rl) +{ + return rv_amo_insn(0, aq, rl, rs2, rs1, 3, rd, 0x2f); +} + +static bool is_12b_int(s64 val) +{ + return -(1 << 11) <= val && val < (1 << 11); +} + +static bool is_32b_int(s64 val) +{ + return -(1L << 31) <= val && val < (1L << 31); +} + +/* jumps */ +static bool is_21b_int(s64 val) +{ + return -(1L << 20) <= val && val < (1L << 20); + +} + +/* conditional branches */ +static bool is_13b_int(s64 val) +{ + return -(1 << 12) <= val && val < (1 << 12); +} + +static void emit_imm(u8 rd, s64 val, struct rv_jit_context *ctx) +{ + /* Note that the immediate from the add is sign-extended, + * which means that we need to compensate this by adding 2^12, + * when the 12th bit is set. A simpler way of doing this, and + * getting rid of the check, is to just add 2**11 before the + * shift. The "Loading a 32-Bit constant" example from the + * "Computer Organization and Design, RISC-V edition" book by + * Patterson/Hennessy highlights this fact. + * + * This also means that we need to process LSB to MSB. + */ + s64 upper = (val + (1 << 11)) >> 12, lower = val & 0xfff; + int shift; + + if (is_32b_int(val)) { + if (upper) + emit(rv_lui(rd, upper), ctx); + + if (!upper) { + emit(rv_addi(rd, RV_REG_ZERO, lower), ctx); + return; + } + + emit(rv_addiw(rd, rd, lower), ctx); + return; + } + + shift = __ffs(upper); + upper >>= shift; + shift += 12; + + emit_imm(rd, upper, ctx); + + emit(rv_slli(rd, rd, shift), ctx); + if (lower) + emit(rv_addi(rd, rd, lower), ctx); +} + +static int rv_offset(int bpf_to, int bpf_from, struct rv_jit_context *ctx) +{ + int from = ctx->offset[bpf_from] - 1, to = ctx->offset[bpf_to]; + + return (to - from) << 2; +} + +static int epilogue_offset(struct rv_jit_context *ctx) +{ + int to = ctx->epilogue_offset, from = ctx->ninsns; + + return (to - from) << 2; +} + +static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, + bool extra_pass) +{ + bool is64 = BPF_CLASS(insn->code) == BPF_ALU64; + int rvoff, i = insn - ctx->prog->insnsi; + u8 rd, rs, code = insn->code; + s16 off = insn->off; + s32 imm = insn->imm; + + switch (code) { + /* dst = src */ + case BPF_ALU | BPF_MOV | BPF_X: + case BPF_ALU64 | BPF_MOV | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_addi(rd, rs, 0) : rv_addiw(rd, rs, 0), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + + /* dst = dst OP src */ + case BPF_ALU | BPF_ADD | BPF_X: + case BPF_ALU64 | BPF_ADD | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_add(rd, rd, rs) : rv_addw(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_SUB | BPF_X: + case BPF_ALU64 | BPF_SUB | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_sub(rd, rd, rs) : rv_subw(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_AND | BPF_X: + case BPF_ALU64 | BPF_AND | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_and(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_OR | BPF_X: + case BPF_ALU64 | BPF_OR | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_or(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_XOR | BPF_X: + case BPF_ALU64 | BPF_XOR | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_xor(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_MUL | BPF_X: + case BPF_ALU64 | BPF_MUL | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_mul(rd, rd, rs) : rv_mulw(rd, rd, rs), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_DIV | BPF_X: + case BPF_ALU64 | BPF_DIV | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_divu(rd, rd, rs) : rv_divuw(rd, rd, rs), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_MOD | BPF_X: + case BPF_ALU64 | BPF_MOD | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_remu(rd, rd, rs) : rv_remuw(rd, rd, rs), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_LSH | BPF_X: + case BPF_ALU64 | BPF_LSH | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_sll(rd, rd, rs) : rv_sllw(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_RSH | BPF_X: + case BPF_ALU64 | BPF_RSH | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_srl(rd, rd, rs) : rv_srlw(rd, rd, rs), ctx); + break; + case BPF_ALU | BPF_ARSH | BPF_X: + case BPF_ALU64 | BPF_ARSH | BPF_X: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_sra(rd, rd, rs) : rv_sraw(rd, rd, rs), ctx); + break; + + /* dst = -dst */ + case BPF_ALU | BPF_NEG: + case BPF_ALU64 | BPF_NEG: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? + rv_sub(rd, RV_REG_ZERO, rd) : + rv_subw(rd, RV_REG_ZERO, rd), + ctx); + break; + + /* dst = BSWAP##imm(dst) */ + case BPF_ALU | BPF_END | BPF_FROM_LE: + { + int shift = 64 - imm; + + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_slli(rd, rd, shift), ctx); + emit(rv_srli(rd, rd, shift), ctx); + break; + } + case BPF_ALU | BPF_END | BPF_FROM_BE: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + + emit(rv_addi(RV_REG_T2, RV_REG_ZERO, 0), ctx); + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + if (imm == 16) + goto out_be; + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + if (imm == 32) + goto out_be; + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + emit(rv_slli(RV_REG_T2, RV_REG_T2, 8), ctx); + emit(rv_srli(rd, rd, 8), ctx); + out_be: + emit(rv_andi(RV_REG_T1, rd, 0xff), ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, RV_REG_T1), ctx); + + emit(rv_addi(rd, RV_REG_T2, 0), ctx); + break; + + /* dst = imm */ + case BPF_ALU | BPF_MOV | BPF_K: + case BPF_ALU64 | BPF_MOV | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(rd, imm, ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + + /* dst = dst OP imm */ + case BPF_ALU | BPF_ADD | BPF_K: + case BPF_ALU64 | BPF_ADD | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(imm)) { + emit(is64 ? rv_addi(rd, rd, imm) : + rv_addiw(rd, rd, imm), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + } + emit_imm(RV_REG_T1, imm, ctx); + emit(is64 ? rv_add(rd, rd, RV_REG_T1) : + rv_addw(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_SUB | BPF_K: + case BPF_ALU64 | BPF_SUB | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(-imm)) { + emit(is64 ? rv_addi(rd, rd, -imm) : + rv_addiw(rd, rd, -imm), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + } + emit_imm(RV_REG_T1, imm, ctx); + emit(is64 ? rv_sub(rd, rd, RV_REG_T1) : + rv_subw(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_AND | BPF_K: + case BPF_ALU64 | BPF_AND | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(imm)) { + emit(rv_andi(rd, rd, imm), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + } + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_and(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_OR | BPF_K: + case BPF_ALU64 | BPF_OR | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(imm)) { + emit(rv_ori(rd, rd, imm), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + } + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_or(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_XOR | BPF_K: + case BPF_ALU64 | BPF_XOR | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(imm)) { + emit(rv_xori(rd, rd, imm), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + } + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_xor(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_MUL | BPF_K: + case BPF_ALU64 | BPF_MUL | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(is64 ? rv_mul(rd, rd, RV_REG_T1) : + rv_mulw(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_DIV | BPF_K: + case BPF_ALU64 | BPF_DIV | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(is64 ? rv_divu(rd, rd, RV_REG_T1) : + rv_divuw(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_MOD | BPF_K: + case BPF_ALU64 | BPF_MOD | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(is64 ? rv_remu(rd, rd, RV_REG_T1) : + rv_remuw(rd, rd, RV_REG_T1), ctx); + if (!is64) { + emit(rv_slli(rd, rd, 32), ctx); + emit(rv_srli(rd, rd, 32), ctx); + } + break; + case BPF_ALU | BPF_LSH | BPF_K: + case BPF_ALU64 | BPF_LSH | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_slli(rd, rd, imm) : + rv_slliw(rd, rd, imm), ctx); + break; + case BPF_ALU | BPF_RSH | BPF_K: + case BPF_ALU64 | BPF_RSH | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_srli(rd, rd, imm) : + rv_srliw(rd, rd, imm), ctx); + break; + case BPF_ALU | BPF_ARSH | BPF_K: + case BPF_ALU64 | BPF_ARSH | BPF_K: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(is64 ? rv_srai(rd, rd, imm) : + rv_sraiw(rd, rd, imm), ctx); + break; + + /* JUMP off */ + case BPF_JMP | BPF_JA: + rvoff = rv_offset(i + off, i, ctx); + if (!is_21b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, rvoff); + return -1; + } + + emit(rv_jal(RV_REG_ZERO, rvoff >> 1), ctx); + break; + + /* IF (dst COND src) JUMP off */ + case BPF_JMP | BPF_JEQ | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_beq(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JGT | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bltu(rs, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JLT | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bltu(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JGE | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bgeu(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JLE | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bgeu(rs, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JNE | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bne(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSGT | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_blt(rs, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSLT | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_blt(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSGE | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bge(rd, rs, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSLE | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_bge(rs, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSET | BPF_X: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit(rv_and(RV_REG_T1, rd, rs), ctx); + emit(rv_bne(RV_REG_T1, RV_REG_ZERO, rvoff >> 1), ctx); + break; + + /* IF (dst COND imm) JUMP off */ + case BPF_JMP | BPF_JEQ | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_beq(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JGT | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bltu(RV_REG_T1, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JLT | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bltu(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JGE | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bgeu(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JLE | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bgeu(RV_REG_T1, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JNE | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bne(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSGT | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_blt(RV_REG_T1, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSLT | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_blt(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSGE | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bge(rd, RV_REG_T1, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSLE | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + emit(rv_bge(RV_REG_T1, rd, rvoff >> 1), ctx); + break; + case BPF_JMP | BPF_JSET | BPF_K: + rvoff = rv_offset(i + off, i, ctx); + if (!is_13b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, (int)rvoff); + return -1; + } + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T2, imm, ctx); + emit(rv_and(RV_REG_T1, rd, RV_REG_T2), ctx); + emit(rv_bne(RV_REG_T1, RV_REG_ZERO, rvoff >> 1), ctx); + break; + + /* function call */ + case BPF_JMP | BPF_CALL: + { + bool fixed; + int i, ret; + u64 addr; + + seen_call(ctx); + ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass, &addr, + &fixed); + if (ret < 0) + return ret; + if (fixed) { + emit_imm(RV_REG_T1, addr, ctx); + } else { + i = ctx->ninsns; + emit_imm(RV_REG_T1, addr, ctx); + for (i = ctx->ninsns - i; i < 8; i++) { + /* nop */ + emit(rv_addi(RV_REG_ZERO, RV_REG_ZERO, 0), + ctx); + } + } + emit(rv_jalr(RV_REG_RA, RV_REG_T1, 0), ctx); + rd = bpf_to_rv_reg(BPF_REG_0, ctx); + emit(rv_addi(rd, RV_REG_A0, 0), ctx); + break; + } + /* tail call */ + case BPF_JMP | BPF_TAIL_CALL: + rd = bpf_to_rv_reg(TAIL_CALL_REG, ctx); + pr_err("bpf-jit: tail call not supported yet!\n"); + return -1; + + /* function return */ + case BPF_JMP | BPF_EXIT: + if (i == ctx->prog->len - 1) + break; + + rvoff = epilogue_offset(ctx); + if (!is_21b_int(rvoff)) { + pr_err("bpf-jit: %d offset=%d not supported yet!\n", + __LINE__, rvoff); + return -1; + } + + emit(rv_jal(RV_REG_ZERO, rvoff >> 1), ctx); + break; + + /* dst = imm64 */ + case BPF_LD | BPF_IMM | BPF_DW: + { + struct bpf_insn insn1 = insn[1]; + u64 imm64; + + imm64 = (u64)insn1.imm << 32 | (u32)imm; + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(rd, imm64, ctx); + return 1; + } + + /* LDX: dst = *(size *)(src + off) */ + case BPF_LDX | BPF_MEM | BPF_B: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_lbu(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx); + emit(rv_lbu(rd, 0, RV_REG_T1), ctx); + break; + case BPF_LDX | BPF_MEM | BPF_H: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_lhu(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx); + emit(rv_lhu(rd, 0, RV_REG_T1), ctx); + break; + case BPF_LDX | BPF_MEM | BPF_W: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_lwu(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx); + emit(rv_lwu(rd, 0, RV_REG_T1), ctx); + break; + case BPF_LDX | BPF_MEM | BPF_DW: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_ld(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx); + emit(rv_ld(rd, 0, RV_REG_T1), ctx); + break; + + /* ST: *(size *)(dst + off) = imm */ + case BPF_ST | BPF_MEM | BPF_B: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + if (is_12b_int(off)) { + emit(rv_sb(rd, off, RV_REG_T1), ctx); + break; + } + + emit_imm(RV_REG_T2, off, ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx); + emit(rv_sb(RV_REG_T2, 0, RV_REG_T1), ctx); + break; + + case BPF_ST | BPF_MEM | BPF_H: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + if (is_12b_int(off)) { + emit(rv_sh(rd, off, RV_REG_T1), ctx); + break; + } + + emit_imm(RV_REG_T2, off, ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx); + emit(rv_sh(RV_REG_T2, 0, RV_REG_T1), ctx); + break; + case BPF_ST | BPF_MEM | BPF_W: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + if (is_12b_int(off)) { + emit(rv_sw(rd, off, RV_REG_T1), ctx); + break; + } + + emit_imm(RV_REG_T2, off, ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx); + emit(rv_sw(RV_REG_T2, 0, RV_REG_T1), ctx); + break; + case BPF_ST | BPF_MEM | BPF_DW: + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + emit_imm(RV_REG_T1, imm, ctx); + if (is_12b_int(off)) { + emit(rv_sd(rd, off, RV_REG_T1), ctx); + break; + } + + emit_imm(RV_REG_T2, off, ctx); + emit(rv_add(RV_REG_T2, RV_REG_T2, rd), ctx); + emit(rv_sd(RV_REG_T2, 0, RV_REG_T1), ctx); + break; + + /* STX: *(size *)(dst + off) = src */ + case BPF_STX | BPF_MEM | BPF_B: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_sb(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx); + emit(rv_sb(RV_REG_T1, 0, rs), ctx); + break; + case BPF_STX | BPF_MEM | BPF_H: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_sh(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx); + emit(rv_sh(RV_REG_T1, 0, rs), ctx); + break; + case BPF_STX | BPF_MEM | BPF_W: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_sw(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx); + emit(rv_sw(RV_REG_T1, 0, rs), ctx); + break; + case BPF_STX | BPF_MEM | BPF_DW: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (is_12b_int(off)) { + emit(rv_sd(rd, off, rs), ctx); + break; + } + + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx); + emit(rv_sd(RV_REG_T1, 0, rs), ctx); + break; + /* STX XADD: lock *(u32 *)(dst + off) += src */ + case BPF_STX | BPF_XADD | BPF_W: + /* STX XADD: lock *(u64 *)(dst + off) += src */ + case BPF_STX | BPF_XADD | BPF_DW: + rs = bpf_to_rv_reg(insn->src_reg, ctx); + rd = bpf_to_rv_reg(insn->dst_reg, ctx); + if (off) { + if (is_12b_int(off)) { + emit(rv_addi(RV_REG_T1, rd, off), ctx); + } else { + emit_imm(RV_REG_T1, off, ctx); + emit(rv_add(RV_REG_T1, RV_REG_T1, rd), ctx); + } + + rd = RV_REG_T1; + } + + emit(BPF_SIZE(code) == BPF_W ? + rv_amoadd_w(RV_REG_ZERO, rs, rd, 0, 0) : + rv_amoadd_d(RV_REG_ZERO, rs, rd, 0, 0), ctx); + break; + default: + pr_err("bpf-jit: unknown opcode %02x\n", code); + return -EINVAL; + } + + return 0; +} + +static void build_prologue(struct rv_jit_context *ctx) +{ + int stack_adjust = 0, store_offset, bpf_stack_adjust; + + if (seen_reg(RV_REG_RA, ctx)) + stack_adjust += 8; + stack_adjust += 8; /* RV_REG_FP */ + if (seen_reg(RV_REG_S1, ctx)) + stack_adjust += 8; + if (seen_reg(RV_REG_S2, ctx)) + stack_adjust += 8; + if (seen_reg(RV_REG_S3, ctx)) + stack_adjust += 8; + if (seen_reg(RV_REG_S4, ctx)) + stack_adjust += 8; + if (seen_reg(RV_REG_S5, ctx)) + stack_adjust += 8; + if (seen_reg(RV_REG_S6, ctx)) + stack_adjust += 8; + + stack_adjust = round_up(stack_adjust, 16); + bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); + stack_adjust += bpf_stack_adjust; + + store_offset = stack_adjust - 8; + + emit(rv_addi(RV_REG_SP, RV_REG_SP, -stack_adjust), ctx); + + if (seen_reg(RV_REG_RA, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_RA), ctx); + store_offset -= 8; + } + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_FP), ctx); + store_offset -= 8; + if (seen_reg(RV_REG_S1, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S1), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S2, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S2), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S3, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S3), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S4, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S4), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S5, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S5), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S6, ctx)) { + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S6), ctx); + store_offset -= 8; + } + + emit(rv_addi(RV_REG_FP, RV_REG_SP, stack_adjust), ctx); + + if (bpf_stack_adjust) { + if (!seen_reg(RV_REG_S5, ctx)) + pr_warn("bpf-jit: not seen BPF_REG_FP, stack is %d\n", + bpf_stack_adjust); + emit(rv_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust), ctx); + } + + ctx->stack_size = stack_adjust; +} + +static void build_epilogue(struct rv_jit_context *ctx) +{ + int stack_adjust = ctx->stack_size, store_offset = stack_adjust - 8; + + if (seen_reg(RV_REG_RA, ctx)) { + emit(rv_ld(RV_REG_RA, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + emit(rv_ld(RV_REG_FP, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + if (seen_reg(RV_REG_S1, ctx)) { + emit(rv_ld(RV_REG_S1, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S2, ctx)) { + emit(rv_ld(RV_REG_S2, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S3, ctx)) { + emit(rv_ld(RV_REG_S3, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S4, ctx)) { + emit(rv_ld(RV_REG_S4, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S5, ctx)) { + emit(rv_ld(RV_REG_S5, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + if (seen_reg(RV_REG_S6, ctx)) { + emit(rv_ld(RV_REG_S6, store_offset, RV_REG_SP), ctx); + store_offset -= 8; + } + + emit(rv_addi(RV_REG_SP, RV_REG_SP, stack_adjust), ctx); + /* Set return value. */ + emit(rv_addi(RV_REG_A0, RV_REG_A5, 0), ctx); + emit(rv_jalr(RV_REG_ZERO, RV_REG_RA, 0), ctx); +} + +static int build_body(struct rv_jit_context *ctx, bool extra_pass) +{ + const struct bpf_prog *prog = ctx->prog; + int i; + + for (i = 0; i < prog->len; i++) { + const struct bpf_insn *insn = &prog->insnsi[i]; + int ret; + + ret = emit_insn(insn, ctx, extra_pass); + if (ret > 0) { + i++; + if (ctx->insns == NULL) + ctx->offset[i] = ctx->ninsns; + continue; + } + if (ctx->insns == NULL) + ctx->offset[i] = ctx->ninsns; + if (ret) + return ret; + } + return 0; +} + +static void bpf_fill_ill_insns(void *area, unsigned int size) +{ + memset(area, 0, size); +} + +static void bpf_flush_icache(void *start, void *end) +{ + flush_icache_range((unsigned long)start, (unsigned long)end); +} + struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) { + bool tmp_blinded = false, extra_pass = false; + struct bpf_prog *tmp, *orig_prog = prog; + struct rv_jit_data *jit_data; + struct rv_jit_context *ctx; + unsigned int image_size; + + if (!prog->jit_requested) + return orig_prog; + + tmp = bpf_jit_blind_constants(prog); + if (IS_ERR(tmp)) + return orig_prog; + if (tmp != prog) { + tmp_blinded = true; + prog = tmp; + } + + jit_data = prog->aux->jit_data; + if (!jit_data) { + jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL); + if (!jit_data) { + prog = orig_prog; + goto out; + } + prog->aux->jit_data = jit_data; + } + + ctx = &jit_data->ctx; + + if (ctx->offset) { + extra_pass = true; + image_size = sizeof(u32) * ctx->ninsns; + goto skip_init_ctx; + } + + ctx->prog = prog; + ctx->offset = kcalloc(prog->len, sizeof(int), GFP_KERNEL); + if (!ctx->offset) { + prog = orig_prog; + goto out_offset; + } + + /* First pass generates the ctx->offset, but does not emit an image. */ + if (build_body(ctx, extra_pass)) { + prog = orig_prog; + goto out_offset; + } + build_prologue(ctx); + ctx->epilogue_offset = ctx->ninsns; + build_epilogue(ctx); + + /* Allocate image, now that we know the size. */ + image_size = sizeof(u32) * ctx->ninsns; + jit_data->header = bpf_jit_binary_alloc(image_size, &jit_data->image, + sizeof(u32), + bpf_fill_ill_insns); + if (!jit_data->header) { + prog = orig_prog; + goto out_offset; + } + + /* Second, real pass, that acutally emits the image. */ + ctx->insns = (u32 *)jit_data->image; +skip_init_ctx: + ctx->ninsns = 0; + + build_prologue(ctx); + if (build_body(ctx, extra_pass)) { + bpf_jit_binary_free(jit_data->header); + prog = orig_prog; + goto out_offset; + } + build_epilogue(ctx); + + if (bpf_jit_enable > 1) + bpf_jit_dump(prog->len, image_size, 2, ctx->insns); + + prog->bpf_func = (void *)ctx->insns; + prog->jited = 1; + prog->jited_len = image_size; + + bpf_flush_icache(jit_data->header, (u8 *)ctx->insns + ctx->ninsns); + + if (!prog->is_func || extra_pass) { +out_offset: + kfree(ctx->offset); + kfree(jit_data); + prog->aux->jit_data = NULL; + } +out: + if (tmp_blinded) + bpf_jit_prog_release_other(prog, prog == orig_prog ? + tmp : orig_prog); return prog; } -- 2.19.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G 2019-01-15 8:35 ` [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G Björn Töpel 2019-01-15 8:35 ` Björn Töpel @ 2019-01-15 23:49 ` Daniel Borkmann 2019-01-16 7:23 ` Björn Töpel 1 sibling, 1 reply; 23+ messages in thread From: Daniel Borkmann @ 2019-01-15 23:49 UTC (permalink / raw) To: Björn Töpel, linux-riscv; +Cc: palmer, davidlee, netdev On 01/15/2019 09:35 AM, Björn Töpel wrote: > This commit adds eBPF JIT for RV64G. > > Codewise, it needs some refactoring. Currently there's a bit too much > copy-and-paste going on, and I know some places where I could optimize > the code generation a bit (mostly BPF_K type of instructions, dealing > with immediates). Nice work! :) > From a features perspective, two things are missing: > > * tail calls > * "far-branches", i.e. conditional branches that reach beyond 13b. > > The test_bpf.ko passes all tests. Did you also check test_verifier under jit with/without jit hardening enabled? That one contains lots of runtime tests as well. Probably makes sense to check under CONFIG_BPF_JIT_ALWAYS_ON to see what fails the JIT; the test_verifier also contains various tail call tests targeted at JITs, for example. Nit: please definitely also add a MAINTAINERS entry with at least yourself under BPF JIT section, and update Documentation/sysctl/net.txt with riscv64. > Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> > --- > arch/riscv/net/bpf_jit_comp.c | 1608 +++++++++++++++++++++++++++++++++ > 1 file changed, 1608 insertions(+) > > diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c > index 7e359d3249ee..562d56eb8d23 100644 > --- a/arch/riscv/net/bpf_jit_comp.c > +++ b/arch/riscv/net/bpf_jit_comp.c > @@ -1,4 +1,1612 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * BPF JIT compiler for RV64G > + * > + * Copyright(c) 2019 Björn Töpel <bjorn.topel@gmail.com> > + * > + */ > + > +#include <linux/bpf.h> > +#include <linux/filter.h> > +#include <asm/cacheflush.h> > + > +#define TMP_REG_0 (MAX_BPF_JIT_REG + 0) > +#define TMP_REG_1 (MAX_BPF_JIT_REG + 1) Not used? > +#define TAIL_CALL_REG (MAX_BPF_JIT_REG + 2) > + > +enum rv_register { > + RV_REG_ZERO = 0, /* The constant value 0 */ > + RV_REG_RA = 1, /* Return address */ > + RV_REG_SP = 2, /* Stack pointer */ > + RV_REG_GP = 3, /* Global pointer */ > + RV_REG_TP = 4, /* Thread pointer */ > + RV_REG_T0 = 5, /* Temporaries */ > + RV_REG_T1 = 6, > + RV_REG_T2 = 7, > + RV_REG_FP = 8, > + RV_REG_S1 = 9, /* Saved registers */ > + RV_REG_A0 = 10, /* Function argument/return values */ > + RV_REG_A1 = 11, /* Function arguments */ > + RV_REG_A2 = 12, > + RV_REG_A3 = 13, > + RV_REG_A4 = 14, > + RV_REG_A5 = 15, > + RV_REG_A6 = 16, > + RV_REG_A7 = 17, > + RV_REG_S2 = 18, /* Saved registers */ > + RV_REG_S3 = 19, > + RV_REG_S4 = 20, > + RV_REG_S5 = 21, > + RV_REG_S6 = 22, > + RV_REG_S7 = 23, > + RV_REG_S8 = 24, > + RV_REG_S9 = 25, > + RV_REG_S10 = 26, > + RV_REG_S11 = 27, > + RV_REG_T3 = 28, /* Temporaries */ > + RV_REG_T4 = 29, > + RV_REG_T5 = 30, > + RV_REG_T6 = 31, > +}; > + > +struct rv_jit_context { > + struct bpf_prog *prog; > + u32 *insns; /* RV insns */ > + int ninsns; > + int epilogue_offset; > + int *offset; /* BPF to RV */ > + unsigned long seen_reg_bits; > + int stack_size; > +}; > + > +struct rv_jit_data { > + struct bpf_binary_header *header; > + u8 *image; > + struct rv_jit_context ctx; > +}; > + > +static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx) > +{ This one can also be simplified by having a simple mapping as in other JITs and then mark __set_bit(<reg>) in the small bpf_to_rv_reg() helper. > + switch (bpf_reg) { > + /* Return value */ > + case BPF_REG_0: > + __set_bit(RV_REG_A5, &ctx->seen_reg_bits); > + return RV_REG_A5; > + /* Function arguments */ > + case BPF_REG_1: > + __set_bit(RV_REG_A0, &ctx->seen_reg_bits); > + return RV_REG_A0; > + case BPF_REG_2: > + __set_bit(RV_REG_A1, &ctx->seen_reg_bits); > + return RV_REG_A1; > + case BPF_REG_3: > + __set_bit(RV_REG_A2, &ctx->seen_reg_bits); > + return RV_REG_A2; > + case BPF_REG_4: > + __set_bit(RV_REG_A3, &ctx->seen_reg_bits); > + return RV_REG_A3; > + case BPF_REG_5: > + __set_bit(RV_REG_A4, &ctx->seen_reg_bits); > + return RV_REG_A4; > + /* Callee saved registers */ > + case BPF_REG_6: > + __set_bit(RV_REG_S1, &ctx->seen_reg_bits); > + return RV_REG_S1; > + case BPF_REG_7: > + __set_bit(RV_REG_S2, &ctx->seen_reg_bits); > + return RV_REG_S2; > + case BPF_REG_8: > + __set_bit(RV_REG_S3, &ctx->seen_reg_bits); > + return RV_REG_S3; > + case BPF_REG_9: > + __set_bit(RV_REG_S4, &ctx->seen_reg_bits); > + return RV_REG_S4; > + /* Stack read-only frame pointer to access stack */ > + case BPF_REG_FP: > + __set_bit(RV_REG_S5, &ctx->seen_reg_bits); > + return RV_REG_S5; > + /* Temporary register */ > + case BPF_REG_AX: > + __set_bit(RV_REG_T0, &ctx->seen_reg_bits); > + return RV_REG_T0; > + /* Tail call counter */ > + case TAIL_CALL_REG: > + __set_bit(RV_REG_S6, &ctx->seen_reg_bits); > + return RV_REG_S6; > + default: > + return 0; > + } > +}; [...] > + /* tail call */ > + case BPF_JMP | BPF_TAIL_CALL: > + rd = bpf_to_rv_reg(TAIL_CALL_REG, ctx); > + pr_err("bpf-jit: tail call not supported yet!\n"); > + return -1; There are two options here, either fixed size prologue where you can then jump over it in tail call case, or dynamic one which would make it slower due to reg restore but shrinks image for non-tail calls. > + /* function return */ > + case BPF_JMP | BPF_EXIT: > + if (i == ctx->prog->len - 1) > + break; > + > + rvoff = epilogue_offset(ctx); > + if (!is_21b_int(rvoff)) { > + pr_err("bpf-jit: %d offset=%d not supported yet!\n", > + __LINE__, rvoff); > + return -1; > + } > + > + emit(rv_jal(RV_REG_ZERO, rvoff >> 1), ctx); > + break; > + > + /* dst = imm64 */ > + case BPF_LD | BPF_IMM | BPF_DW: > + { > + struct bpf_insn insn1 = insn[1]; > + u64 imm64; > + [...] > + > +static void build_prologue(struct rv_jit_context *ctx) > +{ > + int stack_adjust = 0, store_offset, bpf_stack_adjust; > + > + if (seen_reg(RV_REG_RA, ctx)) > + stack_adjust += 8; > + stack_adjust += 8; /* RV_REG_FP */ > + if (seen_reg(RV_REG_S1, ctx)) > + stack_adjust += 8; > + if (seen_reg(RV_REG_S2, ctx)) > + stack_adjust += 8; > + if (seen_reg(RV_REG_S3, ctx)) > + stack_adjust += 8; > + if (seen_reg(RV_REG_S4, ctx)) > + stack_adjust += 8; > + if (seen_reg(RV_REG_S5, ctx)) > + stack_adjust += 8; > + if (seen_reg(RV_REG_S6, ctx)) > + stack_adjust += 8; > + > + stack_adjust = round_up(stack_adjust, 16); > + bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); > + stack_adjust += bpf_stack_adjust; > + > + store_offset = stack_adjust - 8; > + > + emit(rv_addi(RV_REG_SP, RV_REG_SP, -stack_adjust), ctx); > + > + if (seen_reg(RV_REG_RA, ctx)) { > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_RA), ctx); > + store_offset -= 8; > + } > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_FP), ctx); > + store_offset -= 8; > + if (seen_reg(RV_REG_S1, ctx)) { > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S1), ctx); > + store_offset -= 8; > + } > + if (seen_reg(RV_REG_S2, ctx)) { > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S2), ctx); > + store_offset -= 8; > + } > + if (seen_reg(RV_REG_S3, ctx)) { > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S3), ctx); > + store_offset -= 8; > + } > + if (seen_reg(RV_REG_S4, ctx)) { > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S4), ctx); > + store_offset -= 8; > + } > + if (seen_reg(RV_REG_S5, ctx)) { > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S5), ctx); > + store_offset -= 8; > + } > + if (seen_reg(RV_REG_S6, ctx)) { > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S6), ctx); > + store_offset -= 8; > + } > + > + emit(rv_addi(RV_REG_FP, RV_REG_SP, stack_adjust), ctx); > + > + if (bpf_stack_adjust) { > + if (!seen_reg(RV_REG_S5, ctx)) > + pr_warn("bpf-jit: not seen BPF_REG_FP, stack is %d\n", > + bpf_stack_adjust); > + emit(rv_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust), ctx); > + } > + > + ctx->stack_size = stack_adjust; > +} > + > +static void build_epilogue(struct rv_jit_context *ctx) > +{ > + int stack_adjust = ctx->stack_size, store_offset = stack_adjust - 8; > + > + if (seen_reg(RV_REG_RA, ctx)) { > + emit(rv_ld(RV_REG_RA, store_offset, RV_REG_SP), ctx); > + store_offset -= 8; > + } > + emit(rv_ld(RV_REG_FP, store_offset, RV_REG_SP), ctx); > + store_offset -= 8; > + if (seen_reg(RV_REG_S1, ctx)) { > + emit(rv_ld(RV_REG_S1, store_offset, RV_REG_SP), ctx); > + store_offset -= 8; > + } > + if (seen_reg(RV_REG_S2, ctx)) { > + emit(rv_ld(RV_REG_S2, store_offset, RV_REG_SP), ctx); > + store_offset -= 8; > + } > + if (seen_reg(RV_REG_S3, ctx)) { > + emit(rv_ld(RV_REG_S3, store_offset, RV_REG_SP), ctx); > + store_offset -= 8; > + } > + if (seen_reg(RV_REG_S4, ctx)) { > + emit(rv_ld(RV_REG_S4, store_offset, RV_REG_SP), ctx); > + store_offset -= 8; > + } > + if (seen_reg(RV_REG_S5, ctx)) { > + emit(rv_ld(RV_REG_S5, store_offset, RV_REG_SP), ctx); > + store_offset -= 8; > + } > + if (seen_reg(RV_REG_S6, ctx)) { > + emit(rv_ld(RV_REG_S6, store_offset, RV_REG_SP), ctx); > + store_offset -= 8; > + } > + > + emit(rv_addi(RV_REG_SP, RV_REG_SP, stack_adjust), ctx); > + /* Set return value. */ > + emit(rv_addi(RV_REG_A0, RV_REG_A5, 0), ctx); > + emit(rv_jalr(RV_REG_ZERO, RV_REG_RA, 0), ctx); > +} > + > +static int build_body(struct rv_jit_context *ctx, bool extra_pass) > +{ > + const struct bpf_prog *prog = ctx->prog; > + int i; > + > + for (i = 0; i < prog->len; i++) { > + const struct bpf_insn *insn = &prog->insnsi[i]; > + int ret; > + > + ret = emit_insn(insn, ctx, extra_pass); > + if (ret > 0) { > + i++; > + if (ctx->insns == NULL) > + ctx->offset[i] = ctx->ninsns; > + continue; > + } > + if (ctx->insns == NULL) > + ctx->offset[i] = ctx->ninsns; > + if (ret) > + return ret; > + } > + return 0; > +} > + > +static void bpf_fill_ill_insns(void *area, unsigned int size) > +{ > + memset(area, 0, size); Needs update as well? > +} > + > +static void bpf_flush_icache(void *start, void *end) > +{ > + flush_icache_range((unsigned long)start, (unsigned long)end); > +} > + ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G 2019-01-15 23:49 ` Daniel Borkmann @ 2019-01-16 7:23 ` Björn Töpel 2019-01-16 15:41 ` Daniel Borkmann 0 siblings, 1 reply; 23+ messages in thread From: Björn Töpel @ 2019-01-16 7:23 UTC (permalink / raw) To: Daniel Borkmann; +Cc: linux-riscv, Palmer Dabbelt, davidlee, Netdev Den ons 16 jan. 2019 kl 00:50 skrev Daniel Borkmann <daniel@iogearbox.net>: > > On 01/15/2019 09:35 AM, Björn Töpel wrote: > > This commit adds eBPF JIT for RV64G. > > > > Codewise, it needs some refactoring. Currently there's a bit too much > > copy-and-paste going on, and I know some places where I could optimize > > the code generation a bit (mostly BPF_K type of instructions, dealing > > with immediates). > > Nice work! :) > > > From a features perspective, two things are missing: > > > > * tail calls > > * "far-branches", i.e. conditional branches that reach beyond 13b. > > > > The test_bpf.ko passes all tests. > > Did you also check test_verifier under jit with/without jit hardening > enabled? That one contains lots of runtime tests as well. Probably makes > sense to check under CONFIG_BPF_JIT_ALWAYS_ON to see what fails the JIT; > the test_verifier also contains various tail call tests targeted at JITs, > for example. > Good point! I will do that. The only selftests/bpf program that I ran (and passed) was "test_progs". I'll make sure that the complete bpf selftests suite passes as well! > Nit: please definitely also add a MAINTAINERS entry with at least yourself > under BPF JIT section, and update Documentation/sysctl/net.txt with riscv64. > Ah! Yes, I'll fix that. > > Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> > > --- > > arch/riscv/net/bpf_jit_comp.c | 1608 +++++++++++++++++++++++++++++++++ > > 1 file changed, 1608 insertions(+) > > > > diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c > > index 7e359d3249ee..562d56eb8d23 100644 > > --- a/arch/riscv/net/bpf_jit_comp.c > > +++ b/arch/riscv/net/bpf_jit_comp.c > > @@ -1,4 +1,1612 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* > > + * BPF JIT compiler for RV64G > > + * > > + * Copyright(c) 2019 Björn Töpel <bjorn.topel@gmail.com> > > + * > > + */ > > + > > +#include <linux/bpf.h> > > +#include <linux/filter.h> > > +#include <asm/cacheflush.h> > > + > > +#define TMP_REG_0 (MAX_BPF_JIT_REG + 0) > > +#define TMP_REG_1 (MAX_BPF_JIT_REG + 1) > > Not used? > Correct! I'll get rid of them. > > +#define TAIL_CALL_REG (MAX_BPF_JIT_REG + 2) > > + > > +enum rv_register { > > + RV_REG_ZERO = 0, /* The constant value 0 */ > > + RV_REG_RA = 1, /* Return address */ > > + RV_REG_SP = 2, /* Stack pointer */ > > + RV_REG_GP = 3, /* Global pointer */ > > + RV_REG_TP = 4, /* Thread pointer */ > > + RV_REG_T0 = 5, /* Temporaries */ > > + RV_REG_T1 = 6, > > + RV_REG_T2 = 7, > > + RV_REG_FP = 8, > > + RV_REG_S1 = 9, /* Saved registers */ > > + RV_REG_A0 = 10, /* Function argument/return values */ > > + RV_REG_A1 = 11, /* Function arguments */ > > + RV_REG_A2 = 12, > > + RV_REG_A3 = 13, > > + RV_REG_A4 = 14, > > + RV_REG_A5 = 15, > > + RV_REG_A6 = 16, > > + RV_REG_A7 = 17, > > + RV_REG_S2 = 18, /* Saved registers */ > > + RV_REG_S3 = 19, > > + RV_REG_S4 = 20, > > + RV_REG_S5 = 21, > > + RV_REG_S6 = 22, > > + RV_REG_S7 = 23, > > + RV_REG_S8 = 24, > > + RV_REG_S9 = 25, > > + RV_REG_S10 = 26, > > + RV_REG_S11 = 27, > > + RV_REG_T3 = 28, /* Temporaries */ > > + RV_REG_T4 = 29, > > + RV_REG_T5 = 30, > > + RV_REG_T6 = 31, > > +}; > > + > > +struct rv_jit_context { > > + struct bpf_prog *prog; > > + u32 *insns; /* RV insns */ > > + int ninsns; > > + int epilogue_offset; > > + int *offset; /* BPF to RV */ > > + unsigned long seen_reg_bits; > > + int stack_size; > > +}; > > + > > +struct rv_jit_data { > > + struct bpf_binary_header *header; > > + u8 *image; > > + struct rv_jit_context ctx; > > +}; > > + > > +static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx) > > +{ > > This one can also be simplified by having a simple mapping as in > other JITs and then mark __set_bit(<reg>) in the small bpf_to_rv_reg() > helper. > Yeah, I agree. Much better. I'll take that route. > > + switch (bpf_reg) { > > + /* Return value */ > > + case BPF_REG_0: > > + __set_bit(RV_REG_A5, &ctx->seen_reg_bits); > > + return RV_REG_A5; > > + /* Function arguments */ > > + case BPF_REG_1: > > + __set_bit(RV_REG_A0, &ctx->seen_reg_bits); > > + return RV_REG_A0; > > + case BPF_REG_2: > > + __set_bit(RV_REG_A1, &ctx->seen_reg_bits); > > + return RV_REG_A1; > > + case BPF_REG_3: > > + __set_bit(RV_REG_A2, &ctx->seen_reg_bits); > > + return RV_REG_A2; > > + case BPF_REG_4: > > + __set_bit(RV_REG_A3, &ctx->seen_reg_bits); > > + return RV_REG_A3; > > + case BPF_REG_5: > > + __set_bit(RV_REG_A4, &ctx->seen_reg_bits); > > + return RV_REG_A4; > > + /* Callee saved registers */ > > + case BPF_REG_6: > > + __set_bit(RV_REG_S1, &ctx->seen_reg_bits); > > + return RV_REG_S1; > > + case BPF_REG_7: > > + __set_bit(RV_REG_S2, &ctx->seen_reg_bits); > > + return RV_REG_S2; > > + case BPF_REG_8: > > + __set_bit(RV_REG_S3, &ctx->seen_reg_bits); > > + return RV_REG_S3; > > + case BPF_REG_9: > > + __set_bit(RV_REG_S4, &ctx->seen_reg_bits); > > + return RV_REG_S4; > > + /* Stack read-only frame pointer to access stack */ > > + case BPF_REG_FP: > > + __set_bit(RV_REG_S5, &ctx->seen_reg_bits); > > + return RV_REG_S5; > > + /* Temporary register */ > > + case BPF_REG_AX: > > + __set_bit(RV_REG_T0, &ctx->seen_reg_bits); > > + return RV_REG_T0; > > + /* Tail call counter */ > > + case TAIL_CALL_REG: > > + __set_bit(RV_REG_S6, &ctx->seen_reg_bits); > > + return RV_REG_S6; > > + default: > > + return 0; > > + } > > +}; > [...] > > + /* tail call */ > > + case BPF_JMP | BPF_TAIL_CALL: > > + rd = bpf_to_rv_reg(TAIL_CALL_REG, ctx); > > + pr_err("bpf-jit: tail call not supported yet!\n"); > > + return -1; > > There are two options here, either fixed size prologue where you can > then jump over it in tail call case, or dynamic one which would make > it slower due to reg restore but shrinks image for non-tail calls. > So, it would be the latter then, which is pretty much like a more expensive (due to the tail call depth checks) function call. For the fixed prologue: how does, say x86, deal with BPF stack usage in the tail call case? If the caller doesn't use the bpf stack, but the callee does. From a quick glance in the code, the x86 prologue still uses aux->stack_depth. If the callee has a different stack usage that the caller, and then the callee does a function call, wouldn't this mess up the frame? (Yeah, obviously missing something! :-)) > > + /* function return */ > > + case BPF_JMP | BPF_EXIT: > > + if (i == ctx->prog->len - 1) > > + break; > > + > > + rvoff = epilogue_offset(ctx); > > + if (!is_21b_int(rvoff)) { > > + pr_err("bpf-jit: %d offset=%d not supported yet!\n", > > + __LINE__, rvoff); > > + return -1; > > + } > > + > > + emit(rv_jal(RV_REG_ZERO, rvoff >> 1), ctx); > > + break; > > + > > + /* dst = imm64 */ > > + case BPF_LD | BPF_IMM | BPF_DW: > > + { > > + struct bpf_insn insn1 = insn[1]; > > + u64 imm64; > > + > [...] > > + > > +static void build_prologue(struct rv_jit_context *ctx) > > +{ > > + int stack_adjust = 0, store_offset, bpf_stack_adjust; > > + > > + if (seen_reg(RV_REG_RA, ctx)) > > + stack_adjust += 8; > > + stack_adjust += 8; /* RV_REG_FP */ > > + if (seen_reg(RV_REG_S1, ctx)) > > + stack_adjust += 8; > > + if (seen_reg(RV_REG_S2, ctx)) > > + stack_adjust += 8; > > + if (seen_reg(RV_REG_S3, ctx)) > > + stack_adjust += 8; > > + if (seen_reg(RV_REG_S4, ctx)) > > + stack_adjust += 8; > > + if (seen_reg(RV_REG_S5, ctx)) > > + stack_adjust += 8; > > + if (seen_reg(RV_REG_S6, ctx)) > > + stack_adjust += 8; > > + > > + stack_adjust = round_up(stack_adjust, 16); > > + bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); > > + stack_adjust += bpf_stack_adjust; > > + > > + store_offset = stack_adjust - 8; > > + > > + emit(rv_addi(RV_REG_SP, RV_REG_SP, -stack_adjust), ctx); > > + > > + if (seen_reg(RV_REG_RA, ctx)) { > > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_RA), ctx); > > + store_offset -= 8; > > + } > > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_FP), ctx); > > + store_offset -= 8; > > + if (seen_reg(RV_REG_S1, ctx)) { > > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S1), ctx); > > + store_offset -= 8; > > + } > > + if (seen_reg(RV_REG_S2, ctx)) { > > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S2), ctx); > > + store_offset -= 8; > > + } > > + if (seen_reg(RV_REG_S3, ctx)) { > > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S3), ctx); > > + store_offset -= 8; > > + } > > + if (seen_reg(RV_REG_S4, ctx)) { > > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S4), ctx); > > + store_offset -= 8; > > + } > > + if (seen_reg(RV_REG_S5, ctx)) { > > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S5), ctx); > > + store_offset -= 8; > > + } > > + if (seen_reg(RV_REG_S6, ctx)) { > > + emit(rv_sd(RV_REG_SP, store_offset, RV_REG_S6), ctx); > > + store_offset -= 8; > > + } > > + > > + emit(rv_addi(RV_REG_FP, RV_REG_SP, stack_adjust), ctx); > > + > > + if (bpf_stack_adjust) { > > + if (!seen_reg(RV_REG_S5, ctx)) > > + pr_warn("bpf-jit: not seen BPF_REG_FP, stack is %d\n", > > + bpf_stack_adjust); > > + emit(rv_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust), ctx); > > + } > > + > > + ctx->stack_size = stack_adjust; > > +} > > + > > +static void build_epilogue(struct rv_jit_context *ctx) > > +{ > > + int stack_adjust = ctx->stack_size, store_offset = stack_adjust - 8; > > + > > + if (seen_reg(RV_REG_RA, ctx)) { > > + emit(rv_ld(RV_REG_RA, store_offset, RV_REG_SP), ctx); > > + store_offset -= 8; > > + } > > + emit(rv_ld(RV_REG_FP, store_offset, RV_REG_SP), ctx); > > + store_offset -= 8; > > + if (seen_reg(RV_REG_S1, ctx)) { > > + emit(rv_ld(RV_REG_S1, store_offset, RV_REG_SP), ctx); > > + store_offset -= 8; > > + } > > + if (seen_reg(RV_REG_S2, ctx)) { > > + emit(rv_ld(RV_REG_S2, store_offset, RV_REG_SP), ctx); > > + store_offset -= 8; > > + } > > + if (seen_reg(RV_REG_S3, ctx)) { > > + emit(rv_ld(RV_REG_S3, store_offset, RV_REG_SP), ctx); > > + store_offset -= 8; > > + } > > + if (seen_reg(RV_REG_S4, ctx)) { > > + emit(rv_ld(RV_REG_S4, store_offset, RV_REG_SP), ctx); > > + store_offset -= 8; > > + } > > + if (seen_reg(RV_REG_S5, ctx)) { > > + emit(rv_ld(RV_REG_S5, store_offset, RV_REG_SP), ctx); > > + store_offset -= 8; > > + } > > + if (seen_reg(RV_REG_S6, ctx)) { > > + emit(rv_ld(RV_REG_S6, store_offset, RV_REG_SP), ctx); > > + store_offset -= 8; > > + } > > + > > + emit(rv_addi(RV_REG_SP, RV_REG_SP, stack_adjust), ctx); > > + /* Set return value. */ > > + emit(rv_addi(RV_REG_A0, RV_REG_A5, 0), ctx); > > + emit(rv_jalr(RV_REG_ZERO, RV_REG_RA, 0), ctx); > > +} > > + > > +static int build_body(struct rv_jit_context *ctx, bool extra_pass) > > +{ > > + const struct bpf_prog *prog = ctx->prog; > > + int i; > > + > > + for (i = 0; i < prog->len; i++) { > > + const struct bpf_insn *insn = &prog->insnsi[i]; > > + int ret; > > + > > + ret = emit_insn(insn, ctx, extra_pass); > > + if (ret > 0) { > > + i++; > > + if (ctx->insns == NULL) > > + ctx->offset[i] = ctx->ninsns; > > + continue; > > + } > > + if (ctx->insns == NULL) > > + ctx->offset[i] = ctx->ninsns; > > + if (ret) > > + return ret; > > + } > > + return 0; > > +} > > + > > +static void bpf_fill_ill_insns(void *area, unsigned int size) > > +{ > > + memset(area, 0, size); > > Needs update as well? > No, bitpattern of all zeros is an illegal instruction, but a comment would be good! > > +} > > + > > +static void bpf_flush_icache(void *start, void *end) > > +{ > > + flush_icache_range((unsigned long)start, (unsigned long)end); > > +} > > + Thanks a lot for the comments, Daniel! I'll get back with a v2. Björn ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G 2019-01-16 7:23 ` Björn Töpel @ 2019-01-16 15:41 ` Daniel Borkmann 2019-01-16 19:06 ` Björn Töpel 0 siblings, 1 reply; 23+ messages in thread From: Daniel Borkmann @ 2019-01-16 15:41 UTC (permalink / raw) To: Björn Töpel; +Cc: linux-riscv, Palmer Dabbelt, davidlee, Netdev On 01/16/2019 08:23 AM, Björn Töpel wrote: > Den ons 16 jan. 2019 kl 00:50 skrev Daniel Borkmann <daniel@iogearbox.net>: >> >> On 01/15/2019 09:35 AM, Björn Töpel wrote: >>> This commit adds eBPF JIT for RV64G. >>> >>> Codewise, it needs some refactoring. Currently there's a bit too much >>> copy-and-paste going on, and I know some places where I could optimize >>> the code generation a bit (mostly BPF_K type of instructions, dealing >>> with immediates). >> >> Nice work! :) >> >>> From a features perspective, two things are missing: >>> >>> * tail calls >>> * "far-branches", i.e. conditional branches that reach beyond 13b. >>> >>> The test_bpf.ko passes all tests. >> >> Did you also check test_verifier under jit with/without jit hardening >> enabled? That one contains lots of runtime tests as well. Probably makes >> sense to check under CONFIG_BPF_JIT_ALWAYS_ON to see what fails the JIT; >> the test_verifier also contains various tail call tests targeted at JITs, >> for example. >> > > Good point! I will do that. The only selftests/bpf program that I ran > (and passed) was "test_progs". I'll make sure that the complete bpf > selftests suite passes as well! > >> Nit: please definitely also add a MAINTAINERS entry with at least yourself >> under BPF JIT section, and update Documentation/sysctl/net.txt with riscv64. >> > > Ah! Yes, I'll fix that. > >>> Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> >>> --- >>> arch/riscv/net/bpf_jit_comp.c | 1608 +++++++++++++++++++++++++++++++++ >>> 1 file changed, 1608 insertions(+) >>> >>> diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c >>> index 7e359d3249ee..562d56eb8d23 100644 >>> --- a/arch/riscv/net/bpf_jit_comp.c >>> +++ b/arch/riscv/net/bpf_jit_comp.c >>> @@ -1,4 +1,1612 @@ >>> +// SPDX-License-Identifier: GPL-2.0 >>> +/* >>> + * BPF JIT compiler for RV64G >>> + * >>> + * Copyright(c) 2019 Björn Töpel <bjorn.topel@gmail.com> >>> + * >>> + */ >>> + >>> +#include <linux/bpf.h> >>> +#include <linux/filter.h> >>> +#include <asm/cacheflush.h> >>> + >>> +#define TMP_REG_0 (MAX_BPF_JIT_REG + 0) >>> +#define TMP_REG_1 (MAX_BPF_JIT_REG + 1) >> >> Not used? >> > > Correct! I'll get rid of them. > >>> +#define TAIL_CALL_REG (MAX_BPF_JIT_REG + 2) >>> + >>> +enum rv_register { >>> + RV_REG_ZERO = 0, /* The constant value 0 */ >>> + RV_REG_RA = 1, /* Return address */ >>> + RV_REG_SP = 2, /* Stack pointer */ >>> + RV_REG_GP = 3, /* Global pointer */ >>> + RV_REG_TP = 4, /* Thread pointer */ >>> + RV_REG_T0 = 5, /* Temporaries */ >>> + RV_REG_T1 = 6, >>> + RV_REG_T2 = 7, >>> + RV_REG_FP = 8, >>> + RV_REG_S1 = 9, /* Saved registers */ >>> + RV_REG_A0 = 10, /* Function argument/return values */ >>> + RV_REG_A1 = 11, /* Function arguments */ >>> + RV_REG_A2 = 12, >>> + RV_REG_A3 = 13, >>> + RV_REG_A4 = 14, >>> + RV_REG_A5 = 15, >>> + RV_REG_A6 = 16, >>> + RV_REG_A7 = 17, >>> + RV_REG_S2 = 18, /* Saved registers */ >>> + RV_REG_S3 = 19, >>> + RV_REG_S4 = 20, >>> + RV_REG_S5 = 21, >>> + RV_REG_S6 = 22, >>> + RV_REG_S7 = 23, >>> + RV_REG_S8 = 24, >>> + RV_REG_S9 = 25, >>> + RV_REG_S10 = 26, >>> + RV_REG_S11 = 27, >>> + RV_REG_T3 = 28, /* Temporaries */ >>> + RV_REG_T4 = 29, >>> + RV_REG_T5 = 30, >>> + RV_REG_T6 = 31, >>> +}; >>> + >>> +struct rv_jit_context { >>> + struct bpf_prog *prog; >>> + u32 *insns; /* RV insns */ >>> + int ninsns; >>> + int epilogue_offset; >>> + int *offset; /* BPF to RV */ >>> + unsigned long seen_reg_bits; >>> + int stack_size; >>> +}; >>> + >>> +struct rv_jit_data { >>> + struct bpf_binary_header *header; >>> + u8 *image; >>> + struct rv_jit_context ctx; >>> +}; >>> + >>> +static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx) >>> +{ >> >> This one can also be simplified by having a simple mapping as in >> other JITs and then mark __set_bit(<reg>) in the small bpf_to_rv_reg() >> helper. >> > > Yeah, I agree. Much better. I'll take that route. > >>> + switch (bpf_reg) { >>> + /* Return value */ >>> + case BPF_REG_0: >>> + __set_bit(RV_REG_A5, &ctx->seen_reg_bits); >>> + return RV_REG_A5; >>> + /* Function arguments */ >>> + case BPF_REG_1: >>> + __set_bit(RV_REG_A0, &ctx->seen_reg_bits); >>> + return RV_REG_A0; >>> + case BPF_REG_2: >>> + __set_bit(RV_REG_A1, &ctx->seen_reg_bits); >>> + return RV_REG_A1; >>> + case BPF_REG_3: >>> + __set_bit(RV_REG_A2, &ctx->seen_reg_bits); >>> + return RV_REG_A2; >>> + case BPF_REG_4: >>> + __set_bit(RV_REG_A3, &ctx->seen_reg_bits); >>> + return RV_REG_A3; >>> + case BPF_REG_5: >>> + __set_bit(RV_REG_A4, &ctx->seen_reg_bits); >>> + return RV_REG_A4; >>> + /* Callee saved registers */ >>> + case BPF_REG_6: >>> + __set_bit(RV_REG_S1, &ctx->seen_reg_bits); >>> + return RV_REG_S1; >>> + case BPF_REG_7: >>> + __set_bit(RV_REG_S2, &ctx->seen_reg_bits); >>> + return RV_REG_S2; >>> + case BPF_REG_8: >>> + __set_bit(RV_REG_S3, &ctx->seen_reg_bits); >>> + return RV_REG_S3; >>> + case BPF_REG_9: >>> + __set_bit(RV_REG_S4, &ctx->seen_reg_bits); >>> + return RV_REG_S4; >>> + /* Stack read-only frame pointer to access stack */ >>> + case BPF_REG_FP: >>> + __set_bit(RV_REG_S5, &ctx->seen_reg_bits); >>> + return RV_REG_S5; >>> + /* Temporary register */ >>> + case BPF_REG_AX: >>> + __set_bit(RV_REG_T0, &ctx->seen_reg_bits); >>> + return RV_REG_T0; >>> + /* Tail call counter */ >>> + case TAIL_CALL_REG: >>> + __set_bit(RV_REG_S6, &ctx->seen_reg_bits); >>> + return RV_REG_S6; >>> + default: >>> + return 0; >>> + } >>> +}; >> [...] >>> + /* tail call */ >>> + case BPF_JMP | BPF_TAIL_CALL: >>> + rd = bpf_to_rv_reg(TAIL_CALL_REG, ctx); >>> + pr_err("bpf-jit: tail call not supported yet!\n"); >>> + return -1; >> >> There are two options here, either fixed size prologue where you can >> then jump over it in tail call case, or dynamic one which would make >> it slower due to reg restore but shrinks image for non-tail calls. > > So, it would be the latter then, which is pretty much like a more > expensive (due to the tail call depth checks) function call. Right. > For the fixed prologue: how does, say x86, deal with BPF stack usage > in the tail call case? If the caller doesn't use the bpf stack, but > the callee does. From a quick glance in the code, the x86 prologue > still uses aux->stack_depth. If the callee has a different stack usage > that the caller, and then the callee does a function call, wouldn't > this mess up the frame? (Yeah, obviously missing something! :-)) Basically in this case verifier sets stack size to MAX_BPF_STACK when it finds a tail call in the prog, meaning the callee will be reusing <= stack size than the caller and then upon exit unwinds it via leave+ret. Cheers, Daniel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G 2019-01-16 15:41 ` Daniel Borkmann @ 2019-01-16 19:06 ` Björn Töpel 0 siblings, 0 replies; 23+ messages in thread From: Björn Töpel @ 2019-01-16 19:06 UTC (permalink / raw) To: Daniel Borkmann; +Cc: linux-riscv, Palmer Dabbelt, davidlee, Netdev Den ons 16 jan. 2019 kl 16:41 skrev Daniel Borkmann <daniel@iogearbox.net>: > [...] > > > For the fixed prologue: how does, say x86, deal with BPF stack usage > > in the tail call case? If the caller doesn't use the bpf stack, but > > the callee does. From a quick glance in the code, the x86 prologue > > still uses aux->stack_depth. If the callee has a different stack usage > > that the caller, and then the callee does a function call, wouldn't > > this mess up the frame? (Yeah, obviously missing something! :-)) > > Basically in this case verifier sets stack size to MAX_BPF_STACK when it > finds a tail call in the prog, meaning the callee will be reusing <= stack > size than the caller and then upon exit unwinds it via leave+ret. > Ugh, so for "dynamic" tail calls this would mean "more expensive functions calls with maximum stack usage per call"? I.e. each tail call consumes MAX_BPF_STACK plus regular pro-/epilogue plus depth tracking. I'd still prefer optimizing to regular functions calls, than tail calls -- or is that naive? What is most common in larger bpf deployments, say, Katran or Cilium? Cheers! Björn > Cheers, > Daniel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/3] RV64G eBPF JIT 2019-01-15 8:35 [RFC PATCH 0/3] RV64G eBPF JIT Björn Töpel ` (3 preceding siblings ...) 2019-01-15 8:35 ` [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G Björn Töpel @ 2019-01-15 15:40 ` Christoph Hellwig 2019-01-15 16:03 ` Björn Töpel 2019-01-25 19:54 ` Paul Walmsley 2019-01-30 2:02 ` Palmer Dabbelt 6 siblings, 1 reply; 23+ messages in thread From: Christoph Hellwig @ 2019-01-15 15:40 UTC (permalink / raw) To: Björn Töpel; +Cc: linux-riscv, palmer, davidlee, daniel, netdev Hi Björn, at least for me patch 3 didn't make it to the list. On Tue, Jan 15, 2019 at 09:35:15AM +0100, Björn Töpel wrote: > Hi! > > I've been hacking on a RV64G eBPF JIT compiler, and would like some > feedback. > > Codewise, it needs some refactoring. Currently there's a bit too much > copy-and-paste going on, and I know some places where I could optimize > the code generation a bit (mostly BPF_K type of instructions, dealing > with immediates). > > From a features perspective, two things are missing: > > * tail calls > * "far-branches", i.e. conditional branches that reach beyond 13b. > > The test_bpf.ko (only tested on 4.20!) passes all tests. > > I've done all the tests on QEMU (version 3.1.50), so no real hardware. > > Some questions/observations: > > * I've added "HAVE_EFFICIENT_UNALIGNED_ACCESS" to > arch/riscv/Kconfig. Is this assumption correct? > > * emit_imm() just relies on lui, adds and shifts. No fancy xori cost > optimizations like GCC does. > > * Suggestions on how to implement the tail call, given that the > prologue/epilogue has variable size. I will dig into the details of > mips/arm64/x86. :-) > > Next steps (prior patch proper) is cleaning up the code, add tail > calls, and making sure that bpftool disassembly works correctly. > > All input are welcome. This is my first RISC-V hack, so I sure there > are a lot things to improve! > > > Thanks, > Björn > > > Björn Töpel (3): > riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS > riscv: add build infra for JIT compiler > bpf, riscv: added eBPF JIT for RV64G > > arch/riscv/Kconfig | 2 + > arch/riscv/Makefile | 4 + > arch/riscv/net/Makefile | 5 + > arch/riscv/net/bpf_jit_comp.c | 1612 +++++++++++++++++++++++++++++++++ > 4 files changed, 1623 insertions(+) > create mode 100644 arch/riscv/net/Makefile > create mode 100644 arch/riscv/net/bpf_jit_comp.c > > -- > 2.19.1 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv ---end quoted text--- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/3] RV64G eBPF JIT 2019-01-15 15:40 ` [RFC PATCH 0/3] RV64G eBPF JIT Christoph Hellwig @ 2019-01-15 16:03 ` Björn Töpel 2019-01-25 19:02 ` Palmer Dabbelt 0 siblings, 1 reply; 23+ messages in thread From: Björn Töpel @ 2019-01-15 16:03 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-riscv, Palmer Dabbelt, davidlee, Daniel Borkmann, Netdev Den tis 15 jan. 2019 kl 16:40 skrev Christoph Hellwig <hch@infradead.org>: > > Hi Björn, > > at least for me patch 3 didn't make it to the list. > Hmm, held back: "Your message to linux-riscv awaits moderator approval". Exceeded the 40k limit. I'll wait until the moderator wakes up (Palmer?). Björn > On Tue, Jan 15, 2019 at 09:35:15AM +0100, Björn Töpel wrote: > > Hi! > > > > I've been hacking on a RV64G eBPF JIT compiler, and would like some > > feedback. > > > > Codewise, it needs some refactoring. Currently there's a bit too much > > copy-and-paste going on, and I know some places where I could optimize > > the code generation a bit (mostly BPF_K type of instructions, dealing > > with immediates). > > > > From a features perspective, two things are missing: > > > > * tail calls > > * "far-branches", i.e. conditional branches that reach beyond 13b. > > > > The test_bpf.ko (only tested on 4.20!) passes all tests. > > > > I've done all the tests on QEMU (version 3.1.50), so no real hardware. > > > > Some questions/observations: > > > > * I've added "HAVE_EFFICIENT_UNALIGNED_ACCESS" to > > arch/riscv/Kconfig. Is this assumption correct? > > > > * emit_imm() just relies on lui, adds and shifts. No fancy xori cost > > optimizations like GCC does. > > > > * Suggestions on how to implement the tail call, given that the > > prologue/epilogue has variable size. I will dig into the details of > > mips/arm64/x86. :-) > > > > Next steps (prior patch proper) is cleaning up the code, add tail > > calls, and making sure that bpftool disassembly works correctly. > > > > All input are welcome. This is my first RISC-V hack, so I sure there > > are a lot things to improve! > > > > > > Thanks, > > Björn > > > > > > Björn Töpel (3): > > riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS > > riscv: add build infra for JIT compiler > > bpf, riscv: added eBPF JIT for RV64G > > > > arch/riscv/Kconfig | 2 + > > arch/riscv/Makefile | 4 + > > arch/riscv/net/Makefile | 5 + > > arch/riscv/net/bpf_jit_comp.c | 1612 +++++++++++++++++++++++++++++++++ > > 4 files changed, 1623 insertions(+) > > create mode 100644 arch/riscv/net/Makefile > > create mode 100644 arch/riscv/net/bpf_jit_comp.c > > > > -- > > 2.19.1 > > > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > ---end quoted text--- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/3] RV64G eBPF JIT 2019-01-15 16:03 ` Björn Töpel @ 2019-01-25 19:02 ` Palmer Dabbelt 0 siblings, 0 replies; 23+ messages in thread From: Palmer Dabbelt @ 2019-01-25 19:02 UTC (permalink / raw) To: bjorn.topel; +Cc: Christoph Hellwig, linux-riscv, davidlee, daniel, netdev On Tue, 15 Jan 2019 08:03:18 PST (-0800), bjorn.topel@gmail.com wrote: > Den tis 15 jan. 2019 kl 16:40 skrev Christoph Hellwig <hch@infradead.org>: >> >> Hi Björn, >> >> at least for me patch 3 didn't make it to the list. >> > > Hmm, held back: "Your message to linux-riscv awaits moderator > approval". Exceeded the 40k limit. > > I'll wait until the moderator wakes up (Palmer?). Sorry, it took me a while to wake up :) > > > Björn > >> On Tue, Jan 15, 2019 at 09:35:15AM +0100, Björn Töpel wrote: >> > Hi! >> > >> > I've been hacking on a RV64G eBPF JIT compiler, and would like some >> > feedback. >> > >> > Codewise, it needs some refactoring. Currently there's a bit too much >> > copy-and-paste going on, and I know some places where I could optimize >> > the code generation a bit (mostly BPF_K type of instructions, dealing >> > with immediates). >> > >> > From a features perspective, two things are missing: >> > >> > * tail calls >> > * "far-branches", i.e. conditional branches that reach beyond 13b. >> > >> > The test_bpf.ko (only tested on 4.20!) passes all tests. >> > >> > I've done all the tests on QEMU (version 3.1.50), so no real hardware. >> > >> > Some questions/observations: >> > >> > * I've added "HAVE_EFFICIENT_UNALIGNED_ACCESS" to >> > arch/riscv/Kconfig. Is this assumption correct? >> > >> > * emit_imm() just relies on lui, adds and shifts. No fancy xori cost >> > optimizations like GCC does. >> > >> > * Suggestions on how to implement the tail call, given that the >> > prologue/epilogue has variable size. I will dig into the details of >> > mips/arm64/x86. :-) >> > >> > Next steps (prior patch proper) is cleaning up the code, add tail >> > calls, and making sure that bpftool disassembly works correctly. >> > >> > All input are welcome. This is my first RISC-V hack, so I sure there >> > are a lot things to improve! >> > >> > >> > Thanks, >> > Björn >> > >> > >> > Björn Töpel (3): >> > riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS >> > riscv: add build infra for JIT compiler >> > bpf, riscv: added eBPF JIT for RV64G >> > >> > arch/riscv/Kconfig | 2 + >> > arch/riscv/Makefile | 4 + >> > arch/riscv/net/Makefile | 5 + >> > arch/riscv/net/bpf_jit_comp.c | 1612 +++++++++++++++++++++++++++++++++ >> > 4 files changed, 1623 insertions(+) >> > create mode 100644 arch/riscv/net/Makefile >> > create mode 100644 arch/riscv/net/bpf_jit_comp.c >> > >> > -- >> > 2.19.1 >> > >> > >> > _______________________________________________ >> > linux-riscv mailing list >> > linux-riscv@lists.infradead.org >> > http://lists.infradead.org/mailman/listinfo/linux-riscv >> ---end quoted text--- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/3] RV64G eBPF JIT 2019-01-15 8:35 [RFC PATCH 0/3] RV64G eBPF JIT Björn Töpel ` (4 preceding siblings ...) 2019-01-15 15:40 ` [RFC PATCH 0/3] RV64G eBPF JIT Christoph Hellwig @ 2019-01-25 19:54 ` Paul Walmsley 2019-01-27 12:28 ` Björn Töpel 2019-01-30 2:02 ` Palmer Dabbelt 6 siblings, 1 reply; 23+ messages in thread From: Paul Walmsley @ 2019-01-25 19:54 UTC (permalink / raw) To: Björn Töpel; +Cc: linux-riscv, palmer, davidlee, daniel, netdev [-- Attachment #1: Type: text/plain, Size: 475 bytes --] Hi, thanks for taking a shot at the eBPF JIT; this will be very useful. On Tue, 15 Jan 2019, Björn Töpel wrote: > * I've added "HAVE_EFFICIENT_UNALIGNED_ACCESS" to > arch/riscv/Kconfig. Is this assumption correct? From a hardware point of view, this is not the case on the Linux-capable RISC-V ASICs that the public can buy right now (to the best of my knowledge this is only the SiFive FU540). So I'd recommend not including this for now. - Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/3] RV64G eBPF JIT 2019-01-25 19:54 ` Paul Walmsley @ 2019-01-27 12:28 ` Björn Töpel 0 siblings, 0 replies; 23+ messages in thread From: Björn Töpel @ 2019-01-27 12:28 UTC (permalink / raw) To: Paul Walmsley Cc: linux-riscv, Palmer Dabbelt, davidlee, Daniel Borkmann, Netdev Den fre 25 jan. 2019 kl 20:54 skrev Paul Walmsley <paul.walmsley@sifive.com>: > > Hi, > > thanks for taking a shot at the eBPF JIT; this will be very useful. > > On Tue, 15 Jan 2019, Björn Töpel wrote: > > > * I've added "HAVE_EFFICIENT_UNALIGNED_ACCESS" to > > arch/riscv/Kconfig. Is this assumption correct? > > From a hardware point of view, this is not the case on the Linux-capable > RISC-V ASICs that the public can buy right now (to the best of my > knowledge this is only the SiFive FU540). > > So I'd recommend not including this for now. > Got it! Thanks for clearing that up for me! Hopefully, I'll find some time the coming week to get a v2 out with tail-call support and most comments addressed. Cheers, Björn > > - Paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/3] RV64G eBPF JIT 2019-01-15 8:35 [RFC PATCH 0/3] RV64G eBPF JIT Björn Töpel ` (5 preceding siblings ...) 2019-01-25 19:54 ` Paul Walmsley @ 2019-01-30 2:02 ` Palmer Dabbelt 6 siblings, 0 replies; 23+ messages in thread From: Palmer Dabbelt @ 2019-01-30 2:02 UTC (permalink / raw) To: bjorn.topel; +Cc: linux-riscv, bjorn.topel, daniel, davidlee, netdev On Tue, 15 Jan 2019 00:35:15 PST (-0800), bjorn.topel@gmail.com wrote: > Hi! > > I've been hacking on a RV64G eBPF JIT compiler, and would like some > feedback. > > Codewise, it needs some refactoring. Currently there's a bit too much > copy-and-paste going on, and I know some places where I could optimize > the code generation a bit (mostly BPF_K type of instructions, dealing > with immediates). > > From a features perspective, two things are missing: > > * tail calls > * "far-branches", i.e. conditional branches that reach beyond 13b. > > The test_bpf.ko (only tested on 4.20!) passes all tests. > > I've done all the tests on QEMU (version 3.1.50), so no real hardware. > > Some questions/observations: > > * I've added "HAVE_EFFICIENT_UNALIGNED_ACCESS" to > arch/riscv/Kconfig. Is this assumption correct? > > * emit_imm() just relies on lui, adds and shifts. No fancy xori cost > optimizations like GCC does. > > * Suggestions on how to implement the tail call, given that the > prologue/epilogue has variable size. I will dig into the details of > mips/arm64/x86. :-) > > Next steps (prior patch proper) is cleaning up the code, add tail > calls, and making sure that bpftool disassembly works correctly. > > All input are welcome. This is my first RISC-V hack, so I sure there > are a lot things to improve! > > > Thanks, > Björn > > > Björn Töpel (3): > riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS > riscv: add build infra for JIT compiler > bpf, riscv: added eBPF JIT for RV64G > > arch/riscv/Kconfig | 2 + > arch/riscv/Makefile | 4 + > arch/riscv/net/Makefile | 5 + > arch/riscv/net/bpf_jit_comp.c | 1612 +++++++++++++++++++++++++++++++++ > 4 files changed, 1623 insertions(+) > create mode 100644 arch/riscv/net/Makefile > create mode 100644 arch/riscv/net/bpf_jit_comp.c Thanks for doing this. I saw a few reviews go by and I'm behind on email, so I'm going to drop this until a v2. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2019-01-30 2:02 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-15 8:35 [RFC PATCH 0/3] RV64G eBPF JIT Björn Töpel 2019-01-15 8:35 ` Björn Töpel 2019-01-15 8:35 ` [RFC PATCH 1/3] riscv: set HAVE_EFFICIENT_UNALIGNED_ACCESS Björn Töpel 2019-01-15 15:39 ` Christoph Hellwig 2019-01-15 16:06 ` Björn Töpel 2019-01-25 20:21 ` Palmer Dabbelt 2019-01-26 1:33 ` Jim Wilson 2019-01-29 2:43 ` Palmer Dabbelt 2019-01-15 8:35 ` [RFC PATCH 2/3] riscv: add build infra for JIT compiler Björn Töpel 2019-01-15 15:43 ` Christoph Hellwig 2019-01-15 16:09 ` Björn Töpel 2019-01-15 8:35 ` [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G Björn Töpel 2019-01-15 8:35 ` Björn Töpel 2019-01-15 23:49 ` Daniel Borkmann 2019-01-16 7:23 ` Björn Töpel 2019-01-16 15:41 ` Daniel Borkmann 2019-01-16 19:06 ` Björn Töpel 2019-01-15 15:40 ` [RFC PATCH 0/3] RV64G eBPF JIT Christoph Hellwig 2019-01-15 16:03 ` Björn Töpel 2019-01-25 19:02 ` Palmer Dabbelt 2019-01-25 19:54 ` Paul Walmsley 2019-01-27 12:28 ` Björn Töpel 2019-01-30 2:02 ` Palmer Dabbelt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).