[PATCH bpf-next v2 0/6] nfp: bpf: add multiplication and divide support on NFP JIT

* [PATCH bpf-next v2 0/6] nfp: bpf: add multiplication and divide support on NFP JIT
@ 2018-07-06 22:13 Jakub Kicinski
  2018-07-06 22:13 ` [PATCH bpf-next v2 1/6] lib: reciprocal_div: implement the improved algorithm on the paper mentioned Jakub Kicinski
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Jakub Kicinski @ 2018-07-06 22:13 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: Song Liu, oss-drivers, netdev, Jakub Kicinski

Jiong says:

NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per
step, therefore we need 2 steps for u16 and 4 steps for u32.

We also need one start instruction to initialize the sequence and one or
two instructions to fetch the result depending on either you need the high
halve of u32 multiplication.

For ALU64, if either operand is beyond u32's value range, we reject it. One
thing to note, if the source operand is BPF_K, then we need to check "imm"
field directly, and we'd reject it if it is negative.  Because for ALU64,
"imm" (with s32 type) is expected to be sign extended to s64 which NFP mul
doesn't support. For ALU32, it is fine for "imm" be negative though,
because the result is 32-bits and here is no difference on the low halve
of result for signed/unsigned mul, so we will get correct result.

NFP doesn't have integer divide instruction, this patch set uses reciprocal
algorithm (the basic one, reciprocal_div) to emulate it.

For each u32 divide, we would need 11 instructions to finish the operation.

   7 (for multiplication) + 4 (various ALUs) = 11

Given NFP only supports multiplication no bigger than u32, we'd require
divisor and dividend no bigger than that as well.

Also eBPF doesn't support signed divide and has enforced this on C language
level by failing compilation. However LLVM assembler hasn't enforced this,
so it is possible for negative constant to leak in as a BPF_K operand
through assembly code, we reject such cases as well.

Meanwhile reciprocal_div.h only implemented the basic version of:

   "Division by Invariant Integers Using Multiplication"
                          - Torbjörn Granlund and Peter L. Montgomery

This patch set further implements the optimized version (Figure 4.2 in the
paper) inside existing reciprocal_div.h. When the divider is even and the
calculated reciprocal magic number doesn't fit u32, we could reduce the
required ALU instructions from 4 to 2 or 1 for some cases.

The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations. It makes sense to use it for JIT divide
code generation (for example eBPF JIT backends) for which we are willing to
trade performance of JITed code with that of host.

v2:
  - add warning in l == 32 code path. (Song Liu/Jakub)
  - jit separate insn sequence for l == 32. (Jakub/Edwin)
  - should use unrestricted operand for mul.
  - once place should use "1U << exp" instead of "1 << exp".

Jiong Wang (6):
  lib: reciprocal_div: implement the improved algorithm on the paper
    mentioned
  nfp: bpf: rename umin/umax to umin_src/umax_src
  nfp: bpf: copy range info for all operands of all ALU operations
  nfp: bpf: support u16 and u32 multiplications
  nfp: bpf: support u32 divide using reciprocal_div.h
  nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h

 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 249 +++++++++++++++++-
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  43 +--
 .../net/ethernet/netronome/nfp/bpf/offload.c  |   6 +-
 .../net/ethernet/netronome/nfp/bpf/verifier.c |  85 +++++-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h  |  28 ++
 include/linux/reciprocal_div.h                |  68 +++++
 lib/reciprocal_div.c                          |  41 +++
 7 files changed, 483 insertions(+), 37 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 8+ messages in thread