[PATH bpf-next 00/13] bpf: propose new jmp32 instructions

* [PATH bpf-next 00/13] bpf: propose new jmp32 instructions
@ 2018-12-19 22:44 Jiong Wang
  2018-12-19 22:44 ` [PATH bpf-next 01/13] bpf: encoding description and macros for JMP32 Jiong Wang
                   ` (13 more replies)
  0 siblings, 14 replies; 18+ messages in thread
From: Jiong Wang @ 2018-12-19 22:44 UTC (permalink / raw)
  To: ast, daniel
  Cc: netdev, oss-drivers, Jiong Wang, David S . Miller, Paul Burton,
	Wang YanQing, Zi Shen Lim, Shubham Bansal, Naveen N . Rao,
	Sandipan Das, Martin Schwidefsky, Heiko Carstens

Current eBPF ISA has 32-bit sub-register and has defined a set of ALU32
instructions.

However, there is no JMP32 instructions, the consequence is code-gen for
32-bit sub-registers is not efficient. For example, explicit sign-extension
from 32-bit to 64-bit is needed for signed comparison.

Adding JMP32 instruction therefore could complete eBPF ISA on 32-bit
sub-register support. This also match those JMP32 instructions in most JIT
backends, for example x64-64 and AArch64. These new eBPF JMP32 instructions
could have one-to-one map on them.

A few verifier ALU32 related bugs has been fixed recently, and JMP32
introduced by this set further improves BPF sub-register ecosystem. Once
this is landed, BPF programs using 32-bit sub-register ISA could get
reasonably good support from verifier and JIT compilers. Users then could
compare the runtime efficiency of one BPF program under both modes, and
could use the one benchmarked as better. One good thing is JMP32 is making 
32-bit JIT more efficient, because it only has 32-bit use, no def, so
unlike ALU32, no need to clear high bits. Hence, even without data-flow
analysis, JMP32 is making better code-gen then JMP64. More benchmark
results are listed below in this cover letter.

 - Encoding

   Ideally, JMP32 could use new CLASS BPF_JMP32, just like BPF_ALU and
   BPF_ALU32. But we only has one class number 0x06 unused. I am not sure
   if we want to keep it for other extension purpose. For example restore
   it as BPF_MISC which could then redefine the interpretation of all the 
   remaining bits in bis[7:1];

   So, I am following the coding style used by BPF_PSEUDO_CALL, that is to
   use reserved bits under BPF_JMP. When BPF_SRC(code) == BPF_X, the
   encoding is 0x1 at insn->imm. When BPF_SRC(code) == BPF_K, the encoding
   is 0x1 at insn->src_reg. All other bits in imm and src_reg are still
   reserved and should be zeroed.

 - Testing

   A couple of unit tests has been added and included in this set. Also
   LLVM code-gen for JMP32 has been added, so you could just compile any
   BPF C program with both -mcpu=probe and -mattr=+alu32 specified if
   you are compiling on a machine with kernel patched by this set, LLVM
   will select the ISA automatically based on host probe results.
   Otherwise specify -mcpu=v3 and -mattr=+alu32 to force use JMP32 ISA
   and enable sub-register code-gen.

   LLVM support could be found at:

     https://github.com/Netronome/llvm/commit/607f088b92ebfb09f026a84a9443a59237cf6628

   (will send out merge request once kernel set reached consensus.
    Hopefully could get into LLVM 8.0 which will be branched at
    16-Jan-2019)

   I have compiled BPF selftest with JMP32 enabled. The methodology is
   BPF selftest Makefile has introduced a new variable "BPF_SELFTEST_32BIT"
   which allows BPF C programs contained inside the testsuite compiled
   using sub-register mode for which ALU32 and JMP32 instructions will be
   generated once the kernel installed on the compilation machine support
   them. From my tests, no regression on this sub-register test mode except
   when loading bpf_flow.o which somehow verifier doesn't reason the pkt
   range accurately. test_progs which contains quite a few BPF C tests
   passed cleanly.

   Using an env variable to control test mode seems bring smallest change to
   the Makefile, and would require "make check" with BPF_SELFTEST_32BIT
   defined in your test driver script for this new test mode.

   Would appreicate if any better idea on how to enable extra test mode for
   BPF selftests.

 - JIT backends support

   A couple of JIT backends has been supported in this set except SPARC
   and MIPS which I need maintainer's help on implementing them.
   @David, @Paul, would appreciate if you could help on this.

   Also those implemented in this set needs port maintainer's review and
   tests. I have only tested x86_64 and NFP.

 - Benchmarking

   Below are some benchmark results from Cilium BPF programs. After JMP32
   enabled, we could see consistently code size reduction and processed
   instruction numbers are reduced in general as well.

   Text size in bytes (generated by "size")
   ===
   LLVM code-gen option     default   alu32   alu32 + jmp32  change
                                                             (Vs. alu32)
   bpf_lb-DLB_L3.o:         6456      6280    6160           -1.91%
   bpf_lb-DLB_L4.o:         7848      7664    7136           -6.89%
   bpf_lb-DUNKNOWN.o:       2680      2664    2568           -3.60%
   bpf_lxc.o:               104824    104744  97360          -7.05%
   bpf_netdev.o:            23456     23576   21632          -8.25%
   bpf_overlay.o:           16184     16304   14648          -10.16%

   Processed insn number
   ===
   LLVM code-gen option     default   alu32   alu32 + jmp32  change

   bpf_lb-DLB_L3.o:         1579      1281    1304           +1.79%
   bpf_lb-DLB_L4.o:         2045      1663    1554           -6.55%
   bpf_lb-DUNKNOWN.o:       606       513     505            -1.56%
   bpf_lxc.o:               85381     103218  102666         -0.53%
   bpf_netdev.o:            5246      5809    5376           -7.45%
   bpf_overlay.o:           2443      2705    2460           -9.05%

   JITed insn num (on NFP, other 32-bit arches could be similar)
   ===
   LLVM code-gen option     default   alu32   alu32 + jmp32  change
                                                             (Vs. alu32)
   one ~300 line C program  632       612     597            -2.45%
   (NFP contains some fixed sequence, so the real improvements is higher)

Thanks.

Cc: David S. Miller <davem@davemloft.net>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Wang YanQing <udknight@gmail.com>
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Shubham Bansal <illusionist.neo@gmail.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>

Jiong Wang (13):
  bpf: encoding description and macros for JMP32
  bpf: interpreter support for JMP32
  bpf: JIT blinds support JMP32
  x86_64: bpf: implement jitting of JMP32
  x32: bpf: implement jitting of JMP32
  arm64: bpf: implement jitting of JMP32
  arm: bpf: implement jitting of JMP32
  ppc: bpf: implement jitting of JMP32
  s390: bpf: implement jitting of JMP32
  nfp: bpf: implement jitting of JMP32
  bpf: verifier support JMP32
  bpf: unit tests for JMP32
  selftests: bpf: makefile support sub-register code-gen test mode

 Documentation/networking/filter.txt          |  10 +
 arch/arm/net/bpf_jit_32.c                    |  23 +-
 arch/arm64/net/bpf_jit_comp.c                |  10 +-
 arch/powerpc/net/bpf_jit_comp64.c            |  50 ++++-
 arch/s390/net/bpf_jit_comp.c                 |  12 +-
 arch/x86/net/bpf_jit_comp.c                  |  13 +-
 arch/x86/net/bpf_jit_comp32.c                |  46 ++--
 drivers/net/ethernet/netronome/nfp/bpf/jit.c |  69 ++++--
 include/linux/filter.h                       |  19 ++
 include/uapi/linux/bpf.h                     |   4 +
 kernel/bpf/core.c                            |  60 +++--
 kernel/bpf/verifier.c                        | 178 +++++++++++----
 lib/test_bpf.c                               | 321 ++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h               |   4 +
 tools/testing/selftests/bpf/Makefile         |   4 +
 15 files changed, 696 insertions(+), 127 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 18+ messages in thread