bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes
@ 2019-05-03 10:42 Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type Jiong Wang
                   ` (16 more replies)
  0 siblings, 17 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, David S . Miller,
	Paul Burton, Wang YanQing, Zi Shen Lim, Shubham Bansal,
	Naveen N . Rao, Sandipan Das, Martin Schwidefsky, Heiko Carstens,
	Jakub Kicinski

v6:
  - Fixed s390 kbuild test robot error. (kbuild)
  - Make comment style in backends patches more consistent.

v5:
  - Adjusted several test_verifier helpers to make them works on hosts
    w and w/o hardware zext. (Naveen)
  - Make sure zext flag not set when verifier by-passed, for example,
    libtest_bpf.ko. (Naveen)
  - Conservatively mark bpf main return value as 64-bit. (Alexei)
  - Make sure read flag is either READ64 or READ32, not the mix of both.
    (Alexei)
  - Merged patch 1 and 2 in v4. (Alexei)
  - Fixed kbuild test robot warning on NFP. (kbuild)
  - Proposed new BPF_ZEXT insn to have optimal code-gen for various JIT
    back-ends.
  - Conservately set zext flags for patched-insn.
  - Fixed return value zext for helper function calls.
  - Also adjusted test_verifier scalability unit test to avoid triggerring
    too many insn patch which will hang computer.
  - re-tested on x86 host with llvm 9.0, no regression on test_verifier,
    test_progs, test_progs_32.
  - re-tested offload target (nfp), no regression on local testsuite.

v4:
  - added the two missing fixes which addresses two Jakub's reviewes in v3.
  - rebase on top of bpf-next.

v3:
  - remove redundant check in "propagate_liveness_reg". (Jakub)
  - add extra check in "mark_reg_read" to prune more search. (Jakub)
  - re-implemented "prog_flags" passing mechanism, removed use of
    global switch inside libbpf.
  - enabled high 32-bit randomization beyond "test_verifier" and
    "test_progs". Now it should have been enabled for all possible
    tests. Re-run all tests, haven't noticed regression.
  - remove RFC tag.

v2:
  - rebased on top of bpf-next master.
  - added comments for what is sub-register def index. (Edward, Alexei)
  - removed patch 1 which turns bit mask from enum to macro. (Alexei)
  - removed sysctl/bpf_jit_32bit_opt. (Alexei)
  - merged sub-register def insn index into reg state. (Alexei)
  - change test methodology (Alexei):
      + instead of simple unit tests on x86_64 for which this optimization
        doesn't enabled due to there is hardware support, poison high
        32-bit for whose def identified as safe to do so. this could let
        the correctness of this patch set checked when daily bpf selftest
        ran which delivers very stressful test on host machine like x86_64.
      + hi32 poisoning is gated by a new BPF_F_TEST_RND_HI32 prog flags.
      + BPF_F_TEST_RND_HI32 is enabled for all tests of "test_progs" and
        "test_verifier", the latter needs minor tweak on two unit tests,
        please see the patch for the change.
      + introduced a new global variable "libbpf_test_mode" into libbpf.
        once it is set to true, it will set BPF_F_TEST_RND_HI32 for all the
        later PROG_LOAD syscall, the goal is to easy the enable of hi32
        poison on exsiting testsuite.
        we could also introduce new APIs, for example "bpf_prog_test_load",
        then use -Dbpf_prog_load=bpf_prog_test_load to migrate tests under
        test_progs, but there are several load APIs, and such new API need
        some change on struture like "struct bpf_prog_load_attr".
      + removed old unit tests. it is based on insn scan and requires quite
        a few test_verifier generic code change. given hi32 randomization
        could offer good test coverage, the unit tests doesn't add much
        extra test value.
  - enhanced register width check ("is_reg64") when record sub-register
    write, now, it returns more accurate width.
  - Re-run all tests under "test_progs" and "test_verifier" on x86_64, no
    regression. Fixed a couple of bugs exposed:
      1. ctx field size transformation was not taken into account.
      2. insn patch could cause lost of original aux data which is
         important for ctx field conversion.
      3. return value for propagate_liveness was wrong and caused
         regression on processed insn number.
      4. helper call arg wasn't handled properly that path prune may cause
         64-bit read info in pruned path lost.
  - Re-run Cilium bpf prog for processed-insn-number benchmarking, no
    regression.

v1:
  - Fixed the missing handling on callee-saved for bpf-to-bpf call,
    sub-register defs therefore moved to frame state. (Jakub Kicinski)
  - Removed redundant "cross_reg". (Jakub Kicinski)
  - Various coding styles & grammar fixes. (Jakub Kicinski, Quentin Monnet)

eBPF ISA specification requires high 32-bit cleared when low 32-bit
sub-register is written. This applies to destination register of
ALU32/LD_H/B/W etc. JIT back-ends must guarantee this semantic when doing
code-gen.

x86-64 and arm64 ISA has the same semantics, so the corresponding JIT
back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
etc.) and some other 64-bit arches (powerpc, sparc etc), need explicitly
zero extension sequence to meet such semantic.

This is important, because for C code like the following:

  u64_value = (u64) u32_value
  ... other uses of u64_value

compiler could exploit the semantic described above and save those zero
extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
JIT back-ends, are responsible for guaranteeing this. Some benchmarks shows
~40% sub-register writes out of total insns, meaning ~40% extra code-gen
and could go up for arches requiring two shifts for zero extension. All
these are because JIT back-end needs to do extra code-gen for all such
instructions, always.

However this is not always necessary in case u32 value is never cast into a
u64, which is quite normal in real life program. So, it would be really
good if we could identify those places where such type cast happened, and
only do zero extensions for them, not for the others. This could save a lot
of BPF code-gen.

Algo
====
We could use insn scan based static analysis to tell whether one
sub-register def doesn't need zero extension. However, using such static
analysis, we must do conservative assumption at branching point where
multiple uses could be introduced. So, for any sub-register def that is
active at branching point, we need to mark it as needing zero extension.
This could introducing quite a few false alarms, for example ~25% on
Cilium bpf_lxc.

It will be far better to use dynamic data-flow tracing which verifier
fortunately already has and could be easily extend to serve the purpose of
this patch set.

 - Record indices of instructions that do sub-register def (write). And
   these indices need to stay with function state so path pruning and bpf
   to bpf function call could be handled properly.

   These indices are kept up to date while doing insn walk.

 - A full register read on an active sub-register def marks the def insn as
   needing zero extension on dst register.

 - A new sub-register write overrides the old one.

   A new full register write makes the register free of zero extension on
   dst register.

 - When propagating register read64 during path pruning, it also marks def
   insns whose defs are hanging active sub-register, if there is any read64
   from shown from the equal state.

 The core patch in this set is patch 4.

Benchmark
=========
 - I estimate the JITed image could be 25% smaller on average on all these
   affected arches (nfp, arm, x32, risv, ppc, sparc, s390).

 - The implementation is based on existing register read liveness tracking
   infrastructure, so it is dynamic tracking and would trace all possible
   code paths, therefore, it shouldn't be any false alarm.

   For Cilium bpf_lxc, there is ~11500 insns in the compiled binary (use
   latest LLVM snapshot, and with -mcpu=v3 -mattr=+alu32 enabled), 4460 of
   them has sub-register writes (~40%). Calculated by:

    cat dump | grep -P "\tw" | wc -l       (ALU32)
    cat dump | grep -P "r.*=.*u32" | wc -l (READ_W)
    cat dump | grep -P "r.*=.*u16" | wc -l (READ_H)
    cat dump | grep -P "r.*=.*u8" | wc -l  (READ_B)

   After this patch set enabled, up-to 647 out of those 4460 could be
   identified as really needing zero extension on the destination, then it
   is safe for JIT back-ends to eliminate zero extension for all the other
   instructions which is ~85% of all those sub-register write insns or 33%
   of total insns. It is a significant save.

   For those insns marked as needing zero extension, part of them are
   setting up u64 parameters for help calls, remaining ones are those whose
   sub-register defs really have 64-bit reads.

Cc: David S. Miller <davem@davemloft.net>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Wang YanQing <udknight@gmail.com>
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Shubham Bansal <illusionist.neo@gmail.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>

Jiong Wang (17):
  bpf: verifier: offer more accurate helper function arg and return type
  bpf: verifier: mark verified-insn with sub-register zext flag
  bpf: verifier: mark patched-insn with sub-register zext flag
  bpf: introduce new alu insn BPF_ZEXT for explicit zero extension
  bpf: verifier: insert BPF_ZEXT according to zext analysis result
  bpf: introduce new bpf prog load flags "BPF_F_TEST_RND_HI32"
  bpf: verifier: randomize high 32-bit when BPF_F_TEST_RND_HI32 is set
  libbpf: add "prog_flags" to
    bpf_program/bpf_prog_load_attr/bpf_load_program_attr
  selftests: bpf: adjust several test_verifier helpers for insn
    insertion
  selftests: bpf: enable hi32 randomization for all tests
  arm: bpf: eliminate zero extension code-gen
  powerpc: bpf: eliminate zero extension code-gen
  s390: bpf: eliminate zero extension code-gen
  sparc: bpf: eliminate zero extension code-gen
  x32: bpf: eliminate zero extension code-gen
  riscv: bpf: eliminate zero extension code-gen
  nfp: bpf: eliminate zero extension code-gen

 Documentation/networking/filter.txt                |  10 +
 arch/arm/net/bpf_jit_32.c                          |  35 +-
 arch/powerpc/net/bpf_jit_comp64.c                  |  13 +-
 arch/riscv/net/bpf_jit_comp.c                      |  36 ++-
 arch/s390/net/bpf_jit_comp.c                       |  20 +-
 arch/sparc/net/bpf_jit_comp_64.c                   |  12 +-
 arch/x86/net/bpf_jit_comp32.c                      |  39 ++-
 drivers/net/ethernet/netronome/nfp/bpf/jit.c       | 115 ++++---
 drivers/net/ethernet/netronome/nfp/bpf/main.h      |   2 +
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c  |  12 +
 include/linux/bpf.h                                |   7 +-
 include/linux/bpf_verifier.h                       |  14 +-
 include/linux/filter.h                             |   1 +
 include/uapi/linux/bpf.h                           |  21 ++
 kernel/bpf/core.c                                  |  14 +-
 kernel/bpf/helpers.c                               |  10 +-
 kernel/bpf/syscall.c                               |   4 +-
 kernel/bpf/verifier.c                              | 352 +++++++++++++++++++--
 kernel/trace/bpf_trace.c                           |   4 +-
 net/core/filter.c                                  |  38 +--
 tools/include/uapi/linux/bpf.h                     |  21 ++
 tools/lib/bpf/bpf.c                                |   1 +
 tools/lib/bpf/bpf.h                                |   1 +
 tools/lib/bpf/libbpf.c                             |   3 +
 tools/lib/bpf/libbpf.h                             |   1 +
 tools/testing/selftests/bpf/Makefile               |  10 +-
 .../selftests/bpf/prog_tests/bpf_verif_scale.c     |   1 +
 tools/testing/selftests/bpf/test_sock_addr.c       |   1 +
 tools/testing/selftests/bpf/test_sock_fields.c     |   1 +
 tools/testing/selftests/bpf/test_socket_cookie.c   |   1 +
 tools/testing/selftests/bpf/test_stub.c            |  40 +++
 tools/testing/selftests/bpf/test_verifier.c        |  31 +-
 32 files changed, 716 insertions(+), 155 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_stub.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-06 13:57   ` Daniel Borkmann
  2019-05-06 15:50   ` Alexei Starovoitov
  2019-05-03 10:42 ` [PATCH v6 bpf-next 02/17] bpf: verifier: mark verified-insn with sub-register zext flag Jiong Wang
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

BPF helper call transfers execution from eBPF insns to native functions
while verifier insn walker only walks eBPF insns. So, verifier can only
knows argument and return value types from explicit helper function
prototype descriptions.

For 32-bit optimization, it is important to know whether argument (register
use from eBPF insn) and return value (register define from external
function) is 32-bit or 64-bit, so corresponding registers could be
zero-extended correctly.

For arguments, they are register uses, we conservatively treat all of them
as 64-bit at default, while the following new bpf_arg_type are added so we
could start to mark those frequently used helper functions with more
accurate argument type.

  ARG_CONST_SIZE32
  ARG_CONST_SIZE32_OR_ZERO
  ARG_ANYTHING32

A few helper functions shown up frequently inside Cilium bpf program are
updated using these new types.

For return values, they are register defs, we need to know accurate width
for correct zero extensions. Given most of the helper functions returning
integers return 32-bit value, a new RET_INTEGER64 is added to make those
functions return 64-bit value. All related helper functions are updated.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf.h      |  6 +++++-
 kernel/bpf/core.c        |  2 +-
 kernel/bpf/helpers.c     | 10 +++++-----
 kernel/bpf/verifier.c    | 15 ++++++++++-----
 kernel/trace/bpf_trace.c |  4 ++--
 net/core/filter.c        | 38 +++++++++++++++++++-------------------
 6 files changed, 42 insertions(+), 33 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9a21848..11a5fb9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -198,9 +198,12 @@ enum bpf_arg_type {
 
 	ARG_CONST_SIZE,		/* number of bytes accessed from memory */
 	ARG_CONST_SIZE_OR_ZERO,	/* number of bytes accessed from memory or 0 */
+	ARG_CONST_SIZE32,	/* Likewise, but size fits into 32-bit */
+	ARG_CONST_SIZE32_OR_ZERO,	/* Ditto */
 
 	ARG_PTR_TO_CTX,		/* pointer to context */
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
+	ARG_ANYTHING32,		/* Likewise, but it is a 32-bit argument */
 	ARG_PTR_TO_SPIN_LOCK,	/* pointer to bpf_spin_lock */
 	ARG_PTR_TO_SOCK_COMMON,	/* pointer to sock_common */
 	ARG_PTR_TO_INT,		/* pointer to int */
@@ -210,7 +213,8 @@ enum bpf_arg_type {
 
 /* type of values returned from helper functions */
 enum bpf_return_type {
-	RET_INTEGER,			/* function returns integer */
+	RET_INTEGER,			/* function returns 32-bit integer */
+	RET_INTEGER64,			/* function returns 64-bit integer */
 	RET_VOID,			/* function doesn't return anything */
 	RET_PTR_TO_MAP_VALUE,		/* returns a pointer to map elem value */
 	RET_PTR_TO_MAP_VALUE_OR_NULL,	/* returns a pointer to map elem value or NULL */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index ace8c22..2792eda 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2067,7 +2067,7 @@ const struct bpf_func_proto bpf_tail_call_proto = {
 	.ret_type	= RET_VOID,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_CONST_MAP_PTR,
-	.arg3_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING32,
 };
 
 /* Stub for JITs that only support cBPF. eBPF programs are interpreted.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 4266ffd..60f6e31 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -157,7 +157,7 @@ BPF_CALL_0(bpf_ktime_get_ns)
 const struct bpf_func_proto bpf_ktime_get_ns_proto = {
 	.func		= bpf_ktime_get_ns,
 	.gpl_only	= true,
-	.ret_type	= RET_INTEGER,
+	.ret_type	= RET_INTEGER64,
 };
 
 BPF_CALL_0(bpf_get_current_pid_tgid)
@@ -173,7 +173,7 @@ BPF_CALL_0(bpf_get_current_pid_tgid)
 const struct bpf_func_proto bpf_get_current_pid_tgid_proto = {
 	.func		= bpf_get_current_pid_tgid,
 	.gpl_only	= false,
-	.ret_type	= RET_INTEGER,
+	.ret_type	= RET_INTEGER64,
 };
 
 BPF_CALL_0(bpf_get_current_uid_gid)
@@ -193,7 +193,7 @@ BPF_CALL_0(bpf_get_current_uid_gid)
 const struct bpf_func_proto bpf_get_current_uid_gid_proto = {
 	.func		= bpf_get_current_uid_gid,
 	.gpl_only	= false,
-	.ret_type	= RET_INTEGER,
+	.ret_type	= RET_INTEGER64,
 };
 
 BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size)
@@ -221,7 +221,7 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
-	.arg2_type	= ARG_CONST_SIZE,
+	.arg2_type	= ARG_CONST_SIZE32,
 };
 
 #if defined(CONFIG_QUEUED_SPINLOCKS) || defined(CONFIG_BPF_ARCH_SPINLOCK)
@@ -331,7 +331,7 @@ BPF_CALL_0(bpf_get_current_cgroup_id)
 const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
 	.func		= bpf_get_current_cgroup_id,
 	.gpl_only	= false,
-	.ret_type	= RET_INTEGER,
+	.ret_type	= RET_INTEGER64,
 };
 
 #ifdef CONFIG_CGROUP_BPF
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2717172..07ab563 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2492,7 +2492,9 @@ static bool arg_type_is_mem_ptr(enum bpf_arg_type type)
 static bool arg_type_is_mem_size(enum bpf_arg_type type)
 {
 	return type == ARG_CONST_SIZE ||
-	       type == ARG_CONST_SIZE_OR_ZERO;
+	       type == ARG_CONST_SIZE_OR_ZERO ||
+	       type == ARG_CONST_SIZE32 ||
+	       type == ARG_CONST_SIZE32_OR_ZERO;
 }
 
 static bool arg_type_is_int_ptr(enum bpf_arg_type type)
@@ -2526,7 +2528,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 	if (err)
 		return err;
 
-	if (arg_type == ARG_ANYTHING) {
+	if (arg_type == ARG_ANYTHING || arg_type == ARG_ANYTHING32) {
 		if (is_pointer_value(env, regno)) {
 			verbose(env, "R%d leaks addr into helper function\n",
 				regno);
@@ -2554,7 +2556,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			 type != expected_type)
 			goto err_type;
 	} else if (arg_type == ARG_CONST_SIZE ||
-		   arg_type == ARG_CONST_SIZE_OR_ZERO) {
+		   arg_type == ARG_CONST_SIZE_OR_ZERO ||
+		   arg_type == ARG_CONST_SIZE32 ||
+		   arg_type == ARG_CONST_SIZE32_OR_ZERO) {
 		expected_type = SCALAR_VALUE;
 		if (type != expected_type)
 			goto err_type;
@@ -2660,7 +2664,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 					      meta->map_ptr->value_size, false,
 					      meta);
 	} else if (arg_type_is_mem_size(arg_type)) {
-		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
+		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO ||
+					  arg_type == ARG_CONST_SIZE32_OR_ZERO);
 
 		/* remember the mem_size which may be used later
 		 * to refine return values.
@@ -3333,7 +3338,7 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 	}
 
 	/* update return register (already marked as written above) */
-	if (fn->ret_type == RET_INTEGER) {
+	if (fn->ret_type == RET_INTEGER || fn->ret_type == RET_INTEGER64) {
 		/* sets type to SCALAR_VALUE */
 		mark_reg_unknown(env, regs, BPF_REG_0);
 	} else if (fn->ret_type == RET_VOID) {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 8607aba..f300b68 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -370,7 +370,7 @@ BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
 static const struct bpf_func_proto bpf_perf_event_read_proto = {
 	.func		= bpf_perf_event_read,
 	.gpl_only	= true,
-	.ret_type	= RET_INTEGER,
+	.ret_type	= RET_INTEGER64,
 	.arg1_type	= ARG_CONST_MAP_PTR,
 	.arg2_type	= ARG_ANYTHING,
 };
@@ -503,7 +503,7 @@ BPF_CALL_0(bpf_get_current_task)
 static const struct bpf_func_proto bpf_get_current_task_proto = {
 	.func		= bpf_get_current_task,
 	.gpl_only	= true,
-	.ret_type	= RET_INTEGER,
+	.ret_type	= RET_INTEGER64,
 };
 
 BPF_CALL_2(bpf_current_task_under_cgroup, struct bpf_map *, map, u32, idx)
diff --git a/net/core/filter.c b/net/core/filter.c
index 55bfc94..72f2abe 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1695,9 +1695,9 @@ static const struct bpf_func_proto bpf_skb_store_bytes_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_PTR_TO_MEM,
-	.arg4_type	= ARG_CONST_SIZE,
+	.arg4_type	= ARG_CONST_SIZE32,
 	.arg5_type	= ARG_ANYTHING,
 };
 
@@ -1760,9 +1760,9 @@ static const struct bpf_func_proto bpf_flow_dissector_load_bytes_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
-	.arg4_type	= ARG_CONST_SIZE,
+	.arg4_type	= ARG_CONST_SIZE32,
 };
 
 BPF_CALL_5(bpf_skb_load_bytes_relative, const struct sk_buff *, skb,
@@ -1911,7 +1911,7 @@ static const struct bpf_func_proto bpf_l3_csum_replace_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_ANYTHING,
 	.arg4_type	= ARG_ANYTHING,
 	.arg5_type	= ARG_ANYTHING,
@@ -1964,7 +1964,7 @@ static const struct bpf_func_proto bpf_l4_csum_replace_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_ANYTHING,
 	.arg4_type	= ARG_ANYTHING,
 	.arg5_type	= ARG_ANYTHING,
@@ -2003,9 +2003,9 @@ static const struct bpf_func_proto bpf_csum_diff_proto = {
 	.pkt_access	= true,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_MEM_OR_NULL,
-	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg2_type	= ARG_CONST_SIZE32_OR_ZERO,
 	.arg3_type	= ARG_PTR_TO_MEM_OR_NULL,
-	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg4_type	= ARG_CONST_SIZE32_OR_ZERO,
 	.arg5_type	= ARG_ANYTHING,
 };
 
@@ -2186,7 +2186,7 @@ static const struct bpf_func_proto bpf_redirect_proto = {
 	.func           = bpf_redirect,
 	.gpl_only       = false,
 	.ret_type       = RET_INTEGER,
-	.arg1_type      = ARG_ANYTHING,
+	.arg1_type      = ARG_ANYTHING32,
 	.arg2_type      = ARG_ANYTHING,
 };
 
@@ -2964,7 +2964,7 @@ static const struct bpf_func_proto bpf_skb_change_proto_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_ANYTHING,
 };
 
@@ -2984,7 +2984,7 @@ static const struct bpf_func_proto bpf_skb_change_type_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 };
 
 static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
@@ -3287,7 +3287,7 @@ static const struct bpf_func_proto bpf_skb_change_tail_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_ANYTHING,
 };
 
@@ -3883,7 +3883,7 @@ static const struct bpf_func_proto bpf_skb_get_tunnel_key_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
-	.arg3_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_CONST_SIZE32,
 	.arg4_type	= ARG_ANYTHING,
 };
 
@@ -3992,7 +3992,7 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_key_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_PTR_TO_MEM,
-	.arg3_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_CONST_SIZE32,
 	.arg4_type	= ARG_ANYTHING,
 };
 
@@ -4091,7 +4091,7 @@ BPF_CALL_1(bpf_skb_cgroup_id, const struct sk_buff *, skb)
 static const struct bpf_func_proto bpf_skb_cgroup_id_proto = {
 	.func           = bpf_skb_cgroup_id,
 	.gpl_only       = false,
-	.ret_type       = RET_INTEGER,
+	.ret_type       = RET_INTEGER64,
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
@@ -4116,7 +4116,7 @@ BPF_CALL_2(bpf_skb_ancestor_cgroup_id, const struct sk_buff *, skb, int,
 static const struct bpf_func_proto bpf_skb_ancestor_cgroup_id_proto = {
 	.func           = bpf_skb_ancestor_cgroup_id,
 	.gpl_only       = false,
-	.ret_type       = RET_INTEGER,
+	.ret_type       = RET_INTEGER64,
 	.arg1_type      = ARG_PTR_TO_CTX,
 	.arg2_type      = ARG_ANYTHING,
 };
@@ -4162,7 +4162,7 @@ BPF_CALL_1(bpf_get_socket_cookie, struct sk_buff *, skb)
 static const struct bpf_func_proto bpf_get_socket_cookie_proto = {
 	.func           = bpf_get_socket_cookie,
 	.gpl_only       = false,
-	.ret_type       = RET_INTEGER,
+	.ret_type       = RET_INTEGER64,
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
@@ -4174,7 +4174,7 @@ BPF_CALL_1(bpf_get_socket_cookie_sock_addr, struct bpf_sock_addr_kern *, ctx)
 static const struct bpf_func_proto bpf_get_socket_cookie_sock_addr_proto = {
 	.func		= bpf_get_socket_cookie_sock_addr,
 	.gpl_only	= false,
-	.ret_type	= RET_INTEGER,
+	.ret_type	= RET_INTEGER64,
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
@@ -4186,7 +4186,7 @@ BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx)
 static const struct bpf_func_proto bpf_get_socket_cookie_sock_ops_proto = {
 	.func		= bpf_get_socket_cookie_sock_ops,
 	.gpl_only	= false,
-	.ret_type	= RET_INTEGER,
+	.ret_type	= RET_INTEGER64,
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 02/17] bpf: verifier: mark verified-insn with sub-register zext flag
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-06 13:49   ` Daniel Borkmann
  2019-05-03 10:42 ` [PATCH v6 bpf-next 03/17] bpf: verifier: mark patched-insn " Jiong Wang
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

eBPF ISA specification requires high 32-bit cleared when low 32-bit
sub-register is written. This applies to destination register of ALU32 etc.
JIT back-ends must guarantee this semantic when doing code-gen.

x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
extension sequence to meet such semantic.

This is important, because for code the following:

  u64_value = (u64) u32_value
  ... other uses of u64_value

compiler could exploit the semantic described above and save those zero
extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
could go up to more for some arches which requires two shifts for zero
extension) because JIT back-end needs to do extra code-gen for all such
instructions.

However this is not always necessary in case u32_value is never cast into
a u64, which is quite normal in real life program. So, it would be really
good if we could identify those places where such type cast happened, and
only do zero extensions for them, not for the others. This could save a lot
of BPF code-gen.

Algo:
 - Split read flags into READ32 and READ64.

 - Record indices of instructions that do sub-register def (write). And
   these indices need to stay with reg state so path pruning and bpf
   to bpf function call could be handled properly.

   These indices are kept up to date while doing insn walk.

 - A full register read on an active sub-register def marks the def insn as
   needing zero extension on dst register.

 - A new sub-register write overrides the old one.

   A new full register write makes the register free of zero extension on
   dst register.

 - When propagating read64 during path pruning, also marks def insns whose
   defs are hanging active sub-register.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf_verifier.h |  14 ++-
 kernel/bpf/verifier.c        | 213 ++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 211 insertions(+), 16 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 1305ccb..6a0b12c 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -36,9 +36,11 @@
  */
 enum bpf_reg_liveness {
 	REG_LIVE_NONE = 0, /* reg hasn't been read or written this branch */
-	REG_LIVE_READ, /* reg was read, so we're sensitive to initial value */
-	REG_LIVE_WRITTEN, /* reg was written first, screening off later reads */
-	REG_LIVE_DONE = 4, /* liveness won't be updating this register anymore */
+	REG_LIVE_READ32 = 0x1, /* reg was read, so we're sensitive to initial value */
+	REG_LIVE_READ64 = 0x2, /* likewise, but full 64-bit content matters */
+	REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64,
+	REG_LIVE_WRITTEN = 0x4, /* reg was written first, screening off later reads */
+	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
 };
 
 struct bpf_reg_state {
@@ -131,6 +133,11 @@ struct bpf_reg_state {
 	 * pointing to bpf_func_state.
 	 */
 	u32 frameno;
+	/* Tracks subreg definition. The stored value is the insn_idx of the
+	 * writing insn. This is safe because subreg_def is used before any insn
+	 * patching which only happens after main verification finished.
+	 */
+	s32 subreg_def;
 	enum bpf_reg_liveness live;
 };
 
@@ -232,6 +239,7 @@ struct bpf_insn_aux_data {
 	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
 	int sanitize_stack_off; /* stack slot to be cleared */
 	bool seen; /* this insn was processed by the verifier */
+	bool zext_dst; /* this insn zero extend dst reg */
 	u8 alu_state; /* used in combination with alu_limit */
 	unsigned int orig_idx; /* original instruction index */
 };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 07ab563..43ea665 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -981,6 +981,7 @@ static void mark_reg_not_init(struct bpf_verifier_env *env,
 	__mark_reg_not_init(regs + regno);
 }
 
+#define DEF_NOT_SUBREG	(-1)
 static void init_reg_state(struct bpf_verifier_env *env,
 			   struct bpf_func_state *state)
 {
@@ -991,6 +992,7 @@ static void init_reg_state(struct bpf_verifier_env *env,
 		mark_reg_not_init(env, regs, i);
 		regs[i].live = REG_LIVE_NONE;
 		regs[i].parent = NULL;
+		regs[i].subreg_def = DEF_NOT_SUBREG;
 	}
 
 	/* frame pointer */
@@ -1136,7 +1138,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
  */
 static int mark_reg_read(struct bpf_verifier_env *env,
 			 const struct bpf_reg_state *state,
-			 struct bpf_reg_state *parent)
+			 struct bpf_reg_state *parent, u8 flag)
 {
 	bool writes = parent == state->parent; /* Observe write marks */
 	int cnt = 0;
@@ -1151,17 +1153,26 @@ static int mark_reg_read(struct bpf_verifier_env *env,
 				parent->var_off.value, parent->off);
 			return -EFAULT;
 		}
-		if (parent->live & REG_LIVE_READ)
+		/* The first condition is more likely to be true than the
+		 * second, checked it first.
+		 */
+		if ((parent->live & REG_LIVE_READ) == flag ||
+		    parent->live & REG_LIVE_READ64)
 			/* The parentage chain never changes and
 			 * this parent was already marked as LIVE_READ.
 			 * There is no need to keep walking the chain again and
 			 * keep re-marking all parents as LIVE_READ.
 			 * This case happens when the same register is read
 			 * multiple times without writes into it in-between.
+			 * Also, if parent has the stronger REG_LIVE_READ64 set,
+			 * then no need to set the weak REG_LIVE_READ32.
 			 */
 			break;
 		/* ... then we depend on parent's value */
-		parent->live |= REG_LIVE_READ;
+		parent->live |= flag;
+		/* REG_LIVE_READ64 overrides REG_LIVE_READ32. */
+		if (flag == REG_LIVE_READ64)
+			parent->live &= ~REG_LIVE_READ32;
 		state = parent;
 		parent = state->parent;
 		writes = true;
@@ -1173,12 +1184,146 @@ static int mark_reg_read(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static bool helper_call_arg64(struct bpf_verifier_env *env, int func_id,
+			      u32 regno)
+{
+	/* get_func_proto must succeed, other it should have been rejected
+	 * early inside check_helper_call.
+	 */
+	const struct bpf_func_proto *fn =
+		env->ops->get_func_proto(func_id, env->prog);
+	enum bpf_arg_type arg_type;
+
+	switch (regno) {
+	case BPF_REG_1:
+		arg_type = fn->arg1_type;
+		break;
+	case BPF_REG_2:
+		arg_type = fn->arg2_type;
+		break;
+	case BPF_REG_3:
+		arg_type = fn->arg3_type;
+		break;
+	case BPF_REG_4:
+		arg_type = fn->arg4_type;
+		break;
+	case BPF_REG_5:
+		arg_type = fn->arg5_type;
+		break;
+	default:
+		arg_type = ARG_DONTCARE;
+	}
+
+	return arg_type != ARG_CONST_SIZE32 &&
+	       arg_type != ARG_CONST_SIZE32_OR_ZERO &&
+	       arg_type != ARG_ANYTHING32;
+}
+
+/* This function is supposed to be used by the following 32-bit optimization
+ * code only. It returns TRUE if the source or destination register operates
+ * on 64-bit, otherwise return FALSE.
+ */
+static bool is_reg64(struct bpf_verifier_env *env, struct bpf_insn *insn,
+		     u32 regno, struct bpf_reg_state *reg, enum reg_arg_type t)
+{
+	u8 code, class, op;
+
+	code = insn->code;
+	class = BPF_CLASS(code);
+	op = BPF_OP(code);
+	if (class == BPF_JMP) {
+		/* BPF_EXIT for "main" will reach here. Return TRUE
+		 * conservatively.
+		 */
+		if (op == BPF_EXIT)
+			return true;
+		if (op == BPF_CALL) {
+			/* BPF to BPF call will reach here because of marking
+			 * caller saved clobber with DST_OP_NO_MARK for which we
+			 * don't care the register def because they are anyway
+			 * marked as NOT_INIT already.
+			 */
+			if (insn->src_reg == BPF_PSEUDO_CALL)
+				return false;
+			/* Helper call will reach here because of arg type
+			 * check.
+			 */
+			if (t == SRC_OP)
+				return helper_call_arg64(env, insn->imm, regno);
+
+			return false;
+		}
+	}
+
+	if (class == BPF_ALU64 || class == BPF_JMP ||
+	    /* BPF_END always use BPF_ALU class. */
+	    (class == BPF_ALU && op == BPF_END && insn->imm == 64))
+		return true;
+
+	if (class == BPF_ALU || class == BPF_JMP32)
+		return false;
+
+	if (class == BPF_LDX) {
+		if (t != SRC_OP)
+			return BPF_SIZE(code) == BPF_DW;
+		/* LDX source must be ptr. */
+		return true;
+	}
+
+	if (class == BPF_STX) {
+		if (reg->type != SCALAR_VALUE)
+			return true;
+		return BPF_SIZE(code) == BPF_DW;
+	}
+
+	if (class == BPF_LD) {
+		u8 mode = BPF_MODE(code);
+
+		/* LD_IMM64 */
+		if (mode == BPF_IMM)
+			return true;
+
+		/* Both LD_IND and LD_ABS return 32-bit data. */
+		if (t != SRC_OP)
+			return  false;
+
+		/* Implicit ctx ptr. */
+		if (regno == BPF_REG_6)
+			return true;
+
+		/* Explicit source could be any width. */
+		return true;
+	}
+
+	if (class == BPF_ST)
+		/* The only source register for BPF_ST is a ptr. */
+		return true;
+
+	/* Conservatively return true at default. */
+	return true;
+}
+
+static void mark_insn_zext(struct bpf_verifier_env *env,
+			   struct bpf_reg_state *reg)
+{
+	s32 def_idx = reg->subreg_def;
+
+	if (def_idx == DEF_NOT_SUBREG)
+		return;
+
+	env->insn_aux_data[def_idx].zext_dst = true;
+	/* The dst will be zero extended, so won't be sub-register anymore. */
+	reg->subreg_def = DEF_NOT_SUBREG;
+}
+
 static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 			 enum reg_arg_type t)
 {
 	struct bpf_verifier_state *vstate = env->cur_state;
 	struct bpf_func_state *state = vstate->frame[vstate->curframe];
+	struct bpf_insn *insn = env->prog->insnsi + env->insn_idx;
 	struct bpf_reg_state *reg, *regs = state->regs;
+	bool rw64;
 
 	if (regno >= MAX_BPF_REG) {
 		verbose(env, "R%d is invalid\n", regno);
@@ -1186,6 +1331,7 @@ static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 	}
 
 	reg = &regs[regno];
+	rw64 = is_reg64(env, insn, regno, reg, t);
 	if (t == SRC_OP) {
 		/* check whether register used as source operand can be read */
 		if (reg->type == NOT_INIT) {
@@ -1196,7 +1342,11 @@ static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 		if (regno == BPF_REG_FP)
 			return 0;
 
-		return mark_reg_read(env, reg, reg->parent);
+		if (rw64)
+			mark_insn_zext(env, reg);
+
+		return mark_reg_read(env, reg, reg->parent,
+				     rw64 ? REG_LIVE_READ64 : REG_LIVE_READ32);
 	} else {
 		/* check whether register used as dest operand can be written to */
 		if (regno == BPF_REG_FP) {
@@ -1204,6 +1354,7 @@ static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		reg->live |= REG_LIVE_WRITTEN;
+		reg->subreg_def = rw64 ? DEF_NOT_SUBREG : env->insn_idx;
 		if (t == DST_OP)
 			mark_reg_unknown(env, regs, regno);
 	}
@@ -1383,7 +1534,8 @@ static int check_stack_read(struct bpf_verifier_env *env,
 			state->regs[value_regno].live |= REG_LIVE_WRITTEN;
 		}
 		mark_reg_read(env, &reg_state->stack[spi].spilled_ptr,
-			      reg_state->stack[spi].spilled_ptr.parent);
+			      reg_state->stack[spi].spilled_ptr.parent,
+			      REG_LIVE_READ64);
 		return 0;
 	} else {
 		int zeros = 0;
@@ -1400,7 +1552,9 @@ static int check_stack_read(struct bpf_verifier_env *env,
 			return -EACCES;
 		}
 		mark_reg_read(env, &reg_state->stack[spi].spilled_ptr,
-			      reg_state->stack[spi].spilled_ptr.parent);
+			      reg_state->stack[spi].spilled_ptr.parent,
+			      size == BPF_REG_SIZE
+			      ? REG_LIVE_READ64 : REG_LIVE_READ32);
 		if (value_regno >= 0) {
 			if (zeros == size) {
 				/* any size read into register is zero extended,
@@ -2109,6 +2263,12 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 						    value_regno);
 				if (reg_type_may_be_null(reg_type))
 					regs[value_regno].id = ++env->id_gen;
+				/* A load of ctx field could have different
+				 * actual load size with the one encoded in the
+				 * insn. When the dst is PTR, it is for sure not
+				 * a sub-register.
+				 */
+				regs[value_regno].subreg_def = DEF_NOT_SUBREG;
 			}
 			regs[value_regno].type = reg_type;
 		}
@@ -2368,7 +2528,9 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
 		 * the whole slot to be marked as 'read'
 		 */
 		mark_reg_read(env, &state->stack[spi].spilled_ptr,
-			      state->stack[spi].spilled_ptr.parent);
+			      state->stack[spi].spilled_ptr.parent,
+			      access_size == BPF_REG_SIZE
+			      ? REG_LIVE_READ64 : REG_LIVE_READ32);
 	}
 	return update_stack_depth(env, state, min_off);
 }
@@ -3337,10 +3499,16 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 		check_reg_arg(env, caller_saved[i], DST_OP_NO_MARK);
 	}
 
+	/* assume helper call has returned 64-bit value. */
+	regs[BPF_REG_0].subreg_def = DEF_NOT_SUBREG;
+
 	/* update return register (already marked as written above) */
 	if (fn->ret_type == RET_INTEGER || fn->ret_type == RET_INTEGER64) {
 		/* sets type to SCALAR_VALUE */
 		mark_reg_unknown(env, regs, BPF_REG_0);
+		/* RET_INTEGER returns sub-register. */
+		if (fn->ret_type == RET_INTEGER)
+			regs[BPF_REG_0].subreg_def = insn_idx;
 	} else if (fn->ret_type == RET_VOID) {
 		regs[BPF_REG_0].type = NOT_INIT;
 	} else if (fn->ret_type == RET_PTR_TO_MAP_VALUE_OR_NULL ||
@@ -4268,6 +4436,7 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 				 */
 				*dst_reg = *src_reg;
 				dst_reg->live |= REG_LIVE_WRITTEN;
+				dst_reg->subreg_def = DEF_NOT_SUBREG;
 			} else {
 				/* R1 = (u32) R2 */
 				if (is_pointer_value(env, insn->src_reg)) {
@@ -4278,6 +4447,7 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 				} else if (src_reg->type == SCALAR_VALUE) {
 					*dst_reg = *src_reg;
 					dst_reg->live |= REG_LIVE_WRITTEN;
+					dst_reg->subreg_def = env->insn_idx;
 				} else {
 					mark_reg_unknown(env, regs,
 							 insn->dst_reg);
@@ -5341,6 +5511,8 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	 * Already marked as written above.
 	 */
 	mark_reg_unknown(env, regs, BPF_REG_0);
+	/* ld_abs load up to 32-bit skb data. */
+	regs[BPF_REG_0].subreg_def = env->insn_idx;
 	return 0;
 }
 
@@ -6281,20 +6453,33 @@ static bool states_equal(struct bpf_verifier_env *env,
 	return true;
 }
 
+/* Return 0 if no propagation happened. Return negative error code if error
+ * happened. Otherwise, return the propagated bit.
+ */
 static int propagate_liveness_reg(struct bpf_verifier_env *env,
 				  struct bpf_reg_state *reg,
 				  struct bpf_reg_state *parent_reg)
 {
+	u8 parent_flag = parent_reg->live & REG_LIVE_READ;
+	u8 flag = reg->live & REG_LIVE_READ;
 	int err;
 
-	if (parent_reg->live & REG_LIVE_READ || !(reg->live & REG_LIVE_READ))
+	/* When comes here, read flags of PARENT_REG or REG could be any of
+	 * REG_LIVE_READ64, REG_LIVE_READ32, REG_LIVE_NONE. There is no need
+	 * of propagation if PARENT_REG has strongest REG_LIVE_READ64.
+	 */
+	if (parent_flag == REG_LIVE_READ64 ||
+	    /* Or if there is no read flag from REG. */
+	    !flag ||
+	    /* Or if the read flag from REG is the same as PARENT_REG. */
+	    parent_flag == flag)
 		return 0;
 
-	err = mark_reg_read(env, reg, parent_reg);
+	err = mark_reg_read(env, reg, parent_reg, flag);
 	if (err)
 		return err;
 
-	return 0;
+	return flag;
 }
 
 /* A write screens off any subsequent reads; but write marks come from the
@@ -6328,8 +6513,10 @@ static int propagate_liveness(struct bpf_verifier_env *env,
 		for (i = frame < vstate->curframe ? BPF_REG_6 : 0; i < BPF_REG_FP; i++) {
 			err = propagate_liveness_reg(env, &state_reg[i],
 						     &parent_reg[i]);
-			if (err)
+			if (err < 0)
 				return err;
+			if (err == REG_LIVE_READ64)
+				mark_insn_zext(env, &parent_reg[i]);
 		}
 
 		/* Propagate stack slots. */
@@ -6339,11 +6526,11 @@ static int propagate_liveness(struct bpf_verifier_env *env,
 			state_reg = &state->stack[i].spilled_ptr;
 			err = propagate_liveness_reg(env, state_reg,
 						     parent_reg);
-			if (err)
+			if (err < 0)
 				return err;
 		}
 	}
-	return err;
+	return 0;
 }
 
 static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 03/17] bpf: verifier: mark patched-insn with sub-register zext flag
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 02/17] bpf: verifier: mark verified-insn with sub-register zext flag Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 04/17] bpf: introduce new alu insn BPF_ZEXT for explicit zero extension Jiong Wang
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

Patched insns do not go through generic verification, therefore doesn't has
zero extension information collected during insn walking.

We don't bother analyze them at the moment, for any sub-register def comes
from them, just conservatively mark it as needing zero extension.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/verifier.c | 37 +++++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 43ea665..b43e8a2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1303,6 +1303,24 @@ static bool is_reg64(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	return true;
 }
 
+/* Return TRUE if INSN doesn't have explicit value define. */
+static bool insn_no_def(struct bpf_insn *insn)
+{
+	u8 class = BPF_CLASS(insn->code);
+
+	return (class == BPF_JMP || class == BPF_JMP32 ||
+		class == BPF_STX || class == BPF_ST);
+}
+
+/* Return TRUE if INSN has defined any 32-bit value explicitly. */
+static bool insn_has_def32(struct bpf_verifier_env *env, struct bpf_insn *insn)
+{
+	if (insn_no_def(insn))
+		return false;
+
+	return !is_reg64(env, insn, insn->dst_reg, NULL, DST_OP);
+}
+
 static void mark_insn_zext(struct bpf_verifier_env *env,
 			   struct bpf_reg_state *reg)
 {
@@ -7306,14 +7324,23 @@ static void convert_pseudo_ld_imm64(struct bpf_verifier_env *env)
  * insni[off, off + cnt).  Adjust corresponding insn_aux_data by copying
  * [0, off) and [off, end) to new locations, so the patched range stays zero
  */
-static int adjust_insn_aux_data(struct bpf_verifier_env *env, u32 prog_len,
-				u32 off, u32 cnt)
+static int adjust_insn_aux_data(struct bpf_verifier_env *env,
+				struct bpf_prog *new_prog, u32 off, u32 cnt)
 {
 	struct bpf_insn_aux_data *new_data, *old_data = env->insn_aux_data;
+	struct bpf_insn *insn = new_prog->insnsi;
+	u32 prog_len;
 	int i;
 
+	/* aux info at OFF always needs adjustment, no matter fast path
+	 * (cnt == 1) is taken or not. There is no guarantee INSN at OFF is the
+	 * original insn at old prog.
+	 */
+	old_data[off].zext_dst = insn_has_def32(env, insn + off + cnt - 1);
+
 	if (cnt == 1)
 		return 0;
+	prog_len = new_prog->len;
 	new_data = vzalloc(array_size(prog_len,
 				      sizeof(struct bpf_insn_aux_data)));
 	if (!new_data)
@@ -7321,8 +7348,10 @@ static int adjust_insn_aux_data(struct bpf_verifier_env *env, u32 prog_len,
 	memcpy(new_data, old_data, sizeof(struct bpf_insn_aux_data) * off);
 	memcpy(new_data + off + cnt - 1, old_data + off,
 	       sizeof(struct bpf_insn_aux_data) * (prog_len - off - cnt + 1));
-	for (i = off; i < off + cnt - 1; i++)
+	for (i = off; i < off + cnt - 1; i++) {
 		new_data[i].seen = true;
+		new_data[i].zext_dst = insn_has_def32(env, insn + i);
+	}
 	env->insn_aux_data = new_data;
 	vfree(old_data);
 	return 0;
@@ -7355,7 +7384,7 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of
 				env->insn_aux_data[off].orig_idx);
 		return NULL;
 	}
-	if (adjust_insn_aux_data(env, new_prog->len, off, len))
+	if (adjust_insn_aux_data(env, new_prog, off, len))
 		return NULL;
 	adjust_subprog_starts(env, off, len);
 	return new_prog;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 04/17] bpf: introduce new alu insn BPF_ZEXT for explicit zero extension
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (2 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 03/17] bpf: verifier: mark patched-insn " Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-06 15:57   ` Alexei Starovoitov
  2019-05-03 10:42 ` [PATCH v6 bpf-next 05/17] bpf: verifier: insert BPF_ZEXT according to zext analysis result Jiong Wang
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

This patch introduce new alu32 insn BPF_ZEXT, and allocate the unused
opcode 0xe0 to it.

Compared with the other alu32 insns, zero extension on low 32-bit is the
only semantics for this instruction. It also allows various JIT back-ends
to do optimal zero extension code-gen.

BPF_ZEXT is supposed to be encoded with BPF_ALU only, and is supposed to be
generated by the latter 32-bit optimization code inside verifier for those
arches that do not support hardware implicit zero extension only.

It is not supposed to be used in user's program directly at the moment.
Therefore, no need to recognize it inside generic verification code. It
just need to be supported for execution on interpreter or related JIT
back-ends.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 Documentation/networking/filter.txt | 10 ++++++++++
 include/uapi/linux/bpf.h            |  3 +++
 kernel/bpf/core.c                   |  4 ++++
 tools/include/uapi/linux/bpf.h      |  3 +++
 4 files changed, 20 insertions(+)

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 319e5e0..1cb3e42 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -903,6 +903,16 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
   BPF_MOV   0xb0  /* eBPF only: mov reg to reg */
   BPF_ARSH  0xc0  /* eBPF only: sign extending shift right */
   BPF_END   0xd0  /* eBPF only: endianness conversion */
+  BPF_ZEXT  0xe0  /* eBPF BPF_ALU only: zero-extends low 32-bit */
+
+Compared with BPF_ALU | BPF_MOV which zero-extends low 32-bit implicitly,
+BPF_ALU | BPF_ZEXT zero-extends low 32-bit explicitly. Such zero extension is
+not the main semantics for the prior, but is for the latter. Therefore, JIT
+optimizer could optimize out the zero extension for the prior when it is
+concluded safe to do so, but should never do such optimization for the latter.
+LLVM compiler won't generate BPF_ZEXT, and hand written assembly is not supposed
+to use it. Verifier 32-bit optimization pass, which removes zero extension
+semantics from the other BPF_ALU instructions, is the only place generates it.
 
 If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of:
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 72336ba..22ccdf4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -32,6 +32,9 @@
 #define BPF_FROM_LE	BPF_TO_LE
 #define BPF_FROM_BE	BPF_TO_BE
 
+/* zero extend low 32-bit */
+#define BPF_ZEXT	0xe0
+
 /* jmp encodings */
 #define BPF_JNE		0x50	/* jump != */
 #define BPF_JLT		0xa0	/* LT is unsigned, '<' */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 2792eda..ee8703d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1152,6 +1152,7 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
 	INSN_2(ALU, NEG),			\
 	INSN_3(ALU, END, TO_BE),		\
 	INSN_3(ALU, END, TO_LE),		\
+	INSN_2(ALU, ZEXT),			\
 	/*   Immediate based. */		\
 	INSN_3(ALU, ADD,  K),			\
 	INSN_3(ALU, SUB,  K),			\
@@ -1352,6 +1353,9 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
 	ALU64_NEG:
 		DST = -DST;
 		CONT;
+	ALU_ZEXT:
+		DST = (u32) DST;
+		CONT;
 	ALU_MOV_X:
 		DST = (u32) SRC;
 		CONT;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 72336ba..22ccdf4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -32,6 +32,9 @@
 #define BPF_FROM_LE	BPF_TO_LE
 #define BPF_FROM_BE	BPF_TO_BE
 
+/* zero extend low 32-bit */
+#define BPF_ZEXT	0xe0
+
 /* jmp encodings */
 #define BPF_JNE		0x50	/* jump != */
 #define BPF_JLT		0xa0	/* LT is unsigned, '<' */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 05/17] bpf: verifier: insert BPF_ZEXT according to zext analysis result
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (3 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 04/17] bpf: introduce new alu insn BPF_ZEXT for explicit zero extension Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 06/17] bpf: introduce new bpf prog load flags "BPF_F_TEST_RND_HI32" Jiong Wang
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

After previous patches, verifier has marked those instructions that really
need zero extension on dst_reg.

It is then for all back-ends to decide how to use such information to
eliminate unnecessary zero extension code-gen during JIT compilation.

One approach is:
  1. Verifier insert explicit zero extension for those instructions that
     need zero extension.
  2. All JIT back-ends do NOT generate zero extension for sub-register
     write any more.

The good thing for this approach is no major change on JIT back-end
interface, all back-ends could get this optimization.

However, only those back-ends that do not have hardware zero extension
want this optimization. For back-ends like x86_64 and AArch64, there is
hardware support, so zext insertion should be disabled.

This patch introduces new target hook "bpf_jit_hardware_zext" which is
default true, meaning the underlying hardware will do zero extension
implicitly, therefore zext insertion by verifier will be disabled. Once a
back-end overrides this hook to false, then verifier will insert BPF_ZEXT
to clear high 32-bit of definitions when necessary.

Offload targets do not use this native target hook, instead, they could
get the optimization results using bpf_prog_offload_ops.finalize.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf.h    |  1 +
 include/linux/filter.h |  1 +
 kernel/bpf/core.c      |  8 ++++++++
 kernel/bpf/verifier.c  | 40 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 50 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 11a5fb9..cf3c3f3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -373,6 +373,7 @@ struct bpf_prog_aux {
 	u32 id;
 	u32 func_cnt; /* used by non-func prog as the number of func progs */
 	u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
+	bool verifier_zext; /* Zero extensions has been inserted by verifier. */
 	bool offload_requested;
 	struct bpf_prog **func;
 	void *jit_data; /* JIT specific data. arch dependent */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index fb0edad..8750657 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -821,6 +821,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
 void bpf_jit_compile(struct bpf_prog *prog);
+bool bpf_jit_hardware_zext(void);
 bool bpf_helper_changes_pkt_data(void *func);
 
 static inline bool bpf_dump_raw_ok(void)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index ee8703d..9754346 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2095,6 +2095,14 @@ bool __weak bpf_helper_changes_pkt_data(void *func)
 	return false;
 }
 
+/* Return TRUE is the target hardware of JIT will do zero extension to high bits
+ * when writing to low 32-bit of one register. Otherwise, return FALSE.
+ */
+bool __weak bpf_jit_hardware_zext(void)
+{
+	return true;
+}
+
 /* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
  * skb_copy_bits(), so provide a weak definition of it for NET-less config.
  */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b43e8a2..999da02 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7648,6 +7648,37 @@ static int opt_remove_nops(struct bpf_verifier_env *env)
 	return 0;
 }
 
+static int opt_subreg_zext_lo32(struct bpf_verifier_env *env)
+{
+	struct bpf_insn_aux_data *aux = env->insn_aux_data;
+	struct bpf_insn *insns = env->prog->insnsi;
+	int i, delta = 0, len = env->prog->len;
+	struct bpf_insn zext_patch[2];
+	struct bpf_prog *new_prog;
+
+	zext_patch[1] = BPF_ALU32_IMM(BPF_ZEXT, 0, 0);
+	for (i = 0; i < len; i++) {
+		int adj_idx = i + delta;
+		struct bpf_insn insn;
+
+		if (!aux[adj_idx].zext_dst)
+			continue;
+
+		insn = insns[adj_idx];
+		zext_patch[0] = insn;
+		zext_patch[1].dst_reg = insn.dst_reg;
+		new_prog = bpf_patch_insn_data(env, adj_idx, zext_patch, 2);
+		if (!new_prog)
+			return -ENOMEM;
+		env->prog = new_prog;
+		insns = new_prog->insnsi;
+		aux = env->insn_aux_data;
+		delta += 2;
+	}
+
+	return 0;
+}
+
 /* convert load instructions that access fields of a context type into a
  * sequence of instructions that access fields of the underlying structure:
  *     struct __sk_buff    -> struct sk_buff
@@ -8499,6 +8530,15 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	if (ret == 0)
 		ret = fixup_bpf_calls(env);
 
+	/* do 32-bit optimization after insn patching has done so those patched
+	 * insns could be handled correctly.
+	 */
+	if (ret == 0 && !bpf_jit_hardware_zext() &&
+	    !bpf_prog_is_dev_bound(env->prog->aux)) {
+		ret = opt_subreg_zext_lo32(env);
+		env->prog->aux->verifier_zext = !ret;
+	}
+
 	if (ret == 0)
 		ret = fixup_call_args(env);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 06/17] bpf: introduce new bpf prog load flags "BPF_F_TEST_RND_HI32"
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (4 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 05/17] bpf: verifier: insert BPF_ZEXT according to zext analysis result Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 07/17] bpf: verifier: randomize high 32-bit when BPF_F_TEST_RND_HI32 is set Jiong Wang
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

x86_64 and AArch64 perhaps are two arches that running bpf testsuite
frequently, however the zero extension insertion pass is not enabled for
them because of their hardware support.

It is critical to guarantee the pass correction as it is supposed to be
enabled at default for a couple of other arches, for example PowerPC,
SPARC, arm, NFP etc. Therefore, it would be very useful if there is a way
to test this pass on for example x86_64.

The test methodology employed by this set is "poisoning" useless bits. High
32-bit of a definition is randomized if it is identified as not used by any
later instruction. Such randomization is only enabled under testing mode
which is gated by the new bpf prog load flags "BPF_F_TEST_RND_HI32".

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/uapi/linux/bpf.h       | 18 ++++++++++++++++++
 kernel/bpf/syscall.c           |  4 +++-
 tools/include/uapi/linux/bpf.h | 18 ++++++++++++++++++
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 22ccdf4..1bf32c3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -263,6 +263,24 @@ enum bpf_attach_type {
  */
 #define BPF_F_ANY_ALIGNMENT	(1U << 1)
 
+/* BPF_F_TEST_RND_HI32 is used in BPF_PROG_LOAD command for testing purpose.
+ * Verifier does sub-register def/use analysis and identifies instructions whose
+ * def only matters for low 32-bit, high 32-bit is never referenced later
+ * through implicit zero extension. Therefore verifier notifies JIT back-ends
+ * that it is safe to ignore clearing high 32-bit for these instructions. This
+ * saves some back-ends a lot of code-gen. However such optimization is not
+ * necessary on some arches, for example x86_64, arm64 etc, whose JIT back-ends
+ * hence hasn't used verifier's analysis result. But, we really want to have a
+ * way to be able to verify the correctness of the described optimization on
+ * x86_64 on which testsuites are frequently exercised.
+ *
+ * So, this flag is introduced. Once it is set, verifier will randomize high
+ * 32-bit for those instructions who has been identified as safe to ignore them.
+ * Then, if verifier is not doing correct analysis, such randomization will
+ * regress tests to expose bugs.
+ */
+#define BPF_F_TEST_RND_HI32	(1U << 2)
+
 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
  * two extensions:
  *
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ad3ccf8..ec1b42c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1601,7 +1601,9 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	if (CHECK_ATTR(BPF_PROG_LOAD))
 		return -EINVAL;
 
-	if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT))
+	if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT |
+				 BPF_F_ANY_ALIGNMENT |
+				 BPF_F_TEST_RND_HI32))
 		return -EINVAL;
 
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 22ccdf4..1bf32c3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -263,6 +263,24 @@ enum bpf_attach_type {
  */
 #define BPF_F_ANY_ALIGNMENT	(1U << 1)
 
+/* BPF_F_TEST_RND_HI32 is used in BPF_PROG_LOAD command for testing purpose.
+ * Verifier does sub-register def/use analysis and identifies instructions whose
+ * def only matters for low 32-bit, high 32-bit is never referenced later
+ * through implicit zero extension. Therefore verifier notifies JIT back-ends
+ * that it is safe to ignore clearing high 32-bit for these instructions. This
+ * saves some back-ends a lot of code-gen. However such optimization is not
+ * necessary on some arches, for example x86_64, arm64 etc, whose JIT back-ends
+ * hence hasn't used verifier's analysis result. But, we really want to have a
+ * way to be able to verify the correctness of the described optimization on
+ * x86_64 on which testsuites are frequently exercised.
+ *
+ * So, this flag is introduced. Once it is set, verifier will randomize high
+ * 32-bit for those instructions who has been identified as safe to ignore them.
+ * Then, if verifier is not doing correct analysis, such randomization will
+ * regress tests to expose bugs.
+ */
+#define BPF_F_TEST_RND_HI32	(1U << 2)
+
 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
  * two extensions:
  *
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 07/17] bpf: verifier: randomize high 32-bit when BPF_F_TEST_RND_HI32 is set
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (5 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 06/17] bpf: introduce new bpf prog load flags "BPF_F_TEST_RND_HI32" Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 08/17] libbpf: add "prog_flags" to bpf_program/bpf_prog_load_attr/bpf_load_program_attr Jiong Wang
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

This patch randomizes high 32-bit of a definition when BPF_F_TEST_RND_HI32
is set.

It does this once the flag set no matter there is hardware zero extension
support or not. Because this is a test feature and we want to deliver the
most stressful test.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/verifier.c | 69 +++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 58 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 999da02..31ffbef 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7648,32 +7648,79 @@ static int opt_remove_nops(struct bpf_verifier_env *env)
 	return 0;
 }
 
-static int opt_subreg_zext_lo32(struct bpf_verifier_env *env)
+static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
+					 const union bpf_attr *attr)
 {
+	struct bpf_insn *patch, zext_patch[2], rnd_hi32_patch[4];
 	struct bpf_insn_aux_data *aux = env->insn_aux_data;
+	int i, patch_len, delta = 0, len = env->prog->len;
 	struct bpf_insn *insns = env->prog->insnsi;
-	int i, delta = 0, len = env->prog->len;
-	struct bpf_insn zext_patch[2];
 	struct bpf_prog *new_prog;
+	bool rnd_hi32;
+
+	rnd_hi32 = attr->prog_flags & BPF_F_TEST_RND_HI32;
 
 	zext_patch[1] = BPF_ALU32_IMM(BPF_ZEXT, 0, 0);
+	rnd_hi32_patch[1] = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, 0);
+	rnd_hi32_patch[2] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_AX, 32);
+	rnd_hi32_patch[3] = BPF_ALU64_REG(BPF_OR, 0, BPF_REG_AX);
 	for (i = 0; i < len; i++) {
 		int adj_idx = i + delta;
 		struct bpf_insn insn;
 
-		if (!aux[adj_idx].zext_dst)
+		insn = insns[adj_idx];
+		if (!aux[adj_idx].zext_dst) {
+			u8 code, class;
+			u32 imm_rnd;
+
+			if (!rnd_hi32)
+				continue;
+
+			code = insn.code;
+			class = BPF_CLASS(code);
+			if (insn_no_def(&insn))
+				continue;
+
+			/* NOTE: arg "reg" (the fourth one) is only used for
+			 *       BPF_STX which has been ruled out in above
+			 *       check, it is safe to pass NULL here.
+			 */
+			if (is_reg64(env, &insn, insn.dst_reg, NULL, DST_OP)) {
+				if (class == BPF_LD &&
+				    BPF_MODE(code) == BPF_IMM)
+					i++;
+				continue;
+			}
+
+			/* ctx load could be transformed into wider load. */
+			if (class == BPF_LDX &&
+			    aux[adj_idx].ptr_type == PTR_TO_CTX)
+				continue;
+
+			imm_rnd = get_random_int();
+			rnd_hi32_patch[0] = insn;
+			rnd_hi32_patch[1].imm = imm_rnd;
+			rnd_hi32_patch[3].dst_reg = insn.dst_reg;
+			patch = rnd_hi32_patch;
+			patch_len = 4;
+			goto apply_patch_buffer;
+		}
+
+		if (bpf_jit_hardware_zext())
 			continue;
 
-		insn = insns[adj_idx];
 		zext_patch[0] = insn;
 		zext_patch[1].dst_reg = insn.dst_reg;
-		new_prog = bpf_patch_insn_data(env, adj_idx, zext_patch, 2);
+		patch = zext_patch;
+		patch_len = 2;
+apply_patch_buffer:
+		new_prog = bpf_patch_insn_data(env, adj_idx, patch, patch_len);
 		if (!new_prog)
 			return -ENOMEM;
 		env->prog = new_prog;
 		insns = new_prog->insnsi;
 		aux = env->insn_aux_data;
-		delta += 2;
+		delta += patch_len - 1;
 	}
 
 	return 0;
@@ -8533,10 +8580,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	/* do 32-bit optimization after insn patching has done so those patched
 	 * insns could be handled correctly.
 	 */
-	if (ret == 0 && !bpf_jit_hardware_zext() &&
-	    !bpf_prog_is_dev_bound(env->prog->aux)) {
-		ret = opt_subreg_zext_lo32(env);
-		env->prog->aux->verifier_zext = !ret;
+	if (ret == 0 && !bpf_prog_is_dev_bound(env->prog->aux)) {
+		ret = opt_subreg_zext_lo32_rnd_hi32(env, attr);
+		env->prog->aux->verifier_zext =
+			bpf_jit_hardware_zext() ? false : !ret;
 	}
 
 	if (ret == 0)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 08/17] libbpf: add "prog_flags" to bpf_program/bpf_prog_load_attr/bpf_load_program_attr
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (6 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 07/17] bpf: verifier: randomize high 32-bit when BPF_F_TEST_RND_HI32 is set Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 09/17] selftests: bpf: adjust several test_verifier helpers for insn insertion Jiong Wang
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

libbpf doesn't allow passing "prog_flags" during bpf program load in a
couple of load related APIs, "bpf_load_program_xattr", "load_program" and
"bpf_prog_load_xattr".

It makes sense to allow passing "prog_flags" which is useful for
customizing program loading.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 tools/lib/bpf/bpf.c    | 1 +
 tools/lib/bpf/bpf.h    | 1 +
 tools/lib/bpf/libbpf.c | 3 +++
 tools/lib/bpf/libbpf.h | 1 +
 4 files changed, 6 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 955191c..f79ec49 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -254,6 +254,7 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
 	if (load_attr->name)
 		memcpy(attr.prog_name, load_attr->name,
 		       min(strlen(load_attr->name), BPF_OBJ_NAME_LEN - 1));
+	attr.prog_flags = load_attr->prog_flags;
 
 	fd = sys_bpf_prog_load(&attr, sizeof(attr));
 	if (fd >= 0)
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 9593fec..ff42ca0 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -87,6 +87,7 @@ struct bpf_load_program_attr {
 	const void *line_info;
 	__u32 line_info_cnt;
 	__u32 log_level;
+	__u32 prog_flags;
 };
 
 /* Flags to direct loading requirements */
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 11a65db..debca21 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -184,6 +184,7 @@ struct bpf_program {
 	void *line_info;
 	__u32 line_info_rec_size;
 	__u32 line_info_cnt;
+	__u32 prog_flags;
 };
 
 enum libbpf_map_type {
@@ -1949,6 +1950,7 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 	load_attr.line_info_rec_size = prog->line_info_rec_size;
 	load_attr.line_info_cnt = prog->line_info_cnt;
 	load_attr.log_level = prog->log_level;
+	load_attr.prog_flags = prog->prog_flags;
 	if (!load_attr.insns || !load_attr.insns_cnt)
 		return -EINVAL;
 
@@ -3394,6 +3396,7 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 						      expected_attach_type);
 
 		prog->log_level = attr->log_level;
+		prog->prog_flags = attr->prog_flags;
 		if (!first_prog)
 			first_prog = prog;
 	}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index c5ff005..5abc237 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -320,6 +320,7 @@ struct bpf_prog_load_attr {
 	enum bpf_attach_type expected_attach_type;
 	int ifindex;
 	int log_level;
+	int prog_flags;
 };
 
 LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 09/17] selftests: bpf: adjust several test_verifier helpers for insn insertion
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (7 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 08/17] libbpf: add "prog_flags" to bpf_program/bpf_prog_load_attr/bpf_load_program_attr Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 10/17] selftests: bpf: enable hi32 randomization for all tests Jiong Wang
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

  - bpf_fill_ld_abs_vlan_push_pop:
    Prevent zext happens inside PUSH_CNT loop. This could happen because
    of BPF_LD_ABS (32-bit def) + BPF_JMP (64-bit use), or BPF_LD_ABS +
    EXIT (64-bit use of R0). So, change BPF_JMP to BPF_JMP32 and redefine
    R0 at exit path to cut off the data-flow from inside the loop.

  - bpf_fill_jump_around_ld_abs:
    Jump range is limited to 16 bit. every ld_abs is replaced by 6 insns,
    but on arches like arm, ppc etc, there will be one BPF_ZEXT inserted
    to extend the error value of the inlined ld_abs sequence which then
    contains 7 insns. so, set the dividend to 7 so the testcase could
    work on all arches.

  - bpf_fill_scale1/bpf_fill_scale2:
    Both contains ~1M BPF_ALU32_IMM which will trigger ~1M insn patcher
    call because of hi32 randomization later when BPF_F_TEST_RND_HI32 is
    set for bpf selftests. Insn patcher is not efficient that 1M call to
    it will hang computer. So , change to BPF_ALU64_IMM to avoid hi32
    randomization.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 tools/testing/selftests/bpf/test_verifier.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index ccd896b..3dcdfd4 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -138,32 +138,36 @@ static void bpf_fill_ld_abs_vlan_push_pop(struct bpf_test *self)
 loop:
 	for (j = 0; j < PUSH_CNT; j++) {
 		insn[i++] = BPF_LD_ABS(BPF_B, 0);
-		insn[i] = BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0x34, len - i - 2);
+		/* jump to error label */
+		insn[i] = BPF_JMP32_IMM(BPF_JNE, BPF_REG_0, 0x34, len - i - 3);
 		i++;
 		insn[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_6);
 		insn[i++] = BPF_MOV64_IMM(BPF_REG_2, 1);
 		insn[i++] = BPF_MOV64_IMM(BPF_REG_3, 2);
 		insn[i++] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
 					 BPF_FUNC_skb_vlan_push),
-		insn[i] = BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, len - i - 2);
+		insn[i] = BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, len - i - 3);
 		i++;
 	}
 
 	for (j = 0; j < PUSH_CNT; j++) {
 		insn[i++] = BPF_LD_ABS(BPF_B, 0);
-		insn[i] = BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0x34, len - i - 2);
+		insn[i] = BPF_JMP32_IMM(BPF_JNE, BPF_REG_0, 0x34, len - i - 3);
 		i++;
 		insn[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_6);
 		insn[i++] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
 					 BPF_FUNC_skb_vlan_pop),
-		insn[i] = BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, len - i - 2);
+		insn[i] = BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, len - i - 3);
 		i++;
 	}
 	if (++k < 5)
 		goto loop;
 
-	for (; i < len - 1; i++)
-		insn[i] = BPF_ALU32_IMM(BPF_MOV, BPF_REG_0, 0xbef);
+	for (; i < len - 3; i++)
+		insn[i] = BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 0xbef);
+	insn[len - 3] = BPF_JMP_A(1);
+	/* error label */
+	insn[len - 2] = BPF_MOV32_IMM(BPF_REG_0, 0);
 	insn[len - 1] = BPF_EXIT_INSN();
 	self->prog_len = len;
 }
@@ -171,8 +175,13 @@ static void bpf_fill_ld_abs_vlan_push_pop(struct bpf_test *self)
 static void bpf_fill_jump_around_ld_abs(struct bpf_test *self)
 {
 	struct bpf_insn *insn = self->fill_insns;
-	/* jump range is limited to 16 bit. every ld_abs is replaced by 6 insns */
-	unsigned int len = (1 << 15) / 6;
+	/* jump range is limited to 16 bit. every ld_abs is replaced by 6 insns,
+	 * but on arches like arm, ppc etc, there will be one BPF_ZEXT inserted
+	 * to extend the error value of the inlined ld_abs sequence which then
+	 * contains 7 insns. so, set the dividend to 7 so the testcase could
+	 * work on all arches.
+	 */
+	unsigned int len = (1 << 15) / 7;
 	int i = 0;
 
 	insn[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
@@ -230,7 +239,7 @@ static void bpf_fill_scale1(struct bpf_test *self)
 	 * within 1m limit add MAX_TEST_INSNS - 1025 MOVs and 1 EXIT
 	 */
 	while (i < MAX_TEST_INSNS - 1025)
-		insn[i++] = BPF_ALU32_IMM(BPF_MOV, BPF_REG_0, 42);
+		insn[i++] = BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 42);
 	insn[i] = BPF_EXIT_INSN();
 	self->prog_len = i + 1;
 	self->retval = 42;
@@ -261,7 +270,7 @@ static void bpf_fill_scale2(struct bpf_test *self)
 	 * within 1m limit add MAX_TEST_INSNS - 1025 MOVs and 1 EXIT
 	 */
 	while (i < MAX_TEST_INSNS - 1025)
-		insn[i++] = BPF_ALU32_IMM(BPF_MOV, BPF_REG_0, 42);
+		insn[i++] = BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 42);
 	insn[i] = BPF_EXIT_INSN();
 	self->prog_len = i + 1;
 	self->retval = 42;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 10/17] selftests: bpf: enable hi32 randomization for all tests
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (8 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 09/17] selftests: bpf: adjust several test_verifier helpers for insn insertion Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 11/17] arm: bpf: eliminate zero extension code-gen Jiong Wang
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

The previous libbpf patch allows user to specify "prog_flags" to bpf
program load APIs. To enable high 32-bit randomization for a test, we need
to set BPF_F_TEST_RND_HI32 in "prog_flags".

To enable such randomization for all tests, we need to make sure all places
are passing BPF_F_TEST_RND_HI32. Changing them one by one is not
convenient, also, it would be better if a test could be switched to
"normal" running mode without code change.

Given the program load APIs used across bpf selftests are mostly:
  bpf_prog_load:      load from file
  bpf_load_program:   load from raw insns

A test_stub.c is implemented for bpf seltests, it offers two functions for
testing purpose:

  bpf_prog_test_load
  bpf_test_load_program

The are the same as "bpf_prog_load" and "bpf_load_program", except they
also set BPF_F_TEST_RND_HI32. Given *_xattr functions are the APIs to
customize any "prog_flags", it makes little sense to put these two
functions into libbpf.

Then, the following CFLAGS are passed to compilations for host programs:
  -Dbpf_prog_load=bpf_prog_test_load
  -Dbpf_load_program=bpf_test_load_program

They migrate the used load APIs to the test version, hence enable high
32-bit randomization for these tests without changing source code.

Besides all these, there are several testcases are using
"bpf_prog_load_attr" directly, their call sites are updated to pass
BPF_F_TEST_RND_HI32.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 tools/testing/selftests/bpf/Makefile               | 10 +++---
 .../selftests/bpf/prog_tests/bpf_verif_scale.c     |  1 +
 tools/testing/selftests/bpf/test_sock_addr.c       |  1 +
 tools/testing/selftests/bpf/test_sock_fields.c     |  1 +
 tools/testing/selftests/bpf/test_socket_cookie.c   |  1 +
 tools/testing/selftests/bpf/test_stub.c            | 40 ++++++++++++++++++++++
 tools/testing/selftests/bpf/test_verifier.c        |  2 +-
 7 files changed, 51 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_stub.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 66f2dca..3f2c131 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -15,7 +15,9 @@ LLC		?= llc
 LLVM_OBJCOPY	?= llvm-objcopy
 LLVM_READELF	?= llvm-readelf
 BTF_PAHOLE	?= pahole
-CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(BPFDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include
+CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(BPFDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include \
+	  -Dbpf_prog_load=bpf_prog_test_load \
+	  -Dbpf_load_program=bpf_test_load_program
 LDLIBS += -lcap -lelf -lrt -lpthread
 
 # Order correspond to 'make run_tests' order
@@ -78,9 +80,9 @@ $(OUTPUT)/test_maps: map_tests/*.c
 
 BPFOBJ := $(OUTPUT)/libbpf.a
 
-$(TEST_GEN_PROGS): $(BPFOBJ)
+$(TEST_GEN_PROGS): test_stub.o $(BPFOBJ)
 
-$(TEST_GEN_PROGS_EXTENDED): $(OUTPUT)/libbpf.a
+$(TEST_GEN_PROGS_EXTENDED): test_stub.o $(OUTPUT)/libbpf.a
 
 $(OUTPUT)/test_dev_cgroup: cgroup_helpers.c
 $(OUTPUT)/test_skb_cgroup_id_user: cgroup_helpers.c
@@ -176,7 +178,7 @@ $(ALU32_BUILD_DIR)/test_progs_32: test_progs.c $(OUTPUT)/libbpf.a\
 						$(ALU32_BUILD_DIR)/urandom_read
 	$(CC) $(TEST_PROGS_CFLAGS) $(CFLAGS) \
 		-o $(ALU32_BUILD_DIR)/test_progs_32 \
-		test_progs.c trace_helpers.c prog_tests/*.c \
+		test_progs.c test_stub.c trace_helpers.c prog_tests/*.c \
 		$(OUTPUT)/libbpf.a $(LDLIBS)
 
 $(ALU32_BUILD_DIR)/test_progs_32: $(PROG_TESTS_H)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
index 23b159d..2623d15 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
@@ -22,6 +22,7 @@ static int check_load(const char *file)
 	attr.file = file;
 	attr.prog_type = BPF_PROG_TYPE_SCHED_CLS;
 	attr.log_level = 4;
+	attr.prog_flags = BPF_F_TEST_RND_HI32;
 	err = bpf_prog_load_xattr(&attr, &obj, &prog_fd);
 	bpf_object__close(obj);
 	if (err)
diff --git a/tools/testing/selftests/bpf/test_sock_addr.c b/tools/testing/selftests/bpf/test_sock_addr.c
index 3f110ea..5d0c4f0 100644
--- a/tools/testing/selftests/bpf/test_sock_addr.c
+++ b/tools/testing/selftests/bpf/test_sock_addr.c
@@ -745,6 +745,7 @@ static int load_path(const struct sock_addr_test *test, const char *path)
 	attr.file = path;
 	attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
 	attr.expected_attach_type = test->expected_attach_type;
+	attr.prog_flags = BPF_F_TEST_RND_HI32;
 
 	if (bpf_prog_load_xattr(&attr, &obj, &prog_fd)) {
 		if (test->expected_result != LOAD_REJECT)
diff --git a/tools/testing/selftests/bpf/test_sock_fields.c b/tools/testing/selftests/bpf/test_sock_fields.c
index e089477..f0fc103 100644
--- a/tools/testing/selftests/bpf/test_sock_fields.c
+++ b/tools/testing/selftests/bpf/test_sock_fields.c
@@ -414,6 +414,7 @@ int main(int argc, char **argv)
 	struct bpf_prog_load_attr attr = {
 		.file = "test_sock_fields_kern.o",
 		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+		.prog_flags = BPF_F_TEST_RND_HI32,
 	};
 	int cgroup_fd, egress_fd, ingress_fd, err;
 	struct bpf_program *ingress_prog;
diff --git a/tools/testing/selftests/bpf/test_socket_cookie.c b/tools/testing/selftests/bpf/test_socket_cookie.c
index e51d637..cac8ee5 100644
--- a/tools/testing/selftests/bpf/test_socket_cookie.c
+++ b/tools/testing/selftests/bpf/test_socket_cookie.c
@@ -148,6 +148,7 @@ static int run_test(int cgfd)
 	memset(&attr, 0, sizeof(attr));
 	attr.file = SOCKET_COOKIE_PROG;
 	attr.prog_type = BPF_PROG_TYPE_UNSPEC;
+	attr.prog_flags = BPF_F_TEST_RND_HI32;
 
 	err = bpf_prog_load_xattr(&attr, &pobj, &prog_fd);
 	if (err) {
diff --git a/tools/testing/selftests/bpf/test_stub.c b/tools/testing/selftests/bpf/test_stub.c
new file mode 100644
index 0000000..84e81a8
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_stub.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Netronome Systems, Inc. */
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include <string.h>
+
+int bpf_prog_test_load(const char *file, enum bpf_prog_type type,
+		       struct bpf_object **pobj, int *prog_fd)
+{
+	struct bpf_prog_load_attr attr;
+
+	memset(&attr, 0, sizeof(struct bpf_prog_load_attr));
+	attr.file = file;
+	attr.prog_type = type;
+	attr.expected_attach_type = 0;
+	attr.prog_flags = BPF_F_TEST_RND_HI32;
+
+	return bpf_prog_load_xattr(&attr, pobj, prog_fd);
+}
+
+int bpf_test_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
+			  size_t insns_cnt, const char *license,
+			  __u32 kern_version, char *log_buf,
+		     size_t log_buf_sz)
+{
+	struct bpf_load_program_attr load_attr;
+
+	memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
+	load_attr.prog_type = type;
+	load_attr.expected_attach_type = 0;
+	load_attr.name = NULL;
+	load_attr.insns = insns;
+	load_attr.insns_cnt = insns_cnt;
+	load_attr.license = license;
+	load_attr.kern_version = kern_version;
+	load_attr.prog_flags = BPF_F_TEST_RND_HI32;
+
+	return bpf_load_program_xattr(&load_attr, log_buf, log_buf_sz);
+}
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 3dcdfd4..71704de 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -879,7 +879,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 	if (fixup_skips != skips)
 		return;
 
-	pflags = 0;
+	pflags = BPF_F_TEST_RND_HI32;
 	if (test->flags & F_LOAD_WITH_STRICT_ALIGNMENT)
 		pflags |= BPF_F_STRICT_ALIGNMENT;
 	if (test->flags & F_NEEDS_EFFICIENT_UNALIGNED_ACCESS)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 11/17] arm: bpf: eliminate zero extension code-gen
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (9 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 10/17] selftests: bpf: enable hi32 randomization for all tests Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 12/17] powerpc: " Jiong Wang
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Shubham Bansal

Cc: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/arm/net/bpf_jit_32.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index c8bfbbf..a6f78c8 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -736,7 +736,8 @@ static inline void emit_a32_alu_r64(const bool is64, const s8 dst[],
 
 		/* ALU operation */
 		emit_alu_r(rd[1], rs, true, false, op, ctx);
-		emit_a32_mov_i(rd[0], 0, ctx);
+		if (!ctx->prog->aux->verifier_zext)
+			emit_a32_mov_i(rd[0], 0, ctx);
 	}
 
 	arm_bpf_put_reg64(dst, rd, ctx);
@@ -758,8 +759,9 @@ static inline void emit_a32_mov_r64(const bool is64, const s8 dst[],
 				  struct jit_ctx *ctx) {
 	if (!is64) {
 		emit_a32_mov_r(dst_lo, src_lo, ctx);
-		/* Zero out high 4 bytes */
-		emit_a32_mov_i(dst_hi, 0, ctx);
+		if (!ctx->prog->aux->verifier_zext)
+			/* Zero out high 4 bytes */
+			emit_a32_mov_i(dst_hi, 0, ctx);
 	} else if (__LINUX_ARM_ARCH__ < 6 &&
 		   ctx->cpu_architecture < CPU_ARCH_ARMv5TE) {
 		/* complete 8 byte move */
@@ -1060,17 +1062,20 @@ static inline void emit_ldx_r(const s8 dst[], const s8 src,
 	case BPF_B:
 		/* Load a Byte */
 		emit(ARM_LDRB_I(rd[1], rm, off), ctx);
-		emit_a32_mov_i(rd[0], 0, ctx);
+		if (!ctx->prog->aux->verifier_zext)
+			emit_a32_mov_i(rd[0], 0, ctx);
 		break;
 	case BPF_H:
 		/* Load a HalfWord */
 		emit(ARM_LDRH_I(rd[1], rm, off), ctx);
-		emit_a32_mov_i(rd[0], 0, ctx);
+		if (!ctx->prog->aux->verifier_zext)
+			emit_a32_mov_i(rd[0], 0, ctx);
 		break;
 	case BPF_W:
 		/* Load a Word */
 		emit(ARM_LDR_I(rd[1], rm, off), ctx);
-		emit_a32_mov_i(rd[0], 0, ctx);
+		if (!ctx->prog->aux->verifier_zext)
+			emit_a32_mov_i(rd[0], 0, ctx);
 		break;
 	case BPF_DW:
 		/* Load a Double Word */
@@ -1352,6 +1357,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 	switch (code) {
 	/* ALU operations */
 
+	/* dst = (u32) dst */
+	case BPF_ALU | BPF_ZEXT:
+		emit_a32_mov_i(dst_hi, 0, ctx);
+		break;
 	/* dst = src */
 	case BPF_ALU | BPF_MOV | BPF_K:
 	case BPF_ALU | BPF_MOV | BPF_X:
@@ -1438,7 +1447,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 		}
 		emit_udivmod(rd_lo, rd_lo, rt, ctx, BPF_OP(code));
 		arm_bpf_put_reg32(dst_lo, rd_lo, ctx);
-		emit_a32_mov_i(dst_hi, 0, ctx);
+		if (!ctx->prog->aux->verifier_zext)
+			emit_a32_mov_i(dst_hi, 0, ctx);
 		break;
 	case BPF_ALU64 | BPF_DIV | BPF_K:
 	case BPF_ALU64 | BPF_DIV | BPF_X:
@@ -1453,7 +1463,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 			return -EINVAL;
 		if (imm)
 			emit_a32_alu_i(dst_lo, imm, ctx, BPF_OP(code));
-		emit_a32_mov_i(dst_hi, 0, ctx);
+		if (!ctx->prog->aux->verifier_zext)
+			emit_a32_mov_i(dst_hi, 0, ctx);
 		break;
 	/* dst = dst << imm */
 	case BPF_ALU64 | BPF_LSH | BPF_K:
@@ -1488,7 +1499,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 	/* dst = ~dst */
 	case BPF_ALU | BPF_NEG:
 		emit_a32_alu_i(dst_lo, 0, ctx, BPF_OP(code));
-		emit_a32_mov_i(dst_hi, 0, ctx);
+		if (!ctx->prog->aux->verifier_zext)
+			emit_a32_mov_i(dst_hi, 0, ctx);
 		break;
 	/* dst = ~dst (64 bit) */
 	case BPF_ALU64 | BPF_NEG:
@@ -1838,6 +1850,11 @@ void bpf_jit_compile(struct bpf_prog *prog)
 	/* Nothing to do here. We support Internal BPF. */
 }
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
 	struct bpf_prog *tmp, *orig_prog = prog;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 12/17] powerpc: bpf: eliminate zero extension code-gen
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (10 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 11/17] arm: bpf: eliminate zero extension code-gen Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 13/17] s390: " Jiong Wang
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Naveen N . Rao, Sandipan Das

Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/powerpc/net/bpf_jit_comp64.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 21a1dcd..9fef73dc 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -557,9 +557,15 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 				goto bpf_alu32_trunc;
 			break;
 
+		/*
+		 * ZEXT
+		 */
+		case BPF_ALU | BPF_ZEXT:
+			PPC_RLWINM(dst_reg, dst_reg, 0, 0, 31);
+			break;
 bpf_alu32_trunc:
 		/* Truncate to 32-bits */
-		if (BPF_CLASS(code) == BPF_ALU)
+		if (BPF_CLASS(code) == BPF_ALU && !fp->aux->verifier_zext)
 			PPC_RLWINM(dst_reg, dst_reg, 0, 0, 31);
 		break;
 
@@ -1046,6 +1052,11 @@ struct powerpc64_jit_data {
 	struct codegen_context ctx;
 };
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 {
 	u32 proglen;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 13/17] s390: bpf: eliminate zero extension code-gen
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (11 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 12/17] powerpc: " Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 13:41   ` Heiko Carstens
  2019-05-03 10:42 ` [PATCH v6 bpf-next 14/17] sparc: " Jiong Wang
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Martin Schwidefsky, Heiko Carstens

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/s390/net/bpf_jit_comp.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 51dd026..8315b2e 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -299,9 +299,11 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
 
 #define EMIT_ZERO(b1)						\
 ({								\
-	/* llgfr %dst,%dst (zero extend to 64 bit) */		\
-	EMIT4(0xb9160000, b1, b1);				\
-	REG_SET_SEEN(b1);					\
+	if (!fp->aux->verifier_zext) {				\
+		/* llgfr %dst,%dst (zero extend to 64 bit) */	\
+		EMIT4(0xb9160000, b1, b1);			\
+		REG_SET_SEEN(b1);				\
+	}							\
 })
 
 /*
@@ -515,6 +517,13 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
 		jit->seen |= SEEN_REG_AX;
 	switch (insn->code) {
 	/*
+	 * BPF_ZEXT
+	 */
+	case BPF_ALU | BPF_ZEXT: /* dst = (u32) dst */
+		/* llgfr %dst,%dst */
+		EMIT4(0xb9160000, dst_reg, dst_reg);
+		break;
+	/*
 	 * BPF_MOV
 	 */
 	case BPF_ALU | BPF_MOV | BPF_X: /* dst = (u32) src */
@@ -1282,6 +1291,11 @@ static int bpf_jit_prog(struct bpf_jit *jit, struct bpf_prog *fp)
 	return 0;
 }
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 /*
  * Compile eBPF program "fp"
  */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 14/17] sparc: bpf: eliminate zero extension code-gen
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (12 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 13/17] s390: " Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 15/17] x32: " Jiong Wang
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, David S . Miller

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/sparc/net/bpf_jit_comp_64.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 65428e7..8bac761 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -905,6 +905,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 		ctx->saw_frame_pointer = true;
 
 	switch (code) {
+	/* dst = (u32) dst */
+	case BPF_ALU | BPF_ZEXT:
+		emit_alu_K(SRL, dst, 0, ctx);
+		break;
 	/* dst = src */
 	case BPF_ALU | BPF_MOV | BPF_X:
 		emit_alu3_K(SRL, src, 0, dst, ctx);
@@ -1144,7 +1148,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 		break;
 
 	do_alu32_trunc:
-		if (BPF_CLASS(code) == BPF_ALU)
+		if (BPF_CLASS(code) == BPF_ALU &&
+		    !ctx->prog->aux->verifier_zext)
 			emit_alu_K(SRL, dst, 0, ctx);
 		break;
 
@@ -1432,6 +1437,11 @@ static void jit_fill_hole(void *area, unsigned int size)
 		*ptr++ = 0x91d02005; /* ta 5 */
 }
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 struct sparc64_jit_data {
 	struct bpf_binary_header *header;
 	u8 *image;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 15/17] x32: bpf: eliminate zero extension code-gen
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (13 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 14/17] sparc: " Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 16/17] riscv: " Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 17/17] nfp: " Jiong Wang
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Wang YanQing

Cc: Wang YanQing <udknight@gmail.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/x86/net/bpf_jit_comp32.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index 0d9cdff..16c4f4e 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -567,7 +567,7 @@ static inline void emit_ia32_alu_r(const bool is64, const bool hi, const u8 op,
 static inline void emit_ia32_alu_r64(const bool is64, const u8 op,
 				     const u8 dst[], const u8 src[],
 				     bool dstk,  bool sstk,
-				     u8 **pprog)
+				     u8 **pprog, const struct bpf_prog_aux *aux)
 {
 	u8 *prog = *pprog;
 
@@ -575,7 +575,7 @@ static inline void emit_ia32_alu_r64(const bool is64, const u8 op,
 	if (is64)
 		emit_ia32_alu_r(is64, true, op, dst_hi, src_hi, dstk, sstk,
 				&prog);
-	else
+	else if (!aux->verifier_zext)
 		emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 	*pprog = prog;
 }
@@ -666,7 +666,8 @@ static inline void emit_ia32_alu_i(const bool is64, const bool hi, const u8 op,
 /* ALU operation (64 bit) */
 static inline void emit_ia32_alu_i64(const bool is64, const u8 op,
 				     const u8 dst[], const u32 val,
-				     bool dstk, u8 **pprog)
+				     bool dstk, u8 **pprog,
+				     const struct bpf_prog_aux *aux)
 {
 	u8 *prog = *pprog;
 	u32 hi = 0;
@@ -677,7 +678,7 @@ static inline void emit_ia32_alu_i64(const bool is64, const u8 op,
 	emit_ia32_alu_i(is64, false, op, dst_lo, val, dstk, &prog);
 	if (is64)
 		emit_ia32_alu_i(is64, true, op, dst_hi, hi, dstk, &prog);
-	else
+	else if (!aux->verifier_zext)
 		emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 
 	*pprog = prog;
@@ -1642,6 +1643,10 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 
 		switch (code) {
 		/* ALU operations */
+		/* dst = (u32) dst */
+		case BPF_ALU | BPF_ZEXT:
+			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			break;
 		/* dst = src */
 		case BPF_ALU | BPF_MOV | BPF_K:
 		case BPF_ALU | BPF_MOV | BPF_X:
@@ -1690,11 +1695,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 			switch (BPF_SRC(code)) {
 			case BPF_X:
 				emit_ia32_alu_r64(is64, BPF_OP(code), dst,
-						  src, dstk, sstk, &prog);
+						  src, dstk, sstk, &prog,
+						  bpf_prog->aux);
 				break;
 			case BPF_K:
 				emit_ia32_alu_i64(is64, BPF_OP(code), dst,
-						  imm32, dstk, &prog);
+						  imm32, dstk, &prog,
+						  bpf_prog->aux);
 				break;
 			}
 			break;
@@ -1713,7 +1720,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 						false, &prog);
 				break;
 			}
-			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			if (!bpf_prog->aux->verifier_zext)
+				emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 			break;
 		case BPF_ALU | BPF_LSH | BPF_X:
 		case BPF_ALU | BPF_RSH | BPF_X:
@@ -1733,7 +1741,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 						  &prog);
 				break;
 			}
-			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			if (!bpf_prog->aux->verifier_zext)
+				emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 			break;
 		/* dst = dst / src(imm) */
 		/* dst = dst % src(imm) */
@@ -1755,7 +1764,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 						    &prog);
 				break;
 			}
-			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			if (!bpf_prog->aux->verifier_zext)
+				emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 			break;
 		case BPF_ALU64 | BPF_DIV | BPF_K:
 		case BPF_ALU64 | BPF_DIV | BPF_X:
@@ -1772,7 +1782,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 			EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32);
 			emit_ia32_shift_r(BPF_OP(code), dst_lo, IA32_ECX, dstk,
 					  false, &prog);
-			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			if (!bpf_prog->aux->verifier_zext)
+				emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 			break;
 		/* dst = dst << imm */
 		case BPF_ALU64 | BPF_LSH | BPF_K:
@@ -1808,7 +1819,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 		case BPF_ALU | BPF_NEG:
 			emit_ia32_alu_i(is64, false, BPF_OP(code),
 					dst_lo, 0, dstk, &prog);
-			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			if (!bpf_prog->aux->verifier_zext)
+				emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 			break;
 		/* dst = ~dst (64 bit) */
 		case BPF_ALU64 | BPF_NEG:
@@ -2367,6 +2379,11 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 	return proglen;
 }
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
 	struct bpf_binary_header *header = NULL;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 16/17] riscv: bpf: eliminate zero extension code-gen
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (14 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 15/17] x32: " Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  2019-05-03 10:42 ` [PATCH v6 bpf-next 17/17] nfp: " Jiong Wang
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/riscv/net/bpf_jit_comp.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c
index 80b12aa..3074c9b 100644
--- a/arch/riscv/net/bpf_jit_comp.c
+++ b/arch/riscv/net/bpf_jit_comp.c
@@ -731,6 +731,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 {
 	bool is64 = BPF_CLASS(insn->code) == BPF_ALU64 ||
 		    BPF_CLASS(insn->code) == BPF_JMP;
+	struct bpf_prog_aux *aux = ctx->prog->aux;
 	int rvoff, i = insn - ctx->prog->insnsi;
 	u8 rd = -1, rs = -1, code = insn->code;
 	s16 off = insn->off;
@@ -739,11 +740,15 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	init_regs(&rd, &rs, insn, ctx);
 
 	switch (code) {
+	/* dst = (u32) dst */
+	case BPF_ALU | BPF_ZEXT:
+		emit_zext_32(rd, ctx);
+		break;
 	/* dst = src */
 	case BPF_ALU | BPF_MOV | BPF_X:
 	case BPF_ALU64 | BPF_MOV | BPF_X:
 		emit(is64 ? rv_addi(rd, rs, 0) : rv_addiw(rd, rs, 0), ctx);
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 
@@ -771,19 +776,19 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU | BPF_MUL | BPF_X:
 	case BPF_ALU64 | BPF_MUL | BPF_X:
 		emit(is64 ? rv_mul(rd, rd, rs) : rv_mulw(rd, rd, rs), ctx);
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_DIV | BPF_X:
 	case BPF_ALU64 | BPF_DIV | BPF_X:
 		emit(is64 ? rv_divu(rd, rd, rs) : rv_divuw(rd, rd, rs), ctx);
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_MOD | BPF_X:
 	case BPF_ALU64 | BPF_MOD | BPF_X:
 		emit(is64 ? rv_remu(rd, rd, rs) : rv_remuw(rd, rd, rs), ctx);
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_LSH | BPF_X:
@@ -867,7 +872,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU | BPF_MOV | BPF_K:
 	case BPF_ALU64 | BPF_MOV | BPF_K:
 		emit_imm(rd, imm, ctx);
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 
@@ -882,7 +887,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit(is64 ? rv_add(rd, rd, RV_REG_T1) :
 			     rv_addw(rd, rd, RV_REG_T1), ctx);
 		}
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_SUB | BPF_K:
@@ -895,7 +900,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit(is64 ? rv_sub(rd, rd, RV_REG_T1) :
 			     rv_subw(rd, rd, RV_REG_T1), ctx);
 		}
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_AND | BPF_K:
@@ -906,7 +911,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_imm(RV_REG_T1, imm, ctx);
 			emit(rv_and(rd, rd, RV_REG_T1), ctx);
 		}
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_OR | BPF_K:
@@ -917,7 +922,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_imm(RV_REG_T1, imm, ctx);
 			emit(rv_or(rd, rd, RV_REG_T1), ctx);
 		}
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_XOR | BPF_K:
@@ -928,7 +933,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_imm(RV_REG_T1, imm, ctx);
 			emit(rv_xor(rd, rd, RV_REG_T1), ctx);
 		}
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_MUL | BPF_K:
@@ -936,7 +941,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		emit_imm(RV_REG_T1, imm, ctx);
 		emit(is64 ? rv_mul(rd, rd, RV_REG_T1) :
 		     rv_mulw(rd, rd, RV_REG_T1), ctx);
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_DIV | BPF_K:
@@ -944,7 +949,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		emit_imm(RV_REG_T1, imm, ctx);
 		emit(is64 ? rv_divu(rd, rd, RV_REG_T1) :
 		     rv_divuw(rd, rd, RV_REG_T1), ctx);
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_MOD | BPF_K:
@@ -952,7 +957,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		emit_imm(RV_REG_T1, imm, ctx);
 		emit(is64 ? rv_remu(rd, rd, RV_REG_T1) :
 		     rv_remuw(rd, rd, RV_REG_T1), ctx);
-		if (!is64)
+		if (!is64 && !aux->verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_LSH | BPF_K:
@@ -1503,6 +1508,11 @@ static void bpf_flush_icache(void *start, void *end)
 	flush_icache_range((unsigned long)start, (unsigned long)end);
 }
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
 	bool tmp_blinded = false, extra_pass = false;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v6 bpf-next 17/17] nfp: bpf: eliminate zero extension code-gen
  2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (15 preceding siblings ...)
  2019-05-03 10:42 ` [PATCH v6 bpf-next 16/17] riscv: " Jiong Wang
@ 2019-05-03 10:42 ` Jiong Wang
  16 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 10:42 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

This patch eliminate zero extension code-gen for instructions including
both alu and load/store. The only exception is for ctx load, because
offload target doesn't go through host ctx convert logic so we do
customized load and ignores zext flag set by verifier.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c      | 115 +++++++++++++---------
 drivers/net/ethernet/netronome/nfp/bpf/main.h     |   2 +
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |  12 +++
 3 files changed, 81 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index f272247..634bae0 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -612,6 +612,13 @@ static void wrp_immed(struct nfp_prog *nfp_prog, swreg dst, u32 imm)
 }
 
 static void
+wrp_zext(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, u8 dst)
+{
+	if (meta->flags & FLAG_INSN_DO_ZEXT)
+		wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+}
+
+static void
 wrp_immed_relo(struct nfp_prog *nfp_prog, swreg dst, u32 imm,
 	       enum nfp_relo_type relo)
 {
@@ -847,7 +854,8 @@ static int nfp_cpp_memcpy(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 }
 
 static int
-data_ld(struct nfp_prog *nfp_prog, swreg offset, u8 dst_gpr, int size)
+data_ld(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, swreg offset,
+	u8 dst_gpr, int size)
 {
 	unsigned int i;
 	u16 shift, sz;
@@ -870,14 +878,15 @@ data_ld(struct nfp_prog *nfp_prog, swreg offset, u8 dst_gpr, int size)
 			wrp_mov(nfp_prog, reg_both(dst_gpr + i), reg_xfer(i));
 
 	if (i < 2)
-		wrp_immed(nfp_prog, reg_both(dst_gpr + 1), 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 
 	return 0;
 }
 
 static int
-data_ld_host_order(struct nfp_prog *nfp_prog, u8 dst_gpr,
-		   swreg lreg, swreg rreg, int size, enum cmd_mode mode)
+data_ld_host_order(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+		   u8 dst_gpr, swreg lreg, swreg rreg, int size,
+		   enum cmd_mode mode)
 {
 	unsigned int i;
 	u8 mask, sz;
@@ -900,33 +909,34 @@ data_ld_host_order(struct nfp_prog *nfp_prog, u8 dst_gpr,
 			wrp_mov(nfp_prog, reg_both(dst_gpr + i), reg_xfer(i));
 
 	if (i < 2)
-		wrp_immed(nfp_prog, reg_both(dst_gpr + 1), 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 
 	return 0;
 }
 
 static int
-data_ld_host_order_addr32(struct nfp_prog *nfp_prog, u8 src_gpr, swreg offset,
-			  u8 dst_gpr, u8 size)
+data_ld_host_order_addr32(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+			  u8 src_gpr, swreg offset, u8 dst_gpr, u8 size)
 {
-	return data_ld_host_order(nfp_prog, dst_gpr, reg_a(src_gpr), offset,
-				  size, CMD_MODE_32b);
+	return data_ld_host_order(nfp_prog, meta, dst_gpr, reg_a(src_gpr),
+				  offset, size, CMD_MODE_32b);
 }
 
 static int
-data_ld_host_order_addr40(struct nfp_prog *nfp_prog, u8 src_gpr, swreg offset,
-			  u8 dst_gpr, u8 size)
+data_ld_host_order_addr40(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+			  u8 src_gpr, swreg offset, u8 dst_gpr, u8 size)
 {
 	swreg rega, regb;
 
 	addr40_offset(nfp_prog, src_gpr, offset, &rega, &regb);
 
-	return data_ld_host_order(nfp_prog, dst_gpr, rega, regb,
+	return data_ld_host_order(nfp_prog, meta, dst_gpr, rega, regb,
 				  size, CMD_MODE_40b_BA);
 }
 
 static int
-construct_data_ind_ld(struct nfp_prog *nfp_prog, u16 offset, u16 src, u8 size)
+construct_data_ind_ld(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+		      u16 offset, u16 src, u8 size)
 {
 	swreg tmp_reg;
 
@@ -942,10 +952,12 @@ construct_data_ind_ld(struct nfp_prog *nfp_prog, u16 offset, u16 src, u8 size)
 	emit_br_relo(nfp_prog, BR_BLO, BR_OFF_RELO, 0, RELO_BR_GO_ABORT);
 
 	/* Load data */
-	return data_ld(nfp_prog, imm_b(nfp_prog), 0, size);
+	return data_ld(nfp_prog, meta, imm_b(nfp_prog), 0, size);
 }
 
-static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
+static int
+construct_data_ld(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+		  u16 offset, u8 size)
 {
 	swreg tmp_reg;
 
@@ -956,7 +968,7 @@ static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
 
 	/* Load data */
 	tmp_reg = re_load_imm_any(nfp_prog, offset, imm_b(nfp_prog));
-	return data_ld(nfp_prog, tmp_reg, 0, size);
+	return data_ld(nfp_prog, meta, tmp_reg, 0, size);
 }
 
 static int
@@ -1193,7 +1205,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	}
 
 	if (clr_gpr && size < 8)
-		wrp_immed(nfp_prog, reg_both(gpr + 1), 0);
+		wrp_zext(nfp_prog, meta, gpr);
 
 	while (size) {
 		u32 slice_end;
@@ -1294,9 +1306,10 @@ wrp_alu32_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	      enum alu_op alu_op)
 {
 	const struct bpf_insn *insn = &meta->insn;
+	u8 dst = insn->dst_reg * 2;
 
-	wrp_alu_imm(nfp_prog, insn->dst_reg * 2, alu_op, insn->imm);
-	wrp_immed(nfp_prog, reg_both(insn->dst_reg * 2 + 1), 0);
+	wrp_alu_imm(nfp_prog, dst, alu_op, insn->imm);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
@@ -1308,7 +1321,7 @@ wrp_alu32_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	u8 dst = meta->insn.dst_reg * 2, src = meta->insn.src_reg * 2;
 
 	emit_alu(nfp_prog, reg_both(dst), reg_a(dst), alu_op, reg_b(src));
-	wrp_immed(nfp_prog, reg_both(meta->insn.dst_reg * 2 + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
@@ -2385,12 +2398,14 @@ static int neg_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	u8 dst = meta->insn.dst_reg * 2;
 
 	emit_alu(nfp_prog, reg_both(dst), reg_imm(0), ALU_OP_SUB, reg_b(dst));
-	wrp_immed(nfp_prog, reg_both(meta->insn.dst_reg * 2 + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
 
-static int __ashr_imm(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
+static int
+__ashr_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, u8 dst,
+	   u8 shift_amt)
 {
 	if (shift_amt) {
 		/* Set signedness bit (MSB of result). */
@@ -2399,7 +2414,7 @@ static int __ashr_imm(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
 		emit_shf(nfp_prog, reg_both(dst), reg_none(), SHF_OP_ASHR,
 			 reg_b(dst), SHF_SC_R_SHF, shift_amt);
 	}
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
@@ -2414,7 +2429,7 @@ static int ashr_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	umin = meta->umin_src;
 	umax = meta->umax_src;
 	if (umin == umax)
-		return __ashr_imm(nfp_prog, dst, umin);
+		return __ashr_imm(nfp_prog, meta, dst, umin);
 
 	src = insn->src_reg * 2;
 	/* NOTE: the first insn will set both indirect shift amount (source A)
@@ -2423,7 +2438,7 @@ static int ashr_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_b(dst));
 	emit_shf_indir(nfp_prog, reg_both(dst), reg_none(), SHF_OP_ASHR,
 		       reg_b(dst), SHF_SC_R_SHF);
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
@@ -2433,15 +2448,17 @@ static int ashr_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	const struct bpf_insn *insn = &meta->insn;
 	u8 dst = insn->dst_reg * 2;
 
-	return __ashr_imm(nfp_prog, dst, insn->imm);
+	return __ashr_imm(nfp_prog, meta, dst, insn->imm);
 }
 
-static int __shr_imm(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
+static int
+__shr_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, u8 dst,
+	  u8 shift_amt)
 {
 	if (shift_amt)
 		emit_shf(nfp_prog, reg_both(dst), reg_none(), SHF_OP_NONE,
 			 reg_b(dst), SHF_SC_R_SHF, shift_amt);
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 	return 0;
 }
 
@@ -2450,7 +2467,7 @@ static int shr_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	const struct bpf_insn *insn = &meta->insn;
 	u8 dst = insn->dst_reg * 2;
 
-	return __shr_imm(nfp_prog, dst, insn->imm);
+	return __shr_imm(nfp_prog, meta, dst, insn->imm);
 }
 
 static int shr_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
@@ -2463,22 +2480,24 @@ static int shr_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	umin = meta->umin_src;
 	umax = meta->umax_src;
 	if (umin == umax)
-		return __shr_imm(nfp_prog, dst, umin);
+		return __shr_imm(nfp_prog, meta, dst, umin);
 
 	src = insn->src_reg * 2;
 	emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_imm(0));
 	emit_shf_indir(nfp_prog, reg_both(dst), reg_none(), SHF_OP_NONE,
 		       reg_b(dst), SHF_SC_R_SHF);
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 	return 0;
 }
 
-static int __shl_imm(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
+static int
+__shl_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, u8 dst,
+	  u8 shift_amt)
 {
 	if (shift_amt)
 		emit_shf(nfp_prog, reg_both(dst), reg_none(), SHF_OP_NONE,
 			 reg_b(dst), SHF_SC_L_SHF, shift_amt);
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 	return 0;
 }
 
@@ -2487,7 +2506,7 @@ static int shl_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	const struct bpf_insn *insn = &meta->insn;
 	u8 dst = insn->dst_reg * 2;
 
-	return __shl_imm(nfp_prog, dst, insn->imm);
+	return __shl_imm(nfp_prog, meta, dst, insn->imm);
 }
 
 static int shl_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
@@ -2500,11 +2519,11 @@ static int shl_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	umin = meta->umin_src;
 	umax = meta->umax_src;
 	if (umin == umax)
-		return __shl_imm(nfp_prog, dst, umin);
+		return __shl_imm(nfp_prog, meta, dst, umin);
 
 	src = insn->src_reg * 2;
 	shl_reg64_lt32_low(nfp_prog, dst, src);
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 	return 0;
 }
 
@@ -2566,34 +2585,34 @@ static int imm_ld8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 
 static int data_ld1(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ld(nfp_prog, meta->insn.imm, 1);
+	return construct_data_ld(nfp_prog, meta, meta->insn.imm, 1);
 }
 
 static int data_ld2(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ld(nfp_prog, meta->insn.imm, 2);
+	return construct_data_ld(nfp_prog, meta, meta->insn.imm, 2);
 }
 
 static int data_ld4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ld(nfp_prog, meta->insn.imm, 4);
+	return construct_data_ld(nfp_prog, meta, meta->insn.imm, 4);
 }
 
 static int data_ind_ld1(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ind_ld(nfp_prog, meta->insn.imm,
+	return construct_data_ind_ld(nfp_prog, meta, meta->insn.imm,
 				     meta->insn.src_reg * 2, 1);
 }
 
 static int data_ind_ld2(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ind_ld(nfp_prog, meta->insn.imm,
+	return construct_data_ind_ld(nfp_prog, meta, meta->insn.imm,
 				     meta->insn.src_reg * 2, 2);
 }
 
 static int data_ind_ld4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ind_ld(nfp_prog, meta->insn.imm,
+	return construct_data_ind_ld(nfp_prog, meta, meta->insn.imm,
 				     meta->insn.src_reg * 2, 4);
 }
 
@@ -2671,7 +2690,7 @@ mem_ldx_data(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 
 	tmp_reg = re_load_imm_any(nfp_prog, meta->insn.off, imm_b(nfp_prog));
 
-	return data_ld_host_order_addr32(nfp_prog, meta->insn.src_reg * 2,
+	return data_ld_host_order_addr32(nfp_prog, meta, meta->insn.src_reg * 2,
 					 tmp_reg, meta->insn.dst_reg * 2, size);
 }
 
@@ -2683,7 +2702,7 @@ mem_ldx_emem(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 
 	tmp_reg = re_load_imm_any(nfp_prog, meta->insn.off, imm_b(nfp_prog));
 
-	return data_ld_host_order_addr40(nfp_prog, meta->insn.src_reg * 2,
+	return data_ld_host_order_addr40(nfp_prog, meta, meta->insn.src_reg * 2,
 					 tmp_reg, meta->insn.dst_reg * 2, size);
 }
 
@@ -2744,7 +2763,7 @@ mem_ldx_data_from_pktcache_unaligned(struct nfp_prog *nfp_prog,
 	wrp_reg_subpart(nfp_prog, dst_lo, src_lo, len_lo, off);
 
 	if (!len_mid) {
-		wrp_immed(nfp_prog, dst_hi, 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 		return 0;
 	}
 
@@ -2752,7 +2771,7 @@ mem_ldx_data_from_pktcache_unaligned(struct nfp_prog *nfp_prog,
 
 	if (size <= REG_WIDTH) {
 		wrp_reg_or_subpart(nfp_prog, dst_lo, src_mid, len_mid, len_lo);
-		wrp_immed(nfp_prog, dst_hi, 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 	} else {
 		swreg src_hi = reg_xfer(idx + 2);
 
@@ -2783,10 +2802,10 @@ mem_ldx_data_from_pktcache_aligned(struct nfp_prog *nfp_prog,
 
 	if (size < REG_WIDTH) {
 		wrp_reg_subpart(nfp_prog, dst_lo, src_lo, size, 0);
-		wrp_immed(nfp_prog, dst_hi, 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 	} else if (size == REG_WIDTH) {
 		wrp_mov(nfp_prog, dst_lo, src_lo);
-		wrp_immed(nfp_prog, dst_hi, 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 	} else {
 		swreg src_hi = reg_xfer(idx + 1);
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index e54d1ac..57d6ff5 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -238,6 +238,8 @@ struct nfp_bpf_reg_state {
 #define FLAG_INSN_SKIP_PREC_DEPENDENT		BIT(4)
 /* Instruction is optimized by the verifier */
 #define FLAG_INSN_SKIP_VERIFIER_OPT		BIT(5)
+/* Instruction needs to zero extend to high 32-bit */
+#define FLAG_INSN_DO_ZEXT			BIT(6)
 
 #define FLAG_INSN_SKIP_MASK		(FLAG_INSN_SKIP_NOOP | \
 					 FLAG_INSN_SKIP_PREC_DEPENDENT | \
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 36f56eb..e92ee51 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -744,6 +744,17 @@ static unsigned int nfp_bpf_get_stack_usage(struct nfp_prog *nfp_prog)
 	goto continue_subprog;
 }
 
+static void nfp_bpf_insn_flag_zext(struct nfp_prog *nfp_prog,
+				   struct bpf_insn_aux_data *aux)
+{
+	struct nfp_insn_meta *meta;
+
+	list_for_each_entry(meta, &nfp_prog->insns, l) {
+		if (aux[meta->n].zext_dst)
+			meta->flags |= FLAG_INSN_DO_ZEXT;
+	}
+}
+
 int nfp_bpf_finalize(struct bpf_verifier_env *env)
 {
 	struct bpf_subprog_info *info;
@@ -784,6 +795,7 @@ int nfp_bpf_finalize(struct bpf_verifier_env *env)
 		return -EOPNOTSUPP;
 	}
 
+	nfp_bpf_insn_flag_zext(nfp_prog, env->insn_aux_data);
 	return 0;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 13/17] s390: bpf: eliminate zero extension code-gen
  2019-05-03 10:42 ` [PATCH v6 bpf-next 13/17] s390: " Jiong Wang
@ 2019-05-03 13:41   ` Heiko Carstens
  2019-05-03 13:50     ` Eric Dumazet
  2019-05-03 14:09     ` Jiong Wang
  0 siblings, 2 replies; 40+ messages in thread
From: Heiko Carstens @ 2019-05-03 13:41 UTC (permalink / raw)
  To: Jiong Wang
  Cc: alexei.starovoitov, daniel, bpf, netdev, oss-drivers, Martin Schwidefsky

On Fri, May 03, 2019 at 11:42:40AM +0100, Jiong Wang wrote:
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  arch/s390/net/bpf_jit_comp.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)

When sending patches which affect s390, could you please add Martin
and me on cc to _all_ patches? We now received only the cover-letter
plus one patch. It's always hard in such cirumstances to figure out if
the code is doing the right thing.

Usually I end up looking up the missing patches within other mailing
lists, however I haven't subscribed the bpf and netdev mailing lists.

The extra e-mail volume because of being added to CC really doesn't
matter at all.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 13/17] s390: bpf: eliminate zero extension code-gen
  2019-05-03 13:41   ` Heiko Carstens
@ 2019-05-03 13:50     ` Eric Dumazet
  2019-05-03 14:09     ` Jiong Wang
  1 sibling, 0 replies; 40+ messages in thread
From: Eric Dumazet @ 2019-05-03 13:50 UTC (permalink / raw)
  To: Heiko Carstens, Jiong Wang
  Cc: alexei.starovoitov, daniel, bpf, netdev, oss-drivers, Martin Schwidefsky



On 5/3/19 9:41 AM, Heiko Carstens wrote:
> On Fri, May 03, 2019 at 11:42:40AM +0100, Jiong Wang wrote:
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>>  arch/s390/net/bpf_jit_comp.c | 20 +++++++++++++++++---
>>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> When sending patches which affect s390, could you please add Martin
> and me on cc to _all_ patches? We now received only the cover-letter
> plus one patch. It's always hard in such cirumstances to figure out if
> the code is doing the right thing.
> 
>
One possible way is to use  --signed-off-by-cc option in git send-email

       --[no-]signed-off-by-cc
           If this is set, add emails found in Signed-off-by: or Cc: lines to the cc list.
           Default is the value of sendemail.signedoffbycc configuration value; if that is
           unspecified, default to --signed-off-by-cc.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 13/17] s390: bpf: eliminate zero extension code-gen
  2019-05-03 13:41   ` Heiko Carstens
  2019-05-03 13:50     ` Eric Dumazet
@ 2019-05-03 14:09     ` Jiong Wang
  1 sibling, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-03 14:09 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Jiong Wang, alexei.starovoitov, daniel, bpf, netdev, oss-drivers,
	Martin Schwidefsky


Heiko Carstens writes:

> On Fri, May 03, 2019 at 11:42:40AM +0100, Jiong Wang wrote:
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>>  arch/s390/net/bpf_jit_comp.c | 20 +++++++++++++++++---
>>  1 file changed, 17 insertions(+), 3 deletions(-)
>
> When sending patches which affect s390, could you please add Martin
> and me on cc to _all_ patches? We now received only the cover-letter
> plus one patch. It's always hard in such cirumstances to figure out if
> the code is doing the right thing.

OK, will do it next time.

Will just CC back-end maintainers on all patches including patches for the
other back-ends to make the information complete.

Regards,
Jiong

> Usually I end up looking up the missing patches within other mailing
> lists, however I haven't subscribed the bpf and netdev mailing lists.
>
> The extra e-mail volume because of being added to CC really doesn't
> matter at all.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 02/17] bpf: verifier: mark verified-insn with sub-register zext flag
  2019-05-03 10:42 ` [PATCH v6 bpf-next 02/17] bpf: verifier: mark verified-insn with sub-register zext flag Jiong Wang
@ 2019-05-06 13:49   ` Daniel Borkmann
  2019-05-06 14:49     ` Daniel Borkmann
  2019-05-06 22:14     ` Jiong Wang
  0 siblings, 2 replies; 40+ messages in thread
From: Daniel Borkmann @ 2019-05-06 13:49 UTC (permalink / raw)
  To: Jiong Wang, alexei.starovoitov; +Cc: bpf, netdev, oss-drivers

On 05/03/2019 12:42 PM, Jiong Wang wrote:
> eBPF ISA specification requires high 32-bit cleared when low 32-bit
> sub-register is written. This applies to destination register of ALU32 etc.
> JIT back-ends must guarantee this semantic when doing code-gen.
> 
> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
> extension sequence to meet such semantic.
> 
> This is important, because for code the following:
> 
>   u64_value = (u64) u32_value
>   ... other uses of u64_value
> 
> compiler could exploit the semantic described above and save those zero
> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
> could go up to more for some arches which requires two shifts for zero
> extension) because JIT back-end needs to do extra code-gen for all such
> instructions.
> 
> However this is not always necessary in case u32_value is never cast into
> a u64, which is quite normal in real life program. So, it would be really
> good if we could identify those places where such type cast happened, and
> only do zero extensions for them, not for the others. This could save a lot
> of BPF code-gen.
> 
> Algo:
>  - Split read flags into READ32 and READ64.
> 
>  - Record indices of instructions that do sub-register def (write). And
>    these indices need to stay with reg state so path pruning and bpf
>    to bpf function call could be handled properly.
> 
>    These indices are kept up to date while doing insn walk.
> 
>  - A full register read on an active sub-register def marks the def insn as
>    needing zero extension on dst register.
> 
>  - A new sub-register write overrides the old one.
> 
>    A new full register write makes the register free of zero extension on
>    dst register.
> 
>  - When propagating read64 during path pruning, also marks def insns whose
>    defs are hanging active sub-register.
> 
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>

[...]
> +/* This function is supposed to be used by the following 32-bit optimization
> + * code only. It returns TRUE if the source or destination register operates
> + * on 64-bit, otherwise return FALSE.
> + */
> +static bool is_reg64(struct bpf_verifier_env *env, struct bpf_insn *insn,
> +		     u32 regno, struct bpf_reg_state *reg, enum reg_arg_type t)
> +{
> +	u8 code, class, op;
> +
> +	code = insn->code;
> +	class = BPF_CLASS(code);
> +	op = BPF_OP(code);
> +	if (class == BPF_JMP) {
> +		/* BPF_EXIT for "main" will reach here. Return TRUE
> +		 * conservatively.
> +		 */
> +		if (op == BPF_EXIT)
> +			return true;
> +		if (op == BPF_CALL) {
> +			/* BPF to BPF call will reach here because of marking
> +			 * caller saved clobber with DST_OP_NO_MARK for which we
> +			 * don't care the register def because they are anyway
> +			 * marked as NOT_INIT already.
> +			 */
> +			if (insn->src_reg == BPF_PSEUDO_CALL)
> +				return false;
> +			/* Helper call will reach here because of arg type
> +			 * check.
> +			 */
> +			if (t == SRC_OP)
> +				return helper_call_arg64(env, insn->imm, regno);
> +
> +			return false;
> +		}
> +	}
> +
> +	if (class == BPF_ALU64 || class == BPF_JMP ||
> +	    /* BPF_END always use BPF_ALU class. */
> +	    (class == BPF_ALU && op == BPF_END && insn->imm == 64))
> +		return true;

For the BPF_JMP + JA case we don't look at registers, but I presume here
we 'pretend' to use 64 bit regs to be more conservative as verifier would
otherwise need to do more complex analysis at the jump target wrt zero
extension, correct?

> +
> +	if (class == BPF_ALU || class == BPF_JMP32)
> +		return false;
> +
> +	if (class == BPF_LDX) {
> +		if (t != SRC_OP)
> +			return BPF_SIZE(code) == BPF_DW;
> +		/* LDX source must be ptr. */
> +		return true;
> +	}
> +
> +	if (class == BPF_STX) {
> +		if (reg->type != SCALAR_VALUE)
> +			return true;
> +		return BPF_SIZE(code) == BPF_DW;
> +	}
> +
> +	if (class == BPF_LD) {
> +		u8 mode = BPF_MODE(code);
> +
> +		/* LD_IMM64 */
> +		if (mode == BPF_IMM)
> +			return true;
> +
> +		/* Both LD_IND and LD_ABS return 32-bit data. */
> +		if (t != SRC_OP)
> +			return  false;
> +
> +		/* Implicit ctx ptr. */
> +		if (regno == BPF_REG_6)
> +			return true;
> +
> +		/* Explicit source could be any width. */
> +		return true;
> +	}
> +
> +	if (class == BPF_ST)
> +		/* The only source register for BPF_ST is a ptr. */
> +		return true;
> +
> +	/* Conservatively return true at default. */
> +	return true;
> +}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-03 10:42 ` [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type Jiong Wang
@ 2019-05-06 13:57   ` Daniel Borkmann
  2019-05-06 22:25     ` Jiong Wang
  2019-05-06 15:50   ` Alexei Starovoitov
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2019-05-06 13:57 UTC (permalink / raw)
  To: Jiong Wang, alexei.starovoitov; +Cc: bpf, netdev, oss-drivers

On 05/03/2019 12:42 PM, Jiong Wang wrote:
> BPF helper call transfers execution from eBPF insns to native functions
> while verifier insn walker only walks eBPF insns. So, verifier can only
> knows argument and return value types from explicit helper function
> prototype descriptions.
> 
> For 32-bit optimization, it is important to know whether argument (register
> use from eBPF insn) and return value (register define from external
> function) is 32-bit or 64-bit, so corresponding registers could be
> zero-extended correctly.
> 
> For arguments, they are register uses, we conservatively treat all of them
> as 64-bit at default, while the following new bpf_arg_type are added so we
> could start to mark those frequently used helper functions with more
> accurate argument type.
> 
>   ARG_CONST_SIZE32
>   ARG_CONST_SIZE32_OR_ZERO

For the above two, I was wondering is there a case where the passed size is
not used as 32 bit aka couldn't we generally assume 32 bit here w/o adding
these two extra arg types? For ARG_ANYTHING32 and RET_INTEGER64 definitely
makes sense (btw, opt-in value like RET_INTEGER32 might have been easier for
reviewing converted helpers).

>   ARG_ANYTHING32
> 
> A few helper functions shown up frequently inside Cilium bpf program are
> updated using these new types.
> 
> For return values, they are register defs, we need to know accurate width
> for correct zero extensions. Given most of the helper functions returning
> integers return 32-bit value, a new RET_INTEGER64 is added to make those
> functions return 64-bit value. All related helper functions are updated.
> 
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
[...]

> @@ -2003,9 +2003,9 @@ static const struct bpf_func_proto bpf_csum_diff_proto = {
>  	.pkt_access	= true,
>  	.ret_type	= RET_INTEGER,
>  	.arg1_type	= ARG_PTR_TO_MEM_OR_NULL,
> -	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
> +	.arg2_type	= ARG_CONST_SIZE32_OR_ZERO,
>  	.arg3_type	= ARG_PTR_TO_MEM_OR_NULL,
> -	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
> +	.arg4_type	= ARG_CONST_SIZE32_OR_ZERO,
>  	.arg5_type	= ARG_ANYTHING,
>  };

I noticed that the above and also bpf_csum_update() would need to be converted
to RET_INTEGER64 as they would break otherwise: it's returning error but also
u32 csum value, so use for error checking would be s64 ret = bpf_csum_xyz(...).

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 02/17] bpf: verifier: mark verified-insn with sub-register zext flag
  2019-05-06 13:49   ` Daniel Borkmann
@ 2019-05-06 14:49     ` Daniel Borkmann
  2019-05-06 22:14     ` Jiong Wang
  1 sibling, 0 replies; 40+ messages in thread
From: Daniel Borkmann @ 2019-05-06 14:49 UTC (permalink / raw)
  To: Jiong Wang, alexei.starovoitov; +Cc: bpf, netdev, oss-drivers

On 05/06/2019 03:49 PM, Daniel Borkmann wrote:
> On 05/03/2019 12:42 PM, Jiong Wang wrote:
>> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> sub-register is written. This applies to destination register of ALU32 etc.
>> JIT back-ends must guarantee this semantic when doing code-gen.
>>
>> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> extension sequence to meet such semantic.
>>
>> This is important, because for code the following:
>>
>>   u64_value = (u64) u32_value
>>   ... other uses of u64_value
>>
>> compiler could exploit the semantic described above and save those zero
>> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> could go up to more for some arches which requires two shifts for zero
>> extension) because JIT back-end needs to do extra code-gen for all such
>> instructions.
>>
>> However this is not always necessary in case u32_value is never cast into
>> a u64, which is quite normal in real life program. So, it would be really
>> good if we could identify those places where such type cast happened, and
>> only do zero extensions for them, not for the others. This could save a lot
>> of BPF code-gen.
>>
>> Algo:
>>  - Split read flags into READ32 and READ64.
>>
>>  - Record indices of instructions that do sub-register def (write). And
>>    these indices need to stay with reg state so path pruning and bpf
>>    to bpf function call could be handled properly.
>>
>>    These indices are kept up to date while doing insn walk.
>>
>>  - A full register read on an active sub-register def marks the def insn as
>>    needing zero extension on dst register.
>>
>>  - A new sub-register write overrides the old one.
>>
>>    A new full register write makes the register free of zero extension on
>>    dst register.
>>
>>  - When propagating read64 during path pruning, also marks def insns whose
>>    defs are hanging active sub-register.
>>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> 
> [...]
>> +/* This function is supposed to be used by the following 32-bit optimization
>> + * code only. It returns TRUE if the source or destination register operates
>> + * on 64-bit, otherwise return FALSE.
>> + */
>> +static bool is_reg64(struct bpf_verifier_env *env, struct bpf_insn *insn,
>> +		     u32 regno, struct bpf_reg_state *reg, enum reg_arg_type t)
>> +{
>> +	u8 code, class, op;
>> +
>> +	code = insn->code;
>> +	class = BPF_CLASS(code);
>> +	op = BPF_OP(code);
>> +	if (class == BPF_JMP) {
>> +		/* BPF_EXIT for "main" will reach here. Return TRUE
>> +		 * conservatively.
>> +		 */
>> +		if (op == BPF_EXIT)
>> +			return true;
>> +		if (op == BPF_CALL) {
>> +			/* BPF to BPF call will reach here because of marking
>> +			 * caller saved clobber with DST_OP_NO_MARK for which we
>> +			 * don't care the register def because they are anyway
>> +			 * marked as NOT_INIT already.
>> +			 */
>> +			if (insn->src_reg == BPF_PSEUDO_CALL)
>> +				return false;
>> +			/* Helper call will reach here because of arg type
>> +			 * check.
>> +			 */
>> +			if (t == SRC_OP)
>> +				return helper_call_arg64(env, insn->imm, regno);
>> +
>> +			return false;
>> +		}
>> +	}
>> +
>> +	if (class == BPF_ALU64 || class == BPF_JMP ||
>> +	    /* BPF_END always use BPF_ALU class. */
>> +	    (class == BPF_ALU && op == BPF_END && insn->imm == 64))
>> +		return true;
> 
> For the BPF_JMP + JA case we don't look at registers, but I presume here
> we 'pretend' to use 64 bit regs to be more conservative as verifier would
> otherwise need to do more complex analysis at the jump target wrt zero
> extension, correct?

Hmm, scratch that last thought. Shouldn't it behave the same as with the
below class == BPF_JMP32 case?

>> +	if (class == BPF_ALU || class == BPF_JMP32)
>> +		return false;
>> +
>> +	if (class == BPF_LDX) {
>> +		if (t != SRC_OP)
>> +			return BPF_SIZE(code) == BPF_DW;
>> +		/* LDX source must be ptr. */
>> +		return true;
>> +	}
>> +
>> +	if (class == BPF_STX) {
>> +		if (reg->type != SCALAR_VALUE)
>> +			return true;
>> +		return BPF_SIZE(code) == BPF_DW;
>> +	}
>> +
>> +	if (class == BPF_LD) {
>> +		u8 mode = BPF_MODE(code);
>> +
>> +		/* LD_IMM64 */
>> +		if (mode == BPF_IMM)
>> +			return true;
>> +
>> +		/* Both LD_IND and LD_ABS return 32-bit data. */
>> +		if (t != SRC_OP)
>> +			return  false;
>> +
>> +		/* Implicit ctx ptr. */
>> +		if (regno == BPF_REG_6)
>> +			return true;
>> +
>> +		/* Explicit source could be any width. */
>> +		return true;
>> +	}
>> +
>> +	if (class == BPF_ST)
>> +		/* The only source register for BPF_ST is a ptr. */
>> +		return true;
>> +
>> +	/* Conservatively return true at default. */
>> +	return true;
>> +}


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-03 10:42 ` [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type Jiong Wang
  2019-05-06 13:57   ` Daniel Borkmann
@ 2019-05-06 15:50   ` Alexei Starovoitov
  2019-05-08 14:45     ` Jiong Wang
  1 sibling, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2019-05-06 15:50 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Fri, May 03, 2019 at 11:42:28AM +0100, Jiong Wang wrote:
> BPF helper call transfers execution from eBPF insns to native functions
> while verifier insn walker only walks eBPF insns. So, verifier can only
> knows argument and return value types from explicit helper function
> prototype descriptions.
> 
> For 32-bit optimization, it is important to know whether argument (register
> use from eBPF insn) and return value (register define from external
> function) is 32-bit or 64-bit, so corresponding registers could be
> zero-extended correctly.
> 
> For arguments, they are register uses, we conservatively treat all of them
> as 64-bit at default, while the following new bpf_arg_type are added so we
> could start to mark those frequently used helper functions with more
> accurate argument type.
> 
>   ARG_CONST_SIZE32
>   ARG_CONST_SIZE32_OR_ZERO
>   ARG_ANYTHING32
> 
> A few helper functions shown up frequently inside Cilium bpf program are
> updated using these new types.
> 
> For return values, they are register defs, we need to know accurate width
> for correct zero extensions. Given most of the helper functions returning
> integers return 32-bit value, a new RET_INTEGER64 is added to make those
> functions return 64-bit value. All related helper functions are updated.
> 
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  include/linux/bpf.h      |  6 +++++-
>  kernel/bpf/core.c        |  2 +-
>  kernel/bpf/helpers.c     | 10 +++++-----
>  kernel/bpf/verifier.c    | 15 ++++++++++-----
>  kernel/trace/bpf_trace.c |  4 ++--
>  net/core/filter.c        | 38 +++++++++++++++++++-------------------
>  6 files changed, 42 insertions(+), 33 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9a21848..11a5fb9 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -198,9 +198,12 @@ enum bpf_arg_type {
>  
>  	ARG_CONST_SIZE,		/* number of bytes accessed from memory */
>  	ARG_CONST_SIZE_OR_ZERO,	/* number of bytes accessed from memory or 0 */
> +	ARG_CONST_SIZE32,	/* Likewise, but size fits into 32-bit */
> +	ARG_CONST_SIZE32_OR_ZERO,	/* Ditto */

these two should not be necessary. The program must pass constant into
the helper. The verifier is smart enough already to see it through.
Looks like patch 2 is using it as a band-aid.

>  	ARG_PTR_TO_CTX,		/* pointer to context */
>  	ARG_ANYTHING,		/* any (initialized) argument is ok */
> +	ARG_ANYTHING32,		/* Likewise, but it is a 32-bit argument */

Such annotation has subtle semantics that I don't think we've explored
in the past and I don't see from commit logs of this patch set that
the necessary analysis was done.
In particular 'int' in the helper argument does not mean that the verifier
will reject 64-bit values. It also doesn't mean that the helper
will reject them at run-time. In most cases it's a silent truncation.
Like bpf_tail_call will accept 64-bit value, and will cast it to u32
before doing max_entries bounds check.
In other cases it could be signed vs unsigned interpretation inside
the helper.
imo the code base is not ready for semi-automatic remarking of
helpers with ARG_ANYTHING32 when helper is accepting 32-bit value.
Definitely not something short term and not a prerequisite for this set.

Longer term... if we introduce ARG_ANYTHING32, what would that mean?
Would the verifier insert zext before calling the helper automatically
and we can drop truncation in the helper? May be useful?
What about passing negative value ?
ld_imm64 r2, -1
call foo
is a valid program.

>  	ARG_PTR_TO_SPIN_LOCK,	/* pointer to bpf_spin_lock */
>  	ARG_PTR_TO_SOCK_COMMON,	/* pointer to sock_common */
>  	ARG_PTR_TO_INT,		/* pointer to int */
> @@ -210,7 +213,8 @@ enum bpf_arg_type {
>  
>  /* type of values returned from helper functions */
>  enum bpf_return_type {
> -	RET_INTEGER,			/* function returns integer */
> +	RET_INTEGER,			/* function returns 32-bit integer */
> +	RET_INTEGER64,			/* function returns 64-bit integer */

These type of annotations are dangerous too since they don't consider sign
of the return value. In BPF ISA all arguments and return values are 64-bit.
When it's full 64-bit we don't need to specify the sign, since sing-bit
will be interpreted by the program and the verifier doesn't need to worry.
If we say that helper returns 32-bit we need state whether it's signed or not
for the verifier to analyze it properly.

I think it's the best to drop this patch. I don't think the rest of the set
really needs it. It looks to me as a last minute optimization that I really
wish wasn't there, because the rest we've discussed in depth and the set
was practically ready to land, but now bpf-next is closed.
Please resubmit after it reopens.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 04/17] bpf: introduce new alu insn BPF_ZEXT for explicit zero extension
  2019-05-03 10:42 ` [PATCH v6 bpf-next 04/17] bpf: introduce new alu insn BPF_ZEXT for explicit zero extension Jiong Wang
@ 2019-05-06 15:57   ` Alexei Starovoitov
  2019-05-06 23:19     ` Jiong Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2019-05-06 15:57 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Fri, May 03, 2019 at 11:42:31AM +0100, Jiong Wang wrote:
> This patch introduce new alu32 insn BPF_ZEXT, and allocate the unused
> opcode 0xe0 to it.
> 
> Compared with the other alu32 insns, zero extension on low 32-bit is the
> only semantics for this instruction. It also allows various JIT back-ends
> to do optimal zero extension code-gen.
> 
> BPF_ZEXT is supposed to be encoded with BPF_ALU only, and is supposed to be
> generated by the latter 32-bit optimization code inside verifier for those
> arches that do not support hardware implicit zero extension only.
> 
> It is not supposed to be used in user's program directly at the moment.
> Therefore, no need to recognize it inside generic verification code. It
> just need to be supported for execution on interpreter or related JIT
> back-ends.

uapi and the doc define it, but "it is not supposed to be used" ?!

> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  Documentation/networking/filter.txt | 10 ++++++++++
>  include/uapi/linux/bpf.h            |  3 +++
>  kernel/bpf/core.c                   |  4 ++++
>  tools/include/uapi/linux/bpf.h      |  3 +++
>  4 files changed, 20 insertions(+)
> 
> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
> index 319e5e0..1cb3e42 100644
> --- a/Documentation/networking/filter.txt
> +++ b/Documentation/networking/filter.txt
> @@ -903,6 +903,16 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
>    BPF_MOV   0xb0  /* eBPF only: mov reg to reg */
>    BPF_ARSH  0xc0  /* eBPF only: sign extending shift right */
>    BPF_END   0xd0  /* eBPF only: endianness conversion */
> +  BPF_ZEXT  0xe0  /* eBPF BPF_ALU only: zero-extends low 32-bit */
> +
> +Compared with BPF_ALU | BPF_MOV which zero-extends low 32-bit implicitly,
> +BPF_ALU | BPF_ZEXT zero-extends low 32-bit explicitly. Such zero extension is

wait. that's an excellent observation. alu|mov is exactly it.
we do not need another insn.
we probably can teach the verifier to recognize <<32, >>32 and replace with mov32


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 02/17] bpf: verifier: mark verified-insn with sub-register zext flag
  2019-05-06 13:49   ` Daniel Borkmann
  2019-05-06 14:49     ` Daniel Borkmann
@ 2019-05-06 22:14     ` Jiong Wang
  1 sibling, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-06 22:14 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Jiong Wang, alexei.starovoitov, bpf, netdev, oss-drivers


Daniel Borkmann writes:

> On 05/03/2019 12:42 PM, Jiong Wang wrote:
>> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> sub-register is written. This applies to destination register of ALU32 etc.
>> JIT back-ends must guarantee this semantic when doing code-gen.
>> 
>> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> extension sequence to meet such semantic.
>> 
>> This is important, because for code the following:
>> 
>>   u64_value = (u64) u32_value
>>   ... other uses of u64_value
>> 
>> compiler could exploit the semantic described above and save those zero
>> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> could go up to more for some arches which requires two shifts for zero
>> extension) because JIT back-end needs to do extra code-gen for all such
>> instructions.
>> 
>> However this is not always necessary in case u32_value is never cast into
>> a u64, which is quite normal in real life program. So, it would be really
>> good if we could identify those places where such type cast happened, and
>> only do zero extensions for them, not for the others. This could save a lot
>> of BPF code-gen.
>> 
>> Algo:
>>  - Split read flags into READ32 and READ64.
>> 
>>  - Record indices of instructions that do sub-register def (write). And
>>    these indices need to stay with reg state so path pruning and bpf
>>    to bpf function call could be handled properly.
>> 
>>    These indices are kept up to date while doing insn walk.
>> 
>>  - A full register read on an active sub-register def marks the def insn as
>>    needing zero extension on dst register.
>> 
>>  - A new sub-register write overrides the old one.
>> 
>>    A new full register write makes the register free of zero extension on
>>    dst register.
>> 
>>  - When propagating read64 during path pruning, also marks def insns whose
>>    defs are hanging active sub-register.
>> 
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>
> [...]
>> +/* This function is supposed to be used by the following 32-bit optimization
>> + * code only. It returns TRUE if the source or destination register operates
>> + * on 64-bit, otherwise return FALSE.
>> + */
>> +static bool is_reg64(struct bpf_verifier_env *env, struct bpf_insn *insn,
>> +		     u32 regno, struct bpf_reg_state *reg, enum reg_arg_type t)
>> +{
>> +	u8 code, class, op;
>> +
>> +	code = insn->code;
>> +	class = BPF_CLASS(code);
>> +	op = BPF_OP(code);
>> +	if (class == BPF_JMP) {
>> +		/* BPF_EXIT for "main" will reach here. Return TRUE
>> +		 * conservatively.
>> +		 */
>> +		if (op == BPF_EXIT)
>> +			return true;
>> +		if (op == BPF_CALL) {
>> +			/* BPF to BPF call will reach here because of marking
>> +			 * caller saved clobber with DST_OP_NO_MARK for which we
>> +			 * don't care the register def because they are anyway
>> +			 * marked as NOT_INIT already.
>> +			 */
>> +			if (insn->src_reg == BPF_PSEUDO_CALL)
>> +				return false;
>> +			/* Helper call will reach here because of arg type
>> +			 * check.
>> +			 */
>> +			if (t == SRC_OP)
>> +				return helper_call_arg64(env, insn->imm, regno);
>> +
>> +			return false;
>> +		}
>> +	}
>> +
>> +	if (class == BPF_ALU64 || class == BPF_JMP ||
>> +	    /* BPF_END always use BPF_ALU class. */
>> +	    (class == BPF_ALU && op == BPF_END && insn->imm == 64))
>> +		return true;
>
> For the BPF_JMP + JA case we don't look at registers, but I presume here
> we 'pretend' to use 64 bit regs to be more conservative as verifier would
> otherwise need to do more complex analysis at the jump target wrt zero
> extension, correct?

is_reg64 is only called by instruction which defines or uses register
value, BPF_JMP + JA doesn't have register define or use, so it won't define
sub-register nor has potential 64-bit read that will cause the associated
32-bit sub-register marked as needing zero extension.

So, BPF_JMP + JA actually doesn't matter to 32-bit opt and won't trigger
is_reg64.

Regards,
Jiong

>
>> +
>> +	if (class == BPF_ALU || class == BPF_JMP32)
>> +		return false;
>> +
>> +	if (class == BPF_LDX) {
>> +		if (t != SRC_OP)
>> +			return BPF_SIZE(code) == BPF_DW;
>> +		/* LDX source must be ptr. */
>> +		return true;
>> +	}
>> +
>> +	if (class == BPF_STX) {
>> +		if (reg->type != SCALAR_VALUE)
>> +			return true;
>> +		return BPF_SIZE(code) == BPF_DW;
>> +	}
>> +
>> +	if (class == BPF_LD) {
>> +		u8 mode = BPF_MODE(code);
>> +
>> +		/* LD_IMM64 */
>> +		if (mode == BPF_IMM)
>> +			return true;
>> +
>> +		/* Both LD_IND and LD_ABS return 32-bit data. */
>> +		if (t != SRC_OP)
>> +			return  false;
>> +
>> +		/* Implicit ctx ptr. */
>> +		if (regno == BPF_REG_6)
>> +			return true;
>> +
>> +		/* Explicit source could be any width. */
>> +		return true;
>> +	}
>> +
>> +	if (class == BPF_ST)
>> +		/* The only source register for BPF_ST is a ptr. */
>> +		return true;
>> +
>> +	/* Conservatively return true at default. */
>> +	return true;
>> +}


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-06 13:57   ` Daniel Borkmann
@ 2019-05-06 22:25     ` Jiong Wang
  2019-05-08 11:12       ` Jiong Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Jiong Wang @ 2019-05-06 22:25 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Jiong Wang, alexei.starovoitov, bpf, netdev, oss-drivers


Daniel Borkmann writes:

> On 05/03/2019 12:42 PM, Jiong Wang wrote:
>> BPF helper call transfers execution from eBPF insns to native functions
>> while verifier insn walker only walks eBPF insns. So, verifier can only
>> knows argument and return value types from explicit helper function
>> prototype descriptions.
>> 
>> For 32-bit optimization, it is important to know whether argument (register
>> use from eBPF insn) and return value (register define from external
>> function) is 32-bit or 64-bit, so corresponding registers could be
>> zero-extended correctly.
>> 
>> For arguments, they are register uses, we conservatively treat all of them
>> as 64-bit at default, while the following new bpf_arg_type are added so we
>> could start to mark those frequently used helper functions with more
>> accurate argument type.
>> 
>>   ARG_CONST_SIZE32
>>   ARG_CONST_SIZE32_OR_ZERO
>
> For the above two, I was wondering is there a case where the passed size is
> not used as 32 bit aka couldn't we generally assume 32 bit here w/o adding
> these two extra arg types?

Will give a detailed reply tomorrow. IIRC there was. I was benchmarking
bpf_lxc and found it contains quite a few helper calls which generates a
fairly percentage of unnecessary zext on parameters.

> For ARG_ANYTHING32 and RET_INTEGER64 definitely
> makes sense (btw, opt-in value like RET_INTEGER32 might have been easier for
> reviewing converted helpers).
>
>>   ARG_ANYTHING32
>> 
>> A few helper functions shown up frequently inside Cilium bpf program are
>> updated using these new types.
>> 
>> For return values, they are register defs, we need to know accurate width
>> for correct zero extensions. Given most of the helper functions returning
>> integers return 32-bit value, a new RET_INTEGER64 is added to make those
>> functions return 64-bit value. All related helper functions are updated.
>> 
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> [...]
>
>> @@ -2003,9 +2003,9 @@ static const struct bpf_func_proto bpf_csum_diff_proto = {
>>  	.pkt_access	= true,
>>  	.ret_type	= RET_INTEGER,
>>  	.arg1_type	= ARG_PTR_TO_MEM_OR_NULL,
>> -	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
>> +	.arg2_type	= ARG_CONST_SIZE32_OR_ZERO,
>>  	.arg3_type	= ARG_PTR_TO_MEM_OR_NULL,
>> -	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
>> +	.arg4_type	= ARG_CONST_SIZE32_OR_ZERO,
>>  	.arg5_type	= ARG_ANYTHING,
>>  };
>
> I noticed that the above and also bpf_csum_update() would need to be converted
> to RET_INTEGER64 as they would break otherwise: it's returning error but also
> u32 csum value, so use for error checking would be s64 ret =
> bpf_csum_xyz(...).

Ack.

(I did searched ^u64 inside upai header, should also search ^s64, will
double-check all changes)

>
> Thanks,
> Daniel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 04/17] bpf: introduce new alu insn BPF_ZEXT for explicit zero extension
  2019-05-06 15:57   ` Alexei Starovoitov
@ 2019-05-06 23:19     ` Jiong Wang
  2019-05-07  4:29       ` Jiong Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Jiong Wang @ 2019-05-06 23:19 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jiong Wang, daniel, bpf, netdev, oss-drivers


Alexei Starovoitov writes:

> On Fri, May 03, 2019 at 11:42:31AM +0100, Jiong Wang wrote:
>> This patch introduce new alu32 insn BPF_ZEXT, and allocate the unused
>> opcode 0xe0 to it.
>> 
>> Compared with the other alu32 insns, zero extension on low 32-bit is the
>> only semantics for this instruction. It also allows various JIT back-ends
>> to do optimal zero extension code-gen.
>> 
>> BPF_ZEXT is supposed to be encoded with BPF_ALU only, and is supposed to be
>> generated by the latter 32-bit optimization code inside verifier for those
>> arches that do not support hardware implicit zero extension only.
>> 
>> It is not supposed to be used in user's program directly at the moment.
>> Therefore, no need to recognize it inside generic verification code. It
>> just need to be supported for execution on interpreter or related JIT
>> back-ends.
>
> uapi and the doc define it, but "it is not supposed to be used" ?!
>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>>  Documentation/networking/filter.txt | 10 ++++++++++
>>  include/uapi/linux/bpf.h            |  3 +++
>>  kernel/bpf/core.c                   |  4 ++++
>>  tools/include/uapi/linux/bpf.h      |  3 +++
>>  4 files changed, 20 insertions(+)
>> 
>> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
>> index 319e5e0..1cb3e42 100644
>> --- a/Documentation/networking/filter.txt
>> +++ b/Documentation/networking/filter.txt
>> @@ -903,6 +903,16 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
>>    BPF_MOV   0xb0  /* eBPF only: mov reg to reg */
>>    BPF_ARSH  0xc0  /* eBPF only: sign extending shift right */
>>    BPF_END   0xd0  /* eBPF only: endianness conversion */
>> +  BPF_ZEXT  0xe0  /* eBPF BPF_ALU only: zero-extends low 32-bit */
>> +
>> +Compared with BPF_ALU | BPF_MOV which zero-extends low 32-bit implicitly,
>> +BPF_ALU | BPF_ZEXT zero-extends low 32-bit explicitly. Such zero extension is
>
> wait. that's an excellent observation. alu|mov is exactly it.
> we do not need another insn.
> we probably can teach the verifier to recognize <<32, >>32 and replace
> with mov32

Hmm, I am silly, in v6, patched insn will be conservatively marked as
always needing zext, so looks like no problem to just insert mov32 as
zext. But some backends needs minor opt, because this will be special mov,
with the same src and dst, just need to clear high 32-bit, no need of mov.

Regards,
Jiong


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 04/17] bpf: introduce new alu insn BPF_ZEXT for explicit zero extension
  2019-05-06 23:19     ` Jiong Wang
@ 2019-05-07  4:29       ` Jiong Wang
  2019-05-07  4:40         ` Alexei Starovoitov
  0 siblings, 1 reply; 40+ messages in thread
From: Jiong Wang @ 2019-05-07  4:29 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: daniel, bpf, netdev, oss-drivers


Jiong Wang writes:

> Alexei Starovoitov writes:
>
>> On Fri, May 03, 2019 at 11:42:31AM +0100, Jiong Wang wrote:
>>> This patch introduce new alu32 insn BPF_ZEXT, and allocate the unused
>>> opcode 0xe0 to it.
>>> 
>>> Compared with the other alu32 insns, zero extension on low 32-bit is the
>>> only semantics for this instruction. It also allows various JIT back-ends
>>> to do optimal zero extension code-gen.
>>> 
>>> BPF_ZEXT is supposed to be encoded with BPF_ALU only, and is supposed to be
>>> generated by the latter 32-bit optimization code inside verifier for those
>>> arches that do not support hardware implicit zero extension only.
>>> 
>>> It is not supposed to be used in user's program directly at the moment.
>>> Therefore, no need to recognize it inside generic verification code. It
>>> just need to be supported for execution on interpreter or related JIT
>>> back-ends.
>>
>> uapi and the doc define it, but "it is not supposed to be used" ?!
>>
>>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>>> ---
>>>  Documentation/networking/filter.txt | 10 ++++++++++
>>>  include/uapi/linux/bpf.h            |  3 +++
>>>  kernel/bpf/core.c                   |  4 ++++
>>>  tools/include/uapi/linux/bpf.h      |  3 +++
>>>  4 files changed, 20 insertions(+)
>>> 
>>> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
>>> index 319e5e0..1cb3e42 100644
>>> --- a/Documentation/networking/filter.txt
>>> +++ b/Documentation/networking/filter.txt
>>> @@ -903,6 +903,16 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
>>>    BPF_MOV   0xb0  /* eBPF only: mov reg to reg */
>>>    BPF_ARSH  0xc0  /* eBPF only: sign extending shift right */
>>>    BPF_END   0xd0  /* eBPF only: endianness conversion */
>>> +  BPF_ZEXT  0xe0  /* eBPF BPF_ALU only: zero-extends low 32-bit */
>>> +
>>> +Compared with BPF_ALU | BPF_MOV which zero-extends low 32-bit implicitly,
>>> +BPF_ALU | BPF_ZEXT zero-extends low 32-bit explicitly. Such zero extension is
>>
>> wait. that's an excellent observation. alu|mov is exactly it.
>> we do not need another insn.
>> we probably can teach the verifier to recognize <<32, >>32 and replace
>> with mov32
>
> Hmm, I am silly, in v6, patched insn will be conservatively marked as
> always needing zext, so looks like no problem to just insert mov32 as
> zext. But some backends needs minor opt, because this will be special mov,
> with the same src and dst, just need to clear high 32-bit, no need of
> mov.

I take it back.

Recalled the reason why new ZEXT was introduced. It was because instruction
insertion based approach doesn't push analysis results down to JIT
back-ends, instead, it removes the clear-high-32bit semantics from
all sub-register write instructions, then insert new explicit ZEXT insn,
either by 64bit shifts combination or this new introduced ZEXT, what's
important, the inserted "ZEXT" should not be affected by
"env->verifier_zext", and be performed unconditionally.

That is to say, for the current zero extension insertion based approach,
JIT back-ends trust verifier has done full zero extension insertion and
rewritten the instruction sequence once the flag env->verifier_zext is set,
and then JIT back-end do NOT clear high 32-bit for all existing
sub-register write instructions, for example ALU32 and narrowed load, they
rely on those new inserted "unconditional ZEXT" to do the job if it is
needed. So, if we insert mov32, there actually won't be zero extension
insertion performed for it.

The inserted "ZEXT" needs to have zero extension semantics that is not
affected by env->verifier_zext. BPF_ZEXT was introduced because of this,
low32 zext is its only semantics and should be unconditionally done.

"mov32" could be used as "ZEXT" only when there is no such removal of zero
extension semantics from alu32, or if JIT back-end could have instruction
level information, for example the analyzed instruction level zero extension
information pushed down to JIT back-ends. The new inserted "mov32" would
then has information like "zext_dst" be true, JIT back-end then will
generate zero extension for it.

Regards,
Jiong

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 04/17] bpf: introduce new alu insn BPF_ZEXT for explicit zero extension
  2019-05-07  4:29       ` Jiong Wang
@ 2019-05-07  4:40         ` Alexei Starovoitov
  0 siblings, 0 replies; 40+ messages in thread
From: Alexei Starovoitov @ 2019-05-07  4:40 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Tue, May 07, 2019 at 05:29:09AM +0100, Jiong Wang wrote:
> 
> Jiong Wang writes:
> 
> > Alexei Starovoitov writes:
> >
> >> On Fri, May 03, 2019 at 11:42:31AM +0100, Jiong Wang wrote:
> >>> This patch introduce new alu32 insn BPF_ZEXT, and allocate the unused
> >>> opcode 0xe0 to it.
> >>> 
> >>> Compared with the other alu32 insns, zero extension on low 32-bit is the
> >>> only semantics for this instruction. It also allows various JIT back-ends
> >>> to do optimal zero extension code-gen.
> >>> 
> >>> BPF_ZEXT is supposed to be encoded with BPF_ALU only, and is supposed to be
> >>> generated by the latter 32-bit optimization code inside verifier for those
> >>> arches that do not support hardware implicit zero extension only.
> >>> 
> >>> It is not supposed to be used in user's program directly at the moment.
> >>> Therefore, no need to recognize it inside generic verification code. It
> >>> just need to be supported for execution on interpreter or related JIT
> >>> back-ends.
> >>
> >> uapi and the doc define it, but "it is not supposed to be used" ?!
> >>
> >>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >>> ---
> >>>  Documentation/networking/filter.txt | 10 ++++++++++
> >>>  include/uapi/linux/bpf.h            |  3 +++
> >>>  kernel/bpf/core.c                   |  4 ++++
> >>>  tools/include/uapi/linux/bpf.h      |  3 +++
> >>>  4 files changed, 20 insertions(+)
> >>> 
> >>> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
> >>> index 319e5e0..1cb3e42 100644
> >>> --- a/Documentation/networking/filter.txt
> >>> +++ b/Documentation/networking/filter.txt
> >>> @@ -903,6 +903,16 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
> >>>    BPF_MOV   0xb0  /* eBPF only: mov reg to reg */
> >>>    BPF_ARSH  0xc0  /* eBPF only: sign extending shift right */
> >>>    BPF_END   0xd0  /* eBPF only: endianness conversion */
> >>> +  BPF_ZEXT  0xe0  /* eBPF BPF_ALU only: zero-extends low 32-bit */
> >>> +
> >>> +Compared with BPF_ALU | BPF_MOV which zero-extends low 32-bit implicitly,
> >>> +BPF_ALU | BPF_ZEXT zero-extends low 32-bit explicitly. Such zero extension is
> >>
> >> wait. that's an excellent observation. alu|mov is exactly it.
> >> we do not need another insn.
> >> we probably can teach the verifier to recognize <<32, >>32 and replace
> >> with mov32
> >
> > Hmm, I am silly, in v6, patched insn will be conservatively marked as
> > always needing zext, so looks like no problem to just insert mov32 as
> > zext. But some backends needs minor opt, because this will be special mov,
> > with the same src and dst, just need to clear high 32-bit, no need of
> > mov.
> 
> I take it back.
> 
> Recalled the reason why new ZEXT was introduced. It was because instruction
> insertion based approach doesn't push analysis results down to JIT
> back-ends, instead, it removes the clear-high-32bit semantics from
> all sub-register write instructions, then insert new explicit ZEXT insn,
> either by 64bit shifts combination or this new introduced ZEXT, what's
> important, the inserted "ZEXT" should not be affected by
> "env->verifier_zext", and be performed unconditionally.
> 
> That is to say, for the current zero extension insertion based approach,
> JIT back-ends trust verifier has done full zero extension insertion and
> rewritten the instruction sequence once the flag env->verifier_zext is set,
> and then JIT back-end do NOT clear high 32-bit for all existing
> sub-register write instructions, for example ALU32 and narrowed load, they
> rely on those new inserted "unconditional ZEXT" to do the job if it is
> needed. So, if we insert mov32, there actually won't be zero extension
> insertion performed for it.
> 
> The inserted "ZEXT" needs to have zero extension semantics that is not
> affected by env->verifier_zext. BPF_ZEXT was introduced because of this,
> low32 zext is its only semantics and should be unconditionally done.
> 
> "mov32" could be used as "ZEXT" only when there is no such removal of zero
> extension semantics from alu32, or if JIT back-end could have instruction
> level information, for example the analyzed instruction level zero extension
> information pushed down to JIT back-ends. The new inserted "mov32" would
> then has information like "zext_dst" be true, JIT back-end then will
> generate zero extension for it.

JITs could simply always do zext for mov32. No need to for extra flags.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-06 22:25     ` Jiong Wang
@ 2019-05-08 11:12       ` Jiong Wang
  0 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-08 11:12 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: alexei.starovoitov, bpf, netdev, oss-drivers


Jiong Wang writes:

> Daniel Borkmann writes:
>
>> On 05/03/2019 12:42 PM, Jiong Wang wrote:
>>> BPF helper call transfers execution from eBPF insns to native functions
>>> while verifier insn walker only walks eBPF insns. So, verifier can only
>>> knows argument and return value types from explicit helper function
>>> prototype descriptions.
>>> 
>>> For 32-bit optimization, it is important to know whether argument (register
>>> use from eBPF insn) and return value (register define from external
>>> function) is 32-bit or 64-bit, so corresponding registers could be
>>> zero-extended correctly.
>>> 
>>> For arguments, they are register uses, we conservatively treat all of them
>>> as 64-bit at default, while the following new bpf_arg_type are added so we
>>> could start to mark those frequently used helper functions with more
>>> accurate argument type.
>>> 
>>>   ARG_CONST_SIZE32
>>>   ARG_CONST_SIZE32_OR_ZERO
>>
>> For the above two, I was wondering is there a case where the passed size is
>> not used as 32 bit aka couldn't we generally assume 32 bit here w/o adding
>> these two extra arg types?
>
> Will give a detailed reply tomorrow. IIRC there was.

"bpf_perf_event_output" etc inside kernel/trace/bpf_trace.c. They are using
ARG_CONST_SIZE_OR_ZERO for "u64 size" which should have been a mistake,
because "size" parameter for bpf_perf_event_output is used to initialize
the same field inside struct perf_raw_record which is u32. This lead me
thinking people might use in-accurate arg type description.

Was keeping the original ARG_CONST_SIZE/OR_ZERO as 64-bit meaning at
default, mostly because I am thinking it is safer. If we assume 
ARG_CONST_SIZE/OR_ZERO are 32-bit at default, we must check all helper
functions to make sure their arg types are correct, and need to make sure
all future added helpers has correct arg types as well. Otherwise, if a
helper function has u64 arg and it comes from u32 zext, forget to use
new ARG_CONST_SIZE64 will cause "val" not zero extended, and it will be a
correctness issue.

  u32 val
  helper_call((u64)val)

Instead, if we assume existing ARG_CONST_SIZE/OR_ZERO are u64, it just
introduce redundant zext but not correctness issue.

Regards,
Jiong

>> For ARG_ANYTHING32 and RET_INTEGER64 definitely
>> makes sense (btw, opt-in value like RET_INTEGER32 might have been easier for
>> reviewing converted helpers

>>> A few helper functions shown up frequently inside Cilium bpf program are
>>> updated using these new types.
>>> 
>>> For return values, they are register defs, we need to know accurate width
>>> for correct zero extensions. Given most of the helper functions returning
>>> integers return 32-bit value, a new RET_INTEGER64 is added to make those
>>> functions return 64-bit value. All related helper functions are updated.
>>> 
>>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> [...]
>>
>>> @@ -2003,9 +2003,9 @@ static const struct bpf_func_proto bpf_csum_diff_proto = {
>>>  	.pkt_access	= true,
>>>  	.ret_type	= RET_INTEGER,
>>>  	.arg1_type	= ARG_PTR_TO_MEM_OR_NULL,
>>> -	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
>>> +	.arg2_type	= ARG_CONST_SIZE32_OR_ZERO,
>>>  	.arg3_type	= ARG_PTR_TO_MEM_OR_NULL,
>>> -	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
>>> +	.arg4_type	= ARG_CONST_SIZE32_OR_ZERO,
>>>  	.arg5_type	= ARG_ANYTHING,
>>>  };
>>
>> I noticed that the above and also bpf_csum_update() would need to be converted
>> to RET_INTEGER64 as they would break otherwise: it's returning error but also
>> u32 csum value, so use for error checking would be s64 ret =
>> bpf_csum_xyz(...).
>
> Ack.
>
> (I did searched ^u64 inside upai header, should also search ^s64, will
> double-check all changes)
>
>>
>> Thanks,
>> Daniel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-06 15:50   ` Alexei Starovoitov
@ 2019-05-08 14:45     ` Jiong Wang
  2019-05-08 17:51       ` Alexei Starovoitov
  0 siblings, 1 reply; 40+ messages in thread
From: Jiong Wang @ 2019-05-08 14:45 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jiong Wang, daniel, bpf, netdev, oss-drivers


Alexei Starovoitov writes:

> On Fri, May 03, 2019 at 11:42:28AM +0100, Jiong Wang wrote:
>> BPF helper call transfers execution from eBPF insns to native functions
>> while verifier insn walker only walks eBPF insns. So, verifier can only
>> knows argument and return value types from explicit helper function
>> prototype descriptions.
>> 
>> For 32-bit optimization, it is important to know whether argument (register
>> use from eBPF insn) and return value (register define from external
>> function) is 32-bit or 64-bit, so corresponding registers could be
>> zero-extended correctly.
>> 
>> For arguments, they are register uses, we conservatively treat all of them
>> as 64-bit at default, while the following new bpf_arg_type are added so we
>> could start to mark those frequently used helper functions with more
>> accurate argument type.
>> 
>>   ARG_CONST_SIZE32
>>   ARG_CONST_SIZE32_OR_ZERO
>>   ARG_ANYTHING32
>> 
>> A few helper functions shown up frequently inside Cilium bpf program are
>> updated using these new types.
>> 
>> For return values, they are register defs, we need to know accurate width
>> for correct zero extensions. Given most of the helper functions returning
>> integers return 32-bit value, a new RET_INTEGER64 is added to make those
>> functions return 64-bit value. All related helper functions are updated.
>> 
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>>  include/linux/bpf.h      |  6 +++++-
>>  kernel/bpf/core.c        |  2 +-
>>  kernel/bpf/helpers.c     | 10 +++++-----
>>  kernel/bpf/verifier.c    | 15 ++++++++++-----
>>  kernel/trace/bpf_trace.c |  4 ++--
>>  net/core/filter.c        | 38 +++++++++++++++++++-------------------
>>  6 files changed, 42 insertions(+), 33 deletions(-)
>> 
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 9a21848..11a5fb9 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -198,9 +198,12 @@ enum bpf_arg_type {
>>  
>>  	ARG_CONST_SIZE,		/* number of bytes accessed from memory */
>>  	ARG_CONST_SIZE_OR_ZERO,	/* number of bytes accessed from memory or 0 */
>> +	ARG_CONST_SIZE32,	/* Likewise, but size fits into 32-bit */
>> +	ARG_CONST_SIZE32_OR_ZERO,	/* Ditto */
>
> these two should not be necessary. The program must pass constant into
> the helper. The verifier is smart enough already to see it through.
> Looks like patch 2 is using it as a band-aid.
>
>>  	ARG_PTR_TO_CTX,		/* pointer to context */
>>  	ARG_ANYTHING,		/* any (initialized) argument is ok */
>> +	ARG_ANYTHING32,		/* Likewise, but it is a 32-bit argument */
>
> Such annotation has subtle semantics that I don't think we've explored
> in the past and I don't see from commit logs of this patch set that
> the necessary analysis was done.
> In particular 'int' in the helper argument does not mean that the verifier
> will reject 64-bit values. It also doesn't mean that the helper
> will reject them at run-time. In most cases it's a silent truncation.
> Like bpf_tail_call will accept 64-bit value, and will cast it to u32
> before doing max_entries bounds check.
> In other cases it could be signed vs unsigned interpretation inside
> the helper.

I might be misunderstanding your points, please just shout if I am wrong.

Suppose the following BPF code:

  unsigned helper(unsigned long long, unsigned long long);
  unsigned long long test(unsigned *a, unsigned int c)
  {
    unsigned int b = *a;
    c += 10;
    return helper(b, c);
  }

We get the following instruction sequence by latest llvm
(-O2 -mattr=+alu32 -mcpu=v3)

  test:
    1: w1 = *(u32 *)(r1 + 0)
    2: w2 += 10
    3: call helper
    4: exit

Argument Types
===
Now instruction 1 and 2 are sub-register defines, and instruction 3, the
call, use them implicitly.

Without the introduction of the new ARG_CONST_SIZE32 and
ARG_CONST_SIZE32_OR_ZERO, we don't know what should be done with w1 and
w2, zero-extend them should be fine for all cases, but could resulting in a
few unnecessary zero-extension inserted.

By introducing the new types, we know both are u64, so should zero-extend
them.

This sort of comes to how high 32-bit zero extension semantics are
exploited by compiler or handle-written assembler.

For LLVM compiler, it will exploit ISA zext semantics for quite a few
cases.

  1. narrow load at are naturally done by instruction combine pattern,
  things like:

  def : Pat<(i64 (zextloadi32 ADDRri:$src)),
              (SUBREG_TO_REG (i64 0), (LDW32 ADDRri:$src), sub_32)>;

  2. For ALUs, there is BPF peephole pass trying to do it, sequence for
  preparing argument 2 would have been:
  
     w2 = 16
     r2 <<= 32
     r2 >>= 32
     
  But, the peephole pass could know the two shifts are shifting value comes
  from sub-register MOV, so safe to remove the redundant two shifts.

So, finally the BPF instruction sequence we get is the one shown early,
without new argument types, verifier doesn't know how to deal with
zext.

Static compiler has header files which contains function prototypes/C-types
to guide code-gen. IMHO, verifier needs the same thing, which is BPF helper
function prototype description.

And that why I introduce these new argument types, without them, there
could be more than 10% extra zext inserted on benchmarks like bpf_lxc.

Return Types
===
If we change the testcase a little bit by introducing one new function
helper1.

  unsigned helper(unsigned long long, unsigned long long);
  unsigned long long helper1(unsigned long long);

  unsigned long long test(unsigned *a, unsigned int c)
  {
    unsigned int b = *a;
    c += 10;
    return helper1(helper(b, c));
  }

Code-gen for -O2 -mattr=+alu32 -mcpu=v3 will be:

  test:
    1: w1 = *(u32 *)(r1 + 0)
    2: w2 += 10
    3: call helper
    4: r1 = r0
    5: call helper1
    6: exit

The return value of "helper" is u32 and is used as u64 arg for helper1
directly. For bpf-to-bpf call, I think it is correct, because the value
define happens inside bpf program, and 32-bit opt pass does pass the
definition information from callee to caller at exit, so the sub-register
define instruction will be marked, and the register will be zero extended,
so no problem to use it directly as 64-bit to helper1.

But for helper functions, they are done by native code which may not follow
this convention. For example, on arm32, calling helper functions are just
jump to and execute native code. And if the helper returns u32, it just set
r0, no clearing of r1 which is the high 32-bit in the register pair
modeling eBPF R0.

Inside verifier, we assume helper return values are all 64-bit, without new
return types, there won't be zero-extension on u32 value if 32-bit JIT
back-ends haven't done correct job to clear high 32-bit of return value if
the helper function is 64-bit type. This requires JIT back-ends check
helper function prototypes and do code-gen accordingly. Maybe is is too
strict to ask JIT backends to do this, we should just let compiler always
generates extra zero extension if the 32-bit return value comes from
external helper functions?

> imo the code base is not ready for semi-automatic remarking of
> helpers with ARG_ANYTHING32 when helper is accepting 32-bit value.
> Definitely not something short term and not a prerequisite for this set.
>
> Longer term... if we introduce ARG_ANYTHING32, what would that mean?

At the moment I am thinking ARG_ANYTHING32 just means the read of the value
is low 32-bit only.

> Would the verifier insert zext before calling the helper automatically
> and we can drop truncation in the helper? May be useful?

Yes, after 32-bit opt enabled, JIT back-ends doesn't need to generate high
32-bit clearance instructions if once instruction is not marked. Verifier
is responsible for inserting zext before calling the helper if the
definition of the arguments is a sub-register define and if the argument
usage if 64-bit.

> What about passing negative value ?
> ld_imm64 r2, -1
> call foo
> is a valid program.

ld_imm64 is a 64-bit define, so 32-bit opt pass won't touch the sequence.

>
>>  	ARG_PTR_TO_SPIN_LOCK,	/* pointer to bpf_spin_lock */
>>  	ARG_PTR_TO_SOCK_COMMON,	/* pointer to sock_common */
>>  	ARG_PTR_TO_INT,		/* pointer to int */
>> @@ -210,7 +213,8 @@ enum bpf_arg_type {
>>  
>>  /* type of values returned from helper functions */
>>  enum bpf_return_type {
>> -	RET_INTEGER,			/* function returns integer */
>> +	RET_INTEGER,			/* function returns 32-bit integer */
>> +	RET_INTEGER64,			/* function returns 64-bit integer */
>
> These type of annotations are dangerous too since they don't consider sign
> of the return value. In BPF ISA all arguments and return values are 64-bit.
> When it's full 64-bit we don't need to specify the sign, since sing-bit
> will be interpreted by the program and the verifier doesn't need to worry.
> If we say that helper returns 32-bit we need state whether it's signed or not
> for the verifier to analyze it properly.

I feel signed or unsigned will just work, for example:

  s32 helper(u64, u64);
  s64 helper1(u64);

  s64 test(u32 *a, u32 c)
  {
    u32 b = *a;
    c += 10;
    return helper1(helper(b, c));
  }

  test:
    w1 = *(u32 *)(r1 + 0)
    w2 += 10
    call helper
    r1 = r0
    r1 <<= 32
    r1 s>>= 32
    call helper1
    exit
    
"helper" is RET_INTEGER which means returning 32-bit, and helper1 is
RET_INTEGER64. LLVM compiler should have generated explicit signed
extension sequence like above to sign extend return value of "helper". And
because is is RET_INTEGER and because there is later 64-bit use in the
following "r1 = r0" move, it is fine to insert zero extension after the
call, the later signed extension sequence will still work.

Regards,
Jiong

> I think it's the best to drop this patch. I don't think the rest of the set
> really needs it. It looks to me as a last minute optimization that I really
> wish wasn't there, because the rest we've discussed in depth and the set
> was practically ready to land, but now bpf-next is closed.
> Please resubmit after it reopens.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-08 14:45     ` Jiong Wang
@ 2019-05-08 17:51       ` Alexei Starovoitov
  2019-05-09 12:32         ` Jiong Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2019-05-08 17:51 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Wed, May 08, 2019 at 03:45:12PM +0100, Jiong Wang wrote:
> 
> I might be misunderstanding your points, please just shout if I am wrong.
> 
> Suppose the following BPF code:
> 
>   unsigned helper(unsigned long long, unsigned long long);
>   unsigned long long test(unsigned *a, unsigned int c)
>   {
>     unsigned int b = *a;
>     c += 10;
>     return helper(b, c);
>   }
> 
> We get the following instruction sequence by latest llvm
> (-O2 -mattr=+alu32 -mcpu=v3)
> 
>   test:
>     1: w1 = *(u32 *)(r1 + 0)
>     2: w2 += 10
>     3: call helper
>     4: exit
> 
> Argument Types
> ===
> Now instruction 1 and 2 are sub-register defines, and instruction 3, the
> call, use them implicitly.
> 
> Without the introduction of the new ARG_CONST_SIZE32 and
> ARG_CONST_SIZE32_OR_ZERO, we don't know what should be done with w1 and
> w2, zero-extend them should be fine for all cases, but could resulting in a
> few unnecessary zero-extension inserted.

I don't think we're on the same page.
The argument type is _const_.
In the example above they are not _const_.

> 
> And that why I introduce these new argument types, without them, there
> could be more than 10% extra zext inserted on benchmarks like bpf_lxc.

10% extra ? so be it.
We're talking past each other here.
I agree with your optimization goal, but I think you're missing
the safety concerns I'm trying to explain.

> 
> But for helper functions, they are done by native code which may not follow
> this convention. For example, on arm32, calling helper functions are just
> jump to and execute native code. And if the helper returns u32, it just set
> r0, no clearing of r1 which is the high 32-bit in the register pair
> modeling eBPF R0.

it's arm32 bug then. All helpers _must_ return 64-bit back to bpf prog
and _must_ accept 64-bit from bpf prog.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-08 17:51       ` Alexei Starovoitov
@ 2019-05-09 12:32         ` Jiong Wang
  2019-05-09 17:31           ` Jiong Wang
  2019-05-10  1:53           ` Alexei Starovoitov
  0 siblings, 2 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-09 12:32 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jiong Wang, daniel, bpf, netdev, oss-drivers


Alexei Starovoitov writes:

> On Wed, May 08, 2019 at 03:45:12PM +0100, Jiong Wang wrote:
>> 
>> I might be misunderstanding your points, please just shout if I am wrong.
>> 
>> Suppose the following BPF code:
>> 
>>   unsigned helper(unsigned long long, unsigned long long);
>>   unsigned long long test(unsigned *a, unsigned int c)
>>   {
>>     unsigned int b = *a;
>>     c += 10;
>>     return helper(b, c);
>>   }
>> 
>> We get the following instruction sequence by latest llvm
>> (-O2 -mattr=+alu32 -mcpu=v3)
>> 
>>   test:
>>     1: w1 = *(u32 *)(r1 + 0)
>>     2: w2 += 10
>>     3: call helper
>>     4: exit
>> 
>> Argument Types
>> ===
>> Now instruction 1 and 2 are sub-register defines, and instruction 3, the
>> call, use them implicitly.
>> 
>> Without the introduction of the new ARG_CONST_SIZE32 and
>> ARG_CONST_SIZE32_OR_ZERO, we don't know what should be done with w1 and
>> w2, zero-extend them should be fine for all cases, but could resulting in a
>> few unnecessary zero-extension inserted.
>
> I don't think we're on the same page.
> The argument type is _const_.
> In the example above they are not _const_.

Right, have read check_func_arg + check_helper_mem_access again.

Looks like ARG_CONST_SIZE* are designed for describing memory access size
for things like bounds checking. It must be a constant for stack access,
otherwise prog will be rejected, but it looks to me variables are allowed
for pkt/map access.

But pkt/map has extra range info. So, indeed, ARG_CONST_SIZE32* are
unnecessary, the width could be figured out through the range.

Will just drop this patch in next version.

And sorry for repeating it again, I am still concerned on the issue
described at https://www.spinics.net/lists/netdev/msg568678.html.

To be simple, zext insertion is based on eBPF ISA and assumes all
sub-register defines from alu32 or narrow loads need it if the underlying
hardware arches don't do it. However, some arches support hardware zext
partially. For example, PowerPC, SPARC etc are 64-bit arches, while they
don't do hardware zext on alu32, they do it for narrow loads. And RISCV is
even more special, some alu32 has hardware zext, some don't.

At the moment we have single backend hook "bpf_jit_hardware_zext", once a
backend enable it, verifier just insert zero extension for all identified
alu32 and narrow loads.

Given verifier analysis info is not pushed down to JIT back-ends, verifier
needs more back-end info pushed up from back-ends. Do you think make sense
to introduce another hook "bpf_jit_hardware_zext_narrow_load" to at least
prevent unnecessary zext inserted for narrowed loads for arches like
PowerPC, SPARC?

The hooks to control verifier zext insertion then becomes two:

  bpf_jit_hardware_zext_alu32
  bpf_jit_hardware_zext_narrow_load

>> And that why I introduce these new argument types, without them, there
>> could be more than 10% extra zext inserted on benchmarks like bpf_lxc.
>
> 10% extra ? so be it.
> We're talking past each other here.
> I agree with your optimization goal, but I think you're missing
> the safety concerns I'm trying to explain.
>> But for helper functions, they are done by native code which may not follow
>> this convention. For example, on arm32, calling helper functions are just
>> jump to and execute native code. And if the helper returns u32, it just set
>> r0, no clearing of r1 which is the high 32-bit in the register pair
>> modeling eBPF R0.
>
> it's arm32 bug then. All helpers _must_ return 64-bit back to bpf prog
> and _must_ accept 64-bit from bpf prog.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-09 12:32         ` Jiong Wang
@ 2019-05-09 17:31           ` Jiong Wang
  2019-05-10  1:53           ` Alexei Starovoitov
  1 sibling, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-09 17:31 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: daniel, bpf, netdev, oss-drivers


Jiong Wang writes:

<snip>

> At the moment we have single backend hook "bpf_jit_hardware_zext", once a
> backend enable it, verifier just insert zero extension for all identified
> alu32 and narrow loads.
>
> Given verifier analysis info is not pushed down to JIT back-ends, verifier
> needs more back-end info pushed up from back-ends. Do you think make sense
> to introduce another hook "bpf_jit_hardware_zext_narrow_load"

Maybe just keep the current "bpf_jit_hardware_zext", but let it return
int/enum instead of bool. Then verifier could know hardware ability through
the enum value?


> to at least
> prevent unnecessary zext inserted for narrowed loads for arches like
> PowerPC, SPARC?
>
> The hooks to control verifier zext insertion then becomes two:
>
>   bpf_jit_hardware_zext_alu32
>   bpf_jit_hardware_zext_narrow_load
>
>>> And that why I introduce these new argument types, without them, there
>>> could be more than 10% extra zext inserted on benchmarks like bpf_lxc.
>>
>> 10% extra ? so be it.
>> We're talking past each other here.
>> I agree with your optimization goal, but I think you're missing
>> the safety concerns I'm trying to explain.
>>> But for helper functions, they are done by native code which may not follow
>>> this convention. For example, on arm32, calling helper functions are just
>>> jump to and execute native code. And if the helper returns u32, it just set
>>> r0, no clearing of r1 which is the high 32-bit in the register pair
>>> modeling eBPF R0.
>>
>> it's arm32 bug then. All helpers _must_ return 64-bit back to bpf prog
>> and _must_ accept 64-bit from bpf prog.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-09 12:32         ` Jiong Wang
  2019-05-09 17:31           ` Jiong Wang
@ 2019-05-10  1:53           ` Alexei Starovoitov
  2019-05-10  8:30             ` Jiong Wang
  1 sibling, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2019-05-10  1:53 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Thu, May 09, 2019 at 01:32:30PM +0100, Jiong Wang wrote:
> 
> Alexei Starovoitov writes:
> 
> > On Wed, May 08, 2019 at 03:45:12PM +0100, Jiong Wang wrote:
> >> 
> >> I might be misunderstanding your points, please just shout if I am wrong.
> >> 
> >> Suppose the following BPF code:
> >> 
> >>   unsigned helper(unsigned long long, unsigned long long);
> >>   unsigned long long test(unsigned *a, unsigned int c)
> >>   {
> >>     unsigned int b = *a;
> >>     c += 10;
> >>     return helper(b, c);
> >>   }
> >> 
> >> We get the following instruction sequence by latest llvm
> >> (-O2 -mattr=+alu32 -mcpu=v3)
> >> 
> >>   test:
> >>     1: w1 = *(u32 *)(r1 + 0)
> >>     2: w2 += 10
> >>     3: call helper
> >>     4: exit
> >> 
> >> Argument Types
> >> ===
> >> Now instruction 1 and 2 are sub-register defines, and instruction 3, the
> >> call, use them implicitly.
> >> 
> >> Without the introduction of the new ARG_CONST_SIZE32 and
> >> ARG_CONST_SIZE32_OR_ZERO, we don't know what should be done with w1 and
> >> w2, zero-extend them should be fine for all cases, but could resulting in a
> >> few unnecessary zero-extension inserted.
> >
> > I don't think we're on the same page.
> > The argument type is _const_.
> > In the example above they are not _const_.
> 
> Right, have read check_func_arg + check_helper_mem_access again.
> 
> Looks like ARG_CONST_SIZE* are designed for describing memory access size
> for things like bounds checking. It must be a constant for stack access,
> otherwise prog will be rejected, but it looks to me variables are allowed
> for pkt/map access.
> 
> But pkt/map has extra range info. So, indeed, ARG_CONST_SIZE32* are
> unnecessary, the width could be figured out through the range.
> 
> Will just drop this patch in next version.
> 
> And sorry for repeating it again, I am still concerned on the issue
> described at https://www.spinics.net/lists/netdev/msg568678.html.
> 
> To be simple, zext insertion is based on eBPF ISA and assumes all
> sub-register defines from alu32 or narrow loads need it if the underlying

It's not an assumption. It's a requirement. If JIT is not zeroing
upper 32-bits after 32-bit alu or narrow load it's a bug.

> hardware arches don't do it. However, some arches support hardware zext
> partially. For example, PowerPC, SPARC etc are 64-bit arches, while they
> don't do hardware zext on alu32, they do it for narrow loads. And RISCV is
> even more special, some alu32 has hardware zext, some don't.
> 
> At the moment we have single backend hook "bpf_jit_hardware_zext", once a
> backend enable it, verifier just insert zero extension for all identified
> alu32 and narrow loads.
> 
> Given verifier analysis info is not pushed down to JIT back-ends, verifier
> needs more back-end info pushed up from back-ends. Do you think make sense
> to introduce another hook "bpf_jit_hardware_zext_narrow_load" to at least
> prevent unnecessary zext inserted for narrowed loads for arches like
> PowerPC, SPARC?
> 
> The hooks to control verifier zext insertion then becomes two:
> 
>   bpf_jit_hardware_zext_alu32
>   bpf_jit_hardware_zext_narrow_load

and what to do with other combinations?
Like in some cases narrow load on particular arch will be zero extended
by hw and if it's misaligned or some other condition then it will not be?
It doesn't feel that we can enumerate all such combinations.
It feels 'bpf_jit_hardware_zext' backend hook isn't quite working.
It optimizes out some zext, but may be adding unnecessary extra zexts.

May be it should be a global flag from the verifier unidirectional to JITs
that will say "the verifier inserted MOV32 where necessary. JIT doesn't
need to do zext manually".
And then JITs will remove MOV32 when hw does it automatically.
Removal should be easy, since such insn will be right after corresponding
alu32 or narrow load.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-10  1:53           ` Alexei Starovoitov
@ 2019-05-10  8:30             ` Jiong Wang
  2019-05-10 20:10               ` Alexei Starovoitov
  0 siblings, 1 reply; 40+ messages in thread
From: Jiong Wang @ 2019-05-10  8:30 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jiong Wang, daniel, bpf, netdev, oss-drivers


Alexei Starovoitov writes:

> On Thu, May 09, 2019 at 01:32:30PM +0100, Jiong Wang wrote:
>> 
>> Alexei Starovoitov writes:
>> 
>> > On Wed, May 08, 2019 at 03:45:12PM +0100, Jiong Wang wrote:
>> >> 
>> >> I might be misunderstanding your points, please just shout if I am wrong.
>> >> 
>> >> Suppose the following BPF code:
>> >> 
>> >>   unsigned helper(unsigned long long, unsigned long long);
>> >>   unsigned long long test(unsigned *a, unsigned int c)
>> >>   {
>> >>     unsigned int b = *a;
>> >>     c += 10;
>> >>     return helper(b, c);
>> >>   }
>> >> 
>> >> We get the following instruction sequence by latest llvm
>> >> (-O2 -mattr=+alu32 -mcpu=v3)
>> >> 
>> >>   test:
>> >>     1: w1 = *(u32 *)(r1 + 0)
>> >>     2: w2 += 10
>> >>     3: call helper
>> >>     4: exit
>> >> 
>> >> Argument Types
>> >> ===
>> >> Now instruction 1 and 2 are sub-register defines, and instruction 3, the
>> >> call, use them implicitly.
>> >> 
>> >> Without the introduction of the new ARG_CONST_SIZE32 and
>> >> ARG_CONST_SIZE32_OR_ZERO, we don't know what should be done with w1 and
>> >> w2, zero-extend them should be fine for all cases, but could resulting in a
>> >> few unnecessary zero-extension inserted.
>> >
>> > I don't think we're on the same page.
>> > The argument type is _const_.
>> > In the example above they are not _const_.
>> 
>> Right, have read check_func_arg + check_helper_mem_access again.
>> 
>> Looks like ARG_CONST_SIZE* are designed for describing memory access size
>> for things like bounds checking. It must be a constant for stack access,
>> otherwise prog will be rejected, but it looks to me variables are allowed
>> for pkt/map access.
>> 
>> But pkt/map has extra range info. So, indeed, ARG_CONST_SIZE32* are
>> unnecessary, the width could be figured out through the range.
>> 
>> Will just drop this patch in next version.
>> 
>> And sorry for repeating it again, I am still concerned on the issue
>> described at https://www.spinics.net/lists/netdev/msg568678.html.
>> 
>> To be simple, zext insertion is based on eBPF ISA and assumes all
>> sub-register defines from alu32 or narrow loads need it if the underlying
>
> It's not an assumption. It's a requirement. If JIT is not zeroing
> upper 32-bits after 32-bit alu or narrow load it's a bug.
>
>> hardware arches don't do it. However, some arches support hardware zext
>> partially. For example, PowerPC, SPARC etc are 64-bit arches, while they
>> don't do hardware zext on alu32, they do it for narrow loads. And RISCV is
>> even more special, some alu32 has hardware zext, some don't.
>> 
>> At the moment we have single backend hook "bpf_jit_hardware_zext", once a
>> backend enable it, verifier just insert zero extension for all identified
>> alu32 and narrow loads.
>> 
>> Given verifier analysis info is not pushed down to JIT back-ends, verifier
>> needs more back-end info pushed up from back-ends. Do you think make sense
>> to introduce another hook "bpf_jit_hardware_zext_narrow_load" to at least
>> prevent unnecessary zext inserted for narrowed loads for arches like
>> PowerPC, SPARC?
>> 
>> The hooks to control verifier zext insertion then becomes two:
>> 
>>   bpf_jit_hardware_zext_alu32
>>   bpf_jit_hardware_zext_narrow_load
>
> and what to do with other combinations?
> Like in some cases narrow load on particular arch will be zero extended
> by hw and if it's misaligned or some other condition then it will not be? 
> It doesn't feel that we can enumerate all such combinations.

Yes, and above narrow_load is just an example. As mentioned, behaviour on
alu32 also diverse on some arches.

> It feels 'bpf_jit_hardware_zext' backend hook isn't quite working.

It is still useful for x86_64 and aarch64 to disable verifier insertion
pass completely. But then perhaps should be renamed into
"bpf_jit_verifier_zext". Returning false means verifier should disable the
insertion completely.

> It optimizes out some zext, but may be adding unnecessary extra zexts.

This is exactly my concern.

> May be it should be a global flag from the verifier unidirectional to JITs
> that will say "the verifier inserted MOV32 where necessary. JIT doesn't
> need to do zext manually".
> And then JITs will remove MOV32 when hw does it automatically.
> Removal should be easy, since such insn will be right after corresponding
> alu32 or narrow load.

OK, so you mean do a simple peephole to combine insns. JIT looks forward
the next insn, if it is mov32 with dst_src == src_reg, then skip it. And
only do this when jitting a sub-register write eBPF insn and there is
hardware zext support.

I guess such special mov32 won't be generated by compiler that it won't be
jump destination hence skip it is safe.

For zero extension insertion part of this set, I am going to do the
following changes in next version:

  1. verifier inserts special "mov32" (dst_reg == src_reg) as "zext".
     JIT could still save zext for the other "mov32", but should always do
     zext for this special "mov32".
  2. rename 'bpf_jit_hardware_zext' to 'bpf_jit_verifier_zext' which
     returns false at default to disable zext insertion.
  3. JITs want verifier zext override bpf_jit_verifier_zext to return
     true and should skip unnecessary mov32 as described above.

Looks good?

Regards,
Jiong



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-10  8:30             ` Jiong Wang
@ 2019-05-10 20:10               ` Alexei Starovoitov
  2019-05-10 21:59                 ` Jiong Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2019-05-10 20:10 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Fri, May 10, 2019 at 09:30:28AM +0100, Jiong Wang wrote:
> 
> Alexei Starovoitov writes:
> 
> > On Thu, May 09, 2019 at 01:32:30PM +0100, Jiong Wang wrote:
> >> 
> >> Alexei Starovoitov writes:
> >> 
> >> > On Wed, May 08, 2019 at 03:45:12PM +0100, Jiong Wang wrote:
> >> >> 
> >> >> I might be misunderstanding your points, please just shout if I am wrong.
> >> >> 
> >> >> Suppose the following BPF code:
> >> >> 
> >> >>   unsigned helper(unsigned long long, unsigned long long);
> >> >>   unsigned long long test(unsigned *a, unsigned int c)
> >> >>   {
> >> >>     unsigned int b = *a;
> >> >>     c += 10;
> >> >>     return helper(b, c);
> >> >>   }
> >> >> 
> >> >> We get the following instruction sequence by latest llvm
> >> >> (-O2 -mattr=+alu32 -mcpu=v3)
> >> >> 
> >> >>   test:
> >> >>     1: w1 = *(u32 *)(r1 + 0)
> >> >>     2: w2 += 10
> >> >>     3: call helper
> >> >>     4: exit
> >> >> 
> >> >> Argument Types
> >> >> ===
> >> >> Now instruction 1 and 2 are sub-register defines, and instruction 3, the
> >> >> call, use them implicitly.
> >> >> 
> >> >> Without the introduction of the new ARG_CONST_SIZE32 and
> >> >> ARG_CONST_SIZE32_OR_ZERO, we don't know what should be done with w1 and
> >> >> w2, zero-extend them should be fine for all cases, but could resulting in a
> >> >> few unnecessary zero-extension inserted.
> >> >
> >> > I don't think we're on the same page.
> >> > The argument type is _const_.
> >> > In the example above they are not _const_.
> >> 
> >> Right, have read check_func_arg + check_helper_mem_access again.
> >> 
> >> Looks like ARG_CONST_SIZE* are designed for describing memory access size
> >> for things like bounds checking. It must be a constant for stack access,
> >> otherwise prog will be rejected, but it looks to me variables are allowed
> >> for pkt/map access.
> >> 
> >> But pkt/map has extra range info. So, indeed, ARG_CONST_SIZE32* are
> >> unnecessary, the width could be figured out through the range.
> >> 
> >> Will just drop this patch in next version.
> >> 
> >> And sorry for repeating it again, I am still concerned on the issue
> >> described at https://www.spinics.net/lists/netdev/msg568678.html.
> >> 
> >> To be simple, zext insertion is based on eBPF ISA and assumes all
> >> sub-register defines from alu32 or narrow loads need it if the underlying
> >
> > It's not an assumption. It's a requirement. If JIT is not zeroing
> > upper 32-bits after 32-bit alu or narrow load it's a bug.
> >
> >> hardware arches don't do it. However, some arches support hardware zext
> >> partially. For example, PowerPC, SPARC etc are 64-bit arches, while they
> >> don't do hardware zext on alu32, they do it for narrow loads. And RISCV is
> >> even more special, some alu32 has hardware zext, some don't.
> >> 
> >> At the moment we have single backend hook "bpf_jit_hardware_zext", once a
> >> backend enable it, verifier just insert zero extension for all identified
> >> alu32 and narrow loads.
> >> 
> >> Given verifier analysis info is not pushed down to JIT back-ends, verifier
> >> needs more back-end info pushed up from back-ends. Do you think make sense
> >> to introduce another hook "bpf_jit_hardware_zext_narrow_load" to at least
> >> prevent unnecessary zext inserted for narrowed loads for arches like
> >> PowerPC, SPARC?
> >> 
> >> The hooks to control verifier zext insertion then becomes two:
> >> 
> >>   bpf_jit_hardware_zext_alu32
> >>   bpf_jit_hardware_zext_narrow_load
> >
> > and what to do with other combinations?
> > Like in some cases narrow load on particular arch will be zero extended
> > by hw and if it's misaligned or some other condition then it will not be? 
> > It doesn't feel that we can enumerate all such combinations.
> 
> Yes, and above narrow_load is just an example. As mentioned, behaviour on
> alu32 also diverse on some arches.
> 
> > It feels 'bpf_jit_hardware_zext' backend hook isn't quite working.
> 
> It is still useful for x86_64 and aarch64 to disable verifier insertion
> pass completely. But then perhaps should be renamed into
> "bpf_jit_verifier_zext". Returning false means verifier should disable the
> insertion completely.

I think the name is too cryptic.
May be call it bpf_jit_needs_zext ?
x64/arm64 will set it false and the rest to true ?

> > It optimizes out some zext, but may be adding unnecessary extra zexts.
> 
> This is exactly my concern.
> 
> > May be it should be a global flag from the verifier unidirectional to JITs
> > that will say "the verifier inserted MOV32 where necessary. JIT doesn't
> > need to do zext manually".
> > And then JITs will remove MOV32 when hw does it automatically.
> > Removal should be easy, since such insn will be right after corresponding
> > alu32 or narrow load.
> 
> OK, so you mean do a simple peephole to combine insns. JIT looks forward
> the next insn, if it is mov32 with dst_src == src_reg, then skip it. And
> only do this when jitting a sub-register write eBPF insn and there is
> hardware zext support.
> 
> I guess such special mov32 won't be generated by compiler that it won't be
> jump destination hence skip it is safe.
> 
> For zero extension insertion part of this set, I am going to do the
> following changes in next version:
> 
>   1. verifier inserts special "mov32" (dst_reg == src_reg) as "zext".
>      JIT could still save zext for the other "mov32", but should always do
>      zext for this special "mov32".

May be used mov32 with imm==1 as indicator that such mov32 is special?

>   2. rename 'bpf_jit_hardware_zext' to 'bpf_jit_verifier_zext' which
>      returns false at default to disable zext insertion.
>   3. JITs want verifier zext override bpf_jit_verifier_zext to return
>      true and should skip unnecessary mov32 as described above.
> 
> Looks good?

Kinda makes sense, but when x64/arm64 are saying 'dont do zext'
what verifier suppose to do? It will still do the analysis
and liveness marks, but only mov32 won't be inserted?
I guess that's fine, since BPF_F_TEST_RND_HI32 will use
the results of the analysis?
Please double check that BPF_F_TEST_RND_HI32 poisoning works on
32-bit archs too.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type
  2019-05-10 20:10               ` Alexei Starovoitov
@ 2019-05-10 21:59                 ` Jiong Wang
  0 siblings, 0 replies; 40+ messages in thread
From: Jiong Wang @ 2019-05-10 21:59 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jiong Wang, daniel, bpf, netdev, oss-drivers


Alexei Starovoitov writes:

> On Fri, May 10, 2019 at 09:30:28AM +0100, Jiong Wang wrote:
>> 
>> Alexei Starovoitov writes:
>> 
>> > On Thu, May 09, 2019 at 01:32:30PM +0100, Jiong Wang wrote:
>> >> 
>> >> Alexei Starovoitov writes:
>> >> 
>> >> > On Wed, May 08, 2019 at 03:45:12PM +0100, Jiong Wang wrote:
>> >> >> 
>> >> >> I might be misunderstanding your points, please just shout if I am wrong.
>> >> >> 
>> >> >> Suppose the following BPF code:
>> >> >> 
>> >> >>   unsigned helper(unsigned long long, unsigned long long);
>> >> >>   unsigned long long test(unsigned *a, unsigned int c)
>> >> >>   {
>> >> >>     unsigned int b = *a;
>> >> >>     c += 10;
>> >> >>     return helper(b, c);
>> >> >>   }
>> >> >> 
>> >> >> We get the following instruction sequence by latest llvm
>> >> >> (-O2 -mattr=+alu32 -mcpu=v3)
>> >> >> 
>> >> >>   test:
>> >> >>     1: w1 = *(u32 *)(r1 + 0)
>> >> >>     2: w2 += 10
>> >> >>     3: call helper
>> >> >>     4: exit
>> >> >> 
>> >> >> Argument Types
>> >> >> ===
>> >> >> Now instruction 1 and 2 are sub-register defines, and instruction 3, the
>> >> >> call, use them implicitly.
>> >> >> 
>> >> >> Without the introduction of the new ARG_CONST_SIZE32 and
>> >> >> ARG_CONST_SIZE32_OR_ZERO, we don't know what should be done with w1 and
>> >> >> w2, zero-extend them should be fine for all cases, but could resulting in a
>> >> >> few unnecessary zero-extension inserted.
>> >> >
>> >> > I don't think we're on the same page.
>> >> > The argument type is _const_.
>> >> > In the example above they are not _const_.
>> >> 
>> >> Right, have read check_func_arg + check_helper_mem_access again.
>> >> 
>> >> Looks like ARG_CONST_SIZE* are designed for describing memory access size
>> >> for things like bounds checking. It must be a constant for stack access,
>> >> otherwise prog will be rejected, but it looks to me variables are allowed
>> >> for pkt/map access.
>> >> 
>> >> But pkt/map has extra range info. So, indeed, ARG_CONST_SIZE32* are
>> >> unnecessary, the width could be figured out through the range.
>> >> 
>> >> Will just drop this patch in next version.
>> >> 
>> >> And sorry for repeating it again, I am still concerned on the issue
>> >> described at https://www.spinics.net/lists/netdev/msg568678.html.
>> >> 
>> >> To be simple, zext insertion is based on eBPF ISA and assumes all
>> >> sub-register defines from alu32 or narrow loads need it if the underlying
>> >
>> > It's not an assumption. It's a requirement. If JIT is not zeroing
>> > upper 32-bits after 32-bit alu or narrow load it's a bug.
>> >
>> >> hardware arches don't do it. However, some arches support hardware zext
>> >> partially. For example, PowerPC, SPARC etc are 64-bit arches, while they
>> >> don't do hardware zext on alu32, they do it for narrow loads. And RISCV is
>> >> even more special, some alu32 has hardware zext, some don't.
>> >> 
>> >> At the moment we have single backend hook "bpf_jit_hardware_zext", once a
>> >> backend enable it, verifier just insert zero extension for all identified
>> >> alu32 and narrow loads.
>> >> 
>> >> Given verifier analysis info is not pushed down to JIT back-ends, verifier
>> >> needs more back-end info pushed up from back-ends. Do you think make sense
>> >> to introduce another hook "bpf_jit_hardware_zext_narrow_load" to at least
>> >> prevent unnecessary zext inserted for narrowed loads for arches like
>> >> PowerPC, SPARC?
>> >> 
>> >> The hooks to control verifier zext insertion then becomes two:
>> >> 
>> >>   bpf_jit_hardware_zext_alu32
>> >>   bpf_jit_hardware_zext_narrow_load
>> >
>> > and what to do with other combinations?
>> > Like in some cases narrow load on particular arch will be zero extended
>> > by hw and if it's misaligned or some other condition then it will not be? 
>> > It doesn't feel that we can enumerate all such combinations.
>> 
>> Yes, and above narrow_load is just an example. As mentioned, behaviour on
>> alu32 also diverse on some arches.
>> 
>> > It feels 'bpf_jit_hardware_zext' backend hook isn't quite working.
>> 
>> It is still useful for x86_64 and aarch64 to disable verifier insertion
>> pass completely. But then perhaps should be renamed into
>> "bpf_jit_verifier_zext". Returning false means verifier should disable the
>> insertion completely.
>
> I think the name is too cryptic.
> May be call it bpf_jit_needs_zext ?

Ack.

> x64/arm64 will set it false and the rest to true ?

Ack.

>> > It optimizes out some zext, but may be adding unnecessary extra zexts.
>> 
>> This is exactly my concern.
>> 
>> > May be it should be a global flag from the verifier unidirectional to JITs
>> > that will say "the verifier inserted MOV32 where necessary. JIT doesn't
>> > need to do zext manually".
>> > And then JITs will remove MOV32 when hw does it automatically.
>> > Removal should be easy, since such insn will be right after corresponding
>> > alu32 or narrow load.
>> 
>> OK, so you mean do a simple peephole to combine insns. JIT looks forward
>> the next insn, if it is mov32 with dst_src == src_reg, then skip it. And
>> only do this when jitting a sub-register write eBPF insn and there is
>> hardware zext support.
>> 
>> I guess such special mov32 won't be generated by compiler that it won't be
>> jump destination hence skip it is safe.
>> 
>> For zero extension insertion part of this set, I am going to do the
>> following changes in next version:
>> 
>>   1. verifier inserts special "mov32" (dst_reg == src_reg) as "zext".
>>      JIT could still save zext for the other "mov32", but should always do
>>      zext for this special "mov32".
>
> May be used mov32 with imm==1 as indicator that such mov32 is special?

OK, will go with it.

>
>>   2. rename 'bpf_jit_hardware_zext' to 'bpf_jit_verifier_zext' which
>>      returns false at default to disable zext insertion.
>>   3. JITs want verifier zext override bpf_jit_verifier_zext to return
>>      true and should skip unnecessary mov32 as described above.
>> 
>> Looks good?
>
> Kinda makes sense, but when x64/arm64 are saying 'dont do zext'
> what verifier suppose to do? It will still do the analysis
> and liveness marks, but only mov32 won't be inserted?

Yes. The analysis part is still enabled, it is quite integrated with
existing liveness tracking infrastructure, and is not heavy.

zext insertion part is disabled.

> I guess that's fine, since BPF_F_TEST_RND_HI32 will use
> the results of the analysis?

Yes, hi32 poisoning needs it.

> Please double check that BPF_F_TEST_RND_HI32 poisoning works on
> 32-bit archs too.

OK. I don't have 32-bit host env at the moment, but will try to sort this
out.

Regards,
Jiong

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2019-05-10 22:00 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-03 10:42 [PATCH v6 bpf-next 00/17] bpf: eliminate zero extensions for sub-register writes Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 01/17] bpf: verifier: offer more accurate helper function arg and return type Jiong Wang
2019-05-06 13:57   ` Daniel Borkmann
2019-05-06 22:25     ` Jiong Wang
2019-05-08 11:12       ` Jiong Wang
2019-05-06 15:50   ` Alexei Starovoitov
2019-05-08 14:45     ` Jiong Wang
2019-05-08 17:51       ` Alexei Starovoitov
2019-05-09 12:32         ` Jiong Wang
2019-05-09 17:31           ` Jiong Wang
2019-05-10  1:53           ` Alexei Starovoitov
2019-05-10  8:30             ` Jiong Wang
2019-05-10 20:10               ` Alexei Starovoitov
2019-05-10 21:59                 ` Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 02/17] bpf: verifier: mark verified-insn with sub-register zext flag Jiong Wang
2019-05-06 13:49   ` Daniel Borkmann
2019-05-06 14:49     ` Daniel Borkmann
2019-05-06 22:14     ` Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 03/17] bpf: verifier: mark patched-insn " Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 04/17] bpf: introduce new alu insn BPF_ZEXT for explicit zero extension Jiong Wang
2019-05-06 15:57   ` Alexei Starovoitov
2019-05-06 23:19     ` Jiong Wang
2019-05-07  4:29       ` Jiong Wang
2019-05-07  4:40         ` Alexei Starovoitov
2019-05-03 10:42 ` [PATCH v6 bpf-next 05/17] bpf: verifier: insert BPF_ZEXT according to zext analysis result Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 06/17] bpf: introduce new bpf prog load flags "BPF_F_TEST_RND_HI32" Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 07/17] bpf: verifier: randomize high 32-bit when BPF_F_TEST_RND_HI32 is set Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 08/17] libbpf: add "prog_flags" to bpf_program/bpf_prog_load_attr/bpf_load_program_attr Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 09/17] selftests: bpf: adjust several test_verifier helpers for insn insertion Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 10/17] selftests: bpf: enable hi32 randomization for all tests Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 11/17] arm: bpf: eliminate zero extension code-gen Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 12/17] powerpc: " Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 13/17] s390: " Jiong Wang
2019-05-03 13:41   ` Heiko Carstens
2019-05-03 13:50     ` Eric Dumazet
2019-05-03 14:09     ` Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 14/17] sparc: " Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 15/17] x32: " Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 16/17] riscv: " Jiong Wang
2019-05-03 10:42 ` [PATCH v6 bpf-next 17/17] nfp: " Jiong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).