All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes
@ 2019-03-26 18:05 Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 01/16] bpf: turn "enum bpf_reg_liveness" into bit representation Jiong Wang
                   ` (15 more replies)
  0 siblings, 16 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, David S . Miller,
	Paul Burton, Wang YanQing, Zi Shen Lim, Shubham Bansal,
	Naveen N . Rao, Sandipan Das, Martin Schwidefsky, Heiko Carstens,
	Jakub Kicinski

eBPF ISA specification requires high 32-bit cleared when low 32-bit
sub-register is written. This applies to destination register of
ALU32/LD_H/B/W etc. JIT back-ends must guarantee this semantic when doing
code-gen.

x86-64 and arm64 ISA has the same semantics, so the corresponding JIT
back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
etc.) and some other 64-bit arches (powerpc, sparc etc), need explicitly
zero extension sequence to meet such semantic.

This is important, because for C code like the following:

  u64_value = (u64) u32_value
  ... other uses of u64_value

compiler could exploit the semantic described above and save those zero
extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
JIT back-ends, are responsible for guaranteeing this. Some benchmarks shows
~40% sub-register writes out of total insns, meaning ~40% extra code-gen
and could go up for arches requiring two shifts for zero extension. All
these are because JIT back-end needs to do extra code-gen for all such
instructions, always.

However this is not always necessary in case u32 value is never cast into a
u64, which is quite normal in real life program. So, it would be really
good if we could identify those places where such type cast happened, and
only do zero extensions for them, not for the others. This could save a lot
of BPF code-gen.

Algo
====
We could use insn scan based static analysis to tell whether one
sub-register def doesn't need zero extension. However, using such static
analysis, we must do conservative assumption at branching point where
multiple uses could be introduced. So, for any sub-register def that is
active at branching point, we need to mark it as needing zero extension.
This could introducing quite a few false alarms, for example ~25% on
Cilium bpf_lxc.

It will be far better to use dynamic data-flow tracing which verifier
fortunately already has and could be easily extend to serve the purpose of
this patch set.

 - Record indices of instructions that do sub-register def (write). And
   these indices need to stay with function state so path pruning and bpf
   to bpf function call could be handled properly.

   These indices are kept up to date while doing insn walk.

 - A full register read on an active sub-register def marks the def insn as
   needing zero extension on dst register.

 - A new sub-register write overrides the old one.

   A new full register write makes the register free of zero extension on
   dst register.

 - When propagating register read64 during path pruning, it also marks def
   insns whose defs are hanging active sub-register, if there is any read64
   from shown from the equal state.

 The core patch in this set is patch 4.

Benchmark
=========
 - I estimate the JITed image could be 25% smaller on average on all these
   affected arches (nfp, arm, x32, risv, ppc, sparc, s390).

 - The implementation is based on existing register read liveness tracking
   infrastructure, so it is dynamic tracking and would trace all possible
   code paths, therefore, it shouldn't be any false alarm.

   For Cilium bpf_lxc, there is ~11500 insns in the compiled binary (use
   latest LLVM snapshot, and with -mcpu=v3 -mattr=+alu32 enabled), 4460 of
   them has sub-register writes (~40%). Calculated by:

    cat dump | grep -P "\tw" | wc -l       (ALU32)
    cat dump | grep -P "r.*=.*u32" | wc -l (READ_W)
    cat dump | grep -P "r.*=.*u16" | wc -l (READ_H)
    cat dump | grep -P "r.*=.*u8" | wc -l  (READ_B)

   After this patch set enabled, 647 out of those 4460 are identified as
   really needing zero extension on the destination, then it is safe for
   JIT back-ends to eliminate zero extension for all the other instructions
   which is ~85% of all those sub-register write insns or 33% of total
   insns. It is a significant save.

   For those 647 insns marked as needing zero extension, part of them are
   setting up u64 parameters for help calls, remaining ones are those whose
   sub-register defs really have 64-bit reads.

Tetsting
========
A couple of unit tests have been added including cases for path pruning
etc. It would be great if any affected host arch could apply this patch set
then test bpf selftest under sub-register code-gen mode, test_progs which
contains quite a few C programs will be a very good coverage.

ToDo
====
 - eBPF doesn't have zero extension instruction, so lshift and rshift are
   used to do it, meaning two native insns while JIT back-ends could just
   use one truncate insn if they understand the lshift + rshift is just
   doing zero extension.

   There are three approaches to fix this:

     - Minor change on back-end JIT hook, also pass aux_insn information to
       back-ends so they could have per insn information and they could do
       zero extension for the marked insn themselves using the most
       efficient native insn.

    - Introduce zero extension insn for eBPF. Then verifier could insert
      the new zext insn instead of lshift + rshift. zext could be JITed
      more efficiently.

      NOTE: existing MOV64 can't be used as zero extension insn, because
            after zext optimization enabled, MOV64 doesn't necessarily
            clear high 32-bit.

    - Otherwise JIT back-ends need to do peephole to catch lshift + rshift
      and turn them into native zext.

 - In this set, for JIT back-ends except NFP, I have only enabled the
   optimization for ALU32 instructions, while it could be easily enabled
   for load instruction. I could do a follow up patch set for this.

Reviews
=======
 - Fixed the missing handling on callee-saved for bpf-to-bpf call,
   sub-register defs therefore moved to frame state. (Jakub Kicinski)
 - Removed redundant "cross_reg". (Jakub Kicinski)
 - Various coding styles & grammar fixes. (Jakub Kicinski, Quentin Monnet)

Cc: David S. Miller <davem@davemloft.net>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Wang YanQing <udknight@gmail.com>
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Shubham Bansal <illusionist.neo@gmail.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>

Jiong Wang (16):
  bpf: turn "enum bpf_reg_liveness" into bit representation
  bpf: refactor propagate_live implementation
  bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32
  bpf: mark sub-register writes that really need zero extension to high
    bits
  bpf: reduce false alarm by refining "enum bpf_arg_type"
  bpf: new sysctl "bpf_jit_32bit_opt"
  bpf: insert explicit zero extension instructions when
    bpf_jit_32bit_opt is true
  arm: bpf: eliminate zero extension code-gen
  powerpc: bpf: eliminate zero extension code-gen
  s390: bpf: eliminate zero extension code-gen
  sparc: bpf: eliminate zero extension code-gen
  x32: bpf: eliminate zero extension code-gen
  riscv: bpf: eliminate zero extension code-gen
  nfp: bpf: eliminate zero extension code-gen
  selftests: bpf: new field "xlated_insns" for insn scan test after
    verification
  selftests: bpf: unit testcases for zero extension insertion pass

 Documentation/sysctl/net.txt                      |  15 +
 arch/arm/net/bpf_jit_32.c                         |  22 +-
 arch/powerpc/net/bpf_jit_comp64.c                 |   7 +-
 arch/riscv/net/bpf_jit_comp.c                     |  32 +-
 arch/s390/net/bpf_jit_comp.c                      |  13 +-
 arch/sparc/net/bpf_jit_comp_64.c                  |   8 +-
 arch/x86/net/bpf_jit_comp32.c                     |  32 +-
 drivers/net/ethernet/netronome/nfp/bpf/jit.c      | 119 ++--
 drivers/net/ethernet/netronome/nfp/bpf/main.h     |   2 +
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |  12 +
 include/linux/bpf.h                               |   4 +
 include/linux/bpf_verifier.h                      |  23 +-
 include/linux/filter.h                            |   2 +
 kernel/bpf/core.c                                 |  18 +-
 kernel/bpf/helpers.c                              |   2 +-
 kernel/bpf/verifier.c                             | 216 +++++--
 net/core/filter.c                                 |  28 +-
 net/core/sysctl_net_core.c                        |   9 +
 tools/testing/selftests/bpf/test_verifier.c       | 220 +++++++-
 tools/testing/selftests/bpf/verifier/zext.c       | 651 ++++++++++++++++++++++
 20 files changed, 1286 insertions(+), 149 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/zext.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 01/16] bpf: turn "enum bpf_reg_liveness" into bit representation
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-27 15:44   ` Alexei Starovoitov
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 02/16] bpf: refactor propagate_live implementation Jiong Wang
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

"enum bpf_reg_liveness" is actually used as bit instead of integer. For
example:

        if (live & (REG_LIVE_READ | REG_LIVE_WRITTEN | REG_LIVE_DONE))

Using enum to represent bit is error prone, better to use explicit bit
macros.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf_verifier.h | 16 +++++++++-------
 kernel/bpf/verifier.c        |  5 ++---
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7d8228d..f03c86a 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -34,12 +34,14 @@
  * but of the link between it and its parent.  See mark_reg_read() and
  * mark_stack_slot_read() in kernel/bpf/verifier.c.
  */
-enum bpf_reg_liveness {
-	REG_LIVE_NONE = 0, /* reg hasn't been read or written this branch */
-	REG_LIVE_READ, /* reg was read, so we're sensitive to initial value */
-	REG_LIVE_WRITTEN, /* reg was written first, screening off later reads */
-	REG_LIVE_DONE = 4, /* liveness won't be updating this register anymore */
-};
+/* Reg hasn't been read or written this branch. */
+#define REG_LIVE_NONE		0x0
+/* Reg was read, so we're sensitive to initial value. */
+#define REG_LIVE_READ		0x1
+/* Reg was written first, screening off later reads. */
+#define REG_LIVE_WRITTEN	0x2
+/* Liveness won't be updating this register anymore. */
+#define REG_LIVE_DONE		0x4
 
 struct bpf_reg_state {
 	/* Ordering of fields matters.  See states_equal() */
@@ -131,7 +133,7 @@ struct bpf_reg_state {
 	 * pointing to bpf_func_state.
 	 */
 	u32 frameno;
-	enum bpf_reg_liveness live;
+	u8 live;
 };
 
 enum bpf_stack_slot_type {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dffeec3..6cc8c38 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -407,8 +407,7 @@ static char slot_type_char[] = {
 	[STACK_ZERO]	= '0',
 };
 
-static void print_liveness(struct bpf_verifier_env *env,
-			   enum bpf_reg_liveness live)
+static void print_liveness(struct bpf_verifier_env *env, u8 live)
 {
 	if (live & (REG_LIVE_READ | REG_LIVE_WRITTEN | REG_LIVE_DONE))
 	    verbose(env, "_");
@@ -5687,8 +5686,8 @@ static bool check_ids(u32 old_id, u32 cur_id, struct idpair *idmap)
 static void clean_func_state(struct bpf_verifier_env *env,
 			     struct bpf_func_state *st)
 {
-	enum bpf_reg_liveness live;
 	int i, j;
+	u8 live;
 
 	for (i = 0; i < BPF_REG_FP; i++) {
 		live = st->regs[i].live;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 02/16] bpf: refactor propagate_live implementation
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 01/16] bpf: turn "enum bpf_reg_liveness" into bit representation Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:26   ` Jann Horn
  2019-03-27 16:35   ` Alexei Starovoitov
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 03/16] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32 Jiong Wang
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

Some code inside current implementation of "propagate_liveness" is a little
bit verbose.

This patch refactor them so the code looks more simple and more clear.

The redundant usage of "vparent->frame[vstate->curframe]" is removed as we
are here. It is safe to do this because "state_equal" has guaranteed that
vstate->curframe must be equal with vparent->curframe.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/verifier.c | 44 ++++++++++++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6cc8c38..245bb3c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6050,6 +6050,22 @@ static bool states_equal(struct bpf_verifier_env *env,
 	return true;
 }
 
+static int propagate_liveness_reg(struct bpf_verifier_env *env,
+				  struct bpf_reg_state *reg,
+				  struct bpf_reg_state *parent_reg, u8 flag)
+{
+	int err;
+
+	if (parent_reg->live & flag || !(reg->live & flag))
+		return 0;
+
+	err = mark_reg_read(env, reg, parent_reg);
+	if (err)
+		return err;
+
+	return 1;
+}
+
 /* A write screens off any subsequent reads; but write marks come from the
  * straight-line code between a state and its parent.  When we arrive at an
  * equivalent state (jump target or such) we didn't arrive by the straight-line
@@ -6061,8 +6077,9 @@ static int propagate_liveness(struct bpf_verifier_env *env,
 			      const struct bpf_verifier_state *vstate,
 			      struct bpf_verifier_state *vparent)
 {
-	int i, frame, err = 0;
+	struct bpf_reg_state *regs, *parent_regs;
 	struct bpf_func_state *state, *parent;
+	int i, frame, err = 0;
 
 	if (vparent->curframe != vstate->curframe) {
 		WARN(1, "propagate_live: parent frame %d current frame %d\n",
@@ -6071,16 +6088,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
 	}
 	/* Propagate read liveness of registers... */
 	BUILD_BUG_ON(BPF_REG_FP + 1 != MAX_BPF_REG);
+	parent_regs = vparent->frame[vparent->curframe]->regs;
+	regs = vstate->frame[vstate->curframe]->regs;
 	/* We don't need to worry about FP liveness because it's read-only */
 	for (i = 0; i < BPF_REG_FP; i++) {
-		if (vparent->frame[vparent->curframe]->regs[i].live & REG_LIVE_READ)
-			continue;
-		if (vstate->frame[vstate->curframe]->regs[i].live & REG_LIVE_READ) {
-			err = mark_reg_read(env, &vstate->frame[vstate->curframe]->regs[i],
-					    &vparent->frame[vstate->curframe]->regs[i]);
-			if (err)
-				return err;
-		}
+		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i]);
+		if (err < 0)
+			return err;
 	}
 
 	/* ... and stack slots */
@@ -6089,11 +6103,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
 		parent = vparent->frame[frame];
 		for (i = 0; i < state->allocated_stack / BPF_REG_SIZE &&
 			    i < parent->allocated_stack / BPF_REG_SIZE; i++) {
-			if (parent->stack[i].spilled_ptr.live & REG_LIVE_READ)
-				continue;
-			if (state->stack[i].spilled_ptr.live & REG_LIVE_READ)
-				mark_reg_read(env, &state->stack[i].spilled_ptr,
-					      &parent->stack[i].spilled_ptr);
+			struct bpf_reg_state *parent_reg, *reg;
+
+			parent_reg = &parent->stack[i].spilled_ptr;
+			reg = &state->stack[i].spilled_ptr;
+			err = propagate_liveness_reg(env, reg, parent_reg);
+			if (err < 0)
+				return err;
 		}
 	}
 	return err;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 03/16] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 01/16] bpf: turn "enum bpf_reg_liveness" into bit representation Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 02/16] bpf: refactor propagate_live implementation Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 20:21   ` Jann Horn
  2019-03-27 16:38   ` Alexei Starovoitov
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits Jiong Wang
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

In previous patch, we have split register arg type for sub-register read,
but haven't touch read liveness.

This patch further split read liveness into REG_LIVE_READ64 and
REG_LIVE_READ32. Liveness propagation code are updated accordingly.

After this split, customized actions could be defined when propagating full
register read (REG_LIVE_READ64) or sub-register read (REG_LIVE_READ32).

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf_verifier.h |  9 ++++++---
 kernel/bpf/verifier.c        | 30 +++++++++++++++++++++---------
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f03c86a..27761ab 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -37,11 +37,14 @@
 /* Reg hasn't been read or written this branch. */
 #define REG_LIVE_NONE		0x0
 /* Reg was read, so we're sensitive to initial value. */
-#define REG_LIVE_READ		0x1
+#define REG_LIVE_READ32		0x1
+/* Likewise, but full 64-bit content matters. */
+#define REG_LIVE_READ64		0x2
+#define REG_LIVE_READ		(REG_LIVE_READ32 | REG_LIVE_READ64)
 /* Reg was written first, screening off later reads. */
-#define REG_LIVE_WRITTEN	0x2
+#define REG_LIVE_WRITTEN	0x4
 /* Liveness won't be updating this register anymore. */
-#define REG_LIVE_DONE		0x4
+#define REG_LIVE_DONE		0x8
 
 struct bpf_reg_state {
 	/* Ordering of fields matters.  See states_equal() */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 245bb3c..b95c438 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1126,7 +1126,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
  */
 static int mark_reg_read(struct bpf_verifier_env *env,
 			 const struct bpf_reg_state *state,
-			 struct bpf_reg_state *parent)
+			 struct bpf_reg_state *parent, bool dw_read)
 {
 	bool writes = parent == state->parent; /* Observe write marks */
 
@@ -1141,7 +1141,7 @@ static int mark_reg_read(struct bpf_verifier_env *env,
 			return -EFAULT;
 		}
 		/* ... then we depend on parent's value */
-		parent->live |= REG_LIVE_READ;
+		parent->live |= dw_read ? REG_LIVE_READ64 : REG_LIVE_READ32;
 		state = parent;
 		parent = state->parent;
 		writes = true;
@@ -1170,7 +1170,7 @@ static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 		/* We don't need to worry about FP liveness because it's read-only */
 		if (regno != BPF_REG_FP)
 			return mark_reg_read(env, &regs[regno],
-					     regs[regno].parent);
+					     regs[regno].parent, true);
 	} else {
 		/* check whether register used as dest operand can be written to */
 		if (regno == BPF_REG_FP) {
@@ -1357,7 +1357,7 @@ static int check_stack_read(struct bpf_verifier_env *env,
 			state->regs[value_regno].live |= REG_LIVE_WRITTEN;
 		}
 		mark_reg_read(env, &reg_state->stack[spi].spilled_ptr,
-			      reg_state->stack[spi].spilled_ptr.parent);
+			      reg_state->stack[spi].spilled_ptr.parent, true);
 		return 0;
 	} else {
 		int zeros = 0;
@@ -1374,7 +1374,8 @@ static int check_stack_read(struct bpf_verifier_env *env,
 			return -EACCES;
 		}
 		mark_reg_read(env, &reg_state->stack[spi].spilled_ptr,
-			      reg_state->stack[spi].spilled_ptr.parent);
+			      reg_state->stack[spi].spilled_ptr.parent,
+			      size == BPF_REG_SIZE);
 		if (value_regno >= 0) {
 			if (zeros == size) {
 				/* any size read into register is zero extended,
@@ -2220,7 +2221,8 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
 		 * the whole slot to be marked as 'read'
 		 */
 		mark_reg_read(env, &state->stack[spi].spilled_ptr,
-			      state->stack[spi].spilled_ptr.parent);
+			      state->stack[spi].spilled_ptr.parent,
+			      access_size == BPF_REG_SIZE);
 	}
 	return update_stack_depth(env, state, off);
 }
@@ -6059,7 +6061,7 @@ static int propagate_liveness_reg(struct bpf_verifier_env *env,
 	if (parent_reg->live & flag || !(reg->live & flag))
 		return 0;
 
-	err = mark_reg_read(env, reg, parent_reg);
+	err = mark_reg_read(env, reg, parent_reg, flag == REG_LIVE_READ64);
 	if (err)
 		return err;
 
@@ -6092,7 +6094,12 @@ static int propagate_liveness(struct bpf_verifier_env *env,
 	regs = vstate->frame[vstate->curframe]->regs;
 	/* We don't need to worry about FP liveness because it's read-only */
 	for (i = 0; i < BPF_REG_FP; i++) {
-		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i]);
+		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i],
+					     REG_LIVE_READ64);
+		if (err < 0)
+			return err;
+		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i],
+					     REG_LIVE_READ32);
 		if (err < 0)
 			return err;
 	}
@@ -6107,7 +6114,12 @@ static int propagate_liveness(struct bpf_verifier_env *env,
 
 			parent_reg = &parent->stack[i].spilled_ptr;
 			reg = &state->stack[i].spilled_ptr;
-			err = propagate_liveness_reg(env, reg, parent_reg);
+			err = propagate_liveness_reg(env, reg, parent_reg,
+						     REG_LIVE_READ64);
+			if (err < 0)
+				return err;
+			err = propagate_liveness_reg(env, reg, parent_reg,
+						     REG_LIVE_READ32);
 			if (err < 0)
 				return err;
 		}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (2 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 03/16] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32 Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:44   ` Edward Cree
  2019-03-27 16:50   ` Alexei Starovoitov
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 05/16] bpf: reduce false alarm by refining "enum bpf_arg_type" Jiong Wang
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

eBPF ISA specification requires high 32-bit cleared when low 32-bit
sub-register is written. This applies to destination register of ALU32 etc.
JIT back-ends must guarantee this semantic when doing code-gen.

x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
extension sequence to meet such semantic.

This is important, because for code the following:

  u64_value = (u64) u32_value
  ... other uses of u64_value

compiler could exploit the semantic described above and save those zero
extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
could go up to more for some arches which requires two shifts for zero
extension) because JIT back-end needs to do extra code-gen for all such
instructions.

However this is not always necessary in case u32_value is never cast into
a u64, which is quite normal in real life program. So, it would be really
good if we could identify those places where such type cast happened, and
only do zero extensions for them, not for the others. This could save a lot
of BPF code-gen.

Algo:
 - Record indices of instructions that do sub-register def (write). And
   these indices need to stay with function state so path pruning and bpf
   to bpf function call could be handled properly.

   These indices are kept up to date while doing insn walk.

 - A full register read on an active sub-register def marks the def insn as
   needing zero extension on dst register.

 - A new sub-register write overrides the old one.

   A new full register write makes the register free of zero extension on
   dst register.

 - When propagating register read64 during path pruning, it also marks def
   insns whose defs are hanging active sub-register, if there is any read64
   from shown from the equal state.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf_verifier.h |  4 +++
 kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 84 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 27761ab..0ae9a3f 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -181,6 +181,9 @@ struct bpf_func_state {
 	 */
 	u32 subprogno;
 
+	/* tracks subreg definition. */
+	s32 subreg_def[MAX_BPF_REG];
+
 	/* The following fields should be last. See copy_func_state() */
 	int acquired_refs;
 	struct bpf_reference_state *refs;
@@ -232,6 +235,7 @@ struct bpf_insn_aux_data {
 	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
 	int sanitize_stack_off; /* stack slot to be cleared */
 	bool seen; /* this insn was processed by the verifier */
+	bool zext_dst; /* this insn zero extend dst reg */
 	u8 alu_state; /* used in combination with alu_limit */
 	unsigned int orig_idx; /* original instruction index */
 };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b95c438..66e5e65 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -971,16 +971,19 @@ static void mark_reg_not_init(struct bpf_verifier_env *env,
 	__mark_reg_not_init(regs + regno);
 }
 
+#define DEF_NOT_SUBREG	(-1)
 static void init_reg_state(struct bpf_verifier_env *env,
 			   struct bpf_func_state *state)
 {
 	struct bpf_reg_state *regs = state->regs;
+	s32 *subreg_def = state->subreg_def;
 	int i;
 
 	for (i = 0; i < MAX_BPF_REG; i++) {
 		mark_reg_not_init(env, regs, i);
 		regs[i].live = REG_LIVE_NONE;
 		regs[i].parent = NULL;
+		subreg_def[i] = DEF_NOT_SUBREG;
 	}
 
 	/* frame pointer */
@@ -1149,18 +1152,66 @@ static int mark_reg_read(struct bpf_verifier_env *env,
 	return 0;
 }
 
+/* This function is supposed to be used by the following check_reg_arg only. */
+static bool insn_has_reg64(struct bpf_insn *insn)
+{
+	u8 code, class, op;
+
+	code = insn->code;
+	class = BPF_CLASS(code);
+	op = BPF_OP(code);
+
+	/* BPF_EXIT will reach here because of return value readability test for
+	 * "main" which has s32 return value.
+	 * BPF_CALL will reach here because of marking caller saved clobber with
+	 * DST_OP_NO_MARK for which we don't care the register def because they
+	 * are anyway marked as NOT_INIT already.
+	 *
+	 * So, return false for both.
+	 */
+	if (class == BPF_JMP && (op == BPF_EXIT || op == BPF_CALL))
+		return false;
+
+	if (class == BPF_ALU64 || class == BPF_JMP ||
+	    /* BPF_END always use BPF_ALU class. */
+	    (class == BPF_ALU && op == BPF_END && insn->imm == 64))
+		return true;
+
+	if (class == BPF_ALU || class == BPF_JMP32)
+		return false;
+
+	/* LD/ST/LDX/STX */
+	return BPF_SIZE(code) == BPF_DW;
+}
+
+static void mark_insn_zext(struct bpf_verifier_env *env,
+			   struct bpf_func_state *state, u8 regno)
+{
+	s32 def_idx = state->subreg_def[regno];
+
+	if (def_idx == DEF_NOT_SUBREG)
+		return;
+
+	env->insn_aux_data[def_idx].zext_dst = true;
+	/* The dst will be zero extended, so won't be sub-register anymore. */
+	state->subreg_def[regno] = DEF_NOT_SUBREG;
+}
+
 static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 			 enum reg_arg_type t)
 {
 	struct bpf_verifier_state *vstate = env->cur_state;
 	struct bpf_func_state *state = vstate->frame[vstate->curframe];
+	struct bpf_insn *insn = env->prog->insnsi + env->insn_idx;
 	struct bpf_reg_state *regs = state->regs;
+	bool dw_reg;
 
 	if (regno >= MAX_BPF_REG) {
 		verbose(env, "R%d is invalid\n", regno);
 		return -EINVAL;
 	}
 
+	dw_reg = insn_has_reg64(insn);
 	if (t == SRC_OP) {
 		/* check whether register used as source operand can be read */
 		if (regs[regno].type == NOT_INIT) {
@@ -1168,9 +1219,12 @@ static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		/* We don't need to worry about FP liveness because it's read-only */
-		if (regno != BPF_REG_FP)
+		if (regno != BPF_REG_FP) {
+			if (dw_reg)
+				mark_insn_zext(env, state, regno);
 			return mark_reg_read(env, &regs[regno],
-					     regs[regno].parent, true);
+					     regs[regno].parent, dw_reg);
+		}
 	} else {
 		/* check whether register used as dest operand can be written to */
 		if (regno == BPF_REG_FP) {
@@ -1178,6 +1232,8 @@ static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		regs[regno].live |= REG_LIVE_WRITTEN;
+		state->subreg_def[regno] =
+			dw_reg ? DEF_NOT_SUBREG : env->insn_idx;
 		if (t == DST_OP)
 			mark_reg_unknown(env, regs, regno);
 	}
@@ -2360,6 +2416,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 	if (err)
 		return err;
 
+	/* arg_type doesn't differentiate 32 and 64-bit arg, always zext. */
+	mark_insn_zext(env, cur_func(env), regno);
+
 	if (arg_type == ARG_ANYTHING) {
 		if (is_pointer_value(env, regno)) {
 			verbose(env, "R%d leaks addr into helper function\n",
@@ -2880,10 +2939,13 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		return err;
 
 	/* copy r1 - r5 args that callee can access.  The copy includes parent
-	 * pointers, which connects us up to the liveness chain
+	 * pointers, which connects us up to the liveness chain. subreg_def for
+	 * them need to be copied as well.
 	 */
-	for (i = BPF_REG_1; i <= BPF_REG_5; i++)
+	for (i = BPF_REG_1; i <= BPF_REG_5; i++) {
 		callee->regs[i] = caller->regs[i];
+		callee->subreg_def[i] = caller->subreg_def[i];
+	}
 
 	/* after the call registers r0 - r5 were scratched */
 	for (i = 0; i < CALLER_SAVED_REGS; i++) {
@@ -2928,8 +2990,11 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 
 	state->curframe--;
 	caller = state->frame[state->curframe];
-	/* return to the caller whatever r0 had in the callee */
+	/* return to the caller whatever r0 had in the callee, subreg_def should
+	 * be copied to caller as well.
+	 */
 	caller->regs[BPF_REG_0] = *r0;
+	caller->subreg_def[BPF_REG_0] = callee->subreg_def[BPF_REG_0];
 
 	/* Transfer references to the caller */
 	err = transfer_reference_state(caller, callee);
@@ -3118,6 +3183,9 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 		check_reg_arg(env, caller_saved[i], DST_OP_NO_MARK);
 	}
 
+	/* helper call must return full 64-bit R0. */
+	cur_func(env)->subreg_def[BPF_REG_0] = DEF_NOT_SUBREG;
+
 	/* update return register (already marked as written above) */
 	if (fn->ret_type == RET_INTEGER) {
 		/* sets type to SCALAR_VALUE */
@@ -5114,6 +5182,8 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	 * Already marked as written above.
 	 */
 	mark_reg_unknown(env, regs, BPF_REG_0);
+	/* ld_abs load up to 32-bit skb data. */
+	cur_func(env)->subreg_def[BPF_REG_0] = env->insn_idx;
 	return 0;
 }
 
@@ -6092,12 +6162,17 @@ static int propagate_liveness(struct bpf_verifier_env *env,
 	BUILD_BUG_ON(BPF_REG_FP + 1 != MAX_BPF_REG);
 	parent_regs = vparent->frame[vparent->curframe]->regs;
 	regs = vstate->frame[vstate->curframe]->regs;
+	parent = vparent->frame[vparent->curframe];
 	/* We don't need to worry about FP liveness because it's read-only */
 	for (i = 0; i < BPF_REG_FP; i++) {
 		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i],
 					     REG_LIVE_READ64);
 		if (err < 0)
 			return err;
+
+		if (err > 0)
+			mark_insn_zext(env, parent, i);
+
 		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i],
 					     REG_LIVE_READ32);
 		if (err < 0)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 05/16] bpf: reduce false alarm by refining "enum bpf_arg_type"
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (3 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt" Jiong Wang
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

Unlike BPF to BPF function call, BPF helper call is calls to native insns
that verifier can't walk. So, verifier needs helper proto type descriptions
for data-flow purpose.

There is such information already, but it is not differentiate sub-register
read with full register read.

This patch split "enum bpf_arg_type" for sub-register read, and updated
descriptions for several functions that shown frequent usage in one Cilium
benchmark.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf.h   |  3 +++
 kernel/bpf/core.c     |  2 +-
 kernel/bpf/helpers.c  |  2 +-
 kernel/bpf/verifier.c | 19 +++++++++++++------
 net/core/filter.c     | 28 ++++++++++++++--------------
 5 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f628971..5616a58 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -190,9 +190,12 @@ enum bpf_arg_type {
 
 	ARG_CONST_SIZE,		/* number of bytes accessed from memory */
 	ARG_CONST_SIZE_OR_ZERO,	/* number of bytes accessed from memory or 0 */
+	ARG_CONST_SIZE32,	/* Likewise, but size fits into 32-bit */
+	ARG_CONST_SIZE32_OR_ZERO,	/* Ditto */
 
 	ARG_PTR_TO_CTX,		/* pointer to context */
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
+	ARG_ANYTHING32,		/* Likewise, but it is a 32-bit argument */
 	ARG_PTR_TO_SPIN_LOCK,	/* pointer to bpf_spin_lock */
 	ARG_PTR_TO_SOCK_COMMON,	/* pointer to sock_common */
 };
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index ff09d32..8834d80 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2065,7 +2065,7 @@ const struct bpf_func_proto bpf_tail_call_proto = {
 	.ret_type	= RET_VOID,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_CONST_MAP_PTR,
-	.arg3_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING32,
 };
 
 /* Stub for JITs that only support cBPF. eBPF programs are interpreted.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a411fc1..6b7453e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -218,7 +218,7 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
-	.arg2_type	= ARG_CONST_SIZE,
+	.arg2_type	= ARG_CONST_SIZE32,
 };
 
 #if defined(CONFIG_QUEUED_SPINLOCKS) || defined(CONFIG_BPF_ARCH_SPINLOCK)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 66e5e65..83448bb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2398,7 +2398,9 @@ static bool arg_type_is_mem_ptr(enum bpf_arg_type type)
 static bool arg_type_is_mem_size(enum bpf_arg_type type)
 {
 	return type == ARG_CONST_SIZE ||
-	       type == ARG_CONST_SIZE_OR_ZERO;
+	       type == ARG_CONST_SIZE_OR_ZERO ||
+	       type == ARG_CONST_SIZE32 ||
+	       type == ARG_CONST_SIZE32_OR_ZERO;
 }
 
 static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
@@ -2416,10 +2418,12 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 	if (err)
 		return err;
 
-	/* arg_type doesn't differentiate 32 and 64-bit arg, always zext. */
-	mark_insn_zext(env, cur_func(env), regno);
+	if (arg_type != ARG_ANYTHING32 &&
+	    arg_type != ARG_CONST_SIZE32 &&
+	    arg_type != ARG_CONST_SIZE32_OR_ZERO)
+		mark_insn_zext(env, cur_func(env), regno);
 
-	if (arg_type == ARG_ANYTHING) {
+	if (arg_type == ARG_ANYTHING || arg_type == ARG_ANYTHING32) {
 		if (is_pointer_value(env, regno)) {
 			verbose(env, "R%d leaks addr into helper function\n",
 				regno);
@@ -2442,7 +2446,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		    type != expected_type)
 			goto err_type;
 	} else if (arg_type == ARG_CONST_SIZE ||
-		   arg_type == ARG_CONST_SIZE_OR_ZERO) {
+		   arg_type == ARG_CONST_SIZE_OR_ZERO ||
+		   arg_type == ARG_CONST_SIZE32 ||
+		   arg_type == ARG_CONST_SIZE32_OR_ZERO) {
 		expected_type = SCALAR_VALUE;
 		if (type != expected_type)
 			goto err_type;
@@ -2536,7 +2542,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 					      meta->map_ptr->value_size, false,
 					      meta);
 	} else if (arg_type_is_mem_size(arg_type)) {
-		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
+		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO ||
+					  arg_type == ARG_CONST_SIZE32_OR_ZERO);
 
 		/* remember the mem_size which may be used later
 		 * to refine return values.
diff --git a/net/core/filter.c b/net/core/filter.c
index 22eb2ed..3f6d8af 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1693,9 +1693,9 @@ static const struct bpf_func_proto bpf_skb_store_bytes_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_PTR_TO_MEM,
-	.arg4_type	= ARG_CONST_SIZE,
+	.arg4_type	= ARG_CONST_SIZE32,
 	.arg5_type	= ARG_ANYTHING,
 };
 
@@ -1724,9 +1724,9 @@ static const struct bpf_func_proto bpf_skb_load_bytes_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
-	.arg4_type	= ARG_CONST_SIZE,
+	.arg4_type	= ARG_CONST_SIZE32,
 };
 
 BPF_CALL_5(bpf_skb_load_bytes_relative, const struct sk_buff *, skb,
@@ -1875,7 +1875,7 @@ static const struct bpf_func_proto bpf_l3_csum_replace_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_ANYTHING,
 	.arg4_type	= ARG_ANYTHING,
 	.arg5_type	= ARG_ANYTHING,
@@ -1928,7 +1928,7 @@ static const struct bpf_func_proto bpf_l4_csum_replace_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_ANYTHING,
 	.arg4_type	= ARG_ANYTHING,
 	.arg5_type	= ARG_ANYTHING,
@@ -1967,9 +1967,9 @@ static const struct bpf_func_proto bpf_csum_diff_proto = {
 	.pkt_access	= true,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_MEM_OR_NULL,
-	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg2_type	= ARG_CONST_SIZE32_OR_ZERO,
 	.arg3_type	= ARG_PTR_TO_MEM_OR_NULL,
-	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg4_type	= ARG_CONST_SIZE32_OR_ZERO,
 	.arg5_type	= ARG_ANYTHING,
 };
 
@@ -2150,7 +2150,7 @@ static const struct bpf_func_proto bpf_redirect_proto = {
 	.func           = bpf_redirect,
 	.gpl_only       = false,
 	.ret_type       = RET_INTEGER,
-	.arg1_type      = ARG_ANYTHING,
+	.arg1_type      = ARG_ANYTHING32,
 	.arg2_type      = ARG_ANYTHING,
 };
 
@@ -2928,7 +2928,7 @@ static const struct bpf_func_proto bpf_skb_change_proto_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_ANYTHING,
 };
 
@@ -2948,7 +2948,7 @@ static const struct bpf_func_proto bpf_skb_change_type_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 };
 
 static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
@@ -3236,7 +3236,7 @@ static const struct bpf_func_proto bpf_skb_change_tail_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING32,
 	.arg3_type	= ARG_ANYTHING,
 };
 
@@ -3832,7 +3832,7 @@ static const struct bpf_func_proto bpf_skb_get_tunnel_key_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
-	.arg3_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_CONST_SIZE32,
 	.arg4_type	= ARG_ANYTHING,
 };
 
@@ -3941,7 +3941,7 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_key_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_PTR_TO_MEM,
-	.arg3_type	= ARG_CONST_SIZE,
+	.arg3_type	= ARG_CONST_SIZE32,
 	.arg4_type	= ARG_ANYTHING,
 };
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt"
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (4 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 05/16] bpf: reduce false alarm by refining "enum bpf_arg_type" Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-27 17:00   ` Alexei Starovoitov
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 07/16] bpf: insert explicit zero extension instructions when bpf_jit_32bit_opt is true Jiong Wang
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

After previous patches, verifier has marked those instructions that really
need zero extension on dst_reg.

It is then for all back-ends to decide how to use such information to
eliminate unnecessary zero extension codegen during JIT compilation.

One approach is:
  1. Verifier insert explicit zero extension for those instructions that
     need zero extension.
  2. All JIT back-ends do NOT generate zero extension for sub-register
     write any more.

The good thing for this approach is no major change on JIT back-end
interface, all back-ends could get this optimization.

However, only those back-ends that do not have hardware zero extension
want this optimization. For back-ends like x86_64 and AArch64, there is
hardware support, so this optimization should be disabled.

This patch introduces new sysctl "bpf_jit_32bit_opt" which is the control
variable for whether the optimization should be enabled.

It is initialized using target hook bpf_jit_hardware_zext which is default
true, meaning the underlying hardware will do zero extension automatically,
therefore the optimization will be disabled.

Offload targets do not use this native target hook, instead, they could
get the optimization results using bpf_prog_offload_ops.finalize.

The user could always enable or disable the optimization by using:

   sysctl net/core/bpf_jit_32bit_opt=[0 | 1]

A brief diagram below to show the logic of how the optimization is
controlled.

        arm                       ppc                      x86_64
bpf_jit_hardware_zext()  bpf_jit_hardware_zext()  bpf_jit_hardware_zext()
         |                         |                         |
         V                         V                         V
       false                     false                      true
            |                      |                          |
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                   |
                                   V
                            bpf_jit_32bit_opt
                                   /\
                                  /  \
                                true false -> disable optimization
                                 |
                                 V
                               enable optimization

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 Documentation/sysctl/net.txt | 15 +++++++++++++++
 include/linux/filter.h       |  2 ++
 kernel/bpf/core.c            | 16 ++++++++++++++++
 net/core/sysctl_net_core.c   |  9 +++++++++
 4 files changed, 42 insertions(+)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 2ae91d3..f820e3b 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -101,6 +101,21 @@ compiler in order to reject unprivileged JIT requests once it has
 been surpassed. bpf_jit_limit contains the value of the global limit
 in bytes.
 
+bpf_jit_32bit_opt
+-----------------
+
+This enables verifier optimizations related with sub-register access. These
+optimizations aim to help JIT back-ends doing code-gen efficiently. There is
+only one such optimization at the moment, the zero extension insertion pass.
+Once it is enabled, verifier will guarantee high bits clearance semantics
+when doing sub-register write whenever it is necessary. Without this, JIT
+back-ends always need to do code-gen for high bits clearance, which leads to
+significant redundancy.
+
+Values :
+	0 - disable these optimization passes
+	1 - enable these optimization passes
+
 dev_weight
 --------------
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 6074aa0..b66a4d9 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -819,6 +819,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
 void bpf_jit_compile(struct bpf_prog *prog);
+bool bpf_jit_hardware_zext(void);
 bool bpf_helper_changes_pkt_data(void *func);
 
 static inline bool bpf_dump_raw_ok(void)
@@ -905,6 +906,7 @@ extern int bpf_jit_enable;
 extern int bpf_jit_harden;
 extern int bpf_jit_kallsyms;
 extern long bpf_jit_limit;
+extern int bpf_jit_32bit_opt;
 
 typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8834d80..cc7f0fd 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -524,6 +524,14 @@ int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
 int bpf_jit_harden   __read_mostly;
 int bpf_jit_kallsyms __read_mostly;
 long bpf_jit_limit   __read_mostly;
+int bpf_jit_32bit_opt __read_mostly;
+
+static int __init bpf_jit_32bit_opt_init(void)
+{
+	bpf_jit_32bit_opt = !bpf_jit_hardware_zext();
+	return 0;
+}
+pure_initcall(bpf_jit_32bit_opt_init);
 
 static __always_inline void
 bpf_get_prog_addr_region(const struct bpf_prog *prog,
@@ -2089,6 +2097,14 @@ bool __weak bpf_helper_changes_pkt_data(void *func)
 	return false;
 }
 
+/* Return TRUE is the target hardware of JIT will do zero extension to high bits
+ * when writing to low 32-bit of one register. Otherwise, return FALSE.
+ */
+bool __weak bpf_jit_hardware_zext(void)
+{
+	return true;
+}
+
 /* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
  * skb_copy_bits(), so provide a weak definition of it for NET-less config.
  */
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 84bf286..68be151 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -416,6 +416,15 @@ static struct ctl_table net_core_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one,
 	},
+	{
+		.procname	= "bpf_jit_32bit_opt",
+		.data		= &bpf_jit_32bit_opt,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax_bpf_restricted,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
 # endif
 	{
 		.procname	= "bpf_jit_limit",
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 07/16] bpf: insert explicit zero extension instructions when bpf_jit_32bit_opt is true
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (5 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt" Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 08/16] arm: bpf: eliminate zero extension code-gen Jiong Wang
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

This patch implements the zero extension insertion pass using
bpf_patch_insn_data infrastructure.

Once zero extensions are inserted, tell JIT back-ends about this through
the new field env boolean "no_verifier_zext". We need this because user
could enable or disable the insertion pass as they like through sysctl
variable.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf.h   |  1 +
 kernel/bpf/verifier.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5616a58..3336f93 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -359,6 +359,7 @@ struct bpf_prog_aux {
 	u32 id;
 	u32 func_cnt; /* used by non-func prog as the number of func progs */
 	u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
+	bool no_verifier_zext; /* No zero extension insertion by verifier. */
 	bool offload_requested;
 	struct bpf_prog **func;
 	void *jit_data; /* JIT specific data. arch dependent */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 83448bb..57db451 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7224,6 +7224,38 @@ static int opt_remove_nops(struct bpf_verifier_env *env)
 	return 0;
 }
 
+static int opt_subreg_zext(struct bpf_verifier_env *env)
+{
+	struct bpf_insn_aux_data *aux = env->insn_aux_data;
+	int i, delta = 0, len = env->prog->len;
+	struct bpf_insn *insns = env->prog->insnsi;
+	struct bpf_insn zext_patch[3];
+	struct bpf_prog *new_prog;
+
+	zext_patch[1] = BPF_ALU64_IMM(BPF_LSH, 0, 32);
+	zext_patch[2] = BPF_ALU64_IMM(BPF_RSH, 0, 32);
+	for (i = 0; i < len; i++) {
+		struct bpf_insn insn;
+
+		if (!aux[i + delta].zext_dst)
+			continue;
+
+		insn = insns[i + delta];
+		zext_patch[0] = insn;
+		zext_patch[1].dst_reg = insn.dst_reg;
+		zext_patch[2].dst_reg = insn.dst_reg;
+		new_prog = bpf_patch_insn_data(env, i + delta, zext_patch, 3);
+		if (!new_prog)
+			return -ENOMEM;
+		env->prog = new_prog;
+		insns = new_prog->insnsi;
+		aux = env->insn_aux_data;
+		delta += 2;
+	}
+
+	return 0;
+}
+
 /* convert load instructions that access fields of a context type into a
  * sequence of instructions that access fields of the underlying structure:
  *     struct __sk_buff    -> struct sk_buff
@@ -8022,7 +8054,18 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	if (ret == 0)
 		ret = check_max_stack_depth(env);
 
-	/* instruction rewrites happen after this point */
+	/* Instruction rewrites happen after this point.
+	 * For offload target, finalize hook has all aux insn info, do any
+	 * customized work there.
+	 */
+	if (ret == 0 && bpf_jit_32bit_opt &&
+	    !bpf_prog_is_dev_bound(env->prog->aux)) {
+		ret = opt_subreg_zext(env);
+		env->prog->aux->no_verifier_zext = !!ret;
+	} else {
+		env->prog->aux->no_verifier_zext = true;
+	}
+
 	if (is_priv) {
 		if (ret == 0)
 			opt_hard_wire_dead_code_branches(env);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 08/16] arm: bpf: eliminate zero extension code-gen
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (6 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 07/16] bpf: insert explicit zero extension instructions when bpf_jit_32bit_opt is true Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 09/16] powerpc: " Jiong Wang
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Shubham Bansal

Cc: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/arm/net/bpf_jit_32.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index c8bfbbf..8cecd06 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -736,7 +736,8 @@ static inline void emit_a32_alu_r64(const bool is64, const s8 dst[],
 
 		/* ALU operation */
 		emit_alu_r(rd[1], rs, true, false, op, ctx);
-		emit_a32_mov_i(rd[0], 0, ctx);
+		if (ctx->prog->aux->no_verifier_zext)
+			emit_a32_mov_i(rd[0], 0, ctx);
 	}
 
 	arm_bpf_put_reg64(dst, rd, ctx);
@@ -758,8 +759,9 @@ static inline void emit_a32_mov_r64(const bool is64, const s8 dst[],
 				  struct jit_ctx *ctx) {
 	if (!is64) {
 		emit_a32_mov_r(dst_lo, src_lo, ctx);
-		/* Zero out high 4 bytes */
-		emit_a32_mov_i(dst_hi, 0, ctx);
+		if (ctx->prog->aux->no_verifier_zext)
+			/* Zero out high 4 bytes */
+			emit_a32_mov_i(dst_hi, 0, ctx);
 	} else if (__LINUX_ARM_ARCH__ < 6 &&
 		   ctx->cpu_architecture < CPU_ARCH_ARMv5TE) {
 		/* complete 8 byte move */
@@ -1438,7 +1440,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 		}
 		emit_udivmod(rd_lo, rd_lo, rt, ctx, BPF_OP(code));
 		arm_bpf_put_reg32(dst_lo, rd_lo, ctx);
-		emit_a32_mov_i(dst_hi, 0, ctx);
+		if (ctx->prog->aux->no_verifier_zext)
+			emit_a32_mov_i(dst_hi, 0, ctx);
 		break;
 	case BPF_ALU64 | BPF_DIV | BPF_K:
 	case BPF_ALU64 | BPF_DIV | BPF_X:
@@ -1453,7 +1456,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 			return -EINVAL;
 		if (imm)
 			emit_a32_alu_i(dst_lo, imm, ctx, BPF_OP(code));
-		emit_a32_mov_i(dst_hi, 0, ctx);
+		if (ctx->prog->aux->no_verifier_zext)
+			emit_a32_mov_i(dst_hi, 0, ctx);
 		break;
 	/* dst = dst << imm */
 	case BPF_ALU64 | BPF_LSH | BPF_K:
@@ -1488,7 +1492,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 	/* dst = ~dst */
 	case BPF_ALU | BPF_NEG:
 		emit_a32_alu_i(dst_lo, 0, ctx, BPF_OP(code));
-		emit_a32_mov_i(dst_hi, 0, ctx);
+		if (ctx->prog->aux->no_verifier_zext)
+			emit_a32_mov_i(dst_hi, 0, ctx);
 		break;
 	/* dst = ~dst (64 bit) */
 	case BPF_ALU64 | BPF_NEG:
@@ -1838,6 +1843,11 @@ void bpf_jit_compile(struct bpf_prog *prog)
 	/* Nothing to do here. We support Internal BPF. */
 }
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
 	struct bpf_prog *tmp, *orig_prog = prog;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 09/16] powerpc: bpf: eliminate zero extension code-gen
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (7 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 08/16] arm: bpf: eliminate zero extension code-gen Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 10/16] s390: " Jiong Wang
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Naveen N . Rao, Sandipan Das

Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/powerpc/net/bpf_jit_comp64.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 21a1dcd..d10621b 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -559,7 +559,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 
 bpf_alu32_trunc:
 		/* Truncate to 32-bits */
-		if (BPF_CLASS(code) == BPF_ALU)
+		if (BPF_CLASS(code) == BPF_ALU && fp->aux->no_verifier_zext)
 			PPC_RLWINM(dst_reg, dst_reg, 0, 0, 31);
 		break;
 
@@ -1046,6 +1046,11 @@ struct powerpc64_jit_data {
 	struct codegen_context ctx;
 };
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 {
 	u32 proglen;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 10/16] s390: bpf: eliminate zero extension code-gen
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (8 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 09/16] powerpc: " Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 11/16] sparc: " Jiong Wang
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Martin Schwidefsky, Heiko Carstens

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/s390/net/bpf_jit_comp.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 51dd026..59592d7 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -299,9 +299,11 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
 
 #define EMIT_ZERO(b1)						\
 ({								\
-	/* llgfr %dst,%dst (zero extend to 64 bit) */		\
-	EMIT4(0xb9160000, b1, b1);				\
-	REG_SET_SEEN(b1);					\
+	if (fp->aux->no_verifier_zext) {			\
+		/* llgfr %dst,%dst (zero extend to 64 bit) */	\
+		EMIT4(0xb9160000, b1, b1);			\
+		REG_SET_SEEN(b1);				\
+	}							\
 })
 
 /*
@@ -1282,6 +1284,11 @@ static int bpf_jit_prog(struct bpf_jit *jit, struct bpf_prog *fp)
 	return 0;
 }
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 /*
  * Compile eBPF program "fp"
  */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 11/16] sparc: bpf: eliminate zero extension code-gen
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (9 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 10/16] s390: " Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 12/16] x32: " Jiong Wang
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, David S . Miller

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/sparc/net/bpf_jit_comp_64.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 65428e7..e58f84e 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -1144,7 +1144,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 		break;
 
 	do_alu32_trunc:
-		if (BPF_CLASS(code) == BPF_ALU)
+		if (BPF_CLASS(code) == BPF_ALU &&
+		    ctx->prog->aux->no_verifier_zext)
 			emit_alu_K(SRL, dst, 0, ctx);
 		break;
 
@@ -1432,6 +1433,11 @@ static void jit_fill_hole(void *area, unsigned int size)
 		*ptr++ = 0x91d02005; /* ta 5 */
 }
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 struct sparc64_jit_data {
 	struct bpf_binary_header *header;
 	u8 *image;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 12/16] x32: bpf: eliminate zero extension code-gen
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (10 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 11/16] sparc: " Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 13/16] riscv: " Jiong Wang
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Wang YanQing

Cc: Wang YanQing <udknight@gmail.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/x86/net/bpf_jit_comp32.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index 0d9cdff..8c6cf22 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -567,7 +567,7 @@ static inline void emit_ia32_alu_r(const bool is64, const bool hi, const u8 op,
 static inline void emit_ia32_alu_r64(const bool is64, const u8 op,
 				     const u8 dst[], const u8 src[],
 				     bool dstk,  bool sstk,
-				     u8 **pprog)
+				     u8 **pprog, const struct bpf_prog_aux *aux)
 {
 	u8 *prog = *pprog;
 
@@ -575,7 +575,7 @@ static inline void emit_ia32_alu_r64(const bool is64, const u8 op,
 	if (is64)
 		emit_ia32_alu_r(is64, true, op, dst_hi, src_hi, dstk, sstk,
 				&prog);
-	else
+	else if (aux->no_verifier_zext)
 		emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 	*pprog = prog;
 }
@@ -666,7 +666,8 @@ static inline void emit_ia32_alu_i(const bool is64, const bool hi, const u8 op,
 /* ALU operation (64 bit) */
 static inline void emit_ia32_alu_i64(const bool is64, const u8 op,
 				     const u8 dst[], const u32 val,
-				     bool dstk, u8 **pprog)
+				     bool dstk, u8 **pprog,
+				     const struct bpf_prog_aux *aux)
 {
 	u8 *prog = *pprog;
 	u32 hi = 0;
@@ -677,7 +678,7 @@ static inline void emit_ia32_alu_i64(const bool is64, const u8 op,
 	emit_ia32_alu_i(is64, false, op, dst_lo, val, dstk, &prog);
 	if (is64)
 		emit_ia32_alu_i(is64, true, op, dst_hi, hi, dstk, &prog);
-	else
+	else if (aux->no_verifier_zext)
 		emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 
 	*pprog = prog;
@@ -1690,11 +1691,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 			switch (BPF_SRC(code)) {
 			case BPF_X:
 				emit_ia32_alu_r64(is64, BPF_OP(code), dst,
-						  src, dstk, sstk, &prog);
+						  src, dstk, sstk, &prog,
+						  bpf_prog->aux);
 				break;
 			case BPF_K:
 				emit_ia32_alu_i64(is64, BPF_OP(code), dst,
-						  imm32, dstk, &prog);
+						  imm32, dstk, &prog,
+						  bpf_prog->aux);
 				break;
 			}
 			break;
@@ -1713,7 +1716,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 						false, &prog);
 				break;
 			}
-			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			if (bpf_prog->aux->no_verifier_zext)
+				emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 			break;
 		case BPF_ALU | BPF_LSH | BPF_X:
 		case BPF_ALU | BPF_RSH | BPF_X:
@@ -1733,7 +1737,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 						  &prog);
 				break;
 			}
-			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			if (bpf_prog->aux->no_verifier_zext)
+				emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 			break;
 		/* dst = dst / src(imm) */
 		/* dst = dst % src(imm) */
@@ -1755,7 +1760,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 						    &prog);
 				break;
 			}
-			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			if (bpf_prog->aux->no_verifier_zext)
+				emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 			break;
 		case BPF_ALU64 | BPF_DIV | BPF_K:
 		case BPF_ALU64 | BPF_DIV | BPF_X:
@@ -1772,7 +1778,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 			EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32);
 			emit_ia32_shift_r(BPF_OP(code), dst_lo, IA32_ECX, dstk,
 					  false, &prog);
-			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			if (bpf_prog->aux->no_verifier_zext)
+				emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
 			break;
 		/* dst = dst << imm */
 		case BPF_ALU64 | BPF_LSH | BPF_K:
@@ -2367,6 +2374,11 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 	return proglen;
 }
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
 	struct bpf_binary_header *header = NULL;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 13/16] riscv: bpf: eliminate zero extension code-gen
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (11 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 12/16] x32: " Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 14/16] nfp: " Jiong Wang
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: bpf, netdev, oss-drivers, Jiong Wang, Björn Töpel

CC: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 arch/riscv/net/bpf_jit_comp.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c
index 80b12aa..9cba262 100644
--- a/arch/riscv/net/bpf_jit_comp.c
+++ b/arch/riscv/net/bpf_jit_comp.c
@@ -731,6 +731,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 {
 	bool is64 = BPF_CLASS(insn->code) == BPF_ALU64 ||
 		    BPF_CLASS(insn->code) == BPF_JMP;
+	struct bpf_prog_aux *aux = ctx->prog->aux;
 	int rvoff, i = insn - ctx->prog->insnsi;
 	u8 rd = -1, rs = -1, code = insn->code;
 	s16 off = insn->off;
@@ -743,7 +744,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU | BPF_MOV | BPF_X:
 	case BPF_ALU64 | BPF_MOV | BPF_X:
 		emit(is64 ? rv_addi(rd, rs, 0) : rv_addiw(rd, rs, 0), ctx);
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 
@@ -771,19 +772,19 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU | BPF_MUL | BPF_X:
 	case BPF_ALU64 | BPF_MUL | BPF_X:
 		emit(is64 ? rv_mul(rd, rd, rs) : rv_mulw(rd, rd, rs), ctx);
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_DIV | BPF_X:
 	case BPF_ALU64 | BPF_DIV | BPF_X:
 		emit(is64 ? rv_divu(rd, rd, rs) : rv_divuw(rd, rd, rs), ctx);
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_MOD | BPF_X:
 	case BPF_ALU64 | BPF_MOD | BPF_X:
 		emit(is64 ? rv_remu(rd, rd, rs) : rv_remuw(rd, rd, rs), ctx);
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_LSH | BPF_X:
@@ -867,7 +868,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU | BPF_MOV | BPF_K:
 	case BPF_ALU64 | BPF_MOV | BPF_K:
 		emit_imm(rd, imm, ctx);
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 
@@ -882,7 +883,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit(is64 ? rv_add(rd, rd, RV_REG_T1) :
 			     rv_addw(rd, rd, RV_REG_T1), ctx);
 		}
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_SUB | BPF_K:
@@ -895,7 +896,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit(is64 ? rv_sub(rd, rd, RV_REG_T1) :
 			     rv_subw(rd, rd, RV_REG_T1), ctx);
 		}
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_AND | BPF_K:
@@ -906,7 +907,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_imm(RV_REG_T1, imm, ctx);
 			emit(rv_and(rd, rd, RV_REG_T1), ctx);
 		}
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_OR | BPF_K:
@@ -917,7 +918,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_imm(RV_REG_T1, imm, ctx);
 			emit(rv_or(rd, rd, RV_REG_T1), ctx);
 		}
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_XOR | BPF_K:
@@ -928,7 +929,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_imm(RV_REG_T1, imm, ctx);
 			emit(rv_xor(rd, rd, RV_REG_T1), ctx);
 		}
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_MUL | BPF_K:
@@ -936,7 +937,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		emit_imm(RV_REG_T1, imm, ctx);
 		emit(is64 ? rv_mul(rd, rd, RV_REG_T1) :
 		     rv_mulw(rd, rd, RV_REG_T1), ctx);
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_DIV | BPF_K:
@@ -944,7 +945,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		emit_imm(RV_REG_T1, imm, ctx);
 		emit(is64 ? rv_divu(rd, rd, RV_REG_T1) :
 		     rv_divuw(rd, rd, RV_REG_T1), ctx);
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_MOD | BPF_K:
@@ -952,7 +953,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		emit_imm(RV_REG_T1, imm, ctx);
 		emit(is64 ? rv_remu(rd, rd, RV_REG_T1) :
 		     rv_remuw(rd, rd, RV_REG_T1), ctx);
-		if (!is64)
+		if (!is64 && aux->no_verifier_zext)
 			emit_zext_32(rd, ctx);
 		break;
 	case BPF_ALU | BPF_LSH | BPF_K:
@@ -1503,6 +1504,11 @@ static void bpf_flush_icache(void *start, void *end)
 	flush_icache_range((unsigned long)start, (unsigned long)end);
 }
 
+bool bpf_jit_hardware_zext(void)
+{
+	return false;
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
 	bool tmp_blinded = false, extra_pass = false;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 14/16] nfp: bpf: eliminate zero extension code-gen
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (12 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 13/16] riscv: " Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 15/16] selftests: bpf: new field "xlated_insns" for insn scan test after verification Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 16/16] selftests: bpf: unit testcases for zero extension insertion pass Jiong Wang
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

This patch eliminate zero extension code-gen for instructions except
load/store when possible.

Elimination for load/store will be supported in up coming patches.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c      | 119 +++++++++++++---------
 drivers/net/ethernet/netronome/nfp/bpf/main.h     |   2 +
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |  12 +++
 3 files changed, 83 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index f272247..eb30c52 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -612,6 +612,13 @@ static void wrp_immed(struct nfp_prog *nfp_prog, swreg dst, u32 imm)
 }
 
 static void
+wrp_zext(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, u8 dst)
+{
+	if (meta->flags & FLAG_INSN_DO_ZEXT)
+		wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+}
+
+static void
 wrp_immed_relo(struct nfp_prog *nfp_prog, swreg dst, u32 imm,
 	       enum nfp_relo_type relo)
 {
@@ -847,7 +854,8 @@ static int nfp_cpp_memcpy(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 }
 
 static int
-data_ld(struct nfp_prog *nfp_prog, swreg offset, u8 dst_gpr, int size)
+data_ld(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, swreg offset,
+	u8 dst_gpr, int size)
 {
 	unsigned int i;
 	u16 shift, sz;
@@ -870,14 +878,15 @@ data_ld(struct nfp_prog *nfp_prog, swreg offset, u8 dst_gpr, int size)
 			wrp_mov(nfp_prog, reg_both(dst_gpr + i), reg_xfer(i));
 
 	if (i < 2)
-		wrp_immed(nfp_prog, reg_both(dst_gpr + 1), 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 
 	return 0;
 }
 
 static int
-data_ld_host_order(struct nfp_prog *nfp_prog, u8 dst_gpr,
-		   swreg lreg, swreg rreg, int size, enum cmd_mode mode)
+data_ld_host_order(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+		   u8 dst_gpr, swreg lreg, swreg rreg, int size,
+		   enum cmd_mode mode)
 {
 	unsigned int i;
 	u8 mask, sz;
@@ -900,33 +909,34 @@ data_ld_host_order(struct nfp_prog *nfp_prog, u8 dst_gpr,
 			wrp_mov(nfp_prog, reg_both(dst_gpr + i), reg_xfer(i));
 
 	if (i < 2)
-		wrp_immed(nfp_prog, reg_both(dst_gpr + 1), 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 
 	return 0;
 }
 
 static int
-data_ld_host_order_addr32(struct nfp_prog *nfp_prog, u8 src_gpr, swreg offset,
-			  u8 dst_gpr, u8 size)
+data_ld_host_order_addr32(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+			  u8 src_gpr, swreg offset, u8 dst_gpr, u8 size)
 {
-	return data_ld_host_order(nfp_prog, dst_gpr, reg_a(src_gpr), offset,
-				  size, CMD_MODE_32b);
+	return data_ld_host_order(nfp_prog, meta, dst_gpr, reg_a(src_gpr),
+				  offset, size, CMD_MODE_32b);
 }
 
 static int
-data_ld_host_order_addr40(struct nfp_prog *nfp_prog, u8 src_gpr, swreg offset,
-			  u8 dst_gpr, u8 size)
+data_ld_host_order_addr40(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+			  u8 src_gpr, swreg offset, u8 dst_gpr, u8 size)
 {
 	swreg rega, regb;
 
 	addr40_offset(nfp_prog, src_gpr, offset, &rega, &regb);
 
-	return data_ld_host_order(nfp_prog, dst_gpr, rega, regb,
+	return data_ld_host_order(nfp_prog, meta, dst_gpr, rega, regb,
 				  size, CMD_MODE_40b_BA);
 }
 
 static int
-construct_data_ind_ld(struct nfp_prog *nfp_prog, u16 offset, u16 src, u8 size)
+construct_data_ind_ld(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+		      u16 offset, u16 src, u8 size)
 {
 	swreg tmp_reg;
 
@@ -942,10 +952,12 @@ construct_data_ind_ld(struct nfp_prog *nfp_prog, u16 offset, u16 src, u8 size)
 	emit_br_relo(nfp_prog, BR_BLO, BR_OFF_RELO, 0, RELO_BR_GO_ABORT);
 
 	/* Load data */
-	return data_ld(nfp_prog, imm_b(nfp_prog), 0, size);
+	return data_ld(nfp_prog, meta, imm_b(nfp_prog), 0, size);
 }
 
-static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
+static int
+construct_data_ld(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+		  u16 offset, u8 size)
 {
 	swreg tmp_reg;
 
@@ -956,7 +968,7 @@ static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
 
 	/* Load data */
 	tmp_reg = re_load_imm_any(nfp_prog, offset, imm_b(nfp_prog));
-	return data_ld(nfp_prog, tmp_reg, 0, size);
+	return data_ld(nfp_prog, meta, tmp_reg, 0, size);
 }
 
 static int
@@ -1193,7 +1205,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	}
 
 	if (clr_gpr && size < 8)
-		wrp_immed(nfp_prog, reg_both(gpr + 1), 0);
+		wrp_zext(nfp_prog, meta, gpr);
 
 	while (size) {
 		u32 slice_end;
@@ -1294,9 +1306,10 @@ wrp_alu32_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	      enum alu_op alu_op)
 {
 	const struct bpf_insn *insn = &meta->insn;
+	u8 dst = insn->dst_reg * 2;
 
-	wrp_alu_imm(nfp_prog, insn->dst_reg * 2, alu_op, insn->imm);
-	wrp_immed(nfp_prog, reg_both(insn->dst_reg * 2 + 1), 0);
+	wrp_alu_imm(nfp_prog, dst, alu_op, insn->imm);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
@@ -1308,7 +1321,7 @@ wrp_alu32_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	u8 dst = meta->insn.dst_reg * 2, src = meta->insn.src_reg * 2;
 
 	emit_alu(nfp_prog, reg_both(dst), reg_a(dst), alu_op, reg_b(src));
-	wrp_immed(nfp_prog, reg_both(meta->insn.dst_reg * 2 + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
@@ -2385,12 +2398,14 @@ static int neg_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	u8 dst = meta->insn.dst_reg * 2;
 
 	emit_alu(nfp_prog, reg_both(dst), reg_imm(0), ALU_OP_SUB, reg_b(dst));
-	wrp_immed(nfp_prog, reg_both(meta->insn.dst_reg * 2 + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
 
-static int __ashr_imm(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
+static int
+__ashr_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, u8 dst,
+	   u8 shift_amt)
 {
 	if (shift_amt) {
 		/* Set signedness bit (MSB of result). */
@@ -2399,7 +2414,7 @@ static int __ashr_imm(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
 		emit_shf(nfp_prog, reg_both(dst), reg_none(), SHF_OP_ASHR,
 			 reg_b(dst), SHF_SC_R_SHF, shift_amt);
 	}
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
@@ -2414,7 +2429,7 @@ static int ashr_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	umin = meta->umin_src;
 	umax = meta->umax_src;
 	if (umin == umax)
-		return __ashr_imm(nfp_prog, dst, umin);
+		return __ashr_imm(nfp_prog, meta, dst, umin);
 
 	src = insn->src_reg * 2;
 	/* NOTE: the first insn will set both indirect shift amount (source A)
@@ -2423,7 +2438,7 @@ static int ashr_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_b(dst));
 	emit_shf_indir(nfp_prog, reg_both(dst), reg_none(), SHF_OP_ASHR,
 		       reg_b(dst), SHF_SC_R_SHF);
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
@@ -2433,15 +2448,17 @@ static int ashr_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	const struct bpf_insn *insn = &meta->insn;
 	u8 dst = insn->dst_reg * 2;
 
-	return __ashr_imm(nfp_prog, dst, insn->imm);
+	return __ashr_imm(nfp_prog, meta, dst, insn->imm);
 }
 
-static int __shr_imm(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
+static int
+__shr_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, u8 dst,
+	  u8 shift_amt)
 {
 	if (shift_amt)
 		emit_shf(nfp_prog, reg_both(dst), reg_none(), SHF_OP_NONE,
 			 reg_b(dst), SHF_SC_R_SHF, shift_amt);
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 	return 0;
 }
 
@@ -2450,7 +2467,7 @@ static int shr_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	const struct bpf_insn *insn = &meta->insn;
 	u8 dst = insn->dst_reg * 2;
 
-	return __shr_imm(nfp_prog, dst, insn->imm);
+	return __shr_imm(nfp_prog, meta, dst, insn->imm);
 }
 
 static int shr_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
@@ -2463,22 +2480,24 @@ static int shr_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	umin = meta->umin_src;
 	umax = meta->umax_src;
 	if (umin == umax)
-		return __shr_imm(nfp_prog, dst, umin);
+		return __shr_imm(nfp_prog, meta, dst, umin);
 
 	src = insn->src_reg * 2;
 	emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_imm(0));
 	emit_shf_indir(nfp_prog, reg_both(dst), reg_none(), SHF_OP_NONE,
 		       reg_b(dst), SHF_SC_R_SHF);
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 	return 0;
 }
 
-static int __shl_imm(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
+static int
+__shl_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, u8 dst,
+	  u8 shift_amt)
 {
 	if (shift_amt)
 		emit_shf(nfp_prog, reg_both(dst), reg_none(), SHF_OP_NONE,
 			 reg_b(dst), SHF_SC_L_SHF, shift_amt);
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 	return 0;
 }
 
@@ -2487,7 +2506,7 @@ static int shl_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	const struct bpf_insn *insn = &meta->insn;
 	u8 dst = insn->dst_reg * 2;
 
-	return __shl_imm(nfp_prog, dst, insn->imm);
+	return __shl_imm(nfp_prog, meta, dst, insn->imm);
 }
 
 static int shl_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
@@ -2500,11 +2519,11 @@ static int shl_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	umin = meta->umin_src;
 	umax = meta->umax_src;
 	if (umin == umax)
-		return __shl_imm(nfp_prog, dst, umin);
+		return __shl_imm(nfp_prog, meta, dst, umin);
 
 	src = insn->src_reg * 2;
 	shl_reg64_lt32_low(nfp_prog, dst, src);
-	wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 	return 0;
 }
 
@@ -2566,34 +2585,34 @@ static int imm_ld8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 
 static int data_ld1(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ld(nfp_prog, meta->insn.imm, 1);
+	return construct_data_ld(nfp_prog, meta, meta->insn.imm, 1);
 }
 
 static int data_ld2(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ld(nfp_prog, meta->insn.imm, 2);
+	return construct_data_ld(nfp_prog, meta, meta->insn.imm, 2);
 }
 
 static int data_ld4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ld(nfp_prog, meta->insn.imm, 4);
+	return construct_data_ld(nfp_prog, meta, meta->insn.imm, 4);
 }
 
 static int data_ind_ld1(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ind_ld(nfp_prog, meta->insn.imm,
+	return construct_data_ind_ld(nfp_prog, meta, meta->insn.imm,
 				     meta->insn.src_reg * 2, 1);
 }
 
 static int data_ind_ld2(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ind_ld(nfp_prog, meta->insn.imm,
+	return construct_data_ind_ld(nfp_prog, meta, meta->insn.imm,
 				     meta->insn.src_reg * 2, 2);
 }
 
 static int data_ind_ld4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-	return construct_data_ind_ld(nfp_prog, meta->insn.imm,
+	return construct_data_ind_ld(nfp_prog, meta, meta->insn.imm,
 				     meta->insn.src_reg * 2, 4);
 }
 
@@ -2632,7 +2651,7 @@ static int mem_ldx_skb(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 		return -EOPNOTSUPP;
 	}
 
-	wrp_immed(nfp_prog, reg_both(meta->insn.dst_reg * 2 + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
@@ -2658,7 +2677,7 @@ static int mem_ldx_xdp(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 		return -EOPNOTSUPP;
 	}
 
-	wrp_immed(nfp_prog, reg_both(meta->insn.dst_reg * 2 + 1), 0);
+	wrp_zext(nfp_prog, meta, dst);
 
 	return 0;
 }
@@ -2671,7 +2690,7 @@ mem_ldx_data(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 
 	tmp_reg = re_load_imm_any(nfp_prog, meta->insn.off, imm_b(nfp_prog));
 
-	return data_ld_host_order_addr32(nfp_prog, meta->insn.src_reg * 2,
+	return data_ld_host_order_addr32(nfp_prog, meta, meta->insn.src_reg * 2,
 					 tmp_reg, meta->insn.dst_reg * 2, size);
 }
 
@@ -2683,7 +2702,7 @@ mem_ldx_emem(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 
 	tmp_reg = re_load_imm_any(nfp_prog, meta->insn.off, imm_b(nfp_prog));
 
-	return data_ld_host_order_addr40(nfp_prog, meta->insn.src_reg * 2,
+	return data_ld_host_order_addr40(nfp_prog, meta, meta->insn.src_reg * 2,
 					 tmp_reg, meta->insn.dst_reg * 2, size);
 }
 
@@ -2744,7 +2763,7 @@ mem_ldx_data_from_pktcache_unaligned(struct nfp_prog *nfp_prog,
 	wrp_reg_subpart(nfp_prog, dst_lo, src_lo, len_lo, off);
 
 	if (!len_mid) {
-		wrp_immed(nfp_prog, dst_hi, 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 		return 0;
 	}
 
@@ -2752,7 +2771,7 @@ mem_ldx_data_from_pktcache_unaligned(struct nfp_prog *nfp_prog,
 
 	if (size <= REG_WIDTH) {
 		wrp_reg_or_subpart(nfp_prog, dst_lo, src_mid, len_mid, len_lo);
-		wrp_immed(nfp_prog, dst_hi, 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 	} else {
 		swreg src_hi = reg_xfer(idx + 2);
 
@@ -2783,10 +2802,10 @@ mem_ldx_data_from_pktcache_aligned(struct nfp_prog *nfp_prog,
 
 	if (size < REG_WIDTH) {
 		wrp_reg_subpart(nfp_prog, dst_lo, src_lo, size, 0);
-		wrp_immed(nfp_prog, dst_hi, 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 	} else if (size == REG_WIDTH) {
 		wrp_mov(nfp_prog, dst_lo, src_lo);
-		wrp_immed(nfp_prog, dst_hi, 0);
+		wrp_zext(nfp_prog, meta, dst_gpr);
 	} else {
 		swreg src_hi = reg_xfer(idx + 1);
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index b25a482..7369bdf 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -249,6 +249,8 @@ struct nfp_bpf_reg_state {
 #define FLAG_INSN_SKIP_PREC_DEPENDENT		BIT(4)
 /* Instruction is optimized by the verifier */
 #define FLAG_INSN_SKIP_VERIFIER_OPT		BIT(5)
+/* Instruction needs to zero extend to high 32-bit */
+#define FLAG_INSN_DO_ZEXT			BIT(6)
 
 #define FLAG_INSN_SKIP_MASK		(FLAG_INSN_SKIP_NOOP | \
 					 FLAG_INSN_SKIP_PREC_DEPENDENT | \
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 36f56eb..e92ee51 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -744,6 +744,17 @@ static unsigned int nfp_bpf_get_stack_usage(struct nfp_prog *nfp_prog)
 	goto continue_subprog;
 }
 
+static void nfp_bpf_insn_flag_zext(struct nfp_prog *nfp_prog,
+				   struct bpf_insn_aux_data *aux)
+{
+	struct nfp_insn_meta *meta;
+
+	list_for_each_entry(meta, &nfp_prog->insns, l) {
+		if (aux[meta->n].zext_dst)
+			meta->flags |= FLAG_INSN_DO_ZEXT;
+	}
+}
+
 int nfp_bpf_finalize(struct bpf_verifier_env *env)
 {
 	struct bpf_subprog_info *info;
@@ -784,6 +795,7 @@ int nfp_bpf_finalize(struct bpf_verifier_env *env)
 		return -EOPNOTSUPP;
 	}
 
+	nfp_bpf_insn_flag_zext(nfp_prog, env->insn_aux_data);
 	return 0;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 15/16] selftests: bpf: new field "xlated_insns" for insn scan test after verification
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (13 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 14/16] nfp: " Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 16/16] selftests: bpf: unit testcases for zero extension insertion pass Jiong Wang
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

Instruction scan is needed to test the new zero extension insertion pass.

This patch introduces the new "xlated_insns" field. Once it is set,
instructions from "xlated_insns" will be compared with the instruction
sequences returned by prog query syscall after verification.

Failure will be reported if there is mismatch, meaning transformations
haven't happened as expected.

One thing to note is we want to always run such tests but the test host do
NOT necessarily has this optimization enabled.

So, we need to set sysctl variable "bpf_jit_32bit_opt" to true manually
before running such tests and restore its value back.

Also, we disable JIT blinding which could cause trouble when matching
instructions.

We only run insn scan tests under privileged mode.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 tools/testing/selftests/bpf/test_verifier.c | 220 ++++++++++++++++++++++++++--
 1 file changed, 210 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 19b5d03..aeb2566 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -65,7 +65,8 @@ static int skips;
 
 struct bpf_test {
 	const char *descr;
-	struct bpf_insn	insns[MAX_INSNS];
+	struct bpf_insn insns[MAX_INSNS];
+	struct bpf_insn xlated_insns[MAX_INSNS];
 	int fixup_map_hash_8b[MAX_FIXUPS];
 	int fixup_map_hash_48b[MAX_FIXUPS];
 	int fixup_map_hash_16b[MAX_FIXUPS];
@@ -257,14 +258,33 @@ static struct bpf_test tests[] = {
 #undef FILL_ARRAY
 };
 
-static int probe_filter_length(const struct bpf_insn *fp)
+static int probe_filter_length_bidir(const struct bpf_insn *fp, bool reverse)
 {
 	int len;
 
-	for (len = MAX_INSNS - 1; len > 0; --len)
-		if (fp[len].code != 0 || fp[len].imm != 0)
+	if (reverse) {
+		for (len = MAX_INSNS - 1; len > 0; --len)
+			if (fp[len].code != 0 || fp[len].imm != 0)
+				break;
+		return len + 1;
+	}
+
+	for (len = 0; len < MAX_INSNS; len++)
+		if (fp[len].code == 0 && fp[len].imm == 0)
 			break;
-	return len + 1;
+
+	return len;
+}
+
+static int probe_filter_length(const struct bpf_insn *fp)
+{
+	return probe_filter_length_bidir(fp, true);
+}
+
+static int probe_xlated_filter_length(const struct bpf_insn *fp)
+{
+	/* Translated insn array is very likely to be empty. */
+	return probe_filter_length_bidir(fp, false);
 }
 
 static bool skip_unsupported_map(enum bpf_map_type map_type)
@@ -698,13 +718,130 @@ static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val,
 	return 0;
 }
 
+static inline __u64 ptr_to_u64(const void *ptr)
+{
+	return (__u64)(unsigned long)ptr;
+}
+
+static int read_bpf_procfs(const char *name)
+{
+	char path[64], *endptr, *line = NULL;
+	size_t len = 0;
+	FILE *fd;
+	int res;
+
+	snprintf(path, sizeof(path), "/proc/sys/net/core/%s", name);
+
+	fd = fopen(path, "r");
+	if (!fd)
+		return -1;
+
+	res = getline(&line, &len, fd);
+	fclose(fd);
+	if (res < 0)
+		return -1;
+
+	errno = 0;
+	res = strtol(line, &endptr, 10);
+	if (errno || *line == '\0' || *endptr != '\n')
+		res = -1;
+	free(line);
+
+	return res;
+}
+
+static int write_bpf_procfs(const char *name, const char value)
+{
+	char path[64];
+	FILE *fd;
+	int res;
+
+	snprintf(path, sizeof(path), "/proc/sys/net/core/%s", name);
+
+	fd = fopen(path, "w");
+	if (!fd)
+		return -1;
+
+	res = fwrite(&value, 1, 1, fd);
+	fclose(fd);
+	if (res != 1)
+		return -1;
+
+	return 0;
+}
+
+static int check_xlated_insn(int fd_prog, struct bpf_test *test, int xlated_len)
+{
+	struct bpf_insn *xlated_insn_buf;
+	__u32 len, *member_len, buf_size;
+	struct bpf_prog_info info;
+	__u64 *member_ptr;
+	int err, idx;
+
+	len = sizeof(info);
+	memset(&info, 0, sizeof(info));
+	member_len = &info.xlated_prog_len;
+	member_ptr = &info.xlated_prog_insns;
+	err = bpf_obj_get_info_by_fd(fd_prog, &info, &len);
+	if (err) {
+		printf("FAIL\nFailed to get prog info '%s'!\n",
+		       strerror(errno));
+		return -1;
+	}
+	if (!*member_len) {
+		printf("FAIL\nNo xlated insn returned!\n");
+		return -1;
+	}
+	buf_size = *member_len;
+	xlated_insn_buf = malloc(buf_size);
+	if (!xlated_insn_buf) {
+		printf("FAIL\nFailed to alloc xlated insn buffer!\n");
+		return -1;
+	}
+
+	memset(&info, 0, sizeof(info));
+	*member_ptr = ptr_to_u64(xlated_insn_buf);
+	*member_len = buf_size;
+	err = bpf_obj_get_info_by_fd(fd_prog, &info, &len);
+	if (err) {
+		printf("FAIL\nFailed to get prog info '%s'!\n",
+		       strerror(errno));
+		return -1;
+	}
+	if (*member_len > buf_size) {
+		printf("FAIL\nToo many xlated insns returned!\n");
+		return -1;
+	}
+	for (idx = 0; idx < xlated_len; idx++) {
+		struct bpf_insn expect_insn = test->xlated_insns[idx];
+		struct bpf_insn got_insn = xlated_insn_buf[idx];
+		bool match_fail;
+
+		/* Verifier will rewrite call imm/offset, just compare code. */
+		if (expect_insn.code == (BPF_JMP | BPF_CALL))
+			match_fail = got_insn.code != expect_insn.code;
+		else /* Full match. */
+			match_fail = memcmp(&got_insn, &expect_insn,
+					    sizeof(struct bpf_insn));
+
+		if (match_fail) {
+			printf("FAIL\nFailed to match xlated insns[%d]\n", idx);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 static void do_test_single(struct bpf_test *test, bool unpriv,
 			   int *passes, int *errors)
 {
-	int fd_prog, expected_ret, alignment_prevented_execution;
-	int prog_len, prog_type = test->prog_type;
+	int fd_prog = -1, expected_ret, alignment_prevented_execution;
+	int original_jit_blind = 0, original_jit_32bit_opt = 0;
+	int xlated_len, prog_len, prog_type = test->prog_type;
 	struct bpf_insn *prog = test->insns;
 	int run_errs, run_successes;
+	bool has_xlated_insn_test;
 	int map_fds[MAX_NR_MAPS];
 	const char *expected_err;
 	int fixup_skips;
@@ -724,6 +861,45 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 	if (fixup_skips != skips)
 		return;
 	prog_len = probe_filter_length(prog);
+	xlated_len = probe_xlated_filter_length(test->xlated_insns);
+	expected_ret = unpriv && test->result_unpriv != UNDEF ?
+		       test->result_unpriv : test->result;
+	has_xlated_insn_test = expected_ret == ACCEPT && xlated_len;
+	if (!unpriv) {
+		/* Disable 32-bit optimization for all the other tests. The
+		 * inserted shifts could break some test assumption, for
+		 * example, those hard coded map fixup insn indexes.
+		 */
+		char opt_enable = '0';
+
+		original_jit_32bit_opt = read_bpf_procfs("bpf_jit_32bit_opt");
+		if (original_jit_32bit_opt < 0) {
+			printf("FAIL\nRead jit 32bit opt proc info\n");
+			goto fail;
+		}
+		/* Disable JIT blinding and enable 32-bit optimization when
+		 * there is translated insn match test.
+		 */
+		if (has_xlated_insn_test) {
+			original_jit_blind = read_bpf_procfs("bpf_jit_harden");
+			if (original_jit_blind < 0) {
+				printf("FAIL\nRead jit blinding proc info\n");
+				goto fail;
+			}
+			err = write_bpf_procfs("bpf_jit_harden", '0');
+			if (err < 0) {
+				printf("FAIL\nDisable jit blinding\n");
+				goto fail;
+			}
+
+			opt_enable = '1';
+		}
+		err = write_bpf_procfs("bpf_jit_32bit_opt", opt_enable);
+		if (err < 0) {
+			printf("FAIL\nSetting jit 32-bit opt enablement\n");
+			goto fail;
+		}
+	}
 
 	pflags = 0;
 	if (test->flags & F_LOAD_WITH_STRICT_ALIGNMENT)
@@ -738,8 +914,6 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 		goto close_fds;
 	}
 
-	expected_ret = unpriv && test->result_unpriv != UNDEF ?
-		       test->result_unpriv : test->result;
 	expected_err = unpriv && test->errstr_unpriv ?
 		       test->errstr_unpriv : test->errstr;
 
@@ -756,6 +930,31 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 		    (test->flags & F_NEEDS_EFFICIENT_UNALIGNED_ACCESS))
 			alignment_prevented_execution = 1;
 #endif
+
+		if (!unpriv) {
+			/* Restore 32-bit optimization variable . */
+			err = write_bpf_procfs("bpf_jit_32bit_opt",
+					       '0' + original_jit_32bit_opt);
+			if (err < 0) {
+				printf("FAIL\nRestore jit 32-bit opt\n");
+				goto fail;
+			}
+			if (has_xlated_insn_test) {
+				char c = '0' + original_jit_blind;
+
+				/* Restore JIT blinding variable . */
+				err = write_bpf_procfs("bpf_jit_harden", c);
+				if (err < 0) {
+					printf("FAIL\nRestore jit blinding\n");
+					goto fail;
+				}
+				/* Do xlated insn comparisons. */
+				err = check_xlated_insn(fd_prog, test,
+							xlated_len);
+				if (err < 0)
+					goto fail;
+			}
+		}
 	} else {
 		if (fd_prog >= 0) {
 			printf("FAIL\nUnexpected success to load!\n");
@@ -836,8 +1035,9 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 	sched_yield();
 	return;
 fail_log:
-	(*errors)++;
 	printf("%s", bpf_vlog);
+fail:
+	(*errors)++;
 	goto close_fds;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH/RFC bpf-next 16/16] selftests: bpf: unit testcases for zero extension insertion pass
  2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
                   ` (14 preceding siblings ...)
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 15/16] selftests: bpf: new field "xlated_insns" for insn scan test after verification Jiong Wang
@ 2019-03-26 18:05 ` Jiong Wang
  15 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 18:05 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: bpf, netdev, oss-drivers, Jiong Wang

This patch adds some unit testcases.

There are a couple of code paths inside verifier doing register read/write
marking, therefore are the places that could trigger zero extension
insertion logic. Create one test for each of them.

A couple of testcases for complex CFG also included. They cover register
read propagation during path pruning etc.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 tools/testing/selftests/bpf/verifier/zext.c | 651 ++++++++++++++++++++++++++++
 1 file changed, 651 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/verifier/zext.c

diff --git a/tools/testing/selftests/bpf/verifier/zext.c b/tools/testing/selftests/bpf/verifier/zext.c
new file mode 100644
index 0000000..b45a429
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/zext.c
@@ -0,0 +1,651 @@
+/* There are a couple of code paths inside verifier doing register
+ * read/write marking. Create one test for each.
+ */
+{
+	"zext: basic 1",
+	.insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000000ULL),
+	BPF_MOV32_IMM(BPF_REG_0, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000000ULL),
+	BPF_MOV32_IMM(BPF_REG_0, 0),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	"zext: basic 2",
+	.insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000001ULL),
+	BPF_MOV32_IMM(BPF_REG_0, 1),
+	BPF_ALU64_IMM(BPF_NEG, BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000001ULL),
+	BPF_MOV32_IMM(BPF_REG_0, 1),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_ALU64_IMM(BPF_NEG, BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = -1,
+},
+{
+	"zext: basic 3",
+	.insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000000ULL),
+	BPF_MOV32_IMM(BPF_REG_0, 0),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000000ULL),
+	BPF_MOV32_IMM(BPF_REG_0, 0),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	"zext: basic 4",
+	.insns = {
+	BPF_LD_IMM64(BPF_REG_1, 0x300000001ULL),
+	BPF_MOV32_IMM(BPF_REG_1, 1),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 32),
+	BPF_MOV32_IMM(BPF_REG_2, 2),
+	BPF_JMP_REG(BPF_JSLE, BPF_REG_1, BPF_REG_2, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 3),
+	BPF_EXIT_INSN(),
+	BPF_MOV32_IMM(BPF_REG_0, 4),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_LD_IMM64(BPF_REG_1, 0x300000001ULL),
+	BPF_MOV32_IMM(BPF_REG_1, 1),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 32),
+	BPF_MOV32_IMM(BPF_REG_2, 2),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 32),
+	BPF_JMP_REG(BPF_JSLE, BPF_REG_1, BPF_REG_2, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 3),
+	BPF_EXIT_INSN(),
+	BPF_MOV32_IMM(BPF_REG_0, 4),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 4,
+},
+{
+	"zext: basic 5",
+	.insns = {
+	BPF_LD_IMM64(BPF_REG_1, 0x100000001ULL),
+	BPF_MOV32_IMM(BPF_REG_1, 0),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_LD_IMM64(BPF_REG_1, 0x100000001ULL),
+	BPF_MOV32_IMM(BPF_REG_1, 0),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 32),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	"zext: basic 6",
+	.insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000000ULL),
+	BPF_MOV32_IMM(BPF_REG_0, 0),
+	BPF_MOV32_IMM(BPF_REG_1, 1),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000000ULL),
+	BPF_MOV32_IMM(BPF_REG_0, 0),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_MOV32_IMM(BPF_REG_1, 1),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 32),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"zext: ret from main",
+	.insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000001ULL),
+	BPF_MOV32_IMM(BPF_REG_0, 1),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000001ULL),
+	BPF_MOV32_IMM(BPF_REG_0, 1),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"zext: ret from helper",
+	.insns = {
+	BPF_MOV32_IMM(BPF_REG_0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_IMM(BPF_REG_8, BPF_REG_0),
+	BPF_MOV32_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	/* Shouldn't do zext. */
+	BPF_MOV32_IMM(BPF_REG_0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_IMM(BPF_REG_8, BPF_REG_0),
+	BPF_MOV32_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	"zext: xadd",
+	.insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000001ULL),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+	BPF_MOV32_IMM(BPF_REG_0, 1),
+	BPF_STX_XADD(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_LD_IMM64(BPF_REG_0, 0x100000001ULL),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+	BPF_MOV32_IMM(BPF_REG_0, 1),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_STX_XADD(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.retval = 1,
+},
+{
+	"zext: ld_abs ind",
+	.insns = {
+	BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+	BPF_LD_IMM64(BPF_REG_8, 0x100000000ULL),
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_8, 32),
+	BPF_LD_IND(BPF_B, BPF_REG_8, 0),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+	BPF_LD_IMM64(BPF_REG_8, 0x100000000ULL),
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_8, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_8, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_8, 32),
+	},
+	.data = {
+		10, 20, 30, 40, 50,
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.result = ACCEPT,
+	.retval = 10,
+},
+{
+	"zext: multi paths, all 32-bit use",
+	.insns = {
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2),
+	BPF_MOV32_REG(BPF_REG_0, BPF_REG_8),
+	BPF_EXIT_INSN(),
+	BPF_MOV32_REG(BPF_REG_7, BPF_REG_8),
+	BPF_MOV32_REG(BPF_REG_0, BPF_REG_7),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2),
+	BPF_MOV32_REG(BPF_REG_0, BPF_REG_8),
+	BPF_EXIT_INSN(),
+	BPF_MOV32_REG(BPF_REG_7, BPF_REG_8),
+	BPF_MOV32_REG(BPF_REG_0, BPF_REG_7),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 2,
+},
+{
+	"zext: multi paths, partial 64-bit use",
+	.insns = {
+	BPF_LD_IMM64(BPF_REG_8, 0x100000001ULL),
+	BPF_MOV32_IMM(BPF_REG_8, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2),
+	BPF_MOV32_REG(BPF_REG_0, BPF_REG_8),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_LD_IMM64(BPF_REG_8, 0x100000001ULL),
+	BPF_MOV32_IMM(BPF_REG_8, 0),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_8, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_8, 32),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2),
+	BPF_MOV32_REG(BPF_REG_0, BPF_REG_8),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	"zext: multi paths, 32-bit def override",
+	.insns = {
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 3),
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_MOV32_REG(BPF_REG_0, BPF_REG_8),
+	BPF_EXIT_INSN(),
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_8),
+	BPF_MOV32_REG(BPF_REG_0, BPF_REG_7),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 3),
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_MOV32_REG(BPF_REG_0, BPF_REG_8),
+	BPF_EXIT_INSN(),
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_8, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_8, 32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_8),
+	BPF_MOV32_REG(BPF_REG_0, BPF_REG_7),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 2,
+},
+{
+	/* Diamond CFG
+	 *
+	 *          -----
+	 *         | BB0 |
+	 *          -----
+	 *           /\
+	 *          /  \
+	 *         /    \
+	 *      -----     -----
+	 *     | BB1 |   | BB2 | u32 def
+	 *      -----     -----
+	 *         \     /
+	 *          \   / -> pruned, but u64 read should propagate backward
+	 *           \ /
+	 *          -----
+	 *         | BB3 | u64 read
+	 *          -----
+	 */
+	"zext: complex cfg 1",
+	.insns = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2),
+	/* BB1 */
+	BPF_MOV64_IMM(BPF_REG_8, 2),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 2),
+	/* BB2 */
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 0),
+	/* BB3, 64-bit R8 read should be prop backward. */
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2),
+	/* BB1 */
+	BPF_MOV64_IMM(BPF_REG_8, 2),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 3),
+	/* BB2 */
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_8, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_8, 32),
+	/* BB3 */
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	/* Diamond CFG
+	 *
+	 *          -----
+	 *         | BB0 | u32 def
+	 *          -----
+	 *           /\
+	 *          /  \
+	 *         /    \
+	 *      -----     -----
+	 *     | BB1 |   | BB2 | u32 def
+	 *      -----     -----
+	 *         \     /
+	 *          \   / -> pruned, but u64 read should propagate backward
+	 *           \ /
+	 *          -----
+	 *         | BB3 | u64 read
+	 *          -----
+	 */
+	"zext: complex cfg 2",
+	.insns = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV32_IMM(BPF_REG_6, 2),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2),
+	/* BB1 */
+	BPF_MOV64_IMM(BPF_REG_8, 2),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 2),
+	/* BB2 */
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 0),
+	/* BB3, 64-bit R8 read should be prop backward. */
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV32_IMM(BPF_REG_6, 2),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_6, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_6, 32),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2),
+	/* BB1 */
+	BPF_MOV64_IMM(BPF_REG_8, 2),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 3),
+	/* BB2 */
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_8, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_8, 32),
+	/* BB3 */
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 2,
+},
+{
+	/* Diamond CFG
+	 *
+	 *                 -----
+	 *                | BB0 | u32 def A
+	 *                 -----
+	 *                  /\
+	 *                 /  \
+	 *                /    \
+	 *             -----    -----
+	 *  u64 def A | BB1 |  | BB2 | u32 def B
+	 *             -----    -----
+	 *                 \     /
+	 *                  \   /
+	 *                   \ /
+	 *                  -----
+	 *                 | BB3 | u64 read A and B
+	 *                  -----
+	 */
+	"zext: complex cfg 3",
+	.insns = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV32_IMM(BPF_REG_6, 2),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 3),
+	/* BB1 */
+	BPF_MOV64_IMM(BPF_REG_8, 2),
+	BPF_MOV64_IMM(BPF_REG_6, 2),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 2),
+	/* BB2 */
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 0),
+	/* BB3, 64-bit R8 read should be prop backward. */
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV32_IMM(BPF_REG_6, 2),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_6, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_6, 32),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 3),
+	/* BB1 */
+	BPF_MOV64_IMM(BPF_REG_8, 2),
+	BPF_MOV64_IMM(BPF_REG_6, 2),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 3),
+	/* BB2 */
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_8, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_8, 32),
+	/* BB3 */
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 2,
+},
+{
+	/* Diamond CFG
+	 *
+	 *                 -----
+	 *                | BB0 | u32 def A
+	 *                 -----
+	 *                  /\
+	 *                 /  \
+	 *                /    \
+	 *             -----    -----
+	 *  u64 def A | BB1 |  | BB2 | u64 def A and u32 def B
+	 *             -----    -----
+	 *                 \     /
+	 *                  \   /
+	 *                   \ /
+	 *                  -----
+	 *                 | BB3 | u64 read A and B
+	 *                  -----
+	 */
+	"zext: complex cfg 4",
+	.insns = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV32_IMM(BPF_REG_6, 2),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 3),
+	/* BB1 */
+	BPF_MOV64_IMM(BPF_REG_8, 2),
+	BPF_MOV64_IMM(BPF_REG_6, 3),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 3),
+	/* BB2 */
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_MOV64_IMM(BPF_REG_6, 3),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 0),
+	/* BB3, 64-bit R8 read should be prop backward. */
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV32_IMM(BPF_REG_6, 2),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 3),
+	/* BB1 */
+	BPF_MOV64_IMM(BPF_REG_8, 2),
+	BPF_MOV64_IMM(BPF_REG_6, 3),
+	BPF_JMP_IMM(BPF_JA, 0, 0, 4),
+	/* BB2 */
+	BPF_MOV32_IMM(BPF_REG_8, 2),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_8, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_8, 32),
+	BPF_MOV64_IMM(BPF_REG_6, 3),
+	/* BB3 */
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_8),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.retval = 3,
+},
+{
+	"zext: callee-saved",
+	.insns = {
+	BPF_MOV32_IMM(BPF_REG_6, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_6),
+	BPF_EXIT_INSN(),
+	/* callee */
+	BPF_MOV32_IMM(BPF_REG_6, 1),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	/* caller u32 def should be zero extended. */
+	BPF_MOV32_IMM(BPF_REG_6, 0),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_6, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_6, 32),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+	/* u64 use. */
+	BPF_MOV64_REG(BPF_REG_0, BPF_REG_6),
+	BPF_EXIT_INSN(),
+	/* callee u32 def shouldn't be affected. */
+	BPF_MOV32_IMM(BPF_REG_6, 1),
+	BPF_EXIT_INSN(),
+	},
+	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.result_unpriv = REJECT,
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	"zext: arg regs",
+	.insns = {
+	BPF_MOV32_IMM(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	/* callee */
+	BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	/* caller u32 def should be zero extended. */
+	BPF_MOV32_IMM(BPF_REG_1, 0),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 32),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+	/* u64 use. */
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	/* callee u64 use on caller-saved reg. */
+	BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+	BPF_EXIT_INSN(),
+	},
+	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.result_unpriv = REJECT,
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	"zext: return arg",
+	.insns = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 5),
+	BPF_ALU32_IMM(BPF_ADD, BPF_REG_0, 1),
+	BPF_MOV32_REG(BPF_REG_6, BPF_REG_0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6),
+	BPF_EXIT_INSN(),
+	/* callee 1 */
+	BPF_MOV32_IMM(BPF_REG_0, 1),
+	BPF_EXIT_INSN(),
+	/* callee 2 */
+	BPF_MOV32_IMM(BPF_REG_0, 1),
+	BPF_EXIT_INSN(),
+	},
+	.xlated_insns = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 7),
+	BPF_ALU32_IMM(BPF_ADD, BPF_REG_0, 1),
+	BPF_MOV32_REG(BPF_REG_6, BPF_REG_0),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_6, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_6, 32),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6),
+	BPF_EXIT_INSN(),
+	/* callee 1 */
+	BPF_MOV32_IMM(BPF_REG_0, 1),
+	BPF_EXIT_INSN(),
+	/* callee 2 */
+	BPF_MOV32_IMM(BPF_REG_0, 1),
+	BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 32),
+	BPF_ALU64_IMM(BPF_RSH, BPF_REG_0, 32),
+	BPF_EXIT_INSN(),
+	},
+	.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+	.result_unpriv = REJECT,
+	.result = ACCEPT,
+	.retval = 3,
+},
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 02/16] bpf: refactor propagate_live implementation
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 02/16] bpf: refactor propagate_live implementation Jiong Wang
@ 2019-03-26 18:26   ` Jann Horn
  2019-03-26 19:45     ` Jiong Wang
  2019-03-27 16:35   ` Alexei Starovoitov
  1 sibling, 1 reply; 39+ messages in thread
From: Jann Horn @ 2019-03-26 18:26 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers

On Tue, Mar 26, 2019 at 7:07 PM Jiong Wang <jiong.wang@netronome.com> wrote:
> Some code inside current implementation of "propagate_liveness" is a little
> bit verbose.
>
> This patch refactor them so the code looks more simple and more clear.
>
> The redundant usage of "vparent->frame[vstate->curframe]" is removed as we
> are here. It is safe to do this because "state_equal" has guaranteed that
> vstate->curframe must be equal with vparent->curframe.
[...]
> @@ -6050,6 +6050,22 @@ static bool states_equal(struct bpf_verifier_env *env,
>         return true;
>  }
>
> +static int propagate_liveness_reg(struct bpf_verifier_env *env,
> +                                 struct bpf_reg_state *reg,
> +                                 struct bpf_reg_state *parent_reg, u8 flag)

This function takes four arguments...

[...]
> @@ -6071,16 +6088,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
[...]
> +               err = propagate_liveness_reg(env, &regs[i], &parent_regs[i]);

.. but both here...

[...]
> @@ -6089,11 +6103,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
[...]
> +                       err = propagate_liveness_reg(env, reg, parent_reg);

... and here you only pass in three arguments? Does this compile?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits Jiong Wang
@ 2019-03-26 18:44   ` Edward Cree
  2019-03-26 19:47     ` Jiong Wang
  2019-04-05 20:44     ` Jiong Wang
  2019-03-27 16:50   ` Alexei Starovoitov
  1 sibling, 2 replies; 39+ messages in thread
From: Edward Cree @ 2019-03-26 18:44 UTC (permalink / raw)
  To: Jiong Wang; +Cc: alexei.starovoitov, daniel, bpf, netdev, oss-drivers

On 26/03/2019 18:05, Jiong Wang wrote:
> eBPF ISA specification requires high 32-bit cleared when low 32-bit
> sub-register is written. This applies to destination register of ALU32 etc.
> JIT back-ends must guarantee this semantic when doing code-gen.
>
> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
> extension sequence to meet such semantic.
>
> This is important, because for code the following:
>
>   u64_value = (u64) u32_value
>   ... other uses of u64_value
>
> compiler could exploit the semantic described above and save those zero
> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
> could go up to more for some arches which requires two shifts for zero
> extension) because JIT back-end needs to do extra code-gen for all such
> instructions.
>
> However this is not always necessary in case u32_value is never cast into
> a u64, which is quite normal in real life program. So, it would be really
> good if we could identify those places where such type cast happened, and
> only do zero extensions for them, not for the others. This could save a lot
> of BPF code-gen.
>
> Algo:
>  - Record indices of instructions that do sub-register def (write). And
>    these indices need to stay with function state so path pruning and bpf
>    to bpf function call could be handled properly.
>
>    These indices are kept up to date while doing insn walk.
>
>  - A full register read on an active sub-register def marks the def insn as
>    needing zero extension on dst register.
>
>  - A new sub-register write overrides the old one.
>
>    A new full register write makes the register free of zero extension on
>    dst register.
>
>  - When propagating register read64 during path pruning, it also marks def
>    insns whose defs are hanging active sub-register, if there is any read64
>    from shown from the equal state.
>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  include/linux/bpf_verifier.h |  4 +++
>  kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 84 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 27761ab..0ae9a3f 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -181,6 +181,9 @@ struct bpf_func_state {
>  	 */
>  	u32 subprogno;
>  
> +	/* tracks subreg definition. */
Ideally this comment should mention that the stored value is the insn_idx
 of the writing insn.  Perhaps also that this is safe because patching
 (bpf_patch_insn_data()) only happens after main verification completes.

-Ed

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 02/16] bpf: refactor propagate_live implementation
  2019-03-26 18:26   ` Jann Horn
@ 2019-03-26 19:45     ` Jiong Wang
  0 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 19:45 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers

On 26/03/2019 18:26, Jann Horn wrote:
> On Tue, Mar 26, 2019 at 7:07 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>> Some code inside current implementation of "propagate_liveness" is a little
>> bit verbose.
>>
>> This patch refactor them so the code looks more simple and more clear.
>>
>> The redundant usage of "vparent->frame[vstate->curframe]" is removed as we
>> are here. It is safe to do this because "state_equal" has guaranteed that
>> vstate->curframe must be equal with vparent->curframe.
> [...]
>> @@ -6050,6 +6050,22 @@ static bool states_equal(struct bpf_verifier_env *env,
>>          return true;
>>   }
>>
>> +static int propagate_liveness_reg(struct bpf_verifier_env *env,
>> +                                 struct bpf_reg_state *reg,
>> +                                 struct bpf_reg_state *parent_reg, u8 flag)
> This function takes four arguments...
>
> [...]
>> @@ -6071,16 +6088,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
> [...]
>> +               err = propagate_liveness_reg(env, &regs[i], &parent_regs[i]);
> .. but both here...
>
> [...]
>> @@ -6089,11 +6103,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
> [...]
>> +                       err = propagate_liveness_reg(env, reg, parent_reg);
> ... and here you only pass in three arguments? Does this compile?

Yes, it compiles. It is fixed in patch 03/16...

I was doing some simplification on 03/16, then found it's better
to be put into 02/16, and forget to update the function prototype there,

apology for this.

Regards,
Jiong


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits
  2019-03-26 18:44   ` Edward Cree
@ 2019-03-26 19:47     ` Jiong Wang
  2019-04-05 20:44     ` Jiong Wang
  1 sibling, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 19:47 UTC (permalink / raw)
  To: Edward Cree; +Cc: alexei.starovoitov, daniel, bpf, netdev, oss-drivers

On 26/03/2019 18:44, Edward Cree wrote:
> On 26/03/2019 18:05, Jiong Wang wrote:
>> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> sub-register is written. This applies to destination register of ALU32 etc.
>> JIT back-ends must guarantee this semantic when doing code-gen.
>>
>> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> extension sequence to meet such semantic.
>>
>> This is important, because for code the following:
>>
>>    u64_value = (u64) u32_value
>>    ... other uses of u64_value
>>
>> compiler could exploit the semantic described above and save those zero
>> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> could go up to more for some arches which requires two shifts for zero
>> extension) because JIT back-end needs to do extra code-gen for all such
>> instructions.
>>
>> However this is not always necessary in case u32_value is never cast into
>> a u64, which is quite normal in real life program. So, it would be really
>> good if we could identify those places where such type cast happened, and
>> only do zero extensions for them, not for the others. This could save a lot
>> of BPF code-gen.
>>
>> Algo:
>>   - Record indices of instructions that do sub-register def (write). And
>>     these indices need to stay with function state so path pruning and bpf
>>     to bpf function call could be handled properly.
>>
>>     These indices are kept up to date while doing insn walk.
>>
>>   - A full register read on an active sub-register def marks the def insn as
>>     needing zero extension on dst register.
>>
>>   - A new sub-register write overrides the old one.
>>
>>     A new full register write makes the register free of zero extension on
>>     dst register.
>>
>>   - When propagating register read64 during path pruning, it also marks def
>>     insns whose defs are hanging active sub-register, if there is any read64
>>     from shown from the equal state.
>>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>>   include/linux/bpf_verifier.h |  4 +++
>>   kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>>   2 files changed, 84 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 27761ab..0ae9a3f 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -181,6 +181,9 @@ struct bpf_func_state {
>>   	 */
>>   	u32 subprogno;
>>   
>> +	/* tracks subreg definition. */
> Ideally this comment should mention that the stored value is the insn_idx
>   of the writing insn.  Perhaps also that this is safe because patching
>   (bpf_patch_insn_data()) only happens after main verification completes.

OK, will add more comments for this.

Regards,
Jiong


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 03/16] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 03/16] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32 Jiong Wang
@ 2019-03-26 20:21   ` Jann Horn
  2019-03-26 20:50     ` Jiong Wang
  2019-03-27 16:38   ` Alexei Starovoitov
  1 sibling, 1 reply; 39+ messages in thread
From: Jann Horn @ 2019-03-26 20:21 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers

On Tue, Mar 26, 2019 at 7:06 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>
> In previous patch, we have split register arg type for sub-register read,
> but haven't touch read liveness.
>
> This patch further split read liveness into REG_LIVE_READ64 and
> REG_LIVE_READ32. Liveness propagation code are updated accordingly.
>
> After this split, customized actions could be defined when propagating full
> register read (REG_LIVE_READ64) or sub-register read (REG_LIVE_READ32).
>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
[...]
> @@ -1374,7 +1374,8 @@ static int check_stack_read(struct bpf_verifier_env *env,
>                         return -EACCES;
>                 }
>                 mark_reg_read(env, &reg_state->stack[spi].spilled_ptr,
> -                             reg_state->stack[spi].spilled_ptr.parent);
> +                             reg_state->stack[spi].spilled_ptr.parent,
> +                             size == BPF_REG_SIZE);

Isn't it possible to use a 4-byte read on the upper half of an 8-byte
stack slot?

>                 if (value_regno >= 0) {
>                         if (zeros == size) {
>                                 /* any size read into register is zero extended,
> @@ -2220,7 +2221,8 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
>                  * the whole slot to be marked as 'read'
>                  */
>                 mark_reg_read(env, &state->stack[spi].spilled_ptr,
> -                             state->stack[spi].spilled_ptr.parent);
> +                             state->stack[spi].spilled_ptr.parent,
> +                             access_size == BPF_REG_SIZE);

Same thing as above.

>         }
>         return update_stack_depth(env, state, off);
>  }
[...]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 03/16] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32
  2019-03-26 20:21   ` Jann Horn
@ 2019-03-26 20:50     ` Jiong Wang
  0 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-26 20:50 UTC (permalink / raw)
  To: Jann Horn
  Cc: Jiong Wang, Alexei Starovoitov, Daniel Borkmann, bpf,
	Network Development, oss-drivers


Jann Horn writes:

> On Tue, Mar 26, 2019 at 7:06 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>> In previous patch, we have split register arg type for sub-register read,
>> but haven't touch read liveness.
>>
>> This patch further split read liveness into REG_LIVE_READ64 and
>> REG_LIVE_READ32. Liveness propagation code are updated accordingly.
>>
>> After this split, customized actions could be defined when propagating full
>> register read (REG_LIVE_READ64) or sub-register read (REG_LIVE_READ32).
>>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> [...]
>> @@ -1374,7 +1374,8 @@ static int check_stack_read(struct bpf_verifier_env *env,
>>                         return -EACCES;
>>                 }
>>                 mark_reg_read(env, &reg_state->stack[spi].spilled_ptr,
>> -                             reg_state->stack[spi].spilled_ptr.parent);
>> +                             reg_state->stack[spi].spilled_ptr.parent,
>> +                             size == BPF_REG_SIZE);
>
> Isn't it possible to use a 4-byte read on the upper half of an 8-byte
> stack slot?

I think that's fine, and is irrelevant with zero-extension on register.

If it is a 8-byte stack slot comes from spill of register, then the
definition of the register should have been marked as needing
zero-extension if that register was generated by sub-register write.

Regards,
Jiong

>
>>                 if (value_regno >= 0) {
>>                         if (zeros == size) {
>>                                 /* any size read into register is zero extended,
>> @@ -2220,7 +2221,8 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
>>                  * the whole slot to be marked as 'read'
>>                  */
>>                 mark_reg_read(env, &state->stack[spi].spilled_ptr,
>> -                             state->stack[spi].spilled_ptr.parent);
>> +                             state->stack[spi].spilled_ptr.parent,
>> +                             access_size == BPF_REG_SIZE);
>
> Same thing as above.
>
>>         }
>>         return update_stack_depth(env, state, off);
>>  }
> [...]


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 01/16] bpf: turn "enum bpf_reg_liveness" into bit representation
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 01/16] bpf: turn "enum bpf_reg_liveness" into bit representation Jiong Wang
@ 2019-03-27 15:44   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-03-27 15:44 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Tue, Mar 26, 2019 at 06:05:24PM +0000, Jiong Wang wrote:
> "enum bpf_reg_liveness" is actually used as bit instead of integer. For
> example:
> 
>         if (live & (REG_LIVE_READ | REG_LIVE_WRITTEN | REG_LIVE_DONE))
> 
> Using enum to represent bit is error prone, better to use explicit bit
> macros.
> 
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  include/linux/bpf_verifier.h | 16 +++++++++-------
>  kernel/bpf/verifier.c        |  5 ++---
>  2 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 7d8228d..f03c86a 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -34,12 +34,14 @@
>   * but of the link between it and its parent.  See mark_reg_read() and
>   * mark_stack_slot_read() in kernel/bpf/verifier.c.
>   */
> -enum bpf_reg_liveness {
> -	REG_LIVE_NONE = 0, /* reg hasn't been read or written this branch */
> -	REG_LIVE_READ, /* reg was read, so we're sensitive to initial value */
> -	REG_LIVE_WRITTEN, /* reg was written first, screening off later reads */
> -	REG_LIVE_DONE = 4, /* liveness won't be updating this register anymore */
> -};

yes. it is enum that is used as a bitfield, but I prefer to keep it as enum
because clang -Wassign-enum can do at least some type checking.
I also find it easier to review the code when it has
'enum bpf_reg_liveness' instead of 'u8'


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 02/16] bpf: refactor propagate_live implementation
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 02/16] bpf: refactor propagate_live implementation Jiong Wang
  2019-03-26 18:26   ` Jann Horn
@ 2019-03-27 16:35   ` Alexei Starovoitov
  2019-03-27 16:44     ` Jiong Wang
  1 sibling, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-03-27 16:35 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Tue, Mar 26, 2019 at 06:05:25PM +0000, Jiong Wang wrote:
> Some code inside current implementation of "propagate_liveness" is a little
> bit verbose.
> 
> This patch refactor them so the code looks more simple and more clear.
> 
> The redundant usage of "vparent->frame[vstate->curframe]" is removed as we
> are here. It is safe to do this because "state_equal" has guaranteed that
> vstate->curframe must be equal with vparent->curframe.
> 
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  kernel/bpf/verifier.c | 44 ++++++++++++++++++++++++++++++--------------
>  1 file changed, 30 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6cc8c38..245bb3c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6050,6 +6050,22 @@ static bool states_equal(struct bpf_verifier_env *env,
>  	return true;
>  }
>  
> +static int propagate_liveness_reg(struct bpf_verifier_env *env,
> +				  struct bpf_reg_state *reg,
> +				  struct bpf_reg_state *parent_reg, u8 flag)
> +{
> +	int err;
> +
> +	if (parent_reg->live & flag || !(reg->live & flag))
> +		return 0;
> +
> +	err = mark_reg_read(env, reg, parent_reg);
> +	if (err)
> +		return err;
> +
> +	return 1;
> +}

what is the difference between 1 and 0 ? it doesn't seem to be used.

> +
>  /* A write screens off any subsequent reads; but write marks come from the
>   * straight-line code between a state and its parent.  When we arrive at an
>   * equivalent state (jump target or such) we didn't arrive by the straight-line
> @@ -6061,8 +6077,9 @@ static int propagate_liveness(struct bpf_verifier_env *env,
>  			      const struct bpf_verifier_state *vstate,
>  			      struct bpf_verifier_state *vparent)
>  {
> -	int i, frame, err = 0;
> +	struct bpf_reg_state *regs, *parent_regs;
>  	struct bpf_func_state *state, *parent;
> +	int i, frame, err = 0;
>  
>  	if (vparent->curframe != vstate->curframe) {
>  		WARN(1, "propagate_live: parent frame %d current frame %d\n",
> @@ -6071,16 +6088,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
>  	}
>  	/* Propagate read liveness of registers... */
>  	BUILD_BUG_ON(BPF_REG_FP + 1 != MAX_BPF_REG);
> +	parent_regs = vparent->frame[vparent->curframe]->regs;
> +	regs = vstate->frame[vstate->curframe]->regs;


may be do:
frame = vstate->curframe;
if (vparent->curframe != frame) { WARN...
parent_regs = vparent->frame[frame]->regs;
regs = vstate->frame[frame]->regs;

?

>  	/* We don't need to worry about FP liveness because it's read-only */
>  	for (i = 0; i < BPF_REG_FP; i++) {
> -		if (vparent->frame[vparent->curframe]->regs[i].live & REG_LIVE_READ)
> -			continue;
> -		if (vstate->frame[vstate->curframe]->regs[i].live & REG_LIVE_READ) {
> -			err = mark_reg_read(env, &vstate->frame[vstate->curframe]->regs[i],
> -					    &vparent->frame[vstate->curframe]->regs[i]);
> -			if (err)
> -				return err;
> -		}
> +		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i]);
> +		if (err < 0)
> +			return err;
>  	}
>  
>  	/* ... and stack slots */
> @@ -6089,11 +6103,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
>  		parent = vparent->frame[frame];
>  		for (i = 0; i < state->allocated_stack / BPF_REG_SIZE &&
>  			    i < parent->allocated_stack / BPF_REG_SIZE; i++) {
> -			if (parent->stack[i].spilled_ptr.live & REG_LIVE_READ)
> -				continue;
> -			if (state->stack[i].spilled_ptr.live & REG_LIVE_READ)
> -				mark_reg_read(env, &state->stack[i].spilled_ptr,
> -					      &parent->stack[i].spilled_ptr);
> +			struct bpf_reg_state *parent_reg, *reg;
> +
> +			parent_reg = &parent->stack[i].spilled_ptr;
> +			reg = &state->stack[i].spilled_ptr;
> +			err = propagate_liveness_reg(env, reg, parent_reg);
> +			if (err < 0)
> +				return err;
>  		}
>  	}
>  	return err;
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 03/16] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 03/16] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32 Jiong Wang
  2019-03-26 20:21   ` Jann Horn
@ 2019-03-27 16:38   ` Alexei Starovoitov
  1 sibling, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-03-27 16:38 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Tue, Mar 26, 2019 at 06:05:26PM +0000, Jiong Wang wrote:
> In previous patch, we have split register arg type for sub-register read,
> but haven't touch read liveness.
> 
> This patch further split read liveness into REG_LIVE_READ64 and
> REG_LIVE_READ32. Liveness propagation code are updated accordingly.
> 
> After this split, customized actions could be defined when propagating full
> register read (REG_LIVE_READ64) or sub-register read (REG_LIVE_READ32).
> 
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  include/linux/bpf_verifier.h |  9 ++++++---
>  kernel/bpf/verifier.c        | 30 +++++++++++++++++++++---------
>  2 files changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index f03c86a..27761ab 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -37,11 +37,14 @@
>  /* Reg hasn't been read or written this branch. */
>  #define REG_LIVE_NONE		0x0
>  /* Reg was read, so we're sensitive to initial value. */
> -#define REG_LIVE_READ		0x1
> +#define REG_LIVE_READ32		0x1
> +/* Likewise, but full 64-bit content matters. */
> +#define REG_LIVE_READ64		0x2
> +#define REG_LIVE_READ		(REG_LIVE_READ32 | REG_LIVE_READ64)
>  /* Reg was written first, screening off later reads. */
> -#define REG_LIVE_WRITTEN	0x2
> +#define REG_LIVE_WRITTEN	0x4
>  /* Liveness won't be updating this register anymore. */
> -#define REG_LIVE_DONE		0x4
> +#define REG_LIVE_DONE		0x8
>  
>  struct bpf_reg_state {
>  	/* Ordering of fields matters.  See states_equal() */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 245bb3c..b95c438 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1126,7 +1126,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
>   */
>  static int mark_reg_read(struct bpf_verifier_env *env,
>  			 const struct bpf_reg_state *state,
> -			 struct bpf_reg_state *parent)
> +			 struct bpf_reg_state *parent, bool dw_read)
>  {
>  	bool writes = parent == state->parent; /* Observe write marks */
>  
> @@ -1141,7 +1141,7 @@ static int mark_reg_read(struct bpf_verifier_env *env,
>  			return -EFAULT;
>  		}
>  		/* ... then we depend on parent's value */
> -		parent->live |= REG_LIVE_READ;
> +		parent->live |= dw_read ? REG_LIVE_READ64 : REG_LIVE_READ32;
>  		state = parent;
>  		parent = state->parent;
>  		writes = true;
> @@ -1170,7 +1170,7 @@ static int check_reg_arg(struct bpf_verifier_env *env, u32 regno,
>  		/* We don't need to worry about FP liveness because it's read-only */
>  		if (regno != BPF_REG_FP)
>  			return mark_reg_read(env, &regs[regno],
> -					     regs[regno].parent);
> +					     regs[regno].parent, true);
>  	} else {
>  		/* check whether register used as dest operand can be written to */
>  		if (regno == BPF_REG_FP) {
> @@ -1357,7 +1357,7 @@ static int check_stack_read(struct bpf_verifier_env *env,
>  			state->regs[value_regno].live |= REG_LIVE_WRITTEN;
>  		}
>  		mark_reg_read(env, &reg_state->stack[spi].spilled_ptr,
> -			      reg_state->stack[spi].spilled_ptr.parent);
> +			      reg_state->stack[spi].spilled_ptr.parent, true);
>  		return 0;
>  	} else {
>  		int zeros = 0;
> @@ -1374,7 +1374,8 @@ static int check_stack_read(struct bpf_verifier_env *env,
>  			return -EACCES;
>  		}
>  		mark_reg_read(env, &reg_state->stack[spi].spilled_ptr,
> -			      reg_state->stack[spi].spilled_ptr.parent);
> +			      reg_state->stack[spi].spilled_ptr.parent,
> +			      size == BPF_REG_SIZE);
>  		if (value_regno >= 0) {
>  			if (zeros == size) {
>  				/* any size read into register is zero extended,
> @@ -2220,7 +2221,8 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
>  		 * the whole slot to be marked as 'read'
>  		 */
>  		mark_reg_read(env, &state->stack[spi].spilled_ptr,
> -			      state->stack[spi].spilled_ptr.parent);
> +			      state->stack[spi].spilled_ptr.parent,
> +			      access_size == BPF_REG_SIZE);
>  	}
>  	return update_stack_depth(env, state, off);
>  }
> @@ -6059,7 +6061,7 @@ static int propagate_liveness_reg(struct bpf_verifier_env *env,
>  	if (parent_reg->live & flag || !(reg->live & flag))
>  		return 0;
>  
> -	err = mark_reg_read(env, reg, parent_reg);
> +	err = mark_reg_read(env, reg, parent_reg, flag == REG_LIVE_READ64);
>  	if (err)
>  		return err;
>  
> @@ -6092,7 +6094,12 @@ static int propagate_liveness(struct bpf_verifier_env *env,
>  	regs = vstate->frame[vstate->curframe]->regs;
>  	/* We don't need to worry about FP liveness because it's read-only */
>  	for (i = 0; i < BPF_REG_FP; i++) {
> -		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i]);
> +		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i],
> +					     REG_LIVE_READ64);
> +		if (err < 0)
> +			return err;
> +		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i],
> +					     REG_LIVE_READ32);

so the parentage chain will be walked twice.
Please do it in one loop.
It's already slow.
I'm working on patches that speed up this walk, but doing it twice is wasteful regardless.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 02/16] bpf: refactor propagate_live implementation
  2019-03-27 16:35   ` Alexei Starovoitov
@ 2019-03-27 16:44     ` Jiong Wang
  0 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-27 16:44 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Daniel Borkmann, bpf, netdev, oss-drivers


> On 27 Mar 2019, at 16:35, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Tue, Mar 26, 2019 at 06:05:25PM +0000, Jiong Wang wrote:
>> Some code inside current implementation of "propagate_liveness" is a little
>> bit verbose.
>> 
>> This patch refactor them so the code looks more simple and more clear.
>> 
>> The redundant usage of "vparent->frame[vstate->curframe]" is removed as we
>> are here. It is safe to do this because "state_equal" has guaranteed that
>> vstate->curframe must be equal with vparent->curframe.
>> 
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>> kernel/bpf/verifier.c | 44 ++++++++++++++++++++++++++++++--------------
>> 1 file changed, 30 insertions(+), 14 deletions(-)
>> 
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 6cc8c38..245bb3c 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -6050,6 +6050,22 @@ static bool states_equal(struct bpf_verifier_env *env,
>> 	return true;
>> }
>> 
>> +static int propagate_liveness_reg(struct bpf_verifier_env *env,
>> +				  struct bpf_reg_state *reg,
>> +				  struct bpf_reg_state *parent_reg, u8 flag)
>> +{
>> +	int err;
>> +
>> +	if (parent_reg->live & flag || !(reg->live & flag))
>> +		return 0;
>> +
>> +	err = mark_reg_read(env, reg, parent_reg);
>> +	if (err)
>> +		return err;
>> +
>> +	return 1;
>> +}
> 
> what is the difference between 1 and 0 ? it doesn't seem to be used.

0 means no propagation has been done. 1 means propagation has been done.

They are used later in patch 4. If there is propagation, then will trigger
insn marking.

Will add comment for this.


> 
>> +
>> /* A write screens off any subsequent reads; but write marks come from the
>>  * straight-line code between a state and its parent.  When we arrive at an
>>  * equivalent state (jump target or such) we didn't arrive by the straight-line
>> @@ -6061,8 +6077,9 @@ static int propagate_liveness(struct bpf_verifier_env *env,
>> 			      const struct bpf_verifier_state *vstate,
>> 			      struct bpf_verifier_state *vparent)
>> {
>> -	int i, frame, err = 0;
>> +	struct bpf_reg_state *regs, *parent_regs;
>> 	struct bpf_func_state *state, *parent;
>> +	int i, frame, err = 0;
>> 
>> 	if (vparent->curframe != vstate->curframe) {
>> 		WARN(1, "propagate_live: parent frame %d current frame %d\n",
>> @@ -6071,16 +6088,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
>> 	}
>> 	/* Propagate read liveness of registers... */
>> 	BUILD_BUG_ON(BPF_REG_FP + 1 != MAX_BPF_REG);
>> +	parent_regs = vparent->frame[vparent->curframe]->regs;
>> +	regs = vstate->frame[vstate->curframe]->regs;
> 
> 
> may be do:
> frame = vstate->curframe;
> if (vparent->curframe != frame) { WARN...
> parent_regs = vparent->frame[frame]->regs;
> regs = vstate->frame[frame]->regs;
> 
> ?

Ack, will factor out "vstate->curframe” into “frame”.
 
And there is a check and warning on the equality already, just several lines above:

   if (vparent->curframe != vstate->curframe) {                             
     WARN(1, "propagate_live: parent frame %d current frame %d\n",

Regards,
Jiong
 
> 
>> 	/* We don't need to worry about FP liveness because it's read-only */
>> 	for (i = 0; i < BPF_REG_FP; i++) {
>> -		if (vparent->frame[vparent->curframe]->regs[i].live & REG_LIVE_READ)
>> -			continue;
>> -		if (vstate->frame[vstate->curframe]->regs[i].live & REG_LIVE_READ) {
>> -			err = mark_reg_read(env, &vstate->frame[vstate->curframe]->regs[i],
>> -					    &vparent->frame[vstate->curframe]->regs[i]);
>> -			if (err)
>> -				return err;
>> -		}
>> +		err = propagate_liveness_reg(env, &regs[i], &parent_regs[i]);
>> +		if (err < 0)
>> +			return err;
>> 	}
>> 
>> 	/* ... and stack slots */
>> @@ -6089,11 +6103,13 @@ static int propagate_liveness(struct bpf_verifier_env *env,
>> 		parent = vparent->frame[frame];
>> 		for (i = 0; i < state->allocated_stack / BPF_REG_SIZE &&
>> 			    i < parent->allocated_stack / BPF_REG_SIZE; i++) {
>> -			if (parent->stack[i].spilled_ptr.live & REG_LIVE_READ)
>> -				continue;
>> -			if (state->stack[i].spilled_ptr.live & REG_LIVE_READ)
>> -				mark_reg_read(env, &state->stack[i].spilled_ptr,
>> -					      &parent->stack[i].spilled_ptr);
>> +			struct bpf_reg_state *parent_reg, *reg;
>> +
>> +			parent_reg = &parent->stack[i].spilled_ptr;
>> +			reg = &state->stack[i].spilled_ptr;
>> +			err = propagate_liveness_reg(env, reg, parent_reg);
>> +			if (err < 0)
>> +				return err;
>> 		}
>> 	}
>> 	return err;
>> -- 
>> 2.7.4
>> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits Jiong Wang
  2019-03-26 18:44   ` Edward Cree
@ 2019-03-27 16:50   ` Alexei Starovoitov
  2019-03-27 17:06     ` Jiong Wang
  1 sibling, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-03-27 16:50 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Tue, Mar 26, 2019 at 06:05:27PM +0000, Jiong Wang wrote:
> eBPF ISA specification requires high 32-bit cleared when low 32-bit
> sub-register is written. This applies to destination register of ALU32 etc.
> JIT back-ends must guarantee this semantic when doing code-gen.
> 
> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
> extension sequence to meet such semantic.
> 
> This is important, because for code the following:
> 
>   u64_value = (u64) u32_value
>   ... other uses of u64_value
> 
> compiler could exploit the semantic described above and save those zero
> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
> could go up to more for some arches which requires two shifts for zero
> extension) because JIT back-end needs to do extra code-gen for all such
> instructions.
> 
> However this is not always necessary in case u32_value is never cast into
> a u64, which is quite normal in real life program. So, it would be really
> good if we could identify those places where such type cast happened, and
> only do zero extensions for them, not for the others. This could save a lot
> of BPF code-gen.
> 
> Algo:
>  - Record indices of instructions that do sub-register def (write). And
>    these indices need to stay with function state so path pruning and bpf
>    to bpf function call could be handled properly.
> 
>    These indices are kept up to date while doing insn walk.
> 
>  - A full register read on an active sub-register def marks the def insn as
>    needing zero extension on dst register.
> 
>  - A new sub-register write overrides the old one.
> 
>    A new full register write makes the register free of zero extension on
>    dst register.
> 
>  - When propagating register read64 during path pruning, it also marks def
>    insns whose defs are hanging active sub-register, if there is any read64
>    from shown from the equal state.
> 
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> ---
>  include/linux/bpf_verifier.h |  4 +++
>  kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 84 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 27761ab..0ae9a3f 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -181,6 +181,9 @@ struct bpf_func_state {
>  	 */
>  	u32 subprogno;
>  
> +	/* tracks subreg definition. */
> +	s32 subreg_def[MAX_BPF_REG];

Same comment as Ed and another question: why it's not part of bpf_reg_state ?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt"
  2019-03-26 18:05 ` [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt" Jiong Wang
@ 2019-03-27 17:00   ` Alexei Starovoitov
  2019-03-27 17:06     ` Jiong Wang
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-03-27 17:00 UTC (permalink / raw)
  To: Jiong Wang; +Cc: daniel, bpf, netdev, oss-drivers

On Tue, Mar 26, 2019 at 06:05:29PM +0000, Jiong Wang wrote:
> After previous patches, verifier has marked those instructions that really
> need zero extension on dst_reg.
> 
> It is then for all back-ends to decide how to use such information to
> eliminate unnecessary zero extension codegen during JIT compilation.
> 
> One approach is:
>   1. Verifier insert explicit zero extension for those instructions that
>      need zero extension.
>   2. All JIT back-ends do NOT generate zero extension for sub-register
>      write any more.
> 
> The good thing for this approach is no major change on JIT back-end
> interface, all back-ends could get this optimization.
> 
> However, only those back-ends that do not have hardware zero extension
> want this optimization. For back-ends like x86_64 and AArch64, there is
> hardware support, so this optimization should be disabled.
> 
> This patch introduces new sysctl "bpf_jit_32bit_opt" which is the control
> variable for whether the optimization should be enabled.
> 
> It is initialized using target hook bpf_jit_hardware_zext which is default
> true, meaning the underlying hardware will do zero extension automatically,
> therefore the optimization will be disabled.
> 
> Offload targets do not use this native target hook, instead, they could
> get the optimization results using bpf_prog_offload_ops.finalize.
> 
> The user could always enable or disable the optimization by using:
> 
>    sysctl net/core/bpf_jit_32bit_opt=[0 | 1]

I don't think there should be a sysctl for this.
32-bit archs and ppc/sparc should always do it, since there is no downside
and for x64/arm64 the verifier won't be inserting shifts because
bpf_jit_hardware_zext==true.
We probably need a new insn for this. I'm guessing shifts are not always
optimal on all archs. There could be arch specific insns that do it.
I'd love to hear from arch folks though.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt"
  2019-03-27 17:00   ` Alexei Starovoitov
@ 2019-03-27 17:06     ` Jiong Wang
  2019-03-27 17:17       ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Jiong Wang @ 2019-03-27 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Daniel Borkmann, bpf, netdev, oss-drivers


> On 27 Mar 2019, at 17:00, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Tue, Mar 26, 2019 at 06:05:29PM +0000, Jiong Wang wrote:
>> After previous patches, verifier has marked those instructions that really
>> need zero extension on dst_reg.
>> 
>> It is then for all back-ends to decide how to use such information to
>> eliminate unnecessary zero extension codegen during JIT compilation.
>> 
>> One approach is:
>>  1. Verifier insert explicit zero extension for those instructions that
>>     need zero extension.
>>  2. All JIT back-ends do NOT generate zero extension for sub-register
>>     write any more.
>> 
>> The good thing for this approach is no major change on JIT back-end
>> interface, all back-ends could get this optimization.
>> 
>> However, only those back-ends that do not have hardware zero extension
>> want this optimization. For back-ends like x86_64 and AArch64, there is
>> hardware support, so this optimization should be disabled.
>> 
>> This patch introduces new sysctl "bpf_jit_32bit_opt" which is the control
>> variable for whether the optimization should be enabled.
>> 
>> It is initialized using target hook bpf_jit_hardware_zext which is default
>> true, meaning the underlying hardware will do zero extension automatically,
>> therefore the optimization will be disabled.
>> 
>> Offload targets do not use this native target hook, instead, they could
>> get the optimization results using bpf_prog_offload_ops.finalize.
>> 
>> The user could always enable or disable the optimization by using:
>> 
>>   sysctl net/core/bpf_jit_32bit_opt=[0 | 1]
> 
> I don't think there should be a sysctl for this.

The sysctl introduced mostly because I think it could be useful for testing.
For example on x86_64, with this sysctl, we can enable the optimisation and
can run selftest.

Does this make sense?

Or when one insn is marked, we print verbose info, so the tester could catch
it from log?
 

> 32-bit archs and ppc/sparc should always do it, since there is no downside
> and for x64/arm64 the verifier won't be inserting shifts because
> bpf_jit_hardware_zext==true.
> We probably need a new insn for this.

Agree, and I mentioned this issue on the ToDo in cover letter.

> I'm guessing shifts are not always
> optimal on all archs. There could be arch specific insns that do it.
> I'd love to hear from arch folks though.
> 

Regards,
Jiong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits
  2019-03-27 16:50   ` Alexei Starovoitov
@ 2019-03-27 17:06     ` Jiong Wang
  0 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-27 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Daniel Borkmann, bpf, netdev, oss-drivers


> On 27 Mar 2019, at 16:50, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Tue, Mar 26, 2019 at 06:05:27PM +0000, Jiong Wang wrote:
>> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> sub-register is written. This applies to destination register of ALU32 etc.
>> JIT back-ends must guarantee this semantic when doing code-gen.
>> 
>> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> extension sequence to meet such semantic.
>> 
>> This is important, because for code the following:
>> 
>>  u64_value = (u64) u32_value
>>  ... other uses of u64_value
>> 
>> compiler could exploit the semantic described above and save those zero
>> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> could go up to more for some arches which requires two shifts for zero
>> extension) because JIT back-end needs to do extra code-gen for all such
>> instructions.
>> 
>> However this is not always necessary in case u32_value is never cast into
>> a u64, which is quite normal in real life program. So, it would be really
>> good if we could identify those places where such type cast happened, and
>> only do zero extensions for them, not for the others. This could save a lot
>> of BPF code-gen.
>> 
>> Algo:
>> - Record indices of instructions that do sub-register def (write). And
>>   these indices need to stay with function state so path pruning and bpf
>>   to bpf function call could be handled properly.
>> 
>>   These indices are kept up to date while doing insn walk.
>> 
>> - A full register read on an active sub-register def marks the def insn as
>>   needing zero extension on dst register.
>> 
>> - A new sub-register write overrides the old one.
>> 
>>   A new full register write makes the register free of zero extension on
>>   dst register.
>> 
>> - When propagating register read64 during path pruning, it also marks def
>>   insns whose defs are hanging active sub-register, if there is any read64
>>   from shown from the equal state.
>> 
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>> include/linux/bpf_verifier.h |  4 +++
>> kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>> 2 files changed, 84 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 27761ab..0ae9a3f 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -181,6 +181,9 @@ struct bpf_func_state {
>> 	 */
>> 	u32 subprogno;
>> 
>> +	/* tracks subreg definition. */
>> +	s32 subreg_def[MAX_BPF_REG];
> 
> Same comment as Ed and another question: why it's not part of bpf_reg_state ?

Thanks for spotting this, indeed, it looks perfect to be merged into bpf_reg_state.

Regards,
Jiong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt"
  2019-03-27 17:06     ` Jiong Wang
@ 2019-03-27 17:17       ` Alexei Starovoitov
  2019-03-27 17:18         ` Jiong Wang
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-03-27 17:17 UTC (permalink / raw)
  To: Jiong Wang; +Cc: Daniel Borkmann, bpf, netdev, oss-drivers

On Wed, Mar 27, 2019 at 05:06:01PM +0000, Jiong Wang wrote:
> 
> > On 27 Mar 2019, at 17:00, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > 
> > On Tue, Mar 26, 2019 at 06:05:29PM +0000, Jiong Wang wrote:
> >> After previous patches, verifier has marked those instructions that really
> >> need zero extension on dst_reg.
> >> 
> >> It is then for all back-ends to decide how to use such information to
> >> eliminate unnecessary zero extension codegen during JIT compilation.
> >> 
> >> One approach is:
> >>  1. Verifier insert explicit zero extension for those instructions that
> >>     need zero extension.
> >>  2. All JIT back-ends do NOT generate zero extension for sub-register
> >>     write any more.
> >> 
> >> The good thing for this approach is no major change on JIT back-end
> >> interface, all back-ends could get this optimization.
> >> 
> >> However, only those back-ends that do not have hardware zero extension
> >> want this optimization. For back-ends like x86_64 and AArch64, there is
> >> hardware support, so this optimization should be disabled.
> >> 
> >> This patch introduces new sysctl "bpf_jit_32bit_opt" which is the control
> >> variable for whether the optimization should be enabled.
> >> 
> >> It is initialized using target hook bpf_jit_hardware_zext which is default
> >> true, meaning the underlying hardware will do zero extension automatically,
> >> therefore the optimization will be disabled.
> >> 
> >> Offload targets do not use this native target hook, instead, they could
> >> get the optimization results using bpf_prog_offload_ops.finalize.
> >> 
> >> The user could always enable or disable the optimization by using:
> >> 
> >>   sysctl net/core/bpf_jit_32bit_opt=[0 | 1]
> > 
> > I don't think there should be a sysctl for this.
> 
> The sysctl introduced mostly because I think it could be useful for testing.
> For example on x86_64, with this sysctl, we can enable the optimisation and
> can run selftest.
> 
> Does this make sense?
> 
> Or when one insn is marked, we print verbose info, so the tester could catch
> it from log?

sysctl in this patch only triggers insertion of shifts.
what kind of testing does it enable on x64?
The writing insn is already 32-bit and hw does zero extend.
These two shifts is always a nop?
a sysctl to test that the verifier inserted shifts in the right place?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt"
  2019-03-27 17:17       ` Alexei Starovoitov
@ 2019-03-27 17:18         ` Jiong Wang
  2019-03-27 17:45           ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Jiong Wang @ 2019-03-27 17:18 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Daniel Borkmann, bpf, netdev, oss-drivers


> On 27 Mar 2019, at 17:17, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Wed, Mar 27, 2019 at 05:06:01PM +0000, Jiong Wang wrote:
>> 
>>> On 27 Mar 2019, at 17:00, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>> 
>>> On Tue, Mar 26, 2019 at 06:05:29PM +0000, Jiong Wang wrote:
>>>> After previous patches, verifier has marked those instructions that really
>>>> need zero extension on dst_reg.
>>>> 
>>>> It is then for all back-ends to decide how to use such information to
>>>> eliminate unnecessary zero extension codegen during JIT compilation.
>>>> 
>>>> One approach is:
>>>> 1. Verifier insert explicit zero extension for those instructions that
>>>>    need zero extension.
>>>> 2. All JIT back-ends do NOT generate zero extension for sub-register
>>>>    write any more.
>>>> 
>>>> The good thing for this approach is no major change on JIT back-end
>>>> interface, all back-ends could get this optimization.
>>>> 
>>>> However, only those back-ends that do not have hardware zero extension
>>>> want this optimization. For back-ends like x86_64 and AArch64, there is
>>>> hardware support, so this optimization should be disabled.
>>>> 
>>>> This patch introduces new sysctl "bpf_jit_32bit_opt" which is the control
>>>> variable for whether the optimization should be enabled.
>>>> 
>>>> It is initialized using target hook bpf_jit_hardware_zext which is default
>>>> true, meaning the underlying hardware will do zero extension automatically,
>>>> therefore the optimization will be disabled.
>>>> 
>>>> Offload targets do not use this native target hook, instead, they could
>>>> get the optimization results using bpf_prog_offload_ops.finalize.
>>>> 
>>>> The user could always enable or disable the optimization by using:
>>>> 
>>>>  sysctl net/core/bpf_jit_32bit_opt=[0 | 1]
>>> 
>>> I don't think there should be a sysctl for this.
>> 
>> The sysctl introduced mostly because I think it could be useful for testing.
>> For example on x86_64, with this sysctl, we can enable the optimisation and
>> can run selftest.
>> 
>> Does this make sense?
>> 
>> Or when one insn is marked, we print verbose info, so the tester could catch
>> it from log?
> 
> sysctl in this patch only triggers insertion of shifts.
> what kind of testing does it enable on x64?
> The writing insn is already 32-bit and hw does zero extend.
> These two shifts is always a nop?
> a sysctl to test that the verifier inserted shifts in the right place?

Yes, that’s the test methodology I am using. Match the instruction sequence after
shifts insertion.

Regards,
Jiong


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt"
  2019-03-27 17:18         ` Jiong Wang
@ 2019-03-27 17:45           ` Alexei Starovoitov
  2019-03-27 19:13             ` Jiong Wang
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-03-27 17:45 UTC (permalink / raw)
  To: Jiong Wang; +Cc: Daniel Borkmann, bpf, netdev, oss-drivers

On Wed, Mar 27, 2019 at 05:18:35PM +0000, Jiong Wang wrote:
> 
> > On 27 Mar 2019, at 17:17, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > 
> > On Wed, Mar 27, 2019 at 05:06:01PM +0000, Jiong Wang wrote:
> >> 
> >>> On 27 Mar 2019, at 17:00, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >>> 
> >>> On Tue, Mar 26, 2019 at 06:05:29PM +0000, Jiong Wang wrote:
> >>>> After previous patches, verifier has marked those instructions that really
> >>>> need zero extension on dst_reg.
> >>>> 
> >>>> It is then for all back-ends to decide how to use such information to
> >>>> eliminate unnecessary zero extension codegen during JIT compilation.
> >>>> 
> >>>> One approach is:
> >>>> 1. Verifier insert explicit zero extension for those instructions that
> >>>>    need zero extension.
> >>>> 2. All JIT back-ends do NOT generate zero extension for sub-register
> >>>>    write any more.
> >>>> 
> >>>> The good thing for this approach is no major change on JIT back-end
> >>>> interface, all back-ends could get this optimization.
> >>>> 
> >>>> However, only those back-ends that do not have hardware zero extension
> >>>> want this optimization. For back-ends like x86_64 and AArch64, there is
> >>>> hardware support, so this optimization should be disabled.
> >>>> 
> >>>> This patch introduces new sysctl "bpf_jit_32bit_opt" which is the control
> >>>> variable for whether the optimization should be enabled.
> >>>> 
> >>>> It is initialized using target hook bpf_jit_hardware_zext which is default
> >>>> true, meaning the underlying hardware will do zero extension automatically,
> >>>> therefore the optimization will be disabled.
> >>>> 
> >>>> Offload targets do not use this native target hook, instead, they could
> >>>> get the optimization results using bpf_prog_offload_ops.finalize.
> >>>> 
> >>>> The user could always enable or disable the optimization by using:
> >>>> 
> >>>>  sysctl net/core/bpf_jit_32bit_opt=[0 | 1]
> >>> 
> >>> I don't think there should be a sysctl for this.
> >> 
> >> The sysctl introduced mostly because I think it could be useful for testing.
> >> For example on x86_64, with this sysctl, we can enable the optimisation and
> >> can run selftest.
> >> 
> >> Does this make sense?
> >> 
> >> Or when one insn is marked, we print verbose info, so the tester could catch
> >> it from log?
> > 
> > sysctl in this patch only triggers insertion of shifts.
> > what kind of testing does it enable on x64?
> > The writing insn is already 32-bit and hw does zero extend.
> > These two shifts is always a nop?
> > a sysctl to test that the verifier inserted shifts in the right place?
> 
> Yes, that’s the test methodology I am using. Match the instruction sequence after
> shifts insertion.

I see. I don't think such extra shifts right after hw zero extend will catch much.
imo it would be better to populate upper 32-bit with random values on x64
where verifier analysis showed that it's ok to do so.
Such extra insns can be inserted by the verifier. Since such debugging
has run-time cost we'd need a flag to turn it on.
May be a new flag during prog load instead of sysctl?
It can be a global switch inside libbpf, so test_verifier and test_progs
wouldn't need to pass it everywhere explictly. It would double the test time,
but it's worth doing always on all archs. Especially on x64.

other thoughts...
I guess it's ok to stick with shifts for now.
Introducing new insn would be nice, but we can do it later.
Changing all jits for this new insn as pre-patch to this set is too much.

peephole to convert shifts is probably useful regardless.
bpf backend emits a bunch of useless shifts when alu32 is not used.
Would be great if x86 jit can optimize it for such lazy users
(and users who don't upgrade llvm fast enough or don't know about alu32)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt"
  2019-03-27 17:45           ` Alexei Starovoitov
@ 2019-03-27 19:13             ` Jiong Wang
  0 siblings, 0 replies; 39+ messages in thread
From: Jiong Wang @ 2019-03-27 19:13 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jiong Wang, Daniel Borkmann, bpf, netdev, oss-drivers


Alexei Starovoitov writes:

> On Wed, Mar 27, 2019 at 05:18:35PM +0000, Jiong Wang wrote:
>> 
>> > On 27 Mar 2019, at 17:17, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> > 
>> > On Wed, Mar 27, 2019 at 05:06:01PM +0000, Jiong Wang wrote:
>> >> 
>> >>> On 27 Mar 2019, at 17:00, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> >>> 
>> >>> On Tue, Mar 26, 2019 at 06:05:29PM +0000, Jiong Wang wrote:
>> >>>> After previous patches, verifier has marked those instructions that really
>> >>>> need zero extension on dst_reg.
>> >>>> 
>> >>>> It is then for all back-ends to decide how to use such information to
>> >>>> eliminate unnecessary zero extension codegen during JIT compilation.
>> >>>> 
>> >>>> One approach is:
>> >>>> 1. Verifier insert explicit zero extension for those instructions that
>> >>>>    need zero extension.
>> >>>> 2. All JIT back-ends do NOT generate zero extension for sub-register
>> >>>>    write any more.
>> >>>> 
>> >>>> The good thing for this approach is no major change on JIT back-end
>> >>>> interface, all back-ends could get this optimization.
>> >>>> 
>> >>>> However, only those back-ends that do not have hardware zero extension
>> >>>> want this optimization. For back-ends like x86_64 and AArch64, there is
>> >>>> hardware support, so this optimization should be disabled.
>> >>>> 
>> >>>> This patch introduces new sysctl "bpf_jit_32bit_opt" which is the control
>> >>>> variable for whether the optimization should be enabled.
>> >>>> 
>> >>>> It is initialized using target hook bpf_jit_hardware_zext which is default
>> >>>> true, meaning the underlying hardware will do zero extension automatically,
>> >>>> therefore the optimization will be disabled.
>> >>>> 
>> >>>> Offload targets do not use this native target hook, instead, they could
>> >>>> get the optimization results using bpf_prog_offload_ops.finalize.
>> >>>> 
>> >>>> The user could always enable or disable the optimization by using:
>> >>>> 
>> >>>>  sysctl net/core/bpf_jit_32bit_opt=[0 | 1]
>> >>> 
>> >>> I don't think there should be a sysctl for this.
>> >> 
>> >> The sysctl introduced mostly because I think it could be useful for testing.
>> >> For example on x86_64, with this sysctl, we can enable the optimisation and
>> >> can run selftest.
>> >> 
>> >> Does this make sense?
>> >> 
>> >> Or when one insn is marked, we print verbose info, so the tester could catch
>> >> it from log?
>> > 
>> > sysctl in this patch only triggers insertion of shifts.
>> > what kind of testing does it enable on x64?
>> > The writing insn is already 32-bit and hw does zero extend.
>> > These two shifts is always a nop?
>> > a sysctl to test that the verifier inserted shifts in the right place?
>> 
>> Yes, that’s the test methodology I am using. Match the instruction sequence after
>> shifts insertion.
>
> I see. I don't think such extra shifts right after hw zero extend will catch much.
> imo it would be better to populate upper 32-bit with random values on x64
> where verifier analysis showed that it's ok to do so.

Sound like a good idea, indeed gives much more stressful test on x64, and
if all tests passed under test_progs + -mattr=+alu32, then could be very
good assurance on the correctness.

> Such extra insns can be inserted by the verifier. Since such debugging
> has run-time cost we'd need a flag to turn it on.
> May be a new flag during prog load instead of sysctl?

OK, I will explore on this line, see if could have a clean solution.

> It can be a global switch inside libbpf, so test_verifier and test_progs
> wouldn't need to pass it everywhere explictly. It would double the test time,
> but it's worth doing always on all archs. Especially on x64.
>
> other thoughts...
> I guess it's ok to stick with shifts for now.
> Introducing new insn would be nice, but we can do it later.
> Changing all jits for this new insn as pre-patch to this set is too much.

+1

> peephole to convert shifts is probably useful regardless.
> bpf backend emits a bunch of useless shifts when alu32 is not used.
> Would be great if x86 jit can optimize it for such lazy users
> (and users who don't upgrade llvm fast enough or don't know about alu32)

Will do some checks on generic eBPF code-gen later to see how much peephole
opportunities there are.

Regards,
Jiong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits
  2019-03-26 18:44   ` Edward Cree
  2019-03-26 19:47     ` Jiong Wang
@ 2019-04-05 20:44     ` Jiong Wang
  2019-04-06  3:41       ` Alexei Starovoitov
  1 sibling, 1 reply; 39+ messages in thread
From: Jiong Wang @ 2019-04-05 20:44 UTC (permalink / raw)
  To: Edward Cree, Alexei Starovoitov; +Cc: Daniel Borkmann, bpf, netdev, oss-drivers


> On 26 Mar 2019, at 18:44, Edward Cree <ecree@solarflare.com> wrote:
> 
> On 26/03/2019 18:05, Jiong Wang wrote:
>> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> sub-register is written. This applies to destination register of ALU32 etc.
>> JIT back-ends must guarantee this semantic when doing code-gen.
>> 
>> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> extension sequence to meet such semantic.
>> 
>> This is important, because for code the following:
>> 
>>  u64_value = (u64) u32_value
>>  ... other uses of u64_value
>> 
>> compiler could exploit the semantic described above and save those zero
>> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> could go up to more for some arches which requires two shifts for zero
>> extension) because JIT back-end needs to do extra code-gen for all such
>> instructions.
>> 
>> However this is not always necessary in case u32_value is never cast into
>> a u64, which is quite normal in real life program. So, it would be really
>> good if we could identify those places where such type cast happened, and
>> only do zero extensions for them, not for the others. This could save a lot
>> of BPF code-gen.
>> 
>> Algo:
>> - Record indices of instructions that do sub-register def (write). And
>>   these indices need to stay with function state so path pruning and bpf
>>   to bpf function call could be handled properly.
>> 
>>   These indices are kept up to date while doing insn walk.
>> 
>> - A full register read on an active sub-register def marks the def insn as
>>   needing zero extension on dst register.
>> 
>> - A new sub-register write overrides the old one.
>> 
>>   A new full register write makes the register free of zero extension on
>>   dst register.
>> 
>> - When propagating register read64 during path pruning, it also marks def
>>   insns whose defs are hanging active sub-register, if there is any read64
>>   from shown from the equal state.
>> 
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> ---
>> include/linux/bpf_verifier.h |  4 +++
>> kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>> 2 files changed, 84 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 27761ab..0ae9a3f 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -181,6 +181,9 @@ struct bpf_func_state {
>> 	 */
>> 	u32 subprogno;
>> 
>> +	/* tracks subreg definition. */
> Ideally this comment should mention that the stored value is the insn_idx
>  of the writing insn.  Perhaps also that this is safe because patching
>  (bpf_patch_insn_data()) only happens after main verification completes.

During full x86_64 host tests, found one new issue.                                    
                                                                                         
“convert_ctx_accesses” will change load size, A BPF_W load could be transformed          
into BPF_DW or kept as BPF_W depending on the underlying ctx field size. And             
“convert_ctx_accesses” happens after zero extension insertion.                           
                                                                                         
So, a BPF_W load could have been marked and zero extensions inserted after               
it, however, the later happened “convert_ctx_accesses” then figured out it’s             
transformed load size is actually BPF_DW then re-write to that. But the                  
previously inserted zero extensions then break things, the high 32 bits are              
wrongly cleared. For example:

1: r2 = *(u32 *)(r1 + 80)                                                                
2: r1 = *(u32 *)(r1 + 76)                                                                
3: r3 = r1                                                                               
4: r3 += 14                                                                              
5: if r3 > r2 goto +35                                                                   
                                                                                         
insn 1 and 2 could be turned into BPF_DW load if they are loading xdp “data"
and “data_end". There shouldn’t be zero-extension inserted after them will
will destroy the pointer. However they are treated as 32-bit load initially,
and later due to 64-bit use at insn 3 and 5, they are marked as needing zero
extension.                                                                        
                                                                                         
I am thinking normally the field sizes in *_md inside uapi/linux/bpf.h are
the same those in real underlying context, only when one field is pointer
type, then it could be possible be a u32 to u64 conversion. So, I guess
we just need to mark the dst register as a full 64-bit register write 
inside check_mem_access when for PTR_TO_CTX, the reg type of the dust reg
returned by check_ctx_access is ptr type.

Please let me know if I am thinking wrong.                                                               
                                                                                 
Thanks.
                                                     
Regards,                                                                         
Jiong                       

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits
  2019-04-05 20:44     ` Jiong Wang
@ 2019-04-06  3:41       ` Alexei Starovoitov
  2019-04-06  6:56         ` Jiong Wang
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-04-06  3:41 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Edward Cree, Alexei Starovoitov, Daniel Borkmann, bpf, netdev,
	oss-drivers

On Fri, Apr 05, 2019 at 09:44:49PM +0100, Jiong Wang wrote:
> 
> > On 26 Mar 2019, at 18:44, Edward Cree <ecree@solarflare.com> wrote:
> > 
> > On 26/03/2019 18:05, Jiong Wang wrote:
> >> eBPF ISA specification requires high 32-bit cleared when low 32-bit
> >> sub-register is written. This applies to destination register of ALU32 etc.
> >> JIT back-ends must guarantee this semantic when doing code-gen.
> >> 
> >> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
> >> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
> >> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
> >> extension sequence to meet such semantic.
> >> 
> >> This is important, because for code the following:
> >> 
> >>  u64_value = (u64) u32_value
> >>  ... other uses of u64_value
> >> 
> >> compiler could exploit the semantic described above and save those zero
> >> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
> >> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
> >> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
> >> could go up to more for some arches which requires two shifts for zero
> >> extension) because JIT back-end needs to do extra code-gen for all such
> >> instructions.
> >> 
> >> However this is not always necessary in case u32_value is never cast into
> >> a u64, which is quite normal in real life program. So, it would be really
> >> good if we could identify those places where such type cast happened, and
> >> only do zero extensions for them, not for the others. This could save a lot
> >> of BPF code-gen.
> >> 
> >> Algo:
> >> - Record indices of instructions that do sub-register def (write). And
> >>   these indices need to stay with function state so path pruning and bpf
> >>   to bpf function call could be handled properly.
> >> 
> >>   These indices are kept up to date while doing insn walk.
> >> 
> >> - A full register read on an active sub-register def marks the def insn as
> >>   needing zero extension on dst register.
> >> 
> >> - A new sub-register write overrides the old one.
> >> 
> >>   A new full register write makes the register free of zero extension on
> >>   dst register.
> >> 
> >> - When propagating register read64 during path pruning, it also marks def
> >>   insns whose defs are hanging active sub-register, if there is any read64
> >>   from shown from the equal state.
> >> 
> >> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >> ---
> >> include/linux/bpf_verifier.h |  4 +++
> >> kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
> >> 2 files changed, 84 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> index 27761ab..0ae9a3f 100644
> >> --- a/include/linux/bpf_verifier.h
> >> +++ b/include/linux/bpf_verifier.h
> >> @@ -181,6 +181,9 @@ struct bpf_func_state {
> >> 	 */
> >> 	u32 subprogno;
> >> 
> >> +	/* tracks subreg definition. */
> > Ideally this comment should mention that the stored value is the insn_idx
> >  of the writing insn.  Perhaps also that this is safe because patching
> >  (bpf_patch_insn_data()) only happens after main verification completes.
> 
> During full x86_64 host tests, found one new issue.                                    
>                                                                                          
> “convert_ctx_accesses” will change load size, A BPF_W load could be transformed          
> into BPF_DW or kept as BPF_W depending on the underlying ctx field size. And             
> “convert_ctx_accesses” happens after zero extension insertion.                           
>                                                                                          
> So, a BPF_W load could have been marked and zero extensions inserted after               
> it, however, the later happened “convert_ctx_accesses” then figured out it’s             
> transformed load size is actually BPF_DW then re-write to that. But the                  
> previously inserted zero extensions then break things, the high 32 bits are              
> wrongly cleared. For example:
> 
> 1: r2 = *(u32 *)(r1 + 80)                                                                
> 2: r1 = *(u32 *)(r1 + 76)                                                                
> 3: r3 = r1                                                                               
> 4: r3 += 14                                                                              
> 5: if r3 > r2 goto +35                                                                   
>                                                                                          
> insn 1 and 2 could be turned into BPF_DW load if they are loading xdp “data"
> and “data_end". There shouldn’t be zero-extension inserted after them will
> will destroy the pointer. However they are treated as 32-bit load initially,
> and later due to 64-bit use at insn 3 and 5, they are marked as needing zero
> extension.                                                                        
>                                                                                          
> I am thinking normally the field sizes in *_md inside uapi/linux/bpf.h are
> the same those in real underlying context, only when one field is pointer
> type, then it could be possible be a u32 to u64 conversion. So, I guess
> we just need to mark the dst register as a full 64-bit register write 
> inside check_mem_access when for PTR_TO_CTX, the reg type of the dust reg
> returned by check_ctx_access is ptr type.

Since the register containing ctx->data was used later in the load insn and
it's type was pointer the analysis should have marked it as 64-bit access.

It feels that there is an issue in propagating 64-bit access through
parentage chain. Since insn 5 above recognized r2 as 64-bit access
then how come insn 1 was still allowed to poison upper bits?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits
  2019-04-06  3:41       ` Alexei Starovoitov
@ 2019-04-06  6:56         ` Jiong Wang
  2019-04-07  2:51           ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Jiong Wang @ 2019-04-06  6:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiong Wang, Edward Cree, Alexei Starovoitov, Daniel Borkmann,
	bpf, netdev, oss-drivers


Alexei Starovoitov writes:

> On Fri, Apr 05, 2019 at 09:44:49PM +0100, Jiong Wang wrote:
>> 
>> > On 26 Mar 2019, at 18:44, Edward Cree <ecree@solarflare.com> wrote:
>> > 
>> > On 26/03/2019 18:05, Jiong Wang wrote:
>> >> eBPF ISA specification requires high 32-bit cleared when low 32-bit
>> >> sub-register is written. This applies to destination register of ALU32 etc.
>> >> JIT back-ends must guarantee this semantic when doing code-gen.
>> >> 
>> >> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
>> >> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
>> >> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
>> >> extension sequence to meet such semantic.
>> >> 
>> >> This is important, because for code the following:
>> >> 
>> >>  u64_value = (u64) u32_value
>> >>  ... other uses of u64_value
>> >> 
>> >> compiler could exploit the semantic described above and save those zero
>> >> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
>> >> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
>> >> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
>> >> could go up to more for some arches which requires two shifts for zero
>> >> extension) because JIT back-end needs to do extra code-gen for all such
>> >> instructions.
>> >> 
>> >> However this is not always necessary in case u32_value is never cast into
>> >> a u64, which is quite normal in real life program. So, it would be really
>> >> good if we could identify those places where such type cast happened, and
>> >> only do zero extensions for them, not for the others. This could save a lot
>> >> of BPF code-gen.
>> >> 
>> >> Algo:
>> >> - Record indices of instructions that do sub-register def (write). And
>> >>   these indices need to stay with function state so path pruning and bpf
>> >>   to bpf function call could be handled properly.
>> >> 
>> >>   These indices are kept up to date while doing insn walk.
>> >> 
>> >> - A full register read on an active sub-register def marks the def insn as
>> >>   needing zero extension on dst register.
>> >> 
>> >> - A new sub-register write overrides the old one.
>> >> 
>> >>   A new full register write makes the register free of zero extension on
>> >>   dst register.
>> >> 
>> >> - When propagating register read64 during path pruning, it also marks def
>> >>   insns whose defs are hanging active sub-register, if there is any read64
>> >>   from shown from the equal state.
>> >> 
>> >> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
>> >> ---
>> >> include/linux/bpf_verifier.h |  4 +++
>> >> kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
>> >> 2 files changed, 84 insertions(+), 5 deletions(-)
>> >> 
>> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> >> index 27761ab..0ae9a3f 100644
>> >> --- a/include/linux/bpf_verifier.h
>> >> +++ b/include/linux/bpf_verifier.h
>> >> @@ -181,6 +181,9 @@ struct bpf_func_state {
>> >> 	 */
>> >> 	u32 subprogno;
>> >> 
>> >> +	/* tracks subreg definition. */
>> > Ideally this comment should mention that the stored value is the insn_idx
>> >  of the writing insn.  Perhaps also that this is safe because patching
>> >  (bpf_patch_insn_data()) only happens after main verification completes.
>> 
>> During full x86_64 host tests, found one new issue.                                    
>>                                                                                          
>> “convert_ctx_accesses” will change load size, A BPF_W load could be transformed          
>> into BPF_DW or kept as BPF_W depending on the underlying ctx field size. And             
>> “convert_ctx_accesses” happens after zero extension insertion.                           
>>                                                                                          
>> So, a BPF_W load could have been marked and zero extensions inserted after               
>> it, however, the later happened “convert_ctx_accesses” then figured out it’s             
>> transformed load size is actually BPF_DW then re-write to that. But the                  
>> previously inserted zero extensions then break things, the high 32 bits are              
>> wrongly cleared. For example:
>> 
>> 1: r2 = *(u32 *)(r1 + 80)                                                                
>> 2: r1 = *(u32 *)(r1 + 76)                                                                
>> 3: r3 = r1                                                                               
>> 4: r3 += 14                                                                              
>> 5: if r3 > r2 goto +35                                                                   
>>                                                                                          
>> insn 1 and 2 could be turned into BPF_DW load if they are loading xdp “data"
>> and “data_end". There shouldn’t be zero-extension inserted after them will
>> will destroy the pointer. However they are treated as 32-bit load initially,
>> and later due to 64-bit use at insn 3 and 5, they are marked as needing zero
>> extension.                                                                        
>>                                                                                          
>> I am thinking normally the field sizes in *_md inside uapi/linux/bpf.h are
>> the same those in real underlying context, only when one field is pointer
>> type, then it could be possible be a u32 to u64 conversion. So, I guess
>> we just need to mark the dst register as a full 64-bit register write 
>> inside check_mem_access when for PTR_TO_CTX, the reg type of the dust reg
>> returned by check_ctx_access is ptr type.
>
> Since the register containing ctx->data was used later in the load insn and
> it's type was pointer the analysis should have marked it as 64-bit access.
>
> It feels that there is an issue in propagating 64-bit access through
> parentage chain. Since insn 5 above recognized r2 as 64-bit access
> then how come insn 1 was still allowed to poison upper bits?

Guess my description was misleading. The high bits of insn 1 was not
poisoned, they are truncated, the analysis pass is correct here.

It is a BPF_W (4-byte) load, so initially it is marked as a sub-register
def and JIT compiler doesn't need to guarantee high 32-bit cleared. However
later insn 5 found it has a 64-bit use (as 64-bit operand in the
comparison), so it become mandatory to guarantee high 32-bit cleared, so
sequence transformed into:

1: r2 = *(u32 *)(r1 + 80)
2. r2 <<= 32
3. r2 >>= 32
4: r1 = *(u32 *)(r1 + 76)
5: r1 <<=  32
6: r1 >>= 32
5: r3 = r1                                                                               
6: r3 += 14                                                                              
7: if r3 > r2 goto +35

After the zero extension insertion, later in convert_ctx_access, it will
be further transformed into something like:

1: r2 = *(u64 *)(r1 + 80)
2. r2 <<= 32
3. r2 >>= 32

However, the inserted zero extension (insn 2/3) is still there and will
clear the high 32-bit of the loaded 64-bit value.

This issue should have been exposed before. But as described in the cover
letter, the opt is disabled on x86, my previous test methodology on x86 was
forcing it on through sysctl for a couple of insn matching unit tests only,
and was hoping other host arches like ppc could give it a full run on bpf
selftest before which the correctness of the opt was not verified by full
bpf selftest. I have done full run of some internal offload tests which
could be a good coverage, but the offload handles PTR_TO_CTX in a different
way so this issue was not caught.

Now as you suggested, the test methodology switched to poisoning high
32-bit on x86, so full test on bpf selftest is able to be enabled on x86
test and this issue is caught.

Regards,
Jiong

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits
  2019-04-06  6:56         ` Jiong Wang
@ 2019-04-07  2:51           ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-04-07  2:51 UTC (permalink / raw)
  To: Jiong Wang
  Cc: Edward Cree, Alexei Starovoitov, Daniel Borkmann, bpf, netdev,
	oss-drivers

On Sat, Apr 06, 2019 at 07:56:25AM +0100, Jiong Wang wrote:
> 
> Alexei Starovoitov writes:
> 
> > On Fri, Apr 05, 2019 at 09:44:49PM +0100, Jiong Wang wrote:
> >> 
> >> > On 26 Mar 2019, at 18:44, Edward Cree <ecree@solarflare.com> wrote:
> >> > 
> >> > On 26/03/2019 18:05, Jiong Wang wrote:
> >> >> eBPF ISA specification requires high 32-bit cleared when low 32-bit
> >> >> sub-register is written. This applies to destination register of ALU32 etc.
> >> >> JIT back-ends must guarantee this semantic when doing code-gen.
> >> >> 
> >> >> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT
> >> >> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp
> >> >> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero
> >> >> extension sequence to meet such semantic.
> >> >> 
> >> >> This is important, because for code the following:
> >> >> 
> >> >>  u64_value = (u64) u32_value
> >> >>  ... other uses of u64_value
> >> >> 
> >> >> compiler could exploit the semantic described above and save those zero
> >> >> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF
> >> >> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show
> >> >> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen (
> >> >> could go up to more for some arches which requires two shifts for zero
> >> >> extension) because JIT back-end needs to do extra code-gen for all such
> >> >> instructions.
> >> >> 
> >> >> However this is not always necessary in case u32_value is never cast into
> >> >> a u64, which is quite normal in real life program. So, it would be really
> >> >> good if we could identify those places where such type cast happened, and
> >> >> only do zero extensions for them, not for the others. This could save a lot
> >> >> of BPF code-gen.
> >> >> 
> >> >> Algo:
> >> >> - Record indices of instructions that do sub-register def (write). And
> >> >>   these indices need to stay with function state so path pruning and bpf
> >> >>   to bpf function call could be handled properly.
> >> >> 
> >> >>   These indices are kept up to date while doing insn walk.
> >> >> 
> >> >> - A full register read on an active sub-register def marks the def insn as
> >> >>   needing zero extension on dst register.
> >> >> 
> >> >> - A new sub-register write overrides the old one.
> >> >> 
> >> >>   A new full register write makes the register free of zero extension on
> >> >>   dst register.
> >> >> 
> >> >> - When propagating register read64 during path pruning, it also marks def
> >> >>   insns whose defs are hanging active sub-register, if there is any read64
> >> >>   from shown from the equal state.
> >> >> 
> >> >> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >> >> ---
> >> >> include/linux/bpf_verifier.h |  4 +++
> >> >> kernel/bpf/verifier.c        | 85 +++++++++++++++++++++++++++++++++++++++++---
> >> >> 2 files changed, 84 insertions(+), 5 deletions(-)
> >> >> 
> >> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> >> index 27761ab..0ae9a3f 100644
> >> >> --- a/include/linux/bpf_verifier.h
> >> >> +++ b/include/linux/bpf_verifier.h
> >> >> @@ -181,6 +181,9 @@ struct bpf_func_state {
> >> >> 	 */
> >> >> 	u32 subprogno;
> >> >> 
> >> >> +	/* tracks subreg definition. */
> >> > Ideally this comment should mention that the stored value is the insn_idx
> >> >  of the writing insn.  Perhaps also that this is safe because patching
> >> >  (bpf_patch_insn_data()) only happens after main verification completes.
> >> 
> >> During full x86_64 host tests, found one new issue.                                    
> >>                                                                                          
> >> “convert_ctx_accesses” will change load size, A BPF_W load could be transformed          
> >> into BPF_DW or kept as BPF_W depending on the underlying ctx field size. And             
> >> “convert_ctx_accesses” happens after zero extension insertion.                           
> >>                                                                                          
> >> So, a BPF_W load could have been marked and zero extensions inserted after               
> >> it, however, the later happened “convert_ctx_accesses” then figured out it’s             
> >> transformed load size is actually BPF_DW then re-write to that. But the                  
> >> previously inserted zero extensions then break things, the high 32 bits are              
> >> wrongly cleared. For example:
> >> 
> >> 1: r2 = *(u32 *)(r1 + 80)                                                                
> >> 2: r1 = *(u32 *)(r1 + 76)                                                                
> >> 3: r3 = r1                                                                               
> >> 4: r3 += 14                                                                              
> >> 5: if r3 > r2 goto +35                                                                   
> >>                                                                                          
> >> insn 1 and 2 could be turned into BPF_DW load if they are loading xdp “data"
> >> and “data_end". There shouldn’t be zero-extension inserted after them will
> >> will destroy the pointer. However they are treated as 32-bit load initially,
> >> and later due to 64-bit use at insn 3 and 5, they are marked as needing zero
> >> extension.                                                                        
> >>                                                                                          
> >> I am thinking normally the field sizes in *_md inside uapi/linux/bpf.h are
> >> the same those in real underlying context, only when one field is pointer
> >> type, then it could be possible be a u32 to u64 conversion. So, I guess
> >> we just need to mark the dst register as a full 64-bit register write 
> >> inside check_mem_access when for PTR_TO_CTX, the reg type of the dust reg
> >> returned by check_ctx_access is ptr type.
> >
> > Since the register containing ctx->data was used later in the load insn and
> > it's type was pointer the analysis should have marked it as 64-bit access.
> >
> > It feels that there is an issue in propagating 64-bit access through
> > parentage chain. Since insn 5 above recognized r2 as 64-bit access
> > then how come insn 1 was still allowed to poison upper bits?
> 
> Guess my description was misleading. The high bits of insn 1 was not
> poisoned, they are truncated, the analysis pass is correct here.
> 
> It is a BPF_W (4-byte) load, so initially it is marked as a sub-register
> def and JIT compiler doesn't need to guarantee high 32-bit cleared. However
> later insn 5 found it has a 64-bit use (as 64-bit operand in the
> comparison), so it become mandatory to guarantee high 32-bit cleared, so
> sequence transformed into:
> 
> 1: r2 = *(u32 *)(r1 + 80)
> 2. r2 <<= 32
> 3. r2 >>= 32
> 4: r1 = *(u32 *)(r1 + 76)
> 5: r1 <<=  32
> 6: r1 >>= 32
> 5: r3 = r1                                                                               
> 6: r3 += 14                                                                              
> 7: if r3 > r2 goto +35
> 
> After the zero extension insertion, later in convert_ctx_access, it will
> be further transformed into something like:
> 
> 1: r2 = *(u64 *)(r1 + 80)
> 2. r2 <<= 32
> 3. r2 >>= 32
> 
> However, the inserted zero extension (insn 2/3) is still there and will
> clear the high 32-bit of the loaded 64-bit value.
> 
> This issue should have been exposed before. But as described in the cover
> letter, the opt is disabled on x86, my previous test methodology on x86 was
> forcing it on through sysctl for a couple of insn matching unit tests only,
> and was hoping other host arches like ppc could give it a full run on bpf
> selftest before which the correctness of the opt was not verified by full
> bpf selftest. I have done full run of some internal offload tests which
> could be a good coverage, but the offload handles PTR_TO_CTX in a different
> way so this issue was not caught.
> 
> Now as you suggested, the test methodology switched to poisoning high
> 32-bit on x86, so full test on bpf selftest is able to be enabled on x86
> test and this issue is caught.

Got it. Yes. checking that check_ctx_access returns ptr type will be enough.


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2019-04-07  2:51 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-26 18:05 [PATCH/RFC bpf-next 00/16] bpf: eliminate zero extensions for sub-register writes Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 01/16] bpf: turn "enum bpf_reg_liveness" into bit representation Jiong Wang
2019-03-27 15:44   ` Alexei Starovoitov
2019-03-26 18:05 ` [PATCH/RFC bpf-next 02/16] bpf: refactor propagate_live implementation Jiong Wang
2019-03-26 18:26   ` Jann Horn
2019-03-26 19:45     ` Jiong Wang
2019-03-27 16:35   ` Alexei Starovoitov
2019-03-27 16:44     ` Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 03/16] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32 Jiong Wang
2019-03-26 20:21   ` Jann Horn
2019-03-26 20:50     ` Jiong Wang
2019-03-27 16:38   ` Alexei Starovoitov
2019-03-26 18:05 ` [PATCH/RFC bpf-next 04/16] bpf: mark sub-register writes that really need zero extension to high bits Jiong Wang
2019-03-26 18:44   ` Edward Cree
2019-03-26 19:47     ` Jiong Wang
2019-04-05 20:44     ` Jiong Wang
2019-04-06  3:41       ` Alexei Starovoitov
2019-04-06  6:56         ` Jiong Wang
2019-04-07  2:51           ` Alexei Starovoitov
2019-03-27 16:50   ` Alexei Starovoitov
2019-03-27 17:06     ` Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 05/16] bpf: reduce false alarm by refining "enum bpf_arg_type" Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 06/16] bpf: new sysctl "bpf_jit_32bit_opt" Jiong Wang
2019-03-27 17:00   ` Alexei Starovoitov
2019-03-27 17:06     ` Jiong Wang
2019-03-27 17:17       ` Alexei Starovoitov
2019-03-27 17:18         ` Jiong Wang
2019-03-27 17:45           ` Alexei Starovoitov
2019-03-27 19:13             ` Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 07/16] bpf: insert explicit zero extension instructions when bpf_jit_32bit_opt is true Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 08/16] arm: bpf: eliminate zero extension code-gen Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 09/16] powerpc: " Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 10/16] s390: " Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 11/16] sparc: " Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 12/16] x32: " Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 13/16] riscv: " Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 14/16] nfp: " Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 15/16] selftests: bpf: new field "xlated_insns" for insn scan test after verification Jiong Wang
2019-03-26 18:05 ` [PATCH/RFC bpf-next 16/16] selftests: bpf: unit testcases for zero extension insertion pass Jiong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.