All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/5] bpf: rewrite value tracking in verifier
@ 2017-06-07 14:55 ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-07 14:55 UTC (permalink / raw)
  To: davem, Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, iovisor-dev, LKML

This series simplifies alignment tracking, generalises bounds tracking and
 fixes some bounds-tracking bugs in the BPF verifier.  Pointer arithmetic on
 packet pointers, stack pointers, map value pointers and context pointers has
 been unified, and bounds on these pointers are only checked when the pointer
 is dereferenced.
Operations on pointers which destroy all relation to the original pointer
 (such as multiplies and shifts) are disallowed if !env->allow_ptr_leaks,
 otherwise they convert the pointer to an unknown scalar and feed it to the
 normal scalar arithmetic handling.
Pointer types have been unified with the corresponding adjusted-pointer types
 where those existed (e.g. PTR_TO_MAP_VALUE[_ADJ] or FRAME_PTR vs
 PTR_TO_STACK); similarly, CONST_IMM and UNKNOWN_VALUE have been unified into
 SCALAR_VALUE.
Pointer types (except CONST_PTR_TO_MAP, PTR_TO_MAP_VALUE_OR_NULL and
 PTR_TO_PACKET_END, which do not allow arithmetic) have a 'fixed offset' and
 a 'variable offset'; the former is used when e.g. adding an immediate or a
 known-constant register, as long as it does not overflow.  Otherwise the
 latter is used, and any operation creating a new variable offset creates a
 new 'id' (and, for PTR_TO_PACKET, clears the 'range').
SCALAR_VALUEs use the 'variable offset' fields to track the range of possible
 values; the 'fixed offset' should never be set on a scalar.

Patch 2/5 is rather on the big side, but since it changes the contents and
 semantics of a fairly central data structure, I'm not really sure how to go
 about splitting it up further without producing broken intermediate states.

With the changes in patch 5/5, all tools/testing/selftests/bpf/test_verifier
 tests pass.

Edward Cree (5):
  selftests/bpf: add test for mixed signed and unsigned bounds checks
  bpf/verifier: rework value tracking
  bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU
    path
  bpf/verifier: track signed and unsigned min/max values
  selftests/bpf: change test_verifier expectations

 include/linux/bpf.h                         |   34 +-
 include/linux/bpf_verifier.h                |   56 +-
 include/linux/tnum.h                        |   58 +
 kernel/bpf/Makefile                         |    2 +-
 kernel/bpf/tnum.c                           |  163 +++
 kernel/bpf/verifier.c                       | 1852 ++++++++++++++++-----------
 tools/testing/selftests/bpf/test_verifier.c |  248 ++--
 7 files changed, 1482 insertions(+), 931 deletions(-)
 create mode 100644 include/linux/tnum.h
 create mode 100644 kernel/bpf/tnum.c

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC PATCH net-next 0/5] bpf: rewrite value tracking in verifier
@ 2017-06-07 14:55 ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-07 14:55 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov, Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev, LKML

This series simplifies alignment tracking, generalises bounds tracking and
 fixes some bounds-tracking bugs in the BPF verifier.  Pointer arithmetic on
 packet pointers, stack pointers, map value pointers and context pointers has
 been unified, and bounds on these pointers are only checked when the pointer
 is dereferenced.
Operations on pointers which destroy all relation to the original pointer
 (such as multiplies and shifts) are disallowed if !env->allow_ptr_leaks,
 otherwise they convert the pointer to an unknown scalar and feed it to the
 normal scalar arithmetic handling.
Pointer types have been unified with the corresponding adjusted-pointer types
 where those existed (e.g. PTR_TO_MAP_VALUE[_ADJ] or FRAME_PTR vs
 PTR_TO_STACK); similarly, CONST_IMM and UNKNOWN_VALUE have been unified into
 SCALAR_VALUE.
Pointer types (except CONST_PTR_TO_MAP, PTR_TO_MAP_VALUE_OR_NULL and
 PTR_TO_PACKET_END, which do not allow arithmetic) have a 'fixed offset' and
 a 'variable offset'; the former is used when e.g. adding an immediate or a
 known-constant register, as long as it does not overflow.  Otherwise the
 latter is used, and any operation creating a new variable offset creates a
 new 'id' (and, for PTR_TO_PACKET, clears the 'range').
SCALAR_VALUEs use the 'variable offset' fields to track the range of possible
 values; the 'fixed offset' should never be set on a scalar.

Patch 2/5 is rather on the big side, but since it changes the contents and
 semantics of a fairly central data structure, I'm not really sure how to go
 about splitting it up further without producing broken intermediate states.

With the changes in patch 5/5, all tools/testing/selftests/bpf/test_verifier
 tests pass.

Edward Cree (5):
  selftests/bpf: add test for mixed signed and unsigned bounds checks
  bpf/verifier: rework value tracking
  bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU
    path
  bpf/verifier: track signed and unsigned min/max values
  selftests/bpf: change test_verifier expectations

 include/linux/bpf.h                         |   34 +-
 include/linux/bpf_verifier.h                |   56 +-
 include/linux/tnum.h                        |   58 +
 kernel/bpf/Makefile                         |    2 +-
 kernel/bpf/tnum.c                           |  163 +++
 kernel/bpf/verifier.c                       | 1852 ++++++++++++++++-----------
 tools/testing/selftests/bpf/test_verifier.c |  248 ++--
 7 files changed, 1482 insertions(+), 931 deletions(-)
 create mode 100644 include/linux/tnum.h
 create mode 100644 kernel/bpf/tnum.c

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC PATCH net-next 1/5] selftests/bpf: add test for mixed signed and unsigned bounds checks
@ 2017-06-07 14:58   ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-07 14:58 UTC (permalink / raw)
  To: davem, Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, iovisor-dev, LKML

Currently fails due to bug in verifier bounds handling.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 tools/testing/selftests/bpf/test_verifier.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index cabb19b..5074cfa 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -5169,6 +5169,32 @@ static struct bpf_test tests[] = {
 		},
 		.result = ACCEPT,
 	},
+	{
+		"bounds checks mixing signed and unsigned",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				     BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
+			BPF_MOV64_IMM(BPF_REG_2, -1),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_2, 3),
+			BPF_JMP_IMM(BPF_JSGT, BPF_REG_1, 1, 2),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+			BPF_ST_MEM(BPF_B, BPF_REG_0, 0, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map1 = { 3 },
+		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
+		.result = REJECT,
+		.result_unpriv = REJECT,
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH net-next 1/5] selftests/bpf: add test for mixed signed and unsigned bounds checks
@ 2017-06-07 14:58   ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-07 14:58 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov, Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev, LKML

Currently fails due to bug in verifier bounds handling.

Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
---
 tools/testing/selftests/bpf/test_verifier.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index cabb19b..5074cfa 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -5169,6 +5169,32 @@ static struct bpf_test tests[] = {
 		},
 		.result = ACCEPT,
 	},
+	{
+		"bounds checks mixing signed and unsigned",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				     BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
+			BPF_MOV64_IMM(BPF_REG_2, -1),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_2, 3),
+			BPF_JMP_IMM(BPF_JSGT, BPF_REG_1, 1, 2),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+			BPF_ST_MEM(BPF_B, BPF_REG_0, 0, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map1 = { 3 },
+		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
+		.result = REJECT,
+		.result_unpriv = REJECT,
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-07 14:58   ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-07 14:58 UTC (permalink / raw)
  To: davem, Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, iovisor-dev, LKML

Tracks value alignment by means of tracking known & unknown bits.
Tightens some min/max value checks and fixes a couple of bugs therein.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/bpf.h          |   34 +-
 include/linux/bpf_verifier.h |   40 +-
 include/linux/tnum.h         |   58 ++
 kernel/bpf/Makefile          |    2 +-
 kernel/bpf/tnum.c            |  163 +++++
 kernel/bpf/verifier.c        | 1641 +++++++++++++++++++++++-------------------
 6 files changed, 1170 insertions(+), 768 deletions(-)
 create mode 100644 include/linux/tnum.h
 create mode 100644 kernel/bpf/tnum.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6bb38d7..5ac19ab 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -115,35 +115,25 @@ enum bpf_access_type {
 };
 
 /* types of values stored in eBPF registers */
+/* Pointer types represent:
+ * pointer
+ * pointer + imm
+ * pointer + (u16) var
+ * pointer + (u16) var + imm
+ * if (range > 0) then [ptr, ptr + range - off) is safe to access
+ * if (id > 0) means that some 'var' was added
+ * if (off > 0) means that 'imm' was added
+ */
 enum bpf_reg_type {
 	NOT_INIT = 0,		 /* nothing was written into register */
-	UNKNOWN_VALUE,		 /* reg doesn't contain a valid pointer */
+	SCALAR_VALUE,		 /* reg doesn't contain a valid pointer */
 	PTR_TO_CTX,		 /* reg points to bpf_context */
 	CONST_PTR_TO_MAP,	 /* reg points to struct bpf_map */
 	PTR_TO_MAP_VALUE,	 /* reg points to map element value */
 	PTR_TO_MAP_VALUE_OR_NULL,/* points to map elem value or NULL */
-	FRAME_PTR,		 /* reg == frame_pointer */
-	PTR_TO_STACK,		 /* reg == frame_pointer + imm */
-	CONST_IMM,		 /* constant integer value */
-
-	/* PTR_TO_PACKET represents:
-	 * skb->data
-	 * skb->data + imm
-	 * skb->data + (u16) var
-	 * skb->data + (u16) var + imm
-	 * if (range > 0) then [ptr, ptr + range - off) is safe to access
-	 * if (id > 0) means that some 'var' was added
-	 * if (off > 0) menas that 'imm' was added
-	 */
-	PTR_TO_PACKET,
+	PTR_TO_STACK,		 /* reg == frame_pointer + offset */
+	PTR_TO_PACKET,		 /* reg points to skb->data */
 	PTR_TO_PACKET_END,	 /* skb->data + headlen */
-
-	/* PTR_TO_MAP_VALUE_ADJ is used for doing pointer math inside of a map
-	 * elem value.  We only allow this if we can statically verify that
-	 * access from this register are going to fall within the size of the
-	 * map element.
-	 */
-	PTR_TO_MAP_VALUE_ADJ,
 };
 
 struct bpf_prog;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index d5093b5..e341469 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -9,6 +9,7 @@
 
 #include <linux/bpf.h> /* for enum bpf_reg_type */
 #include <linux/filter.h> /* for MAX_BPF_STACK */
+#include <linux/tnum.h>
 
  /* Just some arbitrary values so we can safely do math without overflowing and
   * are obviously wrong for any sort of memory access.
@@ -19,30 +20,39 @@
 struct bpf_reg_state {
 	enum bpf_reg_type type;
 	union {
-		/* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE */
-		s64 imm;
-
-		/* valid when type == PTR_TO_PACKET* */
-		struct {
-			u16 off;
-			u16 range;
-		};
+		/* valid when type == PTR_TO_PACKET */
+		u32 range;
 
 		/* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
 		 *   PTR_TO_MAP_VALUE_OR_NULL
 		 */
 		struct bpf_map *map_ptr;
 	};
+	/* Fixed part of pointer offset, pointer types only */
+	s32 off;
+	/* Used to find other pointers with the same variable offset, so they
+	 * can share range knowledge.
+	 * Exception: for PTR_TO_MAP_VALUE_OR_NULL this is used to share which
+	 * map value we came from, when one is tested for != NULL.  Note that
+	 * this overloading means that we can't do pointer arithmetic on a
+	 * PTR_TO_MAP_VALUE_OR_NULL, we have to NULL-check it _first_.
+	 */
 	u32 id;
+	/* These three fields must be last.  See states_equal() */
+	/* For scalar types (SCALAR_VALUE), this represents our knowledge of
+	 * the actual value.
+	 * For pointer types, this represents the variable part of the offset
+	 * from the pointed-to object, and is shared with all bpf_reg_states
+	 * with the same id as us.
+	 */
+	struct tnum align;
 	/* Used to determine if any memory access using this register will
-	 * result in a bad access. These two fields must be last.
-	 * See states_equal()
+	 * result in a bad access.
+	 * These refer to the same value as align, not necessarily the actual
+	 * contents of the register.
 	 */
-	s64 min_value;
-	u64 max_value;
-	u32 min_align;
-	u32 aux_off;
-	u32 aux_off_align;
+	s64 min_value; /* minimum possible (s64)value */
+	u64 max_value; /* maximum possible (u64)value */
 };
 
 enum bpf_stack_slot_type {
diff --git a/include/linux/tnum.h b/include/linux/tnum.h
new file mode 100644
index 0000000..d9279a6
--- /dev/null
+++ b/include/linux/tnum.h
@@ -0,0 +1,58 @@
+/* tnum: tracked (or tristate) numbers
+ *
+ * A tnum tracks knowledge about the bits of a value.  Each bit can be either
+ * known (0 or 1), or unknown (x).  Arithmetic operations on tnums will
+ * propagate the unknown bits such that the tnum result represents all the
+ * possible results for possible values of the operands.
+ */
+#include <linux/types.h>
+
+struct tnum {
+	u64 value;
+	u64 mask;
+};
+
+/* Constructors */
+/* Represent a known constant as a tnum. */
+struct tnum tn_const(u64 value);
+/* A completely unknown value */
+extern const struct tnum tn_unknown;
+
+/* Arithmetic and logical ops */
+/* Shift a tnum left (by a fixed shift) */
+struct tnum tn_sl(struct tnum a, u8 shift);
+/* Shift a tnum right (by a fixed shift) */
+struct tnum tn_sr(struct tnum a, u8 shift);
+/* Add two tnums, return %a + %b */
+struct tnum tn_add(struct tnum a, struct tnum b);
+/* Subtract two tnums, return %a - %b */
+struct tnum tn_sub(struct tnum a, struct tnum b);
+/* Bitwise-AND, return %a & %b */
+struct tnum tn_and(struct tnum a, struct tnum b);
+/* Bitwise-OR, return %a | %b */
+struct tnum tn_or(struct tnum a, struct tnum b);
+/* Bitwise-XOR, return %a ^ %b */
+struct tnum tn_xor(struct tnum a, struct tnum b);
+/* Multiply two tnums, return %a * %b */
+struct tnum tn_mul(struct tnum a, struct tnum b);
+
+/* Return a tnum representing numbers satisfying both %a and %b */
+struct tnum tn_intersect(struct tnum a, struct tnum b);
+
+/* Returns true if %a is known to be a multiple of %size.
+ * %size must be a power of two.
+ */
+bool tn_is_aligned(struct tnum a, u64 size);
+
+/* Returns true if %b represents a subset of %a. */
+bool tn_in(struct tnum a, struct tnum b);
+
+/* Formatting functions.  These have snprintf-like semantics: they will write
+ * up to size bytes (including the terminating NUL byte), and return the number
+ * of bytes (excluding the terminating NUL) which would have been written had
+ * sufficient space been available.  (Thus tn_sbin always returns 64.)
+ */
+/* Format a tnum as a pair of hex numbers (value; mask) */
+int tn_strn(char *str, size_t size, struct tnum a);
+/* Format a tnum as tristate binary expansion */
+int tn_sbin(char *str, size_t size, struct tnum a);
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index e1e5e65..df14def 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -1,6 +1,6 @@
 obj-y := core.o
 
-obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o
+obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
 ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
new file mode 100644
index 0000000..cd167f4
--- /dev/null
+++ b/kernel/bpf/tnum.c
@@ -0,0 +1,163 @@
+/* tnum: tracked (or tristate) numbers
+ *
+ * A tnum tracks knowledge about the bits of a value.  Each bit can be either
+ * known (0 or 1), or unknown (x).  Arithmetic operations on tnums will
+ * propagate the unknown bits such that the tnum result represents all the
+ * possible results for possible values of the operands.
+ */
+#include <linux/kernel.h>
+#include <linux/tnum.h>
+
+#define TNUM(_v, _m)	(struct tnum){.value = _v, .mask = _m}
+/* A completely unknown value */
+const struct tnum tn_unknown = { .value = 0, .mask = -1 };
+
+struct tnum tn_const(u64 value)
+{
+	return TNUM(value, 0);
+}
+
+struct tnum tn_phi(struct tnum a, struct tnum b)
+{
+	u64 delta, mu;
+
+	delta = a.value ^ b.value;
+	mu = a.mask | b.mask | delta;
+	return TNUM(a.value & ~mu, mu);
+}
+
+struct tnum tn_sl(struct tnum a, u8 shift)
+{
+	return TNUM(a.value << shift, a.mask << shift);
+}
+
+struct tnum tn_sr(struct tnum a, u8 shift)
+{
+	return TNUM(a.value >> shift, a.mask >> shift);
+}
+
+struct tnum tn_add(struct tnum a, struct tnum b)
+{
+	u64 sm, sv, sigma, chi, mu;
+
+	sm = a.mask + b.mask;
+	sv = a.value + b.value;
+	sigma = sm + sv;
+	chi = sigma ^ sv;
+	mu = chi | a.mask | b.mask;
+	return TNUM(sv & ~mu, mu);
+}
+
+struct tnum tn_sub(struct tnum a, struct tnum b)
+{
+	u64 dv, alpha, beta, chi, mu;
+
+	dv = a.value - b.value;
+	alpha = dv + a.mask;
+	beta = dv - b.mask;
+	chi = alpha ^ beta;
+	mu = chi | a.mask | b.mask;
+	return TNUM(dv & ~mu, mu);
+}
+
+struct tnum tn_and(struct tnum a, struct tnum b)
+{
+	u64 alpha, beta, v;
+
+	alpha = a.value | a.mask;
+	beta = b.value | b.mask;
+	v = a.value & b.value;
+	return TNUM(v, alpha & beta & ~v);
+}
+
+struct tnum tn_or(struct tnum a, struct tnum b)
+{
+	u64 v, mu;
+
+	v = a.value | b.value;
+	mu = a.mask | b.mask;
+	return TNUM(v, mu & ~v);
+}
+
+struct tnum tn_xor(struct tnum a, struct tnum b)
+{
+	u64 v, mu;
+
+	v = a.value ^ b.value;
+	mu = a.mask | b.mask;
+	return TNUM(v & ~mu, mu);
+}
+
+/* half-multiply add: acc += (unknown * mask * value) */
+static struct tnum hma(struct tnum acc, u64 value, u64 mask)
+{
+	while (mask) {
+		if (mask & 1)
+			acc = tn_add(acc, TNUM(0, value));
+		mask >>= 1;
+		value <<= 1;
+	}
+	return acc;
+}
+
+struct tnum tn_mul(struct tnum a, struct tnum b)
+{
+	struct tnum acc;
+	u64 pi;
+
+	pi = a.value * b.value;
+	acc = hma(TNUM(pi, 0), a.mask, b.mask | b.value);
+	return hma(acc, b.mask, a.value);
+}
+
+/* Note that if a and b disagree - i.e. one has a 'known 1' where the other has
+ * a 'known 0' - this will return a 'known 1' for that bit.
+ */
+struct tnum tn_intersect(struct tnum a, struct tnum b)
+{
+	u64 v, mu;
+
+	v = a.value | b.value;
+	mu = a.mask & b.mask;
+	return TNUM(v & ~mu, mu);
+}
+
+bool tn_is_aligned(struct tnum a, u64 size)
+{
+	if (!size)
+		return true;
+	return !((a.value | a.mask) & (size - 1));
+}
+
+bool tn_in(struct tnum a, struct tnum b)
+{
+	if (b.mask & ~a.mask)
+		return false;
+	b.value &= ~a.mask;
+	return a.value == b.value;
+}
+
+int tn_strn(char *str, size_t size, struct tnum a)
+{
+	return snprintf(str, size, "(%#llx; %#llx)", a.value, a.mask);
+}
+
+int tn_sbin(char *str, size_t size, struct tnum a)
+{
+	size_t n;
+
+	for (n = 64; n; n--) {
+		if (n < size) {
+			if (a.mask & 1)
+				str[n - 1] = 'x';
+			else if (a.value & 1)
+				str[n - 1] = '1';
+			else
+				str[n - 1] = '0';
+		}
+		a.mask >>= 1;
+		a.value >>= 1;
+	}
+	str[min(size - 1, (size_t)64)] = 0;
+	return 64;
+}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 339c8a1..dd06e4e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -61,12 +61,12 @@
  * (and -20 constant is saved for further stack bounds checking).
  * Meaning that this reg is a pointer to stack plus known immediate constant.
  *
- * Most of the time the registers have UNKNOWN_VALUE type, which
+ * Most of the time the registers have SCALAR_VALUE type, which
  * means the register has some value, but it's not a valid pointer.
- * (like pointer plus pointer becomes UNKNOWN_VALUE type)
+ * (like pointer plus pointer becomes SCALAR_VALUE type)
  *
  * When verifier sees load or store instructions the type of base register
- * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, FRAME_PTR. These are three pointer
+ * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, PTR_TO_STACK. These are three pointer
  * types recognized by check_mem_access() function.
  *
  * PTR_TO_MAP_VALUE means that this register is pointing to 'map element value'
@@ -180,15 +180,12 @@ static __printf(1, 2) void verbose(const char *fmt, ...)
 /* string representation of 'enum bpf_reg_type' */
 static const char * const reg_type_str[] = {
 	[NOT_INIT]		= "?",
-	[UNKNOWN_VALUE]		= "inv",
+	[SCALAR_VALUE]		= "inv",
 	[PTR_TO_CTX]		= "ctx",
 	[CONST_PTR_TO_MAP]	= "map_ptr",
 	[PTR_TO_MAP_VALUE]	= "map_value",
 	[PTR_TO_MAP_VALUE_OR_NULL] = "map_value_or_null",
-	[PTR_TO_MAP_VALUE_ADJ]	= "map_value_adj",
-	[FRAME_PTR]		= "fp",
 	[PTR_TO_STACK]		= "fp",
-	[CONST_IMM]		= "imm",
 	[PTR_TO_PACKET]		= "pkt",
 	[PTR_TO_PACKET_END]	= "pkt_end",
 };
@@ -221,32 +218,36 @@ static void print_verifier_state(struct bpf_verifier_state *state)
 		if (t == NOT_INIT)
 			continue;
 		verbose(" R%d=%s", i, reg_type_str[t]);
-		if (t == CONST_IMM || t == PTR_TO_STACK)
-			verbose("%lld", reg->imm);
-		else if (t == PTR_TO_PACKET)
-			verbose("(id=%d,off=%d,r=%d)",
-				reg->id, reg->off, reg->range);
-		else if (t == UNKNOWN_VALUE && reg->imm)
-			verbose("%lld", reg->imm);
-		else if (t == CONST_PTR_TO_MAP || t == PTR_TO_MAP_VALUE ||
-			 t == PTR_TO_MAP_VALUE_OR_NULL ||
-			 t == PTR_TO_MAP_VALUE_ADJ)
-			verbose("(ks=%d,vs=%d,id=%u)",
-				reg->map_ptr->key_size,
-				reg->map_ptr->value_size,
-				reg->id);
-		if (reg->min_value != BPF_REGISTER_MIN_RANGE)
-			verbose(",min_value=%lld",
-				(long long)reg->min_value);
-		if (reg->max_value != BPF_REGISTER_MAX_RANGE)
-			verbose(",max_value=%llu",
-				(unsigned long long)reg->max_value);
-		if (reg->min_align)
-			verbose(",min_align=%u", reg->min_align);
-		if (reg->aux_off)
-			verbose(",aux_off=%u", reg->aux_off);
-		if (reg->aux_off_align)
-			verbose(",aux_off_align=%u", reg->aux_off_align);
+		if ((t == SCALAR_VALUE || t == PTR_TO_STACK) &&
+		    !reg->align.mask) {
+			/* reg->off should be 0 for SCALAR_VALUE */
+			verbose("%lld", reg->align.value + reg->off);
+		} else {
+			verbose("(id=%d", reg->id);
+			if (t != SCALAR_VALUE)
+				verbose(",off=%d", reg->off);
+			if (t == PTR_TO_PACKET)
+				verbose(",r=%d", reg->range);
+			else if (t == CONST_PTR_TO_MAP ||
+				 t == PTR_TO_MAP_VALUE ||
+				 t == PTR_TO_MAP_VALUE_OR_NULL)
+				verbose(",ks=%d,vs=%d",
+					reg->map_ptr->key_size,
+					reg->map_ptr->value_size);
+			if (reg->min_value != BPF_REGISTER_MIN_RANGE)
+				verbose(",min_value=%lld",
+					(long long)reg->min_value);
+			if (reg->max_value != BPF_REGISTER_MAX_RANGE)
+				verbose(",max_value=%llu",
+					(unsigned long long)reg->max_value);
+			if (~reg->align.mask) {
+				char tn_buf[48];
+
+				tn_strn(tn_buf, sizeof(tn_buf), reg->align);
+				verbose(",align=%s", tn_buf);
+			}
+			verbose(")");
+		}
 	}
 	for (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) {
 		if (state->stack_slot_type[i] == STACK_SPILL)
@@ -463,55 +464,59 @@ static const int caller_saved[CALLER_SAVED_REGS] = {
 	BPF_REG_0, BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4, BPF_REG_5
 };
 
-static void mark_reg_not_init(struct bpf_reg_state *regs, u32 regno)
+/* Mark the 'variable offset' part of a register as zero.  This should be
+ * used only on registers holding a pointer type.
+ */
+static void __mark_reg_known_zero(struct bpf_reg_state *reg)
 {
-	BUG_ON(regno >= MAX_BPF_REG);
-
-	memset(&regs[regno], 0, sizeof(regs[regno]));
-	regs[regno].type = NOT_INIT;
-	regs[regno].min_value = BPF_REGISTER_MIN_RANGE;
-	regs[regno].max_value = BPF_REGISTER_MAX_RANGE;
+	reg->align = tn_const(0);
+	reg->min_value = 0;
+	reg->max_value = 0;
 }
 
-static void init_reg_state(struct bpf_reg_state *regs)
+static void mark_reg_known_zero(struct bpf_reg_state *regs, u32 regno)
 {
-	int i;
-
-	for (i = 0; i < MAX_BPF_REG; i++)
-		mark_reg_not_init(regs, i);
-
-	/* frame pointer */
-	regs[BPF_REG_FP].type = FRAME_PTR;
-
-	/* 1st arg to a function */
-	regs[BPF_REG_1].type = PTR_TO_CTX;
+	BUG_ON(regno >= MAX_BPF_REG);
+	__mark_reg_known_zero(regs + regno);
 }
 
-static void __mark_reg_unknown_value(struct bpf_reg_state *regs, u32 regno)
+/* Mark a register as having a completely unknown (scalar) value. */
+static void __mark_reg_unknown(struct bpf_reg_state *reg)
 {
-	regs[regno].type = UNKNOWN_VALUE;
-	regs[regno].id = 0;
-	regs[regno].imm = 0;
+	reg->type = SCALAR_VALUE;
+	reg->id = 0;
+	reg->off = 0;
+	reg->align = tn_unknown;
+	reg->min_value = BPF_REGISTER_MIN_RANGE;
+	reg->max_value = BPF_REGISTER_MAX_RANGE;
 }
 
-static void mark_reg_unknown_value(struct bpf_reg_state *regs, u32 regno)
+static void mark_reg_unknown(struct bpf_reg_state *regs, u32 regno)
 {
 	BUG_ON(regno >= MAX_BPF_REG);
-	__mark_reg_unknown_value(regs, regno);
+	__mark_reg_unknown(regs + regno);
 }
 
-static void reset_reg_range_values(struct bpf_reg_state *regs, u32 regno)
+static void mark_reg_not_init(struct bpf_reg_state *regs, u32 regno)
 {
-	regs[regno].min_value = BPF_REGISTER_MIN_RANGE;
-	regs[regno].max_value = BPF_REGISTER_MAX_RANGE;
-	regs[regno].min_align = 0;
+	mark_reg_unknown(regs, regno);
+	regs[regno].type = NOT_INIT;
 }
 
-static void mark_reg_unknown_value_and_range(struct bpf_reg_state *regs,
-					     u32 regno)
+static void init_reg_state(struct bpf_reg_state *regs)
 {
-	mark_reg_unknown_value(regs, regno);
-	reset_reg_range_values(regs, regno);
+	int i;
+
+	for (i = 0; i < MAX_BPF_REG; i++)
+		mark_reg_not_init(regs, i);
+
+	/* frame pointer */
+	regs[BPF_REG_FP].type = PTR_TO_STACK;
+	mark_reg_known_zero(regs, BPF_REG_FP);
+
+	/* 1st arg to a function */
+	regs[BPF_REG_1].type = PTR_TO_CTX;
+	mark_reg_known_zero(regs, BPF_REG_1);
 }
 
 enum reg_arg_type {
@@ -541,7 +546,7 @@ static int check_reg_arg(struct bpf_reg_state *regs, u32 regno,
 			return -EACCES;
 		}
 		if (t == DST_OP)
-			mark_reg_unknown_value(regs, regno);
+			mark_reg_unknown(regs, regno);
 	}
 	return 0;
 }
@@ -565,12 +570,10 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	switch (type) {
 	case PTR_TO_MAP_VALUE:
 	case PTR_TO_MAP_VALUE_OR_NULL:
-	case PTR_TO_MAP_VALUE_ADJ:
 	case PTR_TO_STACK:
 	case PTR_TO_CTX:
 	case PTR_TO_PACKET:
 	case PTR_TO_PACKET_END:
-	case FRAME_PTR:
 	case CONST_PTR_TO_MAP:
 		return true;
 	default:
@@ -650,14 +653,13 @@ static int check_stack_read(struct bpf_verifier_state *state, int off, int size,
 		}
 		if (value_regno >= 0)
 			/* have read misc data from the stack */
-			mark_reg_unknown_value_and_range(state->regs,
-							 value_regno);
+			mark_reg_unknown(state->regs, value_regno);
 		return 0;
 	}
 }
 
 /* check read/write into map element returned by bpf_map_lookup_elem() */
-static int check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
+static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
 			    int size)
 {
 	struct bpf_map *map = env->cur_state.regs[regno].map_ptr;
@@ -670,22 +672,25 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
 	return 0;
 }
 
-/* check read/write into an adjusted map element */
-static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
+/* check read/write into a map element with possible variable offset */
+static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 				int off, int size)
 {
 	struct bpf_verifier_state *state = &env->cur_state;
 	struct bpf_reg_state *reg = &state->regs[regno];
 	int err;
 
-	/* We adjusted the register to this map value, so we
-	 * need to change off and size to min_value and max_value
-	 * respectively to make sure our theoretical access will be
-	 * safe.
+	/* We may have adjusted the register to this map value, so we
+	 * need to try adding each of min_value and max_value to off
+	 * to make sure our theoretical access will be safe.
 	 */
 	if (log_level)
 		print_verifier_state(state);
-	env->varlen_map_value_access = true;
+	/* If the offset is variable, we will need to be stricter in state
+	 * pruning from now on.
+	 */
+	if (reg->align.mask)
+		env->varlen_map_value_access = true;
 	/* The minimum value is only important with signed
 	 * comparisons where we can't assume the floor of a
 	 * value is 0.  If we are using signed variables for our
@@ -697,10 +702,9 @@ static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
 			regno);
 		return -EACCES;
 	}
-	err = check_map_access(env, regno, reg->min_value + off, size);
+	err = __check_map_access(env, regno, reg->min_value + off, size);
 	if (err) {
-		verbose("R%d min value is outside of the array range\n",
-			regno);
+		verbose("R%d min value is outside of the array range\n", regno);
 		return err;
 	}
 
@@ -712,7 +716,10 @@ static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
 			regno);
 		return -EACCES;
 	}
-	return check_map_access(env, regno, reg->max_value + off, size);
+	err = __check_map_access(env, regno, reg->max_value + off, size);
+	if (err)
+		verbose("R%d max value is outside of the array range\n", regno);
+	return err;
 }
 
 #define MAX_PACKET_OFF 0xffff
@@ -742,14 +749,14 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
 	}
 }
 
-static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
-			       int size)
+static int __check_packet_access(struct bpf_verifier_env *env, u32 regno,
+				 int off, int size)
 {
 	struct bpf_reg_state *regs = env->cur_state.regs;
 	struct bpf_reg_state *reg = &regs[regno];
 
-	off += reg->off;
-	if (off < 0 || size <= 0 || off + size > reg->range) {
+	if (off < 0 || size <= 0 || off > MAX_PACKET_OFF ||
+	    off + size > reg->range) {
 		verbose("invalid access to packet, off=%d size=%d, R%d(id=%d,off=%d,r=%d)\n",
 			off, size, regno, reg->id, reg->off, reg->range);
 		return -EACCES;
@@ -757,7 +764,35 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
 	return 0;
 }
 
-/* check access to 'struct bpf_context' fields */
+static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
+			       int size)
+{
+	struct bpf_reg_state *regs = env->cur_state.regs;
+	struct bpf_reg_state *reg = &regs[regno];
+	int err;
+
+	/* We may have added a variable offset to the packet pointer; but any
+	 * reg->range we have comes after that.  We are only checking the fixed
+	 * offset.
+	 */
+
+	/* We don't allow negative numbers, because we aren't tracking enough
+	 * detail to prove they're safe.
+	 */
+	if (reg->min_value < 0) {
+		verbose("R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n",
+			regno);
+		return -EACCES;
+	}
+	err = __check_packet_access(env, regno, off, size);
+	if (err) {
+		verbose("R%d offset is outside of the packet\n", regno);
+		return err;
+	}
+	return err;
+}
+
+/* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
 static int check_ctx_access(struct bpf_verifier_env *env, int off, int size,
 			    enum bpf_access_type t, enum bpf_reg_type *reg_type)
 {
@@ -782,35 +817,19 @@ static bool is_pointer_value(struct bpf_verifier_env *env, int regno)
 	if (env->allow_ptr_leaks)
 		return false;
 
-	switch (env->cur_state.regs[regno].type) {
-	case UNKNOWN_VALUE:
-	case CONST_IMM:
-		return false;
-	default:
-		return true;
-	}
+	return env->cur_state.regs[regno].type != SCALAR_VALUE;
 }
 
 static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg,
 				   int off, int size, bool strict)
 {
+	struct tnum reg_off;
 	int ip_align;
-	int reg_off;
 
 	/* Byte size accesses are always allowed. */
 	if (!strict || size == 1)
 		return 0;
 
-	reg_off = reg->off;
-	if (reg->id) {
-		if (reg->aux_off_align % size) {
-			verbose("Packet access is only %u byte aligned, %d byte access not allowed\n",
-				reg->aux_off_align, size);
-			return -EACCES;
-		}
-		reg_off += reg->aux_off;
-	}
-
 	/* For platforms that do not have a Kconfig enabling
 	 * CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS the value of
 	 * NET_IP_ALIGN is universally set to '2'.  And on platforms
@@ -820,20 +839,37 @@ static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg,
 	 * unconditional IP align value of '2'.
 	 */
 	ip_align = 2;
-	if ((ip_align + reg_off + off) % size != 0) {
-		verbose("misaligned packet access off %d+%d+%d size %d\n",
-			ip_align, reg_off, off, size);
+
+	reg_off = tn_add(reg->align, tn_const(ip_align + reg->off + off));
+	if (!tn_is_aligned(reg_off, size)) {
+		char tn_buf[48];
+
+		tn_strn(tn_buf, sizeof(tn_buf), reg->align);
+		verbose("misaligned packet access off %d+%s+%d+%d size %d\n",
+			ip_align, tn_buf, reg->off, off, size);
 		return -EACCES;
 	}
 
 	return 0;
 }
 
-static int check_val_ptr_alignment(const struct bpf_reg_state *reg,
-				   int size, bool strict)
+static int check_generic_ptr_alignment(const struct bpf_reg_state *reg,
+				       const char *pointer_desc,
+				       int off, int size, bool strict)
 {
-	if (strict && size != 1) {
-		verbose("Unknown alignment. Only byte-sized access allowed in value access.\n");
+	struct tnum reg_off;
+
+	/* Byte size accesses are always allowed. */
+	if (!strict || size == 1)
+		return 0;
+
+	reg_off = tn_add(reg->align, tn_const(reg->off + off));
+	if (!tn_is_aligned(reg_off, size)) {
+		char tn_buf[48];
+
+		tn_strn(tn_buf, sizeof(tn_buf), reg->align);
+		verbose("misaligned %saccess off %s+%d+%d size %d\n",
+			pointer_desc, tn_buf, reg->off, off, size);
 		return -EACCES;
 	}
 
@@ -845,21 +881,25 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
 			       int off, int size)
 {
 	bool strict = env->strict_alignment;
+	const char *pointer_desc = "";
 
 	switch (reg->type) {
 	case PTR_TO_PACKET:
+		/* special case, because of NET_IP_ALIGN */
 		return check_pkt_ptr_alignment(reg, off, size, strict);
-	case PTR_TO_MAP_VALUE_ADJ:
-		return check_val_ptr_alignment(reg, size, strict);
+	case PTR_TO_MAP_VALUE:
+		pointer_desc = "value ";
+		break;
+	case PTR_TO_CTX:
+		pointer_desc = "context ";
+		break;
+	case PTR_TO_STACK:
+		pointer_desc = "stack ";
+		break;
 	default:
-		if (off % size != 0) {
-			verbose("misaligned access off %d size %d\n",
-				off, size);
-			return -EACCES;
-		}
-
-		return 0;
+		break;
 	}
+	return check_generic_ptr_alignment(reg, pointer_desc, off, size, strict);
 }
 
 /* check whether memory at (regno + off) is accessible for t = (read | write)
@@ -876,52 +916,78 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 	struct bpf_reg_state *reg = &state->regs[regno];
 	int size, err = 0;
 
-	if (reg->type == PTR_TO_STACK)
-		off += reg->imm;
-
 	size = bpf_size_to_bytes(bpf_size);
 	if (size < 0)
 		return size;
 
+	/* alignment checks will add in reg->off themselves */
 	err = check_ptr_alignment(env, reg, off, size);
 	if (err)
 		return err;
 
-	if (reg->type == PTR_TO_MAP_VALUE ||
-	    reg->type == PTR_TO_MAP_VALUE_ADJ) {
+	/* for access checks, reg->off is just part of off */
+	off += reg->off;
+
+	if (reg->type == PTR_TO_MAP_VALUE) {
 		if (t == BPF_WRITE && value_regno >= 0 &&
 		    is_pointer_value(env, value_regno)) {
 			verbose("R%d leaks addr into map\n", value_regno);
 			return -EACCES;
 		}
 
-		if (reg->type == PTR_TO_MAP_VALUE_ADJ)
-			err = check_map_access_adj(env, regno, off, size);
-		else
-			err = check_map_access(env, regno, off, size);
+		err = check_map_access(env, regno, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
-			mark_reg_unknown_value_and_range(state->regs,
-							 value_regno);
+			mark_reg_unknown(state->regs, value_regno);
 
 	} else if (reg->type == PTR_TO_CTX) {
-		enum bpf_reg_type reg_type = UNKNOWN_VALUE;
+		enum bpf_reg_type reg_type = SCALAR_VALUE;
 
 		if (t == BPF_WRITE && value_regno >= 0 &&
 		    is_pointer_value(env, value_regno)) {
 			verbose("R%d leaks addr into ctx\n", value_regno);
 			return -EACCES;
 		}
+		/* ctx accesses must be at a fixed offset, so that we can
+		 * determine what type of data were returned.
+		 */
+		if (reg->align.mask) {
+			char tn_buf[48];
+
+			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
+			verbose("variable ctx access align=%s off=%d size=%d",
+				tn_buf, off, size);
+			return -EACCES;
+		}
+		off += reg->align.value;
 		err = check_ctx_access(env, off, size, t, &reg_type);
 		if (!err && t == BPF_READ && value_regno >= 0) {
-			mark_reg_unknown_value_and_range(state->regs,
-							 value_regno);
-			/* note that reg.[id|off|range] == 0 */
+			/* ctx access returns either a scalar, or a
+			 * PTR_TO_PACKET[_END].  In the latter case, we know
+			 * the offset is zero.
+			 */
+			if (reg_type == SCALAR_VALUE)
+				mark_reg_unknown(state->regs, value_regno);
+			else
+				mark_reg_known_zero(state->regs, value_regno);
+			state->regs[value_regno].id = 0;
+			state->regs[value_regno].off = 0;
+			state->regs[value_regno].range = 0;
 			state->regs[value_regno].type = reg_type;
-			state->regs[value_regno].aux_off = 0;
-			state->regs[value_regno].aux_off_align = 0;
 		}
 
-	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
+	} else if (reg->type == PTR_TO_STACK) {
+		/* stack accesses must be at a fixed offset, so that we can
+		 * determine what type of data were returned.
+		 */
+		if (reg->align.mask) {
+			char tn_buf[48];
+
+			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
+			verbose("variable stack access align=%s off=%d size=%d",
+				tn_buf, off, size);
+			return -EACCES;
+		}
+		off += reg->align.value;
 		if (off >= 0 || off < -MAX_BPF_STACK) {
 			verbose("invalid stack off=%d size=%d\n", off, size);
 			return -EACCES;
@@ -937,7 +1003,7 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 		} else {
 			err = check_stack_read(state, off, size, value_regno);
 		}
-	} else if (state->regs[regno].type == PTR_TO_PACKET) {
+	} else if (reg->type == PTR_TO_PACKET) {
 		if (t == BPF_WRITE && !may_access_direct_pkt_data(env, NULL, t)) {
 			verbose("cannot write into packet\n");
 			return -EACCES;
@@ -949,21 +1015,23 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 		}
 		err = check_packet_access(env, regno, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
-			mark_reg_unknown_value_and_range(state->regs,
-							 value_regno);
+			mark_reg_unknown(state->regs, value_regno);
 	} else {
 		verbose("R%d invalid mem access '%s'\n",
 			regno, reg_type_str[reg->type]);
 		return -EACCES;
 	}
 
-	if (!err && size <= 2 && value_regno >= 0 && env->allow_ptr_leaks &&
-	    state->regs[value_regno].type == UNKNOWN_VALUE) {
-		/* 1 or 2 byte load zero-extends, determine the number of
-		 * zero upper bits. Not doing it fo 4 byte load, since
-		 * such values cannot be added to ptr_to_packet anyway.
-		 */
-		state->regs[value_regno].imm = 64 - size * 8;
+	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
+	    state->regs[value_regno].type == SCALAR_VALUE) {
+		/* b/h/w load zero-extends, mark upper bits as known 0 */
+		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
+		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
+		/* sign bit is known zero, so we can bound the value */
+		state->regs[value_regno].min_value = 0;
+		state->regs[value_regno].max_value = min_t(u64,
+					state->regs[value_regno].align.mask,
+					BPF_REGISTER_MAX_RANGE);
 	}
 	return err;
 }
@@ -1000,9 +1068,18 @@ static int check_xadd(struct bpf_verifier_env *env, struct bpf_insn *insn)
 				BPF_SIZE(insn->code), BPF_WRITE, -1);
 }
 
+/* Does this register contain a constant zero? */
+static bool register_is_null(struct bpf_reg_state reg)
+{
+	return reg.type == SCALAR_VALUE && reg.align.mask == 0 &&
+	       reg.align.value == 0;
+}
+
 /* when register 'regno' is passed into function that will read 'access_size'
  * bytes from that pointer, make sure that it's within stack boundary
- * and all elements of stack are initialized
+ * and all elements of stack are initialized.
+ * Unlike most pointer bounds-checking functions, this one doesn't take an
+ * 'off' argument, so it has to add in reg->off itself.
  */
 static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
 				int access_size, bool zero_size_allowed,
@@ -1013,9 +1090,9 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
 	int off, i;
 
 	if (regs[regno].type != PTR_TO_STACK) {
+		/* Allow zero-byte read from NULL, regardless of pointer type */
 		if (zero_size_allowed && access_size == 0 &&
-		    regs[regno].type == CONST_IMM &&
-		    regs[regno].imm  == 0)
+		    register_is_null(regs[regno]))
 			return 0;
 
 		verbose("R%d type=%s expected=%s\n", regno,
@@ -1024,7 +1101,15 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
 		return -EACCES;
 	}
 
-	off = regs[regno].imm;
+	/* Only allow fixed-offset stack reads */
+	if (regs[regno].align.mask) {
+		char tn_buf[48];
+
+		tn_strn(tn_buf, sizeof(tn_buf), regs[regno].align);
+		verbose("invalid variable stack read R%d align=%s\n",
+			regno, tn_buf);
+	}
+	off = regs[regno].off + regs[regno].align.value;
 	if (off >= 0 || off < -MAX_BPF_STACK || off + access_size > 0 ||
 	    access_size <= 0) {
 		verbose("invalid stack type R%d off=%d access_size=%d\n",
@@ -1052,16 +1137,14 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
 				   int access_size, bool zero_size_allowed,
 				   struct bpf_call_arg_meta *meta)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs;
+	struct bpf_reg_state *regs = env->cur_state.regs, *reg = &regs[regno];
 
-	switch (regs[regno].type) {
+	switch (reg->type) {
 	case PTR_TO_PACKET:
-		return check_packet_access(env, regno, 0, access_size);
+		return check_packet_access(env, regno, reg->off, access_size);
 	case PTR_TO_MAP_VALUE:
-		return check_map_access(env, regno, 0, access_size);
-	case PTR_TO_MAP_VALUE_ADJ:
-		return check_map_access_adj(env, regno, 0, access_size);
-	default: /* const_imm|ptr_to_stack or invalid ptr */
+		return check_map_access(env, regno, reg->off, access_size);
+	default: /* scalar_value|ptr_to_stack or invalid ptr */
 		return check_stack_boundary(env, regno, access_size,
 					    zero_size_allowed, meta);
 	}
@@ -1104,11 +1187,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			goto err_type;
 	} else if (arg_type == ARG_CONST_SIZE ||
 		   arg_type == ARG_CONST_SIZE_OR_ZERO) {
-		expected_type = CONST_IMM;
-		/* One exception. Allow UNKNOWN_VALUE registers when the
-		 * boundaries are known and don't cause unsafe memory accesses
-		 */
-		if (type != UNKNOWN_VALUE && type != expected_type)
+		expected_type = SCALAR_VALUE;
+		if (type != expected_type)
 			goto err_type;
 	} else if (arg_type == ARG_CONST_MAP_PTR) {
 		expected_type = CONST_PTR_TO_MAP;
@@ -1122,13 +1202,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		   arg_type == ARG_PTR_TO_UNINIT_MEM) {
 		expected_type = PTR_TO_STACK;
 		/* One exception here. In case function allows for NULL to be
-		 * passed in as argument, it's a CONST_IMM type. Final test
+		 * passed in as argument, it's a SCALAR_VALUE type. Final test
 		 * happens during stack boundary checking.
 		 */
-		if (type == CONST_IMM && reg->imm == 0)
+		if (register_is_null(*reg))
 			/* final test in check_stack_boundary() */;
 		else if (type != PTR_TO_PACKET && type != PTR_TO_MAP_VALUE &&
-			 type != PTR_TO_MAP_VALUE_ADJ && type != expected_type)
+			 type != expected_type)
 			goto err_type;
 		meta->raw_mode = arg_type == ARG_PTR_TO_UNINIT_MEM;
 	} else {
@@ -1154,7 +1234,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		if (type == PTR_TO_PACKET)
-			err = check_packet_access(env, regno, 0,
+			err = check_packet_access(env, regno, reg->off,
 						  meta->map_ptr->key_size);
 		else
 			err = check_stack_boundary(env, regno,
@@ -1170,7 +1250,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		if (type == PTR_TO_PACKET)
-			err = check_packet_access(env, regno, 0,
+			err = check_packet_access(env, regno, reg->off,
 						  meta->map_ptr->value_size);
 		else
 			err = check_stack_boundary(env, regno,
@@ -1190,10 +1270,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 
-		/* If the register is UNKNOWN_VALUE, the access check happens
-		 * using its boundaries. Otherwise, just use its imm
+		/* The register is SCALAR_VALUE; the access check
+		 * happens using its boundaries.
 		 */
-		if (type == UNKNOWN_VALUE) {
+
+		if (reg->align.mask)
 			/* For unprivileged variable accesses, disable raw
 			 * mode so that the program is required to
 			 * initialize all the memory that the helper could
@@ -1201,35 +1282,28 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			 */
 			meta = NULL;
 
-			if (reg->min_value < 0) {
-				verbose("R%d min value is negative, either use unsigned or 'var &= const'\n",
-					regno);
-				return -EACCES;
-			}
-
-			if (reg->min_value == 0) {
-				err = check_helper_mem_access(env, regno - 1, 0,
-							      zero_size_allowed,
-							      meta);
-				if (err)
-					return err;
-			}
+		if (reg->min_value < 0) {
+			verbose("R%d min value is negative, either use unsigned or 'var &= const'\n",
+				regno);
+			return -EACCES;
+		}
 
-			if (reg->max_value == BPF_REGISTER_MAX_RANGE) {
-				verbose("R%d unbounded memory access, use 'var &= const' or 'if (var < const)'\n",
-					regno);
-				return -EACCES;
-			}
-			err = check_helper_mem_access(env, regno - 1,
-						      reg->max_value,
-						      zero_size_allowed, meta);
+		if (reg->min_value == 0) {
+			err = check_helper_mem_access(env, regno - 1, 0,
+						      zero_size_allowed,
+						      meta);
 			if (err)
 				return err;
-		} else {
-			/* register is CONST_IMM */
-			err = check_helper_mem_access(env, regno - 1, reg->imm,
-						      zero_size_allowed, meta);
 		}
+
+		if (reg->max_value == BPF_REGISTER_MAX_RANGE) {
+			verbose("R%d unbounded memory access, use 'var &= const' or 'if (var < const)'\n",
+				regno);
+			return -EACCES;
+		}
+		err = check_helper_mem_access(env, regno - 1,
+					      reg->max_value,
+					      zero_size_allowed, meta);
 	}
 
 	return err;
@@ -1321,6 +1395,9 @@ static int check_raw_mode(const struct bpf_func_proto *fn)
 	return count > 1 ? -EINVAL : 0;
 }
 
+/* Packet data might have moved, any old PTR_TO_PACKET[_END] are now invalid,
+ * so turn them into unknown SCALAR_VALUE.
+ */
 static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
 {
 	struct bpf_verifier_state *state = &env->cur_state;
@@ -1330,7 +1407,7 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
 	for (i = 0; i < MAX_BPF_REG; i++)
 		if (regs[i].type == PTR_TO_PACKET ||
 		    regs[i].type == PTR_TO_PACKET_END)
-			mark_reg_unknown_value(regs, i);
+			mark_reg_unknown(regs, i);
 
 	for (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) {
 		if (state->stack_slot_type[i] != STACK_SPILL)
@@ -1339,8 +1416,7 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
 		if (reg->type != PTR_TO_PACKET &&
 		    reg->type != PTR_TO_PACKET_END)
 			continue;
-		reg->type = UNKNOWN_VALUE;
-		reg->imm = 0;
+		__mark_reg_unknown(reg);
 	}
 }
 
@@ -1420,14 +1496,17 @@ static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
 
 	/* update return register */
 	if (fn->ret_type == RET_INTEGER) {
-		regs[BPF_REG_0].type = UNKNOWN_VALUE;
+		/* sets type to SCALAR_VALUE */
+		mark_reg_unknown(regs, BPF_REG_0);
 	} else if (fn->ret_type == RET_VOID) {
 		regs[BPF_REG_0].type = NOT_INIT;
 	} else if (fn->ret_type == RET_PTR_TO_MAP_VALUE_OR_NULL) {
 		struct bpf_insn_aux_data *insn_aux;
 
 		regs[BPF_REG_0].type = PTR_TO_MAP_VALUE_OR_NULL;
-		regs[BPF_REG_0].max_value = regs[BPF_REG_0].min_value = 0;
+		/* There is no offset yet applied, variable or fixed */
+		mark_reg_known_zero(regs, BPF_REG_0);
+		regs[BPF_REG_0].off = 0;
 		/* remember map_ptr, so that check_map_access()
 		 * can check 'value_size' boundary of memory access
 		 * to map element returned from bpf_map_lookup_elem()
@@ -1458,371 +1537,421 @@ static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
 	return 0;
 }
 
-static int check_packet_ptr_add(struct bpf_verifier_env *env,
-				struct bpf_insn *insn)
+static void check_reg_overflow(struct bpf_reg_state *reg)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs;
-	struct bpf_reg_state *dst_reg = &regs[insn->dst_reg];
-	struct bpf_reg_state *src_reg = &regs[insn->src_reg];
-	struct bpf_reg_state tmp_reg;
-	s32 imm;
-
-	if (BPF_SRC(insn->code) == BPF_K) {
-		/* pkt_ptr += imm */
-		imm = insn->imm;
-
-add_imm:
-		if (imm < 0) {
-			verbose("addition of negative constant to packet pointer is not allowed\n");
-			return -EACCES;
-		}
-		if (imm >= MAX_PACKET_OFF ||
-		    imm + dst_reg->off >= MAX_PACKET_OFF) {
-			verbose("constant %d is too large to add to packet pointer\n",
-				imm);
-			return -EACCES;
-		}
-		/* a constant was added to pkt_ptr.
-		 * Remember it while keeping the same 'id'
-		 */
-		dst_reg->off += imm;
-	} else {
-		bool had_id;
-
-		if (src_reg->type == PTR_TO_PACKET) {
-			/* R6=pkt(id=0,off=0,r=62) R7=imm22; r7 += r6 */
-			tmp_reg = *dst_reg;  /* save r7 state */
-			*dst_reg = *src_reg; /* copy pkt_ptr state r6 into r7 */
-			src_reg = &tmp_reg;  /* pretend it's src_reg state */
-			/* if the checks below reject it, the copy won't matter,
-			 * since we're rejecting the whole program. If all ok,
-			 * then imm22 state will be added to r7
-			 * and r7 will be pkt(id=0,off=22,r=62) while
-			 * r6 will stay as pkt(id=0,off=0,r=62)
-			 */
-		}
-
-		if (src_reg->type == CONST_IMM) {
-			/* pkt_ptr += reg where reg is known constant */
-			imm = src_reg->imm;
-			goto add_imm;
-		}
-		/* disallow pkt_ptr += reg
-		 * if reg is not uknown_value with guaranteed zero upper bits
-		 * otherwise pkt_ptr may overflow and addition will become
-		 * subtraction which is not allowed
-		 */
-		if (src_reg->type != UNKNOWN_VALUE) {
-			verbose("cannot add '%s' to ptr_to_packet\n",
-				reg_type_str[src_reg->type]);
-			return -EACCES;
-		}
-		if (src_reg->imm < 48) {
-			verbose("cannot add integer value with %lld upper zero bits to ptr_to_packet\n",
-				src_reg->imm);
-			return -EACCES;
-		}
-
-		had_id = (dst_reg->id != 0);
-
-		/* dst_reg stays as pkt_ptr type and since some positive
-		 * integer value was added to the pointer, increment its 'id'
-		 */
-		dst_reg->id = ++env->id_gen;
+	if (reg->max_value > BPF_REGISTER_MAX_RANGE)
+		reg->max_value = BPF_REGISTER_MAX_RANGE;
+	if (reg->min_value < BPF_REGISTER_MIN_RANGE ||
+	    reg->min_value > BPF_REGISTER_MAX_RANGE)
+		reg->min_value = BPF_REGISTER_MIN_RANGE;
+}
 
-		/* something was added to pkt_ptr, set range to zero */
-		dst_reg->aux_off += dst_reg->off;
-		dst_reg->off = 0;
-		dst_reg->range = 0;
-		if (had_id)
-			dst_reg->aux_off_align = min(dst_reg->aux_off_align,
-						     src_reg->min_align);
-		else
-			dst_reg->aux_off_align = src_reg->min_align;
+static void coerce_reg_to_32(struct bpf_reg_state *reg)
+{
+	/* 32-bit values can't be negative as an s64 */
+	if (reg->min_value < 0)
+		reg->min_value = 0;
+	/* clear high 32 bits */
+	reg->align.value &= (u32)-1;
+	reg->align.mask &= (u32)-1;
+	/* Did value become known?  Then update bounds */
+	if (!reg->align.mask) {
+		if ((s64)reg->align.value > BPF_REGISTER_MIN_RANGE)
+			reg->min_value = reg->align.value;
+		if (reg->align.value < BPF_REGISTER_MAX_RANGE)
+			reg->max_value = reg->align.value;
 	}
-	return 0;
 }
 
-static int evaluate_reg_alu(struct bpf_verifier_env *env, struct bpf_insn *insn)
+/* Handles arithmetic on a pointer and a scalar: computes new min/max and align.
+ * Caller must check_reg_overflow all argument regs beforehand.
+ * Caller should also handle BPF_MOV case separately.
+ */
+static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
+				   struct bpf_insn *insn,
+				   struct bpf_reg_state *ptr_reg,
+				   struct bpf_reg_state *off_reg)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs;
-	struct bpf_reg_state *dst_reg = &regs[insn->dst_reg];
+	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg;
+	bool known = !off_reg->align.mask;
+	s64 min_val = off_reg->min_value;
+	u64 max_val = off_reg->max_value;
 	u8 opcode = BPF_OP(insn->code);
-	s64 imm_log2;
+	u32 dst = insn->dst_reg;
 
-	/* for type == UNKNOWN_VALUE:
-	 * imm > 0 -> number of zero upper bits
-	 * imm == 0 -> don't track which is the same as all bits can be non-zero
-	 */
+	dst_reg = &regs[dst];
 
-	if (BPF_SRC(insn->code) == BPF_X) {
-		struct bpf_reg_state *src_reg = &regs[insn->src_reg];
-
-		if (src_reg->type == UNKNOWN_VALUE && src_reg->imm > 0 &&
-		    dst_reg->imm && opcode == BPF_ADD) {
-			/* dreg += sreg
-			 * where both have zero upper bits. Adding them
-			 * can only result making one more bit non-zero
-			 * in the larger value.
-			 * Ex. 0xffff (imm=48) + 1 (imm=63) = 0x10000 (imm=47)
-			 *     0xffff (imm=48) + 0xffff = 0x1fffe (imm=47)
-			 */
-			dst_reg->imm = min(dst_reg->imm, src_reg->imm);
-			dst_reg->imm--;
-			return 0;
+	if (WARN_ON_ONCE(known && (min_val != max_val))) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
+	}
+
+	if (BPF_CLASS(insn->code) != BPF_ALU64) {
+		/* 32-bit ALU ops on pointers produce (meaningless) scalars */
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d 32-bit pointer arithmetic prohibited\n",
+				dst);
+			return -EACCES;
 		}
-		if (src_reg->type == CONST_IMM && src_reg->imm > 0 &&
-		    dst_reg->imm && opcode == BPF_ADD) {
-			/* dreg += sreg
-			 * where dreg has zero upper bits and sreg is const.
-			 * Adding them can only result making one more bit
-			 * non-zero in the larger value.
-			 */
-			imm_log2 = __ilog2_u64((long long)src_reg->imm);
-			dst_reg->imm = min(dst_reg->imm, 63 - imm_log2);
-			dst_reg->imm--;
-			return 0;
+		__mark_reg_unknown(dst_reg);
+		/* High bits are known zero */
+		dst_reg->align.mask = (u32)-1;
+		return 0;
+	}
+
+	if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
+				dst);
+			return -EACCES;
+		}
+		__mark_reg_unknown(dst_reg);
+		return 0;
+	}
+	if (ptr_reg->type == CONST_PTR_TO_MAP) {
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n",
+				dst);
+			return -EACCES;
 		}
-		/* all other cases non supported yet, just mark dst_reg */
-		dst_reg->imm = 0;
+		__mark_reg_unknown(dst_reg);
+		return 0;
+	}
+	if (ptr_reg->type == PTR_TO_PACKET_END) {
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n",
+				dst);
+			return -EACCES;
+		}
+		__mark_reg_unknown(dst_reg);
 		return 0;
 	}
 
-	/* sign extend 32-bit imm into 64-bit to make sure that
-	 * negative values occupy bit 63. Note ilog2() would have
-	 * been incorrect, since sizeof(insn->imm) == 4
+	/* In case of 'scalar += pointer', dst_reg inherits pointer type and id.
+	 * The id may be overwritten later if we create a new variable offset.
 	 */
-	imm_log2 = __ilog2_u64((long long)insn->imm);
+	dst_reg->type = ptr_reg->type;
+	dst_reg->id = ptr_reg->id;
 
-	if (dst_reg->imm && opcode == BPF_LSH) {
-		/* reg <<= imm
-		 * if reg was a result of 2 byte load, then its imm == 48
-		 * which means that upper 48 bits are zero and shifting this reg
-		 * left by 4 would mean that upper 44 bits are still zero
+	switch (opcode) {
+	case BPF_ADD:
+		/* We can take a fixed offset as long as it doesn't overflow
+		 * the s32 'off' field
 		 */
-		dst_reg->imm -= insn->imm;
-	} else if (dst_reg->imm && opcode == BPF_MUL) {
-		/* reg *= imm
-		 * if multiplying by 14 subtract 4
-		 * This is conservative calculation of upper zero bits.
-		 * It's not trying to special case insn->imm == 1 or 0 cases
+		if (known && (ptr_reg->off + min_val ==
+			      (s64)(s32)(ptr_reg->off + min_val))) {
+			/* pointer += K.  Accumulate it into fixed offset */
+			dst_reg->min_value = ptr_reg->min_value;
+			dst_reg->max_value = ptr_reg->max_value;
+			dst_reg->align = ptr_reg->align;
+			dst_reg->off = ptr_reg->off + min_val;
+			break;
+		}
+		if (max_val == BPF_REGISTER_MAX_RANGE) {
+			verbose("R%d tried to add unbounded value to pointer\n",
+				dst);
+			return -EACCES;
+		}
+		/* A new variable offset is created.  Note that off_reg->off
+		 * == 0, since it's a scalar.
+		 * dst_reg gets the pointer type and since some positive
+		 * integer value was added to the pointer, increment its 'id'.
+		 * this creates a new 'base' pointer, off_reg (variable) gets
+		 * added into the variable offset, and we copy the fixed offset
+		 * from ptr_reg.
 		 */
-		dst_reg->imm -= imm_log2 + 1;
-	} else if (opcode == BPF_AND) {
-		/* reg &= imm */
-		dst_reg->imm = 63 - imm_log2;
-	} else if (dst_reg->imm && opcode == BPF_ADD) {
-		/* reg += imm */
-		dst_reg->imm = min(dst_reg->imm, 63 - imm_log2);
-		dst_reg->imm--;
-	} else if (opcode == BPF_RSH) {
-		/* reg >>= imm
-		 * which means that after right shift, upper bits will be zero
-		 * note that verifier already checked that
-		 * 0 <= imm < 64 for shift insn
+		if (min_val <= BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value += min_val;
+		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
+			dst_reg->max_value += max_val;
+		dst_reg->align = tn_add(ptr_reg->align, off_reg->align);
+		dst_reg->off = ptr_reg->off;
+		dst_reg->id = ++env->id_gen;
+		if (ptr_reg->type == PTR_TO_PACKET)
+			/* something was added to pkt_ptr, set range to zero */
+			dst_reg->range = 0;
+		break;
+	case BPF_SUB:
+		if (dst_reg == off_reg) {
+			/* scalar -= pointer.  Creates an unknown scalar */
+			if (!env->allow_ptr_leaks) {
+				verbose("R%d tried to subtract pointer from scalar\n",
+					dst);
+				return -EACCES;
+			}
+			/* Make it an unknown scalar */
+			__mark_reg_unknown(dst_reg);
+			break;
+		}
+		/* We don't allow subtraction from FP, because (according to
+		 * test_verifier.c test "invalid fp arithmetic", JITs might not
+		 * be able to deal with it.
 		 */
-		dst_reg->imm += insn->imm;
-		if (unlikely(dst_reg->imm > 64))
-			/* some dumb code did:
-			 * r2 = *(u32 *)mem;
-			 * r2 >>= 32;
-			 * and all bits are zero now */
-			dst_reg->imm = 64;
-	} else {
-		/* all other alu ops, means that we don't know what will
-		 * happen to the value, mark it with unknown number of zero bits
+		if (ptr_reg->type == PTR_TO_STACK) {
+			if (!env->allow_ptr_leaks) {
+				verbose("R%d subtraction from stack pointer prohibited\n",
+					dst);
+				return -EACCES;
+			}
+			/* Make it an unknown scalar */
+			__mark_reg_unknown(dst_reg);
+			break;
+		}
+		if (known && (ptr_reg->off - min_val ==
+			      (s64)(s32)(ptr_reg->off - min_val))) {
+			/* pointer -= K.  Subtract it from fixed offset */
+			dst_reg->min_value = ptr_reg->min_value;
+			dst_reg->max_value = ptr_reg->max_value;
+			dst_reg->align = ptr_reg->align;
+			dst_reg->id = ptr_reg->id;
+			dst_reg->off = ptr_reg->off - min_val;
+			break;
+		}
+		/* Subtracting a negative value will just confuse everything.
+		 * This can happen if off_reg is an immediate.
 		 */
-		dst_reg->imm = 0;
-	}
-
-	if (dst_reg->imm < 0) {
-		/* all 64 bits of the register can contain non-zero bits
-		 * and such value cannot be added to ptr_to_packet, since it
-		 * may overflow, mark it as unknown to avoid further eval
+		if ((s64)max_val < 0) {
+			if (!env->allow_ptr_leaks) {
+				verbose("R%d tried to subtract negative max_val %lld from pointer\n",
+					dst, (s64)max_val);
+				return -EACCES;
+			}
+			/* Make it an unknown scalar */
+			__mark_reg_unknown(dst_reg);
+			break;
+		}
+		/* A new variable offset is created.  If the subtrahend is known
+		 * nonnegative, then any reg->range we had before is still good.
 		 */
-		dst_reg->imm = 0;
-	}
-	return 0;
-}
-
-static int evaluate_reg_imm_alu(struct bpf_verifier_env *env,
-				struct bpf_insn *insn)
-{
-	struct bpf_reg_state *regs = env->cur_state.regs;
-	struct bpf_reg_state *dst_reg = &regs[insn->dst_reg];
-	struct bpf_reg_state *src_reg = &regs[insn->src_reg];
-	u8 opcode = BPF_OP(insn->code);
-	u64 dst_imm = dst_reg->imm;
-
-	/* dst_reg->type == CONST_IMM here. Simulate execution of insns
-	 * containing ALU ops. Don't care about overflow or negative
-	 * values, just add/sub/... them; registers are in u64.
-	 */
-	if (opcode == BPF_ADD && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm += insn->imm;
-	} else if (opcode == BPF_ADD && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm += src_reg->imm;
-	} else if (opcode == BPF_SUB && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm -= insn->imm;
-	} else if (opcode == BPF_SUB && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm -= src_reg->imm;
-	} else if (opcode == BPF_MUL && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm *= insn->imm;
-	} else if (opcode == BPF_MUL && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm *= src_reg->imm;
-	} else if (opcode == BPF_OR && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm |= insn->imm;
-	} else if (opcode == BPF_OR && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm |= src_reg->imm;
-	} else if (opcode == BPF_AND && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm &= insn->imm;
-	} else if (opcode == BPF_AND && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm &= src_reg->imm;
-	} else if (opcode == BPF_RSH && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm >>= insn->imm;
-	} else if (opcode == BPF_RSH && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm >>= src_reg->imm;
-	} else if (opcode == BPF_LSH && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm <<= insn->imm;
-	} else if (opcode == BPF_LSH && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm <<= src_reg->imm;
-	} else {
-		mark_reg_unknown_value(regs, insn->dst_reg);
-		goto out;
+		if (max_val >= BPF_REGISTER_MAX_RANGE)
+			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value -= max_val;
+		if (min_val <= BPF_REGISTER_MIN_RANGE)
+			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
+			dst_reg->max_value -= min_val;
+		dst_reg->align = tn_sub(ptr_reg->align, off_reg->align);
+		dst_reg->off = ptr_reg->off;
+		dst_reg->id = ++env->id_gen;
+		if (ptr_reg->type == PTR_TO_PACKET && min_val < 0)
+			/* something was added to pkt_ptr, set range to zero */
+			dst_reg->range = 0;
+		break;
+	case BPF_AND:
+	case BPF_OR:
+	case BPF_XOR:
+		/* bitwise ops on pointers are troublesome, prohibit for now.
+		 * (However, in principle we could allow some cases, e.g.
+		 * ptr &= ~3 which would reduce min_value by 3.)
+		 */
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d bitwise operator %s on pointer prohibited\n",
+				dst, bpf_alu_string[opcode >> 4]);
+			return -EACCES;
+		}
+		/* Make it an unknown scalar */
+		__mark_reg_unknown(dst_reg);
+	default:
+		/* other operators (e.g. MUL,LSH) produce non-pointer results */
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d pointer arithmetic with %s operator prohibited\n",
+				dst, bpf_alu_string[opcode >> 4]);
+			return -EACCES;
+		}
+		/* Make it an unknown scalar */
+		__mark_reg_unknown(dst_reg);
 	}
 
-	dst_reg->imm = dst_imm;
-out:
+	check_reg_overflow(dst_reg);
 	return 0;
 }
 
-static void check_reg_overflow(struct bpf_reg_state *reg)
-{
-	if (reg->max_value > BPF_REGISTER_MAX_RANGE)
-		reg->max_value = BPF_REGISTER_MAX_RANGE;
-	if (reg->min_value < BPF_REGISTER_MIN_RANGE ||
-	    reg->min_value > BPF_REGISTER_MAX_RANGE)
-		reg->min_value = BPF_REGISTER_MIN_RANGE;
-}
-
-static u32 calc_align(u32 imm)
-{
-	if (!imm)
-		return 1U << 31;
-	return imm - ((imm - 1) & imm);
-}
-
-static void adjust_reg_min_max_vals(struct bpf_verifier_env *env,
-				    struct bpf_insn *insn)
+/* Handles ALU ops other than BPF_END, BPF_NEG and BPF_MOV: computes new min/max
+ * and align.
+ * TODO: check this is legit for ALU32, particularly around negatives
+ */
+static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
+				   struct bpf_insn *insn)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg;
+	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg, *src_reg;
+	struct bpf_reg_state *ptr_reg = NULL, off_reg = {0};
 	s64 min_val = BPF_REGISTER_MIN_RANGE;
 	u64 max_val = BPF_REGISTER_MAX_RANGE;
 	u8 opcode = BPF_OP(insn->code);
-	u32 dst_align, src_align;
+	bool src_known, dst_known;
 
 	dst_reg = &regs[insn->dst_reg];
-	src_align = 0;
+	check_reg_overflow(dst_reg);
+	src_reg = NULL;
+	if (dst_reg->type != SCALAR_VALUE)
+		ptr_reg = dst_reg;
 	if (BPF_SRC(insn->code) == BPF_X) {
-		check_reg_overflow(&regs[insn->src_reg]);
-		min_val = regs[insn->src_reg].min_value;
-		max_val = regs[insn->src_reg].max_value;
-
-		/* If the source register is a random pointer then the
-		 * min_value/max_value values represent the range of the known
-		 * accesses into that value, not the actual min/max value of the
-		 * register itself.  In this case we have to reset the reg range
-		 * values so we know it is not safe to look at.
-		 */
-		if (regs[insn->src_reg].type != CONST_IMM &&
-		    regs[insn->src_reg].type != UNKNOWN_VALUE) {
-			min_val = BPF_REGISTER_MIN_RANGE;
-			max_val = BPF_REGISTER_MAX_RANGE;
-			src_align = 0;
-		} else {
-			src_align = regs[insn->src_reg].min_align;
+		src_reg = &regs[insn->src_reg];
+		check_reg_overflow(src_reg);
+
+		if (src_reg->type != SCALAR_VALUE) {
+			if (dst_reg->type != SCALAR_VALUE) {
+				/* Combining two pointers by any ALU op yields
+				 * an arbitrary scalar.
+				 */
+				if (!env->allow_ptr_leaks) {
+					verbose("R%d pointer %s pointer prohibited\n",
+						insn->dst_reg,
+						bpf_alu_string[opcode >> 4]);
+					return -EACCES;
+				}
+				mark_reg_unknown(regs, insn->dst_reg);
+				return 0;
+			} else {
+				/* scalar += pointer
+				 * This is legal, but we have to reverse our
+				 * src/dest handling in computing the range
+				 */
+				return adjust_ptr_min_max_vals(env, insn,
+							       src_reg, dst_reg);
+			}
+		} else if (ptr_reg) {
+			/* pointer += scalar */
+			return adjust_ptr_min_max_vals(env, insn,
+						       dst_reg, src_reg);
 		}
-	} else if (insn->imm < BPF_REGISTER_MAX_RANGE &&
-		   (s64)insn->imm > BPF_REGISTER_MIN_RANGE) {
-		min_val = max_val = insn->imm;
-		src_align = calc_align(insn->imm);
+	} else {
+		/* Pretend the src is a reg with a known value, since we only
+		 * need to be able to read from this state.
+		 */
+		off_reg.type = SCALAR_VALUE;
+		off_reg.align = tn_const(insn->imm);
+		off_reg.min_value = insn->imm;
+		off_reg.max_value = insn->imm;
+		src_reg = &off_reg;
+		if (ptr_reg) /* pointer += K */
+			return adjust_ptr_min_max_vals(env, insn,
+						       ptr_reg, src_reg);
+	}
+
+	/* Got here implies adding two SCALAR_VALUEs */
+	if (WARN_ON_ONCE(ptr_reg)) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
 	}
-
-	dst_align = dst_reg->min_align;
-
-	/* We don't know anything about what was done to this register, mark it
-	 * as unknown.
-	 */
-	if (min_val == BPF_REGISTER_MIN_RANGE &&
-	    max_val == BPF_REGISTER_MAX_RANGE) {
-		reset_reg_range_values(regs, insn->dst_reg);
-		return;
+	if (WARN_ON(!src_reg)) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
 	}
-
-	/* If one of our values was at the end of our ranges then we can't just
-	 * do our normal operations to the register, we need to set the values
-	 * to the min/max since they are undefined.
-	 */
-	if (min_val == BPF_REGISTER_MIN_RANGE)
-		dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-	if (max_val == BPF_REGISTER_MAX_RANGE)
-		dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+	if (BPF_CLASS(insn->code) != BPF_ALU64) {
+		/* 32-bit ALU ops are (32,32)->64 */
+		coerce_reg_to_32(dst_reg);
+		coerce_reg_to_32(src_reg);
+	}
+	min_val = src_reg->min_value;
+	max_val = src_reg->max_value;
+	src_known = !src_reg->align.mask;
+	dst_known = !dst_reg->align.mask;
 
 	switch (opcode) {
 	case BPF_ADD:
+		if (min_val == BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
 		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
 			dst_reg->min_value += min_val;
+		/* if max_val is MAX_RANGE, this will saturate dst->max */
 		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
 			dst_reg->max_value += max_val;
-		dst_reg->min_align = min(src_align, dst_align);
+		dst_reg->align = tn_add(dst_reg->align, src_reg->align);
 		break;
 	case BPF_SUB:
+		if (max_val == BPF_REGISTER_MAX_RANGE)
+			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
 		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
-			dst_reg->min_value -= min_val;
+			dst_reg->min_value -= max_val;
+		if (min_val == BPF_REGISTER_MIN_RANGE)
+			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
 		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value -= max_val;
-		dst_reg->min_align = min(src_align, dst_align);
+			dst_reg->max_value -= min_val;
+		dst_reg->align = tn_sub(dst_reg->align, src_reg->align);
 		break;
 	case BPF_MUL:
-		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
-			dst_reg->min_value *= min_val;
+		if (min_val < 0 || dst_reg->min_value < 0) {
+			/* Ain't nobody got time to multiply that sign */
+			__mark_reg_unknown(dst_reg);
+			break;
+		}
+		dst_reg->min_value *= min_val;
+		/* if max_val is MAX_RANGE, this will saturate dst->max.
+		 * We know MAX_RANGE ** 2 won't overflow a u64, because
+		 * MAX_RANGE itself fits in a u32.
+		 */
+		BUILD_BUG_ON(BPF_REGISTER_MAX_RANGE > (u32)-1);
 		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
 			dst_reg->max_value *= max_val;
-		dst_reg->min_align = max(src_align, dst_align);
+		dst_reg->align = tn_mul(dst_reg->align, src_reg->align);
 		break;
 	case BPF_AND:
-		/* Disallow AND'ing of negative numbers, ain't nobody got time
-		 * for that.  Otherwise the minimum is 0 and the max is the max
-		 * value we could AND against.
+		if (src_known && dst_known) {
+			u64 value = dst_reg->align.value & src_reg->align.value;
+
+			dst_reg->align = tn_const(value);
+			dst_reg->min_value = dst_reg->max_value = min_t(u64,
+					value, BPF_REGISTER_MAX_RANGE);
+			break;
+		}
+		/* Lose min_value when AND'ing negative numbers, ain't nobody
+		 * got time for that.  Otherwise we get our minimum from the
+		 * align, since that's inherently bitwise.
+		 * Our maximum is the minimum of the operands' maxima.
 		 */
-		if (min_val < 0)
+		dst_reg->align = tn_and(dst_reg->align, src_reg->align);
+		if (min_val < 0 && dst_reg->min_value < 0)
 			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
 		else
-			dst_reg->min_value = 0;
-		dst_reg->max_value = max_val;
-		dst_reg->min_align = max(src_align, dst_align);
+			dst_reg->min_value = dst_reg->align.value;
+		dst_reg->max_value = min(dst_reg->max_value, max_val);
+		break;
+	case BPF_OR:
+		if (src_known && dst_known) {
+			u64 value = dst_reg->align.value | src_reg->align.value;
+
+			dst_reg->align = tn_const(value);
+			dst_reg->min_value = dst_reg->max_value = min_t(u64,
+					value, BPF_REGISTER_MAX_RANGE);
+			break;
+		}
+		/* Lose ranges when OR'ing negative numbers, ain't nobody got
+		 * time for that.  Otherwise we get our maximum from the align,
+		 * and our minimum is the maximum of the operands' minima.
+		 */
+		dst_reg->align = tn_or(dst_reg->align, src_reg->align);
+		if (min_val < 0 || dst_reg->min_value < 0) {
+			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+		} else {
+			dst_reg->min_value = max(dst_reg->min_value, min_val);
+			dst_reg->max_value = dst_reg->align.value | dst_reg->align.mask;
+		}
 		break;
 	case BPF_LSH:
+		if (min_val < 0) {
+			/* LSH by a negative number is undefined */
+			mark_reg_unknown(regs, insn->dst_reg);
+			break;
+		}
 		/* Gotta have special overflow logic here, if we're shifting
 		 * more than MAX_RANGE then just assume we have an invalid
 		 * range.
 		 */
 		if (min_val > ilog2(BPF_REGISTER_MAX_RANGE)) {
 			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-			dst_reg->min_align = 1;
+			dst_reg->align = tn_unknown;
 		} else {
 			if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
 				dst_reg->min_value <<= min_val;
-			if (!dst_reg->min_align)
-				dst_reg->min_align = 1;
-			dst_reg->min_align <<= min_val;
+			if (src_known)
+				dst_reg->align = tn_sl(dst_reg->align, min_val);
+			else
+				dst_reg->align = tn_sl(tn_unknown, min_val);
 		}
 		if (max_val > ilog2(BPF_REGISTER_MAX_RANGE))
 			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
@@ -1830,37 +1959,41 @@ static void adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 			dst_reg->max_value <<= max_val;
 		break;
 	case BPF_RSH:
-		/* RSH by a negative number is undefined, and the BPF_RSH is an
-		 * unsigned shift, so make the appropriate casts.
-		 */
-		if (min_val < 0 || dst_reg->min_value < 0) {
-			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+		if (min_val < 0) {
+			/* RSH by a negative number is undefined */
+			mark_reg_unknown(regs, insn->dst_reg);
+			break;
+		}
+		/* BPF_RSH is an unsigned shift, so make the appropriate casts */
+		if (dst_reg->min_value < 0) {
+			if (min_val)
+				/* Sign bit will be cleared */
+				dst_reg->min_value = 0;
 		} else {
 			dst_reg->min_value =
 				(u64)(dst_reg->min_value) >> min_val;
 		}
-		if (min_val < 0) {
-			dst_reg->min_align = 1;
-		} else {
-			dst_reg->min_align >>= (u64) min_val;
-			if (!dst_reg->min_align)
-				dst_reg->min_align = 1;
-		}
-		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value >>= max_val;
+		if (src_known)
+			dst_reg->align = tn_sr(dst_reg->align, min_val);
+		else
+			dst_reg->align = tn_sr(tn_unknown, min_val);
+		if (dst_reg->max_value == BPF_REGISTER_MAX_RANGE)
+			dst_reg->max_value = ~0;
+		dst_reg->max_value >>= max_val;
 		break;
 	default:
-		reset_reg_range_values(regs, insn->dst_reg);
+		mark_reg_unknown(regs, insn->dst_reg);
 		break;
 	}
 
 	check_reg_overflow(dst_reg);
+	return 0;
 }
 
 /* check validity of 32-bit and 64-bit arithmetic operations */
 static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg;
+	struct bpf_reg_state *regs = env->cur_state.regs;
 	u8 opcode = BPF_OP(insn->code);
 	int err;
 
@@ -1920,11 +2053,6 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		if (err)
 			return err;
 
-		/* we are setting our register to something new, we need to
-		 * reset its range values.
-		 */
-		reset_reg_range_values(regs, insn->dst_reg);
-
 		if (BPF_SRC(insn->code) == BPF_X) {
 			if (BPF_CLASS(insn->code) == BPF_ALU64) {
 				/* case: R1 = R2
@@ -1932,22 +2060,27 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 				 */
 				regs[insn->dst_reg] = regs[insn->src_reg];
 			} else {
+				/* R1 = (u32) R2 */
 				if (is_pointer_value(env, insn->src_reg)) {
 					verbose("R%d partial copy of pointer\n",
 						insn->src_reg);
 					return -EACCES;
 				}
-				mark_reg_unknown_value(regs, insn->dst_reg);
+				mark_reg_unknown(regs, insn->dst_reg);
+				/* high 32 bits are known zero.  But this is
+				 * still out of range for max_value, so leave
+				 * that.
+				 */
+				regs[insn->dst_reg].align.mask &= (u32)-1;
 			}
 		} else {
 			/* case: R = imm
 			 * remember the value we stored into this reg
 			 */
-			regs[insn->dst_reg].type = CONST_IMM;
-			regs[insn->dst_reg].imm = insn->imm;
+			regs[insn->dst_reg].type = SCALAR_VALUE;
+			regs[insn->dst_reg].align = tn_const(insn->imm);
 			regs[insn->dst_reg].max_value = insn->imm;
 			regs[insn->dst_reg].min_value = insn->imm;
-			regs[insn->dst_reg].min_align = calc_align(insn->imm);
 		}
 
 	} else if (opcode > BPF_END) {
@@ -1998,68 +2131,7 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		if (err)
 			return err;
 
-		dst_reg = &regs[insn->dst_reg];
-
-		/* first we want to adjust our ranges. */
-		adjust_reg_min_max_vals(env, insn);
-
-		/* pattern match 'bpf_add Rx, imm' instruction */
-		if (opcode == BPF_ADD && BPF_CLASS(insn->code) == BPF_ALU64 &&
-		    dst_reg->type == FRAME_PTR && BPF_SRC(insn->code) == BPF_K) {
-			dst_reg->type = PTR_TO_STACK;
-			dst_reg->imm = insn->imm;
-			return 0;
-		} else if (opcode == BPF_ADD &&
-			   BPF_CLASS(insn->code) == BPF_ALU64 &&
-			   dst_reg->type == PTR_TO_STACK &&
-			   ((BPF_SRC(insn->code) == BPF_X &&
-			     regs[insn->src_reg].type == CONST_IMM) ||
-			    BPF_SRC(insn->code) == BPF_K)) {
-			if (BPF_SRC(insn->code) == BPF_X)
-				dst_reg->imm += regs[insn->src_reg].imm;
-			else
-				dst_reg->imm += insn->imm;
-			return 0;
-		} else if (opcode == BPF_ADD &&
-			   BPF_CLASS(insn->code) == BPF_ALU64 &&
-			   (dst_reg->type == PTR_TO_PACKET ||
-			    (BPF_SRC(insn->code) == BPF_X &&
-			     regs[insn->src_reg].type == PTR_TO_PACKET))) {
-			/* ptr_to_packet += K|X */
-			return check_packet_ptr_add(env, insn);
-		} else if (BPF_CLASS(insn->code) == BPF_ALU64 &&
-			   dst_reg->type == UNKNOWN_VALUE &&
-			   env->allow_ptr_leaks) {
-			/* unknown += K|X */
-			return evaluate_reg_alu(env, insn);
-		} else if (BPF_CLASS(insn->code) == BPF_ALU64 &&
-			   dst_reg->type == CONST_IMM &&
-			   env->allow_ptr_leaks) {
-			/* reg_imm += K|X */
-			return evaluate_reg_imm_alu(env, insn);
-		} else if (is_pointer_value(env, insn->dst_reg)) {
-			verbose("R%d pointer arithmetic prohibited\n",
-				insn->dst_reg);
-			return -EACCES;
-		} else if (BPF_SRC(insn->code) == BPF_X &&
-			   is_pointer_value(env, insn->src_reg)) {
-			verbose("R%d pointer arithmetic prohibited\n",
-				insn->src_reg);
-			return -EACCES;
-		}
-
-		/* If we did pointer math on a map value then just set it to our
-		 * PTR_TO_MAP_VALUE_ADJ type so we can deal with any stores or
-		 * loads to this register appropriately, otherwise just mark the
-		 * register as unknown.
-		 */
-		if (env->allow_ptr_leaks &&
-		    BPF_CLASS(insn->code) == BPF_ALU64 && opcode == BPF_ADD &&
-		    (dst_reg->type == PTR_TO_MAP_VALUE ||
-		     dst_reg->type == PTR_TO_MAP_VALUE_ADJ))
-			dst_reg->type = PTR_TO_MAP_VALUE_ADJ;
-		else
-			mark_reg_unknown_value(regs, insn->dst_reg);
+		return adjust_reg_min_max_vals(env, insn);
 	}
 
 	return 0;
@@ -2071,6 +2143,10 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *state,
 	struct bpf_reg_state *regs = state->regs, *reg;
 	int i;
 
+	if (dst_reg->off < 0)
+		/* This doesn't give us any range */
+		return;
+
 	/* LLVM can generate two kind of checks:
 	 *
 	 * Type 1:
@@ -2104,20 +2180,21 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *state,
 	for (i = 0; i < MAX_BPF_REG; i++)
 		if (regs[i].type == PTR_TO_PACKET && regs[i].id == dst_reg->id)
 			/* keep the maximum range already checked */
-			regs[i].range = max(regs[i].range, dst_reg->off);
+			regs[i].range = max_t(u32, regs[i].range, dst_reg->off);
 
 	for (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) {
 		if (state->stack_slot_type[i] != STACK_SPILL)
 			continue;
 		reg = &state->spilled_regs[i / BPF_REG_SIZE];
 		if (reg->type == PTR_TO_PACKET && reg->id == dst_reg->id)
-			reg->range = max(reg->range, dst_reg->off);
+			reg->range = max_t(u32, reg->range, dst_reg->off);
 	}
 }
 
 /* Adjusts the register min/max values in the case that the dst_reg is the
  * variable register that we are working on, and src_reg is a constant or we're
  * simply doing a BPF_K check.
+ * In JEQ/JNE cases we also adjust the align values.
  */
 static void reg_set_min_max(struct bpf_reg_state *true_reg,
 			    struct bpf_reg_state *false_reg, u64 val,
@@ -2129,34 +2206,52 @@ static void reg_set_min_max(struct bpf_reg_state *true_reg,
 		 * true then we know for sure.
 		 */
 		true_reg->max_value = true_reg->min_value = val;
+		true_reg->align = tn_const(val);
 		break;
 	case BPF_JNE:
 		/* If this is true we know nothing Jon Snow, but if it is false
 		 * we know the value for sure;
 		 */
 		false_reg->max_value = false_reg->min_value = val;
+		false_reg->align = tn_const(val);
 		break;
 	case BPF_JGT:
-		/* Unsigned comparison, the minimum value is 0. */
-		false_reg->min_value = 0;
-		/* fallthrough */
-	case BPF_JSGT:
-		/* If this is false then we know the maximum val is val,
-		 * otherwise we know the min val is val+1.
+		/* Unsigned comparison, can only tell us about max_value (since
+		 * min_value is signed), unless we learn sign bit.
 		 */
 		false_reg->max_value = val;
+		/* If we're not unsigned-greater-than a positive value, then
+		 * we can't be negative.
+		 */
+		if ((s64)val >= 0 && false_reg->min_value < 0)
+			false_reg->min_value = 0;
+		break;
+	case BPF_JSGT:
+		/* Signed comparison, can only tell us about min_value (since
+		 * max_value is unsigned), unless we already know sign bit.
+		 */
 		true_reg->min_value = val + 1;
+		/* If we're not signed-greater than val, and we're not negative,
+		 * then we can't be unsigned-greater than val either.
+		 */
+		if (false_reg->min_value >= 0)
+			false_reg->max_value = val;
 		break;
 	case BPF_JGE:
-		/* Unsigned comparison, the minimum value is 0. */
-		false_reg->min_value = 0;
-		/* fallthrough */
-	case BPF_JSGE:
-		/* If this is false then we know the maximum value is val - 1,
-		 * otherwise we know the mimimum value is val.
-		 */
 		false_reg->max_value = val - 1;
+		/* If we're not unsigned-ge a positive value, then we can't be
+		 * negative.
+		 */
+		if ((s64)val >= 0 && false_reg->min_value < 0)
+			false_reg->min_value = 0;
+		break;
+	case BPF_JSGE:
 		true_reg->min_value = val;
+		/* If we're not signed-ge val, and we're not negative, then we
+		 * can't be unsigned-ge val either.
+		 */
+		if (false_reg->min_value >= 0)
+			false_reg->max_value = val - 1;
 		break;
 	default:
 		break;
@@ -2166,8 +2261,8 @@ static void reg_set_min_max(struct bpf_reg_state *true_reg,
 	check_reg_overflow(true_reg);
 }
 
-/* Same as above, but for the case that dst_reg is a CONST_IMM reg and src_reg
- * is the variable reg.
+/* Same as above, but for the case that dst_reg holds a constant and src_reg is
+ * the variable reg.
  */
 static void reg_set_min_max_inv(struct bpf_reg_state *true_reg,
 				struct bpf_reg_state *false_reg, u64 val,
@@ -2179,35 +2274,52 @@ static void reg_set_min_max_inv(struct bpf_reg_state *true_reg,
 		 * true then we know for sure.
 		 */
 		true_reg->max_value = true_reg->min_value = val;
+		true_reg->align = tn_const(val);
 		break;
 	case BPF_JNE:
 		/* If this is true we know nothing Jon Snow, but if it is false
 		 * we know the value for sure;
 		 */
 		false_reg->max_value = false_reg->min_value = val;
+		false_reg->align = tn_const(val);
 		break;
 	case BPF_JGT:
-		/* Unsigned comparison, the minimum value is 0. */
-		true_reg->min_value = 0;
-		/* fallthrough */
+		/* Unsigned comparison, can only tell us about max_value (since
+		 * min_value is signed), unless we learn sign bit.
+		 */
+		true_reg->max_value = val - 1;
+		/* If a positive value is unsigned-greater-than us, then we
+		 * can't be negative.
+		 */
+		if ((s64)val >= 0 && true_reg->min_value < 0)
+			true_reg->min_value = 0;
+		break;
 	case BPF_JSGT:
-		/*
-		 * If this is false, then the val is <= the register, if it is
-		 * true the register <= to the val.
+		/* Signed comparison, can only tell us about min_value (since
+		 * max_value is unsigned), unless we already know sign bit.
 		 */
 		false_reg->min_value = val;
-		true_reg->max_value = val - 1;
+		/* If val is signed-greater-than us, and we're not negative,
+		 * then val must be unsigned-greater-than us.
+		 */
+		if (true_reg->min_value >= 0)
+			true_reg->max_value = val - 1;
 		break;
 	case BPF_JGE:
-		/* Unsigned comparison, the minimum value is 0. */
-		true_reg->min_value = 0;
-		/* fallthrough */
-	case BPF_JSGE:
-		/* If this is false then constant < register, if it is true then
-		 * the register < constant.
+		true_reg->max_value = val;
+		/* If a positive value is unsigned-ge us, then we can't be
+		 * negative.
 		 */
+		if ((s64)val >= 0 && true_reg->min_value < 0)
+			true_reg->min_value = 0;
+		break;
+	case BPF_JSGE:
 		false_reg->min_value = val + 1;
-		true_reg->max_value = val;
+		/* If val is signed-ge us, and we're not negative, then val
+		 * must be unsigned-ge us.
+		 */
+		if (true_reg->min_value >= 0)
+			true_reg->max_value = val;
 		break;
 	default:
 		break;
@@ -2217,19 +2329,58 @@ static void reg_set_min_max_inv(struct bpf_reg_state *true_reg,
 	check_reg_overflow(true_reg);
 }
 
+/* Regs are known to be equal, so intersect their min/max/align */
+static void __reg_combine_min_max(struct bpf_reg_state *src_reg,
+				  struct bpf_reg_state *dst_reg)
+{
+	src_reg->min_value = dst_reg->min_value = max(src_reg->min_value,
+						      dst_reg->min_value);
+	src_reg->max_value = dst_reg->max_value = min(src_reg->max_value,
+						      dst_reg->max_value);
+	src_reg->align = dst_reg->align = tn_intersect(src_reg->align,
+						       dst_reg->align);
+	check_reg_overflow(src_reg);
+	check_reg_overflow(dst_reg);
+}
+
+static void reg_combine_min_max(struct bpf_reg_state *true_src,
+				struct bpf_reg_state *true_dst,
+				struct bpf_reg_state *false_src,
+				struct bpf_reg_state *false_dst,
+				u8 opcode)
+{
+	switch (opcode) {
+	case BPF_JEQ:
+		__reg_combine_min_max(true_src, true_dst);
+		break;
+	case BPF_JNE:
+		__reg_combine_min_max(false_src, false_dst);
+	}
+}
+
 static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id,
-			 enum bpf_reg_type type)
+			 bool is_null)
 {
 	struct bpf_reg_state *reg = &regs[regno];
 
 	if (reg->type == PTR_TO_MAP_VALUE_OR_NULL && reg->id == id) {
-		if (type == UNKNOWN_VALUE) {
-			__mark_reg_unknown_value(regs, regno);
+		/* Old offset (both fixed and variable parts) should
+		 * have been known-zero, because we don't allow pointer
+		 * arithmetic on pointers that might be NULL.
+		 */
+		if (WARN_ON_ONCE(reg->min_value || reg->max_value ||
+				 reg->align.value || reg->align.mask ||
+				 reg->off)) {
+			reg->min_value = reg->max_value = reg->off = 0;
+			reg->align = tn_const(0);
+		}
+		if (is_null) {
+			reg->type = SCALAR_VALUE;
 		} else if (reg->map_ptr->inner_map_meta) {
 			reg->type = CONST_PTR_TO_MAP;
 			reg->map_ptr = reg->map_ptr->inner_map_meta;
 		} else {
-			reg->type = type;
+			reg->type = PTR_TO_MAP_VALUE;
 		}
 		/* We don't need id from this point onwards anymore, thus we
 		 * should better reset it, so that state pruning has chances
@@ -2243,19 +2394,19 @@ static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id,
  * be folded together at some point.
  */
 static void mark_map_regs(struct bpf_verifier_state *state, u32 regno,
-			  enum bpf_reg_type type)
+			  bool is_null)
 {
 	struct bpf_reg_state *regs = state->regs;
 	u32 id = regs[regno].id;
 	int i;
 
 	for (i = 0; i < MAX_BPF_REG; i++)
-		mark_map_reg(regs, i, id, type);
+		mark_map_reg(regs, i, id, is_null);
 
 	for (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) {
 		if (state->stack_slot_type[i] != STACK_SPILL)
 			continue;
-		mark_map_reg(state->spilled_regs, i / BPF_REG_SIZE, id, type);
+		mark_map_reg(state->spilled_regs, i / BPF_REG_SIZE, id, is_null);
 	}
 }
 
@@ -2305,7 +2456,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 	/* detect if R == 0 where R was initialized to zero earlier */
 	if (BPF_SRC(insn->code) == BPF_K &&
 	    (opcode == BPF_JEQ || opcode == BPF_JNE) &&
-	    dst_reg->type == CONST_IMM && dst_reg->imm == insn->imm) {
+	    dst_reg->type == SCALAR_VALUE &&
+	    dst_reg->align.value == insn->imm &&
+	    dst_reg->align.mask == 0) {
 		if (opcode == BPF_JEQ) {
 			/* if (imm == imm) goto pc+off;
 			 * only follow the goto, ignore fall-through
@@ -2327,17 +2480,30 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 
 	/* detect if we are comparing against a constant value so we can adjust
 	 * our min/max values for our dst register.
+	 * this is only legit if both are scalars (or pointers to the same
+	 * object, I suppose, but we don't support that right now), because
+	 * otherwise the different base pointers mean the offsets aren't
+	 * comparable.
 	 */
 	if (BPF_SRC(insn->code) == BPF_X) {
-		if (regs[insn->src_reg].type == CONST_IMM)
-			reg_set_min_max(&other_branch->regs[insn->dst_reg],
-					dst_reg, regs[insn->src_reg].imm,
-					opcode);
-		else if (dst_reg->type == CONST_IMM)
-			reg_set_min_max_inv(&other_branch->regs[insn->src_reg],
-					    &regs[insn->src_reg], dst_reg->imm,
-					    opcode);
-	} else {
+		if (dst_reg->type == SCALAR_VALUE &&
+		    regs[insn->src_reg].type == SCALAR_VALUE) {
+			if (regs[insn->src_reg].align.mask == 0)
+				reg_set_min_max(&other_branch->regs[insn->dst_reg],
+						dst_reg, regs[insn->src_reg].align.value,
+						opcode);
+			else if (dst_reg->align.mask == 0)
+				reg_set_min_max_inv(&other_branch->regs[insn->src_reg],
+						    &regs[insn->src_reg],
+						    dst_reg->align.value, opcode);
+			else if (opcode == BPF_JEQ || opcode == BPF_JNE)
+				/* Comparing for equality, we can combine knowledge */
+				reg_combine_min_max(&other_branch->regs[insn->src_reg],
+						    &other_branch->regs[insn->dst_reg],
+						    &regs[insn->src_reg],
+						    &regs[insn->dst_reg], opcode);
+		}
+	} else if (dst_reg->type == SCALAR_VALUE) {
 		reg_set_min_max(&other_branch->regs[insn->dst_reg],
 					dst_reg, insn->imm, opcode);
 	}
@@ -2349,10 +2515,8 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 		/* Mark all identical map registers in each branch as either
 		 * safe or unknown depending R == 0 or R != 0 conditional.
 		 */
-		mark_map_regs(this_branch, insn->dst_reg,
-			      opcode == BPF_JEQ ? PTR_TO_MAP_VALUE : UNKNOWN_VALUE);
-		mark_map_regs(other_branch, insn->dst_reg,
-			      opcode == BPF_JEQ ? UNKNOWN_VALUE : PTR_TO_MAP_VALUE);
+		mark_map_regs(this_branch, insn->dst_reg, opcode == BPF_JNE);
+		mark_map_regs(other_branch, insn->dst_reg, opcode == BPF_JEQ);
 	} else if (BPF_SRC(insn->code) == BPF_X && opcode == BPF_JGT &&
 		   dst_reg->type == PTR_TO_PACKET &&
 		   regs[insn->src_reg].type == PTR_TO_PACKET_END) {
@@ -2400,8 +2564,11 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	if (insn->src_reg == 0) {
 		u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
 
-		regs[insn->dst_reg].type = CONST_IMM;
-		regs[insn->dst_reg].imm = imm;
+		regs[insn->dst_reg].type = SCALAR_VALUE;
+		regs[insn->dst_reg].min_value = imm;
+		regs[insn->dst_reg].max_value = imm;
+		check_reg_overflow(&regs[insn->dst_reg]);
+		regs[insn->dst_reg].align = tn_const(imm);
 		return 0;
 	}
 
@@ -2482,7 +2649,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	/* mark destination R0 register as readable, since it contains
 	 * the value fetched from the packet
 	 */
-	regs[BPF_REG_0].type = UNKNOWN_VALUE;
+	mark_reg_unknown(regs, BPF_REG_0);
 	return 0;
 }
 
@@ -2685,57 +2852,102 @@ static int check_cfg(struct bpf_verifier_env *env)
 	return ret;
 }
 
-/* the following conditions reduce the number of explored insns
- * from ~140k to ~80k for ultra large programs that use a lot of ptr_to_packet
- */
-static bool compare_ptrs_to_packet(struct bpf_verifier_env *env,
-				   struct bpf_reg_state *old,
-				   struct bpf_reg_state *cur)
+/* check %cur's range satisfies %old's */
+static bool range_within(struct bpf_reg_state *old,
+			 struct bpf_reg_state *cur)
 {
-	if (old->id != cur->id)
-		return false;
+	return old->min_value <= cur->min_value &&
+	       old->max_value >= cur->max_value;
+}
 
-	/* old ptr_to_packet is more conservative, since it allows smaller
-	 * range. Ex:
-	 * old(off=0,r=10) is equal to cur(off=0,r=20), because
-	 * old(off=0,r=10) means that with range=10 the verifier proceeded
-	 * further and found no issues with the program. Now we're in the same
-	 * spot with cur(off=0,r=20), so we're safe too, since anything further
-	 * will only be looking at most 10 bytes after this pointer.
-	 */
-	if (old->off == cur->off && old->range < cur->range)
+/* Returns true if (rold safe implies rcur safe) */
+static bool regsafe(struct bpf_reg_state *rold,
+		    struct bpf_reg_state *rcur,
+		    bool varlen_map_access)
+{
+	if (memcmp(rold, rcur, sizeof(*rold)) == 0)
 		return true;
 
-	/* old(off=20,r=10) is equal to cur(off=22,re=22 or 5 or 0)
-	 * since both cannot be used for packet access and safe(old)
-	 * pointer has smaller off that could be used for further
-	 * 'if (ptr > data_end)' check
-	 * Ex:
-	 * old(off=20,r=10) and cur(off=22,r=22) and cur(off=22,r=0) mean
-	 * that we cannot access the packet.
-	 * The safe range is:
-	 * [ptr, ptr + range - off)
-	 * so whenever off >=range, it means no safe bytes from this pointer.
-	 * When comparing old->off <= cur->off, it means that older code
-	 * went with smaller offset and that offset was later
-	 * used to figure out the safe range after 'if (ptr > data_end)' check
-	 * Say, 'old' state was explored like:
-	 * ... R3(off=0, r=0)
-	 * R4 = R3 + 20
-	 * ... now R4(off=20,r=0)  <-- here
-	 * if (R4 > data_end)
-	 * ... R4(off=20,r=20), R3(off=0,r=20) and R3 can be used to access.
-	 * ... the code further went all the way to bpf_exit.
-	 * Now the 'cur' state at the mark 'here' has R4(off=30,r=0).
-	 * old_R4(off=20,r=0) equal to cur_R4(off=30,r=0), since if the verifier
-	 * goes further, such cur_R4 will give larger safe packet range after
-	 * 'if (R4 > data_end)' and all further insn were already good with r=20,
-	 * so they will be good with r=30 and we can prune the search.
-	 */
-	if (!env->strict_alignment && old->off <= cur->off &&
-	    old->off >= old->range && cur->off >= cur->range)
+	if (rold->type == NOT_INIT)
+		/* explored state can't have used this */
 		return true;
+	if (rcur->type == NOT_INIT)
+		return false;
+	switch (rold->type) {
+	case SCALAR_VALUE:
+		if (rcur->type == SCALAR_VALUE) {
+			/* new val must satisfy old val knowledge */
+			return range_within(rold, rcur) &&
+			       tn_in(rold->align, rcur->align);
+		} else {
+			/* if we knew anything about the old value, we're not
+			 * equal, because we can't know anything about the
+			 * scalar value of the pointer in the new value.
+			 */
+			return rold->min_value == BPF_REGISTER_MIN_RANGE &&
+			       rold->max_value == BPF_REGISTER_MAX_RANGE &&
+			       !~rold->align.mask;
+		}
+	case PTR_TO_MAP_VALUE:
+		if (varlen_map_access) {
+			/* If the new min/max/align satisfy the old ones and
+			 * everything else matches, we are OK.
+			 * We don't care about the 'id' value, because nothing
+			 * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL)
+			 */
+			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
+			       range_within(rold, rcur) &&
+			       tn_in(rold->align, rcur->align);
+		} else {
+			/* If the ranges/align were not the same, but
+			 * everything else was and we didn't do a variable
+			 * access into a map then we are a-ok.
+			 */
+			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0;
+		}
+	case PTR_TO_MAP_VALUE_OR_NULL:
+		/* a PTR_TO_MAP_VALUE with no offset (fixed or variable) can
+		 * safely be used as a PTR_TO_MAP_VALUE_OR_NULL into the same
+		 * map.  (We can't do the same thing for a CONST_PTR_TO_MAP,
+		 * because its map_ptr changed when we NULL-checked it.)
+		 */
+		return rcur->type == PTR_TO_MAP_VALUE &&
+		       rcur->map_ptr == rold->map_ptr &&
+		       rcur->align.mask == 0 &&
+		       rcur->off == 0;
+	case PTR_TO_PACKET:
+		if (rcur->type != PTR_TO_PACKET)
+			return false;
+		/* We must have at least as much range as the old ptr
+		 * did, so that any accesses which were safe before are
+		 * still safe.  This is true even if old range < old off,
+		 * since someone could have accessed through (ptr - k), or
+		 * even done ptr -= k in a register, to get a safe access.
+		 */
+		if (rold->range > rcur->range)
+			return false;
+		/* If the offsets don't match, we can't trust our align;
+		 * nor can we be sure that we won't fall out of range.
+		 */
+		if (rold->off != rcur->off)
+			return false;
+		/* new val must satisfy old val knowledge */
+		return range_within(rold, rcur) &&
+		       tn_in(rold->align, rcur->align);
+	case PTR_TO_CTX:
+	case CONST_PTR_TO_MAP:
+	case PTR_TO_STACK:
+	case PTR_TO_PACKET_END:
+		/* Only valid matches are exact, which memcmp() above
+		 * would have accepted
+		 */
+	default:
+		/* Don't know what's going on, just say it's not safe */
+		return false;
+	}
 
+	/* Shouldn't get here; if we do, say it's not safe */
+	WARN_ON_ONCE(1);
 	return false;
 }
 
@@ -2770,43 +2982,11 @@ static bool states_equal(struct bpf_verifier_env *env,
 			 struct bpf_verifier_state *cur)
 {
 	bool varlen_map_access = env->varlen_map_value_access;
-	struct bpf_reg_state *rold, *rcur;
 	int i;
 
 	for (i = 0; i < MAX_BPF_REG; i++) {
-		rold = &old->regs[i];
-		rcur = &cur->regs[i];
-
-		if (memcmp(rold, rcur, sizeof(*rold)) == 0)
-			continue;
-
-		/* If the ranges were not the same, but everything else was and
-		 * we didn't do a variable access into a map then we are a-ok.
-		 */
-		if (!varlen_map_access &&
-		    memcmp(rold, rcur, offsetofend(struct bpf_reg_state, id)) == 0)
-			continue;
-
-		/* If we didn't map access then again we don't care about the
-		 * mismatched range values and it's ok if our old type was
-		 * UNKNOWN and we didn't go to a NOT_INIT'ed reg.
-		 */
-		if (rold->type == NOT_INIT ||
-		    (!varlen_map_access && rold->type == UNKNOWN_VALUE &&
-		     rcur->type != NOT_INIT))
-			continue;
-
-		/* Don't care about the reg->id in this case. */
-		if (rold->type == PTR_TO_MAP_VALUE_OR_NULL &&
-		    rcur->type == PTR_TO_MAP_VALUE_OR_NULL &&
-		    rold->map_ptr == rcur->map_ptr)
-			continue;
-
-		if (rold->type == PTR_TO_PACKET && rcur->type == PTR_TO_PACKET &&
-		    compare_ptrs_to_packet(env, rold, rcur))
-			continue;
-
-		return false;
+		if (!regsafe(&old->regs[i], &cur->regs[i], varlen_map_access))
+			return false;
 	}
 
 	for (i = 0; i < MAX_BPF_STACK; i++) {
@@ -2821,16 +3001,18 @@ static bool states_equal(struct bpf_verifier_env *env,
 			return false;
 		if (i % BPF_REG_SIZE)
 			continue;
-		if (memcmp(&old->spilled_regs[i / BPF_REG_SIZE],
-			   &cur->spilled_regs[i / BPF_REG_SIZE],
-			   sizeof(old->spilled_regs[0])))
-			/* when explored and current stack slot types are
-			 * the same, check that stored pointers types
+		if (old->stack_slot_type[i] == STACK_MISC)
+			continue;
+		if (!regsafe(&old->spilled_regs[i / BPF_REG_SIZE],
+			     &cur->spilled_regs[i / BPF_REG_SIZE],
+			     varlen_map_access))
+			/* when explored and current stack slot are both storing
+			 * spilled registers, check that stored pointers types
 			 * are the same as well.
 			 * Ex: explored safe path could have stored
-			 * (bpf_reg_state) {.type = PTR_TO_STACK, .imm = -8}
+			 * (bpf_reg_state) {.type = PTR_TO_STACK, .off = -8}
 			 * but current path has stored:
-			 * (bpf_reg_state) {.type = PTR_TO_STACK, .imm = -16}
+			 * (bpf_reg_state) {.type = PTR_TO_STACK, .off = -16}
 			 * such verifier states are not equivalent.
 			 * return false to continue verification of this path
 			 */
@@ -3158,7 +3340,6 @@ static int do_check(struct bpf_verifier_env *env)
 				verbose("invalid BPF_LD mode\n");
 				return -EINVAL;
 			}
-			reset_reg_range_values(regs, insn->dst_reg);
 		} else {
 			verbose("unknown insn class %d\n", class);
 			return -EINVAL;

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-07 14:58   ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-07 14:58 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov, Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev, LKML

Tracks value alignment by means of tracking known & unknown bits.
Tightens some min/max value checks and fixes a couple of bugs therein.

Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
---
 include/linux/bpf.h          |   34 +-
 include/linux/bpf_verifier.h |   40 +-
 include/linux/tnum.h         |   58 ++
 kernel/bpf/Makefile          |    2 +-
 kernel/bpf/tnum.c            |  163 +++++
 kernel/bpf/verifier.c        | 1641 +++++++++++++++++++++++-------------------
 6 files changed, 1170 insertions(+), 768 deletions(-)
 create mode 100644 include/linux/tnum.h
 create mode 100644 kernel/bpf/tnum.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6bb38d7..5ac19ab 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -115,35 +115,25 @@ enum bpf_access_type {
 };
 
 /* types of values stored in eBPF registers */
+/* Pointer types represent:
+ * pointer
+ * pointer + imm
+ * pointer + (u16) var
+ * pointer + (u16) var + imm
+ * if (range > 0) then [ptr, ptr + range - off) is safe to access
+ * if (id > 0) means that some 'var' was added
+ * if (off > 0) means that 'imm' was added
+ */
 enum bpf_reg_type {
 	NOT_INIT = 0,		 /* nothing was written into register */
-	UNKNOWN_VALUE,		 /* reg doesn't contain a valid pointer */
+	SCALAR_VALUE,		 /* reg doesn't contain a valid pointer */
 	PTR_TO_CTX,		 /* reg points to bpf_context */
 	CONST_PTR_TO_MAP,	 /* reg points to struct bpf_map */
 	PTR_TO_MAP_VALUE,	 /* reg points to map element value */
 	PTR_TO_MAP_VALUE_OR_NULL,/* points to map elem value or NULL */
-	FRAME_PTR,		 /* reg == frame_pointer */
-	PTR_TO_STACK,		 /* reg == frame_pointer + imm */
-	CONST_IMM,		 /* constant integer value */
-
-	/* PTR_TO_PACKET represents:
-	 * skb->data
-	 * skb->data + imm
-	 * skb->data + (u16) var
-	 * skb->data + (u16) var + imm
-	 * if (range > 0) then [ptr, ptr + range - off) is safe to access
-	 * if (id > 0) means that some 'var' was added
-	 * if (off > 0) menas that 'imm' was added
-	 */
-	PTR_TO_PACKET,
+	PTR_TO_STACK,		 /* reg == frame_pointer + offset */
+	PTR_TO_PACKET,		 /* reg points to skb->data */
 	PTR_TO_PACKET_END,	 /* skb->data + headlen */
-
-	/* PTR_TO_MAP_VALUE_ADJ is used for doing pointer math inside of a map
-	 * elem value.  We only allow this if we can statically verify that
-	 * access from this register are going to fall within the size of the
-	 * map element.
-	 */
-	PTR_TO_MAP_VALUE_ADJ,
 };
 
 struct bpf_prog;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index d5093b5..e341469 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -9,6 +9,7 @@
 
 #include <linux/bpf.h> /* for enum bpf_reg_type */
 #include <linux/filter.h> /* for MAX_BPF_STACK */
+#include <linux/tnum.h>
 
  /* Just some arbitrary values so we can safely do math without overflowing and
   * are obviously wrong for any sort of memory access.
@@ -19,30 +20,39 @@
 struct bpf_reg_state {
 	enum bpf_reg_type type;
 	union {
-		/* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE */
-		s64 imm;
-
-		/* valid when type == PTR_TO_PACKET* */
-		struct {
-			u16 off;
-			u16 range;
-		};
+		/* valid when type == PTR_TO_PACKET */
+		u32 range;
 
 		/* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
 		 *   PTR_TO_MAP_VALUE_OR_NULL
 		 */
 		struct bpf_map *map_ptr;
 	};
+	/* Fixed part of pointer offset, pointer types only */
+	s32 off;
+	/* Used to find other pointers with the same variable offset, so they
+	 * can share range knowledge.
+	 * Exception: for PTR_TO_MAP_VALUE_OR_NULL this is used to share which
+	 * map value we came from, when one is tested for != NULL.  Note that
+	 * this overloading means that we can't do pointer arithmetic on a
+	 * PTR_TO_MAP_VALUE_OR_NULL, we have to NULL-check it _first_.
+	 */
 	u32 id;
+	/* These three fields must be last.  See states_equal() */
+	/* For scalar types (SCALAR_VALUE), this represents our knowledge of
+	 * the actual value.
+	 * For pointer types, this represents the variable part of the offset
+	 * from the pointed-to object, and is shared with all bpf_reg_states
+	 * with the same id as us.
+	 */
+	struct tnum align;
 	/* Used to determine if any memory access using this register will
-	 * result in a bad access. These two fields must be last.
-	 * See states_equal()
+	 * result in a bad access.
+	 * These refer to the same value as align, not necessarily the actual
+	 * contents of the register.
 	 */
-	s64 min_value;
-	u64 max_value;
-	u32 min_align;
-	u32 aux_off;
-	u32 aux_off_align;
+	s64 min_value; /* minimum possible (s64)value */
+	u64 max_value; /* maximum possible (u64)value */
 };
 
 enum bpf_stack_slot_type {
diff --git a/include/linux/tnum.h b/include/linux/tnum.h
new file mode 100644
index 0000000..d9279a6
--- /dev/null
+++ b/include/linux/tnum.h
@@ -0,0 +1,58 @@
+/* tnum: tracked (or tristate) numbers
+ *
+ * A tnum tracks knowledge about the bits of a value.  Each bit can be either
+ * known (0 or 1), or unknown (x).  Arithmetic operations on tnums will
+ * propagate the unknown bits such that the tnum result represents all the
+ * possible results for possible values of the operands.
+ */
+#include <linux/types.h>
+
+struct tnum {
+	u64 value;
+	u64 mask;
+};
+
+/* Constructors */
+/* Represent a known constant as a tnum. */
+struct tnum tn_const(u64 value);
+/* A completely unknown value */
+extern const struct tnum tn_unknown;
+
+/* Arithmetic and logical ops */
+/* Shift a tnum left (by a fixed shift) */
+struct tnum tn_sl(struct tnum a, u8 shift);
+/* Shift a tnum right (by a fixed shift) */
+struct tnum tn_sr(struct tnum a, u8 shift);
+/* Add two tnums, return %a + %b */
+struct tnum tn_add(struct tnum a, struct tnum b);
+/* Subtract two tnums, return %a - %b */
+struct tnum tn_sub(struct tnum a, struct tnum b);
+/* Bitwise-AND, return %a & %b */
+struct tnum tn_and(struct tnum a, struct tnum b);
+/* Bitwise-OR, return %a | %b */
+struct tnum tn_or(struct tnum a, struct tnum b);
+/* Bitwise-XOR, return %a ^ %b */
+struct tnum tn_xor(struct tnum a, struct tnum b);
+/* Multiply two tnums, return %a * %b */
+struct tnum tn_mul(struct tnum a, struct tnum b);
+
+/* Return a tnum representing numbers satisfying both %a and %b */
+struct tnum tn_intersect(struct tnum a, struct tnum b);
+
+/* Returns true if %a is known to be a multiple of %size.
+ * %size must be a power of two.
+ */
+bool tn_is_aligned(struct tnum a, u64 size);
+
+/* Returns true if %b represents a subset of %a. */
+bool tn_in(struct tnum a, struct tnum b);
+
+/* Formatting functions.  These have snprintf-like semantics: they will write
+ * up to size bytes (including the terminating NUL byte), and return the number
+ * of bytes (excluding the terminating NUL) which would have been written had
+ * sufficient space been available.  (Thus tn_sbin always returns 64.)
+ */
+/* Format a tnum as a pair of hex numbers (value; mask) */
+int tn_strn(char *str, size_t size, struct tnum a);
+/* Format a tnum as tristate binary expansion */
+int tn_sbin(char *str, size_t size, struct tnum a);
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index e1e5e65..df14def 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -1,6 +1,6 @@
 obj-y := core.o
 
-obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o
+obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
 ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
new file mode 100644
index 0000000..cd167f4
--- /dev/null
+++ b/kernel/bpf/tnum.c
@@ -0,0 +1,163 @@
+/* tnum: tracked (or tristate) numbers
+ *
+ * A tnum tracks knowledge about the bits of a value.  Each bit can be either
+ * known (0 or 1), or unknown (x).  Arithmetic operations on tnums will
+ * propagate the unknown bits such that the tnum result represents all the
+ * possible results for possible values of the operands.
+ */
+#include <linux/kernel.h>
+#include <linux/tnum.h>
+
+#define TNUM(_v, _m)	(struct tnum){.value = _v, .mask = _m}
+/* A completely unknown value */
+const struct tnum tn_unknown = { .value = 0, .mask = -1 };
+
+struct tnum tn_const(u64 value)
+{
+	return TNUM(value, 0);
+}
+
+struct tnum tn_phi(struct tnum a, struct tnum b)
+{
+	u64 delta, mu;
+
+	delta = a.value ^ b.value;
+	mu = a.mask | b.mask | delta;
+	return TNUM(a.value & ~mu, mu);
+}
+
+struct tnum tn_sl(struct tnum a, u8 shift)
+{
+	return TNUM(a.value << shift, a.mask << shift);
+}
+
+struct tnum tn_sr(struct tnum a, u8 shift)
+{
+	return TNUM(a.value >> shift, a.mask >> shift);
+}
+
+struct tnum tn_add(struct tnum a, struct tnum b)
+{
+	u64 sm, sv, sigma, chi, mu;
+
+	sm = a.mask + b.mask;
+	sv = a.value + b.value;
+	sigma = sm + sv;
+	chi = sigma ^ sv;
+	mu = chi | a.mask | b.mask;
+	return TNUM(sv & ~mu, mu);
+}
+
+struct tnum tn_sub(struct tnum a, struct tnum b)
+{
+	u64 dv, alpha, beta, chi, mu;
+
+	dv = a.value - b.value;
+	alpha = dv + a.mask;
+	beta = dv - b.mask;
+	chi = alpha ^ beta;
+	mu = chi | a.mask | b.mask;
+	return TNUM(dv & ~mu, mu);
+}
+
+struct tnum tn_and(struct tnum a, struct tnum b)
+{
+	u64 alpha, beta, v;
+
+	alpha = a.value | a.mask;
+	beta = b.value | b.mask;
+	v = a.value & b.value;
+	return TNUM(v, alpha & beta & ~v);
+}
+
+struct tnum tn_or(struct tnum a, struct tnum b)
+{
+	u64 v, mu;
+
+	v = a.value | b.value;
+	mu = a.mask | b.mask;
+	return TNUM(v, mu & ~v);
+}
+
+struct tnum tn_xor(struct tnum a, struct tnum b)
+{
+	u64 v, mu;
+
+	v = a.value ^ b.value;
+	mu = a.mask | b.mask;
+	return TNUM(v & ~mu, mu);
+}
+
+/* half-multiply add: acc += (unknown * mask * value) */
+static struct tnum hma(struct tnum acc, u64 value, u64 mask)
+{
+	while (mask) {
+		if (mask & 1)
+			acc = tn_add(acc, TNUM(0, value));
+		mask >>= 1;
+		value <<= 1;
+	}
+	return acc;
+}
+
+struct tnum tn_mul(struct tnum a, struct tnum b)
+{
+	struct tnum acc;
+	u64 pi;
+
+	pi = a.value * b.value;
+	acc = hma(TNUM(pi, 0), a.mask, b.mask | b.value);
+	return hma(acc, b.mask, a.value);
+}
+
+/* Note that if a and b disagree - i.e. one has a 'known 1' where the other has
+ * a 'known 0' - this will return a 'known 1' for that bit.
+ */
+struct tnum tn_intersect(struct tnum a, struct tnum b)
+{
+	u64 v, mu;
+
+	v = a.value | b.value;
+	mu = a.mask & b.mask;
+	return TNUM(v & ~mu, mu);
+}
+
+bool tn_is_aligned(struct tnum a, u64 size)
+{
+	if (!size)
+		return true;
+	return !((a.value | a.mask) & (size - 1));
+}
+
+bool tn_in(struct tnum a, struct tnum b)
+{
+	if (b.mask & ~a.mask)
+		return false;
+	b.value &= ~a.mask;
+	return a.value == b.value;
+}
+
+int tn_strn(char *str, size_t size, struct tnum a)
+{
+	return snprintf(str, size, "(%#llx; %#llx)", a.value, a.mask);
+}
+
+int tn_sbin(char *str, size_t size, struct tnum a)
+{
+	size_t n;
+
+	for (n = 64; n; n--) {
+		if (n < size) {
+			if (a.mask & 1)
+				str[n - 1] = 'x';
+			else if (a.value & 1)
+				str[n - 1] = '1';
+			else
+				str[n - 1] = '0';
+		}
+		a.mask >>= 1;
+		a.value >>= 1;
+	}
+	str[min(size - 1, (size_t)64)] = 0;
+	return 64;
+}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 339c8a1..dd06e4e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -61,12 +61,12 @@
  * (and -20 constant is saved for further stack bounds checking).
  * Meaning that this reg is a pointer to stack plus known immediate constant.
  *
- * Most of the time the registers have UNKNOWN_VALUE type, which
+ * Most of the time the registers have SCALAR_VALUE type, which
  * means the register has some value, but it's not a valid pointer.
- * (like pointer plus pointer becomes UNKNOWN_VALUE type)
+ * (like pointer plus pointer becomes SCALAR_VALUE type)
  *
  * When verifier sees load or store instructions the type of base register
- * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, FRAME_PTR. These are three pointer
+ * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, PTR_TO_STACK. These are three pointer
  * types recognized by check_mem_access() function.
  *
  * PTR_TO_MAP_VALUE means that this register is pointing to 'map element value'
@@ -180,15 +180,12 @@ static __printf(1, 2) void verbose(const char *fmt, ...)
 /* string representation of 'enum bpf_reg_type' */
 static const char * const reg_type_str[] = {
 	[NOT_INIT]		= "?",
-	[UNKNOWN_VALUE]		= "inv",
+	[SCALAR_VALUE]		= "inv",
 	[PTR_TO_CTX]		= "ctx",
 	[CONST_PTR_TO_MAP]	= "map_ptr",
 	[PTR_TO_MAP_VALUE]	= "map_value",
 	[PTR_TO_MAP_VALUE_OR_NULL] = "map_value_or_null",
-	[PTR_TO_MAP_VALUE_ADJ]	= "map_value_adj",
-	[FRAME_PTR]		= "fp",
 	[PTR_TO_STACK]		= "fp",
-	[CONST_IMM]		= "imm",
 	[PTR_TO_PACKET]		= "pkt",
 	[PTR_TO_PACKET_END]	= "pkt_end",
 };
@@ -221,32 +218,36 @@ static void print_verifier_state(struct bpf_verifier_state *state)
 		if (t == NOT_INIT)
 			continue;
 		verbose(" R%d=%s", i, reg_type_str[t]);
-		if (t == CONST_IMM || t == PTR_TO_STACK)
-			verbose("%lld", reg->imm);
-		else if (t == PTR_TO_PACKET)
-			verbose("(id=%d,off=%d,r=%d)",
-				reg->id, reg->off, reg->range);
-		else if (t == UNKNOWN_VALUE && reg->imm)
-			verbose("%lld", reg->imm);
-		else if (t == CONST_PTR_TO_MAP || t == PTR_TO_MAP_VALUE ||
-			 t == PTR_TO_MAP_VALUE_OR_NULL ||
-			 t == PTR_TO_MAP_VALUE_ADJ)
-			verbose("(ks=%d,vs=%d,id=%u)",
-				reg->map_ptr->key_size,
-				reg->map_ptr->value_size,
-				reg->id);
-		if (reg->min_value != BPF_REGISTER_MIN_RANGE)
-			verbose(",min_value=%lld",
-				(long long)reg->min_value);
-		if (reg->max_value != BPF_REGISTER_MAX_RANGE)
-			verbose(",max_value=%llu",
-				(unsigned long long)reg->max_value);
-		if (reg->min_align)
-			verbose(",min_align=%u", reg->min_align);
-		if (reg->aux_off)
-			verbose(",aux_off=%u", reg->aux_off);
-		if (reg->aux_off_align)
-			verbose(",aux_off_align=%u", reg->aux_off_align);
+		if ((t == SCALAR_VALUE || t == PTR_TO_STACK) &&
+		    !reg->align.mask) {
+			/* reg->off should be 0 for SCALAR_VALUE */
+			verbose("%lld", reg->align.value + reg->off);
+		} else {
+			verbose("(id=%d", reg->id);
+			if (t != SCALAR_VALUE)
+				verbose(",off=%d", reg->off);
+			if (t == PTR_TO_PACKET)
+				verbose(",r=%d", reg->range);
+			else if (t == CONST_PTR_TO_MAP ||
+				 t == PTR_TO_MAP_VALUE ||
+				 t == PTR_TO_MAP_VALUE_OR_NULL)
+				verbose(",ks=%d,vs=%d",
+					reg->map_ptr->key_size,
+					reg->map_ptr->value_size);
+			if (reg->min_value != BPF_REGISTER_MIN_RANGE)
+				verbose(",min_value=%lld",
+					(long long)reg->min_value);
+			if (reg->max_value != BPF_REGISTER_MAX_RANGE)
+				verbose(",max_value=%llu",
+					(unsigned long long)reg->max_value);
+			if (~reg->align.mask) {
+				char tn_buf[48];
+
+				tn_strn(tn_buf, sizeof(tn_buf), reg->align);
+				verbose(",align=%s", tn_buf);
+			}
+			verbose(")");
+		}
 	}
 	for (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) {
 		if (state->stack_slot_type[i] == STACK_SPILL)
@@ -463,55 +464,59 @@ static const int caller_saved[CALLER_SAVED_REGS] = {
 	BPF_REG_0, BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4, BPF_REG_5
 };
 
-static void mark_reg_not_init(struct bpf_reg_state *regs, u32 regno)
+/* Mark the 'variable offset' part of a register as zero.  This should be
+ * used only on registers holding a pointer type.
+ */
+static void __mark_reg_known_zero(struct bpf_reg_state *reg)
 {
-	BUG_ON(regno >= MAX_BPF_REG);
-
-	memset(&regs[regno], 0, sizeof(regs[regno]));
-	regs[regno].type = NOT_INIT;
-	regs[regno].min_value = BPF_REGISTER_MIN_RANGE;
-	regs[regno].max_value = BPF_REGISTER_MAX_RANGE;
+	reg->align = tn_const(0);
+	reg->min_value = 0;
+	reg->max_value = 0;
 }
 
-static void init_reg_state(struct bpf_reg_state *regs)
+static void mark_reg_known_zero(struct bpf_reg_state *regs, u32 regno)
 {
-	int i;
-
-	for (i = 0; i < MAX_BPF_REG; i++)
-		mark_reg_not_init(regs, i);
-
-	/* frame pointer */
-	regs[BPF_REG_FP].type = FRAME_PTR;
-
-	/* 1st arg to a function */
-	regs[BPF_REG_1].type = PTR_TO_CTX;
+	BUG_ON(regno >= MAX_BPF_REG);
+	__mark_reg_known_zero(regs + regno);
 }
 
-static void __mark_reg_unknown_value(struct bpf_reg_state *regs, u32 regno)
+/* Mark a register as having a completely unknown (scalar) value. */
+static void __mark_reg_unknown(struct bpf_reg_state *reg)
 {
-	regs[regno].type = UNKNOWN_VALUE;
-	regs[regno].id = 0;
-	regs[regno].imm = 0;
+	reg->type = SCALAR_VALUE;
+	reg->id = 0;
+	reg->off = 0;
+	reg->align = tn_unknown;
+	reg->min_value = BPF_REGISTER_MIN_RANGE;
+	reg->max_value = BPF_REGISTER_MAX_RANGE;
 }
 
-static void mark_reg_unknown_value(struct bpf_reg_state *regs, u32 regno)
+static void mark_reg_unknown(struct bpf_reg_state *regs, u32 regno)
 {
 	BUG_ON(regno >= MAX_BPF_REG);
-	__mark_reg_unknown_value(regs, regno);
+	__mark_reg_unknown(regs + regno);
 }
 
-static void reset_reg_range_values(struct bpf_reg_state *regs, u32 regno)
+static void mark_reg_not_init(struct bpf_reg_state *regs, u32 regno)
 {
-	regs[regno].min_value = BPF_REGISTER_MIN_RANGE;
-	regs[regno].max_value = BPF_REGISTER_MAX_RANGE;
-	regs[regno].min_align = 0;
+	mark_reg_unknown(regs, regno);
+	regs[regno].type = NOT_INIT;
 }
 
-static void mark_reg_unknown_value_and_range(struct bpf_reg_state *regs,
-					     u32 regno)
+static void init_reg_state(struct bpf_reg_state *regs)
 {
-	mark_reg_unknown_value(regs, regno);
-	reset_reg_range_values(regs, regno);
+	int i;
+
+	for (i = 0; i < MAX_BPF_REG; i++)
+		mark_reg_not_init(regs, i);
+
+	/* frame pointer */
+	regs[BPF_REG_FP].type = PTR_TO_STACK;
+	mark_reg_known_zero(regs, BPF_REG_FP);
+
+	/* 1st arg to a function */
+	regs[BPF_REG_1].type = PTR_TO_CTX;
+	mark_reg_known_zero(regs, BPF_REG_1);
 }
 
 enum reg_arg_type {
@@ -541,7 +546,7 @@ static int check_reg_arg(struct bpf_reg_state *regs, u32 regno,
 			return -EACCES;
 		}
 		if (t == DST_OP)
-			mark_reg_unknown_value(regs, regno);
+			mark_reg_unknown(regs, regno);
 	}
 	return 0;
 }
@@ -565,12 +570,10 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	switch (type) {
 	case PTR_TO_MAP_VALUE:
 	case PTR_TO_MAP_VALUE_OR_NULL:
-	case PTR_TO_MAP_VALUE_ADJ:
 	case PTR_TO_STACK:
 	case PTR_TO_CTX:
 	case PTR_TO_PACKET:
 	case PTR_TO_PACKET_END:
-	case FRAME_PTR:
 	case CONST_PTR_TO_MAP:
 		return true;
 	default:
@@ -650,14 +653,13 @@ static int check_stack_read(struct bpf_verifier_state *state, int off, int size,
 		}
 		if (value_regno >= 0)
 			/* have read misc data from the stack */
-			mark_reg_unknown_value_and_range(state->regs,
-							 value_regno);
+			mark_reg_unknown(state->regs, value_regno);
 		return 0;
 	}
 }
 
 /* check read/write into map element returned by bpf_map_lookup_elem() */
-static int check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
+static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
 			    int size)
 {
 	struct bpf_map *map = env->cur_state.regs[regno].map_ptr;
@@ -670,22 +672,25 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
 	return 0;
 }
 
-/* check read/write into an adjusted map element */
-static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
+/* check read/write into a map element with possible variable offset */
+static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 				int off, int size)
 {
 	struct bpf_verifier_state *state = &env->cur_state;
 	struct bpf_reg_state *reg = &state->regs[regno];
 	int err;
 
-	/* We adjusted the register to this map value, so we
-	 * need to change off and size to min_value and max_value
-	 * respectively to make sure our theoretical access will be
-	 * safe.
+	/* We may have adjusted the register to this map value, so we
+	 * need to try adding each of min_value and max_value to off
+	 * to make sure our theoretical access will be safe.
 	 */
 	if (log_level)
 		print_verifier_state(state);
-	env->varlen_map_value_access = true;
+	/* If the offset is variable, we will need to be stricter in state
+	 * pruning from now on.
+	 */
+	if (reg->align.mask)
+		env->varlen_map_value_access = true;
 	/* The minimum value is only important with signed
 	 * comparisons where we can't assume the floor of a
 	 * value is 0.  If we are using signed variables for our
@@ -697,10 +702,9 @@ static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
 			regno);
 		return -EACCES;
 	}
-	err = check_map_access(env, regno, reg->min_value + off, size);
+	err = __check_map_access(env, regno, reg->min_value + off, size);
 	if (err) {
-		verbose("R%d min value is outside of the array range\n",
-			regno);
+		verbose("R%d min value is outside of the array range\n", regno);
 		return err;
 	}
 
@@ -712,7 +716,10 @@ static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
 			regno);
 		return -EACCES;
 	}
-	return check_map_access(env, regno, reg->max_value + off, size);
+	err = __check_map_access(env, regno, reg->max_value + off, size);
+	if (err)
+		verbose("R%d max value is outside of the array range\n", regno);
+	return err;
 }
 
 #define MAX_PACKET_OFF 0xffff
@@ -742,14 +749,14 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
 	}
 }
 
-static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
-			       int size)
+static int __check_packet_access(struct bpf_verifier_env *env, u32 regno,
+				 int off, int size)
 {
 	struct bpf_reg_state *regs = env->cur_state.regs;
 	struct bpf_reg_state *reg = &regs[regno];
 
-	off += reg->off;
-	if (off < 0 || size <= 0 || off + size > reg->range) {
+	if (off < 0 || size <= 0 || off > MAX_PACKET_OFF ||
+	    off + size > reg->range) {
 		verbose("invalid access to packet, off=%d size=%d, R%d(id=%d,off=%d,r=%d)\n",
 			off, size, regno, reg->id, reg->off, reg->range);
 		return -EACCES;
@@ -757,7 +764,35 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
 	return 0;
 }
 
-/* check access to 'struct bpf_context' fields */
+static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
+			       int size)
+{
+	struct bpf_reg_state *regs = env->cur_state.regs;
+	struct bpf_reg_state *reg = &regs[regno];
+	int err;
+
+	/* We may have added a variable offset to the packet pointer; but any
+	 * reg->range we have comes after that.  We are only checking the fixed
+	 * offset.
+	 */
+
+	/* We don't allow negative numbers, because we aren't tracking enough
+	 * detail to prove they're safe.
+	 */
+	if (reg->min_value < 0) {
+		verbose("R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n",
+			regno);
+		return -EACCES;
+	}
+	err = __check_packet_access(env, regno, off, size);
+	if (err) {
+		verbose("R%d offset is outside of the packet\n", regno);
+		return err;
+	}
+	return err;
+}
+
+/* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
 static int check_ctx_access(struct bpf_verifier_env *env, int off, int size,
 			    enum bpf_access_type t, enum bpf_reg_type *reg_type)
 {
@@ -782,35 +817,19 @@ static bool is_pointer_value(struct bpf_verifier_env *env, int regno)
 	if (env->allow_ptr_leaks)
 		return false;
 
-	switch (env->cur_state.regs[regno].type) {
-	case UNKNOWN_VALUE:
-	case CONST_IMM:
-		return false;
-	default:
-		return true;
-	}
+	return env->cur_state.regs[regno].type != SCALAR_VALUE;
 }
 
 static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg,
 				   int off, int size, bool strict)
 {
+	struct tnum reg_off;
 	int ip_align;
-	int reg_off;
 
 	/* Byte size accesses are always allowed. */
 	if (!strict || size == 1)
 		return 0;
 
-	reg_off = reg->off;
-	if (reg->id) {
-		if (reg->aux_off_align % size) {
-			verbose("Packet access is only %u byte aligned, %d byte access not allowed\n",
-				reg->aux_off_align, size);
-			return -EACCES;
-		}
-		reg_off += reg->aux_off;
-	}
-
 	/* For platforms that do not have a Kconfig enabling
 	 * CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS the value of
 	 * NET_IP_ALIGN is universally set to '2'.  And on platforms
@@ -820,20 +839,37 @@ static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg,
 	 * unconditional IP align value of '2'.
 	 */
 	ip_align = 2;
-	if ((ip_align + reg_off + off) % size != 0) {
-		verbose("misaligned packet access off %d+%d+%d size %d\n",
-			ip_align, reg_off, off, size);
+
+	reg_off = tn_add(reg->align, tn_const(ip_align + reg->off + off));
+	if (!tn_is_aligned(reg_off, size)) {
+		char tn_buf[48];
+
+		tn_strn(tn_buf, sizeof(tn_buf), reg->align);
+		verbose("misaligned packet access off %d+%s+%d+%d size %d\n",
+			ip_align, tn_buf, reg->off, off, size);
 		return -EACCES;
 	}
 
 	return 0;
 }
 
-static int check_val_ptr_alignment(const struct bpf_reg_state *reg,
-				   int size, bool strict)
+static int check_generic_ptr_alignment(const struct bpf_reg_state *reg,
+				       const char *pointer_desc,
+				       int off, int size, bool strict)
 {
-	if (strict && size != 1) {
-		verbose("Unknown alignment. Only byte-sized access allowed in value access.\n");
+	struct tnum reg_off;
+
+	/* Byte size accesses are always allowed. */
+	if (!strict || size == 1)
+		return 0;
+
+	reg_off = tn_add(reg->align, tn_const(reg->off + off));
+	if (!tn_is_aligned(reg_off, size)) {
+		char tn_buf[48];
+
+		tn_strn(tn_buf, sizeof(tn_buf), reg->align);
+		verbose("misaligned %saccess off %s+%d+%d size %d\n",
+			pointer_desc, tn_buf, reg->off, off, size);
 		return -EACCES;
 	}
 
@@ -845,21 +881,25 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
 			       int off, int size)
 {
 	bool strict = env->strict_alignment;
+	const char *pointer_desc = "";
 
 	switch (reg->type) {
 	case PTR_TO_PACKET:
+		/* special case, because of NET_IP_ALIGN */
 		return check_pkt_ptr_alignment(reg, off, size, strict);
-	case PTR_TO_MAP_VALUE_ADJ:
-		return check_val_ptr_alignment(reg, size, strict);
+	case PTR_TO_MAP_VALUE:
+		pointer_desc = "value ";
+		break;
+	case PTR_TO_CTX:
+		pointer_desc = "context ";
+		break;
+	case PTR_TO_STACK:
+		pointer_desc = "stack ";
+		break;
 	default:
-		if (off % size != 0) {
-			verbose("misaligned access off %d size %d\n",
-				off, size);
-			return -EACCES;
-		}
-
-		return 0;
+		break;
 	}
+	return check_generic_ptr_alignment(reg, pointer_desc, off, size, strict);
 }
 
 /* check whether memory at (regno + off) is accessible for t = (read | write)
@@ -876,52 +916,78 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 	struct bpf_reg_state *reg = &state->regs[regno];
 	int size, err = 0;
 
-	if (reg->type == PTR_TO_STACK)
-		off += reg->imm;
-
 	size = bpf_size_to_bytes(bpf_size);
 	if (size < 0)
 		return size;
 
+	/* alignment checks will add in reg->off themselves */
 	err = check_ptr_alignment(env, reg, off, size);
 	if (err)
 		return err;
 
-	if (reg->type == PTR_TO_MAP_VALUE ||
-	    reg->type == PTR_TO_MAP_VALUE_ADJ) {
+	/* for access checks, reg->off is just part of off */
+	off += reg->off;
+
+	if (reg->type == PTR_TO_MAP_VALUE) {
 		if (t == BPF_WRITE && value_regno >= 0 &&
 		    is_pointer_value(env, value_regno)) {
 			verbose("R%d leaks addr into map\n", value_regno);
 			return -EACCES;
 		}
 
-		if (reg->type == PTR_TO_MAP_VALUE_ADJ)
-			err = check_map_access_adj(env, regno, off, size);
-		else
-			err = check_map_access(env, regno, off, size);
+		err = check_map_access(env, regno, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
-			mark_reg_unknown_value_and_range(state->regs,
-							 value_regno);
+			mark_reg_unknown(state->regs, value_regno);
 
 	} else if (reg->type == PTR_TO_CTX) {
-		enum bpf_reg_type reg_type = UNKNOWN_VALUE;
+		enum bpf_reg_type reg_type = SCALAR_VALUE;
 
 		if (t == BPF_WRITE && value_regno >= 0 &&
 		    is_pointer_value(env, value_regno)) {
 			verbose("R%d leaks addr into ctx\n", value_regno);
 			return -EACCES;
 		}
+		/* ctx accesses must be at a fixed offset, so that we can
+		 * determine what type of data were returned.
+		 */
+		if (reg->align.mask) {
+			char tn_buf[48];
+
+			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
+			verbose("variable ctx access align=%s off=%d size=%d",
+				tn_buf, off, size);
+			return -EACCES;
+		}
+		off += reg->align.value;
 		err = check_ctx_access(env, off, size, t, &reg_type);
 		if (!err && t == BPF_READ && value_regno >= 0) {
-			mark_reg_unknown_value_and_range(state->regs,
-							 value_regno);
-			/* note that reg.[id|off|range] == 0 */
+			/* ctx access returns either a scalar, or a
+			 * PTR_TO_PACKET[_END].  In the latter case, we know
+			 * the offset is zero.
+			 */
+			if (reg_type == SCALAR_VALUE)
+				mark_reg_unknown(state->regs, value_regno);
+			else
+				mark_reg_known_zero(state->regs, value_regno);
+			state->regs[value_regno].id = 0;
+			state->regs[value_regno].off = 0;
+			state->regs[value_regno].range = 0;
 			state->regs[value_regno].type = reg_type;
-			state->regs[value_regno].aux_off = 0;
-			state->regs[value_regno].aux_off_align = 0;
 		}
 
-	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
+	} else if (reg->type == PTR_TO_STACK) {
+		/* stack accesses must be at a fixed offset, so that we can
+		 * determine what type of data were returned.
+		 */
+		if (reg->align.mask) {
+			char tn_buf[48];
+
+			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
+			verbose("variable stack access align=%s off=%d size=%d",
+				tn_buf, off, size);
+			return -EACCES;
+		}
+		off += reg->align.value;
 		if (off >= 0 || off < -MAX_BPF_STACK) {
 			verbose("invalid stack off=%d size=%d\n", off, size);
 			return -EACCES;
@@ -937,7 +1003,7 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 		} else {
 			err = check_stack_read(state, off, size, value_regno);
 		}
-	} else if (state->regs[regno].type == PTR_TO_PACKET) {
+	} else if (reg->type == PTR_TO_PACKET) {
 		if (t == BPF_WRITE && !may_access_direct_pkt_data(env, NULL, t)) {
 			verbose("cannot write into packet\n");
 			return -EACCES;
@@ -949,21 +1015,23 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 		}
 		err = check_packet_access(env, regno, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
-			mark_reg_unknown_value_and_range(state->regs,
-							 value_regno);
+			mark_reg_unknown(state->regs, value_regno);
 	} else {
 		verbose("R%d invalid mem access '%s'\n",
 			regno, reg_type_str[reg->type]);
 		return -EACCES;
 	}
 
-	if (!err && size <= 2 && value_regno >= 0 && env->allow_ptr_leaks &&
-	    state->regs[value_regno].type == UNKNOWN_VALUE) {
-		/* 1 or 2 byte load zero-extends, determine the number of
-		 * zero upper bits. Not doing it fo 4 byte load, since
-		 * such values cannot be added to ptr_to_packet anyway.
-		 */
-		state->regs[value_regno].imm = 64 - size * 8;
+	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
+	    state->regs[value_regno].type == SCALAR_VALUE) {
+		/* b/h/w load zero-extends, mark upper bits as known 0 */
+		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
+		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
+		/* sign bit is known zero, so we can bound the value */
+		state->regs[value_regno].min_value = 0;
+		state->regs[value_regno].max_value = min_t(u64,
+					state->regs[value_regno].align.mask,
+					BPF_REGISTER_MAX_RANGE);
 	}
 	return err;
 }
@@ -1000,9 +1068,18 @@ static int check_xadd(struct bpf_verifier_env *env, struct bpf_insn *insn)
 				BPF_SIZE(insn->code), BPF_WRITE, -1);
 }
 
+/* Does this register contain a constant zero? */
+static bool register_is_null(struct bpf_reg_state reg)
+{
+	return reg.type == SCALAR_VALUE && reg.align.mask == 0 &&
+	       reg.align.value == 0;
+}
+
 /* when register 'regno' is passed into function that will read 'access_size'
  * bytes from that pointer, make sure that it's within stack boundary
- * and all elements of stack are initialized
+ * and all elements of stack are initialized.
+ * Unlike most pointer bounds-checking functions, this one doesn't take an
+ * 'off' argument, so it has to add in reg->off itself.
  */
 static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
 				int access_size, bool zero_size_allowed,
@@ -1013,9 +1090,9 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
 	int off, i;
 
 	if (regs[regno].type != PTR_TO_STACK) {
+		/* Allow zero-byte read from NULL, regardless of pointer type */
 		if (zero_size_allowed && access_size == 0 &&
-		    regs[regno].type == CONST_IMM &&
-		    regs[regno].imm  == 0)
+		    register_is_null(regs[regno]))
 			return 0;
 
 		verbose("R%d type=%s expected=%s\n", regno,
@@ -1024,7 +1101,15 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
 		return -EACCES;
 	}
 
-	off = regs[regno].imm;
+	/* Only allow fixed-offset stack reads */
+	if (regs[regno].align.mask) {
+		char tn_buf[48];
+
+		tn_strn(tn_buf, sizeof(tn_buf), regs[regno].align);
+		verbose("invalid variable stack read R%d align=%s\n",
+			regno, tn_buf);
+	}
+	off = regs[regno].off + regs[regno].align.value;
 	if (off >= 0 || off < -MAX_BPF_STACK || off + access_size > 0 ||
 	    access_size <= 0) {
 		verbose("invalid stack type R%d off=%d access_size=%d\n",
@@ -1052,16 +1137,14 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
 				   int access_size, bool zero_size_allowed,
 				   struct bpf_call_arg_meta *meta)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs;
+	struct bpf_reg_state *regs = env->cur_state.regs, *reg = &regs[regno];
 
-	switch (regs[regno].type) {
+	switch (reg->type) {
 	case PTR_TO_PACKET:
-		return check_packet_access(env, regno, 0, access_size);
+		return check_packet_access(env, regno, reg->off, access_size);
 	case PTR_TO_MAP_VALUE:
-		return check_map_access(env, regno, 0, access_size);
-	case PTR_TO_MAP_VALUE_ADJ:
-		return check_map_access_adj(env, regno, 0, access_size);
-	default: /* const_imm|ptr_to_stack or invalid ptr */
+		return check_map_access(env, regno, reg->off, access_size);
+	default: /* scalar_value|ptr_to_stack or invalid ptr */
 		return check_stack_boundary(env, regno, access_size,
 					    zero_size_allowed, meta);
 	}
@@ -1104,11 +1187,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			goto err_type;
 	} else if (arg_type == ARG_CONST_SIZE ||
 		   arg_type == ARG_CONST_SIZE_OR_ZERO) {
-		expected_type = CONST_IMM;
-		/* One exception. Allow UNKNOWN_VALUE registers when the
-		 * boundaries are known and don't cause unsafe memory accesses
-		 */
-		if (type != UNKNOWN_VALUE && type != expected_type)
+		expected_type = SCALAR_VALUE;
+		if (type != expected_type)
 			goto err_type;
 	} else if (arg_type == ARG_CONST_MAP_PTR) {
 		expected_type = CONST_PTR_TO_MAP;
@@ -1122,13 +1202,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		   arg_type == ARG_PTR_TO_UNINIT_MEM) {
 		expected_type = PTR_TO_STACK;
 		/* One exception here. In case function allows for NULL to be
-		 * passed in as argument, it's a CONST_IMM type. Final test
+		 * passed in as argument, it's a SCALAR_VALUE type. Final test
 		 * happens during stack boundary checking.
 		 */
-		if (type == CONST_IMM && reg->imm == 0)
+		if (register_is_null(*reg))
 			/* final test in check_stack_boundary() */;
 		else if (type != PTR_TO_PACKET && type != PTR_TO_MAP_VALUE &&
-			 type != PTR_TO_MAP_VALUE_ADJ && type != expected_type)
+			 type != expected_type)
 			goto err_type;
 		meta->raw_mode = arg_type == ARG_PTR_TO_UNINIT_MEM;
 	} else {
@@ -1154,7 +1234,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		if (type == PTR_TO_PACKET)
-			err = check_packet_access(env, regno, 0,
+			err = check_packet_access(env, regno, reg->off,
 						  meta->map_ptr->key_size);
 		else
 			err = check_stack_boundary(env, regno,
@@ -1170,7 +1250,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 		if (type == PTR_TO_PACKET)
-			err = check_packet_access(env, regno, 0,
+			err = check_packet_access(env, regno, reg->off,
 						  meta->map_ptr->value_size);
 		else
 			err = check_stack_boundary(env, regno,
@@ -1190,10 +1270,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 
-		/* If the register is UNKNOWN_VALUE, the access check happens
-		 * using its boundaries. Otherwise, just use its imm
+		/* The register is SCALAR_VALUE; the access check
+		 * happens using its boundaries.
 		 */
-		if (type == UNKNOWN_VALUE) {
+
+		if (reg->align.mask)
 			/* For unprivileged variable accesses, disable raw
 			 * mode so that the program is required to
 			 * initialize all the memory that the helper could
@@ -1201,35 +1282,28 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			 */
 			meta = NULL;
 
-			if (reg->min_value < 0) {
-				verbose("R%d min value is negative, either use unsigned or 'var &= const'\n",
-					regno);
-				return -EACCES;
-			}
-
-			if (reg->min_value == 0) {
-				err = check_helper_mem_access(env, regno - 1, 0,
-							      zero_size_allowed,
-							      meta);
-				if (err)
-					return err;
-			}
+		if (reg->min_value < 0) {
+			verbose("R%d min value is negative, either use unsigned or 'var &= const'\n",
+				regno);
+			return -EACCES;
+		}
 
-			if (reg->max_value == BPF_REGISTER_MAX_RANGE) {
-				verbose("R%d unbounded memory access, use 'var &= const' or 'if (var < const)'\n",
-					regno);
-				return -EACCES;
-			}
-			err = check_helper_mem_access(env, regno - 1,
-						      reg->max_value,
-						      zero_size_allowed, meta);
+		if (reg->min_value == 0) {
+			err = check_helper_mem_access(env, regno - 1, 0,
+						      zero_size_allowed,
+						      meta);
 			if (err)
 				return err;
-		} else {
-			/* register is CONST_IMM */
-			err = check_helper_mem_access(env, regno - 1, reg->imm,
-						      zero_size_allowed, meta);
 		}
+
+		if (reg->max_value == BPF_REGISTER_MAX_RANGE) {
+			verbose("R%d unbounded memory access, use 'var &= const' or 'if (var < const)'\n",
+				regno);
+			return -EACCES;
+		}
+		err = check_helper_mem_access(env, regno - 1,
+					      reg->max_value,
+					      zero_size_allowed, meta);
 	}
 
 	return err;
@@ -1321,6 +1395,9 @@ static int check_raw_mode(const struct bpf_func_proto *fn)
 	return count > 1 ? -EINVAL : 0;
 }
 
+/* Packet data might have moved, any old PTR_TO_PACKET[_END] are now invalid,
+ * so turn them into unknown SCALAR_VALUE.
+ */
 static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
 {
 	struct bpf_verifier_state *state = &env->cur_state;
@@ -1330,7 +1407,7 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
 	for (i = 0; i < MAX_BPF_REG; i++)
 		if (regs[i].type == PTR_TO_PACKET ||
 		    regs[i].type == PTR_TO_PACKET_END)
-			mark_reg_unknown_value(regs, i);
+			mark_reg_unknown(regs, i);
 
 	for (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) {
 		if (state->stack_slot_type[i] != STACK_SPILL)
@@ -1339,8 +1416,7 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
 		if (reg->type != PTR_TO_PACKET &&
 		    reg->type != PTR_TO_PACKET_END)
 			continue;
-		reg->type = UNKNOWN_VALUE;
-		reg->imm = 0;
+		__mark_reg_unknown(reg);
 	}
 }
 
@@ -1420,14 +1496,17 @@ static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
 
 	/* update return register */
 	if (fn->ret_type == RET_INTEGER) {
-		regs[BPF_REG_0].type = UNKNOWN_VALUE;
+		/* sets type to SCALAR_VALUE */
+		mark_reg_unknown(regs, BPF_REG_0);
 	} else if (fn->ret_type == RET_VOID) {
 		regs[BPF_REG_0].type = NOT_INIT;
 	} else if (fn->ret_type == RET_PTR_TO_MAP_VALUE_OR_NULL) {
 		struct bpf_insn_aux_data *insn_aux;
 
 		regs[BPF_REG_0].type = PTR_TO_MAP_VALUE_OR_NULL;
-		regs[BPF_REG_0].max_value = regs[BPF_REG_0].min_value = 0;
+		/* There is no offset yet applied, variable or fixed */
+		mark_reg_known_zero(regs, BPF_REG_0);
+		regs[BPF_REG_0].off = 0;
 		/* remember map_ptr, so that check_map_access()
 		 * can check 'value_size' boundary of memory access
 		 * to map element returned from bpf_map_lookup_elem()
@@ -1458,371 +1537,421 @@ static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
 	return 0;
 }
 
-static int check_packet_ptr_add(struct bpf_verifier_env *env,
-				struct bpf_insn *insn)
+static void check_reg_overflow(struct bpf_reg_state *reg)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs;
-	struct bpf_reg_state *dst_reg = &regs[insn->dst_reg];
-	struct bpf_reg_state *src_reg = &regs[insn->src_reg];
-	struct bpf_reg_state tmp_reg;
-	s32 imm;
-
-	if (BPF_SRC(insn->code) == BPF_K) {
-		/* pkt_ptr += imm */
-		imm = insn->imm;
-
-add_imm:
-		if (imm < 0) {
-			verbose("addition of negative constant to packet pointer is not allowed\n");
-			return -EACCES;
-		}
-		if (imm >= MAX_PACKET_OFF ||
-		    imm + dst_reg->off >= MAX_PACKET_OFF) {
-			verbose("constant %d is too large to add to packet pointer\n",
-				imm);
-			return -EACCES;
-		}
-		/* a constant was added to pkt_ptr.
-		 * Remember it while keeping the same 'id'
-		 */
-		dst_reg->off += imm;
-	} else {
-		bool had_id;
-
-		if (src_reg->type == PTR_TO_PACKET) {
-			/* R6=pkt(id=0,off=0,r=62) R7=imm22; r7 += r6 */
-			tmp_reg = *dst_reg;  /* save r7 state */
-			*dst_reg = *src_reg; /* copy pkt_ptr state r6 into r7 */
-			src_reg = &tmp_reg;  /* pretend it's src_reg state */
-			/* if the checks below reject it, the copy won't matter,
-			 * since we're rejecting the whole program. If all ok,
-			 * then imm22 state will be added to r7
-			 * and r7 will be pkt(id=0,off=22,r=62) while
-			 * r6 will stay as pkt(id=0,off=0,r=62)
-			 */
-		}
-
-		if (src_reg->type == CONST_IMM) {
-			/* pkt_ptr += reg where reg is known constant */
-			imm = src_reg->imm;
-			goto add_imm;
-		}
-		/* disallow pkt_ptr += reg
-		 * if reg is not uknown_value with guaranteed zero upper bits
-		 * otherwise pkt_ptr may overflow and addition will become
-		 * subtraction which is not allowed
-		 */
-		if (src_reg->type != UNKNOWN_VALUE) {
-			verbose("cannot add '%s' to ptr_to_packet\n",
-				reg_type_str[src_reg->type]);
-			return -EACCES;
-		}
-		if (src_reg->imm < 48) {
-			verbose("cannot add integer value with %lld upper zero bits to ptr_to_packet\n",
-				src_reg->imm);
-			return -EACCES;
-		}
-
-		had_id = (dst_reg->id != 0);
-
-		/* dst_reg stays as pkt_ptr type and since some positive
-		 * integer value was added to the pointer, increment its 'id'
-		 */
-		dst_reg->id = ++env->id_gen;
+	if (reg->max_value > BPF_REGISTER_MAX_RANGE)
+		reg->max_value = BPF_REGISTER_MAX_RANGE;
+	if (reg->min_value < BPF_REGISTER_MIN_RANGE ||
+	    reg->min_value > BPF_REGISTER_MAX_RANGE)
+		reg->min_value = BPF_REGISTER_MIN_RANGE;
+}
 
-		/* something was added to pkt_ptr, set range to zero */
-		dst_reg->aux_off += dst_reg->off;
-		dst_reg->off = 0;
-		dst_reg->range = 0;
-		if (had_id)
-			dst_reg->aux_off_align = min(dst_reg->aux_off_align,
-						     src_reg->min_align);
-		else
-			dst_reg->aux_off_align = src_reg->min_align;
+static void coerce_reg_to_32(struct bpf_reg_state *reg)
+{
+	/* 32-bit values can't be negative as an s64 */
+	if (reg->min_value < 0)
+		reg->min_value = 0;
+	/* clear high 32 bits */
+	reg->align.value &= (u32)-1;
+	reg->align.mask &= (u32)-1;
+	/* Did value become known?  Then update bounds */
+	if (!reg->align.mask) {
+		if ((s64)reg->align.value > BPF_REGISTER_MIN_RANGE)
+			reg->min_value = reg->align.value;
+		if (reg->align.value < BPF_REGISTER_MAX_RANGE)
+			reg->max_value = reg->align.value;
 	}
-	return 0;
 }
 
-static int evaluate_reg_alu(struct bpf_verifier_env *env, struct bpf_insn *insn)
+/* Handles arithmetic on a pointer and a scalar: computes new min/max and align.
+ * Caller must check_reg_overflow all argument regs beforehand.
+ * Caller should also handle BPF_MOV case separately.
+ */
+static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
+				   struct bpf_insn *insn,
+				   struct bpf_reg_state *ptr_reg,
+				   struct bpf_reg_state *off_reg)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs;
-	struct bpf_reg_state *dst_reg = &regs[insn->dst_reg];
+	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg;
+	bool known = !off_reg->align.mask;
+	s64 min_val = off_reg->min_value;
+	u64 max_val = off_reg->max_value;
 	u8 opcode = BPF_OP(insn->code);
-	s64 imm_log2;
+	u32 dst = insn->dst_reg;
 
-	/* for type == UNKNOWN_VALUE:
-	 * imm > 0 -> number of zero upper bits
-	 * imm == 0 -> don't track which is the same as all bits can be non-zero
-	 */
+	dst_reg = &regs[dst];
 
-	if (BPF_SRC(insn->code) == BPF_X) {
-		struct bpf_reg_state *src_reg = &regs[insn->src_reg];
-
-		if (src_reg->type == UNKNOWN_VALUE && src_reg->imm > 0 &&
-		    dst_reg->imm && opcode == BPF_ADD) {
-			/* dreg += sreg
-			 * where both have zero upper bits. Adding them
-			 * can only result making one more bit non-zero
-			 * in the larger value.
-			 * Ex. 0xffff (imm=48) + 1 (imm=63) = 0x10000 (imm=47)
-			 *     0xffff (imm=48) + 0xffff = 0x1fffe (imm=47)
-			 */
-			dst_reg->imm = min(dst_reg->imm, src_reg->imm);
-			dst_reg->imm--;
-			return 0;
+	if (WARN_ON_ONCE(known && (min_val != max_val))) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
+	}
+
+	if (BPF_CLASS(insn->code) != BPF_ALU64) {
+		/* 32-bit ALU ops on pointers produce (meaningless) scalars */
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d 32-bit pointer arithmetic prohibited\n",
+				dst);
+			return -EACCES;
 		}
-		if (src_reg->type == CONST_IMM && src_reg->imm > 0 &&
-		    dst_reg->imm && opcode == BPF_ADD) {
-			/* dreg += sreg
-			 * where dreg has zero upper bits and sreg is const.
-			 * Adding them can only result making one more bit
-			 * non-zero in the larger value.
-			 */
-			imm_log2 = __ilog2_u64((long long)src_reg->imm);
-			dst_reg->imm = min(dst_reg->imm, 63 - imm_log2);
-			dst_reg->imm--;
-			return 0;
+		__mark_reg_unknown(dst_reg);
+		/* High bits are known zero */
+		dst_reg->align.mask = (u32)-1;
+		return 0;
+	}
+
+	if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
+				dst);
+			return -EACCES;
+		}
+		__mark_reg_unknown(dst_reg);
+		return 0;
+	}
+	if (ptr_reg->type == CONST_PTR_TO_MAP) {
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n",
+				dst);
+			return -EACCES;
 		}
-		/* all other cases non supported yet, just mark dst_reg */
-		dst_reg->imm = 0;
+		__mark_reg_unknown(dst_reg);
+		return 0;
+	}
+	if (ptr_reg->type == PTR_TO_PACKET_END) {
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n",
+				dst);
+			return -EACCES;
+		}
+		__mark_reg_unknown(dst_reg);
 		return 0;
 	}
 
-	/* sign extend 32-bit imm into 64-bit to make sure that
-	 * negative values occupy bit 63. Note ilog2() would have
-	 * been incorrect, since sizeof(insn->imm) == 4
+	/* In case of 'scalar += pointer', dst_reg inherits pointer type and id.
+	 * The id may be overwritten later if we create a new variable offset.
 	 */
-	imm_log2 = __ilog2_u64((long long)insn->imm);
+	dst_reg->type = ptr_reg->type;
+	dst_reg->id = ptr_reg->id;
 
-	if (dst_reg->imm && opcode == BPF_LSH) {
-		/* reg <<= imm
-		 * if reg was a result of 2 byte load, then its imm == 48
-		 * which means that upper 48 bits are zero and shifting this reg
-		 * left by 4 would mean that upper 44 bits are still zero
+	switch (opcode) {
+	case BPF_ADD:
+		/* We can take a fixed offset as long as it doesn't overflow
+		 * the s32 'off' field
 		 */
-		dst_reg->imm -= insn->imm;
-	} else if (dst_reg->imm && opcode == BPF_MUL) {
-		/* reg *= imm
-		 * if multiplying by 14 subtract 4
-		 * This is conservative calculation of upper zero bits.
-		 * It's not trying to special case insn->imm == 1 or 0 cases
+		if (known && (ptr_reg->off + min_val ==
+			      (s64)(s32)(ptr_reg->off + min_val))) {
+			/* pointer += K.  Accumulate it into fixed offset */
+			dst_reg->min_value = ptr_reg->min_value;
+			dst_reg->max_value = ptr_reg->max_value;
+			dst_reg->align = ptr_reg->align;
+			dst_reg->off = ptr_reg->off + min_val;
+			break;
+		}
+		if (max_val == BPF_REGISTER_MAX_RANGE) {
+			verbose("R%d tried to add unbounded value to pointer\n",
+				dst);
+			return -EACCES;
+		}
+		/* A new variable offset is created.  Note that off_reg->off
+		 * == 0, since it's a scalar.
+		 * dst_reg gets the pointer type and since some positive
+		 * integer value was added to the pointer, increment its 'id'.
+		 * this creates a new 'base' pointer, off_reg (variable) gets
+		 * added into the variable offset, and we copy the fixed offset
+		 * from ptr_reg.
 		 */
-		dst_reg->imm -= imm_log2 + 1;
-	} else if (opcode == BPF_AND) {
-		/* reg &= imm */
-		dst_reg->imm = 63 - imm_log2;
-	} else if (dst_reg->imm && opcode == BPF_ADD) {
-		/* reg += imm */
-		dst_reg->imm = min(dst_reg->imm, 63 - imm_log2);
-		dst_reg->imm--;
-	} else if (opcode == BPF_RSH) {
-		/* reg >>= imm
-		 * which means that after right shift, upper bits will be zero
-		 * note that verifier already checked that
-		 * 0 <= imm < 64 for shift insn
+		if (min_val <= BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value += min_val;
+		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
+			dst_reg->max_value += max_val;
+		dst_reg->align = tn_add(ptr_reg->align, off_reg->align);
+		dst_reg->off = ptr_reg->off;
+		dst_reg->id = ++env->id_gen;
+		if (ptr_reg->type == PTR_TO_PACKET)
+			/* something was added to pkt_ptr, set range to zero */
+			dst_reg->range = 0;
+		break;
+	case BPF_SUB:
+		if (dst_reg == off_reg) {
+			/* scalar -= pointer.  Creates an unknown scalar */
+			if (!env->allow_ptr_leaks) {
+				verbose("R%d tried to subtract pointer from scalar\n",
+					dst);
+				return -EACCES;
+			}
+			/* Make it an unknown scalar */
+			__mark_reg_unknown(dst_reg);
+			break;
+		}
+		/* We don't allow subtraction from FP, because (according to
+		 * test_verifier.c test "invalid fp arithmetic", JITs might not
+		 * be able to deal with it.
 		 */
-		dst_reg->imm += insn->imm;
-		if (unlikely(dst_reg->imm > 64))
-			/* some dumb code did:
-			 * r2 = *(u32 *)mem;
-			 * r2 >>= 32;
-			 * and all bits are zero now */
-			dst_reg->imm = 64;
-	} else {
-		/* all other alu ops, means that we don't know what will
-		 * happen to the value, mark it with unknown number of zero bits
+		if (ptr_reg->type == PTR_TO_STACK) {
+			if (!env->allow_ptr_leaks) {
+				verbose("R%d subtraction from stack pointer prohibited\n",
+					dst);
+				return -EACCES;
+			}
+			/* Make it an unknown scalar */
+			__mark_reg_unknown(dst_reg);
+			break;
+		}
+		if (known && (ptr_reg->off - min_val ==
+			      (s64)(s32)(ptr_reg->off - min_val))) {
+			/* pointer -= K.  Subtract it from fixed offset */
+			dst_reg->min_value = ptr_reg->min_value;
+			dst_reg->max_value = ptr_reg->max_value;
+			dst_reg->align = ptr_reg->align;
+			dst_reg->id = ptr_reg->id;
+			dst_reg->off = ptr_reg->off - min_val;
+			break;
+		}
+		/* Subtracting a negative value will just confuse everything.
+		 * This can happen if off_reg is an immediate.
 		 */
-		dst_reg->imm = 0;
-	}
-
-	if (dst_reg->imm < 0) {
-		/* all 64 bits of the register can contain non-zero bits
-		 * and such value cannot be added to ptr_to_packet, since it
-		 * may overflow, mark it as unknown to avoid further eval
+		if ((s64)max_val < 0) {
+			if (!env->allow_ptr_leaks) {
+				verbose("R%d tried to subtract negative max_val %lld from pointer\n",
+					dst, (s64)max_val);
+				return -EACCES;
+			}
+			/* Make it an unknown scalar */
+			__mark_reg_unknown(dst_reg);
+			break;
+		}
+		/* A new variable offset is created.  If the subtrahend is known
+		 * nonnegative, then any reg->range we had before is still good.
 		 */
-		dst_reg->imm = 0;
-	}
-	return 0;
-}
-
-static int evaluate_reg_imm_alu(struct bpf_verifier_env *env,
-				struct bpf_insn *insn)
-{
-	struct bpf_reg_state *regs = env->cur_state.regs;
-	struct bpf_reg_state *dst_reg = &regs[insn->dst_reg];
-	struct bpf_reg_state *src_reg = &regs[insn->src_reg];
-	u8 opcode = BPF_OP(insn->code);
-	u64 dst_imm = dst_reg->imm;
-
-	/* dst_reg->type == CONST_IMM here. Simulate execution of insns
-	 * containing ALU ops. Don't care about overflow or negative
-	 * values, just add/sub/... them; registers are in u64.
-	 */
-	if (opcode == BPF_ADD && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm += insn->imm;
-	} else if (opcode == BPF_ADD && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm += src_reg->imm;
-	} else if (opcode == BPF_SUB && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm -= insn->imm;
-	} else if (opcode == BPF_SUB && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm -= src_reg->imm;
-	} else if (opcode == BPF_MUL && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm *= insn->imm;
-	} else if (opcode == BPF_MUL && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm *= src_reg->imm;
-	} else if (opcode == BPF_OR && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm |= insn->imm;
-	} else if (opcode == BPF_OR && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm |= src_reg->imm;
-	} else if (opcode == BPF_AND && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm &= insn->imm;
-	} else if (opcode == BPF_AND && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm &= src_reg->imm;
-	} else if (opcode == BPF_RSH && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm >>= insn->imm;
-	} else if (opcode == BPF_RSH && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm >>= src_reg->imm;
-	} else if (opcode == BPF_LSH && BPF_SRC(insn->code) == BPF_K) {
-		dst_imm <<= insn->imm;
-	} else if (opcode == BPF_LSH && BPF_SRC(insn->code) == BPF_X &&
-		   src_reg->type == CONST_IMM) {
-		dst_imm <<= src_reg->imm;
-	} else {
-		mark_reg_unknown_value(regs, insn->dst_reg);
-		goto out;
+		if (max_val >= BPF_REGISTER_MAX_RANGE)
+			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value -= max_val;
+		if (min_val <= BPF_REGISTER_MIN_RANGE)
+			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
+			dst_reg->max_value -= min_val;
+		dst_reg->align = tn_sub(ptr_reg->align, off_reg->align);
+		dst_reg->off = ptr_reg->off;
+		dst_reg->id = ++env->id_gen;
+		if (ptr_reg->type == PTR_TO_PACKET && min_val < 0)
+			/* something was added to pkt_ptr, set range to zero */
+			dst_reg->range = 0;
+		break;
+	case BPF_AND:
+	case BPF_OR:
+	case BPF_XOR:
+		/* bitwise ops on pointers are troublesome, prohibit for now.
+		 * (However, in principle we could allow some cases, e.g.
+		 * ptr &= ~3 which would reduce min_value by 3.)
+		 */
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d bitwise operator %s on pointer prohibited\n",
+				dst, bpf_alu_string[opcode >> 4]);
+			return -EACCES;
+		}
+		/* Make it an unknown scalar */
+		__mark_reg_unknown(dst_reg);
+	default:
+		/* other operators (e.g. MUL,LSH) produce non-pointer results */
+		if (!env->allow_ptr_leaks) {
+			verbose("R%d pointer arithmetic with %s operator prohibited\n",
+				dst, bpf_alu_string[opcode >> 4]);
+			return -EACCES;
+		}
+		/* Make it an unknown scalar */
+		__mark_reg_unknown(dst_reg);
 	}
 
-	dst_reg->imm = dst_imm;
-out:
+	check_reg_overflow(dst_reg);
 	return 0;
 }
 
-static void check_reg_overflow(struct bpf_reg_state *reg)
-{
-	if (reg->max_value > BPF_REGISTER_MAX_RANGE)
-		reg->max_value = BPF_REGISTER_MAX_RANGE;
-	if (reg->min_value < BPF_REGISTER_MIN_RANGE ||
-	    reg->min_value > BPF_REGISTER_MAX_RANGE)
-		reg->min_value = BPF_REGISTER_MIN_RANGE;
-}
-
-static u32 calc_align(u32 imm)
-{
-	if (!imm)
-		return 1U << 31;
-	return imm - ((imm - 1) & imm);
-}
-
-static void adjust_reg_min_max_vals(struct bpf_verifier_env *env,
-				    struct bpf_insn *insn)
+/* Handles ALU ops other than BPF_END, BPF_NEG and BPF_MOV: computes new min/max
+ * and align.
+ * TODO: check this is legit for ALU32, particularly around negatives
+ */
+static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
+				   struct bpf_insn *insn)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg;
+	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg, *src_reg;
+	struct bpf_reg_state *ptr_reg = NULL, off_reg = {0};
 	s64 min_val = BPF_REGISTER_MIN_RANGE;
 	u64 max_val = BPF_REGISTER_MAX_RANGE;
 	u8 opcode = BPF_OP(insn->code);
-	u32 dst_align, src_align;
+	bool src_known, dst_known;
 
 	dst_reg = &regs[insn->dst_reg];
-	src_align = 0;
+	check_reg_overflow(dst_reg);
+	src_reg = NULL;
+	if (dst_reg->type != SCALAR_VALUE)
+		ptr_reg = dst_reg;
 	if (BPF_SRC(insn->code) == BPF_X) {
-		check_reg_overflow(&regs[insn->src_reg]);
-		min_val = regs[insn->src_reg].min_value;
-		max_val = regs[insn->src_reg].max_value;
-
-		/* If the source register is a random pointer then the
-		 * min_value/max_value values represent the range of the known
-		 * accesses into that value, not the actual min/max value of the
-		 * register itself.  In this case we have to reset the reg range
-		 * values so we know it is not safe to look at.
-		 */
-		if (regs[insn->src_reg].type != CONST_IMM &&
-		    regs[insn->src_reg].type != UNKNOWN_VALUE) {
-			min_val = BPF_REGISTER_MIN_RANGE;
-			max_val = BPF_REGISTER_MAX_RANGE;
-			src_align = 0;
-		} else {
-			src_align = regs[insn->src_reg].min_align;
+		src_reg = &regs[insn->src_reg];
+		check_reg_overflow(src_reg);
+
+		if (src_reg->type != SCALAR_VALUE) {
+			if (dst_reg->type != SCALAR_VALUE) {
+				/* Combining two pointers by any ALU op yields
+				 * an arbitrary scalar.
+				 */
+				if (!env->allow_ptr_leaks) {
+					verbose("R%d pointer %s pointer prohibited\n",
+						insn->dst_reg,
+						bpf_alu_string[opcode >> 4]);
+					return -EACCES;
+				}
+				mark_reg_unknown(regs, insn->dst_reg);
+				return 0;
+			} else {
+				/* scalar += pointer
+				 * This is legal, but we have to reverse our
+				 * src/dest handling in computing the range
+				 */
+				return adjust_ptr_min_max_vals(env, insn,
+							       src_reg, dst_reg);
+			}
+		} else if (ptr_reg) {
+			/* pointer += scalar */
+			return adjust_ptr_min_max_vals(env, insn,
+						       dst_reg, src_reg);
 		}
-	} else if (insn->imm < BPF_REGISTER_MAX_RANGE &&
-		   (s64)insn->imm > BPF_REGISTER_MIN_RANGE) {
-		min_val = max_val = insn->imm;
-		src_align = calc_align(insn->imm);
+	} else {
+		/* Pretend the src is a reg with a known value, since we only
+		 * need to be able to read from this state.
+		 */
+		off_reg.type = SCALAR_VALUE;
+		off_reg.align = tn_const(insn->imm);
+		off_reg.min_value = insn->imm;
+		off_reg.max_value = insn->imm;
+		src_reg = &off_reg;
+		if (ptr_reg) /* pointer += K */
+			return adjust_ptr_min_max_vals(env, insn,
+						       ptr_reg, src_reg);
+	}
+
+	/* Got here implies adding two SCALAR_VALUEs */
+	if (WARN_ON_ONCE(ptr_reg)) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
 	}
-
-	dst_align = dst_reg->min_align;
-
-	/* We don't know anything about what was done to this register, mark it
-	 * as unknown.
-	 */
-	if (min_val == BPF_REGISTER_MIN_RANGE &&
-	    max_val == BPF_REGISTER_MAX_RANGE) {
-		reset_reg_range_values(regs, insn->dst_reg);
-		return;
+	if (WARN_ON(!src_reg)) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
 	}
-
-	/* If one of our values was at the end of our ranges then we can't just
-	 * do our normal operations to the register, we need to set the values
-	 * to the min/max since they are undefined.
-	 */
-	if (min_val == BPF_REGISTER_MIN_RANGE)
-		dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-	if (max_val == BPF_REGISTER_MAX_RANGE)
-		dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+	if (BPF_CLASS(insn->code) != BPF_ALU64) {
+		/* 32-bit ALU ops are (32,32)->64 */
+		coerce_reg_to_32(dst_reg);
+		coerce_reg_to_32(src_reg);
+	}
+	min_val = src_reg->min_value;
+	max_val = src_reg->max_value;
+	src_known = !src_reg->align.mask;
+	dst_known = !dst_reg->align.mask;
 
 	switch (opcode) {
 	case BPF_ADD:
+		if (min_val == BPF_REGISTER_MIN_RANGE)
+			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
 		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
 			dst_reg->min_value += min_val;
+		/* if max_val is MAX_RANGE, this will saturate dst->max */
 		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
 			dst_reg->max_value += max_val;
-		dst_reg->min_align = min(src_align, dst_align);
+		dst_reg->align = tn_add(dst_reg->align, src_reg->align);
 		break;
 	case BPF_SUB:
+		if (max_val == BPF_REGISTER_MAX_RANGE)
+			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
 		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
-			dst_reg->min_value -= min_val;
+			dst_reg->min_value -= max_val;
+		if (min_val == BPF_REGISTER_MIN_RANGE)
+			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
 		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value -= max_val;
-		dst_reg->min_align = min(src_align, dst_align);
+			dst_reg->max_value -= min_val;
+		dst_reg->align = tn_sub(dst_reg->align, src_reg->align);
 		break;
 	case BPF_MUL:
-		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
-			dst_reg->min_value *= min_val;
+		if (min_val < 0 || dst_reg->min_value < 0) {
+			/* Ain't nobody got time to multiply that sign */
+			__mark_reg_unknown(dst_reg);
+			break;
+		}
+		dst_reg->min_value *= min_val;
+		/* if max_val is MAX_RANGE, this will saturate dst->max.
+		 * We know MAX_RANGE ** 2 won't overflow a u64, because
+		 * MAX_RANGE itself fits in a u32.
+		 */
+		BUILD_BUG_ON(BPF_REGISTER_MAX_RANGE > (u32)-1);
 		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
 			dst_reg->max_value *= max_val;
-		dst_reg->min_align = max(src_align, dst_align);
+		dst_reg->align = tn_mul(dst_reg->align, src_reg->align);
 		break;
 	case BPF_AND:
-		/* Disallow AND'ing of negative numbers, ain't nobody got time
-		 * for that.  Otherwise the minimum is 0 and the max is the max
-		 * value we could AND against.
+		if (src_known && dst_known) {
+			u64 value = dst_reg->align.value & src_reg->align.value;
+
+			dst_reg->align = tn_const(value);
+			dst_reg->min_value = dst_reg->max_value = min_t(u64,
+					value, BPF_REGISTER_MAX_RANGE);
+			break;
+		}
+		/* Lose min_value when AND'ing negative numbers, ain't nobody
+		 * got time for that.  Otherwise we get our minimum from the
+		 * align, since that's inherently bitwise.
+		 * Our maximum is the minimum of the operands' maxima.
 		 */
-		if (min_val < 0)
+		dst_reg->align = tn_and(dst_reg->align, src_reg->align);
+		if (min_val < 0 && dst_reg->min_value < 0)
 			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
 		else
-			dst_reg->min_value = 0;
-		dst_reg->max_value = max_val;
-		dst_reg->min_align = max(src_align, dst_align);
+			dst_reg->min_value = dst_reg->align.value;
+		dst_reg->max_value = min(dst_reg->max_value, max_val);
+		break;
+	case BPF_OR:
+		if (src_known && dst_known) {
+			u64 value = dst_reg->align.value | src_reg->align.value;
+
+			dst_reg->align = tn_const(value);
+			dst_reg->min_value = dst_reg->max_value = min_t(u64,
+					value, BPF_REGISTER_MAX_RANGE);
+			break;
+		}
+		/* Lose ranges when OR'ing negative numbers, ain't nobody got
+		 * time for that.  Otherwise we get our maximum from the align,
+		 * and our minimum is the maximum of the operands' minima.
+		 */
+		dst_reg->align = tn_or(dst_reg->align, src_reg->align);
+		if (min_val < 0 || dst_reg->min_value < 0) {
+			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+		} else {
+			dst_reg->min_value = max(dst_reg->min_value, min_val);
+			dst_reg->max_value = dst_reg->align.value | dst_reg->align.mask;
+		}
 		break;
 	case BPF_LSH:
+		if (min_val < 0) {
+			/* LSH by a negative number is undefined */
+			mark_reg_unknown(regs, insn->dst_reg);
+			break;
+		}
 		/* Gotta have special overflow logic here, if we're shifting
 		 * more than MAX_RANGE then just assume we have an invalid
 		 * range.
 		 */
 		if (min_val > ilog2(BPF_REGISTER_MAX_RANGE)) {
 			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-			dst_reg->min_align = 1;
+			dst_reg->align = tn_unknown;
 		} else {
 			if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
 				dst_reg->min_value <<= min_val;
-			if (!dst_reg->min_align)
-				dst_reg->min_align = 1;
-			dst_reg->min_align <<= min_val;
+			if (src_known)
+				dst_reg->align = tn_sl(dst_reg->align, min_val);
+			else
+				dst_reg->align = tn_sl(tn_unknown, min_val);
 		}
 		if (max_val > ilog2(BPF_REGISTER_MAX_RANGE))
 			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
@@ -1830,37 +1959,41 @@ static void adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 			dst_reg->max_value <<= max_val;
 		break;
 	case BPF_RSH:
-		/* RSH by a negative number is undefined, and the BPF_RSH is an
-		 * unsigned shift, so make the appropriate casts.
-		 */
-		if (min_val < 0 || dst_reg->min_value < 0) {
-			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+		if (min_val < 0) {
+			/* RSH by a negative number is undefined */
+			mark_reg_unknown(regs, insn->dst_reg);
+			break;
+		}
+		/* BPF_RSH is an unsigned shift, so make the appropriate casts */
+		if (dst_reg->min_value < 0) {
+			if (min_val)
+				/* Sign bit will be cleared */
+				dst_reg->min_value = 0;
 		} else {
 			dst_reg->min_value =
 				(u64)(dst_reg->min_value) >> min_val;
 		}
-		if (min_val < 0) {
-			dst_reg->min_align = 1;
-		} else {
-			dst_reg->min_align >>= (u64) min_val;
-			if (!dst_reg->min_align)
-				dst_reg->min_align = 1;
-		}
-		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value >>= max_val;
+		if (src_known)
+			dst_reg->align = tn_sr(dst_reg->align, min_val);
+		else
+			dst_reg->align = tn_sr(tn_unknown, min_val);
+		if (dst_reg->max_value == BPF_REGISTER_MAX_RANGE)
+			dst_reg->max_value = ~0;
+		dst_reg->max_value >>= max_val;
 		break;
 	default:
-		reset_reg_range_values(regs, insn->dst_reg);
+		mark_reg_unknown(regs, insn->dst_reg);
 		break;
 	}
 
 	check_reg_overflow(dst_reg);
+	return 0;
 }
 
 /* check validity of 32-bit and 64-bit arithmetic operations */
 static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg;
+	struct bpf_reg_state *regs = env->cur_state.regs;
 	u8 opcode = BPF_OP(insn->code);
 	int err;
 
@@ -1920,11 +2053,6 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		if (err)
 			return err;
 
-		/* we are setting our register to something new, we need to
-		 * reset its range values.
-		 */
-		reset_reg_range_values(regs, insn->dst_reg);
-
 		if (BPF_SRC(insn->code) == BPF_X) {
 			if (BPF_CLASS(insn->code) == BPF_ALU64) {
 				/* case: R1 = R2
@@ -1932,22 +2060,27 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 				 */
 				regs[insn->dst_reg] = regs[insn->src_reg];
 			} else {
+				/* R1 = (u32) R2 */
 				if (is_pointer_value(env, insn->src_reg)) {
 					verbose("R%d partial copy of pointer\n",
 						insn->src_reg);
 					return -EACCES;
 				}
-				mark_reg_unknown_value(regs, insn->dst_reg);
+				mark_reg_unknown(regs, insn->dst_reg);
+				/* high 32 bits are known zero.  But this is
+				 * still out of range for max_value, so leave
+				 * that.
+				 */
+				regs[insn->dst_reg].align.mask &= (u32)-1;
 			}
 		} else {
 			/* case: R = imm
 			 * remember the value we stored into this reg
 			 */
-			regs[insn->dst_reg].type = CONST_IMM;
-			regs[insn->dst_reg].imm = insn->imm;
+			regs[insn->dst_reg].type = SCALAR_VALUE;
+			regs[insn->dst_reg].align = tn_const(insn->imm);
 			regs[insn->dst_reg].max_value = insn->imm;
 			regs[insn->dst_reg].min_value = insn->imm;
-			regs[insn->dst_reg].min_align = calc_align(insn->imm);
 		}
 
 	} else if (opcode > BPF_END) {
@@ -1998,68 +2131,7 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		if (err)
 			return err;
 
-		dst_reg = &regs[insn->dst_reg];
-
-		/* first we want to adjust our ranges. */
-		adjust_reg_min_max_vals(env, insn);
-
-		/* pattern match 'bpf_add Rx, imm' instruction */
-		if (opcode == BPF_ADD && BPF_CLASS(insn->code) == BPF_ALU64 &&
-		    dst_reg->type == FRAME_PTR && BPF_SRC(insn->code) == BPF_K) {
-			dst_reg->type = PTR_TO_STACK;
-			dst_reg->imm = insn->imm;
-			return 0;
-		} else if (opcode == BPF_ADD &&
-			   BPF_CLASS(insn->code) == BPF_ALU64 &&
-			   dst_reg->type == PTR_TO_STACK &&
-			   ((BPF_SRC(insn->code) == BPF_X &&
-			     regs[insn->src_reg].type == CONST_IMM) ||
-			    BPF_SRC(insn->code) == BPF_K)) {
-			if (BPF_SRC(insn->code) == BPF_X)
-				dst_reg->imm += regs[insn->src_reg].imm;
-			else
-				dst_reg->imm += insn->imm;
-			return 0;
-		} else if (opcode == BPF_ADD &&
-			   BPF_CLASS(insn->code) == BPF_ALU64 &&
-			   (dst_reg->type == PTR_TO_PACKET ||
-			    (BPF_SRC(insn->code) == BPF_X &&
-			     regs[insn->src_reg].type == PTR_TO_PACKET))) {
-			/* ptr_to_packet += K|X */
-			return check_packet_ptr_add(env, insn);
-		} else if (BPF_CLASS(insn->code) == BPF_ALU64 &&
-			   dst_reg->type == UNKNOWN_VALUE &&
-			   env->allow_ptr_leaks) {
-			/* unknown += K|X */
-			return evaluate_reg_alu(env, insn);
-		} else if (BPF_CLASS(insn->code) == BPF_ALU64 &&
-			   dst_reg->type == CONST_IMM &&
-			   env->allow_ptr_leaks) {
-			/* reg_imm += K|X */
-			return evaluate_reg_imm_alu(env, insn);
-		} else if (is_pointer_value(env, insn->dst_reg)) {
-			verbose("R%d pointer arithmetic prohibited\n",
-				insn->dst_reg);
-			return -EACCES;
-		} else if (BPF_SRC(insn->code) == BPF_X &&
-			   is_pointer_value(env, insn->src_reg)) {
-			verbose("R%d pointer arithmetic prohibited\n",
-				insn->src_reg);
-			return -EACCES;
-		}
-
-		/* If we did pointer math on a map value then just set it to our
-		 * PTR_TO_MAP_VALUE_ADJ type so we can deal with any stores or
-		 * loads to this register appropriately, otherwise just mark the
-		 * register as unknown.
-		 */
-		if (env->allow_ptr_leaks &&
-		    BPF_CLASS(insn->code) == BPF_ALU64 && opcode == BPF_ADD &&
-		    (dst_reg->type == PTR_TO_MAP_VALUE ||
-		     dst_reg->type == PTR_TO_MAP_VALUE_ADJ))
-			dst_reg->type = PTR_TO_MAP_VALUE_ADJ;
-		else
-			mark_reg_unknown_value(regs, insn->dst_reg);
+		return adjust_reg_min_max_vals(env, insn);
 	}
 
 	return 0;
@@ -2071,6 +2143,10 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *state,
 	struct bpf_reg_state *regs = state->regs, *reg;
 	int i;
 
+	if (dst_reg->off < 0)
+		/* This doesn't give us any range */
+		return;
+
 	/* LLVM can generate two kind of checks:
 	 *
 	 * Type 1:
@@ -2104,20 +2180,21 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *state,
 	for (i = 0; i < MAX_BPF_REG; i++)
 		if (regs[i].type == PTR_TO_PACKET && regs[i].id == dst_reg->id)
 			/* keep the maximum range already checked */
-			regs[i].range = max(regs[i].range, dst_reg->off);
+			regs[i].range = max_t(u32, regs[i].range, dst_reg->off);
 
 	for (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) {
 		if (state->stack_slot_type[i] != STACK_SPILL)
 			continue;
 		reg = &state->spilled_regs[i / BPF_REG_SIZE];
 		if (reg->type == PTR_TO_PACKET && reg->id == dst_reg->id)
-			reg->range = max(reg->range, dst_reg->off);
+			reg->range = max_t(u32, reg->range, dst_reg->off);
 	}
 }
 
 /* Adjusts the register min/max values in the case that the dst_reg is the
  * variable register that we are working on, and src_reg is a constant or we're
  * simply doing a BPF_K check.
+ * In JEQ/JNE cases we also adjust the align values.
  */
 static void reg_set_min_max(struct bpf_reg_state *true_reg,
 			    struct bpf_reg_state *false_reg, u64 val,
@@ -2129,34 +2206,52 @@ static void reg_set_min_max(struct bpf_reg_state *true_reg,
 		 * true then we know for sure.
 		 */
 		true_reg->max_value = true_reg->min_value = val;
+		true_reg->align = tn_const(val);
 		break;
 	case BPF_JNE:
 		/* If this is true we know nothing Jon Snow, but if it is false
 		 * we know the value for sure;
 		 */
 		false_reg->max_value = false_reg->min_value = val;
+		false_reg->align = tn_const(val);
 		break;
 	case BPF_JGT:
-		/* Unsigned comparison, the minimum value is 0. */
-		false_reg->min_value = 0;
-		/* fallthrough */
-	case BPF_JSGT:
-		/* If this is false then we know the maximum val is val,
-		 * otherwise we know the min val is val+1.
+		/* Unsigned comparison, can only tell us about max_value (since
+		 * min_value is signed), unless we learn sign bit.
 		 */
 		false_reg->max_value = val;
+		/* If we're not unsigned-greater-than a positive value, then
+		 * we can't be negative.
+		 */
+		if ((s64)val >= 0 && false_reg->min_value < 0)
+			false_reg->min_value = 0;
+		break;
+	case BPF_JSGT:
+		/* Signed comparison, can only tell us about min_value (since
+		 * max_value is unsigned), unless we already know sign bit.
+		 */
 		true_reg->min_value = val + 1;
+		/* If we're not signed-greater than val, and we're not negative,
+		 * then we can't be unsigned-greater than val either.
+		 */
+		if (false_reg->min_value >= 0)
+			false_reg->max_value = val;
 		break;
 	case BPF_JGE:
-		/* Unsigned comparison, the minimum value is 0. */
-		false_reg->min_value = 0;
-		/* fallthrough */
-	case BPF_JSGE:
-		/* If this is false then we know the maximum value is val - 1,
-		 * otherwise we know the mimimum value is val.
-		 */
 		false_reg->max_value = val - 1;
+		/* If we're not unsigned-ge a positive value, then we can't be
+		 * negative.
+		 */
+		if ((s64)val >= 0 && false_reg->min_value < 0)
+			false_reg->min_value = 0;
+		break;
+	case BPF_JSGE:
 		true_reg->min_value = val;
+		/* If we're not signed-ge val, and we're not negative, then we
+		 * can't be unsigned-ge val either.
+		 */
+		if (false_reg->min_value >= 0)
+			false_reg->max_value = val - 1;
 		break;
 	default:
 		break;
@@ -2166,8 +2261,8 @@ static void reg_set_min_max(struct bpf_reg_state *true_reg,
 	check_reg_overflow(true_reg);
 }
 
-/* Same as above, but for the case that dst_reg is a CONST_IMM reg and src_reg
- * is the variable reg.
+/* Same as above, but for the case that dst_reg holds a constant and src_reg is
+ * the variable reg.
  */
 static void reg_set_min_max_inv(struct bpf_reg_state *true_reg,
 				struct bpf_reg_state *false_reg, u64 val,
@@ -2179,35 +2274,52 @@ static void reg_set_min_max_inv(struct bpf_reg_state *true_reg,
 		 * true then we know for sure.
 		 */
 		true_reg->max_value = true_reg->min_value = val;
+		true_reg->align = tn_const(val);
 		break;
 	case BPF_JNE:
 		/* If this is true we know nothing Jon Snow, but if it is false
 		 * we know the value for sure;
 		 */
 		false_reg->max_value = false_reg->min_value = val;
+		false_reg->align = tn_const(val);
 		break;
 	case BPF_JGT:
-		/* Unsigned comparison, the minimum value is 0. */
-		true_reg->min_value = 0;
-		/* fallthrough */
+		/* Unsigned comparison, can only tell us about max_value (since
+		 * min_value is signed), unless we learn sign bit.
+		 */
+		true_reg->max_value = val - 1;
+		/* If a positive value is unsigned-greater-than us, then we
+		 * can't be negative.
+		 */
+		if ((s64)val >= 0 && true_reg->min_value < 0)
+			true_reg->min_value = 0;
+		break;
 	case BPF_JSGT:
-		/*
-		 * If this is false, then the val is <= the register, if it is
-		 * true the register <= to the val.
+		/* Signed comparison, can only tell us about min_value (since
+		 * max_value is unsigned), unless we already know sign bit.
 		 */
 		false_reg->min_value = val;
-		true_reg->max_value = val - 1;
+		/* If val is signed-greater-than us, and we're not negative,
+		 * then val must be unsigned-greater-than us.
+		 */
+		if (true_reg->min_value >= 0)
+			true_reg->max_value = val - 1;
 		break;
 	case BPF_JGE:
-		/* Unsigned comparison, the minimum value is 0. */
-		true_reg->min_value = 0;
-		/* fallthrough */
-	case BPF_JSGE:
-		/* If this is false then constant < register, if it is true then
-		 * the register < constant.
+		true_reg->max_value = val;
+		/* If a positive value is unsigned-ge us, then we can't be
+		 * negative.
 		 */
+		if ((s64)val >= 0 && true_reg->min_value < 0)
+			true_reg->min_value = 0;
+		break;
+	case BPF_JSGE:
 		false_reg->min_value = val + 1;
-		true_reg->max_value = val;
+		/* If val is signed-ge us, and we're not negative, then val
+		 * must be unsigned-ge us.
+		 */
+		if (true_reg->min_value >= 0)
+			true_reg->max_value = val;
 		break;
 	default:
 		break;
@@ -2217,19 +2329,58 @@ static void reg_set_min_max_inv(struct bpf_reg_state *true_reg,
 	check_reg_overflow(true_reg);
 }
 
+/* Regs are known to be equal, so intersect their min/max/align */
+static void __reg_combine_min_max(struct bpf_reg_state *src_reg,
+				  struct bpf_reg_state *dst_reg)
+{
+	src_reg->min_value = dst_reg->min_value = max(src_reg->min_value,
+						      dst_reg->min_value);
+	src_reg->max_value = dst_reg->max_value = min(src_reg->max_value,
+						      dst_reg->max_value);
+	src_reg->align = dst_reg->align = tn_intersect(src_reg->align,
+						       dst_reg->align);
+	check_reg_overflow(src_reg);
+	check_reg_overflow(dst_reg);
+}
+
+static void reg_combine_min_max(struct bpf_reg_state *true_src,
+				struct bpf_reg_state *true_dst,
+				struct bpf_reg_state *false_src,
+				struct bpf_reg_state *false_dst,
+				u8 opcode)
+{
+	switch (opcode) {
+	case BPF_JEQ:
+		__reg_combine_min_max(true_src, true_dst);
+		break;
+	case BPF_JNE:
+		__reg_combine_min_max(false_src, false_dst);
+	}
+}
+
 static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id,
-			 enum bpf_reg_type type)
+			 bool is_null)
 {
 	struct bpf_reg_state *reg = &regs[regno];
 
 	if (reg->type == PTR_TO_MAP_VALUE_OR_NULL && reg->id == id) {
-		if (type == UNKNOWN_VALUE) {
-			__mark_reg_unknown_value(regs, regno);
+		/* Old offset (both fixed and variable parts) should
+		 * have been known-zero, because we don't allow pointer
+		 * arithmetic on pointers that might be NULL.
+		 */
+		if (WARN_ON_ONCE(reg->min_value || reg->max_value ||
+				 reg->align.value || reg->align.mask ||
+				 reg->off)) {
+			reg->min_value = reg->max_value = reg->off = 0;
+			reg->align = tn_const(0);
+		}
+		if (is_null) {
+			reg->type = SCALAR_VALUE;
 		} else if (reg->map_ptr->inner_map_meta) {
 			reg->type = CONST_PTR_TO_MAP;
 			reg->map_ptr = reg->map_ptr->inner_map_meta;
 		} else {
-			reg->type = type;
+			reg->type = PTR_TO_MAP_VALUE;
 		}
 		/* We don't need id from this point onwards anymore, thus we
 		 * should better reset it, so that state pruning has chances
@@ -2243,19 +2394,19 @@ static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id,
  * be folded together at some point.
  */
 static void mark_map_regs(struct bpf_verifier_state *state, u32 regno,
-			  enum bpf_reg_type type)
+			  bool is_null)
 {
 	struct bpf_reg_state *regs = state->regs;
 	u32 id = regs[regno].id;
 	int i;
 
 	for (i = 0; i < MAX_BPF_REG; i++)
-		mark_map_reg(regs, i, id, type);
+		mark_map_reg(regs, i, id, is_null);
 
 	for (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) {
 		if (state->stack_slot_type[i] != STACK_SPILL)
 			continue;
-		mark_map_reg(state->spilled_regs, i / BPF_REG_SIZE, id, type);
+		mark_map_reg(state->spilled_regs, i / BPF_REG_SIZE, id, is_null);
 	}
 }
 
@@ -2305,7 +2456,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 	/* detect if R == 0 where R was initialized to zero earlier */
 	if (BPF_SRC(insn->code) == BPF_K &&
 	    (opcode == BPF_JEQ || opcode == BPF_JNE) &&
-	    dst_reg->type == CONST_IMM && dst_reg->imm == insn->imm) {
+	    dst_reg->type == SCALAR_VALUE &&
+	    dst_reg->align.value == insn->imm &&
+	    dst_reg->align.mask == 0) {
 		if (opcode == BPF_JEQ) {
 			/* if (imm == imm) goto pc+off;
 			 * only follow the goto, ignore fall-through
@@ -2327,17 +2480,30 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 
 	/* detect if we are comparing against a constant value so we can adjust
 	 * our min/max values for our dst register.
+	 * this is only legit if both are scalars (or pointers to the same
+	 * object, I suppose, but we don't support that right now), because
+	 * otherwise the different base pointers mean the offsets aren't
+	 * comparable.
 	 */
 	if (BPF_SRC(insn->code) == BPF_X) {
-		if (regs[insn->src_reg].type == CONST_IMM)
-			reg_set_min_max(&other_branch->regs[insn->dst_reg],
-					dst_reg, regs[insn->src_reg].imm,
-					opcode);
-		else if (dst_reg->type == CONST_IMM)
-			reg_set_min_max_inv(&other_branch->regs[insn->src_reg],
-					    &regs[insn->src_reg], dst_reg->imm,
-					    opcode);
-	} else {
+		if (dst_reg->type == SCALAR_VALUE &&
+		    regs[insn->src_reg].type == SCALAR_VALUE) {
+			if (regs[insn->src_reg].align.mask == 0)
+				reg_set_min_max(&other_branch->regs[insn->dst_reg],
+						dst_reg, regs[insn->src_reg].align.value,
+						opcode);
+			else if (dst_reg->align.mask == 0)
+				reg_set_min_max_inv(&other_branch->regs[insn->src_reg],
+						    &regs[insn->src_reg],
+						    dst_reg->align.value, opcode);
+			else if (opcode == BPF_JEQ || opcode == BPF_JNE)
+				/* Comparing for equality, we can combine knowledge */
+				reg_combine_min_max(&other_branch->regs[insn->src_reg],
+						    &other_branch->regs[insn->dst_reg],
+						    &regs[insn->src_reg],
+						    &regs[insn->dst_reg], opcode);
+		}
+	} else if (dst_reg->type == SCALAR_VALUE) {
 		reg_set_min_max(&other_branch->regs[insn->dst_reg],
 					dst_reg, insn->imm, opcode);
 	}
@@ -2349,10 +2515,8 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 		/* Mark all identical map registers in each branch as either
 		 * safe or unknown depending R == 0 or R != 0 conditional.
 		 */
-		mark_map_regs(this_branch, insn->dst_reg,
-			      opcode == BPF_JEQ ? PTR_TO_MAP_VALUE : UNKNOWN_VALUE);
-		mark_map_regs(other_branch, insn->dst_reg,
-			      opcode == BPF_JEQ ? UNKNOWN_VALUE : PTR_TO_MAP_VALUE);
+		mark_map_regs(this_branch, insn->dst_reg, opcode == BPF_JNE);
+		mark_map_regs(other_branch, insn->dst_reg, opcode == BPF_JEQ);
 	} else if (BPF_SRC(insn->code) == BPF_X && opcode == BPF_JGT &&
 		   dst_reg->type == PTR_TO_PACKET &&
 		   regs[insn->src_reg].type == PTR_TO_PACKET_END) {
@@ -2400,8 +2564,11 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	if (insn->src_reg == 0) {
 		u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
 
-		regs[insn->dst_reg].type = CONST_IMM;
-		regs[insn->dst_reg].imm = imm;
+		regs[insn->dst_reg].type = SCALAR_VALUE;
+		regs[insn->dst_reg].min_value = imm;
+		regs[insn->dst_reg].max_value = imm;
+		check_reg_overflow(&regs[insn->dst_reg]);
+		regs[insn->dst_reg].align = tn_const(imm);
 		return 0;
 	}
 
@@ -2482,7 +2649,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	/* mark destination R0 register as readable, since it contains
 	 * the value fetched from the packet
 	 */
-	regs[BPF_REG_0].type = UNKNOWN_VALUE;
+	mark_reg_unknown(regs, BPF_REG_0);
 	return 0;
 }
 
@@ -2685,57 +2852,102 @@ static int check_cfg(struct bpf_verifier_env *env)
 	return ret;
 }
 
-/* the following conditions reduce the number of explored insns
- * from ~140k to ~80k for ultra large programs that use a lot of ptr_to_packet
- */
-static bool compare_ptrs_to_packet(struct bpf_verifier_env *env,
-				   struct bpf_reg_state *old,
-				   struct bpf_reg_state *cur)
+/* check %cur's range satisfies %old's */
+static bool range_within(struct bpf_reg_state *old,
+			 struct bpf_reg_state *cur)
 {
-	if (old->id != cur->id)
-		return false;
+	return old->min_value <= cur->min_value &&
+	       old->max_value >= cur->max_value;
+}
 
-	/* old ptr_to_packet is more conservative, since it allows smaller
-	 * range. Ex:
-	 * old(off=0,r=10) is equal to cur(off=0,r=20), because
-	 * old(off=0,r=10) means that with range=10 the verifier proceeded
-	 * further and found no issues with the program. Now we're in the same
-	 * spot with cur(off=0,r=20), so we're safe too, since anything further
-	 * will only be looking at most 10 bytes after this pointer.
-	 */
-	if (old->off == cur->off && old->range < cur->range)
+/* Returns true if (rold safe implies rcur safe) */
+static bool regsafe(struct bpf_reg_state *rold,
+		    struct bpf_reg_state *rcur,
+		    bool varlen_map_access)
+{
+	if (memcmp(rold, rcur, sizeof(*rold)) == 0)
 		return true;
 
-	/* old(off=20,r=10) is equal to cur(off=22,re=22 or 5 or 0)
-	 * since both cannot be used for packet access and safe(old)
-	 * pointer has smaller off that could be used for further
-	 * 'if (ptr > data_end)' check
-	 * Ex:
-	 * old(off=20,r=10) and cur(off=22,r=22) and cur(off=22,r=0) mean
-	 * that we cannot access the packet.
-	 * The safe range is:
-	 * [ptr, ptr + range - off)
-	 * so whenever off >=range, it means no safe bytes from this pointer.
-	 * When comparing old->off <= cur->off, it means that older code
-	 * went with smaller offset and that offset was later
-	 * used to figure out the safe range after 'if (ptr > data_end)' check
-	 * Say, 'old' state was explored like:
-	 * ... R3(off=0, r=0)
-	 * R4 = R3 + 20
-	 * ... now R4(off=20,r=0)  <-- here
-	 * if (R4 > data_end)
-	 * ... R4(off=20,r=20), R3(off=0,r=20) and R3 can be used to access.
-	 * ... the code further went all the way to bpf_exit.
-	 * Now the 'cur' state at the mark 'here' has R4(off=30,r=0).
-	 * old_R4(off=20,r=0) equal to cur_R4(off=30,r=0), since if the verifier
-	 * goes further, such cur_R4 will give larger safe packet range after
-	 * 'if (R4 > data_end)' and all further insn were already good with r=20,
-	 * so they will be good with r=30 and we can prune the search.
-	 */
-	if (!env->strict_alignment && old->off <= cur->off &&
-	    old->off >= old->range && cur->off >= cur->range)
+	if (rold->type == NOT_INIT)
+		/* explored state can't have used this */
 		return true;
+	if (rcur->type == NOT_INIT)
+		return false;
+	switch (rold->type) {
+	case SCALAR_VALUE:
+		if (rcur->type == SCALAR_VALUE) {
+			/* new val must satisfy old val knowledge */
+			return range_within(rold, rcur) &&
+			       tn_in(rold->align, rcur->align);
+		} else {
+			/* if we knew anything about the old value, we're not
+			 * equal, because we can't know anything about the
+			 * scalar value of the pointer in the new value.
+			 */
+			return rold->min_value == BPF_REGISTER_MIN_RANGE &&
+			       rold->max_value == BPF_REGISTER_MAX_RANGE &&
+			       !~rold->align.mask;
+		}
+	case PTR_TO_MAP_VALUE:
+		if (varlen_map_access) {
+			/* If the new min/max/align satisfy the old ones and
+			 * everything else matches, we are OK.
+			 * We don't care about the 'id' value, because nothing
+			 * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL)
+			 */
+			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
+			       range_within(rold, rcur) &&
+			       tn_in(rold->align, rcur->align);
+		} else {
+			/* If the ranges/align were not the same, but
+			 * everything else was and we didn't do a variable
+			 * access into a map then we are a-ok.
+			 */
+			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0;
+		}
+	case PTR_TO_MAP_VALUE_OR_NULL:
+		/* a PTR_TO_MAP_VALUE with no offset (fixed or variable) can
+		 * safely be used as a PTR_TO_MAP_VALUE_OR_NULL into the same
+		 * map.  (We can't do the same thing for a CONST_PTR_TO_MAP,
+		 * because its map_ptr changed when we NULL-checked it.)
+		 */
+		return rcur->type == PTR_TO_MAP_VALUE &&
+		       rcur->map_ptr == rold->map_ptr &&
+		       rcur->align.mask == 0 &&
+		       rcur->off == 0;
+	case PTR_TO_PACKET:
+		if (rcur->type != PTR_TO_PACKET)
+			return false;
+		/* We must have at least as much range as the old ptr
+		 * did, so that any accesses which were safe before are
+		 * still safe.  This is true even if old range < old off,
+		 * since someone could have accessed through (ptr - k), or
+		 * even done ptr -= k in a register, to get a safe access.
+		 */
+		if (rold->range > rcur->range)
+			return false;
+		/* If the offsets don't match, we can't trust our align;
+		 * nor can we be sure that we won't fall out of range.
+		 */
+		if (rold->off != rcur->off)
+			return false;
+		/* new val must satisfy old val knowledge */
+		return range_within(rold, rcur) &&
+		       tn_in(rold->align, rcur->align);
+	case PTR_TO_CTX:
+	case CONST_PTR_TO_MAP:
+	case PTR_TO_STACK:
+	case PTR_TO_PACKET_END:
+		/* Only valid matches are exact, which memcmp() above
+		 * would have accepted
+		 */
+	default:
+		/* Don't know what's going on, just say it's not safe */
+		return false;
+	}
 
+	/* Shouldn't get here; if we do, say it's not safe */
+	WARN_ON_ONCE(1);
 	return false;
 }
 
@@ -2770,43 +2982,11 @@ static bool states_equal(struct bpf_verifier_env *env,
 			 struct bpf_verifier_state *cur)
 {
 	bool varlen_map_access = env->varlen_map_value_access;
-	struct bpf_reg_state *rold, *rcur;
 	int i;
 
 	for (i = 0; i < MAX_BPF_REG; i++) {
-		rold = &old->regs[i];
-		rcur = &cur->regs[i];
-
-		if (memcmp(rold, rcur, sizeof(*rold)) == 0)
-			continue;
-
-		/* If the ranges were not the same, but everything else was and
-		 * we didn't do a variable access into a map then we are a-ok.
-		 */
-		if (!varlen_map_access &&
-		    memcmp(rold, rcur, offsetofend(struct bpf_reg_state, id)) == 0)
-			continue;
-
-		/* If we didn't map access then again we don't care about the
-		 * mismatched range values and it's ok if our old type was
-		 * UNKNOWN and we didn't go to a NOT_INIT'ed reg.
-		 */
-		if (rold->type == NOT_INIT ||
-		    (!varlen_map_access && rold->type == UNKNOWN_VALUE &&
-		     rcur->type != NOT_INIT))
-			continue;
-
-		/* Don't care about the reg->id in this case. */
-		if (rold->type == PTR_TO_MAP_VALUE_OR_NULL &&
-		    rcur->type == PTR_TO_MAP_VALUE_OR_NULL &&
-		    rold->map_ptr == rcur->map_ptr)
-			continue;
-
-		if (rold->type == PTR_TO_PACKET && rcur->type == PTR_TO_PACKET &&
-		    compare_ptrs_to_packet(env, rold, rcur))
-			continue;
-
-		return false;
+		if (!regsafe(&old->regs[i], &cur->regs[i], varlen_map_access))
+			return false;
 	}
 
 	for (i = 0; i < MAX_BPF_STACK; i++) {
@@ -2821,16 +3001,18 @@ static bool states_equal(struct bpf_verifier_env *env,
 			return false;
 		if (i % BPF_REG_SIZE)
 			continue;
-		if (memcmp(&old->spilled_regs[i / BPF_REG_SIZE],
-			   &cur->spilled_regs[i / BPF_REG_SIZE],
-			   sizeof(old->spilled_regs[0])))
-			/* when explored and current stack slot types are
-			 * the same, check that stored pointers types
+		if (old->stack_slot_type[i] == STACK_MISC)
+			continue;
+		if (!regsafe(&old->spilled_regs[i / BPF_REG_SIZE],
+			     &cur->spilled_regs[i / BPF_REG_SIZE],
+			     varlen_map_access))
+			/* when explored and current stack slot are both storing
+			 * spilled registers, check that stored pointers types
 			 * are the same as well.
 			 * Ex: explored safe path could have stored
-			 * (bpf_reg_state) {.type = PTR_TO_STACK, .imm = -8}
+			 * (bpf_reg_state) {.type = PTR_TO_STACK, .off = -8}
 			 * but current path has stored:
-			 * (bpf_reg_state) {.type = PTR_TO_STACK, .imm = -16}
+			 * (bpf_reg_state) {.type = PTR_TO_STACK, .off = -16}
 			 * such verifier states are not equivalent.
 			 * return false to continue verification of this path
 			 */
@@ -3158,7 +3340,6 @@ static int do_check(struct bpf_verifier_env *env)
 				verbose("invalid BPF_LD mode\n");
 				return -EINVAL;
 			}
-			reset_reg_range_values(regs, insn->dst_reg);
 		} else {
 			verbose("unknown insn class %d\n", class);
 			return -EINVAL;

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-07 14:58   ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-07 14:58 UTC (permalink / raw)
  To: davem, Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, iovisor-dev, LKML

If pointer leaks are allowed, and adjust_ptr_min_max_vals returns -EACCES,
 treat the pointer as an unknown scalar and try again, because we might be
 able to conclude something about the result (e.g. pointer & 0x40 is either
 0 or 0x40).

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 kernel/bpf/verifier.c | 244 ++++++++++++++++++++++++++------------------------
 1 file changed, 127 insertions(+), 117 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dd06e4e..1ff5b5d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1566,6 +1566,8 @@ static void coerce_reg_to_32(struct bpf_reg_state *reg)
 /* Handles arithmetic on a pointer and a scalar: computes new min/max and align.
  * Caller must check_reg_overflow all argument regs beforehand.
  * Caller should also handle BPF_MOV case separately.
+ * If we return -EACCES, caller may want to try again treating pointer as a
+ * scalar.  So we only emit a diagnostic if !env->allow_ptr_leaks.
  */
 static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 				   struct bpf_insn *insn,
@@ -1588,43 +1590,29 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 
 	if (BPF_CLASS(insn->code) != BPF_ALU64) {
 		/* 32-bit ALU ops on pointers produce (meaningless) scalars */
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d 32-bit pointer arithmetic prohibited\n",
 				dst);
-			return -EACCES;
-		}
-		__mark_reg_unknown(dst_reg);
-		/* High bits are known zero */
-		dst_reg->align.mask = (u32)-1;
-		return 0;
+		return -EACCES;
 	}
 
 	if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
 				dst);
-			return -EACCES;
-		}
-		__mark_reg_unknown(dst_reg);
-		return 0;
+		return -EACCES;
 	}
 	if (ptr_reg->type == CONST_PTR_TO_MAP) {
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n",
 				dst);
-			return -EACCES;
-		}
-		__mark_reg_unknown(dst_reg);
-		return 0;
+		return -EACCES;
 	}
 	if (ptr_reg->type == PTR_TO_PACKET_END) {
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n",
 				dst);
-			return -EACCES;
-		}
-		__mark_reg_unknown(dst_reg);
-		return 0;
+		return -EACCES;
 	}
 
 	/* In case of 'scalar += pointer', dst_reg inherits pointer type and id.
@@ -1648,8 +1636,9 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 			break;
 		}
 		if (max_val == BPF_REGISTER_MAX_RANGE) {
-			verbose("R%d tried to add unbounded value to pointer\n",
-				dst);
+			if (!env->allow_ptr_leaks)
+				verbose("R%d tried to add unbounded value to pointer\n",
+					dst);
 			return -EACCES;
 		}
 		/* A new variable offset is created.  Note that off_reg->off
@@ -1676,28 +1665,20 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 	case BPF_SUB:
 		if (dst_reg == off_reg) {
 			/* scalar -= pointer.  Creates an unknown scalar */
-			if (!env->allow_ptr_leaks) {
+			if (!env->allow_ptr_leaks)
 				verbose("R%d tried to subtract pointer from scalar\n",
 					dst);
-				return -EACCES;
-			}
-			/* Make it an unknown scalar */
-			__mark_reg_unknown(dst_reg);
-			break;
+			return -EACCES;
 		}
 		/* We don't allow subtraction from FP, because (according to
 		 * test_verifier.c test "invalid fp arithmetic", JITs might not
 		 * be able to deal with it.
 		 */
 		if (ptr_reg->type == PTR_TO_STACK) {
-			if (!env->allow_ptr_leaks) {
+			if (!env->allow_ptr_leaks)
 				verbose("R%d subtraction from stack pointer prohibited\n",
 					dst);
-				return -EACCES;
-			}
-			/* Make it an unknown scalar */
-			__mark_reg_unknown(dst_reg);
-			break;
+			return -EACCES;
 		}
 		if (known && (ptr_reg->off - min_val ==
 			      (s64)(s32)(ptr_reg->off - min_val))) {
@@ -1713,14 +1694,10 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		 * This can happen if off_reg is an immediate.
 		 */
 		if ((s64)max_val < 0) {
-			if (!env->allow_ptr_leaks) {
+			if (!env->allow_ptr_leaks)
 				verbose("R%d tried to subtract negative max_val %lld from pointer\n",
 					dst, (s64)max_val);
-				return -EACCES;
-			}
-			/* Make it an unknown scalar */
-			__mark_reg_unknown(dst_reg);
-			break;
+			return -EACCES;
 		}
 		/* A new variable offset is created.  If the subtrahend is known
 		 * nonnegative, then any reg->range we had before is still good.
@@ -1747,99 +1724,37 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		 * (However, in principle we could allow some cases, e.g.
 		 * ptr &= ~3 which would reduce min_value by 3.)
 		 */
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d bitwise operator %s on pointer prohibited\n",
 				dst, bpf_alu_string[opcode >> 4]);
-			return -EACCES;
-		}
-		/* Make it an unknown scalar */
-		__mark_reg_unknown(dst_reg);
+		return -EACCES;
 	default:
 		/* other operators (e.g. MUL,LSH) produce non-pointer results */
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d pointer arithmetic with %s operator prohibited\n",
 				dst, bpf_alu_string[opcode >> 4]);
-			return -EACCES;
-		}
-		/* Make it an unknown scalar */
-		__mark_reg_unknown(dst_reg);
+		return -EACCES;
 	}
 
 	check_reg_overflow(dst_reg);
 	return 0;
 }
 
-/* Handles ALU ops other than BPF_END, BPF_NEG and BPF_MOV: computes new min/max
- * and align.
- * TODO: check this is legit for ALU32, particularly around negatives
- */
-static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
-				   struct bpf_insn *insn)
+static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
+				      struct bpf_insn *insn,
+				      struct bpf_reg_state *dst_reg,
+				      struct bpf_reg_state *src_reg)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg, *src_reg;
-	struct bpf_reg_state *ptr_reg = NULL, off_reg = {0};
+	struct bpf_reg_state *regs = env->cur_state.regs;
 	s64 min_val = BPF_REGISTER_MIN_RANGE;
 	u64 max_val = BPF_REGISTER_MAX_RANGE;
 	u8 opcode = BPF_OP(insn->code);
 	bool src_known, dst_known;
 
-	dst_reg = &regs[insn->dst_reg];
-	check_reg_overflow(dst_reg);
-	src_reg = NULL;
-	if (dst_reg->type != SCALAR_VALUE)
-		ptr_reg = dst_reg;
-	if (BPF_SRC(insn->code) == BPF_X) {
-		src_reg = &regs[insn->src_reg];
-		check_reg_overflow(src_reg);
-
-		if (src_reg->type != SCALAR_VALUE) {
-			if (dst_reg->type != SCALAR_VALUE) {
-				/* Combining two pointers by any ALU op yields
-				 * an arbitrary scalar.
-				 */
-				if (!env->allow_ptr_leaks) {
-					verbose("R%d pointer %s pointer prohibited\n",
-						insn->dst_reg,
-						bpf_alu_string[opcode >> 4]);
-					return -EACCES;
-				}
-				mark_reg_unknown(regs, insn->dst_reg);
-				return 0;
-			} else {
-				/* scalar += pointer
-				 * This is legal, but we have to reverse our
-				 * src/dest handling in computing the range
-				 */
-				return adjust_ptr_min_max_vals(env, insn,
-							       src_reg, dst_reg);
-			}
-		} else if (ptr_reg) {
-			/* pointer += scalar */
-			return adjust_ptr_min_max_vals(env, insn,
-						       dst_reg, src_reg);
-		}
-	} else {
-		/* Pretend the src is a reg with a known value, since we only
-		 * need to be able to read from this state.
-		 */
-		off_reg.type = SCALAR_VALUE;
-		off_reg.align = tn_const(insn->imm);
-		off_reg.min_value = insn->imm;
-		off_reg.max_value = insn->imm;
-		src_reg = &off_reg;
-		if (ptr_reg) /* pointer += K */
-			return adjust_ptr_min_max_vals(env, insn,
-						       ptr_reg, src_reg);
-	}
-
-	/* Got here implies adding two SCALAR_VALUEs */
-	if (WARN_ON_ONCE(ptr_reg)) {
-		verbose("verifier internal error\n");
-		return -EINVAL;
-	}
-	if (WARN_ON(!src_reg)) {
-		verbose("verifier internal error\n");
-		return -EINVAL;
+	if (BPF_CLASS(insn->code) != BPF_ALU64) {
+		/* 32-bit ALU ops are (32,32)->64 */
+		coerce_reg_to_32(dst_reg);
+		coerce_reg_to_32(src_reg);
 	}
 	if (BPF_CLASS(insn->code) != BPF_ALU64) {
 		/* 32-bit ALU ops are (32,32)->64 */
@@ -1990,6 +1905,101 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 	return 0;
 }
 
+/* Handles ALU ops other than BPF_END, BPF_NEG and BPF_MOV: computes new min/max
+ * and align.
+ */
+static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
+				   struct bpf_insn *insn)
+{
+	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg, *src_reg;
+	struct bpf_reg_state *ptr_reg = NULL, off_reg = {0};
+	u8 opcode = BPF_OP(insn->code);
+	int rc;
+
+	dst_reg = &regs[insn->dst_reg];
+	check_reg_overflow(dst_reg);
+	src_reg = NULL;
+	if (dst_reg->type != SCALAR_VALUE)
+		ptr_reg = dst_reg;
+	if (BPF_SRC(insn->code) == BPF_X) {
+		src_reg = &regs[insn->src_reg];
+		check_reg_overflow(src_reg);
+
+		if (src_reg->type != SCALAR_VALUE) {
+			if (dst_reg->type != SCALAR_VALUE) {
+				/* Combining two pointers by any ALU op yields
+				 * an arbitrary scalar.
+				 */
+				if (!env->allow_ptr_leaks) {
+					verbose("R%d pointer %s pointer prohibited\n",
+						insn->dst_reg,
+						bpf_alu_string[opcode >> 4]);
+					return -EACCES;
+				}
+				mark_reg_unknown(regs, insn->dst_reg);
+				return 0;
+			} else {
+				/* scalar += pointer
+				 * This is legal, but we have to reverse our
+				 * src/dest handling in computing the range
+				 */
+				rc = adjust_ptr_min_max_vals(env, insn,
+							     src_reg, dst_reg);
+				if (rc == -EACCES && env->allow_ptr_leaks) {
+					/* scalar += unknown scalar */
+					__mark_reg_unknown(&off_reg);
+					return adjust_scalar_min_max_vals(
+							env, insn,
+							dst_reg, &off_reg);
+				}
+				return rc;
+			}
+		} else if (ptr_reg) {
+			/* pointer += scalar */
+			rc = adjust_ptr_min_max_vals(env, insn,
+						     dst_reg, src_reg);
+			if (rc == -EACCES && env->allow_ptr_leaks) {
+				/* unknown scalar += scalar */
+				__mark_reg_unknown(dst_reg);
+				return adjust_scalar_min_max_vals(
+						env, insn, dst_reg, src_reg);
+			}
+			return rc;
+		}
+	} else {
+		/* Pretend the src is a reg with a known value, since we only
+		 * need to be able to read from this state.
+		 */
+		off_reg.type = SCALAR_VALUE;
+		off_reg.align = tn_const(insn->imm);
+		off_reg.min_value = insn->imm;
+		off_reg.max_value = insn->imm;
+		src_reg = &off_reg;
+		if (ptr_reg) { /* pointer += K */
+			rc = adjust_ptr_min_max_vals(env, insn,
+						     ptr_reg, src_reg);
+			if (rc == -EACCES && env->allow_ptr_leaks) {
+				/* unknown scalar += K */
+				__mark_reg_unknown(dst_reg);
+				return adjust_scalar_min_max_vals(
+						env, insn, dst_reg, &off_reg);
+			}
+			return rc;
+		}
+	}
+
+	/* Got here implies adding two SCALAR_VALUEs */
+	if (WARN_ON_ONCE(ptr_reg)) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
+	}
+	if (WARN_ON(!src_reg)) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
+	}
+	return adjust_scalar_min_max_vals(env, insn, dst_reg, src_reg);
+}
+
 /* check validity of 32-bit and 64-bit arithmetic operations */
 static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 {

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-07 14:58   ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-07 14:58 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov, Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev, LKML

If pointer leaks are allowed, and adjust_ptr_min_max_vals returns -EACCES,
 treat the pointer as an unknown scalar and try again, because we might be
 able to conclude something about the result (e.g. pointer & 0x40 is either
 0 or 0x40).

Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
---
 kernel/bpf/verifier.c | 244 ++++++++++++++++++++++++++------------------------
 1 file changed, 127 insertions(+), 117 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dd06e4e..1ff5b5d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1566,6 +1566,8 @@ static void coerce_reg_to_32(struct bpf_reg_state *reg)
 /* Handles arithmetic on a pointer and a scalar: computes new min/max and align.
  * Caller must check_reg_overflow all argument regs beforehand.
  * Caller should also handle BPF_MOV case separately.
+ * If we return -EACCES, caller may want to try again treating pointer as a
+ * scalar.  So we only emit a diagnostic if !env->allow_ptr_leaks.
  */
 static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 				   struct bpf_insn *insn,
@@ -1588,43 +1590,29 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 
 	if (BPF_CLASS(insn->code) != BPF_ALU64) {
 		/* 32-bit ALU ops on pointers produce (meaningless) scalars */
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d 32-bit pointer arithmetic prohibited\n",
 				dst);
-			return -EACCES;
-		}
-		__mark_reg_unknown(dst_reg);
-		/* High bits are known zero */
-		dst_reg->align.mask = (u32)-1;
-		return 0;
+		return -EACCES;
 	}
 
 	if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
 				dst);
-			return -EACCES;
-		}
-		__mark_reg_unknown(dst_reg);
-		return 0;
+		return -EACCES;
 	}
 	if (ptr_reg->type == CONST_PTR_TO_MAP) {
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n",
 				dst);
-			return -EACCES;
-		}
-		__mark_reg_unknown(dst_reg);
-		return 0;
+		return -EACCES;
 	}
 	if (ptr_reg->type == PTR_TO_PACKET_END) {
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n",
 				dst);
-			return -EACCES;
-		}
-		__mark_reg_unknown(dst_reg);
-		return 0;
+		return -EACCES;
 	}
 
 	/* In case of 'scalar += pointer', dst_reg inherits pointer type and id.
@@ -1648,8 +1636,9 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 			break;
 		}
 		if (max_val == BPF_REGISTER_MAX_RANGE) {
-			verbose("R%d tried to add unbounded value to pointer\n",
-				dst);
+			if (!env->allow_ptr_leaks)
+				verbose("R%d tried to add unbounded value to pointer\n",
+					dst);
 			return -EACCES;
 		}
 		/* A new variable offset is created.  Note that off_reg->off
@@ -1676,28 +1665,20 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 	case BPF_SUB:
 		if (dst_reg == off_reg) {
 			/* scalar -= pointer.  Creates an unknown scalar */
-			if (!env->allow_ptr_leaks) {
+			if (!env->allow_ptr_leaks)
 				verbose("R%d tried to subtract pointer from scalar\n",
 					dst);
-				return -EACCES;
-			}
-			/* Make it an unknown scalar */
-			__mark_reg_unknown(dst_reg);
-			break;
+			return -EACCES;
 		}
 		/* We don't allow subtraction from FP, because (according to
 		 * test_verifier.c test "invalid fp arithmetic", JITs might not
 		 * be able to deal with it.
 		 */
 		if (ptr_reg->type == PTR_TO_STACK) {
-			if (!env->allow_ptr_leaks) {
+			if (!env->allow_ptr_leaks)
 				verbose("R%d subtraction from stack pointer prohibited\n",
 					dst);
-				return -EACCES;
-			}
-			/* Make it an unknown scalar */
-			__mark_reg_unknown(dst_reg);
-			break;
+			return -EACCES;
 		}
 		if (known && (ptr_reg->off - min_val ==
 			      (s64)(s32)(ptr_reg->off - min_val))) {
@@ -1713,14 +1694,10 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		 * This can happen if off_reg is an immediate.
 		 */
 		if ((s64)max_val < 0) {
-			if (!env->allow_ptr_leaks) {
+			if (!env->allow_ptr_leaks)
 				verbose("R%d tried to subtract negative max_val %lld from pointer\n",
 					dst, (s64)max_val);
-				return -EACCES;
-			}
-			/* Make it an unknown scalar */
-			__mark_reg_unknown(dst_reg);
-			break;
+			return -EACCES;
 		}
 		/* A new variable offset is created.  If the subtrahend is known
 		 * nonnegative, then any reg->range we had before is still good.
@@ -1747,99 +1724,37 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		 * (However, in principle we could allow some cases, e.g.
 		 * ptr &= ~3 which would reduce min_value by 3.)
 		 */
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d bitwise operator %s on pointer prohibited\n",
 				dst, bpf_alu_string[opcode >> 4]);
-			return -EACCES;
-		}
-		/* Make it an unknown scalar */
-		__mark_reg_unknown(dst_reg);
+		return -EACCES;
 	default:
 		/* other operators (e.g. MUL,LSH) produce non-pointer results */
-		if (!env->allow_ptr_leaks) {
+		if (!env->allow_ptr_leaks)
 			verbose("R%d pointer arithmetic with %s operator prohibited\n",
 				dst, bpf_alu_string[opcode >> 4]);
-			return -EACCES;
-		}
-		/* Make it an unknown scalar */
-		__mark_reg_unknown(dst_reg);
+		return -EACCES;
 	}
 
 	check_reg_overflow(dst_reg);
 	return 0;
 }
 
-/* Handles ALU ops other than BPF_END, BPF_NEG and BPF_MOV: computes new min/max
- * and align.
- * TODO: check this is legit for ALU32, particularly around negatives
- */
-static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
-				   struct bpf_insn *insn)
+static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
+				      struct bpf_insn *insn,
+				      struct bpf_reg_state *dst_reg,
+				      struct bpf_reg_state *src_reg)
 {
-	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg, *src_reg;
-	struct bpf_reg_state *ptr_reg = NULL, off_reg = {0};
+	struct bpf_reg_state *regs = env->cur_state.regs;
 	s64 min_val = BPF_REGISTER_MIN_RANGE;
 	u64 max_val = BPF_REGISTER_MAX_RANGE;
 	u8 opcode = BPF_OP(insn->code);
 	bool src_known, dst_known;
 
-	dst_reg = &regs[insn->dst_reg];
-	check_reg_overflow(dst_reg);
-	src_reg = NULL;
-	if (dst_reg->type != SCALAR_VALUE)
-		ptr_reg = dst_reg;
-	if (BPF_SRC(insn->code) == BPF_X) {
-		src_reg = &regs[insn->src_reg];
-		check_reg_overflow(src_reg);
-
-		if (src_reg->type != SCALAR_VALUE) {
-			if (dst_reg->type != SCALAR_VALUE) {
-				/* Combining two pointers by any ALU op yields
-				 * an arbitrary scalar.
-				 */
-				if (!env->allow_ptr_leaks) {
-					verbose("R%d pointer %s pointer prohibited\n",
-						insn->dst_reg,
-						bpf_alu_string[opcode >> 4]);
-					return -EACCES;
-				}
-				mark_reg_unknown(regs, insn->dst_reg);
-				return 0;
-			} else {
-				/* scalar += pointer
-				 * This is legal, but we have to reverse our
-				 * src/dest handling in computing the range
-				 */
-				return adjust_ptr_min_max_vals(env, insn,
-							       src_reg, dst_reg);
-			}
-		} else if (ptr_reg) {
-			/* pointer += scalar */
-			return adjust_ptr_min_max_vals(env, insn,
-						       dst_reg, src_reg);
-		}
-	} else {
-		/* Pretend the src is a reg with a known value, since we only
-		 * need to be able to read from this state.
-		 */
-		off_reg.type = SCALAR_VALUE;
-		off_reg.align = tn_const(insn->imm);
-		off_reg.min_value = insn->imm;
-		off_reg.max_value = insn->imm;
-		src_reg = &off_reg;
-		if (ptr_reg) /* pointer += K */
-			return adjust_ptr_min_max_vals(env, insn,
-						       ptr_reg, src_reg);
-	}
-
-	/* Got here implies adding two SCALAR_VALUEs */
-	if (WARN_ON_ONCE(ptr_reg)) {
-		verbose("verifier internal error\n");
-		return -EINVAL;
-	}
-	if (WARN_ON(!src_reg)) {
-		verbose("verifier internal error\n");
-		return -EINVAL;
+	if (BPF_CLASS(insn->code) != BPF_ALU64) {
+		/* 32-bit ALU ops are (32,32)->64 */
+		coerce_reg_to_32(dst_reg);
+		coerce_reg_to_32(src_reg);
 	}
 	if (BPF_CLASS(insn->code) != BPF_ALU64) {
 		/* 32-bit ALU ops are (32,32)->64 */
@@ -1990,6 +1905,101 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 	return 0;
 }
 
+/* Handles ALU ops other than BPF_END, BPF_NEG and BPF_MOV: computes new min/max
+ * and align.
+ */
+static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
+				   struct bpf_insn *insn)
+{
+	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg, *src_reg;
+	struct bpf_reg_state *ptr_reg = NULL, off_reg = {0};
+	u8 opcode = BPF_OP(insn->code);
+	int rc;
+
+	dst_reg = &regs[insn->dst_reg];
+	check_reg_overflow(dst_reg);
+	src_reg = NULL;
+	if (dst_reg->type != SCALAR_VALUE)
+		ptr_reg = dst_reg;
+	if (BPF_SRC(insn->code) == BPF_X) {
+		src_reg = &regs[insn->src_reg];
+		check_reg_overflow(src_reg);
+
+		if (src_reg->type != SCALAR_VALUE) {
+			if (dst_reg->type != SCALAR_VALUE) {
+				/* Combining two pointers by any ALU op yields
+				 * an arbitrary scalar.
+				 */
+				if (!env->allow_ptr_leaks) {
+					verbose("R%d pointer %s pointer prohibited\n",
+						insn->dst_reg,
+						bpf_alu_string[opcode >> 4]);
+					return -EACCES;
+				}
+				mark_reg_unknown(regs, insn->dst_reg);
+				return 0;
+			} else {
+				/* scalar += pointer
+				 * This is legal, but we have to reverse our
+				 * src/dest handling in computing the range
+				 */
+				rc = adjust_ptr_min_max_vals(env, insn,
+							     src_reg, dst_reg);
+				if (rc == -EACCES && env->allow_ptr_leaks) {
+					/* scalar += unknown scalar */
+					__mark_reg_unknown(&off_reg);
+					return adjust_scalar_min_max_vals(
+							env, insn,
+							dst_reg, &off_reg);
+				}
+				return rc;
+			}
+		} else if (ptr_reg) {
+			/* pointer += scalar */
+			rc = adjust_ptr_min_max_vals(env, insn,
+						     dst_reg, src_reg);
+			if (rc == -EACCES && env->allow_ptr_leaks) {
+				/* unknown scalar += scalar */
+				__mark_reg_unknown(dst_reg);
+				return adjust_scalar_min_max_vals(
+						env, insn, dst_reg, src_reg);
+			}
+			return rc;
+		}
+	} else {
+		/* Pretend the src is a reg with a known value, since we only
+		 * need to be able to read from this state.
+		 */
+		off_reg.type = SCALAR_VALUE;
+		off_reg.align = tn_const(insn->imm);
+		off_reg.min_value = insn->imm;
+		off_reg.max_value = insn->imm;
+		src_reg = &off_reg;
+		if (ptr_reg) { /* pointer += K */
+			rc = adjust_ptr_min_max_vals(env, insn,
+						     ptr_reg, src_reg);
+			if (rc == -EACCES && env->allow_ptr_leaks) {
+				/* unknown scalar += K */
+				__mark_reg_unknown(dst_reg);
+				return adjust_scalar_min_max_vals(
+						env, insn, dst_reg, &off_reg);
+			}
+			return rc;
+		}
+	}
+
+	/* Got here implies adding two SCALAR_VALUEs */
+	if (WARN_ON_ONCE(ptr_reg)) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
+	}
+	if (WARN_ON(!src_reg)) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
+	}
+	return adjust_scalar_min_max_vals(env, insn, dst_reg, src_reg);
+}
+
 /* check validity of 32-bit and 64-bit arithmetic operations */
 static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 {

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH net-next 4/5] bpf/verifier: track signed and unsigned min/max values
  2017-06-07 14:55 ` Edward Cree via iovisor-dev
                   ` (3 preceding siblings ...)
  (?)
@ 2017-06-07 14:59 ` Edward Cree
  2017-06-08  2:40     ` Alexei Starovoitov via iovisor-dev
  -1 siblings, 1 reply; 45+ messages in thread
From: Edward Cree @ 2017-06-07 14:59 UTC (permalink / raw)
  To: davem, Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, iovisor-dev, LKML

Allows us to, sometimes, combine information from a signed check of one
 bound and an unsigned check of the other.
We now track the full range of possible values, rather than restricting
 ourselves to [0, 1<<30) and considering anything beyond that as
 unknown.  While this is probably not necessary, it makes the code more
 straightforward and symmetrical between signed and unsigned bounds.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/bpf_verifier.h |  22 +-
 kernel/bpf/verifier.c        | 661 +++++++++++++++++++++++++------------------
 2 files changed, 395 insertions(+), 288 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index e341469..10a5944 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -11,11 +11,15 @@
 #include <linux/filter.h> /* for MAX_BPF_STACK */
 #include <linux/tnum.h>
 
- /* Just some arbitrary values so we can safely do math without overflowing and
-  * are obviously wrong for any sort of memory access.
-  */
-#define BPF_REGISTER_MAX_RANGE (1024 * 1024 * 1024)
-#define BPF_REGISTER_MIN_RANGE -1
+/* Maximum variable offset umax_value permitted when resolving memory accesses.
+ * In practice this is far bigger than any realistic pointer offset; this limit
+ * ensures that umax_value + (int)off + (int)size cannot overflow a u64.
+ */
+#define BPF_MAX_VAR_OFF	(1ULL << 31)
+/* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO].  This ensures
+ * that converting umax_value to int cannot overflow.
+ */
+#define BPF_MAX_VAR_SIZ	INT_MAX
 
 struct bpf_reg_state {
 	enum bpf_reg_type type;
@@ -38,7 +42,7 @@ struct bpf_reg_state {
 	 * PTR_TO_MAP_VALUE_OR_NULL, we have to NULL-check it _first_.
 	 */
 	u32 id;
-	/* These three fields must be last.  See states_equal() */
+	/* These five fields must be last.  See states_equal() */
 	/* For scalar types (SCALAR_VALUE), this represents our knowledge of
 	 * the actual value.
 	 * For pointer types, this represents the variable part of the offset
@@ -51,8 +55,10 @@ struct bpf_reg_state {
 	 * These refer to the same value as align, not necessarily the actual
 	 * contents of the register.
 	 */
-	s64 min_value; /* minimum possible (s64)value */
-	u64 max_value; /* maximum possible (u64)value */
+	s64 smin_value; /* minimum possible (s64)value */
+	s64 smax_value; /* maximum possible (s64)value */
+	u64 umin_value; /* minimum possible (u64)value */
+	u64 umax_value; /* maximum possible (u64)value */
 };
 
 enum bpf_stack_slot_type {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1ff5b5d..a5bb3f1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -234,12 +234,20 @@ static void print_verifier_state(struct bpf_verifier_state *state)
 				verbose(",ks=%d,vs=%d",
 					reg->map_ptr->key_size,
 					reg->map_ptr->value_size);
-			if (reg->min_value != BPF_REGISTER_MIN_RANGE)
-				verbose(",min_value=%lld",
-					(long long)reg->min_value);
-			if (reg->max_value != BPF_REGISTER_MAX_RANGE)
-				verbose(",max_value=%llu",
-					(unsigned long long)reg->max_value);
+			if (reg->smin_value != reg->umin_value &&
+			    reg->smin_value != S64_MIN)
+				verbose(",smin_value=%lld",
+					(long long)reg->smin_value);
+			if (reg->smax_value != reg->umax_value &&
+			    reg->smax_value != S64_MAX)
+				verbose(",smax_value=%lld",
+					(long long)reg->smax_value);
+			if (reg->umin_value != 0)
+				verbose(",umin_value=%llu",
+					(unsigned long long)reg->umin_value);
+			if (reg->umax_value != U64_MAX)
+				verbose(",umax_value=%llu",
+					(unsigned long long)reg->umax_value);
 			if (~reg->align.mask) {
 				char tn_buf[48];
 
@@ -464,14 +472,24 @@ static const int caller_saved[CALLER_SAVED_REGS] = {
 	BPF_REG_0, BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4, BPF_REG_5
 };
 
+/* Mark the unknown part of a register (variable offset or scalar value) as
+ * known to have the value @imm.
+ */
+static void __mark_reg_known(struct bpf_reg_state *reg, u64 imm)
+{
+	reg->align = tn_const(imm);
+	reg->smin_value = (s64)imm;
+	reg->smax_value = (s64)imm;
+	reg->umin_value = imm;
+	reg->umax_value = imm;
+}
+
 /* Mark the 'variable offset' part of a register as zero.  This should be
  * used only on registers holding a pointer type.
  */
 static void __mark_reg_known_zero(struct bpf_reg_state *reg)
 {
-	reg->align = tn_const(0);
-	reg->min_value = 0;
-	reg->max_value = 0;
+	__mark_reg_known(reg, 0);
 }
 
 static void mark_reg_known_zero(struct bpf_reg_state *regs, u32 regno)
@@ -480,6 +498,63 @@ static void mark_reg_known_zero(struct bpf_reg_state *regs, u32 regno)
 	__mark_reg_known_zero(regs + regno);
 }
 
+/* Attempts to improve min/max values based on align information */
+static void __update_reg_bounds(struct bpf_reg_state *reg)
+{
+	/* min signed is max(sign bit) | min(other bits) */
+	reg->smin_value = max_t(s64, reg->smin_value,
+				reg->align.value | (reg->align.mask & S64_MIN));
+	/* max signed is min(sign bit) | max(other bits) */
+	reg->smax_value = min_t(s64, reg->smax_value,
+				reg->align.value | (reg->align.mask & S64_MAX));
+	reg->umin_value = max(reg->umin_value, reg->align.value);
+	reg->umax_value = min(reg->umax_value, reg->align.value | reg->align.mask);
+}
+
+/* Uses signed min/max values to inform unsigned, and vice-versa */
+static void __reg_deduce_bounds(struct bpf_reg_state *reg)
+{
+	/* Learn sign from signed bounds.
+	 * If we cannot cross the sign boundary, then signed and unsigned bounds
+	 * are the same, so combine.  This works even in the negative case, e.g.
+	 * -3 s<= x s<= -1 implies 0xf...fd u<= x u<= 0xf...ff.
+	 */
+	if (reg->smin_value >= 0 || reg->smax_value < 0) {
+		reg->smin_value = reg->umin_value = max_t(u64, reg->smin_value,
+							  reg->umin_value);
+		reg->smax_value = reg->umax_value = min_t(u64, reg->smax_value,
+							  reg->umax_value);
+		return;
+	}
+	/* Learn sign from unsigned bounds.  Signed bounds cross the sign
+	 * boundary, so we must be careful.
+	 */
+	if ((s64)reg->umax_value >= 0) {
+		/* Positive.  We can't learn anything from the smin, but smax
+		 * is positive, hence safe.
+		 */
+		reg->smin_value = reg->umin_value;
+		reg->smax_value = reg->umax_value = min_t(u64, reg->smax_value,
+							  reg->umax_value);
+	} else if ((s64)reg->umin_value < 0) {
+		/* Negative.  We can't learn anything from the smax, but smin
+		 * is negative, hence safe.
+		 */
+		reg->smin_value = reg->umin_value = max_t(u64, reg->smin_value,
+							  reg->umin_value);
+		reg->smax_value = reg->umax_value;
+	}
+}
+
+/* Reset the min/max bounds of a register */
+static void __mark_reg_unbounded(struct bpf_reg_state *reg)
+{
+	reg->smin_value = S64_MIN;
+	reg->smax_value = S64_MAX;
+	reg->umin_value = 0;
+	reg->umax_value = U64_MAX;
+}
+
 /* Mark a register as having a completely unknown (scalar) value. */
 static void __mark_reg_unknown(struct bpf_reg_state *reg)
 {
@@ -487,8 +562,7 @@ static void __mark_reg_unknown(struct bpf_reg_state *reg)
 	reg->id = 0;
 	reg->off = 0;
 	reg->align = tn_unknown;
-	reg->min_value = BPF_REGISTER_MIN_RANGE;
-	reg->max_value = BPF_REGISTER_MAX_RANGE;
+	__mark_reg_unbounded(reg);
 }
 
 static void mark_reg_unknown(struct bpf_reg_state *regs, u32 regno)
@@ -697,26 +771,27 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 	 * index'es we need to make sure that whatever we use
 	 * will have a set floor within our range.
 	 */
-	if (reg->min_value < 0) {
+	if (reg->smin_value < 0) {
 		verbose("R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n",
 			regno);
 		return -EACCES;
 	}
-	err = __check_map_access(env, regno, reg->min_value + off, size);
+	err = __check_map_access(env, regno, reg->smin_value + off, size);
 	if (err) {
 		verbose("R%d min value is outside of the array range\n", regno);
 		return err;
 	}
 
-	/* If we haven't set a max value then we need to bail
-	 * since we can't be sure we won't do bad things.
+	/* If we haven't set a max value then we need to bail since we can't be
+	 * sure we won't do bad things.
+	 * If reg->umax_value + off could overflow, treat that as unbounded too.
 	 */
-	if (reg->max_value == BPF_REGISTER_MAX_RANGE) {
+	if (reg->umax_value >= BPF_MAX_VAR_OFF) {
 		verbose("R%d unbounded memory access, make sure to bounds check any array access into a map\n",
 			regno);
 		return -EACCES;
 	}
-	err = __check_map_access(env, regno, reg->max_value + off, size);
+	err = __check_map_access(env, regno, reg->umax_value + off, size);
 	if (err)
 		verbose("R%d max value is outside of the array range\n", regno);
 	return err;
@@ -779,7 +854,7 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
 	/* We don't allow negative numbers, because we aren't tracking enough
 	 * detail to prove they're safe.
 	 */
-	if (reg->min_value < 0) {
+	if (reg->smin_value < 0) {
 		verbose("R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n",
 			regno);
 		return -EACCES;
@@ -1027,11 +1102,7 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 		/* b/h/w load zero-extends, mark upper bits as known 0 */
 		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
 		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
-		/* sign bit is known zero, so we can bound the value */
-		state->regs[value_regno].min_value = 0;
-		state->regs[value_regno].max_value = min_t(u64,
-					state->regs[value_regno].align.mask,
-					BPF_REGISTER_MAX_RANGE);
+		__update_reg_bounds(&state->regs[value_regno]);
 	}
 	return err;
 }
@@ -1282,13 +1353,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			 */
 			meta = NULL;
 
-		if (reg->min_value < 0) {
+		if (reg->smin_value < 0) {
 			verbose("R%d min value is negative, either use unsigned or 'var &= const'\n",
 				regno);
 			return -EACCES;
 		}
 
-		if (reg->min_value == 0) {
+		if (reg->umin_value == 0) {
 			err = check_helper_mem_access(env, regno - 1, 0,
 						      zero_size_allowed,
 						      meta);
@@ -1296,13 +1367,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 				return err;
 		}
 
-		if (reg->max_value == BPF_REGISTER_MAX_RANGE) {
+		if (reg->umax_value >= BPF_MAX_VAR_SIZ) {
 			verbose("R%d unbounded memory access, use 'var &= const' or 'if (var < const)'\n",
 				regno);
 			return -EACCES;
 		}
 		err = check_helper_mem_access(env, regno - 1,
-					      reg->max_value,
+					      reg->umax_value,
 					      zero_size_allowed, meta);
 	}
 
@@ -1537,34 +1608,36 @@ static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
 	return 0;
 }
 
-static void check_reg_overflow(struct bpf_reg_state *reg)
-{
-	if (reg->max_value > BPF_REGISTER_MAX_RANGE)
-		reg->max_value = BPF_REGISTER_MAX_RANGE;
-	if (reg->min_value < BPF_REGISTER_MIN_RANGE ||
-	    reg->min_value > BPF_REGISTER_MAX_RANGE)
-		reg->min_value = BPF_REGISTER_MIN_RANGE;
-}
-
 static void coerce_reg_to_32(struct bpf_reg_state *reg)
 {
-	/* 32-bit values can't be negative as an s64 */
-	if (reg->min_value < 0)
-		reg->min_value = 0;
 	/* clear high 32 bits */
 	reg->align.value &= (u32)-1;
 	reg->align.mask &= (u32)-1;
-	/* Did value become known?  Then update bounds */
-	if (!reg->align.mask) {
-		if ((s64)reg->align.value > BPF_REGISTER_MIN_RANGE)
-			reg->min_value = reg->align.value;
-		if (reg->align.value < BPF_REGISTER_MAX_RANGE)
-			reg->max_value = reg->align.value;
-	}
+	/* Update bounds */
+	__update_reg_bounds(reg);
+}
+
+static bool signed_add_overflows(s64 a, s64 b)
+{
+	/* Do the add in u64, where overflow is well-defined */
+	s64 res = (s64)((u64)a + (u64)b);
+
+	if (b < 0)
+		return res > a;
+	return res < a;
+}
+
+static bool signed_sub_overflows(s64 a, s64 b)
+{
+	/* Do the sub in u64, where overflow is well-defined */
+	s64 res = (s64)((u64)a - (u64)b);
+
+	if (b < 0)
+		return res < a;
+	return res > a;
 }
 
 /* Handles arithmetic on a pointer and a scalar: computes new min/max and align.
- * Caller must check_reg_overflow all argument regs beforehand.
  * Caller should also handle BPF_MOV case separately.
  * If we return -EACCES, caller may want to try again treating pointer as a
  * scalar.  So we only emit a diagnostic if !env->allow_ptr_leaks.
@@ -1576,14 +1649,20 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 {
 	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg;
 	bool known = !off_reg->align.mask;
-	s64 min_val = off_reg->min_value;
-	u64 max_val = off_reg->max_value;
+	s64 smin_val = off_reg->smin_value, smax_val = off_reg->smax_value,
+	    smin_ptr = ptr_reg->smin_value, smax_ptr = ptr_reg->smax_value;
+	u64 umin_val = off_reg->umin_value, umax_val = off_reg->umax_value,
+	    umin_ptr = ptr_reg->umin_value, umax_ptr = ptr_reg->umax_value;
 	u8 opcode = BPF_OP(insn->code);
 	u32 dst = insn->dst_reg;
 
 	dst_reg = &regs[dst];
 
-	if (WARN_ON_ONCE(known && (min_val != max_val))) {
+	if (WARN_ON_ONCE(known && (smin_val != smax_val))) {
+		verbose("verifier internal error\n");
+		return -EINVAL;
+	}
+	if (WARN_ON_ONCE(known && (umin_val != umax_val))) {
 		verbose("verifier internal error\n");
 		return -EINVAL;
 	}
@@ -1626,21 +1705,17 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		/* We can take a fixed offset as long as it doesn't overflow
 		 * the s32 'off' field
 		 */
-		if (known && (ptr_reg->off + min_val ==
-			      (s64)(s32)(ptr_reg->off + min_val))) {
+		if (known && (ptr_reg->off + smin_val ==
+			      (s64)(s32)(ptr_reg->off + smin_val))) {
 			/* pointer += K.  Accumulate it into fixed offset */
-			dst_reg->min_value = ptr_reg->min_value;
-			dst_reg->max_value = ptr_reg->max_value;
+			dst_reg->smin_value = smin_ptr;
+			dst_reg->smax_value = smax_ptr;
+			dst_reg->umin_value = umin_ptr;
+			dst_reg->umax_value = umax_ptr;
 			dst_reg->align = ptr_reg->align;
-			dst_reg->off = ptr_reg->off + min_val;
+			dst_reg->off = ptr_reg->off + smin_val;
 			break;
 		}
-		if (max_val == BPF_REGISTER_MAX_RANGE) {
-			if (!env->allow_ptr_leaks)
-				verbose("R%d tried to add unbounded value to pointer\n",
-					dst);
-			return -EACCES;
-		}
 		/* A new variable offset is created.  Note that off_reg->off
 		 * == 0, since it's a scalar.
 		 * dst_reg gets the pointer type and since some positive
@@ -1649,12 +1724,22 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		 * added into the variable offset, and we copy the fixed offset
 		 * from ptr_reg.
 		 */
-		if (min_val <= BPF_REGISTER_MIN_RANGE)
-			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
-			dst_reg->min_value += min_val;
-		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value += max_val;
+		if (signed_add_overflows(smin_ptr, smin_val) ||
+		    signed_add_overflows(smax_ptr, smax_val)) {
+			dst_reg->smin_value = S64_MIN;
+			dst_reg->smax_value = S64_MAX;
+		} else {
+			dst_reg->smin_value = smin_ptr + smin_val;
+			dst_reg->smax_value = smax_ptr + smax_val;
+		}
+		if (umin_ptr + umin_val < umin_ptr ||
+		    umax_ptr + umax_val < umax_ptr) {
+			dst_reg->umin_value = 0;
+			dst_reg->umax_value = U64_MAX;
+		} else {
+			dst_reg->umin_value = umin_ptr + umin_val;
+			dst_reg->umax_value = umax_ptr + umax_val;
+		}
 		dst_reg->align = tn_add(ptr_reg->align, off_reg->align);
 		dst_reg->off = ptr_reg->off;
 		dst_reg->id = ++env->id_gen;
@@ -1680,40 +1765,43 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 					dst);
 			return -EACCES;
 		}
-		if (known && (ptr_reg->off - min_val ==
-			      (s64)(s32)(ptr_reg->off - min_val))) {
+		if (known && (ptr_reg->off - smin_val ==
+			      (s64)(s32)(ptr_reg->off - smin_val))) {
 			/* pointer -= K.  Subtract it from fixed offset */
-			dst_reg->min_value = ptr_reg->min_value;
-			dst_reg->max_value = ptr_reg->max_value;
+			dst_reg->smin_value = smin_ptr;
+			dst_reg->smax_value = smax_ptr;
+			dst_reg->umin_value = umin_ptr;
+			dst_reg->umax_value = umax_ptr;
 			dst_reg->align = ptr_reg->align;
 			dst_reg->id = ptr_reg->id;
-			dst_reg->off = ptr_reg->off - min_val;
+			dst_reg->off = ptr_reg->off - smin_val;
 			break;
 		}
-		/* Subtracting a negative value will just confuse everything.
-		 * This can happen if off_reg is an immediate.
-		 */
-		if ((s64)max_val < 0) {
-			if (!env->allow_ptr_leaks)
-				verbose("R%d tried to subtract negative max_val %lld from pointer\n",
-					dst, (s64)max_val);
-			return -EACCES;
-		}
 		/* A new variable offset is created.  If the subtrahend is known
 		 * nonnegative, then any reg->range we had before is still good.
 		 */
-		if (max_val >= BPF_REGISTER_MAX_RANGE)
-			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
-			dst_reg->min_value -= max_val;
-		if (min_val <= BPF_REGISTER_MIN_RANGE)
-			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
-		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value -= min_val;
+		if (signed_sub_overflows(smin_ptr, smax_val) ||
+		    signed_sub_overflows(smax_ptr, smin_val)) {
+			/* Overflow possible, we know nothing */
+			dst_reg->smin_value = S64_MIN;
+			dst_reg->smax_value = S64_MAX;
+		} else {
+			dst_reg->smin_value = smin_ptr - smax_val;
+			dst_reg->smax_value = smax_ptr - smin_val;
+		}
+		if (umin_ptr < umax_val) {
+			/* Overflow possible, we know nothing */
+			dst_reg->umin_value = 0;
+			dst_reg->umax_value = U64_MAX;
+		} else {
+			/* Cannot overflow (as long as bounds are consistent) */
+			dst_reg->umin_value = umin_ptr - umax_val;
+			dst_reg->umax_value = umax_ptr - umin_val;
+		}
 		dst_reg->align = tn_sub(ptr_reg->align, off_reg->align);
 		dst_reg->off = ptr_reg->off;
 		dst_reg->id = ++env->id_gen;
-		if (ptr_reg->type == PTR_TO_PACKET && min_val < 0)
+		if (ptr_reg->type == PTR_TO_PACKET && smin_val < 0)
 			/* something was added to pkt_ptr, set range to zero */
 			dst_reg->range = 0;
 		break;
@@ -1736,7 +1824,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
-	check_reg_overflow(dst_reg);
+	__reg_deduce_bounds(dst_reg);
 	return 0;
 }
 
@@ -1746,10 +1834,10 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
 				      struct bpf_reg_state *src_reg)
 {
 	struct bpf_reg_state *regs = env->cur_state.regs;
-	s64 min_val = BPF_REGISTER_MIN_RANGE;
-	u64 max_val = BPF_REGISTER_MAX_RANGE;
 	u8 opcode = BPF_OP(insn->code);
 	bool src_known, dst_known;
+	s64 smin_val, smax_val;
+	u64 umin_val, umax_val;
 
 	if (BPF_CLASS(insn->code) != BPF_ALU64) {
 		/* 32-bit ALU ops are (32,32)->64 */
@@ -1761,147 +1849,204 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
 		coerce_reg_to_32(dst_reg);
 		coerce_reg_to_32(src_reg);
 	}
-	min_val = src_reg->min_value;
-	max_val = src_reg->max_value;
+	smin_val = src_reg->smin_value;
+	smax_val = src_reg->smax_value;
+	umin_val = src_reg->umin_value;
+	umax_val = src_reg->umax_value;
 	src_known = !src_reg->align.mask;
 	dst_known = !dst_reg->align.mask;
 
 	switch (opcode) {
 	case BPF_ADD:
-		if (min_val == BPF_REGISTER_MIN_RANGE)
-			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
-			dst_reg->min_value += min_val;
-		/* if max_val is MAX_RANGE, this will saturate dst->max */
-		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value += max_val;
+		if (signed_add_overflows(dst_reg->smin_value, smin_val) ||
+		    signed_add_overflows(dst_reg->smax_value, smax_val)) {
+			dst_reg->smin_value = S64_MIN;
+			dst_reg->smax_value = S64_MAX;
+		} else {
+			dst_reg->smin_value += smin_val;
+			dst_reg->smax_value += smax_val;
+		}
+		if (dst_reg->umin_value + umin_val < umin_val ||
+		    dst_reg->umax_value + umax_val < umax_val) {
+			dst_reg->umin_value = 0;
+			dst_reg->umax_value = U64_MAX;
+		} else {
+			dst_reg->umin_value += umin_val;
+			dst_reg->umax_value += umax_val;
+		}
 		dst_reg->align = tn_add(dst_reg->align, src_reg->align);
 		break;
 	case BPF_SUB:
-		if (max_val == BPF_REGISTER_MAX_RANGE)
-			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-		if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
-			dst_reg->min_value -= max_val;
-		if (min_val == BPF_REGISTER_MIN_RANGE)
-			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
-		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value -= min_val;
+		if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
+		    signed_sub_overflows(dst_reg->smax_value, smin_val)) {
+			/* Overflow possible, we know nothing */
+			dst_reg->smin_value = S64_MIN;
+			dst_reg->smax_value = S64_MAX;
+		} else {
+			dst_reg->smin_value -= smax_val;
+			dst_reg->smax_value -= smin_val;
+		}
+		if (dst_reg->umin_value < umax_val) {
+			/* Overflow possible, we know nothing */
+			dst_reg->umin_value = 0;
+			dst_reg->umax_value = U64_MAX;
+		} else {
+			/* Cannot overflow (as long as bounds are consistent) */
+			dst_reg->umin_value -= umax_val;
+			dst_reg->umax_value -= umin_val;
+		}
 		dst_reg->align = tn_sub(dst_reg->align, src_reg->align);
 		break;
 	case BPF_MUL:
-		if (min_val < 0 || dst_reg->min_value < 0) {
+		dst_reg->align = tn_mul(dst_reg->align, src_reg->align);
+		if (smin_val < 0 || dst_reg->smin_value < 0) {
 			/* Ain't nobody got time to multiply that sign */
-			__mark_reg_unknown(dst_reg);
+			__mark_reg_unbounded(dst_reg);
+			__update_reg_bounds(dst_reg);
 			break;
 		}
-		dst_reg->min_value *= min_val;
-		/* if max_val is MAX_RANGE, this will saturate dst->max.
-		 * We know MAX_RANGE ** 2 won't overflow a u64, because
-		 * MAX_RANGE itself fits in a u32.
+		/* Both values are positive, so we can work with unsigned and
+		 * copy the result to signed (unless it exceeds S64_MAX).
 		 */
-		BUILD_BUG_ON(BPF_REGISTER_MAX_RANGE > (u32)-1);
-		if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value *= max_val;
-		dst_reg->align = tn_mul(dst_reg->align, src_reg->align);
+		if (umax_val > U32_MAX || dst_reg->umax_value > U32_MAX) {
+			/* Potential overflow, we know nothing */
+			__mark_reg_unbounded(dst_reg);
+			/* (except what we can learn from the align) */
+			__update_reg_bounds(dst_reg);
+			break;
+		}
+		dst_reg->umin_value *= umin_val;
+		dst_reg->umax_value *= umax_val;
+		if (dst_reg->umax_value > S64_MAX) {
+			/* Overflow possible, we know nothing */
+			dst_reg->smin_value = S64_MIN;
+			dst_reg->smax_value = S64_MAX;
+		} else {
+			dst_reg->smin_value = dst_reg->umin_value;
+			dst_reg->smax_value = dst_reg->umax_value;
+		}
 		break;
 	case BPF_AND:
 		if (src_known && dst_known) {
-			u64 value = dst_reg->align.value & src_reg->align.value;
-
-			dst_reg->align = tn_const(value);
-			dst_reg->min_value = dst_reg->max_value = min_t(u64,
-					value, BPF_REGISTER_MAX_RANGE);
+			__mark_reg_known(dst_reg, dst_reg->align.value &
+						  src_reg->align.value);
 			break;
 		}
-		/* Lose min_value when AND'ing negative numbers, ain't nobody
-		 * got time for that.  Otherwise we get our minimum from the
-		 * align, since that's inherently bitwise.
-		 * Our maximum is the minimum of the operands' maxima.
+		/* We get our minimum from the align, since that's inherently
+		 * bitwise.  Our maximum is the minimum of the operands' maxima.
 		 */
 		dst_reg->align = tn_and(dst_reg->align, src_reg->align);
-		if (min_val < 0 && dst_reg->min_value < 0)
-			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-		else
-			dst_reg->min_value = dst_reg->align.value;
-		dst_reg->max_value = min(dst_reg->max_value, max_val);
+		dst_reg->umin_value = dst_reg->align.value;
+		dst_reg->umax_value = min(dst_reg->umax_value, umax_val);
+		if (dst_reg->smin_value < 0 || smin_val < 0) {
+			/* Lose signed bounds when ANDing negative numbers,
+			 * ain't nobody got time for that.
+			 */
+			dst_reg->smin_value = S64_MIN;
+			dst_reg->smax_value = S64_MAX;
+		} else {
+			/* ANDing two positives gives a positive, so safe to
+			 * cast result into s64.
+			 */
+			dst_reg->smin_value = dst_reg->umin_value;
+			dst_reg->smax_value = dst_reg->umax_value;
+		}
+		/* We may learn something more from the align */
+		__update_reg_bounds(dst_reg);
 		break;
 	case BPF_OR:
 		if (src_known && dst_known) {
-			u64 value = dst_reg->align.value | src_reg->align.value;
-
-			dst_reg->align = tn_const(value);
-			dst_reg->min_value = dst_reg->max_value = min_t(u64,
-					value, BPF_REGISTER_MAX_RANGE);
+			__mark_reg_known(dst_reg, dst_reg->align.value |
+						  src_reg->align.value);
 			break;
 		}
-		/* Lose ranges when OR'ing negative numbers, ain't nobody got
-		 * time for that.  Otherwise we get our maximum from the align,
-		 * and our minimum is the maximum of the operands' minima.
+		/* We get our maximum from the align, and our minimum is the
+		 * maximum of the operands' minima
 		 */
 		dst_reg->align = tn_or(dst_reg->align, src_reg->align);
-		if (min_val < 0 || dst_reg->min_value < 0) {
-			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+		dst_reg->umin_value = max(dst_reg->umin_value, umin_val);
+		dst_reg->umax_value = dst_reg->align.value | dst_reg->align.mask;
+		if (dst_reg->smin_value < 0 || smin_val < 0) {
+			/* Lose signed bounds when ORing negative numbers,
+			 * ain't nobody got time for that.
+			 */
+			dst_reg->smin_value = S64_MIN;
+			dst_reg->smax_value = S64_MAX;
 		} else {
-			dst_reg->min_value = max(dst_reg->min_value, min_val);
-			dst_reg->max_value = dst_reg->align.value | dst_reg->align.mask;
+			/* ORing two positives gives a positive, so safe to
+			 * cast result into s64.
+			 */
+			dst_reg->smin_value = dst_reg->umin_value;
+			dst_reg->smax_value = dst_reg->umax_value;
 		}
+		/* We may learn something more from the align */
+		__update_reg_bounds(dst_reg);
 		break;
 	case BPF_LSH:
-		if (min_val < 0) {
-			/* LSH by a negative number is undefined */
+		if (umax_val > 63) {
+			/* Shifts greater than 63 are undefined.  This includes
+			 * shifts by a negative number.
+			 */
 			mark_reg_unknown(regs, insn->dst_reg);
 			break;
 		}
-		/* Gotta have special overflow logic here, if we're shifting
-		 * more than MAX_RANGE then just assume we have an invalid
-		 * range.
+		/* We lose all sign bit information (except what we can pick
+		 * up from align)
 		 */
-		if (min_val > ilog2(BPF_REGISTER_MAX_RANGE)) {
-			dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
-			dst_reg->align = tn_unknown;
+		dst_reg->smin_value = S64_MIN;
+		dst_reg->smax_value = S64_MAX;
+		/* If we might shift our top bit out, then we know nothing */
+		if (dst_reg->umax_value > 1ULL << (63 - umax_val)) {
+			dst_reg->umin_value = 0;
+			dst_reg->umax_value = U64_MAX;
 		} else {
-			if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
-				dst_reg->min_value <<= min_val;
-			if (src_known)
-				dst_reg->align = tn_sl(dst_reg->align, min_val);
-			else
-				dst_reg->align = tn_sl(tn_unknown, min_val);
+			dst_reg->umin_value <<= umin_val;
+			dst_reg->umax_value <<= umax_val;
 		}
-		if (max_val > ilog2(BPF_REGISTER_MAX_RANGE))
-			dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
-		else if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value <<= max_val;
+		if (src_known)
+			dst_reg->align = tn_sl(dst_reg->align, umin_val);
+		else
+			dst_reg->align = tn_sl(tn_unknown, umin_val);
+		/* We may learn something more from the align */
+		__update_reg_bounds(dst_reg);
 		break;
 	case BPF_RSH:
-		if (min_val < 0) {
-			/* RSH by a negative number is undefined */
+		if (umax_val > 63) {
+			/* Shifts greater than 63 are undefined.  This includes
+			 * shifts by a negative number.
+			 */
 			mark_reg_unknown(regs, insn->dst_reg);
 			break;
 		}
 		/* BPF_RSH is an unsigned shift, so make the appropriate casts */
-		if (dst_reg->min_value < 0) {
-			if (min_val)
+		if (dst_reg->smin_value < 0) {
+			if (umin_val) {
 				/* Sign bit will be cleared */
-				dst_reg->min_value = 0;
+				dst_reg->smin_value = 0;
+			} else {
+				/* Lost sign bit information */
+				dst_reg->smin_value = S64_MIN;
+				dst_reg->smax_value = S64_MAX;
+			}
 		} else {
-			dst_reg->min_value =
-				(u64)(dst_reg->min_value) >> min_val;
+			dst_reg->smin_value =
+				(u64)(dst_reg->smin_value) >> umax_val;
 		}
 		if (src_known)
-			dst_reg->align = tn_sr(dst_reg->align, min_val);
+			dst_reg->align = tn_sr(dst_reg->align, umin_val);
 		else
-			dst_reg->align = tn_sr(tn_unknown, min_val);
-		if (dst_reg->max_value == BPF_REGISTER_MAX_RANGE)
-			dst_reg->max_value = ~0;
-		dst_reg->max_value >>= max_val;
+			dst_reg->align = tn_sr(tn_unknown, umin_val);
+		dst_reg->umin_value >>= umax_val;
+		dst_reg->umax_value >>= umin_val;
+		/* We may learn something more from the align */
+		__update_reg_bounds(dst_reg);
 		break;
 	default:
 		mark_reg_unknown(regs, insn->dst_reg);
 		break;
 	}
 
-	check_reg_overflow(dst_reg);
+	__reg_deduce_bounds(dst_reg);
 	return 0;
 }
 
@@ -1917,14 +2062,11 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 	int rc;
 
 	dst_reg = &regs[insn->dst_reg];
-	check_reg_overflow(dst_reg);
 	src_reg = NULL;
 	if (dst_reg->type != SCALAR_VALUE)
 		ptr_reg = dst_reg;
 	if (BPF_SRC(insn->code) == BPF_X) {
 		src_reg = &regs[insn->src_reg];
-		check_reg_overflow(src_reg);
-
 		if (src_reg->type != SCALAR_VALUE) {
 			if (dst_reg->type != SCALAR_VALUE) {
 				/* Combining two pointers by any ALU op yields
@@ -1972,8 +2114,10 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 		 */
 		off_reg.type = SCALAR_VALUE;
 		off_reg.align = tn_const(insn->imm);
-		off_reg.min_value = insn->imm;
-		off_reg.max_value = insn->imm;
+		off_reg.smin_value = (s64)insn->imm;
+		off_reg.smax_value = (s64)insn->imm;
+		off_reg.umin_value = insn->imm;
+		off_reg.umax_value = insn->imm;
 		src_reg = &off_reg;
 		if (ptr_reg) { /* pointer += K */
 			rc = adjust_ptr_min_max_vals(env, insn,
@@ -2077,20 +2221,16 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 					return -EACCES;
 				}
 				mark_reg_unknown(regs, insn->dst_reg);
-				/* high 32 bits are known zero.  But this is
-				 * still out of range for max_value, so leave
-				 * that.
-				 */
+				/* high 32 bits are known zero. */
 				regs[insn->dst_reg].align.mask &= (u32)-1;
+				__update_reg_bounds(&regs[insn->dst_reg]);
 			}
 		} else {
 			/* case: R = imm
 			 * remember the value we stored into this reg
 			 */
 			regs[insn->dst_reg].type = SCALAR_VALUE;
-			regs[insn->dst_reg].align = tn_const(insn->imm);
-			regs[insn->dst_reg].max_value = insn->imm;
-			regs[insn->dst_reg].min_value = insn->imm;
+			__mark_reg_known(regs + insn->dst_reg, insn->imm);
 		}
 
 	} else if (opcode > BPF_END) {
@@ -2215,60 +2355,35 @@ static void reg_set_min_max(struct bpf_reg_state *true_reg,
 		/* If this is false then we know nothing Jon Snow, but if it is
 		 * true then we know for sure.
 		 */
-		true_reg->max_value = true_reg->min_value = val;
-		true_reg->align = tn_const(val);
+		__mark_reg_known(true_reg, val);
 		break;
 	case BPF_JNE:
 		/* If this is true we know nothing Jon Snow, but if it is false
 		 * we know the value for sure;
 		 */
-		false_reg->max_value = false_reg->min_value = val;
-		false_reg->align = tn_const(val);
+		__mark_reg_known(false_reg, val);
 		break;
 	case BPF_JGT:
-		/* Unsigned comparison, can only tell us about max_value (since
-		 * min_value is signed), unless we learn sign bit.
-		 */
-		false_reg->max_value = val;
-		/* If we're not unsigned-greater-than a positive value, then
-		 * we can't be negative.
-		 */
-		if ((s64)val >= 0 && false_reg->min_value < 0)
-			false_reg->min_value = 0;
+		false_reg->umax_value = min(false_reg->umax_value, val);
+		true_reg->umin_value = max(true_reg->umin_value, val + 1);
 		break;
 	case BPF_JSGT:
-		/* Signed comparison, can only tell us about min_value (since
-		 * max_value is unsigned), unless we already know sign bit.
-		 */
-		true_reg->min_value = val + 1;
-		/* If we're not signed-greater than val, and we're not negative,
-		 * then we can't be unsigned-greater than val either.
-		 */
-		if (false_reg->min_value >= 0)
-			false_reg->max_value = val;
+		false_reg->smax_value = min_t(s64, false_reg->smax_value, val);
+		true_reg->smin_value = max_t(s64, true_reg->smin_value, val + 1);
 		break;
 	case BPF_JGE:
-		false_reg->max_value = val - 1;
-		/* If we're not unsigned-ge a positive value, then we can't be
-		 * negative.
-		 */
-		if ((s64)val >= 0 && false_reg->min_value < 0)
-			false_reg->min_value = 0;
+		false_reg->umax_value = min(false_reg->umax_value, val - 1);
+		true_reg->umin_value = max(true_reg->umin_value, val);
 		break;
 	case BPF_JSGE:
-		true_reg->min_value = val;
-		/* If we're not signed-ge val, and we're not negative, then we
-		 * can't be unsigned-ge val either.
-		 */
-		if (false_reg->min_value >= 0)
-			false_reg->max_value = val - 1;
+		false_reg->smax_value = min_t(s64, false_reg->smax_value, val - 1);
+		true_reg->smin_value = max_t(s64, true_reg->smin_value, val);
 		break;
 	default:
 		break;
 	}
-
-	check_reg_overflow(false_reg);
-	check_reg_overflow(true_reg);
+	__reg_deduce_bounds(false_reg);
+	__reg_deduce_bounds(true_reg);
 }
 
 /* Same as above, but for the case that dst_reg holds a constant and src_reg is
@@ -2283,74 +2398,58 @@ static void reg_set_min_max_inv(struct bpf_reg_state *true_reg,
 		/* If this is false then we know nothing Jon Snow, but if it is
 		 * true then we know for sure.
 		 */
-		true_reg->max_value = true_reg->min_value = val;
-		true_reg->align = tn_const(val);
+		__mark_reg_known(true_reg, val);
 		break;
 	case BPF_JNE:
 		/* If this is true we know nothing Jon Snow, but if it is false
 		 * we know the value for sure;
 		 */
-		false_reg->max_value = false_reg->min_value = val;
-		false_reg->align = tn_const(val);
+		__mark_reg_known(false_reg, val);
 		break;
 	case BPF_JGT:
-		/* Unsigned comparison, can only tell us about max_value (since
-		 * min_value is signed), unless we learn sign bit.
-		 */
-		true_reg->max_value = val - 1;
-		/* If a positive value is unsigned-greater-than us, then we
-		 * can't be negative.
-		 */
-		if ((s64)val >= 0 && true_reg->min_value < 0)
-			true_reg->min_value = 0;
+		true_reg->umax_value = min(true_reg->umax_value, val - 1);
+		false_reg->umin_value = max(false_reg->umin_value, val);
 		break;
 	case BPF_JSGT:
-		/* Signed comparison, can only tell us about min_value (since
-		 * max_value is unsigned), unless we already know sign bit.
-		 */
-		false_reg->min_value = val;
-		/* If val is signed-greater-than us, and we're not negative,
-		 * then val must be unsigned-greater-than us.
-		 */
-		if (true_reg->min_value >= 0)
-			true_reg->max_value = val - 1;
+		true_reg->smax_value = min_t(s64, true_reg->smax_value, val - 1);
+		false_reg->smin_value = max_t(s64, false_reg->smin_value, val);
 		break;
 	case BPF_JGE:
-		true_reg->max_value = val;
-		/* If a positive value is unsigned-ge us, then we can't be
-		 * negative.
-		 */
-		if ((s64)val >= 0 && true_reg->min_value < 0)
-			true_reg->min_value = 0;
+		true_reg->umax_value = min(true_reg->umax_value, val);
+		false_reg->umin_value = max(false_reg->umin_value, val + 1);
 		break;
 	case BPF_JSGE:
-		false_reg->min_value = val + 1;
-		/* If val is signed-ge us, and we're not negative, then val
-		 * must be unsigned-ge us.
-		 */
-		if (true_reg->min_value >= 0)
-			true_reg->max_value = val;
+		true_reg->smax_value = min_t(s64, true_reg->smax_value, val);
+		false_reg->smin_value = max_t(s64, false_reg->smin_value, val + 1);
 		break;
 	default:
 		break;
 	}
 
-	check_reg_overflow(false_reg);
-	check_reg_overflow(true_reg);
+	__reg_deduce_bounds(false_reg);
+	__reg_deduce_bounds(true_reg);
 }
 
 /* Regs are known to be equal, so intersect their min/max/align */
 static void __reg_combine_min_max(struct bpf_reg_state *src_reg,
 				  struct bpf_reg_state *dst_reg)
 {
-	src_reg->min_value = dst_reg->min_value = max(src_reg->min_value,
-						      dst_reg->min_value);
-	src_reg->max_value = dst_reg->max_value = min(src_reg->max_value,
-						      dst_reg->max_value);
+	src_reg->umin_value = dst_reg->umin_value = max(src_reg->umin_value,
+							dst_reg->umin_value);
+	src_reg->umax_value = dst_reg->umax_value = min(src_reg->umax_value,
+							dst_reg->umax_value);
+	src_reg->smin_value = dst_reg->smin_value = max(src_reg->smin_value,
+							dst_reg->smin_value);
+	src_reg->smax_value = dst_reg->smax_value = min(src_reg->smax_value,
+							dst_reg->smax_value);
 	src_reg->align = dst_reg->align = tn_intersect(src_reg->align,
 						       dst_reg->align);
-	check_reg_overflow(src_reg);
-	check_reg_overflow(dst_reg);
+	/* We might have learned new bounds from the align. */
+	__update_reg_bounds(src_reg);
+	__update_reg_bounds(dst_reg);
+	/* We might have learned something about the sign bit. */
+	__reg_deduce_bounds(src_reg);
+	__reg_deduce_bounds(dst_reg);
 }
 
 static void reg_combine_min_max(struct bpf_reg_state *true_src,
@@ -2365,6 +2464,7 @@ static void reg_combine_min_max(struct bpf_reg_state *true_src,
 		break;
 	case BPF_JNE:
 		__reg_combine_min_max(false_src, false_dst);
+		break;
 	}
 }
 
@@ -2378,11 +2478,11 @@ static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id,
 		 * have been known-zero, because we don't allow pointer
 		 * arithmetic on pointers that might be NULL.
 		 */
-		if (WARN_ON_ONCE(reg->min_value || reg->max_value ||
+		if (WARN_ON_ONCE(reg->smin_value || reg->smax_value ||
 				 reg->align.value || reg->align.mask ||
 				 reg->off)) {
-			reg->min_value = reg->max_value = reg->off = 0;
-			reg->align = tn_const(0);
+			__mark_reg_known_zero(reg);
+			reg->off = 0;
 		}
 		if (is_null) {
 			reg->type = SCALAR_VALUE;
@@ -2575,10 +2675,7 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
 
 		regs[insn->dst_reg].type = SCALAR_VALUE;
-		regs[insn->dst_reg].min_value = imm;
-		regs[insn->dst_reg].max_value = imm;
-		check_reg_overflow(&regs[insn->dst_reg]);
-		regs[insn->dst_reg].align = tn_const(imm);
+		__mark_reg_known(&regs[insn->dst_reg], imm);
 		return 0;
 	}
 
@@ -2866,8 +2963,10 @@ static int check_cfg(struct bpf_verifier_env *env)
 static bool range_within(struct bpf_reg_state *old,
 			 struct bpf_reg_state *cur)
 {
-	return old->min_value <= cur->min_value &&
-	       old->max_value >= cur->max_value;
+	return old->umin_value <= cur->umin_value &&
+	       old->umax_value >= cur->umax_value &&
+	       old->smin_value <= cur->smin_value &&
+	       old->smax_value >= cur->smax_value;
 }
 
 /* Returns true if (rold safe implies rcur safe) */
@@ -2894,8 +2993,10 @@ static bool regsafe(struct bpf_reg_state *rold,
 			 * equal, because we can't know anything about the
 			 * scalar value of the pointer in the new value.
 			 */
-			return rold->min_value == BPF_REGISTER_MIN_RANGE &&
-			       rold->max_value == BPF_REGISTER_MAX_RANGE &&
+			return rold->umin_value == 0 &&
+			       rold->umax_value == U64_MAX &&
+			       rold->smin_value == S64_MIN &&
+			       rold->smax_value == S64_MAX &&
 			       !~rold->align.mask;
 		}
 	case PTR_TO_MAP_VALUE:

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH net-next 5/5] selftests/bpf: change test_verifier expectations
@ 2017-06-07 15:00   ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-07 15:00 UTC (permalink / raw)
  To: davem, Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, iovisor-dev, LKML

Some of the verifier's error messages have changed, and some constructs
 that previously couldn't be verified are now accepted.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 tools/testing/selftests/bpf/test_verifier.c | 226 ++++++++++++++--------------
 1 file changed, 116 insertions(+), 110 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 5074cfa..f5281df 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -421,7 +421,7 @@ static struct bpf_test tests[] = {
 			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
-		.errstr_unpriv = "R1 pointer arithmetic",
+		.errstr_unpriv = "R1 subtraction from stack pointer",
 		.result_unpriv = REJECT,
 		.errstr = "R1 invalid mem access",
 		.result = REJECT,
@@ -603,8 +603,9 @@ static struct bpf_test tests[] = {
 			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, -4),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned stack access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"invalid map_fd for function call",
@@ -650,8 +651,9 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map1 = { 3 },
-		.errstr = "misaligned access",
+		.errstr = "misaligned value access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"sometimes access memory with incorrect alignment",
@@ -672,6 +674,7 @@ static struct bpf_test tests[] = {
 		.errstr = "R0 invalid mem access",
 		.errstr_unpriv = "R0 leaks addr",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"jump test 1",
@@ -1184,8 +1187,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[0]) + 1),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: half, oob 1",
@@ -1279,8 +1283,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[0]) + 2),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: word, unaligned 2",
@@ -1290,8 +1295,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[4]) + 1),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: word, unaligned 3",
@@ -1301,8 +1307,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[4]) + 2),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: word, unaligned 4",
@@ -1312,8 +1319,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[4]) + 3),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: double",
@@ -1339,8 +1347,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[1])),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: double, unaligned 2",
@@ -1350,8 +1359,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[3])),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: double, oob 1",
@@ -1505,7 +1515,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "misaligned access off -6 size 8",
+		.errstr = "misaligned stack access off (0x0; 0x0)+-8+2 size 8",
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"PTR_TO_STACK store/load - bad alignment on reg",
@@ -1517,7 +1528,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "misaligned access off -2 size 8",
+		.errstr = "misaligned stack access off (0x0; 0x0)+-10+8 size 8",
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"PTR_TO_STACK store/load - out of bounds low",
@@ -1561,8 +1573,6 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = ACCEPT,
-		.result_unpriv = REJECT,
-		.errstr_unpriv = "R1 pointer arithmetic",
 	},
 	{
 		"unpriv: add pointer to pointer",
@@ -1573,7 +1583,7 @@ static struct bpf_test tests[] = {
 		},
 		.result = ACCEPT,
 		.result_unpriv = REJECT,
-		.errstr_unpriv = "R1 pointer arithmetic",
+		.errstr_unpriv = "R1 pointer += pointer",
 	},
 	{
 		"unpriv: neg pointer",
@@ -1914,10 +1924,7 @@ static struct bpf_test tests[] = {
 			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, -8),
 			BPF_EXIT_INSN(),
 		},
-		.errstr_unpriv = "pointer arithmetic prohibited",
-		.result_unpriv = REJECT,
-		.errstr = "R1 invalid mem access",
-		.result = REJECT,
+		.result = ACCEPT,
 	},
 	{
 		"unpriv: cmp of stack pointer",
@@ -1981,7 +1988,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid stack type R3",
+		.errstr = "R4 min value is negative",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
@@ -1998,7 +2005,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid stack type R3",
+		.errstr = "R4 min value is negative",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
@@ -2200,7 +2207,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid stack type R3 off=-1 access_size=-1",
+		.errstr = "R4 min value is negative",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
@@ -2217,7 +2224,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid stack type R3 off=-1 access_size=2147483647",
+		.errstr = "R4 unbounded memory access, use 'var &= const' or 'if (var < const)'",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
@@ -2234,7 +2241,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid stack type R3 off=-512 access_size=2147483647",
+		.errstr = "R4 unbounded memory access, use 'var &= const' or 'if (var < const)'",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
@@ -2634,7 +2641,7 @@ static struct bpf_test tests[] = {
 			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1),
 			BPF_JMP_A(-6),
 		},
-		.errstr = "misaligned packet access off 2+15+-4 size 4",
+		.errstr = "misaligned packet access off 2+(0x0; 0x0)+15+-4 size 4",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
@@ -2929,7 +2936,7 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test14, cls helper fail sub",
+		"helper access to packet: test14, cls helper ok sub",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -2949,12 +2956,36 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"helper access to packet: test15, cls helper fail sub",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, 1),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_7, 6),
+			BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 12),
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				     BPF_FUNC_csum_diff),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
 		.result = REJECT,
-		.errstr = "type=inv expected=fp",
+		.errstr = "invalid access to packet",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test15, cls helper fail range 1",
+		"helper access to packet: test16, cls helper fail range 1",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -2979,7 +3010,7 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test16, cls helper fail range 2",
+		"helper access to packet: test17, cls helper fail range 2",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -3000,11 +3031,11 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid access to packet",
+		.errstr = "R2 min value is negative",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test17, cls helper fail range 3",
+		"helper access to packet: test18, cls helper fail range 3",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -3025,11 +3056,11 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid access to packet",
+		.errstr = "R2 min value is negative",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test18, cls helper fail range zero",
+		"helper access to packet: test19, cls helper fail range zero",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -3054,7 +3085,7 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test19, pkt end as input",
+		"helper access to packet: test20, pkt end as input",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -3079,7 +3110,7 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test20, wrong reg",
+		"helper access to packet: test21, wrong reg",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -3139,7 +3170,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result_unpriv = REJECT,
 		.result = ACCEPT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -3163,7 +3194,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result_unpriv = REJECT,
 		.result = ACCEPT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -3191,7 +3222,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result_unpriv = REJECT,
 		.result = ACCEPT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -3232,9 +3263,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
 		.errstr = "R0 min value is outside of the array range",
-		.result_unpriv = REJECT,
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
@@ -3256,9 +3285,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
-		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
-		.result_unpriv = REJECT,
+		.errstr = "R0 unbounded memory access, make sure to bounds check any array access into a map",
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
@@ -3272,7 +3299,7 @@ static struct bpf_test tests[] = {
 			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
 				     BPF_FUNC_map_lookup_elem),
 			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
-			BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
 			BPF_MOV32_IMM(BPF_REG_2, MAX_ENTRIES),
 			BPF_JMP_REG(BPF_JSGT, BPF_REG_2, BPF_REG_1, 1),
 			BPF_MOV32_IMM(BPF_REG_1, 0),
@@ -3283,8 +3310,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
-		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
+		.errstr_unpriv = "R0 leaks addr",
+		.errstr = "R0 unbounded memory access",
 		.result_unpriv = REJECT,
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -3310,7 +3337,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.errstr = "invalid access to map value, value_size=48 off=44 size=8",
 		.result_unpriv = REJECT,
 		.result = REJECT,
@@ -3340,8 +3367,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3, 11 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
-		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
+		.errstr_unpriv = "R0 pointer += pointer",
+		.errstr = "R0 invalid mem access 'inv'",
 		.result_unpriv = REJECT,
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -3483,34 +3510,6 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS
 	},
 	{
-		"multiple registers share map_lookup_elem bad reg type",
-		.insns = {
-			BPF_MOV64_IMM(BPF_REG_1, 10),
-			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
-			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
-			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
-			BPF_LD_MAP_FD(BPF_REG_1, 0),
-			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
-				     BPF_FUNC_map_lookup_elem),
-			BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
-			BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
-			BPF_MOV64_REG(BPF_REG_4, BPF_REG_0),
-			BPF_MOV64_REG(BPF_REG_5, BPF_REG_0),
-			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
-			BPF_MOV64_IMM(BPF_REG_1, 1),
-			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
-			BPF_MOV64_IMM(BPF_REG_1, 2),
-			BPF_JMP_IMM(BPF_JEQ, BPF_REG_3, 0, 1),
-			BPF_ST_MEM(BPF_DW, BPF_REG_3, 0, 0),
-			BPF_MOV64_IMM(BPF_REG_1, 3),
-			BPF_EXIT_INSN(),
-		},
-		.fixup_map1 = { 4 },
-		.result = REJECT,
-		.errstr = "R3 invalid mem access 'inv'",
-		.prog_type = BPF_PROG_TYPE_SCHED_CLS
-	},
-	{
 		"invalid map access from else condition",
 		.insns = {
 			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
@@ -3528,9 +3527,9 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "R0 unbounded memory access, make sure to bounds check any array access into a map",
+		.errstr = "R0 unbounded memory access",
 		.result = REJECT,
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
@@ -3842,7 +3841,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "invalid access to map value, value_size=48 off=0 size=-8",
+		.errstr = "R2 min value is negative",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -3954,7 +3953,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "invalid access to map value, value_size=48 off=4 size=-8",
+		.errstr = "R2 min value is negative",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -3976,7 +3975,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "R1 min value is outside of the array range",
+		.errstr = "R2 min value is negative",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -4092,7 +4091,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "invalid access to map value, value_size=48 off=4 size=-8",
+		.errstr = "R2 min value is negative",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -4115,7 +4114,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "R1 min value is outside of the array range",
+		.errstr = "R2 min value is negative",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -4203,13 +4202,13 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
 			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
 			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
-			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_2, 1),
 			BPF_MOV64_IMM(BPF_REG_3, 0),
 			BPF_EMIT_CALL(BPF_FUNC_probe_read),
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "R1 min value is negative, either use unsigned index or do a if (index >=0) check",
+		.errstr = "R1 unbounded memory access",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -4329,7 +4328,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result = ACCEPT,
 		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -4357,7 +4356,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result = ACCEPT,
 		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -4376,7 +4375,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 bitwise operator &= on pointer",
 		.errstr = "invalid mem access 'inv'",
 		.result = REJECT,
 		.result_unpriv = REJECT,
@@ -4395,7 +4394,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 32-bit pointer arithmetic prohibited",
 		.errstr = "invalid mem access 'inv'",
 		.result = REJECT,
 		.result_unpriv = REJECT,
@@ -4414,7 +4413,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 pointer arithmetic with /= operator",
 		.errstr = "invalid mem access 'inv'",
 		.result = REJECT,
 		.result_unpriv = REJECT,
@@ -4457,10 +4456,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 invalid mem access 'inv'",
 		.errstr = "R0 invalid mem access 'inv'",
 		.result = REJECT,
-		.result_unpriv = REJECT,
 	},
 	{
 		"map element value is preserved across register spilling",
@@ -4482,7 +4479,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result = ACCEPT,
 		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -4664,7 +4661,8 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "R2 unbounded memory access",
+		/* because max wasn't checked, signed min is negative */
+		.errstr = "R2 min value is negative, either use unsigned or 'var &= const'",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -4720,7 +4718,7 @@ static struct bpf_test tests[] = {
 			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
 				sizeof(struct test_val), 4),
 			BPF_MOV64_IMM(BPF_REG_4, 0),
-			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_JMP_REG(BPF_JSGE, BPF_REG_4, BPF_REG_2, 2),
 			BPF_MOV64_IMM(BPF_REG_3, 0),
 			BPF_EMIT_CALL(BPF_FUNC_probe_read),
 			BPF_MOV64_IMM(BPF_REG_0, 0),
@@ -4746,7 +4744,7 @@ static struct bpf_test tests[] = {
 			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
 				sizeof(struct test_val) + 1, 4),
 			BPF_MOV64_IMM(BPF_REG_4, 0),
-			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_JMP_REG(BPF_JSGE, BPF_REG_4, BPF_REG_2, 2),
 			BPF_MOV64_IMM(BPF_REG_3, 0),
 			BPF_EMIT_CALL(BPF_FUNC_probe_read),
 			BPF_MOV64_IMM(BPF_REG_0, 0),
@@ -4774,7 +4772,7 @@ static struct bpf_test tests[] = {
 			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
 				sizeof(struct test_val) - 20, 4),
 			BPF_MOV64_IMM(BPF_REG_4, 0),
-			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_JMP_REG(BPF_JSGE, BPF_REG_4, BPF_REG_2, 2),
 			BPF_MOV64_IMM(BPF_REG_3, 0),
 			BPF_EMIT_CALL(BPF_FUNC_probe_read),
 			BPF_MOV64_IMM(BPF_REG_0, 0),
@@ -4801,7 +4799,7 @@ static struct bpf_test tests[] = {
 			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
 				sizeof(struct test_val) - 19, 4),
 			BPF_MOV64_IMM(BPF_REG_4, 0),
-			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_JMP_REG(BPF_JSGE, BPF_REG_4, BPF_REG_2, 2),
 			BPF_MOV64_IMM(BPF_REG_3, 0),
 			BPF_EMIT_CALL(BPF_FUNC_probe_read),
 			BPF_MOV64_IMM(BPF_REG_0, 0),
@@ -4813,6 +4811,20 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
 	{
+		"helper access to variable memory: size = 0 allowed on NULL",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_1, 0),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_EMIT_CALL(BPF_FUNC_csum_diff),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
 		"helper access to variable memory: size > 0 not allowed on NULL",
 		.insns = {
 			BPF_MOV64_IMM(BPF_REG_1, 0),
@@ -4826,7 +4838,7 @@ static struct bpf_test tests[] = {
 			BPF_EMIT_CALL(BPF_FUNC_csum_diff),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "R1 type=imm expected=fp",
+		.errstr = "R1 type=inv expected=fp",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
@@ -4911,7 +4923,7 @@ static struct bpf_test tests[] = {
 			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
 				     BPF_FUNC_map_lookup_elem),
 			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
-			BPF_MOV64_IMM(BPF_REG_1, 6),
+			BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
 			BPF_ALU64_IMM(BPF_AND, BPF_REG_1, -4),
 			BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 2),
 			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
@@ -4920,10 +4932,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
-		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
+		.errstr = "R0 max value is outside of the array range",
 		.result = REJECT,
-		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
@@ -4952,10 +4962,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
-		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
+		.errstr = "R0 max value is outside of the array range",
 		.result = REJECT,
-		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
@@ -5002,7 +5010,7 @@ static struct bpf_test tests[] = {
 		},
 		.fixup_map_in_map = { 3 },
 		.errstr = "R1 type=inv expected=map_ptr",
-		.errstr_unpriv = "R1 pointer arithmetic prohibited",
+		.errstr_unpriv = "R1 pointer arithmetic on CONST_PTR_TO_MAP prohibited",
 		.result = REJECT,
 	},
 	{
@@ -5190,10 +5198,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map1 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
 		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
 		.result = REJECT,
-		.result_unpriv = REJECT,
 	},
 };
 

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH net-next 5/5] selftests/bpf: change test_verifier expectations
@ 2017-06-07 15:00   ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-07 15:00 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov, Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev, LKML

Some of the verifier's error messages have changed, and some constructs
 that previously couldn't be verified are now accepted.

Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
---
 tools/testing/selftests/bpf/test_verifier.c | 226 ++++++++++++++--------------
 1 file changed, 116 insertions(+), 110 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 5074cfa..f5281df 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -421,7 +421,7 @@ static struct bpf_test tests[] = {
 			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
-		.errstr_unpriv = "R1 pointer arithmetic",
+		.errstr_unpriv = "R1 subtraction from stack pointer",
 		.result_unpriv = REJECT,
 		.errstr = "R1 invalid mem access",
 		.result = REJECT,
@@ -603,8 +603,9 @@ static struct bpf_test tests[] = {
 			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, -4),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned stack access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"invalid map_fd for function call",
@@ -650,8 +651,9 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map1 = { 3 },
-		.errstr = "misaligned access",
+		.errstr = "misaligned value access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"sometimes access memory with incorrect alignment",
@@ -672,6 +674,7 @@ static struct bpf_test tests[] = {
 		.errstr = "R0 invalid mem access",
 		.errstr_unpriv = "R0 leaks addr",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"jump test 1",
@@ -1184,8 +1187,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[0]) + 1),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: half, oob 1",
@@ -1279,8 +1283,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[0]) + 2),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: word, unaligned 2",
@@ -1290,8 +1295,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[4]) + 1),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: word, unaligned 3",
@@ -1301,8 +1307,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[4]) + 2),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: word, unaligned 4",
@@ -1312,8 +1319,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[4]) + 3),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: double",
@@ -1339,8 +1347,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[1])),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: double, unaligned 2",
@@ -1350,8 +1359,9 @@ static struct bpf_test tests[] = {
 				    offsetof(struct __sk_buff, cb[3])),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "misaligned access",
+		.errstr = "misaligned context access",
 		.result = REJECT,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"check cb access: double, oob 1",
@@ -1505,7 +1515,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "misaligned access off -6 size 8",
+		.errstr = "misaligned stack access off (0x0; 0x0)+-8+2 size 8",
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"PTR_TO_STACK store/load - bad alignment on reg",
@@ -1517,7 +1528,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "misaligned access off -2 size 8",
+		.errstr = "misaligned stack access off (0x0; 0x0)+-10+8 size 8",
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
 	},
 	{
 		"PTR_TO_STACK store/load - out of bounds low",
@@ -1561,8 +1573,6 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = ACCEPT,
-		.result_unpriv = REJECT,
-		.errstr_unpriv = "R1 pointer arithmetic",
 	},
 	{
 		"unpriv: add pointer to pointer",
@@ -1573,7 +1583,7 @@ static struct bpf_test tests[] = {
 		},
 		.result = ACCEPT,
 		.result_unpriv = REJECT,
-		.errstr_unpriv = "R1 pointer arithmetic",
+		.errstr_unpriv = "R1 pointer += pointer",
 	},
 	{
 		"unpriv: neg pointer",
@@ -1914,10 +1924,7 @@ static struct bpf_test tests[] = {
 			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, -8),
 			BPF_EXIT_INSN(),
 		},
-		.errstr_unpriv = "pointer arithmetic prohibited",
-		.result_unpriv = REJECT,
-		.errstr = "R1 invalid mem access",
-		.result = REJECT,
+		.result = ACCEPT,
 	},
 	{
 		"unpriv: cmp of stack pointer",
@@ -1981,7 +1988,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid stack type R3",
+		.errstr = "R4 min value is negative",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
@@ -1998,7 +2005,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid stack type R3",
+		.errstr = "R4 min value is negative",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
@@ -2200,7 +2207,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid stack type R3 off=-1 access_size=-1",
+		.errstr = "R4 min value is negative",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
@@ -2217,7 +2224,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid stack type R3 off=-1 access_size=2147483647",
+		.errstr = "R4 unbounded memory access, use 'var &= const' or 'if (var < const)'",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
@@ -2234,7 +2241,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid stack type R3 off=-512 access_size=2147483647",
+		.errstr = "R4 unbounded memory access, use 'var &= const' or 'if (var < const)'",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
@@ -2634,7 +2641,7 @@ static struct bpf_test tests[] = {
 			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1),
 			BPF_JMP_A(-6),
 		},
-		.errstr = "misaligned packet access off 2+15+-4 size 4",
+		.errstr = "misaligned packet access off 2+(0x0; 0x0)+15+-4 size 4",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
@@ -2929,7 +2936,7 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test14, cls helper fail sub",
+		"helper access to packet: test14, cls helper ok sub",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -2949,12 +2956,36 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"helper access to packet: test15, cls helper fail sub",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, 1),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 7),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_7, 6),
+			BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 12),
+			BPF_MOV64_IMM(BPF_REG_2, 4),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				     BPF_FUNC_csum_diff),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
 		.result = REJECT,
-		.errstr = "type=inv expected=fp",
+		.errstr = "invalid access to packet",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test15, cls helper fail range 1",
+		"helper access to packet: test16, cls helper fail range 1",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -2979,7 +3010,7 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test16, cls helper fail range 2",
+		"helper access to packet: test17, cls helper fail range 2",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -3000,11 +3031,11 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid access to packet",
+		.errstr = "R2 min value is negative",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test17, cls helper fail range 3",
+		"helper access to packet: test18, cls helper fail range 3",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -3025,11 +3056,11 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.result = REJECT,
-		.errstr = "invalid access to packet",
+		.errstr = "R2 min value is negative",
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test18, cls helper fail range zero",
+		"helper access to packet: test19, cls helper fail range zero",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -3054,7 +3085,7 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test19, pkt end as input",
+		"helper access to packet: test20, pkt end as input",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -3079,7 +3110,7 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
-		"helper access to packet: test20, wrong reg",
+		"helper access to packet: test21, wrong reg",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
 				    offsetof(struct __sk_buff, data)),
@@ -3139,7 +3170,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result_unpriv = REJECT,
 		.result = ACCEPT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -3163,7 +3194,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result_unpriv = REJECT,
 		.result = ACCEPT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -3191,7 +3222,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result_unpriv = REJECT,
 		.result = ACCEPT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -3232,9 +3263,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
 		.errstr = "R0 min value is outside of the array range",
-		.result_unpriv = REJECT,
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
@@ -3256,9 +3285,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
-		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
-		.result_unpriv = REJECT,
+		.errstr = "R0 unbounded memory access, make sure to bounds check any array access into a map",
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
@@ -3272,7 +3299,7 @@ static struct bpf_test tests[] = {
 			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
 				     BPF_FUNC_map_lookup_elem),
 			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
-			BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
 			BPF_MOV32_IMM(BPF_REG_2, MAX_ENTRIES),
 			BPF_JMP_REG(BPF_JSGT, BPF_REG_2, BPF_REG_1, 1),
 			BPF_MOV32_IMM(BPF_REG_1, 0),
@@ -3283,8 +3310,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
-		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
+		.errstr_unpriv = "R0 leaks addr",
+		.errstr = "R0 unbounded memory access",
 		.result_unpriv = REJECT,
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -3310,7 +3337,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.errstr = "invalid access to map value, value_size=48 off=44 size=8",
 		.result_unpriv = REJECT,
 		.result = REJECT,
@@ -3340,8 +3367,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3, 11 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
-		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
+		.errstr_unpriv = "R0 pointer += pointer",
+		.errstr = "R0 invalid mem access 'inv'",
 		.result_unpriv = REJECT,
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -3483,34 +3510,6 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS
 	},
 	{
-		"multiple registers share map_lookup_elem bad reg type",
-		.insns = {
-			BPF_MOV64_IMM(BPF_REG_1, 10),
-			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
-			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
-			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
-			BPF_LD_MAP_FD(BPF_REG_1, 0),
-			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
-				     BPF_FUNC_map_lookup_elem),
-			BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
-			BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
-			BPF_MOV64_REG(BPF_REG_4, BPF_REG_0),
-			BPF_MOV64_REG(BPF_REG_5, BPF_REG_0),
-			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
-			BPF_MOV64_IMM(BPF_REG_1, 1),
-			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
-			BPF_MOV64_IMM(BPF_REG_1, 2),
-			BPF_JMP_IMM(BPF_JEQ, BPF_REG_3, 0, 1),
-			BPF_ST_MEM(BPF_DW, BPF_REG_3, 0, 0),
-			BPF_MOV64_IMM(BPF_REG_1, 3),
-			BPF_EXIT_INSN(),
-		},
-		.fixup_map1 = { 4 },
-		.result = REJECT,
-		.errstr = "R3 invalid mem access 'inv'",
-		.prog_type = BPF_PROG_TYPE_SCHED_CLS
-	},
-	{
 		"invalid map access from else condition",
 		.insns = {
 			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
@@ -3528,9 +3527,9 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "R0 unbounded memory access, make sure to bounds check any array access into a map",
+		.errstr = "R0 unbounded memory access",
 		.result = REJECT,
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
@@ -3842,7 +3841,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "invalid access to map value, value_size=48 off=0 size=-8",
+		.errstr = "R2 min value is negative",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -3954,7 +3953,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "invalid access to map value, value_size=48 off=4 size=-8",
+		.errstr = "R2 min value is negative",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -3976,7 +3975,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "R1 min value is outside of the array range",
+		.errstr = "R2 min value is negative",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -4092,7 +4091,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "invalid access to map value, value_size=48 off=4 size=-8",
+		.errstr = "R2 min value is negative",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -4115,7 +4114,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "R1 min value is outside of the array range",
+		.errstr = "R2 min value is negative",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -4203,13 +4202,13 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
 			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
 			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
-			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_2, 1),
 			BPF_MOV64_IMM(BPF_REG_3, 0),
 			BPF_EMIT_CALL(BPF_FUNC_probe_read),
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr = "R1 min value is negative, either use unsigned index or do a if (index >=0) check",
+		.errstr = "R1 unbounded memory access",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -4329,7 +4328,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result = ACCEPT,
 		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -4357,7 +4356,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result = ACCEPT,
 		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -4376,7 +4375,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 bitwise operator &= on pointer",
 		.errstr = "invalid mem access 'inv'",
 		.result = REJECT,
 		.result_unpriv = REJECT,
@@ -4395,7 +4394,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 32-bit pointer arithmetic prohibited",
 		.errstr = "invalid mem access 'inv'",
 		.result = REJECT,
 		.result_unpriv = REJECT,
@@ -4414,7 +4413,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 pointer arithmetic with /= operator",
 		.errstr = "invalid mem access 'inv'",
 		.result = REJECT,
 		.result_unpriv = REJECT,
@@ -4457,10 +4456,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 invalid mem access 'inv'",
 		.errstr = "R0 invalid mem access 'inv'",
 		.result = REJECT,
-		.result_unpriv = REJECT,
 	},
 	{
 		"map element value is preserved across register spilling",
@@ -4482,7 +4479,7 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.errstr_unpriv = "R0 leaks addr",
 		.result = ACCEPT,
 		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
@@ -4664,7 +4661,8 @@ static struct bpf_test tests[] = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "R2 unbounded memory access",
+		/* because max wasn't checked, signed min is negative */
+		.errstr = "R2 min value is negative, either use unsigned or 'var &= const'",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
@@ -4720,7 +4718,7 @@ static struct bpf_test tests[] = {
 			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
 				sizeof(struct test_val), 4),
 			BPF_MOV64_IMM(BPF_REG_4, 0),
-			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_JMP_REG(BPF_JSGE, BPF_REG_4, BPF_REG_2, 2),
 			BPF_MOV64_IMM(BPF_REG_3, 0),
 			BPF_EMIT_CALL(BPF_FUNC_probe_read),
 			BPF_MOV64_IMM(BPF_REG_0, 0),
@@ -4746,7 +4744,7 @@ static struct bpf_test tests[] = {
 			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
 				sizeof(struct test_val) + 1, 4),
 			BPF_MOV64_IMM(BPF_REG_4, 0),
-			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_JMP_REG(BPF_JSGE, BPF_REG_4, BPF_REG_2, 2),
 			BPF_MOV64_IMM(BPF_REG_3, 0),
 			BPF_EMIT_CALL(BPF_FUNC_probe_read),
 			BPF_MOV64_IMM(BPF_REG_0, 0),
@@ -4774,7 +4772,7 @@ static struct bpf_test tests[] = {
 			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
 				sizeof(struct test_val) - 20, 4),
 			BPF_MOV64_IMM(BPF_REG_4, 0),
-			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_JMP_REG(BPF_JSGE, BPF_REG_4, BPF_REG_2, 2),
 			BPF_MOV64_IMM(BPF_REG_3, 0),
 			BPF_EMIT_CALL(BPF_FUNC_probe_read),
 			BPF_MOV64_IMM(BPF_REG_0, 0),
@@ -4801,7 +4799,7 @@ static struct bpf_test tests[] = {
 			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
 				sizeof(struct test_val) - 19, 4),
 			BPF_MOV64_IMM(BPF_REG_4, 0),
-			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_JMP_REG(BPF_JSGE, BPF_REG_4, BPF_REG_2, 2),
 			BPF_MOV64_IMM(BPF_REG_3, 0),
 			BPF_EMIT_CALL(BPF_FUNC_probe_read),
 			BPF_MOV64_IMM(BPF_REG_0, 0),
@@ -4813,6 +4811,20 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
 	{
+		"helper access to variable memory: size = 0 allowed on NULL",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_1, 0),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_EMIT_CALL(BPF_FUNC_csum_diff),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
 		"helper access to variable memory: size > 0 not allowed on NULL",
 		.insns = {
 			BPF_MOV64_IMM(BPF_REG_1, 0),
@@ -4826,7 +4838,7 @@ static struct bpf_test tests[] = {
 			BPF_EMIT_CALL(BPF_FUNC_csum_diff),
 			BPF_EXIT_INSN(),
 		},
-		.errstr = "R1 type=imm expected=fp",
+		.errstr = "R1 type=inv expected=fp",
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
@@ -4911,7 +4923,7 @@ static struct bpf_test tests[] = {
 			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
 				     BPF_FUNC_map_lookup_elem),
 			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
-			BPF_MOV64_IMM(BPF_REG_1, 6),
+			BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
 			BPF_ALU64_IMM(BPF_AND, BPF_REG_1, -4),
 			BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 2),
 			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
@@ -4920,10 +4932,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
-		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
+		.errstr = "R0 max value is outside of the array range",
 		.result = REJECT,
-		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
@@ -4952,10 +4962,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map2 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
-		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
+		.errstr = "R0 max value is outside of the array range",
 		.result = REJECT,
-		.result_unpriv = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
 	{
@@ -5002,7 +5010,7 @@ static struct bpf_test tests[] = {
 		},
 		.fixup_map_in_map = { 3 },
 		.errstr = "R1 type=inv expected=map_ptr",
-		.errstr_unpriv = "R1 pointer arithmetic prohibited",
+		.errstr_unpriv = "R1 pointer arithmetic on CONST_PTR_TO_MAP prohibited",
 		.result = REJECT,
 	},
 	{
@@ -5190,10 +5198,8 @@ static struct bpf_test tests[] = {
 			BPF_EXIT_INSN(),
 		},
 		.fixup_map1 = { 3 },
-		.errstr_unpriv = "R0 pointer arithmetic prohibited",
 		.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
 		.result = REJECT,
-		.result_unpriv = REJECT,
 	},
 };

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-08  2:32     ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov @ 2017-06-08  2:32 UTC (permalink / raw)
  To: Edward Cree
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On Wed, Jun 07, 2017 at 03:58:31PM +0100, Edward Cree wrote:
> Tracks value alignment by means of tracking known & unknown bits.
> Tightens some min/max value checks and fixes a couple of bugs therein.
> 
> Signed-off-by: Edward Cree <ecree@solarflare.com>
> ---
>  include/linux/bpf.h          |   34 +-
>  include/linux/bpf_verifier.h |   40 +-
>  include/linux/tnum.h         |   58 ++
>  kernel/bpf/Makefile          |    2 +-
>  kernel/bpf/tnum.c            |  163 +++++
>  kernel/bpf/verifier.c        | 1641 +++++++++++++++++++++++-------------------
>  6 files changed, 1170 insertions(+), 768 deletions(-)

yeah! That's cool. Overall I like the direction.
I don't understand it completely yet, so ony few nits so far:

> +/* Arithmetic and logical ops */
> +/* Shift a tnum left (by a fixed shift) */
> +struct tnum tn_sl(struct tnum a, u8 shift);
> +/* Shift a tnum right (by a fixed shift) */
> +struct tnum tn_sr(struct tnum a, u8 shift);

I think in few month we will forget what these abbreviations mean.
Can you change it to tnum_rshift, tnum_lshift, tnum_add ?

> +/* half-multiply add: acc += (unknown * mask * value) */
> +static struct tnum hma(struct tnum acc, u64 value, u64 mask)

hma? is it a standard abbreviation?

> -static void init_reg_state(struct bpf_reg_state *regs)
> +static void mark_reg_known_zero(struct bpf_reg_state *regs, u32 regno)
>  {
> -	int i;
> -
> -	for (i = 0; i < MAX_BPF_REG; i++)
> -		mark_reg_not_init(regs, i);
> -
> -	/* frame pointer */
> -	regs[BPF_REG_FP].type = FRAME_PTR;
> -
> -	/* 1st arg to a function */
> -	regs[BPF_REG_1].type = PTR_TO_CTX;
> +	BUG_ON(regno >= MAX_BPF_REG);
> +	__mark_reg_known_zero(regs + regno);

I know we have BUG_ONs in the code and it was never hit,
but since you're rewriting it please change it to WARN_ON and
set all regs into NOT_INIT in such case.
This way if we really have a bug, it hopefully won't crash.

> -/* check read/write into an adjusted map element */
> -static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
> +/* check read/write into a map element with possible variable offset */
> +static int check_map_access(struct bpf_verifier_env *env, u32 regno,
>  				int off, int size)
>  {
>  	struct bpf_verifier_state *state = &env->cur_state;
>  	struct bpf_reg_state *reg = &state->regs[regno];
>  	int err;
>  
> -	/* We adjusted the register to this map value, so we
> -	 * need to change off and size to min_value and max_value
> -	 * respectively to make sure our theoretical access will be
> -	 * safe.
> +	/* We may have adjusted the register to this map value, so we
> +	 * need to try adding each of min_value and max_value to off
> +	 * to make sure our theoretical access will be safe.
>  	 */
>  	if (log_level)
>  		print_verifier_state(state);
> -	env->varlen_map_value_access = true;
> +	/* If the offset is variable, we will need to be stricter in state
> +	 * pruning from now on.
> +	 */
> +	if (reg->align.mask)
> +		env->varlen_map_value_access = true;

i think this align.mask access was used in few places.
May be worth to do static inline helper with clear name?

>  	switch (reg->type) {
>  	case PTR_TO_PACKET:
> +		/* special case, because of NET_IP_ALIGN */
>  		return check_pkt_ptr_alignment(reg, off, size, strict);
> -	case PTR_TO_MAP_VALUE_ADJ:
> -		return check_val_ptr_alignment(reg, size, strict);
> +	case PTR_TO_MAP_VALUE:
> +		pointer_desc = "value ";
> +		break;
> +	case PTR_TO_CTX:
> +		pointer_desc = "context ";
> +		break;
> +	case PTR_TO_STACK:
> +		pointer_desc = "stack ";
> +		break;

thank you for making errors more human readable.

> +			char tn_buf[48];
> +
> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
> +			verbose("variable ctx access align=%s off=%d size=%d",
> +				tn_buf, off, size);
> +			return -EACCES;
> +		}
> +		off += reg->align.value;

I think 'align' is an odd name for this field.
May be rename off/align fields into
s32 fixed_off;
struct tnum var_off;

>  
> -	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
> +	} else if (reg->type == PTR_TO_STACK) {
> +		/* stack accesses must be at a fixed offset, so that we can
> +		 * determine what type of data were returned.
> +		 */
> +		if (reg->align.mask) {
> +			char tn_buf[48];
> +
> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
> +			verbose("variable stack access align=%s off=%d size=%d",
> +				tn_buf, off, size);
> +			return -EACCES;

hmm. why this restriction?
I thought one of key points of the diff that ptr+var tracking logic
will now apply not only to map_value, but to stack_ptr as well?

>  	}
>  
> -	if (!err && size <= 2 && value_regno >= 0 && env->allow_ptr_leaks &&
> -	    state->regs[value_regno].type == UNKNOWN_VALUE) {
> -		/* 1 or 2 byte load zero-extends, determine the number of
> -		 * zero upper bits. Not doing it fo 4 byte load, since
> -		 * such values cannot be added to ptr_to_packet anyway.
> -		 */
> -		state->regs[value_regno].imm = 64 - size * 8;
> +	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
> +	    state->regs[value_regno].type == SCALAR_VALUE) {
> +		/* b/h/w load zero-extends, mark upper bits as known 0 */
> +		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
> +		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;

probably another helper from tnum.h is needed.

> +		/* sign bit is known zero, so we can bound the value */
> +		state->regs[value_regno].min_value = 0;
> +		state->regs[value_regno].max_value = min_t(u64,
> +					state->regs[value_regno].align.mask,
> +					BPF_REGISTER_MAX_RANGE);

min_t with mask? should it be align.value?

>  	}
>  	return err;
>  }
> @@ -1000,9 +1068,18 @@ static int check_xadd(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  				BPF_SIZE(insn->code), BPF_WRITE, -1);
>  }
>  
> +/* Does this register contain a constant zero? */
> +static bool register_is_null(struct bpf_reg_state reg)
> +{
> +	return reg.type == SCALAR_VALUE && reg.align.mask == 0 &&
> +	       reg.align.value == 0;

align.mask == 0 && align.value==0 into helper in tnum.h ?

> @@ -1024,7 +1101,15 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
>  		return -EACCES;
>  	}
>  
> -	off = regs[regno].imm;
> +	/* Only allow fixed-offset stack reads */
> +	if (regs[regno].align.mask) {
> +		char tn_buf[48];
> +
> +		tn_strn(tn_buf, sizeof(tn_buf), regs[regno].align);
> +		verbose("invalid variable stack read R%d align=%s\n",
> +			regno, tn_buf);
> +	}

same question as before. can it be relaxed?
The support for char arr[32]; accee arr[n] was requested several times
and folks used map_value[n] as a workaround.
Seems with this var stack logic it's one step away, no?

> -		if (src_reg->imm < 48) {
> -			verbose("cannot add integer value with %lld upper zero bits to ptr_to_packet\n",
> -				src_reg->imm);
> -			return -EACCES;
> -		}
> -
> -		had_id = (dst_reg->id != 0);
> -
> -		/* dst_reg stays as pkt_ptr type and since some positive
> -		 * integer value was added to the pointer, increment its 'id'
> -		 */
> -		dst_reg->id = ++env->id_gen;

great to see it's being generalized.

> +	if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
> +		if (!env->allow_ptr_leaks) {
> +			verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
> +				dst);
> +			return -EACCES;
> +		}

i guess mark_map_reg() logic will cover good cases and
actual math on ptr_to_map_or_null will happen only in broken programs.
just feels a bit fragile, since it probably depends on order we will
evaluate the branches? it's not an issue with this patch. we have
the same situation today. just thinking out loud.

> +	/* Got here implies adding two SCALAR_VALUEs */
> +	if (WARN_ON_ONCE(ptr_reg)) {
> +		verbose("verifier internal error\n");
> +		return -EINVAL;
...
> +	if (WARN_ON(!src_reg)) {
> +		verbose("verifier internal error\n");
> +		return -EINVAL;
>  	}

i'm lost with these bits.
Can you add a comment in what circumstances this can be hit
and what would be the consequences?

> +/* Returns true if (rold safe implies rcur safe) */
> +static bool regsafe(struct bpf_reg_state *rold,
> +		    struct bpf_reg_state *rcur,
> +		    bool varlen_map_access)
> +{
> +	if (memcmp(rold, rcur, sizeof(*rold)) == 0)
>  		return true;
> +	if (rold->type == NOT_INIT)
> +		/* explored state can't have used this */
>  		return true;
> +	if (rcur->type == NOT_INIT)
> +		return false;
> +	switch (rold->type) {
> +	case SCALAR_VALUE:
> +		if (rcur->type == SCALAR_VALUE) {
> +			/* new val must satisfy old val knowledge */
> +			return range_within(rold, rcur) &&
> +			       tn_in(rold->align, rcur->align);
> +		} else {
> +			/* if we knew anything about the old value, we're not
> +			 * equal, because we can't know anything about the
> +			 * scalar value of the pointer in the new value.
> +			 */
> +			return rold->min_value == BPF_REGISTER_MIN_RANGE &&
> +			       rold->max_value == BPF_REGISTER_MAX_RANGE &&
> +			       !~rold->align.mask;
> +		}
> +	case PTR_TO_MAP_VALUE:
> +		if (varlen_map_access) {
> +			/* If the new min/max/align satisfy the old ones and
> +			 * everything else matches, we are OK.
> +			 * We don't care about the 'id' value, because nothing
> +			 * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL)
> +			 */
> +			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
> +			       range_within(rold, rcur) &&
> +			       tn_in(rold->align, rcur->align);
> +		} else {
> +			/* If the ranges/align were not the same, but
> +			 * everything else was and we didn't do a variable
> +			 * access into a map then we are a-ok.
> +			 */
> +			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0;
> +		}
> +	case PTR_TO_MAP_VALUE_OR_NULL:

does this new state comparison logic helps?
Do you have any numbers before/after in the number of insns it had to process
for the tests in selftests ?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-08  2:32     ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov via iovisor-dev @ 2017-06-08  2:32 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Wed, Jun 07, 2017 at 03:58:31PM +0100, Edward Cree wrote:
> Tracks value alignment by means of tracking known & unknown bits.
> Tightens some min/max value checks and fixes a couple of bugs therein.
> 
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
> ---
>  include/linux/bpf.h          |   34 +-
>  include/linux/bpf_verifier.h |   40 +-
>  include/linux/tnum.h         |   58 ++
>  kernel/bpf/Makefile          |    2 +-
>  kernel/bpf/tnum.c            |  163 +++++
>  kernel/bpf/verifier.c        | 1641 +++++++++++++++++++++++-------------------
>  6 files changed, 1170 insertions(+), 768 deletions(-)

yeah! That's cool. Overall I like the direction.
I don't understand it completely yet, so ony few nits so far:

> +/* Arithmetic and logical ops */
> +/* Shift a tnum left (by a fixed shift) */
> +struct tnum tn_sl(struct tnum a, u8 shift);
> +/* Shift a tnum right (by a fixed shift) */
> +struct tnum tn_sr(struct tnum a, u8 shift);

I think in few month we will forget what these abbreviations mean.
Can you change it to tnum_rshift, tnum_lshift, tnum_add ?

> +/* half-multiply add: acc += (unknown * mask * value) */
> +static struct tnum hma(struct tnum acc, u64 value, u64 mask)

hma? is it a standard abbreviation?

> -static void init_reg_state(struct bpf_reg_state *regs)
> +static void mark_reg_known_zero(struct bpf_reg_state *regs, u32 regno)
>  {
> -	int i;
> -
> -	for (i = 0; i < MAX_BPF_REG; i++)
> -		mark_reg_not_init(regs, i);
> -
> -	/* frame pointer */
> -	regs[BPF_REG_FP].type = FRAME_PTR;
> -
> -	/* 1st arg to a function */
> -	regs[BPF_REG_1].type = PTR_TO_CTX;
> +	BUG_ON(regno >= MAX_BPF_REG);
> +	__mark_reg_known_zero(regs + regno);

I know we have BUG_ONs in the code and it was never hit,
but since you're rewriting it please change it to WARN_ON and
set all regs into NOT_INIT in such case.
This way if we really have a bug, it hopefully won't crash.

> -/* check read/write into an adjusted map element */
> -static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
> +/* check read/write into a map element with possible variable offset */
> +static int check_map_access(struct bpf_verifier_env *env, u32 regno,
>  				int off, int size)
>  {
>  	struct bpf_verifier_state *state = &env->cur_state;
>  	struct bpf_reg_state *reg = &state->regs[regno];
>  	int err;
>  
> -	/* We adjusted the register to this map value, so we
> -	 * need to change off and size to min_value and max_value
> -	 * respectively to make sure our theoretical access will be
> -	 * safe.
> +	/* We may have adjusted the register to this map value, so we
> +	 * need to try adding each of min_value and max_value to off
> +	 * to make sure our theoretical access will be safe.
>  	 */
>  	if (log_level)
>  		print_verifier_state(state);
> -	env->varlen_map_value_access = true;
> +	/* If the offset is variable, we will need to be stricter in state
> +	 * pruning from now on.
> +	 */
> +	if (reg->align.mask)
> +		env->varlen_map_value_access = true;

i think this align.mask access was used in few places.
May be worth to do static inline helper with clear name?

>  	switch (reg->type) {
>  	case PTR_TO_PACKET:
> +		/* special case, because of NET_IP_ALIGN */
>  		return check_pkt_ptr_alignment(reg, off, size, strict);
> -	case PTR_TO_MAP_VALUE_ADJ:
> -		return check_val_ptr_alignment(reg, size, strict);
> +	case PTR_TO_MAP_VALUE:
> +		pointer_desc = "value ";
> +		break;
> +	case PTR_TO_CTX:
> +		pointer_desc = "context ";
> +		break;
> +	case PTR_TO_STACK:
> +		pointer_desc = "stack ";
> +		break;

thank you for making errors more human readable.

> +			char tn_buf[48];
> +
> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
> +			verbose("variable ctx access align=%s off=%d size=%d",
> +				tn_buf, off, size);
> +			return -EACCES;
> +		}
> +		off += reg->align.value;

I think 'align' is an odd name for this field.
May be rename off/align fields into
s32 fixed_off;
struct tnum var_off;

>  
> -	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
> +	} else if (reg->type == PTR_TO_STACK) {
> +		/* stack accesses must be at a fixed offset, so that we can
> +		 * determine what type of data were returned.
> +		 */
> +		if (reg->align.mask) {
> +			char tn_buf[48];
> +
> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
> +			verbose("variable stack access align=%s off=%d size=%d",
> +				tn_buf, off, size);
> +			return -EACCES;

hmm. why this restriction?
I thought one of key points of the diff that ptr+var tracking logic
will now apply not only to map_value, but to stack_ptr as well?

>  	}
>  
> -	if (!err && size <= 2 && value_regno >= 0 && env->allow_ptr_leaks &&
> -	    state->regs[value_regno].type == UNKNOWN_VALUE) {
> -		/* 1 or 2 byte load zero-extends, determine the number of
> -		 * zero upper bits. Not doing it fo 4 byte load, since
> -		 * such values cannot be added to ptr_to_packet anyway.
> -		 */
> -		state->regs[value_regno].imm = 64 - size * 8;
> +	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
> +	    state->regs[value_regno].type == SCALAR_VALUE) {
> +		/* b/h/w load zero-extends, mark upper bits as known 0 */
> +		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
> +		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;

probably another helper from tnum.h is needed.

> +		/* sign bit is known zero, so we can bound the value */
> +		state->regs[value_regno].min_value = 0;
> +		state->regs[value_regno].max_value = min_t(u64,
> +					state->regs[value_regno].align.mask,
> +					BPF_REGISTER_MAX_RANGE);

min_t with mask? should it be align.value?

>  	}
>  	return err;
>  }
> @@ -1000,9 +1068,18 @@ static int check_xadd(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  				BPF_SIZE(insn->code), BPF_WRITE, -1);
>  }
>  
> +/* Does this register contain a constant zero? */
> +static bool register_is_null(struct bpf_reg_state reg)
> +{
> +	return reg.type == SCALAR_VALUE && reg.align.mask == 0 &&
> +	       reg.align.value == 0;

align.mask == 0 && align.value==0 into helper in tnum.h ?

> @@ -1024,7 +1101,15 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
>  		return -EACCES;
>  	}
>  
> -	off = regs[regno].imm;
> +	/* Only allow fixed-offset stack reads */
> +	if (regs[regno].align.mask) {
> +		char tn_buf[48];
> +
> +		tn_strn(tn_buf, sizeof(tn_buf), regs[regno].align);
> +		verbose("invalid variable stack read R%d align=%s\n",
> +			regno, tn_buf);
> +	}

same question as before. can it be relaxed?
The support for char arr[32]; accee arr[n] was requested several times
and folks used map_value[n] as a workaround.
Seems with this var stack logic it's one step away, no?

> -		if (src_reg->imm < 48) {
> -			verbose("cannot add integer value with %lld upper zero bits to ptr_to_packet\n",
> -				src_reg->imm);
> -			return -EACCES;
> -		}
> -
> -		had_id = (dst_reg->id != 0);
> -
> -		/* dst_reg stays as pkt_ptr type and since some positive
> -		 * integer value was added to the pointer, increment its 'id'
> -		 */
> -		dst_reg->id = ++env->id_gen;

great to see it's being generalized.

> +	if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
> +		if (!env->allow_ptr_leaks) {
> +			verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
> +				dst);
> +			return -EACCES;
> +		}

i guess mark_map_reg() logic will cover good cases and
actual math on ptr_to_map_or_null will happen only in broken programs.
just feels a bit fragile, since it probably depends on order we will
evaluate the branches? it's not an issue with this patch. we have
the same situation today. just thinking out loud.

> +	/* Got here implies adding two SCALAR_VALUEs */
> +	if (WARN_ON_ONCE(ptr_reg)) {
> +		verbose("verifier internal error\n");
> +		return -EINVAL;
...
> +	if (WARN_ON(!src_reg)) {
> +		verbose("verifier internal error\n");
> +		return -EINVAL;
>  	}

i'm lost with these bits.
Can you add a comment in what circumstances this can be hit
and what would be the consequences?

> +/* Returns true if (rold safe implies rcur safe) */
> +static bool regsafe(struct bpf_reg_state *rold,
> +		    struct bpf_reg_state *rcur,
> +		    bool varlen_map_access)
> +{
> +	if (memcmp(rold, rcur, sizeof(*rold)) == 0)
>  		return true;
> +	if (rold->type == NOT_INIT)
> +		/* explored state can't have used this */
>  		return true;
> +	if (rcur->type == NOT_INIT)
> +		return false;
> +	switch (rold->type) {
> +	case SCALAR_VALUE:
> +		if (rcur->type == SCALAR_VALUE) {
> +			/* new val must satisfy old val knowledge */
> +			return range_within(rold, rcur) &&
> +			       tn_in(rold->align, rcur->align);
> +		} else {
> +			/* if we knew anything about the old value, we're not
> +			 * equal, because we can't know anything about the
> +			 * scalar value of the pointer in the new value.
> +			 */
> +			return rold->min_value == BPF_REGISTER_MIN_RANGE &&
> +			       rold->max_value == BPF_REGISTER_MAX_RANGE &&
> +			       !~rold->align.mask;
> +		}
> +	case PTR_TO_MAP_VALUE:
> +		if (varlen_map_access) {
> +			/* If the new min/max/align satisfy the old ones and
> +			 * everything else matches, we are OK.
> +			 * We don't care about the 'id' value, because nothing
> +			 * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL)
> +			 */
> +			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
> +			       range_within(rold, rcur) &&
> +			       tn_in(rold->align, rcur->align);
> +		} else {
> +			/* If the ranges/align were not the same, but
> +			 * everything else was and we didn't do a variable
> +			 * access into a map then we are a-ok.
> +			 */
> +			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0;
> +		}
> +	case PTR_TO_MAP_VALUE_OR_NULL:

does this new state comparison logic helps?
Do you have any numbers before/after in the number of insns it had to process
for the tests in selftests ?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08  2:35     ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov @ 2017-06-08  2:35 UTC (permalink / raw)
  To: Edward Cree
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On Wed, Jun 07, 2017 at 03:58:50PM +0100, Edward Cree wrote:
> If pointer leaks are allowed, and adjust_ptr_min_max_vals returns -EACCES,
>  treat the pointer as an unknown scalar and try again, because we might be
>  able to conclude something about the result (e.g. pointer & 0x40 is either
>  0 or 0x40).
> 
> Signed-off-by: Edward Cree <ecree@solarflare.com>
> ---
>  kernel/bpf/verifier.c | 244 ++++++++++++++++++++++++++------------------------
>  1 file changed, 127 insertions(+), 117 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index dd06e4e..1ff5b5d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1566,6 +1566,8 @@ static void coerce_reg_to_32(struct bpf_reg_state *reg)
>  /* Handles arithmetic on a pointer and a scalar: computes new min/max and align.
>   * Caller must check_reg_overflow all argument regs beforehand.
>   * Caller should also handle BPF_MOV case separately.
> + * If we return -EACCES, caller may want to try again treating pointer as a
> + * scalar.  So we only emit a diagnostic if !env->allow_ptr_leaks.
>   */
>  static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  				   struct bpf_insn *insn,
> @@ -1588,43 +1590,29 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  
>  	if (BPF_CLASS(insn->code) != BPF_ALU64) {
>  		/* 32-bit ALU ops on pointers produce (meaningless) scalars */
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d 32-bit pointer arithmetic prohibited\n",
>  				dst);
> -			return -EACCES;
> -		}
> -		__mark_reg_unknown(dst_reg);
> -		/* High bits are known zero */
> -		dst_reg->align.mask = (u32)-1;
> -		return 0;
> +		return -EACCES;
>  	}
>  
>  	if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
>  				dst);
> -			return -EACCES;
> -		}
> -		__mark_reg_unknown(dst_reg);
> -		return 0;
> +		return -EACCES;
>  	}
>  	if (ptr_reg->type == CONST_PTR_TO_MAP) {
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n",
>  				dst);
> -			return -EACCES;
> -		}
> -		__mark_reg_unknown(dst_reg);
> -		return 0;
> +		return -EACCES;
>  	}
>  	if (ptr_reg->type == PTR_TO_PACKET_END) {
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n",
>  				dst);
> -			return -EACCES;
> -		}
> -		__mark_reg_unknown(dst_reg);
> -		return 0;
> +		return -EACCES;
>  	}
>  
>  	/* In case of 'scalar += pointer', dst_reg inherits pointer type and id.
> @@ -1648,8 +1636,9 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  			break;
>  		}
>  		if (max_val == BPF_REGISTER_MAX_RANGE) {
> -			verbose("R%d tried to add unbounded value to pointer\n",
> -				dst);
> +			if (!env->allow_ptr_leaks)
> +				verbose("R%d tried to add unbounded value to pointer\n",
> +					dst);
>  			return -EACCES;
>  		}
>  		/* A new variable offset is created.  Note that off_reg->off
> @@ -1676,28 +1665,20 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  	case BPF_SUB:
>  		if (dst_reg == off_reg) {
>  			/* scalar -= pointer.  Creates an unknown scalar */
> -			if (!env->allow_ptr_leaks) {
> +			if (!env->allow_ptr_leaks)
>  				verbose("R%d tried to subtract pointer from scalar\n",
>  					dst);
> -				return -EACCES;
> -			}
> -			/* Make it an unknown scalar */
> -			__mark_reg_unknown(dst_reg);
> -			break;
> +			return -EACCES;
>  		}
>  		/* We don't allow subtraction from FP, because (according to
>  		 * test_verifier.c test "invalid fp arithmetic", JITs might not
>  		 * be able to deal with it.
>  		 */
>  		if (ptr_reg->type == PTR_TO_STACK) {
> -			if (!env->allow_ptr_leaks) {
> +			if (!env->allow_ptr_leaks)
>  				verbose("R%d subtraction from stack pointer prohibited\n",
>  					dst);
> -				return -EACCES;
> -			}
> -			/* Make it an unknown scalar */
> -			__mark_reg_unknown(dst_reg);
> -			break;
> +			return -EACCES;
>  		}
>  		if (known && (ptr_reg->off - min_val ==
>  			      (s64)(s32)(ptr_reg->off - min_val))) {
> @@ -1713,14 +1694,10 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  		 * This can happen if off_reg is an immediate.
>  		 */
>  		if ((s64)max_val < 0) {
> -			if (!env->allow_ptr_leaks) {
> +			if (!env->allow_ptr_leaks)
>  				verbose("R%d tried to subtract negative max_val %lld from pointer\n",
>  					dst, (s64)max_val);
> -				return -EACCES;
> -			}
> -			/* Make it an unknown scalar */
> -			__mark_reg_unknown(dst_reg);
> -			break;
> +			return -EACCES;
>  		}
>  		/* A new variable offset is created.  If the subtrahend is known
>  		 * nonnegative, then any reg->range we had before is still good.
> @@ -1747,99 +1724,37 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  		 * (However, in principle we could allow some cases, e.g.
>  		 * ptr &= ~3 which would reduce min_value by 3.)
>  		 */
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d bitwise operator %s on pointer prohibited\n",
>  				dst, bpf_alu_string[opcode >> 4]);
> -			return -EACCES;
> -		}
> -		/* Make it an unknown scalar */
> -		__mark_reg_unknown(dst_reg);
> +		return -EACCES;
>  	default:
>  		/* other operators (e.g. MUL,LSH) produce non-pointer results */
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d pointer arithmetic with %s operator prohibited\n",
>  				dst, bpf_alu_string[opcode >> 4]);
> -			return -EACCES;
> -		}
> -		/* Make it an unknown scalar */
> -		__mark_reg_unknown(dst_reg);
> +		return -EACCES;
>  	}
>  
>  	check_reg_overflow(dst_reg);
>  	return 0;
>  }
>  
> -/* Handles ALU ops other than BPF_END, BPF_NEG and BPF_MOV: computes new min/max
> - * and align.
> - * TODO: check this is legit for ALU32, particularly around negatives
> - */
> -static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
> -				   struct bpf_insn *insn)
> +static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
> +				      struct bpf_insn *insn,
> +				      struct bpf_reg_state *dst_reg,
> +				      struct bpf_reg_state *src_reg)
>  {
> -	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg, *src_reg;
> -	struct bpf_reg_state *ptr_reg = NULL, off_reg = {0};
> +	struct bpf_reg_state *regs = env->cur_state.regs;
>  	s64 min_val = BPF_REGISTER_MIN_RANGE;
>  	u64 max_val = BPF_REGISTER_MAX_RANGE;
>  	u8 opcode = BPF_OP(insn->code);
>  	bool src_known, dst_known;
>  
> -	dst_reg = &regs[insn->dst_reg];
> -	check_reg_overflow(dst_reg);
> -	src_reg = NULL;
> -	if (dst_reg->type != SCALAR_VALUE)
> -		ptr_reg = dst_reg;
> -	if (BPF_SRC(insn->code) == BPF_X) {
> -		src_reg = &regs[insn->src_reg];
> -		check_reg_overflow(src_reg);
> -
> -		if (src_reg->type != SCALAR_VALUE) {
> -			if (dst_reg->type != SCALAR_VALUE) {
> -				/* Combining two pointers by any ALU op yields
> -				 * an arbitrary scalar.
> -				 */
> -				if (!env->allow_ptr_leaks) {
> -					verbose("R%d pointer %s pointer prohibited\n",
> -						insn->dst_reg,
> -						bpf_alu_string[opcode >> 4]);
> -					return -EACCES;
> -				}
> -				mark_reg_unknown(regs, insn->dst_reg);
> -				return 0;
> -			} else {
> -				/* scalar += pointer
> -				 * This is legal, but we have to reverse our
> -				 * src/dest handling in computing the range
> -				 */
> -				return adjust_ptr_min_max_vals(env, insn,
> -							       src_reg, dst_reg);
> -			}
> -		} else if (ptr_reg) {
> -			/* pointer += scalar */
> -			return adjust_ptr_min_max_vals(env, insn,
> -						       dst_reg, src_reg);
> -		}
> -	} else {
> -		/* Pretend the src is a reg with a known value, since we only
> -		 * need to be able to read from this state.
> -		 */
> -		off_reg.type = SCALAR_VALUE;
> -		off_reg.align = tn_const(insn->imm);
> -		off_reg.min_value = insn->imm;
> -		off_reg.max_value = insn->imm;
> -		src_reg = &off_reg;
> -		if (ptr_reg) /* pointer += K */
> -			return adjust_ptr_min_max_vals(env, insn,
> -						       ptr_reg, src_reg);
> -	}
> -
> -	/* Got here implies adding two SCALAR_VALUEs */
> -	if (WARN_ON_ONCE(ptr_reg)) {
> -		verbose("verifier internal error\n");
> -		return -EINVAL;
> -	}
> -	if (WARN_ON(!src_reg)) {
> -		verbose("verifier internal error\n");
> -		return -EINVAL;

such large back and forth move doesn't help reviewing.
may be just merge it into previous patch?
Or keep that function in the right place in patch 2 already?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08  2:35     ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov via iovisor-dev @ 2017-06-08  2:35 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Wed, Jun 07, 2017 at 03:58:50PM +0100, Edward Cree wrote:
> If pointer leaks are allowed, and adjust_ptr_min_max_vals returns -EACCES,
>  treat the pointer as an unknown scalar and try again, because we might be
>  able to conclude something about the result (e.g. pointer & 0x40 is either
>  0 or 0x40).
> 
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
> ---
>  kernel/bpf/verifier.c | 244 ++++++++++++++++++++++++++------------------------
>  1 file changed, 127 insertions(+), 117 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index dd06e4e..1ff5b5d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1566,6 +1566,8 @@ static void coerce_reg_to_32(struct bpf_reg_state *reg)
>  /* Handles arithmetic on a pointer and a scalar: computes new min/max and align.
>   * Caller must check_reg_overflow all argument regs beforehand.
>   * Caller should also handle BPF_MOV case separately.
> + * If we return -EACCES, caller may want to try again treating pointer as a
> + * scalar.  So we only emit a diagnostic if !env->allow_ptr_leaks.
>   */
>  static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  				   struct bpf_insn *insn,
> @@ -1588,43 +1590,29 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  
>  	if (BPF_CLASS(insn->code) != BPF_ALU64) {
>  		/* 32-bit ALU ops on pointers produce (meaningless) scalars */
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d 32-bit pointer arithmetic prohibited\n",
>  				dst);
> -			return -EACCES;
> -		}
> -		__mark_reg_unknown(dst_reg);
> -		/* High bits are known zero */
> -		dst_reg->align.mask = (u32)-1;
> -		return 0;
> +		return -EACCES;
>  	}
>  
>  	if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
>  				dst);
> -			return -EACCES;
> -		}
> -		__mark_reg_unknown(dst_reg);
> -		return 0;
> +		return -EACCES;
>  	}
>  	if (ptr_reg->type == CONST_PTR_TO_MAP) {
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n",
>  				dst);
> -			return -EACCES;
> -		}
> -		__mark_reg_unknown(dst_reg);
> -		return 0;
> +		return -EACCES;
>  	}
>  	if (ptr_reg->type == PTR_TO_PACKET_END) {
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n",
>  				dst);
> -			return -EACCES;
> -		}
> -		__mark_reg_unknown(dst_reg);
> -		return 0;
> +		return -EACCES;
>  	}
>  
>  	/* In case of 'scalar += pointer', dst_reg inherits pointer type and id.
> @@ -1648,8 +1636,9 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  			break;
>  		}
>  		if (max_val == BPF_REGISTER_MAX_RANGE) {
> -			verbose("R%d tried to add unbounded value to pointer\n",
> -				dst);
> +			if (!env->allow_ptr_leaks)
> +				verbose("R%d tried to add unbounded value to pointer\n",
> +					dst);
>  			return -EACCES;
>  		}
>  		/* A new variable offset is created.  Note that off_reg->off
> @@ -1676,28 +1665,20 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  	case BPF_SUB:
>  		if (dst_reg == off_reg) {
>  			/* scalar -= pointer.  Creates an unknown scalar */
> -			if (!env->allow_ptr_leaks) {
> +			if (!env->allow_ptr_leaks)
>  				verbose("R%d tried to subtract pointer from scalar\n",
>  					dst);
> -				return -EACCES;
> -			}
> -			/* Make it an unknown scalar */
> -			__mark_reg_unknown(dst_reg);
> -			break;
> +			return -EACCES;
>  		}
>  		/* We don't allow subtraction from FP, because (according to
>  		 * test_verifier.c test "invalid fp arithmetic", JITs might not
>  		 * be able to deal with it.
>  		 */
>  		if (ptr_reg->type == PTR_TO_STACK) {
> -			if (!env->allow_ptr_leaks) {
> +			if (!env->allow_ptr_leaks)
>  				verbose("R%d subtraction from stack pointer prohibited\n",
>  					dst);
> -				return -EACCES;
> -			}
> -			/* Make it an unknown scalar */
> -			__mark_reg_unknown(dst_reg);
> -			break;
> +			return -EACCES;
>  		}
>  		if (known && (ptr_reg->off - min_val ==
>  			      (s64)(s32)(ptr_reg->off - min_val))) {
> @@ -1713,14 +1694,10 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  		 * This can happen if off_reg is an immediate.
>  		 */
>  		if ((s64)max_val < 0) {
> -			if (!env->allow_ptr_leaks) {
> +			if (!env->allow_ptr_leaks)
>  				verbose("R%d tried to subtract negative max_val %lld from pointer\n",
>  					dst, (s64)max_val);
> -				return -EACCES;
> -			}
> -			/* Make it an unknown scalar */
> -			__mark_reg_unknown(dst_reg);
> -			break;
> +			return -EACCES;
>  		}
>  		/* A new variable offset is created.  If the subtrahend is known
>  		 * nonnegative, then any reg->range we had before is still good.
> @@ -1747,99 +1724,37 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
>  		 * (However, in principle we could allow some cases, e.g.
>  		 * ptr &= ~3 which would reduce min_value by 3.)
>  		 */
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d bitwise operator %s on pointer prohibited\n",
>  				dst, bpf_alu_string[opcode >> 4]);
> -			return -EACCES;
> -		}
> -		/* Make it an unknown scalar */
> -		__mark_reg_unknown(dst_reg);
> +		return -EACCES;
>  	default:
>  		/* other operators (e.g. MUL,LSH) produce non-pointer results */
> -		if (!env->allow_ptr_leaks) {
> +		if (!env->allow_ptr_leaks)
>  			verbose("R%d pointer arithmetic with %s operator prohibited\n",
>  				dst, bpf_alu_string[opcode >> 4]);
> -			return -EACCES;
> -		}
> -		/* Make it an unknown scalar */
> -		__mark_reg_unknown(dst_reg);
> +		return -EACCES;
>  	}
>  
>  	check_reg_overflow(dst_reg);
>  	return 0;
>  }
>  
> -/* Handles ALU ops other than BPF_END, BPF_NEG and BPF_MOV: computes new min/max
> - * and align.
> - * TODO: check this is legit for ALU32, particularly around negatives
> - */
> -static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
> -				   struct bpf_insn *insn)
> +static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
> +				      struct bpf_insn *insn,
> +				      struct bpf_reg_state *dst_reg,
> +				      struct bpf_reg_state *src_reg)
>  {
> -	struct bpf_reg_state *regs = env->cur_state.regs, *dst_reg, *src_reg;
> -	struct bpf_reg_state *ptr_reg = NULL, off_reg = {0};
> +	struct bpf_reg_state *regs = env->cur_state.regs;
>  	s64 min_val = BPF_REGISTER_MIN_RANGE;
>  	u64 max_val = BPF_REGISTER_MAX_RANGE;
>  	u8 opcode = BPF_OP(insn->code);
>  	bool src_known, dst_known;
>  
> -	dst_reg = &regs[insn->dst_reg];
> -	check_reg_overflow(dst_reg);
> -	src_reg = NULL;
> -	if (dst_reg->type != SCALAR_VALUE)
> -		ptr_reg = dst_reg;
> -	if (BPF_SRC(insn->code) == BPF_X) {
> -		src_reg = &regs[insn->src_reg];
> -		check_reg_overflow(src_reg);
> -
> -		if (src_reg->type != SCALAR_VALUE) {
> -			if (dst_reg->type != SCALAR_VALUE) {
> -				/* Combining two pointers by any ALU op yields
> -				 * an arbitrary scalar.
> -				 */
> -				if (!env->allow_ptr_leaks) {
> -					verbose("R%d pointer %s pointer prohibited\n",
> -						insn->dst_reg,
> -						bpf_alu_string[opcode >> 4]);
> -					return -EACCES;
> -				}
> -				mark_reg_unknown(regs, insn->dst_reg);
> -				return 0;
> -			} else {
> -				/* scalar += pointer
> -				 * This is legal, but we have to reverse our
> -				 * src/dest handling in computing the range
> -				 */
> -				return adjust_ptr_min_max_vals(env, insn,
> -							       src_reg, dst_reg);
> -			}
> -		} else if (ptr_reg) {
> -			/* pointer += scalar */
> -			return adjust_ptr_min_max_vals(env, insn,
> -						       dst_reg, src_reg);
> -		}
> -	} else {
> -		/* Pretend the src is a reg with a known value, since we only
> -		 * need to be able to read from this state.
> -		 */
> -		off_reg.type = SCALAR_VALUE;
> -		off_reg.align = tn_const(insn->imm);
> -		off_reg.min_value = insn->imm;
> -		off_reg.max_value = insn->imm;
> -		src_reg = &off_reg;
> -		if (ptr_reg) /* pointer += K */
> -			return adjust_ptr_min_max_vals(env, insn,
> -						       ptr_reg, src_reg);
> -	}
> -
> -	/* Got here implies adding two SCALAR_VALUEs */
> -	if (WARN_ON_ONCE(ptr_reg)) {
> -		verbose("verifier internal error\n");
> -		return -EINVAL;
> -	}
> -	if (WARN_ON(!src_reg)) {
> -		verbose("verifier internal error\n");
> -		return -EINVAL;

such large back and forth move doesn't help reviewing.
may be just merge it into previous patch?
Or keep that function in the right place in patch 2 already?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 4/5] bpf/verifier: track signed and unsigned min/max values
@ 2017-06-08  2:40     ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov @ 2017-06-08  2:40 UTC (permalink / raw)
  To: Edward Cree
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On Wed, Jun 07, 2017 at 03:59:25PM +0100, Edward Cree wrote:
> Allows us to, sometimes, combine information from a signed check of one
>  bound and an unsigned check of the other.
> We now track the full range of possible values, rather than restricting
>  ourselves to [0, 1<<30) and considering anything beyond that as
>  unknown.  While this is probably not necessary, it makes the code more
>  straightforward and symmetrical between signed and unsigned bounds.
> 
> Signed-off-by: Edward Cree <ecree@solarflare.com>
> ---
>  include/linux/bpf_verifier.h |  22 +-
>  kernel/bpf/verifier.c        | 661 +++++++++++++++++++++++++------------------
>  2 files changed, 395 insertions(+), 288 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index e341469..10a5944 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -11,11 +11,15 @@
>  #include <linux/filter.h> /* for MAX_BPF_STACK */
>  #include <linux/tnum.h>
>  
> - /* Just some arbitrary values so we can safely do math without overflowing and
> -  * are obviously wrong for any sort of memory access.
> -  */
> -#define BPF_REGISTER_MAX_RANGE (1024 * 1024 * 1024)
> -#define BPF_REGISTER_MIN_RANGE -1
> +/* Maximum variable offset umax_value permitted when resolving memory accesses.
> + * In practice this is far bigger than any realistic pointer offset; this limit
> + * ensures that umax_value + (int)off + (int)size cannot overflow a u64.
> + */
> +#define BPF_MAX_VAR_OFF	(1ULL << 31)
> +/* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO].  This ensures
> + * that converting umax_value to int cannot overflow.
> + */
> +#define BPF_MAX_VAR_SIZ	INT_MAX
>  
>  struct bpf_reg_state {
>  	enum bpf_reg_type type;
> @@ -38,7 +42,7 @@ struct bpf_reg_state {
>  	 * PTR_TO_MAP_VALUE_OR_NULL, we have to NULL-check it _first_.
>  	 */
>  	u32 id;
> -	/* These three fields must be last.  See states_equal() */
> +	/* These five fields must be last.  See states_equal() */
>  	/* For scalar types (SCALAR_VALUE), this represents our knowledge of
>  	 * the actual value.
>  	 * For pointer types, this represents the variable part of the offset
> @@ -51,8 +55,10 @@ struct bpf_reg_state {
>  	 * These refer to the same value as align, not necessarily the actual
>  	 * contents of the register.
>  	 */
> -	s64 min_value; /* minimum possible (s64)value */
> -	u64 max_value; /* maximum possible (u64)value */
> +	s64 smin_value; /* minimum possible (s64)value */
> +	s64 smax_value; /* maximum possible (s64)value */
> +	u64 umin_value; /* minimum possible (u64)value */
> +	u64 umax_value; /* maximum possible (u64)value */

have uneasy feeling about this one.
It's 16 extra bytes to be stored in every reg_state and memcmp later
while we didn't have cases where people wanted negative values
in ptr+var cases. Why bother than?

>  unknown.  While this is probably not necessary, it makes the code more
>  straightforward and symmetrical between signed and unsigned bounds.

it's hard for me to see the 'straightforward' part yet.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 4/5] bpf/verifier: track signed and unsigned min/max values
@ 2017-06-08  2:40     ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov via iovisor-dev @ 2017-06-08  2:40 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Wed, Jun 07, 2017 at 03:59:25PM +0100, Edward Cree wrote:
> Allows us to, sometimes, combine information from a signed check of one
>  bound and an unsigned check of the other.
> We now track the full range of possible values, rather than restricting
>  ourselves to [0, 1<<30) and considering anything beyond that as
>  unknown.  While this is probably not necessary, it makes the code more
>  straightforward and symmetrical between signed and unsigned bounds.
> 
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
> ---
>  include/linux/bpf_verifier.h |  22 +-
>  kernel/bpf/verifier.c        | 661 +++++++++++++++++++++++++------------------
>  2 files changed, 395 insertions(+), 288 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index e341469..10a5944 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -11,11 +11,15 @@
>  #include <linux/filter.h> /* for MAX_BPF_STACK */
>  #include <linux/tnum.h>
>  
> - /* Just some arbitrary values so we can safely do math without overflowing and
> -  * are obviously wrong for any sort of memory access.
> -  */
> -#define BPF_REGISTER_MAX_RANGE (1024 * 1024 * 1024)
> -#define BPF_REGISTER_MIN_RANGE -1
> +/* Maximum variable offset umax_value permitted when resolving memory accesses.
> + * In practice this is far bigger than any realistic pointer offset; this limit
> + * ensures that umax_value + (int)off + (int)size cannot overflow a u64.
> + */
> +#define BPF_MAX_VAR_OFF	(1ULL << 31)
> +/* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO].  This ensures
> + * that converting umax_value to int cannot overflow.
> + */
> +#define BPF_MAX_VAR_SIZ	INT_MAX
>  
>  struct bpf_reg_state {
>  	enum bpf_reg_type type;
> @@ -38,7 +42,7 @@ struct bpf_reg_state {
>  	 * PTR_TO_MAP_VALUE_OR_NULL, we have to NULL-check it _first_.
>  	 */
>  	u32 id;
> -	/* These three fields must be last.  See states_equal() */
> +	/* These five fields must be last.  See states_equal() */
>  	/* For scalar types (SCALAR_VALUE), this represents our knowledge of
>  	 * the actual value.
>  	 * For pointer types, this represents the variable part of the offset
> @@ -51,8 +55,10 @@ struct bpf_reg_state {
>  	 * These refer to the same value as align, not necessarily the actual
>  	 * contents of the register.
>  	 */
> -	s64 min_value; /* minimum possible (s64)value */
> -	u64 max_value; /* maximum possible (u64)value */
> +	s64 smin_value; /* minimum possible (s64)value */
> +	s64 smax_value; /* maximum possible (s64)value */
> +	u64 umin_value; /* minimum possible (u64)value */
> +	u64 umax_value; /* maximum possible (u64)value */

have uneasy feeling about this one.
It's 16 extra bytes to be stored in every reg_state and memcmp later
while we didn't have cases where people wanted negative values
in ptr+var cases. Why bother than?

>  unknown.  While this is probably not necessary, it makes the code more
>  straightforward and symmetrical between signed and unsigned bounds.

it's hard for me to see the 'straightforward' part yet.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 5/5] selftests/bpf: change test_verifier expectations
@ 2017-06-08  2:43     ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov @ 2017-06-08  2:43 UTC (permalink / raw)
  To: Edward Cree
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On Wed, Jun 07, 2017 at 04:00:02PM +0100, Edward Cree wrote:
> Some of the verifier's error messages have changed, and some constructs
>  that previously couldn't be verified are now accepted.
> 
> Signed-off-by: Edward Cree <ecree@solarflare.com>
> ---
>  tools/testing/selftests/bpf/test_verifier.c | 226 ++++++++++++++--------------
>  1 file changed, 116 insertions(+), 110 deletions(-)

imo this rewrite needs more than one additional test.
Like i counted at least 2 new verifier features (like negative and ptr & 0x40)
All the new logic needs to be covered by tests.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 5/5] selftests/bpf: change test_verifier expectations
@ 2017-06-08  2:43     ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov via iovisor-dev @ 2017-06-08  2:43 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Wed, Jun 07, 2017 at 04:00:02PM +0100, Edward Cree wrote:
> Some of the verifier's error messages have changed, and some constructs
>  that previously couldn't be verified are now accepted.
> 
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
> ---
>  tools/testing/selftests/bpf/test_verifier.c | 226 ++++++++++++++--------------
>  1 file changed, 116 insertions(+), 110 deletions(-)

imo this rewrite needs more than one additional test.
Like i counted at least 2 new verifier features (like negative and ptr & 0x40)
All the new logic needs to be covered by tests.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
  2017-06-08  2:32     ` Alexei Starovoitov via iovisor-dev
@ 2017-06-08 14:53       ` Edward Cree via iovisor-dev
  -1 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-08 14:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On 08/06/17 03:32, Alexei Starovoitov wrote:
> On Wed, Jun 07, 2017 at 03:58:31PM +0100, Edward Cree wrote:
>> +/* Arithmetic and logical ops */
>> +/* Shift a tnum left (by a fixed shift) */
>> +struct tnum tn_sl(struct tnum a, u8 shift);
>> +/* Shift a tnum right (by a fixed shift) */
>> +struct tnum tn_sr(struct tnum a, u8 shift);
> I think in few month we will forget what these abbreviations mean.
> Can you change it to tnum_rshift, tnum_lshift, tnum_add ?
Sure, will do.
>> +/* half-multiply add: acc += (unknown * mask * value) */
>> +static struct tnum hma(struct tnum acc, u64 value, u64 mask)
> hma? is it a standard abbreviation?
No, just a weird operation that appears in my multiply algorithm.  Since
 it's static I didn't worry too much about naming it well.
(The abbreviation was inspired by floating point 'fma', fused multiply-add.)
>> -static void init_reg_state(struct bpf_reg_state *regs)
>> +static void mark_reg_known_zero(struct bpf_reg_state *regs, u32 regno)
>>  {
>> -	int i;
>> -
>> -	for (i = 0; i < MAX_BPF_REG; i++)
>> -		mark_reg_not_init(regs, i);
>> -
>> -	/* frame pointer */
>> -	regs[BPF_REG_FP].type = FRAME_PTR;
>> -
>> -	/* 1st arg to a function */
>> -	regs[BPF_REG_1].type = PTR_TO_CTX;
>> +	BUG_ON(regno >= MAX_BPF_REG);
>> +	__mark_reg_known_zero(regs + regno);
> I know we have BUG_ONs in the code and it was never hit,
> but since you're rewriting it please change it to WARN_ON and
> set all regs into NOT_INIT in such case.
> This way if we really have a bug, it hopefully won't crash.
Sure, will do.
>> -/* check read/write into an adjusted map element */
>> -static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
>> +/* check read/write into a map element with possible variable offset */
>> +static int check_map_access(struct bpf_verifier_env *env, u32 regno,
>>  				int off, int size)
>>  {
>>  	struct bpf_verifier_state *state = &env->cur_state;
>>  	struct bpf_reg_state *reg = &state->regs[regno];
>>  	int err;
>>  
>> -	/* We adjusted the register to this map value, so we
>> -	 * need to change off and size to min_value and max_value
>> -	 * respectively to make sure our theoretical access will be
>> -	 * safe.
>> +	/* We may have adjusted the register to this map value, so we
>> +	 * need to try adding each of min_value and max_value to off
>> +	 * to make sure our theoretical access will be safe.
>>  	 */
>>  	if (log_level)
>>  		print_verifier_state(state);
>> -	env->varlen_map_value_access = true;
>> +	/* If the offset is variable, we will need to be stricter in state
>> +	 * pruning from now on.
>> +	 */
>> +	if (reg->align.mask)
>> +		env->varlen_map_value_access = true;
> i think this align.mask access was used in few places.
> May be worth to do static inline helper with clear name?
Sure, seems reasonable.
>> +			char tn_buf[48];
>> +
>> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
>> +			verbose("variable ctx access align=%s off=%d size=%d",
>> +				tn_buf, off, size);
>> +			return -EACCES;
>> +		}
>> +		off += reg->align.value;
> I think 'align' is an odd name for this field.
> May be rename off/align fields into
> s32 fixed_off;
> struct tnum var_off;
Yeah, it got that name for 'historical' reasons i.e. this patch series
 started out as just a rewrite of the alignment tracking, then grew...
I'll do the rename in the next version.
>>  
>> -	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
>> +	} else if (reg->type == PTR_TO_STACK) {
>> +		/* stack accesses must be at a fixed offset, so that we can
>> +		 * determine what type of data were returned.
>> +		 */
>> +		if (reg->align.mask) {
>> +			char tn_buf[48];
>> +
>> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
>> +			verbose("variable stack access align=%s off=%d size=%d",
>> +				tn_buf, off, size);
>> +			return -EACCES;
> hmm. why this restriction?
> I thought one of key points of the diff that ptr+var tracking logic
> will now apply not only to map_value, but to stack_ptr as well?
As the comment above it says, we need to determine what was returned:
 was it STACK_MISC or STACK_SPILL, and if the latter, what kind of pointer
 was spilled there?  See check_stack_read(), which I should probably
 mention in the comment.
>>  	}
>>  
>> -	if (!err && size <= 2 && value_regno >= 0 && env->allow_ptr_leaks &&
>> -	    state->regs[value_regno].type == UNKNOWN_VALUE) {
>> -		/* 1 or 2 byte load zero-extends, determine the number of
>> -		 * zero upper bits. Not doing it fo 4 byte load, since
>> -		 * such values cannot be added to ptr_to_packet anyway.
>> -		 */
>> -		state->regs[value_regno].imm = 64 - size * 8;
>> +	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
>> +	    state->regs[value_regno].type == SCALAR_VALUE) {
>> +		/* b/h/w load zero-extends, mark upper bits as known 0 */
>> +		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
>> +		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
> probably another helper from tnum.h is needed.
I could rewrite as
 reg->align = tn_and(reg->align, tn_const((1ULL << (size * 8)) - 1))
 or do you mean a helper that takes 'size' as an argument?
>> +		/* sign bit is known zero, so we can bound the value */
>> +		state->regs[value_regno].min_value = 0;
>> +		state->regs[value_regno].max_value = min_t(u64,
>> +					state->regs[value_regno].align.mask,
>> +					BPF_REGISTER_MAX_RANGE);
> min_t with mask? should it be align.value?
Hmm, I think actually it should be (mask | value), because this is the
 max (we're taking the min of two maxes to see which is tighter).
>>  	}
>>  	return err;
>>  }
>> @@ -1000,9 +1068,18 @@ static int check_xadd(struct bpf_verifier_env *env, struct bpf_insn *insn)
>>  				BPF_SIZE(insn->code), BPF_WRITE, -1);
>>  }
>>  
>> +/* Does this register contain a constant zero? */
>> +static bool register_is_null(struct bpf_reg_state reg)
>> +{
>> +	return reg.type == SCALAR_VALUE && reg.align.mask == 0 &&
>> +	       reg.align.value == 0;
> align.mask == 0 && align.value==0 into helper in tnum.h ?
Could do, but it seems unnecessary; I don't think anything but this
 function would use it.
>> +	/* Got here implies adding two SCALAR_VALUEs */
>> +	if (WARN_ON_ONCE(ptr_reg)) {
>> +		verbose("verifier internal error\n");
>> +		return -EINVAL;
> ...
>> +	if (WARN_ON(!src_reg)) {
>> +		verbose("verifier internal error\n");
>> +		return -EINVAL;
>>  	}
> i'm lost with these bits.
> Can you add a comment in what circumstances this can be hit
> and what would be the consequences?
It should be impossible to hit either of these cases.  If we let the
 first through, we'd probably do invalid pointer arithmetic (e.g. we
 could multiply a pointer by two and think we'd just multiplied the
 variable offset).  As for the latter, we access through that pointer
 so if it were NULL we would promptly oops.
>> +/* Returns true if (rold safe implies rcur safe) */
>> +static bool regsafe(struct bpf_reg_state *rold,
>> +		    struct bpf_reg_state *rcur,
>> +		    bool varlen_map_access)
>> +{
>> +	if (memcmp(rold, rcur, sizeof(*rold)) == 0)
>>  		return true;
>> +	if (rold->type == NOT_INIT)
>> +		/* explored state can't have used this */
>>  		return true;
>> +	if (rcur->type == NOT_INIT)
>> +		return false;
>> +	switch (rold->type) {
>> +	case SCALAR_VALUE:
>> +		if (rcur->type == SCALAR_VALUE) {
>> +			/* new val must satisfy old val knowledge */
>> +			return range_within(rold, rcur) &&
>> +			       tn_in(rold->align, rcur->align);
>> +		} else {
>> +			/* if we knew anything about the old value, we're not
>> +			 * equal, because we can't know anything about the
>> +			 * scalar value of the pointer in the new value.
>> +			 */
>> +			return rold->min_value == BPF_REGISTER_MIN_RANGE &&
>> +			       rold->max_value == BPF_REGISTER_MAX_RANGE &&
>> +			       !~rold->align.mask;
>> +		}
>> +	case PTR_TO_MAP_VALUE:
>> +		if (varlen_map_access) {
>> +			/* If the new min/max/align satisfy the old ones and
>> +			 * everything else matches, we are OK.
>> +			 * We don't care about the 'id' value, because nothing
>> +			 * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL)
>> +			 */
>> +			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
>> +			       range_within(rold, rcur) &&
>> +			       tn_in(rold->align, rcur->align);
>> +		} else {
>> +			/* If the ranges/align were not the same, but
>> +			 * everything else was and we didn't do a variable
>> +			 * access into a map then we are a-ok.
>> +			 */
>> +			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0;
>> +		}
>> +	case PTR_TO_MAP_VALUE_OR_NULL:
> does this new state comparison logic helps? Do you have any numbers before/after in the number of insns it had to process for the tests in selftests ?
I don't have the numbers, no (I'll try to collect them).  This rewrite was
 more because the data structures had changed so the old code needed changing
 to match.  It's mainly just a refactor and reimplementation of the existing
 logic, I think, extended to cover the new 'align' member as well.

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-08 14:53       ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-08 14:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 08/06/17 03:32, Alexei Starovoitov wrote:
> On Wed, Jun 07, 2017 at 03:58:31PM +0100, Edward Cree wrote:
>> +/* Arithmetic and logical ops */
>> +/* Shift a tnum left (by a fixed shift) */
>> +struct tnum tn_sl(struct tnum a, u8 shift);
>> +/* Shift a tnum right (by a fixed shift) */
>> +struct tnum tn_sr(struct tnum a, u8 shift);
> I think in few month we will forget what these abbreviations mean.
> Can you change it to tnum_rshift, tnum_lshift, tnum_add ?
Sure, will do.
>> +/* half-multiply add: acc += (unknown * mask * value) */
>> +static struct tnum hma(struct tnum acc, u64 value, u64 mask)
> hma? is it a standard abbreviation?
No, just a weird operation that appears in my multiply algorithm.  Since
 it's static I didn't worry too much about naming it well.
(The abbreviation was inspired by floating point 'fma', fused multiply-add.)
>> -static void init_reg_state(struct bpf_reg_state *regs)
>> +static void mark_reg_known_zero(struct bpf_reg_state *regs, u32 regno)
>>  {
>> -	int i;
>> -
>> -	for (i = 0; i < MAX_BPF_REG; i++)
>> -		mark_reg_not_init(regs, i);
>> -
>> -	/* frame pointer */
>> -	regs[BPF_REG_FP].type = FRAME_PTR;
>> -
>> -	/* 1st arg to a function */
>> -	regs[BPF_REG_1].type = PTR_TO_CTX;
>> +	BUG_ON(regno >= MAX_BPF_REG);
>> +	__mark_reg_known_zero(regs + regno);
> I know we have BUG_ONs in the code and it was never hit,
> but since you're rewriting it please change it to WARN_ON and
> set all regs into NOT_INIT in such case.
> This way if we really have a bug, it hopefully won't crash.
Sure, will do.
>> -/* check read/write into an adjusted map element */
>> -static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
>> +/* check read/write into a map element with possible variable offset */
>> +static int check_map_access(struct bpf_verifier_env *env, u32 regno,
>>  				int off, int size)
>>  {
>>  	struct bpf_verifier_state *state = &env->cur_state;
>>  	struct bpf_reg_state *reg = &state->regs[regno];
>>  	int err;
>>  
>> -	/* We adjusted the register to this map value, so we
>> -	 * need to change off and size to min_value and max_value
>> -	 * respectively to make sure our theoretical access will be
>> -	 * safe.
>> +	/* We may have adjusted the register to this map value, so we
>> +	 * need to try adding each of min_value and max_value to off
>> +	 * to make sure our theoretical access will be safe.
>>  	 */
>>  	if (log_level)
>>  		print_verifier_state(state);
>> -	env->varlen_map_value_access = true;
>> +	/* If the offset is variable, we will need to be stricter in state
>> +	 * pruning from now on.
>> +	 */
>> +	if (reg->align.mask)
>> +		env->varlen_map_value_access = true;
> i think this align.mask access was used in few places.
> May be worth to do static inline helper with clear name?
Sure, seems reasonable.
>> +			char tn_buf[48];
>> +
>> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
>> +			verbose("variable ctx access align=%s off=%d size=%d",
>> +				tn_buf, off, size);
>> +			return -EACCES;
>> +		}
>> +		off += reg->align.value;
> I think 'align' is an odd name for this field.
> May be rename off/align fields into
> s32 fixed_off;
> struct tnum var_off;
Yeah, it got that name for 'historical' reasons i.e. this patch series
 started out as just a rewrite of the alignment tracking, then grew...
I'll do the rename in the next version.
>>  
>> -	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
>> +	} else if (reg->type == PTR_TO_STACK) {
>> +		/* stack accesses must be at a fixed offset, so that we can
>> +		 * determine what type of data were returned.
>> +		 */
>> +		if (reg->align.mask) {
>> +			char tn_buf[48];
>> +
>> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
>> +			verbose("variable stack access align=%s off=%d size=%d",
>> +				tn_buf, off, size);
>> +			return -EACCES;
> hmm. why this restriction?
> I thought one of key points of the diff that ptr+var tracking logic
> will now apply not only to map_value, but to stack_ptr as well?
As the comment above it says, we need to determine what was returned:
 was it STACK_MISC or STACK_SPILL, and if the latter, what kind of pointer
 was spilled there?  See check_stack_read(), which I should probably
 mention in the comment.
>>  	}
>>  
>> -	if (!err && size <= 2 && value_regno >= 0 && env->allow_ptr_leaks &&
>> -	    state->regs[value_regno].type == UNKNOWN_VALUE) {
>> -		/* 1 or 2 byte load zero-extends, determine the number of
>> -		 * zero upper bits. Not doing it fo 4 byte load, since
>> -		 * such values cannot be added to ptr_to_packet anyway.
>> -		 */
>> -		state->regs[value_regno].imm = 64 - size * 8;
>> +	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
>> +	    state->regs[value_regno].type == SCALAR_VALUE) {
>> +		/* b/h/w load zero-extends, mark upper bits as known 0 */
>> +		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
>> +		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
> probably another helper from tnum.h is needed.
I could rewrite as
 reg->align = tn_and(reg->align, tn_const((1ULL << (size * 8)) - 1))
 or do you mean a helper that takes 'size' as an argument?
>> +		/* sign bit is known zero, so we can bound the value */
>> +		state->regs[value_regno].min_value = 0;
>> +		state->regs[value_regno].max_value = min_t(u64,
>> +					state->regs[value_regno].align.mask,
>> +					BPF_REGISTER_MAX_RANGE);
> min_t with mask? should it be align.value?
Hmm, I think actually it should be (mask | value), because this is the
 max (we're taking the min of two maxes to see which is tighter).
>>  	}
>>  	return err;
>>  }
>> @@ -1000,9 +1068,18 @@ static int check_xadd(struct bpf_verifier_env *env, struct bpf_insn *insn)
>>  				BPF_SIZE(insn->code), BPF_WRITE, -1);
>>  }
>>  
>> +/* Does this register contain a constant zero? */
>> +static bool register_is_null(struct bpf_reg_state reg)
>> +{
>> +	return reg.type == SCALAR_VALUE && reg.align.mask == 0 &&
>> +	       reg.align.value == 0;
> align.mask == 0 && align.value==0 into helper in tnum.h ?
Could do, but it seems unnecessary; I don't think anything but this
 function would use it.
>> +	/* Got here implies adding two SCALAR_VALUEs */
>> +	if (WARN_ON_ONCE(ptr_reg)) {
>> +		verbose("verifier internal error\n");
>> +		return -EINVAL;
> ...
>> +	if (WARN_ON(!src_reg)) {
>> +		verbose("verifier internal error\n");
>> +		return -EINVAL;
>>  	}
> i'm lost with these bits.
> Can you add a comment in what circumstances this can be hit
> and what would be the consequences?
It should be impossible to hit either of these cases.  If we let the
 first through, we'd probably do invalid pointer arithmetic (e.g. we
 could multiply a pointer by two and think we'd just multiplied the
 variable offset).  As for the latter, we access through that pointer
 so if it were NULL we would promptly oops.
>> +/* Returns true if (rold safe implies rcur safe) */
>> +static bool regsafe(struct bpf_reg_state *rold,
>> +		    struct bpf_reg_state *rcur,
>> +		    bool varlen_map_access)
>> +{
>> +	if (memcmp(rold, rcur, sizeof(*rold)) == 0)
>>  		return true;
>> +	if (rold->type == NOT_INIT)
>> +		/* explored state can't have used this */
>>  		return true;
>> +	if (rcur->type == NOT_INIT)
>> +		return false;
>> +	switch (rold->type) {
>> +	case SCALAR_VALUE:
>> +		if (rcur->type == SCALAR_VALUE) {
>> +			/* new val must satisfy old val knowledge */
>> +			return range_within(rold, rcur) &&
>> +			       tn_in(rold->align, rcur->align);
>> +		} else {
>> +			/* if we knew anything about the old value, we're not
>> +			 * equal, because we can't know anything about the
>> +			 * scalar value of the pointer in the new value.
>> +			 */
>> +			return rold->min_value == BPF_REGISTER_MIN_RANGE &&
>> +			       rold->max_value == BPF_REGISTER_MAX_RANGE &&
>> +			       !~rold->align.mask;
>> +		}
>> +	case PTR_TO_MAP_VALUE:
>> +		if (varlen_map_access) {
>> +			/* If the new min/max/align satisfy the old ones and
>> +			 * everything else matches, we are OK.
>> +			 * We don't care about the 'id' value, because nothing
>> +			 * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL)
>> +			 */
>> +			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
>> +			       range_within(rold, rcur) &&
>> +			       tn_in(rold->align, rcur->align);
>> +		} else {
>> +			/* If the ranges/align were not the same, but
>> +			 * everything else was and we didn't do a variable
>> +			 * access into a map then we are a-ok.
>> +			 */
>> +			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0;
>> +		}
>> +	case PTR_TO_MAP_VALUE_OR_NULL:
> does this new state comparison logic helps? Do you have any numbers before/after in the number of insns it had to process for the tests in selftests ?
I don't have the numbers, no (I'll try to collect them).  This rewrite was
 more because the data structures had changed so the old code needed changing
 to match.  It's mainly just a refactor and reimplementation of the existing
 logic, I think, extended to cover the new 'align' member as well.

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 4/5] bpf/verifier: track signed and unsigned min/max values
  2017-06-08  2:40     ` Alexei Starovoitov via iovisor-dev
@ 2017-06-08 15:23       ` Edward Cree via iovisor-dev
  -1 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-08 15:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On 08/06/17 03:40, Alexei Starovoitov wrote:
> On Wed, Jun 07, 2017 at 03:59:25PM +0100, Edward Cree wrote:
>> Allows us to, sometimes, combine information from a signed check of one
>>  bound and an unsigned check of the other.
>> We now track the full range of possible values, rather than restricting
>>  ourselves to [0, 1<<30) and considering anything beyond that as
>>  unknown.  While this is probably not necessary, it makes the code more
>>  straightforward and symmetrical between signed and unsigned bounds.
>>
>> Signed-off-by: Edward Cree <ecree@solarflare.com>
>> ---
>>  include/linux/bpf_verifier.h |  22 +-
>>  kernel/bpf/verifier.c        | 661 +++++++++++++++++++++++++------------------
>>  2 files changed, 395 insertions(+), 288 deletions(-)
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index e341469..10a5944 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -11,11 +11,15 @@
>>  #include <linux/filter.h> /* for MAX_BPF_STACK */
>>  #include <linux/tnum.h>
>>  
>> - /* Just some arbitrary values so we can safely do math without overflowing and
>> -  * are obviously wrong for any sort of memory access.
>> -  */
>> -#define BPF_REGISTER_MAX_RANGE (1024 * 1024 * 1024)
>> -#define BPF_REGISTER_MIN_RANGE -1
>> +/* Maximum variable offset umax_value permitted when resolving memory accesses.
>> + * In practice this is far bigger than any realistic pointer offset; this limit
>> + * ensures that umax_value + (int)off + (int)size cannot overflow a u64.
>> + */
>> +#define BPF_MAX_VAR_OFF	(1ULL << 31)
>> +/* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO].  This ensures
>> + * that converting umax_value to int cannot overflow.
>> + */
>> +#define BPF_MAX_VAR_SIZ	INT_MAX
>>  
>>  struct bpf_reg_state {
>>  	enum bpf_reg_type type;
>> @@ -38,7 +42,7 @@ struct bpf_reg_state {
>>  	 * PTR_TO_MAP_VALUE_OR_NULL, we have to NULL-check it _first_.
>>  	 */
>>  	u32 id;
>> -	/* These three fields must be last.  See states_equal() */
>> +	/* These five fields must be last.  See states_equal() */
>>  	/* For scalar types (SCALAR_VALUE), this represents our knowledge of
>>  	 * the actual value.
>>  	 * For pointer types, this represents the variable part of the offset
>> @@ -51,8 +55,10 @@ struct bpf_reg_state {
>>  	 * These refer to the same value as align, not necessarily the actual
>>  	 * contents of the register.
>>  	 */
>> -	s64 min_value; /* minimum possible (s64)value */
>> -	u64 max_value; /* maximum possible (u64)value */
>> +	s64 smin_value; /* minimum possible (s64)value */
>> +	s64 smax_value; /* maximum possible (s64)value */
>> +	u64 umin_value; /* minimum possible (u64)value */
>> +	u64 umax_value; /* maximum possible (u64)value */
> have uneasy feeling about this one.
> It's 16 extra bytes to be stored in every reg_state and memcmp later
> while we didn't have cases where people wanted negative values
> in ptr+var cases. Why bother than?
It was the only way I could see to both pass my new test (correctly reject
 an uninformative combination of JGT and JSGT), and still pass one of the
 other tests where we have to accept an informative combination of JGT and
 JSGT.  This isn't so much about supporting negative numbers as it is about
 deducing the right bounds from signed checks, or a mixture of signed and
 unsigned checks on the same value.
For instance, if you check a register is s< 5, you know nothing yet about
 its unsigned maximum (it could be -1).  But if you then check it's u< 10,
 or even if you check it's s>= 0, you've now learned its sign bit so you
 can conclude from the previous check that it's u< 5.  But to conclude
 that, you have to have stored the bound from the previous check.
I'm not too worried about the extra 16 bytes, because this is a control-
 plane operation, and I'd be surprised if its performance really turned out
 to be a problem.  But if there's a better way to handle these checks, I'm
 all ears.
>>  unknown.  While this is probably not necessary, it makes the code more
>>  straightforward and symmetrical between signed and unsigned bounds.
> it's hard for me to see the 'straightforward' part yet.
Well, the new reg_set_min_max[_inv]() are simpler, as they just update the
 relevant bound then call __reg_deduce_bounds() to propagate that knowledge
 into the others, rather than having confusing (and, as we've seen, buggy)
 logic in each case about "if we did this kind of check we've learned that
 thing in this branch".
Also, all the care to check "did we exceed BPF_REGISTER_MAX_RANGE?" goes
 away, as does special handling of negatives to turn them into
 BPF_REGISTER_MIN_RANGE (again, this has bugs in the current code).  Instead
 we just have to check "does our operation on the bounds overflow?", and if
 so, mark our bounds as unknown.
I think a lot of the arithmetic ops become a more mechanical "does this
 overflow?  No?  Then let's compute new bounds".  But then, that's partly
 because the semantics of the old min_value and max_value weren't documented
 anywhere (do they refer to the signed or the unsigned value in the
 register?) and so it's unclear to me why some of the code does what it does.

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 4/5] bpf/verifier: track signed and unsigned min/max values
@ 2017-06-08 15:23       ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-08 15:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 08/06/17 03:40, Alexei Starovoitov wrote:
> On Wed, Jun 07, 2017 at 03:59:25PM +0100, Edward Cree wrote:
>> Allows us to, sometimes, combine information from a signed check of one
>>  bound and an unsigned check of the other.
>> We now track the full range of possible values, rather than restricting
>>  ourselves to [0, 1<<30) and considering anything beyond that as
>>  unknown.  While this is probably not necessary, it makes the code more
>>  straightforward and symmetrical between signed and unsigned bounds.
>>
>> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
>> ---
>>  include/linux/bpf_verifier.h |  22 +-
>>  kernel/bpf/verifier.c        | 661 +++++++++++++++++++++++++------------------
>>  2 files changed, 395 insertions(+), 288 deletions(-)
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index e341469..10a5944 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -11,11 +11,15 @@
>>  #include <linux/filter.h> /* for MAX_BPF_STACK */
>>  #include <linux/tnum.h>
>>  
>> - /* Just some arbitrary values so we can safely do math without overflowing and
>> -  * are obviously wrong for any sort of memory access.
>> -  */
>> -#define BPF_REGISTER_MAX_RANGE (1024 * 1024 * 1024)
>> -#define BPF_REGISTER_MIN_RANGE -1
>> +/* Maximum variable offset umax_value permitted when resolving memory accesses.
>> + * In practice this is far bigger than any realistic pointer offset; this limit
>> + * ensures that umax_value + (int)off + (int)size cannot overflow a u64.
>> + */
>> +#define BPF_MAX_VAR_OFF	(1ULL << 31)
>> +/* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO].  This ensures
>> + * that converting umax_value to int cannot overflow.
>> + */
>> +#define BPF_MAX_VAR_SIZ	INT_MAX
>>  
>>  struct bpf_reg_state {
>>  	enum bpf_reg_type type;
>> @@ -38,7 +42,7 @@ struct bpf_reg_state {
>>  	 * PTR_TO_MAP_VALUE_OR_NULL, we have to NULL-check it _first_.
>>  	 */
>>  	u32 id;
>> -	/* These three fields must be last.  See states_equal() */
>> +	/* These five fields must be last.  See states_equal() */
>>  	/* For scalar types (SCALAR_VALUE), this represents our knowledge of
>>  	 * the actual value.
>>  	 * For pointer types, this represents the variable part of the offset
>> @@ -51,8 +55,10 @@ struct bpf_reg_state {
>>  	 * These refer to the same value as align, not necessarily the actual
>>  	 * contents of the register.
>>  	 */
>> -	s64 min_value; /* minimum possible (s64)value */
>> -	u64 max_value; /* maximum possible (u64)value */
>> +	s64 smin_value; /* minimum possible (s64)value */
>> +	s64 smax_value; /* maximum possible (s64)value */
>> +	u64 umin_value; /* minimum possible (u64)value */
>> +	u64 umax_value; /* maximum possible (u64)value */
> have uneasy feeling about this one.
> It's 16 extra bytes to be stored in every reg_state and memcmp later
> while we didn't have cases where people wanted negative values
> in ptr+var cases. Why bother than?
It was the only way I could see to both pass my new test (correctly reject
 an uninformative combination of JGT and JSGT), and still pass one of the
 other tests where we have to accept an informative combination of JGT and
 JSGT.  This isn't so much about supporting negative numbers as it is about
 deducing the right bounds from signed checks, or a mixture of signed and
 unsigned checks on the same value.
For instance, if you check a register is s< 5, you know nothing yet about
 its unsigned maximum (it could be -1).  But if you then check it's u< 10,
 or even if you check it's s>= 0, you've now learned its sign bit so you
 can conclude from the previous check that it's u< 5.  But to conclude
 that, you have to have stored the bound from the previous check.
I'm not too worried about the extra 16 bytes, because this is a control-
 plane operation, and I'd be surprised if its performance really turned out
 to be a problem.  But if there's a better way to handle these checks, I'm
 all ears.
>>  unknown.  While this is probably not necessary, it makes the code more
>>  straightforward and symmetrical between signed and unsigned bounds.
> it's hard for me to see the 'straightforward' part yet.
Well, the new reg_set_min_max[_inv]() are simpler, as they just update the
 relevant bound then call __reg_deduce_bounds() to propagate that knowledge
 into the others, rather than having confusing (and, as we've seen, buggy)
 logic in each case about "if we did this kind of check we've learned that
 thing in this branch".
Also, all the care to check "did we exceed BPF_REGISTER_MAX_RANGE?" goes
 away, as does special handling of negatives to turn them into
 BPF_REGISTER_MIN_RANGE (again, this has bugs in the current code).  Instead
 we just have to check "does our operation on the bounds overflow?", and if
 so, mark our bounds as unknown.
I think a lot of the arithmetic ops become a more mechanical "does this
 overflow?  No?  Then let's compute new bounds".  But then, that's partly
 because the semantics of the old min_value and max_value weren't documented
 anywhere (do they refer to the signed or the unsigned value in the
 register?) and so it's unclear to me why some of the code does what it does.

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
  2017-06-08  2:35     ` Alexei Starovoitov via iovisor-dev
@ 2017-06-08 15:25       ` Edward Cree via iovisor-dev
  -1 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-08 15:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On 08/06/17 03:35, Alexei Starovoitov wrote:
> such large back and forth move doesn't help reviewing.
> may be just merge it into previous patch?
> Or keep that function in the right place in patch 2 already?
I think 'diff' got a bit confused, and maybe with different options I could
 have got it to produce something more readable.  But I think I will just
 merge this into patch 2; it's only separate because it started out as an
 experiment.

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08 15:25       ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-08 15:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 08/06/17 03:35, Alexei Starovoitov wrote:
> such large back and forth move doesn't help reviewing.
> may be just merge it into previous patch?
> Or keep that function in the right place in patch 2 already?
I think 'diff' got a bit confused, and maybe with different options I could
 have got it to produce something more readable.  But I think I will just
 merge this into patch 2; it's only separate because it started out as an
 experiment.

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 5/5] selftests/bpf: change test_verifier expectations
  2017-06-08  2:43     ` Alexei Starovoitov via iovisor-dev
@ 2017-06-08 15:27       ` Edward Cree via iovisor-dev
  -1 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-08 15:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On 08/06/17 03:43, Alexei Starovoitov wrote:
> On Wed, Jun 07, 2017 at 04:00:02PM +0100, Edward Cree wrote:
>> Some of the verifier's error messages have changed, and some constructs
>>  that previously couldn't be verified are now accepted.
>>
>> Signed-off-by: Edward Cree <ecree@solarflare.com>
>> ---
>>  tools/testing/selftests/bpf/test_verifier.c | 226 ++++++++++++++--------------
>>  1 file changed, 116 insertions(+), 110 deletions(-)
> imo this rewrite needs more than one additional test.
> Like i counted at least 2 new verifier features (like negative and ptr & 0x40)
> All the new logic needs to be covered by tests.
Yes, I will write some new tests to cover the new features.  I just wanted
 to get some comments on the patch first, in case I was barking up entirely
 the wrong tree.

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 5/5] selftests/bpf: change test_verifier expectations
@ 2017-06-08 15:27       ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-08 15:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 08/06/17 03:43, Alexei Starovoitov wrote:
> On Wed, Jun 07, 2017 at 04:00:02PM +0100, Edward Cree wrote:
>> Some of the verifier's error messages have changed, and some constructs
>>  that previously couldn't be verified are now accepted.
>>
>> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
>> ---
>>  tools/testing/selftests/bpf/test_verifier.c | 226 ++++++++++++++--------------
>>  1 file changed, 116 insertions(+), 110 deletions(-)
> imo this rewrite needs more than one additional test.
> Like i counted at least 2 new verifier features (like negative and ptr & 0x40)
> All the new logic needs to be covered by tests.
Yes, I will write some new tests to cover the new features.  I just wanted
 to get some comments on the patch first, in case I was barking up entirely
 the wrong tree.

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-08 16:45         ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov @ 2017-06-08 16:45 UTC (permalink / raw)
  To: Edward Cree
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On Thu, Jun 08, 2017 at 03:53:36PM +0100, Edward Cree wrote:
> >>  
> >> -	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
> >> +	} else if (reg->type == PTR_TO_STACK) {
> >> +		/* stack accesses must be at a fixed offset, so that we can
> >> +		 * determine what type of data were returned.
> >> +		 */
> >> +		if (reg->align.mask) {
> >> +			char tn_buf[48];
> >> +
> >> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
> >> +			verbose("variable stack access align=%s off=%d size=%d",
> >> +				tn_buf, off, size);
> >> +			return -EACCES;
> > hmm. why this restriction?
> > I thought one of key points of the diff that ptr+var tracking logic
> > will now apply not only to map_value, but to stack_ptr as well?
> As the comment above it says, we need to determine what was returned:
>  was it STACK_MISC or STACK_SPILL, and if the latter, what kind of pointer
>  was spilled there?  See check_stack_read(), which I should probably
>  mention in the comment.

this piece of code is not only spill/fill, but normal ldx/stx stack access.
Consider the frequent pattern that many folks tried to do:
bpf_prog()
{
  char buf[64];
  int len;

  bpf_probe_read(&len, sizeof(len), kernel_ptr_to_filename_len);
  bpf_probe_read(buf, sizeof(buf), kernel_ptr_to_filename);
  buf[len & (sizeof(buf) - 1)] = 0;
...

currently above is not supported, but when 'buf' is a pointer to map value
it works fine. Allocating extra bpf map just to do such workaround
isn't nice and since this patch generalized map_value_adj with ptr_to_stack
we can support above code too.
We can check that all bytes of stack for this variable access were
initialized already.
In the example above it will happen by bpf_probe_read (in the verifier code):
        for (i = 0; i < meta.access_size; i++) {
                err = check_mem_access(env, meta.regno, i, BPF_B, BPF_WRITE, -1);
so at the time of
  buf[len & ..] = 0
we can check that 'stx' is within the range of inited stack and allow it.

> >> +	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
> >> +	    state->regs[value_regno].type == SCALAR_VALUE) {
> >> +		/* b/h/w load zero-extends, mark upper bits as known 0 */
> >> +		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
> >> +		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
> > probably another helper from tnum.h is needed.
> I could rewrite as
>  reg->align = tn_and(reg->align, tn_const((1ULL << (size * 8)) - 1))

yep. that's perfect.

> >> +	/* Got here implies adding two SCALAR_VALUEs */
> >> +	if (WARN_ON_ONCE(ptr_reg)) {
> >> +		verbose("verifier internal error\n");
> >> +		return -EINVAL;
> > ...
> >> +	if (WARN_ON(!src_reg)) {
> >> +		verbose("verifier internal error\n");
> >> +		return -EINVAL;
> >>  	}
> > i'm lost with these bits.
> > Can you add a comment in what circumstances this can be hit
> > and what would be the consequences?
> It should be impossible to hit either of these cases.  If we let the
>  first through, we'd probably do invalid pointer arithmetic (e.g. we
>  could multiply a pointer by two and think we'd just multiplied the
>  variable offset).  As for the latter, we access through that pointer
>  so if it were NULL we would promptly oops.

I see. May be print verifier state in such warn_ons and make error
more human readable?

> >> +	case PTR_TO_MAP_VALUE_OR_NULL:
> > does this new state comparison logic helps? Do you have any numbers before/after in the number of insns it had to process for the tests in selftests ?
> I don't have the numbers, no (I'll try to collect them).  This rewrite was

Thanks. The main concern is that right now some complex programs
that cilium is using are close to the verifier complexity limit and these
big changes to amount of info recognized by the verifier can cause pruning
to be ineffective, so we need to test on big programs.
I think Daniel will be happy to test your next rev of the patches.
I'll test them as well.
At least 'insn_processed' from C code in tools/testing/selftests/bpf/
is a good estimate of how these changes affect pruning.

btw, I'm working on bpf_call support and also refactoring verifier
quite a bit, but my stuff is far from ready and I'll wait for
your rewrite to land first.
One of the things I'm working on is trying to get rid of state pruning
heuristics and use register+stack liveness information instead.
It's all experimental so far.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-08 16:45         ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov via iovisor-dev @ 2017-06-08 16:45 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Thu, Jun 08, 2017 at 03:53:36PM +0100, Edward Cree wrote:
> >>  
> >> -	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
> >> +	} else if (reg->type == PTR_TO_STACK) {
> >> +		/* stack accesses must be at a fixed offset, so that we can
> >> +		 * determine what type of data were returned.
> >> +		 */
> >> +		if (reg->align.mask) {
> >> +			char tn_buf[48];
> >> +
> >> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
> >> +			verbose("variable stack access align=%s off=%d size=%d",
> >> +				tn_buf, off, size);
> >> +			return -EACCES;
> > hmm. why this restriction?
> > I thought one of key points of the diff that ptr+var tracking logic
> > will now apply not only to map_value, but to stack_ptr as well?
> As the comment above it says, we need to determine what was returned:
>  was it STACK_MISC or STACK_SPILL, and if the latter, what kind of pointer
>  was spilled there?  See check_stack_read(), which I should probably
>  mention in the comment.

this piece of code is not only spill/fill, but normal ldx/stx stack access.
Consider the frequent pattern that many folks tried to do:
bpf_prog()
{
  char buf[64];
  int len;

  bpf_probe_read(&len, sizeof(len), kernel_ptr_to_filename_len);
  bpf_probe_read(buf, sizeof(buf), kernel_ptr_to_filename);
  buf[len & (sizeof(buf) - 1)] = 0;
...

currently above is not supported, but when 'buf' is a pointer to map value
it works fine. Allocating extra bpf map just to do such workaround
isn't nice and since this patch generalized map_value_adj with ptr_to_stack
we can support above code too.
We can check that all bytes of stack for this variable access were
initialized already.
In the example above it will happen by bpf_probe_read (in the verifier code):
        for (i = 0; i < meta.access_size; i++) {
                err = check_mem_access(env, meta.regno, i, BPF_B, BPF_WRITE, -1);
so at the time of
  buf[len & ..] = 0
we can check that 'stx' is within the range of inited stack and allow it.

> >> +	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
> >> +	    state->regs[value_regno].type == SCALAR_VALUE) {
> >> +		/* b/h/w load zero-extends, mark upper bits as known 0 */
> >> +		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
> >> +		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
> > probably another helper from tnum.h is needed.
> I could rewrite as
>  reg->align = tn_and(reg->align, tn_const((1ULL << (size * 8)) - 1))

yep. that's perfect.

> >> +	/* Got here implies adding two SCALAR_VALUEs */
> >> +	if (WARN_ON_ONCE(ptr_reg)) {
> >> +		verbose("verifier internal error\n");
> >> +		return -EINVAL;
> > ...
> >> +	if (WARN_ON(!src_reg)) {
> >> +		verbose("verifier internal error\n");
> >> +		return -EINVAL;
> >>  	}
> > i'm lost with these bits.
> > Can you add a comment in what circumstances this can be hit
> > and what would be the consequences?
> It should be impossible to hit either of these cases.  If we let the
>  first through, we'd probably do invalid pointer arithmetic (e.g. we
>  could multiply a pointer by two and think we'd just multiplied the
>  variable offset).  As for the latter, we access through that pointer
>  so if it were NULL we would promptly oops.

I see. May be print verifier state in such warn_ons and make error
more human readable?

> >> +	case PTR_TO_MAP_VALUE_OR_NULL:
> > does this new state comparison logic helps? Do you have any numbers before/after in the number of insns it had to process for the tests in selftests ?
> I don't have the numbers, no (I'll try to collect them).  This rewrite was

Thanks. The main concern is that right now some complex programs
that cilium is using are close to the verifier complexity limit and these
big changes to amount of info recognized by the verifier can cause pruning
to be ineffective, so we need to test on big programs.
I think Daniel will be happy to test your next rev of the patches.
I'll test them as well.
At least 'insn_processed' from C code in tools/testing/selftests/bpf/
is a good estimate of how these changes affect pruning.

btw, I'm working on bpf_call support and also refactoring verifier
quite a bit, but my stuff is far from ready and I'll wait for
your rewrite to land first.
One of the things I'm working on is trying to get rid of state pruning
heuristics and use register+stack liveness information instead.
It's all experimental so far.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 4/5] bpf/verifier: track signed and unsigned min/max values
  2017-06-08 15:23       ` Edward Cree via iovisor-dev
  (?)
@ 2017-06-08 16:47       ` Alexei Starovoitov
  -1 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov @ 2017-06-08 16:47 UTC (permalink / raw)
  To: Edward Cree
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On Thu, Jun 08, 2017 at 04:23:24PM +0100, Edward Cree wrote:
> On 08/06/17 03:40, Alexei Starovoitov wrote:
> > On Wed, Jun 07, 2017 at 03:59:25PM +0100, Edward Cree wrote:
> >> Allows us to, sometimes, combine information from a signed check of one
> >>  bound and an unsigned check of the other.
> >> We now track the full range of possible values, rather than restricting
> >>  ourselves to [0, 1<<30) and considering anything beyond that as
> >>  unknown.  While this is probably not necessary, it makes the code more
> >>  straightforward and symmetrical between signed and unsigned bounds.
> >>
> >> Signed-off-by: Edward Cree <ecree@solarflare.com>
> >> ---
> >>  include/linux/bpf_verifier.h |  22 +-
> >>  kernel/bpf/verifier.c        | 661 +++++++++++++++++++++++++------------------
> >>  2 files changed, 395 insertions(+), 288 deletions(-)
> >>
> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> index e341469..10a5944 100644
> >> --- a/include/linux/bpf_verifier.h
> >> +++ b/include/linux/bpf_verifier.h
> >> @@ -11,11 +11,15 @@
> >>  #include <linux/filter.h> /* for MAX_BPF_STACK */
> >>  #include <linux/tnum.h>
> >>  
> >> - /* Just some arbitrary values so we can safely do math without overflowing and
> >> -  * are obviously wrong for any sort of memory access.
> >> -  */
> >> -#define BPF_REGISTER_MAX_RANGE (1024 * 1024 * 1024)
> >> -#define BPF_REGISTER_MIN_RANGE -1
> >> +/* Maximum variable offset umax_value permitted when resolving memory accesses.
> >> + * In practice this is far bigger than any realistic pointer offset; this limit
> >> + * ensures that umax_value + (int)off + (int)size cannot overflow a u64.
> >> + */
> >> +#define BPF_MAX_VAR_OFF	(1ULL << 31)
> >> +/* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO].  This ensures
> >> + * that converting umax_value to int cannot overflow.
> >> + */
> >> +#define BPF_MAX_VAR_SIZ	INT_MAX
> >>  
> >>  struct bpf_reg_state {
> >>  	enum bpf_reg_type type;
> >> @@ -38,7 +42,7 @@ struct bpf_reg_state {
> >>  	 * PTR_TO_MAP_VALUE_OR_NULL, we have to NULL-check it _first_.
> >>  	 */
> >>  	u32 id;
> >> -	/* These three fields must be last.  See states_equal() */
> >> +	/* These five fields must be last.  See states_equal() */
> >>  	/* For scalar types (SCALAR_VALUE), this represents our knowledge of
> >>  	 * the actual value.
> >>  	 * For pointer types, this represents the variable part of the offset
> >> @@ -51,8 +55,10 @@ struct bpf_reg_state {
> >>  	 * These refer to the same value as align, not necessarily the actual
> >>  	 * contents of the register.
> >>  	 */
> >> -	s64 min_value; /* minimum possible (s64)value */
> >> -	u64 max_value; /* maximum possible (u64)value */
> >> +	s64 smin_value; /* minimum possible (s64)value */
> >> +	s64 smax_value; /* maximum possible (s64)value */
> >> +	u64 umin_value; /* minimum possible (u64)value */
> >> +	u64 umax_value; /* maximum possible (u64)value */
> > have uneasy feeling about this one.
> > It's 16 extra bytes to be stored in every reg_state and memcmp later
> > while we didn't have cases where people wanted negative values
> > in ptr+var cases. Why bother than?
> It was the only way I could see to both pass my new test (correctly reject
>  an uninformative combination of JGT and JSGT), and still pass one of the
>  other tests where we have to accept an informative combination of JGT and
>  JSGT.  This isn't so much about supporting negative numbers as it is about
>  deducing the right bounds from signed checks, or a mixture of signed and
>  unsigned checks on the same value.
> For instance, if you check a register is s< 5, you know nothing yet about
>  its unsigned maximum (it could be -1).  But if you then check it's u< 10,
>  or even if you check it's s>= 0, you've now learned its sign bit so you
>  can conclude from the previous check that it's u< 5.  But to conclude
>  that, you have to have stored the bound from the previous check.
> I'm not too worried about the extra 16 bytes, because this is a control-
>  plane operation, and I'd be surprised if its performance really turned out
>  to be a problem.  But if there's a better way to handle these checks, I'm
>  all ears.
> >>  unknown.  While this is probably not necessary, it makes the code more
> >>  straightforward and symmetrical between signed and unsigned bounds.
> > it's hard for me to see the 'straightforward' part yet.
> Well, the new reg_set_min_max[_inv]() are simpler, as they just update the
>  relevant bound then call __reg_deduce_bounds() to propagate that knowledge
>  into the others, rather than having confusing (and, as we've seen, buggy)
>  logic in each case about "if we did this kind of check we've learned that
>  thing in this branch".
> Also, all the care to check "did we exceed BPF_REGISTER_MAX_RANGE?" goes
>  away, as does special handling of negatives to turn them into
>  BPF_REGISTER_MIN_RANGE (again, this has bugs in the current code).  Instead
>  we just have to check "does our operation on the bounds overflow?", and if
>  so, mark our bounds as unknown.
> I think a lot of the arithmetic ops become a more mechanical "does this
>  overflow?  No?  Then let's compute new bounds".  But then, that's partly
>  because the semantics of the old min_value and max_value weren't documented
>  anywhere (do they refer to the signed or the unsigned value in the
>  register?) and so it's unclear to me why some of the code does what it does.

got it. that all makes sense.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
  2017-06-08 15:25       ` Edward Cree via iovisor-dev
  (?)
@ 2017-06-08 16:50       ` Alexei Starovoitov
  2017-06-08 17:12           ` Edward Cree via iovisor-dev
  -1 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2017-06-08 16:50 UTC (permalink / raw)
  To: Edward Cree
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On Thu, Jun 08, 2017 at 04:25:39PM +0100, Edward Cree wrote:
> On 08/06/17 03:35, Alexei Starovoitov wrote:
> > such large back and forth move doesn't help reviewing.
> > may be just merge it into previous patch?
> > Or keep that function in the right place in patch 2 already?
> I think 'diff' got a bit confused, and maybe with different options I could
>  have got it to produce something more readable.  But I think I will just
>  merge this into patch 2; it's only separate because it started out as an
>  experiment.

after sleeping on it I'm not sure we should be allowing such pointer
arithmetic. In normal C code people do fancy tricks with lower 3 bits
of the pointer, but in bpf code I cannot see such use case.
What kind of realistic code will be doing ptr & 0x40 ?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08 17:12           ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-08 17:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On 08/06/17 17:50, Alexei Starovoitov wrote:
> On Thu, Jun 08, 2017 at 04:25:39PM +0100, Edward Cree wrote:
>> On 08/06/17 03:35, Alexei Starovoitov wrote:
>>> such large back and forth move doesn't help reviewing.
>>> may be just merge it into previous patch?
>>> Or keep that function in the right place in patch 2 already?
>> I think 'diff' got a bit confused, and maybe with different options I could
>>  have got it to produce something more readable.  But I think I will just
>>  merge this into patch 2; it's only separate because it started out as an
>>  experiment.
> after sleeping on it I'm not sure we should be allowing such pointer
> arithmetic. In normal C code people do fancy tricks with lower 3 bits
> of the pointer, but in bpf code I cannot see such use case.
> What kind of realistic code will be doing ptr & 0x40 ?
Well, I didn't support it because I saw a use case.  I supported it because
 it seemed easy to do and the code came out reasonably elegant-looking.
Since this is guarded by env->allow_ptr_leaks, I can't see any reason _not_
 to let people try fancy tricks with the low bits of pointers.
I agree ptr & 0x40 is a crazy thing with no imaginable use case, but...
"Unix was not designed to stop its users from doing stupid things, as that
 would also stop them from doing clever things." ;-)

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08 17:12           ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-08 17:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 08/06/17 17:50, Alexei Starovoitov wrote:
> On Thu, Jun 08, 2017 at 04:25:39PM +0100, Edward Cree wrote:
>> On 08/06/17 03:35, Alexei Starovoitov wrote:
>>> such large back and forth move doesn't help reviewing.
>>> may be just merge it into previous patch?
>>> Or keep that function in the right place in patch 2 already?
>> I think 'diff' got a bit confused, and maybe with different options I could
>>  have got it to produce something more readable.  But I think I will just
>>  merge this into patch 2; it's only separate because it started out as an
>>  experiment.
> after sleeping on it I'm not sure we should be allowing such pointer
> arithmetic. In normal C code people do fancy tricks with lower 3 bits
> of the pointer, but in bpf code I cannot see such use case.
> What kind of realistic code will be doing ptr & 0x40 ?
Well, I didn't support it because I saw a use case.  I supported it because
 it seemed easy to do and the code came out reasonably elegant-looking.
Since this is guarded by env->allow_ptr_leaks, I can't see any reason _not_
 to let people try fancy tricks with the low bits of pointers.
I agree ptr & 0x40 is a crazy thing with no imaginable use case, but...
"Unix was not designed to stop its users from doing stupid things, as that
 would also stop them from doing clever things." ;-)

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08 18:41             ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov @ 2017-06-08 18:41 UTC (permalink / raw)
  To: Edward Cree
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On Thu, Jun 08, 2017 at 06:12:39PM +0100, Edward Cree wrote:
> On 08/06/17 17:50, Alexei Starovoitov wrote:
> > On Thu, Jun 08, 2017 at 04:25:39PM +0100, Edward Cree wrote:
> >> On 08/06/17 03:35, Alexei Starovoitov wrote:
> >>> such large back and forth move doesn't help reviewing.
> >>> may be just merge it into previous patch?
> >>> Or keep that function in the right place in patch 2 already?
> >> I think 'diff' got a bit confused, and maybe with different options I could
> >>  have got it to produce something more readable.  But I think I will just
> >>  merge this into patch 2; it's only separate because it started out as an
> >>  experiment.
> > after sleeping on it I'm not sure we should be allowing such pointer
> > arithmetic. In normal C code people do fancy tricks with lower 3 bits
> > of the pointer, but in bpf code I cannot see such use case.
> > What kind of realistic code will be doing ptr & 0x40 ?
> Well, I didn't support it because I saw a use case.  I supported it because
>  it seemed easy to do and the code came out reasonably elegant-looking.
> Since this is guarded by env->allow_ptr_leaks, I can't see any reason _not_
>  to let people try fancy tricks with the low bits of pointers.
> I agree ptr & 0x40 is a crazy thing with no imaginable use case, but...
> "Unix was not designed to stop its users from doing stupid things, as that
>  would also stop them from doing clever things." ;-)

well, I agree with the philosophy :) but I also see few reasons not to allow it:
1. it immediately becomes uapi and if later we find out that it's preventing us
to do something we actually really need we'll be stuck looking for workaround
2. it's the same pruning concern. probably doesn't fully apply here, but
the reason we don't track 'if (reg == 1) ...' is if we mark that
register as known const_imm in the true branch, it will screw up
pruning quite badly. It's trivial to track and may seem useful,
but hurts instead.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08 18:41             ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov via iovisor-dev @ 2017-06-08 18:41 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Thu, Jun 08, 2017 at 06:12:39PM +0100, Edward Cree wrote:
> On 08/06/17 17:50, Alexei Starovoitov wrote:
> > On Thu, Jun 08, 2017 at 04:25:39PM +0100, Edward Cree wrote:
> >> On 08/06/17 03:35, Alexei Starovoitov wrote:
> >>> such large back and forth move doesn't help reviewing.
> >>> may be just merge it into previous patch?
> >>> Or keep that function in the right place in patch 2 already?
> >> I think 'diff' got a bit confused, and maybe with different options I could
> >>  have got it to produce something more readable.  But I think I will just
> >>  merge this into patch 2; it's only separate because it started out as an
> >>  experiment.
> > after sleeping on it I'm not sure we should be allowing such pointer
> > arithmetic. In normal C code people do fancy tricks with lower 3 bits
> > of the pointer, but in bpf code I cannot see such use case.
> > What kind of realistic code will be doing ptr & 0x40 ?
> Well, I didn't support it because I saw a use case.  I supported it because
>  it seemed easy to do and the code came out reasonably elegant-looking.
> Since this is guarded by env->allow_ptr_leaks, I can't see any reason _not_
>  to let people try fancy tricks with the low bits of pointers.
> I agree ptr & 0x40 is a crazy thing with no imaginable use case, but...
> "Unix was not designed to stop its users from doing stupid things, as that
>  would also stop them from doing clever things." ;-)

well, I agree with the philosophy :) but I also see few reasons not to allow it:
1. it immediately becomes uapi and if later we find out that it's preventing us
to do something we actually really need we'll be stuck looking for workaround
2. it's the same pruning concern. probably doesn't fully apply here, but
the reason we don't track 'if (reg == 1) ...' is if we mark that
register as known const_imm in the true branch, it will screw up
pruning quite badly. It's trivial to track and may seem useful,
but hurts instead.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08 19:07               ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-08 19:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On 08/06/17 19:41, Alexei Starovoitov wrote:
> On Thu, Jun 08, 2017 at 06:12:39PM +0100, Edward Cree wrote:
>> On 08/06/17 17:50, Alexei Starovoitov wrote:
>>> On Thu, Jun 08, 2017 at 04:25:39PM +0100, Edward Cree wrote:
>>>> On 08/06/17 03:35, Alexei Starovoitov wrote:
>>>>> such large back and forth move doesn't help reviewing.
>>>>> may be just merge it into previous patch?
>>>>> Or keep that function in the right place in patch 2 already?
>>>> I think 'diff' got a bit confused, and maybe with different options I could
>>>>  have got it to produce something more readable.  But I think I will just
>>>>  merge this into patch 2; it's only separate because it started out as an
>>>>  experiment.
>>> after sleeping on it I'm not sure we should be allowing such pointer
>>> arithmetic. In normal C code people do fancy tricks with lower 3 bits
>>> of the pointer, but in bpf code I cannot see such use case.
>>> What kind of realistic code will be doing ptr & 0x40 ?
>> Well, I didn't support it because I saw a use case.  I supported it because
>>  it seemed easy to do and the code came out reasonably elegant-looking.
>> Since this is guarded by env->allow_ptr_leaks, I can't see any reason _not_
>>  to let people try fancy tricks with the low bits of pointers.
>> I agree ptr & 0x40 is a crazy thing with no imaginable use case, but...
>> "Unix was not designed to stop its users from doing stupid things, as that
>>  would also stop them from doing clever things." ;-)
> well, I agree with the philosophy :) but I also see few reasons not to allow it:
> 1. it immediately becomes uapi and if later we find out that it's preventing us
> to do something we actually really need we'll be stuck looking for workaround
What could it prevent us from doing, though?  It's basically equivalent to giving
 BPF an opcode that casts a pointer to a u64, which of course is only allowed if
 allow_ptr_leaks is true.  And since we don't feed any knowledge about the pointer
 into the verifier, it's just like any other way of filling a register with
 arbitrary, unknown bits.
I can fully appreciate why you're being cautious, what with uapi and all.  But I
 don't think there's any actual problem here.  Open to being convinced, though.
> 2. it's the same pruning concern. probably doesn't fully apply here, but
> the reason we don't track 'if (reg == 1) ...'
Don't we though?
http://elixir.free-electrons.com/linux/v4.12-rc4/source/kernel/bpf/verifier.c#L2127
> is if we mark that
> register as known const_imm in the true branch, it will screw up
> pruning quite badly. It's trivial to track and may seem useful,
> but hurts instead.
(Thinking out loud...)

What would be really nice is a way to propagate limits backwards as well as
 forwards, so that the verifier can say "when I tested this branch, I used
 this part of the state, I read four bytes past this pointer".  Then when it
 wants to prune, it can say "well, the state this time isn't as strong, but
 it still satisfies everything I actually used".
But that sounds like it would be very hard indeed to do.

Maybe with the basic-block DAG stuff David's been talking about, we could
 find all the paths that reach a block, and take the union of their states,
 and then run through the block feeding it that combined state.  But that
 could reject code that relies on correlation of the state (i.e. if r1 != 0
 then r2 is valid ptr I can access, etc) so would still need the 'walk with
 each individual state' as a fallback.  Though at least you'd have all the
 states at once so you could find out which ones were subsumed, instead of
 hoping you get to them in the right order.

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08 19:07               ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-08 19:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 08/06/17 19:41, Alexei Starovoitov wrote:
> On Thu, Jun 08, 2017 at 06:12:39PM +0100, Edward Cree wrote:
>> On 08/06/17 17:50, Alexei Starovoitov wrote:
>>> On Thu, Jun 08, 2017 at 04:25:39PM +0100, Edward Cree wrote:
>>>> On 08/06/17 03:35, Alexei Starovoitov wrote:
>>>>> such large back and forth move doesn't help reviewing.
>>>>> may be just merge it into previous patch?
>>>>> Or keep that function in the right place in patch 2 already?
>>>> I think 'diff' got a bit confused, and maybe with different options I could
>>>>  have got it to produce something more readable.  But I think I will just
>>>>  merge this into patch 2; it's only separate because it started out as an
>>>>  experiment.
>>> after sleeping on it I'm not sure we should be allowing such pointer
>>> arithmetic. In normal C code people do fancy tricks with lower 3 bits
>>> of the pointer, but in bpf code I cannot see such use case.
>>> What kind of realistic code will be doing ptr & 0x40 ?
>> Well, I didn't support it because I saw a use case.  I supported it because
>>  it seemed easy to do and the code came out reasonably elegant-looking.
>> Since this is guarded by env->allow_ptr_leaks, I can't see any reason _not_
>>  to let people try fancy tricks with the low bits of pointers.
>> I agree ptr & 0x40 is a crazy thing with no imaginable use case, but...
>> "Unix was not designed to stop its users from doing stupid things, as that
>>  would also stop them from doing clever things." ;-)
> well, I agree with the philosophy :) but I also see few reasons not to allow it:
> 1. it immediately becomes uapi and if later we find out that it's preventing us
> to do something we actually really need we'll be stuck looking for workaround
What could it prevent us from doing, though?  It's basically equivalent to giving
 BPF an opcode that casts a pointer to a u64, which of course is only allowed if
 allow_ptr_leaks is true.  And since we don't feed any knowledge about the pointer
 into the verifier, it's just like any other way of filling a register with
 arbitrary, unknown bits.
I can fully appreciate why you're being cautious, what with uapi and all.  But I
 don't think there's any actual problem here.  Open to being convinced, though.
> 2. it's the same pruning concern. probably doesn't fully apply here, but
> the reason we don't track 'if (reg == 1) ...'
Don't we though?
http://elixir.free-electrons.com/linux/v4.12-rc4/source/kernel/bpf/verifier.c#L2127
> is if we mark that
> register as known const_imm in the true branch, it will screw up
> pruning quite badly. It's trivial to track and may seem useful,
> but hurts instead.
(Thinking out loud...)

What would be really nice is a way to propagate limits backwards as well as
 forwards, so that the verifier can say "when I tested this branch, I used
 this part of the state, I read four bytes past this pointer".  Then when it
 wants to prune, it can say "well, the state this time isn't as strong, but
 it still satisfies everything I actually used".
But that sounds like it would be very hard indeed to do.

Maybe with the basic-block DAG stuff David's been talking about, we could
 find all the paths that reach a block, and take the union of their states,
 and then run through the block feeding it that combined state.  But that
 could reject code that relies on correlation of the state (i.e. if r1 != 0
 then r2 is valid ptr I can access, etc) so would still need the 'walk with
 each individual state' as a fallback.  Though at least you'd have all the
 states at once so you could find out which ones were subsumed, instead of
 hoping you get to them in the right order.

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-08 19:38           ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree @ 2017-06-08 19:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On 08/06/17 17:45, Alexei Starovoitov wrote:
> On Thu, Jun 08, 2017 at 03:53:36PM +0100, Edward Cree wrote:
>>>>  
>>>> -	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
>>>> +	} else if (reg->type == PTR_TO_STACK) {
>>>> +		/* stack accesses must be at a fixed offset, so that we can
>>>> +		 * determine what type of data were returned.
>>>> +		 */
>>>> +		if (reg->align.mask) {
>>>> +			char tn_buf[48];
>>>> +
>>>> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
>>>> +			verbose("variable stack access align=%s off=%d size=%d",
>>>> +				tn_buf, off, size);
>>>> +			return -EACCES;
>>> hmm. why this restriction?
>>> I thought one of key points of the diff that ptr+var tracking logic
>>> will now apply not only to map_value, but to stack_ptr as well?
>> As the comment above it says, we need to determine what was returned:
>>  was it STACK_MISC or STACK_SPILL, and if the latter, what kind of pointer
>>  was spilled there?  See check_stack_read(), which I should probably
>>  mention in the comment.
> this piece of code is not only spill/fill, but normal ldx/stx stack access.
> Consider the frequent pattern that many folks tried to do:
> bpf_prog()
> {
>   char buf[64];
>   int len;
>
>   bpf_probe_read(&len, sizeof(len), kernel_ptr_to_filename_len);
>   bpf_probe_read(buf, sizeof(buf), kernel_ptr_to_filename);
>   buf[len & (sizeof(buf) - 1)] = 0;
> ...
>
> currently above is not supported, but when 'buf' is a pointer to map value
> it works fine. Allocating extra bpf map just to do such workaround
> isn't nice and since this patch generalized map_value_adj with ptr_to_stack
> we can support above code too.
> We can check that all bytes of stack for this variable access were
> initialized already.
> In the example above it will happen by bpf_probe_read (in the verifier code):
>         for (i = 0; i < meta.access_size; i++) {
>                 err = check_mem_access(env, meta.regno, i, BPF_B, BPF_WRITE, -1);
> so at the time of
>   buf[len & ..] = 0
> we can check that 'stx' is within the range of inited stack and allow it.
Yes, we could check every byte of the stack within the range [buf, buf+63]
 is a STACK_MISC and if so allow it.  But since this is not supported by the
 existing code (so it's not a regression), I'd prefer to leave that for a
 future patch - this one is quite big enough already ;-)
>>>> +	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
>>>> +	    state->regs[value_regno].type == SCALAR_VALUE) {
>>>> +		/* b/h/w load zero-extends, mark upper bits as known 0 */
>>>> +		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
>>>> +		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
>>> probably another helper from tnum.h is needed.
>> I could rewrite as
>>  reg->align = tn_and(reg->align, tn_const((1ULL << (size * 8)) - 1))
> yep. that's perfect.
In the end I settled on adding a helper
    struct tnum tnum_cast(struct tnum a, u8 size);
 since I have a bunch of other places that cast things to 32 bits.
> I see. May be print verifier state in such warn_ons and make error
> more human readable?
Good idea, I'll do that.
>>>> +	case PTR_TO_MAP_VALUE_OR_NULL:
>>> does this new state comparison logic helps? Do you have any numbers before/after in the number of insns it had to process for the tests in selftests ?
>> I don't have the numbers, no (I'll try to collect them).  This rewrite was
> Thanks. The main concern is that right now some complex programs
> that cilium is using are close to the verifier complexity limit and these
> big changes to amount of info recognized by the verifier can cause pruning
> to be ineffective, so we need to test on big programs.
> I think Daniel will be happy to test your next rev of the patches.
> I'll test them as well.
> At least 'insn_processed' from C code in tools/testing/selftests/bpf/
> is a good estimate of how these changes affect pruning.
It looks like the only place this gets recorded is as "processed %d insns"
 in the log_buf.  Is there a convenient way to get at this, or am I going
 to have to make bpf_verify_program grovel through the log sscanf()ing for
 a matching line?

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-08 19:38           ` Edward Cree via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Edward Cree via iovisor-dev @ 2017-06-08 19:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 08/06/17 17:45, Alexei Starovoitov wrote:
> On Thu, Jun 08, 2017 at 03:53:36PM +0100, Edward Cree wrote:
>>>>  
>>>> -	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
>>>> +	} else if (reg->type == PTR_TO_STACK) {
>>>> +		/* stack accesses must be at a fixed offset, so that we can
>>>> +		 * determine what type of data were returned.
>>>> +		 */
>>>> +		if (reg->align.mask) {
>>>> +			char tn_buf[48];
>>>> +
>>>> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
>>>> +			verbose("variable stack access align=%s off=%d size=%d",
>>>> +				tn_buf, off, size);
>>>> +			return -EACCES;
>>> hmm. why this restriction?
>>> I thought one of key points of the diff that ptr+var tracking logic
>>> will now apply not only to map_value, but to stack_ptr as well?
>> As the comment above it says, we need to determine what was returned:
>>  was it STACK_MISC or STACK_SPILL, and if the latter, what kind of pointer
>>  was spilled there?  See check_stack_read(), which I should probably
>>  mention in the comment.
> this piece of code is not only spill/fill, but normal ldx/stx stack access.
> Consider the frequent pattern that many folks tried to do:
> bpf_prog()
> {
>   char buf[64];
>   int len;
>
>   bpf_probe_read(&len, sizeof(len), kernel_ptr_to_filename_len);
>   bpf_probe_read(buf, sizeof(buf), kernel_ptr_to_filename);
>   buf[len & (sizeof(buf) - 1)] = 0;
> ...
>
> currently above is not supported, but when 'buf' is a pointer to map value
> it works fine. Allocating extra bpf map just to do such workaround
> isn't nice and since this patch generalized map_value_adj with ptr_to_stack
> we can support above code too.
> We can check that all bytes of stack for this variable access were
> initialized already.
> In the example above it will happen by bpf_probe_read (in the verifier code):
>         for (i = 0; i < meta.access_size; i++) {
>                 err = check_mem_access(env, meta.regno, i, BPF_B, BPF_WRITE, -1);
> so at the time of
>   buf[len & ..] = 0
> we can check that 'stx' is within the range of inited stack and allow it.
Yes, we could check every byte of the stack within the range [buf, buf+63]
 is a STACK_MISC and if so allow it.  But since this is not supported by the
 existing code (so it's not a regression), I'd prefer to leave that for a
 future patch - this one is quite big enough already ;-)
>>>> +	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
>>>> +	    state->regs[value_regno].type == SCALAR_VALUE) {
>>>> +		/* b/h/w load zero-extends, mark upper bits as known 0 */
>>>> +		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
>>>> +		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
>>> probably another helper from tnum.h is needed.
>> I could rewrite as
>>  reg->align = tn_and(reg->align, tn_const((1ULL << (size * 8)) - 1))
> yep. that's perfect.
In the end I settled on adding a helper
    struct tnum tnum_cast(struct tnum a, u8 size);
 since I have a bunch of other places that cast things to 32 bits.
> I see. May be print verifier state in such warn_ons and make error
> more human readable?
Good idea, I'll do that.
>>>> +	case PTR_TO_MAP_VALUE_OR_NULL:
>>> does this new state comparison logic helps? Do you have any numbers before/after in the number of insns it had to process for the tests in selftests ?
>> I don't have the numbers, no (I'll try to collect them).  This rewrite was
> Thanks. The main concern is that right now some complex programs
> that cilium is using are close to the verifier complexity limit and these
> big changes to amount of info recognized by the verifier can cause pruning
> to be ineffective, so we need to test on big programs.
> I think Daniel will be happy to test your next rev of the patches.
> I'll test them as well.
> At least 'insn_processed' from C code in tools/testing/selftests/bpf/
> is a good estimate of how these changes affect pruning.
It looks like the only place this gets recorded is as "processed %d insns"
 in the log_buf.  Is there a convenient way to get at this, or am I going
 to have to make bpf_verify_program grovel through the log sscanf()ing for
 a matching line?

-Ed

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 0/5] bpf: rewrite value tracking in verifier
  2017-06-07 14:55 ` Edward Cree via iovisor-dev
                   ` (5 preceding siblings ...)
  (?)
@ 2017-06-08 20:18 ` David Miller
  -1 siblings, 0 replies; 45+ messages in thread
From: David Miller @ 2017-06-08 20:18 UTC (permalink / raw)
  To: ecree; +Cc: alexei.starovoitov, ast, daniel, netdev, iovisor-dev, linux-kernel

From: Edward Cree <ecree@solarflare.com>
Date: Wed, 7 Jun 2017 15:55:57 +0100

> This series simplifies alignment tracking, generalises bounds tracking and
>  fixes some bounds-tracking bugs in the BPF verifier.  Pointer arithmetic on
>  packet pointers, stack pointers, map value pointers and context pointers has
>  been unified, and bounds on these pointers are only checked when the pointer
>  is dereferenced.
> Operations on pointers which destroy all relation to the original pointer
>  (such as multiplies and shifts) are disallowed if !env->allow_ptr_leaks,
>  otherwise they convert the pointer to an unknown scalar and feed it to the
>  normal scalar arithmetic handling.
> Pointer types have been unified with the corresponding adjusted-pointer types
>  where those existed (e.g. PTR_TO_MAP_VALUE[_ADJ] or FRAME_PTR vs
>  PTR_TO_STACK); similarly, CONST_IMM and UNKNOWN_VALUE have been unified into
>  SCALAR_VALUE.
> Pointer types (except CONST_PTR_TO_MAP, PTR_TO_MAP_VALUE_OR_NULL and
>  PTR_TO_PACKET_END, which do not allow arithmetic) have a 'fixed offset' and
>  a 'variable offset'; the former is used when e.g. adding an immediate or a
>  known-constant register, as long as it does not overflow.  Otherwise the
>  latter is used, and any operation creating a new variable offset creates a
>  new 'id' (and, for PTR_TO_PACKET, clears the 'range').
> SCALAR_VALUEs use the 'variable offset' fields to track the range of possible
>  values; the 'fixed offset' should never be set on a scalar.
> 
> Patch 2/5 is rather on the big side, but since it changes the contents and
>  semantics of a fairly central data structure, I'm not really sure how to go
>  about splitting it up further without producing broken intermediate states.
> 
> With the changes in patch 5/5, all tools/testing/selftests/bpf/test_verifier
>  tests pass.

Edward, I haven't had a chance to review this yet, but I wanted to thank you
for working on this.

I will find some time to test your work on sparc too.

Thanks again!

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08 21:17                 ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov @ 2017-06-08 21:17 UTC (permalink / raw)
  To: Edward Cree
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On Thu, Jun 08, 2017 at 08:07:53PM +0100, Edward Cree wrote:
> On 08/06/17 19:41, Alexei Starovoitov wrote:
> > On Thu, Jun 08, 2017 at 06:12:39PM +0100, Edward Cree wrote:
> >> On 08/06/17 17:50, Alexei Starovoitov wrote:
> >>> On Thu, Jun 08, 2017 at 04:25:39PM +0100, Edward Cree wrote:
> >>>> On 08/06/17 03:35, Alexei Starovoitov wrote:
> >>>>> such large back and forth move doesn't help reviewing.
> >>>>> may be just merge it into previous patch?
> >>>>> Or keep that function in the right place in patch 2 already?
> >>>> I think 'diff' got a bit confused, and maybe with different options I could
> >>>>  have got it to produce something more readable.  But I think I will just
> >>>>  merge this into patch 2; it's only separate because it started out as an
> >>>>  experiment.
> >>> after sleeping on it I'm not sure we should be allowing such pointer
> >>> arithmetic. In normal C code people do fancy tricks with lower 3 bits
> >>> of the pointer, but in bpf code I cannot see such use case.
> >>> What kind of realistic code will be doing ptr & 0x40 ?
> >> Well, I didn't support it because I saw a use case.  I supported it because
> >>  it seemed easy to do and the code came out reasonably elegant-looking.
> >> Since this is guarded by env->allow_ptr_leaks, I can't see any reason _not_
> >>  to let people try fancy tricks with the low bits of pointers.
> >> I agree ptr & 0x40 is a crazy thing with no imaginable use case, but...
> >> "Unix was not designed to stop its users from doing stupid things, as that
> >>  would also stop them from doing clever things." ;-)
> > well, I agree with the philosophy :) but I also see few reasons not to allow it:
> > 1. it immediately becomes uapi and if later we find out that it's preventing us
> > to do something we actually really need we'll be stuck looking for workaround
> What could it prevent us from doing, though?  It's basically equivalent to giving
>  BPF an opcode that casts a pointer to a u64, which of course is only allowed if
>  allow_ptr_leaks is true.  And since we don't feed any knowledge about the pointer
>  into the verifier, it's just like any other way of filling a register with
>  arbitrary, unknown bits.
> I can fully appreciate why you're being cautious, what with uapi and all.  But I
>  don't think there's any actual problem here.  Open to being convinced, though.

The leaking is not a concern. It's if we started accepting a certain
class of programs we need to keep accepting them in the future.
Another reason is 'ptr & mask' could have been simply a bug and rejecting it
suppose to help users find issues sooner...
but I don't have a strong opinion here.

> > 2. it's the same pruning concern. probably doesn't fully apply here, but
> > the reason we don't track 'if (reg == 1) ...'
> Don't we though?
> http://elixir.free-electrons.com/linux/v4.12-rc4/source/kernel/bpf/verifier.c#L2127
> > is if we mark that
> > register as known const_imm in the true branch, it will screw up
> > pruning quite badly. It's trivial to track and may seem useful,
> > but hurts instead.
> (Thinking out loud...)
> 
> What would be really nice is a way to propagate limits backwards as well as
>  forwards, so that the verifier can say "when I tested this branch, I used
>  this part of the state, I read four bytes past this pointer".  Then when it
>  wants to prune, it can say "well, the state this time isn't as strong, but
>  it still satisfies everything I actually used".
> But that sounds like it would be very hard indeed to do.

that's more or less what i'm trying to do. liveness info per basic block
will trim the state.

> Maybe with the basic-block DAG stuff David's been talking about, we could
>  find all the paths that reach a block, and take the union of their states,
>  and then run through the block feeding it that combined state.  But that
>  could reject code that relies on correlation of the state (i.e. if r1 != 0
>  then r2 is valid ptr I can access, etc) so would still need the 'walk with
>  each individual state' as a fallback.  Though at least you'd have all the
>  states at once so you could find out which ones were subsumed, instead of
>  hoping you get to them in the right order.

I think it's important to optimize verification speed for good programs.
If bad program takes slightly longer, not a big deal. Right now we have
global lock which needs to go away, but that's a minor fix.
In that sense I see that combining the state can help find bad programs
sooner, but I don't see it's helping good programs.
Also we already have programs like:
if (...) {
  var1 = ptr
  var2 = size
} else {
  var1 = different ptr
  var2 = different size
}
call_helper(...var1, var2)
So the state needs to be considered together. Cannot just mix and match.
Initially I was thinking to build Use/Def chains for all operands
of loads, stores and calls and follow them from Use spot to all Defs
recursively to determine validity, but above use case breaks that.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path
@ 2017-06-08 21:17                 ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov via iovisor-dev @ 2017-06-08 21:17 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Thu, Jun 08, 2017 at 08:07:53PM +0100, Edward Cree wrote:
> On 08/06/17 19:41, Alexei Starovoitov wrote:
> > On Thu, Jun 08, 2017 at 06:12:39PM +0100, Edward Cree wrote:
> >> On 08/06/17 17:50, Alexei Starovoitov wrote:
> >>> On Thu, Jun 08, 2017 at 04:25:39PM +0100, Edward Cree wrote:
> >>>> On 08/06/17 03:35, Alexei Starovoitov wrote:
> >>>>> such large back and forth move doesn't help reviewing.
> >>>>> may be just merge it into previous patch?
> >>>>> Or keep that function in the right place in patch 2 already?
> >>>> I think 'diff' got a bit confused, and maybe with different options I could
> >>>>  have got it to produce something more readable.  But I think I will just
> >>>>  merge this into patch 2; it's only separate because it started out as an
> >>>>  experiment.
> >>> after sleeping on it I'm not sure we should be allowing such pointer
> >>> arithmetic. In normal C code people do fancy tricks with lower 3 bits
> >>> of the pointer, but in bpf code I cannot see such use case.
> >>> What kind of realistic code will be doing ptr & 0x40 ?
> >> Well, I didn't support it because I saw a use case.  I supported it because
> >>  it seemed easy to do and the code came out reasonably elegant-looking.
> >> Since this is guarded by env->allow_ptr_leaks, I can't see any reason _not_
> >>  to let people try fancy tricks with the low bits of pointers.
> >> I agree ptr & 0x40 is a crazy thing with no imaginable use case, but...
> >> "Unix was not designed to stop its users from doing stupid things, as that
> >>  would also stop them from doing clever things." ;-)
> > well, I agree with the philosophy :) but I also see few reasons not to allow it:
> > 1. it immediately becomes uapi and if later we find out that it's preventing us
> > to do something we actually really need we'll be stuck looking for workaround
> What could it prevent us from doing, though?  It's basically equivalent to giving
>  BPF an opcode that casts a pointer to a u64, which of course is only allowed if
>  allow_ptr_leaks is true.  And since we don't feed any knowledge about the pointer
>  into the verifier, it's just like any other way of filling a register with
>  arbitrary, unknown bits.
> I can fully appreciate why you're being cautious, what with uapi and all.  But I
>  don't think there's any actual problem here.  Open to being convinced, though.

The leaking is not a concern. It's if we started accepting a certain
class of programs we need to keep accepting them in the future.
Another reason is 'ptr & mask' could have been simply a bug and rejecting it
suppose to help users find issues sooner...
but I don't have a strong opinion here.

> > 2. it's the same pruning concern. probably doesn't fully apply here, but
> > the reason we don't track 'if (reg == 1) ...'
> Don't we though?
> http://elixir.free-electrons.com/linux/v4.12-rc4/source/kernel/bpf/verifier.c#L2127
> > is if we mark that
> > register as known const_imm in the true branch, it will screw up
> > pruning quite badly. It's trivial to track and may seem useful,
> > but hurts instead.
> (Thinking out loud...)
> 
> What would be really nice is a way to propagate limits backwards as well as
>  forwards, so that the verifier can say "when I tested this branch, I used
>  this part of the state, I read four bytes past this pointer".  Then when it
>  wants to prune, it can say "well, the state this time isn't as strong, but
>  it still satisfies everything I actually used".
> But that sounds like it would be very hard indeed to do.

that's more or less what i'm trying to do. liveness info per basic block
will trim the state.

> Maybe with the basic-block DAG stuff David's been talking about, we could
>  find all the paths that reach a block, and take the union of their states,
>  and then run through the block feeding it that combined state.  But that
>  could reject code that relies on correlation of the state (i.e. if r1 != 0
>  then r2 is valid ptr I can access, etc) so would still need the 'walk with
>  each individual state' as a fallback.  Though at least you'd have all the
>  states at once so you could find out which ones were subsumed, instead of
>  hoping you get to them in the right order.

I think it's important to optimize verification speed for good programs.
If bad program takes slightly longer, not a big deal. Right now we have
global lock which needs to go away, but that's a minor fix.
In that sense I see that combining the state can help find bad programs
sooner, but I don't see it's helping good programs.
Also we already have programs like:
if (...) {
  var1 = ptr
  var2 = size
} else {
  var1 = different ptr
  var2 = different size
}
call_helper(...var1, var2)
So the state needs to be considered together. Cannot just mix and match.
Initially I was thinking to build Use/Def chains for all operands
of loads, stores and calls and follow them from Use spot to all Defs
recursively to determine validity, but above use case breaks that.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-08 21:20             ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov @ 2017-06-08 21:20 UTC (permalink / raw)
  To: Edward Cree
  Cc: davem, Alexei Starovoitov, Daniel Borkmann, netdev, iovisor-dev, LKML

On Thu, Jun 08, 2017 at 08:38:29PM +0100, Edward Cree wrote:
> On 08/06/17 17:45, Alexei Starovoitov wrote:
> > On Thu, Jun 08, 2017 at 03:53:36PM +0100, Edward Cree wrote:
> >>>>  
> >>>> -	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
> >>>> +	} else if (reg->type == PTR_TO_STACK) {
> >>>> +		/* stack accesses must be at a fixed offset, so that we can
> >>>> +		 * determine what type of data were returned.
> >>>> +		 */
> >>>> +		if (reg->align.mask) {
> >>>> +			char tn_buf[48];
> >>>> +
> >>>> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
> >>>> +			verbose("variable stack access align=%s off=%d size=%d",
> >>>> +				tn_buf, off, size);
> >>>> +			return -EACCES;
> >>> hmm. why this restriction?
> >>> I thought one of key points of the diff that ptr+var tracking logic
> >>> will now apply not only to map_value, but to stack_ptr as well?
> >> As the comment above it says, we need to determine what was returned:
> >>  was it STACK_MISC or STACK_SPILL, and if the latter, what kind of pointer
> >>  was spilled there?  See check_stack_read(), which I should probably
> >>  mention in the comment.
> > this piece of code is not only spill/fill, but normal ldx/stx stack access.
> > Consider the frequent pattern that many folks tried to do:
> > bpf_prog()
> > {
> >   char buf[64];
> >   int len;
> >
> >   bpf_probe_read(&len, sizeof(len), kernel_ptr_to_filename_len);
> >   bpf_probe_read(buf, sizeof(buf), kernel_ptr_to_filename);
> >   buf[len & (sizeof(buf) - 1)] = 0;
> > ...
> >
> > currently above is not supported, but when 'buf' is a pointer to map value
> > it works fine. Allocating extra bpf map just to do such workaround
> > isn't nice and since this patch generalized map_value_adj with ptr_to_stack
> > we can support above code too.
> > We can check that all bytes of stack for this variable access were
> > initialized already.
> > In the example above it will happen by bpf_probe_read (in the verifier code):
> >         for (i = 0; i < meta.access_size; i++) {
> >                 err = check_mem_access(env, meta.regno, i, BPF_B, BPF_WRITE, -1);
> > so at the time of
> >   buf[len & ..] = 0
> > we can check that 'stx' is within the range of inited stack and allow it.
> Yes, we could check every byte of the stack within the range [buf, buf+63]
>  is a STACK_MISC and if so allow it.  But since this is not supported by the
>  existing code (so it's not a regression), I'd prefer to leave that for a
>  future patch - this one is quite big enough already ;-)

of course! just exploring.

> >>>> +	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
> >>>> +	    state->regs[value_regno].type == SCALAR_VALUE) {
> >>>> +		/* b/h/w load zero-extends, mark upper bits as known 0 */
> >>>> +		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
> >>>> +		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
> >>> probably another helper from tnum.h is needed.
> >> I could rewrite as
> >>  reg->align = tn_and(reg->align, tn_const((1ULL << (size * 8)) - 1))
> > yep. that's perfect.
> In the end I settled on adding a helper
>     struct tnum tnum_cast(struct tnum a, u8 size);
>  since I have a bunch of other places that cast things to 32 bits.

sounds good to me

> > I see. May be print verifier state in such warn_ons and make error
> > more human readable?
> Good idea, I'll do that.
> >>>> +	case PTR_TO_MAP_VALUE_OR_NULL:
> >>> does this new state comparison logic helps? Do you have any numbers before/after in the number of insns it had to process for the tests in selftests ?
> >> I don't have the numbers, no (I'll try to collect them).  This rewrite was
> > Thanks. The main concern is that right now some complex programs
> > that cilium is using are close to the verifier complexity limit and these
> > big changes to amount of info recognized by the verifier can cause pruning
> > to be ineffective, so we need to test on big programs.
> > I think Daniel will be happy to test your next rev of the patches.
> > I'll test them as well.
> > At least 'insn_processed' from C code in tools/testing/selftests/bpf/
> > is a good estimate of how these changes affect pruning.
> It looks like the only place this gets recorded is as "processed %d insns"
>  in the log_buf.  Is there a convenient way to get at this, or am I going
>  to have to make bpf_verify_program grovel through the log sscanf()ing for
>  a matching line?

typically we just run the tests with hacked log_level and grep.
similar stuff Dave did in test_align.c

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
@ 2017-06-08 21:20             ` Alexei Starovoitov via iovisor-dev
  0 siblings, 0 replies; 45+ messages in thread
From: Alexei Starovoitov via iovisor-dev @ 2017-06-08 21:20 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev,
	LKML, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Thu, Jun 08, 2017 at 08:38:29PM +0100, Edward Cree wrote:
> On 08/06/17 17:45, Alexei Starovoitov wrote:
> > On Thu, Jun 08, 2017 at 03:53:36PM +0100, Edward Cree wrote:
> >>>>  
> >>>> -	} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
> >>>> +	} else if (reg->type == PTR_TO_STACK) {
> >>>> +		/* stack accesses must be at a fixed offset, so that we can
> >>>> +		 * determine what type of data were returned.
> >>>> +		 */
> >>>> +		if (reg->align.mask) {
> >>>> +			char tn_buf[48];
> >>>> +
> >>>> +			tn_strn(tn_buf, sizeof(tn_buf), reg->align);
> >>>> +			verbose("variable stack access align=%s off=%d size=%d",
> >>>> +				tn_buf, off, size);
> >>>> +			return -EACCES;
> >>> hmm. why this restriction?
> >>> I thought one of key points of the diff that ptr+var tracking logic
> >>> will now apply not only to map_value, but to stack_ptr as well?
> >> As the comment above it says, we need to determine what was returned:
> >>  was it STACK_MISC or STACK_SPILL, and if the latter, what kind of pointer
> >>  was spilled there?  See check_stack_read(), which I should probably
> >>  mention in the comment.
> > this piece of code is not only spill/fill, but normal ldx/stx stack access.
> > Consider the frequent pattern that many folks tried to do:
> > bpf_prog()
> > {
> >   char buf[64];
> >   int len;
> >
> >   bpf_probe_read(&len, sizeof(len), kernel_ptr_to_filename_len);
> >   bpf_probe_read(buf, sizeof(buf), kernel_ptr_to_filename);
> >   buf[len & (sizeof(buf) - 1)] = 0;
> > ...
> >
> > currently above is not supported, but when 'buf' is a pointer to map value
> > it works fine. Allocating extra bpf map just to do such workaround
> > isn't nice and since this patch generalized map_value_adj with ptr_to_stack
> > we can support above code too.
> > We can check that all bytes of stack for this variable access were
> > initialized already.
> > In the example above it will happen by bpf_probe_read (in the verifier code):
> >         for (i = 0; i < meta.access_size; i++) {
> >                 err = check_mem_access(env, meta.regno, i, BPF_B, BPF_WRITE, -1);
> > so at the time of
> >   buf[len & ..] = 0
> > we can check that 'stx' is within the range of inited stack and allow it.
> Yes, we could check every byte of the stack within the range [buf, buf+63]
>  is a STACK_MISC and if so allow it.  But since this is not supported by the
>  existing code (so it's not a regression), I'd prefer to leave that for a
>  future patch - this one is quite big enough already ;-)

of course! just exploring.

> >>>> +	if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
> >>>> +	    state->regs[value_regno].type == SCALAR_VALUE) {
> >>>> +		/* b/h/w load zero-extends, mark upper bits as known 0 */
> >>>> +		state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1;
> >>>> +		state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1;
> >>> probably another helper from tnum.h is needed.
> >> I could rewrite as
> >>  reg->align = tn_and(reg->align, tn_const((1ULL << (size * 8)) - 1))
> > yep. that's perfect.
> In the end I settled on adding a helper
>     struct tnum tnum_cast(struct tnum a, u8 size);
>  since I have a bunch of other places that cast things to 32 bits.

sounds good to me

> > I see. May be print verifier state in such warn_ons and make error
> > more human readable?
> Good idea, I'll do that.
> >>>> +	case PTR_TO_MAP_VALUE_OR_NULL:
> >>> does this new state comparison logic helps? Do you have any numbers before/after in the number of insns it had to process for the tests in selftests ?
> >> I don't have the numbers, no (I'll try to collect them).  This rewrite was
> > Thanks. The main concern is that right now some complex programs
> > that cilium is using are close to the verifier complexity limit and these
> > big changes to amount of info recognized by the verifier can cause pruning
> > to be ineffective, so we need to test on big programs.
> > I think Daniel will be happy to test your next rev of the patches.
> > I'll test them as well.
> > At least 'insn_processed' from C code in tools/testing/selftests/bpf/
> > is a good estimate of how these changes affect pruning.
> It looks like the only place this gets recorded is as "processed %d insns"
>  in the log_buf.  Is there a convenient way to get at this, or am I going
>  to have to make bpf_verify_program grovel through the log sscanf()ing for
>  a matching line?

typically we just run the tests with hacked log_level and grep.
similar stuff Dave did in test_align.c

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking
  2017-06-08 16:45         ` Alexei Starovoitov via iovisor-dev
  (?)
  (?)
@ 2017-06-09 13:25         ` Daniel Borkmann
  -1 siblings, 0 replies; 45+ messages in thread
From: Daniel Borkmann @ 2017-06-09 13:25 UTC (permalink / raw)
  To: Alexei Starovoitov, Edward Cree
  Cc: davem, Alexei Starovoitov, netdev, iovisor-dev, LKML

On 06/08/2017 06:45 PM, Alexei Starovoitov wrote:
[...]
> I think Daniel will be happy to test your next rev of the patches.
> I'll test them as well.
> At least 'insn_processed' from C code in tools/testing/selftests/bpf/
> is a good estimate of how these changes affect pruning.

Without having looked more deeply (yet), I ran couple of tests with
the cilium test suite to track complexity. Overall programs load
with the set applied, worst case increase I've seen for some of the
current progs was by ~80% from ~33k to ~60k insns. Will still go over
the code for an initial review either today or tomorrow.

> btw, I'm working on bpf_call support and also refactoring verifier
> quite a bit, but my stuff is far from ready and I'll wait for
> your rewrite to land first.
> One of the things I'm working on is trying to get rid of state pruning
> heuristics and use register+stack liveness information instead.
> It's all experimental so far.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2017-06-09 13:25 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07 14:55 [RFC PATCH net-next 0/5] bpf: rewrite value tracking in verifier Edward Cree
2017-06-07 14:55 ` Edward Cree via iovisor-dev
2017-06-07 14:58 ` [RFC PATCH net-next 1/5] selftests/bpf: add test for mixed signed and unsigned bounds checks Edward Cree
2017-06-07 14:58   ` Edward Cree via iovisor-dev
2017-06-07 14:58 ` [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking Edward Cree
2017-06-07 14:58   ` Edward Cree via iovisor-dev
2017-06-08  2:32   ` Alexei Starovoitov
2017-06-08  2:32     ` Alexei Starovoitov via iovisor-dev
2017-06-08 14:53     ` Edward Cree
2017-06-08 14:53       ` Edward Cree via iovisor-dev
2017-06-08 16:45       ` Alexei Starovoitov
2017-06-08 16:45         ` Alexei Starovoitov via iovisor-dev
2017-06-08 19:38         ` Edward Cree
2017-06-08 19:38           ` Edward Cree via iovisor-dev
2017-06-08 21:20           ` Alexei Starovoitov
2017-06-08 21:20             ` Alexei Starovoitov via iovisor-dev
2017-06-09 13:25         ` Daniel Borkmann
2017-06-07 14:58 ` [RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path Edward Cree
2017-06-07 14:58   ` Edward Cree via iovisor-dev
2017-06-08  2:35   ` Alexei Starovoitov
2017-06-08  2:35     ` Alexei Starovoitov via iovisor-dev
2017-06-08 15:25     ` Edward Cree
2017-06-08 15:25       ` Edward Cree via iovisor-dev
2017-06-08 16:50       ` Alexei Starovoitov
2017-06-08 17:12         ` Edward Cree
2017-06-08 17:12           ` Edward Cree via iovisor-dev
2017-06-08 18:41           ` Alexei Starovoitov
2017-06-08 18:41             ` Alexei Starovoitov via iovisor-dev
2017-06-08 19:07             ` Edward Cree
2017-06-08 19:07               ` Edward Cree via iovisor-dev
2017-06-08 21:17               ` Alexei Starovoitov
2017-06-08 21:17                 ` Alexei Starovoitov via iovisor-dev
2017-06-07 14:59 ` [RFC PATCH net-next 4/5] bpf/verifier: track signed and unsigned min/max values Edward Cree
2017-06-08  2:40   ` Alexei Starovoitov
2017-06-08  2:40     ` Alexei Starovoitov via iovisor-dev
2017-06-08 15:23     ` Edward Cree
2017-06-08 15:23       ` Edward Cree via iovisor-dev
2017-06-08 16:47       ` Alexei Starovoitov
2017-06-07 15:00 ` [RFC PATCH net-next 5/5] selftests/bpf: change test_verifier expectations Edward Cree
2017-06-07 15:00   ` Edward Cree via iovisor-dev
2017-06-08  2:43   ` Alexei Starovoitov
2017-06-08  2:43     ` Alexei Starovoitov via iovisor-dev
2017-06-08 15:27     ` Edward Cree
2017-06-08 15:27       ` Edward Cree via iovisor-dev
2017-06-08 20:18 ` [RFC PATCH net-next 0/5] bpf: rewrite value tracking in verifier David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.