bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
@ 2022-12-31 16:31 Eduard Zingerman
  2022-12-31 16:31 ` [RFC bpf-next 1/5] bpf: more precise stack write reasoning for BPF_ST instruction Eduard Zingerman
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Eduard Zingerman @ 2022-12-31 16:31 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, Eduard Zingerman

BPF has two documented (non-atomic) memory store instructions:

BPF_STX: *(size *) (dst_reg + off) = src_reg
BPF_ST : *(size *) (dst_reg + off) = imm32

Currently LLVM BPF back-end does not emit BPF_ST instruction and does
not allow one to be specified as inline assembly.

Recently I've been exploring ways to port some of the verifier test
cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
and machinery provided in tools/testing/selftests/bpf/test_loader.c
(which should hopefully simplify tests maintenance).
The BPF_ST instruction is popular in these tests: used in 52 of 94 files.

While it is possible to adjust LLVM to only support BPF_ST for inline
assembly blocks it seems a bit wasteful. This patch-set contains a set
of changes to verifier necessary in case when LLVM is allowed to
freely emit BPF_ST instructions (source code is available here [1]).
The changes include:
 - update to verifier.c:check_stack_write_*() functions to track
   constant values spilled to stack via BPF_ST instruction in a same
   way stack spills of known registers by BPF_STX are tracked;
 - updates to verifier.c:convert_ctx_access() and it's callbacks to
   handle BPF_ST instruction in a way similar to BPF_STX;
 - some test adjustments and a few new tests.

With the above changes applied and LLVM from [1] all test_verifier,
test_maps, test_progs and test_progs-no_alu32 test cases are passing.

When built using the LLVM version from [1] BPF programs generated for
selftests and Cilium programs (taken from [2]) see certain reduction
in size, e.g. below are total numbers of instructions for files with
over 5K instructions:

File                                    Insns   Insns   Insns   Diff
                                        w/o     with    diff    pct
                                        BPF_ST  BPF_ST
cilium/bpf_host.o                       44620   43774   -846    -1.90%
cilium/bpf_lxc.o                        36842   36060   -782    -2.12%
cilium/bpf_overlay.o                    23557   23186   -371    -1.57%
cilium/bpf_xdp.o                        26397   25931   -466    -1.77%
selftests/core_kern.bpf.o               12359   12359    0       0.00%
selftests/linked_list_fail.bpf.o        5501    5302    -199    -3.62%
selftests/profiler1.bpf.o               17828   17709   -119    -0.67%
selftests/pyperf100.bpf.o               49793   49268   -525    -1.05%
selftests/pyperf180.bpf.o               88738   87813   -925    -1.04%
selftests/pyperf50.bpf.o                25388   25113   -275    -1.08%
selftests/pyperf600.bpf.o               78330   78300   -30     -0.04%
selftests/pyperf_global.bpf.o           5244    5188    -56     -1.07%
selftests/pyperf_subprogs.bpf.o         5262    5192    -70     -1.33%
selftests/strobemeta.bpf.o              17154   16065   -1089   -6.35%
selftests/test_verif_scale2.bpf.o       11337   11337    0       0.00%

(Instructions are counted by counting the number of instruction lines
 in disassembly).

Is community interested in this work?
Are there any omissions in my changes to the verifier?

Known issue:

There are two programs (one Cilium, one selftest) that exhibit
anomalous increase in verifier processing time with this patch-set:

 File                 Program                        Insns (A)  Insns (B)  Insns   (DIFF)
 -------------------  -----------------------------  ---------  ---------  --------------
 bpf_host.o           tail_ipv6_host_policy_ingress       1576       2403  +827 (+52.47%)
 map_kptr.bpf.o       test_map_kptr                        400        475   +75 (+18.75%)
 -------------------  -----------------------------  ---------  ---------  --------------

These are under investigation.

Thanks,
Eduard

[1] https://reviews.llvm.org/D140804
[2] git@github.com:anakryiko/cilium.git

Eduard Zingerman (5):
  bpf: more precise stack write reasoning for BPF_ST instruction
  selftests/bpf: check if verifier tracks constants spilled by
    BPF_ST_MEM
  bpf: allow ctx writes using BPF_ST_MEM instruction
  selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
  selftests/bpf: don't match exact insn index in expected error message

 kernel/bpf/cgroup.c                           |  49 +++++---
 kernel/bpf/verifier.c                         | 102 +++++++++-------
 net/core/filter.c                             |  72 ++++++------
 .../selftests/bpf/prog_tests/log_fixup.c      |   2 +-
 .../selftests/bpf/prog_tests/spin_lock.c      |   8 +-
 .../bpf/verifier/bounds_mix_sign_unsign.c     | 110 ++++++++++--------
 .../selftests/bpf/verifier/bpf_st_mem.c       |  29 +++++
 tools/testing/selftests/bpf/verifier/ctx.c    |  11 --
 tools/testing/selftests/bpf/verifier/unpriv.c |  23 ++++
 9 files changed, 249 insertions(+), 157 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c

-- 
2.39.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC bpf-next 1/5] bpf: more precise stack write reasoning for BPF_ST instruction
  2022-12-31 16:31 [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Eduard Zingerman
@ 2022-12-31 16:31 ` Eduard Zingerman
  2022-12-31 16:31 ` [RFC bpf-next 2/5] selftests/bpf: check if verifier tracks constants spilled by BPF_ST_MEM Eduard Zingerman
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Eduard Zingerman @ 2022-12-31 16:31 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, Eduard Zingerman

For aligned stack writes using BPF_ST instruction track stored values
in a same way BPF_STX is handled, e.g. make sure that the following
commands produce similar verifier knowledge:

  fp[-8] = 42;             r1 = 42;
                       fp[-8] = r1;

This covers two cases:
 - non-null values written to stack are stored as spill of fake
   registers;
 - null values written to stack are stored as STACK_ZERO marks.

Previously both cases above used STACK_MISC marks instead.

Some verifier test cases relied on the old logic to obtain STACK_MISC
marks for some stack values. These test cases are updated in the same
commit to avoid failures during bisect.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 kernel/bpf/verifier.c                         |  23 +++-
 .../bpf/verifier/bounds_mix_sign_unsign.c     | 110 ++++++++++--------
 2 files changed, 84 insertions(+), 49 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4a25375ebb0d..585edea642e1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3257,6 +3257,11 @@ static void save_register_state(struct bpf_func_state *state,
 		scrub_spilled_slot(&state->stack[spi].slot_type[i - 1]);
 }
 
+static bool is_bpf_st_mem(struct bpf_insn *insn)
+{
+	return BPF_CLASS(insn->code) == BPF_ST && BPF_MODE(insn->code) == BPF_MEM;
+}
+
 /* check_stack_{read,write}_fixed_off functions track spill/fill of registers,
  * stack boundary and alignment are checked in check_mem_access()
  */
@@ -3268,8 +3273,9 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 {
 	struct bpf_func_state *cur; /* state of the current function */
 	int i, slot = -off - 1, spi = slot / BPF_REG_SIZE, err;
-	u32 dst_reg = env->prog->insnsi[insn_idx].dst_reg;
+	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
 	struct bpf_reg_state *reg = NULL;
+	u32 dst_reg = insn->dst_reg;
 
 	err = grow_stack_state(state, round_up(slot + 1, BPF_REG_SIZE));
 	if (err)
@@ -3316,6 +3322,14 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 				return err;
 		}
 		save_register_state(state, spi, reg, size);
+	} else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
+		   insn->imm != 0 && env->bpf_capable) {
+		struct bpf_reg_state fake_reg = {};
+
+		__mark_reg_known(&fake_reg, (u32)insn->imm);
+		fake_reg.type = SCALAR_VALUE;
+		fake_reg.parent = state->stack[spi].spilled_ptr.parent;
+		save_register_state(state, spi, &fake_reg, size);
 	} else if (reg && is_spillable_regtype(reg->type)) {
 		/* register containing pointer is being spilled into stack */
 		if (size != BPF_REG_SIZE) {
@@ -3350,7 +3364,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 			state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
 
 		/* when we zero initialize stack slots mark them as such */
-		if (reg && register_is_null(reg)) {
+		if ((reg && register_is_null(reg)) ||
+		    (!reg && is_bpf_st_mem(insn) && insn->imm == 0)) {
 			/* backtracking doesn't work for STACK_ZERO yet. */
 			err = mark_chain_precision(env, value_regno);
 			if (err)
@@ -3395,6 +3410,7 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
 	int min_off, max_off;
 	int i, err;
 	struct bpf_reg_state *ptr_reg = NULL, *value_reg = NULL;
+	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
 	bool writing_zero = false;
 	/* set if the fact that we're writing a zero is used to let any
 	 * stack slots remain STACK_ZERO
@@ -3407,7 +3423,8 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
 	max_off = ptr_reg->smax_value + off + size;
 	if (value_regno >= 0)
 		value_reg = &cur->regs[value_regno];
-	if (value_reg && register_is_null(value_reg))
+	if ((value_reg && register_is_null(value_reg)) ||
+	    (!value_reg && is_bpf_st_mem(insn) && insn->imm == 0))
 		writing_zero = true;
 
 	err = grow_stack_state(state, round_up(-min_off, BPF_REG_SIZE));
diff --git a/tools/testing/selftests/bpf/verifier/bounds_mix_sign_unsign.c b/tools/testing/selftests/bpf/verifier/bounds_mix_sign_unsign.c
index c2aa6f26738b..bf82b923c5fe 100644
--- a/tools/testing/selftests/bpf/verifier/bounds_mix_sign_unsign.c
+++ b/tools/testing/selftests/bpf/verifier/bounds_mix_sign_unsign.c
@@ -1,13 +1,14 @@
 {
 	"bounds checks mixing signed and unsigned, positive bounds",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, 2),
 	BPF_JMP_REG(BPF_JGE, BPF_REG_2, BPF_REG_1, 3),
@@ -17,20 +18,21 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
 {
 	"bounds checks mixing signed and unsigned",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, -1),
 	BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_2, 3),
@@ -40,20 +42,21 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 2",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, -1),
 	BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_2, 5),
@@ -65,20 +68,21 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 3",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, -1),
 	BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_2, 4),
@@ -89,20 +93,21 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 4",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, 1),
 	BPF_ALU64_REG(BPF_AND, BPF_REG_1, BPF_REG_2),
@@ -112,19 +117,20 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.result = ACCEPT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 5",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, -1),
 	BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_2, 5),
@@ -135,17 +141,20 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 6",
 	.insns = {
+	BPF_MOV64_REG(BPF_REG_9, BPF_REG_1),
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_9),
 	BPF_MOV64_IMM(BPF_REG_2, 0),
 	BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, -512),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_6, -1),
 	BPF_JMP_REG(BPF_JGT, BPF_REG_4, BPF_REG_6, 5),
@@ -163,13 +172,14 @@
 {
 	"bounds checks mixing signed and unsigned, variant 7",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, 1024 * 1024 * 1024),
 	BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_2, 3),
@@ -179,19 +189,20 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.result = ACCEPT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 8",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, -1),
 	BPF_JMP_REG(BPF_JGT, BPF_REG_2, BPF_REG_1, 2),
@@ -203,20 +214,21 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 9",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 10),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_LD_IMM64(BPF_REG_2, -9223372036854775808ULL),
 	BPF_JMP_REG(BPF_JGT, BPF_REG_2, BPF_REG_1, 2),
@@ -228,19 +240,20 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.result = ACCEPT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 10",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, 0),
 	BPF_JMP_REG(BPF_JGT, BPF_REG_2, BPF_REG_1, 2),
@@ -252,20 +265,21 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 11",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, -1),
 	BPF_JMP_REG(BPF_JGE, BPF_REG_2, BPF_REG_1, 2),
@@ -278,20 +292,21 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 12",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, -6),
 	BPF_JMP_REG(BPF_JGE, BPF_REG_2, BPF_REG_1, 2),
@@ -303,20 +318,21 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 13",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, 2),
 	BPF_JMP_REG(BPF_JGE, BPF_REG_2, BPF_REG_1, 2),
@@ -331,7 +347,7 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
@@ -340,13 +356,14 @@
 	.insns = {
 	BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
 		    offsetof(struct __sk_buff, mark)),
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, -1),
 	BPF_MOV64_IMM(BPF_REG_8, 2),
@@ -360,20 +377,21 @@
 	BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_2, -3),
 	BPF_JMP_IMM(BPF_JA, 0, 0, -7),
 	},
-	.fixup_map_hash_8b = { 4 },
+	.fixup_map_hash_8b = { 6 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
 {
 	"bounds checks mixing signed and unsigned, variant 15",
 	.insns = {
+	BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns),
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
 	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
 	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
-	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
-	BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
 	BPF_MOV64_IMM(BPF_REG_2, -6),
 	BPF_JMP_REG(BPF_JGE, BPF_REG_2, BPF_REG_1, 2),
@@ -387,7 +405,7 @@
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-	.fixup_map_hash_8b = { 3 },
+	.fixup_map_hash_8b = { 5 },
 	.errstr = "unbounded min value",
 	.result = REJECT,
 },
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC bpf-next 2/5] selftests/bpf: check if verifier tracks constants spilled by BPF_ST_MEM
  2022-12-31 16:31 [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Eduard Zingerman
  2022-12-31 16:31 ` [RFC bpf-next 1/5] bpf: more precise stack write reasoning for BPF_ST instruction Eduard Zingerman
@ 2022-12-31 16:31 ` Eduard Zingerman
  2022-12-31 16:31 ` [RFC bpf-next 3/5] bpf: allow ctx writes using BPF_ST_MEM instruction Eduard Zingerman
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Eduard Zingerman @ 2022-12-31 16:31 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, Eduard Zingerman

Check that verifier tracks the value of 'imm' spilled to stack by
BPF_ST_MEM instruction. Cover the following cases:
- write of non-zero constant to stack;
- write of a zero constant to stack.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 .../selftests/bpf/verifier/bpf_st_mem.c       | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c

diff --git a/tools/testing/selftests/bpf/verifier/bpf_st_mem.c b/tools/testing/selftests/bpf/verifier/bpf_st_mem.c
new file mode 100644
index 000000000000..d3aa293f1a9d
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/bpf_st_mem.c
@@ -0,0 +1,29 @@
+{
+	"BPF_ST_MEM stack imm non-zero",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 42),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, -42),
+	/* if value is tracked correctly R0 is zero */
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"BPF_ST_MEM stack imm zero",
+	.insns = {
+	/* mark stack 0000 0000 */
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	/* read and sum a few bytes */
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_10, -8),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+	BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_10, -4),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+	BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_10, -1),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+	/* if value is tracked correctly R0 is zero */
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC bpf-next 3/5] bpf: allow ctx writes using BPF_ST_MEM instruction
  2022-12-31 16:31 [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Eduard Zingerman
  2022-12-31 16:31 ` [RFC bpf-next 1/5] bpf: more precise stack write reasoning for BPF_ST instruction Eduard Zingerman
  2022-12-31 16:31 ` [RFC bpf-next 2/5] selftests/bpf: check if verifier tracks constants spilled by BPF_ST_MEM Eduard Zingerman
@ 2022-12-31 16:31 ` Eduard Zingerman
  2022-12-31 16:31 ` [RFC bpf-next 4/5] selftests/bpf: test if pointer type is tracked for BPF_ST_MEM Eduard Zingerman
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Eduard Zingerman @ 2022-12-31 16:31 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, Eduard Zingerman

Lift verifier restriction to use BPF_ST_MEM instructions to write to
context data structures. This requires the following changes:
 - verifier.c:do_check() for BPF_ST updated to:
   - no longer forbid writes to registers of type PTR_TO_CTX;
   - track dst_reg type in the env->insn_aux_data[...].ptr_type field
     (same way it is done for BPF_STX and BPF_LDX instructions).
 - verifier.c:convert_ctx_access() and various callbacks invoked by
   it are updated to handle BPF_ST instruction alongside BPF_STX.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 kernel/bpf/cgroup.c                        | 49 ++++++++------
 kernel/bpf/verifier.c                      | 79 +++++++++++-----------
 net/core/filter.c                          | 72 ++++++++++----------
 tools/testing/selftests/bpf/verifier/ctx.c | 11 ---
 4 files changed, 108 insertions(+), 103 deletions(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index bf2fdb33fb31..a57f1b44dc6c 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -2223,10 +2223,12 @@ static u32 sysctl_convert_ctx_access(enum bpf_access_type type,
 				BPF_FIELD_SIZEOF(struct bpf_sysctl_kern, ppos),
 				treg, si->dst_reg,
 				offsetof(struct bpf_sysctl_kern, ppos));
-			*insn++ = BPF_STX_MEM(
-				BPF_SIZEOF(u32), treg, si->src_reg,
+			*insn++ = BPF_RAW_INSN(
+				BPF_CLASS(si->code) | BPF_MEM | BPF_SIZEOF(u32),
+				treg, si->src_reg,
 				bpf_ctx_narrow_access_offset(
-					0, sizeof(u32), sizeof(loff_t)));
+					0, sizeof(u32), sizeof(loff_t)),
+				si->imm);
 			*insn++ = BPF_LDX_MEM(
 				BPF_DW, treg, si->dst_reg,
 				offsetof(struct bpf_sysctl_kern, tmp_reg));
@@ -2376,10 +2378,17 @@ static bool cg_sockopt_is_valid_access(int off, int size,
 	return true;
 }
 
-#define CG_SOCKOPT_ACCESS_FIELD(T, F)					\
-	T(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F),			\
-	  si->dst_reg, si->src_reg,					\
-	  offsetof(struct bpf_sockopt_kern, F))
+#define CG_SOCKOPT_READ_FIELD(F)					\
+	BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F),	\
+		    si->dst_reg, si->src_reg,				\
+		    offsetof(struct bpf_sockopt_kern, F))
+
+#define CG_SOCKOPT_WRITE_FIELD(F)					\
+	BPF_RAW_INSN((BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F) |	\
+		      BPF_MEM | BPF_CLASS(si->code)),			\
+		     si->dst_reg, si->src_reg,				\
+		     offsetof(struct bpf_sockopt_kern, F),		\
+		     si->imm)
 
 static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
 					 const struct bpf_insn *si,
@@ -2391,25 +2400,25 @@ static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
 
 	switch (si->off) {
 	case offsetof(struct bpf_sockopt, sk):
-		*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, sk);
+		*insn++ = CG_SOCKOPT_READ_FIELD(sk);
 		break;
 	case offsetof(struct bpf_sockopt, level):
 		if (type == BPF_WRITE)
-			*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, level);
+			*insn++ = CG_SOCKOPT_WRITE_FIELD(level);
 		else
-			*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, level);
+			*insn++ = CG_SOCKOPT_READ_FIELD(level);
 		break;
 	case offsetof(struct bpf_sockopt, optname):
 		if (type == BPF_WRITE)
-			*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, optname);
+			*insn++ = CG_SOCKOPT_WRITE_FIELD(optname);
 		else
-			*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optname);
+			*insn++ = CG_SOCKOPT_READ_FIELD(optname);
 		break;
 	case offsetof(struct bpf_sockopt, optlen):
 		if (type == BPF_WRITE)
-			*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, optlen);
+			*insn++ = CG_SOCKOPT_WRITE_FIELD(optlen);
 		else
-			*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optlen);
+			*insn++ = CG_SOCKOPT_READ_FIELD(optlen);
 		break;
 	case offsetof(struct bpf_sockopt, retval):
 		BUILD_BUG_ON(offsetof(struct bpf_cg_run_ctx, run_ctx) != 0);
@@ -2429,9 +2438,11 @@ static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
 			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct task_struct, bpf_ctx),
 					      treg, treg,
 					      offsetof(struct task_struct, bpf_ctx));
-			*insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(struct bpf_cg_run_ctx, retval),
-					      treg, si->src_reg,
-					      offsetof(struct bpf_cg_run_ctx, retval));
+			*insn++ = BPF_RAW_INSN(BPF_CLASS(si->code) | BPF_MEM |
+					       BPF_FIELD_SIZEOF(struct bpf_cg_run_ctx, retval),
+					       treg, si->src_reg,
+					       offsetof(struct bpf_cg_run_ctx, retval),
+					       si->imm);
 			*insn++ = BPF_LDX_MEM(BPF_DW, treg, si->dst_reg,
 					      offsetof(struct bpf_sockopt_kern, tmp_reg));
 		} else {
@@ -2447,10 +2458,10 @@ static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type,
 		}
 		break;
 	case offsetof(struct bpf_sockopt, optval):
-		*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optval);
+		*insn++ = CG_SOCKOPT_READ_FIELD(optval);
 		break;
 	case offsetof(struct bpf_sockopt, optval_end):
-		*insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optval_end);
+		*insn++ = CG_SOCKOPT_READ_FIELD(optval_end);
 		break;
 	}
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 585edea642e1..be7d8df7257d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13742,6 +13742,31 @@ static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
 			       !reg_type_mismatch_ok(prev));
 }
 
+static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type)
+{
+	enum bpf_reg_type *prev_type = &env->insn_aux_data[env->insn_idx].ptr_type;
+
+	if (*prev_type == NOT_INIT) {
+		/* Saw a valid insn
+		 * dst_reg = *(u32 *)(src_reg + off)
+		 * save type to validate intersecting paths
+		 */
+		*prev_type = type;
+	} else if (reg_type_mismatch(type, *prev_type)) {
+		/* Abuser program is trying to use the same insn
+		 * dst_reg = *(u32*) (src_reg + off)
+		 * with different pointer types:
+		 * src_reg == ctx in one branch and
+		 * src_reg == stack|map in some other branch.
+		 * Reject it.
+		 */
+		verbose(env, "same insn cannot be used with different pointers\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int do_check(struct bpf_verifier_env *env)
 {
 	bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
@@ -13851,7 +13876,7 @@ static int do_check(struct bpf_verifier_env *env)
 				return err;
 
 		} else if (class == BPF_LDX) {
-			enum bpf_reg_type *prev_src_type, src_reg_type;
+			enum bpf_reg_type src_reg_type;
 
 			/* check for reserved fields is already done */
 
@@ -13875,29 +13900,11 @@ static int do_check(struct bpf_verifier_env *env)
 			if (err)
 				return err;
 
-			prev_src_type = &env->insn_aux_data[env->insn_idx].ptr_type;
-
-			if (*prev_src_type == NOT_INIT) {
-				/* saw a valid insn
-				 * dst_reg = *(u32 *)(src_reg + off)
-				 * save type to validate intersecting paths
-				 */
-				*prev_src_type = src_reg_type;
-
-			} else if (reg_type_mismatch(src_reg_type, *prev_src_type)) {
-				/* ABuser program is trying to use the same insn
-				 * dst_reg = *(u32*) (src_reg + off)
-				 * with different pointer types:
-				 * src_reg == ctx in one branch and
-				 * src_reg == stack|map in some other branch.
-				 * Reject it.
-				 */
-				verbose(env, "same insn cannot be used with different pointers\n");
-				return -EINVAL;
-			}
-
+			err = save_aux_ptr_type(env, src_reg_type);
+			if (err)
+				return err;
 		} else if (class == BPF_STX) {
-			enum bpf_reg_type *prev_dst_type, dst_reg_type;
+			enum bpf_reg_type dst_reg_type;
 
 			if (BPF_MODE(insn->code) == BPF_ATOMIC) {
 				err = check_atomic(env, env->insn_idx, insn);
@@ -13930,16 +13937,12 @@ static int do_check(struct bpf_verifier_env *env)
 			if (err)
 				return err;
 
-			prev_dst_type = &env->insn_aux_data[env->insn_idx].ptr_type;
-
-			if (*prev_dst_type == NOT_INIT) {
-				*prev_dst_type = dst_reg_type;
-			} else if (reg_type_mismatch(dst_reg_type, *prev_dst_type)) {
-				verbose(env, "same insn cannot be used with different pointers\n");
-				return -EINVAL;
-			}
-
+			err = save_aux_ptr_type(env, dst_reg_type);
+			if (err)
+				return err;
 		} else if (class == BPF_ST) {
+			enum bpf_reg_type dst_reg_type;
+
 			if (BPF_MODE(insn->code) != BPF_MEM ||
 			    insn->src_reg != BPF_REG_0) {
 				verbose(env, "BPF_ST uses reserved fields\n");
@@ -13950,12 +13953,7 @@ static int do_check(struct bpf_verifier_env *env)
 			if (err)
 				return err;
 
-			if (is_ctx_reg(env, insn->dst_reg)) {
-				verbose(env, "BPF_ST stores into R%d %s is not allowed\n",
-					insn->dst_reg,
-					reg_type_str(env, reg_state(env, insn->dst_reg)->type));
-				return -EACCES;
-			}
+			dst_reg_type = regs[insn->dst_reg].type;
 
 			/* check that memory (dst_reg + off) is writeable */
 			err = check_mem_access(env, env->insn_idx, insn->dst_reg,
@@ -13964,6 +13962,9 @@ static int do_check(struct bpf_verifier_env *env)
 			if (err)
 				return err;
 
+			err = save_aux_ptr_type(env, dst_reg_type);
+			if (err)
+				return err;
 		} else if (class == BPF_JMP || class == BPF_JMP32) {
 			u8 opcode = BPF_OP(insn->code);
 
@@ -15087,7 +15088,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			   insn->code == (BPF_ST | BPF_MEM | BPF_W) ||
 			   insn->code == (BPF_ST | BPF_MEM | BPF_DW)) {
 			type = BPF_WRITE;
-			ctx_access = BPF_CLASS(insn->code) == BPF_STX;
+			ctx_access = true;
 		} else {
 			continue;
 		}
diff --git a/net/core/filter.c b/net/core/filter.c
index c746e4d77214..1353bb1d476a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9221,11 +9221,15 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
 #endif
 
 	/* <store>: skb->tstamp = tstamp */
-	*insn++ = BPF_STX_MEM(BPF_DW, skb_reg, value_reg,
-			      offsetof(struct sk_buff, tstamp));
+	*insn++ = BPF_RAW_INSN(BPF_CLASS(si->code) | BPF_DW | BPF_MEM,
+			       skb_reg, value_reg, offsetof(struct sk_buff, tstamp), si->imm);
 	return insn;
 }
 
+#define BPF_COPY_STORE(size, si, off)					\
+	BPF_RAW_INSN((si)->code | (size) | BPF_MEM,			\
+		     (si)->dst_reg, (si)->src_reg, (off), (si)->imm)
+
 static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 				  const struct bpf_insn *si,
 				  struct bpf_insn *insn_buf,
@@ -9255,9 +9259,9 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 
 	case offsetof(struct __sk_buff, priority):
 		if (type == BPF_WRITE)
-			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
-					      bpf_target_off(struct sk_buff, priority, 4,
-							     target_size));
+			*insn++ = BPF_COPY_STORE(BPF_W, si,
+						 bpf_target_off(struct sk_buff, priority, 4,
+								target_size));
 		else
 			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
 					      bpf_target_off(struct sk_buff, priority, 4,
@@ -9288,9 +9292,9 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 
 	case offsetof(struct __sk_buff, mark):
 		if (type == BPF_WRITE)
-			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
-					      bpf_target_off(struct sk_buff, mark, 4,
-							     target_size));
+			*insn++ = BPF_COPY_STORE(BPF_W, si,
+						 bpf_target_off(struct sk_buff, mark, 4,
+								target_size));
 		else
 			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
 					      bpf_target_off(struct sk_buff, mark, 4,
@@ -9310,10 +9314,10 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 	case offsetof(struct __sk_buff, queue_mapping):
 		if (type == BPF_WRITE) {
 			*insn++ = BPF_JMP_IMM(BPF_JGE, si->src_reg, NO_QUEUE_MAPPING, 1);
-			*insn++ = BPF_STX_MEM(BPF_H, si->dst_reg, si->src_reg,
-					      bpf_target_off(struct sk_buff,
-							     queue_mapping,
-							     2, target_size));
+			*insn++ = BPF_COPY_STORE(BPF_H, si,
+						 bpf_target_off(struct sk_buff,
+								queue_mapping,
+								2, target_size));
 		} else {
 			*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
 					      bpf_target_off(struct sk_buff,
@@ -9349,8 +9353,7 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 		off += offsetof(struct sk_buff, cb);
 		off += offsetof(struct qdisc_skb_cb, data);
 		if (type == BPF_WRITE)
-			*insn++ = BPF_STX_MEM(BPF_SIZE(si->code), si->dst_reg,
-					      si->src_reg, off);
+			*insn++ = BPF_COPY_STORE(BPF_SIZE(si->code), si, off);
 		else
 			*insn++ = BPF_LDX_MEM(BPF_SIZE(si->code), si->dst_reg,
 					      si->src_reg, off);
@@ -9365,8 +9368,7 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 		off += offsetof(struct qdisc_skb_cb, tc_classid);
 		*target_size = 2;
 		if (type == BPF_WRITE)
-			*insn++ = BPF_STX_MEM(BPF_H, si->dst_reg,
-					      si->src_reg, off);
+			*insn++ = BPF_COPY_STORE(BPF_H, si, off);
 		else
 			*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg,
 					      si->src_reg, off);
@@ -9399,9 +9401,9 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 	case offsetof(struct __sk_buff, tc_index):
 #ifdef CONFIG_NET_SCHED
 		if (type == BPF_WRITE)
-			*insn++ = BPF_STX_MEM(BPF_H, si->dst_reg, si->src_reg,
-					      bpf_target_off(struct sk_buff, tc_index, 2,
-							     target_size));
+			*insn++ = BPF_COPY_STORE(BPF_H, si,
+						 bpf_target_off(struct sk_buff, tc_index, 2,
+								target_size));
 		else
 			*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
 					      bpf_target_off(struct sk_buff, tc_index, 2,
@@ -9602,8 +9604,8 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
 		BUILD_BUG_ON(sizeof_field(struct sock, sk_bound_dev_if) != 4);
 
 		if (type == BPF_WRITE)
-			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
-					offsetof(struct sock, sk_bound_dev_if));
+			*insn++ = BPF_COPY_STORE(BPF_W, si,
+						 offsetof(struct sock, sk_bound_dev_if));
 		else
 			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
 				      offsetof(struct sock, sk_bound_dev_if));
@@ -9613,8 +9615,8 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
 		BUILD_BUG_ON(sizeof_field(struct sock, sk_mark) != 4);
 
 		if (type == BPF_WRITE)
-			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
-					offsetof(struct sock, sk_mark));
+			*insn++ = BPF_COPY_STORE(BPF_W, si,
+						 offsetof(struct sock, sk_mark));
 		else
 			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
 				      offsetof(struct sock, sk_mark));
@@ -9624,8 +9626,8 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
 		BUILD_BUG_ON(sizeof_field(struct sock, sk_priority) != 4);
 
 		if (type == BPF_WRITE)
-			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
-					offsetof(struct sock, sk_priority));
+			*insn++ = BPF_COPY_STORE(BPF_W, si,
+						 offsetof(struct sock, sk_priority));
 		else
 			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
 				      offsetof(struct sock, sk_priority));
@@ -9890,10 +9892,12 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
 				      offsetof(S, TF));			       \
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), tmp_reg,	       \
 				      si->dst_reg, offsetof(S, F));	       \
-		*insn++ = BPF_STX_MEM(SIZE, tmp_reg, si->src_reg,	       \
+		*insn++ = BPF_RAW_INSN(SIZE | BPF_MEM | BPF_CLASS(si->code),   \
+				       tmp_reg, si->src_reg,		       \
 			bpf_target_off(NS, NF, sizeof_field(NS, NF),	       \
 				       target_size)			       \
-				+ OFF);					       \
+				       + OFF,				       \
+				       si->imm);			       \
 		*insn++ = BPF_LDX_MEM(BPF_DW, tmp_reg, si->dst_reg,	       \
 				      offsetof(S, TF));			       \
 	} while (0)
@@ -10128,9 +10132,11 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 						struct bpf_sock_ops_kern, sk),\
 				      reg, si->dst_reg,			      \
 				      offsetof(struct bpf_sock_ops_kern, sk));\
-		*insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD),	      \
-				      reg, si->src_reg,			      \
-				      offsetof(OBJ, OBJ_FIELD));	      \
+		*insn++ = BPF_RAW_INSN(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD) |     \
+				       BPF_MEM | BPF_CLASS(si->code),	      \
+				       reg, si->src_reg,		      \
+				       offsetof(OBJ, OBJ_FIELD),	      \
+				       si->imm);			      \
 		*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->dst_reg,		      \
 				      offsetof(struct bpf_sock_ops_kern,      \
 					       temp));			      \
@@ -10165,8 +10171,7 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 		off -= offsetof(struct bpf_sock_ops, replylong[0]);
 		off += offsetof(struct bpf_sock_ops_kern, replylong[0]);
 		if (type == BPF_WRITE)
-			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
-					      off);
+			*insn++ = BPF_COPY_STORE(BPF_W, si, off);
 		else
 			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
 					      off);
@@ -10523,8 +10528,7 @@ static u32 sk_skb_convert_ctx_access(enum bpf_access_type type,
 		off += offsetof(struct sk_buff, cb);
 		off += offsetof(struct sk_skb_cb, data);
 		if (type == BPF_WRITE)
-			*insn++ = BPF_STX_MEM(BPF_SIZE(si->code), si->dst_reg,
-					      si->src_reg, off);
+			*insn++ = BPF_COPY_STORE(BPF_SIZE(si->code), si, off);
 		else
 			*insn++ = BPF_LDX_MEM(BPF_SIZE(si->code), si->dst_reg,
 					      si->src_reg, off);
diff --git a/tools/testing/selftests/bpf/verifier/ctx.c b/tools/testing/selftests/bpf/verifier/ctx.c
index c8eaf0536c24..2fd31612c0b8 100644
--- a/tools/testing/selftests/bpf/verifier/ctx.c
+++ b/tools/testing/selftests/bpf/verifier/ctx.c
@@ -1,14 +1,3 @@
-{
-	"context stores via ST",
-	.insns = {
-	BPF_MOV64_IMM(BPF_REG_0, 0),
-	BPF_ST_MEM(BPF_DW, BPF_REG_1, offsetof(struct __sk_buff, mark), 0),
-	BPF_EXIT_INSN(),
-	},
-	.errstr = "BPF_ST stores into R1 ctx is not allowed",
-	.result = REJECT,
-	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
-},
 {
 	"context stores via BPF_ATOMIC",
 	.insns = {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC bpf-next 4/5] selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
  2022-12-31 16:31 [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Eduard Zingerman
                   ` (2 preceding siblings ...)
  2022-12-31 16:31 ` [RFC bpf-next 3/5] bpf: allow ctx writes using BPF_ST_MEM instruction Eduard Zingerman
@ 2022-12-31 16:31 ` Eduard Zingerman
  2022-12-31 16:31 ` [RFC bpf-next 5/5] selftests/bpf: don't match exact insn index in expected error message Eduard Zingerman
  2023-01-04 22:10 ` [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Andrii Nakryiko
  5 siblings, 0 replies; 16+ messages in thread
From: Eduard Zingerman @ 2022-12-31 16:31 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, Eduard Zingerman

Check that verifier tracks pointer types for BPF_ST_MEM instructions
and reports error if pointer types do not match for different
execution branches.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 tools/testing/selftests/bpf/verifier/unpriv.c | 23 +++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/tools/testing/selftests/bpf/verifier/unpriv.c b/tools/testing/selftests/bpf/verifier/unpriv.c
index 878ca26c3f0a..af0c0f336625 100644
--- a/tools/testing/selftests/bpf/verifier/unpriv.c
+++ b/tools/testing/selftests/bpf/verifier/unpriv.c
@@ -239,6 +239,29 @@
 	.errstr = "same insn cannot be used with different pointers",
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 },
+{
+	/* Same as above, but use BPF_ST_MEM to save 42
+	 * instead of BPF_STX_MEM.
+	 */
+	"unpriv: spill/fill of different pointers st",
+	.insns = {
+	BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 3),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -16),
+	BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_2, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_1, offsetof(struct __sk_buff, mark), 42),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.result = REJECT,
+	.errstr = "same insn cannot be used with different pointers",
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+},
 {
 	"unpriv: spill/fill of different pointers stx - ctx and sock",
 	.insns = {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC bpf-next 5/5] selftests/bpf: don't match exact insn index in expected error message
  2022-12-31 16:31 [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Eduard Zingerman
                   ` (3 preceding siblings ...)
  2022-12-31 16:31 ` [RFC bpf-next 4/5] selftests/bpf: test if pointer type is tracked for BPF_ST_MEM Eduard Zingerman
@ 2022-12-31 16:31 ` Eduard Zingerman
  2023-01-04 22:10 ` [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Andrii Nakryiko
  5 siblings, 0 replies; 16+ messages in thread
From: Eduard Zingerman @ 2022-12-31 16:31 UTC (permalink / raw)
  To: bpf, ast; +Cc: andrii, daniel, kernel-team, yhs, Eduard Zingerman

Depending on the behavior of the C compiler statements like below
could be translated as 1 or 2 instructions:

  C:  int credit = 0;

BPF:  *(u32 *)(r10 -4) = 0

      - or -

      r1 = 0
      *(u32 *)(r10 -4) = r1

This commit relaxes expected error messages for a few tests to avoid
matching exact instruction number.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 tools/testing/selftests/bpf/prog_tests/log_fixup.c | 2 +-
 tools/testing/selftests/bpf/prog_tests/spin_lock.c | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/log_fixup.c b/tools/testing/selftests/bpf/prog_tests/log_fixup.c
index f4ffdcabf4e4..760bd3155ea2 100644
--- a/tools/testing/selftests/bpf/prog_tests/log_fixup.c
+++ b/tools/testing/selftests/bpf/prog_tests/log_fixup.c
@@ -123,7 +123,7 @@ static void missing_map(void)
 	ASSERT_FALSE(bpf_map__autocreate(skel->maps.missing_map), "missing_map_autocreate");
 
 	ASSERT_HAS_SUBSTR(log_buf,
-			  "8: <invalid BPF map reference>\n"
+			  ": <invalid BPF map reference>\n"
 			  "BPF map 'missing_map' is referenced but wasn't created\n",
 			  "log_buf");
 
diff --git a/tools/testing/selftests/bpf/prog_tests/spin_lock.c b/tools/testing/selftests/bpf/prog_tests/spin_lock.c
index d9270bd3d920..1bdb99b588f0 100644
--- a/tools/testing/selftests/bpf/prog_tests/spin_lock.c
+++ b/tools/testing/selftests/bpf/prog_tests/spin_lock.c
@@ -19,12 +19,12 @@ static struct {
 	  "; R1_w=map_value(off=0,ks=4,vs=4,imm=0)\n2: (85) call bpf_this_cpu_ptr#154\n"
 	  "R1 type=map_value expected=percpu_ptr_" },
 	{ "lock_id_mapval_preserve",
-	  "8: (bf) r1 = r0                       ; R0_w=map_value(id=1,off=0,ks=4,vs=8,imm=0) "
-	  "R1_w=map_value(id=1,off=0,ks=4,vs=8,imm=0)\n9: (85) call bpf_this_cpu_ptr#154\n"
+	  ": (bf) r1 = r0                       ; R0_w=map_value(id=1,off=0,ks=4,vs=8,imm=0) "
+	  "R1_w=map_value(id=1,off=0,ks=4,vs=8,imm=0)\n8: (85) call bpf_this_cpu_ptr#154\n"
 	  "R1 type=map_value expected=percpu_ptr_" },
 	{ "lock_id_innermapval_preserve",
-	  "13: (bf) r1 = r0                      ; R0=map_value(id=2,off=0,ks=4,vs=8,imm=0) "
-	  "R1_w=map_value(id=2,off=0,ks=4,vs=8,imm=0)\n14: (85) call bpf_this_cpu_ptr#154\n"
+	  ": (bf) r1 = r0                      ; R0=map_value(id=2,off=0,ks=4,vs=8,imm=0) "
+	  "R1_w=map_value(id=2,off=0,ks=4,vs=8,imm=0)\n13: (85) call bpf_this_cpu_ptr#154\n"
 	  "R1 type=map_value expected=percpu_ptr_" },
 	{ "lock_id_mismatch_kptr_kptr", "bpf_spin_unlock of different lock" },
 	{ "lock_id_mismatch_kptr_global", "bpf_spin_unlock of different lock" },
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
  2022-12-31 16:31 [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Eduard Zingerman
                   ` (4 preceding siblings ...)
  2022-12-31 16:31 ` [RFC bpf-next 5/5] selftests/bpf: don't match exact insn index in expected error message Eduard Zingerman
@ 2023-01-04 22:10 ` Andrii Nakryiko
  2023-01-05 10:06   ` Jose E. Marchesi
  5 siblings, 1 reply; 16+ messages in thread
From: Andrii Nakryiko @ 2023-01-04 22:10 UTC (permalink / raw)
  To: Eduard Zingerman; +Cc: bpf, ast, andrii, daniel, kernel-team, yhs

On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> BPF has two documented (non-atomic) memory store instructions:
>
> BPF_STX: *(size *) (dst_reg + off) = src_reg
> BPF_ST : *(size *) (dst_reg + off) = imm32
>
> Currently LLVM BPF back-end does not emit BPF_ST instruction and does
> not allow one to be specified as inline assembly.
>
> Recently I've been exploring ways to port some of the verifier test
> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
> and machinery provided in tools/testing/selftests/bpf/test_loader.c
> (which should hopefully simplify tests maintenance).
> The BPF_ST instruction is popular in these tests: used in 52 of 94 files.
>
> While it is possible to adjust LLVM to only support BPF_ST for inline
> assembly blocks it seems a bit wasteful. This patch-set contains a set
> of changes to verifier necessary in case when LLVM is allowed to
> freely emit BPF_ST instructions (source code is available here [1]).

Would we gate LLVM's emitting of BPF_ST for C code behind some new
cpu=v4? What is the benefit for compiler to start automatically emit
such instructions? Such thinking about logistics, if there isn't much
benefit, as BPF application owner I wouldn't bother enabling this
behavior risking regressions on old kernels that don't have these
changes.

So I feel like the biggest benefit is to be able to use this
instruction in embedded assembly, to make writing and maintaining
tests easier.

> The changes include:
>  - update to verifier.c:check_stack_write_*() functions to track
>    constant values spilled to stack via BPF_ST instruction in a same
>    way stack spills of known registers by BPF_STX are tracked;
>  - updates to verifier.c:convert_ctx_access() and it's callbacks to
>    handle BPF_ST instruction in a way similar to BPF_STX;
>  - some test adjustments and a few new tests.
>
> With the above changes applied and LLVM from [1] all test_verifier,
> test_maps, test_progs and test_progs-no_alu32 test cases are passing.
>
> When built using the LLVM version from [1] BPF programs generated for
> selftests and Cilium programs (taken from [2]) see certain reduction
> in size, e.g. below are total numbers of instructions for files with
> over 5K instructions:
>
> File                                    Insns   Insns   Insns   Diff
>                                         w/o     with    diff    pct
>                                         BPF_ST  BPF_ST
> cilium/bpf_host.o                       44620   43774   -846    -1.90%
> cilium/bpf_lxc.o                        36842   36060   -782    -2.12%
> cilium/bpf_overlay.o                    23557   23186   -371    -1.57%
> cilium/bpf_xdp.o                        26397   25931   -466    -1.77%
> selftests/core_kern.bpf.o               12359   12359    0       0.00%
> selftests/linked_list_fail.bpf.o        5501    5302    -199    -3.62%
> selftests/profiler1.bpf.o               17828   17709   -119    -0.67%
> selftests/pyperf100.bpf.o               49793   49268   -525    -1.05%
> selftests/pyperf180.bpf.o               88738   87813   -925    -1.04%
> selftests/pyperf50.bpf.o                25388   25113   -275    -1.08%
> selftests/pyperf600.bpf.o               78330   78300   -30     -0.04%
> selftests/pyperf_global.bpf.o           5244    5188    -56     -1.07%
> selftests/pyperf_subprogs.bpf.o         5262    5192    -70     -1.33%
> selftests/strobemeta.bpf.o              17154   16065   -1089   -6.35%
> selftests/test_verif_scale2.bpf.o       11337   11337    0       0.00%
>
> (Instructions are counted by counting the number of instruction lines
>  in disassembly).
>
> Is community interested in this work?
> Are there any omissions in my changes to the verifier?
>
> Known issue:
>
> There are two programs (one Cilium, one selftest) that exhibit
> anomalous increase in verifier processing time with this patch-set:
>
>  File                 Program                        Insns (A)  Insns (B)  Insns   (DIFF)
>  -------------------  -----------------------------  ---------  ---------  --------------
>  bpf_host.o           tail_ipv6_host_policy_ingress       1576       2403  +827 (+52.47%)
>  map_kptr.bpf.o       test_map_kptr                        400        475   +75 (+18.75%)
>  -------------------  -----------------------------  ---------  ---------  --------------
>
> These are under investigation.
>
> Thanks,
> Eduard
>
> [1] https://reviews.llvm.org/D140804
> [2] git@github.com:anakryiko/cilium.git
>
> Eduard Zingerman (5):
>   bpf: more precise stack write reasoning for BPF_ST instruction
>   selftests/bpf: check if verifier tracks constants spilled by
>     BPF_ST_MEM
>   bpf: allow ctx writes using BPF_ST_MEM instruction
>   selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
>   selftests/bpf: don't match exact insn index in expected error message
>
>  kernel/bpf/cgroup.c                           |  49 +++++---
>  kernel/bpf/verifier.c                         | 102 +++++++++-------
>  net/core/filter.c                             |  72 ++++++------
>  .../selftests/bpf/prog_tests/log_fixup.c      |   2 +-
>  .../selftests/bpf/prog_tests/spin_lock.c      |   8 +-
>  .../bpf/verifier/bounds_mix_sign_unsign.c     | 110 ++++++++++--------
>  .../selftests/bpf/verifier/bpf_st_mem.c       |  29 +++++
>  tools/testing/selftests/bpf/verifier/ctx.c    |  11 --
>  tools/testing/selftests/bpf/verifier/unpriv.c |  23 ++++
>  9 files changed, 249 insertions(+), 157 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c
>
> --
> 2.39.0
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
  2023-01-04 22:10 ` [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Andrii Nakryiko
@ 2023-01-05 10:06   ` Jose E. Marchesi
  2023-01-05 12:07     ` Eduard Zingerman
  0 siblings, 1 reply; 16+ messages in thread
From: Jose E. Marchesi @ 2023-01-05 10:06 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Eduard Zingerman, bpf, ast, andrii, daniel, kernel-team, yhs,
	david.faust, James Hilliard


> On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>
>> BPF has two documented (non-atomic) memory store instructions:
>>
>> BPF_STX: *(size *) (dst_reg + off) = src_reg
>> BPF_ST : *(size *) (dst_reg + off) = imm32
>>
>> Currently LLVM BPF back-end does not emit BPF_ST instruction and does
>> not allow one to be specified as inline assembly.
>>
>> Recently I've been exploring ways to port some of the verifier test
>> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
>> and machinery provided in tools/testing/selftests/bpf/test_loader.c
>> (which should hopefully simplify tests maintenance).
>> The BPF_ST instruction is popular in these tests: used in 52 of 94 files.
>>
>> While it is possible to adjust LLVM to only support BPF_ST for inline
>> assembly blocks it seems a bit wasteful. This patch-set contains a set
>> of changes to verifier necessary in case when LLVM is allowed to
>> freely emit BPF_ST instructions (source code is available here [1]).
>
> Would we gate LLVM's emitting of BPF_ST for C code behind some new
> cpu=v4? What is the benefit for compiler to start automatically emit
> such instructions? Such thinking about logistics, if there isn't much
> benefit, as BPF application owner I wouldn't bother enabling this
> behavior risking regressions on old kernels that don't have these
> changes.

Hmm, GCC happily generates BPF_ST instructions:

  $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
  $ cat foo.s
        .file	"<stdin>"
        .text
        .align	3
        .global	foo
        .type	foo, @function
  foo:
        lddw	%r0,v
        stw	[%r0+0],666
        exit
        .size	foo, .-foo
        .global	v
        .type	v, @object
        .lcomm	v,4,4
        .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"

Been doing that since October 2019, I think before the cpu versioning
mechanism was got in place?

We weren't aware this was problematic.  Does the verifier reject such
instructions?

> So I feel like the biggest benefit is to be able to use this
> instruction in embedded assembly, to make writing and maintaining
> tests easier.
>
>> The changes include:
>>  - update to verifier.c:check_stack_write_*() functions to track
>>    constant values spilled to stack via BPF_ST instruction in a same
>>    way stack spills of known registers by BPF_STX are tracked;
>>  - updates to verifier.c:convert_ctx_access() and it's callbacks to
>>    handle BPF_ST instruction in a way similar to BPF_STX;
>>  - some test adjustments and a few new tests.
>>
>> With the above changes applied and LLVM from [1] all test_verifier,
>> test_maps, test_progs and test_progs-no_alu32 test cases are passing.
>>
>> When built using the LLVM version from [1] BPF programs generated for
>> selftests and Cilium programs (taken from [2]) see certain reduction
>> in size, e.g. below are total numbers of instructions for files with
>> over 5K instructions:
>>
>> File                                    Insns   Insns   Insns   Diff
>>                                         w/o     with    diff    pct
>>                                         BPF_ST  BPF_ST
>> cilium/bpf_host.o                       44620   43774   -846    -1.90%
>> cilium/bpf_lxc.o                        36842   36060   -782    -2.12%
>> cilium/bpf_overlay.o                    23557   23186   -371    -1.57%
>> cilium/bpf_xdp.o                        26397   25931   -466    -1.77%
>> selftests/core_kern.bpf.o               12359   12359    0       0.00%
>> selftests/linked_list_fail.bpf.o        5501    5302    -199    -3.62%
>> selftests/profiler1.bpf.o               17828   17709   -119    -0.67%
>> selftests/pyperf100.bpf.o               49793   49268   -525    -1.05%
>> selftests/pyperf180.bpf.o               88738   87813   -925    -1.04%
>> selftests/pyperf50.bpf.o                25388   25113   -275    -1.08%
>> selftests/pyperf600.bpf.o               78330   78300   -30     -0.04%
>> selftests/pyperf_global.bpf.o           5244    5188    -56     -1.07%
>> selftests/pyperf_subprogs.bpf.o         5262    5192    -70     -1.33%
>> selftests/strobemeta.bpf.o              17154   16065   -1089   -6.35%
>> selftests/test_verif_scale2.bpf.o       11337   11337    0       0.00%
>>
>> (Instructions are counted by counting the number of instruction lines
>>  in disassembly).
>>
>> Is community interested in this work?
>> Are there any omissions in my changes to the verifier?
>>
>> Known issue:
>>
>> There are two programs (one Cilium, one selftest) that exhibit
>> anomalous increase in verifier processing time with this patch-set:
>>
>>  File                 Program                        Insns (A)  Insns (B)  Insns   (DIFF)
>>  -------------------  -----------------------------  ---------  ---------  --------------
>>  bpf_host.o           tail_ipv6_host_policy_ingress       1576       2403  +827 (+52.47%)
>>  map_kptr.bpf.o       test_map_kptr                        400        475   +75 (+18.75%)
>>  -------------------  -----------------------------  ---------  ---------  --------------
>>
>> These are under investigation.
>>
>> Thanks,
>> Eduard
>>
>> [1] https://reviews.llvm.org/D140804
>> [2] git@github.com:anakryiko/cilium.git
>>
>> Eduard Zingerman (5):
>>   bpf: more precise stack write reasoning for BPF_ST instruction
>>   selftests/bpf: check if verifier tracks constants spilled by
>>     BPF_ST_MEM
>>   bpf: allow ctx writes using BPF_ST_MEM instruction
>>   selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
>>   selftests/bpf: don't match exact insn index in expected error message
>>
>>  kernel/bpf/cgroup.c                           |  49 +++++---
>>  kernel/bpf/verifier.c                         | 102 +++++++++-------
>>  net/core/filter.c                             |  72 ++++++------
>>  .../selftests/bpf/prog_tests/log_fixup.c      |   2 +-
>>  .../selftests/bpf/prog_tests/spin_lock.c      |   8 +-
>>  .../bpf/verifier/bounds_mix_sign_unsign.c     | 110 ++++++++++--------
>>  .../selftests/bpf/verifier/bpf_st_mem.c       |  29 +++++
>>  tools/testing/selftests/bpf/verifier/ctx.c    |  11 --
>>  tools/testing/selftests/bpf/verifier/unpriv.c |  23 ++++
>>  9 files changed, 249 insertions(+), 157 deletions(-)
>>  create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c
>>
>> --
>> 2.39.0
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
  2023-01-05 10:06   ` Jose E. Marchesi
@ 2023-01-05 12:07     ` Eduard Zingerman
  2023-01-05 15:07       ` Jose E. Marchesi
  2023-01-12 22:27       ` Alexei Starovoitov
  0 siblings, 2 replies; 16+ messages in thread
From: Eduard Zingerman @ 2023-01-05 12:07 UTC (permalink / raw)
  To: Jose E. Marchesi, Andrii Nakryiko
  Cc: bpf, ast, andrii, daniel, kernel-team, yhs, david.faust, James Hilliard

On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote:
> > On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > 
> > > BPF has two documented (non-atomic) memory store instructions:
> > > 
> > > BPF_STX: *(size *) (dst_reg + off) = src_reg
> > > BPF_ST : *(size *) (dst_reg + off) = imm32
> > > 
> > > Currently LLVM BPF back-end does not emit BPF_ST instruction and does
> > > not allow one to be specified as inline assembly.
> > > 
> > > Recently I've been exploring ways to port some of the verifier test
> > > cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
> > > and machinery provided in tools/testing/selftests/bpf/test_loader.c
> > > (which should hopefully simplify tests maintenance).
> > > The BPF_ST instruction is popular in these tests: used in 52 of 94 files.
> > > 
> > > While it is possible to adjust LLVM to only support BPF_ST for inline
> > > assembly blocks it seems a bit wasteful. This patch-set contains a set
> > > of changes to verifier necessary in case when LLVM is allowed to
> > > freely emit BPF_ST instructions (source code is available here [1]).
> > 
> > Would we gate LLVM's emitting of BPF_ST for C code behind some new
> > cpu=v4? What is the benefit for compiler to start automatically emit
> > such instructions? Such thinking about logistics, if there isn't much
> > benefit, as BPF application owner I wouldn't bother enabling this
> > behavior risking regressions on old kernels that don't have these
> > changes.
> 
> Hmm, GCC happily generates BPF_ST instructions:
> 
>   $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
>   $ cat foo.s
>         .file	"<stdin>"
>         .text
>         .align	3
>         .global	foo
>         .type	foo, @function
>   foo:
>         lddw	%r0,v
>         stw	[%r0+0],666
>         exit
>         .size	foo, .-foo
>         .global	v
>         .type	v, @object
>         .lcomm	v,4,4
>         .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"
> 
> Been doing that since October 2019, I think before the cpu versioning
> mechanism was got in place?
> 
> We weren't aware this was problematic.  Does the verifier reject such
> instructions?

Interesting, do BPF selftests generated by GCC pass the same way they
do if generated by clang?

I had to do the following changes to the verifier to make the
selftests pass when BPF_ST instruction is allowed for selection:

- patch #1 in this patchset: track values of constants written to
  stack using BPF_ST. Currently these are tracked imprecisely, unlike
  the writes using BPF_STX, e.g.:
  
    fp[-8] = 42;   currently verifier assumes that fp[-8]=mmmmmmmm
                   after such instruction, where m stands for "misc",
                   just a note that something is written at fp[-8].
                   
    r1 = 42;       verifier tracks r1=42 after this instruction.
    fp[-8] = r1;   verifier tracks fp[-8]=42 after this instruction.

  So the patch makes both cases equivalent.
  
- patch #3 in this patchset: adjusts verifier.c:convert_ctx_access()
  to operate on BPF_ST alongside BPF_STX.
  
  Context parameters for some BPF programs types are "fake" data
  structures. The verifier matches all BPF_STX and BPF_LDX
  instructions that operate on pointers to such contexts and rewrites
  these instructions. It might change an offset or add another layer
  of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access().
  (This also implies that verifier forbids writes to non-constant
   offsets inside such structures).
   
  So the patch extends this logic to also handle BPF_ST.

> 
> > So I feel like the biggest benefit is to be able to use this
> > instruction in embedded assembly, to make writing and maintaining
> > tests easier.
> > 
> > > The changes include:
> > >  - update to verifier.c:check_stack_write_*() functions to track
> > >    constant values spilled to stack via BPF_ST instruction in a same
> > >    way stack spills of known registers by BPF_STX are tracked;
> > >  - updates to verifier.c:convert_ctx_access() and it's callbacks to
> > >    handle BPF_ST instruction in a way similar to BPF_STX;
> > >  - some test adjustments and a few new tests.
> > > 
> > > With the above changes applied and LLVM from [1] all test_verifier,
> > > test_maps, test_progs and test_progs-no_alu32 test cases are passing.
> > > 
> > > When built using the LLVM version from [1] BPF programs generated for
> > > selftests and Cilium programs (taken from [2]) see certain reduction
> > > in size, e.g. below are total numbers of instructions for files with
> > > over 5K instructions:
> > > 
> > > File                                    Insns   Insns   Insns   Diff
> > >                                         w/o     with    diff    pct
> > >                                         BPF_ST  BPF_ST
> > > cilium/bpf_host.o                       44620   43774   -846    -1.90%
> > > cilium/bpf_lxc.o                        36842   36060   -782    -2.12%
> > > cilium/bpf_overlay.o                    23557   23186   -371    -1.57%
> > > cilium/bpf_xdp.o                        26397   25931   -466    -1.77%
> > > selftests/core_kern.bpf.o               12359   12359    0       0.00%
> > > selftests/linked_list_fail.bpf.o        5501    5302    -199    -3.62%
> > > selftests/profiler1.bpf.o               17828   17709   -119    -0.67%
> > > selftests/pyperf100.bpf.o               49793   49268   -525    -1.05%
> > > selftests/pyperf180.bpf.o               88738   87813   -925    -1.04%
> > > selftests/pyperf50.bpf.o                25388   25113   -275    -1.08%
> > > selftests/pyperf600.bpf.o               78330   78300   -30     -0.04%
> > > selftests/pyperf_global.bpf.o           5244    5188    -56     -1.07%
> > > selftests/pyperf_subprogs.bpf.o         5262    5192    -70     -1.33%
> > > selftests/strobemeta.bpf.o              17154   16065   -1089   -6.35%
> > > selftests/test_verif_scale2.bpf.o       11337   11337    0       0.00%
> > > 
> > > (Instructions are counted by counting the number of instruction lines
> > >  in disassembly).
> > > 
> > > Is community interested in this work?
> > > Are there any omissions in my changes to the verifier?
> > > 
> > > Known issue:
> > > 
> > > There are two programs (one Cilium, one selftest) that exhibit
> > > anomalous increase in verifier processing time with this patch-set:
> > > 
> > >  File                 Program                        Insns (A)  Insns (B)  Insns   (DIFF)
> > >  -------------------  -----------------------------  ---------  ---------  --------------
> > >  bpf_host.o           tail_ipv6_host_policy_ingress       1576       2403  +827 (+52.47%)
> > >  map_kptr.bpf.o       test_map_kptr                        400        475   +75 (+18.75%)
> > >  -------------------  -----------------------------  ---------  ---------  --------------
> > > 
> > > These are under investigation.
> > > 
> > > Thanks,
> > > Eduard
> > > 
> > > [1] https://reviews.llvm.org/D140804
> > > [2] git@github.com:anakryiko/cilium.git
> > > 
> > > Eduard Zingerman (5):
> > >   bpf: more precise stack write reasoning for BPF_ST instruction
> > >   selftests/bpf: check if verifier tracks constants spilled by
> > >     BPF_ST_MEM
> > >   bpf: allow ctx writes using BPF_ST_MEM instruction
> > >   selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
> > >   selftests/bpf: don't match exact insn index in expected error message
> > > 
> > >  kernel/bpf/cgroup.c                           |  49 +++++---
> > >  kernel/bpf/verifier.c                         | 102 +++++++++-------
> > >  net/core/filter.c                             |  72 ++++++------
> > >  .../selftests/bpf/prog_tests/log_fixup.c      |   2 +-
> > >  .../selftests/bpf/prog_tests/spin_lock.c      |   8 +-
> > >  .../bpf/verifier/bounds_mix_sign_unsign.c     | 110 ++++++++++--------
> > >  .../selftests/bpf/verifier/bpf_st_mem.c       |  29 +++++
> > >  tools/testing/selftests/bpf/verifier/ctx.c    |  11 --
> > >  tools/testing/selftests/bpf/verifier/unpriv.c |  23 ++++
> > >  9 files changed, 249 insertions(+), 157 deletions(-)
> > >  create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c
> > > 
> > > --
> > > 2.39.0
> > > 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
  2023-01-05 12:07     ` Eduard Zingerman
@ 2023-01-05 15:07       ` Jose E. Marchesi
  2023-01-12 22:27       ` Alexei Starovoitov
  1 sibling, 0 replies; 16+ messages in thread
From: Jose E. Marchesi @ 2023-01-05 15:07 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, bpf, ast, andrii, daniel, kernel-team, yhs,
	david.faust, James Hilliard, Yonghong Song


> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote:
>> > On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>> > > 
>> > > BPF has two documented (non-atomic) memory store instructions:
>> > > 
>> > > BPF_STX: *(size *) (dst_reg + off) = src_reg
>> > > BPF_ST : *(size *) (dst_reg + off) = imm32
>> > > 
>> > > Currently LLVM BPF back-end does not emit BPF_ST instruction and does
>> > > not allow one to be specified as inline assembly.
>> > > 
>> > > Recently I've been exploring ways to port some of the verifier test
>> > > cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
>> > > and machinery provided in tools/testing/selftests/bpf/test_loader.c
>> > > (which should hopefully simplify tests maintenance).
>> > > The BPF_ST instruction is popular in these tests: used in 52 of 94 files.
>> > > 
>> > > While it is possible to adjust LLVM to only support BPF_ST for inline
>> > > assembly blocks it seems a bit wasteful. This patch-set contains a set
>> > > of changes to verifier necessary in case when LLVM is allowed to
>> > > freely emit BPF_ST instructions (source code is available here [1]).
>> > 
>> > Would we gate LLVM's emitting of BPF_ST for C code behind some new
>> > cpu=v4? What is the benefit for compiler to start automatically emit
>> > such instructions? Such thinking about logistics, if there isn't much
>> > benefit, as BPF application owner I wouldn't bother enabling this
>> > behavior risking regressions on old kernels that don't have these
>> > changes.
>> 
>> Hmm, GCC happily generates BPF_ST instructions:
>> 
>>   $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
>>   $ cat foo.s
>>         .file	"<stdin>"
>>         .text
>>         .align	3
>>         .global	foo
>>         .type	foo, @function
>>   foo:
>>         lddw	%r0,v
>>         stw	[%r0+0],666
>>         exit
>>         .size	foo, .-foo
>>         .global	v
>>         .type	v, @object
>>         .lcomm	v,4,4
>>         .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"
>> 
>> Been doing that since October 2019, I think before the cpu versioning
>> mechanism was got in place?
>> 
>> We weren't aware this was problematic.  Does the verifier reject such
>> instructions?
>
> Interesting, do BPF selftests generated by GCC pass the same way they
> do if generated by clang?

We are still working on other issues in GCC; we are not yet in par with
clang when it comes to build/run the BPF selftests.  So I guess we
didn't reach that particular breaking point yet ;)

> I had to do the following changes to the verifier to make the
> selftests pass when BPF_ST instruction is allowed for selection:

So, if the current state of affairs is that the kernel rejects BPF_ST
instructions, we shouldn't be generating them from GCC.

I like Andrii's idea of introducing a -mcpu=v4 and condition generation
of BPF_ST from the compiler to it.  GCC is defaulting to -mcpu=v3 atm.

Yonghong, WDYT?  Would that be acceptable for clang as well?

>
> - patch #1 in this patchset: track values of constants written to
>   stack using BPF_ST. Currently these are tracked imprecisely, unlike
>   the writes using BPF_STX, e.g.:
>   
>     fp[-8] = 42;   currently verifier assumes that fp[-8]=mmmmmmmm
>                    after such instruction, where m stands for "misc",
>                    just a note that something is written at fp[-8].
>                    
>     r1 = 42;       verifier tracks r1=42 after this instruction.
>     fp[-8] = r1;   verifier tracks fp[-8]=42 after this instruction.
>
>   So the patch makes both cases equivalent.
>   
> - patch #3 in this patchset: adjusts verifier.c:convert_ctx_access()
>   to operate on BPF_ST alongside BPF_STX.
>   
>   Context parameters for some BPF programs types are "fake" data
>   structures. The verifier matches all BPF_STX and BPF_LDX
>   instructions that operate on pointers to such contexts and rewrites
>   these instructions. It might change an offset or add another layer
>   of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access().
>   (This also implies that verifier forbids writes to non-constant
>    offsets inside such structures).
>    
>   So the patch extends this logic to also handle BPF_ST.
>
>> 
>> > So I feel like the biggest benefit is to be able to use this
>> > instruction in embedded assembly, to make writing and maintaining
>> > tests easier.
>> > 
>> > > The changes include:
>> > >  - update to verifier.c:check_stack_write_*() functions to track
>> > >    constant values spilled to stack via BPF_ST instruction in a same
>> > >    way stack spills of known registers by BPF_STX are tracked;
>> > >  - updates to verifier.c:convert_ctx_access() and it's callbacks to
>> > >    handle BPF_ST instruction in a way similar to BPF_STX;
>> > >  - some test adjustments and a few new tests.
>> > > 
>> > > With the above changes applied and LLVM from [1] all test_verifier,
>> > > test_maps, test_progs and test_progs-no_alu32 test cases are passing.
>> > > 
>> > > When built using the LLVM version from [1] BPF programs generated for
>> > > selftests and Cilium programs (taken from [2]) see certain reduction
>> > > in size, e.g. below are total numbers of instructions for files with
>> > > over 5K instructions:
>> > > 
>> > > File                                    Insns   Insns   Insns   Diff
>> > >                                         w/o     with    diff    pct
>> > >                                         BPF_ST  BPF_ST
>> > > cilium/bpf_host.o                       44620   43774   -846    -1.90%
>> > > cilium/bpf_lxc.o                        36842   36060   -782    -2.12%
>> > > cilium/bpf_overlay.o                    23557   23186   -371    -1.57%
>> > > cilium/bpf_xdp.o                        26397   25931   -466    -1.77%
>> > > selftests/core_kern.bpf.o               12359   12359    0       0.00%
>> > > selftests/linked_list_fail.bpf.o        5501    5302    -199    -3.62%
>> > > selftests/profiler1.bpf.o               17828   17709   -119    -0.67%
>> > > selftests/pyperf100.bpf.o               49793   49268   -525    -1.05%
>> > > selftests/pyperf180.bpf.o               88738   87813   -925    -1.04%
>> > > selftests/pyperf50.bpf.o                25388   25113   -275    -1.08%
>> > > selftests/pyperf600.bpf.o               78330   78300   -30     -0.04%
>> > > selftests/pyperf_global.bpf.o           5244    5188    -56     -1.07%
>> > > selftests/pyperf_subprogs.bpf.o         5262    5192    -70     -1.33%
>> > > selftests/strobemeta.bpf.o              17154   16065   -1089   -6.35%
>> > > selftests/test_verif_scale2.bpf.o       11337   11337    0       0.00%
>> > > 
>> > > (Instructions are counted by counting the number of instruction lines
>> > >  in disassembly).
>> > > 
>> > > Is community interested in this work?
>> > > Are there any omissions in my changes to the verifier?
>> > > 
>> > > Known issue:
>> > > 
>> > > There are two programs (one Cilium, one selftest) that exhibit
>> > > anomalous increase in verifier processing time with this patch-set:
>> > > 
>> > >  File                 Program                        Insns (A)  Insns (B)  Insns   (DIFF)
>> > >  -------------------  -----------------------------  ---------  ---------  --------------
>> > >  bpf_host.o           tail_ipv6_host_policy_ingress       1576       2403  +827 (+52.47%)
>> > >  map_kptr.bpf.o       test_map_kptr                        400        475   +75 (+18.75%)
>> > >  -------------------  -----------------------------  ---------  ---------  --------------
>> > > 
>> > > These are under investigation.
>> > > 
>> > > Thanks,
>> > > Eduard
>> > > 
>> > > [1] https://reviews.llvm.org/D140804
>> > > [2] git@github.com:anakryiko/cilium.git
>> > > 
>> > > Eduard Zingerman (5):
>> > >   bpf: more precise stack write reasoning for BPF_ST instruction
>> > >   selftests/bpf: check if verifier tracks constants spilled by
>> > >     BPF_ST_MEM
>> > >   bpf: allow ctx writes using BPF_ST_MEM instruction
>> > >   selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
>> > >   selftests/bpf: don't match exact insn index in expected error message
>> > > 
>> > >  kernel/bpf/cgroup.c                           |  49 +++++---
>> > >  kernel/bpf/verifier.c                         | 102 +++++++++-------
>> > >  net/core/filter.c                             |  72 ++++++------
>> > >  .../selftests/bpf/prog_tests/log_fixup.c      |   2 +-
>> > >  .../selftests/bpf/prog_tests/spin_lock.c      |   8 +-
>> > >  .../bpf/verifier/bounds_mix_sign_unsign.c     | 110 ++++++++++--------
>> > >  .../selftests/bpf/verifier/bpf_st_mem.c       |  29 +++++
>> > >  tools/testing/selftests/bpf/verifier/ctx.c    |  11 --
>> > >  tools/testing/selftests/bpf/verifier/unpriv.c |  23 ++++
>> > >  9 files changed, 249 insertions(+), 157 deletions(-)
>> > >  create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c
>> > > 
>> > > --
>> > > 2.39.0
>> > > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
  2023-01-05 12:07     ` Eduard Zingerman
  2023-01-05 15:07       ` Jose E. Marchesi
@ 2023-01-12 22:27       ` Alexei Starovoitov
  2023-01-13  8:02         ` Yonghong Song
  1 sibling, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2023-01-12 22:27 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Jose E. Marchesi, Andrii Nakryiko, bpf, ast, andrii, daniel,
	kernel-team, yhs, david.faust, James Hilliard

On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote:
> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote:
> > > On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > 
> > > > BPF has two documented (non-atomic) memory store instructions:
> > > > 
> > > > BPF_STX: *(size *) (dst_reg + off) = src_reg
> > > > BPF_ST : *(size *) (dst_reg + off) = imm32
> > > > 
> > > > Currently LLVM BPF back-end does not emit BPF_ST instruction and does
> > > > not allow one to be specified as inline assembly.
> > > > 
> > > > Recently I've been exploring ways to port some of the verifier test
> > > > cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
> > > > and machinery provided in tools/testing/selftests/bpf/test_loader.c
> > > > (which should hopefully simplify tests maintenance).
> > > > The BPF_ST instruction is popular in these tests: used in 52 of 94 files.
> > > > 
> > > > While it is possible to adjust LLVM to only support BPF_ST for inline
> > > > assembly blocks it seems a bit wasteful. This patch-set contains a set
> > > > of changes to verifier necessary in case when LLVM is allowed to
> > > > freely emit BPF_ST instructions (source code is available here [1]).
> > > 
> > > Would we gate LLVM's emitting of BPF_ST for C code behind some new
> > > cpu=v4? What is the benefit for compiler to start automatically emit
> > > such instructions? Such thinking about logistics, if there isn't much
> > > benefit, as BPF application owner I wouldn't bother enabling this
> > > behavior risking regressions on old kernels that don't have these
> > > changes.
> > 
> > Hmm, GCC happily generates BPF_ST instructions:
> > 
> >   $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
> >   $ cat foo.s
> >         .file	"<stdin>"
> >         .text
> >         .align	3
> >         .global	foo
> >         .type	foo, @function
> >   foo:
> >         lddw	%r0,v
> >         stw	[%r0+0],666
> >         exit
> >         .size	foo, .-foo
> >         .global	v
> >         .type	v, @object
> >         .lcomm	v,4,4
> >         .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"
> > 
> > Been doing that since October 2019, I think before the cpu versioning
> > mechanism was got in place?
> > 
> > We weren't aware this was problematic.  Does the verifier reject such
> > instructions?
> 
> Interesting, do BPF selftests generated by GCC pass the same way they
> do if generated by clang?
> 
> I had to do the following changes to the verifier to make the
> selftests pass when BPF_ST instruction is allowed for selection:
> 
> - patch #1 in this patchset: track values of constants written to
>   stack using BPF_ST. Currently these are tracked imprecisely, unlike
>   the writes using BPF_STX, e.g.:
>   
>     fp[-8] = 42;   currently verifier assumes that fp[-8]=mmmmmmmm
>                    after such instruction, where m stands for "misc",
>                    just a note that something is written at fp[-8].
>                    
>     r1 = 42;       verifier tracks r1=42 after this instruction.
>     fp[-8] = r1;   verifier tracks fp[-8]=42 after this instruction.
> 
>   So the patch makes both cases equivalent.
>   
> - patch #3 in this patchset: adjusts verifier.c:convert_ctx_access()
>   to operate on BPF_ST alongside BPF_STX.
>   
>   Context parameters for some BPF programs types are "fake" data
>   structures. The verifier matches all BPF_STX and BPF_LDX
>   instructions that operate on pointers to such contexts and rewrites
>   these instructions. It might change an offset or add another layer
>   of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access().
>   (This also implies that verifier forbids writes to non-constant
>    offsets inside such structures).
>    
>   So the patch extends this logic to also handle BPF_ST.

The patch 3 is necessary to land before llvm starts generating 'st' for ctx access.
That's clear, but I'm missing why patch 1 is necessary.
Sure, it's making the verifier understand scalar spills with 'st' and
makes 'st' equivalent to 'stx', but I'm missing why it's necessary.
What kind of programs fail to be verified when llvm starts generating 'st' ?

Regarind -mcpu=v4.
I think we need to add all of our upcoming instructions as a single flag.
Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them.

-mcpu=v4 could mean:
- ST
- sign extending loads
- sign extend a register
- 32-bit JA
- proper bswap insns: bswap16, bswap32, bswap64

The sign and 32-bit JA we've discussed earlier.
The bswap was on my wish list forever.
The existing TO_LE, TO_BE insns are really odd from compiler pov.
The compiler should translate bswap IR op into proper bswap insn
just like it does on all cpus.

Maybe add SDIV to -mcpu=v4 as well?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
  2023-01-12 22:27       ` Alexei Starovoitov
@ 2023-01-13  8:02         ` Yonghong Song
  2023-01-13  8:53           ` Jose E. Marchesi
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2023-01-13  8:02 UTC (permalink / raw)
  To: Alexei Starovoitov, Eduard Zingerman
  Cc: Jose E. Marchesi, Andrii Nakryiko, bpf, ast, andrii, daniel,
	kernel-team, yhs, david.faust, James Hilliard



On 1/12/23 2:27 PM, Alexei Starovoitov wrote:
> On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote:
>> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote:
>>>> On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>>>>
>>>>> BPF has two documented (non-atomic) memory store instructions:
>>>>>
>>>>> BPF_STX: *(size *) (dst_reg + off) = src_reg
>>>>> BPF_ST : *(size *) (dst_reg + off) = imm32
>>>>>
>>>>> Currently LLVM BPF back-end does not emit BPF_ST instruction and does
>>>>> not allow one to be specified as inline assembly.
>>>>>
>>>>> Recently I've been exploring ways to port some of the verifier test
>>>>> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
>>>>> and machinery provided in tools/testing/selftests/bpf/test_loader.c
>>>>> (which should hopefully simplify tests maintenance).
>>>>> The BPF_ST instruction is popular in these tests: used in 52 of 94 files.
>>>>>
>>>>> While it is possible to adjust LLVM to only support BPF_ST for inline
>>>>> assembly blocks it seems a bit wasteful. This patch-set contains a set
>>>>> of changes to verifier necessary in case when LLVM is allowed to
>>>>> freely emit BPF_ST instructions (source code is available here [1]).
>>>>
>>>> Would we gate LLVM's emitting of BPF_ST for C code behind some new
>>>> cpu=v4? What is the benefit for compiler to start automatically emit
>>>> such instructions? Such thinking about logistics, if there isn't much
>>>> benefit, as BPF application owner I wouldn't bother enabling this
>>>> behavior risking regressions on old kernels that don't have these
>>>> changes.
>>>
>>> Hmm, GCC happily generates BPF_ST instructions:
>>>
>>>    $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
>>>    $ cat foo.s
>>>          .file	"<stdin>"
>>>          .text
>>>          .align	3
>>>          .global	foo
>>>          .type	foo, @function
>>>    foo:
>>>          lddw	%r0,v
>>>          stw	[%r0+0],666
>>>          exit
>>>          .size	foo, .-foo
>>>          .global	v
>>>          .type	v, @object
>>>          .lcomm	v,4,4
>>>          .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"
>>>
>>> Been doing that since October 2019, I think before the cpu versioning
>>> mechanism was got in place?
>>>
>>> We weren't aware this was problematic.  Does the verifier reject such
>>> instructions?
>>
>> Interesting, do BPF selftests generated by GCC pass the same way they
>> do if generated by clang?
>>
>> I had to do the following changes to the verifier to make the
>> selftests pass when BPF_ST instruction is allowed for selection:
>>
>> - patch #1 in this patchset: track values of constants written to
>>    stack using BPF_ST. Currently these are tracked imprecisely, unlike
>>    the writes using BPF_STX, e.g.:
>>    
>>      fp[-8] = 42;   currently verifier assumes that fp[-8]=mmmmmmmm
>>                     after such instruction, where m stands for "misc",
>>                     just a note that something is written at fp[-8].
>>                     
>>      r1 = 42;       verifier tracks r1=42 after this instruction.
>>      fp[-8] = r1;   verifier tracks fp[-8]=42 after this instruction.
>>
>>    So the patch makes both cases equivalent.
>>    
>> - patch #3 in this patchset: adjusts verifier.c:convert_ctx_access()
>>    to operate on BPF_ST alongside BPF_STX.
>>    
>>    Context parameters for some BPF programs types are "fake" data
>>    structures. The verifier matches all BPF_STX and BPF_LDX
>>    instructions that operate on pointers to such contexts and rewrites
>>    these instructions. It might change an offset or add another layer
>>    of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access().
>>    (This also implies that verifier forbids writes to non-constant
>>     offsets inside such structures).
>>     
>>    So the patch extends this logic to also handle BPF_ST.
> 
> The patch 3 is necessary to land before llvm starts generating 'st' for ctx access.
> That's clear, but I'm missing why patch 1 is necessary.
> Sure, it's making the verifier understand scalar spills with 'st' and
> makes 'st' equivalent to 'stx', but I'm missing why it's necessary.
> What kind of programs fail to be verified when llvm starts generating 'st' ?
> 
> Regarind -mcpu=v4.
> I think we need to add all of our upcoming instructions as a single flag.
> Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them.
> 
> -mcpu=v4 could mean:
> - ST
> - sign extending loads
> - sign extend a register
> - 32-bit JA
> - proper bswap insns: bswap16, bswap32, bswap64
> 
> The sign and 32-bit JA we've discussed earlier.
> The bswap was on my wish list forever.
> The existing TO_LE, TO_BE insns are really odd from compiler pov.
> The compiler should translate bswap IR op into proper bswap insn
> just like it does on all cpus.
> 
> Maybe add SDIV to -mcpu=v4 as well?

Right, we should add these insns in llvm17 with -mcpu=v4, so we
can keep the number of cpu generations minimum.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
  2023-01-13  8:02         ` Yonghong Song
@ 2023-01-13  8:53           ` Jose E. Marchesi
  2023-01-13 16:47             ` Eduard Zingerman
  2023-01-13 19:23             ` Yonghong Song
  0 siblings, 2 replies; 16+ messages in thread
From: Jose E. Marchesi @ 2023-01-13  8:53 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Eduard Zingerman, Andrii Nakryiko, bpf, ast,
	andrii, daniel, kernel-team, yhs, david.faust, James Hilliard


> On 1/12/23 2:27 PM, Alexei Starovoitov wrote:
>> On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote:
>>> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote:
>>>>> On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>>>>>
>>>>>> BPF has two documented (non-atomic) memory store instructions:
>>>>>>
>>>>>> BPF_STX: *(size *) (dst_reg + off) = src_reg
>>>>>> BPF_ST : *(size *) (dst_reg + off) = imm32
>>>>>>
>>>>>> Currently LLVM BPF back-end does not emit BPF_ST instruction and does
>>>>>> not allow one to be specified as inline assembly.
>>>>>>
>>>>>> Recently I've been exploring ways to port some of the verifier test
>>>>>> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
>>>>>> and machinery provided in tools/testing/selftests/bpf/test_loader.c
>>>>>> (which should hopefully simplify tests maintenance).
>>>>>> The BPF_ST instruction is popular in these tests: used in 52 of 94 files.
>>>>>>
>>>>>> While it is possible to adjust LLVM to only support BPF_ST for inline
>>>>>> assembly blocks it seems a bit wasteful. This patch-set contains a set
>>>>>> of changes to verifier necessary in case when LLVM is allowed to
>>>>>> freely emit BPF_ST instructions (source code is available here [1]).
>>>>>
>>>>> Would we gate LLVM's emitting of BPF_ST for C code behind some new
>>>>> cpu=v4? What is the benefit for compiler to start automatically emit
>>>>> such instructions? Such thinking about logistics, if there isn't much
>>>>> benefit, as BPF application owner I wouldn't bother enabling this
>>>>> behavior risking regressions on old kernels that don't have these
>>>>> changes.
>>>>
>>>> Hmm, GCC happily generates BPF_ST instructions:
>>>>
>>>>    $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
>>>>    $ cat foo.s
>>>>          .file	"<stdin>"
>>>>          .text
>>>>          .align	3
>>>>          .global	foo
>>>>          .type	foo, @function
>>>>    foo:
>>>>          lddw	%r0,v
>>>>          stw	[%r0+0],666
>>>>          exit
>>>>          .size	foo, .-foo
>>>>          .global	v
>>>>          .type	v, @object
>>>>          .lcomm	v,4,4
>>>>          .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"
>>>>
>>>> Been doing that since October 2019, I think before the cpu versioning
>>>> mechanism was got in place?
>>>>
>>>> We weren't aware this was problematic.  Does the verifier reject such
>>>> instructions?
>>>
>>> Interesting, do BPF selftests generated by GCC pass the same way they
>>> do if generated by clang?
>>>
>>> I had to do the following changes to the verifier to make the
>>> selftests pass when BPF_ST instruction is allowed for selection:
>>>
>>> - patch #1 in this patchset: track values of constants written to
>>>    stack using BPF_ST. Currently these are tracked imprecisely, unlike
>>>    the writes using BPF_STX, e.g.:
>>>         fp[-8] = 42;   currently verifier assumes that
>>> fp[-8]=mmmmmmmm
>>>                     after such instruction, where m stands for "misc",
>>>                     just a note that something is written at fp[-8].
>>>                          r1 = 42;       verifier tracks r1=42 after
>>> this instruction.
>>>      fp[-8] = r1;   verifier tracks fp[-8]=42 after this instruction.
>>>
>>>    So the patch makes both cases equivalent.
>>>    - patch #3 in this patchset: adjusts
>>> verifier.c:convert_ctx_access()
>>>    to operate on BPF_ST alongside BPF_STX.
>>>       Context parameters for some BPF programs types are "fake"
>>> data
>>>    structures. The verifier matches all BPF_STX and BPF_LDX
>>>    instructions that operate on pointers to such contexts and rewrites
>>>    these instructions. It might change an offset or add another layer
>>>    of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access().
>>>    (This also implies that verifier forbids writes to non-constant
>>>     offsets inside such structures).
>>>        So the patch extends this logic to also handle BPF_ST.
>> The patch 3 is necessary to land before llvm starts generating 'st'
>> for ctx access.
>> That's clear, but I'm missing why patch 1 is necessary.
>> Sure, it's making the verifier understand scalar spills with 'st' and
>> makes 'st' equivalent to 'stx', but I'm missing why it's necessary.
>> What kind of programs fail to be verified when llvm starts generating 'st' ?
>> Regarind -mcpu=v4.
>> I think we need to add all of our upcoming instructions as a single flag.
>> Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them.
>> -mcpu=v4 could mean:
>> - ST
>> - sign extending loads
>> - sign extend a register
>> - 32-bit JA
>> - proper bswap insns: bswap16, bswap32, bswap64
>> The sign and 32-bit JA we've discussed earlier.
>> The bswap was on my wish list forever.
>> The existing TO_LE, TO_BE insns are really odd from compiler pov.
>> The compiler should translate bswap IR op into proper bswap insn
>> just like it does on all cpus.
>> Maybe add SDIV to -mcpu=v4 as well?
>
> Right, we should add these insns in llvm17 with -mcpu=v4, so we
> can keep the number of cpu generations minimum.

How do you plan to encode the sign-extend load instructions?

I guess a possibility would be to use one of the available op-mode for
load instructions that are currently marked as reserved.  For example:

   IMM  = 0b000
   ABS  = 0b001
   IND  = 0b010
   MEM  = 0b011
   SEM = 0b100  <- new

Then we would have the following code fields for sign-extending LDX
instructions:

   op-mode:SEM op-size:{W,H,B,DW} op-class:LDX

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
  2023-01-13  8:53           ` Jose E. Marchesi
@ 2023-01-13 16:47             ` Eduard Zingerman
  2023-01-26  5:49               ` Alexei Starovoitov
  2023-01-13 19:23             ` Yonghong Song
  1 sibling, 1 reply; 16+ messages in thread
From: Eduard Zingerman @ 2023-01-13 16:47 UTC (permalink / raw)
  To: Jose E. Marchesi, Yonghong Song, Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, andrii, daniel, kernel-team, yhs,
	david.faust, James Hilliard

On Fri, 2023-01-13 at 09:53 +0100, Jose E. Marchesi wrote:
> > On 1/12/23 2:27 PM, Alexei Starovoitov wrote:
> > > On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote:
> > > > On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote:
> > > > > > On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > > > > 
> > > > > > > BPF has two documented (non-atomic) memory store instructions:
> > > > > > > 
> > > > > > > BPF_STX: *(size *) (dst_reg + off) = src_reg
> > > > > > > BPF_ST : *(size *) (dst_reg + off) = imm32
> > > > > > > 
> > > > > > > Currently LLVM BPF back-end does not emit BPF_ST instruction and does
> > > > > > > not allow one to be specified as inline assembly.
> > > > > > > 
> > > > > > > Recently I've been exploring ways to port some of the verifier test
> > > > > > > cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
> > > > > > > and machinery provided in tools/testing/selftests/bpf/test_loader.c
> > > > > > > (which should hopefully simplify tests maintenance).
> > > > > > > The BPF_ST instruction is popular in these tests: used in 52 of 94 files.
> > > > > > > 
> > > > > > > While it is possible to adjust LLVM to only support BPF_ST for inline
> > > > > > > assembly blocks it seems a bit wasteful. This patch-set contains a set
> > > > > > > of changes to verifier necessary in case when LLVM is allowed to
> > > > > > > freely emit BPF_ST instructions (source code is available here [1]).
> > > > > > 
> > > > > > Would we gate LLVM's emitting of BPF_ST for C code behind some new
> > > > > > cpu=v4? What is the benefit for compiler to start automatically emit
> > > > > > such instructions? Such thinking about logistics, if there isn't much
> > > > > > benefit, as BPF application owner I wouldn't bother enabling this
> > > > > > behavior risking regressions on old kernels that don't have these
> > > > > > changes.
> > > > > 
> > > > > Hmm, GCC happily generates BPF_ST instructions:
> > > > > 
> > > > >    $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
> > > > >    $ cat foo.s
> > > > >          .file	"<stdin>"
> > > > >          .text
> > > > >          .align	3
> > > > >          .global	foo
> > > > >          .type	foo, @function
> > > > >    foo:
> > > > >          lddw	%r0,v
> > > > >          stw	[%r0+0],666
> > > > >          exit
> > > > >          .size	foo, .-foo
> > > > >          .global	v
> > > > >          .type	v, @object
> > > > >          .lcomm	v,4,4
> > > > >          .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"
> > > > > 
> > > > > Been doing that since October 2019, I think before the cpu versioning
> > > > > mechanism was got in place?
> > > > > 
> > > > > We weren't aware this was problematic.  Does the verifier reject such
> > > > > instructions?
> > > > 
> > > > Interesting, do BPF selftests generated by GCC pass the same way they
> > > > do if generated by clang?
> > > > 
> > > > I had to do the following changes to the verifier to make the
> > > > selftests pass when BPF_ST instruction is allowed for selection:
> > > > 
> > > > - patch #1 in this patchset: track values of constants written to
> > > >    stack using BPF_ST. Currently these are tracked imprecisely, unlike
> > > >    the writes using BPF_STX, e.g.:
> > > >         fp[-8] = 42;   currently verifier assumes that
> > > > fp[-8]=mmmmmmmm
> > > >                     after such instruction, where m stands for "misc",
> > > >                     just a note that something is written at fp[-8].
> > > >                          r1 = 42;       verifier tracks r1=42 after
> > > > this instruction.
> > > >      fp[-8] = r1;   verifier tracks fp[-8]=42 after this instruction.
> > > > 
> > > >    So the patch makes both cases equivalent.
> > > >    - patch #3 in this patchset: adjusts
> > > > verifier.c:convert_ctx_access()
> > > >    to operate on BPF_ST alongside BPF_STX.
> > > >       Context parameters for some BPF programs types are "fake"
> > > > data
> > > >    structures. The verifier matches all BPF_STX and BPF_LDX
> > > >    instructions that operate on pointers to such contexts and rewrites
> > > >    these instructions. It might change an offset or add another layer
> > > >    of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access().
> > > >    (This also implies that verifier forbids writes to non-constant
> > > >     offsets inside such structures).
> > > >        So the patch extends this logic to also handle BPF_ST.
> > > The patch 3 is necessary to land before llvm starts generating 'st'
> > > for ctx access.
> > > That's clear, but I'm missing why patch 1 is necessary.
> > > Sure, it's making the verifier understand scalar spills with 'st' and
> > > makes 'st' equivalent to 'stx', but I'm missing why it's necessary.
> > > What kind of programs fail to be verified when llvm starts generating 'st' ?

I should have added an example to the summary. There are a few
test_prog tests that fail w/o this patch, namely atomic_bounds,
dynptr, for_each, xdp_noinline, xdp_synproxy.

Here is how atomic_bounds looks:

  SEC("fentry/bpf_fentry_test1")
  int BPF_PROG(sub, int x)
  {
          int a = 0;
          int b = __sync_fetch_and_add(&a, 1);
          /* b is certainly 0 here. Can the verifier tell? */
          while (b)
                  continue;
          return 0;
  }

    Compiled w/o BPF_ST                  Compiled with BPF_ST
  
  <sub>:                               <sub>:
    w1 = 0x0
    *(u32 *)(r10 - 0x4) = r1             *(u32 *)(r10 - 0x4) = 0x0
    w1 = 0x1                             w1 = 0x1
    lock *(u32 *)(r10 - 0x4) += r1       lock *(u32 *)(r10 - 0x4) += r1
    if w1 == 0x0 goto +0x1 <LBB0_2>      if w1 == 0x0 goto +0x1 <LBB0_2>
  
  <LBB0_1>:                            <LBB0_1>:
    goto -0x1 <LBB0_1>                   goto -0x1 <LBB0_1>
  
  <LBB0_2>:                            <LBB0_2>:
    w0 = 0x0                             w0 = 0x0
    exit                                 exit

When compiled with BPF_ST and verified w/o the patch #1 verification log
looks as follows:

  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (62) *(u32 *)(r10 -4) = 0          ; R10=fp0 fp-8=mmmm????
  1: (b4) w1 = 1                        ; R1_w=1
  2: (c3) r1 = atomic_fetch_add((u32 *)(r10 -4), r1)    ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=mmmm????
  3: (16) if w1 == 0x0 goto pc+1        ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
  4: (05) goto pc-1
  4: (05) goto pc-1
  4: (05) goto pc-1
  4: (05) goto pc-1
  infinite loop detected at insn 4

When compiled w/o BPF_ST and verified w/o the patch #1 verification log
looks as follows:

  func#0 @0
  reg type unsupported for arg#0 function sub#5
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (b4) w1 = 0                        ; R1_w=0
  1: (63) *(u32 *)(r10 -4) = r1
  last_idx 1 first_idx 0
  regs=2 stack=0 before 0: (b4) w1 = 0
  2: R1_w=0 R10=fp0 fp-8=0000????
  2: (b4) w1 = 1                        ; R1_w=1
  ; int b = __sync_fetch_and_add(&a, 1);
  3: (c3) r1 = atomic_fetch_add((u32 *)(r10 -4), r1)    ; R1_w=P0 R10=fp0 fp-8=mmmm????
  4: (16) if w1 == 0x0 goto pc+1
  6: R1_w=P0
  6: (b4) w0 = 0                        ; R0_w=0
  7: (95) exit
  processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

The difference comes from the way zero write to `r10-4` is processed,
with BPF_ST it is tracked as `fp-8=mmmm????` after write, without BPF_ST
it is tracked as `fp-8=0000???? after` write.

Which is caused by the way `check_stack_write_fixed_off()` is written.
For the register spills it either saves the complete register state or
STACK_ZERO if register is known to be zero. However, for the BPF_ST it
just saves STACK_MISC. Hence, the patch #1.

> > > Regarind -mcpu=v4.
> > > I think we need to add all of our upcoming instructions as a single flag.
> > > Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them.
> > > -mcpu=v4 could mean:
> > > - ST
> > > - sign extending loads
> > > - sign extend a register
> > > - 32-bit JA
> > > - proper bswap insns: bswap16, bswap32, bswap64
> > > The sign and 32-bit JA we've discussed earlier.
> > > The bswap was on my wish list forever.
> > > The existing TO_LE, TO_BE insns are really odd from compiler pov.
> > > The compiler should translate bswap IR op into proper bswap insn
> > > just like it does on all cpus.
> > > Maybe add SDIV to -mcpu=v4 as well?

There is also BPF_JSET "PC += off if dst & src" which is not currently
emitted by LLVM backend as far as I can tell.

> > 
> > Right, we should add these insns in llvm17 with -mcpu=v4, so we
> > can keep the number of cpu generations minimum.
> 
> How do you plan to encode the sign-extend load instructions?
> 
> I guess a possibility would be to use one of the available op-mode for
> load instructions that are currently marked as reserved.  For example:
> 
>    IMM  = 0b000
>    ABS  = 0b001
>    IND  = 0b010
>    MEM  = 0b011
>    SEM = 0b100  <- new
> 
> Then we would have the following code fields for sign-extending LDX
> instructions:
> 
>    op-mode:SEM op-size:{W,H,B,DW} op-class:LDX

I second the Jose's question about encoding for the new instructions.

> - sign extend a register

E.g. like this:

BPF_SEXT = 0xb0 
opcode: BPF_SEXT | BPF_X | BPF_ALU
imm32: {8,16,32}                    // to be consistent with BPF_END insn
    or BPF_{B,H,W}                  // to be consistent with LDX/STX

Sign extend 8,16,32-bit value from src to 64-bit dst register:
  src = sext{8,16,32}(dst)

> The sign and 32-bit JA we've discussed earlier.

Could you please share a link for this discussion?

> - proper bswap insns: bswap16, bswap32, bswap64

Could you please extrapolate on this.
Current instructions operate on a single register, e.g.:

  dst_reg = htole16(dst_reg).

It looks like this could be reflected in x86 assembly as a single
instruction using either bswap or xchg instructions (see [1] and [2]).
x86/net/bpf_jit_comp.c:1266 can probably be adjusted to use xchg for
16-bit case.

For ARM rev16, rev32, rev64 allow to use the same register for source
and destination ([3]).

[1] https://www.felixcloutier.com/x86/bswap
[2] https://www.felixcloutier.com/x86/xchg
[3] https://developer.arm.com/documentation/ddi0602/2022-06/Base-Instructions/REV16--Reverse-bytes-in-16-bit-halfwords-?lang=en
    https://developer.arm.com/documentation/ddi0602/2022-06/Base-Instructions/REV32--Reverse-bytes-in-32-bit-words-?lang=en
    https://developer.arm.com/documentation/ddi0602/2022-06/Base-Instructions/REV64--Reverse-Bytes--an-alias-of-REV-?lang=en
    

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
  2023-01-13  8:53           ` Jose E. Marchesi
  2023-01-13 16:47             ` Eduard Zingerman
@ 2023-01-13 19:23             ` Yonghong Song
  1 sibling, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2023-01-13 19:23 UTC (permalink / raw)
  To: Jose E. Marchesi
  Cc: Alexei Starovoitov, Eduard Zingerman, Andrii Nakryiko, bpf, ast,
	andrii, daniel, kernel-team, yhs, david.faust, James Hilliard



On 1/13/23 12:53 AM, Jose E. Marchesi wrote:
> 
>> On 1/12/23 2:27 PM, Alexei Starovoitov wrote:
>>> On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote:
>>>> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote:
>>>>>> On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>>>>>>
>>>>>>> BPF has two documented (non-atomic) memory store instructions:
>>>>>>>
>>>>>>> BPF_STX: *(size *) (dst_reg + off) = src_reg
>>>>>>> BPF_ST : *(size *) (dst_reg + off) = imm32
>>>>>>>
>>>>>>> Currently LLVM BPF back-end does not emit BPF_ST instruction and does
>>>>>>> not allow one to be specified as inline assembly.
>>>>>>>
>>>>>>> Recently I've been exploring ways to port some of the verifier test
>>>>>>> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
>>>>>>> and machinery provided in tools/testing/selftests/bpf/test_loader.c
>>>>>>> (which should hopefully simplify tests maintenance).
>>>>>>> The BPF_ST instruction is popular in these tests: used in 52 of 94 files.
>>>>>>>
>>>>>>> While it is possible to adjust LLVM to only support BPF_ST for inline
>>>>>>> assembly blocks it seems a bit wasteful. This patch-set contains a set
>>>>>>> of changes to verifier necessary in case when LLVM is allowed to
>>>>>>> freely emit BPF_ST instructions (source code is available here [1]).
>>>>>>
>>>>>> Would we gate LLVM's emitting of BPF_ST for C code behind some new
>>>>>> cpu=v4? What is the benefit for compiler to start automatically emit
>>>>>> such instructions? Such thinking about logistics, if there isn't much
>>>>>> benefit, as BPF application owner I wouldn't bother enabling this
>>>>>> behavior risking regressions on old kernels that don't have these
>>>>>> changes.
>>>>>
>>>>> Hmm, GCC happily generates BPF_ST instructions:
>>>>>
>>>>>     $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
>>>>>     $ cat foo.s
>>>>>           .file	"<stdin>"
>>>>>           .text
>>>>>           .align	3
>>>>>           .global	foo
>>>>>           .type	foo, @function
>>>>>     foo:
>>>>>           lddw	%r0,v
>>>>>           stw	[%r0+0],666
>>>>>           exit
>>>>>           .size	foo, .-foo
>>>>>           .global	v
>>>>>           .type	v, @object
>>>>>           .lcomm	v,4,4
>>>>>           .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"
>>>>>
>>>>> Been doing that since October 2019, I think before the cpu versioning
>>>>> mechanism was got in place?
>>>>>
>>>>> We weren't aware this was problematic.  Does the verifier reject such
>>>>> instructions?
>>>>
>>>> Interesting, do BPF selftests generated by GCC pass the same way they
>>>> do if generated by clang?
>>>>
>>>> I had to do the following changes to the verifier to make the
>>>> selftests pass when BPF_ST instruction is allowed for selection:
>>>>
>>>> - patch #1 in this patchset: track values of constants written to
>>>>     stack using BPF_ST. Currently these are tracked imprecisely, unlike
>>>>     the writes using BPF_STX, e.g.:
>>>>          fp[-8] = 42;   currently verifier assumes that
>>>> fp[-8]=mmmmmmmm
>>>>                      after such instruction, where m stands for "misc",
>>>>                      just a note that something is written at fp[-8].
>>>>                           r1 = 42;       verifier tracks r1=42 after
>>>> this instruction.
>>>>       fp[-8] = r1;   verifier tracks fp[-8]=42 after this instruction.
>>>>
>>>>     So the patch makes both cases equivalent.
>>>>     - patch #3 in this patchset: adjusts
>>>> verifier.c:convert_ctx_access()
>>>>     to operate on BPF_ST alongside BPF_STX.
>>>>        Context parameters for some BPF programs types are "fake"
>>>> data
>>>>     structures. The verifier matches all BPF_STX and BPF_LDX
>>>>     instructions that operate on pointers to such contexts and rewrites
>>>>     these instructions. It might change an offset or add another layer
>>>>     of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access().
>>>>     (This also implies that verifier forbids writes to non-constant
>>>>      offsets inside such structures).
>>>>         So the patch extends this logic to also handle BPF_ST.
>>> The patch 3 is necessary to land before llvm starts generating 'st'
>>> for ctx access.
>>> That's clear, but I'm missing why patch 1 is necessary.
>>> Sure, it's making the verifier understand scalar spills with 'st' and
>>> makes 'st' equivalent to 'stx', but I'm missing why it's necessary.
>>> What kind of programs fail to be verified when llvm starts generating 'st' ?
>>> Regarind -mcpu=v4.
>>> I think we need to add all of our upcoming instructions as a single flag.
>>> Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them.
>>> -mcpu=v4 could mean:
>>> - ST
>>> - sign extending loads
>>> - sign extend a register
>>> - 32-bit JA
>>> - proper bswap insns: bswap16, bswap32, bswap64
>>> The sign and 32-bit JA we've discussed earlier.
>>> The bswap was on my wish list forever.
>>> The existing TO_LE, TO_BE insns are really odd from compiler pov.
>>> The compiler should translate bswap IR op into proper bswap insn
>>> just like it does on all cpus.
>>> Maybe add SDIV to -mcpu=v4 as well?
>>
>> Right, we should add these insns in llvm17 with -mcpu=v4, so we
>> can keep the number of cpu generations minimum.
> 
> How do you plan to encode the sign-extend load instructions?
> 
> I guess a possibility would be to use one of the available op-mode for
> load instructions that are currently marked as reserved.  For example:
> 
>     IMM  = 0b000
>     ABS  = 0b001
>     IND  = 0b010
>     MEM  = 0b011
>     SEM = 0b100  <- new
> 
> Then we would have the following code fields for sign-extending LDX
> instructions:
> 
>     op-mode:SEM op-size:{W,H,B,DW} op-class:LDX

Right, this is what I plan to do as well. See my incomplete llvm
prototype here:
   https://reviews.llvm.org/D133464

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler
  2023-01-13 16:47             ` Eduard Zingerman
@ 2023-01-26  5:49               ` Alexei Starovoitov
  0 siblings, 0 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2023-01-26  5:49 UTC (permalink / raw)
  To: Eduard Zingerman, Jose E. Marchesi, Yonghong Song, Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, andrii, daniel, kernel-team, yhs,
	david.faust, James Hilliard

On 1/13/23 8:47 AM, Eduard Zingerman wrote:
>>>>         So the patch extends this logic to also handle BPF_ST.
>>>> The patch 3 is necessary to land before llvm starts generating 'st'
>>>> for ctx access.
>>>> That's clear, but I'm missing why patch 1 is necessary.
>>>> Sure, it's making the verifier understand scalar spills with 'st' and
>>>> makes 'st' equivalent to 'stx', but I'm missing why it's necessary.
>>>> What kind of programs fail to be verified when llvm starts generating 'st' ?
> 
> I should have added an example to the summary. There are a few
> test_prog tests that fail w/o this patch, namely atomic_bounds,
> dynptr, for_each, xdp_noinline, xdp_synproxy.
> 
> Here is how atomic_bounds looks:
> 
>    SEC("fentry/bpf_fentry_test1")
>    int BPF_PROG(sub, int x)
>    {
>            int a = 0;
>            int b = __sync_fetch_and_add(&a, 1);
>            /* b is certainly 0 here. Can the verifier tell? */
>            while (b)
>                    continue;
>            return 0;
>    }
> 
>      Compiled w/o BPF_ST                  Compiled with BPF_ST
>    
>    <sub>:                               <sub>:
>      w1 = 0x0
>      *(u32 *)(r10 - 0x4) = r1             *(u32 *)(r10 - 0x4) = 0x0
>      w1 = 0x1                             w1 = 0x1
>      lock *(u32 *)(r10 - 0x4) += r1       lock *(u32 *)(r10 - 0x4) += r1
>      if w1 == 0x0 goto +0x1 <LBB0_2>      if w1 == 0x0 goto +0x1 <LBB0_2>
>    
>    <LBB0_1>:                            <LBB0_1>:
>      goto -0x1 <LBB0_1>                   goto -0x1 <LBB0_1>
>    
>    <LBB0_2>:                            <LBB0_2>:
>      w0 = 0x0                             w0 = 0x0
>      exit                                 exit
> 
> When compiled with BPF_ST and verified w/o the patch #1 verification log
> looks as follows:
> 
>    0: R1=ctx(off=0,imm=0) R10=fp0
>    0: (62) *(u32 *)(r10 -4) = 0          ; R10=fp0 fp-8=mmmm????
>    1: (b4) w1 = 1                        ; R1_w=1
>    2: (c3) r1 = atomic_fetch_add((u32 *)(r10 -4), r1)    ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=mmmm????
>    3: (16) if w1 == 0x0 goto pc+1        ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
>    4: (05) goto pc-1
>    4: (05) goto pc-1
>    4: (05) goto pc-1
>    4: (05) goto pc-1
>    infinite loop detected at insn 4
> 
> When compiled w/o BPF_ST and verified w/o the patch #1 verification log
> looks as follows:
> 
>    func#0 @0
>    reg type unsupported for arg#0 function sub#5
>    0: R1=ctx(off=0,imm=0) R10=fp0
>    0: (b4) w1 = 0                        ; R1_w=0
>    1: (63) *(u32 *)(r10 -4) = r1
>    last_idx 1 first_idx 0
>    regs=2 stack=0 before 0: (b4) w1 = 0
>    2: R1_w=0 R10=fp0 fp-8=0000????
>    2: (b4) w1 = 1                        ; R1_w=1
>    ; int b = __sync_fetch_and_add(&a, 1);
>    3: (c3) r1 = atomic_fetch_add((u32 *)(r10 -4), r1)    ; R1_w=P0 R10=fp0 fp-8=mmmm????
>    4: (16) if w1 == 0x0 goto pc+1
>    6: R1_w=P0
>    6: (b4) w0 = 0                        ; R0_w=0
>    7: (95) exit
>    processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> 
> The difference comes from the way zero write to `r10-4` is processed,
> with BPF_ST it is tracked as `fp-8=mmmm????` after write, without BPF_ST
> it is tracked as `fp-8=0000???? after` write.
> 
> Which is caused by the way `check_stack_write_fixed_off()` is written.
> For the register spills it either saves the complete register state or
> STACK_ZERO if register is known to be zero. However, for the BPF_ST it
> just saves STACK_MISC. Hence, the patch #1.

Thank you for explaining.
Though llvm changes are not ready, let's land the required
kernel changes now.
Please craft asm test cases for the issues you've seen with BPF_ST.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-01-26  5:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-31 16:31 [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Eduard Zingerman
2022-12-31 16:31 ` [RFC bpf-next 1/5] bpf: more precise stack write reasoning for BPF_ST instruction Eduard Zingerman
2022-12-31 16:31 ` [RFC bpf-next 2/5] selftests/bpf: check if verifier tracks constants spilled by BPF_ST_MEM Eduard Zingerman
2022-12-31 16:31 ` [RFC bpf-next 3/5] bpf: allow ctx writes using BPF_ST_MEM instruction Eduard Zingerman
2022-12-31 16:31 ` [RFC bpf-next 4/5] selftests/bpf: test if pointer type is tracked for BPF_ST_MEM Eduard Zingerman
2022-12-31 16:31 ` [RFC bpf-next 5/5] selftests/bpf: don't match exact insn index in expected error message Eduard Zingerman
2023-01-04 22:10 ` [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler Andrii Nakryiko
2023-01-05 10:06   ` Jose E. Marchesi
2023-01-05 12:07     ` Eduard Zingerman
2023-01-05 15:07       ` Jose E. Marchesi
2023-01-12 22:27       ` Alexei Starovoitov
2023-01-13  8:02         ` Yonghong Song
2023-01-13  8:53           ` Jose E. Marchesi
2023-01-13 16:47             ` Eduard Zingerman
2023-01-26  5:49               ` Alexei Starovoitov
2023-01-13 19:23             ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).