All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/17] BPF open-coded iterators
@ 2023-03-02 23:49 Andrii Nakryiko
  2023-03-02 23:49 ` [PATCH bpf-next 01/17] bpf: improve stack slot state printing Andrii Nakryiko
                   ` (17 more replies)
  0 siblings, 18 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:49 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Add support for open-coded (aka inline) iterators in BPF world. This is a next
evolution of gradually allowing more powerful and less restrictive looping and
iteration capabilities to BPF programs.

We set up a framework for implementing all kinds of iterators (e.g., cgroup,
task, file, etc, iterators), but this patch set only implements numbers
iterator, which is used to implement ergonomic bpf_for() for-like construct
(see patch #15). We also add bpf_for_each(), which is a generic foreach-like
construct that will work with any kind of open-coded iterator implementation,
as long as we stick with bpf_iter_<type>_{new,next,destroy}() naming pattern.

Patches #1 through #12 are various preparatory patches, first eitht of them
are from preliminaries patch set ([0]) which haven't landed yet, so I just
merged them together to let CI do end-to-end testing of everything properly.
Few new patches further adds some necessary functionality in verifier (like
fixed-size read-only memory access for `int *`-returning kfuncs).

The meat of verifier-side logic is in lucky patch #13. Patch #14 implements
numbers iterator. I kept them separate to have clean reference for how to
integrate new iterator types. And it makes verifier core logic changes
abstracted from any particularities of numbers iterator. Patch #15 adds
bpf_for(), bpf_for_each(), and bpf_repeat() macros to bpf_misc.h, and also
adds yet another pyperf test variant, now with bpf_for() loop. Patch #16 is
verification tests, based on numbers iterator (as the only available right
now). Patch #17 actually tests runtime behavior of numbers iterator.

Most of the relevant details are in corresponding commit messages or code
comments.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=725996&state=*

Cc: Tejun Heo <tj@kernel.org>

Andrii Nakryiko (17):
  bpf: improve stack slot state printing
  bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
  selftests/bpf: enhance align selftest's expected log matching
  bpf: honor env->test_state_freq flag in is_state_visited()
  selftests/bpf: adjust log_fixup's buffer size for proper truncation
  bpf: clean up visit_insn()'s instruction processing
  bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback
    helper
  bpf: ensure that r0 is marked scratched after any function call
  bpf: move kfunc_call_arg_meta higher in the file
  bpf: mark PTR_TO_MEM as non-null register type
  bpf: generalize dynptr_get_spi to be usable for iters
  bpf: add support for fixed-size memory pointer returns for kfuncs
  bpf: add support for open-coded iterator loops
  bpf: implement number iterator
  selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros
  selftests/bpf: add iterators tests
  selftests/bpf: add number iterator tests

 include/linux/bpf.h                           |  19 +-
 include/linux/bpf_verifier.h                  |  22 +-
 include/uapi/linux/bpf.h                      |   6 +
 kernel/bpf/bpf_iter.c                         |  71 ++
 kernel/bpf/helpers.c                          |   3 +
 kernel/bpf/verifier.c                         | 851 ++++++++++++++++--
 tools/include/uapi/linux/bpf.h                |   6 +
 .../testing/selftests/bpf/prog_tests/align.c  |  18 +-
 .../bpf/prog_tests/bpf_verif_scale.c          |   6 +
 .../testing/selftests/bpf/prog_tests/iters.c  |  62 ++
 .../selftests/bpf/prog_tests/log_fixup.c      |   2 +-
 .../bpf/prog_tests/uprobe_autoattach.c        |   1 -
 tools/testing/selftests/bpf/progs/bpf_misc.h  |  77 ++
 tools/testing/selftests/bpf/progs/iters.c     | 720 +++++++++++++++
 .../selftests/bpf/progs/iters_looping.c       | 163 ++++
 tools/testing/selftests/bpf/progs/iters_num.c | 242 +++++
 .../selftests/bpf/progs/iters_state_safety.c  | 455 ++++++++++
 tools/testing/selftests/bpf/progs/lsm.c       |   4 +-
 tools/testing/selftests/bpf/progs/pyperf.h    |  14 +-
 .../selftests/bpf/progs/pyperf600_iter.c      |   7 +
 .../selftests/bpf/progs/pyperf600_nounroll.c  |   3 -
 21 files changed, 2641 insertions(+), 111 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/iters.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters_looping.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters_num.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters_state_safety.c
 create mode 100644 tools/testing/selftests/bpf/progs/pyperf600_iter.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 01/17] bpf: improve stack slot state printing
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
@ 2023-03-02 23:49 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 02/17] bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} Andrii Nakryiko
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:49 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Improve stack slot state printing to provide more useful and relevant
information, especially for dynptrs. While previously we'd see something
like:

  8: (85) call bpf_ringbuf_reserve_dynptr#198   ; R0_w=scalar() fp-8_w=dddddddd fp-16_w=dddddddd refs=2

Now we'll see way more useful:

  8: (85) call bpf_ringbuf_reserve_dynptr#198   ; R0_w=scalar() fp-16_w=dynptr_ringbuf(ref_id=2) refs=2

I experimented with printing the range of slots taken by dynptr,
something like:

  fp-16..8_w=dynptr_ringbuf(ref_id=2)

But it felt very awkward and pretty useless. So we print the lowest
address (most negative offset) only.

The general structure of this code is now also set up for easier
extension and will accommodate ITER slots naturally.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 75 ++++++++++++++++++++++++++++---------------
 1 file changed, 49 insertions(+), 26 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bf580f246a01..60cc8473faa8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -705,6 +705,25 @@ static const char *kernel_type_name(const struct btf* btf, u32 id)
 	return btf_name_by_offset(btf, btf_type_by_id(btf, id)->name_off);
 }
 
+static const char *dynptr_type_str(enum bpf_dynptr_type type)
+{
+	switch (type) {
+	case BPF_DYNPTR_TYPE_LOCAL:
+		return "local";
+	case BPF_DYNPTR_TYPE_RINGBUF:
+		return "ringbuf";
+	case BPF_DYNPTR_TYPE_SKB:
+		return "skb";
+	case BPF_DYNPTR_TYPE_XDP:
+		return "xdp";
+	case BPF_DYNPTR_TYPE_INVALID:
+		return "<invalid>";
+	default:
+		WARN_ONCE(1, "unknown dynptr type %d\n", type);
+		return "<unknown>";
+	}
+}
+
 static void mark_reg_scratched(struct bpf_verifier_env *env, u32 regno)
 {
 	env->scratched_regs |= 1U << regno;
@@ -1176,26 +1195,49 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 		for (j = 0; j < BPF_REG_SIZE; j++) {
 			if (state->stack[i].slot_type[j] != STACK_INVALID)
 				valid = true;
-			types_buf[j] = slot_type_char[
-					state->stack[i].slot_type[j]];
+			types_buf[j] = slot_type_char[state->stack[i].slot_type[j]];
 		}
 		types_buf[BPF_REG_SIZE] = 0;
 		if (!valid)
 			continue;
 		if (!print_all && !stack_slot_scratched(env, i))
 			continue;
-		verbose(env, " fp%d", (-i - 1) * BPF_REG_SIZE);
-		print_liveness(env, state->stack[i].spilled_ptr.live);
-		if (is_spilled_reg(&state->stack[i])) {
+		switch (state->stack[i].slot_type[BPF_REG_SIZE - 1]) {
+		case STACK_SPILL:
 			reg = &state->stack[i].spilled_ptr;
 			t = reg->type;
+
+			verbose(env, " fp%d", (-i - 1) * BPF_REG_SIZE);
+			print_liveness(env, reg->live);
 			verbose(env, "=%s", t == SCALAR_VALUE ? "" : reg_type_str(env, t));
 			if (t == SCALAR_VALUE && reg->precise)
 				verbose(env, "P");
 			if (t == SCALAR_VALUE && tnum_is_const(reg->var_off))
 				verbose(env, "%lld", reg->var_off.value + reg->off);
-		} else {
+			break;
+		case STACK_DYNPTR:
+			i += BPF_DYNPTR_NR_SLOTS - 1;
+			reg = &state->stack[i].spilled_ptr;
+
+			verbose(env, " fp%d", (-i - 1) * BPF_REG_SIZE);
+			print_liveness(env, reg->live);
+			verbose(env, "=dynptr_%s", dynptr_type_str(reg->dynptr.type));
+			if (reg->ref_obj_id)
+				verbose(env, "(ref_id=%d)", reg->ref_obj_id);
+			break;
+		case STACK_MISC:
+		case STACK_ZERO:
+		default:
+			reg = &state->stack[i].spilled_ptr;
+
+			for (j = 0; j < BPF_REG_SIZE; j++)
+				types_buf[j] = slot_type_char[state->stack[i].slot_type[j]];
+			types_buf[BPF_REG_SIZE] = 0;
+
+			verbose(env, " fp%d", (-i - 1) * BPF_REG_SIZE);
+			print_liveness(env, reg->live);
 			verbose(env, "=%s", types_buf);
+			break;
 		}
 	}
 	if (state->acquired_refs && state->refs[0].id) {
@@ -6312,28 +6354,9 @@ static int process_dynptr_func(struct bpf_verifier_env *env, int regno, int insn
 
 		/* Fold modifiers (in this case, MEM_RDONLY) when checking expected type */
 		if (!is_dynptr_type_expected(env, reg, arg_type & ~MEM_RDONLY)) {
-			const char *err_extra = "";
-
-			switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
-			case DYNPTR_TYPE_LOCAL:
-				err_extra = "local";
-				break;
-			case DYNPTR_TYPE_RINGBUF:
-				err_extra = "ringbuf";
-				break;
-			case DYNPTR_TYPE_SKB:
-				err_extra = "skb ";
-				break;
-			case DYNPTR_TYPE_XDP:
-				err_extra = "xdp ";
-				break;
-			default:
-				err_extra = "<unknown>";
-				break;
-			}
 			verbose(env,
 				"Expected a dynptr of type %s as arg #%d\n",
-				err_extra, regno);
+				dynptr_type_str(arg_to_dynptr_type(arg_type)), regno);
 			return -EINVAL;
 		}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 02/17] bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
  2023-03-02 23:49 ` [PATCH bpf-next 01/17] bpf: improve stack slot state printing Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 03/17] selftests/bpf: enhance align selftest's expected log matching Andrii Nakryiko
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Teach regsafe() logic to handle PTR_TO_MEM, PTR_TO_BUF, and
PTR_TO_TP_BUFFER similarly to PTR_TO_MAP_{KEY,VALUE}. That is, instead of
exact match for var_off and range, use tnum_in() and range_within()
checks, allowing more general verified state to subsume more specific
current state. This allows to match wider range of valid and safe
states, speeding up verification and detecting wider range of equivalent
states for upcoming open-coded iteration looping logic.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 60cc8473faa8..97f03f9fc711 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14114,13 +14114,17 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
 		       tnum_in(rold->var_off, rcur->var_off);
 	case PTR_TO_MAP_KEY:
 	case PTR_TO_MAP_VALUE:
+	case PTR_TO_MEM:
+	case PTR_TO_BUF:
+	case PTR_TO_TP_BUFFER:
 		/* If the new min/max/var_off satisfy the old ones and
 		 * everything else matches, we are OK.
 		 */
 		return memcmp(rold, rcur, offsetof(struct bpf_reg_state, var_off)) == 0 &&
 		       range_within(rold, rcur) &&
 		       tnum_in(rold->var_off, rcur->var_off) &&
-		       check_ids(rold->id, rcur->id, idmap);
+		       check_ids(rold->id, rcur->id, idmap) &&
+		       check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap);
 	case PTR_TO_PACKET_META:
 	case PTR_TO_PACKET:
 		/* We must have at least as much range as the old ptr
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 03/17] selftests/bpf: enhance align selftest's expected log matching
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
  2023-03-02 23:49 ` [PATCH bpf-next 01/17] bpf: improve stack slot state printing Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 02/17] bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 04/17] bpf: honor env->test_state_freq flag in is_state_visited() Andrii Nakryiko
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Allow to search for expected register state in all the verifier log
output that's related to specified instruction number.

See added comment for an example of possible situation that is happening
due to a simple enhancement done in the next patch, which fixes handling
of env->test_state_freq flag in state checkpointing logic.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/prog_tests/align.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/align.c b/tools/testing/selftests/bpf/prog_tests/align.c
index 4666f88f2bb4..c94fa8d6c4f6 100644
--- a/tools/testing/selftests/bpf/prog_tests/align.c
+++ b/tools/testing/selftests/bpf/prog_tests/align.c
@@ -660,16 +660,22 @@ static int do_test_single(struct bpf_align_test *test)
 			 * func#0 @0
 			 * 0: R1=ctx(off=0,imm=0) R10=fp0
 			 * 0: (b7) r3 = 2                 ; R3_w=2
+			 *
+			 * Sometimes it's actually two lines below, e.g. when
+			 * searching for "6: R3_w=scalar(umax=255,var_off=(0x0; 0xff))":
+			 *   from 4 to 6: R0_w=pkt(off=8,r=8,imm=0) R1=ctx(off=0,imm=0) R2_w=pkt(off=0,r=8,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
+			 *   6: R0_w=pkt(off=8,r=8,imm=0) R1=ctx(off=0,imm=0) R2_w=pkt(off=0,r=8,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
+			 *   6: (71) r3 = *(u8 *)(r2 +0)           ; R2_w=pkt(off=0,r=8,imm=0) R3_w=scalar(umax=255,var_off=(0x0; 0xff))
 			 */
-			if (!strstr(line_ptr, m.match)) {
+			while (!strstr(line_ptr, m.match)) {
 				cur_line = -1;
 				line_ptr = strtok(NULL, "\n");
-				sscanf(line_ptr, "%u: ", &cur_line);
+				sscanf(line_ptr ?: "", "%u: ", &cur_line);
+				if (!line_ptr || cur_line != m.line)
+					break;
 			}
-			if (cur_line != m.line || !line_ptr ||
-			    !strstr(line_ptr, m.match)) {
-				printf("Failed to find match %u: %s\n",
-				       m.line, m.match);
+			if (cur_line != m.line || !line_ptr || !strstr(line_ptr, m.match)) {
+				printf("Failed to find match %u: %s\n", m.line, m.match);
 				ret = 1;
 				printf("%s", bpf_vlog);
 				break;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 04/17] bpf: honor env->test_state_freq flag in is_state_visited()
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 03/17] selftests/bpf: enhance align selftest's expected log matching Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 05/17] selftests/bpf: adjust log_fixup's buffer size for proper truncation Andrii Nakryiko
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

env->test_state_freq flag can be set by user by passing
BPF_F_TEST_STATE_FREQ program flag. This is used in a bunch of selftests
to have predictable state checkpoints at every jump and so on.

Currently, bounded loop handling heuristic ignores this flag if number
of processed jumps and/or number of processed instructions is below some
thresholds, which throws off that reliable state checkpointing.

Honor this flag in all circumstances by disabling heuristic if
env->test_state_freq is set.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 97f03f9fc711..154f5d251ecb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14556,7 +14556,8 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 * This threshold shouldn't be too high either, since states
 			 * at the end of the loop are likely to be useful in pruning.
 			 */
-			if (env->jmps_processed - env->prev_jmps_processed < 20 &&
+			if (!env->test_state_freq &&
+			    env->jmps_processed - env->prev_jmps_processed < 20 &&
 			    env->insn_processed - env->prev_insn_processed < 100)
 				add_new_state = false;
 			goto miss;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 05/17] selftests/bpf: adjust log_fixup's buffer size for proper truncation
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 04/17] bpf: honor env->test_state_freq flag in is_state_visited() Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 06/17] bpf: clean up visit_insn()'s instruction processing Andrii Nakryiko
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Adjust log_fixup's expected buffer length to fix the test. It's pretty
finicky in its length expectation, but it doesn't break often. So just
adjust the length to work on current kernel and with follow up iterator
changes as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/prog_tests/log_fixup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/log_fixup.c b/tools/testing/selftests/bpf/prog_tests/log_fixup.c
index f4ffdcabf4e4..239e1c5753b0 100644
--- a/tools/testing/selftests/bpf/prog_tests/log_fixup.c
+++ b/tools/testing/selftests/bpf/prog_tests/log_fixup.c
@@ -141,7 +141,7 @@ void test_log_fixup(void)
 	if (test__start_subtest("bad_core_relo_trunc_partial"))
 		bad_core_relo(300, TRUNC_PARTIAL /* truncate original log a bit */);
 	if (test__start_subtest("bad_core_relo_trunc_full"))
-		bad_core_relo(250, TRUNC_FULL  /* truncate also libbpf's message patch */);
+		bad_core_relo(210, TRUNC_FULL  /* truncate also libbpf's message patch */);
 	if (test__start_subtest("bad_core_relo_subprog"))
 		bad_core_relo_subprog();
 	if (test__start_subtest("missing_map"))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 06/17] bpf: clean up visit_insn()'s instruction processing
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 05/17] selftests/bpf: adjust log_fixup's buffer size for proper truncation Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 07/17] bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper Andrii Nakryiko
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Instead of referencing processed instruction repeatedly as insns[t]
throughout entire visit_insn() function, take a local insn pointer and
work with it in a cleaner way.

It makes enhancing this function further a bit easier as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 154f5d251ecb..f8055f3d9b47 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13389,44 +13389,43 @@ static int visit_func_call_insn(int t, struct bpf_insn *insns,
  */
 static int visit_insn(int t, struct bpf_verifier_env *env)
 {
-	struct bpf_insn *insns = env->prog->insnsi;
+	struct bpf_insn *insns = env->prog->insnsi, *insn = &insns[t];
 	int ret;
 
-	if (bpf_pseudo_func(insns + t))
+	if (bpf_pseudo_func(insn))
 		return visit_func_call_insn(t, insns, env, true);
 
 	/* All non-branch instructions have a single fall-through edge. */
-	if (BPF_CLASS(insns[t].code) != BPF_JMP &&
-	    BPF_CLASS(insns[t].code) != BPF_JMP32)
+	if (BPF_CLASS(insn->code) != BPF_JMP &&
+	    BPF_CLASS(insn->code) != BPF_JMP32)
 		return push_insn(t, t + 1, FALLTHROUGH, env, false);
 
-	switch (BPF_OP(insns[t].code)) {
+	switch (BPF_OP(insn->code)) {
 	case BPF_EXIT:
 		return DONE_EXPLORING;
 
 	case BPF_CALL:
-		if (insns[t].imm == BPF_FUNC_timer_set_callback)
+		if (insn->imm == BPF_FUNC_timer_set_callback)
 			/* Mark this call insn as a prune point to trigger
 			 * is_state_visited() check before call itself is
 			 * processed by __check_func_call(). Otherwise new
 			 * async state will be pushed for further exploration.
 			 */
 			mark_prune_point(env, t);
-		return visit_func_call_insn(t, insns, env,
-					    insns[t].src_reg == BPF_PSEUDO_CALL);
+		return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
 
 	case BPF_JA:
-		if (BPF_SRC(insns[t].code) != BPF_K)
+		if (BPF_SRC(insn->code) != BPF_K)
 			return -EINVAL;
 
 		/* unconditional jump with single edge */
-		ret = push_insn(t, t + insns[t].off + 1, FALLTHROUGH, env,
+		ret = push_insn(t, t + insn->off + 1, FALLTHROUGH, env,
 				true);
 		if (ret)
 			return ret;
 
-		mark_prune_point(env, t + insns[t].off + 1);
-		mark_jmp_point(env, t + insns[t].off + 1);
+		mark_prune_point(env, t + insn->off + 1);
+		mark_jmp_point(env, t + insn->off + 1);
 
 		return ret;
 
@@ -13438,7 +13437,7 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
 		if (ret)
 			return ret;
 
-		return push_insn(t, t + insns[t].off + 1, BRANCH, env, true);
+		return push_insn(t, t + insn->off + 1, BRANCH, env, true);
 	}
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 07/17] bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 06/17] bpf: clean up visit_insn()'s instruction processing Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 08/17] bpf: ensure that r0 is marked scratched after any function call Andrii Nakryiko
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

It's not correct to assume that any BPF_CALL instruction is a helper
call. Fix visit_insn()'s detection of bpf_timer_set_callback() helper by
also checking insn->code == 0. For kfuncs insn->code would be set to
BPF_PSEUDO_KFUNC_CALL, and for subprog calls it will be BPF_PSEUDO_CALL.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f8055f3d9b47..666e416dc8a2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13405,7 +13405,7 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
 		return DONE_EXPLORING;
 
 	case BPF_CALL:
-		if (insn->imm == BPF_FUNC_timer_set_callback)
+		if (insn->src_reg == 0 && insn->imm == BPF_FUNC_timer_set_callback)
 			/* Mark this call insn as a prune point to trigger
 			 * is_state_visited() check before call itself is
 			 * processed by __check_func_call(). Otherwise new
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 08/17] bpf: ensure that r0 is marked scratched after any function call
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 07/17] bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 09/17] bpf: move kfunc_call_arg_meta higher in the file Andrii Nakryiko
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

r0 is important (unless called function is void-returning, but that's
taken care of by print_verifier_state() anyways) in verifier logs.
Currently for helpers we seem to print it in verifier log, but for
kfuncs we don't.

Instead of figuring out where in the maze of code we accidentally set r0
as scratched for helpers and why we don't do that for kfuncs, just
enforce that after any function call r0 is marked as scratched.

Also, perhaps, we should reconsider "scratched" terminology, as it's
mightily confusing. "Touched" would seem more appropriate. But I left
that for follow ups for now.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 666e416dc8a2..0004c9f3737f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15001,6 +15001,8 @@ static int do_check(struct bpf_verifier_env *env)
 					err = check_helper_call(env, insn, &env->insn_idx);
 				if (err)
 					return err;
+
+				mark_reg_scratched(env, BPF_REG_0);
 			} else if (opcode == BPF_JA) {
 				if (BPF_SRC(insn->code) != BPF_K ||
 				    insn->imm != 0 ||
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 09/17] bpf: move kfunc_call_arg_meta higher in the file
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 08/17] bpf: ensure that r0 is marked scratched after any function call Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 10/17] bpf: mark PTR_TO_MEM as non-null register type Andrii Nakryiko
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Move struct bpf_kfunc_call_arg_meta higher in the file and put it next
to struct bpf_call_arg_meta, so it can be used from more functions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 70 +++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0004c9f3737f..a75d909f4a59 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -270,6 +270,41 @@ struct bpf_call_arg_meta {
 	struct btf_field *kptr_field;
 };
 
+struct bpf_kfunc_call_arg_meta {
+	/* In parameters */
+	struct btf *btf;
+	u32 func_id;
+	u32 kfunc_flags;
+	const struct btf_type *func_proto;
+	const char *func_name;
+	/* Out parameters */
+	u32 ref_obj_id;
+	u8 release_regno;
+	bool r0_rdonly;
+	u32 ret_btf_id;
+	u64 r0_size;
+	u32 subprogno;
+	struct {
+		u64 value;
+		bool found;
+	} arg_constant;
+	struct {
+		struct btf *btf;
+		u32 btf_id;
+	} arg_obj_drop;
+	struct {
+		struct btf_field *field;
+	} arg_list_head;
+	struct {
+		struct btf_field *field;
+	} arg_rbtree_root;
+	struct {
+		enum bpf_dynptr_type type;
+		u32 id;
+	} initialized_dynptr;
+	u64 mem_size;
+};
+
 struct btf *btf_vmlinux;
 
 static DEFINE_MUTEX(bpf_verifier_lock);
@@ -8712,41 +8747,6 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
 	}
 }
 
-struct bpf_kfunc_call_arg_meta {
-	/* In parameters */
-	struct btf *btf;
-	u32 func_id;
-	u32 kfunc_flags;
-	const struct btf_type *func_proto;
-	const char *func_name;
-	/* Out parameters */
-	u32 ref_obj_id;
-	u8 release_regno;
-	bool r0_rdonly;
-	u32 ret_btf_id;
-	u64 r0_size;
-	u32 subprogno;
-	struct {
-		u64 value;
-		bool found;
-	} arg_constant;
-	struct {
-		struct btf *btf;
-		u32 btf_id;
-	} arg_obj_drop;
-	struct {
-		struct btf_field *field;
-	} arg_list_head;
-	struct {
-		struct btf_field *field;
-	} arg_rbtree_root;
-	struct {
-		enum bpf_dynptr_type type;
-		u32 id;
-	} initialized_dynptr;
-	u64 mem_size;
-};
-
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->kfunc_flags & KF_ACQUIRE;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 10/17] bpf: mark PTR_TO_MEM as non-null register type
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 09/17] bpf: move kfunc_call_arg_meta higher in the file Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 11/17] bpf: generalize dynptr_get_spi to be usable for iters Andrii Nakryiko
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

PTR_TO_MEM register without PTR_MAYBE_NULL is indeed non-null. This is
important for BPF verifier to be able to prune guaranteed not to be
taken branches. This is always the case with open-coded iterators.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a75d909f4a59..4ed53280ce95 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -487,7 +487,8 @@ static bool reg_type_not_null(enum bpf_reg_type type)
 		type == PTR_TO_TCP_SOCK ||
 		type == PTR_TO_MAP_VALUE ||
 		type == PTR_TO_MAP_KEY ||
-		type == PTR_TO_SOCK_COMMON;
+		type == PTR_TO_SOCK_COMMON ||
+		type == PTR_TO_MEM;
 }
 
 static bool type_is_ptr_alloc_obj(u32 type)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 11/17] bpf: generalize dynptr_get_spi to be usable for iters
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (9 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 10/17] bpf: mark PTR_TO_MEM as non-null register type Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 12/17] bpf: add support for fixed-size memory pointer returns for kfuncs Andrii Nakryiko
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Generalize the logic of fetching special stack slot object state using
spi (stack slot index). This will be used by STACK_ITER logic next.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4ed53280ce95..641a36204493 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -710,32 +710,38 @@ static bool is_spi_bounds_valid(struct bpf_func_state *state, int spi, int nr_sl
        return spi - nr_slots + 1 >= 0 && spi < allocated_slots;
 }
 
-static int dynptr_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+static int stack_slot_obj_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+			          const char *obj_kind, int nr_slots)
 {
 	int off, spi;
 
 	if (!tnum_is_const(reg->var_off)) {
-		verbose(env, "dynptr has to be at a constant offset\n");
+		verbose(env, "%s has to be at a constant offset\n", obj_kind);
 		return -EINVAL;
 	}
 
 	off = reg->off + reg->var_off.value;
 	if (off % BPF_REG_SIZE) {
-		verbose(env, "cannot pass in dynptr at an offset=%d\n", off);
+		verbose(env, "cannot pass in %s at an offset=%d\n", obj_kind, off);
 		return -EINVAL;
 	}
 
 	spi = __get_spi(off);
-	if (spi < 1) {
-		verbose(env, "cannot pass in dynptr at an offset=%d\n", off);
+	if (spi + 1 < nr_slots) {
+		verbose(env, "cannot pass in %s at an offset=%d\n", obj_kind, off);
 		return -EINVAL;
 	}
 
-	if (!is_spi_bounds_valid(func(env, reg), spi, BPF_DYNPTR_NR_SLOTS))
+	if (!is_spi_bounds_valid(func(env, reg), spi, nr_slots))
 		return -ERANGE;
 	return spi;
 }
 
+static int dynptr_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	return stack_slot_obj_get_spi(env, reg, "dynptr", BPF_DYNPTR_NR_SLOTS);
+}
+
 static const char *kernel_type_name(const struct btf* btf, u32 id)
 {
 	return btf_name_by_offset(btf, btf_type_by_id(btf, id)->name_off);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 12/17] bpf: add support for fixed-size memory pointer returns for kfuncs
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (10 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 11/17] bpf: generalize dynptr_get_spi to be usable for iters Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-02 23:50 ` [PATCH bpf-next 13/17] bpf: add support for open-coded iterator loops Andrii Nakryiko
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Support direct fixed-size (and for now, read-only) memory access when
kfunc's return type is a pointer to non-struct type. Calculate type size
and let BPF program access that many bytes directly. This is crucial for
numbers iterator.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/verifier.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 641a36204493..0ff9dd9170ef 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10241,6 +10241,14 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				return -EFAULT;
 			}
 		} else if (!__btf_type_is_struct(ptr_type)) {
+			if (!meta.r0_size) {
+				__u32 sz;
+
+				if (!IS_ERR(btf_resolve_size(desc_btf, ptr_type, &sz))) {
+					meta.r0_size = sz;
+					meta.r0_rdonly = true;
+				}
+			}
 			if (!meta.r0_size) {
 				ptr_type_name = btf_name_by_offset(desc_btf,
 								   ptr_type->name_off);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 13/17] bpf: add support for open-coded iterator loops
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (11 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 12/17] bpf: add support for fixed-size memory pointer returns for kfuncs Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-04 20:02   ` Alexei Starovoitov
  2023-03-02 23:50 ` [PATCH bpf-next 14/17] bpf: implement number iterator Andrii Nakryiko
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Teach verifier about the concept of open-coded (or inline) iterators.

This patch adds generic iterator loop verification logic, new STACK_ITER
stack slot type to contain iterator state, and necessary kfunc plumbing
for iterator's constructor, destructor and next "methods". Next patch
implements first specific version of iterator (number iterator for
implementing for loop). Such split allows to have more focused commits
for verifier logic and separate commit that we could point later to what
it takes to add new kind of iterator.

First, we add new fixed-size opaque struct bpf_iter (24-byte long) to
contain runtime state of any possible iterator. struct bpf_iter state is
supposed to be on BPF program's stack, so there will be no way to change
its size later on. 24 bytes are chosen as a compromise between using too
much stack and providing too little pre-allocated space for future
iterator types. Numbers iterator in the next patch needs only 8, and
thus fits perfectly and won't require any runtime memory allocation. But
if some other iterator implementation would require more than 24 bytes
to represent iterator runtime state, we'd have to perform runtime
allocation for extra state, which leads to performance hit and general
unreliability (especially in restricted environments of NMI or IRQ).
3 words were chosen to at least accommodate 1 pointer to "collection"
(e.g., bpf_map), 1 pointer to "current/next element", and still leaving
1 word for extra parameters and flags. As such it should hopefully make
a lot of other iterator implementations much simpler and reliable.

The way BPF verifier logic is implemented, there are no artificial
restrictions on a number of active iterators, it should work correctly
using multiple at the same time. This also means you can have multiple
nested iteration loops.  struct bpf_iter reference can be safely passed
to subprograms as well.

General flow is easiest to demonstrate with a simple example using
number iterator implemented in next patch. Here's the simplest possible
loop:

  struct bpf_iter it;
  int *v;

  bpf_iter_num_new(&it, 2, 5);
  while ((v = bpf_iter_num_next(&it))) {
      bpf_printk("X = %d", *v);
  }
  bpf_iter_num_destroy(&it);

Above snippet should output "X = 2", "X = 3", "X = 4". Note that 5 is
exclusive and is not returned.

In the above example, we see a trio of function:
  - constructor, bpf_iter_num_new(), which initializes iterator state
  (struct bpf_iter it) on the stack. If any on the input arguments are
  invalid, constructor should make sure to still initialize it such that
  subsequent bpf_iter_num_next() calls will return NULL. I.e., on error,
  return error and construct empty iterator.
  - next method, bpf_iter_num_next(), which accepts pointer to iterator
  state and produces an element. Next method should always return
  a pointer. The contract between BPF verifier is that next method will
  always eventually return NULL when elements are exhausted. One NULL is
  returned, subsequent next calls should keep returning NULL. In the
  case of numbers iterator, bpf_iter_num_next() returns a pointer to int
  (where current integer value itself is stored inside iterator state),
  which can be dereferenced after according NULL check.
  - once done with the iterator, it's mandated that user cleans up its
  state with destructor, bpf_iter_num_destroy() in this case. Destructor
  frees up any resources and marks stack space used by struct bpf_iter
  as usable for something else.

Any other iterator implementation will have to implement at least these
three methods. It is enforced that for any given type of iterator only
applicable constructor/destructor/next are callable. I.e., verifier
ensures you can't pass number iterator into, say, cgroup iterator's next
method.

It is important to keep naming consistent to be able to create generic
helpers/macros to help with bpf_iter usability. E.g., one of the follow
up patches adds generic bpf_for_each() macro to bpf_misc.h in selftests,
which allows to utilize iterator "trio" nicely without having to code
the above somewhat tedious loop explicitly every time.

**So it is expected that new iterator implementations will follow
bpf_iter_<kind>_{new,next,destroy}() naming.**

At the implementation level, iterator state tracking for verification
purposes is very similar to dynptr. We add STACK_ITER stack slot type,
reserve 3 slots, keep track of necessary extra state in the "main" slot.
Other slots are marked as STACK_ITER, but having invalid iterator type.
This seems simpler than having a separate "is_first_slot" flag. We
should consider reworking STACK_DYNPTR to follow similar approach.

Another big distinction is that STACK_ITER is *always refcounted*, which
simplifies implementation a bit without sacrificing usability. So no
need for extra "iter_id", no need to anticipate reuse of STACK_ITER
slots for new constructors, etc. Keeping it simpler.

As far as verification logic, there are two extensive comments, in
process_iter_next_call() and iter_active_depths_differ(), please refer
to them for details.

But from 10,000-foot point of view, next methods are the points of
forking, which are conceptually similar to what verifier is doing when
validating conditional jump. We branch out at call bpf_iter_<type>_next
instruction and simulate two situations: NULL (iteration is done) and
non-NULL (new element returned). NULL is simulated first and is supposed
to reach exit without looping. After that non-NULL case is validated and
it either reaches exit (for trivial examples with no real loop), or
reaches another call bpf_iter_<type>_next instruction with the state
equivalent to already (partially) validated. State equivalency at that
point means we technically are going to be looping forever without
"breaking out" out of established "state envelope" (i.e., subsequent
iterations don't add any new knowledge or constraints to verifier state,
so running 1, 2, 10, or a million doesn't matter). But taking into the
account the contract that iterator next method *has to* return NULL
eventually, we can conclude that loop body is safe. Given we validated
logic outside of the loop (NULL case), and concluded that loop body is
safe, though potentially looping many times, verifier can claim safety
of the overall program logic.

The rest of the patch is necessary plumbing for state tracking, marking,
validation, and necessary kfunc infrastructure to allow implementing
iterator constructor, destructor, and next methods.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h            |   7 +
 include/linux/bpf_verifier.h   |  22 +-
 include/uapi/linux/bpf.h       |   6 +
 kernel/bpf/verifier.c          | 621 ++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h |   6 +
 5 files changed, 653 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 23ec684e660d..a968282ba324 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -620,6 +620,8 @@ enum bpf_type_flag {
 #define DYNPTR_TYPE_FLAG_MASK	(DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \
 				 | DYNPTR_TYPE_XDP)
 
+#define ITER_TYPE_FLAG_MASK	(0)
+
 /* Max number of base types. */
 #define BPF_BASE_TYPE_LIMIT	(1UL << BPF_BASE_TYPE_BITS)
 
@@ -663,6 +665,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_TIMER,	/* pointer to bpf_timer */
 	ARG_PTR_TO_KPTR,	/* pointer to referenced kptr */
 	ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
+	ARG_PTR_TO_ITER,	/* pointer to bpf_iter. See bpf_type_flag for iter type */
 	__BPF_ARG_TYPE_MAX,
 
 	/* Extended arg_types. */
@@ -1162,6 +1165,10 @@ enum bpf_dynptr_type {
 int bpf_dynptr_check_size(u32 size);
 u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr);
 
+enum bpf_iter_type {
+	BPF_ITER_TYPE_INVALID,
+};
+
 #ifdef CONFIG_BPF_JIT
 int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
 int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index b26ff2a8f63b..493a4bb239fe 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -59,6 +59,12 @@ struct bpf_active_lock {
 	u32 id;
 };
 
+enum bpf_iter_state {
+	BPF_ITER_STATE_INVALID, /* for non-first slot */
+	BPF_ITER_STATE_ACTIVE,
+	BPF_ITER_STATE_DRAINED,
+};
+
 struct bpf_reg_state {
 	/* Ordering of fields matters.  See states_equal() */
 	enum bpf_reg_type type;
@@ -103,6 +109,13 @@ struct bpf_reg_state {
 			bool first_slot;
 		} dynptr;
 
+		/* For bpf_iter stack slots */
+		struct {
+			enum bpf_iter_type type;
+			enum bpf_iter_state state;
+			int depth;
+		} iter;
+
 		/* Max size from any of the above. */
 		struct {
 			unsigned long raw1;
@@ -141,6 +154,8 @@ struct bpf_reg_state {
 	 * same reference to the socket, to determine proper reference freeing.
 	 * For stack slots that are dynptrs, this is used to track references to
 	 * the dynptr to determine proper reference freeing.
+	 * Similarly to dynptrs, we use ID to track "belonging" of a reference
+	 * to a specific instance of bpf_iter.
 	 */
 	u32 id;
 	/* PTR_TO_SOCKET and PTR_TO_TCP_SOCK could be a ptr returned
@@ -211,11 +226,16 @@ enum bpf_stack_slot_type {
 	 * is stored in bpf_stack_state->spilled_ptr.dynptr.type
 	 */
 	STACK_DYNPTR,
+	STACK_ITER,
 };
 
 #define BPF_REG_SIZE 8	/* size of eBPF register in bytes */
+
 #define BPF_DYNPTR_SIZE		sizeof(struct bpf_dynptr_kern)
-#define BPF_DYNPTR_NR_SLOTS		(BPF_DYNPTR_SIZE / BPF_REG_SIZE)
+#define BPF_DYNPTR_NR_SLOTS	(BPF_DYNPTR_SIZE / BPF_REG_SIZE)
+
+#define BPF_ITER_SIZE		sizeof(struct bpf_iter)
+#define BPF_ITER_NR_SLOTS	(BPF_ITER_SIZE / BPF_REG_SIZE)
 
 struct bpf_stack_state {
 	struct bpf_reg_state spilled_ptr;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c9699304aed2..c4b506193365 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6927,6 +6927,12 @@ struct bpf_dynptr {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_iter {
+	__u64 :64;
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_list_head {
 	__u64 :64;
 	__u64 :64;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0ff9dd9170ef..58754929ee33 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -302,6 +302,10 @@ struct bpf_kfunc_call_arg_meta {
 		enum bpf_dynptr_type type;
 		u32 id;
 	} initialized_dynptr;
+	struct {
+		u8 spi;
+		u8 frameno;
+	} iter;
 	u64 mem_size;
 };
 
@@ -668,6 +672,7 @@ static char slot_type_char[] = {
 	[STACK_MISC]	= 'm',
 	[STACK_ZERO]	= '0',
 	[STACK_DYNPTR]	= 'd',
+	[STACK_ITER]	= 'i',
 };
 
 static void print_liveness(struct bpf_verifier_env *env,
@@ -742,6 +747,11 @@ static int dynptr_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *re
 	return stack_slot_obj_get_spi(env, reg, "dynptr", BPF_DYNPTR_NR_SLOTS);
 }
 
+static int iter_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	return stack_slot_obj_get_spi(env, reg, "iter", BPF_ITER_NR_SLOTS);
+}
+
 static const char *kernel_type_name(const struct btf* btf, u32 id)
 {
 	return btf_name_by_offset(btf, btf_type_by_id(btf, id)->name_off);
@@ -766,6 +776,32 @@ static const char *dynptr_type_str(enum bpf_dynptr_type type)
 	}
 }
 
+static const char *iter_type_str(enum bpf_iter_type type)
+{
+	switch (type) {
+	case BPF_ITER_TYPE_INVALID:
+		return "<invalid>";
+	default:
+		WARN_ONCE(1, "unknown iter type %d\n", type);
+		return "<unknown>";
+	}
+}
+
+static const char *iter_state_str(enum bpf_iter_state state)
+{
+	switch (state) {
+	case BPF_ITER_STATE_ACTIVE:
+		return "active";
+	case BPF_ITER_STATE_DRAINED:
+		return "drained";
+	case BPF_ITER_STATE_INVALID:
+		return "<invalid>";
+	default:
+		WARN_ONCE(1, "unknown iter state %d\n", state);
+		return "<unknown>";
+	}
+}
+
 static void mark_reg_scratched(struct bpf_verifier_env *env, u32 regno)
 {
 	env->scratched_regs |= 1U << regno;
@@ -1118,6 +1154,179 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg
 	}
 }
 
+static enum bpf_dynptr_type arg_to_iter_type(enum bpf_arg_type arg_type)
+{
+	switch (arg_type & ITER_TYPE_FLAG_MASK) {
+	default:
+		return BPF_ITER_TYPE_INVALID;
+	}
+}
+
+static void __mark_reg_known_zero(struct bpf_reg_state *reg);
+
+static int mark_stack_slots_iter(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				 enum bpf_arg_type arg_type, int insn_idx)
+{
+	struct bpf_func_state *state = func(env, reg);
+	enum bpf_iter_type type;
+	int spi, i, j, id;
+
+	spi = iter_get_spi(env, reg);
+	if (spi < 0)
+		return spi;
+
+	type = arg_to_iter_type(arg_type);
+	if (type == BPF_ITER_TYPE_INVALID)
+		return -EINVAL;
+
+	id = acquire_reference_state(env, insn_idx);
+	if (id < 0)
+		return id;
+
+	for (i = 0; i < BPF_ITER_NR_SLOTS; i++) {
+		struct bpf_stack_state *slot = &state->stack[spi - i];
+		struct bpf_reg_state *st = &slot->spilled_ptr;
+
+		__mark_reg_known_zero(st);
+		st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
+		st->live |= REG_LIVE_WRITTEN;
+		st->ref_obj_id = i == 0 ? id : 0;
+		st->iter.type = i == 0 ? type : BPF_ITER_TYPE_INVALID;
+		st->iter.state = BPF_ITER_STATE_ACTIVE;
+		st->iter.depth = 0;
+
+		for (j = 0; j < BPF_REG_SIZE; j++)
+			slot->slot_type[j] = STACK_ITER;
+
+		mark_stack_slot_scratched(env, spi - i);
+	}
+
+	return 0;
+}
+
+static int unmark_stack_slots_iter(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi, i, j;
+
+	spi = iter_get_spi(env, reg);
+	if (spi < 0)
+		return spi;
+
+	for (i = 0; i < BPF_ITER_NR_SLOTS; i++) {
+		struct bpf_stack_state *slot = &state->stack[spi - i];
+		struct bpf_reg_state *st = &slot->spilled_ptr;
+
+		if (i == 0)
+			WARN_ON_ONCE(release_reference(env, st->ref_obj_id));
+
+		__mark_reg_not_init(env, st);
+
+		/* see unmark_stack_slots_dynptr() for why we need to set REG_LIVE_WRITTEN */
+		st->live |= REG_LIVE_WRITTEN;
+
+		for (j = 0; j < BPF_REG_SIZE; j++)
+			slot->slot_type[j] = STACK_INVALID;
+
+		mark_stack_slot_scratched(env, spi - i);
+	}
+
+	return 0;
+}
+
+static bool is_iter_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi, i, j;
+
+	/* For -ERANGE (i.e. spi not falling into allocated stack slots), we
+	 * will do check_mem_access to check and update stack bounds later, so
+	 * return true for that case.
+	 */
+	spi = iter_get_spi(env, reg);
+	if (spi == -ERANGE)
+		return true;
+	if (spi < 0)
+		return spi;
+
+	for (i = 0; i < BPF_ITER_NR_SLOTS; i++) {
+		struct bpf_stack_state *slot = &state->stack[spi - i];
+
+		if (slot->slot_type[j] == STACK_ITER)
+			return false;
+	}
+
+	return true;
+}
+
+static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi, i, j;
+
+	spi = iter_get_spi(env, reg);
+	if (spi < 0)
+		return false;
+
+	for (i = 0; i < BPF_ITER_NR_SLOTS; i++) {
+		struct bpf_stack_state *slot = &state->stack[spi - i];
+		struct bpf_reg_state *st = &slot->spilled_ptr;
+
+		/* only first slot contains valid iterator type */
+		if (i == 0 && st->iter.type == BPF_ITER_TYPE_INVALID)
+			return false;
+
+		for (j = 0; j < BPF_REG_SIZE; j++)
+			if (slot->slot_type[j] != STACK_ITER)
+				return false;
+	}
+
+	return true;
+}
+
+static bool is_iter_type_compatible(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				    enum bpf_arg_type arg_type)
+{
+	struct bpf_func_state *state = func(env, reg);
+	enum bpf_iter_type iter_type;
+	int spi;
+
+	/* ARG_PTR_TO_ITER takes any type of iter */
+	if (arg_type == ARG_PTR_TO_ITER)
+		return true;
+
+	spi = iter_get_spi(env, reg);
+	if (spi < 0)
+		return false;
+
+	iter_type = arg_to_iter_type(arg_type);
+	return state->stack[spi].spilled_ptr.iter.type == iter_type;
+}
+
+/* Check if given stack slot is "special":
+ *   - spilled register state (STACK_SPILL);
+ *   - dynptr state (STACK_DYNPTR);
+ *   - iter state (STACK_ITER).
+ */
+static bool is_stack_slot_special(const struct bpf_stack_state *stack)
+{
+	enum bpf_stack_slot_type type = stack->slot_type[BPF_REG_SIZE - 1];
+
+	switch (type) {
+	case STACK_SPILL:
+	case STACK_DYNPTR:
+	case STACK_ITER:
+		return true;
+	case STACK_INVALID:
+	case STACK_MISC:
+	case STACK_ZERO:
+		return false;
+	default:
+		WARN_ONCE(1, "unknown stack slot type %d\n", type);
+		return true;
+	}
+}
+
 /* The reg state of a pointer or a bounded scalar was saved when
  * it was spilled to the stack.
  */
@@ -1267,6 +1476,16 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 			if (reg->ref_obj_id)
 				verbose(env, "(ref_id=%d)", reg->ref_obj_id);
 			break;
+		case STACK_ITER:
+			i += BPF_ITER_NR_SLOTS - 1;
+			reg = &state->stack[i].spilled_ptr;
+
+			verbose(env, " fp%d", (-i - 1) * BPF_REG_SIZE);
+			print_liveness(env, reg->live);
+			verbose(env, "=iter_%s(ref_id=%d,state=%s,depth=%u)",
+				iter_type_str(reg->iter.type), reg->ref_obj_id,
+				iter_state_str(reg->iter.state), reg->iter.depth);
+			break;
 		case STACK_MISC:
 		case STACK_ZERO:
 		default:
@@ -2710,6 +2929,24 @@ static int mark_dynptr_read(struct bpf_verifier_env *env, struct bpf_reg_state *
 			     state->stack[spi - 1].spilled_ptr.parent, REG_LIVE_READ64);
 }
 
+static int mark_iter_read(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int spi)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int err, i;
+
+	for (i = 0; i < BPF_ITER_NR_SLOTS; i++) {
+		struct bpf_reg_state *st = &state->stack[spi - i].spilled_ptr;
+
+		err = mark_reg_read(env, st, st->parent, REG_LIVE_READ64);
+		if (err)
+			return err;
+
+		mark_stack_slot_scratched(env, spi - i);
+	}
+
+	return 0;
+}
+
 /* This function is supposed to be used by the following 32-bit optimization
  * code only. It returns TRUE if the source or destination register operates
  * on 64-bit, otherwise return FALSE.
@@ -3691,8 +3928,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 
 		/* regular write of data into stack destroys any spilled ptr */
 		state->stack[spi].spilled_ptr.type = NOT_INIT;
-		/* Mark slots as STACK_MISC if they belonged to spilled ptr. */
-		if (is_spilled_reg(&state->stack[spi]))
+		/* Mark slots as STACK_MISC if they belonged to spilled ptr/dynptr/iter. */
+		if (is_stack_slot_special(&state->stack[spi]))
 			for (i = 0; i < BPF_REG_SIZE; i++)
 				scrub_spilled_slot(&state->stack[spi].slot_type[i]);
 
@@ -6407,6 +6644,168 @@ static int process_dynptr_func(struct bpf_verifier_env *env, int regno, int insn
 	return err;
 }
 
+static u32 iter_ref_obj_id(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int spi)
+{
+	struct bpf_func_state *state = func(env, reg);
+
+	return state->stack[spi].spilled_ptr.ref_obj_id;
+}
+
+static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_idx,
+			    enum bpf_arg_type arg_type,
+			    struct bpf_kfunc_call_arg_meta *meta)
+{
+	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	int spi, err, i;
+
+	spi = iter_get_spi(env, reg);
+	if (spi < 0 && spi != -ERANGE)
+		return spi;
+
+	meta->iter.spi = spi;
+	meta->iter.frameno = reg->frameno;
+
+	if (arg_type & MEM_UNINIT) {
+		if (!is_iter_reg_valid_uninit(env, reg)) {
+			verbose(env, "expected uninitialized iter as arg #%d\n", regno);
+			return -EINVAL;
+		}
+
+		/* we write BPF_DW bits (8 bytes) at a time */
+		for (i = 0; i < BPF_ITER_SIZE; i += BPF_REG_SIZE) {
+			err = check_mem_access(env, insn_idx, regno,
+					       i, BPF_DW, BPF_WRITE, -1, false);
+			if (err)
+				return err;
+		}
+
+		err = mark_stack_slots_iter(env, reg, arg_type, insn_idx);
+		if (err)
+			return err;
+	} else {
+		if (!is_iter_reg_valid_init(env, reg)) {
+			verbose(env, "expected an initialized iter as arg #%d\n", regno);
+			return -EINVAL;
+		}
+
+		if (!is_iter_type_compatible(env, reg, arg_type)) {
+			verbose(env, "expected an iter of type %s as arg #%d\n",
+				iter_type_str(arg_to_iter_type(arg_type)), regno);
+			return -EINVAL;
+		}
+
+		err = mark_iter_read(env, reg, spi);
+		if (err)
+			return err;
+
+		meta->ref_obj_id = iter_ref_obj_id(env, reg, spi);
+
+		if (arg_type & OBJ_RELEASE) {
+			err = unmark_stack_slots_iter(env, reg);
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
+/* process_iter_next_call() is called when verifier gets to iterator's next "method"
+ * (e.g., bpf_iter_num_next() for numbers iterator) call. We'll refer to it as
+ * just "iter_next()" in comments below.
+ *
+ * BPF verifier relies on a crucial contract for any iter_next()
+ * implementation: it should *eventually* return NULL, and once that happens
+ * it should keep returning NULL. That is, once iterator exhausts elements to
+ * iterate, it should never reset or spuriously return new elements.
+ *
+ * With the assumption of such contract, process_iter_next_call() simulates
+ * a fork in verifier state to validate loop logic correctness and safety
+ * without having to simulate infinite amount of iterations.
+ *
+ * In current state, we assume that iter_next() returned NULL and iterator
+ * state went to BPF_ITER_STATE_DRAINED. In such conditions we should not form
+ * an infinite loop and should eventually reach exit.
+ *
+ * Besides that, we also fork current state and enqueue it for later
+ * verification. In a forked state we keep iterator state as
+ * BPF_ITER_STATE_ACTIVE and assume non-null return from iter_next(). We also
+ * bump iteration depth to prevent erroneous infinite loop detection later on
+ * (see iter_active_depths_differ() comment for details). In this state we
+ * assume that we'll eventually loop back to another iter_next() calls (it
+ * could be in exactly same location or some other one, it doesn't matter, we
+ * don't make any unnecessary assumptions about this, everything revolves
+ * around iterator state in a stack slot, not which instruction is calling
+ * iter_next()). When that happens, we either will come to iter_next() with
+ * equivalent state and can conclude that next iteration will proceed in
+ * exactly the same way as we just verified, so it's safe to assume that loop
+ * converges. If not, we'll go on another iteration simulation with
+ * a different input state.
+ *
+ * This way, we will either exhaustively discover all possible input states
+ * that iterator loop can start with and eventually will converge, or we'll
+ * effectively regress into bounded loop simulation logic and either reach
+ * maximum number of instructions if loop is not provably convergent, or there
+ * is some statically known limit on number of iterations (e.g., if there is
+ * an explicit `if n > 100 then break;` statement somewhere in the loop).
+ *
+ * One very subtle but very important aspect is that we *always* simulate NULL
+ * condition first (as current state) before we simulate non-NULL case. This
+ * has to do with intricacies of scalar precision tracking. By simulating
+ * "exit condition" of iter_next() returning NULL first, we make sure all the
+ * relevant precision marks *that will be set **after** we exit iterator loop*
+ * are propagated backwards to common parent state of NULL and non-NULL
+ * branches. Thanks to that, state equivalence checks done later in forked
+ * state, when reaching iter_next() for ACTIVE iterator, can assume that
+ * precision marks are finalized and won't change. Because simulating another
+ * ACTIVE iterator iteration won't change them (because given same input
+ * states we'll end up with exactly same output states which we are currently
+ * comparing; and verification after the loop already propagated back what
+ * needs to be **additionally** tracked as precise). It's subtle, grok
+ * precision tracking for more intuitive understanding.
+ */
+static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
+				  struct bpf_kfunc_call_arg_meta *meta)
+{
+	struct bpf_verifier_state *cur_st = env->cur_state, *queued_st;
+	struct bpf_func_state *cur_fr = cur_st->frame[cur_st->curframe], *queued_fr;
+	struct bpf_reg_state *cur_iter, *queued_iter;
+	int iter_frameno = meta->iter.frameno;
+	int iter_spi = meta->iter.spi;
+
+	BTF_TYPE_EMIT(struct bpf_iter);
+
+	cur_iter = &env->cur_state->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
+
+	if (cur_iter->iter.state != BPF_ITER_STATE_ACTIVE &&
+	    cur_iter->iter.state != BPF_ITER_STATE_DRAINED) {
+		verbose(env, "verifier internal error: unexpected iterator state %d (%s)\n",
+			cur_iter->iter.state, iter_state_str(cur_iter->iter.state));
+		return -EFAULT;
+	}
+
+	if (cur_iter->iter.state == BPF_ITER_STATE_ACTIVE) {
+		/* branch out active iter state */
+		queued_st = push_stack(env, insn_idx + 1, insn_idx, false);
+		if (!queued_st)
+			return -ENOMEM;
+
+		queued_iter = &queued_st->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
+		queued_iter->iter.state = BPF_ITER_STATE_ACTIVE;
+		queued_iter->iter.depth++;
+
+		queued_fr = queued_st->frame[queued_st->curframe];
+		mark_ptr_not_null_reg(&queued_fr->regs[BPF_REG_0]);
+	}
+
+	/* switch to DRAINED state, but keep the depth unchanged */
+	/* mark current iter state as drained and assume returned NULL */
+	cur_iter->iter.state = BPF_ITER_STATE_DRAINED;
+	__mark_reg_known_zero(&cur_fr->regs[BPF_REG_0]);
+	cur_fr->regs[BPF_REG_0].type = SCALAR_VALUE;
+
+	return 0;
+}
+
 static bool arg_type_is_mem_size(enum bpf_arg_type type)
 {
 	return type == ARG_CONST_SIZE ||
@@ -6423,6 +6822,11 @@ static bool arg_type_is_dynptr(enum bpf_arg_type type)
 	return base_type(type) == ARG_PTR_TO_DYNPTR;
 }
 
+static bool arg_type_is_iter(enum bpf_arg_type type)
+{
+	return base_type(type) == ARG_PTR_TO_ITER;
+}
+
 static int int_ptr_type_to_size(enum bpf_arg_type type)
 {
 	if (type == ARG_PTR_TO_INT)
@@ -6550,6 +6954,7 @@ static const struct bpf_reg_types dynptr_types = {
 		CONST_PTR_TO_DYNPTR,
 	}
 };
+static const struct bpf_reg_types iter_types = { .types = { PTR_TO_STACK } };
 
 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_MAP_KEY]		= &mem_types,
@@ -6577,6 +6982,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_TIMER]		= &timer_types,
 	[ARG_PTR_TO_KPTR]		= &kptr_types,
 	[ARG_PTR_TO_DYNPTR]		= &dynptr_types,
+	[ARG_PTR_TO_ITER]		= &iter_types,
 };
 
 static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
@@ -6728,6 +7134,9 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 		if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK)
 			return 0;
 
+		if (arg_type_is_iter(arg_type))
+			return 0;
+
 		if ((type_is_ptr_alloc_obj(type) || type_is_non_owning_ref(type)) && reg->off) {
 			if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT))
 				return __check_ptr_off_reg(env, reg, regno, true);
@@ -8879,6 +9288,7 @@ static bool is_kfunc_arg_scalar_with_name(const struct btf *btf,
 
 enum {
 	KF_ARG_DYNPTR_ID,
+	KF_ARG_ITER_ID,
 	KF_ARG_LIST_HEAD_ID,
 	KF_ARG_LIST_NODE_ID,
 	KF_ARG_RB_ROOT_ID,
@@ -8887,6 +9297,7 @@ enum {
 
 BTF_ID_LIST(kf_arg_btf_ids)
 BTF_ID(struct, bpf_dynptr_kern)
+BTF_ID(struct, bpf_iter)
 BTF_ID(struct, bpf_list_head)
 BTF_ID(struct, bpf_list_node)
 BTF_ID(struct, bpf_rb_root)
@@ -8914,6 +9325,11 @@ static bool is_kfunc_arg_dynptr(const struct btf *btf, const struct btf_param *a
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_DYNPTR_ID);
 }
 
+static bool is_kfunc_arg_iter(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_ITER_ID);
+}
+
 static bool is_kfunc_arg_list_head(const struct btf *btf, const struct btf_param *arg)
 {
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_HEAD_ID);
@@ -9000,6 +9416,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_ALLOC_BTF_ID,  /* Allocated object */
 	KF_ARG_PTR_TO_KPTR,	     /* PTR_TO_KPTR but type specific */
 	KF_ARG_PTR_TO_DYNPTR,
+	KF_ARG_PTR_TO_ITER,
 	KF_ARG_PTR_TO_LIST_HEAD,
 	KF_ARG_PTR_TO_LIST_NODE,
 	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
@@ -9077,6 +9494,11 @@ static bool is_kfunc_bpf_rcu_read_unlock(struct bpf_kfunc_call_arg_meta *meta)
 	return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_unlock];
 }
 
+static bool is_iter_next_kfunc(int btf_id)
+{
+	return false;
+}
+
 static enum kfunc_ptr_arg_type
 get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 		       struct bpf_kfunc_call_arg_meta *meta,
@@ -9121,6 +9543,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_dynptr(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_DYNPTR;
 
+	if (is_kfunc_arg_iter(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_ITER;
+
 	if (is_kfunc_arg_list_head(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_LIST_HEAD;
 
@@ -9749,6 +10174,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			break;
 		case KF_ARG_PTR_TO_KPTR:
 		case KF_ARG_PTR_TO_DYNPTR:
+		case KF_ARG_PTR_TO_ITER:
 		case KF_ARG_PTR_TO_LIST_HEAD:
 		case KF_ARG_PTR_TO_LIST_NODE:
 		case KF_ARG_PTR_TO_RB_ROOT:
@@ -9845,6 +10271,18 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 
 			break;
 		}
+		case KF_ARG_PTR_TO_ITER:
+		{
+			enum bpf_arg_type iter_arg_type = ARG_PTR_TO_ITER;
+
+			if (is_kfunc_arg_uninit(btf, &args[i]))
+				iter_arg_type |= MEM_UNINIT;
+
+			ret = process_iter_arg(env, regno, insn_idx, iter_arg_type,  meta);
+			if (ret < 0)
+				return ret;
+			break;
+		}
 		case KF_ARG_PTR_TO_LIST_HEAD:
 			if (reg->type != PTR_TO_MAP_VALUE &&
 			    reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
@@ -10315,6 +10753,12 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			mark_btf_func_reg_size(env, regno, t->size);
 	}
 
+	if (is_iter_next_kfunc(meta.func_id)) {
+		err = process_iter_next_call(env, insn_idx, &meta);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
@@ -13427,6 +13871,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
 			 * async state will be pushed for further exploration.
 			 */
 			mark_prune_point(env, t);
+		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL && is_iter_next_kfunc(insn->imm))
+			mark_prune_point(env, t);
 		return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL);
 
 	case BPF_JA:
@@ -14180,6 +14626,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 	 * didn't use them
 	 */
 	for (i = 0; i < old->allocated_stack; i++) {
+		struct bpf_reg_state *old_reg, *cur_reg;
+
 		spi = i / BPF_REG_SIZE;
 
 		if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)) {
@@ -14236,9 +14684,6 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 				return false;
 			break;
 		case STACK_DYNPTR:
-		{
-			const struct bpf_reg_state *old_reg, *cur_reg;
-
 			old_reg = &old->stack[spi].spilled_ptr;
 			cur_reg = &cur->stack[spi].spilled_ptr;
 			if (old_reg->dynptr.type != cur_reg->dynptr.type ||
@@ -14246,7 +14691,14 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 			    !check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
 				return false;
 			break;
-		}
+		case STACK_ITER:
+			old_reg = &old->stack[spi].spilled_ptr;
+			cur_reg = &cur->stack[spi].spilled_ptr;
+			if (old_reg->iter.type != cur_reg->iter.type ||
+			    old_reg->iter.state != cur_reg->iter.state ||
+			    !check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
+				return false;
+			break;
 		case STACK_MISC:
 		case STACK_ZERO:
 		case STACK_INVALID:
@@ -14505,6 +14957,119 @@ static bool states_maybe_looping(struct bpf_verifier_state *old,
 	return true;
 }
 
+static bool is_iter_next_insn(struct bpf_verifier_env *env, int insn_idx, int *reg_idx)
+{
+	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
+	const struct btf_param *args;
+	const struct btf_type *t;
+	const struct btf *btf;
+	int nargs, i;
+
+	if (!bpf_pseudo_kfunc_call(insn))
+		return false;
+	if (!is_iter_next_kfunc(insn->imm))
+		return false;
+
+	btf = find_kfunc_desc_btf(env, insn->off);
+	if (IS_ERR(btf))
+		return false;
+
+	t = btf_type_by_id(btf, insn->imm);	/* FUNC */
+	t = btf_type_by_id(btf, t->type);	/* FUNC_PROTO */
+
+	args = btf_params(t);
+	nargs = btf_vlen(t);
+	for (i = 0; i < nargs; i++) {
+		if (is_kfunc_arg_iter(btf, &args[i])) {
+			*reg_idx = BPF_REG_1 + i;
+			return true;
+		}
+	}
+
+	return false;
+}
+
+/* is_state_visited() handles iter_next() (see process_iter_next_call() for
+ * terminology) calls specially: as opposed to bounded BPF loops, it *expects*
+ * state matching, which otherwise looks like an infinite loop. So while
+ * iter_next() calls are taken care of, we still need to be careful and
+ * prevent erroneous and too eager declaration of "ininite loop", when
+ * iterators are involved.
+ *
+ * Here's a situation in pseudo-BPF assembly form:
+ *
+ *   0: again:                          ; set up iter_next() call args
+ *   1:   r1 = &it                      ; <CHECKPOINT HERE>
+ *   2:   call bpf_iter_num_next        ; this is iter_next() call
+ *   3:   if r0 == 0 goto done
+ *   4:   ... something useful here ...
+ *   5:   goto again                    ; another iteration
+ *   6: done:
+ *   7:   r1 = &it
+ *   8:   call bpf_iter_num_destroy     ; clean up iter state
+ *   9:   exit
+ *
+ * This is a typical loop. Let's assume that we have a prune point at 1:,
+ * before we get to `call bpf_iter_num_next` (e.g., because of that `goto
+ * again`, assuming other heuristics don't get in a way).
+ *
+ * When we first time come to 1:, let's say we have some state X. We proceed
+ * to 2:, fork states, enqueue ACTIVE, validate NULL case successfully, exit.
+ * Now we come back to validate that forked ACTIVE state. We proceed through
+ * 3-5, come to goto, jump to 1:. Let's assume our state didn't change, so we
+ * are converging. But the problem is that we don't know that yet, as this
+ * convergence has to happen at iter_next() call site only. So if nothing is
+ * done, at 1: verifier will use bounded loop logic and declare infinite
+ * looping (and would be *technically* correct, if not for iterator "eventual
+ * sticky NULL" contract, see process_iter_next_call()). But we don't want
+ * that. So what we do in process_iter_next_call() when we go on another
+ * ACTIVE iteration, we bump slot->iter.depth, to mark that it's a different
+ * iteration. So when we detect soon-to-be-declared infinite loop, we also
+ * check if any of *ACTIVE* iterator state's depth differs. If yes, we pretend
+ * we are not looping and wait for next iter_next() call.
+ *
+ * This only applies to ACTIVE state. In DRAINED state we don't expect to
+ * loop, because that would actually mean infinite loop, as DRAINED state is
+ * "sticky", and so we'll keep returning into the same instruction with the
+ * same state (at least in one of possible code paths).
+ *
+ * This approach allows to keep infinite loop heuristic even in the face of
+ * active iterator. E.g., C snippet below will be detected as (and actually is)
+ * looping:
+ *
+ *   struct bpf_iter it;
+ *   int *p, x;
+ *
+ *   bpf_iter_num_new(&it, 0, 10);
+ *   while ((p = bpf_iter_num_next(&t))) {
+ *       x = p;
+ *       while (x--) {} // <<-- infinite loop here
+ *   }
+ *
+ */
+static bool iter_active_depths_differ(struct bpf_verifier_state *old, struct bpf_verifier_state *cur)
+{
+	struct bpf_reg_state *slot, *cur_slot;
+	struct bpf_func_state *state;
+	int i, fr;
+
+	for (fr = old->curframe; fr >= 0; fr--) {
+		state = old->frame[fr];
+		for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) {
+			if (state->stack[i].slot_type[0] != STACK_ITER)
+				continue;
+
+			slot = &state->stack[i].spilled_ptr;
+			if (slot->iter.state != BPF_ITER_STATE_ACTIVE)
+				continue;
+
+			cur_slot = &cur->frame[fr]->stack[i].spilled_ptr;
+			if (cur_slot->iter.depth != slot->iter.depth)
+				return true;
+		}
+	}
+	return false;
+}
 
 static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 {
@@ -14538,6 +15103,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 
 		if (sl->state.branches) {
 			struct bpf_func_state *frame = sl->state.frame[sl->state.curframe];
+			int iter_arg_reg_idx;
 
 			if (frame->in_async_callback_fn &&
 			    frame->async_entry_cnt != cur->frame[cur->curframe]->async_entry_cnt) {
@@ -14552,8 +15118,45 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 				 * Since the verifier still needs to catch infinite loops
 				 * inside async callbacks.
 				 */
-			} else if (states_maybe_looping(&sl->state, cur) &&
-				   states_equal(env, &sl->state, cur)) {
+				goto skip_inf_loop_check;
+			}
+			/* BPF open-coded iterators loop detection is special.
+			 * states_maybe_looping() logic is too simplistic in detecting
+			 * states that *might* be equivalent, because it doesn't know
+			 * about ID remapping, so don't even perform it.
+			 * See process_iter_next_call() and iter_active_depths_differ()
+			 * for overview of the logic. When current and one of parent
+			 * states are detected as equivalent, it's a good thing: we prove
+			 * convergence and can stop simulating further iterations.
+			 * It's safe to assume that iterator loop will finish, taking into
+			 * account iter_next() contract of eventually returning
+			 * sticky NULL result.
+			 */
+			if (is_iter_next_insn(env, insn_idx, &iter_arg_reg_idx)) {
+				if (states_equal(env, &sl->state, cur)) {
+					struct bpf_func_state *cur_frame;
+					struct bpf_reg_state *iter_state, *iter_reg;
+					int spi;
+
+					/* current state is valid due to states_equal(),
+					 * so we can assume valid iter state, no need for extra
+					 * (re-)validations
+					 */
+					cur_frame = cur->frame[cur->curframe];
+					iter_reg = &cur_frame->regs[iter_arg_reg_idx];
+					spi = iter_get_spi(env, iter_reg);
+					if (spi < 0)
+						return spi;
+					iter_state = &func(env, iter_reg)->stack[spi].spilled_ptr;
+					if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE)
+						goto hit;
+				}
+				goto skip_inf_loop_check;
+			}
+			/* attempt to detect infinite loop to avoid unnecessary doomed work */
+			if (states_maybe_looping(&sl->state, cur) &&
+			    states_equal(env, &sl->state, cur) &&
+			    !iter_active_depths_differ(&sl->state, cur)) {
 				verbose_linfo(env, insn_idx, "; ");
 				verbose(env, "infinite loop detected at insn %d\n", insn_idx);
 				return -EINVAL;
@@ -14570,6 +15173,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			 * This threshold shouldn't be too high either, since states
 			 * at the end of the loop are likely to be useful in pruning.
 			 */
+skip_inf_loop_check:
 			if (!env->test_state_freq &&
 			    env->jmps_processed - env->prev_jmps_processed < 20 &&
 			    env->insn_processed - env->prev_insn_processed < 100)
@@ -14577,6 +15181,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			goto miss;
 		}
 		if (states_equal(env, &sl->state, cur)) {
+hit:
 			sl->hit_cnt++;
 			/* reached equivalent register/stack state,
 			 * prune the search.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c9699304aed2..c4b506193365 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6927,6 +6927,12 @@ struct bpf_dynptr {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_iter {
+	__u64 :64;
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_list_head {
 	__u64 :64;
 	__u64 :64;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 14/17] bpf: implement number iterator
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (12 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 13/17] bpf: add support for open-coded iterator loops Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-04 20:21   ` Alexei Starovoitov
  2023-03-02 23:50 ` [PATCH bpf-next 15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros Andrii Nakryiko
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Implement first open-coded iterator type over a range of integers.

It's public API consists of:
  - bpf_iter_num_new() constructor, which accepts [start, end) range
    (that is, start is inclusive, while end is exclusive).
  - bpf_iter_num_next() which will keep returning 4-byte read-only
    pointer to int until the range is exhausted, at which point NULL will
    be returned. If bpf_iter_num_next() is kept calling after this, NULL
    will be persistently returned.
  - bpf_iter_num_destroy() destructor, that needs to be called at some
    point to clean up iterator state. BPF verifier enforces that iterator
    destructor is called at some point before BPF program exits.

Note that `start = end = X` is a valid combination to setup empty
iterator. bpf_iter_num_new() will return 0 (success) for any such
combination.

If bpf_iter_num_new() detects invalid combination of input arguments, it
returns error, resets iterator state to, effectively, empty iterator, so
any subsequent call to bpf_iter_num_next() will keep returning NULL.

BPF verifier has no knowledge that returned integers are in the [start,
end) value range, as both `start` and `end` are not statically
known/enforced, they are runtime values only.

While implementation is pretty trivial, some care needs to be taken to
avoid overflows and underflows. Subsequent selftests will validate
correctness of [start, end) semantics, especially around extremes
(INT_MIN and INT_MAX).

Similarly to bpf_loop(), we enforce that no more than BPF_MAX_LOOPS can
be specified.

bpf_iter_num_{new,next,destroy}() is a logical evolution from bounded
BPF loops and bpf_loop() helper and is the basis for implementing
ergonomic BPF loops with no statically known and verified bounds.
Subsequent patches implement bpf_for() macro, demonstrating how this can
be wrapped into something that works and feels like a normal for() loop
in C language.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h   | 14 +++++++--
 kernel/bpf/bpf_iter.c | 71 +++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/helpers.c  |  3 ++
 kernel/bpf/verifier.c | 24 +++++++++++++--
 4 files changed, 107 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a968282ba324..2a730759a471 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -613,6 +613,9 @@ enum bpf_type_flag {
 	/* DYNPTR points to xdp_buff */
 	DYNPTR_TYPE_XDP		= BIT(16 + BPF_BASE_TYPE_BITS),
 
+	/* ITER of integers */
+	ITER_TYPE_NUM		= BIT(17 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
@@ -620,7 +623,7 @@ enum bpf_type_flag {
 #define DYNPTR_TYPE_FLAG_MASK	(DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \
 				 | DYNPTR_TYPE_XDP)
 
-#define ITER_TYPE_FLAG_MASK	(0)
+#define ITER_TYPE_FLAG_MASK	(ITER_TYPE_NUM)
 
 /* Max number of base types. */
 #define BPF_BASE_TYPE_LIMIT	(1UL << BPF_BASE_TYPE_BITS)
@@ -1167,6 +1170,7 @@ u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr);
 
 enum bpf_iter_type {
 	BPF_ITER_TYPE_INVALID,
+	BPF_ITER_TYPE_NUM,
 };
 
 #ifdef CONFIG_BPF_JIT
@@ -1622,8 +1626,12 @@ struct bpf_array {
 #define BPF_COMPLEXITY_LIMIT_INSNS      1000000 /* yes. 1M insns */
 #define MAX_TAIL_CALL_CNT 33
 
-/* Maximum number of loops for bpf_loop */
-#define BPF_MAX_LOOPS	BIT(23)
+/* Maximum number of loops for bpf_loop and bpf_iter_num.
+ * It's enum to expose it (and thus make it discoverable) through BTF.
+ */
+enum {
+	BPF_MAX_LOOPS = 8 * 1024 * 1024,
+};
 
 #define BPF_F_ACCESS_MASK	(BPF_F_RDONLY |		\
 				 BPF_F_RDONLY_PROG |	\
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index 5dc307bdeaeb..504189a3b474 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright (c) 2020 Facebook */
 
+#include "linux/build_bug.h"
 #include <linux/fs.h>
 #include <linux/anon_inodes.h>
 #include <linux/filter.h>
@@ -776,3 +777,73 @@ const struct bpf_func_proto bpf_loop_proto = {
 	.arg3_type	= ARG_PTR_TO_STACK_OR_NULL,
 	.arg4_type	= ARG_ANYTHING,
 };
+
+struct bpf_iter_num_kern {
+	int cur; /* current value, inclusive */
+	int end; /* final value, exclusive */
+	__u64 :64;
+	__u64 :64;
+} __aligned(8);
+
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in vmlinux BTF");
+
+__bpf_kfunc int bpf_iter_num_new(struct bpf_iter *it__uninit, int start, int end)
+{
+	struct bpf_iter_num_kern *s = (void *)it__uninit;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_num_kern) != sizeof(struct bpf_iter));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_num_kern) != __alignof__(struct bpf_iter));
+
+	/* start == end is legit, it's an empty range and we'll just get NULL
+	 * on first (and any subsequent) bpf_iter_num_next() call
+	 */
+	if (start > end) {
+		s->cur = s->end = 0;
+		return -EINVAL;
+	}
+
+	/* avoid overflows, e.g., if start == INT_MIN and end == INT_MAX */
+	if ((s64)end - (s64)start > BPF_MAX_LOOPS) {
+		s->cur = s->end = 0;
+		return -E2BIG;
+	}
+
+	/* user will call bpf_iter_num_next() first,
+	 * which will set s->cur to exactly start value;
+	 * underflow shouldn't matter
+	 */
+	s->cur = start - 1;
+	s->end = end;
+
+	return 0;
+}
+
+__bpf_kfunc int *bpf_iter_num_next(struct bpf_iter* it)
+{
+	struct bpf_iter_num_kern *s = (void *)it;
+
+	/* check failed initialization or if we are done (same behavior);
+	 * need to be careful about overflow, so convert to s64 for checks,
+	 * e.g., if s->cur == s->end == INT_MAX, we can't just do
+	 * s->cur + 1 >= s->end
+	 */
+	if ((s64)(s->cur + 1) >= s->end) {
+		s->cur = s->end = 0;
+		return NULL;
+	}
+
+	s->cur++;
+
+	return &s->cur;
+}
+
+__bpf_kfunc void bpf_iter_num_destroy(struct bpf_iter *it)
+{
+	struct bpf_iter_num_kern *s = (void *)it;
+
+	s->cur = s->end = 0;
+}
+
+__diag_pop();
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index de9ef8476e29..23c8f2313d5a 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2398,6 +2398,9 @@ BTF_ID_FLAGS(func, bpf_rcu_read_lock)
 BTF_ID_FLAGS(func, bpf_rcu_read_unlock)
 BTF_ID_FLAGS(func, bpf_dynptr_slice, KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_num_new)
+BTF_ID_FLAGS(func, bpf_iter_num_next, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_num_destroy)
 BTF_SET8_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 58754929ee33..9671b4f354e9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -779,6 +779,8 @@ static const char *dynptr_type_str(enum bpf_dynptr_type type)
 static const char *iter_type_str(enum bpf_iter_type type)
 {
 	switch (type) {
+	case BPF_ITER_TYPE_NUM:
+		return "num";
 	case BPF_ITER_TYPE_INVALID:
 		return "<invalid>";
 	default:
@@ -1157,6 +1159,8 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg
 static enum bpf_dynptr_type arg_to_iter_type(enum bpf_arg_type arg_type)
 {
 	switch (arg_type & ITER_TYPE_FLAG_MASK) {
+	case ITER_TYPE_NUM:
+		return BPF_ITER_TYPE_NUM;
 	default:
 		return BPF_ITER_TYPE_INVALID;
 	}
@@ -9445,6 +9449,9 @@ enum special_kfunc_type {
 	KF_bpf_dynptr_from_xdp,
 	KF_bpf_dynptr_slice,
 	KF_bpf_dynptr_slice_rdwr,
+	KF_bpf_iter_num_new,
+	KF_bpf_iter_num_next,
+	KF_bpf_iter_num_destroy,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -9483,6 +9490,9 @@ BTF_ID(func, bpf_dynptr_from_skb)
 BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
+BTF_ID(func, bpf_iter_num_new)
+BTF_ID(func, bpf_iter_num_next)
+BTF_ID(func, bpf_iter_num_destroy)
 
 static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -9496,7 +9506,7 @@ static bool is_kfunc_bpf_rcu_read_unlock(struct bpf_kfunc_call_arg_meta *meta)
 
 static bool is_iter_next_kfunc(int btf_id)
 {
-	return false;
+	return btf_id == special_kfunc_list[KF_bpf_iter_num_next];
 }
 
 static enum kfunc_ptr_arg_type
@@ -10278,7 +10288,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (is_kfunc_arg_uninit(btf, &args[i]))
 				iter_arg_type |= MEM_UNINIT;
 
-			ret = process_iter_arg(env, regno, insn_idx, iter_arg_type,  meta);
+			if (meta->func_id == special_kfunc_list[KF_bpf_iter_num_new] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_iter_num_next]) {
+				iter_arg_type |= ITER_TYPE_NUM;
+			} else if (meta->func_id == special_kfunc_list[KF_bpf_iter_num_destroy]) {
+				iter_arg_type |= ITER_TYPE_NUM | OBJ_RELEASE;
+			} else {
+				verbose(env, "verifier internal error: unrecognized iterator kfunc\n");
+				return -EFAULT;
+			}
+
+			ret = process_iter_arg(env, regno, insn_idx, iter_arg_type, meta);
 			if (ret < 0)
 				return ret;
 			break;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (13 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 14/17] bpf: implement number iterator Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-04 20:34   ` Alexei Starovoitov
  2023-03-02 23:50 ` [PATCH bpf-next 16/17] selftests/bpf: add iterators tests Andrii Nakryiko
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Add bpf_for_each(), bpf_for() and bpf_repeat() macros that make writing
open-coded iterator-based loops much more convenient and natural. These
macro utilize cleanup attribute to ensure proper destruction of the
iterator and thanks to that manage to provide an ergonomic very close to
C language for construct. Typical integer loop would look like:

  int i;
  int arr[N];

  bpf_for(i, 0, N) {
  /* verifier will know that i >= 0 && i < N, so could be used to
   * directly access array elements with no extra checks
   */
   arr[i] = i;
  }

bpf_repeat() is very similar, but it doesn't expose iteration number and
is mean as a simple "repeat action N times":

  bpf_repeat(N) { /* whatever */ }

Note that break and continue inside the {} block work as expected.

bpf_for_each() is a generalization over any kind of BPF open-coded
iterator allowing to use for-each-like approach instead of calling
low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.:

  struct cgroup *cg;

  bpf_for_each(cgroup, cg, some, input, args) {
      /* do something with each cg */
  }

Would call (right now hypothetical) bpf_iter_cgroup_{new,next,destroy}()
functions to form a loop over cgroups, where `some, input, args` are
passed verbatim into constructor as
bpf_iter_cgroup_new(&it, some, input, args).

As a demonstration, add pyperf variant based on bpf_for() loop.

Also clean up few tests that either included bpf_misc.h header
unnecessarily from user-space or included it before any common types are
defined.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../bpf/prog_tests/bpf_verif_scale.c          |  6 ++
 .../bpf/prog_tests/uprobe_autoattach.c        |  1 -
 tools/testing/selftests/bpf/progs/bpf_misc.h  | 76 +++++++++++++++++++
 tools/testing/selftests/bpf/progs/lsm.c       |  4 +-
 tools/testing/selftests/bpf/progs/pyperf.h    | 14 +++-
 .../selftests/bpf/progs/pyperf600_iter.c      |  7 ++
 .../selftests/bpf/progs/pyperf600_nounroll.c  |  3 -
 7 files changed, 101 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/pyperf600_iter.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
index 5ca252823294..731c343897d8 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
@@ -144,6 +144,12 @@ void test_verif_scale_pyperf600_nounroll()
 	scale_test("pyperf600_nounroll.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
 }
 
+void test_verif_scale_pyperf600_iter()
+{
+	/* open-coded BPF iterator version */
+	scale_test("pyperf600_iter.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
+}
+
 void test_verif_scale_loop1()
 {
 	scale_test("loop1.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
index 6558c857e620..d5b3377aa33c 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
@@ -3,7 +3,6 @@
 
 #include <test_progs.h>
 #include "test_uprobe_autoattach.skel.h"
-#include "progs/bpf_misc.h"
 
 /* uprobe attach point */
 static noinline int autoattach_trigger_func(int arg1, int arg2, int arg3,
diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
index f704885aa534..08a791f307a6 100644
--- a/tools/testing/selftests/bpf/progs/bpf_misc.h
+++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
@@ -75,5 +75,81 @@
 #define FUNC_REG_ARG_CNT 5
 #endif
 
+struct bpf_iter;
+
+extern int bpf_iter_num_new(struct bpf_iter *it__uninit, int start, int end) __ksym;
+extern int *bpf_iter_num_next(struct bpf_iter *it) __ksym;
+extern void bpf_iter_num_destroy(struct bpf_iter *it) __ksym;
+
+#ifndef bpf_for_each
+/* bpf_for_each(iter_kind, elem, args...) provides generic construct for using BPF
+ * open-coded iterators without having to write mundane explicit low-level
+ * loop. Instead, it provides for()-like generic construct that can be used
+ * pretty naturally. E.g., for some hypothetical cgroup iterator, you'd write:
+ *
+ * struct cgroup *cg, *parent_cg = <...>;
+ *
+ * bpf_for_each(cgroup, cg, parent_cg, CG_ITER_CHILDREN) {
+ *     bpf_printk("Child cgroup id = %d", cg->cgroup_id);
+ *     if (cg->cgroup_id == 123)
+ *         break;
+ * }
+ *
+ * I.e., it looks almost like high-level for each loop in other languages,
+ * supports continue/break, and is verifiable by BPF verifier.
+ *
+ * For iterating integers, the difference betwen bpf_for_each(num, i, N, M)
+ * and bpf_for(i, N, M) is in that bpf_for() provides additional proof to
+ * verifier that i is in [N, M) range, and in bpf_for_each() case i is `int
+ * *`, not just `int`. So for integers bpf_for() is more convenient.
+ */
+#define bpf_for_each(type, cur, args...) for (						  \
+	/* initialize and define destructor */						  \
+	struct bpf_iter ___it __attribute__((cleanup(bpf_iter_##type##_destroy))),	  \
+	/* ___p pointer is just to call bpf_iter_##type##_new() *once* to init ___it */	  \
+			*___p = (bpf_iter_##type##_new(&___it, ##args),		  \
+	/* this is a workaround for Clang bug: it currently doesn't emit BTF */		  \
+	/* for bpf_iter_##type##_destroy when used from cleanup() attribute */		  \
+				(void)bpf_iter_##type##_destroy, (void *)0);		  \
+	/* iteration and termination check */						  \
+	((cur = bpf_iter_##type##_next(&___it)));					  \
+	/* nothing here  */								  \
+)
+#endif /* bpf_for_each */
+
+#ifndef bpf_for
+/* bpf_for(i, start, end) proves to verifier that i is in [start, end) */
+#define bpf_for(i, start, end) for (							  \
+	/* initialize and define destructor */						  \
+	struct bpf_iter ___it __attribute__((cleanup(bpf_iter_num_destroy))),		  \
+	/* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */	  \
+			*___p = (bpf_iter_num_new(&___it, (start), (end)),		  \
+	/* this is a workaround for Clang bug: it currently doesn't emit BTF */		  \
+	/* for bpf_iter_num_destroy when used from cleanup() attribute */		  \
+				(void)bpf_iter_num_destroy, (void *)0);			  \
+	({										  \
+		/* iteration step */							  \
+		int *___t = bpf_iter_num_next(&___it);					  \
+		/* termination and bounds check */					  \
+		(___t && ((i) = *___t, i >= (start) && i < (end)));			  \
+	});										  \
+	/* nothing here  */								  \
+)
+#endif /* bpf_for */
+
+#ifndef bpf_repeat
+/* bpf_repeat(N) performs N iterations without exposing iteration number */
+#define bpf_repeat(N) for (								  \
+	/* initialize and define destructor */						  \
+	struct bpf_iter ___it __attribute__((cleanup(bpf_iter_num_destroy))),		  \
+	/* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */	  \
+			*___p = (bpf_iter_num_new(&___it, 0, (N)),			  \
+	/* this is a workaround for Clang bug: it currently doesn't emit BTF */		  \
+	/* for bpf_iter_num_destory when used from cleanup() attribute */		  \
+				(void)bpf_iter_num_destroy, (void *)0);			  \
+	bpf_iter_num_next(&___it);							  \
+	/* nothing here  */								  \
+)
+#endif /* bpf_repeat */
 
 #endif
diff --git a/tools/testing/selftests/bpf/progs/lsm.c b/tools/testing/selftests/bpf/progs/lsm.c
index dc93887ed34c..fadfdd98707c 100644
--- a/tools/testing/selftests/bpf/progs/lsm.c
+++ b/tools/testing/selftests/bpf/progs/lsm.c
@@ -4,12 +4,12 @@
  * Copyright 2020 Google LLC.
  */
 
-#include "bpf_misc.h"
 #include "vmlinux.h"
+#include <errno.h>
 #include <bpf/bpf_core_read.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_tracing.h>
-#include <errno.h>
+#include "bpf_misc.h"
 
 struct {
 	__uint(type, BPF_MAP_TYPE_ARRAY);
diff --git a/tools/testing/selftests/bpf/progs/pyperf.h b/tools/testing/selftests/bpf/progs/pyperf.h
index 6c7b1fb268d6..f2e7a31c8d75 100644
--- a/tools/testing/selftests/bpf/progs/pyperf.h
+++ b/tools/testing/selftests/bpf/progs/pyperf.h
@@ -7,6 +7,7 @@
 #include <stdbool.h>
 #include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
 
 #define FUNCTION_NAME_LEN 64
 #define FILE_NAME_LEN 128
@@ -294,17 +295,22 @@ int __on_event(struct bpf_raw_tracepoint_args *ctx)
 	if (ctx.done)
 		return 0;
 #else
-#ifdef NO_UNROLL
+#if defined(USE_ITER)
+/* no for loop, no unrolling */
+#elif defined(NO_UNROLL)
 #pragma clang loop unroll(disable)
-#else
-#ifdef UNROLL_COUNT
+#elif defined(UNROLL_COUNT)
 #pragma clang loop unroll_count(UNROLL_COUNT)
 #else
 #pragma clang loop unroll(full)
-#endif
 #endif /* NO_UNROLL */
 		/* Unwind python stack */
+#ifdef USE_ITER
+		int i;
+		bpf_for(i, 0, STACK_MAX_LEN) {
+#else /* !USE_ITER */
 		for (int i = 0; i < STACK_MAX_LEN; ++i) {
+#endif
 			if (frame_ptr && get_frame_data(frame_ptr, pidData, &frame, &sym)) {
 				int32_t new_symbol_id = *symbol_counter * 64 + cur_cpu;
 				int32_t *symbol_id = bpf_map_lookup_elem(&symbolmap, &sym);
diff --git a/tools/testing/selftests/bpf/progs/pyperf600_iter.c b/tools/testing/selftests/bpf/progs/pyperf600_iter.c
new file mode 100644
index 000000000000..d62e1b200c30
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/pyperf600_iter.c
@@ -0,0 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2023 Meta Platforms, Inc. and affiliates.
+#define STACK_MAX_LEN 600
+#define SUBPROGS
+#define NO_UNROLL
+#define USE_ITER
+#include "pyperf.h"
diff --git a/tools/testing/selftests/bpf/progs/pyperf600_nounroll.c b/tools/testing/selftests/bpf/progs/pyperf600_nounroll.c
index 6beff7502f4d..520b58c4f8db 100644
--- a/tools/testing/selftests/bpf/progs/pyperf600_nounroll.c
+++ b/tools/testing/selftests/bpf/progs/pyperf600_nounroll.c
@@ -2,7 +2,4 @@
 // Copyright (c) 2019 Facebook
 #define STACK_MAX_LEN 600
 #define NO_UNROLL
-/* clang will not unroll at all.
- * Total program size is around 2k insns
- */
 #include "pyperf.h"
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 16/17] selftests/bpf: add iterators tests
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (14 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-04 20:39   ` Alexei Starovoitov
  2023-03-04 21:09   ` Jiri Olsa
  2023-03-02 23:50 ` [PATCH bpf-next 17/17] selftests/bpf: add number iterator tests Andrii Nakryiko
  2023-03-04 19:30 ` [PATCH bpf-next 00/17] BPF open-coded iterators patchwork-bot+netdevbpf
  17 siblings, 2 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Add various tests for open-coded iterators. Some of them excercise
various possible coding patterns in C, some go down to low-level
assembly for more control over various conditions.

We also make use of bpf_for(), bpf_for_each(), bpf_repeat() macros.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/iters.c  |  15 +
 tools/testing/selftests/bpf/progs/bpf_misc.h  |   1 +
 tools/testing/selftests/bpf/progs/iters.c     | 720 ++++++++++++++++++
 .../selftests/bpf/progs/iters_looping.c       | 163 ++++
 .../selftests/bpf/progs/iters_state_safety.c  | 455 +++++++++++
 5 files changed, 1354 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/iters.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters_looping.c
 create mode 100644 tools/testing/selftests/bpf/progs/iters_state_safety.c

diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
new file mode 100644
index 000000000000..414fb8d82145
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+
+#include "iters.skel.h"
+#include "iters_state_safety.skel.h"
+#include "iters_looping.skel.h"
+
+void test_iters(void)
+{
+	RUN_TESTS(iters_state_safety);
+	RUN_TESTS(iters_looping);
+	RUN_TESTS(iters);
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
index 08a791f307a6..955c09e67abb 100644
--- a/tools/testing/selftests/bpf/progs/bpf_misc.h
+++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
@@ -36,6 +36,7 @@
 #define __clobber_common "r0", "r1", "r2", "r3", "r4", "r5", "memory"
 #define __imm(name) [name]"i"(name)
 #define __imm_addr(name) [name]"i"(&name)
+#define __imm_ptr(name) [name]"p"(&name)
 
 #if defined(__TARGET_ARCH_x86)
 #define SYSCALL_WRAPPER 1
diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c
new file mode 100644
index 000000000000..caba8e71c288
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters.c
@@ -0,0 +1,720 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+#include <stdbool.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+static volatile int zero = 0;
+
+int my_pid;
+int arr[256];
+int small_arr[16] SEC(".data.small_arr");
+
+#ifdef REAL_TEST
+#define MY_PID_GUARD() if (my_pid != (bpf_get_current_pid_tgid() >> 32)) return 0
+#else
+#define MY_PID_GUARD() ({ })
+#endif
+
+SEC("?raw_tp")
+__failure __msg("math between map_value pointer and register with unbounded min value is not allowed")
+int iter_err_unsafe_c_loop(const void *ctx)
+{
+	struct bpf_iter it;
+	int *v, i = zero; /* obscure initial value of i */
+
+	MY_PID_GUARD();
+
+	bpf_iter_num_new(&it, 0, 1000);
+	while ((v = bpf_iter_num_next(&it))) {
+		i++;
+	}
+	bpf_iter_num_destroy(&it);
+
+	small_arr[i] = 123; /* invalid */
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("unbounded memory access")
+int iter_err_unsafe_asm_loop(const void *ctx)
+{
+	struct bpf_iter it;
+	int *v, i = 0;
+
+	MY_PID_GUARD();
+
+	asm volatile (
+		"r6 = %[zero];" /* iteration counter */
+		"r1 = %[it];" /* iterator state */
+		"r2 = 0;"
+		"r3 = 1000;"
+		"r4 = 1;"
+		"call %[bpf_iter_num_new];"
+	"loop:"
+		"r1 = %[it];"
+		"call %[bpf_iter_num_next];"
+		"if r0 == 0 goto out;"
+		"r6 += 1;"
+		"goto loop;"
+	"out:"
+		"r1 = %[it];"
+		"call %[bpf_iter_num_destroy];"
+		"r1 = %[small_arr];"
+		"r2 = r6;"
+		"r2 <<= 2;"
+		"r1 += r2;"
+		"*(u32 *)(r1 + 0) = r6;" /* invalid */
+		:
+		: [it]"r"(&it),
+		  [small_arr]"p"(small_arr),
+		  [zero]"p"(zero),
+		  __imm(bpf_iter_num_new),
+		  __imm(bpf_iter_num_next),
+		  __imm(bpf_iter_num_destroy)
+		: __clobber_common, "r6"
+	);
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_while_loop(const void *ctx)
+{
+	struct bpf_iter it;
+	int *v, i;
+
+	MY_PID_GUARD();
+
+	bpf_iter_num_new(&it, 0, 3);
+	while ((v = bpf_iter_num_next(&it))) {
+		bpf_printk("ITER_BASIC: E1 VAL: v=%d", *v);
+	}
+	bpf_iter_num_destroy(&it);
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_while_loop_auto_cleanup(const void *ctx)
+{
+	__attribute__((cleanup(bpf_iter_num_destroy))) struct bpf_iter it;
+	int *v, i;
+
+	MY_PID_GUARD();
+
+	bpf_iter_num_new(&it, 0, 3);
+	while ((v = bpf_iter_num_next(&it))) {
+		bpf_printk("ITER_BASIC: E1 VAL: v=%d", *v);
+	}
+	/* (!) no explicit bpf_iter_num_destroy() */
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_for_loop(const void *ctx)
+{
+	struct bpf_iter it;
+	int *v, i;
+
+	MY_PID_GUARD();
+
+	bpf_iter_num_new(&it, 5, 10);
+	for (v = bpf_iter_num_next(&it); v; v = bpf_iter_num_next(&it)) {
+		bpf_printk("ITER_BASIC: E2 VAL: v=%d", *v);
+	}
+	bpf_iter_num_destroy(&it);
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_bpf_for_each_macro(const void *ctx)
+{
+	int *v;
+
+	MY_PID_GUARD();
+
+	bpf_for_each(num, v, 5, 10) {
+		bpf_printk("ITER_BASIC: E2 VAL: v=%d", *v);
+	}
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_bpf_for_macro(const void *ctx)
+{
+	int i;
+
+	MY_PID_GUARD();
+
+	bpf_for(i, 5, 10) {
+		bpf_printk("ITER_BASIC: E2 VAL: v=%d", i);
+	}
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_pragma_unroll_loop(const void *ctx)
+{
+	struct bpf_iter it;
+	int *v, i;
+
+	MY_PID_GUARD();
+
+	bpf_iter_num_new(&it, 0, 2);
+#pragma nounroll
+	for (i = 0; i < 3; i++) {
+		v = bpf_iter_num_next(&it);
+		bpf_printk("ITER_BASIC: E3 VAL: i=%d v=%d", i, v ? *v : -1);
+	}
+	bpf_iter_num_destroy(&it);
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_manual_unroll_loop(const void *ctx)
+{
+	struct bpf_iter it;
+	int *v, i;
+
+	MY_PID_GUARD();
+
+	bpf_iter_num_new(&it, 100, 200);
+	v = bpf_iter_num_next(&it);
+	bpf_printk("ITER_BASIC: E4 VAL: v=%d", v ? *v : -1);
+	v = bpf_iter_num_next(&it);
+	bpf_printk("ITER_BASIC: E4 VAL: v=%d", v ? *v : -1);
+	v = bpf_iter_num_next(&it);
+	bpf_printk("ITER_BASIC: E4 VAL: v=%d", v ? *v : -1);
+	v = bpf_iter_num_next(&it);
+	bpf_printk("ITER_BASIC: E4 VAL: v=%d\n", v ? *v : -1);
+	bpf_iter_num_destroy(&it);
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_multiple_sequential_loops(const void *ctx)
+{
+	struct bpf_iter it;
+	int *v, i;
+
+	MY_PID_GUARD();
+
+	bpf_iter_num_new(&it, 0, 3);
+	while ((v = bpf_iter_num_next(&it))) {
+		bpf_printk("ITER_BASIC: E1 VAL: v=%d", *v);
+	}
+	bpf_iter_num_destroy(&it);
+
+	bpf_iter_num_new(&it, 5, 10);
+	for (v = bpf_iter_num_next(&it); v; v = bpf_iter_num_next(&it)) {
+		bpf_printk("ITER_BASIC: E2 VAL: v=%d", *v);
+	}
+	bpf_iter_num_destroy(&it);
+
+	bpf_iter_num_new(&it, 0, 2);
+#pragma nounroll
+	for (i = 0; i < 3; i++) {
+		v = bpf_iter_num_next(&it);
+		bpf_printk("ITER_BASIC: E3 VAL: i=%d v=%d", i, v ? *v : -1);
+	}
+	bpf_iter_num_destroy(&it);
+
+	bpf_iter_num_new(&it, 100, 200);
+	v = bpf_iter_num_next(&it);
+	bpf_printk("ITER_BASIC: E4 VAL: v=%d", v ? *v : -1);
+	v = bpf_iter_num_next(&it);
+	bpf_printk("ITER_BASIC: E4 VAL: v=%d", v ? *v : -1);
+	v = bpf_iter_num_next(&it);
+	bpf_printk("ITER_BASIC: E4 VAL: v=%d", v ? *v : -1);
+	v = bpf_iter_num_next(&it);
+	bpf_printk("ITER_BASIC: E4 VAL: v=%d\n", v ? *v : -1);
+	bpf_iter_num_destroy(&it);
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_limit_cond_break_loop(const void *ctx)
+{
+	struct bpf_iter it;
+	int *v, i = 0, sum = 0;
+
+	MY_PID_GUARD();
+
+	bpf_iter_num_new(&it, 0, 10);
+	while ((v = bpf_iter_num_next(&it))) {
+		bpf_printk("ITER_SIMPLE: i=%d v=%d", i, *v);
+		sum += *v;
+
+		i++;
+		if (i > 3)
+			break;
+	}
+	bpf_iter_num_destroy(&it);
+
+	bpf_printk("ITER_SIMPLE: sum=%d\n", sum);
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_obfuscate_counter(const void *ctx)
+{
+	struct bpf_iter it;
+	int *v, sum = 0;
+	/* Make i's initial value unknowable for verifier to prevent it from
+	 * pruning if/else branch inside the loop body and marking i as precise.
+	 */
+	int i = zero;
+
+	MY_PID_GUARD();
+
+	bpf_iter_num_new(&it, 0, 10);
+	while ((v = bpf_iter_num_next(&it))) {
+		int x;
+
+		i += 1;
+
+		/* If we initialized i as `int i = 0;` above, verifier would
+		 * track that i becomes 1 on first iteration after increment
+		 * above, and here verifier would eagerly prune else branch
+		 * and mark i as precise, ruining open-coded iterator logic
+		 * completely, as each next iteration would have a different
+		 * *precise* value of i, and thus there would be no
+		 * convergence of state. This would result in reaching maximum
+		 * instruction limit, no matter what the limit is.
+		 */
+		if (i == 1)
+			x = 123;
+		else
+			x = i * 3 + 1;
+
+		bpf_printk("ITER_OBFUSCATE_COUNTER: i=%d v=%d x=%d", i, *v, x);
+
+		sum += x;
+	}
+	bpf_iter_num_destroy(&it);
+
+	bpf_printk("ITER_OBFUSCATE_COUNTER: sum=%d\n", sum);
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_search_loop(const void *ctx)
+{
+	struct bpf_iter it;
+	int *v, *elem = NULL;
+	bool found = false;
+
+	MY_PID_GUARD();
+
+	bpf_iter_num_new(&it, 0, 10);
+
+	while ((v = bpf_iter_num_next(&it))) {
+		bpf_printk("ITER_SEARCH_LOOP: v=%d", *v);
+
+		if (*v == 2) {
+			found = true;
+			elem = v;
+			barrier_var(elem);
+		}
+	}
+
+	/* should fail to verify if bpf_iter_num_destroy() is here */
+
+	if (found)
+		/* here found element will be wrong, we should have copied
+		 * value to a variable, but here we want to make sure we can
+		 * access memory after the loop anyways
+		 */
+		bpf_printk("ITER_SEARCH_LOOP: FOUND IT = %d!\n", *elem);
+	else
+		bpf_printk("ITER_SEARCH_LOOP: NOT FOUND IT!\n");
+
+	bpf_iter_num_destroy(&it);
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_array_fill(const void *ctx)
+{
+	int sum, i;
+
+	MY_PID_GUARD();
+
+	bpf_for(i, 0, ARRAY_SIZE(arr)) {
+		arr[i] = i * 2;
+	}
+
+	sum = 0;
+	bpf_for(i, 0, ARRAY_SIZE(arr)) {
+		sum += arr[i];
+	}
+
+	bpf_printk("ITER_ARRAY_FILL: sum=%d (should be %d)\n", sum, 255 * 256);
+
+	return 0;
+}
+
+static int arr2d[4][5];
+static int arr2d_row_sums[4];
+static int arr2d_col_sums[5];
+
+SEC("raw_tp")
+__success
+int iter_nested_iters(const void *ctx)
+{
+	int sum, row, col;
+
+	MY_PID_GUARD();
+
+	bpf_for(row, 0, ARRAY_SIZE(arr2d)) {
+		bpf_for( col, 0, ARRAY_SIZE(arr2d[0])) {
+			arr2d[row][col] = row * col;
+		}
+	}
+
+	/* zero-initialize sums */
+	sum = 0;
+	bpf_for(row, 0, ARRAY_SIZE(arr2d)) {
+		arr2d_row_sums[row] = 0;
+	}
+	bpf_for(col, 0, ARRAY_SIZE(arr2d[0])) {
+		arr2d_col_sums[col] = 0;
+	}
+
+	/* calculate sums */
+	bpf_for(row, 0, ARRAY_SIZE(arr2d)) {
+		bpf_for(col, 0, ARRAY_SIZE(arr2d[0])) {
+			sum += arr2d[row][col];
+			arr2d_row_sums[row] += arr2d[row][col];
+			arr2d_col_sums[col] += arr2d[row][col];
+		}
+	}
+
+	bpf_printk("ITER_NESTED_ITERS: total sum=%d", sum);
+	bpf_for(row, 0, ARRAY_SIZE(arr2d)) {
+		bpf_printk("ITER_NESTED_ITERS: row #%d sum=%d", row, arr2d_row_sums[row]);
+	}
+	bpf_for(col, 0, ARRAY_SIZE(arr2d[0])) {
+		bpf_printk("ITER_NESTED_ITERS: col #%d sum=%d%s",
+			   col, arr2d_col_sums[col],
+			   col == ARRAY_SIZE(arr2d[0]) - 1 ? "\n" : "");
+	}
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_nested_deeply_iters(const void *ctx)
+{
+	int sum = 0;
+
+	MY_PID_GUARD();
+
+	bpf_repeat(10) {
+		bpf_repeat(10) {
+			bpf_repeat(10) {
+				bpf_repeat(10) {
+					bpf_repeat(10) {
+						sum += 1;
+					}
+				}
+			}
+		}
+		/* validate that we can break from inside bpf_repeat() */
+		break;
+	}
+
+	return sum;
+}
+
+static __noinline void fill_inner_dimension(int row)
+{
+	int col;
+
+	bpf_for(col, 0, ARRAY_SIZE(arr2d[0])) {
+		arr2d[row][col] = row * col;
+	}
+}
+
+static __noinline int sum_inner_dimension(int row)
+{
+	int sum = 0, col;
+
+	bpf_for(col, 0, ARRAY_SIZE(arr2d[0])) {
+		sum += arr2d[row][col];
+		arr2d_row_sums[row] += arr2d[row][col];
+		arr2d_col_sums[col] += arr2d[row][col];
+	}
+
+	return sum;
+}
+
+SEC("raw_tp")
+__success
+int iter_subprog_iters(const void *ctx)
+{
+	int sum, row, col;
+
+	MY_PID_GUARD();
+
+	bpf_for(row, 0, ARRAY_SIZE(arr2d)) {
+		fill_inner_dimension(row);
+	}
+
+	/* zero-initialize sums */
+	sum = 0;
+	bpf_for(row, 0, ARRAY_SIZE(arr2d)) {
+		arr2d_row_sums[row] = 0;
+	}
+	bpf_for(col, 0, ARRAY_SIZE(arr2d[0])) {
+		arr2d_col_sums[col] = 0;
+	}
+
+	/* calculate sums */
+	bpf_for(row, 0, ARRAY_SIZE(arr2d)) {
+		sum += sum_inner_dimension(row);
+	}
+
+	bpf_printk("ITER_SUBPROG_ITERS: total sum=%d", sum);
+	bpf_for(row, 0, ARRAY_SIZE(arr2d)) {
+		bpf_printk("ITER_SUBPROG_ITERS: row #%d sum=%d",
+			   row, arr2d_row_sums[row]);
+	}
+	bpf_for(col, 0, ARRAY_SIZE(arr2d[0])) {
+		bpf_printk("ITER_SUBPROG_ITERS: col #%d sum=%d%s",
+			   col, arr2d_col_sums[col],
+			   col == ARRAY_SIZE(arr2d[0]) - 1 ? "\n" : "");
+	}
+
+	return 0;
+}
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, int);
+	__uint(max_entries, 1000);
+} arr_map SEC(".maps");
+
+SEC("?raw_tp")
+__failure __msg("invalid mem access 'scalar'")
+int iter_err_too_permissive1(const void *ctx)
+{
+	int *map_val = NULL;
+	int key = 0;
+
+	MY_PID_GUARD();
+
+	map_val = bpf_map_lookup_elem(&arr_map, &key);
+	if (!map_val)
+		return 0;
+
+	bpf_repeat(1000000) {
+		map_val = NULL;
+	}
+
+	*map_val = 123;
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("invalid mem access 'map_value_or_null'")
+int iter_err_too_permissive2(const void *ctx)
+{
+	int *map_val = NULL;
+	int key = 0;
+
+	MY_PID_GUARD();
+
+	map_val = bpf_map_lookup_elem(&arr_map, &key);
+	if (!map_val)
+		return 0;
+
+	bpf_repeat(1000000) {
+		map_val = bpf_map_lookup_elem(&arr_map, &key);
+	}
+
+	*map_val = 123;
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("invalid mem access 'map_value_or_null'")
+int iter_err_too_permissive3(const void *ctx)
+{
+	int *map_val = NULL;
+	int key = 0;
+	bool found = false;
+
+	MY_PID_GUARD();
+
+	bpf_repeat(1000000) {
+		map_val = bpf_map_lookup_elem(&arr_map, &key);
+		found = true;
+	}
+
+	if (found)
+		*map_val = 123;
+
+	return 0;
+}
+
+SEC("raw_tp")
+__success
+int iter_tricky_but_fine(const void *ctx)
+{
+	int *map_val = NULL;
+	int key = 0;
+	bool found = false;
+
+	MY_PID_GUARD();
+
+	bpf_repeat(1000000) {
+		map_val = bpf_map_lookup_elem(&arr_map, &key);
+		if (map_val) {
+			found = true;
+			break;
+		}
+	}
+
+	if (found)
+		*map_val = 123;
+
+	return 0;
+}
+
+#define __bpf_memzero(p, sz) bpf_probe_read_kernel((p), (sz), 0)
+
+SEC("raw_tp")
+__success
+int iter_stack_array_loop(const void *ctx)
+{
+	long arr1[16], arr2[16], sum = 0;
+	int *v, i;
+
+	MY_PID_GUARD();
+
+	/* zero-init arr1 and arr2 in such a way that verifier doesn't know
+	 * it's all zeros; if we don't do that, we'll make BPF verifier track
+	 * all combination of zero/non-zero stack slots for arr1/arr2, which
+	 * will lead to O(2^(ARRAY_SIZE(arr1)+ARRAY_SIZE(arr2))) different
+	 * states
+	 */
+	__bpf_memzero(arr1, sizeof(arr1));
+	__bpf_memzero(arr2, sizeof(arr1));
+
+	/* validate that we can break and continue when using bpf_for() */
+	bpf_for(i, 0, ARRAY_SIZE(arr1)) {
+		if (i & 1) {
+			arr1[i] = i;
+			continue;
+		} else {
+			arr2[i] = i;
+			break;
+		}
+	}
+
+	bpf_for(i, 0, ARRAY_SIZE(arr1)) {
+		sum += arr1[i] + arr2[i];
+	}
+
+	return sum;
+}
+
+static __noinline void fill(struct bpf_iter *it, int *arr, __u32 n, int mul)
+{
+	int *t, i;
+
+	while ((t = bpf_iter_num_next(it))) {
+		i = *t;
+		if (i >= n)
+			break;
+		arr[i] =  i * mul;
+	}
+}
+
+static __noinline int sum(struct bpf_iter *it, int *arr, __u32 n)
+{
+	int *t, i, sum = 0;;
+
+	while ((t = bpf_iter_num_next(it))) {
+		i = *t;
+		if (i >= n)
+			break;
+		sum += arr[i];
+	}
+
+	return sum;
+}
+
+SEC("raw_tp")
+__success
+int iter_pass_iter_ptr_to_subprog(const void *ctx)
+{
+	int arr1[16], arr2[32];
+	struct bpf_iter it;
+	int n, sum1, sum2;
+
+	MY_PID_GUARD();
+
+	/* fill arr1 */
+	n = ARRAY_SIZE(arr1);
+	bpf_iter_num_new(&it, 0, n);
+	fill(&it, arr1, n, 2);
+	bpf_iter_num_destroy(&it);
+
+	/* fill arr2 */
+	n = ARRAY_SIZE(arr2);
+	bpf_iter_num_new(&it, 0, n);
+	fill(&it, arr2, n, 10);
+	bpf_iter_num_destroy(&it);
+
+	/* sum arr1 */
+	n = ARRAY_SIZE(arr1);
+	bpf_iter_num_new(&it, 0, n);
+	sum1 = sum(&it, arr1, n);
+	bpf_iter_num_destroy(&it);
+
+	/* sum arr2 */
+	n = ARRAY_SIZE(arr2);
+	bpf_iter_num_new(&it, 0, n);
+	sum1 = sum(&it, arr2, n);
+	bpf_iter_num_destroy(&it);
+
+	bpf_printk("sum1=%d, sum2=%d", sum1, sum2);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/iters_looping.c b/tools/testing/selftests/bpf/progs/iters_looping.c
new file mode 100644
index 000000000000..f2b289be9dd0
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_looping.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+#include <errno.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+#define ITER_HELPERS						\
+	  __imm(bpf_iter_num_new),				\
+	  __imm(bpf_iter_num_next),				\
+	  __imm(bpf_iter_num_destroy)
+
+SEC("?raw_tp")
+__success
+int force_clang_to_emit_btf_for_externs(void *ctx)
+{
+	/* we need this as a workaround to enforce compiler emitting BTF
+	 * information for bpf_iter_num_{new,next,destroy}() kfuncs,
+	 * as, apparently, it doesn't emit it for symbols only referenced from
+	 * assembly (or cleanup attribute, for that matter, as well)
+	 */
+	bpf_repeat(0);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__success
+int consume_first_item_only(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+
+		/* consume first item */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_next];"
+
+		"if r0 == 0 goto +1;"
+		"r0 = *(u32 *)(r0 + 0);"
+
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("R0 invalid mem access 'scalar'")
+int missing_null_check_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+
+		/* consume first element */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_next];"
+
+		/* FAIL: deref with no NULL check */
+		"r1 = *(u32 *)(r0 + 0);"
+
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure
+__msg("invalid access to memory, mem_size=4 off=0 size=8")
+__msg("R0 min value is outside of the allowed memory range")
+int wrong_sized_read_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+
+		/* consume first element */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_next];"
+
+		"if r0 == 0 goto +1;"
+		/* FAIL: deref more than available 4 bytes */
+		"r0 = *(u64 *)(r0 + 0);"
+
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__success __log_level(2)
+__flag(BPF_F_TEST_STATE_FREQ)
+int simplest_loop(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		"r6 = 0;" /* init sum */
+
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 10;"
+		"call %[bpf_iter_num_new];"
+
+	"1:"
+		/* consume next item */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_next];"
+
+		"if r0 == 0 goto 2f;"
+		"r0 = *(u32 *)(r0 + 0);"
+		"r6 += r0;" /* accumulate sum */
+		"goto 1b;"
+
+	"2:"
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common, "r6"
+	);
+
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/iters_state_safety.c b/tools/testing/selftests/bpf/progs/iters_state_safety.c
new file mode 100644
index 000000000000..01fe4c7f0a2b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_state_safety.c
@@ -0,0 +1,455 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Facebook */
+
+#include <errno.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+#define ITER_HELPERS						\
+	  __imm(bpf_iter_num_new),				\
+	  __imm(bpf_iter_num_next),				\
+	  __imm(bpf_iter_num_destroy)
+
+SEC("?raw_tp")
+__success
+int force_clang_to_emit_btf_for_externs(void *ctx)
+{
+	/* we need this as a workaround to enforce compiler emitting BTF
+	 * information for bpf_iter_num_{new,next,destroy}() kfuncs,
+	 * as, apparently, it doesn't emit it for symbols only referenced from
+	 * assembly (or cleanup attribute, for that matter, as well)
+	 */
+	bpf_repeat(0);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__success __log_level(2)
+__msg("fp-24_w=iter_num(ref_id=1,state=active,depth=0)")
+int create_and_destroy(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("Unreleased reference id=1")
+int create_and_forget_to_destroy_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("expected an initialized iter as arg #1")
+int destroy_without_creating_fail(void *ctx)
+{
+	/* init with zeros to stop verifier complaining about uninit stack */
+	struct bpf_iter iter;
+
+	asm volatile (
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("expected an initialized iter as arg #1")
+int compromise_iter_w_direct_write_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+
+		/* directly write over first half of iter state */
+		"*(u64 *)(%[iter] + 0) = r0;"
+
+		/* (attempt to) destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("expected an initialized iter as arg #1")
+int compromise_iter_w_direct_write2_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+
+		/* directly write over second half of iter state */
+		"*(u64 *)(%[iter] + 8) = r0;"
+
+		/* (attempt to) destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("Unreleased reference id=1")
+int compromise_iter_w_direct_write_and_skip_destroy_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+
+		/* directly write over first half of iter state */
+		"*(u64 *)(%[iter] + 0) = r0;"
+
+		/* don't destroy iter, leaking ref, which should fail */
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("expected an initialized iter as arg #1")
+int compromise_iter_w_helper_write_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+
+		/* overwrite 8th byte with bpf_probe_read_kernel() */
+		"r1 = %[iter];"
+		"r1 += 7;"
+		"r2 = 1;"
+		"r3 = 0;" /* NULL */
+		"call %[bpf_probe_read_kernel];"
+
+		/* (attempt to) destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS, __imm(bpf_probe_read_kernel)
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+static __noinline void subprog_with_iter(void)
+{
+	struct bpf_iter iter;
+
+	bpf_iter_num_new(&iter, 0, 1);
+
+	return;
+}
+
+SEC("?raw_tp")
+__failure
+/* ensure there was a call to subprog, which might happen without __noinline */
+__msg("returning from callee:")
+__msg("Unreleased reference id=1")
+int leak_iter_from_subprog_fail(void *ctx)
+{
+	subprog_with_iter();
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__success __log_level(2)
+__msg("fp-24_w=iter_num(ref_id=1,state=active,depth=0)")
+int valid_stack_reuse(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+
+		/* now reuse same stack slots */
+
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("expected uninitialized iter as arg #1")
+int double_create_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+		/* (attempt to) create iterator again */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("expected an initialized iter as arg #1")
+int double_destroy_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		/* (attempt to) destroy iterator again */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("expected an initialized iter as arg #1")
+int next_without_new_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* don't create iterator and try to iterate*/
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_next];"
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("expected an initialized iter as arg #1")
+int next_after_destroy_fail(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* create iterator */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+		/* destroy iterator */
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_destroy];"
+		/* don't create iterator and try to iterate*/
+		"r1 = %[iter];"
+		"call %[bpf_iter_num_next];"
+		:
+		: __imm_ptr(iter), ITER_HELPERS
+		: __clobber_common
+	);
+
+	return 0;
+}
+
+SEC("?raw_tp")
+__failure __msg("invalid read from stack")
+int __naked read_from_iter_slot_fail(void)
+{
+	asm volatile (
+		/* r6 points to struct bpf_iter on the stack */
+		"r6 = r10;"
+		"r6 += -24;"
+
+		/* create iterator */
+		"r1 = r6;"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+
+		/* attemp to leak bpf_iter state */
+		"r7 = *(u64 *)(r6 + 0);"
+		"r8 = *(u64 *)(r6 + 8);"
+
+		/* destroy iterator */
+		"r1 = r6;"
+		"call %[bpf_iter_num_destroy];"
+
+		/* leak bpf_iter state */
+		"r0 = r7;"
+		"if r7 > r8 goto +1;"
+		"r0 = r8;"
+		"exit;"
+		:
+		: ITER_HELPERS
+		: __clobber_common, "r6", "r7", "r8"
+	);
+}
+
+int zero;
+
+SEC("?raw_tp")
+__failure
+__flag(BPF_F_TEST_STATE_FREQ)
+__msg("Unreleased reference")
+int stacksafe_should_not_conflate_stack_spill_and_iter(void *ctx)
+{
+	struct bpf_iter iter;
+
+	asm volatile (
+		/* Create a fork in logic, with general setup as follows:
+		 *   - fallthrough (first) path is valid;
+		 *   - branch (second) path is invalid.
+		 * Then depending on what we do in fallthrough vs branch path,
+		 * we try to detect bugs in func_states_equal(), regsafe(),
+		 * refsafe(), stack_safe(), and similar by tricking verifier
+		 * into believing that branch state is a valid subset of
+		 * a fallthrough state. Verifier should reject overall
+		 * validation, unless there is a bug somewhere in verifier
+		 * logic.
+		 */
+		"call %[bpf_get_prandom_u32];"
+		"r6 = r0;"
+		"call %[bpf_get_prandom_u32];"
+		"r7 = r0;"
+
+		"if r6 > r7 goto bad;" /* fork */
+
+		/* spill r6 into stack slot of bpf_iter var */
+		"*(u64 *)(%[iter] + 0) = r6;"
+		"*(u64 *)(%[iter] + 8) = r6;"
+
+		"goto skip_bad;"
+
+	"bad:"
+		/* create iterator in the same stack slot */
+		"r1 = %[iter];"
+		"r2 = 0;"
+		"r3 = 1000;"
+		"call %[bpf_iter_num_new];"
+
+		/* but then forget about it and overwrite it back to r6 spill */
+		"*(u64 *)(%[iter] + 0) = r6;"
+		"*(u64 *)(%[iter] + 8) = r6;"
+
+	"skip_bad:"
+		"goto +0;" /* force checkpoint */
+
+		/* corrupt stack slots, if they are really dynptr */
+		"*(u64 *)(%[iter] + 8) = r6;"
+		:
+		: __imm_ptr(iter),
+		  __imm_addr(zero),
+		  __imm(bpf_get_prandom_u32),
+		  __imm(bpf_dynptr_from_mem),
+		  ITER_HELPERS
+		: __clobber_common, "r6", "r7"
+	);
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 17/17] selftests/bpf: add number iterator tests
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (15 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 16/17] selftests/bpf: add iterators tests Andrii Nakryiko
@ 2023-03-02 23:50 ` Andrii Nakryiko
  2023-03-04 19:30 ` [PATCH bpf-next 00/17] BPF open-coded iterators patchwork-bot+netdevbpf
  17 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-02 23:50 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Tejun Heo

Add number iterator (bpf_iter_num_{new,next,destroy}()) specific tests,
validating its correct handling of various corner cases *at runtime*.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/iters.c  |  47 ++++
 tools/testing/selftests/bpf/progs/iters_num.c | 242 ++++++++++++++++++
 2 files changed, 289 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_num.c

diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index 414fb8d82145..d467f49dcf9b 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -6,10 +6,57 @@
 #include "iters.skel.h"
 #include "iters_state_safety.skel.h"
 #include "iters_looping.skel.h"
+#include "iters_num.skel.h"
+
+static void subtest_num_iters(void)
+{
+	struct iters_num *skel;
+	int err;
+
+	skel = iters_num__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	err = iters_num__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto cleanup;
+
+	usleep(1);
+	iters_num__detach(skel);
+
+#define VALIDATE_CASE(case_name)					\
+	ASSERT_EQ(skel->bss->res_##case_name,				\
+		  skel->rodata->exp_##case_name,			\
+		  #case_name)
+
+	VALIDATE_CASE(empty_zero);
+	VALIDATE_CASE(empty_int_min);
+	VALIDATE_CASE(empty_int_max);
+	VALIDATE_CASE(empty_minus_one);
+
+	VALIDATE_CASE(simple_sum);
+	VALIDATE_CASE(neg_sum);
+	VALIDATE_CASE(very_neg_sum);
+	VALIDATE_CASE(neg_pos_sum);
+
+	VALIDATE_CASE(invalid_range);
+	VALIDATE_CASE(max_range);
+	VALIDATE_CASE(e2big_range);
+
+	VALIDATE_CASE(succ_elem_cnt);
+	VALIDATE_CASE(overfetched_elem_cnt);
+	VALIDATE_CASE(fail_elem_cnt);
+
+cleanup:
+	iters_num__destroy(skel);
+}
 
 void test_iters(void)
 {
 	RUN_TESTS(iters_state_safety);
 	RUN_TESTS(iters_looping);
 	RUN_TESTS(iters);
+
+	if (test__start_subtest("num"))
+		subtest_num_iters();
 }
diff --git a/tools/testing/selftests/bpf/progs/iters_num.c b/tools/testing/selftests/bpf/progs/iters_num.c
new file mode 100644
index 000000000000..09df877d31bc
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_num.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+#include <limits.h>
+#include <linux/errno.h>
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+const volatile __s64 exp_empty_zero = 0 + 1;
+__s64 res_empty_zero;
+
+SEC("raw_tp/sys_enter")
+int empty_zero(const void *ctx)
+{
+	__s64 sum = 0, i;
+
+	bpf_for(i, 0, 0) sum += i;
+	res_empty_zero = 1 + sum;
+
+	return 0;
+}
+
+const volatile __s64 exp_empty_int_min = 0 + 2;
+__s64 res_empty_int_min;
+
+SEC("raw_tp/sys_enter")
+int empty_int_min(const void *ctx)
+{
+	__s64 sum = 0, i;
+
+	bpf_for(i, INT_MIN, INT_MIN) sum += i;
+	res_empty_int_min = 2 + sum;
+
+	return 0;
+}
+
+const volatile __s64 exp_empty_int_max = 0 + 3;
+__s64 res_empty_int_max;
+
+SEC("raw_tp/sys_enter")
+int empty_int_max(const void *ctx)
+{
+	__s64 sum = 0, i;
+
+	bpf_for(i, INT_MAX, INT_MAX) sum += i;
+	res_empty_int_max = 3 + sum;
+
+	return 0;
+}
+
+const volatile __s64 exp_empty_minus_one = 0 + 4;
+__s64 res_empty_minus_one;
+
+SEC("raw_tp/sys_enter")
+int empty_minus_one(const void *ctx)
+{
+	__s64 sum = 0, i;
+
+	bpf_for(i, -1, -1) sum += i;
+	res_empty_minus_one = 4 + sum;
+
+	return 0;
+}
+
+const volatile __s64 exp_simple_sum = 9 * 10 / 2;
+__s64 res_simple_sum;
+
+SEC("raw_tp/sys_enter")
+int simple_sum(const void *ctx)
+{
+	__s64 sum = 0, i;
+
+	bpf_for(i, 0, 10) sum += i;
+	res_simple_sum = sum;
+
+	return 0;
+}
+
+const volatile __s64 exp_neg_sum = -11 * 10 / 2;
+__s64 res_neg_sum;
+
+SEC("raw_tp/sys_enter")
+int neg_sum(const void *ctx)
+{
+	__s64 sum = 0, i;
+
+	bpf_for(i, -10, 0) sum += i;
+	res_neg_sum = sum;
+
+	return 0;
+}
+
+const volatile __s64 exp_very_neg_sum = INT_MIN + (__s64)(INT_MIN + 1);
+__s64 res_very_neg_sum;
+
+SEC("raw_tp/sys_enter")
+int very_neg_sum(const void *ctx)
+{
+	__s64 sum = 0, i;
+
+	bpf_for(i, INT_MIN, INT_MIN + 2) sum += i;
+	res_very_neg_sum = sum;
+
+	return 0;
+}
+
+const volatile __s64 exp_very_big_sum = (__s64)(INT_MAX - 1) + (__s64)(INT_MAX - 2);
+__s64 res_very_big_sum;
+
+SEC("raw_tp/sys_enter")
+int very_big_sum(const void *ctx)
+{
+	__s64 sum = 0, i;
+
+	bpf_for(i, INT_MAX - 2, INT_MAX) sum += i;
+	res_very_big_sum = sum;
+
+	return 0;
+}
+
+const volatile __s64 exp_neg_pos_sum = -3;
+__s64 res_neg_pos_sum;
+
+SEC("raw_tp/sys_enter")
+int neg_pos_sum(const void *ctx)
+{
+	__s64 sum = 0, i;
+
+	bpf_for(i, -3, 3) sum += i;
+	res_neg_pos_sum = sum;
+
+	return 0;
+}
+
+const volatile __s64 exp_invalid_range = -EINVAL;
+__s64 res_invalid_range;
+
+SEC("raw_tp/sys_enter")
+int invalid_range(const void *ctx)
+{
+	struct bpf_iter it;
+
+	res_invalid_range = bpf_iter_num_new(&it, 1, 0);
+	bpf_iter_num_destroy(&it);
+
+	return 0;
+}
+
+const volatile __s64 exp_max_range = 0 + 10;
+__s64 res_max_range;
+
+SEC("raw_tp/sys_enter")
+int max_range(const void *ctx)
+{
+	struct bpf_iter it;
+
+	res_max_range = 10 + bpf_iter_num_new(&it, 0, BPF_MAX_LOOPS);
+	bpf_iter_num_destroy(&it);
+
+	return 0;
+}
+
+const volatile __s64 exp_e2big_range = -E2BIG;
+__s64 res_e2big_range;
+
+SEC("raw_tp/sys_enter")
+int e2big_range(const void *ctx)
+{
+	struct bpf_iter it;
+
+	res_e2big_range = bpf_iter_num_new(&it, -1, BPF_MAX_LOOPS);
+	bpf_iter_num_destroy(&it);
+
+	return 0;
+}
+
+const volatile __s64 exp_succ_elem_cnt = 10;
+__s64 res_succ_elem_cnt;
+
+SEC("raw_tp/sys_enter")
+int succ_elem_cnt(const void *ctx)
+{
+	struct bpf_iter it;
+	int cnt = 0, *v;
+
+	bpf_iter_num_new(&it, 0, 10);
+	while ((v = bpf_iter_num_next(&it))) {
+		cnt++;
+	}
+	bpf_iter_num_destroy(&it);
+
+	res_succ_elem_cnt = cnt;
+
+	return 0;
+}
+
+const volatile __s64 exp_overfetched_elem_cnt = 5;
+__s64 res_overfetched_elem_cnt;
+
+SEC("raw_tp/sys_enter")
+int overfetched_elem_cnt(const void *ctx)
+{
+	struct bpf_iter it;
+	int cnt = 0, *v, i;
+
+	bpf_iter_num_new(&it, 0, 5);
+	for (i = 0; i < 10; i++) {
+		v = bpf_iter_num_next(&it);
+		if (v)
+			cnt++;
+	}
+	bpf_iter_num_destroy(&it);
+
+	res_overfetched_elem_cnt = cnt;
+
+	return 0;
+}
+
+const volatile __s64 exp_fail_elem_cnt = 20 + 0;
+__s64 res_fail_elem_cnt;
+
+SEC("raw_tp/sys_enter")
+int fail_elem_cnt(const void *ctx)
+{
+	struct bpf_iter it;
+	int cnt = 0, *v, i;
+
+	bpf_iter_num_new(&it, 100, 10);
+	for (i = 0; i < 10; i++) {
+		v = bpf_iter_num_next(&it);
+		if (v)
+			cnt++;
+	}
+	bpf_iter_num_destroy(&it);
+
+	res_fail_elem_cnt = 20 + cnt;
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 00/17] BPF open-coded iterators
  2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
                   ` (16 preceding siblings ...)
  2023-03-02 23:50 ` [PATCH bpf-next 17/17] selftests/bpf: add number iterator tests Andrii Nakryiko
@ 2023-03-04 19:30 ` patchwork-bot+netdevbpf
  17 siblings, 0 replies; 35+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-03-04 19:30 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team, tj

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Thu, 2 Mar 2023 15:49:58 -0800 you wrote:
> Add support for open-coded (aka inline) iterators in BPF world. This is a next
> evolution of gradually allowing more powerful and less restrictive looping and
> iteration capabilities to BPF programs.
> 
> We set up a framework for implementing all kinds of iterators (e.g., cgroup,
> task, file, etc, iterators), but this patch set only implements numbers
> iterator, which is used to implement ergonomic bpf_for() for-like construct
> (see patch #15). We also add bpf_for_each(), which is a generic foreach-like
> construct that will work with any kind of open-coded iterator implementation,
> as long as we stick with bpf_iter_<type>_{new,next,destroy}() naming pattern.
> 
> [...]

Here is the summary with links:
  - [bpf-next,01/17] bpf: improve stack slot state printing
    https://git.kernel.org/bpf/bpf-next/c/d54e0f6c1adf
  - [bpf-next,02/17] bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
    https://git.kernel.org/bpf/bpf-next/c/567da5d253cd
  - [bpf-next,03/17] selftests/bpf: enhance align selftest's expected log matching
    https://git.kernel.org/bpf/bpf-next/c/6f876e75d316
  - [bpf-next,04/17] bpf: honor env->test_state_freq flag in is_state_visited()
    https://git.kernel.org/bpf/bpf-next/c/98ddcf389d1b
  - [bpf-next,05/17] selftests/bpf: adjust log_fixup's buffer size for proper truncation
    https://git.kernel.org/bpf/bpf-next/c/fffc893b6bf2
  - [bpf-next,06/17] bpf: clean up visit_insn()'s instruction processing
    https://git.kernel.org/bpf/bpf-next/c/653ae3a874ac
  - [bpf-next,07/17] bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper
    https://git.kernel.org/bpf/bpf-next/c/c1ee85a9806a
  - [bpf-next,08/17] bpf: ensure that r0 is marked scratched after any function call
    https://git.kernel.org/bpf/bpf-next/c/553a64a85c5d
  - [bpf-next,09/17] bpf: move kfunc_call_arg_meta higher in the file
    https://git.kernel.org/bpf/bpf-next/c/d0e1ac227945
  - [bpf-next,10/17] bpf: mark PTR_TO_MEM as non-null register type
    https://git.kernel.org/bpf/bpf-next/c/d5271c5b1950
  - [bpf-next,11/17] bpf: generalize dynptr_get_spi to be usable for iters
    https://git.kernel.org/bpf/bpf-next/c/a461f5adf177
  - [bpf-next,12/17] bpf: add support for fixed-size memory pointer returns for kfuncs
    https://git.kernel.org/bpf/bpf-next/c/f4b4eee6169b
  - [bpf-next,13/17] bpf: add support for open-coded iterator loops
    (no matching commit)
  - [bpf-next,14/17] bpf: implement number iterator
    (no matching commit)
  - [bpf-next,15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros
    (no matching commit)
  - [bpf-next,16/17] selftests/bpf: add iterators tests
    (no matching commit)
  - [bpf-next,17/17] selftests/bpf: add number iterator tests
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 13/17] bpf: add support for open-coded iterator loops
  2023-03-02 23:50 ` [PATCH bpf-next 13/17] bpf: add support for open-coded iterator loops Andrii Nakryiko
@ 2023-03-04 20:02   ` Alexei Starovoitov
  2023-03-04 23:27     ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-03-04 20:02 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team, Tejun Heo

On Thu, Mar 02, 2023 at 03:50:11PM -0800, Andrii Nakryiko wrote:
> Teach verifier about the concept of open-coded (or inline) iterators.
> 
> This patch adds generic iterator loop verification logic, new STACK_ITER
> stack slot type to contain iterator state, and necessary kfunc plumbing
> for iterator's constructor, destructor and next "methods". Next patch
> implements first specific version of iterator (number iterator for
> implementing for loop). Such split allows to have more focused commits
> for verifier logic and separate commit that we could point later to what
> it takes to add new kind of iterator.
> 
> First, we add new fixed-size opaque struct bpf_iter (24-byte long) to
> contain runtime state of any possible iterator. struct bpf_iter state is

Looking at the verifier changes it seems that it should be possible to support
any sized iterator and we don't have to fit all of them in 24-bytes.
The same bpf_iter_<kind>_{new,next,destroy}() naming convention can apply
to types and we can have 8-byte struct bpf_iter_num
The bpf_for() macros would work with bpf_iter_<kind> too.
iirc that was your plan earlier (to have different structs).
What prompted you to change that plan?

> Any other iterator implementation will have to implement at least these
> three methods. It is enforced that for any given type of iterator only
> applicable constructor/destructor/next are callable. I.e., verifier
> ensures you can't pass number iterator into, say, cgroup iterator's next
> method.

is_iter_type_compatible() does that, right?

> +
> +static int mark_stack_slots_iter(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +				 enum bpf_arg_type arg_type, int insn_idx)
> +{
> +	struct bpf_func_state *state = func(env, reg);
> +	enum bpf_iter_type type;
> +	int spi, i, j, id;
> +
> +	spi = iter_get_spi(env, reg);
> +	if (spi < 0)
> +		return spi;
> +
> +	type = arg_to_iter_type(arg_type);
> +	if (type == BPF_ITER_TYPE_INVALID)
> +		return -EINVAL;

Do we need destroy_if_dynptr_stack_slot() equivalent here?

> +	id = acquire_reference_state(env, insn_idx);
> +	if (id < 0)
> +		return id;
> +
> +	for (i = 0; i < BPF_ITER_NR_SLOTS; i++) {
> +		struct bpf_stack_state *slot = &state->stack[spi - i];
> +		struct bpf_reg_state *st = &slot->spilled_ptr;
> +
> +		__mark_reg_known_zero(st);
> +		st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
> +		st->live |= REG_LIVE_WRITTEN;
> +		st->ref_obj_id = i == 0 ? id : 0;
> +		st->iter.type = i == 0 ? type : BPF_ITER_TYPE_INVALID;
> +		st->iter.state = BPF_ITER_STATE_ACTIVE;
> +		st->iter.depth = 0;
> +
> +		for (j = 0; j < BPF_REG_SIZE; j++)
> +			slot->slot_type[j] = STACK_ITER;
> +
> +		mark_stack_slot_scratched(env, spi - i);

dynptr needs similar mark_stack_slot_scratched() fix, right?

> +	}
> +
> +	return 0;
> +}

...

> @@ -3691,8 +3928,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>  
>  		/* regular write of data into stack destroys any spilled ptr */
>  		state->stack[spi].spilled_ptr.type = NOT_INIT;
> -		/* Mark slots as STACK_MISC if they belonged to spilled ptr. */
> -		if (is_spilled_reg(&state->stack[spi]))
> +		/* Mark slots as STACK_MISC if they belonged to spilled ptr/dynptr/iter. */
> +		if (is_stack_slot_special(&state->stack[spi]))
>  			for (i = 0; i < BPF_REG_SIZE; i++)
>  				scrub_spilled_slot(&state->stack[spi].slot_type[i]);

It fixes something for dynptr, right?

> +static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
> +				  struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	struct bpf_verifier_state *cur_st = env->cur_state, *queued_st;
> +	struct bpf_func_state *cur_fr = cur_st->frame[cur_st->curframe], *queued_fr;
> +	struct bpf_reg_state *cur_iter, *queued_iter;
> +	int iter_frameno = meta->iter.frameno;
> +	int iter_spi = meta->iter.spi;
> +
> +	BTF_TYPE_EMIT(struct bpf_iter);
> +
> +	cur_iter = &env->cur_state->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
> +
> +	if (cur_iter->iter.state != BPF_ITER_STATE_ACTIVE &&
> +	    cur_iter->iter.state != BPF_ITER_STATE_DRAINED) {
> +		verbose(env, "verifier internal error: unexpected iterator state %d (%s)\n",
> +			cur_iter->iter.state, iter_state_str(cur_iter->iter.state));
> +		return -EFAULT;
> +	}
> +
> +	if (cur_iter->iter.state == BPF_ITER_STATE_ACTIVE) {
> +		/* branch out active iter state */
> +		queued_st = push_stack(env, insn_idx + 1, insn_idx, false);
> +		if (!queued_st)
> +			return -ENOMEM;
> +
> +		queued_iter = &queued_st->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
> +		queued_iter->iter.state = BPF_ITER_STATE_ACTIVE;
> +		queued_iter->iter.depth++;
> +
> +		queued_fr = queued_st->frame[queued_st->curframe];
> +		mark_ptr_not_null_reg(&queued_fr->regs[BPF_REG_0]);
> +	}
> +
> +	/* switch to DRAINED state, but keep the depth unchanged */
> +	/* mark current iter state as drained and assume returned NULL */
> +	cur_iter->iter.state = BPF_ITER_STATE_DRAINED;
> +	__mark_reg_known_zero(&cur_fr->regs[BPF_REG_0]);
> +	cur_fr->regs[BPF_REG_0].type = SCALAR_VALUE;

__mark_reg_const_zero() instead?

> +
> +	return 0;
> +}
...
> +static bool is_iter_next_insn(struct bpf_verifier_env *env, int insn_idx, int *reg_idx)
> +{
> +	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
> +	const struct btf_param *args;
> +	const struct btf_type *t;
> +	const struct btf *btf;
> +	int nargs, i;
> +
> +	if (!bpf_pseudo_kfunc_call(insn))
> +		return false;
> +	if (!is_iter_next_kfunc(insn->imm))
> +		return false;
> +
> +	btf = find_kfunc_desc_btf(env, insn->off);
> +	if (IS_ERR(btf))
> +		return false;
> +
> +	t = btf_type_by_id(btf, insn->imm);	/* FUNC */
> +	t = btf_type_by_id(btf, t->type);	/* FUNC_PROTO */
> +
> +	args = btf_params(t);
> +	nargs = btf_vlen(t);
> +	for (i = 0; i < nargs; i++) {
> +		if (is_kfunc_arg_iter(btf, &args[i])) {
> +			*reg_idx = BPF_REG_1 + i;
> +			return true;
> +		}
> +	}

This is some future-proofing ?
The commit log says that all iterators has to in the form:
bpf_iter_<kind>_next(struct bpf_iter* it)
Should we check for one and only arg here instead?

> +
> +	return false;
> +}
> +
> +/* is_state_visited() handles iter_next() (see process_iter_next_call() for
> + * terminology) calls specially: as opposed to bounded BPF loops, it *expects*
> + * state matching, which otherwise looks like an infinite loop. So while
> + * iter_next() calls are taken care of, we still need to be careful and
> + * prevent erroneous and too eager declaration of "ininite loop", when
> + * iterators are involved.
> + *
> + * Here's a situation in pseudo-BPF assembly form:
> + *
> + *   0: again:                          ; set up iter_next() call args
> + *   1:   r1 = &it                      ; <CHECKPOINT HERE>
> + *   2:   call bpf_iter_num_next        ; this is iter_next() call
> + *   3:   if r0 == 0 goto done
> + *   4:   ... something useful here ...
> + *   5:   goto again                    ; another iteration
> + *   6: done:
> + *   7:   r1 = &it
> + *   8:   call bpf_iter_num_destroy     ; clean up iter state
> + *   9:   exit
> + *
> + * This is a typical loop. Let's assume that we have a prune point at 1:,
> + * before we get to `call bpf_iter_num_next` (e.g., because of that `goto
> + * again`, assuming other heuristics don't get in a way).
> + *
> + * When we first time come to 1:, let's say we have some state X. We proceed
> + * to 2:, fork states, enqueue ACTIVE, validate NULL case successfully, exit.
> + * Now we come back to validate that forked ACTIVE state. We proceed through
> + * 3-5, come to goto, jump to 1:. Let's assume our state didn't change, so we
> + * are converging. But the problem is that we don't know that yet, as this
> + * convergence has to happen at iter_next() call site only. So if nothing is
> + * done, at 1: verifier will use bounded loop logic and declare infinite
> + * looping (and would be *technically* correct, if not for iterator "eventual
> + * sticky NULL" contract, see process_iter_next_call()). But we don't want
> + * that. So what we do in process_iter_next_call() when we go on another
> + * ACTIVE iteration, we bump slot->iter.depth, to mark that it's a different
> + * iteration. So when we detect soon-to-be-declared infinite loop, we also
> + * check if any of *ACTIVE* iterator state's depth differs. If yes, we pretend
> + * we are not looping and wait for next iter_next() call.

'depth' part of bpf_reg_state will be checked for equality in regsafe(), right?
Everytime we branch out in process_iter_next_call() there is depth++
So how come we're able to converge with:
 +                     if (is_iter_next_insn(env, insn_idx, &iter_arg_reg_idx)) {
 +                             if (states_equal(env, &sl->state, cur)) {
That's because states_equal() is done right before doing process_iter_next_call(), right?

So depth counts the number of times bpf_iter*_next() was processed.
In other words it's a number of ways the body of the loop can be walked?

> +			if (is_iter_next_insn(env, insn_idx, &iter_arg_reg_idx)) {
> +				if (states_equal(env, &sl->state, cur)) {
> +					struct bpf_func_state *cur_frame;
> +					struct bpf_reg_state *iter_state, *iter_reg;
> +					int spi;
> +
> +					/* current state is valid due to states_equal(),
> +					 * so we can assume valid iter state, no need for extra
> +					 * (re-)validations
> +					 */
> +					cur_frame = cur->frame[cur->curframe];
> +					iter_reg = &cur_frame->regs[iter_arg_reg_idx];
> +					spi = iter_get_spi(env, iter_reg);
> +					if (spi < 0)
> +						return spi;
> +					iter_state = &func(env, iter_reg)->stack[spi].spilled_ptr;
> +					if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE)
> +						goto hit;
> +				}
> +				goto skip_inf_loop_check;

This goto is "optional", right?
Meaning that if we remove it the states_maybe_looping() + states_equal() won't match anyway.
The goto avoids wasting cycles.

> +			}
> +			/* attempt to detect infinite loop to avoid unnecessary doomed work */
> +			if (states_maybe_looping(&sl->state, cur) &&

Maybe cleaner is to remove above 'goto' and do '} else if (states_maybe_looping' here ?

> +			    states_equal(env, &sl->state, cur) &&
> +			    !iter_active_depths_differ(&sl->state, cur)) {
>  				verbose_linfo(env, insn_idx, "; ");
>  				verbose(env, "infinite loop detected at insn %d\n", insn_idx);
>  				return -EINVAL;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 14/17] bpf: implement number iterator
  2023-03-02 23:50 ` [PATCH bpf-next 14/17] bpf: implement number iterator Andrii Nakryiko
@ 2023-03-04 20:21   ` Alexei Starovoitov
  2023-03-04 23:27     ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-03-04 20:21 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team, Tejun Heo

On Thu, Mar 02, 2023 at 03:50:12PM -0800, Andrii Nakryiko wrote:
>  
>  static enum kfunc_ptr_arg_type
> @@ -10278,7 +10288,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  			if (is_kfunc_arg_uninit(btf, &args[i]))
>  				iter_arg_type |= MEM_UNINIT;
>  
> -			ret = process_iter_arg(env, regno, insn_idx, iter_arg_type,  meta);
> +			if (meta->func_id == special_kfunc_list[KF_bpf_iter_num_new] ||
> +			    meta->func_id == special_kfunc_list[KF_bpf_iter_num_next]) {
> +				iter_arg_type |= ITER_TYPE_NUM;
> +			} else if (meta->func_id == special_kfunc_list[KF_bpf_iter_num_destroy]) {
> +				iter_arg_type |= ITER_TYPE_NUM | OBJ_RELEASE;

Since OBJ_RELEASE is set pretty late here and kfuncs are not marked with KF_RELEASE,
the arg_type_is_release() in check_func_arg_reg_off() won't trigger.
So I'm confused why there is:
+               if (arg_type_is_iter(arg_type))
+                       return 0;
in the previous patch.
Will it ever trigger?

Separate question: What happens when the user does:
bpf_iter_destroy(&it);
bpf_iter_destroy(&it);

+               if (!is_iter_reg_valid_init(env, reg)) {
+                       verbose(env, "expected an initialized iter as arg #%d\n", regno);
will trigger, right?
I didn't find such selftest.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros
  2023-03-02 23:50 ` [PATCH bpf-next 15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros Andrii Nakryiko
@ 2023-03-04 20:34   ` Alexei Starovoitov
  2023-03-04 23:28     ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-03-04 20:34 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team, Tejun Heo

On Thu, Mar 02, 2023 at 03:50:13PM -0800, Andrii Nakryiko wrote:
> Add bpf_for_each(), bpf_for() and bpf_repeat() macros that make writing
> open-coded iterator-based loops much more convenient and natural. These
> macro utilize cleanup attribute to ensure proper destruction of the
> iterator and thanks to that manage to provide an ergonomic very close to
> C language for construct. Typical integer loop would look like:
> 
>   int i;
>   int arr[N];
> 
>   bpf_for(i, 0, N) {
>   /* verifier will know that i >= 0 && i < N, so could be used to
>    * directly access array elements with no extra checks
>    */
>    arr[i] = i;
>   }
> 
> bpf_repeat() is very similar, but it doesn't expose iteration number and
> is mean as a simple "repeat action N times":
> 
>   bpf_repeat(N) { /* whatever */ }
> 
> Note that break and continue inside the {} block work as expected.
> 
> bpf_for_each() is a generalization over any kind of BPF open-coded
> iterator allowing to use for-each-like approach instead of calling
> low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.:
> 
>   struct cgroup *cg;
> 
>   bpf_for_each(cgroup, cg, some, input, args) {
>       /* do something with each cg */
>   }
> 
> Would call (right now hypothetical) bpf_iter_cgroup_{new,next,destroy}()
> functions to form a loop over cgroups, where `some, input, args` are
> passed verbatim into constructor as
> bpf_iter_cgroup_new(&it, some, input, args).
> 
> As a demonstration, add pyperf variant based on bpf_for() loop.
> 
> Also clean up few tests that either included bpf_misc.h header
> unnecessarily from user-space or included it before any common types are
> defined.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  .../bpf/prog_tests/bpf_verif_scale.c          |  6 ++
>  .../bpf/prog_tests/uprobe_autoattach.c        |  1 -
>  tools/testing/selftests/bpf/progs/bpf_misc.h  | 76 +++++++++++++++++++
>  tools/testing/selftests/bpf/progs/lsm.c       |  4 +-
>  tools/testing/selftests/bpf/progs/pyperf.h    | 14 +++-
>  .../selftests/bpf/progs/pyperf600_iter.c      |  7 ++
>  .../selftests/bpf/progs/pyperf600_nounroll.c  |  3 -
>  7 files changed, 101 insertions(+), 10 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/pyperf600_iter.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> index 5ca252823294..731c343897d8 100644
> --- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> @@ -144,6 +144,12 @@ void test_verif_scale_pyperf600_nounroll()
>  	scale_test("pyperf600_nounroll.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
>  }
>  
> +void test_verif_scale_pyperf600_iter()
> +{
> +	/* open-coded BPF iterator version */
> +	scale_test("pyperf600_iter.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> +}
> +
>  void test_verif_scale_loop1()
>  {
>  	scale_test("loop1.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> index 6558c857e620..d5b3377aa33c 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> @@ -3,7 +3,6 @@
>  
>  #include <test_progs.h>
>  #include "test_uprobe_autoattach.skel.h"
> -#include "progs/bpf_misc.h"
>  
>  /* uprobe attach point */
>  static noinline int autoattach_trigger_func(int arg1, int arg2, int arg3,
> diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
> index f704885aa534..08a791f307a6 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_misc.h
> +++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
> @@ -75,5 +75,81 @@
>  #define FUNC_REG_ARG_CNT 5
>  #endif
>  
> +struct bpf_iter;
> +
> +extern int bpf_iter_num_new(struct bpf_iter *it__uninit, int start, int end) __ksym;
> +extern int *bpf_iter_num_next(struct bpf_iter *it) __ksym;
> +extern void bpf_iter_num_destroy(struct bpf_iter *it) __ksym;
> +
> +#ifndef bpf_for_each
> +/* bpf_for_each(iter_kind, elem, args...) provides generic construct for using BPF
> + * open-coded iterators without having to write mundane explicit low-level
> + * loop. Instead, it provides for()-like generic construct that can be used
> + * pretty naturally. E.g., for some hypothetical cgroup iterator, you'd write:
> + *
> + * struct cgroup *cg, *parent_cg = <...>;
> + *
> + * bpf_for_each(cgroup, cg, parent_cg, CG_ITER_CHILDREN) {
> + *     bpf_printk("Child cgroup id = %d", cg->cgroup_id);
> + *     if (cg->cgroup_id == 123)
> + *         break;
> + * }
> + *
> + * I.e., it looks almost like high-level for each loop in other languages,
> + * supports continue/break, and is verifiable by BPF verifier.
> + *
> + * For iterating integers, the difference betwen bpf_for_each(num, i, N, M)
> + * and bpf_for(i, N, M) is in that bpf_for() provides additional proof to
> + * verifier that i is in [N, M) range, and in bpf_for_each() case i is `int
> + * *`, not just `int`. So for integers bpf_for() is more convenient.
> + */
> +#define bpf_for_each(type, cur, args...) for (						  \
> +	/* initialize and define destructor */						  \
> +	struct bpf_iter ___it __attribute__((cleanup(bpf_iter_##type##_destroy))),	  \

We should probably say somewhere that it requires C99 with some flag that allows
declaring variables inside the loop.

Also what are the rules for attr(cleanup()).
When does it get called?
My understanding that the visibility of ___it is only within for() body.
So when the prog does:
bpf_for(i, 0, 10) sum += i;
bpf_for(i, 0, 10) sum += i;

the compiler should generate bpf_iter_num_destroy right after each bpf_for() ?
Or will it group them at the end of function body and destroy all iterators ?
Will compiler reuse the stack space used by ___it in case there are multiple bpf_for-s ?

> +	/* ___p pointer is just to call bpf_iter_##type##_new() *once* to init ___it */	  \
> +			*___p = (bpf_iter_##type##_new(&___it, ##args),		  \
> +	/* this is a workaround for Clang bug: it currently doesn't emit BTF */		  \
> +	/* for bpf_iter_##type##_destroy when used from cleanup() attribute */		  \
> +				(void)bpf_iter_##type##_destroy, (void *)0);		  \
> +	/* iteration and termination check */						  \
> +	((cur = bpf_iter_##type##_next(&___it)));					  \
> +	/* nothing here  */								  \
> +)
> +#endif /* bpf_for_each */
> +
> +#ifndef bpf_for
> +/* bpf_for(i, start, end) proves to verifier that i is in [start, end) */
> +#define bpf_for(i, start, end) for (							  \
> +	/* initialize and define destructor */						  \
> +	struct bpf_iter ___it __attribute__((cleanup(bpf_iter_num_destroy))),		  \
> +	/* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */	  \
> +			*___p = (bpf_iter_num_new(&___it, (start), (end)),		  \
> +	/* this is a workaround for Clang bug: it currently doesn't emit BTF */		  \
> +	/* for bpf_iter_num_destroy when used from cleanup() attribute */		  \
> +				(void)bpf_iter_num_destroy, (void *)0);			  \
> +	({										  \
> +		/* iteration step */							  \
> +		int *___t = bpf_iter_num_next(&___it);					  \
> +		/* termination and bounds check */					  \
> +		(___t && ((i) = *___t, i >= (start) && i < (end)));			  \

The i >= (start) && i < (end) is necessary to make sure that the verifier
tightens the range of 'i' inside the body of the loop and
when the program does arr[i] access the verifier will know that 'i' is within bounds, right?

In such case should we add __builtin_constant_p() check for 'start' and 'end' ?
int arr[100];
if (var < 100)
  bpf_for(i, 0, global_var) sum += arr[i];
will fail the verifier and the users might complain of dumb verifier.

Also if start and end are variables they potentially can change between bpf_iter_num_new()
and in each iteration of the loop.
__builtin_constant_p() might be too restrictive.
May be read start/end once, at least?

> +	});										  \
> +	/* nothing here  */								  \
> +)
> +#endif /* bpf_for */
> +

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 16/17] selftests/bpf: add iterators tests
  2023-03-02 23:50 ` [PATCH bpf-next 16/17] selftests/bpf: add iterators tests Andrii Nakryiko
@ 2023-03-04 20:39   ` Alexei Starovoitov
  2023-03-04 23:29     ` Andrii Nakryiko
  2023-03-04 21:09   ` Jiri Olsa
  1 sibling, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-03-04 20:39 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team, Tejun Heo

On Thu, Mar 02, 2023 at 03:50:14PM -0800, Andrii Nakryiko wrote:
> +
> +#ifdef REAL_TEST

Looks like REAL_TEST is never set.

and all bpf_printk-s in tests are never executed, because the test are 'load-only'
to check the verifier?

It looks like all of them can be run (once printks are removed and converted to if-s).
That would nicely complement patch 17 runners.

It can be a follow up, of course.

Great stuff overall!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 16/17] selftests/bpf: add iterators tests
  2023-03-02 23:50 ` [PATCH bpf-next 16/17] selftests/bpf: add iterators tests Andrii Nakryiko
  2023-03-04 20:39   ` Alexei Starovoitov
@ 2023-03-04 21:09   ` Jiri Olsa
  2023-03-04 23:29     ` Andrii Nakryiko
  1 sibling, 1 reply; 35+ messages in thread
From: Jiri Olsa @ 2023-03-04 21:09 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, kernel-team, Tejun Heo

On Thu, Mar 02, 2023 at 03:50:14PM -0800, Andrii Nakryiko wrote:

SNIP

> +
> +SEC("raw_tp")
> +__success
> +int iter_pass_iter_ptr_to_subprog(const void *ctx)
> +{
> +	int arr1[16], arr2[32];
> +	struct bpf_iter it;
> +	int n, sum1, sum2;
> +
> +	MY_PID_GUARD();
> +
> +	/* fill arr1 */
> +	n = ARRAY_SIZE(arr1);
> +	bpf_iter_num_new(&it, 0, n);
> +	fill(&it, arr1, n, 2);
> +	bpf_iter_num_destroy(&it);
> +
> +	/* fill arr2 */
> +	n = ARRAY_SIZE(arr2);
> +	bpf_iter_num_new(&it, 0, n);
> +	fill(&it, arr2, n, 10);
> +	bpf_iter_num_destroy(&it);
> +
> +	/* sum arr1 */
> +	n = ARRAY_SIZE(arr1);
> +	bpf_iter_num_new(&it, 0, n);
> +	sum1 = sum(&it, arr1, n);
> +	bpf_iter_num_destroy(&it);
> +
> +	/* sum arr2 */
> +	n = ARRAY_SIZE(arr2);
> +	bpf_iter_num_new(&it, 0, n);
> +	sum1 = sum(&it, arr2, n);
> +	bpf_iter_num_destroy(&it);
> +
> +	bpf_printk("sum1=%d, sum2=%d", sum1, sum2);

got to remove this to compile it, debug leftover?

jirka

> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";

SNIP

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 13/17] bpf: add support for open-coded iterator loops
  2023-03-04 20:02   ` Alexei Starovoitov
@ 2023-03-04 23:27     ` Andrii Nakryiko
  2023-03-05 23:46       ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-04 23:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sat, Mar 4, 2023 at 12:02 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Mar 02, 2023 at 03:50:11PM -0800, Andrii Nakryiko wrote:
> > Teach verifier about the concept of open-coded (or inline) iterators.
> >
> > This patch adds generic iterator loop verification logic, new STACK_ITER
> > stack slot type to contain iterator state, and necessary kfunc plumbing
> > for iterator's constructor, destructor and next "methods". Next patch
> > implements first specific version of iterator (number iterator for
> > implementing for loop). Such split allows to have more focused commits
> > for verifier logic and separate commit that we could point later to what
> > it takes to add new kind of iterator.
> >
> > First, we add new fixed-size opaque struct bpf_iter (24-byte long) to
> > contain runtime state of any possible iterator. struct bpf_iter state is
>
> Looking at the verifier changes it seems that it should be possible to support
> any sized iterator and we don't have to fit all of them in 24-bytes.
> The same bpf_iter_<kind>_{new,next,destroy}() naming convention can apply
> to types and we can have 8-byte struct bpf_iter_num
> The bpf_for() macros would work with bpf_iter_<kind> too.
> iirc that was your plan earlier (to have different structs).
> What prompted you to change that plan?

Not really, single user-facing `struct bpf_iter` was what I was going
for from the very beginning, similar to single `struct bpf_dynptr`.
But I guess they are a bit different in that there are many generic
operations applicable across different types of dynptr
(bpf_dynptr_read, write, data, slice, etc), so it makes sense to
abstract it behind single type.

Here, with iterators, besides something like "is iterator exhausted"
(not that I'm planning to add something like that), there isn't much
generic to be shared, so yeah, we can switch to one public UAPI struct
per each type of iterator and then those can be tailored to each
particular type. This will automatically work better for kfuncs as
new/next/destroy trios will have the same `struct bpf_iter_<type> *`
and it won't be possible to accidentally pass wrong bpf_iter_<type> to
wrong new/next/destroy method.

>
> > Any other iterator implementation will have to implement at least these
> > three methods. It is enforced that for any given type of iterator only
> > applicable constructor/destructor/next are callable. I.e., verifier
> > ensures you can't pass number iterator into, say, cgroup iterator's next
> > method.
>
> is_iter_type_compatible() does that, right?

yep. Currently it also anticipates generic iterator functions, but as
I mentioned above, I don't think we'll have any common methods
anyways.

>
> > +
> > +static int mark_stack_slots_iter(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > +                              enum bpf_arg_type arg_type, int insn_idx)
> > +{
> > +     struct bpf_func_state *state = func(env, reg);
> > +     enum bpf_iter_type type;
> > +     int spi, i, j, id;
> > +
> > +     spi = iter_get_spi(env, reg);
> > +     if (spi < 0)
> > +             return spi;
> > +
> > +     type = arg_to_iter_type(arg_type);
> > +     if (type == BPF_ITER_TYPE_INVALID)
> > +             return -EINVAL;
>
> Do we need destroy_if_dynptr_stack_slot() equivalent here?

no, because bpf_iter is always ref-counted, so we'll always have
explicit unmark_stack_slots_iter() call, which will reset slot types

destroy_if_dynptr_stack_slot() was added because LOCAL dynptr doesn't
require explicit destruction. I mentioned this difference
(simplification for bpf_iter case) somewhere in the commit message.

>
> > +     id = acquire_reference_state(env, insn_idx);
> > +     if (id < 0)
> > +             return id;
> > +
> > +     for (i = 0; i < BPF_ITER_NR_SLOTS; i++) {
> > +             struct bpf_stack_state *slot = &state->stack[spi - i];
> > +             struct bpf_reg_state *st = &slot->spilled_ptr;
> > +
> > +             __mark_reg_known_zero(st);
> > +             st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
> > +             st->live |= REG_LIVE_WRITTEN;
> > +             st->ref_obj_id = i == 0 ? id : 0;
> > +             st->iter.type = i == 0 ? type : BPF_ITER_TYPE_INVALID;
> > +             st->iter.state = BPF_ITER_STATE_ACTIVE;
> > +             st->iter.depth = 0;
> > +
> > +             for (j = 0; j < BPF_REG_SIZE; j++)
> > +                     slot->slot_type[j] = STACK_ITER;
> > +
> > +             mark_stack_slot_scratched(env, spi - i);
>
> dynptr needs similar mark_stack_slot_scratched() fix, right?

probably yes. destroy_if_dynptr_stack_slot() is scratching slots, but
we don't call that on OBJ_RELEASE (in unmark_stack_slots_dynptr), so
yeah, we should add this for dynptrs as well

>
> > +     }
> > +
> > +     return 0;
> > +}
>
> ...
>
> > @@ -3691,8 +3928,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> >
> >               /* regular write of data into stack destroys any spilled ptr */
> >               state->stack[spi].spilled_ptr.type = NOT_INIT;
> > -             /* Mark slots as STACK_MISC if they belonged to spilled ptr. */
> > -             if (is_spilled_reg(&state->stack[spi]))
> > +             /* Mark slots as STACK_MISC if they belonged to spilled ptr/dynptr/iter. */
> > +             if (is_stack_slot_special(&state->stack[spi]))
> >                       for (i = 0; i < BPF_REG_SIZE; i++)
> >                               scrub_spilled_slot(&state->stack[spi].slot_type[i]);
>
> It fixes something for dynptr, right?

It's convoluted, I think it might not have a visible effect. This is
the situation of partial (e.g., single byte) overwrite of
STACK_DYNPTR/STACK_ITER, and without this change we'll leave some
slot_types as STACK_MISC, while others as STACK_DYNPTP/STACK_ITER.
This is unexpected state, but I think existing code always checks that
for STACK_DYNPTR's all 8 slots are STACK_DYNPTR.

So I think it's a good clean up, but no consequences for dynptr
correctness. And for STACK_ITER I don't have to worry about such mix,
if any of the slot_type[i] is STACK_ITER, then it's a correct
iterator.

>
> > +static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
> > +                               struct bpf_kfunc_call_arg_meta *meta)
> > +{
> > +     struct bpf_verifier_state *cur_st = env->cur_state, *queued_st;
> > +     struct bpf_func_state *cur_fr = cur_st->frame[cur_st->curframe], *queued_fr;
> > +     struct bpf_reg_state *cur_iter, *queued_iter;
> > +     int iter_frameno = meta->iter.frameno;
> > +     int iter_spi = meta->iter.spi;
> > +
> > +     BTF_TYPE_EMIT(struct bpf_iter);
> > +
> > +     cur_iter = &env->cur_state->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
> > +
> > +     if (cur_iter->iter.state != BPF_ITER_STATE_ACTIVE &&
> > +         cur_iter->iter.state != BPF_ITER_STATE_DRAINED) {
> > +             verbose(env, "verifier internal error: unexpected iterator state %d (%s)\n",
> > +                     cur_iter->iter.state, iter_state_str(cur_iter->iter.state));
> > +             return -EFAULT;
> > +     }
> > +
> > +     if (cur_iter->iter.state == BPF_ITER_STATE_ACTIVE) {
> > +             /* branch out active iter state */
> > +             queued_st = push_stack(env, insn_idx + 1, insn_idx, false);
> > +             if (!queued_st)
> > +                     return -ENOMEM;
> > +
> > +             queued_iter = &queued_st->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
> > +             queued_iter->iter.state = BPF_ITER_STATE_ACTIVE;
> > +             queued_iter->iter.depth++;
> > +
> > +             queued_fr = queued_st->frame[queued_st->curframe];
> > +             mark_ptr_not_null_reg(&queued_fr->regs[BPF_REG_0]);
> > +     }
> > +
> > +     /* switch to DRAINED state, but keep the depth unchanged */
> > +     /* mark current iter state as drained and assume returned NULL */
> > +     cur_iter->iter.state = BPF_ITER_STATE_DRAINED;
> > +     __mark_reg_known_zero(&cur_fr->regs[BPF_REG_0]);
> > +     cur_fr->regs[BPF_REG_0].type = SCALAR_VALUE;
>
> __mark_reg_const_zero() instead?

sure, didn't know about it

>
> > +
> > +     return 0;
> > +}
> ...
> > +static bool is_iter_next_insn(struct bpf_verifier_env *env, int insn_idx, int *reg_idx)
> > +{
> > +     struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
> > +     const struct btf_param *args;
> > +     const struct btf_type *t;
> > +     const struct btf *btf;
> > +     int nargs, i;
> > +
> > +     if (!bpf_pseudo_kfunc_call(insn))
> > +             return false;
> > +     if (!is_iter_next_kfunc(insn->imm))
> > +             return false;
> > +
> > +     btf = find_kfunc_desc_btf(env, insn->off);
> > +     if (IS_ERR(btf))
> > +             return false;
> > +
> > +     t = btf_type_by_id(btf, insn->imm);     /* FUNC */
> > +     t = btf_type_by_id(btf, t->type);       /* FUNC_PROTO */
> > +
> > +     args = btf_params(t);
> > +     nargs = btf_vlen(t);
> > +     for (i = 0; i < nargs; i++) {
> > +             if (is_kfunc_arg_iter(btf, &args[i])) {
> > +                     *reg_idx = BPF_REG_1 + i;
> > +                     return true;
> > +             }
> > +     }
>
> This is some future-proofing ?
> The commit log says that all iterators has to in the form:
> bpf_iter_<kind>_next(struct bpf_iter* it)
> Should we check for one and only arg here instead?

Yeah, a bit of generality. For a long time I had an assumption
hardcoded about first argument being struct bpf_iter *, but that felt
unclean, so I generalized that before submission.

But I can't think why we wouldn't just dictate (and enforce) that
`struct bpf_iter *` is first. It makes sense, it's clean, and we lose
nothing. This is another thing that differs between dynptr and iter,
for dynptr such restriction wouldn't make sense.

Where would be a good place to enforce this for iter kfuncs?

>
> > +
> > +     return false;
> > +}
> > +
> > +/* is_state_visited() handles iter_next() (see process_iter_next_call() for
> > + * terminology) calls specially: as opposed to bounded BPF loops, it *expects*
> > + * state matching, which otherwise looks like an infinite loop. So while
> > + * iter_next() calls are taken care of, we still need to be careful and
> > + * prevent erroneous and too eager declaration of "ininite loop", when
> > + * iterators are involved.
> > + *
> > + * Here's a situation in pseudo-BPF assembly form:
> > + *
> > + *   0: again:                          ; set up iter_next() call args
> > + *   1:   r1 = &it                      ; <CHECKPOINT HERE>
> > + *   2:   call bpf_iter_num_next        ; this is iter_next() call
> > + *   3:   if r0 == 0 goto done
> > + *   4:   ... something useful here ...
> > + *   5:   goto again                    ; another iteration
> > + *   6: done:
> > + *   7:   r1 = &it
> > + *   8:   call bpf_iter_num_destroy     ; clean up iter state
> > + *   9:   exit
> > + *
> > + * This is a typical loop. Let's assume that we have a prune point at 1:,
> > + * before we get to `call bpf_iter_num_next` (e.g., because of that `goto
> > + * again`, assuming other heuristics don't get in a way).
> > + *
> > + * When we first time come to 1:, let's say we have some state X. We proceed
> > + * to 2:, fork states, enqueue ACTIVE, validate NULL case successfully, exit.
> > + * Now we come back to validate that forked ACTIVE state. We proceed through
> > + * 3-5, come to goto, jump to 1:. Let's assume our state didn't change, so we
> > + * are converging. But the problem is that we don't know that yet, as this
> > + * convergence has to happen at iter_next() call site only. So if nothing is
> > + * done, at 1: verifier will use bounded loop logic and declare infinite
> > + * looping (and would be *technically* correct, if not for iterator "eventual
> > + * sticky NULL" contract, see process_iter_next_call()). But we don't want
> > + * that. So what we do in process_iter_next_call() when we go on another
> > + * ACTIVE iteration, we bump slot->iter.depth, to mark that it's a different
> > + * iteration. So when we detect soon-to-be-declared infinite loop, we also
> > + * check if any of *ACTIVE* iterator state's depth differs. If yes, we pretend
> > + * we are not looping and wait for next iter_next() call.
>
> 'depth' part of bpf_reg_state will be checked for equality in regsafe(), right?

no, it is explicitly skipped (and it's actually stacksafe(), not
regsafe()). I can add explicit comment that we *ignore* depth

I was considering adding a flag to states_equal() whether to check
depth or not (that would make iter_active_depths_differ unnecessary),
but it doesn't feel right. Any preferences one way or the other?

> Everytime we branch out in process_iter_next_call() there is depth++
> So how come we're able to converge with:
>  +                     if (is_iter_next_insn(env, insn_idx, &iter_arg_reg_idx)) {
>  +                             if (states_equal(env, &sl->state, cur)) {
> That's because states_equal() is done right before doing process_iter_next_call(), right?

Yes, we check convergence before we process_iter_next_call. We do
converge because we ignore depth, as I mentioned above.

>
> So depth counts the number of times bpf_iter*_next() was processed.
> In other words it's a number of ways the body of the loop can be walked?

More or less, yes. It's a number of sequential unrolls of loop body,
each time with a different starting state. But all that only in the
current code path. So it's not like "how many different loop states we
could have" in total. It's number of loop unrols with the condition
"assuming current code path that led to start of iterator loop". Some
other code path could lead to the state (before iterator loop starts)
that converges faster or slower, which is why I'm pointing out the
distinction.

But I think "yes" would be the answer to the question you had in mind.

>
> > +                     if (is_iter_next_insn(env, insn_idx, &iter_arg_reg_idx)) {
> > +                             if (states_equal(env, &sl->state, cur)) {
> > +                                     struct bpf_func_state *cur_frame;
> > +                                     struct bpf_reg_state *iter_state, *iter_reg;
> > +                                     int spi;
> > +
> > +                                     /* current state is valid due to states_equal(),
> > +                                      * so we can assume valid iter state, no need for extra
> > +                                      * (re-)validations
> > +                                      */
> > +                                     cur_frame = cur->frame[cur->curframe];
> > +                                     iter_reg = &cur_frame->regs[iter_arg_reg_idx];
> > +                                     spi = iter_get_spi(env, iter_reg);
> > +                                     if (spi < 0)
> > +                                             return spi;
> > +                                     iter_state = &func(env, iter_reg)->stack[spi].spilled_ptr;
> > +                                     if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE)
> > +                                             goto hit;
> > +                             }
> > +                             goto skip_inf_loop_check;
>
> This goto is "optional", right?
> Meaning that if we remove it the states_maybe_looping() + states_equal() won't match anyway.
> The goto avoids wasting cycles.

yes, avoids doing this check again, but also felt cleaner to be
explicit about skipping infinite loop check

>
> > +                     }
> > +                     /* attempt to detect infinite loop to avoid unnecessary doomed work */
> > +                     if (states_maybe_looping(&sl->state, cur) &&
>
> Maybe cleaner is to remove above 'goto' and do '} else if (states_maybe_looping' here ?

I can undo this, it felt cleaner with explicit "skip infinite loop
check" both for new code and for that async_entry_cnt check above. But
I can revert to if/else if/else if pattern, though I find it harder to
follow, given all this code (plus comments) is pretty long, so it's
easy to lose track when reading


>
> > +                         states_equal(env, &sl->state, cur) &&
> > +                         !iter_active_depths_differ(&sl->state, cur)) {
> >                               verbose_linfo(env, insn_idx, "; ");
> >                               verbose(env, "infinite loop detected at insn %d\n", insn_idx);
> >                               return -EINVAL;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 14/17] bpf: implement number iterator
  2023-03-04 20:21   ` Alexei Starovoitov
@ 2023-03-04 23:27     ` Andrii Nakryiko
  2023-03-05 23:49       ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-04 23:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sat, Mar 4, 2023 at 12:21 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Mar 02, 2023 at 03:50:12PM -0800, Andrii Nakryiko wrote:
> >
> >  static enum kfunc_ptr_arg_type
> > @@ -10278,7 +10288,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> >                       if (is_kfunc_arg_uninit(btf, &args[i]))
> >                               iter_arg_type |= MEM_UNINIT;
> >
> > -                     ret = process_iter_arg(env, regno, insn_idx, iter_arg_type,  meta);
> > +                     if (meta->func_id == special_kfunc_list[KF_bpf_iter_num_new] ||
> > +                         meta->func_id == special_kfunc_list[KF_bpf_iter_num_next]) {
> > +                             iter_arg_type |= ITER_TYPE_NUM;
> > +                     } else if (meta->func_id == special_kfunc_list[KF_bpf_iter_num_destroy]) {
> > +                             iter_arg_type |= ITER_TYPE_NUM | OBJ_RELEASE;
>
> Since OBJ_RELEASE is set pretty late here and kfuncs are not marked with KF_RELEASE,
> the arg_type_is_release() in check_func_arg_reg_off() won't trigger.

yeah, I had troubles with doing this release using existing scheme.
KF_RELEASE doesn't work, as it makes some extra assumptions about what
was acquired, it didn't fit iters. And I didn't have a precedent in
dynptr to learn from, as RINGBUF dynptr is "acquired" and "released"
using helper. Basically, we don't have dynptr release kfunc yet.

So I set the OBJ_RELEASE flag for process_iter_arg to do an explicit release.

I'd appreciate guidance on how to do this cleaner. Naive attempt to
set KF_ACQUIRE for bpf_iter_num_new() and KF_RELEASE for
bpf_iter_num_destroy() didn't work.


> So I'm confused why there is:
> +               if (arg_type_is_iter(arg_type))
> +                       return 0;
> in the previous patch.
> Will it ever trigger?

maybe not, just followed what dynptr is doing

>
> Separate question: What happens when the user does:
> bpf_iter_destroy(&it);
> bpf_iter_destroy(&it);

After the first destroy stack slots are marked STACK_INVALID, so next
bpf_iter_destroy(&it) will complain about not seeing the initialized
iterator.

>
> +               if (!is_iter_reg_valid_init(env, reg)) {
> +                       verbose(env, "expected an initialized iter as arg #%d\n", regno);
> will trigger, right?
> I didn't find such selftest.

yep, that's the idea, I just checked, I do have such test, it's in
iters_state_safety.c:

__failure __msg("expected an initialized iter as arg #1")
int double_destroy_fail(void *ctx)

There is also next_after_destroy_fail, next_without_new_fail, and
other obvious error conditions. But it would be good for few people to
check that with a fresh eye. I added them a long time ago, and might
have missed something.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros
  2023-03-04 20:34   ` Alexei Starovoitov
@ 2023-03-04 23:28     ` Andrii Nakryiko
  2023-03-06  0:12       ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-04 23:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sat, Mar 4, 2023 at 12:34 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Mar 02, 2023 at 03:50:13PM -0800, Andrii Nakryiko wrote:
> > Add bpf_for_each(), bpf_for() and bpf_repeat() macros that make writing
> > open-coded iterator-based loops much more convenient and natural. These
> > macro utilize cleanup attribute to ensure proper destruction of the
> > iterator and thanks to that manage to provide an ergonomic very close to
> > C language for construct. Typical integer loop would look like:
> >
> >   int i;
> >   int arr[N];
> >
> >   bpf_for(i, 0, N) {
> >   /* verifier will know that i >= 0 && i < N, so could be used to
> >    * directly access array elements with no extra checks
> >    */
> >    arr[i] = i;
> >   }
> >
> > bpf_repeat() is very similar, but it doesn't expose iteration number and
> > is mean as a simple "repeat action N times":
> >
> >   bpf_repeat(N) { /* whatever */ }
> >
> > Note that break and continue inside the {} block work as expected.
> >
> > bpf_for_each() is a generalization over any kind of BPF open-coded
> > iterator allowing to use for-each-like approach instead of calling
> > low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.:
> >
> >   struct cgroup *cg;
> >
> >   bpf_for_each(cgroup, cg, some, input, args) {
> >       /* do something with each cg */
> >   }
> >
> > Would call (right now hypothetical) bpf_iter_cgroup_{new,next,destroy}()
> > functions to form a loop over cgroups, where `some, input, args` are
> > passed verbatim into constructor as
> > bpf_iter_cgroup_new(&it, some, input, args).
> >
> > As a demonstration, add pyperf variant based on bpf_for() loop.
> >
> > Also clean up few tests that either included bpf_misc.h header
> > unnecessarily from user-space or included it before any common types are
> > defined.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  .../bpf/prog_tests/bpf_verif_scale.c          |  6 ++
> >  .../bpf/prog_tests/uprobe_autoattach.c        |  1 -
> >  tools/testing/selftests/bpf/progs/bpf_misc.h  | 76 +++++++++++++++++++
> >  tools/testing/selftests/bpf/progs/lsm.c       |  4 +-
> >  tools/testing/selftests/bpf/progs/pyperf.h    | 14 +++-
> >  .../selftests/bpf/progs/pyperf600_iter.c      |  7 ++
> >  .../selftests/bpf/progs/pyperf600_nounroll.c  |  3 -
> >  7 files changed, 101 insertions(+), 10 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/progs/pyperf600_iter.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > index 5ca252823294..731c343897d8 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > @@ -144,6 +144,12 @@ void test_verif_scale_pyperf600_nounroll()
> >       scale_test("pyperf600_nounroll.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> >  }
> >
> > +void test_verif_scale_pyperf600_iter()
> > +{
> > +     /* open-coded BPF iterator version */
> > +     scale_test("pyperf600_iter.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> > +}
> > +
> >  void test_verif_scale_loop1()
> >  {
> >       scale_test("loop1.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > index 6558c857e620..d5b3377aa33c 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > @@ -3,7 +3,6 @@
> >
> >  #include <test_progs.h>
> >  #include "test_uprobe_autoattach.skel.h"
> > -#include "progs/bpf_misc.h"
> >
> >  /* uprobe attach point */
> >  static noinline int autoattach_trigger_func(int arg1, int arg2, int arg3,
> > diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
> > index f704885aa534..08a791f307a6 100644
> > --- a/tools/testing/selftests/bpf/progs/bpf_misc.h
> > +++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
> > @@ -75,5 +75,81 @@
> >  #define FUNC_REG_ARG_CNT 5
> >  #endif
> >
> > +struct bpf_iter;
> > +
> > +extern int bpf_iter_num_new(struct bpf_iter *it__uninit, int start, int end) __ksym;
> > +extern int *bpf_iter_num_next(struct bpf_iter *it) __ksym;
> > +extern void bpf_iter_num_destroy(struct bpf_iter *it) __ksym;
> > +
> > +#ifndef bpf_for_each
> > +/* bpf_for_each(iter_kind, elem, args...) provides generic construct for using BPF
> > + * open-coded iterators without having to write mundane explicit low-level
> > + * loop. Instead, it provides for()-like generic construct that can be used
> > + * pretty naturally. E.g., for some hypothetical cgroup iterator, you'd write:
> > + *
> > + * struct cgroup *cg, *parent_cg = <...>;
> > + *
> > + * bpf_for_each(cgroup, cg, parent_cg, CG_ITER_CHILDREN) {
> > + *     bpf_printk("Child cgroup id = %d", cg->cgroup_id);
> > + *     if (cg->cgroup_id == 123)
> > + *         break;
> > + * }
> > + *
> > + * I.e., it looks almost like high-level for each loop in other languages,
> > + * supports continue/break, and is verifiable by BPF verifier.
> > + *
> > + * For iterating integers, the difference betwen bpf_for_each(num, i, N, M)
> > + * and bpf_for(i, N, M) is in that bpf_for() provides additional proof to
> > + * verifier that i is in [N, M) range, and in bpf_for_each() case i is `int
> > + * *`, not just `int`. So for integers bpf_for() is more convenient.
> > + */
> > +#define bpf_for_each(type, cur, args...) for (                                                 \
> > +     /* initialize and define destructor */                                            \
> > +     struct bpf_iter ___it __attribute__((cleanup(bpf_iter_##type##_destroy))),        \
>
> We should probably say somewhere that it requires C99 with some flag that allows
> declaring variables inside the loop.

yes, I'll add a comment. I think cleanup attribute isn't standard as
well, I'll mention it. This shouldn't be restrictive, though, as we
expect very modern Clang (or eventually GCC), which definitely will
support all of that. And I feel like most people don't restrict their
BPF-side code to strict C89 anyways.

>
> Also what are the rules for attr(cleanup()).
> When does it get called?

From GCC documentation:

  > The cleanup attribute runs a function when the variable goes out of scope.

So given ___it is bound to for loop, any code path that leads to loop
exit (so, when condition turns false or *breaking* out of the loop,
which is why I use cleanup, this was a saving grace for this approach
to work at all).


> My understanding that the visibility of ___it is only within for() body.

right

> So when the prog does:
> bpf_for(i, 0, 10) sum += i;
> bpf_for(i, 0, 10) sum += i;
>
> the compiler should generate bpf_iter_num_destroy right after each bpf_for() ?

Conceptually, yes, but see the note about breaking out of the loop
above. How actual assembly code is generated is beyond our control. If
the compiler generates multiple separate code paths, each with its own
destroy, that's fine as well. No assumptions are made in the verifier,
we just need to see one bpf_iter_<type>_destroy() for each instance of
iterator.

> Or will it group them at the end of function body and destroy all iterators ?

That would be a bug, as documentation states that clean up happens as
soon as a variable goes out of scope. Delaying clean up could result
in program logic bugs. I.e., we rely on destructors to be called as
soon as possible.

> Will compiler reuse the stack space used by ___it in case there are multiple bpf_for-s ?

That's the question to compiler developers, but I'd assume that, yes,
it should. Why not?

And looking at, for example, iter_pass_iter_ptr_to_subprog which has 4
sequential bpf_for() loops:

0000000000002328 <iter_pass_iter_ptr_to_subprog>:
    1125:       bf a6 00 00 00 00 00 00 r6 = r10
    1126:       07 06 00 00 28 ff ff ff r6 += -216    <-- THIS IS ITER
    1127:       bf 61 00 00 00 00 00 00 r1 = r6
    1128:       b4 02 00 00 00 00 00 00 w2 = 0
    1129:       b4 03 00 00 10 00 00 00 w3 = 16
    1130:       85 10 00 00 ff ff ff ff call -1
                0000000000002350:  R_BPF_64_32  bpf_iter_num_new
    1131:       bf a7 00 00 00 00 00 00 r7 = r10
    1132:       07 07 00 00 c0 ff ff ff r7 += -64
    1133:       bf 61 00 00 00 00 00 00 r1 = r6
    1134:       bf 72 00 00 00 00 00 00 r2 = r7
    1135:       b4 03 00 00 10 00 00 00 w3 = 16
    1136:       b4 04 00 00 02 00 00 00 w4 = 2
    1137:       85 10 00 00 53 00 00 00 call 83
                0000000000002388:  R_BPF_64_32  .text
    1138:       bf 61 00 00 00 00 00 00 r1 = r6
    1139:       85 10 00 00 ff ff ff ff call -1
                0000000000002398:  R_BPF_64_32  bpf_iter_num_destroy
    1140:       bf 61 00 00 00 00 00 00 r1 = r6     <<--- HERE WE REUSE
    1141:       b4 02 00 00 00 00 00 00 w2 = 0
    1142:       b4 03 00 00 20 00 00 00 w3 = 32
    1143:       85 10 00 00 ff ff ff ff call -1
                00000000000023b8:  R_BPF_64_32  bpf_iter_num_new

Note that r6 is set to fp-216 and is just reused as is for second
bpf_for loop (second bpf_iter_num_new) call.

>
> > +     /* ___p pointer is just to call bpf_iter_##type##_new() *once* to init ___it */   \
> > +                     *___p = (bpf_iter_##type##_new(&___it, ##args),           \
> > +     /* this is a workaround for Clang bug: it currently doesn't emit BTF */           \
> > +     /* for bpf_iter_##type##_destroy when used from cleanup() attribute */            \
> > +                             (void)bpf_iter_##type##_destroy, (void *)0);              \
> > +     /* iteration and termination check */                                             \
> > +     ((cur = bpf_iter_##type##_next(&___it)));                                         \
> > +     /* nothing here  */                                                               \
> > +)
> > +#endif /* bpf_for_each */
> > +
> > +#ifndef bpf_for
> > +/* bpf_for(i, start, end) proves to verifier that i is in [start, end) */
> > +#define bpf_for(i, start, end) for (                                                   \
> > +     /* initialize and define destructor */                                            \
> > +     struct bpf_iter ___it __attribute__((cleanup(bpf_iter_num_destroy))),             \
> > +     /* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */   \
> > +                     *___p = (bpf_iter_num_new(&___it, (start), (end)),                \
> > +     /* this is a workaround for Clang bug: it currently doesn't emit BTF */           \
> > +     /* for bpf_iter_num_destroy when used from cleanup() attribute */                 \
> > +                             (void)bpf_iter_num_destroy, (void *)0);                   \
> > +     ({                                                                                \
> > +             /* iteration step */                                                      \
> > +             int *___t = bpf_iter_num_next(&___it);                                    \
> > +             /* termination and bounds check */                                        \
> > +             (___t && ((i) = *___t, i >= (start) && i < (end)));                       \
>
> The i >= (start) && i < (end) is necessary to make sure that the verifier
> tightens the range of 'i' inside the body of the loop and
> when the program does arr[i] access the verifier will know that 'i' is within bounds, right?

yes, it feels like a common pattern, but I was contemplating to add
bpf_for_uncheck() where we "expose" i value, but don't do check. I
decided to keep it simple, as most examples actually required bounds
checks anyways. And for cases where we don't, often bpf_repeat()
suffices. One other way would be to expose i from bpf_repeat(), just
no checks.

One restriction with this approach is that I can't define both `struct
bpf_iter __it` and `int i` inside for loop, so cur/i has to be
declared and passed into bpf_for/bpf_for_each by user explicitly. So
for bpf_repeat() to expose i would require always doing:

int i;

bpf_repeat(i, 100) { /* i is set to 0, 1, ..., 99 */ }

which, if you don't care about iteration number, is a bit of waste. So
don't know, I'm undecided on bpf_repeat with i.

>
> In such case should we add __builtin_constant_p() check for 'start' and 'end' ?

that seems to defy the purpose, as if you know start/end statically,
you might as well just write either unrolled loop or bounded for()
loop

> int arr[100];
> if (var < 100)
>   bpf_for(i, 0, global_var) sum += arr[i];
> will fail the verifier and the users might complain of dumb verifier.


but I'm not following this example, so maybe the answer to above would
be different if I would. What's var and global_var? Are they supposed
to be the same thing? If yes, why would that bpf_for() loop fail?

I suspect you are conflating the other pattern I pointed out with:

int cnt = 0;

bpf_for_each(...) {
   if (cnt++ > 100)
      break;
}

It's different, as cnt comes from outside the loop and is updated on
each iteration. While for

bpf_for(i, 0, var) {
   if (i > 100)
        break;
}

it should work fine, as i is originally unknowable, then narrowed to
*a range* [0, 99]. But that [0, 99] knowledge is part of "inner loop
body" state, so it will still converge as it's going to be basically
ignored during the equivalence check on next bpf_iter_num_next() (old
state didn't know about i yet).

>
> Also if start and end are variables they potentially can change between bpf_iter_num_new()
> and in each iteration of the loop.
> __builtin_constant_p() might be too restrictive.

yep, I think so

> May be read start/end once, at least?

I can't do that, as for() allows only one type of variables to be
defined (note `*___p` hack as well), so there is no place to remember
start/end, unfortunately...

So it's a tradeoff. I can drop range validation, but then every
example and lots of production code would be re-checking these
conditions.


>
> > +     });                                                                               \
> > +     /* nothing here  */                                                               \
> > +)
> > +#endif /* bpf_for */
> > +

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 16/17] selftests/bpf: add iterators tests
  2023-03-04 20:39   ` Alexei Starovoitov
@ 2023-03-04 23:29     ` Andrii Nakryiko
  2023-03-06  0:14       ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-04 23:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sat, Mar 4, 2023 at 12:39 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Mar 02, 2023 at 03:50:14PM -0800, Andrii Nakryiko wrote:
> > +
> > +#ifdef REAL_TEST
>
> Looks like REAL_TEST is never set.
>
> and all bpf_printk-s in tests are never executed, because the test are 'load-only'
> to check the verifier?
>
> It looks like all of them can be run (once printks are removed and converted to if-s).
> That would nicely complement patch 17 runners.
>

Yes, it's a bit sloppy. I used these also as manual tests during
development. I did have an ad-hoc test that attaches and triggers
these programs. And I just manually looked at printk output in
trace_pipe to confirm it does actually work as expected.

And I felt sorry to drop all that, so just added that REAL_TEST hack
to make program code simpler (no extra states for those pid
conditions), it was simpler to debug verification failures, less
states to consider.

I did try to quickly extend RUN_TESTS with the ability to specify a
callback that will be called on success, but it's not trivial if we
want to preserve skeletons, so I abandoned that effort, trying to save
a bit of time. I still want to have RUN_TESTS with ability to specify
callback in the form of:

static void on_success(struct <my_skeleton_type> *skel, struct
bpf_program *prog) {
    ...
}

but it needs more thought and macro magic (or something else), so I
postponed it and wrote simple number iterator tests in patch #17.

> It can be a follow up, of course.

yep, let's keep bpf_printks, as they currently serve as consumers of
variables, preventing the compiler from optimizing loops too much.
This shouldn't be a problem for verification-only kind of tests. And
then with RUN_TESTS() additions, we can actually start executing this.

>
> Great stuff overall!

Thanks!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 16/17] selftests/bpf: add iterators tests
  2023-03-04 21:09   ` Jiri Olsa
@ 2023-03-04 23:29     ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-04 23:29 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sat, Mar 4, 2023 at 1:09 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Mar 02, 2023 at 03:50:14PM -0800, Andrii Nakryiko wrote:
>
> SNIP
>
> > +
> > +SEC("raw_tp")
> > +__success
> > +int iter_pass_iter_ptr_to_subprog(const void *ctx)
> > +{
> > +     int arr1[16], arr2[32];
> > +     struct bpf_iter it;
> > +     int n, sum1, sum2;
> > +
> > +     MY_PID_GUARD();
> > +
> > +     /* fill arr1 */
> > +     n = ARRAY_SIZE(arr1);
> > +     bpf_iter_num_new(&it, 0, n);
> > +     fill(&it, arr1, n, 2);
> > +     bpf_iter_num_destroy(&it);
> > +
> > +     /* fill arr2 */
> > +     n = ARRAY_SIZE(arr2);
> > +     bpf_iter_num_new(&it, 0, n);
> > +     fill(&it, arr2, n, 10);
> > +     bpf_iter_num_destroy(&it);
> > +
> > +     /* sum arr1 */
> > +     n = ARRAY_SIZE(arr1);
> > +     bpf_iter_num_new(&it, 0, n);
> > +     sum1 = sum(&it, arr1, n);
> > +     bpf_iter_num_destroy(&it);
> > +
> > +     /* sum arr2 */
> > +     n = ARRAY_SIZE(arr2);
> > +     bpf_iter_num_new(&it, 0, n);
> > +     sum1 = sum(&it, arr2, n);

this should have been sum2, not sum1, mechanical mistake

> > +     bpf_iter_num_destroy(&it);
> > +
> > +     bpf_printk("sum1=%d, sum2=%d", sum1, sum2);
>
> got to remove this to compile it, debug leftover?

nope, sum1->sum2 typo, which wasn't caught because Dave's patch to
error out on uninitialized variable reads landed a bit later

I'll fix this in v2, of course.


>
> jirka
>
> > +
> > +     return 0;
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";
>
> SNIP

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 13/17] bpf: add support for open-coded iterator loops
  2023-03-04 23:27     ` Andrii Nakryiko
@ 2023-03-05 23:46       ` Alexei Starovoitov
  2023-03-07 21:54         ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-03-05 23:46 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sat, Mar 04, 2023 at 03:27:46PM -0800, Andrii Nakryiko wrote:
> particular type. This will automatically work better for kfuncs as
> new/next/destroy trios will have the same `struct bpf_iter_<type> *`
> and it won't be possible to accidentally pass wrong bpf_iter_<type> to
> wrong new/next/destroy method.

Exactly.

> >
> > > +
> > > +static int mark_stack_slots_iter(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > > +                              enum bpf_arg_type arg_type, int insn_idx)
> > > +{
> > > +     struct bpf_func_state *state = func(env, reg);
> > > +     enum bpf_iter_type type;
> > > +     int spi, i, j, id;
> > > +
> > > +     spi = iter_get_spi(env, reg);
> > > +     if (spi < 0)
> > > +             return spi;
> > > +
> > > +     type = arg_to_iter_type(arg_type);
> > > +     if (type == BPF_ITER_TYPE_INVALID)
> > > +             return -EINVAL;
> >
> > Do we need destroy_if_dynptr_stack_slot() equivalent here?
> 
> no, because bpf_iter is always ref-counted, so we'll always have
> explicit unmark_stack_slots_iter() call, which will reset slot types
> 
> destroy_if_dynptr_stack_slot() was added because LOCAL dynptr doesn't
> require explicit destruction. I mentioned this difference
> (simplification for bpf_iter case) somewhere in the commit message.

I see. Makes sense.

> > >
> > >               /* regular write of data into stack destroys any spilled ptr */
> > >               state->stack[spi].spilled_ptr.type = NOT_INIT;
> > > -             /* Mark slots as STACK_MISC if they belonged to spilled ptr. */
> > > -             if (is_spilled_reg(&state->stack[spi]))
> > > +             /* Mark slots as STACK_MISC if they belonged to spilled ptr/dynptr/iter. */
> > > +             if (is_stack_slot_special(&state->stack[spi]))
> > >                       for (i = 0; i < BPF_REG_SIZE; i++)
> > >                               scrub_spilled_slot(&state->stack[spi].slot_type[i]);
> >
> > It fixes something for dynptr, right?
> 
> It's convoluted, I think it might not have a visible effect. This is
> the situation of partial (e.g., single byte) overwrite of
> STACK_DYNPTR/STACK_ITER, and without this change we'll leave some
> slot_types as STACK_MISC, while others as STACK_DYNPTP/STACK_ITER.
> This is unexpected state, but I think existing code always checks that
> for STACK_DYNPTR's all 8 slots are STACK_DYNPTR.
> 
> So I think it's a good clean up, but no consequences for dynptr
> correctness. And for STACK_ITER I don't have to worry about such mix,
> if any of the slot_type[i] is STACK_ITER, then it's a correct
> iterator.

agree. I was just curious.

> > > +static bool is_iter_next_insn(struct bpf_verifier_env *env, int insn_idx, int *reg_idx)
> > > +{
> > > +     struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
> > > +     const struct btf_param *args;
> > > +     const struct btf_type *t;
> > > +     const struct btf *btf;
> > > +     int nargs, i;
> > > +
> > > +     if (!bpf_pseudo_kfunc_call(insn))
> > > +             return false;
> > > +     if (!is_iter_next_kfunc(insn->imm))
> > > +             return false;
> > > +
> > > +     btf = find_kfunc_desc_btf(env, insn->off);
> > > +     if (IS_ERR(btf))
> > > +             return false;
> > > +
> > > +     t = btf_type_by_id(btf, insn->imm);     /* FUNC */
> > > +     t = btf_type_by_id(btf, t->type);       /* FUNC_PROTO */
> > > +
> > > +     args = btf_params(t);
> > > +     nargs = btf_vlen(t);
> > > +     for (i = 0; i < nargs; i++) {
> > > +             if (is_kfunc_arg_iter(btf, &args[i])) {
> > > +                     *reg_idx = BPF_REG_1 + i;
> > > +                     return true;
> > > +             }
> > > +     }
> >
> > This is some future-proofing ?
> > The commit log says that all iterators has to in the form:
> > bpf_iter_<kind>_next(struct bpf_iter* it)
> > Should we check for one and only arg here instead?
> 
> Yeah, a bit of generality. For a long time I had an assumption
> hardcoded about first argument being struct bpf_iter *, but that felt
> unclean, so I generalized that before submission.
> 
> But I can't think why we wouldn't just dictate (and enforce) that
> `struct bpf_iter *` is first. It makes sense, it's clean, and we lose
> nothing. This is another thing that differs between dynptr and iter,
> for dynptr such restriction wouldn't make sense.
> 
> Where would be a good place to enforce this for iter kfuncs?

I would probably just remove is_iter_next_insn() completely, hard code BPF_REG_1
and add a big comment for now.

In the follow up we can figure out how to:
BUILD_BUG_ON(!__same_type(bpf_iter_num_next, some canonical proto for iter_next));

Like we do:
  BUILD_BUG_ON(!__same_type(ops->map_lookup_elem,
               (void *(*)(struct bpf_map *map, void *key))NULL));

> >
> > 'depth' part of bpf_reg_state will be checked for equality in regsafe(), right?
> 
> no, it is explicitly skipped (and it's actually stacksafe(), not
> regsafe()). I can add explicit comment that we *ignore* depth

Ahh. That's stacksafe() indeed.
Would be great to add a comment to:
+                       if (old_reg->iter.type != cur_reg->iter.type ||
+                           old_reg->iter.state != cur_reg->iter.state ||
+                           !check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
+                               return false;

that depth is explicitly not compared.

> I was considering adding a flag to states_equal() whether to check
> depth or not (that would make iter_active_depths_differ unnecessary),
> but it doesn't feel right. Any preferences one way or the other?

probably overkill. just comment should be enough.

> > Everytime we branch out in process_iter_next_call() there is depth++
> > So how come we're able to converge with:
> >  +                     if (is_iter_next_insn(env, insn_idx, &iter_arg_reg_idx)) {
> >  +                             if (states_equal(env, &sl->state, cur)) {
> > That's because states_equal() is done right before doing process_iter_next_call(), right?
> 
> Yes, we check convergence before we process_iter_next_call. We do
> converge because we ignore depth, as I mentioned above.
> 
> >
> > So depth counts the number of times bpf_iter*_next() was processed.
> > In other words it's a number of ways the body of the loop can be walked?
> 
> More or less, yes. It's a number of sequential unrolls of loop body,
> each time with a different starting state. But all that only in the
> current code path. So it's not like "how many different loop states we
> could have" in total. It's number of loop unrols with the condition
> "assuming current code path that led to start of iterator loop". Some
> other code path could lead to the state (before iterator loop starts)
> that converges faster or slower, which is why I'm pointing out the
> distinction.

got it. I know the comments are big and extensive already, but little bit more
won't hurt.

> >
> > > +                     }
> > > +                     /* attempt to detect infinite loop to avoid unnecessary doomed work */
> > > +                     if (states_maybe_looping(&sl->state, cur) &&
> >
> > Maybe cleaner is to remove above 'goto' and do '} else if (states_maybe_looping' here ?
> 
> I can undo this, it felt cleaner with explicit "skip infinite loop
> check" both for new code and for that async_entry_cnt check above. But
> I can revert to if/else if/else if pattern, though I find it harder to
> follow, given all this code (plus comments) is pretty long, so it's
> easy to lose track when reading

I'm fine whichever way. I just remembered that you typically try to avoid goto-s
and seeing goto here that could have easily been 'else' raised my internal alarm
that I could be missing something subtle in the code here.

> 
> >
> > > +                         states_equal(env, &sl->state, cur) &&
> > > +                         !iter_active_depths_differ(&sl->state, cur)) {
> > >                               verbose_linfo(env, insn_idx, "; ");
> > >                               verbose(env, "infinite loop detected at insn %d\n", insn_idx);
> > >                               return -EINVAL;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 14/17] bpf: implement number iterator
  2023-03-04 23:27     ` Andrii Nakryiko
@ 2023-03-05 23:49       ` Alexei Starovoitov
  0 siblings, 0 replies; 35+ messages in thread
From: Alexei Starovoitov @ 2023-03-05 23:49 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sat, Mar 04, 2023 at 03:27:57PM -0800, Andrii Nakryiko wrote:
> On Sat, Mar 4, 2023 at 12:21 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Mar 02, 2023 at 03:50:12PM -0800, Andrii Nakryiko wrote:
> > >
> > >  static enum kfunc_ptr_arg_type
> > > @@ -10278,7 +10288,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> > >                       if (is_kfunc_arg_uninit(btf, &args[i]))
> > >                               iter_arg_type |= MEM_UNINIT;
> > >
> > > -                     ret = process_iter_arg(env, regno, insn_idx, iter_arg_type,  meta);
> > > +                     if (meta->func_id == special_kfunc_list[KF_bpf_iter_num_new] ||
> > > +                         meta->func_id == special_kfunc_list[KF_bpf_iter_num_next]) {
> > > +                             iter_arg_type |= ITER_TYPE_NUM;
> > > +                     } else if (meta->func_id == special_kfunc_list[KF_bpf_iter_num_destroy]) {
> > > +                             iter_arg_type |= ITER_TYPE_NUM | OBJ_RELEASE;
> >
> > Since OBJ_RELEASE is set pretty late here and kfuncs are not marked with KF_RELEASE,
> > the arg_type_is_release() in check_func_arg_reg_off() won't trigger.
> 
> yeah, I had troubles with doing this release using existing scheme.
> KF_RELEASE doesn't work, as it makes some extra assumptions about what
> was acquired, it didn't fit iters. And I didn't have a precedent in
> dynptr to learn from, as RINGBUF dynptr is "acquired" and "released"
> using helper. Basically, we don't have dynptr release kfunc yet.
> 
> So I set the OBJ_RELEASE flag for process_iter_arg to do an explicit release.
> 
> I'd appreciate guidance on how to do this cleaner. Naive attempt to
> set KF_ACQUIRE for bpf_iter_num_new() and KF_RELEASE for
> bpf_iter_num_destroy() didn't work.

yep. KF_ACQUIRE and KF_RELEASE don't fit here, since they need the register
to be ref_obj_id-ed while here it's in the stack.

The current patch is fine. We can generalize iter and dynptr later in the follow up.

> 
> > So I'm confused why there is:
> > +               if (arg_type_is_iter(arg_type))
> > +                       return 0;
> > in the previous patch.
> > Will it ever trigger?
> 
> maybe not, just followed what dynptr is doing
> 
> >
> > Separate question: What happens when the user does:
> > bpf_iter_destroy(&it);
> > bpf_iter_destroy(&it);
> 
> After the first destroy stack slots are marked STACK_INVALID, so next
> bpf_iter_destroy(&it) will complain about not seeing the initialized
> iterator.
> 
> >
> > +               if (!is_iter_reg_valid_init(env, reg)) {
> > +                       verbose(env, "expected an initialized iter as arg #%d\n", regno);
> > will trigger, right?
> > I didn't find such selftest.
> 
> yep, that's the idea, I just checked, I do have such test, it's in
> iters_state_safety.c:
> 
> __failure __msg("expected an initialized iter as arg #1")
> int double_destroy_fail(void *ctx)

See it now. Thanks for checking.

> There is also next_after_destroy_fail, next_without_new_fail, and
> other obvious error conditions. But it would be good for few people to
> check that with a fresh eye. I added them a long time ago, and might
> have missed something.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros
  2023-03-04 23:28     ` Andrii Nakryiko
@ 2023-03-06  0:12       ` Alexei Starovoitov
  2023-03-07 21:54         ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2023-03-06  0:12 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sat, Mar 04, 2023 at 03:28:03PM -0800, Andrii Nakryiko wrote:
> On Sat, Mar 4, 2023 at 12:34 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Mar 02, 2023 at 03:50:13PM -0800, Andrii Nakryiko wrote:
> > > Add bpf_for_each(), bpf_for() and bpf_repeat() macros that make writing
> > > open-coded iterator-based loops much more convenient and natural. These
> > > macro utilize cleanup attribute to ensure proper destruction of the
> > > iterator and thanks to that manage to provide an ergonomic very close to
> > > C language for construct. Typical integer loop would look like:
> > >
> > >   int i;
> > >   int arr[N];
> > >
> > >   bpf_for(i, 0, N) {
> > >   /* verifier will know that i >= 0 && i < N, so could be used to
> > >    * directly access array elements with no extra checks
> > >    */
> > >    arr[i] = i;
> > >   }
> > >
> > > bpf_repeat() is very similar, but it doesn't expose iteration number and
> > > is mean as a simple "repeat action N times":
> > >
> > >   bpf_repeat(N) { /* whatever */ }
> > >
> > > Note that break and continue inside the {} block work as expected.
> > >
> > > bpf_for_each() is a generalization over any kind of BPF open-coded
> > > iterator allowing to use for-each-like approach instead of calling
> > > low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.:
> > >
> > >   struct cgroup *cg;
> > >
> > >   bpf_for_each(cgroup, cg, some, input, args) {
> > >       /* do something with each cg */
> > >   }
> > >
> > > Would call (right now hypothetical) bpf_iter_cgroup_{new,next,destroy}()
> > > functions to form a loop over cgroups, where `some, input, args` are
> > > passed verbatim into constructor as
> > > bpf_iter_cgroup_new(&it, some, input, args).
> > >
> > > As a demonstration, add pyperf variant based on bpf_for() loop.
> > >
> > > Also clean up few tests that either included bpf_misc.h header
> > > unnecessarily from user-space or included it before any common types are
> > > defined.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  .../bpf/prog_tests/bpf_verif_scale.c          |  6 ++
> > >  .../bpf/prog_tests/uprobe_autoattach.c        |  1 -
> > >  tools/testing/selftests/bpf/progs/bpf_misc.h  | 76 +++++++++++++++++++
> > >  tools/testing/selftests/bpf/progs/lsm.c       |  4 +-
> > >  tools/testing/selftests/bpf/progs/pyperf.h    | 14 +++-
> > >  .../selftests/bpf/progs/pyperf600_iter.c      |  7 ++
> > >  .../selftests/bpf/progs/pyperf600_nounroll.c  |  3 -
> > >  7 files changed, 101 insertions(+), 10 deletions(-)
> > >  create mode 100644 tools/testing/selftests/bpf/progs/pyperf600_iter.c
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > > index 5ca252823294..731c343897d8 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > > @@ -144,6 +144,12 @@ void test_verif_scale_pyperf600_nounroll()
> > >       scale_test("pyperf600_nounroll.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> > >  }
> > >
> > > +void test_verif_scale_pyperf600_iter()
> > > +{
> > > +     /* open-coded BPF iterator version */
> > > +     scale_test("pyperf600_iter.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> > > +}
> > > +
> > >  void test_verif_scale_loop1()
> > >  {
> > >       scale_test("loop1.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > > index 6558c857e620..d5b3377aa33c 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > > @@ -3,7 +3,6 @@
> > >
> > >  #include <test_progs.h>
> > >  #include "test_uprobe_autoattach.skel.h"
> > > -#include "progs/bpf_misc.h"
> > >
> > >  /* uprobe attach point */
> > >  static noinline int autoattach_trigger_func(int arg1, int arg2, int arg3,
> > > diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
> > > index f704885aa534..08a791f307a6 100644
> > > --- a/tools/testing/selftests/bpf/progs/bpf_misc.h
> > > +++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
> > > @@ -75,5 +75,81 @@
> > >  #define FUNC_REG_ARG_CNT 5
> > >  #endif
> > >
> > > +struct bpf_iter;
> > > +
> > > +extern int bpf_iter_num_new(struct bpf_iter *it__uninit, int start, int end) __ksym;
> > > +extern int *bpf_iter_num_next(struct bpf_iter *it) __ksym;
> > > +extern void bpf_iter_num_destroy(struct bpf_iter *it) __ksym;
> > > +
> > > +#ifndef bpf_for_each
> > > +/* bpf_for_each(iter_kind, elem, args...) provides generic construct for using BPF
> > > + * open-coded iterators without having to write mundane explicit low-level
> > > + * loop. Instead, it provides for()-like generic construct that can be used
> > > + * pretty naturally. E.g., for some hypothetical cgroup iterator, you'd write:
> > > + *
> > > + * struct cgroup *cg, *parent_cg = <...>;
> > > + *
> > > + * bpf_for_each(cgroup, cg, parent_cg, CG_ITER_CHILDREN) {
> > > + *     bpf_printk("Child cgroup id = %d", cg->cgroup_id);
> > > + *     if (cg->cgroup_id == 123)
> > > + *         break;
> > > + * }
> > > + *
> > > + * I.e., it looks almost like high-level for each loop in other languages,
> > > + * supports continue/break, and is verifiable by BPF verifier.
> > > + *
> > > + * For iterating integers, the difference betwen bpf_for_each(num, i, N, M)
> > > + * and bpf_for(i, N, M) is in that bpf_for() provides additional proof to
> > > + * verifier that i is in [N, M) range, and in bpf_for_each() case i is `int
> > > + * *`, not just `int`. So for integers bpf_for() is more convenient.
> > > + */
> > > +#define bpf_for_each(type, cur, args...) for (                                                 \
> > > +     /* initialize and define destructor */                                            \
> > > +     struct bpf_iter ___it __attribute__((cleanup(bpf_iter_##type##_destroy))),        \
> >
> > We should probably say somewhere that it requires C99 with some flag that allows
> > declaring variables inside the loop.
> 
> yes, I'll add a comment. I think cleanup attribute isn't standard as
> well, I'll mention it. This shouldn't be restrictive, though, as we
> expect very modern Clang (or eventually GCC), which definitely will
> support all of that. And I feel like most people don't restrict their
> BPF-side code to strict C89 anyways.

yep. No UX concerns. A comment to manage expectations should be enough.

> >
> > Also what are the rules for attr(cleanup()).
> > When does it get called?
> 
> From GCC documentation:
> 
>   > The cleanup attribute runs a function when the variable goes out of scope.
> 
> So given ___it is bound to for loop, any code path that leads to loop
> exit (so, when condition turns false or *breaking* out of the loop,
> which is why I use cleanup, this was a saving grace for this approach
> to work at all).

+1

> 
> > My understanding that the visibility of ___it is only within for() body.
> 
> right
> 
> > So when the prog does:
> > bpf_for(i, 0, 10) sum += i;
> > bpf_for(i, 0, 10) sum += i;
> >
> > the compiler should generate bpf_iter_num_destroy right after each bpf_for() ?
> 
> Conceptually, yes, but see the note about breaking out of the loop
> above. How actual assembly code is generated is beyond our control. If
> the compiler generates multiple separate code paths, each with its own
> destroy, that's fine as well. No assumptions are made in the verifier,
> we just need to see one bpf_iter_<type>_destroy() for each instance of
> iterator.
> 
> > Or will it group them at the end of function body and destroy all iterators ?
> 
> That would be a bug, as documentation states that clean up happens as
> soon as a variable goes out of scope. Delaying clean up could result
> in program logic bugs. I.e., we rely on destructors to be called as
> soon as possible.
> 
> > Will compiler reuse the stack space used by ___it in case there are multiple bpf_for-s ?
> 
> That's the question to compiler developers, but I'd assume that, yes,
> it should. Why not?
> 
> And looking at, for example, iter_pass_iter_ptr_to_subprog which has 4
> sequential bpf_for() loops:
> 
> 0000000000002328 <iter_pass_iter_ptr_to_subprog>:
>     1125:       bf a6 00 00 00 00 00 00 r6 = r10
>     1126:       07 06 00 00 28 ff ff ff r6 += -216    <-- THIS IS ITER
>     1127:       bf 61 00 00 00 00 00 00 r1 = r6
>     1128:       b4 02 00 00 00 00 00 00 w2 = 0
>     1129:       b4 03 00 00 10 00 00 00 w3 = 16
>     1130:       85 10 00 00 ff ff ff ff call -1
>                 0000000000002350:  R_BPF_64_32  bpf_iter_num_new
>     1131:       bf a7 00 00 00 00 00 00 r7 = r10
>     1132:       07 07 00 00 c0 ff ff ff r7 += -64
>     1133:       bf 61 00 00 00 00 00 00 r1 = r6
>     1134:       bf 72 00 00 00 00 00 00 r2 = r7
>     1135:       b4 03 00 00 10 00 00 00 w3 = 16
>     1136:       b4 04 00 00 02 00 00 00 w4 = 2
>     1137:       85 10 00 00 53 00 00 00 call 83
>                 0000000000002388:  R_BPF_64_32  .text
>     1138:       bf 61 00 00 00 00 00 00 r1 = r6
>     1139:       85 10 00 00 ff ff ff ff call -1
>                 0000000000002398:  R_BPF_64_32  bpf_iter_num_destroy
>     1140:       bf 61 00 00 00 00 00 00 r1 = r6     <<--- HERE WE REUSE
>     1141:       b4 02 00 00 00 00 00 00 w2 = 0
>     1142:       b4 03 00 00 20 00 00 00 w3 = 32
>     1143:       85 10 00 00 ff ff ff ff call -1
>                 00000000000023b8:  R_BPF_64_32  bpf_iter_num_new
> 
> Note that r6 is set to fp-216 and is just reused as is for second
> bpf_for loop (second bpf_iter_num_new) call.

Great. Thanks for checking. I was worried that attr(cleanup) will force compiler to think
that different objects cannot reuse the stack slots which will lead to stack exhaustion.
Especially with 24-bytes bpf_iter-s and our tiny 512 limit.

> >
> > > +     /* ___p pointer is just to call bpf_iter_##type##_new() *once* to init ___it */   \
> > > +                     *___p = (bpf_iter_##type##_new(&___it, ##args),           \
> > > +     /* this is a workaround for Clang bug: it currently doesn't emit BTF */           \
> > > +     /* for bpf_iter_##type##_destroy when used from cleanup() attribute */            \
> > > +                             (void)bpf_iter_##type##_destroy, (void *)0);              \
> > > +     /* iteration and termination check */                                             \
> > > +     ((cur = bpf_iter_##type##_next(&___it)));                                         \
> > > +     /* nothing here  */                                                               \
> > > +)
> > > +#endif /* bpf_for_each */
> > > +
> > > +#ifndef bpf_for
> > > +/* bpf_for(i, start, end) proves to verifier that i is in [start, end) */
> > > +#define bpf_for(i, start, end) for (                                                   \
> > > +     /* initialize and define destructor */                                            \
> > > +     struct bpf_iter ___it __attribute__((cleanup(bpf_iter_num_destroy))),             \
> > > +     /* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */   \
> > > +                     *___p = (bpf_iter_num_new(&___it, (start), (end)),                \
> > > +     /* this is a workaround for Clang bug: it currently doesn't emit BTF */           \
> > > +     /* for bpf_iter_num_destroy when used from cleanup() attribute */                 \
> > > +                             (void)bpf_iter_num_destroy, (void *)0);                   \
> > > +     ({                                                                                \
> > > +             /* iteration step */                                                      \
> > > +             int *___t = bpf_iter_num_next(&___it);                                    \
> > > +             /* termination and bounds check */                                        \
> > > +             (___t && ((i) = *___t, i >= (start) && i < (end)));                       \
> >
> > The i >= (start) && i < (end) is necessary to make sure that the verifier
> > tightens the range of 'i' inside the body of the loop and
> > when the program does arr[i] access the verifier will know that 'i' is within bounds, right?
> 
> yes, it feels like a common pattern, but I was contemplating to add
> bpf_for_uncheck() where we "expose" i value, but don't do check. I
> decided to keep it simple, as most examples actually required bounds
> checks anyways. And for cases where we don't, often bpf_repeat()
> suffices. One other way would be to expose i from bpf_repeat(), just
> no checks.
> 
> One restriction with this approach is that I can't define both `struct
> bpf_iter __it` and `int i` inside for loop, so cur/i has to be
> declared and passed into bpf_for/bpf_for_each by user explicitly. So
> for bpf_repeat() to expose i would require always doing:
> 
> int i;
> 
> bpf_repeat(i, 100) { /* i is set to 0, 1, ..., 99 */ }
> 
> which, if you don't care about iteration number, is a bit of waste. So
> don't know, I'm undecided on bpf_repeat with i.

It feels that the proposed bpf_repeat() without 'i' is cleaner.

> >
> > In such case should we add __builtin_constant_p() check for 'start' and 'end' ?
> 
> that seems to defy the purpose, as if you know start/end statically,
> you might as well just write either unrolled loop or bounded for()
> loop
> 
> > int arr[100];
> > if (var < 100)
> >   bpf_for(i, 0, global_var) sum += arr[i];
> > will fail the verifier and the users might complain of dumb verifier.
> 
> 
> but I'm not following this example, so maybe the answer to above would
> be different if I would. What's var and global_var? 

typo. I meant the same var in both places.

> Are they supposed
> to be the same thing? If yes, why would that bpf_for() loop fail?
> 
> I suspect you are conflating the other pattern I pointed out with:
> 
> int cnt = 0;
> 
> bpf_for_each(...) {
>    if (cnt++ > 100)
>       break;
> }
> 
> It's different, as cnt comes from outside the loop and is updated on
> each iteration. While for
> 
> bpf_for(i, 0, var) {
>    if (i > 100)
>         break;
> }
> 
> it should work fine, as i is originally unknowable, then narrowed to
> *a range* [0, 99]. But that [0, 99] knowledge is part of "inner loop
> body" state, so it will still converge as it's going to be basically
> ignored during the equivalence check on next bpf_iter_num_next() (old
> state didn't know about i yet).
> 
> >
> > Also if start and end are variables they potentially can change between bpf_iter_num_new()
> > and in each iteration of the loop.
> > __builtin_constant_p() might be too restrictive.
> 
> yep, I think so
> 
> > May be read start/end once, at least?
> 
> I can't do that, as for() allows only one type of variables to be
> defined (note `*___p` hack as well), so there is no place to remember
> start/end, unfortunately...

I see. Abusing bpf_iter like:
for (struct bpf_iter_num __it, *___p, start_end = (struct bpf_iter_num) {start, end};
is probably too ugly just to READ_ONCE(start).
Especially since bpf_iter_num is opaque.

Please add a comment to warn users that if start/end are variables they
better not change during bpf_for().
I guess plain C loop: for (int i = 0; i < j; i++) will go crazy too if 'j' changes.
I'm not sure what standard says here.
Whether compiler has to reload 'j' every iteration or not.

> So it's a tradeoff. I can drop range validation, but then every
> example and lots of production code would be re-checking these
> conditions.

Understood. Keep it as-is.

> >
> > > +     });                                                                               \
> > > +     /* nothing here  */                                                               \

btw that 'nothing here' comment is distracting when reading this macro.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 16/17] selftests/bpf: add iterators tests
  2023-03-04 23:29     ` Andrii Nakryiko
@ 2023-03-06  0:14       ` Alexei Starovoitov
  0 siblings, 0 replies; 35+ messages in thread
From: Alexei Starovoitov @ 2023-03-06  0:14 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sat, Mar 04, 2023 at 03:29:23PM -0800, Andrii Nakryiko wrote:
> On Sat, Mar 4, 2023 at 12:39 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Mar 02, 2023 at 03:50:14PM -0800, Andrii Nakryiko wrote:
> > > +
> > > +#ifdef REAL_TEST
> >
> > Looks like REAL_TEST is never set.
> >
> > and all bpf_printk-s in tests are never executed, because the test are 'load-only'
> > to check the verifier?
> >
> > It looks like all of them can be run (once printks are removed and converted to if-s).
> > That would nicely complement patch 17 runners.
> >
> 
> Yes, it's a bit sloppy. I used these also as manual tests during
> development. I did have an ad-hoc test that attaches and triggers
> these programs. And I just manually looked at printk output in
> trace_pipe to confirm it does actually work as expected.
> 
> And I felt sorry to drop all that, so just added that REAL_TEST hack
> to make program code simpler (no extra states for those pid
> conditions), it was simpler to debug verification failures, less
> states to consider.
> 
> I did try to quickly extend RUN_TESTS with the ability to specify a
> callback that will be called on success, but it's not trivial if we
> want to preserve skeletons, so I abandoned that effort, trying to save
> a bit of time. I still want to have RUN_TESTS with ability to specify
> callback in the form of:
> 
> static void on_success(struct <my_skeleton_type> *skel, struct
> bpf_program *prog) {
>     ...
> }
> 
> but it needs more thought and macro magic (or something else), so I
> postponed it and wrote simple number iterator tests in patch #17.

Sounds good to me. Follow up is fine.

> > It can be a follow up, of course.
> 
> yep, let's keep bpf_printks, as they currently serve as consumers of
> variables, preventing the compiler from optimizing loops too much.
> This shouldn't be a problem for verification-only kind of tests. And
> then with RUN_TESTS() additions, we can actually start executing this.

+1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 13/17] bpf: add support for open-coded iterator loops
  2023-03-05 23:46       ` Alexei Starovoitov
@ 2023-03-07 21:54         ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-07 21:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sun, Mar 5, 2023 at 3:46 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Mar 04, 2023 at 03:27:46PM -0800, Andrii Nakryiko wrote:
> > particular type. This will automatically work better for kfuncs as
> > new/next/destroy trios will have the same `struct bpf_iter_<type> *`
> > and it won't be possible to accidentally pass wrong bpf_iter_<type> to
> > wrong new/next/destroy method.
>
> Exactly.
>

cool, I'll change this to `struct bpf_iter_<type>`

[...]

>
> > > > +static bool is_iter_next_insn(struct bpf_verifier_env *env, int insn_idx, int *reg_idx)
> > > > +{
> > > > +     struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
> > > > +     const struct btf_param *args;
> > > > +     const struct btf_type *t;
> > > > +     const struct btf *btf;
> > > > +     int nargs, i;
> > > > +
> > > > +     if (!bpf_pseudo_kfunc_call(insn))
> > > > +             return false;
> > > > +     if (!is_iter_next_kfunc(insn->imm))
> > > > +             return false;
> > > > +
> > > > +     btf = find_kfunc_desc_btf(env, insn->off);
> > > > +     if (IS_ERR(btf))
> > > > +             return false;
> > > > +
> > > > +     t = btf_type_by_id(btf, insn->imm);     /* FUNC */
> > > > +     t = btf_type_by_id(btf, t->type);       /* FUNC_PROTO */
> > > > +
> > > > +     args = btf_params(t);
> > > > +     nargs = btf_vlen(t);
> > > > +     for (i = 0; i < nargs; i++) {
> > > > +             if (is_kfunc_arg_iter(btf, &args[i])) {
> > > > +                     *reg_idx = BPF_REG_1 + i;
> > > > +                     return true;
> > > > +             }
> > > > +     }
> > >
> > > This is some future-proofing ?
> > > The commit log says that all iterators has to in the form:
> > > bpf_iter_<kind>_next(struct bpf_iter* it)
> > > Should we check for one and only arg here instead?
> >
> > Yeah, a bit of generality. For a long time I had an assumption
> > hardcoded about first argument being struct bpf_iter *, but that felt
> > unclean, so I generalized that before submission.
> >
> > But I can't think why we wouldn't just dictate (and enforce) that
> > `struct bpf_iter *` is first. It makes sense, it's clean, and we lose
> > nothing. This is another thing that differs between dynptr and iter,
> > for dynptr such restriction wouldn't make sense.
> >
> > Where would be a good place to enforce this for iter kfuncs?
>
> I would probably just remove is_iter_next_insn() completely, hard code BPF_REG_1
> and add a big comment for now.
>
> In the follow up we can figure out how to:
> BUILD_BUG_ON(!__same_type(bpf_iter_num_next, some canonical proto for iter_next));
>
> Like we do:
>   BUILD_BUG_ON(!__same_type(ops->map_lookup_elem,
>                (void *(*)(struct bpf_map *map, void *key))NULL));
>

I ended up adding proper enforcement of iter kfunc prototypes and
generalizing everything with new KF_ITER_xxx flags. It's a pretty
major change that allows implementing new iterators even easier,
please see v2. It's now possible to define iterators in kernel modules
as well, thanks to not having to update core verifier logic for each
new set of iter functions.

> > >
> > > 'depth' part of bpf_reg_state will be checked for equality in regsafe(), right?
> >
> > no, it is explicitly skipped (and it's actually stacksafe(), not
> > regsafe()). I can add explicit comment that we *ignore* depth
>
> Ahh. That's stacksafe() indeed.
> Would be great to add a comment to:
> +                       if (old_reg->iter.type != cur_reg->iter.type ||
> +                           old_reg->iter.state != cur_reg->iter.state ||
> +                           !check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
> +                               return false;
>
> that depth is explicitly not compared.

ok, added

>
> > I was considering adding a flag to states_equal() whether to check
> > depth or not (that would make iter_active_depths_differ unnecessary),
> > but it doesn't feel right. Any preferences one way or the other?
>
> probably overkill. just comment should be enough.
>

[...]

>
> > >
> > > > +                     }
> > > > +                     /* attempt to detect infinite loop to avoid unnecessary doomed work */
> > > > +                     if (states_maybe_looping(&sl->state, cur) &&
> > >
> > > Maybe cleaner is to remove above 'goto' and do '} else if (states_maybe_looping' here ?
> >
> > I can undo this, it felt cleaner with explicit "skip infinite loop
> > check" both for new code and for that async_entry_cnt check above. But
> > I can revert to if/else if/else if pattern, though I find it harder to
> > follow, given all this code (plus comments) is pretty long, so it's
> > easy to lose track when reading
>
> I'm fine whichever way. I just remembered that you typically try to avoid goto-s
> and seeing goto here that could have easily been 'else' raised my internal alarm
> that I could be missing something subtle in the code here.

Well, it depends, as usual. I certainly don't like something like
"goto process_bpf_exit" in do_check(), which jumps into the middle of
a completely unrelated if/else branch, so there's that. But these
"skip a bunch of checks" gotos are cleaner, IMO, if we are talking
about a bunch of complicated if/else if/else checks. Let's go with
explicit goto for now, we can always revert.



>
> >
> > >
> > > > +                         states_equal(env, &sl->state, cur) &&
> > > > +                         !iter_active_depths_differ(&sl->state, cur)) {
> > > >                               verbose_linfo(env, insn_idx, "; ");
> > > >                               verbose(env, "infinite loop detected at insn %d\n", insn_idx);
> > > >                               return -EINVAL;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros
  2023-03-06  0:12       ` Alexei Starovoitov
@ 2023-03-07 21:54         ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2023-03-07 21:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, ast, daniel, kernel-team, Tejun Heo

On Sun, Mar 5, 2023 at 4:12 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Mar 04, 2023 at 03:28:03PM -0800, Andrii Nakryiko wrote:
> > On Sat, Mar 4, 2023 at 12:34 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Mar 02, 2023 at 03:50:13PM -0800, Andrii Nakryiko wrote:
> > > > Add bpf_for_each(), bpf_for() and bpf_repeat() macros that make writing
> > > > open-coded iterator-based loops much more convenient and natural. These
> > > > macro utilize cleanup attribute to ensure proper destruction of the
> > > > iterator and thanks to that manage to provide an ergonomic very close to
> > > > C language for construct. Typical integer loop would look like:
> > > >
> > > >   int i;
> > > >   int arr[N];
> > > >
> > > >   bpf_for(i, 0, N) {
> > > >   /* verifier will know that i >= 0 && i < N, so could be used to
> > > >    * directly access array elements with no extra checks
> > > >    */
> > > >    arr[i] = i;
> > > >   }
> > > >
> > > > bpf_repeat() is very similar, but it doesn't expose iteration number and
> > > > is mean as a simple "repeat action N times":
> > > >
> > > >   bpf_repeat(N) { /* whatever */ }
> > > >
> > > > Note that break and continue inside the {} block work as expected.
> > > >
> > > > bpf_for_each() is a generalization over any kind of BPF open-coded
> > > > iterator allowing to use for-each-like approach instead of calling
> > > > low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.:
> > > >
> > > >   struct cgroup *cg;
> > > >
> > > >   bpf_for_each(cgroup, cg, some, input, args) {
> > > >       /* do something with each cg */
> > > >   }
> > > >
> > > > Would call (right now hypothetical) bpf_iter_cgroup_{new,next,destroy}()
> > > > functions to form a loop over cgroups, where `some, input, args` are
> > > > passed verbatim into constructor as
> > > > bpf_iter_cgroup_new(&it, some, input, args).
> > > >
> > > > As a demonstration, add pyperf variant based on bpf_for() loop.
> > > >
> > > > Also clean up few tests that either included bpf_misc.h header
> > > > unnecessarily from user-space or included it before any common types are
> > > > defined.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > ---
> > > >  .../bpf/prog_tests/bpf_verif_scale.c          |  6 ++
> > > >  .../bpf/prog_tests/uprobe_autoattach.c        |  1 -
> > > >  tools/testing/selftests/bpf/progs/bpf_misc.h  | 76 +++++++++++++++++++
> > > >  tools/testing/selftests/bpf/progs/lsm.c       |  4 +-
> > > >  tools/testing/selftests/bpf/progs/pyperf.h    | 14 +++-
> > > >  .../selftests/bpf/progs/pyperf600_iter.c      |  7 ++
> > > >  .../selftests/bpf/progs/pyperf600_nounroll.c  |  3 -
> > > >  7 files changed, 101 insertions(+), 10 deletions(-)
> > > >  create mode 100644 tools/testing/selftests/bpf/progs/pyperf600_iter.c
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > > > index 5ca252823294..731c343897d8 100644
> > > > --- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > > > @@ -144,6 +144,12 @@ void test_verif_scale_pyperf600_nounroll()
> > > >       scale_test("pyperf600_nounroll.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> > > >  }
> > > >
> > > > +void test_verif_scale_pyperf600_iter()
> > > > +{
> > > > +     /* open-coded BPF iterator version */
> > > > +     scale_test("pyperf600_iter.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> > > > +}
> > > > +
> > > >  void test_verif_scale_loop1()
> > > >  {
> > > >       scale_test("loop1.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > > > index 6558c857e620..d5b3377aa33c 100644
> > > > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > > > @@ -3,7 +3,6 @@
> > > >
> > > >  #include <test_progs.h>
> > > >  #include "test_uprobe_autoattach.skel.h"
> > > > -#include "progs/bpf_misc.h"
> > > >
> > > >  /* uprobe attach point */
> > > >  static noinline int autoattach_trigger_func(int arg1, int arg2, int arg3,
> > > > diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
> > > > index f704885aa534..08a791f307a6 100644
> > > > --- a/tools/testing/selftests/bpf/progs/bpf_misc.h
> > > > +++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
> > > > @@ -75,5 +75,81 @@
> > > >  #define FUNC_REG_ARG_CNT 5
> > > >  #endif
> > > >
> > > > +struct bpf_iter;
> > > > +
> > > > +extern int bpf_iter_num_new(struct bpf_iter *it__uninit, int start, int end) __ksym;
> > > > +extern int *bpf_iter_num_next(struct bpf_iter *it) __ksym;
> > > > +extern void bpf_iter_num_destroy(struct bpf_iter *it) __ksym;
> > > > +
> > > > +#ifndef bpf_for_each
> > > > +/* bpf_for_each(iter_kind, elem, args...) provides generic construct for using BPF
> > > > + * open-coded iterators without having to write mundane explicit low-level
> > > > + * loop. Instead, it provides for()-like generic construct that can be used
> > > > + * pretty naturally. E.g., for some hypothetical cgroup iterator, you'd write:
> > > > + *
> > > > + * struct cgroup *cg, *parent_cg = <...>;
> > > > + *
> > > > + * bpf_for_each(cgroup, cg, parent_cg, CG_ITER_CHILDREN) {
> > > > + *     bpf_printk("Child cgroup id = %d", cg->cgroup_id);
> > > > + *     if (cg->cgroup_id == 123)
> > > > + *         break;
> > > > + * }
> > > > + *
> > > > + * I.e., it looks almost like high-level for each loop in other languages,
> > > > + * supports continue/break, and is verifiable by BPF verifier.
> > > > + *
> > > > + * For iterating integers, the difference betwen bpf_for_each(num, i, N, M)
> > > > + * and bpf_for(i, N, M) is in that bpf_for() provides additional proof to
> > > > + * verifier that i is in [N, M) range, and in bpf_for_each() case i is `int
> > > > + * *`, not just `int`. So for integers bpf_for() is more convenient.
> > > > + */
> > > > +#define bpf_for_each(type, cur, args...) for (                                                 \
> > > > +     /* initialize and define destructor */                                            \
> > > > +     struct bpf_iter ___it __attribute__((cleanup(bpf_iter_##type##_destroy))),        \
> > >
> > > We should probably say somewhere that it requires C99 with some flag that allows
> > > declaring variables inside the loop.
> >
> > yes, I'll add a comment. I think cleanup attribute isn't standard as
> > well, I'll mention it. This shouldn't be restrictive, though, as we
> > expect very modern Clang (or eventually GCC), which definitely will
> > support all of that. And I feel like most people don't restrict their
> > BPF-side code to strict C89 anyways.
>
> yep. No UX concerns. A comment to manage expectations should be enough.
>

added


> > >
> > > Also what are the rules for attr(cleanup()).
> > > When does it get called?
> >
> > From GCC documentation:
> >
> >   > The cleanup attribute runs a function when the variable goes out of scope.
> >
> > So given ___it is bound to for loop, any code path that leads to loop
> > exit (so, when condition turns false or *breaking* out of the loop,
> > which is why I use cleanup, this was a saving grace for this approach
> > to work at all).
>
> +1
>
> >
> > > My understanding that the visibility of ___it is only within for() body.
> >
> > right
> >
> > > So when the prog does:
> > > bpf_for(i, 0, 10) sum += i;
> > > bpf_for(i, 0, 10) sum += i;
> > >
> > > the compiler should generate bpf_iter_num_destroy right after each bpf_for() ?
> >
> > Conceptually, yes, but see the note about breaking out of the loop
> > above. How actual assembly code is generated is beyond our control. If
> > the compiler generates multiple separate code paths, each with its own
> > destroy, that's fine as well. No assumptions are made in the verifier,
> > we just need to see one bpf_iter_<type>_destroy() for each instance of
> > iterator.
> >

[...]

> > One restriction with this approach is that I can't define both `struct
> > bpf_iter __it` and `int i` inside for loop, so cur/i has to be
> > declared and passed into bpf_for/bpf_for_each by user explicitly. So
> > for bpf_repeat() to expose i would require always doing:
> >
> > int i;
> >
> > bpf_repeat(i, 100) { /* i is set to 0, 1, ..., 99 */ }
> >
> > which, if you don't care about iteration number, is a bit of waste. So
> > don't know, I'm undecided on bpf_repeat with i.
>
> It feels that the proposed bpf_repeat() without 'i' is cleaner.
>

agreed, I'm keeping it as is

> > >
> > > In such case should we add __builtin_constant_p() check for 'start' and 'end' ?
> >
> > that seems to defy the purpose, as if you know start/end statically,
> > you might as well just write either unrolled loop or bounded for()
> > loop
> >
> > > int arr[100];
> > > if (var < 100)
> > >   bpf_for(i, 0, global_var) sum += arr[i];
> > > will fail the verifier and the users might complain of dumb verifier.
> >
> >

[...]

> >
> > I can't do that, as for() allows only one type of variables to be
> > defined (note `*___p` hack as well), so there is no place to remember
> > start/end, unfortunately...
>
> I see. Abusing bpf_iter like:
> for (struct bpf_iter_num __it, *___p, start_end = (struct bpf_iter_num) {start, end};
> is probably too ugly just to READ_ONCE(start).
> Especially since bpf_iter_num is opaque.

yeah, that seems like too much

>
> Please add a comment to warn users that if start/end are variables they
> better not change during bpf_for().
> I guess plain C loop: for (int i = 0; i < j; i++) will go crazy too if 'j' changes.
> I'm not sure what standard says here.
> Whether compiler has to reload 'j' every iteration or not.

no idea, but yep, will add a comment

>
> > So it's a tradeoff. I can drop range validation, but then every
> > example and lots of production code would be re-checking these
> > conditions.
>
> Understood. Keep it as-is.
>
> > >
> > > > +     });                                                                               \
> > > > +     /* nothing here  */                                                               \
>
> btw that 'nothing here' comment is distracting when reading this macro.

ok, dropped this

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-03-07 21:55 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-02 23:49 [PATCH bpf-next 00/17] BPF open-coded iterators Andrii Nakryiko
2023-03-02 23:49 ` [PATCH bpf-next 01/17] bpf: improve stack slot state printing Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 02/17] bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 03/17] selftests/bpf: enhance align selftest's expected log matching Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 04/17] bpf: honor env->test_state_freq flag in is_state_visited() Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 05/17] selftests/bpf: adjust log_fixup's buffer size for proper truncation Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 06/17] bpf: clean up visit_insn()'s instruction processing Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 07/17] bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 08/17] bpf: ensure that r0 is marked scratched after any function call Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 09/17] bpf: move kfunc_call_arg_meta higher in the file Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 10/17] bpf: mark PTR_TO_MEM as non-null register type Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 11/17] bpf: generalize dynptr_get_spi to be usable for iters Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 12/17] bpf: add support for fixed-size memory pointer returns for kfuncs Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 13/17] bpf: add support for open-coded iterator loops Andrii Nakryiko
2023-03-04 20:02   ` Alexei Starovoitov
2023-03-04 23:27     ` Andrii Nakryiko
2023-03-05 23:46       ` Alexei Starovoitov
2023-03-07 21:54         ` Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 14/17] bpf: implement number iterator Andrii Nakryiko
2023-03-04 20:21   ` Alexei Starovoitov
2023-03-04 23:27     ` Andrii Nakryiko
2023-03-05 23:49       ` Alexei Starovoitov
2023-03-02 23:50 ` [PATCH bpf-next 15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros Andrii Nakryiko
2023-03-04 20:34   ` Alexei Starovoitov
2023-03-04 23:28     ` Andrii Nakryiko
2023-03-06  0:12       ` Alexei Starovoitov
2023-03-07 21:54         ` Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 16/17] selftests/bpf: add iterators tests Andrii Nakryiko
2023-03-04 20:39   ` Alexei Starovoitov
2023-03-04 23:29     ` Andrii Nakryiko
2023-03-06  0:14       ` Alexei Starovoitov
2023-03-04 21:09   ` Jiri Olsa
2023-03-04 23:29     ` Andrii Nakryiko
2023-03-02 23:50 ` [PATCH bpf-next 17/17] selftests/bpf: add number iterator tests Andrii Nakryiko
2023-03-04 19:30 ` [PATCH bpf-next 00/17] BPF open-coded iterators patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.