bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/2] bpf: Add a generic bits iterator
@ 2024-02-25 10:06 Yafang Shao
  2024-02-25 10:06 ` [PATCH v2 bpf-next 1/2] bpf: Add " Yafang Shao
  2024-02-25 10:06 ` [PATCH v2 bpf-next 2/2] selftests/bpf: Add selftest for bits iter Yafang Shao
  0 siblings, 2 replies; 11+ messages in thread
From: Yafang Shao @ 2024-02-25 10:06 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, Yafang Shao

Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
added for the new bpf_iter_bits functionality. These kfuncs enable the
iteration of the bits from a given address and a given number of bits.

- bpf_iter_bits_new
  Initialize a new bits iterator for a given memory area. Due to the
  limitation of bpf memalloc, the max number of bits to be iterated
  over is (4096 * 8).
- bpf_iter_bits_next
  Get the next bit in a bpf_iter_bits
- bpf_iter_bits_destroy
  Destroy a bpf_iter_bits

The bits iterator can be used in any context and on any address.

In our specific use case, we leverage the cgroup iterator to traverse
percpu data, subsequently exposing it to userspace through a seq file.
Refer to example in patch #2 for the usage.

Changes:
- v1->v2:
  - Simplify the CPU number verification code to avoid the failure on s390x
    (Eduard)
- bpf: Add bpf_iter_cpumask
  https://lwn.net/Articles/961104/
- bpf: Add new bpf helper bpf_for_each_cpu
  https://lwn.net/Articles/939939/

Yafang Shao (2):
  bpf: Add bits iterator
  selftests/bpf: Add selftest for bits iter

 kernel/bpf/helpers.c                          | 100 +++++++++++++
 .../selftests/bpf/prog_tests/bits_iter.c      | 132 ++++++++++++++++++
 .../bpf/progs/test_bits_iter_failure.c        |  54 +++++++
 .../bpf/progs/test_bits_iter_success.c        | 112 +++++++++++++++
 4 files changed, 398 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bits_iter.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_bits_iter_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_bits_iter_success.c

-- 
2.39.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 1/2] bpf: Add bits iterator
  2024-02-25 10:06 [PATCH v2 bpf-next 0/2] bpf: Add a generic bits iterator Yafang Shao
@ 2024-02-25 10:06 ` Yafang Shao
  2024-02-28  1:24   ` Andrii Nakryiko
  2024-02-25 10:06 ` [PATCH v2 bpf-next 2/2] selftests/bpf: Add selftest for bits iter Yafang Shao
  1 sibling, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2024-02-25 10:06 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, Yafang Shao

Add three new kfuncs for the bits iterator:
- bpf_iter_bits_new
  Initialize a new bits iterator for a given memory area. Due to the
  limitation of bpf memalloc, the max number of bits that can be iterated
  over is limited to (4096 * 8).
- bpf_iter_bits_next
  Get the next bit in a bpf_iter_bits
- bpf_iter_bits_destroy
  Destroy a bpf_iter_bits

The bits iterator facilitates the iteration of the bits of a memory area,
such as cpumask. It can be used in any context and on any address.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/helpers.c | 100 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 93edf730d288..052f63891834 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2542,6 +2542,103 @@ __bpf_kfunc void bpf_throw(u64 cookie)
 	WARN(1, "A call to BPF exception callback should never return\n");
 }
 
+struct bpf_iter_bits {
+	__u64 __opaque[2];
+} __aligned(8);
+
+struct bpf_iter_bits_kern {
+	unsigned long *bits;
+	u32 nr_bits;
+	int bit;
+} __aligned(8);
+
+/**
+ * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
+ * @it: The new bpf_iter_bits to be created
+ * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over
+ * @nr_bits: The number of bits to be iterated over. Due to the limitation of
+ * memalloc, it can't greater than (4096 * 8).
+ *
+ * This function initializes a new bpf_iter_bits structure for iterating over
+ * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It
+ * copy the data of the memory area to the newly created bpf_iter_bits @it for
+ * subsequent iteration operations.
+ *
+ * On success, 0 is returned. On failure, ERR is returned.
+ */
+__bpf_kfunc int
+bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits)
+{
+	struct bpf_iter_bits_kern *kit = (void *)it;
+	u32 size = BITS_TO_BYTES(nr_bits);
+	int err;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) !=
+		     __alignof__(struct bpf_iter_bits));
+
+	if (!unsafe_ptr__ign || !nr_bits) {
+		kit->bits = NULL;
+		return -EINVAL;
+	}
+
+	kit->bits = bpf_mem_alloc(&bpf_global_ma, size);
+	if (!kit->bits)
+		return -ENOMEM;
+
+	err = bpf_probe_read_kernel_common(kit->bits, size, unsafe_ptr__ign);
+	if (err) {
+		bpf_mem_free(&bpf_global_ma, kit->bits);
+		kit->bits = NULL;
+		return err;
+	}
+
+	kit->nr_bits = nr_bits;
+	kit->bit = -1;
+	return 0;
+}
+
+/**
+ * bpf_iter_bits_next() - Get the next bit in a bpf_iter_bits
+ * @it: The bpf_iter_bits to be checked
+ *
+ * This function returns a pointer to a number representing the value of the
+ * next bit in the bits.
+ *
+ * If there are no further bit available, it returns NULL.
+ */
+__bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it)
+{
+	struct bpf_iter_bits_kern *kit = (void *)it;
+	const unsigned long *bits = kit->bits;
+	int bit;
+
+	if (!bits)
+		return NULL;
+
+	bit = find_next_bit(bits, kit->nr_bits, kit->bit + 1);
+	if (bit >= kit->nr_bits)
+		return NULL;
+
+	kit->bit = bit;
+	return &kit->bit;
+}
+
+/**
+ * bpf_iter_bits_destroy() - Destroy a bpf_iter_bits
+ * @it: The bpf_iter_bits to be destroyed
+ *
+ * Destroy the resource associated with the bpf_iter_bits.
+ */
+__bpf_kfunc void bpf_iter_bits_destroy(struct bpf_iter_bits *it)
+{
+	struct bpf_iter_bits_kern *kit = (void *)it;
+
+	if (!kit->bits)
+		return;
+	bpf_mem_free(&bpf_global_ma, kit->bits);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(generic_btf_ids)
@@ -2618,6 +2715,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
 BTF_ID_FLAGS(func, bpf_dynptr_size)
 BTF_ID_FLAGS(func, bpf_dynptr_clone)
+BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 2/2] selftests/bpf: Add selftest for bits iter
  2024-02-25 10:06 [PATCH v2 bpf-next 0/2] bpf: Add a generic bits iterator Yafang Shao
  2024-02-25 10:06 ` [PATCH v2 bpf-next 1/2] bpf: Add " Yafang Shao
@ 2024-02-25 10:06 ` Yafang Shao
  1 sibling, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2024-02-25 10:06 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, Yafang Shao

Add selftests for the newly added bits iter.
- bits_iter_success
  - percpu data extracted from the percpu struct should be expected
  - RCU lock is not required
  - It is fine without calling bpf_iter_cpumask_next()
  - It can work as expected when invalid arguments are passed

- bits_iter_failure
  - bpf_iter_bits_destroy() is required after calling
    bpf_iter_bits_new()
  - bpf_iter_bits_destroy() can only destroy an initialized iter
  - bpf_iter_bits_next() must use an initialized iter

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 .../selftests/bpf/prog_tests/bits_iter.c      | 132 ++++++++++++++++++
 .../bpf/progs/test_bits_iter_failure.c        |  54 +++++++
 .../bpf/progs/test_bits_iter_success.c        | 112 +++++++++++++++
 3 files changed, 298 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bits_iter.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_bits_iter_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_bits_iter_success.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bits_iter.c b/tools/testing/selftests/bpf/prog_tests/bits_iter.c
new file mode 100644
index 000000000000..8148b8991276
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bits_iter.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Yafang Shao <laoar.shao@gmail.com> */
+
+#define _GNU_SOURCE
+#include <sched.h>
+
+#include <test_progs.h>
+#include "test_bits_iter_success.skel.h"
+#include "test_bits_iter_failure.skel.h"
+#include "cgroup_helpers.h"
+
+static const char * const positive_testcases[] = {
+	"cpumask_iter",
+};
+
+static const char * const negative_testcases[] = {
+	"null_pointer",
+	"zero_bit",
+	"no_mem",
+};
+
+static int read_percpu_data(struct bpf_link *link)
+{
+	int iter_fd, len;
+	char buf[128];
+	size_t left;
+	char *p;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (!ASSERT_GE(iter_fd, 0, "iter_fd"))
+		return -1;
+
+	memset(buf, 0, sizeof(buf));
+	left = ARRAY_SIZE(buf);
+	p = buf;
+	while ((len = read(iter_fd, p, left)) > 0) {
+		p += len;
+		left -= len;
+	}
+
+	close(iter_fd);
+	return 0;
+}
+
+static void verify_iter_success(const char *prog_name, bool negative)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	struct test_bits_iter_success *skel;
+	union bpf_iter_link_info linfo;
+	struct bpf_program *prog;
+	struct bpf_link *link;
+	int cgrp_fd, err, i;
+	cpu_set_t set;
+
+	if (setup_cgroup_environment())
+		return;
+
+	/* Utilize the cgroup iter */
+	cgrp_fd = get_root_cgroup();
+	if (!ASSERT_GE(cgrp_fd, 0, "create_cgrp"))
+		goto cleanup;
+
+	skel = test_bits_iter_success__open();
+	if (!ASSERT_OK_PTR(skel, "cpumask_iter_success__open"))
+		goto close_fd;
+
+	skel->bss->pid = getpid();
+
+	err = test_bits_iter_success__load(skel);
+	if (!ASSERT_OK(err, "cpumask_iter_success__load"))
+		goto destroy;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto destroy;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.cgroup.cgroup_fd = cgrp_fd;
+	linfo.cgroup.order = BPF_CGROUP_ITER_SELF_ONLY;
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+	link = bpf_program__attach_iter(prog, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach"))
+		goto destroy;
+
+	if (negative)
+		goto negative;
+
+	CPU_ZERO(&set);
+	for (i = 0; i < libbpf_num_possible_cpus(); i++)
+		CPU_SET(i, &set);
+	err = sched_setaffinity(skel->bss->pid, sizeof(set), &set);
+	if (!ASSERT_OK(err, "setaffinity_all_cpus"))
+		goto free_link;
+	err = read_percpu_data(link);
+	if (!ASSERT_OK(err, "read_percpu_data"))
+		goto free_link;
+
+negative:
+	ASSERT_OK(skel->bss->err, "not_zero");
+
+free_link:
+	bpf_link__destroy(link);
+destroy:
+	test_bits_iter_success__destroy(skel);
+close_fd:
+	close(cgrp_fd);
+cleanup:
+	cleanup_cgroup_environment();
+}
+
+void test_bits_iter(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(positive_testcases); i++) {
+		if (!test__start_subtest(positive_testcases[i]))
+			continue;
+
+		verify_iter_success(positive_testcases[i], true);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(negative_testcases); i++) {
+		if (!test__start_subtest(negative_testcases[i]))
+			continue;
+
+		verify_iter_success(negative_testcases[i], true);
+	}
+
+	RUN_TESTS(test_bits_iter_success);
+	RUN_TESTS(test_bits_iter_failure);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_bits_iter_failure.c b/tools/testing/selftests/bpf/progs/test_bits_iter_failure.c
new file mode 100644
index 000000000000..974d0b7a540e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_bits_iter_failure.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Yafang Shao <laoar.shao@gmail.com> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include "bpf_misc.h"
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+int bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign,
+		      u32 nr_bits) __ksym __weak;
+int *bpf_iter_bits_next(struct bpf_iter_bits *it) __ksym __weak;
+void bpf_iter_bits_destroy(struct bpf_iter_bits *it) __ksym __weak;
+
+SEC("iter.s/cgroup")
+__failure __msg("Unreleased reference id=3 alloc_insn=10")
+int BPF_PROG(no_destroy, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_bits it;
+	struct task_struct *p;
+
+	p = bpf_task_from_pid(1);
+	if (!p)
+		return 1;
+
+	bpf_iter_bits_new(&it, p->cpus_ptr, 8192);
+
+	bpf_iter_bits_next(&it);
+	bpf_task_release(p);
+	return 0;
+}
+
+SEC("iter/cgroup")
+__failure __msg("expected an initialized iter_bits as arg #1")
+int BPF_PROG(next_uninit, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_bits *it = NULL;
+
+	bpf_iter_bits_next(it);
+	return 0;
+}
+
+SEC("iter/cgroup")
+__failure __msg("expected an initialized iter_bits as arg #1")
+int BPF_PROG(destroy_uninit, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_bits it = {};
+
+	bpf_iter_bits_destroy(&it);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/test_bits_iter_success.c b/tools/testing/selftests/bpf/progs/test_bits_iter_success.c
new file mode 100644
index 000000000000..aaa14e491c0c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_bits_iter_success.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Yafang Shao <laoar.shao@gmail.com> */
+
+#include "vmlinux.h"
+#include <linux/const.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+extern const struct rq runqueues __ksym __weak;
+
+int bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign,
+		      u32 nr_bits) __ksym __weak;
+int *bpf_iter_bits_next(struct bpf_iter_bits *it) __ksym __weak;
+void bpf_iter_bits_destroy(struct bpf_iter_bits *it) __ksym __weak;
+
+int pid, err;
+
+SEC("iter.s/cgroup")
+int BPF_PROG(cpumask_iter, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct task_struct *p;
+	u32 nr_running = 0;
+	struct rq *rq;
+	int *cpu;
+
+	/* epilogue */
+	if (!cgrp)
+		return 0;
+
+	p = bpf_task_from_pid(pid);
+	if (!p)
+		return 1;
+
+	bpf_for_each(bits, cpu, p->cpus_ptr, 8192) {
+		rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, *cpu);
+		/* Each valid CPU must have a runqueue, even if it is offline. */
+		if (!rq)
+			break;
+
+		nr_running += rq->nr_running;
+	}
+	if (nr_running == 0)
+		err = 1;
+	bpf_task_release(p);
+	return 0;
+}
+
+SEC("iter.s/cgroup")
+int BPF_PROG(null_pointer, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	int *cpu;
+
+	bpf_for_each(bits, cpu, NULL, 8192)
+		err++;
+	return 0;
+}
+
+SEC("iter.s/cgroup")
+int BPF_PROG(zero_bit, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct task_struct *p;
+	int *cpu;
+
+	p = bpf_task_from_pid(pid);
+	if (!p)
+		return 1;
+
+	bpf_for_each(bits, cpu, p->cpus_ptr, 0)
+		err++;
+	bpf_task_release(p);
+	return 0;
+}
+
+SEC("iter.s/cgroup")
+int BPF_PROG(no_mem, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct task_struct *p;
+	int *cpu;
+
+	p = bpf_task_from_pid(pid);
+	if (!p)
+		return 1;
+
+	/* The max size of memalloc is 4096, so it will fail to allocate (8192 * 8) */
+	bpf_for_each(bits, cpu, p->cpus_ptr, 8192 * 8)
+		err++;
+	bpf_task_release(p);
+	return 0;
+}
+
+SEC("iter.s/cgroup")
+int BPF_PROG(no_next, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_bits it;
+	struct task_struct *p;
+
+	p = bpf_task_from_pid(1);
+	if (!p)
+		return 1;
+
+	bpf_iter_bits_new(&it, p->cpus_ptr, 8192);
+
+	/* It is fine without calling bpf_iter_bits_next(). */
+
+	bpf_iter_bits_destroy(&it);
+	bpf_task_release(p);
+	return 0;
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/2] bpf: Add bits iterator
  2024-02-25 10:06 ` [PATCH v2 bpf-next 1/2] bpf: Add " Yafang Shao
@ 2024-02-28  1:24   ` Andrii Nakryiko
  2024-02-28  2:25     ` Yafang Shao
  0 siblings, 1 reply; 11+ messages in thread
From: Andrii Nakryiko @ 2024-02-28  1:24 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Sun, Feb 25, 2024 at 2:07 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Add three new kfuncs for the bits iterator:
> - bpf_iter_bits_new
>   Initialize a new bits iterator for a given memory area. Due to the
>   limitation of bpf memalloc, the max number of bits that can be iterated
>   over is limited to (4096 * 8).
> - bpf_iter_bits_next
>   Get the next bit in a bpf_iter_bits
> - bpf_iter_bits_destroy
>   Destroy a bpf_iter_bits
>
> The bits iterator facilitates the iteration of the bits of a memory area,
> such as cpumask. It can be used in any context and on any address.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  kernel/bpf/helpers.c | 100 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 100 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 93edf730d288..052f63891834 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2542,6 +2542,103 @@ __bpf_kfunc void bpf_throw(u64 cookie)
>         WARN(1, "A call to BPF exception callback should never return\n");
>  }
>
> +struct bpf_iter_bits {
> +       __u64 __opaque[2];
> +} __aligned(8);
> +
> +struct bpf_iter_bits_kern {
> +       unsigned long *bits;
> +       u32 nr_bits;
> +       int bit;
> +} __aligned(8);
> +
> +/**
> + * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
> + * @it: The new bpf_iter_bits to be created
> + * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over
> + * @nr_bits: The number of bits to be iterated over. Due to the limitation of
> + * memalloc, it can't greater than (4096 * 8).
> + *
> + * This function initializes a new bpf_iter_bits structure for iterating over
> + * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It
> + * copy the data of the memory area to the newly created bpf_iter_bits @it for
> + * subsequent iteration operations.
> + *
> + * On success, 0 is returned. On failure, ERR is returned.
> + */
> +__bpf_kfunc int
> +bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits)
> +{
> +       struct bpf_iter_bits_kern *kit = (void *)it;
> +       u32 size = BITS_TO_BYTES(nr_bits);
> +       int err;
> +
> +       BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits));
> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) !=
> +                    __alignof__(struct bpf_iter_bits));
> +
> +       if (!unsafe_ptr__ign || !nr_bits) {
> +               kit->bits = NULL;
> +               return -EINVAL;
> +       }
> +
> +       kit->bits = bpf_mem_alloc(&bpf_global_ma, size);
> +       if (!kit->bits)
> +               return -ENOMEM;

it's probably going to be a pretty common case to do bits iteration
for nr_bits<=64, right? So as an optimization, instead of doing
bpf_mem_alloc() for this case, you can just copy up to 8 bytes and
store it in a union of `unsigned long *bits` and `unsigned long
bits_copy`. As a performance optimization (and to reduce dependency on
memory allocation). WDYT?

> +
> +       err = bpf_probe_read_kernel_common(kit->bits, size, unsafe_ptr__ign);
> +       if (err) {
> +               bpf_mem_free(&bpf_global_ma, kit->bits);
> +               kit->bits = NULL;
> +               return err;
> +       }
> +
> +       kit->nr_bits = nr_bits;
> +       kit->bit = -1;
> +       return 0;
> +}
> +
> +/**
> + * bpf_iter_bits_next() - Get the next bit in a bpf_iter_bits
> + * @it: The bpf_iter_bits to be checked
> + *
> + * This function returns a pointer to a number representing the value of the
> + * next bit in the bits.
> + *
> + * If there are no further bit available, it returns NULL.
> + */
> +__bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it)
> +{
> +       struct bpf_iter_bits_kern *kit = (void *)it;
> +       const unsigned long *bits = kit->bits;
> +       int bit;
> +
> +       if (!bits)
> +               return NULL;
> +
> +       bit = find_next_bit(bits, kit->nr_bits, kit->bit + 1);
> +       if (bit >= kit->nr_bits)
> +               return NULL;
> +
> +       kit->bit = bit;
> +       return &kit->bit;
> +}
> +
> +/**
> + * bpf_iter_bits_destroy() - Destroy a bpf_iter_bits
> + * @it: The bpf_iter_bits to be destroyed
> + *
> + * Destroy the resource associated with the bpf_iter_bits.
> + */
> +__bpf_kfunc void bpf_iter_bits_destroy(struct bpf_iter_bits *it)
> +{
> +       struct bpf_iter_bits_kern *kit = (void *)it;
> +
> +       if (!kit->bits)
> +               return;
> +       bpf_mem_free(&bpf_global_ma, kit->bits);
> +}
> +
>  __bpf_kfunc_end_defs();
>
>  BTF_KFUNCS_START(generic_btf_ids)
> @@ -2618,6 +2715,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>  BTF_ID_FLAGS(func, bpf_dynptr_size)
>  BTF_ID_FLAGS(func, bpf_dynptr_clone)
> +BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>  BTF_KFUNCS_END(common_btf_ids)
>
>  static const struct btf_kfunc_id_set common_kfunc_set = {
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/2] bpf: Add bits iterator
  2024-02-28  1:24   ` Andrii Nakryiko
@ 2024-02-28  2:25     ` Yafang Shao
  2024-02-28  6:04       ` Andrii Nakryiko
  0 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2024-02-28  2:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Wed, Feb 28, 2024 at 9:24 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sun, Feb 25, 2024 at 2:07 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > Add three new kfuncs for the bits iterator:
> > - bpf_iter_bits_new
> >   Initialize a new bits iterator for a given memory area. Due to the
> >   limitation of bpf memalloc, the max number of bits that can be iterated
> >   over is limited to (4096 * 8).
> > - bpf_iter_bits_next
> >   Get the next bit in a bpf_iter_bits
> > - bpf_iter_bits_destroy
> >   Destroy a bpf_iter_bits
> >
> > The bits iterator facilitates the iteration of the bits of a memory area,
> > such as cpumask. It can be used in any context and on any address.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  kernel/bpf/helpers.c | 100 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 100 insertions(+)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 93edf730d288..052f63891834 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2542,6 +2542,103 @@ __bpf_kfunc void bpf_throw(u64 cookie)
> >         WARN(1, "A call to BPF exception callback should never return\n");
> >  }
> >
> > +struct bpf_iter_bits {
> > +       __u64 __opaque[2];
> > +} __aligned(8);
> > +
> > +struct bpf_iter_bits_kern {
> > +       unsigned long *bits;
> > +       u32 nr_bits;
> > +       int bit;
> > +} __aligned(8);
> > +
> > +/**
> > + * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
> > + * @it: The new bpf_iter_bits to be created
> > + * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over
> > + * @nr_bits: The number of bits to be iterated over. Due to the limitation of
> > + * memalloc, it can't greater than (4096 * 8).
> > + *
> > + * This function initializes a new bpf_iter_bits structure for iterating over
> > + * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It
> > + * copy the data of the memory area to the newly created bpf_iter_bits @it for
> > + * subsequent iteration operations.
> > + *
> > + * On success, 0 is returned. On failure, ERR is returned.
> > + */
> > +__bpf_kfunc int
> > +bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits)
> > +{
> > +       struct bpf_iter_bits_kern *kit = (void *)it;
> > +       u32 size = BITS_TO_BYTES(nr_bits);
> > +       int err;
> > +
> > +       BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits));
> > +       BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) !=
> > +                    __alignof__(struct bpf_iter_bits));
> > +
> > +       if (!unsafe_ptr__ign || !nr_bits) {
> > +               kit->bits = NULL;
> > +               return -EINVAL;
> > +       }
> > +
> > +       kit->bits = bpf_mem_alloc(&bpf_global_ma, size);
> > +       if (!kit->bits)
> > +               return -ENOMEM;
>
> it's probably going to be a pretty common case to do bits iteration
> for nr_bits<=64, right?

It's highly unlikely.
Consider the CPU count as an example; There are 256 CPUs on our AMD
EPYC servers.

>  So as an optimization, instead of doing
> bpf_mem_alloc() for this case, you can just copy up to 8 bytes and
> store it in a union of `unsigned long *bits` and `unsigned long
> bits_copy`. As a performance optimization (and to reduce dependency on
> memory allocation). WDYT?
>

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/2] bpf: Add bits iterator
  2024-02-28  2:25     ` Yafang Shao
@ 2024-02-28  6:04       ` Andrii Nakryiko
  2024-02-29  2:15         ` Yafang Shao
  0 siblings, 1 reply; 11+ messages in thread
From: Andrii Nakryiko @ 2024-02-28  6:04 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Tue, Feb 27, 2024 at 6:25 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Wed, Feb 28, 2024 at 9:24 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sun, Feb 25, 2024 at 2:07 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > Add three new kfuncs for the bits iterator:
> > > - bpf_iter_bits_new
> > >   Initialize a new bits iterator for a given memory area. Due to the
> > >   limitation of bpf memalloc, the max number of bits that can be iterated
> > >   over is limited to (4096 * 8).
> > > - bpf_iter_bits_next
> > >   Get the next bit in a bpf_iter_bits
> > > - bpf_iter_bits_destroy
> > >   Destroy a bpf_iter_bits
> > >
> > > The bits iterator facilitates the iteration of the bits of a memory area,
> > > such as cpumask. It can be used in any context and on any address.
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > ---
> > >  kernel/bpf/helpers.c | 100 +++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 100 insertions(+)
> > >
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index 93edf730d288..052f63891834 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > > @@ -2542,6 +2542,103 @@ __bpf_kfunc void bpf_throw(u64 cookie)
> > >         WARN(1, "A call to BPF exception callback should never return\n");
> > >  }
> > >
> > > +struct bpf_iter_bits {
> > > +       __u64 __opaque[2];
> > > +} __aligned(8);
> > > +
> > > +struct bpf_iter_bits_kern {
> > > +       unsigned long *bits;
> > > +       u32 nr_bits;
> > > +       int bit;
> > > +} __aligned(8);
> > > +
> > > +/**
> > > + * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
> > > + * @it: The new bpf_iter_bits to be created
> > > + * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over
> > > + * @nr_bits: The number of bits to be iterated over. Due to the limitation of
> > > + * memalloc, it can't greater than (4096 * 8).
> > > + *
> > > + * This function initializes a new bpf_iter_bits structure for iterating over
> > > + * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It
> > > + * copy the data of the memory area to the newly created bpf_iter_bits @it for
> > > + * subsequent iteration operations.
> > > + *
> > > + * On success, 0 is returned. On failure, ERR is returned.
> > > + */
> > > +__bpf_kfunc int
> > > +bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits)
> > > +{
> > > +       struct bpf_iter_bits_kern *kit = (void *)it;
> > > +       u32 size = BITS_TO_BYTES(nr_bits);
> > > +       int err;
> > > +
> > > +       BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits));
> > > +       BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) !=
> > > +                    __alignof__(struct bpf_iter_bits));
> > > +
> > > +       if (!unsafe_ptr__ign || !nr_bits) {
> > > +               kit->bits = NULL;
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       kit->bits = bpf_mem_alloc(&bpf_global_ma, size);
> > > +       if (!kit->bits)
> > > +               return -ENOMEM;
> >
> > it's probably going to be a pretty common case to do bits iteration
> > for nr_bits<=64, right?
>
> It's highly unlikely.
> Consider the CPU count as an example; There are 256 CPUs on our AMD
> EPYC servers.

Also consider u64-based bit masks (like struct backtrack_state in
verifier code, which has u32 reg_mask and u64 stack_mask). This
iterator is a generic bits iterator, there are tons of cases of
u64/u32 masks in practice.

>
> >  So as an optimization, instead of doing
> > bpf_mem_alloc() for this case, you can just copy up to 8 bytes and
> > store it in a union of `unsigned long *bits` and `unsigned long
> > bits_copy`. As a performance optimization (and to reduce dependency on
> > memory allocation). WDYT?
> >
>
> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/2] bpf: Add bits iterator
  2024-02-28  6:04       ` Andrii Nakryiko
@ 2024-02-29  2:15         ` Yafang Shao
  2024-02-29 17:26           ` Andrii Nakryiko
  0 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2024-02-29  2:15 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Wed, Feb 28, 2024 at 2:04 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Feb 27, 2024 at 6:25 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Wed, Feb 28, 2024 at 9:24 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Sun, Feb 25, 2024 at 2:07 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > Add three new kfuncs for the bits iterator:
> > > > - bpf_iter_bits_new
> > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > >   limitation of bpf memalloc, the max number of bits that can be iterated
> > > >   over is limited to (4096 * 8).
> > > > - bpf_iter_bits_next
> > > >   Get the next bit in a bpf_iter_bits
> > > > - bpf_iter_bits_destroy
> > > >   Destroy a bpf_iter_bits
> > > >
> > > > The bits iterator facilitates the iteration of the bits of a memory area,
> > > > such as cpumask. It can be used in any context and on any address.
> > > >
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > ---
> > > >  kernel/bpf/helpers.c | 100 +++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 100 insertions(+)
> > > >
> > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > index 93edf730d288..052f63891834 100644
> > > > --- a/kernel/bpf/helpers.c
> > > > +++ b/kernel/bpf/helpers.c
> > > > @@ -2542,6 +2542,103 @@ __bpf_kfunc void bpf_throw(u64 cookie)
> > > >         WARN(1, "A call to BPF exception callback should never return\n");
> > > >  }
> > > >
> > > > +struct bpf_iter_bits {
> > > > +       __u64 __opaque[2];
> > > > +} __aligned(8);
> > > > +
> > > > +struct bpf_iter_bits_kern {
> > > > +       unsigned long *bits;
> > > > +       u32 nr_bits;
> > > > +       int bit;
> > > > +} __aligned(8);
> > > > +
> > > > +/**
> > > > + * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
> > > > + * @it: The new bpf_iter_bits to be created
> > > > + * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over
> > > > + * @nr_bits: The number of bits to be iterated over. Due to the limitation of
> > > > + * memalloc, it can't greater than (4096 * 8).
> > > > + *
> > > > + * This function initializes a new bpf_iter_bits structure for iterating over
> > > > + * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It
> > > > + * copy the data of the memory area to the newly created bpf_iter_bits @it for
> > > > + * subsequent iteration operations.
> > > > + *
> > > > + * On success, 0 is returned. On failure, ERR is returned.
> > > > + */
> > > > +__bpf_kfunc int
> > > > +bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits)
> > > > +{
> > > > +       struct bpf_iter_bits_kern *kit = (void *)it;
> > > > +       u32 size = BITS_TO_BYTES(nr_bits);
> > > > +       int err;
> > > > +
> > > > +       BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits));
> > > > +       BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) !=
> > > > +                    __alignof__(struct bpf_iter_bits));
> > > > +
> > > > +       if (!unsafe_ptr__ign || !nr_bits) {
> > > > +               kit->bits = NULL;
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       kit->bits = bpf_mem_alloc(&bpf_global_ma, size);
> > > > +       if (!kit->bits)
> > > > +               return -ENOMEM;
> > >
> > > it's probably going to be a pretty common case to do bits iteration
> > > for nr_bits<=64, right?
> >
> > It's highly unlikely.
> > Consider the CPU count as an example; There are 256 CPUs on our AMD
> > EPYC servers.
>
> Also consider u64-based bit masks (like struct backtrack_state in
> verifier code, which has u32 reg_mask and u64 stack_mask). This
> iterator is a generic bits iterator, there are tons of cases of
> u64/u32 masks in practice.

Should we optimize it as follows?

    if (nr_bits <= 64) {
        // do the optimization
    } else {
        // fallback to memalloc
    }

>
> >
> > >  So as an optimization, instead of doing
> > > bpf_mem_alloc() for this case, you can just copy up to 8 bytes and
> > > store it in a union of `unsigned long *bits` and `unsigned long
> > > bits_copy`. As a performance optimization (and to reduce dependency on
> > > memory allocation). WDYT?
> > >
> >
> > --
> > Regards
> > Yafang



-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/2] bpf: Add bits iterator
  2024-02-29  2:15         ` Yafang Shao
@ 2024-02-29 17:26           ` Andrii Nakryiko
  2024-02-29 22:19             ` John Fastabend
  0 siblings, 1 reply; 11+ messages in thread
From: Andrii Nakryiko @ 2024-02-29 17:26 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Wed, Feb 28, 2024 at 6:16 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Wed, Feb 28, 2024 at 2:04 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Feb 27, 2024 at 6:25 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Wed, Feb 28, 2024 at 9:24 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Sun, Feb 25, 2024 at 2:07 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > Add three new kfuncs for the bits iterator:
> > > > > - bpf_iter_bits_new
> > > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > > >   limitation of bpf memalloc, the max number of bits that can be iterated
> > > > >   over is limited to (4096 * 8).
> > > > > - bpf_iter_bits_next
> > > > >   Get the next bit in a bpf_iter_bits
> > > > > - bpf_iter_bits_destroy
> > > > >   Destroy a bpf_iter_bits
> > > > >
> > > > > The bits iterator facilitates the iteration of the bits of a memory area,
> > > > > such as cpumask. It can be used in any context and on any address.
> > > > >
> > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > > ---
> > > > >  kernel/bpf/helpers.c | 100 +++++++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 100 insertions(+)
> > > > >
> > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > index 93edf730d288..052f63891834 100644
> > > > > --- a/kernel/bpf/helpers.c
> > > > > +++ b/kernel/bpf/helpers.c
> > > > > @@ -2542,6 +2542,103 @@ __bpf_kfunc void bpf_throw(u64 cookie)
> > > > >         WARN(1, "A call to BPF exception callback should never return\n");
> > > > >  }
> > > > >
> > > > > +struct bpf_iter_bits {
> > > > > +       __u64 __opaque[2];
> > > > > +} __aligned(8);
> > > > > +
> > > > > +struct bpf_iter_bits_kern {
> > > > > +       unsigned long *bits;
> > > > > +       u32 nr_bits;
> > > > > +       int bit;
> > > > > +} __aligned(8);
> > > > > +
> > > > > +/**
> > > > > + * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
> > > > > + * @it: The new bpf_iter_bits to be created
> > > > > + * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over
> > > > > + * @nr_bits: The number of bits to be iterated over. Due to the limitation of
> > > > > + * memalloc, it can't greater than (4096 * 8).
> > > > > + *
> > > > > + * This function initializes a new bpf_iter_bits structure for iterating over
> > > > > + * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It
> > > > > + * copy the data of the memory area to the newly created bpf_iter_bits @it for
> > > > > + * subsequent iteration operations.
> > > > > + *
> > > > > + * On success, 0 is returned. On failure, ERR is returned.
> > > > > + */
> > > > > +__bpf_kfunc int
> > > > > +bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits)
> > > > > +{
> > > > > +       struct bpf_iter_bits_kern *kit = (void *)it;
> > > > > +       u32 size = BITS_TO_BYTES(nr_bits);
> > > > > +       int err;
> > > > > +
> > > > > +       BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits));
> > > > > +       BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) !=
> > > > > +                    __alignof__(struct bpf_iter_bits));
> > > > > +
> > > > > +       if (!unsafe_ptr__ign || !nr_bits) {
> > > > > +               kit->bits = NULL;
> > > > > +               return -EINVAL;
> > > > > +       }
> > > > > +
> > > > > +       kit->bits = bpf_mem_alloc(&bpf_global_ma, size);
> > > > > +       if (!kit->bits)
> > > > > +               return -ENOMEM;
> > > >
> > > > it's probably going to be a pretty common case to do bits iteration
> > > > for nr_bits<=64, right?
> > >
> > > It's highly unlikely.
> > > Consider the CPU count as an example; There are 256 CPUs on our AMD
> > > EPYC servers.
> >
> > Also consider u64-based bit masks (like struct backtrack_state in
> > verifier code, which has u32 reg_mask and u64 stack_mask). This
> > iterator is a generic bits iterator, there are tons of cases of
> > u64/u32 masks in practice.
>
> Should we optimize it as follows?
>
>     if (nr_bits <= 64) {
>         // do the optimization
>     } else {
>         // fallback to memalloc
>     }
>

Yep, that's what I'm proposing


> >
> > >
> > > >  So as an optimization, instead of doing
> > > > bpf_mem_alloc() for this case, you can just copy up to 8 bytes and
> > > > store it in a union of `unsigned long *bits` and `unsigned long
> > > > bits_copy`. As a performance optimization (and to reduce dependency on
> > > > memory allocation). WDYT?
> > > >
> > >
> > > --
> > > Regards
> > > Yafang
>
>
>
> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/2] bpf: Add bits iterator
  2024-02-29 17:26           ` Andrii Nakryiko
@ 2024-02-29 22:19             ` John Fastabend
  2024-03-01  6:40               ` Yafang Shao
  0 siblings, 1 reply; 11+ messages in thread
From: John Fastabend @ 2024-02-29 22:19 UTC (permalink / raw)
  To: Andrii Nakryiko, Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

Andrii Nakryiko wrote:
> On Wed, Feb 28, 2024 at 6:16 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Wed, Feb 28, 2024 at 2:04 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Tue, Feb 27, 2024 at 6:25 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 28, 2024 at 9:24 AM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > On Sun, Feb 25, 2024 at 2:07 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > >
> > > > > > Add three new kfuncs for the bits iterator:
> > > > > > - bpf_iter_bits_new
> > > > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > > > >   limitation of bpf memalloc, the max number of bits that can be iterated
> > > > > >   over is limited to (4096 * 8).
> > > > > > - bpf_iter_bits_next
> > > > > >   Get the next bit in a bpf_iter_bits
> > > > > > - bpf_iter_bits_destroy
> > > > > >   Destroy a bpf_iter_bits
> > > > > >
> > > > > > The bits iterator facilitates the iteration of the bits of a memory area,
> > > > > > such as cpumask. It can be used in any context and on any address.
> > > > > >
> > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > > > ---
> > > > > >  kernel/bpf/helpers.c | 100 +++++++++++++++++++++++++++++++++++++++++++
> > > > > >  1 file changed, 100 insertions(+)
> > > > > >
> > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > > index 93edf730d288..052f63891834 100644
> > > > > > --- a/kernel/bpf/helpers.c
> > > > > > +++ b/kernel/bpf/helpers.c
> > > > > > @@ -2542,6 +2542,103 @@ __bpf_kfunc void bpf_throw(u64 cookie)
> > > > > >         WARN(1, "A call to BPF exception callback should never return\n");
> > > > > >  }
> > > > > >
> > > > > > +struct bpf_iter_bits {
> > > > > > +       __u64 __opaque[2];
> > > > > > +} __aligned(8);
> > > > > > +
> > > > > > +struct bpf_iter_bits_kern {
> > > > > > +       unsigned long *bits;
> > > > > > +       u32 nr_bits;
> > > > > > +       int bit;
> > > > > > +} __aligned(8);
> > > > > > +
> > > > > > +/**
> > > > > > + * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
> > > > > > + * @it: The new bpf_iter_bits to be created
> > > > > > + * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over
> > > > > > + * @nr_bits: The number of bits to be iterated over. Due to the limitation of
> > > > > > + * memalloc, it can't greater than (4096 * 8).
> > > > > > + *
> > > > > > + * This function initializes a new bpf_iter_bits structure for iterating over
> > > > > > + * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It
> > > > > > + * copy the data of the memory area to the newly created bpf_iter_bits @it for
> > > > > > + * subsequent iteration operations.
> > > > > > + *
> > > > > > + * On success, 0 is returned. On failure, ERR is returned.
> > > > > > + */
> > > > > > +__bpf_kfunc int
> > > > > > +bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits)
> > > > > > +{
> > > > > > +       struct bpf_iter_bits_kern *kit = (void *)it;
> > > > > > +       u32 size = BITS_TO_BYTES(nr_bits);
> > > > > > +       int err;
> > > > > > +
> > > > > > +       BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits));
> > > > > > +       BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) !=
> > > > > > +                    __alignof__(struct bpf_iter_bits));
> > > > > > +
> > > > > > +       if (!unsafe_ptr__ign || !nr_bits) {
> > > > > > +               kit->bits = NULL;
> > > > > > +               return -EINVAL;
> > > > > > +       }
> > > > > > +
> > > > > > +       kit->bits = bpf_mem_alloc(&bpf_global_ma, size);
> > > > > > +       if (!kit->bits)
> > > > > > +               return -ENOMEM;
> > > > >
> > > > > it's probably going to be a pretty common case to do bits iteration
> > > > > for nr_bits<=64, right?
> > > >
> > > > It's highly unlikely.
> > > > Consider the CPU count as an example; There are 256 CPUs on our AMD
> > > > EPYC servers.
> > >
> > > Also consider u64-based bit masks (like struct backtrack_state in
> > > verifier code, which has u32 reg_mask and u64 stack_mask). This
> > > iterator is a generic bits iterator, there are tons of cases of
> > > u64/u32 masks in practice.
> >
> > Should we optimize it as follows?
> >
> >     if (nr_bits <= 64) {
> >         // do the optimization
> >     } else {
> >         // fallback to memalloc
> >     }
> >
> 
> Yep, that's what I'm proposing

When I suggested why not just open code this in BPF earlier I was
mostly thinking of these u64 and u32 masks we have lots of them
in our code base as well.

I have something like this which might be even better than 3
calls depending on your use case,

 int find_next_bit(uint64_t bits, int last_bit)
 {
    int i = last_bit;
    for (i = 0; i < sizeof(uint64_t) * 8; i++) {
        if (bits & (1 << i))
           return i;
    }
    return -1;
  }

Verifier seems plenty happy with above.

Thanks,
John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/2] bpf: Add bits iterator
  2024-02-29 22:19             ` John Fastabend
@ 2024-03-01  6:40               ` Yafang Shao
  2024-03-01 18:01                 ` John Fastabend
  0 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2024-03-01  6:40 UTC (permalink / raw)
  To: John Fastabend
  Cc: Andrii Nakryiko, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Fri, Mar 1, 2024 at 6:19 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Andrii Nakryiko wrote:
> > On Wed, Feb 28, 2024 at 6:16 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Wed, Feb 28, 2024 at 2:04 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 27, 2024 at 6:25 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > On Wed, Feb 28, 2024 at 9:24 AM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > >
> > > > > > On Sun, Feb 25, 2024 at 2:07 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > >
> > > > > > > Add three new kfuncs for the bits iterator:
> > > > > > > - bpf_iter_bits_new
> > > > > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > > > > >   limitation of bpf memalloc, the max number of bits that can be iterated
> > > > > > >   over is limited to (4096 * 8).
> > > > > > > - bpf_iter_bits_next
> > > > > > >   Get the next bit in a bpf_iter_bits
> > > > > > > - bpf_iter_bits_destroy
> > > > > > >   Destroy a bpf_iter_bits
> > > > > > >
> > > > > > > The bits iterator facilitates the iteration of the bits of a memory area,
> > > > > > > such as cpumask. It can be used in any context and on any address.
> > > > > > >
> > > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > > > > ---
> > > > > > >  kernel/bpf/helpers.c | 100 +++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  1 file changed, 100 insertions(+)
> > > > > > >
> > > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > > > index 93edf730d288..052f63891834 100644
> > > > > > > --- a/kernel/bpf/helpers.c
> > > > > > > +++ b/kernel/bpf/helpers.c
> > > > > > > @@ -2542,6 +2542,103 @@ __bpf_kfunc void bpf_throw(u64 cookie)
> > > > > > >         WARN(1, "A call to BPF exception callback should never return\n");
> > > > > > >  }
> > > > > > >
> > > > > > > +struct bpf_iter_bits {
> > > > > > > +       __u64 __opaque[2];
> > > > > > > +} __aligned(8);
> > > > > > > +
> > > > > > > +struct bpf_iter_bits_kern {
> > > > > > > +       unsigned long *bits;
> > > > > > > +       u32 nr_bits;
> > > > > > > +       int bit;
> > > > > > > +} __aligned(8);
> > > > > > > +
> > > > > > > +/**
> > > > > > > + * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
> > > > > > > + * @it: The new bpf_iter_bits to be created
> > > > > > > + * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over
> > > > > > > + * @nr_bits: The number of bits to be iterated over. Due to the limitation of
> > > > > > > + * memalloc, it can't greater than (4096 * 8).
> > > > > > > + *
> > > > > > > + * This function initializes a new bpf_iter_bits structure for iterating over
> > > > > > > + * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It
> > > > > > > + * copy the data of the memory area to the newly created bpf_iter_bits @it for
> > > > > > > + * subsequent iteration operations.
> > > > > > > + *
> > > > > > > + * On success, 0 is returned. On failure, ERR is returned.
> > > > > > > + */
> > > > > > > +__bpf_kfunc int
> > > > > > > +bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits)
> > > > > > > +{
> > > > > > > +       struct bpf_iter_bits_kern *kit = (void *)it;
> > > > > > > +       u32 size = BITS_TO_BYTES(nr_bits);
> > > > > > > +       int err;
> > > > > > > +
> > > > > > > +       BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits));
> > > > > > > +       BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) !=
> > > > > > > +                    __alignof__(struct bpf_iter_bits));
> > > > > > > +
> > > > > > > +       if (!unsafe_ptr__ign || !nr_bits) {
> > > > > > > +               kit->bits = NULL;
> > > > > > > +               return -EINVAL;
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       kit->bits = bpf_mem_alloc(&bpf_global_ma, size);
> > > > > > > +       if (!kit->bits)
> > > > > > > +               return -ENOMEM;
> > > > > >
> > > > > > it's probably going to be a pretty common case to do bits iteration
> > > > > > for nr_bits<=64, right?
> > > > >
> > > > > It's highly unlikely.
> > > > > Consider the CPU count as an example; There are 256 CPUs on our AMD
> > > > > EPYC servers.
> > > >
> > > > Also consider u64-based bit masks (like struct backtrack_state in
> > > > verifier code, which has u32 reg_mask and u64 stack_mask). This
> > > > iterator is a generic bits iterator, there are tons of cases of
> > > > u64/u32 masks in practice.
> > >
> > > Should we optimize it as follows?
> > >
> > >     if (nr_bits <= 64) {
> > >         // do the optimization
> > >     } else {
> > >         // fallback to memalloc
> > >     }
> > >
> >
> > Yep, that's what I'm proposing
>
> When I suggested why not just open code this in BPF earlier I was
> mostly thinking of these u64 and u32 masks we have lots of them
> in our code base as well.
>
> I have something like this which might be even better than 3
> calls depending on your use case,
>
>  int find_next_bit(uint64_t bits, int last_bit)
>  {
>     int i = last_bit;
>     for (i = 0; i < sizeof(uint64_t) * 8; i++) {
>         if (bits & (1 << i))
>            return i;
>     }
>     return -1;
>   }
>
> Verifier seems plenty happy with above.

I'm not quite following.
Regarding the find_next_bit() function you mentioned, it seems it only
retrieves one bit at a time, necessitating a for loop for execution,
correct? Consequently, the verifier will likely fail the for loop.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/2] bpf: Add bits iterator
  2024-03-01  6:40               ` Yafang Shao
@ 2024-03-01 18:01                 ` John Fastabend
  0 siblings, 0 replies; 11+ messages in thread
From: John Fastabend @ 2024-03-01 18:01 UTC (permalink / raw)
  To: Yafang Shao, John Fastabend
  Cc: Andrii Nakryiko, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

Yafang Shao wrote:
> On Fri, Mar 1, 2024 at 6:19 AM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Andrii Nakryiko wrote:
> > > On Wed, Feb 28, 2024 at 6:16 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 28, 2024 at 2:04 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > On Tue, Feb 27, 2024 at 6:25 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Feb 28, 2024 at 9:24 AM Andrii Nakryiko
> > > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > >
> > > > > > > On Sun, Feb 25, 2024 at 2:07 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Add three new kfuncs for the bits iterator:
> > > > > > > > - bpf_iter_bits_new
> > > > > > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > > > > > >   limitation of bpf memalloc, the max number of bits that can be iterated
> > > > > > > >   over is limited to (4096 * 8).
> > > > > > > > - bpf_iter_bits_next
> > > > > > > >   Get the next bit in a bpf_iter_bits
> > > > > > > > - bpf_iter_bits_destroy
> > > > > > > >   Destroy a bpf_iter_bits
> > > > > > > >
> > > > > > > > The bits iterator facilitates the iteration of the bits of a memory area,
> > > > > > > > such as cpumask. It can be used in any context and on any address.
> > > > > > > >
> > > > > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > > > > > ---

[...]

> > > > > > > > +/**
> > > > > > > > + * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
> > > > > > > > + * @it: The new bpf_iter_bits to be created
> > > > > > > > + * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over
> > > > > > > > + * @nr_bits: The number of bits to be iterated over. Due to the limitation of
> > > > > > > > + * memalloc, it can't greater than (4096 * 8).
> > > > > > > > + *
> > > > > > > > + * This function initializes a new bpf_iter_bits structure for iterating over
> > > > > > > > + * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It
> > > > > > > > + * copy the data of the memory area to the newly created bpf_iter_bits @it for
> > > > > > > > + * subsequent iteration operations.
> > > > > > > > + *
> > > > > > > > + * On success, 0 is returned. On failure, ERR is returned.
> > > > > > > > + */
> > > > > > > > +__bpf_kfunc int
> > > > > > > > +bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits)
> > > > > > > > +{
> > > > > > > > +       struct bpf_iter_bits_kern *kit = (void *)it;
> > > > > > > > +       u32 size = BITS_TO_BYTES(nr_bits);
> > > > > > > > +       int err;
> > > > > > > > +
> > > > > > > > +       BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits));
> > > > > > > > +       BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) !=
> > > > > > > > +                    __alignof__(struct bpf_iter_bits));
> > > > > > > > +
> > > > > > > > +       if (!unsafe_ptr__ign || !nr_bits) {
> > > > > > > > +               kit->bits = NULL;
> > > > > > > > +               return -EINVAL;
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       kit->bits = bpf_mem_alloc(&bpf_global_ma, size);
> > > > > > > > +       if (!kit->bits)
> > > > > > > > +               return -ENOMEM;
> > > > > > >
> > > > > > > it's probably going to be a pretty common case to do bits iteration
> > > > > > > for nr_bits<=64, right?
> > > > > >
> > > > > > It's highly unlikely.
> > > > > > Consider the CPU count as an example; There are 256 CPUs on our AMD
> > > > > > EPYC servers.
> > > > >
> > > > > Also consider u64-based bit masks (like struct backtrack_state in
> > > > > verifier code, which has u32 reg_mask and u64 stack_mask). This
> > > > > iterator is a generic bits iterator, there are tons of cases of
> > > > > u64/u32 masks in practice.
> > > >
> > > > Should we optimize it as follows?
> > > >
> > > >     if (nr_bits <= 64) {
> > > >         // do the optimization
> > > >     } else {
> > > >         // fallback to memalloc
> > > >     }
> > > >
> > >
> > > Yep, that's what I'm proposing
> >
> > When I suggested why not just open code this in BPF earlier I was
> > mostly thinking of these u64 and u32 masks we have lots of them
> > in our code base as well.
> >
> > I have something like this which might be even better than 3
> > calls depending on your use case,
> >
> >  int find_next_bit(uint64_t bits, int last_bit)
> >  {
> >     int i = last_bit;
> >     for (i = 0; i < sizeof(uint64_t) * 8; i++) {
> >         if (bits & (1 << i))
> >            return i;
> >     }
> >     return -1;
> >   }
> >
> > Verifier seems plenty happy with above.
> 
> I'm not quite following.
> Regarding the find_next_bit() function you mentioned, it seems it only
> retrieves one bit at a time, necessitating a for loop for execution,
> correct? Consequently, the verifier will likely fail the for loop.

In practice for small sizes uint64_t and uint32_t we don't see any
issue. Just a comment that this can be open coded without much
trouble in many cases. Not against the helper at all.

> 
> -- 
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-01 18:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-25 10:06 [PATCH v2 bpf-next 0/2] bpf: Add a generic bits iterator Yafang Shao
2024-02-25 10:06 ` [PATCH v2 bpf-next 1/2] bpf: Add " Yafang Shao
2024-02-28  1:24   ` Andrii Nakryiko
2024-02-28  2:25     ` Yafang Shao
2024-02-28  6:04       ` Andrii Nakryiko
2024-02-29  2:15         ` Yafang Shao
2024-02-29 17:26           ` Andrii Nakryiko
2024-02-29 22:19             ` John Fastabend
2024-03-01  6:40               ` Yafang Shao
2024-03-01 18:01                 ` John Fastabend
2024-02-25 10:06 ` [PATCH v2 bpf-next 2/2] selftests/bpf: Add selftest for bits iter Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).