bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/4] Sockmap iterator
@ 2020-09-01 10:32 Lorenz Bauer
  2020-09-01 10:32 ` [PATCH bpf-next v2 1/4] net: sockmap: Remove unnecessary sk_fullsock checks Lorenz Bauer
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Lorenz Bauer @ 2020-09-01 10:32 UTC (permalink / raw)
  To: ast, yhs, daniel, jakub, john.fastabend; +Cc: bpf, kernel-team, Lorenz Bauer

This addresses Jakub's feedback for the v1 [1]. Outstanding issues are:

* Can we use rcu_dereference instead of rcu_dereference_raw?
* Is it OK to not take the bucket lock?
* Can we teach the verifier that PTR_TO_BTF_ID can be the same as PTR_TO_SOCKET?

Changes in v2:
- Remove unnecessary sk_fullsock checks (Jakub)
- Nits for test output (Jakub)
- Increase number of sockets in tests to 64 (Jakub)
- Handle ENOENT in tests (Jakub)
- Actually test SOCKHASH iteration (myself)
- Fix SOCKHASH iterator initialization (myself)

1: https://lore.kernel.org/bpf/20200828094834.23290-1-lmb@cloudflare.com/

Lorenz Bauer (4):
  net: sockmap: Remove unnecessary sk_fullsock checks
  net: Allow iterating sockmap and sockhash
  selftests: bpf: Add helper to compare socket cookies
  selftests: bpf: Test copying a sockmap via bpf_iter

 net/core/sock_map.c                           | 287 +++++++++++++++++-
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 141 ++++++++-
 tools/testing/selftests/bpf/progs/bpf_iter.h  |   9 +
 .../selftests/bpf/progs/bpf_iter_sockmap.c    |  58 ++++
 .../selftests/bpf/progs/bpf_iter_sockmap.h    |   3 +
 5 files changed, 482 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v2 1/4] net: sockmap: Remove unnecessary sk_fullsock checks
  2020-09-01 10:32 [PATCH bpf-next v2 0/4] Sockmap iterator Lorenz Bauer
@ 2020-09-01 10:32 ` Lorenz Bauer
  2020-09-01 10:32 ` [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash Lorenz Bauer
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Lorenz Bauer @ 2020-09-01 10:32 UTC (permalink / raw)
  To: ast, yhs, daniel, jakub, john.fastabend; +Cc: bpf, kernel-team, Lorenz Bauer

The lookup paths for sockmap and sockhash currently include a check
that returns NULL if the socket we just found is not a full socket.
However, this check is not necessary. On insertion we ensure that
we have a full socket (caveat around sock_ops), so request sockets
are not a problem. Time-wait sockets are allocated separate from
the original socket and then fed into the hashdance. They don't
affect the sockets already stored in the sockmap.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 net/core/sock_map.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index d6c6e1e312fc..ffdf94a30c87 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -382,7 +382,7 @@ static void *sock_map_lookup(struct bpf_map *map, void *key)
 	struct sock *sk;
 
 	sk = __sock_map_lookup_elem(map, *(u32 *)key);
-	if (!sk || !sk_fullsock(sk))
+	if (!sk)
 		return NULL;
 	if (sk_is_refcounted(sk) && !refcount_inc_not_zero(&sk->sk_refcnt))
 		return NULL;
@@ -1109,7 +1109,7 @@ static void *sock_hash_lookup(struct bpf_map *map, void *key)
 	struct sock *sk;
 
 	sk = __sock_hash_lookup_elem(map, key);
-	if (!sk || !sk_fullsock(sk))
+	if (!sk)
 		return NULL;
 	if (sk_is_refcounted(sk) && !refcount_inc_not_zero(&sk->sk_refcnt))
 		return NULL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash
  2020-09-01 10:32 [PATCH bpf-next v2 0/4] Sockmap iterator Lorenz Bauer
  2020-09-01 10:32 ` [PATCH bpf-next v2 1/4] net: sockmap: Remove unnecessary sk_fullsock checks Lorenz Bauer
@ 2020-09-01 10:32 ` Lorenz Bauer
  2020-09-02  5:08   ` Yonghong Song
  2020-09-03  1:31   ` Andrii Nakryiko
  2020-09-01 10:32 ` [PATCH bpf-next v2 3/4] selftests: bpf: Add helper to compare socket cookies Lorenz Bauer
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 15+ messages in thread
From: Lorenz Bauer @ 2020-09-01 10:32 UTC (permalink / raw)
  To: ast, yhs, daniel, jakub, john.fastabend; +Cc: bpf, kernel-team, Lorenz Bauer

Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and
hashtable implementation. sockmap and sockhash share the same iteration
context: a pointer to an arbitrary key and a pointer to a socket. Both
pointers may be NULL, and so BPF has to perform a NULL check before accessing
them. Technically it's not possible for sockhash iteration to yield a NULL
socket, but we ignore this to be able to use a single iteration point.

Iteration will visit all keys that remain unmodified during the lifetime of
the iterator. It may or may not visit newly added ones.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 net/core/sock_map.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 283 insertions(+)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index ffdf94a30c87..4767f9df2b8b 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -703,6 +703,114 @@ const struct bpf_func_proto bpf_msg_redirect_map_proto = {
 	.arg4_type      = ARG_ANYTHING,
 };
 
+struct sock_map_seq_info {
+	struct bpf_map *map;
+	struct sock *sk;
+	u32 index;
+};
+
+struct bpf_iter__sockmap {
+	__bpf_md_ptr(struct bpf_iter_meta *, meta);
+	__bpf_md_ptr(struct bpf_map *, map);
+	__bpf_md_ptr(void *, key);
+	__bpf_md_ptr(struct bpf_sock *, sk);
+};
+
+DEFINE_BPF_ITER_FUNC(sockmap, struct bpf_iter_meta *meta,
+		     struct bpf_map *map, void *key,
+		     struct sock *sk)
+
+static void *sock_map_seq_lookup_elem(struct sock_map_seq_info *info)
+{
+	if (unlikely(info->index >= info->map->max_entries))
+		return NULL;
+
+	info->sk = __sock_map_lookup_elem(info->map, info->index);
+
+	/* can't return sk directly, since that might be NULL */
+	return info;
+}
+
+static void *sock_map_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct sock_map_seq_info *info = seq->private;
+
+	if (*pos == 0)
+		++*pos;
+
+	/* pairs with sock_map_seq_stop */
+	rcu_read_lock();
+	return sock_map_seq_lookup_elem(info);
+}
+
+static void *sock_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct sock_map_seq_info *info = seq->private;
+
+	++*pos;
+	++info->index;
+
+	return sock_map_seq_lookup_elem(info);
+}
+
+static int __sock_map_seq_show(struct seq_file *seq, void *v)
+{
+	struct sock_map_seq_info *info = seq->private;
+	struct bpf_iter__sockmap ctx = {};
+	struct bpf_iter_meta meta;
+	struct bpf_prog *prog;
+
+	meta.seq = seq;
+	prog = bpf_iter_get_info(&meta, !v);
+	if (!prog)
+		return 0;
+
+	ctx.meta = &meta;
+	ctx.map = info->map;
+	if (v) {
+		ctx.key = &info->index;
+		ctx.sk = (struct bpf_sock *)info->sk;
+	}
+
+	return bpf_iter_run_prog(prog, &ctx);
+}
+
+static int sock_map_seq_show(struct seq_file *seq, void *v)
+{
+	return __sock_map_seq_show(seq, v);
+}
+
+static void sock_map_seq_stop(struct seq_file *seq, void *v)
+{
+	if (!v)
+		(void)__sock_map_seq_show(seq, NULL);
+
+	/* pairs with sock_map_seq_start */
+	rcu_read_unlock();
+}
+
+static const struct seq_operations sock_map_seq_ops = {
+	.start	= sock_map_seq_start,
+	.next	= sock_map_seq_next,
+	.stop	= sock_map_seq_stop,
+	.show	= sock_map_seq_show,
+};
+
+static int sock_map_init_seq_private(void *priv_data,
+				     struct bpf_iter_aux_info *aux)
+{
+	struct sock_map_seq_info *info = priv_data;
+
+	info->map = aux->map;
+	return 0;
+}
+
+static const struct bpf_iter_seq_info sock_map_iter_seq_info = {
+	.seq_ops		= &sock_map_seq_ops,
+	.init_seq_private	= sock_map_init_seq_private,
+	.seq_priv_size		= sizeof(struct sock_map_seq_info),
+};
+
 static int sock_map_btf_id;
 const struct bpf_map_ops sock_map_ops = {
 	.map_alloc		= sock_map_alloc,
@@ -716,6 +824,7 @@ const struct bpf_map_ops sock_map_ops = {
 	.map_check_btf		= map_check_no_btf,
 	.map_btf_name		= "bpf_stab",
 	.map_btf_id		= &sock_map_btf_id,
+	.iter_seq_info		= &sock_map_iter_seq_info,
 };
 
 struct bpf_shtab_elem {
@@ -1198,6 +1307,121 @@ const struct bpf_func_proto bpf_msg_redirect_hash_proto = {
 	.arg4_type      = ARG_ANYTHING,
 };
 
+struct sock_hash_seq_info {
+	struct bpf_map *map;
+	struct bpf_shtab *htab;
+	u32 bucket_id;
+};
+
+static void *sock_hash_seq_find_next(struct sock_hash_seq_info *info,
+				     struct bpf_shtab_elem *prev_elem)
+{
+	const struct bpf_shtab *htab = info->htab;
+	struct bpf_shtab_bucket *bucket;
+	struct bpf_shtab_elem *elem;
+	struct hlist_node *node;
+
+	/* try to find next elem in the same bucket */
+	if (prev_elem) {
+		node = rcu_dereference_raw(hlist_next_rcu(&prev_elem->node));
+		elem = hlist_entry_safe(node, struct bpf_shtab_elem, node);
+		if (elem)
+			return elem;
+
+		/* no more elements, continue in the next bucket */
+		info->bucket_id++;
+	}
+
+	for (; info->bucket_id < htab->buckets_num; info->bucket_id++) {
+		bucket = &htab->buckets[info->bucket_id];
+		node = rcu_dereference_raw(hlist_first_rcu(&bucket->head));
+		elem = hlist_entry_safe(node, struct bpf_shtab_elem, node);
+		if (elem)
+			return elem;
+	}
+
+	return NULL;
+}
+
+static void *sock_hash_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct sock_hash_seq_info *info = seq->private;
+
+	if (*pos == 0)
+		++*pos;
+
+	/* pairs with sock_hash_seq_stop */
+	rcu_read_lock();
+	return sock_hash_seq_find_next(info, NULL);
+}
+
+static void *sock_hash_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct sock_hash_seq_info *info = seq->private;
+
+	++*pos;
+	return sock_hash_seq_find_next(info, v);
+}
+
+static int __sock_hash_seq_show(struct seq_file *seq, struct bpf_shtab_elem *elem)
+{
+	struct sock_hash_seq_info *info = seq->private;
+	struct bpf_iter__sockmap ctx = {};
+	struct bpf_iter_meta meta;
+	struct bpf_prog *prog;
+
+	meta.seq = seq;
+	prog = bpf_iter_get_info(&meta, !elem);
+	if (!prog)
+		return 0;
+
+	ctx.meta = &meta;
+	ctx.map = info->map;
+	if (elem) {
+		ctx.key = elem->key;
+		ctx.sk = (struct bpf_sock *)elem->sk;
+	}
+
+	return bpf_iter_run_prog(prog, &ctx);
+}
+
+static int sock_hash_seq_show(struct seq_file *seq, void *v)
+{
+	return __sock_hash_seq_show(seq, v);
+}
+
+static void sock_hash_seq_stop(struct seq_file *seq, void *v)
+{
+	if (!v)
+		(void)__sock_hash_seq_show(seq, NULL);
+
+	/* pairs with sock_hash_seq_start */
+	rcu_read_unlock();
+}
+
+static const struct seq_operations sock_hash_seq_ops = {
+	.start	= sock_hash_seq_start,
+	.next	= sock_hash_seq_next,
+	.stop	= sock_hash_seq_stop,
+	.show	= sock_hash_seq_show,
+};
+
+static int sock_hash_init_seq_private(void *priv_data,
+				     struct bpf_iter_aux_info *aux)
+{
+	struct sock_hash_seq_info *info = priv_data;
+
+	info->map = aux->map;
+	info->htab = container_of(aux->map, struct bpf_shtab, map);
+	return 0;
+}
+
+static const struct bpf_iter_seq_info sock_hash_iter_seq_info = {
+	.seq_ops		= &sock_hash_seq_ops,
+	.init_seq_private	= sock_hash_init_seq_private,
+	.seq_priv_size		= sizeof(struct sock_hash_seq_info),
+};
+
 static int sock_hash_map_btf_id;
 const struct bpf_map_ops sock_hash_ops = {
 	.map_alloc		= sock_hash_alloc,
@@ -1211,6 +1435,7 @@ const struct bpf_map_ops sock_hash_ops = {
 	.map_check_btf		= map_check_no_btf,
 	.map_btf_name		= "bpf_shtab",
 	.map_btf_id		= &sock_hash_map_btf_id,
+	.iter_seq_info		= &sock_hash_iter_seq_info,
 };
 
 static struct sk_psock_progs *sock_map_progs(struct bpf_map *map)
@@ -1321,3 +1546,61 @@ void sock_map_close(struct sock *sk, long timeout)
 	release_sock(sk);
 	saved_close(sk, timeout);
 }
+
+static int sock_map_iter_attach_target(struct bpf_prog *prog,
+				       union bpf_iter_link_info *linfo,
+				       struct bpf_iter_aux_info *aux)
+{
+	struct bpf_map *map;
+	int err = -EINVAL;
+
+	if (!linfo->map.map_fd)
+		return -EBADF;
+
+	map = bpf_map_get_with_uref(linfo->map.map_fd);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	if (map->map_type != BPF_MAP_TYPE_SOCKMAP &&
+	    map->map_type != BPF_MAP_TYPE_SOCKHASH)
+		goto put_map;
+
+	if (prog->aux->max_rdonly_access > map->key_size) {
+		err = -EACCES;
+		goto put_map;
+	}
+
+	aux->map = map;
+	return 0;
+
+put_map:
+	bpf_map_put_with_uref(map);
+	return err;
+}
+
+static void sock_map_iter_detach_target(struct bpf_iter_aux_info *aux)
+{
+	bpf_map_put_with_uref(aux->map);
+}
+
+static struct bpf_iter_reg sock_map_iter_reg = {
+	.target			= "sockmap",
+	.attach_target		= sock_map_iter_attach_target,
+	.detach_target		= sock_map_iter_detach_target,
+	.show_fdinfo		= bpf_iter_map_show_fdinfo,
+	.fill_link_info		= bpf_iter_map_fill_link_info,
+	.ctx_arg_info_size	= 2,
+	.ctx_arg_info		= {
+		{ offsetof(struct bpf_iter__sockmap, key),
+		  PTR_TO_RDONLY_BUF_OR_NULL },
+		{ offsetof(struct bpf_iter__sockmap, sk),
+		  PTR_TO_SOCKET_OR_NULL },
+	},
+	.seq_info		= &sock_map_iter_seq_info,
+};
+
+static int __init bpf_sockmap_iter_init(void)
+{
+	return bpf_iter_reg_target(&sock_map_iter_reg);
+}
+late_initcall(bpf_sockmap_iter_init);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v2 3/4] selftests: bpf: Add helper to compare socket cookies
  2020-09-01 10:32 [PATCH bpf-next v2 0/4] Sockmap iterator Lorenz Bauer
  2020-09-01 10:32 ` [PATCH bpf-next v2 1/4] net: sockmap: Remove unnecessary sk_fullsock checks Lorenz Bauer
  2020-09-01 10:32 ` [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash Lorenz Bauer
@ 2020-09-01 10:32 ` Lorenz Bauer
  2020-09-01 10:32 ` [PATCH bpf-next v2 4/4] selftests: bpf: Test copying a sockmap via bpf_iter Lorenz Bauer
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Lorenz Bauer @ 2020-09-01 10:32 UTC (permalink / raw)
  To: ast, yhs, daniel, jakub, john.fastabend; +Cc: bpf, kernel-team, Lorenz Bauer

We compare socket cookies to ensure that insertion into a sockmap worked.
Pull this out into a helper function for use in other tests.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 53 ++++++++++++++-----
 1 file changed, 39 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 0b79d78b98db..9569bbac7f6e 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -47,6 +47,40 @@ static int connected_socket_v4(void)
 	return -1;
 }
 
+static void compare_cookies(struct bpf_map *src, struct bpf_map *dst)
+{
+	__u32 i, max_entries = bpf_map__max_entries(src);
+	int err, duration, src_fd, dst_fd;
+
+	src_fd = bpf_map__fd(src);
+	dst_fd = bpf_map__fd(dst);
+
+	for (i = 0; i < max_entries; i++) {
+		__u64 src_cookie, dst_cookie;
+
+		err = bpf_map_lookup_elem(src_fd, &i, &src_cookie);
+		if (err && errno == ENOENT) {
+			err = bpf_map_lookup_elem(dst_fd, &i, &dst_cookie);
+			if (err)
+				CHECK(errno != ENOENT, "map_lookup_elem(dst)",
+				      "%s\n", strerror(errno));
+			else
+				PRINT_FAIL("map_lookup_elem(dst): element %u not deleted\n", i);
+
+			continue;
+		}
+		if (CHECK(err, "lookup_elem(src)", "%s\n", strerror(errno)))
+			continue;
+
+		err = bpf_map_lookup_elem(dst_fd, &i, &dst_cookie);
+		if (CHECK(err, "lookup_elem(dst)", "%s\n", strerror(errno)))
+			continue;
+
+		CHECK(dst_cookie != src_cookie, "cookie mismatch",
+		      "%llu != %llu (pos %u)\n", dst_cookie, src_cookie, i);
+	}
+}
+
 /* Create a map, populate it with one socket, and free the map. */
 static void test_sockmap_create_update_free(enum bpf_map_type map_type)
 {
@@ -106,9 +140,9 @@ static void test_skmsg_helpers(enum bpf_map_type map_type)
 static void test_sockmap_update(enum bpf_map_type map_type)
 {
 	struct bpf_prog_test_run_attr tattr;
-	int err, prog, src, dst, duration = 0;
+	int err, prog, src, duration = 0;
 	struct test_sockmap_update *skel;
-	__u64 src_cookie, dst_cookie;
+	struct bpf_map *dst_map;
 	const __u32 zero = 0;
 	char dummy[14] = {0};
 	__s64 sk;
@@ -124,18 +158,14 @@ static void test_sockmap_update(enum bpf_map_type map_type)
 	prog = bpf_program__fd(skel->progs.copy_sock_map);
 	src = bpf_map__fd(skel->maps.src);
 	if (map_type == BPF_MAP_TYPE_SOCKMAP)
-		dst = bpf_map__fd(skel->maps.dst_sock_map);
+		dst_map = skel->maps.dst_sock_map;
 	else
-		dst = bpf_map__fd(skel->maps.dst_sock_hash);
+		dst_map = skel->maps.dst_sock_hash;
 
 	err = bpf_map_update_elem(src, &zero, &sk, BPF_NOEXIST);
 	if (CHECK(err, "update_elem(src)", "errno=%u\n", errno))
 		goto out;
 
-	err = bpf_map_lookup_elem(src, &zero, &src_cookie);
-	if (CHECK(err, "lookup_elem(src, cookie)", "errno=%u\n", errno))
-		goto out;
-
 	tattr = (struct bpf_prog_test_run_attr){
 		.prog_fd = prog,
 		.repeat = 1,
@@ -148,12 +178,7 @@ static void test_sockmap_update(enum bpf_map_type map_type)
 		       "errno=%u retval=%u\n", errno, tattr.retval))
 		goto out;
 
-	err = bpf_map_lookup_elem(dst, &zero, &dst_cookie);
-	if (CHECK(err, "lookup_elem(dst, cookie)", "errno=%u\n", errno))
-		goto out;
-
-	CHECK(dst_cookie != src_cookie, "cookie mismatch", "%llu != %llu\n",
-	      dst_cookie, src_cookie);
+	compare_cookies(skel->maps.src, dst_map);
 
 out:
 	test_sockmap_update__destroy(skel);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v2 4/4] selftests: bpf: Test copying a sockmap via bpf_iter
  2020-09-01 10:32 [PATCH bpf-next v2 0/4] Sockmap iterator Lorenz Bauer
                   ` (2 preceding siblings ...)
  2020-09-01 10:32 ` [PATCH bpf-next v2 3/4] selftests: bpf: Add helper to compare socket cookies Lorenz Bauer
@ 2020-09-01 10:32 ` Lorenz Bauer
  2020-09-03  5:35   ` Andrii Nakryiko
  2020-09-02  5:30 ` [PATCH bpf-next v2 0/4] Sockmap iterator Yonghong Song
  2020-09-03 20:08 ` Martin KaFai Lau
  5 siblings, 1 reply; 15+ messages in thread
From: Lorenz Bauer @ 2020-09-01 10:32 UTC (permalink / raw)
  To: ast, yhs, daniel, jakub, john.fastabend; +Cc: bpf, kernel-team, Lorenz Bauer

Add a test that exercises a basic sockmap / sockhash copy using bpf_iter.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 88 +++++++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_iter.h  |  9 ++
 .../selftests/bpf/progs/bpf_iter_sockmap.c    | 58 ++++++++++++
 .../selftests/bpf/progs/bpf_iter_sockmap.h    |  3 +
 4 files changed, 158 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 9569bbac7f6e..f5b7b27f096f 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -6,6 +6,9 @@
 #include "test_skmsg_load_helpers.skel.h"
 #include "test_sockmap_update.skel.h"
 #include "test_sockmap_invalid_update.skel.h"
+#include "bpf_iter_sockmap.skel.h"
+
+#include "progs/bpf_iter_sockmap.h"
 
 #define TCP_REPAIR		19	/* TCP sock is under repair right now */
 
@@ -196,6 +199,87 @@ static void test_sockmap_invalid_update(void)
 		test_sockmap_invalid_update__destroy(skel);
 }
 
+static void test_sockmap_copy(enum bpf_map_type map_type)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	int err, len, src_fd, iter_fd, duration;
+	union bpf_iter_link_info linfo = {0};
+	__s64 sock_fd[SOCKMAP_MAX_ENTRIES];
+	__u32 i, num_sockets, max_elems;
+	struct bpf_iter_sockmap *skel;
+	struct bpf_map *src, *dst;
+	struct bpf_link *link;
+	char buf[64];
+
+	skel = bpf_iter_sockmap__open_and_load();
+	if (CHECK(!skel, "bpf_iter_sockmap__open_and_load",
+		  "skeleton open_and_load failed\n"))
+		return;
+
+	memset(sock_fd, 0xff, sizeof(sock_fd));
+
+	/* Make sure we have at least one "empty" entry to test iteration of
+	 * an empty slot in an array.
+	 */
+	num_sockets = ARRAY_SIZE(sock_fd) - 1;
+
+	if (map_type == BPF_MAP_TYPE_SOCKMAP) {
+		src = skel->maps.sockmap;
+		max_elems = bpf_map__max_entries(src);
+	} else {
+		src = skel->maps.sockhash;
+		max_elems = num_sockets;
+	}
+
+	dst = skel->maps.dst;
+	src_fd = bpf_map__fd(src);
+
+	for (i = 0; i < num_sockets; i++) {
+		sock_fd[i] = connected_socket_v4();
+		if (CHECK(sock_fd[i] == -1, "connected_socket_v4", "cannot connect\n"))
+			goto out;
+
+		err = bpf_map_update_elem(src_fd, &i, &sock_fd[i], BPF_NOEXIST);
+		if (CHECK(err, "map_update", "failed: %s\n", strerror(errno)))
+			goto out;
+	}
+
+	linfo.map.map_fd = src_fd;
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+	link = bpf_program__attach_iter(skel->progs.copy_sockmap, &opts);
+	if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
+		goto out;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
+		goto free_link;
+
+	/* do some tests */
+	while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+		;
+	if (CHECK(len < 0, "read", "failed: %s\n", strerror(errno)))
+		goto close_iter;
+
+	/* test results */
+	if (CHECK(skel->bss->elems != max_elems, "elems", "got %u expected %u\n",
+		  skel->bss->elems, max_elems))
+		goto close_iter;
+
+	compare_cookies(src, dst);
+
+close_iter:
+	close(iter_fd);
+free_link:
+	bpf_link__destroy(link);
+out:
+	for (i = 0; i < num_sockets; i++) {
+		if (sock_fd[i] >= 0)
+			close(sock_fd[i]);
+	}
+	bpf_iter_sockmap__destroy(skel);
+}
+
 void test_sockmap_basic(void)
 {
 	if (test__start_subtest("sockmap create_update_free"))
@@ -212,4 +296,8 @@ void test_sockmap_basic(void)
 		test_sockmap_update(BPF_MAP_TYPE_SOCKHASH);
 	if (test__start_subtest("sockmap update in unsafe context"))
 		test_sockmap_invalid_update();
+	if (test__start_subtest("sockmap copy"))
+		test_sockmap_copy(BPF_MAP_TYPE_SOCKMAP);
+	if (test__start_subtest("sockhash copy"))
+		test_sockmap_copy(BPF_MAP_TYPE_SOCKHASH);
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter.h b/tools/testing/selftests/bpf/progs/bpf_iter.h
index c196280df90d..ac32a29f5153 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter.h
+++ b/tools/testing/selftests/bpf/progs/bpf_iter.h
@@ -13,6 +13,7 @@
 #define udp6_sock udp6_sock___not_used
 #define bpf_iter__bpf_map_elem bpf_iter__bpf_map_elem___not_used
 #define bpf_iter__bpf_sk_storage_map bpf_iter__bpf_sk_storage_map___not_used
+#define bpf_iter__sockmap bpf_iter__sockmap___not_used
 #include "vmlinux.h"
 #undef bpf_iter_meta
 #undef bpf_iter__bpf_map
@@ -26,6 +27,7 @@
 #undef udp6_sock
 #undef bpf_iter__bpf_map_elem
 #undef bpf_iter__bpf_sk_storage_map
+#undef bpf_iter__sockmap
 
 struct bpf_iter_meta {
 	struct seq_file *seq;
@@ -96,3 +98,10 @@ struct bpf_iter__bpf_sk_storage_map {
 	struct sock *sk;
 	void *value;
 };
+
+struct bpf_iter__sockmap {
+	struct bpf_iter_meta *meta;
+	struct bpf_map *map;
+	void *key;
+	struct bpf_sock *sk;
+};
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
new file mode 100644
index 000000000000..d236bc76cc06
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Cloudflare */
+#include "bpf_iter.h"
+#include "bpf_tracing_net.h"
+#include "bpf_iter_sockmap.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <errno.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, SOCKMAP_MAX_ENTRIES);
+	__type(key, __u32);
+	__type(value, __u64);
+} sockmap SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKHASH);
+	__uint(max_entries, SOCKMAP_MAX_ENTRIES);
+	__type(key, __u32);
+	__type(value, __u64);
+} sockhash SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKHASH);
+	__uint(max_entries, SOCKMAP_MAX_ENTRIES);
+	__type(key, __u32);
+	__type(value, __u64);
+} dst SEC(".maps");
+
+__u32 elems = 0;
+
+SEC("iter/sockmap")
+int copy_sockmap(struct bpf_iter__sockmap *ctx)
+{
+	struct bpf_sock *sk = ctx->sk;
+	__u32 tmp, *key = ctx->key;
+	int ret;
+
+	if (key == (void *)0)
+		return 0;
+
+	elems++;
+
+	/* We need a temporary buffer on the stack, since the verifier doesn't
+	 * let us use the pointer from the context as an argument to the helper.
+	 */
+	tmp = *key;
+	bpf_printk("key: %u\n", tmp);
+
+	if (sk != (void *)0)
+		return bpf_map_update_elem(&dst, &tmp, sk, 0) != 0;
+
+	ret = bpf_map_delete_elem(&dst, &tmp);
+	return ret && ret != -ENOENT;
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
new file mode 100644
index 000000000000..f98ad727ac06
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
@@ -0,0 +1,3 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define SOCKMAP_MAX_ENTRIES (64)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash
  2020-09-01 10:32 ` [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash Lorenz Bauer
@ 2020-09-02  5:08   ` Yonghong Song
  2020-09-02  8:58     ` Lorenz Bauer
  2020-09-03  1:31   ` Andrii Nakryiko
  1 sibling, 1 reply; 15+ messages in thread
From: Yonghong Song @ 2020-09-02  5:08 UTC (permalink / raw)
  To: Lorenz Bauer, ast, daniel, jakub, john.fastabend; +Cc: bpf, kernel-team



On 9/1/20 3:32 AM, Lorenz Bauer wrote:
> Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and
> hashtable implementation. sockmap and sockhash share the same iteration
> context: a pointer to an arbitrary key and a pointer to a socket. Both
> pointers may be NULL, and so BPF has to perform a NULL check before accessing
> them. Technically it's not possible for sockhash iteration to yield a NULL
> socket, but we ignore this to be able to use a single iteration point.

There seems some misunderstanding about when the NULL object may be
passed to the bpf program. The original design is to always pass valid
object except the last iteration. If you see a NULL object (key/sock),
that means it is the end of iteration, so you can finish aggregating
the data and may send them to the user space.

Sending NULL object in the middle of iterations is bad as bpf program
will just check NULL and returns.

The current arraymap iterator returns valid value for each index
in the range. Since even if user did not assign anything for an index,
the kernel still allocates storage for that index with default value 0.

For sockmap, it looks it is possible that some index may contain NULL
socket pointer. I suggest skip these array elements and already returns
a non-NULL object.

> 
> Iteration will visit all keys that remain unmodified during the lifetime of
> the iterator. It may or may not visit newly added ones.
> 
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>   net/core/sock_map.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 283 insertions(+)
> 
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index ffdf94a30c87..4767f9df2b8b 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -703,6 +703,114 @@ const struct bpf_func_proto bpf_msg_redirect_map_proto = {
>   	.arg4_type      = ARG_ANYTHING,
>   };
>   
> +struct sock_map_seq_info {
> +	struct bpf_map *map;
> +	struct sock *sk;
> +	u32 index;
> +};
> +
> +struct bpf_iter__sockmap {
> +	__bpf_md_ptr(struct bpf_iter_meta *, meta);
> +	__bpf_md_ptr(struct bpf_map *, map);
> +	__bpf_md_ptr(void *, key);
> +	__bpf_md_ptr(struct bpf_sock *, sk);
> +};
> +
> +DEFINE_BPF_ITER_FUNC(sockmap, struct bpf_iter_meta *meta,
> +		     struct bpf_map *map, void *key,
> +		     struct sock *sk)
> +
> +static void *sock_map_seq_lookup_elem(struct sock_map_seq_info *info)
> +{
> +	if (unlikely(info->index >= info->map->max_entries))
> +		return NULL;
> +
> +	info->sk = __sock_map_lookup_elem(info->map, info->index);
> +
> +	/* can't return sk directly, since that might be NULL */

As said in the above, suggest to skip NULL socket and always return
valid non-NULL socket.

> +	return info;
> +}
> +
> +static void *sock_map_seq_start(struct seq_file *seq, loff_t *pos)
> +{
> +	struct sock_map_seq_info *info = seq->private;
> +
> +	if (*pos == 0)
> +		++*pos;
> +
> +	/* pairs with sock_map_seq_stop */
> +	rcu_read_lock();
> +	return sock_map_seq_lookup_elem(info);
> +}
> +
> +static void *sock_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> +{
> +	struct sock_map_seq_info *info = seq->private;
> +
> +	++*pos;
> +	++info->index;
> +
> +	return sock_map_seq_lookup_elem(info);
> +}
> +
> +static int __sock_map_seq_show(struct seq_file *seq, void *v)
> +{
> +	struct sock_map_seq_info *info = seq->private;
> +	struct bpf_iter__sockmap ctx = {};
> +	struct bpf_iter_meta meta;
> +	struct bpf_prog *prog;
> +
> +	meta.seq = seq;
> +	prog = bpf_iter_get_info(&meta, !v);
> +	if (!prog)
> +		return 0;
> +
> +	ctx.meta = &meta;
> +	ctx.map = info->map;
> +	if (v) {
> +		ctx.key = &info->index;
> +		ctx.sk = (struct bpf_sock *)info->sk;
> +	}
> +
> +	return bpf_iter_run_prog(prog, &ctx);
> +}
> +
> +static int sock_map_seq_show(struct seq_file *seq, void *v)
> +{
> +	return __sock_map_seq_show(seq, v);
> +}
> +
> +static void sock_map_seq_stop(struct seq_file *seq, void *v)
> +{
> +	if (!v)
> +		(void)__sock_map_seq_show(seq, NULL);
> +
> +	/* pairs with sock_map_seq_start */
> +	rcu_read_unlock();
> +}
> +
> +static const struct seq_operations sock_map_seq_ops = {
> +	.start	= sock_map_seq_start,
> +	.next	= sock_map_seq_next,
> +	.stop	= sock_map_seq_stop,
> +	.show	= sock_map_seq_show,
> +};
> +
> +static int sock_map_init_seq_private(void *priv_data,
> +				     struct bpf_iter_aux_info *aux)
> +{
> +	struct sock_map_seq_info *info = priv_data;
> +
> +	info->map = aux->map;
> +	return 0;
> +}
> +
> +static const struct bpf_iter_seq_info sock_map_iter_seq_info = {
> +	.seq_ops		= &sock_map_seq_ops,
> +	.init_seq_private	= sock_map_init_seq_private,
> +	.seq_priv_size		= sizeof(struct sock_map_seq_info),
> +};
> +
>   static int sock_map_btf_id;
>   const struct bpf_map_ops sock_map_ops = {
>   	.map_alloc		= sock_map_alloc,
> @@ -716,6 +824,7 @@ const struct bpf_map_ops sock_map_ops = {
>   	.map_check_btf		= map_check_no_btf,
>   	.map_btf_name		= "bpf_stab",
>   	.map_btf_id		= &sock_map_btf_id,
> +	.iter_seq_info		= &sock_map_iter_seq_info,
>   };
>   
>   struct bpf_shtab_elem {
> @@ -1198,6 +1307,121 @@ const struct bpf_func_proto bpf_msg_redirect_hash_proto = {
>   	.arg4_type      = ARG_ANYTHING,
>   };
>   
> +struct sock_hash_seq_info {
> +	struct bpf_map *map;
> +	struct bpf_shtab *htab;
> +	u32 bucket_id;
> +};
> +
> +static void *sock_hash_seq_find_next(struct sock_hash_seq_info *info,
> +				     struct bpf_shtab_elem *prev_elem)
> +{
> +	const struct bpf_shtab *htab = info->htab;
> +	struct bpf_shtab_bucket *bucket;
> +	struct bpf_shtab_elem *elem;
> +	struct hlist_node *node;
> +
> +	/* try to find next elem in the same bucket */
> +	if (prev_elem) {
> +		node = rcu_dereference_raw(hlist_next_rcu(&prev_elem->node));
> +		elem = hlist_entry_safe(node, struct bpf_shtab_elem, node);
> +		if (elem)
> +			return elem;
> +
> +		/* no more elements, continue in the next bucket */
> +		info->bucket_id++;
> +	}

Looks like there are some earlier discussion on lock is not needed here?
It would be good to add some comments here.

> +	for (; info->bucket_id < htab->buckets_num; info->bucket_id++) {
> +		bucket = &htab->buckets[info->bucket_id];
> +		node = rcu_dereference_raw(hlist_first_rcu(&bucket->head));
> +		elem = hlist_entry_safe(node, struct bpf_shtab_elem, node);
> +		if (elem)
> +			return elem;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void *sock_hash_seq_start(struct seq_file *seq, loff_t *pos)
> +{
> +	struct sock_hash_seq_info *info = seq->private;
> +
> +	if (*pos == 0)
> +		++*pos;
> +
> +	/* pairs with sock_hash_seq_stop */
> +	rcu_read_lock();
> +	return sock_hash_seq_find_next(info, NULL);
> +}
> +
> +static void *sock_hash_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> +{
> +	struct sock_hash_seq_info *info = seq->private;
> +
> +	++*pos;
> +	return sock_hash_seq_find_next(info, v);
> +}
> +
> +static int __sock_hash_seq_show(struct seq_file *seq, struct bpf_shtab_elem *elem)
> +{
> +	struct sock_hash_seq_info *info = seq->private;
> +	struct bpf_iter__sockmap ctx = {};
> +	struct bpf_iter_meta meta;
> +	struct bpf_prog *prog;
> +
> +	meta.seq = seq;
> +	prog = bpf_iter_get_info(&meta, !elem);
> +	if (!prog)
> +		return 0;
> +
> +	ctx.meta = &meta;
> +	ctx.map = info->map;
> +	if (elem) {
> +		ctx.key = elem->key;
> +		ctx.sk = (struct bpf_sock *)elem->sk;
> +	}
> +
> +	return bpf_iter_run_prog(prog, &ctx);
> +}
> +
> +static int sock_hash_seq_show(struct seq_file *seq, void *v)
> +{
> +	return __sock_hash_seq_show(seq, v);
> +}
> +
> +static void sock_hash_seq_stop(struct seq_file *seq, void *v)
> +{
> +	if (!v)
> +		(void)__sock_hash_seq_show(seq, NULL);
> +
> +	/* pairs with sock_hash_seq_start */
> +	rcu_read_unlock();
> +}
> +
> +static const struct seq_operations sock_hash_seq_ops = {
> +	.start	= sock_hash_seq_start,
> +	.next	= sock_hash_seq_next,
> +	.stop	= sock_hash_seq_stop,
> +	.show	= sock_hash_seq_show,
> +};
> +
> +static int sock_hash_init_seq_private(void *priv_data,
> +				     struct bpf_iter_aux_info *aux)
> +{
> +	struct sock_hash_seq_info *info = priv_data;
> +
> +	info->map = aux->map;
> +	info->htab = container_of(aux->map, struct bpf_shtab, map);
> +	return 0;
> +}
> +
> +static const struct bpf_iter_seq_info sock_hash_iter_seq_info = {
> +	.seq_ops		= &sock_hash_seq_ops,
> +	.init_seq_private	= sock_hash_init_seq_private,
> +	.seq_priv_size		= sizeof(struct sock_hash_seq_info),
> +};
> +
>   static int sock_hash_map_btf_id;
>   const struct bpf_map_ops sock_hash_ops = {
>   	.map_alloc		= sock_hash_alloc,
> @@ -1211,6 +1435,7 @@ const struct bpf_map_ops sock_hash_ops = {
>   	.map_check_btf		= map_check_no_btf,
>   	.map_btf_name		= "bpf_shtab",
>   	.map_btf_id		= &sock_hash_map_btf_id,
> +	.iter_seq_info		= &sock_hash_iter_seq_info,
>   };
>   
>   static struct sk_psock_progs *sock_map_progs(struct bpf_map *map)
> @@ -1321,3 +1546,61 @@ void sock_map_close(struct sock *sk, long timeout)
>   	release_sock(sk);
>   	saved_close(sk, timeout);
>   }
> +
> +static int sock_map_iter_attach_target(struct bpf_prog *prog,
> +				       union bpf_iter_link_info *linfo,
> +				       struct bpf_iter_aux_info *aux)
> +{
> +	struct bpf_map *map;
> +	int err = -EINVAL;
> +
> +	if (!linfo->map.map_fd)
> +		return -EBADF;
> +
> +	map = bpf_map_get_with_uref(linfo->map.map_fd);
> +	if (IS_ERR(map))
> +		return PTR_ERR(map);
> +
> +	if (map->map_type != BPF_MAP_TYPE_SOCKMAP &&
> +	    map->map_type != BPF_MAP_TYPE_SOCKHASH)
> +		goto put_map;
> +
> +	if (prog->aux->max_rdonly_access > map->key_size) {
> +		err = -EACCES;
> +		goto put_map;
> +	}
> +
> +	aux->map = map;
> +	return 0;
> +
> +put_map:
> +	bpf_map_put_with_uref(map);
> +	return err;
> +}
> +
> +static void sock_map_iter_detach_target(struct bpf_iter_aux_info *aux)
> +{
> +	bpf_map_put_with_uref(aux->map);
> +}
> +
> +static struct bpf_iter_reg sock_map_iter_reg = {
> +	.target			= "sockmap",
> +	.attach_target		= sock_map_iter_attach_target,
> +	.detach_target		= sock_map_iter_detach_target,
> +	.show_fdinfo		= bpf_iter_map_show_fdinfo,
> +	.fill_link_info		= bpf_iter_map_fill_link_info,
> +	.ctx_arg_info_size	= 2,
> +	.ctx_arg_info		= {
> +		{ offsetof(struct bpf_iter__sockmap, key),
> +		  PTR_TO_RDONLY_BUF_OR_NULL },
> +		{ offsetof(struct bpf_iter__sockmap, sk),
> +		  PTR_TO_SOCKET_OR_NULL },
> +	},
> +	.seq_info		= &sock_map_iter_seq_info,

The .seq_info here is not needed here. The sock_map_iter_seq_info
or sock_hash_iter_seq_info already registered in corresponding
map_ops.

> +};
> +
> +static int __init bpf_sockmap_iter_init(void)
> +{
> +	return bpf_iter_reg_target(&sock_map_iter_reg);
> +}
> +late_initcall(bpf_sockmap_iter_init);
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/4] Sockmap iterator
  2020-09-01 10:32 [PATCH bpf-next v2 0/4] Sockmap iterator Lorenz Bauer
                   ` (3 preceding siblings ...)
  2020-09-01 10:32 ` [PATCH bpf-next v2 4/4] selftests: bpf: Test copying a sockmap via bpf_iter Lorenz Bauer
@ 2020-09-02  5:30 ` Yonghong Song
  2020-09-03 20:08 ` Martin KaFai Lau
  5 siblings, 0 replies; 15+ messages in thread
From: Yonghong Song @ 2020-09-02  5:30 UTC (permalink / raw)
  To: Lorenz Bauer, ast, daniel, jakub, john.fastabend; +Cc: bpf, kernel-team



On 9/1/20 3:32 AM, Lorenz Bauer wrote:
> This addresses Jakub's feedback for the v1 [1]. Outstanding issues are:
> 
> * Can we use rcu_dereference instead of rcu_dereference_raw?
> * Is it OK to not take the bucket lock?
> * Can we teach the verifier that PTR_TO_BTF_ID can be the same as PTR_TO_SOCKET?

For the last question, I see current implementation is

+	.ctx_arg_info		= {
+		{ offsetof(struct bpf_iter__sockmap, key),
+		  PTR_TO_RDONLY_BUF_OR_NULL },
+		{ offsetof(struct bpf_iter__sockmap, sk),
+		  PTR_TO_SOCKET_OR_NULL },
+	},

The PTR_TO_BTF_ID might be better as it provides more flexibility and
more fields to access. If in the verifier PTR_TO_SOCKET reg type
is desired, you could add a callback somewhere to check:
    if the type is PTR_TO_BTF_ID and the btf_id is 'struct socket",
    then you can convert reg type to PTR_TO_SOCKET_OR_NULL.
PTR_TO_SOCK_OR_NULL is used here since PTR_TO_BTF_ID might be
a null pointer.

We could add a bit in register state to indicate a PTR_TO_BTF_ID
could be possibly null or not. For example, a conversion from
PTR_TO_BTF_ID_OR_NULL to PTR_TO_BTF_ID will yield non-null btf_id.
A pointer tracing will yield a possible-null btf_id. If we have this,
it is possible to convert a PTR_TO_BTF_ID to PTR_TO_SOCKET.

> 
> Changes in v2:
> - Remove unnecessary sk_fullsock checks (Jakub)
> - Nits for test output (Jakub)
> - Increase number of sockets in tests to 64 (Jakub)
> - Handle ENOENT in tests (Jakub)
> - Actually test SOCKHASH iteration (myself)
> - Fix SOCKHASH iterator initialization (myself)
> 
> 1: https://lore.kernel.org/bpf/20200828094834.23290-1-lmb@cloudflare.com/
> 
> Lorenz Bauer (4):
>    net: sockmap: Remove unnecessary sk_fullsock checks
>    net: Allow iterating sockmap and sockhash
>    selftests: bpf: Add helper to compare socket cookies
>    selftests: bpf: Test copying a sockmap via bpf_iter
> 
>   net/core/sock_map.c                           | 287 +++++++++++++++++-
>   .../selftests/bpf/prog_tests/sockmap_basic.c  | 141 ++++++++-
>   tools/testing/selftests/bpf/progs/bpf_iter.h  |   9 +
>   .../selftests/bpf/progs/bpf_iter_sockmap.c    |  58 ++++
>   .../selftests/bpf/progs/bpf_iter_sockmap.h    |   3 +
>   5 files changed, 482 insertions(+), 16 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
>   create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash
  2020-09-02  5:08   ` Yonghong Song
@ 2020-09-02  8:58     ` Lorenz Bauer
  2020-09-02 17:29       ` Yonghong Song
  0 siblings, 1 reply; 15+ messages in thread
From: Lorenz Bauer @ 2020-09-02  8:58 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Daniel Borkmann, Jakub Sitnicki,
	John Fastabend, bpf, kernel-team

On Wed, 2 Sep 2020 at 06:08, Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 9/1/20 3:32 AM, Lorenz Bauer wrote:
> > Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and
> > hashtable implementation. sockmap and sockhash share the same iteration
> > context: a pointer to an arbitrary key and a pointer to a socket. Both
> > pointers may be NULL, and so BPF has to perform a NULL check before accessing
> > them. Technically it's not possible for sockhash iteration to yield a NULL
> > socket, but we ignore this to be able to use a single iteration point.
>
> There seems some misunderstanding about when the NULL object may be
> passed to the bpf program. The original design is to always pass valid
> object except the last iteration. If you see a NULL object (key/sock),
> that means it is the end of iteration, so you can finish aggregating
> the data and may send them to the user space.
>
> Sending NULL object in the middle of iterations is bad as bpf program
> will just check NULL and returns.
>
> The current arraymap iterator returns valid value for each index
> in the range. Since even if user did not assign anything for an index,
> the kernel still allocates storage for that index with default value 0.

I think that the current behaviour is actually in line with the NULL:
while sk may be NULL during iteration, key is guaranteed to be !NULL
except on the last iteration. This holds for sockmap and sockhash.

The wording of the commit message is a bit lacking. I'm trying to say
that for sockhash, ctx->sk will never be NULL (except during) the last
iteration. So in theory, sockhash could use a distinct context which
uses PTR_TO_SOCKET instead of PTR_TO_SOCKET_OR_NULL. However, I think
a single context for both sockmap and sockhash is nicer.

>
> For sockmap, it looks it is possible that some index may contain NULL
> socket pointer. I suggest skip these array elements and already returns
> a non-NULL object.

I think this would make sockmap inconsistent with other array map
iteration. It also breaks a use case: as you can see in the test, I
use sk == NULL to trigger map_delete_elem in the target, instead of
map_update_elem.

>
> >
> > Iteration will visit all keys that remain unmodified during the lifetime of
> > the iterator. It may or may not visit newly added ones.
> >
> > Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> > ---
> >   net/core/sock_map.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 283 insertions(+)
> >
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index ffdf94a30c87..4767f9df2b8b 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -703,6 +703,114 @@ const struct bpf_func_proto bpf_msg_redirect_map_proto = {
> >       .arg4_type      = ARG_ANYTHING,
> >   };
> >
> > +struct sock_map_seq_info {
> > +     struct bpf_map *map;
> > +     struct sock *sk;
> > +     u32 index;
> > +};
> > +
> > +struct bpf_iter__sockmap {
> > +     __bpf_md_ptr(struct bpf_iter_meta *, meta);
> > +     __bpf_md_ptr(struct bpf_map *, map);
> > +     __bpf_md_ptr(void *, key);
> > +     __bpf_md_ptr(struct bpf_sock *, sk);
> > +};
> > +
> > +DEFINE_BPF_ITER_FUNC(sockmap, struct bpf_iter_meta *meta,
> > +                  struct bpf_map *map, void *key,
> > +                  struct sock *sk)
> > +
> > +static void *sock_map_seq_lookup_elem(struct sock_map_seq_info *info)
> > +{
> > +     if (unlikely(info->index >= info->map->max_entries))
> > +             return NULL;
> > +
> > +     info->sk = __sock_map_lookup_elem(info->map, info->index);
> > +
> > +     /* can't return sk directly, since that might be NULL */
>
> As said in the above, suggest to skip NULL socket and always return
> valid non-NULL socket.
>
> > +     return info;
> > +}
> > +
> > +static void *sock_map_seq_start(struct seq_file *seq, loff_t *pos)
> > +{
> > +     struct sock_map_seq_info *info = seq->private;
> > +
> > +     if (*pos == 0)
> > +             ++*pos;
> > +
> > +     /* pairs with sock_map_seq_stop */
> > +     rcu_read_lock();
> > +     return sock_map_seq_lookup_elem(info);
> > +}
> > +
> > +static void *sock_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > +{
> > +     struct sock_map_seq_info *info = seq->private;
> > +
> > +     ++*pos;
> > +     ++info->index;
> > +
> > +     return sock_map_seq_lookup_elem(info);
> > +}
> > +
> > +static int __sock_map_seq_show(struct seq_file *seq, void *v)
> > +{
> > +     struct sock_map_seq_info *info = seq->private;
> > +     struct bpf_iter__sockmap ctx = {};
> > +     struct bpf_iter_meta meta;
> > +     struct bpf_prog *prog;
> > +
> > +     meta.seq = seq;
> > +     prog = bpf_iter_get_info(&meta, !v);
> > +     if (!prog)
> > +             return 0;
> > +
> > +     ctx.meta = &meta;
> > +     ctx.map = info->map;
> > +     if (v) {
> > +             ctx.key = &info->index;
> > +             ctx.sk = (struct bpf_sock *)info->sk;
> > +     }
> > +
> > +     return bpf_iter_run_prog(prog, &ctx);
> > +}
> > +
> > +static int sock_map_seq_show(struct seq_file *seq, void *v)
> > +{
> > +     return __sock_map_seq_show(seq, v);
> > +}
> > +
> > +static void sock_map_seq_stop(struct seq_file *seq, void *v)
> > +{
> > +     if (!v)
> > +             (void)__sock_map_seq_show(seq, NULL);
> > +
> > +     /* pairs with sock_map_seq_start */
> > +     rcu_read_unlock();
> > +}
> > +
> > +static const struct seq_operations sock_map_seq_ops = {
> > +     .start  = sock_map_seq_start,
> > +     .next   = sock_map_seq_next,
> > +     .stop   = sock_map_seq_stop,
> > +     .show   = sock_map_seq_show,
> > +};
> > +
> > +static int sock_map_init_seq_private(void *priv_data,
> > +                                  struct bpf_iter_aux_info *aux)
> > +{
> > +     struct sock_map_seq_info *info = priv_data;
> > +
> > +     info->map = aux->map;
> > +     return 0;
> > +}
> > +
> > +static const struct bpf_iter_seq_info sock_map_iter_seq_info = {
> > +     .seq_ops                = &sock_map_seq_ops,
> > +     .init_seq_private       = sock_map_init_seq_private,
> > +     .seq_priv_size          = sizeof(struct sock_map_seq_info),
> > +};
> > +
> >   static int sock_map_btf_id;
> >   const struct bpf_map_ops sock_map_ops = {
> >       .map_alloc              = sock_map_alloc,
> > @@ -716,6 +824,7 @@ const struct bpf_map_ops sock_map_ops = {
> >       .map_check_btf          = map_check_no_btf,
> >       .map_btf_name           = "bpf_stab",
> >       .map_btf_id             = &sock_map_btf_id,
> > +     .iter_seq_info          = &sock_map_iter_seq_info,
> >   };
> >
> >   struct bpf_shtab_elem {
> > @@ -1198,6 +1307,121 @@ const struct bpf_func_proto bpf_msg_redirect_hash_proto = {
> >       .arg4_type      = ARG_ANYTHING,
> >   };
> >
> > +struct sock_hash_seq_info {
> > +     struct bpf_map *map;
> > +     struct bpf_shtab *htab;
> > +     u32 bucket_id;
> > +};
> > +
> > +static void *sock_hash_seq_find_next(struct sock_hash_seq_info *info,
> > +                                  struct bpf_shtab_elem *prev_elem)
> > +{
> > +     const struct bpf_shtab *htab = info->htab;
> > +     struct bpf_shtab_bucket *bucket;
> > +     struct bpf_shtab_elem *elem;
> > +     struct hlist_node *node;
> > +
> > +     /* try to find next elem in the same bucket */
> > +     if (prev_elem) {
> > +             node = rcu_dereference_raw(hlist_next_rcu(&prev_elem->node));
> > +             elem = hlist_entry_safe(node, struct bpf_shtab_elem, node);
> > +             if (elem)
> > +                     return elem;
> > +
> > +             /* no more elements, continue in the next bucket */
> > +             info->bucket_id++;
> > +     }
>
> Looks like there are some earlier discussion on lock is not needed here?
> It would be good to add some comments here.

I've discussed it with Jakub off-list, but I was hoping that either
you or John can weigh in here. I think taking the bucket lock is
actually dangerous and could lead to a deadlock if the iterator
triggers an update of the same bucket.

Parts of sockhash were probably copied from the regular hashtable, so
maybe you can shed some light why the hashtable iterator takes the
bucket lock?

>
> > +     for (; info->bucket_id < htab->buckets_num; info->bucket_id++) {
> > +             bucket = &htab->buckets[info->bucket_id];
> > +             node = rcu_dereference_raw(hlist_first_rcu(&bucket->head));
> > +             elem = hlist_entry_safe(node, struct bpf_shtab_elem, node);
> > +             if (elem)
> > +                     return elem;
> > +     }
> > +
> > +     return NULL;
> > +}
> > +
> > +static void *sock_hash_seq_start(struct seq_file *seq, loff_t *pos)
> > +{
> > +     struct sock_hash_seq_info *info = seq->private;
> > +
> > +     if (*pos == 0)
> > +             ++*pos;
> > +
> > +     /* pairs with sock_hash_seq_stop */
> > +     rcu_read_lock();
> > +     return sock_hash_seq_find_next(info, NULL);
> > +}
> > +
> > +static void *sock_hash_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > +{
> > +     struct sock_hash_seq_info *info = seq->private;
> > +
> > +     ++*pos;
> > +     return sock_hash_seq_find_next(info, v);
> > +}
> > +
> > +static int __sock_hash_seq_show(struct seq_file *seq, struct bpf_shtab_elem *elem)
> > +{
> > +     struct sock_hash_seq_info *info = seq->private;
> > +     struct bpf_iter__sockmap ctx = {};
> > +     struct bpf_iter_meta meta;
> > +     struct bpf_prog *prog;
> > +
> > +     meta.seq = seq;
> > +     prog = bpf_iter_get_info(&meta, !elem);
> > +     if (!prog)
> > +             return 0;
> > +
> > +     ctx.meta = &meta;
> > +     ctx.map = info->map;
> > +     if (elem) {
> > +             ctx.key = elem->key;
> > +             ctx.sk = (struct bpf_sock *)elem->sk;
> > +     }
> > +
> > +     return bpf_iter_run_prog(prog, &ctx);
> > +}
> > +
> > +static int sock_hash_seq_show(struct seq_file *seq, void *v)
> > +{
> > +     return __sock_hash_seq_show(seq, v);
> > +}
> > +
> > +static void sock_hash_seq_stop(struct seq_file *seq, void *v)
> > +{
> > +     if (!v)
> > +             (void)__sock_hash_seq_show(seq, NULL);
> > +
> > +     /* pairs with sock_hash_seq_start */
> > +     rcu_read_unlock();
> > +}
> > +
> > +static const struct seq_operations sock_hash_seq_ops = {
> > +     .start  = sock_hash_seq_start,
> > +     .next   = sock_hash_seq_next,
> > +     .stop   = sock_hash_seq_stop,
> > +     .show   = sock_hash_seq_show,
> > +};
> > +
> > +static int sock_hash_init_seq_private(void *priv_data,
> > +                                  struct bpf_iter_aux_info *aux)
> > +{
> > +     struct sock_hash_seq_info *info = priv_data;
> > +
> > +     info->map = aux->map;
> > +     info->htab = container_of(aux->map, struct bpf_shtab, map);
> > +     return 0;
> > +}
> > +
> > +static const struct bpf_iter_seq_info sock_hash_iter_seq_info = {
> > +     .seq_ops                = &sock_hash_seq_ops,
> > +     .init_seq_private       = sock_hash_init_seq_private,
> > +     .seq_priv_size          = sizeof(struct sock_hash_seq_info),
> > +};
> > +
> >   static int sock_hash_map_btf_id;
> >   const struct bpf_map_ops sock_hash_ops = {
> >       .map_alloc              = sock_hash_alloc,
> > @@ -1211,6 +1435,7 @@ const struct bpf_map_ops sock_hash_ops = {
> >       .map_check_btf          = map_check_no_btf,
> >       .map_btf_name           = "bpf_shtab",
> >       .map_btf_id             = &sock_hash_map_btf_id,
> > +     .iter_seq_info          = &sock_hash_iter_seq_info,
> >   };
> >
> >   static struct sk_psock_progs *sock_map_progs(struct bpf_map *map)
> > @@ -1321,3 +1546,61 @@ void sock_map_close(struct sock *sk, long timeout)
> >       release_sock(sk);
> >       saved_close(sk, timeout);
> >   }
> > +
> > +static int sock_map_iter_attach_target(struct bpf_prog *prog,
> > +                                    union bpf_iter_link_info *linfo,
> > +                                    struct bpf_iter_aux_info *aux)
> > +{
> > +     struct bpf_map *map;
> > +     int err = -EINVAL;
> > +
> > +     if (!linfo->map.map_fd)
> > +             return -EBADF;
> > +
> > +     map = bpf_map_get_with_uref(linfo->map.map_fd);
> > +     if (IS_ERR(map))
> > +             return PTR_ERR(map);
> > +
> > +     if (map->map_type != BPF_MAP_TYPE_SOCKMAP &&
> > +         map->map_type != BPF_MAP_TYPE_SOCKHASH)
> > +             goto put_map;
> > +
> > +     if (prog->aux->max_rdonly_access > map->key_size) {
> > +             err = -EACCES;
> > +             goto put_map;
> > +     }
> > +
> > +     aux->map = map;
> > +     return 0;
> > +
> > +put_map:
> > +     bpf_map_put_with_uref(map);
> > +     return err;
> > +}
> > +
> > +static void sock_map_iter_detach_target(struct bpf_iter_aux_info *aux)
> > +{
> > +     bpf_map_put_with_uref(aux->map);
> > +}
> > +
> > +static struct bpf_iter_reg sock_map_iter_reg = {
> > +     .target                 = "sockmap",
> > +     .attach_target          = sock_map_iter_attach_target,
> > +     .detach_target          = sock_map_iter_detach_target,
> > +     .show_fdinfo            = bpf_iter_map_show_fdinfo,
> > +     .fill_link_info         = bpf_iter_map_fill_link_info,
> > +     .ctx_arg_info_size      = 2,
> > +     .ctx_arg_info           = {
> > +             { offsetof(struct bpf_iter__sockmap, key),
> > +               PTR_TO_RDONLY_BUF_OR_NULL },
> > +             { offsetof(struct bpf_iter__sockmap, sk),
> > +               PTR_TO_SOCKET_OR_NULL },
> > +     },
> > +     .seq_info               = &sock_map_iter_seq_info,
>
> The .seq_info here is not needed here. The sock_map_iter_seq_info
> or sock_hash_iter_seq_info already registered in corresponding
> map_ops.

Ack.


>
> > +};
> > +
> > +static int __init bpf_sockmap_iter_init(void)
> > +{
> > +     return bpf_iter_reg_target(&sock_map_iter_reg);
> > +}
> > +late_initcall(bpf_sockmap_iter_init);
> >



--
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash
  2020-09-02  8:58     ` Lorenz Bauer
@ 2020-09-02 17:29       ` Yonghong Song
  0 siblings, 0 replies; 15+ messages in thread
From: Yonghong Song @ 2020-09-02 17:29 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Alexei Starovoitov, Daniel Borkmann, Jakub Sitnicki,
	John Fastabend, bpf, kernel-team



On 9/2/20 1:58 AM, Lorenz Bauer wrote:
> On Wed, 2 Sep 2020 at 06:08, Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 9/1/20 3:32 AM, Lorenz Bauer wrote:
>>> Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and
>>> hashtable implementation. sockmap and sockhash share the same iteration
>>> context: a pointer to an arbitrary key and a pointer to a socket. Both
>>> pointers may be NULL, and so BPF has to perform a NULL check before accessing
>>> them. Technically it's not possible for sockhash iteration to yield a NULL
>>> socket, but we ignore this to be able to use a single iteration point.
>>
>> There seems some misunderstanding about when the NULL object may be
>> passed to the bpf program. The original design is to always pass valid
>> object except the last iteration. If you see a NULL object (key/sock),
>> that means it is the end of iteration, so you can finish aggregating
>> the data and may send them to the user space.
>>
>> Sending NULL object in the middle of iterations is bad as bpf program
>> will just check NULL and returns.
>>
>> The current arraymap iterator returns valid value for each index
>> in the range. Since even if user did not assign anything for an index,
>> the kernel still allocates storage for that index with default value 0.
> 
> I think that the current behaviour is actually in line with the NULL:
> while sk may be NULL during iteration, key is guaranteed to be !NULL
> except on the last iteration. This holds for sockmap and sockhash.
> 
> The wording of the commit message is a bit lacking. I'm trying to say
> that for sockhash, ctx->sk will never be NULL (except during) the last
> iteration. So in theory, sockhash could use a distinct context which
> uses PTR_TO_SOCKET instead of PTR_TO_SOCKET_OR_NULL. However, I think
> a single context for both sockmap and sockhash is nicer.
> 
>>
>> For sockmap, it looks it is possible that some index may contain NULL
>> socket pointer. I suggest skip these array elements and already returns
>> a non-NULL object.
> 
> I think this would make sockmap inconsistent with other array map
> iteration. It also breaks a use case: as you can see in the test, I
> use sk == NULL to trigger map_delete_elem in the target, instead of
> map_update_elem.

Sorry, I missed this one (using sk == NULL to trigger map_delete_elem)
in the test case.
I thought that key != NULL && sk == NULL is simply skipped by bpf 
program. If you have a use case for key != NULL && sk == NULL, yes,
you can pass to bpf program. So the current iterating of sockmap looks
okay to me.

> 
>>
>>>
>>> Iteration will visit all keys that remain unmodified during the lifetime of
>>> the iterator. It may or may not visit newly added ones.
>>>
>>> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
>>> ---
>>>    net/core/sock_map.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 283 insertions(+)
>>>
>>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>>> index ffdf94a30c87..4767f9df2b8b 100644
>>> --- a/net/core/sock_map.c
>>> +++ b/net/core/sock_map.c
>>> @@ -703,6 +703,114 @@ const struct bpf_func_proto bpf_msg_redirect_map_proto = {
>>>        .arg4_type      = ARG_ANYTHING,
>>>    };
>>>
>>> +struct sock_map_seq_info {
>>> +     struct bpf_map *map;
>>> +     struct sock *sk;
>>> +     u32 index;
>>> +};
>>> +
>>> +struct bpf_iter__sockmap {
>>> +     __bpf_md_ptr(struct bpf_iter_meta *, meta);
>>> +     __bpf_md_ptr(struct bpf_map *, map);
>>> +     __bpf_md_ptr(void *, key);
>>> +     __bpf_md_ptr(struct bpf_sock *, sk);
>>> +};
>>> +
>>> +DEFINE_BPF_ITER_FUNC(sockmap, struct bpf_iter_meta *meta,
>>> +                  struct bpf_map *map, void *key,
>>> +                  struct sock *sk)
>>> +
>>> +static void *sock_map_seq_lookup_elem(struct sock_map_seq_info *info)
>>> +{
>>> +     if (unlikely(info->index >= info->map->max_entries))
>>> +             return NULL;
>>> +
>>> +     info->sk = __sock_map_lookup_elem(info->map, info->index);
>>> +
>>> +     /* can't return sk directly, since that might be NULL */
>>
>> As said in the above, suggest to skip NULL socket and always return
>> valid non-NULL socket.
>>
>>> +     return info;
>>> +}
>>> +
>>> +static void *sock_map_seq_start(struct seq_file *seq, loff_t *pos)
>>> +{
>>> +     struct sock_map_seq_info *info = seq->private;
>>> +
>>> +     if (*pos == 0)
>>> +             ++*pos;
>>> +
>>> +     /* pairs with sock_map_seq_stop */
>>> +     rcu_read_lock();
>>> +     return sock_map_seq_lookup_elem(info);
>>> +}
>>> +
>>> +static void *sock_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>>> +{
>>> +     struct sock_map_seq_info *info = seq->private;
>>> +
>>> +     ++*pos;
>>> +     ++info->index;
>>> +
>>> +     return sock_map_seq_lookup_elem(info);
>>> +}
>>> +
>>> +static int __sock_map_seq_show(struct seq_file *seq, void *v)
>>> +{
>>> +     struct sock_map_seq_info *info = seq->private;
>>> +     struct bpf_iter__sockmap ctx = {};
>>> +     struct bpf_iter_meta meta;
>>> +     struct bpf_prog *prog;
>>> +
>>> +     meta.seq = seq;
>>> +     prog = bpf_iter_get_info(&meta, !v);
>>> +     if (!prog)
>>> +             return 0;
>>> +
>>> +     ctx.meta = &meta;
>>> +     ctx.map = info->map;
>>> +     if (v) {
>>> +             ctx.key = &info->index;
>>> +             ctx.sk = (struct bpf_sock *)info->sk;
>>> +     }
>>> +
>>> +     return bpf_iter_run_prog(prog, &ctx);
>>> +}
>>> +
>>> +static int sock_map_seq_show(struct seq_file *seq, void *v)
>>> +{
>>> +     return __sock_map_seq_show(seq, v);
>>> +}
>>> +
>>> +static void sock_map_seq_stop(struct seq_file *seq, void *v)
>>> +{
>>> +     if (!v)
>>> +             (void)__sock_map_seq_show(seq, NULL);
>>> +
>>> +     /* pairs with sock_map_seq_start */
>>> +     rcu_read_unlock();
>>> +}
>>> +
>>> +static const struct seq_operations sock_map_seq_ops = {
>>> +     .start  = sock_map_seq_start,
>>> +     .next   = sock_map_seq_next,
>>> +     .stop   = sock_map_seq_stop,
>>> +     .show   = sock_map_seq_show,
>>> +};
>>> +
>>> +static int sock_map_init_seq_private(void *priv_data,
>>> +                                  struct bpf_iter_aux_info *aux)
>>> +{
>>> +     struct sock_map_seq_info *info = priv_data;
>>> +
>>> +     info->map = aux->map;
>>> +     return 0;
>>> +}
>>> +
>>> +static const struct bpf_iter_seq_info sock_map_iter_seq_info = {
>>> +     .seq_ops                = &sock_map_seq_ops,
>>> +     .init_seq_private       = sock_map_init_seq_private,
>>> +     .seq_priv_size          = sizeof(struct sock_map_seq_info),
>>> +};
>>> +
>>>    static int sock_map_btf_id;
>>>    const struct bpf_map_ops sock_map_ops = {
>>>        .map_alloc              = sock_map_alloc,
>>> @@ -716,6 +824,7 @@ const struct bpf_map_ops sock_map_ops = {
>>>        .map_check_btf          = map_check_no_btf,
>>>        .map_btf_name           = "bpf_stab",
>>>        .map_btf_id             = &sock_map_btf_id,
>>> +     .iter_seq_info          = &sock_map_iter_seq_info,
>>>    };
>>>
>>>    struct bpf_shtab_elem {
>>> @@ -1198,6 +1307,121 @@ const struct bpf_func_proto bpf_msg_redirect_hash_proto = {
>>>        .arg4_type      = ARG_ANYTHING,
>>>    };
>>>
>>> +struct sock_hash_seq_info {
>>> +     struct bpf_map *map;
>>> +     struct bpf_shtab *htab;
>>> +     u32 bucket_id;
>>> +};
>>> +
>>> +static void *sock_hash_seq_find_next(struct sock_hash_seq_info *info,
>>> +                                  struct bpf_shtab_elem *prev_elem)
>>> +{
>>> +     const struct bpf_shtab *htab = info->htab;
>>> +     struct bpf_shtab_bucket *bucket;
>>> +     struct bpf_shtab_elem *elem;
>>> +     struct hlist_node *node;
>>> +
>>> +     /* try to find next elem in the same bucket */
>>> +     if (prev_elem) {
>>> +             node = rcu_dereference_raw(hlist_next_rcu(&prev_elem->node));
>>> +             elem = hlist_entry_safe(node, struct bpf_shtab_elem, node);
>>> +             if (elem)
>>> +                     return elem;
>>> +
>>> +             /* no more elements, continue in the next bucket */
>>> +             info->bucket_id++;
>>> +     }
>>
>> Looks like there are some earlier discussion on lock is not needed here?
>> It would be good to add some comments here.
> 
> I've discussed it with Jakub off-list, but I was hoping that either
> you or John can weigh in here. I think taking the bucket lock is
> actually dangerous and could lead to a deadlock if the iterator
> triggers an update of the same bucket.
> 
> Parts of sockhash were probably copied from the regular hashtable, so
> maybe you can shed some light why the hashtable iterator takes the
> bucket lock?

We need to take bucket lock since another bpf program might be 
update/delete elements in the hash table. In that case, bpf program
may see the garbage value.  We did not use rcu read lock, which
is certainly another option.

To support the deletion during iterating, the original thinking
is to accumulate the keys to be deleted and deleting them after
the whole bucket has been visited and bucket lock is released.
This is not implemented yet. This might apply to update as well.

You brought up a good point here. Currently for hashmap and
sk_storage_map iterators, bucket lock is taken, this will have
issues if the same map is updated or deleted in bpf program.
rcu read lock might be a good alternative here unless we want
verifier to restrict the visited map should not have update/delete
operation in the bpf program. Will look into it more.

Another data point, existing netlink iterator also uses 
rcu_read_{lock,unlock}.
  seq_ops->start():
      rcu_read_lock();
  seq_ops->next():
      rcu_read_unlock();
      /* next element */
      rcu_read_lock();
  seq_ops->stop();
      rcu_read_unlock();

> 
>>
>>> +     for (; info->bucket_id < htab->buckets_num; info->bucket_id++) {
>>> +             bucket = &htab->buckets[info->bucket_id];
>>> +             node = rcu_dereference_raw(hlist_first_rcu(&bucket->head));
>>> +             elem = hlist_entry_safe(node, struct bpf_shtab_elem, node);
>>> +             if (elem)
>>> +                     return elem;
>>> +     }
>>> +
>>> +     return NULL;
>>> +}
>>> +
>>> +static void *sock_hash_seq_start(struct seq_file *seq, loff_t *pos)
>>> +{
>>> +     struct sock_hash_seq_info *info = seq->private;
>>> +
>>> +     if (*pos == 0)
>>> +             ++*pos;
>>> +
>>> +     /* pairs with sock_hash_seq_stop */
>>> +     rcu_read_lock();
>>> +     return sock_hash_seq_find_next(info, NULL);
>>> +}
>>> +
>>> +static void *sock_hash_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>>> +{
>>> +     struct sock_hash_seq_info *info = seq->private;
>>> +
>>> +     ++*pos;
>>> +     return sock_hash_seq_find_next(info, v);
>>> +}
>>> +
>>> +static int __sock_hash_seq_show(struct seq_file *seq, struct bpf_shtab_elem *elem)
>>> +{
>>> +     struct sock_hash_seq_info *info = seq->private;
>>> +     struct bpf_iter__sockmap ctx = {};
>>> +     struct bpf_iter_meta meta;
>>> +     struct bpf_prog *prog;
>>> +
>>> +     meta.seq = seq;
>>> +     prog = bpf_iter_get_info(&meta, !elem);
>>> +     if (!prog)
>>> +             return 0;
>>> +
>>> +     ctx.meta = &meta;
>>> +     ctx.map = info->map;
>>> +     if (elem) {
>>> +             ctx.key = elem->key;
>>> +             ctx.sk = (struct bpf_sock *)elem->sk;
>>> +     }
>>> +
>>> +     return bpf_iter_run_prog(prog, &ctx);
>>> +}
>>> +
>>> +static int sock_hash_seq_show(struct seq_file *seq, void *v)
>>> +{
>>> +     return __sock_hash_seq_show(seq, v);
>>> +}
>>> +
>>> +static void sock_hash_seq_stop(struct seq_file *seq, void *v)
>>> +{
>>> +     if (!v)
>>> +             (void)__sock_hash_seq_show(seq, NULL);
>>> +
>>> +     /* pairs with sock_hash_seq_start */
>>> +     rcu_read_unlock();
>>> +}
>>> +
>>> +static const struct seq_operations sock_hash_seq_ops = {
>>> +     .start  = sock_hash_seq_start,
>>> +     .next   = sock_hash_seq_next,
>>> +     .stop   = sock_hash_seq_stop,
>>> +     .show   = sock_hash_seq_show,
>>> +};
>>> +
>>> +static int sock_hash_init_seq_private(void *priv_data,
>>> +                                  struct bpf_iter_aux_info *aux)
>>> +{
>>> +     struct sock_hash_seq_info *info = priv_data;
>>> +
>>> +     info->map = aux->map;
>>> +     info->htab = container_of(aux->map, struct bpf_shtab, map);
>>> +     return 0;
>>> +}
>>> +
>>> +static const struct bpf_iter_seq_info sock_hash_iter_seq_info = {
>>> +     .seq_ops                = &sock_hash_seq_ops,
>>> +     .init_seq_private       = sock_hash_init_seq_private,
>>> +     .seq_priv_size          = sizeof(struct sock_hash_seq_info),
>>> +};
>>> +
>>>    static int sock_hash_map_btf_id;
>>>    const struct bpf_map_ops sock_hash_ops = {
>>>        .map_alloc              = sock_hash_alloc,
>>> @@ -1211,6 +1435,7 @@ const struct bpf_map_ops sock_hash_ops = {
>>>        .map_check_btf          = map_check_no_btf,
>>>        .map_btf_name           = "bpf_shtab",
>>>        .map_btf_id             = &sock_hash_map_btf_id,
>>> +     .iter_seq_info          = &sock_hash_iter_seq_info,
>>>    };
>>>
>>>    static struct sk_psock_progs *sock_map_progs(struct bpf_map *map)
>>> @@ -1321,3 +1546,61 @@ void sock_map_close(struct sock *sk, long timeout)
>>>        release_sock(sk);
>>>        saved_close(sk, timeout);
>>>    }
>>> +
>>> +static int sock_map_iter_attach_target(struct bpf_prog *prog,
>>> +                                    union bpf_iter_link_info *linfo,
>>> +                                    struct bpf_iter_aux_info *aux)
>>> +{
>>> +     struct bpf_map *map;
>>> +     int err = -EINVAL;
>>> +
>>> +     if (!linfo->map.map_fd)
>>> +             return -EBADF;
>>> +
>>> +     map = bpf_map_get_with_uref(linfo->map.map_fd);
>>> +     if (IS_ERR(map))
>>> +             return PTR_ERR(map);
>>> +
>>> +     if (map->map_type != BPF_MAP_TYPE_SOCKMAP &&
>>> +         map->map_type != BPF_MAP_TYPE_SOCKHASH)
>>> +             goto put_map;
>>> +
>>> +     if (prog->aux->max_rdonly_access > map->key_size) {
>>> +             err = -EACCES;
>>> +             goto put_map;
>>> +     }
>>> +
>>> +     aux->map = map;
>>> +     return 0;
>>> +
>>> +put_map:
>>> +     bpf_map_put_with_uref(map);
>>> +     return err;
>>> +}
>>> +
>>> +static void sock_map_iter_detach_target(struct bpf_iter_aux_info *aux)
>>> +{
>>> +     bpf_map_put_with_uref(aux->map);
>>> +}
>>> +
>>> +static struct bpf_iter_reg sock_map_iter_reg = {
>>> +     .target                 = "sockmap",
>>> +     .attach_target          = sock_map_iter_attach_target,
>>> +     .detach_target          = sock_map_iter_detach_target,
>>> +     .show_fdinfo            = bpf_iter_map_show_fdinfo,
>>> +     .fill_link_info         = bpf_iter_map_fill_link_info,
>>> +     .ctx_arg_info_size      = 2,
>>> +     .ctx_arg_info           = {
>>> +             { offsetof(struct bpf_iter__sockmap, key),
>>> +               PTR_TO_RDONLY_BUF_OR_NULL },
>>> +             { offsetof(struct bpf_iter__sockmap, sk),
>>> +               PTR_TO_SOCKET_OR_NULL },
>>> +     },
>>> +     .seq_info               = &sock_map_iter_seq_info,
>>
>> The .seq_info here is not needed here. The sock_map_iter_seq_info
>> or sock_hash_iter_seq_info already registered in corresponding
>> map_ops.
> 
> Ack.
> 
> 
>>
>>> +};
>>> +
>>> +static int __init bpf_sockmap_iter_init(void)
>>> +{
>>> +     return bpf_iter_reg_target(&sock_map_iter_reg);
>>> +}
>>> +late_initcall(bpf_sockmap_iter_init);
>>>
> 
> 
> 
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cloudflare.com&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=BteHwk9J57c1AK2H61fPmukqxtl7p3ULOX0Wh33EHRY&s=wOObIqNJV3t4PuUNP8wjKVpF6Jdg0KWurobhHNjpbuw&e=
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash
  2020-09-01 10:32 ` [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash Lorenz Bauer
  2020-09-02  5:08   ` Yonghong Song
@ 2020-09-03  1:31   ` Andrii Nakryiko
  2020-09-03  2:09     ` Yonghong Song
  1 sibling, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2020-09-03  1:31 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Alexei Starovoitov, Yonghong Song, Daniel Borkmann,
	Jakub Sitnicki, john fastabend, bpf, kernel-team

On Tue, Sep 1, 2020 at 3:33 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and
> hashtable implementation. sockmap and sockhash share the same iteration
> context: a pointer to an arbitrary key and a pointer to a socket. Both
> pointers may be NULL, and so BPF has to perform a NULL check before accessing
> them. Technically it's not possible for sockhash iteration to yield a NULL
> socket, but we ignore this to be able to use a single iteration point.
>
> Iteration will visit all keys that remain unmodified during the lifetime of
> the iterator. It may or may not visit newly added ones.
>
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>  net/core/sock_map.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 283 insertions(+)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index ffdf94a30c87..4767f9df2b8b 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -703,6 +703,114 @@ const struct bpf_func_proto bpf_msg_redirect_map_proto = {
>         .arg4_type      = ARG_ANYTHING,
>  };
>
> +struct sock_map_seq_info {
> +       struct bpf_map *map;
> +       struct sock *sk;
> +       u32 index;
> +};
> +
> +struct bpf_iter__sockmap {
> +       __bpf_md_ptr(struct bpf_iter_meta *, meta);
> +       __bpf_md_ptr(struct bpf_map *, map);
> +       __bpf_md_ptr(void *, key);

For sockhash, the key can be of an arbitrary size, right? Should the
key_size be part of bpf_iter__sockmap then?

> +       __bpf_md_ptr(struct bpf_sock *, sk);
> +};
> +

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash
  2020-09-03  1:31   ` Andrii Nakryiko
@ 2020-09-03  2:09     ` Yonghong Song
  0 siblings, 0 replies; 15+ messages in thread
From: Yonghong Song @ 2020-09-03  2:09 UTC (permalink / raw)
  To: Andrii Nakryiko, Lorenz Bauer
  Cc: Alexei Starovoitov, Daniel Borkmann, Jakub Sitnicki,
	john fastabend, bpf, kernel-team



On 9/2/20 6:31 PM, Andrii Nakryiko wrote:
> On Tue, Sep 1, 2020 at 3:33 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>>
>> Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and
>> hashtable implementation. sockmap and sockhash share the same iteration
>> context: a pointer to an arbitrary key and a pointer to a socket. Both
>> pointers may be NULL, and so BPF has to perform a NULL check before accessing
>> them. Technically it's not possible for sockhash iteration to yield a NULL
>> socket, but we ignore this to be able to use a single iteration point.
>>
>> Iteration will visit all keys that remain unmodified during the lifetime of
>> the iterator. It may or may not visit newly added ones.
>>
>> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
>> ---
>>   net/core/sock_map.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 283 insertions(+)
>>
>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>> index ffdf94a30c87..4767f9df2b8b 100644
>> --- a/net/core/sock_map.c
>> +++ b/net/core/sock_map.c
>> @@ -703,6 +703,114 @@ const struct bpf_func_proto bpf_msg_redirect_map_proto = {
>>          .arg4_type      = ARG_ANYTHING,
>>   };
>>
>> +struct sock_map_seq_info {
>> +       struct bpf_map *map;
>> +       struct sock *sk;
>> +       u32 index;
>> +};
>> +
>> +struct bpf_iter__sockmap {
>> +       __bpf_md_ptr(struct bpf_iter_meta *, meta);
>> +       __bpf_md_ptr(struct bpf_map *, map);
>> +       __bpf_md_ptr(void *, key);
> 
> For sockhash, the key can be of an arbitrary size, right? Should the
> key_size be part of bpf_iter__sockmap then?

key_size is available from map->key_size. map->value_size is
for value size.

> 
>> +       __bpf_md_ptr(struct bpf_sock *, sk);
>> +};
>> +
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 4/4] selftests: bpf: Test copying a sockmap via bpf_iter
  2020-09-01 10:32 ` [PATCH bpf-next v2 4/4] selftests: bpf: Test copying a sockmap via bpf_iter Lorenz Bauer
@ 2020-09-03  5:35   ` Andrii Nakryiko
  2020-09-03  8:38     ` Lorenz Bauer
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2020-09-03  5:35 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Alexei Starovoitov, Yonghong Song, Daniel Borkmann,
	Jakub Sitnicki, john fastabend, bpf, kernel-team

On Tue, Sep 1, 2020 at 3:33 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> Add a test that exercises a basic sockmap / sockhash copy using bpf_iter.
>
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---

just a bunch of nits, as I was passing by :-P

>  .../selftests/bpf/prog_tests/sockmap_basic.c  | 88 +++++++++++++++++++
>  tools/testing/selftests/bpf/progs/bpf_iter.h  |  9 ++
>  .../selftests/bpf/progs/bpf_iter_sockmap.c    | 58 ++++++++++++
>  .../selftests/bpf/progs/bpf_iter_sockmap.h    |  3 +
>  4 files changed, 158 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
>  create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> index 9569bbac7f6e..f5b7b27f096f 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> @@ -6,6 +6,9 @@
>  #include "test_skmsg_load_helpers.skel.h"
>  #include "test_sockmap_update.skel.h"
>  #include "test_sockmap_invalid_update.skel.h"
> +#include "bpf_iter_sockmap.skel.h"
> +
> +#include "progs/bpf_iter_sockmap.h"
>
>  #define TCP_REPAIR             19      /* TCP sock is under repair right now */
>
> @@ -196,6 +199,87 @@ static void test_sockmap_invalid_update(void)
>                 test_sockmap_invalid_update__destroy(skel);
>  }
>
> +static void test_sockmap_copy(enum bpf_map_type map_type)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> +       int err, len, src_fd, iter_fd, duration;
> +       union bpf_iter_link_info linfo = {0};

nit: misleading initialization, `= {}` is the same but doesn't imply
that you can fill union/struct with non-zeroes like this

> +       __s64 sock_fd[SOCKMAP_MAX_ENTRIES];
> +       __u32 i, num_sockets, max_elems;
> +       struct bpf_iter_sockmap *skel;
> +       struct bpf_map *src, *dst;
> +       struct bpf_link *link;
> +       char buf[64];
> +

[...]

> +SEC("iter/sockmap")
> +int copy_sockmap(struct bpf_iter__sockmap *ctx)
> +{
> +       struct bpf_sock *sk = ctx->sk;
> +       __u32 tmp, *key = ctx->key;
> +       int ret;
> +
> +       if (key == (void *)0)

nit: seems like a verbose way to just write `if (!key)`?

> +               return 0;
> +
> +       elems++;
> +
> +       /* We need a temporary buffer on the stack, since the verifier doesn't
> +        * let us use the pointer from the context as an argument to the helper.
> +        */
> +       tmp = *key;
> +       bpf_printk("key: %u\n", tmp);

is this intentional or a debugging leftover?

> +
> +       if (sk != (void *)0)
> +               return bpf_map_update_elem(&dst, &tmp, sk, 0) != 0;
> +
> +       ret = bpf_map_delete_elem(&dst, &tmp);
> +       return ret && ret != -ENOENT;
> +}
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
> new file mode 100644
> index 000000000000..f98ad727ac06
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
> @@ -0,0 +1,3 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define SOCKMAP_MAX_ENTRIES (64)
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 4/4] selftests: bpf: Test copying a sockmap via bpf_iter
  2020-09-03  5:35   ` Andrii Nakryiko
@ 2020-09-03  8:38     ` Lorenz Bauer
  0 siblings, 0 replies; 15+ messages in thread
From: Lorenz Bauer @ 2020-09-03  8:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Yonghong Song, Daniel Borkmann,
	Jakub Sitnicki, john fastabend, bpf, kernel-team

On Thu, 3 Sep 2020 at 06:35, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Sep 1, 2020 at 3:33 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
> >
> > Add a test that exercises a basic sockmap / sockhash copy using bpf_iter.
> >
> > Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> > ---
>
> just a bunch of nits, as I was passing by :-P

Appreciated, as always :)

>
> >  .../selftests/bpf/prog_tests/sockmap_basic.c  | 88 +++++++++++++++++++
> >  tools/testing/selftests/bpf/progs/bpf_iter.h  |  9 ++
> >  .../selftests/bpf/progs/bpf_iter_sockmap.c    | 58 ++++++++++++
> >  .../selftests/bpf/progs/bpf_iter_sockmap.h    |  3 +
> >  4 files changed, 158 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> > index 9569bbac7f6e..f5b7b27f096f 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> > @@ -6,6 +6,9 @@
> >  #include "test_skmsg_load_helpers.skel.h"
> >  #include "test_sockmap_update.skel.h"
> >  #include "test_sockmap_invalid_update.skel.h"
> > +#include "bpf_iter_sockmap.skel.h"
> > +
> > +#include "progs/bpf_iter_sockmap.h"
> >
> >  #define TCP_REPAIR             19      /* TCP sock is under repair right now */
> >
> > @@ -196,6 +199,87 @@ static void test_sockmap_invalid_update(void)
> >                 test_sockmap_invalid_update__destroy(skel);
> >  }
> >
> > +static void test_sockmap_copy(enum bpf_map_type map_type)
> > +{
> > +       DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> > +       int err, len, src_fd, iter_fd, duration;
> > +       union bpf_iter_link_info linfo = {0};
>
> nit: misleading initialization, `= {}` is the same but doesn't imply
> that you can fill union/struct with non-zeroes like this

Ack.
>
> > +       __s64 sock_fd[SOCKMAP_MAX_ENTRIES];
> > +       __u32 i, num_sockets, max_elems;
> > +       struct bpf_iter_sockmap *skel;
> > +       struct bpf_map *src, *dst;
> > +       struct bpf_link *link;
> > +       char buf[64];
> > +
>
> [...]
>
> > +SEC("iter/sockmap")
> > +int copy_sockmap(struct bpf_iter__sockmap *ctx)
> > +{
> > +       struct bpf_sock *sk = ctx->sk;
> > +       __u32 tmp, *key = ctx->key;
> > +       int ret;
> > +
> > +       if (key == (void *)0)
>
> nit: seems like a verbose way to just write `if (!key)`?

Yeah, this is copypasta from the other iterator test. I'll change this.

>
> > +               return 0;
> > +
> > +       elems++;
> > +
> > +       /* We need a temporary buffer on the stack, since the verifier doesn't
> > +        * let us use the pointer from the context as an argument to the helper.
> > +        */
> > +       tmp = *key;
> > +       bpf_printk("key: %u\n", tmp);
>
> is this intentional or a debugging leftover?

Oops!

>
> > +
> > +       if (sk != (void *)0)
> > +               return bpf_map_update_elem(&dst, &tmp, sk, 0) != 0;
> > +
> > +       ret = bpf_map_delete_elem(&dst, &tmp);
> > +       return ret && ret != -ENOENT;
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
> > new file mode 100644
> > index 000000000000..f98ad727ac06
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
> > @@ -0,0 +1,3 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#define SOCKMAP_MAX_ENTRIES (64)
> > --
> > 2.25.1
> >



-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/4] Sockmap iterator
  2020-09-01 10:32 [PATCH bpf-next v2 0/4] Sockmap iterator Lorenz Bauer
                   ` (4 preceding siblings ...)
  2020-09-02  5:30 ` [PATCH bpf-next v2 0/4] Sockmap iterator Yonghong Song
@ 2020-09-03 20:08 ` Martin KaFai Lau
  2020-09-04  7:46   ` Lorenz Bauer
  5 siblings, 1 reply; 15+ messages in thread
From: Martin KaFai Lau @ 2020-09-03 20:08 UTC (permalink / raw)
  To: Lorenz Bauer; +Cc: ast, yhs, daniel, jakub, john.fastabend, bpf, kernel-team

On Tue, Sep 01, 2020 at 11:32:06AM +0100, Lorenz Bauer wrote:
> * Can we teach the verifier that PTR_TO_BTF_ID can be the same as PTR_TO_SOCKET?
I am working on a patch to teach the verifier to allow PTR_TO_SOCK* being used
in the bpf_skc_to_*_sock() helper.

The use case is, for example, use bpf_skc_to_tcp_sock() to cast the
tc's __sk_buff->sk to a kernel "struct tcp_sock" (PTR_TO_BTF_ID) such
that the bpf prog won't be limited by the fields in "struct bpf_tcp_sock"
if the user has perfmon cap.   Thus, in general, should be doable.
Hopefully I have something sharable next week.

For the sockmap iter  (which is tracing), I think it is better
to begin with PTR_TO_BTF_ID_OR_NULL such that the bpf iter prog
can access the tcp_sock (or udp_sock) without another casting or
bpf_probe_read_kernel().

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/4] Sockmap iterator
  2020-09-03 20:08 ` Martin KaFai Lau
@ 2020-09-04  7:46   ` Lorenz Bauer
  0 siblings, 0 replies; 15+ messages in thread
From: Lorenz Bauer @ 2020-09-04  7:46 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Alexei Starovoitov, Yonghong Song, Daniel Borkmann,
	Jakub Sitnicki, John Fastabend, bpf, kernel-team

On Thu, 3 Sep 2020 at 21:08, Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Sep 01, 2020 at 11:32:06AM +0100, Lorenz Bauer wrote:
> > * Can we teach the verifier that PTR_TO_BTF_ID can be the same as PTR_TO_SOCKET?
> I am working on a patch to teach the verifier to allow PTR_TO_SOCK* being used
> in the bpf_skc_to_*_sock() helper.
>
> The use case is, for example, use bpf_skc_to_tcp_sock() to cast the
> tc's __sk_buff->sk to a kernel "struct tcp_sock" (PTR_TO_BTF_ID) such
> that the bpf prog won't be limited by the fields in "struct bpf_tcp_sock"
> if the user has perfmon cap.   Thus, in general, should be doable.
> Hopefully I have something sharable next week.

That's great! I got carried away with refactoring check_func_arg with
the same goal, it'll be interesting to see what you've come up with.
I've decided to make a smaller change for the sake of landing this
series, and then I'll follow up with my refactoring. Hopefully we
won't be stepping on each others toes too much.

>
> For the sockmap iter  (which is tracing), I think it is better
> to begin with PTR_TO_BTF_ID_OR_NULL such that the bpf iter prog
> can access the tcp_sock (or udp_sock) without another casting or
> bpf_probe_read_kernel().

Yes, that's what I'm going with as well.

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-09-04  7:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-01 10:32 [PATCH bpf-next v2 0/4] Sockmap iterator Lorenz Bauer
2020-09-01 10:32 ` [PATCH bpf-next v2 1/4] net: sockmap: Remove unnecessary sk_fullsock checks Lorenz Bauer
2020-09-01 10:32 ` [PATCH bpf-next v2 2/4] net: Allow iterating sockmap and sockhash Lorenz Bauer
2020-09-02  5:08   ` Yonghong Song
2020-09-02  8:58     ` Lorenz Bauer
2020-09-02 17:29       ` Yonghong Song
2020-09-03  1:31   ` Andrii Nakryiko
2020-09-03  2:09     ` Yonghong Song
2020-09-01 10:32 ` [PATCH bpf-next v2 3/4] selftests: bpf: Add helper to compare socket cookies Lorenz Bauer
2020-09-01 10:32 ` [PATCH bpf-next v2 4/4] selftests: bpf: Test copying a sockmap via bpf_iter Lorenz Bauer
2020-09-03  5:35   ` Andrii Nakryiko
2020-09-03  8:38     ` Lorenz Bauer
2020-09-02  5:30 ` [PATCH bpf-next v2 0/4] Sockmap iterator Yonghong Song
2020-09-03 20:08 ` Martin KaFai Lau
2020-09-04  7:46   ` Lorenz Bauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).