bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 bpf-next 0/7] bpf: Add socket destroy capability
@ 2023-03-30 15:17 Aditi Ghag
  2023-03-30 15:17 ` [PATCH v5 bpf-next 1/7] bpf: tcp: Avoid taking fast sock lock in iterator Aditi Ghag
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Aditi Ghag @ 2023-03-30 15:17 UTC (permalink / raw)
  To: bpf; +Cc: kafai, sdf, edumazet, aditi.ghag

This patch adds the capability to destroy sockets in BPF. We plan to use
the capability in Cilium to force client sockets to reconnect when their
remote load-balancing backends are deleted. The other use case is
on-the-fly policy enforcement where existing socket connections prevented
by policies need to be terminated.

The use cases, and more details around
the selected approach were presented at LPC 2022 -
https://lpc.events/event/16/contributions/1358/.
RFC discussion -
https://lore.kernel.org/netdev/CABG=zsBEh-P4NXk23eBJw7eajB5YJeRS7oPXnTAzs=yob4EMoQ@mail.gmail.com/T/#u.
v4 patch series -
https://lore.kernel.org/bpf/20230323200633.3175753-1-aditi.ghag@isovalent.com/T/#t

v5 highlights:
Address review comments:
Martin:
- Fixed bugs in the resume operation for UDP iterator.
- Added clean-up preparatory commits prior to the batching commit.
Stan:
- Programmatically retrieve and share server port in the selftests.
- Acquire slow sock lock in UDP iterator to be consistent with TCP.
- Addressed formatting nits.

(Below notes are same as v2 patch series that are still relevant. Refer to
earlier patch series for other notes.)
- I hit a snag while writing the kfunc where verifier complained about the
  `sock_common` type passed from TCP iterator. With kfuncs, there don't
  seem to be any options available to pass BTF type hints to the verifier
  (equivalent of `ARG_PTR_TO_BTF_ID_SOCK_COMMON`, as was the case with the
  helper).  As a result, I changed the argument type of the sock_destory
  kfunc to `sock_common`.

Aditi Ghag (7):
  bpf: tcp: Avoid taking fast sock lock in iterator
  udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state
  udp: seq_file: Helper function to match socket attributes
  bpf: udp: Implement batching for sockets iterator
  bpf: Add bpf_sock_destroy kfunc
  selftests/bpf: Add helper to get port using getsockname
  selftests/bpf: Test bpf_sock_destroy

 include/net/udp.h                             |   1 -
 net/core/filter.c                             |  54 ++++
 net/ipv4/tcp.c                                |  10 +-
 net/ipv4/tcp_ipv4.c                           |   5 +-
 net/ipv4/udp.c                                | 286 +++++++++++++++---
 tools/testing/selftests/bpf/network_helpers.c |  14 +
 tools/testing/selftests/bpf/network_helpers.h |   1 +
 .../selftests/bpf/prog_tests/sock_destroy.c   | 203 +++++++++++++
 .../selftests/bpf/progs/sock_destroy_prog.c   | 147 +++++++++
 9 files changed, 676 insertions(+), 45 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_destroy.c
 create mode 100644 tools/testing/selftests/bpf/progs/sock_destroy_prog.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v5 bpf-next 1/7] bpf: tcp: Avoid taking fast sock lock in iterator
  2023-03-30 15:17 [PATCH v5 bpf-next 0/7] bpf: Add socket destroy capability Aditi Ghag
@ 2023-03-30 15:17 ` Aditi Ghag
  2023-03-30 15:17 ` [PATCH v5 bpf-next 2/7] udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state Aditi Ghag
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Aditi Ghag @ 2023-03-30 15:17 UTC (permalink / raw)
  To: bpf; +Cc: kafai, sdf, edumazet, aditi.ghag

Previously, BPF TCP iterator was acquiring fast version of sock lock that
disables the BH. This introduced a circular dependency with code paths that
later acquire sockets hash table bucket lock.
Replace the fast version of sock lock with slow that faciliates BPF
programs executed from the iterator to destroy TCP listening sockets
using the bpf_sock_destroy kfunc (implemened in follow-up commits).

Here is a stack trace that motivated this change:

```
1) sock_lock with BH disabled + bucket lock

lock_acquire+0xcd/0x330
_raw_spin_lock_bh+0x38/0x50
inet_unhash+0x96/0xd0
tcp_set_state+0x6a/0x210
tcp_abort+0x12b/0x230
bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa
bpf_iter_run_prog+0x1ff/0x340
bpf_iter_tcp_seq_show+0xca/0x190
bpf_seq_read+0x177/0x450
vfs_read+0xc6/0x300
ksys_read+0x69/0xf0
do_syscall_64+0x3c/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc

2) sock lock with BH enable

[    1.499968]   lock_acquire+0xcd/0x330
[    1.500316]   _raw_spin_lock+0x33/0x40
[    1.500670]   sk_clone_lock+0x146/0x520
[    1.501030]   inet_csk_clone_lock+0x1b/0x110
[    1.501433]   tcp_create_openreq_child+0x22/0x3f0
[    1.501873]   tcp_v6_syn_recv_sock+0x96/0x940
[    1.502284]   tcp_check_req+0x137/0x660
[    1.502646]   tcp_v6_rcv+0xa63/0xe80
[    1.502994]   ip6_protocol_deliver_rcu+0x78/0x590
[    1.503434]   ip6_input_finish+0x72/0x140
[    1.503818]   __netif_receive_skb_one_core+0x63/0xa0
[    1.504281]   process_backlog+0x79/0x260
[    1.504668]   __napi_poll.constprop.0+0x27/0x170
[    1.505104]   net_rx_action+0x14a/0x2a0
[    1.505469]   __do_softirq+0x165/0x510
[    1.505842]   do_softirq+0xcd/0x100
[    1.506172]   __local_bh_enable_ip+0xcc/0xf0
[    1.506588]   ip6_finish_output2+0x2a8/0xb00
[    1.506988]   ip6_finish_output+0x274/0x510
[    1.507377]   ip6_xmit+0x319/0x9b0
[    1.507726]   inet6_csk_xmit+0x12b/0x2b0
[    1.508096]   __tcp_transmit_skb+0x549/0xc40
[    1.508498]   tcp_rcv_state_process+0x362/0x1180

```

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
---
 net/ipv4/tcp_ipv4.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ea370afa70ed..f2d370a9450f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2962,7 +2962,6 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v)
 	struct bpf_iter_meta meta;
 	struct bpf_prog *prog;
 	struct sock *sk = v;
-	bool slow;
 	uid_t uid;
 	int ret;
 
@@ -2970,7 +2969,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v)
 		return 0;
 
 	if (sk_fullsock(sk))
-		slow = lock_sock_fast(sk);
+		lock_sock(sk);
 
 	if (unlikely(sk_unhashed(sk))) {
 		ret = SEQ_SKIP;
@@ -2994,7 +2993,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v)
 
 unlock:
 	if (sk_fullsock(sk))
-		unlock_sock_fast(sk, slow);
+		release_sock(sk);
 	return ret;
 
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 bpf-next 2/7] udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state
  2023-03-30 15:17 [PATCH v5 bpf-next 0/7] bpf: Add socket destroy capability Aditi Ghag
  2023-03-30 15:17 ` [PATCH v5 bpf-next 1/7] bpf: tcp: Avoid taking fast sock lock in iterator Aditi Ghag
@ 2023-03-30 15:17 ` Aditi Ghag
  2023-03-30 17:35   ` Martin KaFai Lau
  2023-03-30 15:17 ` [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes Aditi Ghag
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Aditi Ghag @ 2023-03-30 15:17 UTC (permalink / raw)
  To: bpf; +Cc: kafai, sdf, edumazet, aditi.ghag, Martin KaFai Lau

This is a preparatory commit to remove the field. The field was
previously shared between proc fs and BPF UDP socket iterators. As the
follow-up commits will decouple the implementation for the iterators,
remove the field. As for BPF socket iterator, filtering of sockets is
exepected to be done in BPF programs.

Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
---
 include/net/udp.h |  1 -
 net/ipv4/udp.c    | 15 +++------------
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index de4b528522bb..5cad44318d71 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -437,7 +437,6 @@ struct udp_seq_afinfo {
 struct udp_iter_state {
 	struct seq_net_private  p;
 	int			bucket;
-	struct udp_seq_afinfo	*bpf_seq_afinfo;
 };
 
 void *udp_seq_start(struct seq_file *seq, loff_t *pos);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c605d171eb2d..c574c8c17ec9 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2997,10 +2997,7 @@ static struct sock *udp_get_first(struct seq_file *seq, int start)
 	struct udp_table *udptable;
 	struct sock *sk;
 
-	if (state->bpf_seq_afinfo)
-		afinfo = state->bpf_seq_afinfo;
-	else
-		afinfo = pde_data(file_inode(seq->file));
+	afinfo = pde_data(file_inode(seq->file));
 
 	udptable = udp_get_table_afinfo(afinfo, net);
 
@@ -3033,10 +3030,7 @@ static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk)
 	struct udp_seq_afinfo *afinfo;
 	struct udp_table *udptable;
 
-	if (state->bpf_seq_afinfo)
-		afinfo = state->bpf_seq_afinfo;
-	else
-		afinfo = pde_data(file_inode(seq->file));
+	afinfo = pde_data(file_inode(seq->file));
 
 	do {
 		sk = sk_next(sk);
@@ -3094,10 +3088,7 @@ void udp_seq_stop(struct seq_file *seq, void *v)
 	struct udp_seq_afinfo *afinfo;
 	struct udp_table *udptable;
 
-	if (state->bpf_seq_afinfo)
-		afinfo = state->bpf_seq_afinfo;
-	else
-		afinfo = pde_data(file_inode(seq->file));
+	afinfo = pde_data(file_inode(seq->file));
 
 	udptable = udp_get_table_afinfo(afinfo, seq_file_net(seq));
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
  2023-03-30 15:17 [PATCH v5 bpf-next 0/7] bpf: Add socket destroy capability Aditi Ghag
  2023-03-30 15:17 ` [PATCH v5 bpf-next 1/7] bpf: tcp: Avoid taking fast sock lock in iterator Aditi Ghag
  2023-03-30 15:17 ` [PATCH v5 bpf-next 2/7] udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state Aditi Ghag
@ 2023-03-30 15:17 ` Aditi Ghag
  2023-03-30 18:40   ` kernel test robot
                     ` (4 more replies)
  2023-03-30 15:17 ` [PATCH v5 bpf-next 4/7] bpf: udp: Implement batching for sockets iterator Aditi Ghag
                   ` (3 subsequent siblings)
  6 siblings, 5 replies; 33+ messages in thread
From: Aditi Ghag @ 2023-03-30 15:17 UTC (permalink / raw)
  To: bpf; +Cc: kafai, sdf, edumazet, aditi.ghag

This is a preparatory commit to refactor code that matches socket
attributes in iterators to a helper function, and use it in the
proc fs iterator.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
---
 net/ipv4/udp.c | 35 ++++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c574c8c17ec9..cead4acb64c6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2983,6 +2983,8 @@ EXPORT_SYMBOL(udp_prot);
 /* ------------------------------------------------------------------------ */
 #ifdef CONFIG_PROC_FS
 
+static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk);
+
 static struct udp_table *udp_get_table_afinfo(struct udp_seq_afinfo *afinfo,
 					      struct net *net)
 {
@@ -3010,10 +3012,7 @@ static struct sock *udp_get_first(struct seq_file *seq, int start)
 
 		spin_lock_bh(&hslot->lock);
 		sk_for_each(sk, &hslot->head) {
-			if (!net_eq(sock_net(sk), net))
-				continue;
-			if (afinfo->family == AF_UNSPEC ||
-			    sk->sk_family == afinfo->family)
+			if (seq_sk_match(seq, sk))
 				goto found;
 		}
 		spin_unlock_bh(&hslot->lock);
@@ -3034,9 +3033,7 @@ static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk)
 
 	do {
 		sk = sk_next(sk);
-	} while (sk && (!net_eq(sock_net(sk), net) ||
-			(afinfo->family != AF_UNSPEC &&
-			 sk->sk_family != afinfo->family)));
+	} while (sk && !seq_sk_match(seq, sk));
 
 	if (!sk) {
 		udptable = udp_get_table_afinfo(afinfo, net);
@@ -3143,6 +3140,17 @@ struct bpf_iter__udp {
 	int bucket __aligned(8);
 };
 
+static unsigned short seq_file_family(const struct seq_file *seq);
+
+static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)
+{
+	unsigned short family = seq_file_family(seq);
+
+	/* AF_UNSPEC is used as a match all */
+	return ((family == AF_UNSPEC || family == sk->sk_family) &&
+		net_eq(sock_net(sk), seq_file_net(seq)));
+}
+
 static int udp_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta,
 			     struct udp_sock *udp_sk, uid_t uid, int bucket)
 {
@@ -3194,6 +3202,19 @@ static const struct seq_operations bpf_iter_udp_seq_ops = {
 	.stop		= bpf_iter_udp_seq_stop,
 	.show		= bpf_iter_udp_seq_show,
 };
+
+static unsigned short seq_file_family(const struct seq_file *seq)
+{
+	const struct udp_seq_afinfo *afinfo;
+
+	/* BPF iterator: bpf programs to filter sockets. */
+	if (seq->op == &bpf_iter_udp_seq_ops)
+		return AF_UNSPEC;
+
+	/* Proc fs iterator */
+	afinfo = pde_data(file_inode(seq->file));
+	return afinfo->family;
+}
 #endif
 
 const struct seq_operations udp_seq_ops = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 bpf-next 4/7] bpf: udp: Implement batching for sockets iterator
  2023-03-30 15:17 [PATCH v5 bpf-next 0/7] bpf: Add socket destroy capability Aditi Ghag
                   ` (2 preceding siblings ...)
  2023-03-30 15:17 ` [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes Aditi Ghag
@ 2023-03-30 15:17 ` Aditi Ghag
  2023-03-31 21:08   ` Martin KaFai Lau
  2023-03-30 15:17 ` [PATCH v5 bpf-next 5/7] bpf: Add bpf_sock_destroy kfunc Aditi Ghag
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Aditi Ghag @ 2023-03-30 15:17 UTC (permalink / raw)
  To: bpf; +Cc: kafai, sdf, edumazet, aditi.ghag, Martin KaFai Lau

Batch UDP sockets from BPF iterator that allows for overlapping locking
semantics in BPF/kernel helpers executed in BPF programs.  This facilitates
BPF socket destroy kfunc (introduced by follow-up patches) to execute from
BPF iterator programs.

Previously, BPF iterators acquired the sock lock and sockets hash table
bucket lock while executing BPF programs. This prevented BPF helpers that
again acquire these locks to be executed from BPF iterators.  With the
batching approach, we acquire a bucket lock, batch all the bucket sockets,
and then release the bucket lock. This enables BPF or kernel helpers to
skip sock locking when invoked in the supported BPF contexts.

The batching logic is similar to the logic implemented in TCP iterator:
https://lore.kernel.org/bpf/20210701200613.1036157-1-kafai@fb.com/.

Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
---
 net/ipv4/udp.c | 230 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 213 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cead4acb64c6..9af23d1c8d6b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -3140,7 +3140,19 @@ struct bpf_iter__udp {
 	int bucket __aligned(8);
 };
 
+struct bpf_udp_iter_state {
+	struct udp_iter_state state;
+	unsigned int cur_sk;
+	unsigned int end_sk;
+	unsigned int max_sk;
+	int offset;
+	struct sock **batch;
+	bool st_bucket_done;
+};
+
 static unsigned short seq_file_family(const struct seq_file *seq);
+static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter,
+				      unsigned int new_batch_sz);
 
 static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)
 {
@@ -3151,6 +3163,149 @@ static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)
 		net_eq(sock_net(sk), seq_file_net(seq)));
 }
 
+static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
+{
+	struct bpf_udp_iter_state *iter = seq->private;
+	struct udp_iter_state *state = &iter->state;
+	struct net *net = seq_file_net(seq);
+	struct sock *first_sk = NULL;
+	struct udp_seq_afinfo afinfo;
+	struct udp_table *udptable;
+	unsigned int batch_sks = 0;
+	bool resized = false;
+	struct sock *sk;
+	int offset = 0;
+	int new_offset;
+
+	/* The current batch is done, so advance the bucket. */
+	if (iter->st_bucket_done) {
+		state->bucket++;
+		iter->offset = 0;
+	}
+
+	afinfo.family = AF_UNSPEC;
+	afinfo.udp_table = NULL;
+	udptable = udp_get_table_afinfo(&afinfo, net);
+
+	if (state->bucket > udptable->mask) {
+		state->bucket = 0;
+		iter->offset = 0;
+		return NULL;
+	}
+
+again:
+	/* New batch for the next bucket.
+	 * Iterate over the hash table to find a bucket with sockets matching
+	 * the iterator attributes, and return the first matching socket from
+	 * the bucket. The remaining matched sockets from the bucket are batched
+	 * before releasing the bucket lock. This allows BPF programs that are
+	 * called in seq_show to acquire the bucket lock if needed.
+	 */
+	iter->cur_sk = 0;
+	iter->end_sk = 0;
+	iter->st_bucket_done = false;
+	first_sk = NULL;
+	batch_sks = 0;
+	offset = iter->offset;
+
+	for (; state->bucket <= udptable->mask; state->bucket++) {
+		struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
+
+		if (hlist_empty(&hslot2->head)) {
+			offset = 0;
+			continue;
+		}
+		new_offset = offset;
+
+		spin_lock_bh(&hslot2->lock);
+		udp_portaddr_for_each_entry(sk, &hslot2->head) {
+			if (seq_sk_match(seq, sk)) {
+				/* Resume from the last iterated socket at the
+				 * offset in the bucket before iterator was stopped.
+				 */
+				if (offset) {
+					--offset;
+					continue;
+				}
+				if (!first_sk)
+					first_sk = sk;
+				if (iter->end_sk < iter->max_sk) {
+					sock_hold(sk);
+					iter->batch[iter->end_sk++] = sk;
+				}
+				batch_sks++;
+				new_offset++;
+			}
+		}
+		spin_unlock_bh(&hslot2->lock);
+
+		if (first_sk)
+			break;
+
+		/* Reset the current bucket's offset before moving to the next bucket. */
+		offset = 0;
+	}
+
+	/* All done: no batch made. */
+	if (!first_sk)
+		goto ret;
+
+	if (iter->end_sk == batch_sks) {
+		/* Batching is done for the current bucket; return the first
+		 * socket to be iterated from the batch.
+		 */
+		iter->st_bucket_done = true;
+		goto ret;
+	}
+	if (!resized && !bpf_iter_udp_realloc_batch(iter, batch_sks * 3 / 2)) {
+		resized = true;
+		/* Go back to the previous bucket to resize its batch. */
+		state->bucket--;
+		goto again;
+	}
+ret:
+	iter->offset = new_offset;
+	return first_sk;
+}
+
+static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct bpf_udp_iter_state *iter = seq->private;
+	struct sock *sk;
+
+	/* Whenever seq_next() is called, the iter->cur_sk is
+	 * done with seq_show(), so unref the iter->cur_sk.
+	 */
+	if (iter->cur_sk < iter->end_sk) {
+		sock_put(iter->batch[iter->cur_sk++]);
+		++iter->offset;
+	}
+
+	/* After updating iter->cur_sk, check if there are more sockets
+	 * available in the current bucket batch.
+	 */
+	if (iter->cur_sk < iter->end_sk) {
+		sk = iter->batch[iter->cur_sk];
+	} else {
+		// Prepare a new batch.
+		sk = bpf_iter_udp_batch(seq);
+	}
+
+	++*pos;
+	return sk;
+}
+
+static void *bpf_iter_udp_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	/* bpf iter does not support lseek, so it always
+	 * continue from where it was stop()-ped.
+	 */
+	if (*pos)
+		return bpf_iter_udp_batch(seq);
+
+	return SEQ_START_TOKEN;
+}
+
 static int udp_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta,
 			     struct udp_sock *udp_sk, uid_t uid, int bucket)
 {
@@ -3171,18 +3326,37 @@ static int bpf_iter_udp_seq_show(struct seq_file *seq, void *v)
 	struct bpf_prog *prog;
 	struct sock *sk = v;
 	uid_t uid;
+	int rc;
 
 	if (v == SEQ_START_TOKEN)
 		return 0;
 
+	lock_sock(sk);
+
+	if (unlikely(sk_unhashed(sk))) {
+		rc = SEQ_SKIP;
+		goto unlock;
+	}
+
 	uid = from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk));
 	meta.seq = seq;
 	prog = bpf_iter_get_info(&meta, false);
-	return udp_prog_seq_show(prog, &meta, v, uid, state->bucket);
+	rc = udp_prog_seq_show(prog, &meta, v, uid, state->bucket);
+
+unlock:
+	release_sock(sk);
+	return rc;
+}
+
+static void bpf_iter_udp_put_batch(struct bpf_udp_iter_state *iter)
+{
+	while (iter->cur_sk < iter->end_sk)
+		sock_put(iter->batch[iter->cur_sk++]);
 }
 
 static void bpf_iter_udp_seq_stop(struct seq_file *seq, void *v)
 {
+	struct bpf_udp_iter_state *iter = seq->private;
 	struct bpf_iter_meta meta;
 	struct bpf_prog *prog;
 
@@ -3193,12 +3367,15 @@ static void bpf_iter_udp_seq_stop(struct seq_file *seq, void *v)
 			(void)udp_prog_seq_show(prog, &meta, v, 0, 0);
 	}
 
-	udp_seq_stop(seq, v);
+	if (iter->cur_sk < iter->end_sk) {
+		bpf_iter_udp_put_batch(iter);
+		iter->st_bucket_done = false;
+	}
 }
 
 static const struct seq_operations bpf_iter_udp_seq_ops = {
-	.start		= udp_seq_start,
-	.next		= udp_seq_next,
+	.start		= bpf_iter_udp_seq_start,
+	.next		= bpf_iter_udp_seq_next,
 	.stop		= bpf_iter_udp_seq_stop,
 	.show		= bpf_iter_udp_seq_show,
 };
@@ -3425,38 +3602,57 @@ static struct pernet_operations __net_initdata udp_sysctl_ops = {
 DEFINE_BPF_ITER_FUNC(udp, struct bpf_iter_meta *meta,
 		     struct udp_sock *udp_sk, uid_t uid, int bucket)
 
-static int bpf_iter_init_udp(void *priv_data, struct bpf_iter_aux_info *aux)
+static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter,
+				      unsigned int new_batch_sz)
 {
-	struct udp_iter_state *st = priv_data;
-	struct udp_seq_afinfo *afinfo;
-	int ret;
+	struct sock **new_batch;
 
-	afinfo = kmalloc(sizeof(*afinfo), GFP_USER | __GFP_NOWARN);
-	if (!afinfo)
+	new_batch = kvmalloc_array(new_batch_sz, sizeof(*new_batch),
+				   GFP_USER | __GFP_NOWARN);
+	if (!new_batch)
 		return -ENOMEM;
 
-	afinfo->family = AF_UNSPEC;
-	afinfo->udp_table = NULL;
-	st->bpf_seq_afinfo = afinfo;
+	bpf_iter_udp_put_batch(iter);
+	kvfree(iter->batch);
+	iter->batch = new_batch;
+	iter->max_sk = new_batch_sz;
+
+	return 0;
+}
+
+#define INIT_BATCH_SZ 16
+
+static int bpf_iter_init_udp(void *priv_data, struct bpf_iter_aux_info *aux)
+{
+	struct bpf_udp_iter_state *iter = priv_data;
+	int ret;
+
 	ret = bpf_iter_init_seq_net(priv_data, aux);
 	if (ret)
-		kfree(afinfo);
+		return ret;
+
+	ret = bpf_iter_udp_realloc_batch(iter, INIT_BATCH_SZ);
+	if (ret) {
+		bpf_iter_fini_seq_net(priv_data);
+		return ret;
+	}
+
 	return ret;
 }
 
 static void bpf_iter_fini_udp(void *priv_data)
 {
-	struct udp_iter_state *st = priv_data;
+	struct bpf_udp_iter_state *iter = priv_data;
 
-	kfree(st->bpf_seq_afinfo);
 	bpf_iter_fini_seq_net(priv_data);
+	kvfree(iter->batch);
 }
 
 static const struct bpf_iter_seq_info udp_seq_info = {
 	.seq_ops		= &bpf_iter_udp_seq_ops,
 	.init_seq_private	= bpf_iter_init_udp,
 	.fini_seq_private	= bpf_iter_fini_udp,
-	.seq_priv_size		= sizeof(struct udp_iter_state),
+	.seq_priv_size		= sizeof(struct bpf_udp_iter_state),
 };
 
 static struct bpf_iter_reg udp_reg_info = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 bpf-next 5/7] bpf: Add bpf_sock_destroy kfunc
  2023-03-30 15:17 [PATCH v5 bpf-next 0/7] bpf: Add socket destroy capability Aditi Ghag
                   ` (3 preceding siblings ...)
  2023-03-30 15:17 ` [PATCH v5 bpf-next 4/7] bpf: udp: Implement batching for sockets iterator Aditi Ghag
@ 2023-03-30 15:17 ` Aditi Ghag
  2023-03-31 22:24   ` Martin KaFai Lau
  2023-03-30 15:17 ` [PATCH v5 bpf-next 6/7] selftests/bpf: Add helper to get port using getsockname Aditi Ghag
  2023-03-30 15:17 ` [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy Aditi Ghag
  6 siblings, 1 reply; 33+ messages in thread
From: Aditi Ghag @ 2023-03-30 15:17 UTC (permalink / raw)
  To: bpf; +Cc: kafai, sdf, edumazet, aditi.ghag

The socket destroy kfunc is used to forcefully terminate sockets from
certain BPF contexts. We plan to use the capability in Cilium to force
client sockets to reconnect when their remote load-balancing backends are
deleted. The other use case is on-the-fly policy enforcement where existing
socket connections prevented by policies need to be forcefully terminated.
The helper allows terminating sockets that may or may not be actively
sending traffic.

The helper is currently exposed to certain BPF iterators where users can
filter, and terminate selected sockets.  Additionally, the helper can only
be called from these BPF contexts that ensure socket locking in order to
allow synchronous execution of destroy helpers that also acquire socket
locks. The previous commit that batches UDP sockets during iteration
facilitated a synchronous invocation of the destroy helper from BPF context
by skipping taking socket locks in the destroy handler. TCP iterators
already supported batching.

The helper takes `sock_common` type argument, even though it expects, and
casts them to a `sock` pointer. This enables the verifier to allow the
sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
`sock` structs. As a comparison, BPF helpers enable this behavior with the
`ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such
option available with the verifier logic that handles kfuncs where BTF
types are inferred. Furthermore, as `sock_common` only has a subset of
certain fields of `sock`, casting pointer to the latter type might not
always be safe for certain sockets like request sockets, but these have
a special handling in the diag_destroy handlers.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
---
 net/core/filter.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp.c    | 10 ++++++---
 net/ipv4/udp.c    |  6 ++++--
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 3370efad1dda..a70c7b9876fa 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -11724,3 +11724,57 @@ static int __init bpf_kfunc_init(void)
 	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
 }
 late_initcall(bpf_kfunc_init);
+
+/* Disables missing prototype warnings */
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in vmlinux BTF");
+
+/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code.
+ *
+ * The helper expects a non-NULL pointer to a socket. It invokes the
+ * protocol specific socket destroy handlers.
+ *
+ * The helper can only be called from BPF contexts that have acquired the socket
+ * locks.
+ *
+ * Parameters:
+ * @sock: Pointer to socket to be destroyed
+ *
+ * Return:
+ * On error, may return EPROTONOSUPPORT, EINVAL.
+ * EPROTONOSUPPORT if protocol specific destroy handler is not implemented.
+ * 0 otherwise
+ */
+__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
+{
+	struct sock *sk = (struct sock *)sock;
+
+	if (!sk)
+		return -EINVAL;
+
+	/* The locking semantics that allow for synchronous execution of the
+	 * destroy handlers are only supported for TCP and UDP.
+	 */
+	if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW)
+		return -EOPNOTSUPP;
+
+	return sk->sk_prot->diag_destroy(sk, ECONNABORTED);
+}
+
+__diag_pop()
+
+BTF_SET8_START(sock_destroy_kfunc_set)
+BTF_ID_FLAGS(func, bpf_sock_destroy)
+BTF_SET8_END(sock_destroy_kfunc_set)
+
+static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &sock_destroy_kfunc_set,
+};
+
+static int init_subsystem(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set);
+}
+late_initcall(init_subsystem);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 288693981b00..2259b4facc2f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4679,8 +4679,10 @@ int tcp_abort(struct sock *sk, int err)
 		return 0;
 	}
 
-	/* Don't race with userspace socket closes such as tcp_close. */
-	lock_sock(sk);
+	/* BPF context ensures sock locking. */
+	if (!has_current_bpf_ctx())
+		/* Don't race with userspace socket closes such as tcp_close. */
+		lock_sock(sk);
 
 	if (sk->sk_state == TCP_LISTEN) {
 		tcp_set_state(sk, TCP_CLOSE);
@@ -4702,9 +4704,11 @@ int tcp_abort(struct sock *sk, int err)
 	}
 
 	bh_unlock_sock(sk);
+
 	local_bh_enable();
 	tcp_write_queue_purge(sk);
-	release_sock(sk);
+	if (!has_current_bpf_ctx())
+		release_sock(sk);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(tcp_abort);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9af23d1c8d6b..576a2ad272a7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2925,7 +2925,8 @@ EXPORT_SYMBOL(udp_poll);
 
 int udp_abort(struct sock *sk, int err)
 {
-	lock_sock(sk);
+	if (!has_current_bpf_ctx())
+		lock_sock(sk);
 
 	/* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing
 	 * with close()
@@ -2938,7 +2939,8 @@ int udp_abort(struct sock *sk, int err)
 	__udp_disconnect(sk, 0);
 
 out:
-	release_sock(sk);
+	if (!has_current_bpf_ctx())
+		release_sock(sk);
 
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 bpf-next 6/7] selftests/bpf: Add helper to get port using getsockname
  2023-03-30 15:17 [PATCH v5 bpf-next 0/7] bpf: Add socket destroy capability Aditi Ghag
                   ` (4 preceding siblings ...)
  2023-03-30 15:17 ` [PATCH v5 bpf-next 5/7] bpf: Add bpf_sock_destroy kfunc Aditi Ghag
@ 2023-03-30 15:17 ` Aditi Ghag
  2023-03-30 18:41   ` Stanislav Fomichev
  2023-03-30 15:17 ` [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy Aditi Ghag
  6 siblings, 1 reply; 33+ messages in thread
From: Aditi Ghag @ 2023-03-30 15:17 UTC (permalink / raw)
  To: bpf; +Cc: kafai, sdf, edumazet, aditi.ghag

The helper will be used to programmatically retrieve,
and pass ports in userspace and kernel selftest programs.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
---
 tools/testing/selftests/bpf/network_helpers.c | 14 ++++++++++++++
 tools/testing/selftests/bpf/network_helpers.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
index 596caa176582..4c1dc7cf7390 100644
--- a/tools/testing/selftests/bpf/network_helpers.c
+++ b/tools/testing/selftests/bpf/network_helpers.c
@@ -427,3 +427,17 @@ void close_netns(struct nstoken *token)
 	close(token->orig_netns_fd);
 	free(token);
 }
+
+int get_sock_port6(int sock_fd, __u16 *out_port)
+{
+	struct sockaddr_in6 addr = {};
+	socklen_t addr_len = sizeof(addr);
+	int err;
+
+	err = getsockname(sock_fd, (struct sockaddr *)&addr, &addr_len);
+	if (err < 0)
+		return err;
+	*out_port = addr.sin6_port;
+
+	return err;
+}
diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h
index f882c691b790..2ab3b50de0b7 100644
--- a/tools/testing/selftests/bpf/network_helpers.h
+++ b/tools/testing/selftests/bpf/network_helpers.h
@@ -56,6 +56,7 @@ int fastopen_connect(int server_fd, const char *data, unsigned int data_len,
 int make_sockaddr(int family, const char *addr_str, __u16 port,
 		  struct sockaddr_storage *addr, socklen_t *len);
 char *ping_command(int family);
+int get_sock_port6(int sock_fd, __u16 *out_port);
 
 struct nstoken;
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy
  2023-03-30 15:17 [PATCH v5 bpf-next 0/7] bpf: Add socket destroy capability Aditi Ghag
                   ` (5 preceding siblings ...)
  2023-03-30 15:17 ` [PATCH v5 bpf-next 6/7] selftests/bpf: Add helper to get port using getsockname Aditi Ghag
@ 2023-03-30 15:17 ` Aditi Ghag
  2023-03-30 18:46   ` Stanislav Fomichev
  6 siblings, 1 reply; 33+ messages in thread
From: Aditi Ghag @ 2023-03-30 15:17 UTC (permalink / raw)
  To: bpf; +Cc: kafai, sdf, edumazet, aditi.ghag

The test cases for destroying sockets mirror the intended usages of the
bpf_sock_destroy kfunc using iterators.

The destroy helpers set `ECONNABORTED` error code that we can validate in
the test code with client sockets. But UDP sockets have an overriding error
code from the disconnect called during abort, so the error code the
validation is only done for TCP sockets.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
---
 .../selftests/bpf/prog_tests/sock_destroy.c   | 203 ++++++++++++++++++
 .../selftests/bpf/progs/sock_destroy_prog.c   | 147 +++++++++++++
 2 files changed, 350 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_destroy.c
 create mode 100644 tools/testing/selftests/bpf/progs/sock_destroy_prog.c

diff --git a/tools/testing/selftests/bpf/prog_tests/sock_destroy.c b/tools/testing/selftests/bpf/prog_tests/sock_destroy.c
new file mode 100644
index 000000000000..d5d16fabac48
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/sock_destroy.c
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <bpf/bpf_endian.h>
+
+#include "sock_destroy_prog.skel.h"
+#include "network_helpers.h"
+
+static void start_iter_sockets(struct bpf_program *prog)
+{
+	struct bpf_link *link;
+	char buf[50] = {};
+	int iter_fd, len;
+
+	link = bpf_program__attach_iter(prog, NULL);
+	if (!ASSERT_OK_PTR(link, "attach_iter"))
+		return;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (!ASSERT_GE(iter_fd, 0, "create_iter"))
+		goto free_link;
+
+	while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+		;
+	ASSERT_GE(len, 0, "read");
+
+	close(iter_fd);
+
+free_link:
+	bpf_link__destroy(link);
+}
+
+static void test_tcp_client(struct sock_destroy_prog *skel)
+{
+	int serv = -1, clien = -1, n = 0;
+
+	serv = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0);
+	if (!ASSERT_GE(serv, 0, "start_server"))
+		goto cleanup_serv;
+
+	clien = connect_to_fd(serv, 0);
+	if (!ASSERT_GE(clien, 0, "connect_to_fd"))
+		goto cleanup_serv;
+
+	serv = accept(serv, NULL, NULL);
+	if (!ASSERT_GE(serv, 0, "serv accept"))
+		goto cleanup;
+
+	n = send(clien, "t", 1, 0);
+	if (!ASSERT_GE(n, 0, "client send"))
+		goto cleanup;
+
+	/* Run iterator program that destroys connected client sockets. */
+	start_iter_sockets(skel->progs.iter_tcp6_client);
+
+	n = send(clien, "t", 1, 0);
+	if (!ASSERT_LT(n, 0, "client_send on destroyed socket"))
+		goto cleanup;
+	ASSERT_EQ(errno, ECONNABORTED, "error code on destroyed socket");
+
+
+cleanup:
+	close(clien);
+cleanup_serv:
+	close(serv);
+}
+
+static void test_tcp_server(struct sock_destroy_prog *skel)
+{
+	int serv = -1, clien = -1, n = 0, err;
+	__u16 serv_port = 0;
+
+	serv = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0);
+	if (!ASSERT_GE(serv, 0, "start_server"))
+		goto cleanup_serv;
+	err = get_sock_port6(serv, &serv_port);
+	if (!ASSERT_EQ(err, 0, "get_sock_port6"))
+		goto cleanup;
+	skel->bss->serv_port = ntohs(serv_port);
+
+	clien = connect_to_fd(serv, 0);
+	if (!ASSERT_GE(clien, 0, "connect_to_fd"))
+		goto cleanup_serv;
+
+	serv = accept(serv, NULL, NULL);
+	if (!ASSERT_GE(serv, 0, "serv accept"))
+		goto cleanup;
+
+	n = send(clien, "t", 1, 0);
+	if (!ASSERT_GE(n, 0, "client send"))
+		goto cleanup;
+
+	/* Run iterator program that destroys server sockets. */
+	start_iter_sockets(skel->progs.iter_tcp6_server);
+
+	n = send(clien, "t", 1, 0);
+	if (!ASSERT_LT(n, 0, "client_send on destroyed socket"))
+		goto cleanup;
+	ASSERT_EQ(errno, ECONNRESET, "error code on destroyed socket");
+
+
+cleanup:
+	close(clien);
+cleanup_serv:
+	close(serv);
+}
+
+
+static void test_udp_client(struct sock_destroy_prog *skel)
+{
+	int serv = -1, clien = -1, n = 0;
+
+	serv = start_server(AF_INET6, SOCK_DGRAM, NULL, 0, 0);
+	if (!ASSERT_GE(serv, 0, "start_server"))
+		goto cleanup_serv;
+
+	clien = connect_to_fd(serv, 0);
+	if (!ASSERT_GE(clien, 0, "connect_to_fd"))
+		goto cleanup_serv;
+
+	n = send(clien, "t", 1, 0);
+	if (!ASSERT_GE(n, 0, "client send"))
+		goto cleanup;
+
+	/* Run iterator program that destroys sockets. */
+	start_iter_sockets(skel->progs.iter_udp6_client);
+
+	n = send(clien, "t", 1, 0);
+	if (!ASSERT_LT(n, 0, "client_send on destroyed socket"))
+		goto cleanup;
+	/* UDP sockets have an overriding error code after they are disconnected,
+	 * so we don't check for ECONNABORTED error code.
+	 */
+
+cleanup:
+	close(clien);
+cleanup_serv:
+	close(serv);
+}
+
+static void test_udp_server(struct sock_destroy_prog *skel)
+{
+	int *listen_fds = NULL, n, i, err;
+	unsigned int num_listens = 5;
+	char buf[1];
+	__u16 serv_port;
+
+	/* Start reuseport servers. */
+	listen_fds = start_reuseport_server(AF_INET6, SOCK_DGRAM,
+					    "::1", 0, 0, num_listens);
+	if (!ASSERT_OK_PTR(listen_fds, "start_reuseport_server"))
+		goto cleanup;
+	err = get_sock_port6(listen_fds[0], &serv_port);
+	if (!ASSERT_EQ(err, 0, "get_sock_port6"))
+		goto cleanup;
+	skel->bss->serv_port = ntohs(serv_port);
+
+	/* Run iterator program that destroys server sockets. */
+	start_iter_sockets(skel->progs.iter_udp6_server);
+
+	for (i = 0; i < num_listens; ++i) {
+		n = read(listen_fds[i], buf, sizeof(buf));
+		if (!ASSERT_EQ(n, -1, "read") ||
+		    !ASSERT_EQ(errno, ECONNABORTED, "error code on destroyed socket"))
+			break;
+	}
+	ASSERT_EQ(i, num_listens, "server socket");
+
+cleanup:
+	free_fds(listen_fds, num_listens);
+}
+
+void test_sock_destroy(void)
+{
+	struct sock_destroy_prog *skel;
+	int cgroup_fd = 0;
+
+	skel = sock_destroy_prog__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	cgroup_fd = test__join_cgroup("/sock_destroy");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
+		goto close_cgroup_fd;
+
+	skel->links.sock_connect = bpf_program__attach_cgroup(
+		skel->progs.sock_connect, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.sock_connect, "prog_attach"))
+		goto close_cgroup_fd;
+
+	if (test__start_subtest("tcp_client"))
+		test_tcp_client(skel);
+	if (test__start_subtest("tcp_server"))
+		test_tcp_server(skel);
+	if (test__start_subtest("udp_client"))
+		test_udp_client(skel);
+	if (test__start_subtest("udp_server"))
+		test_udp_server(skel);
+
+
+close_cgroup_fd:
+	close(cgroup_fd);
+	sock_destroy_prog__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
new file mode 100644
index 000000000000..5c1e65d50598
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "bpf_tracing_net.h"
+
+#define AF_INET6 10
+
+__u16 serv_port = 0;
+
+int bpf_sock_destroy(struct sock_common *sk) __ksym;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u64);
+} tcp_conn_sockets SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u64);
+} udp_conn_sockets SEC(".maps");
+
+SEC("cgroup/connect6")
+int sock_connect(struct bpf_sock_addr *ctx)
+{
+	int key = 0;
+	__u64 sock_cookie = 0;
+	__u32 keyc = 0;
+
+	if (ctx->family != AF_INET6 || ctx->user_family != AF_INET6)
+		return 1;
+
+	sock_cookie = bpf_get_socket_cookie(ctx);
+	if (ctx->protocol == IPPROTO_TCP)
+		bpf_map_update_elem(&tcp_conn_sockets, &key, &sock_cookie, 0);
+	else if (ctx->protocol == IPPROTO_UDP)
+		bpf_map_update_elem(&udp_conn_sockets, &keyc, &sock_cookie, 0);
+	else
+		return 1;
+
+	return 1;
+}
+
+SEC("iter/tcp")
+int iter_tcp6_client(struct bpf_iter__tcp *ctx)
+{
+	struct sock_common *sk_common = ctx->sk_common;
+	__u64 sock_cookie = 0;
+	__u64 *val;
+	int key = 0;
+
+	if (!sk_common)
+		return 0;
+
+	if (sk_common->skc_family != AF_INET6)
+		return 0;
+
+	sock_cookie  = bpf_get_socket_cookie(sk_common);
+	val = bpf_map_lookup_elem(&tcp_conn_sockets, &key);
+	if (!val)
+		return 0;
+	/* Destroy connected client sockets. */
+	if (sock_cookie == *val)
+		bpf_sock_destroy(sk_common);
+
+	return 0;
+}
+
+SEC("iter/tcp")
+int iter_tcp6_server(struct bpf_iter__tcp *ctx)
+{
+	struct sock_common *sk_common = ctx->sk_common;
+	struct tcp6_sock *tcp_sk;
+	const struct inet_connection_sock *icsk;
+	const struct inet_sock *inet;
+	__u16 srcp;
+
+	if (!sk_common)
+		return 0;
+
+	if (sk_common->skc_family != AF_INET6)
+		return 0;
+
+	tcp_sk = bpf_skc_to_tcp6_sock(sk_common);
+	if (!tcp_sk)
+		return 0;
+
+	icsk = &tcp_sk->tcp.inet_conn;
+	inet = &icsk->icsk_inet;
+	srcp = bpf_ntohs(inet->inet_sport);
+
+	/* Destroy server sockets. */
+	if (srcp == serv_port)
+		bpf_sock_destroy(sk_common);
+
+	return 0;
+}
+
+
+SEC("iter/udp")
+int iter_udp6_client(struct bpf_iter__udp *ctx)
+{
+	struct udp_sock *udp_sk = ctx->udp_sk;
+	struct sock *sk = (struct sock *) udp_sk;
+	__u64 sock_cookie = 0, *val;
+	int key = 0;
+
+	if (!sk)
+		return 0;
+
+	sock_cookie  = bpf_get_socket_cookie(sk);
+	val = bpf_map_lookup_elem(&udp_conn_sockets, &key);
+	if (!val)
+		return 0;
+	/* Destroy connected client sockets. */
+	if (sock_cookie == *val)
+		bpf_sock_destroy((struct sock_common *)sk);
+
+	return 0;
+}
+
+SEC("iter/udp")
+int iter_udp6_server(struct bpf_iter__udp *ctx)
+{
+	struct udp_sock *udp_sk = ctx->udp_sk;
+	struct sock *sk = (struct sock *) udp_sk;
+	__u16 srcp;
+	struct inet_sock *inet;
+
+	if (!sk)
+		return 0;
+
+	inet = &udp_sk->inet;
+	srcp = bpf_ntohs(inet->inet_sport);
+	if (srcp == serv_port)
+		bpf_sock_destroy((struct sock_common *)sk);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 2/7] udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state
  2023-03-30 15:17 ` [PATCH v5 bpf-next 2/7] udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state Aditi Ghag
@ 2023-03-30 17:35   ` Martin KaFai Lau
  0 siblings, 0 replies; 33+ messages in thread
From: Martin KaFai Lau @ 2023-03-30 17:35 UTC (permalink / raw)
  To: Aditi Ghag; +Cc: sdf, edumazet, Martin KaFai Lau, bpf

On 3/30/23 8:17 AM, Aditi Ghag wrote:
> This is a preparatory commit to remove the field. The field was
> previously shared between proc fs and BPF UDP socket iterators. As the
> follow-up commits will decouple the implementation for the iterators,
> remove the field. As for BPF socket iterator, filtering of sockets is
> exepected to be done in BPF programs. >
> Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
> ---
>   include/net/udp.h |  1 -
>   net/ipv4/udp.c    | 15 +++------------
>   2 files changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/include/net/udp.h b/include/net/udp.h
> index de4b528522bb..5cad44318d71 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -437,7 +437,6 @@ struct udp_seq_afinfo {
>   struct udp_iter_state {
>   	struct seq_net_private  p;
>   	int			bucket;
> -	struct udp_seq_afinfo	*bpf_seq_afinfo;

This patch does not compile because the remaining st->bpf_seq_afinfo usages are 
only removed in a later patch 4.
https://patchwork.hopto.org/static/nipa/735459/13194325/build_clang/stderr

Mostly a heads up. No need to re-spin now. Review can be continued on this revision.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
  2023-03-30 15:17 ` [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes Aditi Ghag
@ 2023-03-30 18:40   ` kernel test robot
  2023-03-30 18:51   ` kernel test robot
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2023-03-30 18:40 UTC (permalink / raw)
  To: Aditi Ghag, bpf; +Cc: oe-kbuild-all, kafai, sdf, edumazet, aditi.ghag

Hi Aditi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Aditi-Ghag/bpf-tcp-Avoid-taking-fast-sock-lock-in-iterator/20230330-232137
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230330151758.531170-4-aditi.ghag%40isovalent.com
patch subject: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
config: nios2-defconfig (https://download.01.org/0day-ci/archive/20230331/202303310212.8DXGg50J-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/66cc617bebf6cb3d2162587d6e248d86bc59c1c2
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Aditi-Ghag/bpf-tcp-Avoid-taking-fast-sock-lock-in-iterator/20230330-232137
        git checkout 66cc617bebf6cb3d2162587d6e248d86bc59c1c2
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=nios2 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=nios2 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303310212.8DXGg50J-lkp@intel.com/

All errors (new ones prefixed by >>):

   nios2-linux-ld: net/ipv4/udp.o: in function `udp_get_first':
>> net/ipv4/udp.c:3015: undefined reference to `seq_sk_match'
   net/ipv4/udp.c:3015:(.text+0x4a8): relocation truncated to fit: R_NIOS2_CALL26 against `seq_sk_match'
   nios2-linux-ld: net/ipv4/udp.o: in function `udp_get_next':
   net/ipv4/udp.c:3036: undefined reference to `seq_sk_match'
   net/ipv4/udp.c:3036:(.text+0x55c): relocation truncated to fit: R_NIOS2_CALL26 against `seq_sk_match'


vim +3015 net/ipv4/udp.c

  2993	
  2994	static struct sock *udp_get_first(struct seq_file *seq, int start)
  2995	{
  2996		struct udp_iter_state *state = seq->private;
  2997		struct net *net = seq_file_net(seq);
  2998		struct udp_seq_afinfo *afinfo;
  2999		struct udp_table *udptable;
  3000		struct sock *sk;
  3001	
  3002		afinfo = pde_data(file_inode(seq->file));
  3003	
  3004		udptable = udp_get_table_afinfo(afinfo, net);
  3005	
  3006		for (state->bucket = start; state->bucket <= udptable->mask;
  3007		     ++state->bucket) {
  3008			struct udp_hslot *hslot = &udptable->hash[state->bucket];
  3009	
  3010			if (hlist_empty(&hslot->head))
  3011				continue;
  3012	
  3013			spin_lock_bh(&hslot->lock);
  3014			sk_for_each(sk, &hslot->head) {
> 3015				if (seq_sk_match(seq, sk))
  3016					goto found;
  3017			}
  3018			spin_unlock_bh(&hslot->lock);
  3019		}
  3020		sk = NULL;
  3021	found:
  3022		return sk;
  3023	}
  3024	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 6/7] selftests/bpf: Add helper to get port using getsockname
  2023-03-30 15:17 ` [PATCH v5 bpf-next 6/7] selftests/bpf: Add helper to get port using getsockname Aditi Ghag
@ 2023-03-30 18:41   ` Stanislav Fomichev
  2023-03-31 21:37     ` Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislav Fomichev @ 2023-03-30 18:41 UTC (permalink / raw)
  To: Aditi Ghag; +Cc: bpf, kafai, edumazet

On 03/30, Aditi Ghag wrote:
> The helper will be used to programmatically retrieve,
> and pass ports in userspace and kernel selftest programs.

> Suggested-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
> ---
>   tools/testing/selftests/bpf/network_helpers.c | 14 ++++++++++++++
>   tools/testing/selftests/bpf/network_helpers.h |  1 +
>   2 files changed, 15 insertions(+)

> diff --git a/tools/testing/selftests/bpf/network_helpers.c  
> b/tools/testing/selftests/bpf/network_helpers.c
> index 596caa176582..4c1dc7cf7390 100644
> --- a/tools/testing/selftests/bpf/network_helpers.c
> +++ b/tools/testing/selftests/bpf/network_helpers.c
> @@ -427,3 +427,17 @@ void close_netns(struct nstoken *token)
>   	close(token->orig_netns_fd);
>   	free(token);
>   }
> +
> +int get_sock_port6(int sock_fd, __u16 *out_port)
> +{
> +	struct sockaddr_in6 addr = {};
> +	socklen_t addr_len = sizeof(addr);
> +	int err;
> +
> +	err = getsockname(sock_fd, (struct sockaddr *)&addr, &addr_len);
> +	if (err < 0)
> +		return err;
> +	*out_port = addr.sin6_port;

The rest of the helpers don't usually care about v4 vs v6.
Making it work for both v4 and v6 seems trivial, so maybe let's do it?

> +
> +	return err;
> +}
> diff --git a/tools/testing/selftests/bpf/network_helpers.h  
> b/tools/testing/selftests/bpf/network_helpers.h
> index f882c691b790..2ab3b50de0b7 100644
> --- a/tools/testing/selftests/bpf/network_helpers.h
> +++ b/tools/testing/selftests/bpf/network_helpers.h
> @@ -56,6 +56,7 @@ int fastopen_connect(int server_fd, const char *data,  
> unsigned int data_len,
>   int make_sockaddr(int family, const char *addr_str, __u16 port,
>   		  struct sockaddr_storage *addr, socklen_t *len);
>   char *ping_command(int family);
> +int get_sock_port6(int sock_fd, __u16 *out_port);

>   struct nstoken;
>   /**
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy
  2023-03-30 15:17 ` [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy Aditi Ghag
@ 2023-03-30 18:46   ` Stanislav Fomichev
  2023-03-31 22:32     ` Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislav Fomichev @ 2023-03-30 18:46 UTC (permalink / raw)
  To: Aditi Ghag; +Cc: bpf, kafai, edumazet

On 03/30, Aditi Ghag wrote:
> The test cases for destroying sockets mirror the intended usages of the
> bpf_sock_destroy kfunc using iterators.

> The destroy helpers set `ECONNABORTED` error code that we can validate in
> the test code with client sockets. But UDP sockets have an overriding  
> error
> code from the disconnect called during abort, so the error code the
> validation is only done for TCP sockets.

> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
> ---
>   .../selftests/bpf/prog_tests/sock_destroy.c   | 203 ++++++++++++++++++
>   .../selftests/bpf/progs/sock_destroy_prog.c   | 147 +++++++++++++
>   2 files changed, 350 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_destroy.c
>   create mode 100644 tools/testing/selftests/bpf/progs/sock_destroy_prog.c

> diff --git a/tools/testing/selftests/bpf/prog_tests/sock_destroy.c  
> b/tools/testing/selftests/bpf/prog_tests/sock_destroy.c
> new file mode 100644
> index 000000000000..d5d16fabac48
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/sock_destroy.c
> @@ -0,0 +1,203 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <test_progs.h>
> +#include <bpf/bpf_endian.h>
> +
> +#include "sock_destroy_prog.skel.h"
> +#include "network_helpers.h"
> +
> +static void start_iter_sockets(struct bpf_program *prog)
> +{
> +	struct bpf_link *link;
> +	char buf[50] = {};
> +	int iter_fd, len;
> +
> +	link = bpf_program__attach_iter(prog, NULL);
> +	if (!ASSERT_OK_PTR(link, "attach_iter"))
> +		return;
> +
> +	iter_fd = bpf_iter_create(bpf_link__fd(link));
> +	if (!ASSERT_GE(iter_fd, 0, "create_iter"))
> +		goto free_link;
> +
> +	while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
> +		;
> +	ASSERT_GE(len, 0, "read");
> +
> +	close(iter_fd);
> +
> +free_link:
> +	bpf_link__destroy(link);
> +}
> +
> +static void test_tcp_client(struct sock_destroy_prog *skel)
> +{
> +	int serv = -1, clien = -1, n = 0;
> +
> +	serv = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0);
> +	if (!ASSERT_GE(serv, 0, "start_server"))
> +		goto cleanup_serv;
> +
> +	clien = connect_to_fd(serv, 0);
> +	if (!ASSERT_GE(clien, 0, "connect_to_fd"))
> +		goto cleanup_serv;
> +
> +	serv = accept(serv, NULL, NULL);
> +	if (!ASSERT_GE(serv, 0, "serv accept"))
> +		goto cleanup;
> +
> +	n = send(clien, "t", 1, 0);
> +	if (!ASSERT_GE(n, 0, "client send"))
> +		goto cleanup;
> +
> +	/* Run iterator program that destroys connected client sockets. */
> +	start_iter_sockets(skel->progs.iter_tcp6_client);
> +
> +	n = send(clien, "t", 1, 0);
> +	if (!ASSERT_LT(n, 0, "client_send on destroyed socket"))
> +		goto cleanup;
> +	ASSERT_EQ(errno, ECONNABORTED, "error code on destroyed socket");
> +
> +
> +cleanup:
> +	close(clien);
> +cleanup_serv:
> +	close(serv);
> +}
> +
> +static void test_tcp_server(struct sock_destroy_prog *skel)
> +{
> +	int serv = -1, clien = -1, n = 0, err;
> +	__u16 serv_port = 0;
> +
> +	serv = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0);
> +	if (!ASSERT_GE(serv, 0, "start_server"))
> +		goto cleanup_serv;
> +	err = get_sock_port6(serv, &serv_port);
> +	if (!ASSERT_EQ(err, 0, "get_sock_port6"))
> +		goto cleanup;
> +	skel->bss->serv_port = ntohs(serv_port);

This looks great, thank you for removing those hard codes!

I have one optional nit, feel free to ignore: you do ntohs here and in
the bpf program. Why not store the port in network byte order? Then
you can avoid all those ntohs and compare the ports directly...

> +
> +	clien = connect_to_fd(serv, 0);
> +	if (!ASSERT_GE(clien, 0, "connect_to_fd"))
> +		goto cleanup_serv;
> +
> +	serv = accept(serv, NULL, NULL);
> +	if (!ASSERT_GE(serv, 0, "serv accept"))
> +		goto cleanup;
> +
> +	n = send(clien, "t", 1, 0);
> +	if (!ASSERT_GE(n, 0, "client send"))
> +		goto cleanup;
> +
> +	/* Run iterator program that destroys server sockets. */
> +	start_iter_sockets(skel->progs.iter_tcp6_server);
> +
> +	n = send(clien, "t", 1, 0);
> +	if (!ASSERT_LT(n, 0, "client_send on destroyed socket"))
> +		goto cleanup;
> +	ASSERT_EQ(errno, ECONNRESET, "error code on destroyed socket");
> +
> +
> +cleanup:
> +	close(clien);
> +cleanup_serv:
> +	close(serv);
> +}
> +
> +
> +static void test_udp_client(struct sock_destroy_prog *skel)
> +{
> +	int serv = -1, clien = -1, n = 0;
> +
> +	serv = start_server(AF_INET6, SOCK_DGRAM, NULL, 0, 0);
> +	if (!ASSERT_GE(serv, 0, "start_server"))
> +		goto cleanup_serv;
> +
> +	clien = connect_to_fd(serv, 0);
> +	if (!ASSERT_GE(clien, 0, "connect_to_fd"))
> +		goto cleanup_serv;
> +
> +	n = send(clien, "t", 1, 0);
> +	if (!ASSERT_GE(n, 0, "client send"))
> +		goto cleanup;
> +
> +	/* Run iterator program that destroys sockets. */
> +	start_iter_sockets(skel->progs.iter_udp6_client);
> +
> +	n = send(clien, "t", 1, 0);
> +	if (!ASSERT_LT(n, 0, "client_send on destroyed socket"))
> +		goto cleanup;
> +	/* UDP sockets have an overriding error code after they are  
> disconnected,
> +	 * so we don't check for ECONNABORTED error code.
> +	 */
> +
> +cleanup:
> +	close(clien);
> +cleanup_serv:
> +	close(serv);
> +}
> +
> +static void test_udp_server(struct sock_destroy_prog *skel)
> +{
> +	int *listen_fds = NULL, n, i, err;
> +	unsigned int num_listens = 5;
> +	char buf[1];
> +	__u16 serv_port;
> +
> +	/* Start reuseport servers. */
> +	listen_fds = start_reuseport_server(AF_INET6, SOCK_DGRAM,
> +					    "::1", 0, 0, num_listens);
> +	if (!ASSERT_OK_PTR(listen_fds, "start_reuseport_server"))
> +		goto cleanup;
> +	err = get_sock_port6(listen_fds[0], &serv_port);
> +	if (!ASSERT_EQ(err, 0, "get_sock_port6"))
> +		goto cleanup;
> +	skel->bss->serv_port = ntohs(serv_port);
> +
> +	/* Run iterator program that destroys server sockets. */
> +	start_iter_sockets(skel->progs.iter_udp6_server);
> +
> +	for (i = 0; i < num_listens; ++i) {
> +		n = read(listen_fds[i], buf, sizeof(buf));
> +		if (!ASSERT_EQ(n, -1, "read") ||
> +		    !ASSERT_EQ(errno, ECONNABORTED, "error code on destroyed socket"))
> +			break;
> +	}
> +	ASSERT_EQ(i, num_listens, "server socket");
> +
> +cleanup:
> +	free_fds(listen_fds, num_listens);
> +}
> +
> +void test_sock_destroy(void)
> +{
> +	struct sock_destroy_prog *skel;
> +	int cgroup_fd = 0;
> +
> +	skel = sock_destroy_prog__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "skel_open"))
> +		return;
> +
> +	cgroup_fd = test__join_cgroup("/sock_destroy");
> +	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
> +		goto close_cgroup_fd;
> +
> +	skel->links.sock_connect = bpf_program__attach_cgroup(
> +		skel->progs.sock_connect, cgroup_fd);
> +	if (!ASSERT_OK_PTR(skel->links.sock_connect, "prog_attach"))
> +		goto close_cgroup_fd;
> +
> +	if (test__start_subtest("tcp_client"))
> +		test_tcp_client(skel);
> +	if (test__start_subtest("tcp_server"))
> +		test_tcp_server(skel);
> +	if (test__start_subtest("udp_client"))
> +		test_udp_client(skel);
> +	if (test__start_subtest("udp_server"))
> +		test_udp_server(skel);
> +
> +
> +close_cgroup_fd:
> +	close(cgroup_fd);
> +	sock_destroy_prog__destroy(skel);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c  
> b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
> new file mode 100644
> index 000000000000..5c1e65d50598
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
> @@ -0,0 +1,147 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_endian.h>
> +
> +#include "bpf_tracing_net.h"
> +
> +#define AF_INET6 10
> +
> +__u16 serv_port = 0;
> +
> +int bpf_sock_destroy(struct sock_common *sk) __ksym;
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, __u64);
> +} tcp_conn_sockets SEC(".maps");
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, __u64);
> +} udp_conn_sockets SEC(".maps");
> +
> +SEC("cgroup/connect6")
> +int sock_connect(struct bpf_sock_addr *ctx)
> +{
> +	int key = 0;
> +	__u64 sock_cookie = 0;
> +	__u32 keyc = 0;
> +
> +	if (ctx->family != AF_INET6 || ctx->user_family != AF_INET6)
> +		return 1;
> +
> +	sock_cookie = bpf_get_socket_cookie(ctx);
> +	if (ctx->protocol == IPPROTO_TCP)
> +		bpf_map_update_elem(&tcp_conn_sockets, &key, &sock_cookie, 0);
> +	else if (ctx->protocol == IPPROTO_UDP)
> +		bpf_map_update_elem(&udp_conn_sockets, &keyc, &sock_cookie, 0);
> +	else
> +		return 1;
> +
> +	return 1;
> +}
> +
> +SEC("iter/tcp")
> +int iter_tcp6_client(struct bpf_iter__tcp *ctx)
> +{
> +	struct sock_common *sk_common = ctx->sk_common;
> +	__u64 sock_cookie = 0;
> +	__u64 *val;
> +	int key = 0;
> +
> +	if (!sk_common)
> +		return 0;
> +
> +	if (sk_common->skc_family != AF_INET6)
> +		return 0;
> +
> +	sock_cookie  = bpf_get_socket_cookie(sk_common);
> +	val = bpf_map_lookup_elem(&tcp_conn_sockets, &key);
> +	if (!val)
> +		return 0;
> +	/* Destroy connected client sockets. */
> +	if (sock_cookie == *val)
> +		bpf_sock_destroy(sk_common);
> +
> +	return 0;
> +}
> +
> +SEC("iter/tcp")
> +int iter_tcp6_server(struct bpf_iter__tcp *ctx)
> +{
> +	struct sock_common *sk_common = ctx->sk_common;
> +	struct tcp6_sock *tcp_sk;
> +	const struct inet_connection_sock *icsk;
> +	const struct inet_sock *inet;
> +	__u16 srcp;
> +
> +	if (!sk_common)
> +		return 0;
> +
> +	if (sk_common->skc_family != AF_INET6)
> +		return 0;
> +
> +	tcp_sk = bpf_skc_to_tcp6_sock(sk_common);
> +	if (!tcp_sk)
> +		return 0;
> +
> +	icsk = &tcp_sk->tcp.inet_conn;
> +	inet = &icsk->icsk_inet;
> +	srcp = bpf_ntohs(inet->inet_sport);
> +
> +	/* Destroy server sockets. */
> +	if (srcp == serv_port)
> +		bpf_sock_destroy(sk_common);
> +
> +	return 0;
> +}
> +
> +
> +SEC("iter/udp")
> +int iter_udp6_client(struct bpf_iter__udp *ctx)
> +{
> +	struct udp_sock *udp_sk = ctx->udp_sk;
> +	struct sock *sk = (struct sock *) udp_sk;
> +	__u64 sock_cookie = 0, *val;
> +	int key = 0;
> +
> +	if (!sk)
> +		return 0;
> +
> +	sock_cookie  = bpf_get_socket_cookie(sk);
> +	val = bpf_map_lookup_elem(&udp_conn_sockets, &key);
> +	if (!val)
> +		return 0;
> +	/* Destroy connected client sockets. */
> +	if (sock_cookie == *val)
> +		bpf_sock_destroy((struct sock_common *)sk);
> +
> +	return 0;
> +}
> +
> +SEC("iter/udp")
> +int iter_udp6_server(struct bpf_iter__udp *ctx)
> +{
> +	struct udp_sock *udp_sk = ctx->udp_sk;
> +	struct sock *sk = (struct sock *) udp_sk;
> +	__u16 srcp;
> +	struct inet_sock *inet;
> +
> +	if (!sk)
> +		return 0;
> +
> +	inet = &udp_sk->inet;
> +	srcp = bpf_ntohs(inet->inet_sport);
> +	if (srcp == serv_port)
> +		bpf_sock_destroy((struct sock_common *)sk);
> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
  2023-03-30 15:17 ` [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes Aditi Ghag
  2023-03-30 18:40   ` kernel test robot
@ 2023-03-30 18:51   ` kernel test robot
  2023-03-31  2:52   ` kernel test robot
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2023-03-30 18:51 UTC (permalink / raw)
  To: Aditi Ghag, bpf; +Cc: oe-kbuild-all, kafai, sdf, edumazet, aditi.ghag

Hi Aditi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Aditi-Ghag/bpf-tcp-Avoid-taking-fast-sock-lock-in-iterator/20230330-232137
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230330151758.531170-4-aditi.ghag%40isovalent.com
patch subject: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
config: parisc-defconfig (https://download.01.org/0day-ci/archive/20230331/202303310242.QYEDFSaE-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/66cc617bebf6cb3d2162587d6e248d86bc59c1c2
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Aditi-Ghag/bpf-tcp-Avoid-taking-fast-sock-lock-in-iterator/20230330-232137
        git checkout 66cc617bebf6cb3d2162587d6e248d86bc59c1c2
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=parisc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=parisc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303310242.QYEDFSaE-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

>> net/ipv4/udp.c:2986:20: warning: 'seq_sk_match' used but never defined
    2986 | static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk);
         |                    ^~~~~~~~~~~~
--
   hppa-linux-ld: net/ipv4/udp.o: in function `udp_get_first':
>> (.text+0x11b0): undefined reference to `seq_sk_match'
   hppa-linux-ld: net/ipv4/udp.o: in function `udp_get_next':
   (.text+0x123c): undefined reference to `seq_sk_match'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
  2023-03-30 15:17 ` [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes Aditi Ghag
  2023-03-30 18:40   ` kernel test robot
  2023-03-30 18:51   ` kernel test robot
@ 2023-03-31  2:52   ` kernel test robot
  2023-03-31 20:09   ` Martin KaFai Lau
  2023-04-02  6:18   ` kernel test robot
  4 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2023-03-31  2:52 UTC (permalink / raw)
  To: Aditi Ghag, bpf; +Cc: llvm, oe-kbuild-all, kafai, sdf, edumazet, aditi.ghag

Hi Aditi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Aditi-Ghag/bpf-tcp-Avoid-taking-fast-sock-lock-in-iterator/20230330-232137
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230330151758.531170-4-aditi.ghag%40isovalent.com
patch subject: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
config: i386-randconfig-a006 (https://download.01.org/0day-ci/archive/20230331/202303311022.2GVz6yAt-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/66cc617bebf6cb3d2162587d6e248d86bc59c1c2
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Aditi-Ghag/bpf-tcp-Avoid-taking-fast-sock-lock-in-iterator/20230330-232137
        git checkout 66cc617bebf6cb3d2162587d6e248d86bc59c1c2
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash net/ipv4/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303311022.2GVz6yAt-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> net/ipv4/udp.c:2986:20: warning: function 'seq_sk_match' has internal linkage but is not defined [-Wundefined-internal]
   static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk);
                      ^
   net/ipv4/udp.c:3015:8: note: used here
                           if (seq_sk_match(seq, sk))
                               ^
   1 warning generated.


vim +/seq_sk_match +2986 net/ipv4/udp.c

  2985	
> 2986	static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk);
  2987	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
  2023-03-30 15:17 ` [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes Aditi Ghag
                     ` (2 preceding siblings ...)
  2023-03-31  2:52   ` kernel test robot
@ 2023-03-31 20:09   ` Martin KaFai Lau
  2023-04-03 15:27     ` Aditi Ghag
  2023-04-02  6:18   ` kernel test robot
  4 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2023-03-31 20:09 UTC (permalink / raw)
  To: Aditi Ghag; +Cc: kafai, sdf, edumazet, bpf

On 3/30/23 8:17 AM, Aditi Ghag wrote:
> This is a preparatory commit to refactor code that matches socket
> attributes in iterators to a helper function, and use it in the
> proc fs iterator.
> 
> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
> ---
>   net/ipv4/udp.c | 35 ++++++++++++++++++++++++++++-------
>   1 file changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index c574c8c17ec9..cead4acb64c6 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2983,6 +2983,8 @@ EXPORT_SYMBOL(udp_prot);
>   /* ------------------------------------------------------------------------ */
>   #ifdef CONFIG_PROC_FS
>   
> +static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk);
> +
>   static struct udp_table *udp_get_table_afinfo(struct udp_seq_afinfo *afinfo,
>   					      struct net *net)
>   {
> @@ -3010,10 +3012,7 @@ static struct sock *udp_get_first(struct seq_file *seq, int start)
>   
>   		spin_lock_bh(&hslot->lock);
>   		sk_for_each(sk, &hslot->head) {
> -			if (!net_eq(sock_net(sk), net))
> -				continue;
> -			if (afinfo->family == AF_UNSPEC ||
> -			    sk->sk_family == afinfo->family)
> +			if (seq_sk_match(seq, sk))
>   				goto found;
>   		}
>   		spin_unlock_bh(&hslot->lock);
> @@ -3034,9 +3033,7 @@ static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk)
>   
>   	do {
>   		sk = sk_next(sk);
> -	} while (sk && (!net_eq(sock_net(sk), net) ||
> -			(afinfo->family != AF_UNSPEC &&
> -			 sk->sk_family != afinfo->family)));
> +	} while (sk && !seq_sk_match(seq, sk));
>   
>   	if (!sk) {
>   		udptable = udp_get_table_afinfo(afinfo, net);
> @@ -3143,6 +3140,17 @@ struct bpf_iter__udp {
>   	int bucket __aligned(8);
>   };
>   
> +static unsigned short seq_file_family(const struct seq_file *seq);
> +
> +static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)

This won't work under CONFIG_BPF_SYSCALL. The bot also complained.
It has to be under CONFIG_PROG_FS.

No need to use inline and leave it to compiler.

The earlier forward declaration is unnecessary. Define the function earlier instead.

> +{
> +	unsigned short family = seq_file_family(seq);
> +
> +	/* AF_UNSPEC is used as a match all */
> +	return ((family == AF_UNSPEC || family == sk->sk_family) &&
> +		net_eq(sock_net(sk), seq_file_net(seq)));
> +}
> +
>   static int udp_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta,
>   			     struct udp_sock *udp_sk, uid_t uid, int bucket)
>   {
> @@ -3194,6 +3202,19 @@ static const struct seq_operations bpf_iter_udp_seq_ops = {
>   	.stop		= bpf_iter_udp_seq_stop,
>   	.show		= bpf_iter_udp_seq_show,
>   };
> +
> +static unsigned short seq_file_family(const struct seq_file *seq)

The same here because seq_sk_match() uses seq_file_family().

> +{
> +	const struct udp_seq_afinfo *afinfo;
> +
> +	/* BPF iterator: bpf programs to filter sockets. */
> +	if (seq->op == &bpf_iter_udp_seq_ops)

Note that bpf_iter_udp_seq_ops is only defined under CONFIG_BPF_SYSCALL though.
See how the seq_file_family() is handling it in tcp_ipv4.c.

> +		return AF_UNSPEC;
> +
> +	/* Proc fs iterator */
> +	afinfo = pde_data(file_inode(seq->file));
> +	return afinfo->family;
> +}
>   #endif
>   
>   const struct seq_operations udp_seq_ops = {


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 4/7] bpf: udp: Implement batching for sockets iterator
  2023-03-30 15:17 ` [PATCH v5 bpf-next 4/7] bpf: udp: Implement batching for sockets iterator Aditi Ghag
@ 2023-03-31 21:08   ` Martin KaFai Lau
  2023-04-03 15:54     ` Aditi Ghag
  0 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2023-03-31 21:08 UTC (permalink / raw)
  To: Aditi Ghag; +Cc: kafai, sdf, edumazet, Martin KaFai Lau, bpf

On 3/30/23 8:17 AM, Aditi Ghag wrote:
> +static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter,
> +				      unsigned int new_batch_sz);
>   
>   static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)
>   {
> @@ -3151,6 +3163,149 @@ static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)
>   		net_eq(sock_net(sk), seq_file_net(seq)));
>   }
>   
> +static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
> +{
> +	struct bpf_udp_iter_state *iter = seq->private;
> +	struct udp_iter_state *state = &iter->state;
> +	struct net *net = seq_file_net(seq);
> +	struct sock *first_sk = NULL;
> +	struct udp_seq_afinfo afinfo;
> +	struct udp_table *udptable;
> +	unsigned int batch_sks = 0;
> +	bool resized = false;
> +	struct sock *sk;
> +	int offset = 0;
> +	int new_offset;
> +
> +	/* The current batch is done, so advance the bucket. */
> +	if (iter->st_bucket_done) {
> +		state->bucket++;
> +		iter->offset = 0;
> +	}
> +
> +	afinfo.family = AF_UNSPEC;
> +	afinfo.udp_table = NULL;
> +	udptable = udp_get_table_afinfo(&afinfo, net);
> +
> +	if (state->bucket > udptable->mask) {

This test looks unnecessary. The for-loop below should take care of this case?

> +		state->bucket = 0;

Reset state->bucket here looks suspicious (or at least unnecessary) also. The 
iterator cannot restart from the beginning. or I am missing something here? This 
at least requires a comment if it is really needed.

> +		iter->offset = 0;
> +		return NULL;
> +	}
> +
> +again:
> +	/* New batch for the next bucket.
> +	 * Iterate over the hash table to find a bucket with sockets matching
> +	 * the iterator attributes, and return the first matching socket from
> +	 * the bucket. The remaining matched sockets from the bucket are batched
> +	 * before releasing the bucket lock. This allows BPF programs that are
> +	 * called in seq_show to acquire the bucket lock if needed.
> +	 */
> +	iter->cur_sk = 0;
> +	iter->end_sk = 0;
> +	iter->st_bucket_done = false;
> +	first_sk = NULL;
> +	batch_sks = 0;
> +	offset = iter->offset;
> +
> +	for (; state->bucket <= udptable->mask; state->bucket++) {
> +		struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
> +
> +		if (hlist_empty(&hslot2->head)) {
> +			offset = 0;
> +			continue;
> +		}
> +		new_offset = offset;
> +
> +		spin_lock_bh(&hslot2->lock);
> +		udp_portaddr_for_each_entry(sk, &hslot2->head) {
> +			if (seq_sk_match(seq, sk)) {
> +				/* Resume from the last iterated socket at the
> +				 * offset in the bucket before iterator was stopped.
> +				 */
> +				if (offset) {
> +					--offset;
> +					continue;
> +				}
> +				if (!first_sk)
> +					first_sk = sk;
> +				if (iter->end_sk < iter->max_sk) {
> +					sock_hold(sk);
> +					iter->batch[iter->end_sk++] = sk;
> +				}
> +				batch_sks++;
> +				new_offset++;
> +			}
> +		}
> +		spin_unlock_bh(&hslot2->lock);
> +
> +		if (first_sk)
> +			break;
> +
> +		/* Reset the current bucket's offset before moving to the next bucket. */
> +		offset = 0;
> +	}
> +
> +	/* All done: no batch made. */
> +	if (!first_sk)

Testing !iter->end_sk should be the same?

> +		goto ret;
> +
> +	if (iter->end_sk == batch_sks) {
> +		/* Batching is done for the current bucket; return the first
> +		 * socket to be iterated from the batch.
> +		 */
> +		iter->st_bucket_done = true;
> +		goto ret;
> +	}
> +	if (!resized && !bpf_iter_udp_realloc_batch(iter, batch_sks * 3 / 2)) {
> +		resized = true;
> +		/* Go back to the previous bucket to resize its batch. */
> +		state->bucket--;
> +		goto again;
> +	}
> +ret:
> +	iter->offset = new_offset;

hmm... updating iter->offset looks not right and,
does it need a new_offset?

afaict, either

a) it can resume at the old bucket. In this case, the iter->offset should not 
change.

or

b) it moved to the next bucket and iter->offset should be 0.

> +	return first_sk;

&iter->batch[0] is the first_sk. 'first_sk' variable is not needed then.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 6/7] selftests/bpf: Add helper to get port using getsockname
  2023-03-30 18:41   ` Stanislav Fomichev
@ 2023-03-31 21:37     ` Martin KaFai Lau
  0 siblings, 0 replies; 33+ messages in thread
From: Martin KaFai Lau @ 2023-03-31 21:37 UTC (permalink / raw)
  To: Stanislav Fomichev, Aditi Ghag; +Cc: bpf, kafai, edumazet

On 3/30/23 11:41 AM, Stanislav Fomichev wrote:
>> +int get_sock_port6(int sock_fd, __u16 *out_port)
>> +{
>> +    struct sockaddr_in6 addr = {};
>> +    socklen_t addr_len = sizeof(addr);
>> +    int err;
>> +
>> +    err = getsockname(sock_fd, (struct sockaddr *)&addr, &addr_len);
>> +    if (err < 0)
>> +        return err;
>> +    *out_port = addr.sin6_port;
> 
> The rest of the helpers don't usually care about v4 vs v6.
> Making it work for both v4 and v6 seems trivial, so maybe let's do it?

A nit on top of this. Rename it to 'int get_local_port(int sock_fd)' such that 
it is clear which port it is getting.

> 
>> +
>> +    return err;
>> +}
>> diff --git a/tools/testing/selftests/bpf/network_helpers.h 
>> b/tools/testing/selftests/bpf/network_helpers.h
>> index f882c691b790..2ab3b50de0b7 100644
>> --- a/tools/testing/selftests/bpf/network_helpers.h
>> +++ b/tools/testing/selftests/bpf/network_helpers.h
>> @@ -56,6 +56,7 @@ int fastopen_connect(int server_fd, const char *data, 
>> unsigned int data_len,
>>   int make_sockaddr(int family, const char *addr_str, __u16 port,
>>             struct sockaddr_storage *addr, socklen_t *len);
>>   char *ping_command(int family);
>> +int get_sock_port6(int sock_fd, __u16 *out_port);
> 
>>   struct nstoken;
>>   /**
>> -- 
>> 2.34.1
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 5/7] bpf: Add bpf_sock_destroy kfunc
  2023-03-30 15:17 ` [PATCH v5 bpf-next 5/7] bpf: Add bpf_sock_destroy kfunc Aditi Ghag
@ 2023-03-31 22:24   ` Martin KaFai Lau
  2023-04-04  6:09     ` [RFC PATCH bpf-next] bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set' Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2023-03-31 22:24 UTC (permalink / raw)
  To: Aditi Ghag, bpf; +Cc: kafai, sdf, edumazet

On 3/30/23 8:17 AM, Aditi Ghag wrote:
> +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code.
> + *
> + * The helper expects a non-NULL pointer to a socket. It invokes the
> + * protocol specific socket destroy handlers.
> + *
> + * The helper can only be called from BPF contexts that have acquired the socket
> + * locks.
> + *
> + * Parameters:
> + * @sock: Pointer to socket to be destroyed
> + *
> + * Return:
> + * On error, may return EPROTONOSUPPORT, EINVAL.
> + * EPROTONOSUPPORT if protocol specific destroy handler is not implemented.
> + * 0 otherwise
> + */
> +__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
> +{
> +	struct sock *sk = (struct sock *)sock;
> +
> +	if (!sk)
> +		return -EINVAL;
> +
> +	/* The locking semantics that allow for synchronous execution of the
> +	 * destroy handlers are only supported for TCP and UDP.
> +	 */
> +	if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW)
> +		return -EOPNOTSUPP;
> +
> +	return sk->sk_prot->diag_destroy(sk, ECONNABORTED);
> +}
> +
> +__diag_pop()
> +
> +BTF_SET8_START(sock_destroy_kfunc_set)
> +BTF_ID_FLAGS(func, bpf_sock_destroy)
> +BTF_SET8_END(sock_destroy_kfunc_set)
> +
> +static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
> +	.owner = THIS_MODULE,
> +	.set   = &sock_destroy_kfunc_set,
> +};
> +
> +static int init_subsystem(void)
> +{
> +	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set);

This still needs to be restricted to BPF_TRACE_ITER which has held the 
lock_sock. May be adding BTF_KFUNC_HOOK_TRACING_ITER. Then, 
register_btf_kfunc_id_set() and btf_kfunc_id_set_contains() need to consider the 
prog->expected_attach_type also?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy
  2023-03-30 18:46   ` Stanislav Fomichev
@ 2023-03-31 22:32     ` Martin KaFai Lau
  2023-04-03 15:55       ` Aditi Ghag
  0 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2023-03-31 22:32 UTC (permalink / raw)
  To: Aditi Ghag; +Cc: bpf, kafai, edumazet, Stanislav Fomichev

On 3/30/23 11:46 AM, Stanislav Fomichev wrote:
>> +void test_sock_destroy(void)
>> +{
>> +    struct sock_destroy_prog *skel;
>> +    int cgroup_fd = 0;
>> +
>> +    skel = sock_destroy_prog__open_and_load();
>> +    if (!ASSERT_OK_PTR(skel, "skel_open"))
>> +        return;
>> +
>> +    cgroup_fd = test__join_cgroup("/sock_destroy");

Please run this test in its own netns also to avoid affecting other tests as 
much as possible.

>> +    if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
>> +        goto close_cgroup_fd;
>> +
>> +    skel->links.sock_connect = bpf_program__attach_cgroup(
>> +        skel->progs.sock_connect, cgroup_fd);
>> +    if (!ASSERT_OK_PTR(skel->links.sock_connect, "prog_attach"))
>> +        goto close_cgroup_fd;
>> +
>> +    if (test__start_subtest("tcp_client"))
>> +        test_tcp_client(skel);
>> +    if (test__start_subtest("tcp_server"))
>> +        test_tcp_server(skel);
>> +    if (test__start_subtest("udp_client"))
>> +        test_udp_client(skel);
>> +    if (test__start_subtest("udp_server"))
>> +        test_udp_server(skel);
>> +
>> +
>> +close_cgroup_fd:
>> +    close(cgroup_fd);
>> +    sock_destroy_prog__destroy(skel);
>> +}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
  2023-03-30 15:17 ` [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes Aditi Ghag
                     ` (3 preceding siblings ...)
  2023-03-31 20:09   ` Martin KaFai Lau
@ 2023-04-02  6:18   ` kernel test robot
  4 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2023-04-02  6:18 UTC (permalink / raw)
  To: Aditi Ghag, bpf; +Cc: oe-kbuild-all, kafai, sdf, edumazet, aditi.ghag

Hi Aditi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Aditi-Ghag/bpf-tcp-Avoid-taking-fast-sock-lock-in-iterator/20230330-232137
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230330151758.531170-4-aditi.ghag%40isovalent.com
patch subject: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
config: i386-randconfig-s001 (https://download.01.org/0day-ci/archive/20230402/202304021454.snASElSi-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-39-gce1a6720-dirty
        # https://github.com/intel-lab-lkp/linux/commit/66cc617bebf6cb3d2162587d6e248d86bc59c1c2
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Aditi-Ghag/bpf-tcp-Avoid-taking-fast-sock-lock-in-iterator/20230330-232137
        git checkout 66cc617bebf6cb3d2162587d6e248d86bc59c1c2
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 olddefconfig
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash net/ipv4/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304021454.snASElSi-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
   net/ipv4/udp.c:1487:28: sparse: sparse: context imbalance in 'udp_rmem_release' - unexpected unlock
   net/ipv4/udp.c:1519:19: sparse: sparse: context imbalance in 'busylock_acquire' - wrong count at exit
   net/ipv4/udp.c:1531:28: sparse: sparse: context imbalance in 'busylock_release' - unexpected unlock
>> net/ipv4/udp.c:2986:32: sparse: sparse: marked inline, but without a definition
>> net/ipv4/udp.c:2986:32: sparse: sparse: marked inline, but without a definition

vim +2986 net/ipv4/udp.c

  2985	
> 2986	static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk);
  2987	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes
  2023-03-31 20:09   ` Martin KaFai Lau
@ 2023-04-03 15:27     ` Aditi Ghag
  0 siblings, 0 replies; 33+ messages in thread
From: Aditi Ghag @ 2023-04-03 15:27 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: kafai, sdf, edumazet, bpf



> On Mar 31, 2023, at 1:09 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> 
> On 3/30/23 8:17 AM, Aditi Ghag wrote:
>> This is a preparatory commit to refactor code that matches socket
>> attributes in iterators to a helper function, and use it in the
>> proc fs iterator.
>> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
>> ---
>>  net/ipv4/udp.c | 35 ++++++++++++++++++++++++++++-------
>>  1 file changed, 28 insertions(+), 7 deletions(-)
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index c574c8c17ec9..cead4acb64c6 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -2983,6 +2983,8 @@ EXPORT_SYMBOL(udp_prot);
>>  /* ------------------------------------------------------------------------ */
>>  #ifdef CONFIG_PROC_FS
>>  +static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk);
>> +
>>  static struct udp_table *udp_get_table_afinfo(struct udp_seq_afinfo *afinfo,
>>  					      struct net *net)
>>  {
>> @@ -3010,10 +3012,7 @@ static struct sock *udp_get_first(struct seq_file *seq, int start)
>>    		spin_lock_bh(&hslot->lock);
>>  		sk_for_each(sk, &hslot->head) {
>> -			if (!net_eq(sock_net(sk), net))
>> -				continue;
>> -			if (afinfo->family == AF_UNSPEC ||
>> -			    sk->sk_family == afinfo->family)
>> +			if (seq_sk_match(seq, sk))
>>  				goto found;
>>  		}
>>  		spin_unlock_bh(&hslot->lock);
>> @@ -3034,9 +3033,7 @@ static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk)
>>    	do {
>>  		sk = sk_next(sk);
>> -	} while (sk && (!net_eq(sock_net(sk), net) ||
>> -			(afinfo->family != AF_UNSPEC &&
>> -			 sk->sk_family != afinfo->family)));
>> +	} while (sk && !seq_sk_match(seq, sk));
>>    	if (!sk) {
>>  		udptable = udp_get_table_afinfo(afinfo, net);
>> @@ -3143,6 +3140,17 @@ struct bpf_iter__udp {
>>  	int bucket __aligned(8);
>>  };
>>  +static unsigned short seq_file_family(const struct seq_file *seq);
>> +
>> +static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)
> 
> This won't work under CONFIG_BPF_SYSCALL. The bot also complained.
> It has to be under CONFIG_PROG_FS.
> 
> No need to use inline and leave it to compiler.
> 
> The earlier forward declaration is unnecessary. Define the function earlier instead.


Yes, I got notifications from the kernel test bot, so I've already made the necessary changes.

> 
>> +{
>> +	unsigned short family = seq_file_family(seq);
>> +
>> +	/* AF_UNSPEC is used as a match all */
>> +	return ((family == AF_UNSPEC || family == sk->sk_family) &&
>> +		net_eq(sock_net(sk), seq_file_net(seq)));
>> +}
>> +
>>  static int udp_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta,
>>  			     struct udp_sock *udp_sk, uid_t uid, int bucket)
>>  {
>> @@ -3194,6 +3202,19 @@ static const struct seq_operations bpf_iter_udp_seq_ops = {
>>  	.stop		= bpf_iter_udp_seq_stop,
>>  	.show		= bpf_iter_udp_seq_show,
>>  };
>> +
>> +static unsigned short seq_file_family(const struct seq_file *seq)
> 
> The same here because seq_sk_match() uses seq_file_family().
> 
>> +{
>> +	const struct udp_seq_afinfo *afinfo;
>> +
>> +	/* BPF iterator: bpf programs to filter sockets. */
>> +	if (seq->op == &bpf_iter_udp_seq_ops)
> 
> Note that bpf_iter_udp_seq_ops is only defined under CONFIG_BPF_SYSCALL though.
> See how the seq_file_family() is handling it in tcp_ipv4.c.

Same as above. 

> 
>> +		return AF_UNSPEC;
>> +
>> +	/* Proc fs iterator */
>> +	afinfo = pde_data(file_inode(seq->file));
>> +	return afinfo->family;
>> +}
>>  #endif
>>    const struct seq_operations udp_seq_ops = {


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 4/7] bpf: udp: Implement batching for sockets iterator
  2023-03-31 21:08   ` Martin KaFai Lau
@ 2023-04-03 15:54     ` Aditi Ghag
  2023-04-03 19:20       ` Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Aditi Ghag @ 2023-04-03 15:54 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: kafai, Stanislav Fomichev, edumazet, Martin KaFai Lau, bpf



> On Mar 31, 2023, at 2:08 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> 
> On 3/30/23 8:17 AM, Aditi Ghag wrote:
>> +static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter,
>> +				      unsigned int new_batch_sz);
>>    static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)
>>  {
>> @@ -3151,6 +3163,149 @@ static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)
>>  		net_eq(sock_net(sk), seq_file_net(seq)));
>>  }
>>  +static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
>> +{
>> +	struct bpf_udp_iter_state *iter = seq->private;
>> +	struct udp_iter_state *state = &iter->state;
>> +	struct net *net = seq_file_net(seq);
>> +	struct sock *first_sk = NULL;
>> +	struct udp_seq_afinfo afinfo;
>> +	struct udp_table *udptable;
>> +	unsigned int batch_sks = 0;
>> +	bool resized = false;
>> +	struct sock *sk;
>> +	int offset = 0;
>> +	int new_offset;
>> +
>> +	/* The current batch is done, so advance the bucket. */
>> +	if (iter->st_bucket_done) {
>> +		state->bucket++;
>> +		iter->offset = 0;
>> +	}
>> +
>> +	afinfo.family = AF_UNSPEC;
>> +	afinfo.udp_table = NULL;
>> +	udptable = udp_get_table_afinfo(&afinfo, net);
>> +
>> +	if (state->bucket > udptable->mask) {
> 
> This test looks unnecessary. The for-loop below should take care of this case?

We could return early in case the iterator has reached the end of the hash table. I suppose reset of the bucket should only happen when user stops, and starts a new iterator round.  

> 
>> +		state->bucket = 0;
> 
> Reset state->bucket here looks suspicious (or at least unnecessary) also. The iterator cannot restart from the beginning. or I am missing something here? This at least requires a comment if it is really needed. 
> 
>> +		iter->offset = 0;
>> +		return NULL;
>> +	}
>> +
>> +again:
>> +	/* New batch for the next bucket.
>> +	 * Iterate over the hash table to find a bucket with sockets matching
>> +	 * the iterator attributes, and return the first matching socket from
>> +	 * the bucket. The remaining matched sockets from the bucket are batched
>> +	 * before releasing the bucket lock. This allows BPF programs that are
>> +	 * called in seq_show to acquire the bucket lock if needed.
>> +	 */
>> +	iter->cur_sk = 0;
>> +	iter->end_sk = 0;
>> +	iter->st_bucket_done = false;
>> +	first_sk = NULL;
>> +	batch_sks = 0;
>> +	offset = iter->offset;
>> +
>> +	for (; state->bucket <= udptable->mask; state->bucket++) {
>> +		struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
>> +
>> +		if (hlist_empty(&hslot2->head)) {
>> +			offset = 0;
>> +			continue;
>> +		}
>> +		new_offset = offset;
>> +
>> +		spin_lock_bh(&hslot2->lock);
>> +		udp_portaddr_for_each_entry(sk, &hslot2->head) {
>> +			if (seq_sk_match(seq, sk)) {
>> +				/* Resume from the last iterated socket at the
>> +				 * offset in the bucket before iterator was stopped.
>> +				 */
>> +				if (offset) {
>> +					--offset;
>> +					continue;
>> +				}
>> +				if (!first_sk)
>> +					first_sk = sk;
>> +				if (iter->end_sk < iter->max_sk) {
>> +					sock_hold(sk);
>> +					iter->batch[iter->end_sk++] = sk;
>> +				}
>> +				batch_sks++;
>> +				new_offset++;
>> +			}
>> +		}
>> +		spin_unlock_bh(&hslot2->lock);
>> +
>> +		if (first_sk)
>> +			break;
>> +
>> +		/* Reset the current bucket's offset before moving to the next bucket. */
>> +		offset = 0;
>> +	}
>> +
>> +	/* All done: no batch made. */
>> +	if (!first_sk)
> 
> Testing !iter->end_sk should be the same?

Sure, that could work too.

> 
>> +		goto ret;
>> +
>> +	if (iter->end_sk == batch_sks) {
>> +		/* Batching is done for the current bucket; return the first
>> +		 * socket to be iterated from the batch.
>> +		 */
>> +		iter->st_bucket_done = true;
>> +		goto ret;
>> +	}
>> +	if (!resized && !bpf_iter_udp_realloc_batch(iter, batch_sks * 3 / 2)) {
>> +		resized = true;
>> +		/* Go back to the previous bucket to resize its batch. */
>> +		state->bucket--;
>> +		goto again;
>> +	}
>> +ret:
>> +	iter->offset = new_offset;
> 
> hmm... updating iter->offset looks not right and,
> does it need a new_offset?
> 

This is a leftover from earlier versions. :( Sorry, I didn't do my due diligence here. 

> afaict, either
> 
> a) it can resume at the old bucket. In this case, the iter->offset should not change.
> 
> or
> 
> b) it moved to the next bucket and iter->offset should be 0.

Yes, that's the behavior I had in mind as well. 

> 
>> +	return first_sk;
> 
> &iter->batch[0] is the first_sk. 'first_sk' variable is not needed then.

It's possible that we didn't find any socket to return, or resized batch didn't go through. So we can't always rely on iter->batch[0]. As an alternative, we could return early when a socket is found. Anyway either option seems fine.   

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy
  2023-03-31 22:32     ` Martin KaFai Lau
@ 2023-04-03 15:55       ` Aditi Ghag
  2023-04-03 17:35         ` Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Aditi Ghag @ 2023-04-03 15:55 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: bpf, kafai, edumazet, Stanislav Fomichev



> On Mar 31, 2023, at 3:32 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> 
> On 3/30/23 11:46 AM, Stanislav Fomichev wrote:
>>> +void test_sock_destroy(void)
>>> +{
>>> +    struct sock_destroy_prog *skel;
>>> +    int cgroup_fd = 0;
>>> +
>>> +    skel = sock_destroy_prog__open_and_load();
>>> +    if (!ASSERT_OK_PTR(skel, "skel_open"))
>>> +        return;
>>> +
>>> +    cgroup_fd = test__join_cgroup("/sock_destroy");
> 
> Please run this test in its own netns also to avoid affecting other tests as much as possible.

Is it okay if I defer this nit to a follow-up patch? It's not conflicting with other tests at the moment. 

> 
>>> +    if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
>>> +        goto close_cgroup_fd;
>>> +
>>> +    skel->links.sock_connect = bpf_program__attach_cgroup(
>>> +        skel->progs.sock_connect, cgroup_fd);
>>> +    if (!ASSERT_OK_PTR(skel->links.sock_connect, "prog_attach"))
>>> +        goto close_cgroup_fd;
>>> +
>>> +    if (test__start_subtest("tcp_client"))
>>> +        test_tcp_client(skel);
>>> +    if (test__start_subtest("tcp_server"))
>>> +        test_tcp_server(skel);
>>> +    if (test__start_subtest("udp_client"))
>>> +        test_udp_client(skel);
>>> +    if (test__start_subtest("udp_server"))
>>> +        test_udp_server(skel);
>>> +
>>> +
>>> +close_cgroup_fd:
>>> +    close(cgroup_fd);
>>> +    sock_destroy_prog__destroy(skel);
>>> +}
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy
  2023-04-03 15:55       ` Aditi Ghag
@ 2023-04-03 17:35         ` Martin KaFai Lau
  2023-04-04  0:15           ` Aditi Ghag
  0 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2023-04-03 17:35 UTC (permalink / raw)
  To: Aditi Ghag; +Cc: bpf, kafai, edumazet, Stanislav Fomichev

On 4/3/23 8:55 AM, Aditi Ghag wrote:
> 
> 
>> On Mar 31, 2023, at 3:32 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 3/30/23 11:46 AM, Stanislav Fomichev wrote:
>>>> +void test_sock_destroy(void)
>>>> +{
>>>> +    struct sock_destroy_prog *skel;
>>>> +    int cgroup_fd = 0;
>>>> +
>>>> +    skel = sock_destroy_prog__open_and_load();
>>>> +    if (!ASSERT_OK_PTR(skel, "skel_open"))
>>>> +        return;
>>>> +
>>>> +    cgroup_fd = test__join_cgroup("/sock_destroy");
>>
>> Please run this test in its own netns also to avoid affecting other tests as much as possible.
> 
> Is it okay if I defer this nit to a follow-up patch? It's not conflicting with other tests at the moment.

Is it sure it won't affect other tests when running in parallel (test -j)?
This test is iterating all in the current netns and only checks for port before 
aborting.

It is easy to add. eg. ~10 lines of codes can be borrowed from 
prog_tests/mptcp.c which has recently done the same in commit 02d6a057c7be to 
run the test in its own netns to set a sysctl.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 4/7] bpf: udp: Implement batching for sockets iterator
  2023-04-03 15:54     ` Aditi Ghag
@ 2023-04-03 19:20       ` Martin KaFai Lau
  0 siblings, 0 replies; 33+ messages in thread
From: Martin KaFai Lau @ 2023-04-03 19:20 UTC (permalink / raw)
  To: Aditi Ghag; +Cc: Stanislav Fomichev, edumazet, Martin KaFai Lau, bpf

On 4/3/23 8:54 AM, Aditi Ghag wrote:
> 
> 
>> On Mar 31, 2023, at 2:08 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 3/30/23 8:17 AM, Aditi Ghag wrote:
>>> +static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter,
>>> +				      unsigned int new_batch_sz);
>>>     static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)
>>>   {
>>> @@ -3151,6 +3163,149 @@ static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk)
>>>   		net_eq(sock_net(sk), seq_file_net(seq)));
>>>   }
>>>   +static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
>>> +{
>>> +	struct bpf_udp_iter_state *iter = seq->private;
>>> +	struct udp_iter_state *state = &iter->state;
>>> +	struct net *net = seq_file_net(seq);
>>> +	struct sock *first_sk = NULL;
>>> +	struct udp_seq_afinfo afinfo;
>>> +	struct udp_table *udptable;
>>> +	unsigned int batch_sks = 0;
>>> +	bool resized = false;
>>> +	struct sock *sk;
>>> +	int offset = 0;
>>> +	int new_offset;
>>> +
>>> +	/* The current batch is done, so advance the bucket. */
>>> +	if (iter->st_bucket_done) {
>>> +		state->bucket++;
>>> +		iter->offset = 0;
>>> +	}
>>> +
>>> +	afinfo.family = AF_UNSPEC;
>>> +	afinfo.udp_table = NULL;
>>> +	udptable = udp_get_table_afinfo(&afinfo, net);
>>> +
>>> +	if (state->bucket > udptable->mask) {
>>
>> This test looks unnecessary. The for-loop below should take care of this case?
> 
> We could return early in case the iterator has reached the end of the hash table. I suppose reset of the bucket should only happen when user stops, and starts a new iterator round.

bucket should not go back because bpf-iter cannot lseek back. It is why I was 
confused in the following bucket reset back to zero.

It can just fall through to the following for-loop and...

> 
>>
>>> +		state->bucket = 0;
>>
>> Reset state->bucket here looks suspicious (or at least unnecessary) also. The iterator cannot restart from the beginning. or I am missing something here? This at least requires a comment if it is really needed.
>>
>>> +		iter->offset = 0;
>>> +		return NULL;
>>> +	}
>>> +
>>> +again:
>>> +	/* New batch for the next bucket.
>>> +	 * Iterate over the hash table to find a bucket with sockets matching
>>> +	 * the iterator attributes, and return the first matching socket from
>>> +	 * the bucket. The remaining matched sockets from the bucket are batched
>>> +	 * before releasing the bucket lock. This allows BPF programs that are
>>> +	 * called in seq_show to acquire the bucket lock if needed.
>>> +	 */
>>> +	iter->cur_sk = 0;
>>> +	iter->end_sk = 0;
>>> +	iter->st_bucket_done = false;
>>> +	first_sk = NULL;
>>> +	batch_sks = 0;
>>> +	offset = iter->offset;
>>> +
>>> +	for (; state->bucket <= udptable->mask; state->bucket++) {
>>> +		struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
>>> +
>>> +		if (hlist_empty(&hslot2->head)) {
>>> +			offset = 0;
>>> +			continue;
>>> +		}
>>> +		new_offset = offset;
>>> +
>>> +		spin_lock_bh(&hslot2->lock);
>>> +		udp_portaddr_for_each_entry(sk, &hslot2->head) {
>>> +			if (seq_sk_match(seq, sk)) {
>>> +				/* Resume from the last iterated socket at the
>>> +				 * offset in the bucket before iterator was stopped.
>>> +				 */
>>> +				if (offset) {
>>> +					--offset;
>>> +					continue;
>>> +				}
>>> +				if (!first_sk)
>>> +					first_sk = sk;
>>> +				if (iter->end_sk < iter->max_sk) {
>>> +					sock_hold(sk);
>>> +					iter->batch[iter->end_sk++] = sk;
>>> +				}
>>> +				batch_sks++;
>>> +				new_offset++;
>>> +			}
>>> +		}
>>> +		spin_unlock_bh(&hslot2->lock);
>>> +
>>> +		if (first_sk)
>>> +			break;
>>> +
>>> +		/* Reset the current bucket's offset before moving to the next bucket. */
>>> +		offset = 0;
>>> +	}
>>> +
>>> +	/* All done: no batch made. */
>>> +	if (!first_sk)
>>
>> Testing !iter->end_sk should be the same?
> 
> Sure, that could work too.
> 
>>
>>> +		goto ret;

... return NULL here because new_offset is not needed.

>>> +
>>> +	if (iter->end_sk == batch_sks) {
>>> +		/* Batching is done for the current bucket; return the first
>>> +		 * socket to be iterated from the batch.
>>> +		 */
>>> +		iter->st_bucket_done = true;
>>> +		goto ret;
>>> +	}
>>> +	if (!resized && !bpf_iter_udp_realloc_batch(iter, batch_sks * 3 / 2)) {
>>> +		resized = true;
>>> +		/* Go back to the previous bucket to resize its batch. */
>>> +		state->bucket--;
>>> +		goto again;
>>> +	}
>>> +ret:
>>> +	iter->offset = new_offset;
>>
>> hmm... updating iter->offset looks not right and,
>> does it need a new_offset?
>>
> 
> This is a leftover from earlier versions. :( Sorry, I didn't do my due diligence here.
> 
>> afaict, either
>>
>> a) it can resume at the old bucket. In this case, the iter->offset should not change.
>>
>> or
>>
>> b) it moved to the next bucket and iter->offset should be 0.
> 
> Yes, that's the behavior I had in mind as well.
> 
>>
>>> +	return first_sk;
>>
>> &iter->batch[0] is the first_sk. 'first_sk' variable is not needed then.
> 
> It's possible that we didn't find any socket to return, or resized batch didn't go through. So we can't always rely on iter->batch[0]. As an alternative, we could return early when a socket is found. Anyway either option seems fine.

yeah, if it didn't find any socket, I was assuming the earlier !first_sk (or 
!iter->end_sk) check should just directly return NULL because there is no need 
to change the iter->offset.

If the resize fails, it will return whatever is in iter->batch[0] anyway.

I was asking to use iter->batch[0] instead of having a first_sk is because, when 
I was trying to understand why 'new_offset++' was needed at all, the 'if 
(!first_sk) first_sk = sk;' in the above 'udp_portaddr_for_each_entry' loop kept 
coming back to my head on why it needs to be done since first_sk is just an 
alias of iter->batch[0].

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy
  2023-04-03 17:35         ` Martin KaFai Lau
@ 2023-04-04  0:15           ` Aditi Ghag
  2023-04-04  1:41             ` Martin KaFai Lau
  0 siblings, 1 reply; 33+ messages in thread
From: Aditi Ghag @ 2023-04-04  0:15 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: bpf, kafai, edumazet, Stanislav Fomichev



> On Apr 3, 2023, at 10:35 AM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> 
> On 4/3/23 8:55 AM, Aditi Ghag wrote:
>>> On Mar 31, 2023, at 3:32 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>> 
>>> On 3/30/23 11:46 AM, Stanislav Fomichev wrote:
>>>>> +void test_sock_destroy(void)
>>>>> +{
>>>>> +    struct sock_destroy_prog *skel;
>>>>> +    int cgroup_fd = 0;
>>>>> +
>>>>> +    skel = sock_destroy_prog__open_and_load();
>>>>> +    if (!ASSERT_OK_PTR(skel, "skel_open"))
>>>>> +        return;
>>>>> +
>>>>> +    cgroup_fd = test__join_cgroup("/sock_destroy");
>>> 
>>> Please run this test in its own netns also to avoid affecting other tests as much as possible.
>> Is it okay if I defer this nit to a follow-up patch? It's not conflicting with other tests at the moment.
> 
> Is it sure it won't affect other tests when running in parallel (test -j)?
> This test is iterating all in the current netns and only checks for port before aborting.
> 
> It is easy to add. eg. ~10 lines of codes can be borrowed from prog_tests/mptcp.c which has recently done the same in commit 02d6a057c7be to run the test in its own netns to set a sysctl.

I haven't run the tests in parallel, but the tests are not using hard-coded ports anymore. Anyway I'm willing to run it in a separate netns as a follow-up for additional guards, but do you still think it's a blocker for this patch?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy
  2023-04-04  0:15           ` Aditi Ghag
@ 2023-04-04  1:41             ` Martin KaFai Lau
  2023-04-04 14:24               ` Aditi Ghag
  0 siblings, 1 reply; 33+ messages in thread
From: Martin KaFai Lau @ 2023-04-04  1:41 UTC (permalink / raw)
  To: Aditi Ghag; +Cc: bpf, kafai, edumazet, Stanislav Fomichev

On 4/3/23 5:15 PM, Aditi Ghag wrote:
> 
> 
>> On Apr 3, 2023, at 10:35 AM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 4/3/23 8:55 AM, Aditi Ghag wrote:
>>>> On Mar 31, 2023, at 3:32 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>
>>>> On 3/30/23 11:46 AM, Stanislav Fomichev wrote:
>>>>>> +void test_sock_destroy(void)
>>>>>> +{
>>>>>> +    struct sock_destroy_prog *skel;
>>>>>> +    int cgroup_fd = 0;
>>>>>> +
>>>>>> +    skel = sock_destroy_prog__open_and_load();
>>>>>> +    if (!ASSERT_OK_PTR(skel, "skel_open"))
>>>>>> +        return;
>>>>>> +
>>>>>> +    cgroup_fd = test__join_cgroup("/sock_destroy");
>>>>
>>>> Please run this test in its own netns also to avoid affecting other tests as much as possible.
>>> Is it okay if I defer this nit to a follow-up patch? It's not conflicting with other tests at the moment.
>>
>> Is it sure it won't affect other tests when running in parallel (test -j)?
>> This test is iterating all in the current netns and only checks for port before aborting.
>>
>> It is easy to add. eg. ~10 lines of codes can be borrowed from prog_tests/mptcp.c which has recently done the same in commit 02d6a057c7be to run the test in its own netns to set a sysctl.
> 
> I haven't run the tests in parallel, but the tests are not using hard-coded ports anymore. Anyway I'm willing to run it in a separate netns as a follow-up for additional guards, but do you still think it's a blocker for this patch?

Testing port is not good enough. It is only like ~10 lines of codes that can be 
borrowed from other existing tests that I mentioned earlier. What is the reason 
to cut corners here? The time spent in replying on this topic is more than 
enough to add the netns support. I don't want to spend time to figure out why 
other tests running in parallel become flaky while waiting for the follow up,
so no.

Please run the test in its own netns. All new network tests must run in its own 
netns.

btw, since I don't hear any comment on patch 5 regarding to restricting the 
destroy kfunc to BPF_TRACE_ITER only. It is the major piece missing. I am 
putting some pseudo code that is more flexible than adding 
BTF_KFUNC_HOOK_TRACING_ITER that I mentioned earlier to see how it may look 
like. Will update that patch's thread soon.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH bpf-next] bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set'.
  2023-03-31 22:24   ` Martin KaFai Lau
@ 2023-04-04  6:09     ` Martin KaFai Lau
  2023-04-05 15:05       ` Aditi Ghag
  2023-04-10 23:05       ` Aditi Ghag
  0 siblings, 2 replies; 33+ messages in thread
From: Martin KaFai Lau @ 2023-04-04  6:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Aditi Ghag,
	David Vernet, kernel-team

From: Martin KaFai Lau <martin.lau@kernel.org>

This set (https://lore.kernel.org/bpf/https://lore.kernel.org/bpf/500d452b-f9d5-d01f-d365-2949c4fd37ab@linux.dev/)
needs to limit bpf_sock_destroy kfunc to BPF_TRACE_ITER.
In the earlier reply, I thought of adding a BTF_KFUNC_HOOK_TRACING_ITER.

Instead of adding BTF_KFUNC_HOOK_TRACING_ITER, I quickly hacked something
that added a callback filter to 'struct btf_kfunc_id_set'. The filter has
access to the prog such that it can filter by other properties of a prog.
The prog->expected_attached_type is used in the tracing_iter_filter().
It is mostly compiler tested only, so it is still very rough but should
be good enough to show the idea.

would like to hear how others think. It is pretty much the only
piece left for the above mentioned set.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
 include/linux/btf.h                           | 18 +++---
 kernel/bpf/btf.c                              | 59 +++++++++++++++----
 kernel/bpf/verifier.c                         |  7 ++-
 net/core/filter.c                             |  9 +++
 .../selftests/bpf/progs/sock_destroy_prog.c   | 10 ++++
 5 files changed, 83 insertions(+), 20 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index d53b10cc55f2..84c31b4f5785 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -99,10 +99,14 @@ struct btf_type;
 union bpf_attr;
 struct btf_show;
 struct btf_id_set;
+struct bpf_prog;
+
+typedef int (*btf_kfunc_filter_t)(const struct bpf_prog *prog, u32 kfunc_id);
 
 struct btf_kfunc_id_set {
 	struct module *owner;
 	struct btf_id_set8 *set;
+	btf_kfunc_filter_t filter;
 };
 
 struct btf_id_dtor_kfunc {
@@ -482,7 +486,6 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
 	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
 }
 
-struct bpf_prog;
 struct bpf_verifier_log;
 
 #ifdef CONFIG_BPF_SYSCALL
@@ -490,10 +493,10 @@ const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
 const char *btf_name_by_offset(const struct btf *btf, u32 offset);
 struct btf *btf_parse_vmlinux(void);
 struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
-u32 *btf_kfunc_id_set_contains(const struct btf *btf,
-			       enum bpf_prog_type prog_type,
-			       u32 kfunc_btf_id);
-u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id);
+u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id,
+			       const struct bpf_prog *prog);
+u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id,
+				const struct bpf_prog *prog);
 int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 			      const struct btf_kfunc_id_set *s);
 int register_btf_fmodret_id_set(const struct btf_kfunc_id_set *kset);
@@ -520,8 +523,9 @@ static inline const char *btf_name_by_offset(const struct btf *btf,
 	return NULL;
 }
 static inline u32 *btf_kfunc_id_set_contains(const struct btf *btf,
-					     enum bpf_prog_type prog_type,
-					     u32 kfunc_btf_id)
+					     u32 kfunc_btf_id,
+					     struct bpf_prog *prog)
+
 {
 	return NULL;
 }
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index b7e5a5510b91..7685af3ca9c0 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -218,10 +218,17 @@ enum btf_kfunc_hook {
 enum {
 	BTF_KFUNC_SET_MAX_CNT = 256,
 	BTF_DTOR_KFUNC_MAX_CNT = 256,
+	BTF_KFUNC_FILTER_MAX_CNT = 16,
+};
+
+struct btf_kfunc_hook_filter {
+	btf_kfunc_filter_t filters[BTF_KFUNC_FILTER_MAX_CNT];
+	u32 nr_filters;
 };
 
 struct btf_kfunc_set_tab {
 	struct btf_id_set8 *sets[BTF_KFUNC_HOOK_MAX];
+	struct btf_kfunc_hook_filter hook_filters[BTF_KFUNC_HOOK_MAX];
 };
 
 struct btf_id_dtor_kfunc_tab {
@@ -7712,9 +7719,12 @@ static int btf_check_kfunc_protos(struct btf *btf, u32 func_id, u32 func_flags)
 /* Kernel Function (kfunc) BTF ID set registration API */
 
 static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
-				  struct btf_id_set8 *add_set)
+				  const struct btf_kfunc_id_set *kset)
 {
+	struct btf_kfunc_hook_filter *hook_filter;
+	struct btf_id_set8 *add_set = kset->set;
 	bool vmlinux_set = !btf_is_module(btf);
+	bool add_filter = !!kset->filter;
 	struct btf_kfunc_set_tab *tab;
 	struct btf_id_set8 *set;
 	u32 set_cnt;
@@ -7729,6 +7739,20 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
 		return 0;
 
 	tab = btf->kfunc_set_tab;
+
+	if (tab && add_filter) {
+		int i;
+
+		hook_filter = &tab->hook_filters[hook];
+		for (i = 0; i < hook_filter->nr_filters; i++) {
+			if (hook_filter->filters[i] == kset->filter)
+				add_filter = false;
+		}
+
+		if (add_filter && hook_filter->nr_filters == BTF_KFUNC_FILTER_MAX_CNT)
+			return -E2BIG;
+	}
+
 	if (!tab) {
 		tab = kzalloc(sizeof(*tab), GFP_KERNEL | __GFP_NOWARN);
 		if (!tab)
@@ -7751,7 +7775,7 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
 	 */
 	if (!vmlinux_set) {
 		tab->sets[hook] = add_set;
-		return 0;
+		goto do_add_filter;
 	}
 
 	/* In case of vmlinux sets, there may be more than one set being
@@ -7793,6 +7817,11 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
 
 	sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL);
 
+do_add_filter:
+	if (add_filter) {
+		hook_filter = &tab->hook_filters[hook];
+		hook_filter->filters[hook_filter->nr_filters++] = kset->filter;
+	}
 	return 0;
 end:
 	btf_free_kfunc_set_tab(btf);
@@ -7801,15 +7830,22 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
 
 static u32 *__btf_kfunc_id_set_contains(const struct btf *btf,
 					enum btf_kfunc_hook hook,
+					const struct bpf_prog *prog,
 					u32 kfunc_btf_id)
 {
+	struct btf_kfunc_hook_filter *hook_filter;
 	struct btf_id_set8 *set;
-	u32 *id;
+	u32 *id, i;
 
 	if (hook >= BTF_KFUNC_HOOK_MAX)
 		return NULL;
 	if (!btf->kfunc_set_tab)
 		return NULL;
+	hook_filter = &btf->kfunc_set_tab->hook_filters[hook];
+	for (i = 0; i < hook_filter->nr_filters; i++) {
+		if (hook_filter->filters[i](prog, kfunc_btf_id))
+			return NULL;
+	}
 	set = btf->kfunc_set_tab->sets[hook];
 	if (!set)
 		return NULL;
@@ -7862,23 +7898,25 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
  * protection for looking up a well-formed btf->kfunc_set_tab.
  */
 u32 *btf_kfunc_id_set_contains(const struct btf *btf,
-			       enum bpf_prog_type prog_type,
-			       u32 kfunc_btf_id)
+			       u32 kfunc_btf_id,
+			       const struct bpf_prog *prog)
 {
+	enum bpf_prog_type prog_type = resolve_prog_type(prog);
 	enum btf_kfunc_hook hook;
 	u32 *kfunc_flags;
 
-	kfunc_flags = __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_COMMON, kfunc_btf_id);
+	kfunc_flags = __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_COMMON, prog, kfunc_btf_id);
 	if (kfunc_flags)
 		return kfunc_flags;
 
 	hook = bpf_prog_type_to_kfunc_hook(prog_type);
-	return __btf_kfunc_id_set_contains(btf, hook, kfunc_btf_id);
+	return __btf_kfunc_id_set_contains(btf, hook, prog, kfunc_btf_id);
 }
 
-u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id)
+u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id,
+				const struct bpf_prog *prog)
 {
-	return __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_FMODRET, kfunc_btf_id);
+	return __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_FMODRET, prog, kfunc_btf_id);
 }
 
 static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
@@ -7909,7 +7947,8 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
 			goto err_out;
 	}
 
-	ret = btf_populate_kfunc_set(btf, hook, kset->set);
+	ret = btf_populate_kfunc_set(btf, hook, kset);
+
 err_out:
 	btf_put(btf);
 	return ret;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 20eb2015842f..1a854cdb2566 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10553,7 +10553,7 @@ static int fetch_kfunc_meta(struct bpf_verifier_env *env,
 		*kfunc_name = func_name;
 	func_proto = btf_type_by_id(desc_btf, func->type);
 
-	kfunc_flags = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), func_id);
+	kfunc_flags = btf_kfunc_id_set_contains(desc_btf, func_id, env->prog);
 	if (!kfunc_flags) {
 		return -EACCES;
 	}
@@ -18521,7 +18521,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 				 * in the fmodret id set with the KF_SLEEPABLE flag.
 				 */
 				else {
-					u32 *flags = btf_kfunc_is_modify_return(btf, btf_id);
+					u32 *flags = btf_kfunc_is_modify_return(btf, btf_id,
+										prog);
 
 					if (flags && (*flags & KF_SLEEPABLE))
 						ret = 0;
@@ -18549,7 +18550,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 				return -EINVAL;
 			}
 			ret = -EINVAL;
-			if (btf_kfunc_is_modify_return(btf, btf_id) ||
+			if (btf_kfunc_is_modify_return(btf, btf_id, prog) ||
 			    !check_attach_modify_return(addr, tname))
 				ret = 0;
 			if (ret) {
diff --git a/net/core/filter.c b/net/core/filter.c
index a70c7b9876fa..5e5e6f9baccc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -11768,9 +11768,18 @@ BTF_SET8_START(sock_destroy_kfunc_set)
 BTF_ID_FLAGS(func, bpf_sock_destroy)
 BTF_SET8_END(sock_destroy_kfunc_set)
 
+static int tracing_iter_filter(const struct bpf_prog *prog, u32 kfunc_id)
+{
+	if (btf_id_set8_contains(&sock_destroy_kfunc_set, kfunc_id) &&
+	    prog->expected_attach_type != BPF_TRACE_ITER)
+		return -EACCES;
+	return 0;
+}
+
 static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
 	.owner = THIS_MODULE,
 	.set   = &sock_destroy_kfunc_set,
+	.filter = tracing_iter_filter,
 };
 
 static int init_subsystem(void)
diff --git a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
index 5c1e65d50598..5204e44f6ae4 100644
--- a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
+++ b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
@@ -3,6 +3,7 @@
 #include "vmlinux.h"
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_endian.h>
+#include <bpf/bpf_tracing.h>
 
 #include "bpf_tracing_net.h"
 
@@ -144,4 +145,13 @@ int iter_udp6_server(struct bpf_iter__udp *ctx)
 	return 0;
 }
 
+SEC("tp_btf/tcp_destroy_sock")
+int BPF_PROG(trace_tcp_destroy_sock, struct sock *sk)
+{
+	/* should not load */
+	bpf_sock_destroy((struct sock_common *)sk);
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy
  2023-04-04  1:41             ` Martin KaFai Lau
@ 2023-04-04 14:24               ` Aditi Ghag
  0 siblings, 0 replies; 33+ messages in thread
From: Aditi Ghag @ 2023-04-04 14:24 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: bpf, kafai, Eric Dumazet, Stanislav Fomichev



> On Apr 3, 2023, at 6:41 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> 
> On 4/3/23 5:15 PM, Aditi Ghag wrote:
>>> On Apr 3, 2023, at 10:35 AM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>> 
>>> On 4/3/23 8:55 AM, Aditi Ghag wrote:
>>>>> On Mar 31, 2023, at 3:32 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>> 
>>>>> On 3/30/23 11:46 AM, Stanislav Fomichev wrote:
>>>>>>> +void test_sock_destroy(void)
>>>>>>> +{
>>>>>>> +    struct sock_destroy_prog *skel;
>>>>>>> +    int cgroup_fd = 0;
>>>>>>> +
>>>>>>> +    skel = sock_destroy_prog__open_and_load();
>>>>>>> +    if (!ASSERT_OK_PTR(skel, "skel_open"))
>>>>>>> +        return;
>>>>>>> +
>>>>>>> +    cgroup_fd = test__join_cgroup("/sock_destroy");
>>>>> 
>>>>> Please run this test in its own netns also to avoid affecting other tests as much as possible.
>>>> Is it okay if I defer this nit to a follow-up patch? It's not conflicting with other tests at the moment.
>>> 
>>> Is it sure it won't affect other tests when running in parallel (test -j)?
>>> This test is iterating all in the current netns and only checks for port before aborting.
>>> 
>>> It is easy to add. eg. ~10 lines of codes can be borrowed from prog_tests/mptcp.c which has recently done the same in commit 02d6a057c7be to run the test in its own netns to set a sysctl.
>> I haven't run the tests in parallel, but the tests are not using hard-coded ports anymore. Anyway I'm willing to run it in a separate netns as a follow-up for additional guards, but do you still think it's a blocker for this patch?
> 
> Testing port is not good enough. It is only like ~10 lines of codes that can be borrowed from other existing tests that I mentioned earlier. What is the reason to cut corners here? The time spent in replying on this topic is more than enough to add the netns support. I don't want to spend time to figure out why other tests running in parallel become flaky while waiting for the follow up,
> so no.
> 
> Please run the test in its own netns. All new network tests must run in its own netns.

Got it. I'll take care of the test. 

> 
> btw, since I don't hear any comment on patch 5 regarding to restricting the destroy kfunc to BPF_TRACE_ITER only. It is the major piece missing. I am putting some pseudo code that is more flexible than adding BTF_KFUNC_HOOK_TRACING_ITER that I mentioned earlier to see how it may look like. Will update that patch's thread soon.
> 

Yes, this was the only open item for now. Thanks, I'll take a look at your RFC patch.




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH bpf-next] bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set'.
  2023-04-04  6:09     ` [RFC PATCH bpf-next] bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set' Martin KaFai Lau
@ 2023-04-05 15:05       ` Aditi Ghag
  2023-04-05 17:26         ` Martin KaFai Lau
  2023-04-10 23:05       ` Aditi Ghag
  1 sibling, 1 reply; 33+ messages in thread
From: Aditi Ghag @ 2023-04-05 15:05 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Vernet, kernel-team

Looks quite promising for the sock_destroy use case, and also as a generic filtering mechanism, but I'm not aware of other use cases. I haven't had a chance to apply this patch locally, but I'm planning to do it soon. Thanks!

> On Apr 3, 2023, at 11:09 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> 
> From: Martin KaFai Lau <martin.lau@kernel.org>
> 
> This set (https://lore.kernel.org/bpf/https://lore.kernel.org/bpf/500d452b-f9d5-d01f-d365-2949c4fd37ab@linux.dev/)
> needs to limit bpf_sock_destroy kfunc to BPF_TRACE_ITER.
> In the earlier reply, I thought of adding a BTF_KFUNC_HOOK_TRACING_ITER.
> 
> Instead of adding BTF_KFUNC_HOOK_TRACING_ITER, I quickly hacked something
> that added a callback filter to 'struct btf_kfunc_id_set'. The filter has
> access to the prog such that it can filter by other properties of a prog.
> The prog->expected_attached_type is used in the tracing_iter_filter().
> It is mostly compiler tested only, so it is still very rough but should
> be good enough to show the idea.
> 
> would like to hear how others think. It is pretty much the only
> piece left for the above mentioned set.
> 
> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
> ---
> include/linux/btf.h                           | 18 +++---
> kernel/bpf/btf.c                              | 59 +++++++++++++++----
> kernel/bpf/verifier.c                         |  7 ++-
> net/core/filter.c                             |  9 +++
> .../selftests/bpf/progs/sock_destroy_prog.c   | 10 ++++
> 5 files changed, 83 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index d53b10cc55f2..84c31b4f5785 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -99,10 +99,14 @@ struct btf_type;
> union bpf_attr;
> struct btf_show;
> struct btf_id_set;
> +struct bpf_prog;
> +
> +typedef int (*btf_kfunc_filter_t)(const struct bpf_prog *prog, u32 kfunc_id);
> 
> struct btf_kfunc_id_set {
> 	struct module *owner;
> 	struct btf_id_set8 *set;
> +	btf_kfunc_filter_t filter;
> };
> 
> struct btf_id_dtor_kfunc {
> @@ -482,7 +486,6 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
> 	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
> }
> 
> -struct bpf_prog;
> struct bpf_verifier_log;
> 
> #ifdef CONFIG_BPF_SYSCALL
> @@ -490,10 +493,10 @@ const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
> const char *btf_name_by_offset(const struct btf *btf, u32 offset);
> struct btf *btf_parse_vmlinux(void);
> struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
> -u32 *btf_kfunc_id_set_contains(const struct btf *btf,
> -			       enum bpf_prog_type prog_type,
> -			       u32 kfunc_btf_id);
> -u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id);
> +u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id,
> +			       const struct bpf_prog *prog);
> +u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id,
> +				const struct bpf_prog *prog);
> int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
> 			      const struct btf_kfunc_id_set *s);
> int register_btf_fmodret_id_set(const struct btf_kfunc_id_set *kset);
> @@ -520,8 +523,9 @@ static inline const char *btf_name_by_offset(const struct btf *btf,
> 	return NULL;
> }
> static inline u32 *btf_kfunc_id_set_contains(const struct btf *btf,
> -					     enum bpf_prog_type prog_type,
> -					     u32 kfunc_btf_id)
> +					     u32 kfunc_btf_id,
> +					     struct bpf_prog *prog)
> +
> {
> 	return NULL;
> }
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index b7e5a5510b91..7685af3ca9c0 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -218,10 +218,17 @@ enum btf_kfunc_hook {
> enum {
> 	BTF_KFUNC_SET_MAX_CNT = 256,
> 	BTF_DTOR_KFUNC_MAX_CNT = 256,
> +	BTF_KFUNC_FILTER_MAX_CNT = 16,
> +};
> +
> +struct btf_kfunc_hook_filter {
> +	btf_kfunc_filter_t filters[BTF_KFUNC_FILTER_MAX_CNT];
> +	u32 nr_filters;
> };
> 
> struct btf_kfunc_set_tab {
> 	struct btf_id_set8 *sets[BTF_KFUNC_HOOK_MAX];
> +	struct btf_kfunc_hook_filter hook_filters[BTF_KFUNC_HOOK_MAX];
> };
> 
> struct btf_id_dtor_kfunc_tab {
> @@ -7712,9 +7719,12 @@ static int btf_check_kfunc_protos(struct btf *btf, u32 func_id, u32 func_flags)
> /* Kernel Function (kfunc) BTF ID set registration API */
> 
> static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
> -				  struct btf_id_set8 *add_set)
> +				  const struct btf_kfunc_id_set *kset)
> {
> +	struct btf_kfunc_hook_filter *hook_filter;
> +	struct btf_id_set8 *add_set = kset->set;
> 	bool vmlinux_set = !btf_is_module(btf);
> +	bool add_filter = !!kset->filter;
> 	struct btf_kfunc_set_tab *tab;
> 	struct btf_id_set8 *set;
> 	u32 set_cnt;
> @@ -7729,6 +7739,20 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
> 		return 0;
> 
> 	tab = btf->kfunc_set_tab;
> +
> +	if (tab && add_filter) {
> +		int i;
> +
> +		hook_filter = &tab->hook_filters[hook];
> +		for (i = 0; i < hook_filter->nr_filters; i++) {
> +			if (hook_filter->filters[i] == kset->filter)
> +				add_filter = false;
> +		}

Not intimately familiar with the internals of kfuncs, so mainly for my understanding, what's the use case for having multiple filters?

> +
> +		if (add_filter && hook_filter->nr_filters == BTF_KFUNC_FILTER_MAX_CNT)
> +			return -E2BIG;
> +	}
> +
> 	if (!tab) {
> 		tab = kzalloc(sizeof(*tab), GFP_KERNEL | __GFP_NOWARN);
> 		if (!tab)
> @@ -7751,7 +7775,7 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
> 	 */
> 	if (!vmlinux_set) {
> 		tab->sets[hook] = add_set;
> -		return 0;
> +		goto do_add_filter;
> 	}
> 
> 	/* In case of vmlinux sets, there may be more than one set being
> @@ -7793,6 +7817,11 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
> 
> 	sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL);
> 
> +do_add_filter:
> +	if (add_filter) {
> +		hook_filter = &tab->hook_filters[hook];
> +		hook_filter->filters[hook_filter->nr_filters++] = kset->filter;
> +	}
> 	return 0;
> end:
> 	btf_free_kfunc_set_tab(btf);
> @@ -7801,15 +7830,22 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
> 
> static u32 *__btf_kfunc_id_set_contains(const struct btf *btf,
> 					enum btf_kfunc_hook hook,
> +					const struct bpf_prog *prog,
> 					u32 kfunc_btf_id)
> {
> +	struct btf_kfunc_hook_filter *hook_filter;
> 	struct btf_id_set8 *set;
> -	u32 *id;
> +	u32 *id, i;
> 
> 	if (hook >= BTF_KFUNC_HOOK_MAX)
> 		return NULL;
> 	if (!btf->kfunc_set_tab)
> 		return NULL;
> +	hook_filter = &btf->kfunc_set_tab->hook_filters[hook];
> +	for (i = 0; i < hook_filter->nr_filters; i++) {
> +		if (hook_filter->filters[i](prog, kfunc_btf_id))
> +			return NULL;
> +	}
> 	set = btf->kfunc_set_tab->sets[hook];
> 	if (!set)
> 		return NULL;
> @@ -7862,23 +7898,25 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
>  * protection for looking up a well-formed btf->kfunc_set_tab.
>  */
> u32 *btf_kfunc_id_set_contains(const struct btf *btf,
> -			       enum bpf_prog_type prog_type,
> -			       u32 kfunc_btf_id)
> +			       u32 kfunc_btf_id,
> +			       const struct bpf_prog *prog)
> {
> +	enum bpf_prog_type prog_type = resolve_prog_type(prog);
> 	enum btf_kfunc_hook hook;
> 	u32 *kfunc_flags;
> 
> -	kfunc_flags = __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_COMMON, kfunc_btf_id);
> +	kfunc_flags = __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_COMMON, prog, kfunc_btf_id);
> 	if (kfunc_flags)
> 		return kfunc_flags;
> 
> 	hook = bpf_prog_type_to_kfunc_hook(prog_type);
> -	return __btf_kfunc_id_set_contains(btf, hook, kfunc_btf_id);
> +	return __btf_kfunc_id_set_contains(btf, hook, prog, kfunc_btf_id);
> }
> 
> -u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id)
> +u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id,
> +				const struct bpf_prog *prog)
> {
> -	return __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_FMODRET, kfunc_btf_id);
> +	return __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_FMODRET, prog, kfunc_btf_id);
> }
> 
> static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
> @@ -7909,7 +7947,8 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
> 			goto err_out;
> 	}
> 
> -	ret = btf_populate_kfunc_set(btf, hook, kset->set);
> +	ret = btf_populate_kfunc_set(btf, hook, kset);
> +
> err_out:
> 	btf_put(btf);
> 	return ret;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 20eb2015842f..1a854cdb2566 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -10553,7 +10553,7 @@ static int fetch_kfunc_meta(struct bpf_verifier_env *env,
> 		*kfunc_name = func_name;
> 	func_proto = btf_type_by_id(desc_btf, func->type);
> 
> -	kfunc_flags = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), func_id);
> +	kfunc_flags = btf_kfunc_id_set_contains(desc_btf, func_id, env->prog);
> 	if (!kfunc_flags) {
> 		return -EACCES;
> 	}
> @@ -18521,7 +18521,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
> 				 * in the fmodret id set with the KF_SLEEPABLE flag.
> 				 */
> 				else {
> -					u32 *flags = btf_kfunc_is_modify_return(btf, btf_id);
> +					u32 *flags = btf_kfunc_is_modify_return(btf, btf_id,
> +										prog);
> 
> 					if (flags && (*flags & KF_SLEEPABLE))
> 						ret = 0;
> @@ -18549,7 +18550,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
> 				return -EINVAL;
> 			}
> 			ret = -EINVAL;
> -			if (btf_kfunc_is_modify_return(btf, btf_id) ||
> +			if (btf_kfunc_is_modify_return(btf, btf_id, prog) ||
> 			    !check_attach_modify_return(addr, tname))
> 				ret = 0;
> 			if (ret) {
> diff --git a/net/core/filter.c b/net/core/filter.c
> index a70c7b9876fa..5e5e6f9baccc 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -11768,9 +11768,18 @@ BTF_SET8_START(sock_destroy_kfunc_set)
> BTF_ID_FLAGS(func, bpf_sock_destroy)
> BTF_SET8_END(sock_destroy_kfunc_set)
> 
> +static int tracing_iter_filter(const struct bpf_prog *prog, u32 kfunc_id)
> +{
> +	if (btf_id_set8_contains(&sock_destroy_kfunc_set, kfunc_id) &&
> +	    prog->expected_attach_type != BPF_TRACE_ITER)
> +		return -EACCES;
> +	return 0;
> +}
> +
> static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
> 	.owner = THIS_MODULE,
> 	.set   = &sock_destroy_kfunc_set,
> +	.filter = tracing_iter_filter,
> };
> 
> static int init_subsystem(void)
> diff --git a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
> index 5c1e65d50598..5204e44f6ae4 100644
> --- a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
> +++ b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
> @@ -3,6 +3,7 @@
> #include "vmlinux.h"
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_endian.h>
> +#include <bpf/bpf_tracing.h>
> 
> #include "bpf_tracing_net.h"
> 
> @@ -144,4 +145,13 @@ int iter_udp6_server(struct bpf_iter__udp *ctx)
> 	return 0;
> }
> 
> +SEC("tp_btf/tcp_destroy_sock")
> +int BPF_PROG(trace_tcp_destroy_sock, struct sock *sk)
> +{
> +	/* should not load */
> +	bpf_sock_destroy((struct sock_common *)sk);
> +
> +	return 0;
> +}
> +
> char _license[] SEC("license") = "GPL";
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH bpf-next] bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set'.
  2023-04-05 15:05       ` Aditi Ghag
@ 2023-04-05 17:26         ` Martin KaFai Lau
  0 siblings, 0 replies; 33+ messages in thread
From: Martin KaFai Lau @ 2023-04-05 17:26 UTC (permalink / raw)
  To: Aditi Ghag
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, David Vernet

On 4/5/23 8:05 AM, Aditi Ghag wrote:
> Looks quite promising for the sock_destroy use case, and also as a generic filtering mechanism, but I'm not aware of other use cases. I haven't had a chance to apply this patch locally, but I'm planning to do it soon. Thanks!

Please don't top post.

Other use case is to allow different sets of kfuncs to struct_ops programs from 
David: https://lore.kernel.org/bpf/Y9KLHZ1TNXVHdVKm@maniforge/

>> From: Martin KaFai Lau <martin.lau@kernel.org>
>>
>> This set (https://lore.kernel.org/bpf/https://lore.kernel.org/bpf/500d452b-f9d5-d01f-d365-2949c4fd37ab@linux.dev/)
>> needs to limit bpf_sock_destroy kfunc to BPF_TRACE_ITER.
>> In the earlier reply, I thought of adding a BTF_KFUNC_HOOK_TRACING_ITER.
>>
>> Instead of adding BTF_KFUNC_HOOK_TRACING_ITER, I quickly hacked something
>> that added a callback filter to 'struct btf_kfunc_id_set'. The filter has
>> access to the prog such that it can filter by other properties of a prog.
>> The prog->expected_attached_type is used in the tracing_iter_filter().
>> It is mostly compiler tested only, so it is still very rough but should
>> be good enough to show the idea.
>>
>> would like to hear how others think. It is pretty much the only
>> piece left for the above mentioned set.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH bpf-next] bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set'.
  2023-04-04  6:09     ` [RFC PATCH bpf-next] bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set' Martin KaFai Lau
  2023-04-05 15:05       ` Aditi Ghag
@ 2023-04-10 23:05       ` Aditi Ghag
  2023-04-12 15:21         ` Aditi Ghag
  1 sibling, 1 reply; 33+ messages in thread
From: Aditi Ghag @ 2023-04-10 23:05 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Vernet, kernel-team



> On Apr 3, 2023, at 11:09 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> 
> From: Martin KaFai Lau <martin.lau@kernel.org>
> 
> This set (https://lore.kernel.org/bpf/https://lore.kernel.org/bpf/500d452b-f9d5-d01f-d365-2949c4fd37ab@linux.dev/)
> needs to limit bpf_sock_destroy kfunc to BPF_TRACE_ITER.
> In the earlier reply, I thought of adding a BTF_KFUNC_HOOK_TRACING_ITER.
> 
> Instead of adding BTF_KFUNC_HOOK_TRACING_ITER, I quickly hacked something
> that added a callback filter to 'struct btf_kfunc_id_set'. The filter has
> access to the prog such that it can filter by other properties of a prog.
> The prog->expected_attached_type is used in the tracing_iter_filter().
> It is mostly compiler tested only, so it is still very rough but should
> be good enough to show the idea.
> 
> would like to hear how others think. It is pretty much the only
> piece left for the above mentioned set.
> 
> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
> ---
> include/linux/btf.h                           | 18 +++---
> kernel/bpf/btf.c                              | 59 +++++++++++++++----
> kernel/bpf/verifier.c                         |  7 ++-
> net/core/filter.c                             |  9 +++
> .../selftests/bpf/progs/sock_destroy_prog.c   | 10 ++++
> 5 files changed, 83 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index d53b10cc55f2..84c31b4f5785 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -99,10 +99,14 @@ struct btf_type;
> union bpf_attr;
> struct btf_show;
> struct btf_id_set;
> +struct bpf_prog;
> +
> +typedef int (*btf_kfunc_filter_t)(const struct bpf_prog *prog, u32 kfunc_id);
> 
> struct btf_kfunc_id_set {
> 	struct module *owner;
> 	struct btf_id_set8 *set;
> +	btf_kfunc_filter_t filter;
> };
> 
> struct btf_id_dtor_kfunc {
> @@ -482,7 +486,6 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
> 	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
> }
> 
> -struct bpf_prog;
> struct bpf_verifier_log;
> 
> #ifdef CONFIG_BPF_SYSCALL
> @@ -490,10 +493,10 @@ const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
> const char *btf_name_by_offset(const struct btf *btf, u32 offset);
> struct btf *btf_parse_vmlinux(void);
> struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
> -u32 *btf_kfunc_id_set_contains(const struct btf *btf,
> -			       enum bpf_prog_type prog_type,
> -			       u32 kfunc_btf_id);
> -u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id);
> +u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id,
> +			       const struct bpf_prog *prog);
> +u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id,
> +				const struct bpf_prog *prog);
> int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
> 			      const struct btf_kfunc_id_set *s);
> int register_btf_fmodret_id_set(const struct btf_kfunc_id_set *kset);
> @@ -520,8 +523,9 @@ static inline const char *btf_name_by_offset(const struct btf *btf,
> 	return NULL;
> }
> static inline u32 *btf_kfunc_id_set_contains(const struct btf *btf,
> -					     enum bpf_prog_type prog_type,
> -					     u32 kfunc_btf_id)
> +					     u32 kfunc_btf_id,
> +					     struct bpf_prog *prog)
> +
> {
> 	return NULL;
> }
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index b7e5a5510b91..7685af3ca9c0 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -218,10 +218,17 @@ enum btf_kfunc_hook {
> enum {
> 	BTF_KFUNC_SET_MAX_CNT = 256,
> 	BTF_DTOR_KFUNC_MAX_CNT = 256,
> +	BTF_KFUNC_FILTER_MAX_CNT = 16,
> +};
> +
> +struct btf_kfunc_hook_filter {
> +	btf_kfunc_filter_t filters[BTF_KFUNC_FILTER_MAX_CNT];
> +	u32 nr_filters;
> };
> 
> struct btf_kfunc_set_tab {
> 	struct btf_id_set8 *sets[BTF_KFUNC_HOOK_MAX];
> +	struct btf_kfunc_hook_filter hook_filters[BTF_KFUNC_HOOK_MAX];
> };
> 
> struct btf_id_dtor_kfunc_tab {
> @@ -7712,9 +7719,12 @@ static int btf_check_kfunc_protos(struct btf *btf, u32 func_id, u32 func_flags)
> /* Kernel Function (kfunc) BTF ID set registration API */
> 
> static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
> -				  struct btf_id_set8 *add_set)
> +				  const struct btf_kfunc_id_set *kset)
> {
> +	struct btf_kfunc_hook_filter *hook_filter;
> +	struct btf_id_set8 *add_set = kset->set;
> 	bool vmlinux_set = !btf_is_module(btf);
> +	bool add_filter = !!kset->filter;
> 	struct btf_kfunc_set_tab *tab;
> 	struct btf_id_set8 *set;
> 	u32 set_cnt;
> @@ -7729,6 +7739,20 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
> 		return 0;
> 
> 	tab = btf->kfunc_set_tab;
> +
> +	if (tab && add_filter) {
> +		int i;
> +
> +		hook_filter = &tab->hook_filters[hook];
> +		for (i = 0; i < hook_filter->nr_filters; i++) {
> +			if (hook_filter->filters[i] == kset->filter)
> +				add_filter = false;
> +		}
> +
> +		if (add_filter && hook_filter->nr_filters == BTF_KFUNC_FILTER_MAX_CNT)
> +			return -E2BIG;
> +	}
> +
> 	if (!tab) {
> 		tab = kzalloc(sizeof(*tab), GFP_KERNEL | __GFP_NOWARN);
> 		if (!tab)
> @@ -7751,7 +7775,7 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
> 	 */
> 	if (!vmlinux_set) {
> 		tab->sets[hook] = add_set;
> -		return 0;
> +		goto do_add_filter;
> 	}
> 
> 	/* In case of vmlinux sets, there may be more than one set being
> @@ -7793,6 +7817,11 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
> 
> 	sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL);
> 
> +do_add_filter:
> +	if (add_filter) {
> +		hook_filter = &tab->hook_filters[hook];
> +		hook_filter->filters[hook_filter->nr_filters++] = kset->filter;
> +	}
> 	return 0;
> end:
> 	btf_free_kfunc_set_tab(btf);
> @@ -7801,15 +7830,22 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
> 
> static u32 *__btf_kfunc_id_set_contains(const struct btf *btf,
> 					enum btf_kfunc_hook hook,
> +					const struct bpf_prog *prog,
> 					u32 kfunc_btf_id)
> {
> +	struct btf_kfunc_hook_filter *hook_filter;
> 	struct btf_id_set8 *set;
> -	u32 *id;
> +	u32 *id, i;
> 
> 	if (hook >= BTF_KFUNC_HOOK_MAX)
> 		return NULL;
> 	if (!btf->kfunc_set_tab)
> 		return NULL;
> +	hook_filter = &btf->kfunc_set_tab->hook_filters[hook];
> +	for (i = 0; i < hook_filter->nr_filters; i++) {
> +		if (hook_filter->filters[i](prog, kfunc_btf_id))
> +			return NULL;
> +	}
> 	set = btf->kfunc_set_tab->sets[hook];
> 	if (!set)
> 		return NULL;
> @@ -7862,23 +7898,25 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
>  * protection for looking up a well-formed btf->kfunc_set_tab.
>  */
> u32 *btf_kfunc_id_set_contains(const struct btf *btf,
> -			       enum bpf_prog_type prog_type,
> -			       u32 kfunc_btf_id)
> +			       u32 kfunc_btf_id,
> +			       const struct bpf_prog *prog)
> {
> +	enum bpf_prog_type prog_type = resolve_prog_type(prog);
> 	enum btf_kfunc_hook hook;
> 	u32 *kfunc_flags;
> 
> -	kfunc_flags = __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_COMMON, kfunc_btf_id);
> +	kfunc_flags = __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_COMMON, prog, kfunc_btf_id);
> 	if (kfunc_flags)
> 		return kfunc_flags;
> 
> 	hook = bpf_prog_type_to_kfunc_hook(prog_type);
> -	return __btf_kfunc_id_set_contains(btf, hook, kfunc_btf_id);
> +	return __btf_kfunc_id_set_contains(btf, hook, prog, kfunc_btf_id);
> }
> 
> -u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id)
> +u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id,
> +				const struct bpf_prog *prog)
> {
> -	return __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_FMODRET, kfunc_btf_id);
> +	return __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_FMODRET, prog, kfunc_btf_id);
> }
> 
> static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
> @@ -7909,7 +7947,8 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
> 			goto err_out;
> 	}
> 
> -	ret = btf_populate_kfunc_set(btf, hook, kset->set);
> +	ret = btf_populate_kfunc_set(btf, hook, kset);
> +
> err_out:
> 	btf_put(btf);
> 	return ret;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 20eb2015842f..1a854cdb2566 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -10553,7 +10553,7 @@ static int fetch_kfunc_meta(struct bpf_verifier_env *env,
> 		*kfunc_name = func_name;
> 	func_proto = btf_type_by_id(desc_btf, func->type);
> 
> -	kfunc_flags = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), func_id);
> +	kfunc_flags = btf_kfunc_id_set_contains(desc_btf, func_id, env->prog);
> 	if (!kfunc_flags) {
> 		return -EACCES;
> 	}
> @@ -18521,7 +18521,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
> 				 * in the fmodret id set with the KF_SLEEPABLE flag.
> 				 */
> 				else {
> -					u32 *flags = btf_kfunc_is_modify_return(btf, btf_id);
> +					u32 *flags = btf_kfunc_is_modify_return(btf, btf_id,
> +										prog);
> 
> 					if (flags && (*flags & KF_SLEEPABLE))
> 						ret = 0;
> @@ -18549,7 +18550,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
> 				return -EINVAL;
> 			}
> 			ret = -EINVAL;
> -			if (btf_kfunc_is_modify_return(btf, btf_id) ||
> +			if (btf_kfunc_is_modify_return(btf, btf_id, prog) ||
> 			    !check_attach_modify_return(addr, tname))
> 				ret = 0;
> 			if (ret) {
> diff --git a/net/core/filter.c b/net/core/filter.c
> index a70c7b9876fa..5e5e6f9baccc 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -11768,9 +11768,18 @@ BTF_SET8_START(sock_destroy_kfunc_set)
> BTF_ID_FLAGS(func, bpf_sock_destroy)
> BTF_SET8_END(sock_destroy_kfunc_set)
> 
> +static int tracing_iter_filter(const struct bpf_prog *prog, u32 kfunc_id)
> +{
> +	if (btf_id_set8_contains(&sock_destroy_kfunc_set, kfunc_id) &&
> +	    prog->expected_attach_type != BPF_TRACE_ITER)
> +		return -EACCES;
> +	return 0;
> +}
> +
> static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
> 	.owner = THIS_MODULE,
> 	.set   = &sock_destroy_kfunc_set,
> +	.filter = tracing_iter_filter,
> };
> 
> static int init_subsystem(void)
> diff --git a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
> index 5c1e65d50598..5204e44f6ae4 100644
> --- a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
> +++ b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
> @@ -3,6 +3,7 @@
> #include "vmlinux.h"
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_endian.h>
> +#include <bpf/bpf_tracing.h>
> 
> #include "bpf_tracing_net.h"
> 
> @@ -144,4 +145,13 @@ int iter_udp6_server(struct bpf_iter__udp *ctx)
> 	return 0;
> }
> 
> +SEC("tp_btf/tcp_destroy_sock")
> +int BPF_PROG(trace_tcp_destroy_sock, struct sock *sk)
> +{
> +	/* should not load */
> +	bpf_sock_destroy((struct sock_common *)sk);
> +
> +	return 0;
> +}

Applied, and tested the patch locally. It works as expected: tp_btf/tcp_destroy_sock is rejected, while the other programs in sock_destroy_prog load successfully. 

1: (85) call bpf_sock_destroy#75448
calling kernel function bpf_sock_destroy is not allowed


> +
> char _license[] SEC("license") = "GPL";
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH bpf-next] bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set'.
  2023-04-10 23:05       ` Aditi Ghag
@ 2023-04-12 15:21         ` Aditi Ghag
  0 siblings, 0 replies; 33+ messages in thread
From: Aditi Ghag @ 2023-04-12 15:21 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Vernet, kernel-team



> On Apr 10, 2023, at 4:05 PM, Aditi Ghag <aditi.ghag@isovalent.com> wrote:
> 
> 
> 
>> On Apr 3, 2023, at 11:09 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>> 
>> From: Martin KaFai Lau <martin.lau@kernel.org>
>> 
>> This set (https://lore.kernel.org/bpf/https://lore.kernel.org/bpf/500d452b-f9d5-d01f-d365-2949c4fd37ab@linux.dev/)
>> needs to limit bpf_sock_destroy kfunc to BPF_TRACE_ITER.
>> In the earlier reply, I thought of adding a BTF_KFUNC_HOOK_TRACING_ITER.
>> 
>> Instead of adding BTF_KFUNC_HOOK_TRACING_ITER, I quickly hacked something
>> that added a callback filter to 'struct btf_kfunc_id_set'. The filter has
>> access to the prog such that it can filter by other properties of a prog.
>> The prog->expected_attached_type is used in the tracing_iter_filter().
>> It is mostly compiler tested only, so it is still very rough but should
>> be good enough to show the idea.
>> 
>> would like to hear how others think. It is pretty much the only
>> piece left for the above mentioned set.
>> 
>> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
>> ---
>> include/linux/btf.h                           | 18 +++---
>> kernel/bpf/btf.c                              | 59 +++++++++++++++----
>> kernel/bpf/verifier.c                         |  7 ++-
>> net/core/filter.c                             |  9 +++
>> .../selftests/bpf/progs/sock_destroy_prog.c   | 10 ++++
>> 5 files changed, 83 insertions(+), 20 deletions(-)
>> 
>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>> index d53b10cc55f2..84c31b4f5785 100644
>> --- a/include/linux/btf.h
>> +++ b/include/linux/btf.h
>> @@ -99,10 +99,14 @@ struct btf_type;
>> union bpf_attr;
>> struct btf_show;
>> struct btf_id_set;
>> +struct bpf_prog;
>> +
>> +typedef int (*btf_kfunc_filter_t)(const struct bpf_prog *prog, u32 kfunc_id);
>> 
>> struct btf_kfunc_id_set {
>> 	struct module *owner;
>> 	struct btf_id_set8 *set;
>> +	btf_kfunc_filter_t filter;
>> };
>> 
>> struct btf_id_dtor_kfunc {
>> @@ -482,7 +486,6 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
>> 	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
>> }
>> 
>> -struct bpf_prog;
>> struct bpf_verifier_log;
>> 
>> #ifdef CONFIG_BPF_SYSCALL
>> @@ -490,10 +493,10 @@ const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
>> const char *btf_name_by_offset(const struct btf *btf, u32 offset);
>> struct btf *btf_parse_vmlinux(void);
>> struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
>> -u32 *btf_kfunc_id_set_contains(const struct btf *btf,
>> -			       enum bpf_prog_type prog_type,
>> -			       u32 kfunc_btf_id);
>> -u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id);
>> +u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id,
>> +			       const struct bpf_prog *prog);
>> +u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id,
>> +				const struct bpf_prog *prog);
>> int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
>> 			      const struct btf_kfunc_id_set *s);
>> int register_btf_fmodret_id_set(const struct btf_kfunc_id_set *kset);
>> @@ -520,8 +523,9 @@ static inline const char *btf_name_by_offset(const struct btf *btf,
>> 	return NULL;
>> }
>> static inline u32 *btf_kfunc_id_set_contains(const struct btf *btf,
>> -					     enum bpf_prog_type prog_type,
>> -					     u32 kfunc_btf_id)
>> +					     u32 kfunc_btf_id,
>> +					     struct bpf_prog *prog)
>> +
>> {
>> 	return NULL;
>> }
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index b7e5a5510b91..7685af3ca9c0 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -218,10 +218,17 @@ enum btf_kfunc_hook {
>> enum {
>> 	BTF_KFUNC_SET_MAX_CNT = 256,
>> 	BTF_DTOR_KFUNC_MAX_CNT = 256,
>> +	BTF_KFUNC_FILTER_MAX_CNT = 16,
>> +};
>> +
>> +struct btf_kfunc_hook_filter {
>> +	btf_kfunc_filter_t filters[BTF_KFUNC_FILTER_MAX_CNT];
>> +	u32 nr_filters;
>> };
>> 
>> struct btf_kfunc_set_tab {
>> 	struct btf_id_set8 *sets[BTF_KFUNC_HOOK_MAX];
>> +	struct btf_kfunc_hook_filter hook_filters[BTF_KFUNC_HOOK_MAX];
>> };
>> 
>> struct btf_id_dtor_kfunc_tab {
>> @@ -7712,9 +7719,12 @@ static int btf_check_kfunc_protos(struct btf *btf, u32 func_id, u32 func_flags)
>> /* Kernel Function (kfunc) BTF ID set registration API */
>> 
>> static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
>> -				  struct btf_id_set8 *add_set)
>> +				  const struct btf_kfunc_id_set *kset)
>> {
>> +	struct btf_kfunc_hook_filter *hook_filter;
>> +	struct btf_id_set8 *add_set = kset->set;
>> 	bool vmlinux_set = !btf_is_module(btf);
>> +	bool add_filter = !!kset->filter;
>> 	struct btf_kfunc_set_tab *tab;
>> 	struct btf_id_set8 *set;
>> 	u32 set_cnt;
>> @@ -7729,6 +7739,20 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
>> 		return 0;
>> 
>> 	tab = btf->kfunc_set_tab;
>> +
>> +	if (tab && add_filter) {
>> +		int i;
>> +
>> +		hook_filter = &tab->hook_filters[hook];
>> +		for (i = 0; i < hook_filter->nr_filters; i++) {
>> +			if (hook_filter->filters[i] == kset->filter)
>> +				add_filter = false;
>> +		}
>> +
>> +		if (add_filter && hook_filter->nr_filters == BTF_KFUNC_FILTER_MAX_CNT)
>> +			return -E2BIG;
>> +	}
>> +
>> 	if (!tab) {
>> 		tab = kzalloc(sizeof(*tab), GFP_KERNEL | __GFP_NOWARN);
>> 		if (!tab)
>> @@ -7751,7 +7775,7 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
>> 	 */
>> 	if (!vmlinux_set) {
>> 		tab->sets[hook] = add_set;
>> -		return 0;
>> +		goto do_add_filter;
>> 	}
>> 
>> 	/* In case of vmlinux sets, there may be more than one set being
>> @@ -7793,6 +7817,11 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
>> 
>> 	sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL);
>> 
>> +do_add_filter:
>> +	if (add_filter) {
>> +		hook_filter = &tab->hook_filters[hook];
>> +		hook_filter->filters[hook_filter->nr_filters++] = kset->filter;
>> +	}
>> 	return 0;
>> end:
>> 	btf_free_kfunc_set_tab(btf);
>> @@ -7801,15 +7830,22 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
>> 
>> static u32 *__btf_kfunc_id_set_contains(const struct btf *btf,
>> 					enum btf_kfunc_hook hook,
>> +					const struct bpf_prog *prog,
>> 					u32 kfunc_btf_id)
>> {
>> +	struct btf_kfunc_hook_filter *hook_filter;
>> 	struct btf_id_set8 *set;
>> -	u32 *id;
>> +	u32 *id, i;
>> 
>> 	if (hook >= BTF_KFUNC_HOOK_MAX)
>> 		return NULL;
>> 	if (!btf->kfunc_set_tab)
>> 		return NULL;
>> +	hook_filter = &btf->kfunc_set_tab->hook_filters[hook];
>> +	for (i = 0; i < hook_filter->nr_filters; i++) {
>> +		if (hook_filter->filters[i](prog, kfunc_btf_id))
>> +			return NULL;
>> +	}
>> 	set = btf->kfunc_set_tab->sets[hook];
>> 	if (!set)
>> 		return NULL;
>> @@ -7862,23 +7898,25 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
>> * protection for looking up a well-formed btf->kfunc_set_tab.
>> */
>> u32 *btf_kfunc_id_set_contains(const struct btf *btf,
>> -			       enum bpf_prog_type prog_type,
>> -			       u32 kfunc_btf_id)
>> +			       u32 kfunc_btf_id,
>> +			       const struct bpf_prog *prog)
>> {
>> +	enum bpf_prog_type prog_type = resolve_prog_type(prog);
>> 	enum btf_kfunc_hook hook;
>> 	u32 *kfunc_flags;
>> 
>> -	kfunc_flags = __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_COMMON, kfunc_btf_id);
>> +	kfunc_flags = __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_COMMON, prog, kfunc_btf_id);
>> 	if (kfunc_flags)
>> 		return kfunc_flags;
>> 
>> 	hook = bpf_prog_type_to_kfunc_hook(prog_type);
>> -	return __btf_kfunc_id_set_contains(btf, hook, kfunc_btf_id);
>> +	return __btf_kfunc_id_set_contains(btf, hook, prog, kfunc_btf_id);
>> }
>> 
>> -u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id)
>> +u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id,
>> +				const struct bpf_prog *prog)
>> {
>> -	return __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_FMODRET, kfunc_btf_id);
>> +	return __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_FMODRET, prog, kfunc_btf_id);
>> }
>> 
>> static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
>> @@ -7909,7 +7947,8 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
>> 			goto err_out;
>> 	}
>> 
>> -	ret = btf_populate_kfunc_set(btf, hook, kset->set);
>> +	ret = btf_populate_kfunc_set(btf, hook, kset);
>> +
>> err_out:
>> 	btf_put(btf);
>> 	return ret;
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 20eb2015842f..1a854cdb2566 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -10553,7 +10553,7 @@ static int fetch_kfunc_meta(struct bpf_verifier_env *env,
>> 		*kfunc_name = func_name;
>> 	func_proto = btf_type_by_id(desc_btf, func->type);
>> 
>> -	kfunc_flags = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), func_id);
>> +	kfunc_flags = btf_kfunc_id_set_contains(desc_btf, func_id, env->prog);
>> 	if (!kfunc_flags) {
>> 		return -EACCES;
>> 	}
>> @@ -18521,7 +18521,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>> 				 * in the fmodret id set with the KF_SLEEPABLE flag.
>> 				 */
>> 				else {
>> -					u32 *flags = btf_kfunc_is_modify_return(btf, btf_id);
>> +					u32 *flags = btf_kfunc_is_modify_return(btf, btf_id,
>> +										prog);
>> 
>> 					if (flags && (*flags & KF_SLEEPABLE))
>> 						ret = 0;
>> @@ -18549,7 +18550,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>> 				return -EINVAL;
>> 			}
>> 			ret = -EINVAL;
>> -			if (btf_kfunc_is_modify_return(btf, btf_id) ||
>> +			if (btf_kfunc_is_modify_return(btf, btf_id, prog) ||
>> 			    !check_attach_modify_return(addr, tname))
>> 				ret = 0;
>> 			if (ret) {
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index a70c7b9876fa..5e5e6f9baccc 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -11768,9 +11768,18 @@ BTF_SET8_START(sock_destroy_kfunc_set)
>> BTF_ID_FLAGS(func, bpf_sock_destroy)
>> BTF_SET8_END(sock_destroy_kfunc_set)
>> 
>> +static int tracing_iter_filter(const struct bpf_prog *prog, u32 kfunc_id)
>> +{
>> +	if (btf_id_set8_contains(&sock_destroy_kfunc_set, kfunc_id) &&
>> +	    prog->expected_attach_type != BPF_TRACE_ITER)
>> +		return -EACCES;
>> +	return 0;
>> +}
>> +
>> static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
>> 	.owner = THIS_MODULE,
>> 	.set   = &sock_destroy_kfunc_set,
>> +	.filter = tracing_iter_filter,
>> };
>> 
>> static int init_subsystem(void)
>> diff --git a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
>> index 5c1e65d50598..5204e44f6ae4 100644
>> --- a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
>> +++ b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c
>> @@ -3,6 +3,7 @@
>> #include "vmlinux.h"
>> #include <bpf/bpf_helpers.h>
>> #include <bpf/bpf_endian.h>
>> +#include <bpf/bpf_tracing.h>
>> 
>> #include "bpf_tracing_net.h"
>> 
>> @@ -144,4 +145,13 @@ int iter_udp6_server(struct bpf_iter__udp *ctx)
>> 	return 0;
>> }
>> 
>> +SEC("tp_btf/tcp_destroy_sock")
>> +int BPF_PROG(trace_tcp_destroy_sock, struct sock *sk)
>> +{
>> +	/* should not load */
>> +	bpf_sock_destroy((struct sock_common *)sk);
>> +
>> +	return 0;
>> +}
> 
> Applied, and tested the patch locally. It works as expected: tp_btf/tcp_destroy_sock is rejected, while the other programs in sock_destroy_prog load successfully. 
> 
> 1: (85) call bpf_sock_destroy#75448
> calling kernel function bpf_sock_destroy is not allowed

Martin: I'm ready to push the next revision that addresses all the review comments from the current patch. How do you want to coordinate it with this commit?

> 
> 
>> +
>> char _license[] SEC("license") = "GPL";
>> -- 
>> 2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-04-12 15:21 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-30 15:17 [PATCH v5 bpf-next 0/7] bpf: Add socket destroy capability Aditi Ghag
2023-03-30 15:17 ` [PATCH v5 bpf-next 1/7] bpf: tcp: Avoid taking fast sock lock in iterator Aditi Ghag
2023-03-30 15:17 ` [PATCH v5 bpf-next 2/7] udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state Aditi Ghag
2023-03-30 17:35   ` Martin KaFai Lau
2023-03-30 15:17 ` [PATCH v5 bpf-next 3/7] udp: seq_file: Helper function to match socket attributes Aditi Ghag
2023-03-30 18:40   ` kernel test robot
2023-03-30 18:51   ` kernel test robot
2023-03-31  2:52   ` kernel test robot
2023-03-31 20:09   ` Martin KaFai Lau
2023-04-03 15:27     ` Aditi Ghag
2023-04-02  6:18   ` kernel test robot
2023-03-30 15:17 ` [PATCH v5 bpf-next 4/7] bpf: udp: Implement batching for sockets iterator Aditi Ghag
2023-03-31 21:08   ` Martin KaFai Lau
2023-04-03 15:54     ` Aditi Ghag
2023-04-03 19:20       ` Martin KaFai Lau
2023-03-30 15:17 ` [PATCH v5 bpf-next 5/7] bpf: Add bpf_sock_destroy kfunc Aditi Ghag
2023-03-31 22:24   ` Martin KaFai Lau
2023-04-04  6:09     ` [RFC PATCH bpf-next] bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set' Martin KaFai Lau
2023-04-05 15:05       ` Aditi Ghag
2023-04-05 17:26         ` Martin KaFai Lau
2023-04-10 23:05       ` Aditi Ghag
2023-04-12 15:21         ` Aditi Ghag
2023-03-30 15:17 ` [PATCH v5 bpf-next 6/7] selftests/bpf: Add helper to get port using getsockname Aditi Ghag
2023-03-30 18:41   ` Stanislav Fomichev
2023-03-31 21:37     ` Martin KaFai Lau
2023-03-30 15:17 ` [PATCH v5 bpf-next 7/7] selftests/bpf: Test bpf_sock_destroy Aditi Ghag
2023-03-30 18:46   ` Stanislav Fomichev
2023-03-31 22:32     ` Martin KaFai Lau
2023-04-03 15:55       ` Aditi Ghag
2023-04-03 17:35         ` Martin KaFai Lau
2023-04-04  0:15           ` Aditi Ghag
2023-04-04  1:41             ` Martin KaFai Lau
2023-04-04 14:24               ` Aditi Ghag

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).