All of lore.kernel.org
 help / color / mirror / Atom feed
* [bpf PATCH 0/3] BPF, a couple sockmap fixes
@ 2018-04-11 23:56 John Fastabend
  2018-04-11 23:56 ` [bpf PATCH 1/3] bpf: sockmap, map_release does not hold refcnt for pinned maps John Fastabend
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: John Fastabend @ 2018-04-11 23:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev

While testing sockmap with more programs (besides our test programs)
I found a couple issues.

The attached series fixes an issue where pinned maps were not
working correctly, blocking sockets returned zero, and an error
path that when the sock hit an out of memory case resulted in a
double page_put() while doing ingress redirects.

See individual patches for more details.

Thanks,
John

---

John Fastabend (3):
      bpf: sockmap, map_release does not hold refcnt for pinned maps
      bpf: sockmap, sk_wait_event needed to handle blocking cases
      bpf: sockmap, fix double put_page on ENOMEM error in redirect path


 include/linux/bpf.h  |    3 +++
 kernel/bpf/sockmap.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++----
 kernel/bpf/syscall.c |   10 +++++++++-
 3 files changed, 59 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [bpf PATCH 1/3] bpf: sockmap, map_release does not hold refcnt for pinned maps
  2018-04-11 23:56 [bpf PATCH 0/3] BPF, a couple sockmap fixes John Fastabend
@ 2018-04-11 23:56 ` John Fastabend
  2018-04-11 23:56 ` [bpf PATCH 2/3] bpf: sockmap, sk_wait_event needed to handle blocking cases John Fastabend
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: John Fastabend @ 2018-04-11 23:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev

Relying on map_release hook to decrement the reference counts when a
map is removed only works if the map is not being pinned. In the
pinned case the ref is decremented immediately and the BPF programs
released. After this BPF programs may not be in-use which is not
what the user would expect.

This patch moves the release logic into bpf_map_put_uref() and brings
sockmap in-line with how a similar case is handled in prog array maps.

Fixes: 3d9e952697de ("bpf: sockmap, fix leaking maps with attached but not detached progs")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/bpf.h  |    3 +++
 kernel/bpf/sockmap.c |    3 +--
 kernel/bpf/syscall.c |   10 +++++++++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 95a7abd..8a71058 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -646,6 +646,7 @@ static inline void bpf_map_offload_map_free(struct bpf_map *map)
 #if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_INET)
 struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
 int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type);
+void sock_map_release(struct bpf_map *map);
 #else
 static inline struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
 {
@@ -658,6 +659,8 @@ static inline int sock_map_prog(struct bpf_map *map,
 {
 	return -EOPNOTSUPP;
 }
+
+void sock_map_release(struct bpf_map *map) {}
 #endif
 
 /* verifier prototypes for helper functions called from eBPF programs */
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 8dd9210..cef187f 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1834,7 +1834,7 @@ static int sock_map_update_elem(struct bpf_map *map,
 	return err;
 }
 
-static void sock_map_release(struct bpf_map *map, struct file *map_file)
+void sock_map_release(struct bpf_map *map)
 {
 	struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
 	struct bpf_prog *orig;
@@ -1858,7 +1858,6 @@ static void sock_map_release(struct bpf_map *map, struct file *map_file)
 	.map_get_next_key = sock_map_get_next_key,
 	.map_update_elem = sock_map_update_elem,
 	.map_delete_elem = sock_map_delete_elem,
-	.map_release = sock_map_release,
 };
 
 BPF_CALL_4(bpf_sock_map_update, struct bpf_sock_ops_kern *, bpf_sock,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0244973..449a618 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -257,8 +257,16 @@ static void bpf_map_free_deferred(struct work_struct *work)
 static void bpf_map_put_uref(struct bpf_map *map)
 {
 	if (atomic_dec_and_test(&map->usercnt)) {
-		if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY)
+		switch (map->map_type) {
+		case BPF_MAP_TYPE_PROG_ARRAY:
 			bpf_fd_array_map_clear(map);
+			break;
+		case BPF_MAP_TYPE_SOCKMAP:
+			sock_map_release(map);
+			break;
+		default:
+			break;
+		}
 	}
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [bpf PATCH 2/3] bpf: sockmap, sk_wait_event needed to handle blocking cases
  2018-04-11 23:56 [bpf PATCH 0/3] BPF, a couple sockmap fixes John Fastabend
  2018-04-11 23:56 ` [bpf PATCH 1/3] bpf: sockmap, map_release does not hold refcnt for pinned maps John Fastabend
@ 2018-04-11 23:56 ` John Fastabend
  2018-04-11 23:56 ` [bpf PATCH 3/3] bpf: sockmap, fix double put_page on ENOMEM error in redirect path John Fastabend
  2018-04-13  1:11 ` [bpf PATCH 0/3] BPF, a couple sockmap fixes Daniel Borkmann
  3 siblings, 0 replies; 5+ messages in thread
From: John Fastabend @ 2018-04-11 23:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev

In the recvmsg handler we need to add a wait event to support the
blocking use cases. Without this we return zero and may confuse
user applications. In the wait event any data received on the
sk either via sk_receive_queue or the psock ingress list will
wake up the sock.

Fixes: fa246693a111 ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index cef187f..ffe266b 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -43,6 +43,7 @@
 #include <net/tcp.h>
 #include <linux/ptr_ring.h>
 #include <net/inet_common.h>
+#include <linux/sched/signal.h>
 
 #define SOCK_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
@@ -732,6 +733,26 @@ static int bpf_exec_tx_verdict(struct smap_psock *psock,
 	return err;
 }
 
+static int bpf_wait_data(struct sock *sk,
+			 struct smap_psock *psk, int flags,
+			 long timeo, int *err)
+{
+	int rc;
+
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
+
+	add_wait_queue(sk_sleep(sk), &wait);
+	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	rc = sk_wait_event(sk, &timeo,
+			   !list_empty(&psk->ingress) ||
+			   !skb_queue_empty(&sk->sk_receive_queue),
+			   &wait);
+	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	remove_wait_queue(sk_sleep(sk), &wait);
+
+	return rc;
+}
+
 static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 			   int nonblock, int flags, int *addr_len)
 {
@@ -755,6 +776,7 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
 
 	lock_sock(sk);
+bytes_ready:
 	while (copied != len) {
 		struct scatterlist *sg;
 		struct sk_msg_buff *md;
@@ -809,6 +831,29 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		}
 	}
 
+	if (!copied) {
+		long timeo;
+		int data;
+		int err = 0;
+
+		timeo = sock_rcvtimeo(sk, nonblock);
+		data = bpf_wait_data(sk, psock, flags, timeo, &err);
+
+		if (data) {
+			if (!skb_queue_empty(&sk->sk_receive_queue)) {
+				release_sock(sk);
+				smap_release_sock(psock, sk);
+				copied = tcp_recvmsg(sk, msg, len,
+						     nonblock, flags, addr_len);
+				return copied;
+			}
+			goto bytes_ready;
+		}
+
+		if (err)
+			copied = err;
+	}
+
 	release_sock(sk);
 	smap_release_sock(psock, sk);
 	return copied;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [bpf PATCH 3/3] bpf: sockmap, fix double put_page on ENOMEM error in redirect path
  2018-04-11 23:56 [bpf PATCH 0/3] BPF, a couple sockmap fixes John Fastabend
  2018-04-11 23:56 ` [bpf PATCH 1/3] bpf: sockmap, map_release does not hold refcnt for pinned maps John Fastabend
  2018-04-11 23:56 ` [bpf PATCH 2/3] bpf: sockmap, sk_wait_event needed to handle blocking cases John Fastabend
@ 2018-04-11 23:56 ` John Fastabend
  2018-04-13  1:11 ` [bpf PATCH 0/3] BPF, a couple sockmap fixes Daniel Borkmann
  3 siblings, 0 replies; 5+ messages in thread
From: John Fastabend @ 2018-04-11 23:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev

In the case where the socket memory boundary is hit the redirect
path returns an ENOMEM error. However, before checking for this
condition the redirect scatterlist buffer is setup with a valid
page and length. This is never unwound so when the buffers are
released latter in the error path we do a put_page() and clear
the scatterlist fields. But, because the initial error happens
before completing the scatterlist buffer we end up with both the
original buffer and the redirect buffer pointing to the same page
resulting in duplicate put_page() calls.

To fix this simply move the initial configuration of the redirect
scatterlist buffer below the sock memory check.

Found this while running TCP_STREAM test with netperf using Cilium.

Fixes: fa246693a111 ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index ffe266b..0fefdb0 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -524,8 +524,6 @@ static int bpf_tcp_ingress(struct sock *sk, int apply_bytes,
 	i = md->sg_start;
 
 	do {
-		r->sg_data[i] = md->sg_data[i];
-
 		size = (apply && apply_bytes < md->sg_data[i].length) ?
 			apply_bytes : md->sg_data[i].length;
 
@@ -536,6 +534,7 @@ static int bpf_tcp_ingress(struct sock *sk, int apply_bytes,
 		}
 
 		sk_mem_charge(sk, size);
+		r->sg_data[i] = md->sg_data[i];
 		r->sg_data[i].length = size;
 		md->sg_data[i].length -= size;
 		md->sg_data[i].offset += size;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [bpf PATCH 0/3] BPF, a couple sockmap fixes
  2018-04-11 23:56 [bpf PATCH 0/3] BPF, a couple sockmap fixes John Fastabend
                   ` (2 preceding siblings ...)
  2018-04-11 23:56 ` [bpf PATCH 3/3] bpf: sockmap, fix double put_page on ENOMEM error in redirect path John Fastabend
@ 2018-04-13  1:11 ` Daniel Borkmann
  3 siblings, 0 replies; 5+ messages in thread
From: Daniel Borkmann @ 2018-04-13  1:11 UTC (permalink / raw)
  To: John Fastabend, ast; +Cc: netdev

On 04/12/2018 01:56 AM, John Fastabend wrote:
> While testing sockmap with more programs (besides our test programs)
> I found a couple issues.
> 
> The attached series fixes an issue where pinned maps were not
> working correctly, blocking sockets returned zero, and an error
> path that when the sock hit an out of memory case resulted in a
> double page_put() while doing ingress redirects.
> 
> See individual patches for more details.

Applied to bpf tree, thanks John!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-13  1:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-11 23:56 [bpf PATCH 0/3] BPF, a couple sockmap fixes John Fastabend
2018-04-11 23:56 ` [bpf PATCH 1/3] bpf: sockmap, map_release does not hold refcnt for pinned maps John Fastabend
2018-04-11 23:56 ` [bpf PATCH 2/3] bpf: sockmap, sk_wait_event needed to handle blocking cases John Fastabend
2018-04-11 23:56 ` [bpf PATCH 3/3] bpf: sockmap, fix double put_page on ENOMEM error in redirect path John Fastabend
2018-04-13  1:11 ` [bpf PATCH 0/3] BPF, a couple sockmap fixes Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.