All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf v3 0/2] sockmap fix for KASAN_VMALLOC and af_unix
@ 2023-11-28 15:55 John Fastabend
  2023-11-28 15:55 ` [PATCH bpf v3 1/2] bpf: sockmap, af_unix stream sockets need to hold ref for pair sock John Fastabend
  2023-11-28 15:55 ` [PATCH bpf v3 2/2] bpf: sockmap, add af_unix test with both sockets in map John Fastabend
  0 siblings, 2 replies; 4+ messages in thread
From: John Fastabend @ 2023-11-28 15:55 UTC (permalink / raw)
  To: martin.lau, jakub; +Cc: john.fastabend, bpf, netdev

The af_unix tests in sockmap_listen causes a splat from KASAN_VMALLOC.
Fix it here and include an extra test to catch case where both pairs
of the af_unix socket are included in a BPF sockmap.

Also it seems the test infra is not passing type through correctly when
testing unix_inet_redir_to_connected. Unfortunately, the simple fix
also caused some CI tests to fail so investigating that now.

v3: drop unnecessary assignment (Martin) and rebase on latest selftests.

v2: drop changes to dgram side its fine per Jakub's point it graps a
    reference on the peer socket from each sendmsg.

John Fastabend (2):
  bpf: sockmap, af_unix stream sockets need to hold ref for pair sock
  bpf: sockmap, add af_unix test with both sockets in map

 include/linux/skmsg.h                         |  1 +
 include/net/af_unix.h                         |  1 +
 net/core/skmsg.c                              |  2 +
 net/unix/af_unix.c                            |  2 -
 net/unix/unix_bpf.c                           |  5 ++
 .../selftests/bpf/prog_tests/sockmap_listen.c | 51 +++++++++++++++----
 .../selftests/bpf/progs/test_sockmap_listen.c |  7 +++
 7 files changed, 56 insertions(+), 13 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH bpf v3 1/2] bpf: sockmap, af_unix stream sockets need to hold ref for pair sock
  2023-11-28 15:55 [PATCH bpf v3 0/2] sockmap fix for KASAN_VMALLOC and af_unix John Fastabend
@ 2023-11-28 15:55 ` John Fastabend
  2023-11-28 15:55 ` [PATCH bpf v3 2/2] bpf: sockmap, add af_unix test with both sockets in map John Fastabend
  1 sibling, 0 replies; 4+ messages in thread
From: John Fastabend @ 2023-11-28 15:55 UTC (permalink / raw)
  To: martin.lau, jakub; +Cc: john.fastabend, bpf, netdev

AF_UNIX stream sockets are a paired socket. So sending on one of the pairs
will lookup the paired socket as part of the send operation. It is possible
however to put just one of the pairs in a BPF map. This currently
increments the refcnt on the sock in the sockmap to ensure it is not
free'd by the stack before sockmap cleans up its state and stops any
skbs being sent/recv'd to that socket.

But we missed a case. If the peer socket is closed it will be
free'd by the stack. However, the paired socket can still be
referenced from BPF sockmap side because we hold a reference
there. Then if we are sending traffic through BPF sockmap to
that socket it will try to dereference the free'd pair in its
send logic creating a use after free.  And following splat,

   [59.900375] BUG: KASAN: slab-use-after-free in sk_wake_async+0x31/0x1b0
   [59.901211] Read of size 8 at addr ffff88811acbf060 by task kworker/1:2/954
   [...]
   [59.905468] Call Trace:
   [59.905787]  <TASK>
   [59.906066]  dump_stack_lvl+0x130/0x1d0
   [59.908877]  print_report+0x16f/0x740
   [59.910629]  kasan_report+0x118/0x160
   [59.912576]  sk_wake_async+0x31/0x1b0
   [59.913554]  sock_def_readable+0x156/0x2a0
   [59.914060]  unix_stream_sendmsg+0x3f9/0x12a0
   [59.916398]  sock_sendmsg+0x20e/0x250
   [59.916854]  skb_send_sock+0x236/0xac0
   [59.920527]  sk_psock_backlog+0x287/0xaa0

To fix let BPF sockmap hold a refcnt on both the socket in the
sockmap and its paired socket.  It wasn't obvious how to contain
the fix to bpf_unix logic. The primarily problem with keeping this
logic in bpf_unix was: In the sock close() we could handle the
deref by having a close handler. But, when we are destroying the
psock through a map delete operation we wouldn't have gotten any
signal thorugh the proto struct other than it being replaced.
If we do the deref from the proto replace its too early because
we need to deref the skpair after the backlog worker has been
stopped.

Given all this it seems best to just cache it at the end of the
psock and eat 8B for the af_unix and vsock users.

Notice dgram sockets are OK because they handle locking already.

Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap")
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/skmsg.h | 1 +
 include/net/af_unix.h | 1 +
 net/core/skmsg.c      | 2 ++
 net/unix/af_unix.c    | 2 --
 net/unix/unix_bpf.c   | 5 +++++
 5 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c1637515a8a4..fbe628961cf8 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -106,6 +106,7 @@ struct sk_psock {
 	struct mutex			work_mutex;
 	struct sk_psock_work_state	work_state;
 	struct delayed_work		work;
+	struct sock			*skpair;
 	struct rcu_work			rwork;
 };
 
diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 824c258143a3..49c4640027d8 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -75,6 +75,7 @@ struct unix_sock {
 };
 
 #define unix_sk(ptr) container_of_const(ptr, struct unix_sock, sk)
+#define unix_peer(sk) (unix_sk(sk)->peer)
 
 #define peer_wait peer_wq.wait
 
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 6c31eefbd777..6236164b9bce 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -826,6 +826,8 @@ static void sk_psock_destroy(struct work_struct *work)
 
 	if (psock->sk_redir)
 		sock_put(psock->sk_redir);
+	if (psock->skpair)
+		sock_put(psock->skpair);
 	sock_put(psock->sk);
 	kfree(psock);
 }
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a357dc5f2404..ac1f2bc18fc9 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -213,8 +213,6 @@ static inline bool unix_secdata_eq(struct scm_cookie *scm, struct sk_buff *skb)
 }
 #endif /* CONFIG_SECURITY_NETWORK */
 
-#define unix_peer(sk) (unix_sk(sk)->peer)
-
 static inline int unix_our_peer(struct sock *sk, struct sock *osk)
 {
 	return unix_peer(osk) == sk;
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
index 2f9d8271c6ec..074ab91485f1 100644
--- a/net/unix/unix_bpf.c
+++ b/net/unix/unix_bpf.c
@@ -159,12 +159,17 @@ int unix_dgram_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool re
 
 int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
 {
+	struct sock *skpair;
+
 	if (restore) {
 		sk->sk_write_space = psock->saved_write_space;
 		sock_replace_proto(sk, psock->sk_proto);
 		return 0;
 	}
 
+	skpair = unix_peer(sk);
+	sock_hold(skpair);
+	psock->skpair = skpair;
 	unix_stream_bpf_check_needs_rebuild(psock->sk_proto);
 	sock_replace_proto(sk, &unix_stream_bpf_prot);
 	return 0;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH bpf v3 2/2] bpf: sockmap, add af_unix test with both sockets in map
  2023-11-28 15:55 [PATCH bpf v3 0/2] sockmap fix for KASAN_VMALLOC and af_unix John Fastabend
  2023-11-28 15:55 ` [PATCH bpf v3 1/2] bpf: sockmap, af_unix stream sockets need to hold ref for pair sock John Fastabend
@ 2023-11-28 15:55 ` John Fastabend
  2023-11-29  2:40   ` kernel test robot
  1 sibling, 1 reply; 4+ messages in thread
From: John Fastabend @ 2023-11-28 15:55 UTC (permalink / raw)
  To: martin.lau, jakub; +Cc: john.fastabend, bpf, netdev

This adds a test where both pairs of a af_unix paired socket are put into
a BPF map. This ensures that when we tear down the af_unix pair we don't
have any issues on sockmap side with ordering and reference counting.

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 .../selftests/bpf/prog_tests/sockmap_listen.c | 51 +++++++++++++++----
 .../selftests/bpf/progs/test_sockmap_listen.c |  7 +++
 2 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index a934d430c20c..a07d4862eeba 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1337,7 +1337,8 @@ static void test_redir(struct test_sockmap_listen *skel, struct bpf_map *map,
 }
 
 static void pairs_redir_to_connected(int cli0, int peer0, int cli1, int peer1,
-				     int sock_mapfd, int verd_mapfd, enum redir_mode mode)
+				     int sock_mapfd, int nop_mapfd,
+				     int verd_mapfd, enum redir_mode mode)
 {
 	const char *log_prefix = redir_mode_str(mode);
 	unsigned int pass;
@@ -1351,6 +1352,12 @@ static void pairs_redir_to_connected(int cli0, int peer0, int cli1, int peer1,
 	if (err)
 		return;
 
+	if (nop_madfd >= 0) {
+		err = add_to_sockmap(nop_mapfd, cli0, cli1);
+		if (err)
+			return;
+	}
+
 	n = write(cli1, "a", 1);
 	if (n < 0)
 		FAIL_ERRNO("%s: write", log_prefix);
@@ -1387,7 +1394,7 @@ static void unix_redir_to_connected(int sotype, int sock_mapfd,
 		goto close0;
 	c1 = sfd[0], p1 = sfd[1];
 
-	pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, verd_mapfd, mode);
+	pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, -1, verd_mapfd, mode);
 
 	xclose(c1);
 	xclose(p1);
@@ -1677,7 +1684,7 @@ static void udp_redir_to_connected(int family, int sock_mapfd, int verd_mapfd,
 	if (err)
 		goto close_cli0;
 
-	pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, verd_mapfd, mode);
+	pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, -1, verd_mapfd, mode);
 
 	xclose(c1);
 	xclose(p1);
@@ -1735,7 +1742,7 @@ static void inet_unix_redir_to_connected(int family, int type, int sock_mapfd,
 	if (err)
 		goto close;
 
-	pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, verd_mapfd, mode);
+	pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, -1, verd_mapfd, mode);
 
 	xclose(c1);
 	xclose(p1);
@@ -1770,8 +1777,10 @@ static void inet_unix_skb_redir_to_connected(struct test_sockmap_listen *skel,
 	xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT);
 }
 
-static void unix_inet_redir_to_connected(int family, int type, int sock_mapfd,
-					int verd_mapfd, enum redir_mode mode)
+static void unix_inet_redir_to_connected(int family, int type,
+					int sock_mapfd, int nop_mapfd,
+					int verd_mapfd,
+					enum redir_mode mode)
 {
 	int c0, c1, p0, p1;
 	int sfd[2];
@@ -1785,7 +1794,8 @@ static void unix_inet_redir_to_connected(int family, int type, int sock_mapfd,
 		goto close_cli0;
 	c1 = sfd[0], p1 = sfd[1];
 
-	pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, verd_mapfd, mode);
+	pairs_redir_to_connected(c0, p0, c1, p1,
+				 sock_mapfd, nop_mapfd, verd_mapfd, mode);
 
 	xclose(c1);
 	xclose(p1);
@@ -1799,6 +1809,7 @@ static void unix_inet_skb_redir_to_connected(struct test_sockmap_listen *skel,
 					    struct bpf_map *inner_map, int family)
 {
 	int verdict = bpf_program__fd(skel->progs.prog_skb_verdict);
+	int nop_map = bpf_map__fd(skel->maps.nop_map);
 	int verdict_map = bpf_map__fd(skel->maps.verdict_map);
 	int sock_map = bpf_map__fd(inner_map);
 	int err;
@@ -1808,14 +1819,32 @@ static void unix_inet_skb_redir_to_connected(struct test_sockmap_listen *skel,
 		return;
 
 	skel->bss->test_ingress = false;
-	unix_inet_redir_to_connected(family, SOCK_DGRAM, sock_map, verdict_map,
+	unix_inet_redir_to_connected(family, SOCK_DGRAM,
+				     sock_map, -1, verdict_map,
 				     REDIR_EGRESS);
-	unix_inet_redir_to_connected(family, SOCK_STREAM, sock_map, verdict_map,
+	unix_inet_redir_to_connected(family, SOCK_DGRAM,
+				     sock_map, -1, verdict_map,
+				     REDIR_EGRESS);
+
+	unix_inet_redir_to_connected(family, SOCK_DGRAM,
+				     sock_map, nop_map, verdict_map,
+				     REDIR_EGRESS);
+	unix_inet_redir_to_connected(family, SOCK_STREAM,
+				     sock_map, nop_map, verdict_map,
 				     REDIR_EGRESS);
 	skel->bss->test_ingress = true;
-	unix_inet_redir_to_connected(family, SOCK_DGRAM, sock_map, verdict_map,
+	unix_inet_redir_to_connected(family, SOCK_DGRAM,
+				     sock_map, -1, verdict_map,
+				     REDIR_INGRESS);
+	unix_inet_redir_to_connected(family, SOCK_STREAM,
+				     sock_map, -1, verdict_map,
+				     REDIR_INGRESS);
+
+	unix_inet_redir_to_connected(family, SOCK_DGRAM,
+				     sock_map, nop_map, verdict_map,
 				     REDIR_INGRESS);
-	unix_inet_redir_to_connected(family, SOCK_STREAM, sock_map, verdict_map,
+	unix_inet_redir_to_connected(family, SOCK_STREAM,
+				     sock_map, nop_map, verdict_map,
 				     REDIR_INGRESS);
 
 	xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT);
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
index 464d35bd57c7..b7250eb9c30c 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
@@ -14,6 +14,13 @@ struct {
 	__type(value, __u64);
 } sock_map SEC(".maps");
 
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, 2);
+	__type(key, __u32);
+	__type(value, __u64);
+} nop_map SEC(".maps");
+
 struct {
 	__uint(type, BPF_MAP_TYPE_SOCKHASH);
 	__uint(max_entries, 2);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf v3 2/2] bpf: sockmap, add af_unix test with both sockets in map
  2023-11-28 15:55 ` [PATCH bpf v3 2/2] bpf: sockmap, add af_unix test with both sockets in map John Fastabend
@ 2023-11-29  2:40   ` kernel test robot
  0 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2023-11-29  2:40 UTC (permalink / raw)
  To: John Fastabend, martin.lau, jakub
  Cc: oe-kbuild-all, john.fastabend, bpf, netdev

Hi John,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf/master]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Fastabend/bpf-sockmap-af_unix-stream-sockets-need-to-hold-ref-for-pair-sock/20231128-235707
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
patch link:    https://lore.kernel.org/r/20231128155515.9302-3-john.fastabend%40gmail.com
patch subject: [PATCH bpf v3 2/2] bpf: sockmap, add af_unix test with both sockets in map
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231129/202311290745.tAZIyCyC-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311290745.tAZIyCyC-lkp@intel.com/

All errors (new ones prefixed by >>):

   tools/testing/selftests/bpf/prog_tests/sockmap_listen.c: In function 'pairs_redir_to_connected':
>> tools/testing/selftests/bpf/prog_tests/sockmap_listen.c:1355:13: error: 'nop_madfd' undeclared (first use in this function); did you mean 'nop_mapfd'?
    1355 |         if (nop_madfd >= 0) {
         |             ^~~~~~~~~
         |             nop_mapfd
   tools/testing/selftests/bpf/prog_tests/sockmap_listen.c:1355:13: note: each undeclared identifier is reported only once for each function it appears in


vim +1355 tools/testing/selftests/bpf/prog_tests/sockmap_listen.c

  1338	
  1339	static void pairs_redir_to_connected(int cli0, int peer0, int cli1, int peer1,
  1340					     int sock_mapfd, int nop_mapfd,
  1341					     int verd_mapfd, enum redir_mode mode)
  1342	{
  1343		const char *log_prefix = redir_mode_str(mode);
  1344		unsigned int pass;
  1345		int err, n;
  1346		u32 key;
  1347		char b;
  1348	
  1349		zero_verdict_count(verd_mapfd);
  1350	
  1351		err = add_to_sockmap(sock_mapfd, peer0, peer1);
  1352		if (err)
  1353			return;
  1354	
> 1355		if (nop_madfd >= 0) {
  1356			err = add_to_sockmap(nop_mapfd, cli0, cli1);
  1357			if (err)
  1358				return;
  1359		}
  1360	
  1361		n = write(cli1, "a", 1);
  1362		if (n < 0)
  1363			FAIL_ERRNO("%s: write", log_prefix);
  1364		if (n == 0)
  1365			FAIL("%s: incomplete write", log_prefix);
  1366		if (n < 1)
  1367			return;
  1368	
  1369		key = SK_PASS;
  1370		err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass);
  1371		if (err)
  1372			return;
  1373		if (pass != 1)
  1374			FAIL("%s: want pass count 1, have %d", log_prefix, pass);
  1375	
  1376		n = recv_timeout(mode == REDIR_INGRESS ? peer0 : cli0, &b, 1, 0, IO_TIMEOUT_SEC);
  1377		if (n < 0)
  1378			FAIL_ERRNO("%s: recv_timeout", log_prefix);
  1379		if (n == 0)
  1380			FAIL("%s: incomplete recv", log_prefix);
  1381	}
  1382	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-29  2:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-28 15:55 [PATCH bpf v3 0/2] sockmap fix for KASAN_VMALLOC and af_unix John Fastabend
2023-11-28 15:55 ` [PATCH bpf v3 1/2] bpf: sockmap, af_unix stream sockets need to hold ref for pair sock John Fastabend
2023-11-28 15:55 ` [PATCH bpf v3 2/2] bpf: sockmap, add af_unix test with both sockets in map John Fastabend
2023-11-29  2:40   ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.