linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v4 0/4] Add SCM_PIDFD and SO_PEERPIDFD
@ 2023-04-13 13:33 Alexander Mikhalitsyn
  2023-04-13 13:33 ` [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD Alexander Mikhalitsyn
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Alexander Mikhalitsyn @ 2023-04-13 13:33 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, netdev, daniel, Alexander Mikhalitsyn,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Christian Brauner,
	Kuniyuki Iwashima, Lennart Poettering, Luca Boccassi

1. Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
but it contains pidfd instead of plain pid, which allows programmers not
to care about PID reuse problem.

2. Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd.
This thing is direct analog of SO_PEERCRED which allows to get plain PID.

3. Add SCM_PIDFD / SO_PEERPIDFD kselftest

Idea comes from UAPI kernel group:
https://uapi-group.org/kernel-features/

Big thanks to Christian Brauner and Lennart Poettering for productive
discussions about this and Luca Boccassi for testing and reviewing this.

=== Motivation behind this patchset

Eric Dumazet raised a question:
> It seems that we already can use pidfd_open() (since linux-5.3), and
> pass the resulting fd in af_unix SCM_RIGHTS message ?

Yes, it's possible, but it means that from the receiver side we need
to trust the sent pidfd (in SCM_RIGHTS),
or always use combination of SCM_RIGHTS+SCM_CREDENTIALS, then we can
extract pidfd from SCM_RIGHTS,
then acquire plain pid from pidfd and after compare it with the pid
from SCM_CREDENTIALS.

A few comments from other folks regarding this.

Christian Brauner wrote:

>Let me try and provide some of the missing background.

>There are a range of use-cases where we would like to authenticate a
>client through sockets without being susceptible to PID recycling
>attacks. Currently, we can't do this as the race isn't fully fixable.
>We can only apply mitigations.

>What this patchset will allows us to do is to get a pidfd without the
>client having to send us an fd explicitly via SCM_RIGHTS. As that's
>already possibly as you correctly point out.

>But for protocols like polkit this is quite important. Every message is
>standalone and we would need to force a complete protocol change where
>we would need to require that every client allocate and send a pidfd via
>SCM_RIGHTS. That would also mean patching through all polkit users.

>For something like systemd-journald where we provide logging facilities
>and want to add metadata to the log we would also immensely benefit from
>being able to get a receiver-side controlled pidfd.

>With the message type we envisioned we don't need to change the sender
>at all and can be safe against pid recycling.

>Link: https://gitlab.freedesktop.org/polkit/polkit/-/merge_requests/154
>Link: https://uapi-group.org/kernel-features

Lennart Poettering wrote:

>So yes, this is of course possible, but it would mean the pidfd would
>have to be transported as part of the user protocol, explicitly sent
>by the sender. (Moreover, the receiver after receiving the pidfd would
>then still have to somehow be able to prove that the pidfd it just
>received actually refers to the peer's process and not some random
>process. – this part is actually solvable in userspace, but ugly)

>The big thing is simply that we want that the pidfd is associated
>*implicity* with each AF_UNIX connection, not explicitly. A lot of
>userspace already relies on this, both in the authentication area
>(polkit) as well as in the logging area (systemd-journald). Right now
>using the PID field from SO_PEERCREDS/SCM_CREDENTIALS is racy though
>and very hard to get right. Making this available as pidfd too, would
>solve this raciness, without otherwise changing semantics of it all:
>receivers can still enable the creds stuff as they wish, and the data
>is then implicitly appended to the connections/datagrams the sender
>initiates.

>Or to turn this around: things like polkit are typically used to
>authenticate arbitrary dbus methods calls: some service implements a
>dbus method call, and when an unprivileged client then issues that
>call, it will take the client's info, go to polkit and ask it if this
>is ok. If we wanted to send the pidfd as part of the protocol we
>basically would have to extend every single method call to contain the
>client's pidfd along with it as an additional argument, which would be
>a massive undertaking: it would change the prototypes of basically
>*all* methods a service defines… And that's just ugly.

>Note that Alex' patch set doesn't expose anything that wasn't exposed
>before, or attach, propagate what wasn't before. All it does, is make
>the field already available anyway (the struct ucred .pid field)
>available also in a better way (as a pidfd), to solve a variety of
>races, with no effect on the protocol actually spoken within the
>AF_UNIX transport. It's a seamless improvement of the status quo.

===

This patch series is on top of net-next tree with pidfd.file.api.v6.4
tag (from git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git) merged in.

Git tree:
https://github.com/mihalicyn/linux/tree/scm_pidfd

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>

Tested-by: Luca Boccassi <bluca@debian.org>

Alexander Mikhalitsyn (4):
  scm: add SO_PASSPIDFD and SCM_PIDFD
  net: socket: add sockopts blacklist for BPF cgroup hook
  net: core: add getsockopt SO_PEERPIDFD
  selftests: net: add SCM_PIDFD / SO_PEERPIDFD test

 arch/alpha/include/uapi/asm/socket.h          |   3 +
 arch/mips/include/uapi/asm/socket.h           |   3 +
 arch/parisc/include/uapi/asm/socket.h         |   3 +
 arch/sparc/include/uapi/asm/socket.h          |   3 +
 include/linux/net.h                           |   1 +
 include/linux/socket.h                        |   1 +
 include/net/scm.h                             |  39 +-
 include/uapi/asm-generic/socket.h             |   3 +
 net/core/sock.c                               |  44 ++
 net/mptcp/sockopt.c                           |   1 +
 net/socket.c                                  |  45 +-
 net/unix/af_unix.c                            |  18 +-
 tools/include/uapi/asm-generic/socket.h       |   3 +
 tools/testing/selftests/net/.gitignore        |   1 +
 tools/testing/selftests/net/af_unix/Makefile  |   2 +-
 .../testing/selftests/net/af_unix/scm_pidfd.c | 430 ++++++++++++++++++
 16 files changed, 589 insertions(+), 11 deletions(-)
 create mode 100644 tools/testing/selftests/net/af_unix/scm_pidfd.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD
  2023-04-13 13:33 [PATCH net-next v4 0/4] Add SCM_PIDFD and SO_PEERPIDFD Alexander Mikhalitsyn
@ 2023-04-13 13:33 ` Alexander Mikhalitsyn
  2023-04-17 15:18   ` Christian Brauner
  2023-04-18 13:07   ` Christian Brauner
  2023-04-13 13:33 ` [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook Alexander Mikhalitsyn
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 27+ messages in thread
From: Alexander Mikhalitsyn @ 2023-04-13 13:33 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, netdev, daniel, Alexander Mikhalitsyn,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Christian Brauner,
	Kuniyuki Iwashima, Lennart Poettering, Luca Boccassi, linux-arch

Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
but it contains pidfd instead of plain pid, which allows programmers not
to care about PID reuse problem.

Idea comes from UAPI kernel group:
https://uapi-group.org/kernel-features/

Big thanks to Christian Brauner and Lennart Poettering for productive
discussions about this.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Tested-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
v4:
	- fixed silent fd_install if writting of CMSG to the userspace fails (pointed by Christian)
v2:
	According to review comments from Kuniyuki Iwashima and Christian Brauner:
	- use pidfd_create(..) retval as a result
	- whitespace change
---
 arch/alpha/include/uapi/asm/socket.h    |  2 ++
 arch/mips/include/uapi/asm/socket.h     |  2 ++
 arch/parisc/include/uapi/asm/socket.h   |  2 ++
 arch/sparc/include/uapi/asm/socket.h    |  2 ++
 include/linux/net.h                     |  1 +
 include/linux/socket.h                  |  1 +
 include/net/scm.h                       | 39 +++++++++++++++++++++++--
 include/uapi/asm-generic/socket.h       |  2 ++
 net/core/sock.c                         | 11 +++++++
 net/mptcp/sockopt.c                     |  1 +
 net/unix/af_unix.c                      | 18 ++++++++----
 tools/include/uapi/asm-generic/socket.h |  2 ++
 12 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 739891b94136..ff310613ae64 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -137,6 +137,8 @@
 
 #define SO_RCVMARK		75
 
+#define SO_PASSPIDFD		76
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 18f3d95ecfec..762dcb80e4ec 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -148,6 +148,8 @@
 
 #define SO_RCVMARK		75
 
+#define SO_PASSPIDFD		76
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index f486d3dfb6bb..df16a3e16d64 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -129,6 +129,8 @@
 
 #define SO_RCVMARK		0x4049
 
+#define SO_PASSPIDFD		0x404A
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 2fda57a3ea86..6e2847804fea 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -130,6 +130,8 @@
 
 #define SO_RCVMARK               0x0054
 
+#define SO_PASSPIDFD             0x0055
+
 #if !defined(__KERNEL__)
 
 
diff --git a/include/linux/net.h b/include/linux/net.h
index b73ad8e3c212..c234dfbe7a30 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -43,6 +43,7 @@ struct net;
 #define SOCK_PASSSEC		4
 #define SOCK_SUPPORT_ZC		5
 #define SOCK_CUSTOM_SOCKOPT	6
+#define SOCK_PASSPIDFD		7
 
 #ifndef ARCH_HAS_SOCKET_TYPES
 /**
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 13c3a237b9c9..6bf90f251910 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -177,6 +177,7 @@ static inline size_t msg_data_left(struct msghdr *msg)
 #define	SCM_RIGHTS	0x01		/* rw: access rights (array of int) */
 #define SCM_CREDENTIALS 0x02		/* rw: struct ucred		*/
 #define SCM_SECURITY	0x03		/* rw: security label		*/
+#define SCM_PIDFD	0x04		/* ro: pidfd (int)		*/
 
 struct ucred {
 	__u32	pid;
diff --git a/include/net/scm.h b/include/net/scm.h
index 585adc1346bd..c67f765a165b 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -120,12 +120,44 @@ static inline bool scm_has_secdata(struct socket *sock)
 }
 #endif /* CONFIG_SECURITY_NETWORK */
 
+static __inline__ void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm)
+{
+	struct file *pidfd_file = NULL;
+	int pidfd;
+
+	/*
+	 * put_cmsg() doesn't return an error if CMSG is truncated,
+	 * that's why we need to opencode these checks here.
+	 */
+	if ((msg->msg_controllen <= sizeof(struct cmsghdr)) ||
+	    (msg->msg_controllen - sizeof(struct cmsghdr)) < sizeof(int)) {
+		msg->msg_flags |= MSG_CTRUNC;
+		return;
+	}
+
+	WARN_ON_ONCE(!scm->pid);
+	pidfd = pidfd_prepare(scm->pid, 0, &pidfd_file);
+
+	if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {
+		if (pidfd_file) {
+			put_unused_fd(pidfd);
+			fput(pidfd_file);
+		}
+
+		return;
+	}
+
+	if (pidfd_file)
+		fd_install(pidfd, pidfd_file);
+}
+
 static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
 				struct scm_cookie *scm, int flags)
 {
 	if (!msg->msg_control) {
-		if (test_bit(SOCK_PASSCRED, &sock->flags) || scm->fp ||
-		    scm_has_secdata(sock))
+		if (test_bit(SOCK_PASSCRED, &sock->flags) ||
+		    test_bit(SOCK_PASSPIDFD, &sock->flags) ||
+		    scm->fp || scm_has_secdata(sock))
 			msg->msg_flags |= MSG_CTRUNC;
 		scm_destroy(scm);
 		return;
@@ -141,6 +173,9 @@ static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
 		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(ucreds), &ucreds);
 	}
 
+	if (test_bit(SOCK_PASSPIDFD, &sock->flags))
+		scm_pidfd_recv(msg, scm);
+
 	scm_destroy_cred(scm);
 
 	scm_passec(sock, msg, scm);
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 638230899e98..b76169fdb80b 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -132,6 +132,8 @@
 
 #define SO_RCVMARK		75
 
+#define SO_PASSPIDFD		76
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/sock.c b/net/core/sock.c
index c25888795390..3f974246ba3e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1246,6 +1246,13 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
 			clear_bit(SOCK_PASSCRED, &sock->flags);
 		break;
 
+	case SO_PASSPIDFD:
+		if (valbool)
+			set_bit(SOCK_PASSPIDFD, &sock->flags);
+		else
+			clear_bit(SOCK_PASSPIDFD, &sock->flags);
+		break;
+
 	case SO_TIMESTAMP_OLD:
 	case SO_TIMESTAMP_NEW:
 	case SO_TIMESTAMPNS_OLD:
@@ -1737,6 +1744,10 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 		v.val = !!test_bit(SOCK_PASSCRED, &sock->flags);
 		break;
 
+	case SO_PASSPIDFD:
+		v.val = !!test_bit(SOCK_PASSPIDFD, &sock->flags);
+		break;
+
 	case SO_PEERCRED:
 	{
 		struct ucred peercred;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index b655cebda0f3..67be0558862f 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -355,6 +355,7 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	case SO_BROADCAST:
 	case SO_BSDCOMPAT:
 	case SO_PASSCRED:
+	case SO_PASSPIDFD:
 	case SO_PASSSEC:
 	case SO_RXQ_OVFL:
 	case SO_WIFI_STATUS:
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index fb31e8a4409e..6d5dff4dfe83 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1361,7 +1361,8 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
 		if (err)
 			goto out;
 
-		if (test_bit(SOCK_PASSCRED, &sock->flags) &&
+		if ((test_bit(SOCK_PASSCRED, &sock->flags) ||
+		     test_bit(SOCK_PASSPIDFD, &sock->flags)) &&
 		    !unix_sk(sk)->addr) {
 			err = unix_autobind(sk);
 			if (err)
@@ -1469,7 +1470,8 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	if (err)
 		goto out;
 
-	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr) {
+	if ((test_bit(SOCK_PASSCRED, &sock->flags) ||
+	     test_bit(SOCK_PASSPIDFD, &sock->flags)) && !u->addr) {
 		err = unix_autobind(sk);
 		if (err)
 			goto out;
@@ -1670,6 +1672,8 @@ static void unix_sock_inherit_flags(const struct socket *old,
 {
 	if (test_bit(SOCK_PASSCRED, &old->flags))
 		set_bit(SOCK_PASSCRED, &new->flags);
+	if (test_bit(SOCK_PASSPIDFD, &old->flags))
+		set_bit(SOCK_PASSPIDFD, &new->flags);
 	if (test_bit(SOCK_PASSSEC, &old->flags))
 		set_bit(SOCK_PASSSEC, &new->flags);
 }
@@ -1819,8 +1823,10 @@ static bool unix_passcred_enabled(const struct socket *sock,
 				  const struct sock *other)
 {
 	return test_bit(SOCK_PASSCRED, &sock->flags) ||
+	       test_bit(SOCK_PASSPIDFD, &sock->flags) ||
 	       !other->sk_socket ||
-	       test_bit(SOCK_PASSCRED, &other->sk_socket->flags);
+	       test_bit(SOCK_PASSCRED, &other->sk_socket->flags) ||
+	       test_bit(SOCK_PASSPIDFD, &other->sk_socket->flags);
 }
 
 /*
@@ -1922,7 +1928,8 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 			goto out;
 	}
 
-	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr) {
+	if ((test_bit(SOCK_PASSCRED, &sock->flags) ||
+	     test_bit(SOCK_PASSPIDFD, &sock->flags)) && !u->addr) {
 		err = unix_autobind(sk);
 		if (err)
 			goto out;
@@ -2824,7 +2831,8 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
 			/* Never glue messages from different writers */
 			if (!unix_skb_scm_eq(skb, &scm))
 				break;
-		} else if (test_bit(SOCK_PASSCRED, &sock->flags)) {
+		} else if (test_bit(SOCK_PASSCRED, &sock->flags) ||
+			   test_bit(SOCK_PASSPIDFD, &sock->flags)) {
 			/* Copy credentials */
 			scm_set_cred(&scm, UNIXCB(skb).pid, UNIXCB(skb).uid, UNIXCB(skb).gid);
 			unix_set_secdata(&scm, skb);
diff --git a/tools/include/uapi/asm-generic/socket.h b/tools/include/uapi/asm-generic/socket.h
index 8756df13be50..fbbc4bf53ee3 100644
--- a/tools/include/uapi/asm-generic/socket.h
+++ b/tools/include/uapi/asm-generic/socket.h
@@ -121,6 +121,8 @@
 
 #define SO_RCVMARK		75
 
+#define SO_PASSPIDFD		76
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook
  2023-04-13 13:33 [PATCH net-next v4 0/4] Add SCM_PIDFD and SO_PEERPIDFD Alexander Mikhalitsyn
  2023-04-13 13:33 ` [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD Alexander Mikhalitsyn
@ 2023-04-13 13:33 ` Alexander Mikhalitsyn
  2023-04-13 14:21   ` Eric Dumazet
  2023-04-13 13:33 ` [PATCH net-next v4 3/4] net: core: add getsockopt SO_PEERPIDFD Alexander Mikhalitsyn
  2023-04-13 13:33 ` [PATCH net-next v4 4/4] selftests: net: add SCM_PIDFD / SO_PEERPIDFD test Alexander Mikhalitsyn
  3 siblings, 1 reply; 27+ messages in thread
From: Alexander Mikhalitsyn @ 2023-04-13 13:33 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, netdev, daniel, Alexander Mikhalitsyn,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Christian Brauner,
	Kuniyuki Iwashima, Lennart Poettering, linux-arch

During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
that bpf cgroup hook can cause FD leaks when used with sockopts which
install FDs into the process fdtable.

After some offlist discussion it was proposed to add a blacklist of
socket options those can cause troubles when BPF cgroup hook is enabled.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 net/socket.c | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 73e493da4589..9c1ef11de23f 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -108,6 +108,8 @@
 #include <linux/ptp_clock_kernel.h>
 #include <trace/events/sock.h>
 
+#include <linux/sctp.h>
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 unsigned int sysctl_net_busy_read __read_mostly;
 unsigned int sysctl_net_busy_poll __read_mostly;
@@ -2227,6 +2229,36 @@ static bool sock_use_custom_sol_socket(const struct socket *sock)
 	return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
 }
 
+#ifdef CONFIG_CGROUP_BPF
+static bool sockopt_installs_fd(int level, int optname)
+{
+	/*
+	 * These options do fd_install(), and if BPF_CGROUP_RUN_PROG_GETSOCKOPT
+	 * hook returns an error after success of the original handler
+	 * sctp_getsockopt(...), userspace will receive an error from getsockopt
+	 * syscall and will be not aware that fd was successfully installed into fdtable.
+	 *
+	 * Let's prevent bpf cgroup hook from running on them.
+	 */
+	if (level == SOL_SCTP) {
+		switch (optname) {
+		case SCTP_SOCKOPT_PEELOFF:
+		case SCTP_SOCKOPT_PEELOFF_FLAGS:
+			return true;
+		default:
+			return false;
+		}
+	}
+
+	return false;
+}
+#else /* CONFIG_CGROUP_BPF */
+static inline bool sockopt_installs_fd(int level, int optname)
+{
+	return false;
+}
+#endif /* CONFIG_CGROUP_BPF */
+
 /*
  *	Set a socket option. Because we don't know the option lengths we have
  *	to pass the user mode parameter for the protocols to sort out.
@@ -2250,7 +2282,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 	if (err)
 		goto out_put;
 
-	if (!in_compat_syscall())
+	if (!in_compat_syscall() && !sockopt_installs_fd(level, optname))
 		err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
 						     user_optval, &optlen,
 						     &kernel_optval);
@@ -2304,7 +2336,7 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 	if (err)
 		goto out_put;
 
-	if (!in_compat_syscall())
+	if (!in_compat_syscall() && !sockopt_installs_fd(level, optname))
 		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
 
 	if (level == SOL_SOCKET)
@@ -2315,7 +2347,7 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 		err = sock->ops->getsockopt(sock, level, optname, optval,
 					    optlen);
 
-	if (!in_compat_syscall())
+	if (!in_compat_syscall() && !sockopt_installs_fd(level, optname))
 		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
 						     optval, optlen, max_optlen,
 						     err);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v4 3/4] net: core: add getsockopt SO_PEERPIDFD
  2023-04-13 13:33 [PATCH net-next v4 0/4] Add SCM_PIDFD and SO_PEERPIDFD Alexander Mikhalitsyn
  2023-04-13 13:33 ` [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD Alexander Mikhalitsyn
  2023-04-13 13:33 ` [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook Alexander Mikhalitsyn
@ 2023-04-13 13:33 ` Alexander Mikhalitsyn
  2023-04-18 13:13   ` Christian Brauner
  2023-04-13 13:33 ` [PATCH net-next v4 4/4] selftests: net: add SCM_PIDFD / SO_PEERPIDFD test Alexander Mikhalitsyn
  3 siblings, 1 reply; 27+ messages in thread
From: Alexander Mikhalitsyn @ 2023-04-13 13:33 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, netdev, daniel, Alexander Mikhalitsyn,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Christian Brauner,
	Kuniyuki Iwashima, Lennart Poettering, Luca Boccassi, linux-arch

Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd.
This thing is direct analog of SO_PEERCRED which allows to get plain PID.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Tested-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
v4:
	- return -ESRCH if sk->sk_peer_pid is NULL from getsockopt() syscall
	- return errors from pidfd_prepare() as it is from getsockopt() syscall
v3:
	- fixed possible fd leak (thanks to Christian Brauner)
v2:
	According to review comments from Kuniyuki Iwashima and Christian Brauner:
	- use pidfd_create(..) retval as a result
	- whitespace change
---
 arch/alpha/include/uapi/asm/socket.h    |  1 +
 arch/mips/include/uapi/asm/socket.h     |  1 +
 arch/parisc/include/uapi/asm/socket.h   |  1 +
 arch/sparc/include/uapi/asm/socket.h    |  1 +
 include/uapi/asm-generic/socket.h       |  1 +
 net/core/sock.c                         | 33 +++++++++++++++++++++++++
 net/socket.c                            |  7 ++++++
 tools/include/uapi/asm-generic/socket.h |  1 +
 8 files changed, 46 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index ff310613ae64..e94f621903fe 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -138,6 +138,7 @@
 #define SO_RCVMARK		75
 
 #define SO_PASSPIDFD		76
+#define SO_PEERPIDFD		77
 
 #if !defined(__KERNEL__)
 
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 762dcb80e4ec..60ebaed28a4c 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -149,6 +149,7 @@
 #define SO_RCVMARK		75
 
 #define SO_PASSPIDFD		76
+#define SO_PEERPIDFD		77
 
 #if !defined(__KERNEL__)
 
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index df16a3e16d64..be264c2b1a11 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -130,6 +130,7 @@
 #define SO_RCVMARK		0x4049
 
 #define SO_PASSPIDFD		0x404A
+#define SO_PEERPIDFD		0x404B
 
 #if !defined(__KERNEL__)
 
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 6e2847804fea..682da3714686 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -131,6 +131,7 @@
 #define SO_RCVMARK               0x0054
 
 #define SO_PASSPIDFD             0x0055
+#define SO_PEERPIDFD             0x0056
 
 #if !defined(__KERNEL__)
 
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index b76169fdb80b..8ce8a39a1e5f 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -133,6 +133,7 @@
 #define SO_RCVMARK		75
 
 #define SO_PASSPIDFD		76
+#define SO_PEERPIDFD		77
 
 #if !defined(__KERNEL__)
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 3f974246ba3e..49531c0dc57f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1763,6 +1763,39 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 		goto lenout;
 	}
 
+	case SO_PEERPIDFD:
+	{
+		struct pid *peer_pid;
+		struct file *pidfd_file = NULL;
+		int pidfd;
+
+		if (len > sizeof(pidfd))
+			len = sizeof(pidfd);
+
+		spin_lock(&sk->sk_peer_lock);
+		peer_pid = get_pid(sk->sk_peer_pid);
+		spin_unlock(&sk->sk_peer_lock);
+
+		if (!peer_pid)
+			return -ESRCH;
+
+		pidfd = pidfd_prepare(peer_pid, 0, &pidfd_file);
+		put_pid(peer_pid);
+		if (pidfd < 0)
+			return pidfd;
+
+		if (copy_to_sockptr(optval, &pidfd, len) ||
+		    copy_to_sockptr(optlen, &len, sizeof(int))) {
+			put_unused_fd(pidfd);
+			fput(pidfd_file);
+
+			return -EFAULT;
+		}
+
+		fd_install(pidfd, pidfd_file);
+		return 0;
+	}
+
 	case SO_PEERGROUPS:
 	{
 		const struct cred *cred;
diff --git a/net/socket.c b/net/socket.c
index 9c1ef11de23f..505b85489354 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2248,6 +2248,13 @@ static bool sockopt_installs_fd(int level, int optname)
 		default:
 			return false;
 		}
+	} else if (level == SOL_SOCKET) {
+		switch (optname) {
+		case SO_PEERPIDFD:
+			return true;
+		default:
+			return false;
+		}
 	}
 
 	return false;
diff --git a/tools/include/uapi/asm-generic/socket.h b/tools/include/uapi/asm-generic/socket.h
index fbbc4bf53ee3..54d9c8bf7c55 100644
--- a/tools/include/uapi/asm-generic/socket.h
+++ b/tools/include/uapi/asm-generic/socket.h
@@ -122,6 +122,7 @@
 #define SO_RCVMARK		75
 
 #define SO_PASSPIDFD		76
+#define SO_PEERPIDFD		77
 
 #if !defined(__KERNEL__)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v4 4/4] selftests: net: add SCM_PIDFD / SO_PEERPIDFD test
  2023-04-13 13:33 [PATCH net-next v4 0/4] Add SCM_PIDFD and SO_PEERPIDFD Alexander Mikhalitsyn
                   ` (2 preceding siblings ...)
  2023-04-13 13:33 ` [PATCH net-next v4 3/4] net: core: add getsockopt SO_PEERPIDFD Alexander Mikhalitsyn
@ 2023-04-13 13:33 ` Alexander Mikhalitsyn
  3 siblings, 0 replies; 27+ messages in thread
From: Alexander Mikhalitsyn @ 2023-04-13 13:33 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, netdev, daniel, Alexander Mikhalitsyn,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Christian Brauner,
	Kuniyuki Iwashima, linux-arch, linux-kselftest

Basic test to check consistency between:
- SCM_CREDENTIALS and SCM_PIDFD
- SO_PEERCRED and SO_PEERPIDFD

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
v3:
	- started using kselftest lib (thanks to Kuniyuki Iwashima for suggestion/review)
	- now test covers abstract sockets too and SOCK_DGRAM sockets
---
 tools/testing/selftests/net/.gitignore        |   1 +
 tools/testing/selftests/net/af_unix/Makefile  |   2 +-
 .../testing/selftests/net/af_unix/scm_pidfd.c | 430 ++++++++++++++++++
 3 files changed, 432 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/af_unix/scm_pidfd.c

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 80f06aa62034..83fd1ebd34ec 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -26,6 +26,7 @@ reuseport_bpf_cpu
 reuseport_bpf_numa
 reuseport_dualstack
 rxtimestamp
+scm_pidfd
 sk_bind_sendto_listen
 sk_connect_zero_addr
 socket
diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile
index 1e4b397cece6..f5ca9da8c4d5 100644
--- a/tools/testing/selftests/net/af_unix/Makefile
+++ b/tools/testing/selftests/net/af_unix/Makefile
@@ -1,3 +1,3 @@
-TEST_GEN_PROGS := diag_uid test_unix_oob unix_connect
+TEST_GEN_PROGS := diag_uid test_unix_oob unix_connect scm_pidfd
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/net/af_unix/scm_pidfd.c b/tools/testing/selftests/net/af_unix/scm_pidfd.c
new file mode 100644
index 000000000000..a86222143d79
--- /dev/null
+++ b/tools/testing/selftests/net/af_unix/scm_pidfd.c
@@ -0,0 +1,430 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+#define _GNU_SOURCE
+#include <error.h>
+#include <limits.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+#include <linux/socket.h>
+#include <unistd.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/un.h>
+#include <sys/signal.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "../../kselftest_harness.h"
+
+#define clean_errno() (errno == 0 ? "None" : strerror(errno))
+#define log_err(MSG, ...)                                                   \
+	fprintf(stderr, "(%s:%d: errno: %s) " MSG "\n", __FILE__, __LINE__, \
+		clean_errno(), ##__VA_ARGS__)
+
+#ifndef SCM_PIDFD
+#define SCM_PIDFD 0x04
+#endif
+
+static void child_die()
+{
+	exit(1);
+}
+
+static int safe_int(const char *numstr, int *converted)
+{
+	char *err = NULL;
+	long sli;
+
+	errno = 0;
+	sli = strtol(numstr, &err, 0);
+	if (errno == ERANGE && (sli == LONG_MAX || sli == LONG_MIN))
+		return -ERANGE;
+
+	if (errno != 0 && sli == 0)
+		return -EINVAL;
+
+	if (err == numstr || *err != '\0')
+		return -EINVAL;
+
+	if (sli > INT_MAX || sli < INT_MIN)
+		return -ERANGE;
+
+	*converted = (int)sli;
+	return 0;
+}
+
+static int char_left_gc(const char *buffer, size_t len)
+{
+	size_t i;
+
+	for (i = 0; i < len; i++) {
+		if (buffer[i] == ' ' || buffer[i] == '\t')
+			continue;
+
+		return i;
+	}
+
+	return 0;
+}
+
+static int char_right_gc(const char *buffer, size_t len)
+{
+	int i;
+
+	for (i = len - 1; i >= 0; i--) {
+		if (buffer[i] == ' ' || buffer[i] == '\t' ||
+		    buffer[i] == '\n' || buffer[i] == '\0')
+			continue;
+
+		return i + 1;
+	}
+
+	return 0;
+}
+
+static char *trim_whitespace_in_place(char *buffer)
+{
+	buffer += char_left_gc(buffer, strlen(buffer));
+	buffer[char_right_gc(buffer, strlen(buffer))] = '\0';
+	return buffer;
+}
+
+/* borrowed (with all helpers) from pidfd/pidfd_open_test.c */
+static pid_t get_pid_from_fdinfo_file(int pidfd, const char *key, size_t keylen)
+{
+	int ret;
+	char path[512];
+	FILE *f;
+	size_t n = 0;
+	pid_t result = -1;
+	char *line = NULL;
+
+	snprintf(path, sizeof(path), "/proc/self/fdinfo/%d", pidfd);
+
+	f = fopen(path, "re");
+	if (!f)
+		return -1;
+
+	while (getline(&line, &n, f) != -1) {
+		char *numstr;
+
+		if (strncmp(line, key, keylen))
+			continue;
+
+		numstr = trim_whitespace_in_place(line + 4);
+		ret = safe_int(numstr, &result);
+		if (ret < 0)
+			goto out;
+
+		break;
+	}
+
+out:
+	free(line);
+	fclose(f);
+	return result;
+}
+
+static int cmsg_check(int fd)
+{
+	struct msghdr msg = { 0 };
+	struct cmsghdr *cmsg;
+	struct iovec iov;
+	struct ucred *ucred = NULL;
+	int data = 0;
+	char control[CMSG_SPACE(sizeof(struct ucred)) +
+		     CMSG_SPACE(sizeof(int))] = { 0 };
+	int *pidfd = NULL;
+	pid_t parent_pid;
+	int err;
+
+	iov.iov_base = &data;
+	iov.iov_len = sizeof(data);
+
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = control;
+	msg.msg_controllen = sizeof(control);
+
+	err = recvmsg(fd, &msg, 0);
+	if (err < 0) {
+		log_err("recvmsg");
+		return 1;
+	}
+
+	if (msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) {
+		log_err("recvmsg: truncated");
+		return 1;
+	}
+
+	for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL;
+	     cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+		if (cmsg->cmsg_level == SOL_SOCKET &&
+		    cmsg->cmsg_type == SCM_PIDFD) {
+			if (cmsg->cmsg_len < sizeof(*pidfd)) {
+				log_err("CMSG parse: SCM_PIDFD wrong len");
+				return 1;
+			}
+
+			pidfd = (void *)CMSG_DATA(cmsg);
+		}
+
+		if (cmsg->cmsg_level == SOL_SOCKET &&
+		    cmsg->cmsg_type == SCM_CREDENTIALS) {
+			if (cmsg->cmsg_len < sizeof(*ucred)) {
+				log_err("CMSG parse: SCM_CREDENTIALS wrong len");
+				return 1;
+			}
+
+			ucred = (void *)CMSG_DATA(cmsg);
+		}
+	}
+
+	/* send(pfd, "x", sizeof(char), 0) */
+	if (data != 'x') {
+		log_err("recvmsg: data corruption");
+		return 1;
+	}
+
+	if (!pidfd) {
+		log_err("CMSG parse: SCM_PIDFD not found");
+		return 1;
+	}
+
+	if (!ucred) {
+		log_err("CMSG parse: SCM_CREDENTIALS not found");
+		return 1;
+	}
+
+	/* pidfd from SCM_PIDFD should point to the parent process PID */
+	parent_pid =
+		get_pid_from_fdinfo_file(*pidfd, "Pid:", sizeof("Pid:") - 1);
+	if (parent_pid != getppid()) {
+		log_err("wrong SCM_PIDFD %d != %d", parent_pid, getppid());
+		return 1;
+	}
+
+	return 0;
+}
+
+struct sock_addr {
+	char sock_name[32];
+	struct sockaddr_un listen_addr;
+	socklen_t addrlen;
+};
+
+FIXTURE(scm_pidfd)
+{
+	int server;
+	pid_t client_pid;
+	int startup_pipe[2];
+	struct sock_addr server_addr;
+	struct sock_addr *client_addr;
+};
+
+FIXTURE_VARIANT(scm_pidfd)
+{
+	int type;
+	bool abstract;
+};
+
+FIXTURE_VARIANT_ADD(scm_pidfd, stream_pathname)
+{
+	.type = SOCK_STREAM,
+	.abstract = 0,
+};
+
+FIXTURE_VARIANT_ADD(scm_pidfd, stream_abstract)
+{
+	.type = SOCK_STREAM,
+	.abstract = 1,
+};
+
+FIXTURE_VARIANT_ADD(scm_pidfd, dgram_pathname)
+{
+	.type = SOCK_DGRAM,
+	.abstract = 0,
+};
+
+FIXTURE_VARIANT_ADD(scm_pidfd, dgram_abstract)
+{
+	.type = SOCK_DGRAM,
+	.abstract = 1,
+};
+
+FIXTURE_SETUP(scm_pidfd)
+{
+	self->client_addr = mmap(NULL, sizeof(*self->client_addr), PROT_READ | PROT_WRITE,
+				 MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+	ASSERT_NE(MAP_FAILED, self->client_addr);
+}
+
+FIXTURE_TEARDOWN(scm_pidfd)
+{
+	close(self->server);
+
+	kill(self->client_pid, SIGKILL);
+	waitpid(self->client_pid, NULL, 0);
+
+	if (!variant->abstract) {
+		unlink(self->server_addr.sock_name);
+		unlink(self->client_addr->sock_name);
+	}
+}
+
+static void fill_sockaddr(struct sock_addr *addr, bool abstract)
+{
+	char *sun_path_buf = (char *)&addr->listen_addr.sun_path;
+
+	addr->listen_addr.sun_family = AF_UNIX;
+	addr->addrlen = offsetof(struct sockaddr_un, sun_path);
+	snprintf(addr->sock_name, sizeof(addr->sock_name), "scm_pidfd_%d", getpid());
+	addr->addrlen += strlen(addr->sock_name);
+	if (abstract) {
+		*sun_path_buf = '\0';
+		addr->addrlen++;
+		sun_path_buf++;
+	} else {
+		unlink(addr->sock_name);
+	}
+	memcpy(sun_path_buf, addr->sock_name, strlen(addr->sock_name));
+}
+
+static void client(FIXTURE_DATA(scm_pidfd) *self,
+		   const FIXTURE_VARIANT(scm_pidfd) *variant)
+{
+	int err;
+	int cfd;
+	socklen_t len;
+	struct ucred peer_cred;
+	int peer_pidfd;
+	pid_t peer_pid;
+	int on = 0;
+
+	cfd = socket(AF_UNIX, variant->type, 0);
+	if (cfd < 0) {
+		log_err("socket");
+		child_die();
+	}
+
+	if (variant->type == SOCK_DGRAM) {
+		fill_sockaddr(self->client_addr, variant->abstract);
+
+		if (bind(cfd, (struct sockaddr *)&self->client_addr->listen_addr, self->client_addr->addrlen)) {
+			log_err("bind");
+			child_die();
+		}
+	}
+
+	if (connect(cfd, (struct sockaddr *)&self->server_addr.listen_addr,
+		    self->server_addr.addrlen) != 0) {
+		log_err("connect");
+		child_die();
+	}
+
+	on = 1;
+	if (setsockopt(cfd, SOL_SOCKET, SO_PASSCRED, &on, sizeof(on))) {
+		log_err("Failed to set SO_PASSCRED");
+		child_die();
+	}
+
+	if (setsockopt(cfd, SOL_SOCKET, SO_PASSPIDFD, &on, sizeof(on))) {
+		log_err("Failed to set SO_PASSPIDFD");
+		child_die();
+	}
+
+	close(self->startup_pipe[1]);
+
+	if (cmsg_check(cfd)) {
+		log_err("cmsg_check failed");
+		child_die();
+	}
+
+	/* skip further for SOCK_DGRAM as it's not applicable */
+	if (variant->type == SOCK_DGRAM)
+		return;
+
+	len = sizeof(peer_cred);
+	if (getsockopt(cfd, SOL_SOCKET, SO_PEERCRED, &peer_cred, &len)) {
+		log_err("Failed to get SO_PEERCRED");
+		child_die();
+	}
+
+	len = sizeof(peer_pidfd);
+	if (getsockopt(cfd, SOL_SOCKET, SO_PEERPIDFD, &peer_pidfd, &len)) {
+		log_err("Failed to get SO_PEERPIDFD");
+		child_die();
+	}
+
+	/* pid from SO_PEERCRED should point to the parent process PID */
+	if (peer_cred.pid != getppid()) {
+		log_err("peer_cred.pid != getppid(): %d != %d", peer_cred.pid, getppid());
+		child_die();
+	}
+
+	peer_pid = get_pid_from_fdinfo_file(peer_pidfd,
+					    "Pid:", sizeof("Pid:") - 1);
+	if (peer_pid != peer_cred.pid) {
+		log_err("peer_pid != peer_cred.pid: %d != %d", peer_pid, peer_cred.pid);
+		child_die();
+	}
+}
+
+TEST_F(scm_pidfd, test)
+{
+	int err;
+	int pfd;
+	int child_status = 0;
+
+	self->server = socket(AF_UNIX, variant->type, 0);
+	ASSERT_NE(-1, self->server);
+
+	fill_sockaddr(&self->server_addr, variant->abstract);
+
+	err = bind(self->server, (struct sockaddr *)&self->server_addr.listen_addr, self->server_addr.addrlen);
+	ASSERT_EQ(0, err);
+
+	if (variant->type == SOCK_STREAM) {
+		err = listen(self->server, 1);
+		ASSERT_EQ(0, err);
+	}
+
+	err = pipe(self->startup_pipe);
+	ASSERT_NE(-1, err);
+
+	self->client_pid = fork();
+	ASSERT_NE(-1, self->client_pid);
+	if (self->client_pid == 0) {
+		close(self->server);
+		close(self->startup_pipe[0]);
+		client(self, variant);
+		exit(0);
+	}
+	close(self->startup_pipe[1]);
+
+	if (variant->type == SOCK_STREAM) {
+		pfd = accept(self->server, NULL, NULL);
+		ASSERT_NE(-1, pfd);
+	} else {
+		pfd = self->server;
+	}
+
+	/* wait until the child arrives at checkpoint */
+	read(self->startup_pipe[0], &err, sizeof(int));
+	close(self->startup_pipe[0]);
+
+	if (variant->type == SOCK_DGRAM) {
+		err = sendto(pfd, "x", sizeof(char), 0, (struct sockaddr *)&self->client_addr->listen_addr, self->client_addr->addrlen);
+		ASSERT_NE(-1, err);
+	} else {
+		err = send(pfd, "x", sizeof(char), 0);
+		ASSERT_NE(-1, err);
+	}
+
+	close(pfd);
+	waitpid(self->client_pid, &child_status, 0);
+	ASSERT_EQ(0, WIFEXITED(child_status) ? WEXITSTATUS(child_status) : 1);
+}
+
+TEST_HARNESS_MAIN
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook
  2023-04-13 13:33 ` [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook Alexander Mikhalitsyn
@ 2023-04-13 14:21   ` Eric Dumazet
  2023-04-13 14:38     ` Aleksandr Mikhalitsyn
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2023-04-13 14:21 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: davem, linux-kernel, netdev, daniel, Jakub Kicinski, Paolo Abeni,
	Leon Romanovsky, David Ahern, Arnd Bergmann, Kees Cook,
	Christian Brauner, Kuniyuki Iwashima, Lennart Poettering,
	linux-arch

On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
<aleksandr.mikhalitsyn@canonical.com> wrote:
>
> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> that bpf cgroup hook can cause FD leaks when used with sockopts which
> install FDs into the process fdtable.
>
> After some offlist discussion it was proposed to add a blacklist of

We try to replace this word by either denylist or blocklist, even in changelogs.

> socket options those can cause troubles when BPF cgroup hook is enabled.
>

Can we find the appropriate Fixes: tag to help stable teams ?

> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Lennart Poettering <mzxreary@0pointer.de>
> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Suggested-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook
  2023-04-13 14:21   ` Eric Dumazet
@ 2023-04-13 14:38     ` Aleksandr Mikhalitsyn
  2023-04-13 16:37       ` Stanislav Fomichev
  2023-04-15  0:51       ` [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook Jakub Kicinski
  0 siblings, 2 replies; 27+ messages in thread
From: Aleksandr Mikhalitsyn @ 2023-04-13 14:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, linux-kernel, netdev, daniel, Jakub Kicinski, Paolo Abeni,
	Leon Romanovsky, David Ahern, Arnd Bergmann, Kees Cook,
	Christian Brauner, Kuniyuki Iwashima, Lennart Poettering,
	linux-arch, sdf

On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >
> > During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> > that bpf cgroup hook can cause FD leaks when used with sockopts which
> > install FDs into the process fdtable.
> >
> > After some offlist discussion it was proposed to add a blacklist of
>
> We try to replace this word by either denylist or blocklist, even in changelogs.

Hi Eric,

Oh, I'm sorry about that. :( Sure.

>
> > socket options those can cause troubles when BPF cgroup hook is enabled.
> >
>
> Can we find the appropriate Fixes: tag to help stable teams ?

Sure, I will add next time.

Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")

I think it's better to add Stanislav Fomichev to CC.

Kind regards,
Alex

>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Leon Romanovsky <leon@kernel.org>
> > Cc: David Ahern <dsahern@kernel.org>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> > Cc: Lennart Poettering <mzxreary@0pointer.de>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: netdev@vger.kernel.org
> > Cc: linux-arch@vger.kernel.org
> > Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> > Suggested-by: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>
> Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook
  2023-04-13 14:38     ` Aleksandr Mikhalitsyn
@ 2023-04-13 16:37       ` Stanislav Fomichev
  2023-04-15  1:55         ` Stanislav Fomichev
  2023-04-15  0:51       ` [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook Jakub Kicinski
  1 sibling, 1 reply; 27+ messages in thread
From: Stanislav Fomichev @ 2023-04-13 16:37 UTC (permalink / raw)
  To: Aleksandr Mikhalitsyn
  Cc: Eric Dumazet, davem, linux-kernel, netdev, daniel,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch

On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
<aleksandr.mikhalitsyn@canonical.com> wrote:
>
> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > >
> > > During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> > > that bpf cgroup hook can cause FD leaks when used with sockopts which
> > > install FDs into the process fdtable.
> > >
> > > After some offlist discussion it was proposed to add a blacklist of
> >
> > We try to replace this word by either denylist or blocklist, even in changelogs.
>
> Hi Eric,
>
> Oh, I'm sorry about that. :( Sure.
>
> >
> > > socket options those can cause troubles when BPF cgroup hook is enabled.
> > >
> >
> > Can we find the appropriate Fixes: tag to help stable teams ?
>
> Sure, I will add next time.
>
> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>
> I think it's better to add Stanislav Fomichev to CC.

Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
use it for tcp zerocopy, I'm assuming it should work in this case as
well?

> Kind regards,
> Alex
>
> >
> > > Cc: "David S. Miller" <davem@davemloft.net>
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > Cc: Paolo Abeni <pabeni@redhat.com>
> > > Cc: Leon Romanovsky <leon@kernel.org>
> > > Cc: David Ahern <dsahern@kernel.org>
> > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > Cc: Lennart Poettering <mzxreary@0pointer.de>
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: netdev@vger.kernel.org
> > > Cc: linux-arch@vger.kernel.org
> > > Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> > > Suggested-by: Christian Brauner <brauner@kernel.org>
> > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >
> > Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook
  2023-04-13 14:38     ` Aleksandr Mikhalitsyn
  2023-04-13 16:37       ` Stanislav Fomichev
@ 2023-04-15  0:51       ` Jakub Kicinski
  1 sibling, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2023-04-15  0:51 UTC (permalink / raw)
  To: Aleksandr Mikhalitsyn
  Cc: Eric Dumazet, davem, linux-kernel, netdev, daniel, Paolo Abeni,
	Leon Romanovsky, David Ahern, Arnd Bergmann, Kees Cook,
	Christian Brauner, Kuniyuki Iwashima, Lennart Poettering,
	linux-arch, sdf

On Thu, 13 Apr 2023 16:38:35 +0200 Aleksandr Mikhalitsyn wrote:
> Sure, I will add next time.
> 
> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> 
> I think it's better to add Stanislav Fomichev to CC.

This should go in separately as a fix, right? Not in a -next series.

Also the whole set as is does not build on top of net-next :(

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook
  2023-04-13 16:37       ` Stanislav Fomichev
@ 2023-04-15  1:55         ` Stanislav Fomichev
  2023-04-17 14:42           ` Christian Brauner
  2023-04-18  1:03           ` handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook) Martin KaFai Lau
  0 siblings, 2 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2023-04-15  1:55 UTC (permalink / raw)
  To: Aleksandr Mikhalitsyn
  Cc: Eric Dumazet, davem, linux-kernel, netdev, daniel,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch

On 04/13, Stanislav Fomichev wrote:
> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >
> > On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> > > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > > >
> > > > During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> > > > that bpf cgroup hook can cause FD leaks when used with sockopts which
> > > > install FDs into the process fdtable.
> > > >
> > > > After some offlist discussion it was proposed to add a blacklist of
> > >
> > > We try to replace this word by either denylist or blocklist, even in changelogs.
> >
> > Hi Eric,
> >
> > Oh, I'm sorry about that. :( Sure.
> >
> > >
> > > > socket options those can cause troubles when BPF cgroup hook is enabled.
> > > >
> > >
> > > Can we find the appropriate Fixes: tag to help stable teams ?
> >
> > Sure, I will add next time.
> >
> > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> >
> > I think it's better to add Stanislav Fomichev to CC.
> 
> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> use it for tcp zerocopy, I'm assuming it should work in this case as
> well?

Jakub reminded me of the other things I wanted to ask here bug forgot:

- setsockopt is probably not needed, right? setsockopt hook triggers
  before the kernel and shouldn't leak anything
- for getsockopt, instead of bypassing bpf completely, should we instead
  ignore the error from the bpf program? that would still preserve
  the observability aspect
- or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
  gets called whenever bpf returns an error to make sure protocols have
  a chance to handle that condition (and free the fd)

> > Kind regards,
> > Alex
> >
> > >
> > > > Cc: "David S. Miller" <davem@davemloft.net>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > > Cc: Paolo Abeni <pabeni@redhat.com>
> > > > Cc: Leon Romanovsky <leon@kernel.org>
> > > > Cc: David Ahern <dsahern@kernel.org>
> > > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > Cc: Lennart Poettering <mzxreary@0pointer.de>
> > > > Cc: linux-kernel@vger.kernel.org
> > > > Cc: netdev@vger.kernel.org
> > > > Cc: linux-arch@vger.kernel.org
> > > > Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> > > > Suggested-by: Christian Brauner <brauner@kernel.org>
> > > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > >
> > > Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook
  2023-04-15  1:55         ` Stanislav Fomichev
@ 2023-04-17 14:42           ` Christian Brauner
  2023-04-17 18:10             ` Stanislav Fomichev
  2023-04-18  1:03           ` handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook) Martin KaFai Lau
  1 sibling, 1 reply; 27+ messages in thread
From: Christian Brauner @ 2023-04-17 14:42 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Aleksandr Mikhalitsyn, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch

On Fri, Apr 14, 2023 at 06:55:39PM -0700, Stanislav Fomichev wrote:
> On 04/13, Stanislav Fomichev wrote:
> > On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > >
> > > On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> > > > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > > > >
> > > > > During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> > > > > that bpf cgroup hook can cause FD leaks when used with sockopts which
> > > > > install FDs into the process fdtable.
> > > > >
> > > > > After some offlist discussion it was proposed to add a blacklist of
> > > >
> > > > We try to replace this word by either denylist or blocklist, even in changelogs.
> > >
> > > Hi Eric,
> > >
> > > Oh, I'm sorry about that. :( Sure.
> > >
> > > >
> > > > > socket options those can cause troubles when BPF cgroup hook is enabled.
> > > > >
> > > >
> > > > Can we find the appropriate Fixes: tag to help stable teams ?
> > >
> > > Sure, I will add next time.
> > >
> > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > >
> > > I think it's better to add Stanislav Fomichev to CC.
> > 
> > Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> > use it for tcp zerocopy, I'm assuming it should work in this case as
> > well?
> 
> Jakub reminded me of the other things I wanted to ask here bug forgot:
> 
> - setsockopt is probably not needed, right? setsockopt hook triggers
>   before the kernel and shouldn't leak anything
> - for getsockopt, instead of bypassing bpf completely, should we instead
>   ignore the error from the bpf program? that would still preserve

That's fine by me as well.

It'd be great if the net folks could tell Alex how they would want this
handled.

>   the observability aspect

Please see for more details
https://lore.kernel.org/lkml/20230411-nudelsalat-spreu-3038458f25c4@brauner


> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>   gets called whenever bpf returns an error to make sure protocols have
>   a chance to handle that condition (and free the fd)

Installing an fd into an fdtable makes it visible to userspace at which
point calling close_fd() is doable but an absolute last resort and
generally a good indicator of misdesign. If the bpf hook wants to make
decisions based on the file then it should receive a struct
file, not an fd.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD
  2023-04-13 13:33 ` [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD Alexander Mikhalitsyn
@ 2023-04-17 15:18   ` Christian Brauner
  2023-04-17 16:01     ` Aleksandr Mikhalitsyn
  2023-04-18 13:07   ` Christian Brauner
  1 sibling, 1 reply; 27+ messages in thread
From: Christian Brauner @ 2023-04-17 15:18 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: davem, linux-kernel, netdev, daniel, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Kuniyuki Iwashima, Lennart Poettering,
	Luca Boccassi, linux-arch

On Thu, Apr 13, 2023 at 03:33:52PM +0200, Alexander Mikhalitsyn wrote:
> Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
> but it contains pidfd instead of plain pid, which allows programmers not
> to care about PID reuse problem.
> 
> Idea comes from UAPI kernel group:
> https://uapi-group.org/kernel-features/
> 
> Big thanks to Christian Brauner and Lennart Poettering for productive
> discussions about this.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Lennart Poettering <mzxreary@0pointer.de>
> Cc: Luca Boccassi <bluca@debian.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Tested-by: Luca Boccassi <bluca@debian.org>
> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
> v4:
> 	- fixed silent fd_install if writting of CMSG to the userspace fails (pointed by Christian)
> v2:
> 	According to review comments from Kuniyuki Iwashima and Christian Brauner:
> 	- use pidfd_create(..) retval as a result
> 	- whitespace change
> ---
>  arch/alpha/include/uapi/asm/socket.h    |  2 ++
>  arch/mips/include/uapi/asm/socket.h     |  2 ++
>  arch/parisc/include/uapi/asm/socket.h   |  2 ++
>  arch/sparc/include/uapi/asm/socket.h    |  2 ++
>  include/linux/net.h                     |  1 +
>  include/linux/socket.h                  |  1 +
>  include/net/scm.h                       | 39 +++++++++++++++++++++++--
>  include/uapi/asm-generic/socket.h       |  2 ++
>  net/core/sock.c                         | 11 +++++++
>  net/mptcp/sockopt.c                     |  1 +
>  net/unix/af_unix.c                      | 18 ++++++++----
>  tools/include/uapi/asm-generic/socket.h |  2 ++
>  12 files changed, 76 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> index 739891b94136..ff310613ae64 100644
> --- a/arch/alpha/include/uapi/asm/socket.h
> +++ b/arch/alpha/include/uapi/asm/socket.h
> @@ -137,6 +137,8 @@
>  
>  #define SO_RCVMARK		75
>  
> +#define SO_PASSPIDFD		76
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> index 18f3d95ecfec..762dcb80e4ec 100644
> --- a/arch/mips/include/uapi/asm/socket.h
> +++ b/arch/mips/include/uapi/asm/socket.h
> @@ -148,6 +148,8 @@
>  
>  #define SO_RCVMARK		75
>  
> +#define SO_PASSPIDFD		76
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> index f486d3dfb6bb..df16a3e16d64 100644
> --- a/arch/parisc/include/uapi/asm/socket.h
> +++ b/arch/parisc/include/uapi/asm/socket.h
> @@ -129,6 +129,8 @@
>  
>  #define SO_RCVMARK		0x4049
>  
> +#define SO_PASSPIDFD		0x404A
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> index 2fda57a3ea86..6e2847804fea 100644
> --- a/arch/sparc/include/uapi/asm/socket.h
> +++ b/arch/sparc/include/uapi/asm/socket.h
> @@ -130,6 +130,8 @@
>  
>  #define SO_RCVMARK               0x0054
>  
> +#define SO_PASSPIDFD             0x0055
> +
>  #if !defined(__KERNEL__)
>  
>  
> diff --git a/include/linux/net.h b/include/linux/net.h
> index b73ad8e3c212..c234dfbe7a30 100644
> --- a/include/linux/net.h
> +++ b/include/linux/net.h
> @@ -43,6 +43,7 @@ struct net;
>  #define SOCK_PASSSEC		4
>  #define SOCK_SUPPORT_ZC		5
>  #define SOCK_CUSTOM_SOCKOPT	6
> +#define SOCK_PASSPIDFD		7
>  
>  #ifndef ARCH_HAS_SOCKET_TYPES
>  /**
> diff --git a/include/linux/socket.h b/include/linux/socket.h
> index 13c3a237b9c9..6bf90f251910 100644
> --- a/include/linux/socket.h
> +++ b/include/linux/socket.h
> @@ -177,6 +177,7 @@ static inline size_t msg_data_left(struct msghdr *msg)
>  #define	SCM_RIGHTS	0x01		/* rw: access rights (array of int) */
>  #define SCM_CREDENTIALS 0x02		/* rw: struct ucred		*/
>  #define SCM_SECURITY	0x03		/* rw: security label		*/
> +#define SCM_PIDFD	0x04		/* ro: pidfd (int)		*/
>  
>  struct ucred {
>  	__u32	pid;
> diff --git a/include/net/scm.h b/include/net/scm.h
> index 585adc1346bd..c67f765a165b 100644
> --- a/include/net/scm.h
> +++ b/include/net/scm.h
> @@ -120,12 +120,44 @@ static inline bool scm_has_secdata(struct socket *sock)
>  }
>  #endif /* CONFIG_SECURITY_NETWORK */
>  
> +static __inline__ void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm)
> +{
> +	struct file *pidfd_file = NULL;
> +	int pidfd;
> +
> +	/*
> +	 * put_cmsg() doesn't return an error if CMSG is truncated,
> +	 * that's why we need to opencode these checks here.
> +	 */
> +	if ((msg->msg_controllen <= sizeof(struct cmsghdr)) ||
> +	    (msg->msg_controllen - sizeof(struct cmsghdr)) < sizeof(int)) {
> +		msg->msg_flags |= MSG_CTRUNC;
> +		return;

Hm, curious about this: We mark the message as truncated for SCM_PIDFD
but if the same conditions were to apply for SCM_PASSCRED we don't mark
the message as truncated. Am I reading this correct? And is so, you
please briefly explain this difference?

> +	}
> +
> +	WARN_ON_ONCE(!scm->pid);
> +	pidfd = pidfd_prepare(scm->pid, 0, &pidfd_file);
> +
> +	if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {

If the put_cmsg() of the pidfd fails userspace needs to be able to
detect this. Otherwise they can't distinguish between the SCM_PIDFD
value being zero because the put_cmsg() failed or put_cmsg() succeeded
and the allocated fd nr was 0.

Looking at put_cmsg() it looks to me that userspace will receive a
SCM_PIDFD message only if the put_cmsg() is completely successful. IIUC,
then this change is fine.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD
  2023-04-17 15:18   ` Christian Brauner
@ 2023-04-17 16:01     ` Aleksandr Mikhalitsyn
  2023-04-17 17:16       ` Christian Brauner
  0 siblings, 1 reply; 27+ messages in thread
From: Aleksandr Mikhalitsyn @ 2023-04-17 16:01 UTC (permalink / raw)
  To: Christian Brauner
  Cc: davem, linux-kernel, netdev, daniel, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Kuniyuki Iwashima, Lennart Poettering,
	Luca Boccassi, linux-arch

On Mon, Apr 17, 2023 at 5:18 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Thu, Apr 13, 2023 at 03:33:52PM +0200, Alexander Mikhalitsyn wrote:
> > Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
> > but it contains pidfd instead of plain pid, which allows programmers not
> > to care about PID reuse problem.
> >
> > Idea comes from UAPI kernel group:
> > https://uapi-group.org/kernel-features/
> >
> > Big thanks to Christian Brauner and Lennart Poettering for productive
> > discussions about this.
> >
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Leon Romanovsky <leon@kernel.org>
> > Cc: David Ahern <dsahern@kernel.org>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> > Cc: Lennart Poettering <mzxreary@0pointer.de>
> > Cc: Luca Boccassi <bluca@debian.org>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: netdev@vger.kernel.org
> > Cc: linux-arch@vger.kernel.org
> > Tested-by: Luca Boccassi <bluca@debian.org>
> > Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > ---
> > v4:
> >       - fixed silent fd_install if writting of CMSG to the userspace fails (pointed by Christian)
> > v2:
> >       According to review comments from Kuniyuki Iwashima and Christian Brauner:
> >       - use pidfd_create(..) retval as a result
> >       - whitespace change
> > ---
> >  arch/alpha/include/uapi/asm/socket.h    |  2 ++
> >  arch/mips/include/uapi/asm/socket.h     |  2 ++
> >  arch/parisc/include/uapi/asm/socket.h   |  2 ++
> >  arch/sparc/include/uapi/asm/socket.h    |  2 ++
> >  include/linux/net.h                     |  1 +
> >  include/linux/socket.h                  |  1 +
> >  include/net/scm.h                       | 39 +++++++++++++++++++++++--
> >  include/uapi/asm-generic/socket.h       |  2 ++
> >  net/core/sock.c                         | 11 +++++++
> >  net/mptcp/sockopt.c                     |  1 +
> >  net/unix/af_unix.c                      | 18 ++++++++----
> >  tools/include/uapi/asm-generic/socket.h |  2 ++
> >  12 files changed, 76 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> > index 739891b94136..ff310613ae64 100644
> > --- a/arch/alpha/include/uapi/asm/socket.h
> > +++ b/arch/alpha/include/uapi/asm/socket.h
> > @@ -137,6 +137,8 @@
> >
> >  #define SO_RCVMARK           75
> >
> > +#define SO_PASSPIDFD         76
> > +
> >  #if !defined(__KERNEL__)
> >
> >  #if __BITS_PER_LONG == 64
> > diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> > index 18f3d95ecfec..762dcb80e4ec 100644
> > --- a/arch/mips/include/uapi/asm/socket.h
> > +++ b/arch/mips/include/uapi/asm/socket.h
> > @@ -148,6 +148,8 @@
> >
> >  #define SO_RCVMARK           75
> >
> > +#define SO_PASSPIDFD         76
> > +
> >  #if !defined(__KERNEL__)
> >
> >  #if __BITS_PER_LONG == 64
> > diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> > index f486d3dfb6bb..df16a3e16d64 100644
> > --- a/arch/parisc/include/uapi/asm/socket.h
> > +++ b/arch/parisc/include/uapi/asm/socket.h
> > @@ -129,6 +129,8 @@
> >
> >  #define SO_RCVMARK           0x4049
> >
> > +#define SO_PASSPIDFD         0x404A
> > +
> >  #if !defined(__KERNEL__)
> >
> >  #if __BITS_PER_LONG == 64
> > diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> > index 2fda57a3ea86..6e2847804fea 100644
> > --- a/arch/sparc/include/uapi/asm/socket.h
> > +++ b/arch/sparc/include/uapi/asm/socket.h
> > @@ -130,6 +130,8 @@
> >
> >  #define SO_RCVMARK               0x0054
> >
> > +#define SO_PASSPIDFD             0x0055
> > +
> >  #if !defined(__KERNEL__)
> >
> >
> > diff --git a/include/linux/net.h b/include/linux/net.h
> > index b73ad8e3c212..c234dfbe7a30 100644
> > --- a/include/linux/net.h
> > +++ b/include/linux/net.h
> > @@ -43,6 +43,7 @@ struct net;
> >  #define SOCK_PASSSEC         4
> >  #define SOCK_SUPPORT_ZC              5
> >  #define SOCK_CUSTOM_SOCKOPT  6
> > +#define SOCK_PASSPIDFD               7
> >
> >  #ifndef ARCH_HAS_SOCKET_TYPES
> >  /**
> > diff --git a/include/linux/socket.h b/include/linux/socket.h
> > index 13c3a237b9c9..6bf90f251910 100644
> > --- a/include/linux/socket.h
> > +++ b/include/linux/socket.h
> > @@ -177,6 +177,7 @@ static inline size_t msg_data_left(struct msghdr *msg)
> >  #define      SCM_RIGHTS      0x01            /* rw: access rights (array of int) */
> >  #define SCM_CREDENTIALS 0x02         /* rw: struct ucred             */
> >  #define SCM_SECURITY 0x03            /* rw: security label           */
> > +#define SCM_PIDFD    0x04            /* ro: pidfd (int)              */
> >
> >  struct ucred {
> >       __u32   pid;
> > diff --git a/include/net/scm.h b/include/net/scm.h
> > index 585adc1346bd..c67f765a165b 100644
> > --- a/include/net/scm.h
> > +++ b/include/net/scm.h
> > @@ -120,12 +120,44 @@ static inline bool scm_has_secdata(struct socket *sock)
> >  }
> >  #endif /* CONFIG_SECURITY_NETWORK */
> >
> > +static __inline__ void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm)
> > +{
> > +     struct file *pidfd_file = NULL;
> > +     int pidfd;
> > +
> > +     /*
> > +      * put_cmsg() doesn't return an error if CMSG is truncated,
> > +      * that's why we need to opencode these checks here.
> > +      */
> > +     if ((msg->msg_controllen <= sizeof(struct cmsghdr)) ||
> > +         (msg->msg_controllen - sizeof(struct cmsghdr)) < sizeof(int)) {
> > +             msg->msg_flags |= MSG_CTRUNC;
> > +             return;
>
> Hm, curious about this: We mark the message as truncated for SCM_PIDFD
> but if the same conditions were to apply for SCM_PASSCRED we don't mark
> the message as truncated. Am I reading this correct? And is so, you
> please briefly explain this difference?

Hi, Christian!

For SCM_CREDENTIALS we mark it too. Inside the put_cmsg function:
https://github.com/torvalds/linux/blob/6a8f57ae2eb07ab39a6f0ccad60c760743051026/net/core/scm.c#L225

The reason why I'm open-coding these checks is that I want to know
that the message
doesn't fit into the userspace buffer before doing pidfd_prepare and
other stuff and because
put_cmsg is not returning an error when message doesn't fit in the
userspace buffer and
we won't be able to properly do pidfd cleanup (put struct pid and fd index).

>
> > +     }
> > +
> > +     WARN_ON_ONCE(!scm->pid);
> > +     pidfd = pidfd_prepare(scm->pid, 0, &pidfd_file);
> > +
> > +     if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {
>
> If the put_cmsg() of the pidfd fails userspace needs to be able to
> detect this. Otherwise they can't distinguish between the SCM_PIDFD
> value being zero because the put_cmsg() failed or put_cmsg() succeeded
> and the allocated fd nr was 0.

If pidfd_prepare fails then userspace will receive SCM_PIDFD message
with negative pidfd value.

>
> Looking at put_cmsg() it looks to me that userspace will receive a
> SCM_PIDFD message only if the put_cmsg() is completely successful. IIUC,
> then this change is fine.

Kind regards,
Alex

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD
  2023-04-17 16:01     ` Aleksandr Mikhalitsyn
@ 2023-04-17 17:16       ` Christian Brauner
  2023-04-17 17:43         ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Christian Brauner @ 2023-04-17 17:16 UTC (permalink / raw)
  To: Aleksandr Mikhalitsyn
  Cc: davem, linux-kernel, netdev, daniel, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Kuniyuki Iwashima, Lennart Poettering,
	Luca Boccassi, linux-arch

On Mon, Apr 17, 2023 at 06:01:16PM +0200, Aleksandr Mikhalitsyn wrote:
> On Mon, Apr 17, 2023 at 5:18 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Thu, Apr 13, 2023 at 03:33:52PM +0200, Alexander Mikhalitsyn wrote:
> > > Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
> > > but it contains pidfd instead of plain pid, which allows programmers not
> > > to care about PID reuse problem.
> > >
> > > Idea comes from UAPI kernel group:
> > > https://uapi-group.org/kernel-features/
> > >
> > > Big thanks to Christian Brauner and Lennart Poettering for productive
> > > discussions about this.
> > >
> > > Cc: "David S. Miller" <davem@davemloft.net>
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > Cc: Paolo Abeni <pabeni@redhat.com>
> > > Cc: Leon Romanovsky <leon@kernel.org>
> > > Cc: David Ahern <dsahern@kernel.org>
> > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > Cc: Lennart Poettering <mzxreary@0pointer.de>
> > > Cc: Luca Boccassi <bluca@debian.org>
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: netdev@vger.kernel.org
> > > Cc: linux-arch@vger.kernel.org
> > > Tested-by: Luca Boccassi <bluca@debian.org>
> > > Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > > ---
> > > v4:
> > >       - fixed silent fd_install if writting of CMSG to the userspace fails (pointed by Christian)
> > > v2:
> > >       According to review comments from Kuniyuki Iwashima and Christian Brauner:
> > >       - use pidfd_create(..) retval as a result
> > >       - whitespace change
> > > ---
> > >  arch/alpha/include/uapi/asm/socket.h    |  2 ++
> > >  arch/mips/include/uapi/asm/socket.h     |  2 ++
> > >  arch/parisc/include/uapi/asm/socket.h   |  2 ++
> > >  arch/sparc/include/uapi/asm/socket.h    |  2 ++
> > >  include/linux/net.h                     |  1 +
> > >  include/linux/socket.h                  |  1 +
> > >  include/net/scm.h                       | 39 +++++++++++++++++++++++--
> > >  include/uapi/asm-generic/socket.h       |  2 ++
> > >  net/core/sock.c                         | 11 +++++++
> > >  net/mptcp/sockopt.c                     |  1 +
> > >  net/unix/af_unix.c                      | 18 ++++++++----
> > >  tools/include/uapi/asm-generic/socket.h |  2 ++
> > >  12 files changed, 76 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> > > index 739891b94136..ff310613ae64 100644
> > > --- a/arch/alpha/include/uapi/asm/socket.h
> > > +++ b/arch/alpha/include/uapi/asm/socket.h
> > > @@ -137,6 +137,8 @@
> > >
> > >  #define SO_RCVMARK           75
> > >
> > > +#define SO_PASSPIDFD         76
> > > +
> > >  #if !defined(__KERNEL__)
> > >
> > >  #if __BITS_PER_LONG == 64
> > > diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> > > index 18f3d95ecfec..762dcb80e4ec 100644
> > > --- a/arch/mips/include/uapi/asm/socket.h
> > > +++ b/arch/mips/include/uapi/asm/socket.h
> > > @@ -148,6 +148,8 @@
> > >
> > >  #define SO_RCVMARK           75
> > >
> > > +#define SO_PASSPIDFD         76
> > > +
> > >  #if !defined(__KERNEL__)
> > >
> > >  #if __BITS_PER_LONG == 64
> > > diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> > > index f486d3dfb6bb..df16a3e16d64 100644
> > > --- a/arch/parisc/include/uapi/asm/socket.h
> > > +++ b/arch/parisc/include/uapi/asm/socket.h
> > > @@ -129,6 +129,8 @@
> > >
> > >  #define SO_RCVMARK           0x4049
> > >
> > > +#define SO_PASSPIDFD         0x404A
> > > +
> > >  #if !defined(__KERNEL__)
> > >
> > >  #if __BITS_PER_LONG == 64
> > > diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> > > index 2fda57a3ea86..6e2847804fea 100644
> > > --- a/arch/sparc/include/uapi/asm/socket.h
> > > +++ b/arch/sparc/include/uapi/asm/socket.h
> > > @@ -130,6 +130,8 @@
> > >
> > >  #define SO_RCVMARK               0x0054
> > >
> > > +#define SO_PASSPIDFD             0x0055
> > > +
> > >  #if !defined(__KERNEL__)
> > >
> > >
> > > diff --git a/include/linux/net.h b/include/linux/net.h
> > > index b73ad8e3c212..c234dfbe7a30 100644
> > > --- a/include/linux/net.h
> > > +++ b/include/linux/net.h
> > > @@ -43,6 +43,7 @@ struct net;
> > >  #define SOCK_PASSSEC         4
> > >  #define SOCK_SUPPORT_ZC              5
> > >  #define SOCK_CUSTOM_SOCKOPT  6
> > > +#define SOCK_PASSPIDFD               7
> > >
> > >  #ifndef ARCH_HAS_SOCKET_TYPES
> > >  /**
> > > diff --git a/include/linux/socket.h b/include/linux/socket.h
> > > index 13c3a237b9c9..6bf90f251910 100644
> > > --- a/include/linux/socket.h
> > > +++ b/include/linux/socket.h
> > > @@ -177,6 +177,7 @@ static inline size_t msg_data_left(struct msghdr *msg)
> > >  #define      SCM_RIGHTS      0x01            /* rw: access rights (array of int) */
> > >  #define SCM_CREDENTIALS 0x02         /* rw: struct ucred             */
> > >  #define SCM_SECURITY 0x03            /* rw: security label           */
> > > +#define SCM_PIDFD    0x04            /* ro: pidfd (int)              */
> > >
> > >  struct ucred {
> > >       __u32   pid;
> > > diff --git a/include/net/scm.h b/include/net/scm.h
> > > index 585adc1346bd..c67f765a165b 100644
> > > --- a/include/net/scm.h
> > > +++ b/include/net/scm.h
> > > @@ -120,12 +120,44 @@ static inline bool scm_has_secdata(struct socket *sock)
> > >  }
> > >  #endif /* CONFIG_SECURITY_NETWORK */
> > >
> > > +static __inline__ void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm)
> > > +{
> > > +     struct file *pidfd_file = NULL;
> > > +     int pidfd;
> > > +
> > > +     /*
> > > +      * put_cmsg() doesn't return an error if CMSG is truncated,
> > > +      * that's why we need to opencode these checks here.
> > > +      */
> > > +     if ((msg->msg_controllen <= sizeof(struct cmsghdr)) ||
> > > +         (msg->msg_controllen - sizeof(struct cmsghdr)) < sizeof(int)) {
> > > +             msg->msg_flags |= MSG_CTRUNC;
> > > +             return;
> >
> > Hm, curious about this: We mark the message as truncated for SCM_PIDFD
> > but if the same conditions were to apply for SCM_PASSCRED we don't mark
> > the message as truncated. Am I reading this correct? And is so, you
> > please briefly explain this difference?
> 
> Hi, Christian!
> 
> For SCM_CREDENTIALS we mark it too. Inside the put_cmsg function:
> https://github.com/torvalds/linux/blob/6a8f57ae2eb07ab39a6f0ccad60c760743051026/net/core/scm.c#L225
> 
> The reason why I'm open-coding these checks is that I want to know
> that the message
> doesn't fit into the userspace buffer before doing pidfd_prepare and
> other stuff and because
> put_cmsg is not returning an error when message doesn't fit in the
> userspace buffer and
> we won't be able to properly do pidfd cleanup (put struct pid and fd index).
> 
> >
> > > +     }
> > > +
> > > +     WARN_ON_ONCE(!scm->pid);
> > > +     pidfd = pidfd_prepare(scm->pid, 0, &pidfd_file);
> > > +
> > > +     if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {
> >
> > If the put_cmsg() of the pidfd fails userspace needs to be able to
> > detect this. Otherwise they can't distinguish between the SCM_PIDFD
> > value being zero because the put_cmsg() failed or put_cmsg() succeeded
> > and the allocated fd nr was 0.
> 
> If pidfd_prepare fails then userspace will receive SCM_PIDFD message
> with negative pidfd value.

So we discussed this a bit offline and I think there's still an issue.
If put_cmsg() fails

          if (msg->msg_control_is_user) {
                  struct cmsghdr __user *cm = msg->msg_control_user;

                  check_object_size(data, cmlen - sizeof(*cm), true);

                  if (!user_write_access_begin(cm, cmlen))
                          goto efault;

		  // This succeeds so cm->cmsg_len == sizeof(int)
                  unsafe_put_user(cmlen, &cm->cmsg_len, efault_end);

		  // This succeeds so cm->cmsg_level == SOL_SOCKET
                  unsafe_put_user(level, &cm->cmsg_level, efault_end);

		  // This succeeds so cm->cmsg_type == SCM_PIDFD
                  unsafe_put_user(type, &cm->cmsg_type, efault_end);

		  // This fails and leaves all bits set to 0
                  unsafe_copy_to_user(CMSG_USER_DATA(cm), data,
                                      cmlen - sizeof(*cm), efault_end);
                  user_write_access_end();

so now we hit

          if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {
                  if (pidfd_file) {
                          put_unused_fd(pidfd);
                          fput(pidfd_file);
                  }

                  return;
          }

and return early. Afaict, userspace would now receive:

	if (cmsg && cmsg->cmsg_len == CMSG_LEN(sizeof(int)) &&
	    cmsg->cmsg_level == SOL_SOCKET &&
	    cmsg->cmsg_type == SCM_PIDFD) {
		memcpy(&pidfd, CMSG_DATA(cmsg), sizeof(int));

		// pidfd is now 0 which is a valid fd number
		// it'll likely refer to /dev/stdin or whatever and so
		// will fail or, worst case, 0 refers to another pidfd :)
		pidfd_send_signal(pidfd, SIGKILL);

so we need to address this. So one way I think that would solve this is:

diff --git a/net/core/scm.c b/net/core/scm.c
index 3cd7dd377e53..d1f4cd135c5a 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -236,9 +236,9 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)

                unsafe_put_user(cmlen, &cm->cmsg_len, efault_end);
                unsafe_put_user(level, &cm->cmsg_level, efault_end);
-               unsafe_put_user(type, &cm->cmsg_type, efault_end);
                unsafe_copy_to_user(CMSG_USER_DATA(cm), data,
                                    cmlen - sizeof(*cm), efault_end);
+               unsafe_put_user(type, &cm->cmsg_type, efault_end);
                user_write_access_end();
        } else {
                struct cmsghdr *cm = msg->msg_control;

such that we only copy cm->cmsg_type after we transfered the data.

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD
  2023-04-17 17:16       ` Christian Brauner
@ 2023-04-17 17:43         ` Eric Dumazet
  2023-04-18  8:16           ` Christian Brauner
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2023-04-17 17:43 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Aleksandr Mikhalitsyn, davem, linux-kernel, netdev, daniel,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Kuniyuki Iwashima, Lennart Poettering,
	Luca Boccassi, linux-arch

On Mon, Apr 17, 2023 at 7:16 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Mon, Apr 17, 2023 at 06:01:16PM +0200, Aleksandr Mikhalitsyn wrote:
> > On Mon, Apr 17, 2023 at 5:18 PM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Thu, Apr 13, 2023 at 03:33:52PM +0200, Alexander Mikhalitsyn wrote:
> > > > Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
> > > > but it contains pidfd instead of plain pid, which allows programmers not
> > > > to care about PID reuse problem.
> > > >
> > > > Idea comes from UAPI kernel group:
> > > > https://uapi-group.org/kernel-features/
> > > >
> > > > Big thanks to Christian Brauner and Lennart Poettering for productive
> > > > discussions about this.
> > > >
> > > > Cc: "David S. Miller" <davem@davemloft.net>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > > Cc: Paolo Abeni <pabeni@redhat.com>
> > > > Cc: Leon Romanovsky <leon@kernel.org>
> > > > Cc: David Ahern <dsahern@kernel.org>
> > > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > Cc: Lennart Poettering <mzxreary@0pointer.de>
> > > > Cc: Luca Boccassi <bluca@debian.org>
> > > > Cc: linux-kernel@vger.kernel.org
> > > > Cc: netdev@vger.kernel.org
> > > > Cc: linux-arch@vger.kernel.org
> > > > Tested-by: Luca Boccassi <bluca@debian.org>
> > > > Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > > > ---
> > > > v4:
> > > >       - fixed silent fd_install if writting of CMSG to the userspace fails (pointed by Christian)
> > > > v2:
> > > >       According to review comments from Kuniyuki Iwashima and Christian Brauner:
> > > >       - use pidfd_create(..) retval as a result
> > > >       - whitespace change
> > > > ---
> > > >  arch/alpha/include/uapi/asm/socket.h    |  2 ++
> > > >  arch/mips/include/uapi/asm/socket.h     |  2 ++
> > > >  arch/parisc/include/uapi/asm/socket.h   |  2 ++
> > > >  arch/sparc/include/uapi/asm/socket.h    |  2 ++
> > > >  include/linux/net.h                     |  1 +
> > > >  include/linux/socket.h                  |  1 +
> > > >  include/net/scm.h                       | 39 +++++++++++++++++++++++--
> > > >  include/uapi/asm-generic/socket.h       |  2 ++
> > > >  net/core/sock.c                         | 11 +++++++
> > > >  net/mptcp/sockopt.c                     |  1 +
> > > >  net/unix/af_unix.c                      | 18 ++++++++----
> > > >  tools/include/uapi/asm-generic/socket.h |  2 ++
> > > >  12 files changed, 76 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> > > > index 739891b94136..ff310613ae64 100644
> > > > --- a/arch/alpha/include/uapi/asm/socket.h
> > > > +++ b/arch/alpha/include/uapi/asm/socket.h
> > > > @@ -137,6 +137,8 @@
> > > >
> > > >  #define SO_RCVMARK           75
> > > >
> > > > +#define SO_PASSPIDFD         76
> > > > +
> > > >  #if !defined(__KERNEL__)
> > > >
> > > >  #if __BITS_PER_LONG == 64
> > > > diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> > > > index 18f3d95ecfec..762dcb80e4ec 100644
> > > > --- a/arch/mips/include/uapi/asm/socket.h
> > > > +++ b/arch/mips/include/uapi/asm/socket.h
> > > > @@ -148,6 +148,8 @@
> > > >
> > > >  #define SO_RCVMARK           75
> > > >
> > > > +#define SO_PASSPIDFD         76
> > > > +
> > > >  #if !defined(__KERNEL__)
> > > >
> > > >  #if __BITS_PER_LONG == 64
> > > > diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> > > > index f486d3dfb6bb..df16a3e16d64 100644
> > > > --- a/arch/parisc/include/uapi/asm/socket.h
> > > > +++ b/arch/parisc/include/uapi/asm/socket.h
> > > > @@ -129,6 +129,8 @@
> > > >
> > > >  #define SO_RCVMARK           0x4049
> > > >
> > > > +#define SO_PASSPIDFD         0x404A
> > > > +
> > > >  #if !defined(__KERNEL__)
> > > >
> > > >  #if __BITS_PER_LONG == 64
> > > > diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> > > > index 2fda57a3ea86..6e2847804fea 100644
> > > > --- a/arch/sparc/include/uapi/asm/socket.h
> > > > +++ b/arch/sparc/include/uapi/asm/socket.h
> > > > @@ -130,6 +130,8 @@
> > > >
> > > >  #define SO_RCVMARK               0x0054
> > > >
> > > > +#define SO_PASSPIDFD             0x0055
> > > > +
> > > >  #if !defined(__KERNEL__)
> > > >
> > > >
> > > > diff --git a/include/linux/net.h b/include/linux/net.h
> > > > index b73ad8e3c212..c234dfbe7a30 100644
> > > > --- a/include/linux/net.h
> > > > +++ b/include/linux/net.h
> > > > @@ -43,6 +43,7 @@ struct net;
> > > >  #define SOCK_PASSSEC         4
> > > >  #define SOCK_SUPPORT_ZC              5
> > > >  #define SOCK_CUSTOM_SOCKOPT  6
> > > > +#define SOCK_PASSPIDFD               7
> > > >
> > > >  #ifndef ARCH_HAS_SOCKET_TYPES
> > > >  /**
> > > > diff --git a/include/linux/socket.h b/include/linux/socket.h
> > > > index 13c3a237b9c9..6bf90f251910 100644
> > > > --- a/include/linux/socket.h
> > > > +++ b/include/linux/socket.h
> > > > @@ -177,6 +177,7 @@ static inline size_t msg_data_left(struct msghdr *msg)
> > > >  #define      SCM_RIGHTS      0x01            /* rw: access rights (array of int) */
> > > >  #define SCM_CREDENTIALS 0x02         /* rw: struct ucred             */
> > > >  #define SCM_SECURITY 0x03            /* rw: security label           */
> > > > +#define SCM_PIDFD    0x04            /* ro: pidfd (int)              */
> > > >
> > > >  struct ucred {
> > > >       __u32   pid;
> > > > diff --git a/include/net/scm.h b/include/net/scm.h
> > > > index 585adc1346bd..c67f765a165b 100644
> > > > --- a/include/net/scm.h
> > > > +++ b/include/net/scm.h
> > > > @@ -120,12 +120,44 @@ static inline bool scm_has_secdata(struct socket *sock)
> > > >  }
> > > >  #endif /* CONFIG_SECURITY_NETWORK */
> > > >
> > > > +static __inline__ void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm)
> > > > +{
> > > > +     struct file *pidfd_file = NULL;
> > > > +     int pidfd;
> > > > +
> > > > +     /*
> > > > +      * put_cmsg() doesn't return an error if CMSG is truncated,
> > > > +      * that's why we need to opencode these checks here.
> > > > +      */
> > > > +     if ((msg->msg_controllen <= sizeof(struct cmsghdr)) ||
> > > > +         (msg->msg_controllen - sizeof(struct cmsghdr)) < sizeof(int)) {
> > > > +             msg->msg_flags |= MSG_CTRUNC;
> > > > +             return;
> > >
> > > Hm, curious about this: We mark the message as truncated for SCM_PIDFD
> > > but if the same conditions were to apply for SCM_PASSCRED we don't mark
> > > the message as truncated. Am I reading this correct? And is so, you
> > > please briefly explain this difference?
> >
> > Hi, Christian!
> >
> > For SCM_CREDENTIALS we mark it too. Inside the put_cmsg function:
> > https://github.com/torvalds/linux/blob/6a8f57ae2eb07ab39a6f0ccad60c760743051026/net/core/scm.c#L225
> >
> > The reason why I'm open-coding these checks is that I want to know
> > that the message
> > doesn't fit into the userspace buffer before doing pidfd_prepare and
> > other stuff and because
> > put_cmsg is not returning an error when message doesn't fit in the
> > userspace buffer and
> > we won't be able to properly do pidfd cleanup (put struct pid and fd index).
> >
> > >
> > > > +     }
> > > > +
> > > > +     WARN_ON_ONCE(!scm->pid);
> > > > +     pidfd = pidfd_prepare(scm->pid, 0, &pidfd_file);
> > > > +
> > > > +     if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {
> > >
> > > If the put_cmsg() of the pidfd fails userspace needs to be able to
> > > detect this. Otherwise they can't distinguish between the SCM_PIDFD
> > > value being zero because the put_cmsg() failed or put_cmsg() succeeded
> > > and the allocated fd nr was 0.
> >
> > If pidfd_prepare fails then userspace will receive SCM_PIDFD message
> > with negative pidfd value.
>
> So we discussed this a bit offline and I think there's still an issue.
> If put_cmsg() fails
>
>           if (msg->msg_control_is_user) {
>                   struct cmsghdr __user *cm = msg->msg_control_user;
>
>                   check_object_size(data, cmlen - sizeof(*cm), true);
>
>                   if (!user_write_access_begin(cm, cmlen))
>                           goto efault;
>
>                   // This succeeds so cm->cmsg_len == sizeof(int)
>                   unsafe_put_user(cmlen, &cm->cmsg_len, efault_end);
>
>                   // This succeeds so cm->cmsg_level == SOL_SOCKET
>                   unsafe_put_user(level, &cm->cmsg_level, efault_end);
>
>                   // This succeeds so cm->cmsg_type == SCM_PIDFD
>                   unsafe_put_user(type, &cm->cmsg_type, efault_end);
>
>                   // This fails and leaves all bits set to 0
>                   unsafe_copy_to_user(CMSG_USER_DATA(cm), data,
>                                       cmlen - sizeof(*cm), efault_end);
>                   user_write_access_end();
>
> so now we hit
>
>           if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {
>                   if (pidfd_file) {
>                           put_unused_fd(pidfd);
>                           fput(pidfd_file);
>                   }
>
>                   return;
>           }
>
> and return early. Afaict, userspace would now receive:
>
>         if (cmsg && cmsg->cmsg_len == CMSG_LEN(sizeof(int)) &&
>             cmsg->cmsg_level == SOL_SOCKET &&
>             cmsg->cmsg_type == SCM_PIDFD) {
>                 memcpy(&pidfd, CMSG_DATA(cmsg), sizeof(int));
>
>                 // pidfd is now 0 which is a valid fd number
>                 // it'll likely refer to /dev/stdin or whatever and so
>                 // will fail or, worst case, 0 refers to another pidfd :)
>                 pidfd_send_signal(pidfd, SIGKILL);
>
> so we need to address this. So one way I think that would solve this is:
>
> diff --git a/net/core/scm.c b/net/core/scm.c
> index 3cd7dd377e53..d1f4cd135c5a 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -236,9 +236,9 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
>
>                 unsafe_put_user(cmlen, &cm->cmsg_len, efault_end);
>                 unsafe_put_user(level, &cm->cmsg_level, efault_end);
> -               unsafe_put_user(type, &cm->cmsg_type, efault_end);
>                 unsafe_copy_to_user(CMSG_USER_DATA(cm), data,
>                                     cmlen - sizeof(*cm), efault_end);
> +               unsafe_put_user(type, &cm->cmsg_type, efault_end);
>                 user_write_access_end();
>         } else {
>                 struct cmsghdr *cm = msg->msg_control;
>
> such that we only copy cm->cmsg_type after we transfered the data.

This looks wrong to me.

if put_cmsg() returns -EFAULT, then msg->msg_control and
msg->msg_controllen were not changed.

So the user application should not attempt to read this part of the
control buffer, this could contain garbage.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook
  2023-04-17 14:42           ` Christian Brauner
@ 2023-04-17 18:10             ` Stanislav Fomichev
  0 siblings, 0 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2023-04-17 18:10 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Aleksandr Mikhalitsyn, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch

On Mon, Apr 17, 2023 at 7:42 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Fri, Apr 14, 2023 at 06:55:39PM -0700, Stanislav Fomichev wrote:
> > On 04/13, Stanislav Fomichev wrote:
> > > On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> > > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > > >
> > > > On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> > > > >
> > > > > On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> > > > > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > > > > >
> > > > > > During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> > > > > > that bpf cgroup hook can cause FD leaks when used with sockopts which
> > > > > > install FDs into the process fdtable.
> > > > > >
> > > > > > After some offlist discussion it was proposed to add a blacklist of
> > > > >
> > > > > We try to replace this word by either denylist or blocklist, even in changelogs.
> > > >
> > > > Hi Eric,
> > > >
> > > > Oh, I'm sorry about that. :( Sure.
> > > >
> > > > >
> > > > > > socket options those can cause troubles when BPF cgroup hook is enabled.
> > > > > >
> > > > >
> > > > > Can we find the appropriate Fixes: tag to help stable teams ?
> > > >
> > > > Sure, I will add next time.
> > > >
> > > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > > >
> > > > I think it's better to add Stanislav Fomichev to CC.
> > >
> > > Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> > > use it for tcp zerocopy, I'm assuming it should work in this case as
> > > well?
> >
> > Jakub reminded me of the other things I wanted to ask here bug forgot:
> >
> > - setsockopt is probably not needed, right? setsockopt hook triggers
> >   before the kernel and shouldn't leak anything
> > - for getsockopt, instead of bypassing bpf completely, should we instead
> >   ignore the error from the bpf program? that would still preserve
>
> That's fine by me as well.
>
> It'd be great if the net folks could tell Alex how they would want this
> handled.

Doing the bypass seems fine with me for now. If we ever decide that
fd-based optvals are worth inspecting in bpf, we can lift that bypass.

> >   the observability aspect
>
> Please see for more details
> https://lore.kernel.org/lkml/20230411-nudelsalat-spreu-3038458f25c4@brauner

Thanks for the context. Yeah, sockopts are being used for a lot of
interesting things :-(

> > - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
> >   gets called whenever bpf returns an error to make sure protocols have
> >   a chance to handle that condition (and free the fd)
>
> Installing an fd into an fdtable makes it visible to userspace at which
> point calling close_fd() is doable but an absolute last resort and
> generally a good indicator of misdesign. If the bpf hook wants to make
> decisions based on the file then it should receive a struct
> file, not an fd.

SG! Then let's not over-complicate it for now and do a simple bypass.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-15  1:55         ` Stanislav Fomichev
  2023-04-17 14:42           ` Christian Brauner
@ 2023-04-18  1:03           ` Martin KaFai Lau
  2023-04-18 16:47             ` Stanislav Fomichev
  1 sibling, 1 reply; 27+ messages in thread
From: Martin KaFai Lau @ 2023-04-18  1:03 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Eric Dumazet, davem, linux-kernel, netdev, daniel,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf

On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
> On 04/13, Stanislav Fomichev wrote:
>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>
>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
>>>>
>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>
>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
>>>>> install FDs into the process fdtable.
>>>>>
>>>>> After some offlist discussion it was proposed to add a blacklist of
>>>>
>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
>>>
>>> Hi Eric,
>>>
>>> Oh, I'm sorry about that. :( Sure.
>>>
>>>>
>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
>>>>>
>>>>
>>>> Can we find the appropriate Fixes: tag to help stable teams ?
>>>
>>> Sure, I will add next time.
>>>
>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>>>
>>> I think it's better to add Stanislav Fomichev to CC.
>>
>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
>> use it for tcp zerocopy, I'm assuming it should work in this case as
>> well?
> 
> Jakub reminded me of the other things I wanted to ask here bug forgot:
> 
> - setsockopt is probably not needed, right? setsockopt hook triggers
>    before the kernel and shouldn't leak anything
> - for getsockopt, instead of bypassing bpf completely, should we instead
>    ignore the error from the bpf program? that would still preserve
>    the observability aspect

stealing this thread to discuss the optlen issue which may make sense to bypass 
also.

There has been issue with optlen. Other than this older post related to optlen > 
PAGE_SIZE: 
https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/, the 
recent one related to optlen that we have seen is NETLINK_LIST_MEMBERSHIPS. The 
userspace passed in optlen == 0 and the kernel put the expected optlen (> 0) and 
'return 0;' to userspace. The userspace intention is to learn the expected 
optlen. This makes 'ctx.optlen > max_optlen' and 
__cgroup_bpf_run_filter_getsockopt() ends up returning -EFAULT to the userspace 
even the bpf prog has not changed anything.

Does it make sense to also bypass the bpf prog when 'ctx.optlen > max_optlen' 
for now (and this can use a separate patch which as usual requires a bpf selftests)?

In the future, does it make sense to have a specific cgroup-bpf-prog (a specific 
attach type?) that only uses bpf_dynptr kfunc to access the optval such that it 
can enforce read-only for some optname and potentially also track if bpf-prog 
has written a new optval? The bpf-prog can only return 1 (OK) and only allows 
using bpf_set_retval() instead. Likely there is still holes but could be a seed 
of thought to continue polishing the idea.


> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>    gets called whenever bpf returns an error to make sure protocols have
>    a chance to handle that condition (and free the fd)
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD
  2023-04-17 17:43         ` Eric Dumazet
@ 2023-04-18  8:16           ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2023-04-18  8:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Aleksandr Mikhalitsyn, davem, linux-kernel, netdev, daniel,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Kuniyuki Iwashima, Lennart Poettering,
	Luca Boccassi, linux-arch

On Mon, Apr 17, 2023 at 07:43:19PM +0200, Eric Dumazet wrote:
> On Mon, Apr 17, 2023 at 7:16 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Mon, Apr 17, 2023 at 06:01:16PM +0200, Aleksandr Mikhalitsyn wrote:
> > > On Mon, Apr 17, 2023 at 5:18 PM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Thu, Apr 13, 2023 at 03:33:52PM +0200, Alexander Mikhalitsyn wrote:
> > > > > Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
> > > > > but it contains pidfd instead of plain pid, which allows programmers not
> > > > > to care about PID reuse problem.
> > > > >
> > > > > Idea comes from UAPI kernel group:
> > > > > https://uapi-group.org/kernel-features/
> > > > >
> > > > > Big thanks to Christian Brauner and Lennart Poettering for productive
> > > > > discussions about this.
> > > > >
> > > > > Cc: "David S. Miller" <davem@davemloft.net>
> > > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > > > Cc: Paolo Abeni <pabeni@redhat.com>
> > > > > Cc: Leon Romanovsky <leon@kernel.org>
> > > > > Cc: David Ahern <dsahern@kernel.org>
> > > > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > > Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > > Cc: Lennart Poettering <mzxreary@0pointer.de>
> > > > > Cc: Luca Boccassi <bluca@debian.org>
> > > > > Cc: linux-kernel@vger.kernel.org
> > > > > Cc: netdev@vger.kernel.org
> > > > > Cc: linux-arch@vger.kernel.org
> > > > > Tested-by: Luca Boccassi <bluca@debian.org>
> > > > > Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > > > > ---
> > > > > v4:
> > > > >       - fixed silent fd_install if writting of CMSG to the userspace fails (pointed by Christian)
> > > > > v2:
> > > > >       According to review comments from Kuniyuki Iwashima and Christian Brauner:
> > > > >       - use pidfd_create(..) retval as a result
> > > > >       - whitespace change
> > > > > ---
> > > > >  arch/alpha/include/uapi/asm/socket.h    |  2 ++
> > > > >  arch/mips/include/uapi/asm/socket.h     |  2 ++
> > > > >  arch/parisc/include/uapi/asm/socket.h   |  2 ++
> > > > >  arch/sparc/include/uapi/asm/socket.h    |  2 ++
> > > > >  include/linux/net.h                     |  1 +
> > > > >  include/linux/socket.h                  |  1 +
> > > > >  include/net/scm.h                       | 39 +++++++++++++++++++++++--
> > > > >  include/uapi/asm-generic/socket.h       |  2 ++
> > > > >  net/core/sock.c                         | 11 +++++++
> > > > >  net/mptcp/sockopt.c                     |  1 +
> > > > >  net/unix/af_unix.c                      | 18 ++++++++----
> > > > >  tools/include/uapi/asm-generic/socket.h |  2 ++
> > > > >  12 files changed, 76 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> > > > > index 739891b94136..ff310613ae64 100644
> > > > > --- a/arch/alpha/include/uapi/asm/socket.h
> > > > > +++ b/arch/alpha/include/uapi/asm/socket.h
> > > > > @@ -137,6 +137,8 @@
> > > > >
> > > > >  #define SO_RCVMARK           75
> > > > >
> > > > > +#define SO_PASSPIDFD         76
> > > > > +
> > > > >  #if !defined(__KERNEL__)
> > > > >
> > > > >  #if __BITS_PER_LONG == 64
> > > > > diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> > > > > index 18f3d95ecfec..762dcb80e4ec 100644
> > > > > --- a/arch/mips/include/uapi/asm/socket.h
> > > > > +++ b/arch/mips/include/uapi/asm/socket.h
> > > > > @@ -148,6 +148,8 @@
> > > > >
> > > > >  #define SO_RCVMARK           75
> > > > >
> > > > > +#define SO_PASSPIDFD         76
> > > > > +
> > > > >  #if !defined(__KERNEL__)
> > > > >
> > > > >  #if __BITS_PER_LONG == 64
> > > > > diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> > > > > index f486d3dfb6bb..df16a3e16d64 100644
> > > > > --- a/arch/parisc/include/uapi/asm/socket.h
> > > > > +++ b/arch/parisc/include/uapi/asm/socket.h
> > > > > @@ -129,6 +129,8 @@
> > > > >
> > > > >  #define SO_RCVMARK           0x4049
> > > > >
> > > > > +#define SO_PASSPIDFD         0x404A
> > > > > +
> > > > >  #if !defined(__KERNEL__)
> > > > >
> > > > >  #if __BITS_PER_LONG == 64
> > > > > diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> > > > > index 2fda57a3ea86..6e2847804fea 100644
> > > > > --- a/arch/sparc/include/uapi/asm/socket.h
> > > > > +++ b/arch/sparc/include/uapi/asm/socket.h
> > > > > @@ -130,6 +130,8 @@
> > > > >
> > > > >  #define SO_RCVMARK               0x0054
> > > > >
> > > > > +#define SO_PASSPIDFD             0x0055
> > > > > +
> > > > >  #if !defined(__KERNEL__)
> > > > >
> > > > >
> > > > > diff --git a/include/linux/net.h b/include/linux/net.h
> > > > > index b73ad8e3c212..c234dfbe7a30 100644
> > > > > --- a/include/linux/net.h
> > > > > +++ b/include/linux/net.h
> > > > > @@ -43,6 +43,7 @@ struct net;
> > > > >  #define SOCK_PASSSEC         4
> > > > >  #define SOCK_SUPPORT_ZC              5
> > > > >  #define SOCK_CUSTOM_SOCKOPT  6
> > > > > +#define SOCK_PASSPIDFD               7
> > > > >
> > > > >  #ifndef ARCH_HAS_SOCKET_TYPES
> > > > >  /**
> > > > > diff --git a/include/linux/socket.h b/include/linux/socket.h
> > > > > index 13c3a237b9c9..6bf90f251910 100644
> > > > > --- a/include/linux/socket.h
> > > > > +++ b/include/linux/socket.h
> > > > > @@ -177,6 +177,7 @@ static inline size_t msg_data_left(struct msghdr *msg)
> > > > >  #define      SCM_RIGHTS      0x01            /* rw: access rights (array of int) */
> > > > >  #define SCM_CREDENTIALS 0x02         /* rw: struct ucred             */
> > > > >  #define SCM_SECURITY 0x03            /* rw: security label           */
> > > > > +#define SCM_PIDFD    0x04            /* ro: pidfd (int)              */
> > > > >
> > > > >  struct ucred {
> > > > >       __u32   pid;
> > > > > diff --git a/include/net/scm.h b/include/net/scm.h
> > > > > index 585adc1346bd..c67f765a165b 100644
> > > > > --- a/include/net/scm.h
> > > > > +++ b/include/net/scm.h
> > > > > @@ -120,12 +120,44 @@ static inline bool scm_has_secdata(struct socket *sock)
> > > > >  }
> > > > >  #endif /* CONFIG_SECURITY_NETWORK */
> > > > >
> > > > > +static __inline__ void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm)
> > > > > +{
> > > > > +     struct file *pidfd_file = NULL;
> > > > > +     int pidfd;
> > > > > +
> > > > > +     /*
> > > > > +      * put_cmsg() doesn't return an error if CMSG is truncated,
> > > > > +      * that's why we need to opencode these checks here.
> > > > > +      */
> > > > > +     if ((msg->msg_controllen <= sizeof(struct cmsghdr)) ||
> > > > > +         (msg->msg_controllen - sizeof(struct cmsghdr)) < sizeof(int)) {
> > > > > +             msg->msg_flags |= MSG_CTRUNC;
> > > > > +             return;
> > > >
> > > > Hm, curious about this: We mark the message as truncated for SCM_PIDFD
> > > > but if the same conditions were to apply for SCM_PASSCRED we don't mark
> > > > the message as truncated. Am I reading this correct? And is so, you
> > > > please briefly explain this difference?
> > >
> > > Hi, Christian!
> > >
> > > For SCM_CREDENTIALS we mark it too. Inside the put_cmsg function:
> > > https://github.com/torvalds/linux/blob/6a8f57ae2eb07ab39a6f0ccad60c760743051026/net/core/scm.c#L225
> > >
> > > The reason why I'm open-coding these checks is that I want to know
> > > that the message
> > > doesn't fit into the userspace buffer before doing pidfd_prepare and
> > > other stuff and because
> > > put_cmsg is not returning an error when message doesn't fit in the
> > > userspace buffer and
> > > we won't be able to properly do pidfd cleanup (put struct pid and fd index).
> > >
> > > >
> > > > > +     }
> > > > > +
> > > > > +     WARN_ON_ONCE(!scm->pid);
> > > > > +     pidfd = pidfd_prepare(scm->pid, 0, &pidfd_file);
> > > > > +
> > > > > +     if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {
> > > >
> > > > If the put_cmsg() of the pidfd fails userspace needs to be able to
> > > > detect this. Otherwise they can't distinguish between the SCM_PIDFD
> > > > value being zero because the put_cmsg() failed or put_cmsg() succeeded
> > > > and the allocated fd nr was 0.
> > >
> > > If pidfd_prepare fails then userspace will receive SCM_PIDFD message
> > > with negative pidfd value.
> >
> > So we discussed this a bit offline and I think there's still an issue.
> > If put_cmsg() fails
> >
> >           if (msg->msg_control_is_user) {
> >                   struct cmsghdr __user *cm = msg->msg_control_user;
> >
> >                   check_object_size(data, cmlen - sizeof(*cm), true);
> >
> >                   if (!user_write_access_begin(cm, cmlen))
> >                           goto efault;
> >
> >                   // This succeeds so cm->cmsg_len == sizeof(int)
> >                   unsafe_put_user(cmlen, &cm->cmsg_len, efault_end);
> >
> >                   // This succeeds so cm->cmsg_level == SOL_SOCKET
> >                   unsafe_put_user(level, &cm->cmsg_level, efault_end);
> >
> >                   // This succeeds so cm->cmsg_type == SCM_PIDFD
> >                   unsafe_put_user(type, &cm->cmsg_type, efault_end);
> >
> >                   // This fails and leaves all bits set to 0
> >                   unsafe_copy_to_user(CMSG_USER_DATA(cm), data,
> >                                       cmlen - sizeof(*cm), efault_end);
> >                   user_write_access_end();
> >
> > so now we hit
> >
> >           if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {
> >                   if (pidfd_file) {
> >                           put_unused_fd(pidfd);
> >                           fput(pidfd_file);
> >                   }
> >
> >                   return;
> >           }
> >
> > and return early. Afaict, userspace would now receive:
> >
> >         if (cmsg && cmsg->cmsg_len == CMSG_LEN(sizeof(int)) &&
> >             cmsg->cmsg_level == SOL_SOCKET &&
> >             cmsg->cmsg_type == SCM_PIDFD) {
> >                 memcpy(&pidfd, CMSG_DATA(cmsg), sizeof(int));
> >
> >                 // pidfd is now 0 which is a valid fd number
> >                 // it'll likely refer to /dev/stdin or whatever and so
> >                 // will fail or, worst case, 0 refers to another pidfd :)
> >                 pidfd_send_signal(pidfd, SIGKILL);
> >
> > so we need to address this. So one way I think that would solve this is:
> >
> > diff --git a/net/core/scm.c b/net/core/scm.c
> > index 3cd7dd377e53..d1f4cd135c5a 100644
> > --- a/net/core/scm.c
> > +++ b/net/core/scm.c
> > @@ -236,9 +236,9 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
> >
> >                 unsafe_put_user(cmlen, &cm->cmsg_len, efault_end);
> >                 unsafe_put_user(level, &cm->cmsg_level, efault_end);
> > -               unsafe_put_user(type, &cm->cmsg_type, efault_end);
> >                 unsafe_copy_to_user(CMSG_USER_DATA(cm), data,
> >                                     cmlen - sizeof(*cm), efault_end);
> > +               unsafe_put_user(type, &cm->cmsg_type, efault_end);
> >                 user_write_access_end();
> >         } else {
> >                 struct cmsghdr *cm = msg->msg_control;
> >
> > such that we only copy cm->cmsg_type after we transfered the data.
> 
> This looks wrong to me.
> 
> if put_cmsg() returns -EFAULT, then msg->msg_control and
> msg->msg_controllen were not changed.
> 
> So the user application should not attempt to read this part of the
> control buffer, this could contain garbage.

Thanks for the review, Eric. That's reassuring.

I've done a bit of container related networking before but I'm fumbling
my way through the reviews here. So any additional reviews here would be
very helpful.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD
  2023-04-13 13:33 ` [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD Alexander Mikhalitsyn
  2023-04-17 15:18   ` Christian Brauner
@ 2023-04-18 13:07   ` Christian Brauner
  1 sibling, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2023-04-18 13:07 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: davem, linux-kernel, netdev, daniel, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Kuniyuki Iwashima, Lennart Poettering,
	Luca Boccassi, linux-arch

On Thu, Apr 13, 2023 at 03:33:52PM +0200, Alexander Mikhalitsyn wrote:
> Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
> but it contains pidfd instead of plain pid, which allows programmers not
> to care about PID reuse problem.
> 
> Idea comes from UAPI kernel group:
> https://uapi-group.org/kernel-features/
> 
> Big thanks to Christian Brauner and Lennart Poettering for productive
> discussions about this.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Lennart Poettering <mzxreary@0pointer.de>
> Cc: Luca Boccassi <bluca@debian.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Tested-by: Luca Boccassi <bluca@debian.org>
> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
> v4:
> 	- fixed silent fd_install if writting of CMSG to the userspace fails (pointed by Christian)

I don't have a lot more to add to this,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 3/4] net: core: add getsockopt SO_PEERPIDFD
  2023-04-13 13:33 ` [PATCH net-next v4 3/4] net: core: add getsockopt SO_PEERPIDFD Alexander Mikhalitsyn
@ 2023-04-18 13:13   ` Christian Brauner
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Brauner @ 2023-04-18 13:13 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: davem, linux-kernel, netdev, daniel, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Kuniyuki Iwashima, Lennart Poettering,
	Luca Boccassi, linux-arch

On Thu, Apr 13, 2023 at 03:33:54PM +0200, Alexander Mikhalitsyn wrote:
> Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd.
> This thing is direct analog of SO_PEERCRED which allows to get plain PID.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Lennart Poettering <mzxreary@0pointer.de>
> Cc: Luca Boccassi <bluca@debian.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Tested-by: Luca Boccassi <bluca@debian.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
> v4:
> 	- return -ESRCH if sk->sk_peer_pid is NULL from getsockopt() syscall
> 	- return errors from pidfd_prepare() as it is from getsockopt() syscall

I see no obvious issues anymore,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-18  1:03           ` handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook) Martin KaFai Lau
@ 2023-04-18 16:47             ` Stanislav Fomichev
  2023-04-25 17:59               ` Kui-Feng Lee
  0 siblings, 1 reply; 27+ messages in thread
From: Stanislav Fomichev @ 2023-04-18 16:47 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Eric Dumazet, davem, linux-kernel, netdev, daniel,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf

On 04/17, Martin KaFai Lau wrote:
> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
> > On 04/13, Stanislav Fomichev wrote:
> > > On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> > > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > > > 
> > > > On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> > > > > 
> > > > > On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> > > > > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > > > > > 
> > > > > > During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> > > > > > that bpf cgroup hook can cause FD leaks when used with sockopts which
> > > > > > install FDs into the process fdtable.
> > > > > > 
> > > > > > After some offlist discussion it was proposed to add a blacklist of
> > > > > 
> > > > > We try to replace this word by either denylist or blocklist, even in changelogs.
> > > > 
> > > > Hi Eric,
> > > > 
> > > > Oh, I'm sorry about that. :( Sure.
> > > > 
> > > > > 
> > > > > > socket options those can cause troubles when BPF cgroup hook is enabled.
> > > > > > 
> > > > > 
> > > > > Can we find the appropriate Fixes: tag to help stable teams ?
> > > > 
> > > > Sure, I will add next time.
> > > > 
> > > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > > > 
> > > > I think it's better to add Stanislav Fomichev to CC.
> > > 
> > > Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> > > use it for tcp zerocopy, I'm assuming it should work in this case as
> > > well?
> > 
> > Jakub reminded me of the other things I wanted to ask here bug forgot:
> > 
> > - setsockopt is probably not needed, right? setsockopt hook triggers
> >    before the kernel and shouldn't leak anything
> > - for getsockopt, instead of bypassing bpf completely, should we instead
> >    ignore the error from the bpf program? that would still preserve
> >    the observability aspect
> 
> stealing this thread to discuss the optlen issue which may make sense to
> bypass also.
> 
> There has been issue with optlen. Other than this older post related to
> optlen > PAGE_SIZE:
> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
> the recent one related to optlen that we have seen is
> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
> intention is to learn the expected optlen. This makes 'ctx.optlen >
> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
> -EFAULT to the userspace even the bpf prog has not changed anything.

(ignoring -EFAULT issue) this seems like it needs to be

	if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
		/* error */
	}

?

> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
> max_optlen' for now (and this can use a separate patch which as usual
> requires a bpf selftests)?

Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
seems like the way to go. It caused too much trouble already :-(

Should I prepare a patch or do you want to take a stab at it?

> In the future, does it make sense to have a specific cgroup-bpf-prog (a
> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
> such that it can enforce read-only for some optname and potentially also
> track if bpf-prog has written a new optval? The bpf-prog can only return 1
> (OK) and only allows using bpf_set_retval() instead. Likely there is still
> holes but could be a seed of thought to continue polishing the idea.

Ack, let's think about it.

Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
and we have a mostly working bpf_getsockopt (after your refactoring),
I don't see why we need to continue the current scheme of triggering
after the kernel?

> > - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
> >    gets called whenever bpf returns an error to make sure protocols have
> >    a chance to handle that condition (and free the fd)
> > 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-18 16:47             ` Stanislav Fomichev
@ 2023-04-25 17:59               ` Kui-Feng Lee
  2023-04-25 18:42                 ` Stanislav Fomichev
  0 siblings, 1 reply; 27+ messages in thread
From: Kui-Feng Lee @ 2023-04-25 17:59 UTC (permalink / raw)
  To: Stanislav Fomichev, Martin KaFai Lau
  Cc: Eric Dumazet, davem, linux-kernel, netdev, daniel,
	Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf



On 4/18/23 09:47, Stanislav Fomichev wrote:
> On 04/17, Martin KaFai Lau wrote:
>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
>>> On 04/13, Stanislav Fomichev wrote:
>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>
>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
>>>>>>
>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>
>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
>>>>>>> install FDs into the process fdtable.
>>>>>>>
>>>>>>> After some offlist discussion it was proposed to add a blacklist of
>>>>>>
>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
>>>>>
>>>>> Hi Eric,
>>>>>
>>>>> Oh, I'm sorry about that. :( Sure.
>>>>>
>>>>>>
>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
>>>>>>>
>>>>>>
>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
>>>>>
>>>>> Sure, I will add next time.
>>>>>
>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>>>>>
>>>>> I think it's better to add Stanislav Fomichev to CC.
>>>>
>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
>>>> well?
>>>
>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
>>>
>>> - setsockopt is probably not needed, right? setsockopt hook triggers
>>>     before the kernel and shouldn't leak anything
>>> - for getsockopt, instead of bypassing bpf completely, should we instead
>>>     ignore the error from the bpf program? that would still preserve
>>>     the observability aspect
>>
>> stealing this thread to discuss the optlen issue which may make sense to
>> bypass also.
>>
>> There has been issue with optlen. Other than this older post related to
>> optlen > PAGE_SIZE:
>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
>> the recent one related to optlen that we have seen is
>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
>> intention is to learn the expected optlen. This makes 'ctx.optlen >
>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
>> -EFAULT to the userspace even the bpf prog has not changed anything.
> 
> (ignoring -EFAULT issue) this seems like it needs to be
> 
> 	if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
> 		/* error */
> 	}
> 
> ?
> 
>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
>> max_optlen' for now (and this can use a separate patch which as usual
>> requires a bpf selftests)?
> 
> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
> seems like the way to go. It caused too much trouble already :-(
> 
> Should I prepare a patch or do you want to take a stab at it?
> 
>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
>> such that it can enforce read-only for some optname and potentially also
>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
>> holes but could be a seed of thought to continue polishing the idea.
> 
> Ack, let's think about it.
> 
> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
> and we have a mostly working bpf_getsockopt (after your refactoring),
> I don't see why we need to continue the current scheme of triggering
> after the kernel?

Since a sleepable hook would cause some restrictions, perhaps, we could 
introduce something like the promise pattern.  In our case here, BPF 
program call an async version of copy_from_user()/copy_to_user() to 
return a promise.

> 
>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>>>     gets called whenever bpf returns an error to make sure protocols have
>>>     a chance to handle that condition (and free the fd)
>>>
>>
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-25 17:59               ` Kui-Feng Lee
@ 2023-04-25 18:42                 ` Stanislav Fomichev
  2023-04-25 21:11                   ` Kui-Feng Lee
  2023-04-25 21:28                   ` Kui-Feng Lee
  0 siblings, 2 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2023-04-25 18:42 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Martin KaFai Lau, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Christian Brauner,
	Kuniyuki Iwashima, Lennart Poettering, linux-arch,
	Aleksandr Mikhalitsyn, bpf

On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 4/18/23 09:47, Stanislav Fomichev wrote:
> > On 04/17, Martin KaFai Lau wrote:
> >> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
> >>> On 04/13, Stanislav Fomichev wrote:
> >>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> >>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>
> >>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> >>>>>>
> >>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> >>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>>>
> >>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> >>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
> >>>>>>> install FDs into the process fdtable.
> >>>>>>>
> >>>>>>> After some offlist discussion it was proposed to add a blacklist of
> >>>>>>
> >>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
> >>>>>
> >>>>> Hi Eric,
> >>>>>
> >>>>> Oh, I'm sorry about that. :( Sure.
> >>>>>
> >>>>>>
> >>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
> >>>>>>>
> >>>>>>
> >>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
> >>>>>
> >>>>> Sure, I will add next time.
> >>>>>
> >>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> >>>>>
> >>>>> I think it's better to add Stanislav Fomichev to CC.
> >>>>
> >>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> >>>> use it for tcp zerocopy, I'm assuming it should work in this case as
> >>>> well?
> >>>
> >>> Jakub reminded me of the other things I wanted to ask here bug forgot:
> >>>
> >>> - setsockopt is probably not needed, right? setsockopt hook triggers
> >>>     before the kernel and shouldn't leak anything
> >>> - for getsockopt, instead of bypassing bpf completely, should we instead
> >>>     ignore the error from the bpf program? that would still preserve
> >>>     the observability aspect
> >>
> >> stealing this thread to discuss the optlen issue which may make sense to
> >> bypass also.
> >>
> >> There has been issue with optlen. Other than this older post related to
> >> optlen > PAGE_SIZE:
> >> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
> >> the recent one related to optlen that we have seen is
> >> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
> >> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
> >> intention is to learn the expected optlen. This makes 'ctx.optlen >
> >> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
> >> -EFAULT to the userspace even the bpf prog has not changed anything.
> >
> > (ignoring -EFAULT issue) this seems like it needs to be
> >
> >       if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
> >               /* error */
> >       }
> >
> > ?
> >
> >> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
> >> max_optlen' for now (and this can use a separate patch which as usual
> >> requires a bpf selftests)?
> >
> > Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
> > seems like the way to go. It caused too much trouble already :-(
> >
> > Should I prepare a patch or do you want to take a stab at it?
> >
> >> In the future, does it make sense to have a specific cgroup-bpf-prog (a
> >> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
> >> such that it can enforce read-only for some optname and potentially also
> >> track if bpf-prog has written a new optval? The bpf-prog can only return 1
> >> (OK) and only allows using bpf_set_retval() instead. Likely there is still
> >> holes but could be a seed of thought to continue polishing the idea.
> >
> > Ack, let's think about it.
> >
> > Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
> > as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
> > and we have a mostly working bpf_getsockopt (after your refactoring),
> > I don't see why we need to continue the current scheme of triggering
> > after the kernel?
>
> Since a sleepable hook would cause some restrictions, perhaps, we could
> introduce something like the promise pattern.  In our case here, BPF
> program call an async version of copy_from_user()/copy_to_user() to
> return a promise.

Having a promise might work. This is essentially what we already do
with sockets/etc with acquire/release pattern.

What are the sleepable restrictions you're hinting about? I feel like
with the sleepable bpf, we can also remove all the temporary buffer
management / extra copies which sounds like a win to me. (we have this
ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
allocate temporary buffers if needed..

> >>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
> >>>     gets called whenever bpf returns an error to make sure protocols have
> >>>     a chance to handle that condition (and free the fd)
> >>>
> >>
> >>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-25 18:42                 ` Stanislav Fomichev
@ 2023-04-25 21:11                   ` Kui-Feng Lee
  2023-04-26 15:50                     ` Stanislav Fomichev
  2023-04-25 21:28                   ` Kui-Feng Lee
  1 sibling, 1 reply; 27+ messages in thread
From: Kui-Feng Lee @ 2023-04-25 21:11 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Martin KaFai Lau, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Christian Brauner,
	Kuniyuki Iwashima, Lennart Poettering, linux-arch,
	Aleksandr Mikhalitsyn, bpf



On 4/25/23 11:42, Stanislav Fomichev wrote:
> On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 4/18/23 09:47, Stanislav Fomichev wrote:
>>> On 04/17, Martin KaFai Lau wrote:
>>>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
>>>>> On 04/13, Stanislav Fomichev wrote:
>>>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>
>>>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
>>>>>>>>
>>>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
>>>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>>>
>>>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
>>>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
>>>>>>>>> install FDs into the process fdtable.
>>>>>>>>>
>>>>>>>>> After some offlist discussion it was proposed to add a blacklist of
>>>>>>>>
>>>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
>>>>>>>
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> Oh, I'm sorry about that. :( Sure.
>>>>>>>
>>>>>>>>
>>>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
>>>>>>>
>>>>>>> Sure, I will add next time.
>>>>>>>
>>>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>>>>>>>
>>>>>>> I think it's better to add Stanislav Fomichev to CC.
>>>>>>
>>>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
>>>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
>>>>>> well?
>>>>>
>>>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
>>>>>
>>>>> - setsockopt is probably not needed, right? setsockopt hook triggers
>>>>>      before the kernel and shouldn't leak anything
>>>>> - for getsockopt, instead of bypassing bpf completely, should we instead
>>>>>      ignore the error from the bpf program? that would still preserve
>>>>>      the observability aspect
>>>>
>>>> stealing this thread to discuss the optlen issue which may make sense to
>>>> bypass also.
>>>>
>>>> There has been issue with optlen. Other than this older post related to
>>>> optlen > PAGE_SIZE:
>>>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
>>>> the recent one related to optlen that we have seen is
>>>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
>>>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
>>>> intention is to learn the expected optlen. This makes 'ctx.optlen >
>>>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
>>>> -EFAULT to the userspace even the bpf prog has not changed anything.
>>>
>>> (ignoring -EFAULT issue) this seems like it needs to be
>>>
>>>        if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
>>>                /* error */
>>>        }
>>>
>>> ?
>>>
>>>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
>>>> max_optlen' for now (and this can use a separate patch which as usual
>>>> requires a bpf selftests)?
>>>
>>> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
>>> seems like the way to go. It caused too much trouble already :-(
>>>
>>> Should I prepare a patch or do you want to take a stab at it?
>>>
>>>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
>>>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
>>>> such that it can enforce read-only for some optname and potentially also
>>>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
>>>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
>>>> holes but could be a seed of thought to continue polishing the idea.
>>>
>>> Ack, let's think about it.
>>>
>>> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
>>> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
>>> and we have a mostly working bpf_getsockopt (after your refactoring),
>>> I don't see why we need to continue the current scheme of triggering
>>> after the kernel?
>>
>> Since a sleepable hook would cause some restrictions, perhaps, we could
>> introduce something like the promise pattern.  In our case here, BPF
>> program call an async version of copy_from_user()/copy_to_user() to
>> return a promise.
> 
> Having a promise might work. This is essentially what we already do
> with sockets/etc with acquire/release pattern.
> 
> What are the sleepable restrictions you're hinting about? I feel like
AFAIK, a sleepable program can use only some types of maps; for example,
array, hash, ringbuf,... etc.  Other types of maps are unavailable to
sleepable programs; for example, sockmap, sockhash.

> with the sleepable bpf, we can also remove all the temporary buffer
> management / extra copies which sounds like a win to me. (we have this
> ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
> allocate temporary buffers if needed..
Agree!


> 
>>>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>>>>>      gets called whenever bpf returns an error to make sure protocols have
>>>>>      a chance to handle that condition (and free the fd)
>>>>>
>>>>
>>>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-25 18:42                 ` Stanislav Fomichev
  2023-04-25 21:11                   ` Kui-Feng Lee
@ 2023-04-25 21:28                   ` Kui-Feng Lee
  2023-04-26 15:50                     ` Stanislav Fomichev
  1 sibling, 1 reply; 27+ messages in thread
From: Kui-Feng Lee @ 2023-04-25 21:28 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Martin KaFai Lau, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Christian Brauner,
	Kuniyuki Iwashima, Lennart Poettering, linux-arch,
	Aleksandr Mikhalitsyn, bpf



On 4/25/23 11:42, Stanislav Fomichev wrote:
> On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 4/18/23 09:47, Stanislav Fomichev wrote:
>>> On 04/17, Martin KaFai Lau wrote:
>>>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
>>>>> On 04/13, Stanislav Fomichev wrote:
>>>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>
>>>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
>>>>>>>>
>>>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
>>>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>>>
>>>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
>>>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
>>>>>>>>> install FDs into the process fdtable.
>>>>>>>>>
>>>>>>>>> After some offlist discussion it was proposed to add a blacklist of
>>>>>>>>
>>>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
>>>>>>>
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> Oh, I'm sorry about that. :( Sure.
>>>>>>>
>>>>>>>>
>>>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
>>>>>>>
>>>>>>> Sure, I will add next time.
>>>>>>>
>>>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>>>>>>>
>>>>>>> I think it's better to add Stanislav Fomichev to CC.
>>>>>>
>>>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
>>>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
>>>>>> well?
>>>>>
>>>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
>>>>>
>>>>> - setsockopt is probably not needed, right? setsockopt hook triggers
>>>>>      before the kernel and shouldn't leak anything
>>>>> - for getsockopt, instead of bypassing bpf completely, should we instead
>>>>>      ignore the error from the bpf program? that would still preserve
>>>>>      the observability aspect
>>>>
>>>> stealing this thread to discuss the optlen issue which may make sense to
>>>> bypass also.
>>>>
>>>> There has been issue with optlen. Other than this older post related to
>>>> optlen > PAGE_SIZE:
>>>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
>>>> the recent one related to optlen that we have seen is
>>>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
>>>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
>>>> intention is to learn the expected optlen. This makes 'ctx.optlen >
>>>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
>>>> -EFAULT to the userspace even the bpf prog has not changed anything.
>>>
>>> (ignoring -EFAULT issue) this seems like it needs to be
>>>
>>>        if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
>>>                /* error */
>>>        }
>>>
>>> ?
>>>
>>>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
>>>> max_optlen' for now (and this can use a separate patch which as usual
>>>> requires a bpf selftests)?
>>>
>>> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
>>> seems like the way to go. It caused too much trouble already :-(
>>>
>>> Should I prepare a patch or do you want to take a stab at it?
>>>
>>>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
>>>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
>>>> such that it can enforce read-only for some optname and potentially also
>>>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
>>>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
>>>> holes but could be a seed of thought to continue polishing the idea.
>>>
>>> Ack, let's think about it.
>>>
>>> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
>>> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
>>> and we have a mostly working bpf_getsockopt (after your refactoring),
>>> I don't see why we need to continue the current scheme of triggering
>>> after the kernel?
>>
>> Since a sleepable hook would cause some restrictions, perhaps, we could
>> introduce something like the promise pattern.  In our case here, BPF
>> program call an async version of copy_from_user()/copy_to_user() to
>> return a promise.
> 
> Having a promise might work. This is essentially what we already do
> with sockets/etc with acquire/release pattern.

Would you mind to give me some context of the socket things?

> 
> What are the sleepable restrictions you're hinting about? I feel like
> with the sleepable bpf, we can also remove all the temporary buffer
> management / extra copies which sounds like a win to me. (we have this
> ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
> allocate temporary buffers if needed..
> 
>>>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>>>>>      gets called whenever bpf returns an error to make sure protocols have
>>>>>      a chance to handle that condition (and free the fd)
>>>>>
>>>>
>>>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-25 21:28                   ` Kui-Feng Lee
@ 2023-04-26 15:50                     ` Stanislav Fomichev
  0 siblings, 0 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2023-04-26 15:50 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Martin KaFai Lau, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Christian Brauner,
	Kuniyuki Iwashima, Lennart Poettering, linux-arch,
	Aleksandr Mikhalitsyn, bpf

On Tue, Apr 25, 2023 at 2:28 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 4/25/23 11:42, Stanislav Fomichev wrote:
> > On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
> >>
> >>
> >>
> >> On 4/18/23 09:47, Stanislav Fomichev wrote:
> >>> On 04/17, Martin KaFai Lau wrote:
> >>>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
> >>>>> On 04/13, Stanislav Fomichev wrote:
> >>>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> >>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>>>
> >>>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> >>>>>>>>
> >>>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> >>>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>>>>>
> >>>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> >>>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
> >>>>>>>>> install FDs into the process fdtable.
> >>>>>>>>>
> >>>>>>>>> After some offlist discussion it was proposed to add a blacklist of
> >>>>>>>>
> >>>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
> >>>>>>>
> >>>>>>> Hi Eric,
> >>>>>>>
> >>>>>>> Oh, I'm sorry about that. :( Sure.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
> >>>>>>>
> >>>>>>> Sure, I will add next time.
> >>>>>>>
> >>>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> >>>>>>>
> >>>>>>> I think it's better to add Stanislav Fomichev to CC.
> >>>>>>
> >>>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> >>>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
> >>>>>> well?
> >>>>>
> >>>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
> >>>>>
> >>>>> - setsockopt is probably not needed, right? setsockopt hook triggers
> >>>>>      before the kernel and shouldn't leak anything
> >>>>> - for getsockopt, instead of bypassing bpf completely, should we instead
> >>>>>      ignore the error from the bpf program? that would still preserve
> >>>>>      the observability aspect
> >>>>
> >>>> stealing this thread to discuss the optlen issue which may make sense to
> >>>> bypass also.
> >>>>
> >>>> There has been issue with optlen. Other than this older post related to
> >>>> optlen > PAGE_SIZE:
> >>>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
> >>>> the recent one related to optlen that we have seen is
> >>>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
> >>>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
> >>>> intention is to learn the expected optlen. This makes 'ctx.optlen >
> >>>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
> >>>> -EFAULT to the userspace even the bpf prog has not changed anything.
> >>>
> >>> (ignoring -EFAULT issue) this seems like it needs to be
> >>>
> >>>        if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
> >>>                /* error */
> >>>        }
> >>>
> >>> ?
> >>>
> >>>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
> >>>> max_optlen' for now (and this can use a separate patch which as usual
> >>>> requires a bpf selftests)?
> >>>
> >>> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
> >>> seems like the way to go. It caused too much trouble already :-(
> >>>
> >>> Should I prepare a patch or do you want to take a stab at it?
> >>>
> >>>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
> >>>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
> >>>> such that it can enforce read-only for some optname and potentially also
> >>>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
> >>>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
> >>>> holes but could be a seed of thought to continue polishing the idea.
> >>>
> >>> Ack, let's think about it.
> >>>
> >>> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
> >>> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
> >>> and we have a mostly working bpf_getsockopt (after your refactoring),
> >>> I don't see why we need to continue the current scheme of triggering
> >>> after the kernel?
> >>
> >> Since a sleepable hook would cause some restrictions, perhaps, we could
> >> introduce something like the promise pattern.  In our case here, BPF
> >> program call an async version of copy_from_user()/copy_to_user() to
> >> return a promise.
> >
> > Having a promise might work. This is essentially what we already do
> > with sockets/etc with acquire/release pattern.
>
> Would you mind to give me some context of the socket things?

I'm mostly referring to the infra around KF_ACQUIRE/KF_RELEASE where
the verifier already has some resource tracking functionality. We can
probably extend it to verify that the program does copy_to_user
equivalent at the end of the run (or somehow specifically marks that
it's not needed).

> >
> > What are the sleepable restrictions you're hinting about? I feel like
> > with the sleepable bpf, we can also remove all the temporary buffer
> > management / extra copies which sounds like a win to me. (we have this
> > ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
> > allocate temporary buffers if needed..
> >
> >>>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
> >>>>>      gets called whenever bpf returns an error to make sure protocols have
> >>>>>      a chance to handle that condition (and free the fd)
> >>>>>
> >>>>
> >>>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-25 21:11                   ` Kui-Feng Lee
@ 2023-04-26 15:50                     ` Stanislav Fomichev
  0 siblings, 0 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2023-04-26 15:50 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Martin KaFai Lau, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David Ahern, Arnd Bergmann, Kees Cook, Christian Brauner,
	Kuniyuki Iwashima, Lennart Poettering, linux-arch,
	Aleksandr Mikhalitsyn, bpf

On Tue, Apr 25, 2023 at 2:11 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 4/25/23 11:42, Stanislav Fomichev wrote:
> > On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
> >>
> >>
> >>
> >> On 4/18/23 09:47, Stanislav Fomichev wrote:
> >>> On 04/17, Martin KaFai Lau wrote:
> >>>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
> >>>>> On 04/13, Stanislav Fomichev wrote:
> >>>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> >>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>>>
> >>>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> >>>>>>>>
> >>>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> >>>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>>>>>
> >>>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> >>>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
> >>>>>>>>> install FDs into the process fdtable.
> >>>>>>>>>
> >>>>>>>>> After some offlist discussion it was proposed to add a blacklist of
> >>>>>>>>
> >>>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
> >>>>>>>
> >>>>>>> Hi Eric,
> >>>>>>>
> >>>>>>> Oh, I'm sorry about that. :( Sure.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
> >>>>>>>
> >>>>>>> Sure, I will add next time.
> >>>>>>>
> >>>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> >>>>>>>
> >>>>>>> I think it's better to add Stanislav Fomichev to CC.
> >>>>>>
> >>>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> >>>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
> >>>>>> well?
> >>>>>
> >>>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
> >>>>>
> >>>>> - setsockopt is probably not needed, right? setsockopt hook triggers
> >>>>>      before the kernel and shouldn't leak anything
> >>>>> - for getsockopt, instead of bypassing bpf completely, should we instead
> >>>>>      ignore the error from the bpf program? that would still preserve
> >>>>>      the observability aspect
> >>>>
> >>>> stealing this thread to discuss the optlen issue which may make sense to
> >>>> bypass also.
> >>>>
> >>>> There has been issue with optlen. Other than this older post related to
> >>>> optlen > PAGE_SIZE:
> >>>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
> >>>> the recent one related to optlen that we have seen is
> >>>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
> >>>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
> >>>> intention is to learn the expected optlen. This makes 'ctx.optlen >
> >>>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
> >>>> -EFAULT to the userspace even the bpf prog has not changed anything.
> >>>
> >>> (ignoring -EFAULT issue) this seems like it needs to be
> >>>
> >>>        if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
> >>>                /* error */
> >>>        }
> >>>
> >>> ?
> >>>
> >>>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
> >>>> max_optlen' for now (and this can use a separate patch which as usual
> >>>> requires a bpf selftests)?
> >>>
> >>> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
> >>> seems like the way to go. It caused too much trouble already :-(
> >>>
> >>> Should I prepare a patch or do you want to take a stab at it?
> >>>
> >>>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
> >>>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
> >>>> such that it can enforce read-only for some optname and potentially also
> >>>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
> >>>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
> >>>> holes but could be a seed of thought to continue polishing the idea.
> >>>
> >>> Ack, let's think about it.
> >>>
> >>> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
> >>> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
> >>> and we have a mostly working bpf_getsockopt (after your refactoring),
> >>> I don't see why we need to continue the current scheme of triggering
> >>> after the kernel?
> >>
> >> Since a sleepable hook would cause some restrictions, perhaps, we could
> >> introduce something like the promise pattern.  In our case here, BPF
> >> program call an async version of copy_from_user()/copy_to_user() to
> >> return a promise.
> >
> > Having a promise might work. This is essentially what we already do
> > with sockets/etc with acquire/release pattern.
> >
> > What are the sleepable restrictions you're hinting about? I feel like
> AFAIK, a sleepable program can use only some types of maps; for example,
> array, hash, ringbuf,... etc.  Other types of maps are unavailable to
> sleepable programs; for example, sockmap, sockhash.

Sure, but it doesn't have to stay that way. (hypothetically) If we
think that sleepable makes sense, we can try to expand the scope.

> > with the sleepable bpf, we can also remove all the temporary buffer
> > management / extra copies which sounds like a win to me. (we have this
> > ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
> > allocate temporary buffers if needed..
> Agree!
>
>
> >
> >>>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
> >>>>>      gets called whenever bpf returns an error to make sure protocols have
> >>>>>      a chance to handle that condition (and free the fd)
> >>>>>
> >>>>
> >>>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-04-26 15:51 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-13 13:33 [PATCH net-next v4 0/4] Add SCM_PIDFD and SO_PEERPIDFD Alexander Mikhalitsyn
2023-04-13 13:33 ` [PATCH net-next v4 1/4] scm: add SO_PASSPIDFD and SCM_PIDFD Alexander Mikhalitsyn
2023-04-17 15:18   ` Christian Brauner
2023-04-17 16:01     ` Aleksandr Mikhalitsyn
2023-04-17 17:16       ` Christian Brauner
2023-04-17 17:43         ` Eric Dumazet
2023-04-18  8:16           ` Christian Brauner
2023-04-18 13:07   ` Christian Brauner
2023-04-13 13:33 ` [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook Alexander Mikhalitsyn
2023-04-13 14:21   ` Eric Dumazet
2023-04-13 14:38     ` Aleksandr Mikhalitsyn
2023-04-13 16:37       ` Stanislav Fomichev
2023-04-15  1:55         ` Stanislav Fomichev
2023-04-17 14:42           ` Christian Brauner
2023-04-17 18:10             ` Stanislav Fomichev
2023-04-18  1:03           ` handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook) Martin KaFai Lau
2023-04-18 16:47             ` Stanislav Fomichev
2023-04-25 17:59               ` Kui-Feng Lee
2023-04-25 18:42                 ` Stanislav Fomichev
2023-04-25 21:11                   ` Kui-Feng Lee
2023-04-26 15:50                     ` Stanislav Fomichev
2023-04-25 21:28                   ` Kui-Feng Lee
2023-04-26 15:50                     ` Stanislav Fomichev
2023-04-15  0:51       ` [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook Jakub Kicinski
2023-04-13 13:33 ` [PATCH net-next v4 3/4] net: core: add getsockopt SO_PEERPIDFD Alexander Mikhalitsyn
2023-04-18 13:13   ` Christian Brauner
2023-04-13 13:33 ` [PATCH net-next v4 4/4] selftests: net: add SCM_PIDFD / SO_PEERPIDFD test Alexander Mikhalitsyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).