linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Add seccomp notifier ioctl that enables adding fds
@ 2020-07-06 20:17 Kees Cook
  2020-07-06 20:17 ` [PATCH v6 1/7] net/scm: Regularize compat handling of scm_detach_fds() Kees Cook
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Kees Cook @ 2020-07-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

Hello!

v6:
- fix missing fput()
- API name change: s/fd_install_received/receive_fd/
v5: https://lore.kernel.org/lkml/20200617220327.3731559-1-keescook@chromium.org/

This continues the thread-merge between [1] and [2]. tl;dr: add a way for
a seccomp user_notif process manager to inject files into the managed
process in order to handle emulation of various fd-returning syscalls
across security boundaries. Containers folks and Chrome are in need
of the feature, and investigating this solution uncovered (and fixed)
implementation issues with existing file sending routines.

I intend to carry this in the for-next/seccomp tree, unless someone
has objections. :) Please review and test!

-Kees

[1] https://lore.kernel.org/lkml/20200603011044.7972-1-sargun@sargun.me/
[2] https://lore.kernel.org/lkml/20200610045214.1175600-1-keescook@chromium.org/


Kees Cook (5):
  net/scm: Regularize compat handling of scm_detach_fds()
  fs: Move __scm_install_fd() to __receive_fd()
  fs: Add receive_fd() wrapper for __receive_fd()
  pidfd: Replace open-coded partial receive_fd()
  fs: Expand __receive_fd() to accept existing fd

Sargun Dhillon (2):
  seccomp: Introduce addfd ioctl to seccomp user notifier
  selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD

 fs/file.c                                     |  67 +++++
 include/linux/file.h                          |  19 ++
 include/linux/net.h                           |   9 +
 include/uapi/linux/seccomp.h                  |  22 ++
 kernel/pid.c                                  |  13 +-
 kernel/seccomp.c                              | 172 ++++++++++++-
 net/compat.c                                  |  55 ++---
 net/core/scm.c                                |  50 +---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 229 ++++++++++++++++++
 9 files changed, 554 insertions(+), 82 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v6 1/7] net/scm: Regularize compat handling of scm_detach_fds()
  2020-07-06 20:17 [PATCH v6 0/7] Add seccomp notifier ioctl that enables adding fds Kees Cook
@ 2020-07-06 20:17 ` Kees Cook
  2020-07-07 11:41   ` Christian Brauner
  2020-07-06 20:17 ` [PATCH v6 2/7] fs: Move __scm_install_fd() to __receive_fd() Kees Cook
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-07-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

Duplicate the cleanups from commit 2618d530dd8b ("net/scm: cleanup
scm_detach_fds") into the compat code.

Move the check added in commit 1f466e1f15cf ("net: cleanly handle kernel
vs user buffers for ->msg_control") to before the compat call, even
though it should be impossible for an in-kernel call to also be compat.

Correct the int "flags" argument to unsigned int to match fd_install()
and similar APIs.

Regularize any remaining differences, including a whitespace issue,
a checkpatch warning, and add the check from commit 6900317f5eff ("net,
scm: fix PaX detected msg_controllen overflow in scm_detach_fds") which
fixed an overflow unique to 64-bit. To avoid confusion when comparing
the compat handler to the native handler, just include the same check
in the compat handler.

Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/scm.h |  1 +
 net/compat.c      | 55 +++++++++++++++++++++--------------------------
 net/core/scm.c    | 18 ++++++++--------
 3 files changed, 35 insertions(+), 39 deletions(-)

diff --git a/include/net/scm.h b/include/net/scm.h
index 1ce365f4c256..581a94d6c613 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -37,6 +37,7 @@ struct scm_cookie {
 #endif
 };
 
+int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags);
 void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
 void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm);
 int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm);
diff --git a/net/compat.c b/net/compat.c
index 5e3041a2c37d..27d477fdcaa0 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -281,39 +281,31 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
 	return 0;
 }
 
-void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
+static int scm_max_fds_compat(struct msghdr *msg)
 {
-	struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
-	int fdmax = (kmsg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
-	int fdnum = scm->fp->count;
-	struct file **fp = scm->fp->fp;
-	int __user *cmfptr;
-	int err = 0, i;
+	if (msg->msg_controllen <= sizeof(struct compat_cmsghdr))
+		return 0;
+	return (msg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
+}
 
-	if (fdnum < fdmax)
-		fdmax = fdnum;
+void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm)
+{
+	struct compat_cmsghdr __user *cm =
+		(struct compat_cmsghdr __user *)msg->msg_control;
+	unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
+	int fdmax = min_t(int, scm_max_fds_compat(msg), scm->fp->count);
+	int __user *cmsg_data = CMSG_USER_DATA(cm);
+	int err = 0, i;
 
-	for (i = 0, cmfptr = (int __user *) CMSG_COMPAT_DATA(cm); i < fdmax; i++, cmfptr++) {
-		int new_fd;
-		err = security_file_receive(fp[i]);
+	for (i = 0; i < fdmax; i++) {
+		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
 		if (err)
 			break;
-		err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & kmsg->msg_flags
-					  ? O_CLOEXEC : 0);
-		if (err < 0)
-			break;
-		new_fd = err;
-		err = put_user(new_fd, cmfptr);
-		if (err) {
-			put_unused_fd(new_fd);
-			break;
-		}
-		/* Bump the usage count and install the file. */
-		fd_install(new_fd, get_file(fp[i]));
 	}
 
 	if (i > 0) {
 		int cmlen = CMSG_COMPAT_LEN(i * sizeof(int));
+
 		err = put_user(SOL_SOCKET, &cm->cmsg_level);
 		if (!err)
 			err = put_user(SCM_RIGHTS, &cm->cmsg_type);
@@ -321,16 +313,19 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
 			err = put_user(cmlen, &cm->cmsg_len);
 		if (!err) {
 			cmlen = CMSG_COMPAT_SPACE(i * sizeof(int));
-			kmsg->msg_control += cmlen;
-			kmsg->msg_controllen -= cmlen;
+			if (msg->msg_controllen < cmlen)
+				cmlen = msg->msg_controllen;
+			msg->msg_control += cmlen;
+			msg->msg_controllen -= cmlen;
 		}
 	}
-	if (i < fdnum)
-		kmsg->msg_flags |= MSG_CTRUNC;
+
+	if (i < scm->fp->count || (scm->fp->count && fdmax <= 0))
+		msg->msg_flags |= MSG_CTRUNC;
 
 	/*
-	 * All of the files that fit in the message have had their
-	 * usage counts incremented, so we just free the list.
+	 * All of the files that fit in the message have had their usage counts
+	 * incremented, so we just free the list.
 	 */
 	__scm_destroy(scm);
 }
diff --git a/net/core/scm.c b/net/core/scm.c
index 875df1c2989d..6151678c73ed 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -280,7 +280,7 @@ void put_cmsg_scm_timestamping(struct msghdr *msg, struct scm_timestamping_inter
 }
 EXPORT_SYMBOL(put_cmsg_scm_timestamping);
 
-static int __scm_install_fd(struct file *file, int __user *ufd, int o_flags)
+int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags)
 {
 	struct socket *sock;
 	int new_fd;
@@ -319,29 +319,29 @@ static int scm_max_fds(struct msghdr *msg)
 
 void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
 {
-	struct cmsghdr __user *cm
-		= (__force struct cmsghdr __user*)msg->msg_control;
-	int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
+	struct cmsghdr __user *cm =
+		(__force struct cmsghdr __user *)msg->msg_control;
+	unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
 	int fdmax = min_t(int, scm_max_fds(msg), scm->fp->count);
 	int __user *cmsg_data = CMSG_USER_DATA(cm);
 	int err = 0, i;
 
+	/* no use for FD passing from kernel space callers */
+	if (WARN_ON_ONCE(!msg->msg_control_is_user))
+		return;
+
 	if (msg->msg_flags & MSG_CMSG_COMPAT) {
 		scm_detach_fds_compat(msg, scm);
 		return;
 	}
 
-	/* no use for FD passing from kernel space callers */
-	if (WARN_ON_ONCE(!msg->msg_control_is_user))
-		return;
-
 	for (i = 0; i < fdmax; i++) {
 		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
 		if (err)
 			break;
 	}
 
-	if (i > 0)  {
+	if (i > 0) {
 		int cmlen = CMSG_LEN(i * sizeof(int));
 
 		err = put_user(SOL_SOCKET, &cm->cmsg_level);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 2/7] fs: Move __scm_install_fd() to __receive_fd()
  2020-07-06 20:17 [PATCH v6 0/7] Add seccomp notifier ioctl that enables adding fds Kees Cook
  2020-07-06 20:17 ` [PATCH v6 1/7] net/scm: Regularize compat handling of scm_detach_fds() Kees Cook
@ 2020-07-06 20:17 ` Kees Cook
  2020-07-07  6:49   ` Christoph Hellwig
  2020-07-07 11:45   ` Christian Brauner
  2020-07-06 20:17 ` [PATCH v6 3/7] fs: Add receive_fd() wrapper for __receive_fd() Kees Cook
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Kees Cook @ 2020-07-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

In preparation for users of the "install a received file" logic outside
of net/ (pidfd and seccomp), relocate and rename __scm_install_fd() from
net/core/scm.c to __receive_fd() in fs/file.c, and provide a wrapper
named receive_fd_user(), as future patches will change the interface
to __receive_fd().

Reviewed-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/file.c            | 48 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/file.h |  8 ++++++++
 include/linux/net.h  |  9 +++++++++
 include/net/scm.h    |  1 -
 net/compat.c         |  2 +-
 net/core/scm.c       | 32 +----------------------------
 6 files changed, 67 insertions(+), 33 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index abb8b7081d7a..3bd67233088b 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -11,6 +11,7 @@
 #include <linux/export.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
+#include <linux/net.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
 #include <linux/file.h>
@@ -18,6 +19,8 @@
 #include <linux/bitops.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
+#include <net/cls_cgroup.h>
+#include <net/netprio_cgroup.h>
 
 unsigned int sysctl_nr_open __read_mostly = 1024*1024;
 unsigned int sysctl_nr_open_min = BITS_PER_LONG;
@@ -931,6 +934,51 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 	return err;
 }
 
+/**
+ * __receive_fd() - Install received file into file descriptor table
+ *
+ * @file: struct file that was received from another process
+ * @ufd: __user pointer to write new fd number to
+ * @o_flags: the O_* flags to apply to the new fd entry
+ *
+ * Installs a received file into the file descriptor table, with appropriate
+ * checks and count updates. Writes the fd number to userspace.
+ *
+ * This helper handles its own reference counting of the incoming
+ * struct file.
+ *
+ * Returns -ve on error.
+ */
+int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
+{
+	struct socket *sock;
+	int new_fd;
+	int error;
+
+	error = security_file_receive(file);
+	if (error)
+		return error;
+
+	new_fd = get_unused_fd_flags(o_flags);
+	if (new_fd < 0)
+		return new_fd;
+
+	error = put_user(new_fd, ufd);
+	if (error) {
+		put_unused_fd(new_fd);
+		return error;
+	}
+
+	/* Bump the usage count and install the file. */
+	sock = sock_from_file(file, &error);
+	if (sock) {
+		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
+		sock_update_classid(&sock->sk->sk_cgrp_data);
+	}
+	fd_install(new_fd, get_file(file));
+	return 0;
+}
+
 static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
 {
 	int err = -EBADF;
diff --git a/include/linux/file.h b/include/linux/file.h
index 122f80084a3e..b14ff2ffd0bd 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -91,6 +91,14 @@ extern void put_unused_fd(unsigned int fd);
 
 extern void fd_install(unsigned int fd, struct file *file);
 
+extern int __receive_fd(struct file *file, int __user *ufd,
+			unsigned int o_flags);
+static inline int receive_fd_user(struct file *file, int __user *ufd,
+				  unsigned int o_flags)
+{
+	return __receive_fd(file, ufd, o_flags);
+}
+
 extern void flush_delayed_fput(void);
 extern void __fput_sync(struct file *);
 
diff --git a/include/linux/net.h b/include/linux/net.h
index 016a9c5faa34..0ca550d0f6a6 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -240,7 +240,16 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg);
 int sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags);
 struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname);
 struct socket *sockfd_lookup(int fd, int *err);
+
+#ifdef CONFIG_NET
 struct socket *sock_from_file(struct file *file, int *err);
+#else
+static inline struct socket *sock_from_file(struct file *file, int *err)
+{
+	return NULL;
+}
+#endif
+
 #define		     sockfd_put(sock) fput(sock->file)
 int net_ratelimit(void);
 
diff --git a/include/net/scm.h b/include/net/scm.h
index 581a94d6c613..1ce365f4c256 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -37,7 +37,6 @@ struct scm_cookie {
 #endif
 };
 
-int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags);
 void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
 void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm);
 int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm);
diff --git a/net/compat.c b/net/compat.c
index 27d477fdcaa0..e74cd3dae8b0 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -298,7 +298,7 @@ void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm)
 	int err = 0, i;
 
 	for (i = 0; i < fdmax; i++) {
-		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
+		err = receive_fd_user(scm->fp->fp[i], cmsg_data + i, o_flags);
 		if (err)
 			break;
 	}
diff --git a/net/core/scm.c b/net/core/scm.c
index 6151678c73ed..67c166a7820d 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -280,36 +280,6 @@ void put_cmsg_scm_timestamping(struct msghdr *msg, struct scm_timestamping_inter
 }
 EXPORT_SYMBOL(put_cmsg_scm_timestamping);
 
-int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags)
-{
-	struct socket *sock;
-	int new_fd;
-	int error;
-
-	error = security_file_receive(file);
-	if (error)
-		return error;
-
-	new_fd = get_unused_fd_flags(o_flags);
-	if (new_fd < 0)
-		return new_fd;
-
-	error = put_user(new_fd, ufd);
-	if (error) {
-		put_unused_fd(new_fd);
-		return error;
-	}
-
-	/* Bump the usage count and install the file. */
-	sock = sock_from_file(file, &error);
-	if (sock) {
-		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
-		sock_update_classid(&sock->sk->sk_cgrp_data);
-	}
-	fd_install(new_fd, get_file(file));
-	return 0;
-}
-
 static int scm_max_fds(struct msghdr *msg)
 {
 	if (msg->msg_controllen <= sizeof(struct cmsghdr))
@@ -336,7 +306,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
 	}
 
 	for (i = 0; i < fdmax; i++) {
-		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
+		err = receive_fd_user(scm->fp->fp[i], cmsg_data + i, o_flags);
 		if (err)
 			break;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 3/7] fs: Add receive_fd() wrapper for __receive_fd()
  2020-07-06 20:17 [PATCH v6 0/7] Add seccomp notifier ioctl that enables adding fds Kees Cook
  2020-07-06 20:17 ` [PATCH v6 1/7] net/scm: Regularize compat handling of scm_detach_fds() Kees Cook
  2020-07-06 20:17 ` [PATCH v6 2/7] fs: Move __scm_install_fd() to __receive_fd() Kees Cook
@ 2020-07-06 20:17 ` Kees Cook
  2020-07-07  6:49   ` Christoph Hellwig
  2020-07-07 11:49   ` Christian Brauner
  2020-07-06 20:17 ` [PATCH v6 4/7] pidfd: Replace open-coded partial receive_fd() Kees Cook
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Kees Cook @ 2020-07-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

For both pidfd and seccomp, the __user pointer is not used. Update
__receive_fd() to make writing to ufd optional via a NULL check. However,
for the receive_fd_user() wrapper, ufd is NULL checked so an -EFAULT
can be returned to avoid changing the SCM_RIGHTS interface behavior. Add
new wrapper receive_fd() for pidfd and seccomp that does not use the ufd
argument. For the new helper, the allocated fd needs to be returned on
success. Update the existing callers to handle it.

Reviewed-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/file.c            | 23 +++++++++++++++--------
 include/linux/file.h |  7 +++++++
 net/compat.c         |  2 +-
 net/core/scm.c       |  2 +-
 4 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 3bd67233088b..0efdcf413210 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -942,12 +942,13 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
  * @o_flags: the O_* flags to apply to the new fd entry
  *
  * Installs a received file into the file descriptor table, with appropriate
- * checks and count updates. Writes the fd number to userspace.
+ * checks and count updates. Optionally writes the fd number to userspace, if
+ * @ufd is non-NULL.
  *
  * This helper handles its own reference counting of the incoming
  * struct file.
  *
- * Returns -ve on error.
+ * Returns newly install fd or -ve on error.
  */
 int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
 {
@@ -963,20 +964,26 @@ int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
 	if (new_fd < 0)
 		return new_fd;
 
-	error = put_user(new_fd, ufd);
-	if (error) {
-		put_unused_fd(new_fd);
-		return error;
+	if (ufd) {
+		error = put_user(new_fd, ufd);
+		if (error) {
+			put_unused_fd(new_fd);
+			return error;
+		}
 	}
 
-	/* Bump the usage count and install the file. */
+	/*
+	 * Bump the usage count and install the file. The resulting value of
+	 * "error" is ignored here since we only need to take action when
+	 * the file is a socket and testing "sock" for NULL is sufficient.
+	 */
 	sock = sock_from_file(file, &error);
 	if (sock) {
 		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
 		sock_update_classid(&sock->sk->sk_cgrp_data);
 	}
 	fd_install(new_fd, get_file(file));
-	return 0;
+	return new_fd;
 }
 
 static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
diff --git a/include/linux/file.h b/include/linux/file.h
index b14ff2ffd0bd..d9fee9f5c8da 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -9,6 +9,7 @@
 #include <linux/compiler.h>
 #include <linux/types.h>
 #include <linux/posix_types.h>
+#include <linux/errno.h>
 
 struct file;
 
@@ -96,8 +97,14 @@ extern int __receive_fd(struct file *file, int __user *ufd,
 static inline int receive_fd_user(struct file *file, int __user *ufd,
 				  unsigned int o_flags)
 {
+	if (ufd == NULL)
+		return -EFAULT;
 	return __receive_fd(file, ufd, o_flags);
 }
+static inline int receive_fd(struct file *file, unsigned int o_flags)
+{
+	return __receive_fd(file, NULL, o_flags);
+}
 
 extern void flush_delayed_fput(void);
 extern void __fput_sync(struct file *);
diff --git a/net/compat.c b/net/compat.c
index e74cd3dae8b0..dc7ddbc2b15e 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -299,7 +299,7 @@ void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm)
 
 	for (i = 0; i < fdmax; i++) {
 		err = receive_fd_user(scm->fp->fp[i], cmsg_data + i, o_flags);
-		if (err)
+		if (err < 0)
 			break;
 	}
 
diff --git a/net/core/scm.c b/net/core/scm.c
index 67c166a7820d..8156d4fb8a39 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -307,7 +307,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
 
 	for (i = 0; i < fdmax; i++) {
 		err = receive_fd_user(scm->fp->fp[i], cmsg_data + i, o_flags);
-		if (err)
+		if (err < 0)
 			break;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 4/7] pidfd: Replace open-coded partial receive_fd()
  2020-07-06 20:17 [PATCH v6 0/7] Add seccomp notifier ioctl that enables adding fds Kees Cook
                   ` (2 preceding siblings ...)
  2020-07-06 20:17 ` [PATCH v6 3/7] fs: Add receive_fd() wrapper for __receive_fd() Kees Cook
@ 2020-07-06 20:17 ` Kees Cook
  2020-07-07 12:22   ` Christian Brauner
  2020-07-06 20:17 ` [PATCH v6 5/7] fs: Expand __receive_fd() to accept existing fd Kees Cook
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-07-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

The sock counting (sock_update_netprioidx() and sock_update_classid()) was
missing from pidfd's implementation of received fd installation. Replace
the open-coded version with a call to the new receive_fd()
helper.

Thanks to Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> for
catching a missed fput() in an earlier version of this patch.

Fixes: 8649c322f75c ("pid: Implement pidfd_getfd syscall")
Reviewed-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/pid.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index f1496b757162..a31c102f4c87 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -635,17 +635,8 @@ static int pidfd_getfd(struct pid *pid, int fd)
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 
-	ret = security_file_receive(file);
-	if (ret) {
-		fput(file);
-		return ret;
-	}
-
-	ret = get_unused_fd_flags(O_CLOEXEC);
-	if (ret < 0)
-		fput(file);
-	else
-		fd_install(ret, file);
+	ret = receive_fd(file, O_CLOEXEC);
+	fput(file);
 
 	return ret;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 5/7] fs: Expand __receive_fd() to accept existing fd
  2020-07-06 20:17 [PATCH v6 0/7] Add seccomp notifier ioctl that enables adding fds Kees Cook
                   ` (3 preceding siblings ...)
  2020-07-06 20:17 ` [PATCH v6 4/7] pidfd: Replace open-coded partial receive_fd() Kees Cook
@ 2020-07-06 20:17 ` Kees Cook
  2020-07-07 12:38   ` Christian Brauner
  2020-07-06 20:17 ` [PATCH v6 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier Kees Cook
  2020-07-06 20:17 ` [PATCH v6 7/7] selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD Kees Cook
  6 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-07-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

Expand __receive_fd() with support for replace_fd() for the coming seccomp
"addfd" ioctl(). Add new wrapper receive_fd_replace() for the new behavior
and update existing wrappers to retain old behavior.

Thanks to Colin Ian King <colin.king@canonical.com> for pointing out an
uninitialized variable exposure in an earlier version of this patch.

Reviewed-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/file.c            | 24 ++++++++++++++++++------
 include/linux/file.h | 10 +++++++---
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 0efdcf413210..11313ff36802 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -937,6 +937,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 /**
  * __receive_fd() - Install received file into file descriptor table
  *
+ * @fd: fd to install into (if negative, a new fd will be allocated)
  * @file: struct file that was received from another process
  * @ufd: __user pointer to write new fd number to
  * @o_flags: the O_* flags to apply to the new fd entry
@@ -950,7 +951,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
  *
  * Returns newly install fd or -ve on error.
  */
-int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
+int __receive_fd(int fd, struct file *file, int __user *ufd, unsigned int o_flags)
 {
 	struct socket *sock;
 	int new_fd;
@@ -960,18 +961,30 @@ int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
 	if (error)
 		return error;
 
-	new_fd = get_unused_fd_flags(o_flags);
-	if (new_fd < 0)
-		return new_fd;
+	if (fd < 0) {
+		new_fd = get_unused_fd_flags(o_flags);
+		if (new_fd < 0)
+			return new_fd;
+	} else
+		new_fd = fd;
 
 	if (ufd) {
 		error = put_user(new_fd, ufd);
 		if (error) {
-			put_unused_fd(new_fd);
+			if (fd < 0)
+				put_unused_fd(new_fd);
 			return error;
 		}
 	}
 
+	if (fd < 0)
+		fd_install(new_fd, get_file(file));
+	else {
+		error = replace_fd(new_fd, file, o_flags);
+		if (error)
+			return error;
+	}
+
 	/*
 	 * Bump the usage count and install the file. The resulting value of
 	 * "error" is ignored here since we only need to take action when
@@ -982,7 +995,6 @@ int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
 		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
 		sock_update_classid(&sock->sk->sk_cgrp_data);
 	}
-	fd_install(new_fd, get_file(file));
 	return new_fd;
 }
 
diff --git a/include/linux/file.h b/include/linux/file.h
index d9fee9f5c8da..225982792fa2 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -92,18 +92,22 @@ extern void put_unused_fd(unsigned int fd);
 
 extern void fd_install(unsigned int fd, struct file *file);
 
-extern int __receive_fd(struct file *file, int __user *ufd,
+extern int __receive_fd(int fd, struct file *file, int __user *ufd,
 			unsigned int o_flags);
 static inline int receive_fd_user(struct file *file, int __user *ufd,
 				  unsigned int o_flags)
 {
 	if (ufd == NULL)
 		return -EFAULT;
-	return __receive_fd(file, ufd, o_flags);
+	return __receive_fd(-1, file, ufd, o_flags);
 }
 static inline int receive_fd(struct file *file, unsigned int o_flags)
 {
-	return __receive_fd(file, NULL, o_flags);
+	return __receive_fd(-1, file, NULL, o_flags);
+}
+static inline int receive_fd_replace(int fd, struct file *file, unsigned int o_flags)
+{
+	return __receive_fd(fd, file, NULL, o_flags);
 }
 
 extern void flush_delayed_fput(void);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier
  2020-07-06 20:17 [PATCH v6 0/7] Add seccomp notifier ioctl that enables adding fds Kees Cook
                   ` (4 preceding siblings ...)
  2020-07-06 20:17 ` [PATCH v6 5/7] fs: Expand __receive_fd() to accept existing fd Kees Cook
@ 2020-07-06 20:17 ` Kees Cook
  2020-07-07 13:30   ` Christian Brauner
  2020-07-06 20:17 ` [PATCH v6 7/7] selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD Kees Cook
  6 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-07-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Matt Denton, Christian Brauner,
	Tycho Andersen, David Laight, Christoph Hellwig, David S. Miller,
	Jakub Kicinski, Alexander Viro, Aleksa Sarai, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

From: Sargun Dhillon <sargun@sargun.me>

This adds a seccomp notifier ioctl which allows for the listener to
"add" file descriptors to a process which originated a seccomp user
notification. This allows calls like mount, and mknod to be "implemented",
as the return value, and the arguments are data in memory. On the other
hand, calls like connect can be "implemented" using pidfd_getfd.

Unfortunately, there are calls which return file descriptors, like
open, which are vulnerable to ToCToU attacks, and require that the more
privileged supervisor can inspect the argument, and perform the syscall
on behalf of the process generating the notification. This allows the
file descriptor generated from that open call to be returned to the
calling process.

In addition, there is functionality to allow for replacement of specific
file descriptors, following dup2-like semantics.

The ioctl handling is based on the discussions[1] of how Extensible
Arguments should interact with ioctls. Instead of building size into
the addfd structure, make it a function of the ioctl command (which
is how sizes are normally passed to ioctls). To support forward and
backward compatibility, just mask out the direction and size, and match
everything. The size (and any future direction) checks are done along
with copy_struct_from_user() logic.

As a note, the seccomp_notif_addfd structure is laid out based on 8-byte
alignment without requiring packing as there have been packing issues
with uapi highlighted before[1][2]. Although we could overload the
newfd field and use -1 to indicate that it is not to be used, doing
so requires changing the size of the fd field, and introduces struct
packing complexity.

[1]: https://lore.kernel.org/lkml/87o8w9bcaf.fsf@mid.deneb.enyo.de/
[2]: https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f18c0@rasmusvillemoes.dk/
[3]: https://lore.kernel.org/lkml/20200612104629.GA15814@ircssh-2.c.rugged-nimbus-611.internal

Suggested-by: Matt Denton <mpdenton@google.com>
Link: https://lore.kernel.org/r/20200603011044.7972-4-sargun@sargun.me
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/uapi/linux/seccomp.h |  22 +++++
 kernel/seccomp.c             | 172 ++++++++++++++++++++++++++++++++++-
 2 files changed, 193 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 965290f7dcc2..6ba18b82a02e 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -113,6 +113,25 @@ struct seccomp_notif_resp {
 	__u32 flags;
 };
 
+/* valid flags for seccomp_notif_addfd */
+#define SECCOMP_ADDFD_FLAG_SETFD	(1UL << 0) /* Specify remote fd */
+
+/**
+ * struct seccomp_notif_addfd
+ * @id: The ID of the seccomp notification
+ * @flags: SECCOMP_ADDFD_FLAG_*
+ * @srcfd: The local fd number
+ * @newfd: Optional remote FD number if SETFD option is set, otherwise 0.
+ * @newfd_flags: The O_* flags the remote FD should have applied
+ */
+struct seccomp_notif_addfd {
+	__u64 id;
+	__u32 flags;
+	__u32 srcfd;
+	__u32 newfd;
+	__u32 newfd_flags;
+};
+
 #define SECCOMP_IOC_MAGIC		'!'
 #define SECCOMP_IO(nr)			_IO(SECCOMP_IOC_MAGIC, nr)
 #define SECCOMP_IOR(nr, type)		_IOR(SECCOMP_IOC_MAGIC, nr, type)
@@ -124,5 +143,8 @@ struct seccomp_notif_resp {
 #define SECCOMP_IOCTL_NOTIF_SEND	SECCOMP_IOWR(1,	\
 						struct seccomp_notif_resp)
 #define SECCOMP_IOCTL_NOTIF_ID_VALID	SECCOMP_IOW(2, __u64)
+/* On success, the return value is the remote process's added fd number */
+#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOW(3, \
+						struct seccomp_notif_addfd)
 
 #endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 866a432cd746..ee314036e429 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -87,10 +87,42 @@ struct seccomp_knotif {
 	long val;
 	u32 flags;
 
-	/* Signals when this has entered SECCOMP_NOTIFY_REPLIED */
+	/*
+	 * Signals when this has changed states, such as the listener
+	 * dying, a new seccomp addfd message, or changing to REPLIED
+	 */
 	struct completion ready;
 
 	struct list_head list;
+
+	/* outstanding addfd requests */
+	struct list_head addfd;
+};
+
+/**
+ * struct seccomp_kaddfd - container for seccomp_addfd ioctl messages
+ *
+ * @file: A reference to the file to install in the other task
+ * @fd: The fd number to install it at. If the fd number is -1, it means the
+ *      installing process should allocate the fd as normal.
+ * @flags: The flags for the new file descriptor. At the moment, only O_CLOEXEC
+ *         is allowed.
+ * @ret: The return value of the installing process. It is set to the fd num
+ *       upon success (>= 0).
+ * @completion: Indicates that the installing process has completed fd
+ *              installation, or gone away (either due to successful
+ *              reply, or signal)
+ *
+ */
+struct seccomp_kaddfd {
+	struct file *file;
+	int fd;
+	unsigned int flags;
+
+	/* To only be set on reply */
+	int ret;
+	struct completion completion;
+	struct list_head list;
 };
 
 /**
@@ -793,6 +825,17 @@ static u64 seccomp_next_notify_id(struct seccomp_filter *filter)
 	return filter->notif->next_id++;
 }
 
+static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd)
+{
+	/*
+	 * Remove the notification, and reset the list pointers, indicating
+	 * that it has been handled.
+	 */
+	list_del_init(&addfd->list);
+	addfd->ret = receive_fd_replace(addfd->fd, addfd->file, addfd->flags);
+	complete(&addfd->completion);
+}
+
 static int seccomp_do_user_notification(int this_syscall,
 					struct seccomp_filter *match,
 					const struct seccomp_data *sd)
@@ -801,6 +844,7 @@ static int seccomp_do_user_notification(int this_syscall,
 	u32 flags = 0;
 	long ret = 0;
 	struct seccomp_knotif n = {};
+	struct seccomp_kaddfd *addfd, *tmp;
 
 	mutex_lock(&match->notify_lock);
 	err = -ENOSYS;
@@ -813,6 +857,7 @@ static int seccomp_do_user_notification(int this_syscall,
 	n.id = seccomp_next_notify_id(match);
 	init_completion(&n.ready);
 	list_add(&n.list, &match->notif->notifications);
+	INIT_LIST_HEAD(&n.addfd);
 
 	up(&match->notif->request);
 	wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
@@ -821,14 +866,31 @@ static int seccomp_do_user_notification(int this_syscall,
 	/*
 	 * This is where we wait for a reply from userspace.
 	 */
+wait:
 	err = wait_for_completion_interruptible(&n.ready);
 	mutex_lock(&match->notify_lock);
 	if (err == 0) {
+		/* Check if we were woken up by a addfd message */
+		addfd = list_first_entry_or_null(&n.addfd,
+						 struct seccomp_kaddfd, list);
+		if (addfd && n.state != SECCOMP_NOTIFY_REPLIED) {
+			seccomp_handle_addfd(addfd);
+			mutex_unlock(&match->notify_lock);
+			goto wait;
+		}
 		ret = n.val;
 		err = n.error;
 		flags = n.flags;
 	}
 
+	/* If there were any pending addfd calls, clear them out */
+	list_for_each_entry_safe(addfd, tmp, &n.addfd, list) {
+		/* The process went away before we got a chance to handle it */
+		addfd->ret = -ESRCH;
+		list_del_init(&addfd->list);
+		complete(&addfd->completion);
+	}
+
 	/*
 	 * Note that it's possible the listener died in between the time when
 	 * we were notified of a respons (or a signal) and when we were able to
@@ -1069,6 +1131,11 @@ static int seccomp_notify_release(struct inode *inode, struct file *file)
 		knotif->error = -ENOSYS;
 		knotif->val = 0;
 
+		/*
+		 * We do not need to wake up any pending addfd messages, as
+		 * the notifier will do that for us, as this just looks
+		 * like a standard reply.
+		 */
 		complete(&knotif->ready);
 	}
 
@@ -1233,12 +1300,107 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter,
 	return ret;
 }
 
+static long seccomp_notify_addfd(struct seccomp_filter *filter,
+				 struct seccomp_notif_addfd __user *uaddfd,
+				 unsigned int size)
+{
+	struct seccomp_notif_addfd addfd;
+	struct seccomp_knotif *knotif;
+	struct seccomp_kaddfd kaddfd;
+	int ret;
+
+	/* 24 is original sizeof(struct seccomp_notif_addfd) */
+	if (size < 24 || size >= PAGE_SIZE)
+		return -EINVAL;
+
+	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
+	if (ret)
+		return ret;
+
+	if (addfd.newfd_flags & ~O_CLOEXEC)
+		return -EINVAL;
+
+	if (addfd.flags & ~SECCOMP_ADDFD_FLAG_SETFD)
+		return -EINVAL;
+
+	if (addfd.newfd && !(addfd.flags & SECCOMP_ADDFD_FLAG_SETFD))
+		return -EINVAL;
+
+	kaddfd.file = fget(addfd.srcfd);
+	if (!kaddfd.file)
+		return -EBADF;
+
+	kaddfd.flags = addfd.newfd_flags;
+	kaddfd.fd = (addfd.flags & SECCOMP_ADDFD_FLAG_SETFD) ?
+		    addfd.newfd : -1;
+	init_completion(&kaddfd.completion);
+
+	ret = mutex_lock_interruptible(&filter->notify_lock);
+	if (ret < 0)
+		goto out;
+
+	knotif = find_notification(filter, addfd.id);
+	if (!knotif) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	/*
+	 * We do not want to allow for FD injection to occur before the
+	 * notification has been picked up by a userspace handler, or after
+	 * the notification has been replied to.
+	 */
+	if (knotif->state != SECCOMP_NOTIFY_SENT) {
+		ret = -EINPROGRESS;
+		goto out_unlock;
+	}
+
+	list_add(&kaddfd.list, &knotif->addfd);
+	complete(&knotif->ready);
+	mutex_unlock(&filter->notify_lock);
+
+	/* Now we wait for it to be processed or be interrupted */
+	ret = wait_for_completion_interruptible(&kaddfd.completion);
+	if (ret == 0) {
+		/*
+		 * We had a successful completion. The other side has already
+		 * removed us from the addfd queue, and
+		 * wait_for_completion_interruptible has a memory barrier upon
+		 * success that lets us read this value directly without
+		 * locking.
+		 */
+		ret = kaddfd.ret;
+		goto out;
+	}
+
+	mutex_lock(&filter->notify_lock);
+	/*
+	 * Even though we were woken up by a signal and not a successful
+	 * completion, a completion may have happened in the mean time.
+	 *
+	 * We need to check again if the addfd request has been handled,
+	 * and if not, we will remove it from the queue.
+	 */
+	if (list_empty(&kaddfd.list))
+		ret = kaddfd.ret;
+	else
+		list_del(&kaddfd.list);
+
+out_unlock:
+	mutex_unlock(&filter->notify_lock);
+out:
+	fput(kaddfd.file);
+
+	return ret;
+}
+
 static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
 				 unsigned long arg)
 {
 	struct seccomp_filter *filter = file->private_data;
 	void __user *buf = (void __user *)arg;
 
+	/* Fixed-size ioctls */
 	switch (cmd) {
 	case SECCOMP_IOCTL_NOTIF_RECV:
 		return seccomp_notify_recv(filter, buf);
@@ -1247,9 +1409,17 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
 	case SECCOMP_IOCTL_NOTIF_ID_VALID_WRONG_DIR:
 	case SECCOMP_IOCTL_NOTIF_ID_VALID:
 		return seccomp_notify_id_valid(filter, buf);
+	}
+
+	/* Extensible Argument ioctls */
+#define EA_IOCTL(cmd)	((cmd) & ~(IOC_INOUT | IOCSIZE_MASK))
+	switch (EA_IOCTL(cmd)) {
+	case EA_IOCTL(SECCOMP_IOCTL_NOTIF_ADDFD):
+		return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
 	default:
 		return -EINVAL;
 	}
+#undef EA_IOCTL
 }
 
 static __poll_t seccomp_notify_poll(struct file *file,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 7/7] selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
  2020-07-06 20:17 [PATCH v6 0/7] Add seccomp notifier ioctl that enables adding fds Kees Cook
                   ` (5 preceding siblings ...)
  2020-07-06 20:17 ` [PATCH v6 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier Kees Cook
@ 2020-07-06 20:17 ` Kees Cook
  6 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2020-07-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

From: Sargun Dhillon <sargun@sargun.me>

Test whether we can add file descriptors in response to notifications.
This injects the file descriptors via notifications, and then uses kcmp
to determine whether or not it has been successful.

It also includes some basic sanity checking for arguments.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Link: https://lore.kernel.org/r/20200603011044.7972-5-sargun@sargun.me
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 229 ++++++++++++++++++
 1 file changed, 229 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 2c77105e50e6..b854a6c5bf49 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -45,6 +45,7 @@
 #include <sys/socket.h>
 #include <sys/ioctl.h>
 #include <linux/kcmp.h>
+#include <sys/resource.h>
 
 #include <unistd.h>
 #include <sys/syscall.h>
@@ -168,7 +169,9 @@ struct seccomp_metadata {
 
 #ifndef SECCOMP_FILTER_FLAG_NEW_LISTENER
 #define SECCOMP_FILTER_FLAG_NEW_LISTENER	(1UL << 3)
+#endif
 
+#ifndef SECCOMP_RET_USER_NOTIF
 #define SECCOMP_RET_USER_NOTIF 0x7fc00000U
 
 #define SECCOMP_IOC_MAGIC		'!'
@@ -204,6 +207,39 @@ struct seccomp_notif_sizes {
 };
 #endif
 
+#ifndef SECCOMP_IOCTL_NOTIF_ADDFD
+/* On success, the return value is the remote process's added fd number */
+#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOW(3,	\
+						struct seccomp_notif_addfd)
+
+/* valid flags for seccomp_notif_addfd */
+#define SECCOMP_ADDFD_FLAG_SETFD	(1UL << 0) /* Specify remote fd */
+
+struct seccomp_notif_addfd {
+	__u64 id;
+	__u32 flags;
+	__u32 srcfd;
+	__u32 newfd;
+	__u32 newfd_flags;
+};
+#endif
+
+struct seccomp_notif_addfd_small {
+	__u64 id;
+	char weird[4];
+};
+#define SECCOMP_IOCTL_NOTIF_ADDFD_SMALL	\
+	SECCOMP_IOW(3, struct seccomp_notif_addfd_small)
+
+struct seccomp_notif_addfd_big {
+	union {
+		struct seccomp_notif_addfd addfd;
+		char buf[sizeof(struct seccomp_notif_addfd) + 8];
+	};
+};
+#define SECCOMP_IOCTL_NOTIF_ADDFD_BIG	\
+	SECCOMP_IOWR(3, struct seccomp_notif_addfd_big)
+
 #ifndef PTRACE_EVENTMSG_SYSCALL_ENTRY
 #define PTRACE_EVENTMSG_SYSCALL_ENTRY	1
 #define PTRACE_EVENTMSG_SYSCALL_EXIT	2
@@ -3738,6 +3774,199 @@ TEST(user_notification_filter_empty_threaded)
 	EXPECT_GT((pollfd.revents & POLLHUP) ?: 0, 0);
 }
 
+TEST(user_notification_addfd)
+{
+	pid_t pid;
+	long ret;
+	int status, listener, memfd, fd;
+	struct seccomp_notif_addfd addfd = {};
+	struct seccomp_notif_addfd_small small = {};
+	struct seccomp_notif_addfd_big big = {};
+	struct seccomp_notif req = {};
+	struct seccomp_notif_resp resp = {};
+	/* 100 ms */
+	struct timespec delay = { .tv_nsec = 100000000 };
+
+	memfd = memfd_create("test", 0);
+	ASSERT_GE(memfd, 0);
+
+	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+	}
+
+	/* Check that the basic notification machinery works */
+	listener = user_notif_syscall(__NR_getppid,
+				      SECCOMP_FILTER_FLAG_NEW_LISTENER);
+	ASSERT_GE(listener, 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+
+	if (pid == 0) {
+		if (syscall(__NR_getppid) != USER_NOTIF_MAGIC)
+			exit(1);
+		exit(syscall(__NR_getppid) != USER_NOTIF_MAGIC);
+	}
+
+	ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+
+	addfd.srcfd = memfd;
+	addfd.newfd = 0;
+	addfd.id = req.id;
+	addfd.flags = 0x0;
+
+	/* Verify bad newfd_flags cannot be set */
+	addfd.newfd_flags = ~O_CLOEXEC;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+	EXPECT_EQ(errno, EINVAL);
+	addfd.newfd_flags = O_CLOEXEC;
+
+	/* Verify bad flags cannot be set */
+	addfd.flags = 0xff;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+	EXPECT_EQ(errno, EINVAL);
+	addfd.flags = 0;
+
+	/* Verify that remote_fd cannot be set without setting flags */
+	addfd.newfd = 1;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+	EXPECT_EQ(errno, EINVAL);
+	addfd.newfd = 0;
+
+	/* Verify small size cannot be set */
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD_SMALL, &small), -1);
+	EXPECT_EQ(errno, EINVAL);
+
+	/* Verify we can't send bits filled in unknown buffer area */
+	memset(&big, 0xAA, sizeof(big));
+	big.addfd = addfd;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD_BIG, &big), -1);
+	EXPECT_EQ(errno, E2BIG);
+
+
+	/* Verify we can set an arbitrary remote fd */
+	fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd);
+	/*
+	 * The child has fds 0(stdin), 1(stdout), 2(stderr), 3(memfd),
+	 * 4(listener), so the newly allocated fd should be 5.
+	 */
+	EXPECT_EQ(fd, 5);
+	EXPECT_EQ(filecmp(getpid(), pid, memfd, fd), 0);
+
+	/* Verify we can set an arbitrary remote fd with large size */
+	memset(&big, 0x0, sizeof(big));
+	big.addfd = addfd;
+	fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD_BIG, &big);
+	EXPECT_EQ(fd, 6);
+
+	/* Verify we can set a specific remote fd */
+	addfd.newfd = 42;
+	addfd.flags = SECCOMP_ADDFD_FLAG_SETFD;
+	fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd);
+	EXPECT_EQ(fd, 42);
+	EXPECT_EQ(filecmp(getpid(), pid, memfd, fd), 0);
+
+	/* Resume syscall */
+	resp.id = req.id;
+	resp.error = 0;
+	resp.val = USER_NOTIF_MAGIC;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0);
+
+	/*
+	 * This sets the ID of the ADD FD to the last request plus 1. The
+	 * notification ID increments 1 per notification.
+	 */
+	addfd.id = req.id + 1;
+
+	/* This spins until the underlying notification is generated */
+	while (ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd) != -1 &&
+	       errno != -EINPROGRESS)
+		nanosleep(&delay, NULL);
+
+	memset(&req, 0, sizeof(req));
+	ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+	ASSERT_EQ(addfd.id, req.id);
+
+	resp.id = req.id;
+	resp.error = 0;
+	resp.val = USER_NOTIF_MAGIC;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0);
+
+	/* Wait for child to finish. */
+	EXPECT_EQ(waitpid(pid, &status, 0), pid);
+	EXPECT_EQ(true, WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	close(memfd);
+}
+
+TEST(user_notification_addfd_rlimit)
+{
+	pid_t pid;
+	long ret;
+	int status, listener, memfd;
+	struct seccomp_notif_addfd addfd = {};
+	struct seccomp_notif req = {};
+	struct seccomp_notif_resp resp = {};
+	const struct rlimit lim = {
+		.rlim_cur	= 0,
+		.rlim_max	= 0,
+	};
+
+	memfd = memfd_create("test", 0);
+	ASSERT_GE(memfd, 0);
+
+	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+	}
+
+	/* Check that the basic notification machinery works */
+	listener = user_notif_syscall(__NR_getppid,
+				      SECCOMP_FILTER_FLAG_NEW_LISTENER);
+	ASSERT_GE(listener, 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+
+	if (pid == 0)
+		exit(syscall(__NR_getppid) != USER_NOTIF_MAGIC);
+
+
+	ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+
+	ASSERT_EQ(prlimit(pid, RLIMIT_NOFILE, &lim, NULL), 0);
+
+	addfd.srcfd = memfd;
+	addfd.newfd_flags = O_CLOEXEC;
+	addfd.newfd = 0;
+	addfd.id = req.id;
+	addfd.flags = 0;
+
+	/* Should probably spot check /proc/sys/fs/file-nr */
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+	EXPECT_EQ(errno, EMFILE);
+
+	addfd.newfd = 100;
+	addfd.flags = SECCOMP_ADDFD_FLAG_SETFD;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+	EXPECT_EQ(errno, EBADF);
+
+	resp.id = req.id;
+	resp.error = 0;
+	resp.val = USER_NOTIF_MAGIC;
+
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0);
+
+	/* Wait for child to finish. */
+	EXPECT_EQ(waitpid(pid, &status, 0), pid);
+	EXPECT_EQ(true, WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	close(memfd);
+}
+
 /*
  * TODO:
  * - expand NNP testing
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/7] fs: Move __scm_install_fd() to __receive_fd()
  2020-07-06 20:17 ` [PATCH v6 2/7] fs: Move __scm_install_fd() to __receive_fd() Kees Cook
@ 2020-07-07  6:49   ` Christoph Hellwig
  2020-07-07 11:45   ` Christian Brauner
  1 sibling, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-07-07  6:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Mon, Jul 06, 2020 at 01:17:15PM -0700, Kees Cook wrote:
> In preparation for users of the "install a received file" logic outside
> of net/ (pidfd and seccomp), relocate and rename __scm_install_fd() from
> net/core/scm.c to __receive_fd() in fs/file.c, and provide a wrapper
> named receive_fd_user(), as future patches will change the interface
> to __receive_fd().

Why the __ prefix if there is no receive_fd()?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 3/7] fs: Add receive_fd() wrapper for __receive_fd()
  2020-07-06 20:17 ` [PATCH v6 3/7] fs: Add receive_fd() wrapper for __receive_fd() Kees Cook
@ 2020-07-07  6:49   ` Christoph Hellwig
  2020-07-07 11:49   ` Christian Brauner
  1 sibling, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2020-07-07  6:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Mon, Jul 06, 2020 at 01:17:16PM -0700, Kees Cook wrote:
> For both pidfd and seccomp, the __user pointer is not used. Update
> __receive_fd() to make writing to ufd optional via a NULL check. However,
> for the receive_fd_user() wrapper, ufd is NULL checked so an -EFAULT
> can be returned to avoid changing the SCM_RIGHTS interface behavior. Add
> new wrapper receive_fd() for pidfd and seccomp that does not use the ufd
> argument. For the new helper, the allocated fd needs to be returned on
> success. Update the existing callers to handle it.

Oh here we go, I'll take the last comment back.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/7] net/scm: Regularize compat handling of scm_detach_fds()
  2020-07-06 20:17 ` [PATCH v6 1/7] net/scm: Regularize compat handling of scm_detach_fds() Kees Cook
@ 2020-07-07 11:41   ` Christian Brauner
  2020-07-08 23:46     ` Kees Cook
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Brauner @ 2020-07-07 11:41 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Mon, Jul 06, 2020 at 01:17:14PM -0700, Kees Cook wrote:
> Duplicate the cleanups from commit 2618d530dd8b ("net/scm: cleanup
> scm_detach_fds") into the compat code.
> 
> Move the check added in commit 1f466e1f15cf ("net: cleanly handle kernel
> vs user buffers for ->msg_control") to before the compat call, even
> though it should be impossible for an in-kernel call to also be compat.
> 
> Correct the int "flags" argument to unsigned int to match fd_install()
> and similar APIs.
> 
> Regularize any remaining differences, including a whitespace issue,
> a checkpatch warning, and add the check from commit 6900317f5eff ("net,
> scm: fix PaX detected msg_controllen overflow in scm_detach_fds") which
> fixed an overflow unique to 64-bit. To avoid confusion when comparing
> the compat handler to the native handler, just include the same check
> in the compat handler.
> 
> Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
> Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Thanks. Just a comment below.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

>  include/net/scm.h |  1 +
>  net/compat.c      | 55 +++++++++++++++++++++--------------------------
>  net/core/scm.c    | 18 ++++++++--------
>  3 files changed, 35 insertions(+), 39 deletions(-)
> 
> diff --git a/include/net/scm.h b/include/net/scm.h
> index 1ce365f4c256..581a94d6c613 100644
> --- a/include/net/scm.h
> +++ b/include/net/scm.h
> @@ -37,6 +37,7 @@ struct scm_cookie {
>  #endif
>  };
>  
> +int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags);
>  void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
>  void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm);
>  int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm);
> diff --git a/net/compat.c b/net/compat.c
> index 5e3041a2c37d..27d477fdcaa0 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -281,39 +281,31 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
>  	return 0;
>  }
>  
> -void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
> +static int scm_max_fds_compat(struct msghdr *msg)
>  {
> -	struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
> -	int fdmax = (kmsg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
> -	int fdnum = scm->fp->count;
> -	struct file **fp = scm->fp->fp;
> -	int __user *cmfptr;
> -	int err = 0, i;
> +	if (msg->msg_controllen <= sizeof(struct compat_cmsghdr))
> +		return 0;
> +	return (msg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
> +}
>  
> -	if (fdnum < fdmax)
> -		fdmax = fdnum;
> +void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm)
> +{
> +	struct compat_cmsghdr __user *cm =
> +		(struct compat_cmsghdr __user *)msg->msg_control;
> +	unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
> +	int fdmax = min_t(int, scm_max_fds_compat(msg), scm->fp->count);

Just a note that SCM_RIGHTS fd-sending is limited to 253 (SCM_MAX_FD)
fds so min_t should never ouput > SCM_MAX_FD here afaict.

> +	int __user *cmsg_data = CMSG_USER_DATA(cm);
> +	int err = 0, i;
>  
> -	for (i = 0, cmfptr = (int __user *) CMSG_COMPAT_DATA(cm); i < fdmax; i++, cmfptr++) {
> -		int new_fd;
> -		err = security_file_receive(fp[i]);
> +	for (i = 0; i < fdmax; i++) {
> +		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
>  		if (err)
>  			break;
> -		err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & kmsg->msg_flags
> -					  ? O_CLOEXEC : 0);
> -		if (err < 0)
> -			break;
> -		new_fd = err;
> -		err = put_user(new_fd, cmfptr);
> -		if (err) {
> -			put_unused_fd(new_fd);
> -			break;
> -		}
> -		/* Bump the usage count and install the file. */
> -		fd_install(new_fd, get_file(fp[i]));
>  	}
>  
>  	if (i > 0) {
>  		int cmlen = CMSG_COMPAT_LEN(i * sizeof(int));
> +
>  		err = put_user(SOL_SOCKET, &cm->cmsg_level);
>  		if (!err)
>  			err = put_user(SCM_RIGHTS, &cm->cmsg_type);
> @@ -321,16 +313,19 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
>  			err = put_user(cmlen, &cm->cmsg_len);
>  		if (!err) {
>  			cmlen = CMSG_COMPAT_SPACE(i * sizeof(int));
> -			kmsg->msg_control += cmlen;
> -			kmsg->msg_controllen -= cmlen;
> +			if (msg->msg_controllen < cmlen)
> +				cmlen = msg->msg_controllen;
> +			msg->msg_control += cmlen;
> +			msg->msg_controllen -= cmlen;
>  		}
>  	}
> -	if (i < fdnum)
> -		kmsg->msg_flags |= MSG_CTRUNC;
> +
> +	if (i < scm->fp->count || (scm->fp->count && fdmax <= 0))

I think fdmax can't be < 0 after your changes? scm_max_fds() guarantees
that fdmax is always >= 0 and min_t() guarantees that fdmax <= scm->fp->count.
So the check should technically be :)

if (i < scm->fp->count || scm->fp->count && fdmax == 0)

> +		msg->msg_flags |= MSG_CTRUNC;
>  
>  	/*
> -	 * All of the files that fit in the message have had their
> -	 * usage counts incremented, so we just free the list.
> +	 * All of the files that fit in the message have had their usage counts
> +	 * incremented, so we just free the list.
>  	 */
>  	__scm_destroy(scm);
>  }
> diff --git a/net/core/scm.c b/net/core/scm.c
> index 875df1c2989d..6151678c73ed 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -280,7 +280,7 @@ void put_cmsg_scm_timestamping(struct msghdr *msg, struct scm_timestamping_inter
>  }
>  EXPORT_SYMBOL(put_cmsg_scm_timestamping);
>  
> -static int __scm_install_fd(struct file *file, int __user *ufd, int o_flags)
> +int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags)
>  {
>  	struct socket *sock;
>  	int new_fd;
> @@ -319,29 +319,29 @@ static int scm_max_fds(struct msghdr *msg)
>  
>  void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
>  {
> -	struct cmsghdr __user *cm
> -		= (__force struct cmsghdr __user*)msg->msg_control;
> -	int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
> +	struct cmsghdr __user *cm =
> +		(__force struct cmsghdr __user *)msg->msg_control;
> +	unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
>  	int fdmax = min_t(int, scm_max_fds(msg), scm->fp->count);
>  	int __user *cmsg_data = CMSG_USER_DATA(cm);
>  	int err = 0, i;
>  
> +	/* no use for FD passing from kernel space callers */
> +	if (WARN_ON_ONCE(!msg->msg_control_is_user))
> +		return;
> +
>  	if (msg->msg_flags & MSG_CMSG_COMPAT) {
>  		scm_detach_fds_compat(msg, scm);
>  		return;
>  	}
>  
> -	/* no use for FD passing from kernel space callers */
> -	if (WARN_ON_ONCE(!msg->msg_control_is_user))
> -		return;
> -
>  	for (i = 0; i < fdmax; i++) {
>  		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
>  		if (err)
>  			break;
>  	}
>  
> -	if (i > 0)  {
> +	if (i > 0) {
>  		int cmlen = CMSG_LEN(i * sizeof(int));
>  
>  		err = put_user(SOL_SOCKET, &cm->cmsg_level);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 2/7] fs: Move __scm_install_fd() to __receive_fd()
  2020-07-06 20:17 ` [PATCH v6 2/7] fs: Move __scm_install_fd() to __receive_fd() Kees Cook
  2020-07-07  6:49   ` Christoph Hellwig
@ 2020-07-07 11:45   ` Christian Brauner
  1 sibling, 0 replies; 26+ messages in thread
From: Christian Brauner @ 2020-07-07 11:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Mon, Jul 06, 2020 at 01:17:15PM -0700, Kees Cook wrote:
> In preparation for users of the "install a received file" logic outside
> of net/ (pidfd and seccomp), relocate and rename __scm_install_fd() from
> net/core/scm.c to __receive_fd() in fs/file.c, and provide a wrapper
> named receive_fd_user(), as future patches will change the interface
> to __receive_fd().
> 
> Reviewed-by: Sargun Dhillon <sargun@sargun.me>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Thanks!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 3/7] fs: Add receive_fd() wrapper for __receive_fd()
  2020-07-06 20:17 ` [PATCH v6 3/7] fs: Add receive_fd() wrapper for __receive_fd() Kees Cook
  2020-07-07  6:49   ` Christoph Hellwig
@ 2020-07-07 11:49   ` Christian Brauner
  2020-07-08 23:48     ` Kees Cook
  1 sibling, 1 reply; 26+ messages in thread
From: Christian Brauner @ 2020-07-07 11:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Mon, Jul 06, 2020 at 01:17:16PM -0700, Kees Cook wrote:
> For both pidfd and seccomp, the __user pointer is not used. Update
> __receive_fd() to make writing to ufd optional via a NULL check. However,
> for the receive_fd_user() wrapper, ufd is NULL checked so an -EFAULT
> can be returned to avoid changing the SCM_RIGHTS interface behavior. Add
> new wrapper receive_fd() for pidfd and seccomp that does not use the ufd
> argument. For the new helper, the allocated fd needs to be returned on
> success. Update the existing callers to handle it.
> 
> Reviewed-by: Sargun Dhillon <sargun@sargun.me>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Hm, I'm not sure why 2/7 and 3/7 aren't just one patch but ok. :)
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 4/7] pidfd: Replace open-coded partial receive_fd()
  2020-07-06 20:17 ` [PATCH v6 4/7] pidfd: Replace open-coded partial receive_fd() Kees Cook
@ 2020-07-07 12:22   ` Christian Brauner
  2020-07-08 23:49     ` Kees Cook
  2020-07-09  6:35     ` Kees Cook
  0 siblings, 2 replies; 26+ messages in thread
From: Christian Brauner @ 2020-07-07 12:22 UTC (permalink / raw)
  To: Kees Cook, Christoph Hellwig
  Cc: linux-kernel, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, David S. Miller, Jakub Kicinski, Alexander Viro,
	Aleksa Sarai, Matt Denton, Jann Horn, Chris Palmer, Robert Sesek,
	Giuseppe Scrivano, Greg Kroah-Hartman, Andy Lutomirski,
	Will Drewry, Shuah Khan, netdev, containers, linux-api,
	linux-fsdevel, linux-kselftest

On Mon, Jul 06, 2020 at 01:17:17PM -0700, Kees Cook wrote:
> The sock counting (sock_update_netprioidx() and sock_update_classid()) was
> missing from pidfd's implementation of received fd installation. Replace
> the open-coded version with a call to the new receive_fd()
> helper.
> 
> Thanks to Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> for
> catching a missed fput() in an earlier version of this patch.
> 
> Fixes: 8649c322f75c ("pid: Implement pidfd_getfd syscall")
> Reviewed-by: Sargun Dhillon <sargun@sargun.me>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Thanks!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

Christoph, Kees,

So while the patch is correct it leaves 5.6 and 5.7 with a bug in the
pidfd_getfd() implementation and that just doesn't seem right. I'm
wondering whether we should introduce:

void sock_update(struct file *file)
{
	struct socket *sock;
	int error;

	sock = sock_from_file(file, &error);
	if (sock) {
		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
		sock_update_classid(&sock->sk->sk_cgrp_data);
	}
}

and switch pidfd_getfd() over to:

diff --git a/kernel/pid.c b/kernel/pid.c
index f1496b757162..c26bba822be3 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -642,10 +642,12 @@ static int pidfd_getfd(struct pid *pid, int fd)
        }

        ret = get_unused_fd_flags(O_CLOEXEC);
-       if (ret < 0)
+       if (ret < 0) {
                fput(file);
-       else
+       } else {
+               sock_update(file);
                fd_install(ret, file);
+       }

        return ret;
 }

first thing in the series and then all of the other patches on top of it
so that we can Cc stable for this and that can get it backported to 5.6,
5.7, and 5.8.

Alternatively, I can make this a separate bugfix patch series which I'll
send upstream soonish. Or we have specific patches just for 5.6, 5.7,
and 5.8. Thoughts?

Thanks!
Christian

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 5/7] fs: Expand __receive_fd() to accept existing fd
  2020-07-06 20:17 ` [PATCH v6 5/7] fs: Expand __receive_fd() to accept existing fd Kees Cook
@ 2020-07-07 12:38   ` Christian Brauner
  2020-07-08 23:52     ` Kees Cook
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Brauner @ 2020-07-07 12:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Mon, Jul 06, 2020 at 01:17:18PM -0700, Kees Cook wrote:
> Expand __receive_fd() with support for replace_fd() for the coming seccomp
> "addfd" ioctl(). Add new wrapper receive_fd_replace() for the new behavior
> and update existing wrappers to retain old behavior.
> 
> Thanks to Colin Ian King <colin.king@canonical.com> for pointing out an
> uninitialized variable exposure in an earlier version of this patch.
> 
> Reviewed-by: Sargun Dhillon <sargun@sargun.me>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---

Thanks!
(One tiny-nit below.)
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

>  fs/file.c            | 24 ++++++++++++++++++------
>  include/linux/file.h | 10 +++++++---
>  2 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/file.c b/fs/file.c
> index 0efdcf413210..11313ff36802 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -937,6 +937,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
>  /**
>   * __receive_fd() - Install received file into file descriptor table
>   *
> + * @fd: fd to install into (if negative, a new fd will be allocated)
>   * @file: struct file that was received from another process
>   * @ufd: __user pointer to write new fd number to
>   * @o_flags: the O_* flags to apply to the new fd entry
> @@ -950,7 +951,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
>   *
>   * Returns newly install fd or -ve on error.
>   */
> -int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
> +int __receive_fd(int fd, struct file *file, int __user *ufd, unsigned int o_flags)
>  {
>  	struct socket *sock;
>  	int new_fd;
> @@ -960,18 +961,30 @@ int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
>  	if (error)
>  		return error;
>  
> -	new_fd = get_unused_fd_flags(o_flags);
> -	if (new_fd < 0)
> -		return new_fd;
> +	if (fd < 0) {
> +		new_fd = get_unused_fd_flags(o_flags);
> +		if (new_fd < 0)
> +			return new_fd;
> +	} else
> +		new_fd = fd;

This is nitpicky but coding style technically wants us to use braces
around both branches if one of them requires them. ;)

>  
>  	if (ufd) {
>  		error = put_user(new_fd, ufd);
>  		if (error) {
> -			put_unused_fd(new_fd);
> +			if (fd < 0)
> +				put_unused_fd(new_fd);
>  			return error;
>  		}
>  	}
>  
> +	if (fd < 0)
> +		fd_install(new_fd, get_file(file));
> +	else {
> +		error = replace_fd(new_fd, file, o_flags);
> +		if (error)
> +			return error;
> +	}
> +
>  	/*
>  	 * Bump the usage count and install the file. The resulting value of
>  	 * "error" is ignored here since we only need to take action when
> @@ -982,7 +995,6 @@ int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
>  		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
>  		sock_update_classid(&sock->sk->sk_cgrp_data);
>  	}
> -	fd_install(new_fd, get_file(file));
>  	return new_fd;
>  }
>  
> diff --git a/include/linux/file.h b/include/linux/file.h
> index d9fee9f5c8da..225982792fa2 100644
> --- a/include/linux/file.h
> +++ b/include/linux/file.h
> @@ -92,18 +92,22 @@ extern void put_unused_fd(unsigned int fd);
>  
>  extern void fd_install(unsigned int fd, struct file *file);
>  
> -extern int __receive_fd(struct file *file, int __user *ufd,
> +extern int __receive_fd(int fd, struct file *file, int __user *ufd,
>  			unsigned int o_flags);
>  static inline int receive_fd_user(struct file *file, int __user *ufd,
>  				  unsigned int o_flags)
>  {
>  	if (ufd == NULL)
>  		return -EFAULT;
> -	return __receive_fd(file, ufd, o_flags);
> +	return __receive_fd(-1, file, ufd, o_flags);
>  }
>  static inline int receive_fd(struct file *file, unsigned int o_flags)
>  {
> -	return __receive_fd(file, NULL, o_flags);
> +	return __receive_fd(-1, file, NULL, o_flags);
> +}
> +static inline int receive_fd_replace(int fd, struct file *file, unsigned int o_flags)
> +{
> +	return __receive_fd(fd, file, NULL, o_flags);
>  }
>  
>  extern void flush_delayed_fput(void);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier
  2020-07-06 20:17 ` [PATCH v6 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier Kees Cook
@ 2020-07-07 13:30   ` Christian Brauner
  2020-07-09  6:12     ` Kees Cook
  2020-07-09  6:17     ` [PATCH v6.1 " Kees Cook
  0 siblings, 2 replies; 26+ messages in thread
From: Christian Brauner @ 2020-07-07 13:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Sargun Dhillon, Matt Denton, Christian Brauner,
	Tycho Andersen, David Laight, Christoph Hellwig, David S. Miller,
	Jakub Kicinski, Alexander Viro, Aleksa Sarai, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Mon, Jul 06, 2020 at 01:17:19PM -0700, Kees Cook wrote:
> From: Sargun Dhillon <sargun@sargun.me>
> 
> This adds a seccomp notifier ioctl which allows for the listener to
> "add" file descriptors to a process which originated a seccomp user
> notification. This allows calls like mount, and mknod to be "implemented",

This either seems like the wrong description as we're intercepting
mount() and mknod() already and as far as I can tell, addfd is not
needed in any way for them. addfd is targeted at system calls that
themselves would install a file descriptor in the intercepted task, e.g.
open() and accept().

> as the return value, and the arguments are data in memory. On the other
> hand, calls like connect can be "implemented" using pidfd_getfd.
> 
> Unfortunately, there are calls which return file descriptors, like
> open, which are vulnerable to ToCToU attacks, and require that the more

Hm, the race is not specific to open though but any syscall and
specifically its bad for those syscalls with pointer arguments where the
supervisor copies out the memory, inspects it, and bases its decision on
them.

> privileged supervisor can inspect the argument, and perform the syscall
> on behalf of the process generating the notification. This allows the
> file descriptor generated from that open call to be returned to the
> calling process.

Hm, maybe change that description to sm like:

The seccomp notifier which was introduced a while back allows for
syscall supervision. It is often used in settings where a supervisising
task emulates syscalls for a supervised task in userspace either to
further restrict it's syscall capabilities or to circumvent kernel
enforced restrictions the supervisor deems safe to lift.
While the seccomp notifier allows for the interception of any syscall
only a certain set of syscalls can be correctly emulated. Over the last
cycles we have done work to reduce the set of syscalls which can't be
emulated with the addition of pidfd_getfd(). With this syscall we are
now able to e.g. intercept syscalls that require the supervisor to
operate on file descriptors of the supervisee such as connect().

However, syscalls that cause new file descriptors to be installed in the
supervisor can currently not be correctly emulated since there is no way
for the supervisor to inject file descriptors into the supervisee. The
new addfd ioctl removes this restriction by allowing the supervisor to
install file descriptors into the intercepted task. By implementing this
feature via seccomp the supervisor effectively instructs the supervisee
to install a set of file descriptors into its own file descriptor table.
With the new addfd extension it is possible to e.g. intercept syscalls
such as open() or accept(). The addfd ioctl allows allows to replace
existing file descriptors in the supervisee. One use-case would be to
replace stdout and stderr of a supervisee with a log file descriptor to
record the supervisees output.

> 
> In addition, there is functionality to allow for replacement of specific
> file descriptors, following dup2-like semantics.
> 
> The ioctl handling is based on the discussions[1] of how Extensible
> Arguments should interact with ioctls. Instead of building size into
> the addfd structure, make it a function of the ioctl command (which
> is how sizes are normally passed to ioctls). To support forward and
> backward compatibility, just mask out the direction and size, and match
> everything. The size (and any future direction) checks are done along
> with copy_struct_from_user() logic.
> 
> As a note, the seccomp_notif_addfd structure is laid out based on 8-byte
> alignment without requiring packing as there have been packing issues
> with uapi highlighted before[1][2]. Although we could overload the
> newfd field and use -1 to indicate that it is not to be used, doing
> so requires changing the size of the fd field, and introduces struct
> packing complexity.
> 
> [1]: https://lore.kernel.org/lkml/87o8w9bcaf.fsf@mid.deneb.enyo.de/
> [2]: https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f18c0@rasmusvillemoes.dk/
> [3]: https://lore.kernel.org/lkml/20200612104629.GA15814@ircssh-2.c.rugged-nimbus-611.internal
> 
> Suggested-by: Matt Denton <mpdenton@google.com>
> Link: https://lore.kernel.org/r/20200603011044.7972-4-sargun@sargun.me
> Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> Co-developed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  include/uapi/linux/seccomp.h |  22 +++++
>  kernel/seccomp.c             | 172 ++++++++++++++++++++++++++++++++++-
>  2 files changed, 193 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 965290f7dcc2..6ba18b82a02e 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -113,6 +113,25 @@ struct seccomp_notif_resp {
>  	__u32 flags;
>  };
>  
> +/* valid flags for seccomp_notif_addfd */
> +#define SECCOMP_ADDFD_FLAG_SETFD	(1UL << 0) /* Specify remote fd */
> +
> +/**
> + * struct seccomp_notif_addfd
> + * @id: The ID of the seccomp notification
> + * @flags: SECCOMP_ADDFD_FLAG_*
> + * @srcfd: The local fd number
> + * @newfd: Optional remote FD number if SETFD option is set, otherwise 0.
> + * @newfd_flags: The O_* flags the remote FD should have applied
> + */
> +struct seccomp_notif_addfd {
> +	__u64 id;
> +	__u32 flags;
> +	__u32 srcfd;
> +	__u32 newfd;
> +	__u32 newfd_flags;
> +};
> +
>  #define SECCOMP_IOC_MAGIC		'!'
>  #define SECCOMP_IO(nr)			_IO(SECCOMP_IOC_MAGIC, nr)
>  #define SECCOMP_IOR(nr, type)		_IOR(SECCOMP_IOC_MAGIC, nr, type)
> @@ -124,5 +143,8 @@ struct seccomp_notif_resp {
>  #define SECCOMP_IOCTL_NOTIF_SEND	SECCOMP_IOWR(1,	\
>  						struct seccomp_notif_resp)
>  #define SECCOMP_IOCTL_NOTIF_ID_VALID	SECCOMP_IOW(2, __u64)
> +/* On success, the return value is the remote process's added fd number */
> +#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOW(3, \
> +						struct seccomp_notif_addfd)
>  
>  #endif /* _UAPI_LINUX_SECCOMP_H */
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 866a432cd746..ee314036e429 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -87,10 +87,42 @@ struct seccomp_knotif {
>  	long val;
>  	u32 flags;
>  
> -	/* Signals when this has entered SECCOMP_NOTIFY_REPLIED */
> +	/*
> +	 * Signals when this has changed states, such as the listener
> +	 * dying, a new seccomp addfd message, or changing to REPLIED
> +	 */
>  	struct completion ready;
>  
>  	struct list_head list;
> +
> +	/* outstanding addfd requests */
> +	struct list_head addfd;
> +};
> +
> +/**
> + * struct seccomp_kaddfd - container for seccomp_addfd ioctl messages
> + *
> + * @file: A reference to the file to install in the other task
> + * @fd: The fd number to install it at. If the fd number is -1, it means the
> + *      installing process should allocate the fd as normal.
> + * @flags: The flags for the new file descriptor. At the moment, only O_CLOEXEC
> + *         is allowed.
> + * @ret: The return value of the installing process. It is set to the fd num
> + *       upon success (>= 0).
> + * @completion: Indicates that the installing process has completed fd
> + *              installation, or gone away (either due to successful
> + *              reply, or signal)
> + *
> + */
> +struct seccomp_kaddfd {
> +	struct file *file;
> +	int fd;
> +	unsigned int flags;
> +
> +	/* To only be set on reply */
> +	int ret;
> +	struct completion completion;
> +	struct list_head list;
>  };
>  
>  /**
> @@ -793,6 +825,17 @@ static u64 seccomp_next_notify_id(struct seccomp_filter *filter)
>  	return filter->notif->next_id++;
>  }
>  
> +static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd)
> +{
> +	/*
> +	 * Remove the notification, and reset the list pointers, indicating
> +	 * that it has been handled.
> +	 */
> +	list_del_init(&addfd->list);
> +	addfd->ret = receive_fd_replace(addfd->fd, addfd->file, addfd->flags);
> +	complete(&addfd->completion);
> +}
> +
>  static int seccomp_do_user_notification(int this_syscall,
>  					struct seccomp_filter *match,
>  					const struct seccomp_data *sd)
> @@ -801,6 +844,7 @@ static int seccomp_do_user_notification(int this_syscall,
>  	u32 flags = 0;
>  	long ret = 0;
>  	struct seccomp_knotif n = {};
> +	struct seccomp_kaddfd *addfd, *tmp;
>  
>  	mutex_lock(&match->notify_lock);
>  	err = -ENOSYS;
> @@ -813,6 +857,7 @@ static int seccomp_do_user_notification(int this_syscall,
>  	n.id = seccomp_next_notify_id(match);
>  	init_completion(&n.ready);
>  	list_add(&n.list, &match->notif->notifications);
> +	INIT_LIST_HEAD(&n.addfd);
>  
>  	up(&match->notif->request);
>  	wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
> @@ -821,14 +866,31 @@ static int seccomp_do_user_notification(int this_syscall,
>  	/*
>  	 * This is where we wait for a reply from userspace.
>  	 */
> +wait:
>  	err = wait_for_completion_interruptible(&n.ready);
>  	mutex_lock(&match->notify_lock);
>  	if (err == 0) {
> +		/* Check if we were woken up by a addfd message */
> +		addfd = list_first_entry_or_null(&n.addfd,
> +						 struct seccomp_kaddfd, list);
> +		if (addfd && n.state != SECCOMP_NOTIFY_REPLIED) {
> +			seccomp_handle_addfd(addfd);
> +			mutex_unlock(&match->notify_lock);
> +			goto wait;
> +		}
>  		ret = n.val;
>  		err = n.error;
>  		flags = n.flags;
>  	}
>  
> +	/* If there were any pending addfd calls, clear them out */
> +	list_for_each_entry_safe(addfd, tmp, &n.addfd, list) {
> +		/* The process went away before we got a chance to handle it */
> +		addfd->ret = -ESRCH;
> +		list_del_init(&addfd->list);
> +		complete(&addfd->completion);
> +	}
> +
>  	/*
>  	 * Note that it's possible the listener died in between the time when
>  	 * we were notified of a respons (or a signal) and when we were able to
> @@ -1069,6 +1131,11 @@ static int seccomp_notify_release(struct inode *inode, struct file *file)
>  		knotif->error = -ENOSYS;
>  		knotif->val = 0;
>  
> +		/*
> +		 * We do not need to wake up any pending addfd messages, as
> +		 * the notifier will do that for us, as this just looks
> +		 * like a standard reply.
> +		 */
>  		complete(&knotif->ready);
>  	}
>  
> @@ -1233,12 +1300,107 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter,
>  	return ret;
>  }
>  
> +static long seccomp_notify_addfd(struct seccomp_filter *filter,
> +				 struct seccomp_notif_addfd __user *uaddfd,
> +				 unsigned int size)
> +{
> +	struct seccomp_notif_addfd addfd;
> +	struct seccomp_knotif *knotif;
> +	struct seccomp_kaddfd kaddfd;
> +	int ret;
> +
> +	/* 24 is original sizeof(struct seccomp_notif_addfd) */
> +	if (size < 24 || size >= PAGE_SIZE)
> +		return -EINVAL;

Hm, so maybe add the following:

#define SECCOMP_NOTIFY_ADDFD_VER0 24
#define SECCOMP_NOTIFY_ADDFD_LATEST SECCOMP_NOTIFY_ADDFD_VER0

and then place:

BUILD_BUG_ON(sizeof(struct seccomp_notify_addfd) < SECCOMP_NOTIFY_ADDFD_VER0);
BUILD_BUG_ON(sizeof(struct open_how) != SECCOMP_NOTIFY_ADDFD_LATEST);

somewhere which is what we do for clone3(), openat2() and others to
catch build-time nonsense.

include/uapi/linux/perf_event.h:#define PERF_ATTR_SIZE_VER0     64      /* sizeof first published struct */
include/uapi/linux/sched.h:#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
include/uapi/linux/sched/types.h:#define SCHED_ATTR_SIZE_VER0   48      /* sizeof first published struct */
include/linux/fcntl.h:#define OPEN_HOW_SIZE_VER0        24 /* sizeof first published struct */
include/linux/fcntl.h:#define OPEN_HOW_SIZE_LATEST      OPEN_HOW_SIZE_VER0

> +
> +	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> +	if (ret)
> +		return ret;
> +
> +	if (addfd.newfd_flags & ~O_CLOEXEC)
> +		return -EINVAL;
> +
> +	if (addfd.flags & ~SECCOMP_ADDFD_FLAG_SETFD)
> +		return -EINVAL;
> +
> +	if (addfd.newfd && !(addfd.flags & SECCOMP_ADDFD_FLAG_SETFD))
> +		return -EINVAL;
> +
> +	kaddfd.file = fget(addfd.srcfd);
> +	if (!kaddfd.file)
> +		return -EBADF;
> +
> +	kaddfd.flags = addfd.newfd_flags;
> +	kaddfd.fd = (addfd.flags & SECCOMP_ADDFD_FLAG_SETFD) ?
> +		    addfd.newfd : -1;
> +	init_completion(&kaddfd.completion);
> +
> +	ret = mutex_lock_interruptible(&filter->notify_lock);
> +	if (ret < 0)
> +		goto out;
> +
> +	knotif = find_notification(filter, addfd.id);
> +	if (!knotif) {
> +		ret = -ENOENT;
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * We do not want to allow for FD injection to occur before the
> +	 * notification has been picked up by a userspace handler, or after
> +	 * the notification has been replied to.
> +	 */
> +	if (knotif->state != SECCOMP_NOTIFY_SENT) {
> +		ret = -EINPROGRESS;
> +		goto out_unlock;
> +	}
> +
> +	list_add(&kaddfd.list, &knotif->addfd);
> +	complete(&knotif->ready);
> +	mutex_unlock(&filter->notify_lock);
> +
> +	/* Now we wait for it to be processed or be interrupted */
> +	ret = wait_for_completion_interruptible(&kaddfd.completion);
> +	if (ret == 0) {
> +		/*
> +		 * We had a successful completion. The other side has already
> +		 * removed us from the addfd queue, and
> +		 * wait_for_completion_interruptible has a memory barrier upon
> +		 * success that lets us read this value directly without
> +		 * locking.
> +		 */
> +		ret = kaddfd.ret;
> +		goto out;
> +	}
> +
> +	mutex_lock(&filter->notify_lock);
> +	/*
> +	 * Even though we were woken up by a signal and not a successful
> +	 * completion, a completion may have happened in the mean time.
> +	 *
> +	 * We need to check again if the addfd request has been handled,
> +	 * and if not, we will remove it from the queue.
> +	 */
> +	if (list_empty(&kaddfd.list))
> +		ret = kaddfd.ret;
> +	else
> +		list_del(&kaddfd.list);
> +
> +out_unlock:
> +	mutex_unlock(&filter->notify_lock);
> +out:
> +	fput(kaddfd.file);
> +
> +	return ret;
> +}
> +
>  static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
>  				 unsigned long arg)
>  {
>  	struct seccomp_filter *filter = file->private_data;
>  	void __user *buf = (void __user *)arg;
>  
> +	/* Fixed-size ioctls */
>  	switch (cmd) {
>  	case SECCOMP_IOCTL_NOTIF_RECV:
>  		return seccomp_notify_recv(filter, buf);
> @@ -1247,9 +1409,17 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
>  	case SECCOMP_IOCTL_NOTIF_ID_VALID_WRONG_DIR:
>  	case SECCOMP_IOCTL_NOTIF_ID_VALID:
>  		return seccomp_notify_id_valid(filter, buf);
> +	}
> +
> +	/* Extensible Argument ioctls */
> +#define EA_IOCTL(cmd)	((cmd) & ~(IOC_INOUT | IOCSIZE_MASK))
> +	switch (EA_IOCTL(cmd)) {
> +	case EA_IOCTL(SECCOMP_IOCTL_NOTIF_ADDFD):
> +		return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
>  	default:
>  		return -EINVAL;
>  	}
> +#undef EA_IOCTL

Why is this undefed? :)

>  }
>  
>  static __poll_t seccomp_notify_poll(struct file *file,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/7] net/scm: Regularize compat handling of scm_detach_fds()
  2020-07-07 11:41   ` Christian Brauner
@ 2020-07-08 23:46     ` Kees Cook
  0 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2020-07-08 23:46 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Tue, Jul 07, 2020 at 01:41:03PM +0200, Christian Brauner wrote:
> On Mon, Jul 06, 2020 at 01:17:14PM -0700, Kees Cook wrote:
> > Duplicate the cleanups from commit 2618d530dd8b ("net/scm: cleanup
> > scm_detach_fds") into the compat code.
> > 
> > Move the check added in commit 1f466e1f15cf ("net: cleanly handle kernel
> > vs user buffers for ->msg_control") to before the compat call, even
> > though it should be impossible for an in-kernel call to also be compat.
> > 
> > Correct the int "flags" argument to unsigned int to match fd_install()
> > and similar APIs.
> > 
> > Regularize any remaining differences, including a whitespace issue,
> > a checkpatch warning, and add the check from commit 6900317f5eff ("net,
> > scm: fix PaX detected msg_controllen overflow in scm_detach_fds") which
> > fixed an overflow unique to 64-bit. To avoid confusion when comparing
> > the compat handler to the native handler, just include the same check
> > in the compat handler.
> > 
> > Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
> > Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> 
> Thanks. Just a comment below.
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

Thanks!

> >  include/net/scm.h |  1 +
> >  net/compat.c      | 55 +++++++++++++++++++++--------------------------
> >  net/core/scm.c    | 18 ++++++++--------
> >  3 files changed, 35 insertions(+), 39 deletions(-)
> > 
> > diff --git a/include/net/scm.h b/include/net/scm.h
> > index 1ce365f4c256..581a94d6c613 100644
> > --- a/include/net/scm.h
> > +++ b/include/net/scm.h
> > @@ -37,6 +37,7 @@ struct scm_cookie {
> >  #endif
> >  };
> >  
> > +int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags);
> >  void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
> >  void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm);
> >  int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm);
> > diff --git a/net/compat.c b/net/compat.c
> > index 5e3041a2c37d..27d477fdcaa0 100644
> > --- a/net/compat.c
> > +++ b/net/compat.c
> > @@ -281,39 +281,31 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
> >  	return 0;
> >  }
> >  
> > -void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
> > +static int scm_max_fds_compat(struct msghdr *msg)
> >  {
> > -	struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
> > -	int fdmax = (kmsg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
> > -	int fdnum = scm->fp->count;
> > -	struct file **fp = scm->fp->fp;
> > -	int __user *cmfptr;
> > -	int err = 0, i;
> > +	if (msg->msg_controllen <= sizeof(struct compat_cmsghdr))
> > +		return 0;
> > +	return (msg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
> > +}
> >  
> > -	if (fdnum < fdmax)
> > -		fdmax = fdnum;
> > +void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm)
> > +{
> > +	struct compat_cmsghdr __user *cm =
> > +		(struct compat_cmsghdr __user *)msg->msg_control;
> > +	unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
> > +	int fdmax = min_t(int, scm_max_fds_compat(msg), scm->fp->count);
> 
> Just a note that SCM_RIGHTS fd-sending is limited to 253 (SCM_MAX_FD)
> fds so min_t should never ouput > SCM_MAX_FD here afaict.
> 
> > +	int __user *cmsg_data = CMSG_USER_DATA(cm);
> > +	int err = 0, i;
> >  
> > -	for (i = 0, cmfptr = (int __user *) CMSG_COMPAT_DATA(cm); i < fdmax; i++, cmfptr++) {
> > -		int new_fd;
> > -		err = security_file_receive(fp[i]);
> > +	for (i = 0; i < fdmax; i++) {
> > +		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
> >  		if (err)
> >  			break;
> > -		err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & kmsg->msg_flags
> > -					  ? O_CLOEXEC : 0);
> > -		if (err < 0)
> > -			break;
> > -		new_fd = err;
> > -		err = put_user(new_fd, cmfptr);
> > -		if (err) {
> > -			put_unused_fd(new_fd);
> > -			break;
> > -		}
> > -		/* Bump the usage count and install the file. */
> > -		fd_install(new_fd, get_file(fp[i]));
> >  	}
> >  
> >  	if (i > 0) {
> >  		int cmlen = CMSG_COMPAT_LEN(i * sizeof(int));
> > +
> >  		err = put_user(SOL_SOCKET, &cm->cmsg_level);
> >  		if (!err)
> >  			err = put_user(SCM_RIGHTS, &cm->cmsg_type);
> > @@ -321,16 +313,19 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
> >  			err = put_user(cmlen, &cm->cmsg_len);
> >  		if (!err) {
> >  			cmlen = CMSG_COMPAT_SPACE(i * sizeof(int));
> > -			kmsg->msg_control += cmlen;
> > -			kmsg->msg_controllen -= cmlen;
> > +			if (msg->msg_controllen < cmlen)
> > +				cmlen = msg->msg_controllen;
> > +			msg->msg_control += cmlen;
> > +			msg->msg_controllen -= cmlen;
> >  		}
> >  	}
> > -	if (i < fdnum)
> > -		kmsg->msg_flags |= MSG_CTRUNC;
> > +
> > +	if (i < scm->fp->count || (scm->fp->count && fdmax <= 0))
> 
> I think fdmax can't be < 0 after your changes? scm_max_fds() guarantees
> that fdmax is always >= 0 and min_t() guarantees that fdmax <= scm->fp->count.
> So the check should technically be :)

You left our your suggestion! :) But, I think you mean "== 0" ?

The check actually comes from the refactoring from commit 2618d530dd8b
("net/scm: cleanup scm_detach_fds") which I mostly copy/pasted into
compat. However, fdmax is an int, and scm->fp->count is signed so it's
possible fdmax is < 0 (but I don't think count can actually ever be <
0), but I don't want to refactor all the types just to fix this boundary
condition. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 3/7] fs: Add receive_fd() wrapper for __receive_fd()
  2020-07-07 11:49   ` Christian Brauner
@ 2020-07-08 23:48     ` Kees Cook
  0 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2020-07-08 23:48 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Tue, Jul 07, 2020 at 01:49:23PM +0200, Christian Brauner wrote:
> On Mon, Jul 06, 2020 at 01:17:16PM -0700, Kees Cook wrote:
> > For both pidfd and seccomp, the __user pointer is not used. Update
> > __receive_fd() to make writing to ufd optional via a NULL check. However,
> > for the receive_fd_user() wrapper, ufd is NULL checked so an -EFAULT
> > can be returned to avoid changing the SCM_RIGHTS interface behavior. Add
> > new wrapper receive_fd() for pidfd and seccomp that does not use the ufd
> > argument. For the new helper, the allocated fd needs to be returned on
> > success. Update the existing callers to handle it.
> > 
> > Reviewed-by: Sargun Dhillon <sargun@sargun.me>
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> 
> Hm, I'm not sure why 2/7 and 3/7 aren't just one patch but ok. :)

I wanted to do a "clean" move from one source to another without any
behavioral changes first.

> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

Thanks!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 4/7] pidfd: Replace open-coded partial receive_fd()
  2020-07-07 12:22   ` Christian Brauner
@ 2020-07-08 23:49     ` Kees Cook
  2020-07-09  6:35     ` Kees Cook
  1 sibling, 0 replies; 26+ messages in thread
From: Kees Cook @ 2020-07-08 23:49 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Christoph Hellwig, linux-kernel, Sargun Dhillon,
	Christian Brauner, Tycho Andersen, David Laight, David S. Miller,
	Jakub Kicinski, Alexander Viro, Aleksa Sarai, Matt Denton,
	Jann Horn, Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Tue, Jul 07, 2020 at 02:22:20PM +0200, Christian Brauner wrote:
> On Mon, Jul 06, 2020 at 01:17:17PM -0700, Kees Cook wrote:
> > The sock counting (sock_update_netprioidx() and sock_update_classid()) was
> > missing from pidfd's implementation of received fd installation. Replace
> > the open-coded version with a call to the new receive_fd()
> > helper.
> > 
> > Thanks to Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> for
> > catching a missed fput() in an earlier version of this patch.
> > 
> > Fixes: 8649c322f75c ("pid: Implement pidfd_getfd syscall")
> > Reviewed-by: Sargun Dhillon <sargun@sargun.me>
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> 
> Thanks!
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> 
> Christoph, Kees,
> 
> So while the patch is correct it leaves 5.6 and 5.7 with a bug in the
> pidfd_getfd() implementation and that just doesn't seem right. I'm
> wondering whether we should introduce:
> 
> void sock_update(struct file *file)
> {
> 	struct socket *sock;
> 	int error;
> 
> 	sock = sock_from_file(file, &error);
> 	if (sock) {
> 		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> 		sock_update_classid(&sock->sk->sk_cgrp_data);
> 	}
> }
> 
> and switch pidfd_getfd() over to:
> 
> diff --git a/kernel/pid.c b/kernel/pid.c
> index f1496b757162..c26bba822be3 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -642,10 +642,12 @@ static int pidfd_getfd(struct pid *pid, int fd)
>         }
> 
>         ret = get_unused_fd_flags(O_CLOEXEC);
> -       if (ret < 0)
> +       if (ret < 0) {
>                 fput(file);
> -       else
> +       } else {
> +               sock_update(file);
>                 fd_install(ret, file);
> +       }
> 
>         return ret;
>  }
> 
> first thing in the series and then all of the other patches on top of it
> so that we can Cc stable for this and that can get it backported to 5.6,
> 5.7, and 5.8.
> 
> Alternatively, I can make this a separate bugfix patch series which I'll
> send upstream soonish. Or we have specific patches just for 5.6, 5.7,
> and 5.8. Thoughts?

I was thinking of just tossing the entire series (hch's and mine) at
-stable since it's relatively narrow. I'll look at what's needed for
backports...

> 
> Thanks!
> Christian

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 5/7] fs: Expand __receive_fd() to accept existing fd
  2020-07-07 12:38   ` Christian Brauner
@ 2020-07-08 23:52     ` Kees Cook
  0 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2020-07-08 23:52 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Tue, Jul 07, 2020 at 02:38:54PM +0200, Christian Brauner wrote:
> On Mon, Jul 06, 2020 at 01:17:18PM -0700, Kees Cook wrote:
> > Expand __receive_fd() with support for replace_fd() for the coming seccomp
> > "addfd" ioctl(). Add new wrapper receive_fd_replace() for the new behavior
> > and update existing wrappers to retain old behavior.
> > 
> > Thanks to Colin Ian King <colin.king@canonical.com> for pointing out an
> > uninitialized variable exposure in an earlier version of this patch.
> > 
> > Reviewed-by: Sargun Dhillon <sargun@sargun.me>
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> 
> Thanks!
> (One tiny-nit below.)
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> 
> >  fs/file.c            | 24 ++++++++++++++++++------
> >  include/linux/file.h | 10 +++++++---
> >  2 files changed, 25 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/file.c b/fs/file.c
> > index 0efdcf413210..11313ff36802 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -937,6 +937,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
> >  /**
> >   * __receive_fd() - Install received file into file descriptor table
> >   *
> > + * @fd: fd to install into (if negative, a new fd will be allocated)
> >   * @file: struct file that was received from another process
> >   * @ufd: __user pointer to write new fd number to
> >   * @o_flags: the O_* flags to apply to the new fd entry
> > @@ -950,7 +951,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
> >   *
> >   * Returns newly install fd or -ve on error.
> >   */
> > -int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
> > +int __receive_fd(int fd, struct file *file, int __user *ufd, unsigned int o_flags)
> >  {
> >  	struct socket *sock;
> >  	int new_fd;
> > @@ -960,18 +961,30 @@ int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
> >  	if (error)
> >  		return error;
> >  
> > -	new_fd = get_unused_fd_flags(o_flags);
> > -	if (new_fd < 0)
> > -		return new_fd;
> > +	if (fd < 0) {
> > +		new_fd = get_unused_fd_flags(o_flags);
> > +		if (new_fd < 0)
> > +			return new_fd;
> > +	} else
> > +		new_fd = fd;
> 
> This is nitpicky but coding style technically wants us to use braces
> around both branches if one of them requires them. ;)

Ah yeah, good point. Fixed. Thanks!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier
  2020-07-07 13:30   ` Christian Brauner
@ 2020-07-09  6:12     ` Kees Cook
  2020-07-09 13:08       ` Christian Brauner
  2020-07-09  6:17     ` [PATCH v6.1 " Kees Cook
  1 sibling, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-07-09  6:12 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, Sargun Dhillon, Matt Denton, Christian Brauner,
	Tycho Andersen, David Laight, Christoph Hellwig, David S. Miller,
	Jakub Kicinski, Alexander Viro, Aleksa Sarai, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Tue, Jul 07, 2020 at 03:30:49PM +0200, Christian Brauner wrote:
> Hm, maybe change that description to sm like:
> 
> [...]

Cool, yeah. Thanks! I've tweaked it a little more

> > +	/* 24 is original sizeof(struct seccomp_notif_addfd) */
> > +	if (size < 24 || size >= PAGE_SIZE)
> > +		return -EINVAL;
> 
> Hm, so maybe add the following:
> 
> #define SECCOMP_NOTIFY_ADDFD_VER0 24
> #define SECCOMP_NOTIFY_ADDFD_LATEST SECCOMP_NOTIFY_ADDFD_VER0
> 
> and then place:
> 
> BUILD_BUG_ON(sizeof(struct seccomp_notify_addfd) < SECCOMP_NOTIFY_ADDFD_VER0);
> BUILD_BUG_ON(sizeof(struct open_how) != SECCOMP_NOTIFY_ADDFD_LATEST);

Yes, good idea (BTW, did the EA syscall docs land?)

I've made these SECCOMP_NOTIFY_ADDFD_SIZE_* to match your examples below
(i.e.  I added "SIZE" to what you suggested above).

> somewhere which is what we do for clone3(), openat2() and others to
> catch build-time nonsense.
> 
> include/uapi/linux/perf_event.h:#define PERF_ATTR_SIZE_VER0     64      /* sizeof first published struct */
> include/uapi/linux/sched.h:#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
> include/uapi/linux/sched/types.h:#define SCHED_ATTR_SIZE_VER0   48      /* sizeof first published struct */
> include/linux/fcntl.h:#define OPEN_HOW_SIZE_VER0        24 /* sizeof first published struct */
> include/linux/fcntl.h:#define OPEN_HOW_SIZE_LATEST      OPEN_HOW_SIZE_VER0

The ..._SIZE_VER0 and ...LATEST stuff doesn't seem useful to export via
UAPI. Above, 2 of the 3 export to uapi. Is there a specific rationale
for which should and which shouldn't?

> > +#undef EA_IOCTL
> 
> Why is this undefed? :)

It was defined "in" a function, so I like to mimic function visibility.
But you're right; there's no reason to undef it.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v6.1 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier
  2020-07-07 13:30   ` Christian Brauner
  2020-07-09  6:12     ` Kees Cook
@ 2020-07-09  6:17     ` Kees Cook
  2020-07-09  6:22       ` Kees Cook
  1 sibling, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-07-09  6:17 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Sargun Dhillon, Will Drewry, Christoph Hellwig, Tycho Andersen,
	Jann Horn, Robert Sesek, Chris Palmer, Al Viro, linux-fsdevel,
	linux-api, Matt Denton, linux-kernel

From: Sargun Dhillon <sargun@sargun.me>

The current SECCOMP_RET_USER_NOTIF API allows for syscall supervision over
an fd. It is often used in settings where a supervising task emulates
syscalls on behalf of a supervised task in userspace, either to further
restrict the supervisee's syscall abilities or to circumvent kernel
enforced restrictions the supervisor deems safe to lift (e.g. actually
performing a mount(2) for an unprivileged container).

While SECCOMP_RET_USER_NOTIF allows for the interception of any syscall,
only a certain subset of syscalls could be correctly emulated. Over the
last few development cycles, the set of syscalls which can't be emulated
has been reduced due to the addition of pidfd_getfd(2). With this we are
now able to, for example, intercept syscalls that require the supervisor
to operate on file descriptors of the supervisee such as connect(2).

However, syscalls that cause new file descriptors to be installed can not
currently be correctly emulated since there is no way for the supervisor
to inject file descriptors into the supervisee. This patch adds a
new addfd ioctl to remove this restriction by allowing the supervisor to
install file descriptors into the intercepted task. By implementing this
feature via seccomp the supervisor effectively instructs the supervisee
to install a set of file descriptors into its own file descriptor table
during the intercepted syscall. This way it is possible to intercept
syscalls such as open() or accept(), and install (or replace, like
dup2(2)) the supervisor's resulting fd into the supervisee. One
replacement use-case would be to redirect the stdout and stderr of a
supervisee into log file descriptors opened by the supervisor.

The ioctl handling is based on the discussions[1] of how Extensible
Arguments should interact with ioctls. Instead of building size into
the addfd structure, make it a function of the ioctl command (which
is how sizes are normally passed to ioctls). To support forward and
backward compatibility, just mask out the direction and size, and match
everything. The size (and any future direction) checks are done along
with copy_struct_from_user() logic.

As a note, the seccomp_notif_addfd structure is laid out based on 8-byte
alignment without requiring packing as there have been packing issues
with uapi highlighted before[2][3]. Although we could overload the
newfd field and use -1 to indicate that it is not to be used, doing
so requires changing the size of the fd field, and introduces struct
packing complexity.

[1]: https://lore.kernel.org/lkml/87o8w9bcaf.fsf@mid.deneb.enyo.de/
[2]: https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f18c0@rasmusvillemoes.dk/
[3]: https://lore.kernel.org/lkml/20200612104629.GA15814@ircssh-2.c.rugged-nimbus-611.internal

Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Jann Horn <jannh@google.com>
Cc: Robert Sesek <rsesek@google.com>
Cc: Chris Palmer <palmer@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Suggested-by: Matt Denton <mpdenton@google.com>
Link: https://lore.kernel.org/r/20200603011044.7972-4-sargun@sargun.me
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v6.1:
- clarify commit log (christian)
- add ..._SIZE_{VER0,LATEST} and BUILD_BUG_ON()s (christian)
- remove undef (christian)
- fix embedded URL reference numbers
v6: https://lore.kernel.org/lkml/20200707133049.nfxc6vz6vcs26m3b@wittgenstein
---
 include/linux/seccomp.h      |   4 +
 include/uapi/linux/seccomp.h |  22 +++++
 kernel/seccomp.c             | 173 ++++++++++++++++++++++++++++++++++-
 3 files changed, 198 insertions(+), 1 deletion(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index babcd6c02d09..881c90b6aa25 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -10,6 +10,10 @@
 					 SECCOMP_FILTER_FLAG_NEW_LISTENER | \
 					 SECCOMP_FILTER_FLAG_TSYNC_ESRCH)
 
+/* sizeof() the first published struct seccomp_notif_addfd */
+#define SECCOMP_NOTIFY_ADDFD_SIZE_VER0 24
+#define SECCOMP_NOTIFY_ADDFD_SIZE_LATEST SECCOMP_NOTIFY_ADDFD_SIZE_VER0
+
 #ifdef CONFIG_SECCOMP
 
 #include <linux/thread_info.h>
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 965290f7dcc2..6ba18b82a02e 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -113,6 +113,25 @@ struct seccomp_notif_resp {
 	__u32 flags;
 };
 
+/* valid flags for seccomp_notif_addfd */
+#define SECCOMP_ADDFD_FLAG_SETFD	(1UL << 0) /* Specify remote fd */
+
+/**
+ * struct seccomp_notif_addfd
+ * @id: The ID of the seccomp notification
+ * @flags: SECCOMP_ADDFD_FLAG_*
+ * @srcfd: The local fd number
+ * @newfd: Optional remote FD number if SETFD option is set, otherwise 0.
+ * @newfd_flags: The O_* flags the remote FD should have applied
+ */
+struct seccomp_notif_addfd {
+	__u64 id;
+	__u32 flags;
+	__u32 srcfd;
+	__u32 newfd;
+	__u32 newfd_flags;
+};
+
 #define SECCOMP_IOC_MAGIC		'!'
 #define SECCOMP_IO(nr)			_IO(SECCOMP_IOC_MAGIC, nr)
 #define SECCOMP_IOR(nr, type)		_IOR(SECCOMP_IOC_MAGIC, nr, type)
@@ -124,5 +143,8 @@ struct seccomp_notif_resp {
 #define SECCOMP_IOCTL_NOTIF_SEND	SECCOMP_IOWR(1,	\
 						struct seccomp_notif_resp)
 #define SECCOMP_IOCTL_NOTIF_ID_VALID	SECCOMP_IOW(2, __u64)
+/* On success, the return value is the remote process's added fd number */
+#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOW(3, \
+						struct seccomp_notif_addfd)
 
 #endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 866a432cd746..2305bedcad50 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -87,10 +87,42 @@ struct seccomp_knotif {
 	long val;
 	u32 flags;
 
-	/* Signals when this has entered SECCOMP_NOTIFY_REPLIED */
+	/*
+	 * Signals when this has changed states, such as the listener
+	 * dying, a new seccomp addfd message, or changing to REPLIED
+	 */
 	struct completion ready;
 
 	struct list_head list;
+
+	/* outstanding addfd requests */
+	struct list_head addfd;
+};
+
+/**
+ * struct seccomp_kaddfd - container for seccomp_addfd ioctl messages
+ *
+ * @file: A reference to the file to install in the other task
+ * @fd: The fd number to install it at. If the fd number is -1, it means the
+ *      installing process should allocate the fd as normal.
+ * @flags: The flags for the new file descriptor. At the moment, only O_CLOEXEC
+ *         is allowed.
+ * @ret: The return value of the installing process. It is set to the fd num
+ *       upon success (>= 0).
+ * @completion: Indicates that the installing process has completed fd
+ *              installation, or gone away (either due to successful
+ *              reply, or signal)
+ *
+ */
+struct seccomp_kaddfd {
+	struct file *file;
+	int fd;
+	unsigned int flags;
+
+	/* To only be set on reply */
+	int ret;
+	struct completion completion;
+	struct list_head list;
 };
 
 /**
@@ -793,6 +825,17 @@ static u64 seccomp_next_notify_id(struct seccomp_filter *filter)
 	return filter->notif->next_id++;
 }
 
+static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd)
+{
+	/*
+	 * Remove the notification, and reset the list pointers, indicating
+	 * that it has been handled.
+	 */
+	list_del_init(&addfd->list);
+	addfd->ret = receive_fd_replace(addfd->fd, addfd->file, addfd->flags);
+	complete(&addfd->completion);
+}
+
 static int seccomp_do_user_notification(int this_syscall,
 					struct seccomp_filter *match,
 					const struct seccomp_data *sd)
@@ -801,6 +844,7 @@ static int seccomp_do_user_notification(int this_syscall,
 	u32 flags = 0;
 	long ret = 0;
 	struct seccomp_knotif n = {};
+	struct seccomp_kaddfd *addfd, *tmp;
 
 	mutex_lock(&match->notify_lock);
 	err = -ENOSYS;
@@ -813,6 +857,7 @@ static int seccomp_do_user_notification(int this_syscall,
 	n.id = seccomp_next_notify_id(match);
 	init_completion(&n.ready);
 	list_add(&n.list, &match->notif->notifications);
+	INIT_LIST_HEAD(&n.addfd);
 
 	up(&match->notif->request);
 	wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
@@ -821,14 +866,31 @@ static int seccomp_do_user_notification(int this_syscall,
 	/*
 	 * This is where we wait for a reply from userspace.
 	 */
+wait:
 	err = wait_for_completion_interruptible(&n.ready);
 	mutex_lock(&match->notify_lock);
 	if (err == 0) {
+		/* Check if we were woken up by a addfd message */
+		addfd = list_first_entry_or_null(&n.addfd,
+						 struct seccomp_kaddfd, list);
+		if (addfd && n.state != SECCOMP_NOTIFY_REPLIED) {
+			seccomp_handle_addfd(addfd);
+			mutex_unlock(&match->notify_lock);
+			goto wait;
+		}
 		ret = n.val;
 		err = n.error;
 		flags = n.flags;
 	}
 
+	/* If there were any pending addfd calls, clear them out */
+	list_for_each_entry_safe(addfd, tmp, &n.addfd, list) {
+		/* The process went away before we got a chance to handle it */
+		addfd->ret = -ESRCH;
+		list_del_init(&addfd->list);
+		complete(&addfd->completion);
+	}
+
 	/*
 	 * Note that it's possible the listener died in between the time when
 	 * we were notified of a respons (or a signal) and when we were able to
@@ -1069,6 +1131,11 @@ static int seccomp_notify_release(struct inode *inode, struct file *file)
 		knotif->error = -ENOSYS;
 		knotif->val = 0;
 
+		/*
+		 * We do not need to wake up any pending addfd messages, as
+		 * the notifier will do that for us, as this just looks
+		 * like a standard reply.
+		 */
 		complete(&knotif->ready);
 	}
 
@@ -1233,12 +1300,109 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter,
 	return ret;
 }
 
+static long seccomp_notify_addfd(struct seccomp_filter *filter,
+				 struct seccomp_notif_addfd __user *uaddfd,
+				 unsigned int size)
+{
+	struct seccomp_notif_addfd addfd;
+	struct seccomp_knotif *knotif;
+	struct seccomp_kaddfd kaddfd;
+	int ret;
+
+	BUILD_BUG_ON(sizeof(struct seccomp_notify_addfd) < SECCOMP_NOTIFY_ADDFD_SIZE_VER0);
+	BUILD_BUG_ON(sizeof(struct seccomp_notify_addfd) != SECCOMP_NOTIFY_ADDFD_SIZE_LATEST);
+
+	if (size < SECCOMP_NOTIFY_ADDFD_SIZE_VER0 || size >= PAGE_SIZE)
+		return -EINVAL;
+
+	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
+	if (ret)
+		return ret;
+
+	if (addfd.newfd_flags & ~O_CLOEXEC)
+		return -EINVAL;
+
+	if (addfd.flags & ~SECCOMP_ADDFD_FLAG_SETFD)
+		return -EINVAL;
+
+	if (addfd.newfd && !(addfd.flags & SECCOMP_ADDFD_FLAG_SETFD))
+		return -EINVAL;
+
+	kaddfd.file = fget(addfd.srcfd);
+	if (!kaddfd.file)
+		return -EBADF;
+
+	kaddfd.flags = addfd.newfd_flags;
+	kaddfd.fd = (addfd.flags & SECCOMP_ADDFD_FLAG_SETFD) ?
+		    addfd.newfd : -1;
+	init_completion(&kaddfd.completion);
+
+	ret = mutex_lock_interruptible(&filter->notify_lock);
+	if (ret < 0)
+		goto out;
+
+	knotif = find_notification(filter, addfd.id);
+	if (!knotif) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	/*
+	 * We do not want to allow for FD injection to occur before the
+	 * notification has been picked up by a userspace handler, or after
+	 * the notification has been replied to.
+	 */
+	if (knotif->state != SECCOMP_NOTIFY_SENT) {
+		ret = -EINPROGRESS;
+		goto out_unlock;
+	}
+
+	list_add(&kaddfd.list, &knotif->addfd);
+	complete(&knotif->ready);
+	mutex_unlock(&filter->notify_lock);
+
+	/* Now we wait for it to be processed or be interrupted */
+	ret = wait_for_completion_interruptible(&kaddfd.completion);
+	if (ret == 0) {
+		/*
+		 * We had a successful completion. The other side has already
+		 * removed us from the addfd queue, and
+		 * wait_for_completion_interruptible has a memory barrier upon
+		 * success that lets us read this value directly without
+		 * locking.
+		 */
+		ret = kaddfd.ret;
+		goto out;
+	}
+
+	mutex_lock(&filter->notify_lock);
+	/*
+	 * Even though we were woken up by a signal and not a successful
+	 * completion, a completion may have happened in the mean time.
+	 *
+	 * We need to check again if the addfd request has been handled,
+	 * and if not, we will remove it from the queue.
+	 */
+	if (list_empty(&kaddfd.list))
+		ret = kaddfd.ret;
+	else
+		list_del(&kaddfd.list);
+
+out_unlock:
+	mutex_unlock(&filter->notify_lock);
+out:
+	fput(kaddfd.file);
+
+	return ret;
+}
+
 static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
 				 unsigned long arg)
 {
 	struct seccomp_filter *filter = file->private_data;
 	void __user *buf = (void __user *)arg;
 
+	/* Fixed-size ioctls */
 	switch (cmd) {
 	case SECCOMP_IOCTL_NOTIF_RECV:
 		return seccomp_notify_recv(filter, buf);
@@ -1247,6 +1411,13 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
 	case SECCOMP_IOCTL_NOTIF_ID_VALID_WRONG_DIR:
 	case SECCOMP_IOCTL_NOTIF_ID_VALID:
 		return seccomp_notify_id_valid(filter, buf);
+	}
+
+	/* Extensible Argument ioctls */
+#define EA_IOCTL(cmd)	((cmd) & ~(IOC_INOUT | IOCSIZE_MASK))
+	switch (EA_IOCTL(cmd)) {
+	case EA_IOCTL(SECCOMP_IOCTL_NOTIF_ADDFD):
+		return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
 	default:
 		return -EINVAL;
 	}
-- 
2.25.1


-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6.1 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier
  2020-07-09  6:17     ` [PATCH v6.1 " Kees Cook
@ 2020-07-09  6:22       ` Kees Cook
  0 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2020-07-09  6:22 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Sargun Dhillon, Will Drewry, Christoph Hellwig, Tycho Andersen,
	Jann Horn, Robert Sesek, Chris Palmer, Al Viro, linux-fsdevel,
	linux-api, Matt Denton, linux-kernel

On Wed, Jul 08, 2020 at 11:17:27PM -0700, Kees Cook wrote:
> +static long seccomp_notify_addfd(struct seccomp_filter *filter,
> +				 struct seccomp_notif_addfd __user *uaddfd,
> +				 unsigned int size)
> +{
> +	struct seccomp_notif_addfd addfd;
> +	struct seccomp_knotif *knotif;
> +	struct seccomp_kaddfd kaddfd;
> +	int ret;
> +
> +	BUILD_BUG_ON(sizeof(struct seccomp_notify_addfd) < SECCOMP_NOTIFY_ADDFD_SIZE_VER0);
> +	BUILD_BUG_ON(sizeof(struct seccomp_notify_addfd) != SECCOMP_NOTIFY_ADDFD_SIZE_LATEST);

*brown paper bag* I built the wrong tree! This is a typo:
seccomp_notify_addfd should be seccomp_notif_addfd (no "y").

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 4/7] pidfd: Replace open-coded partial receive_fd()
  2020-07-07 12:22   ` Christian Brauner
  2020-07-08 23:49     ` Kees Cook
@ 2020-07-09  6:35     ` Kees Cook
  2020-07-09 12:54       ` Christian Brauner
  1 sibling, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-07-09  6:35 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Christoph Hellwig, linux-kernel, Sargun Dhillon,
	Christian Brauner, Tycho Andersen, David Laight, David S. Miller,
	Jakub Kicinski, Alexander Viro, Aleksa Sarai, Matt Denton,
	Jann Horn, Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Tue, Jul 07, 2020 at 02:22:20PM +0200, Christian Brauner wrote:
> So while the patch is correct it leaves 5.6 and 5.7 with a bug in the
> pidfd_getfd() implementation and that just doesn't seem right. I'm
> wondering whether we should introduce:
> 
> void sock_update(struct file *file)
> {
> 	struct socket *sock;
> 	int error;
> 
> 	sock = sock_from_file(file, &error);
> 	if (sock) {
> 		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> 		sock_update_classid(&sock->sk->sk_cgrp_data);
> 	}
> }
> 
> and switch pidfd_getfd() over to:
> 
> diff --git a/kernel/pid.c b/kernel/pid.c
> index f1496b757162..c26bba822be3 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -642,10 +642,12 @@ static int pidfd_getfd(struct pid *pid, int fd)
>         }
> 
>         ret = get_unused_fd_flags(O_CLOEXEC);
> -       if (ret < 0)
> +       if (ret < 0) {
>                 fput(file);
> -       else
> +       } else {
> +               sock_update(file);
>                 fd_install(ret, file);
> +       }
> 
>         return ret;
>  }
> 
> first thing in the series and then all of the other patches on top of it
> so that we can Cc stable for this and that can get it backported to 5.6,
> 5.7, and 5.8.
> 
> Alternatively, I can make this a separate bugfix patch series which I'll
> send upstream soonish. Or we have specific patches just for 5.6, 5.7,
> and 5.8. Thoughts?

Okay, I looked at hch's clean-ups again and I'm reminded why they
don't make great -stable material. :) The compat bug (also missing the
sock_update()) needs a similar fix (going back to 3.6...), so, yeah,
for ease of backport, probably an explicit sock_update() implementation
(with compat and native scm using it), and a second patch for pidfd.

Let me see what I looks best...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 4/7] pidfd: Replace open-coded partial receive_fd()
  2020-07-09  6:35     ` Kees Cook
@ 2020-07-09 12:54       ` Christian Brauner
  0 siblings, 0 replies; 26+ messages in thread
From: Christian Brauner @ 2020-07-09 12:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christoph Hellwig, linux-kernel, Sargun Dhillon,
	Christian Brauner, Tycho Andersen, David Laight, David S. Miller,
	Jakub Kicinski, Alexander Viro, Aleksa Sarai, Matt Denton,
	Jann Horn, Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Wed, Jul 08, 2020 at 11:35:39PM -0700, Kees Cook wrote:
> On Tue, Jul 07, 2020 at 02:22:20PM +0200, Christian Brauner wrote:
> > So while the patch is correct it leaves 5.6 and 5.7 with a bug in the
> > pidfd_getfd() implementation and that just doesn't seem right. I'm
> > wondering whether we should introduce:
> > 
> > void sock_update(struct file *file)
> > {
> > 	struct socket *sock;
> > 	int error;
> > 
> > 	sock = sock_from_file(file, &error);
> > 	if (sock) {
> > 		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> > 		sock_update_classid(&sock->sk->sk_cgrp_data);
> > 	}
> > }
> > 
> > and switch pidfd_getfd() over to:
> > 
> > diff --git a/kernel/pid.c b/kernel/pid.c
> > index f1496b757162..c26bba822be3 100644
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -642,10 +642,12 @@ static int pidfd_getfd(struct pid *pid, int fd)
> >         }
> > 
> >         ret = get_unused_fd_flags(O_CLOEXEC);
> > -       if (ret < 0)
> > +       if (ret < 0) {
> >                 fput(file);
> > -       else
> > +       } else {
> > +               sock_update(file);
> >                 fd_install(ret, file);
> > +       }
> > 
> >         return ret;
> >  }
> > 
> > first thing in the series and then all of the other patches on top of it
> > so that we can Cc stable for this and that can get it backported to 5.6,
> > 5.7, and 5.8.
> > 
> > Alternatively, I can make this a separate bugfix patch series which I'll
> > send upstream soonish. Or we have specific patches just for 5.6, 5.7,
> > and 5.8. Thoughts?
> 
> Okay, I looked at hch's clean-ups again and I'm reminded why they
> don't make great -stable material. :) The compat bug (also missing the
> sock_update()) needs a similar fix (going back to 3.6...), so, yeah,
> for ease of backport, probably an explicit sock_update() implementation
> (with compat and native scm using it), and a second patch for pidfd.
> 
> Let me see what I looks best...

Yeah, it'd be quite some code. I've written some patches for this before
I sent this mail, just so you know. We likely need a 5.6 and 5.7 patch
and a 5.8 patch after Christoph's changes. The 5.8 fixes I'd like to get
in during this merge window. So either I can do this or you can send me
the patches for this?

Christian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier
  2020-07-09  6:12     ` Kees Cook
@ 2020-07-09 13:08       ` Christian Brauner
  0 siblings, 0 replies; 26+ messages in thread
From: Christian Brauner @ 2020-07-09 13:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Sargun Dhillon, Matt Denton, Christian Brauner,
	Tycho Andersen, David Laight, Christoph Hellwig, David S. Miller,
	Jakub Kicinski, Alexander Viro, Aleksa Sarai, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano,
	Greg Kroah-Hartman, Andy Lutomirski, Will Drewry, Shuah Khan,
	netdev, containers, linux-api, linux-fsdevel, linux-kselftest

On Wed, Jul 08, 2020 at 11:12:02PM -0700, Kees Cook wrote:
> On Tue, Jul 07, 2020 at 03:30:49PM +0200, Christian Brauner wrote:
> > Hm, maybe change that description to sm like:
> > 
> > [...]
> 
> Cool, yeah. Thanks! I've tweaked it a little more
> 
> > > +	/* 24 is original sizeof(struct seccomp_notif_addfd) */
> > > +	if (size < 24 || size >= PAGE_SIZE)
> > > +		return -EINVAL;
> > 
> > Hm, so maybe add the following:
> > 
> > #define SECCOMP_NOTIFY_ADDFD_VER0 24
> > #define SECCOMP_NOTIFY_ADDFD_LATEST SECCOMP_NOTIFY_ADDFD_VER0
> > 
> > and then place:
> > 
> > BUILD_BUG_ON(sizeof(struct seccomp_notify_addfd) < SECCOMP_NOTIFY_ADDFD_VER0);
> > BUILD_BUG_ON(sizeof(struct open_how) != SECCOMP_NOTIFY_ADDFD_LATEST);
> 
> Yes, good idea (BTW, did the EA syscall docs land?)

I'll be giving a kernel summit talk about extensible syscalls to come to
some agreement on a few things. After this we'll update the doc patch
we have now and merge it. :)

> 
> I've made these SECCOMP_NOTIFY_ADDFD_SIZE_* to match your examples below
> (i.e.  I added "SIZE" to what you suggested above).

Yup, sounds good!

> 
> > somewhere which is what we do for clone3(), openat2() and others to
> > catch build-time nonsense.
> > 
> > include/uapi/linux/perf_event.h:#define PERF_ATTR_SIZE_VER0     64      /* sizeof first published struct */
> > include/uapi/linux/sched.h:#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
> > include/uapi/linux/sched/types.h:#define SCHED_ATTR_SIZE_VER0   48      /* sizeof first published struct */
> > include/linux/fcntl.h:#define OPEN_HOW_SIZE_VER0        24 /* sizeof first published struct */
> > include/linux/fcntl.h:#define OPEN_HOW_SIZE_LATEST      OPEN_HOW_SIZE_VER0
> 
> The ..._SIZE_VER0 and ...LATEST stuff doesn't seem useful to export via
> UAPI. Above, 2 of the 3 export to uapi. Is there a specific rationale
> for which should and which shouldn't?

I think openat2() just didn't think it was useful. I find them helpful
because I often update codebase to the newest struct I know about:

struct clone_args {
	__aligned_u64 flags;
	__aligned_u64 pidfd;
	__aligned_u64 child_tid;
	__aligned_u64 parent_tid;
	__aligned_u64 exit_signal;
	__aligned_u64 stack;
	__aligned_u64 stack_size;
	__aligned_u64 tls;
/* CLONE_ARGS_SIZE_VER0 64 */
	__aligned_u64 set_tid;
	__aligned_u64 set_tid_size;
/* CLONE_ARGS_SIZE_VER1 80 */
	__aligned_u64 cgroup;
/* CLONE_ARGS_SIZE_VER2 88 */
};

But bumping it means I can't use:

clone3(&clone_args, sizeof(clone));

everywhere in the codebase because I'm fscking over everyone on older
kernels now. :)

Soin various parts of the codebase I will just use:

clone3(&clone_args, CLONE_ARGS_SIZE_VER0);

because I don't care about any of the additional features and I don't
need the kernel to copy any of the other stuff. Then in other parts of
the codebase I want to set_tid so I use:

clone3(&clone_args, CLONE_ARGS_SIZE_VER1);

This way I can also set "templates", i.e.

struct clone_args clone_template1 = {
	.flags		|= CLONE_CLEAR_SIGHAND,
	.exit_signal	= SIGCHLD,
	.set_tid	= 1000,
	.set_tid_size	= 1,
};

and then use the same struct for:

clone3(&clone_template1, CLONE_ARGS_SIZE_VER0);
clone3(&clone_template1, CLONE_ARGS_SIZE_VER1);

Whereas sizeof(clone_template1) would always give me
CLONE_ARGS_SIZE_VER2.

Christian

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-07-09 13:08 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-06 20:17 [PATCH v6 0/7] Add seccomp notifier ioctl that enables adding fds Kees Cook
2020-07-06 20:17 ` [PATCH v6 1/7] net/scm: Regularize compat handling of scm_detach_fds() Kees Cook
2020-07-07 11:41   ` Christian Brauner
2020-07-08 23:46     ` Kees Cook
2020-07-06 20:17 ` [PATCH v6 2/7] fs: Move __scm_install_fd() to __receive_fd() Kees Cook
2020-07-07  6:49   ` Christoph Hellwig
2020-07-07 11:45   ` Christian Brauner
2020-07-06 20:17 ` [PATCH v6 3/7] fs: Add receive_fd() wrapper for __receive_fd() Kees Cook
2020-07-07  6:49   ` Christoph Hellwig
2020-07-07 11:49   ` Christian Brauner
2020-07-08 23:48     ` Kees Cook
2020-07-06 20:17 ` [PATCH v6 4/7] pidfd: Replace open-coded partial receive_fd() Kees Cook
2020-07-07 12:22   ` Christian Brauner
2020-07-08 23:49     ` Kees Cook
2020-07-09  6:35     ` Kees Cook
2020-07-09 12:54       ` Christian Brauner
2020-07-06 20:17 ` [PATCH v6 5/7] fs: Expand __receive_fd() to accept existing fd Kees Cook
2020-07-07 12:38   ` Christian Brauner
2020-07-08 23:52     ` Kees Cook
2020-07-06 20:17 ` [PATCH v6 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier Kees Cook
2020-07-07 13:30   ` Christian Brauner
2020-07-09  6:12     ` Kees Cook
2020-07-09 13:08       ` Christian Brauner
2020-07-09  6:17     ` [PATCH v6.1 " Kees Cook
2020-07-09  6:22       ` Kees Cook
2020-07-06 20:17 ` [PATCH v6 7/7] selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).