linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()
@ 2024-04-27 11:24 Stas Sergeev
  2024-04-27 11:24 ` [PATCH v6 1/3] fs: reorganize path_openat() Stas Sergeev
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Stas Sergeev @ 2024-04-27 11:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro,
	Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton,
	Chuck Lever, Alexander Aring, David Laight, linux-fsdevel,
	linux-api, Paolo Bonzini, Christian Göttsche

This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall.
It is needed to perform an open operation with the creds that were in
effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW
flag. This allows the process to pre-open some dirs and switch eUID
(and other UIDs/GIDs) to the less-privileged user, while still retaining
the possibility to open/create files within the pre-opened directory set.

The sand-boxing is security-oriented: symlinks leading outside of a
sand-box are rejected. /proc magic links are rejected. fds opened with
O_CRED_ALLOW are always closed on exec() and cannot be passed via unix
socket.
The more detailed description (including security considerations)
is available in the log messages of individual patches.

Changes in v6:
- it appears open flags bit 23 is already taken on parisc, and bit 24
  is taken on alpha. Move O_CRED_ALLOW to bit 25.
- added selftests for both O_CRED_ALLOW and O_CRED_INHERIT additions

Changes in v5:
- rename OA2_INHERIT_CRED to OA2_CRED_INHERIT
- add an "opt-in" flag O_CRED_ALLOW as was suggested by many reviewers
- stop using 64bit types, as suggested by
  Christian Brauner <brauner@kernel.org>
- add BUILD_BUG_ON() for VALID_OPENAT2_FLAGS, based on Christian Brauner's
  comments
- fixed problems reported by patch-testing bot
- made O_CRED_ALLOW fds not passable via unix sockets and exec(),
  based on Christian Brauner's comments

Changes in v4:
- add optimizations suggested by David Laight <David.Laight@ACULAB.COM>
- move security checks to build_open_flags()
- force RESOLVE_NO_MAGICLINKS as suggested by Andy Lutomirski <luto@kernel.org>

Changes in v3:
- partially revert v2 changes to avoid overriding capabilities.
  Only the bare minimum is overridden: fsuid, fsgid and group_info.
  Document the fact the full cred override is unwanted, as it may
  represent an unneeded security risk.

Changes in v2:
- capture full struct cred instead of just fsuid/fsgid.
  Suggested by Stefan Metzmacher <metze@samba.org>

CC: Stefan Metzmacher <metze@samba.org>
CC: Eric Biederman <ebiederm@xmission.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Andy Lutomirski <luto@kernel.org>
CC: Christian Brauner <brauner@kernel.org>
CC: Jan Kara <jack@suse.cz>
CC: Jeff Layton <jlayton@kernel.org>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Alexander Aring <alex.aring@gmail.com>
CC: David Laight <David.Laight@ACULAB.COM>
CC: linux-fsdevel@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Christian Göttsche <cgzones@googlemail.com>

-- 
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 1/3] fs: reorganize path_openat()
  2024-04-27 11:24 [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2() Stas Sergeev
@ 2024-04-27 11:24 ` Stas Sergeev
  2024-04-27 11:24 ` [PATCH v6 2/3] open: add O_CRED_ALLOW flag Stas Sergeev
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Stas Sergeev @ 2024-04-27 11:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro,
	Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton,
	Chuck Lever, Alexander Aring, David Laight, linux-fsdevel,
	linux-api, Paolo Bonzini, Christian Göttsche

This patch moves the call to alloc_empty_file() down the if branches.
That changes is needed for the next patch, which adds a cred override
for alloc_empty_file(). The cred override is only needed in one branch,
i.e. it is not needed for O_PATH and O_TMPFILE..

No functional changes are intended by that patch.

Signed-off-by: Stas Sergeev <stsp2@yandex.ru>

CC: Eric Biederman <ebiederm@xmission.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Christian Brauner <brauner@kernel.org>
CC: Jan Kara <jack@suse.cz>
CC: Andy Lutomirski <luto@kernel.org>
CC: David Laight <David.Laight@ACULAB.COM>
CC: linux-fsdevel@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
 fs/namei.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c5b2a25be7d0..dd50345f7260 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3781,17 +3781,24 @@ static struct file *path_openat(struct nameidata *nd,
 {
 	struct file *file;
 	int error;
+	int open_flags = op->open_flag;
 
-	file = alloc_empty_file(op->open_flag, current_cred());
-	if (IS_ERR(file))
-		return file;
-
-	if (unlikely(file->f_flags & __O_TMPFILE)) {
-		error = do_tmpfile(nd, flags, op, file);
-	} else if (unlikely(file->f_flags & O_PATH)) {
-		error = do_o_path(nd, flags, file);
+	if (unlikely(open_flags & (__O_TMPFILE | O_PATH))) {
+		file = alloc_empty_file(open_flags, current_cred());
+		if (IS_ERR(file))
+			return file;
+		if (open_flags & __O_TMPFILE)
+			error = do_tmpfile(nd, flags, op, file);
+		else
+			error = do_o_path(nd, flags, file);
 	} else {
-		const char *s = path_init(nd, flags);
+		const char *s;
+
+		file = alloc_empty_file(open_flags, current_cred());
+		if (IS_ERR(file))
+			return file;
+
+		s = path_init(nd, flags);
 		while (!(error = link_path_walk(s, nd)) &&
 		       (s = open_last_lookups(nd, file, op)) != NULL)
 			;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 2/3] open: add O_CRED_ALLOW flag
  2024-04-27 11:24 [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2() Stas Sergeev
  2024-04-27 11:24 ` [PATCH v6 1/3] fs: reorganize path_openat() Stas Sergeev
@ 2024-04-27 11:24 ` Stas Sergeev
  2024-04-27 11:24 ` [PATCH v6 3/3] openat2: add OA2_CRED_INHERIT flag Stas Sergeev
  2024-05-07  7:50 ` [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2() Aleksa Sarai
  3 siblings, 0 replies; 10+ messages in thread
From: Stas Sergeev @ 2024-04-27 11:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro,
	Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton,
	Chuck Lever, Alexander Aring, David Laight, linux-fsdevel,
	linux-api, Paolo Bonzini, Christian Göttsche, Arnd Bergmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jens Axboe, Kuniyuki Iwashima, Pavel Begunkov, linux-arch,
	netdev

This flag prevents an fd from being passed via unix socket, and
makes it to be always closed on exec().

Selftest is added to check for both properties.

It is needed for the subsequent OA2_CRED_INHERIT addition, to work
as an "opt-in" for the new cred-inherit functionality. Without using
O_CRED_ALLOW when opening dir fd, OA2_CRED_INHERIT is going to return
EPERM.

Signed-off-by: Stas Sergeev <stsp2@yandex.ru>

CC: Eric Biederman <ebiederm@xmission.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Christian Brauner <brauner@kernel.org>
CC: Jan Kara <jack@suse.cz>
CC: Andy Lutomirski <luto@kernel.org>
CC: David Laight <David.Laight@ACULAB.COM>
CC: Arnd Bergmann <arnd@arndb.de>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Paolo Abeni <pabeni@redhat.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Kuniyuki Iwashima <kuniyu@amazon.com>
CC: Pavel Begunkov <asml.silence@gmail.com>
CC: linux-arch@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
---
 fs/fcntl.c                                |   2 +-
 fs/file.c                                 |  15 +--
 include/linux/fcntl.h                     |   2 +-
 include/uapi/asm-generic/fcntl.h          |   5 +
 net/core/scm.c                            |   5 +
 tools/testing/selftests/core/Makefile     |   2 +-
 tools/testing/selftests/core/cred_allow.c | 139 ++++++++++++++++++++++
 7 files changed, 160 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/core/cred_allow.c

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 54cc85d3338e..78c96b1293c2 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -1039,7 +1039,7 @@ static int __init fcntl_init(void)
 	 * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
 	 * is defined as O_NONBLOCK on some platforms and not on others.
 	 */
-	BUILD_BUG_ON(21 - 1 /* for O_RDONLY being 0 */ !=
+	BUILD_BUG_ON(22 - 1 /* for O_RDONLY being 0 */ !=
 		HWEIGHT32(
 			(VALID_OPEN_FLAGS & ~(O_NONBLOCK | O_NDELAY)) |
 			__FMODE_EXEC | __FMODE_NONOTIFY));
diff --git a/fs/file.c b/fs/file.c
index 3b683b9101d8..2a09d5276676 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -827,22 +827,23 @@ void do_close_on_exec(struct files_struct *files)
 	/* exec unshares first */
 	spin_lock(&files->file_lock);
 	for (i = 0; ; i++) {
+		int j;
 		unsigned long set;
 		unsigned fd = i * BITS_PER_LONG;
 		fdt = files_fdtable(files);
 		if (fd >= fdt->max_fds)
 			break;
 		set = fdt->close_on_exec[i];
-		if (!set)
-			continue;
 		fdt->close_on_exec[i] = 0;
-		for ( ; set ; fd++, set >>= 1) {
-			struct file *file;
-			if (!(set & 1))
-				continue;
-			file = fdt->fd[fd];
+		for (j = 0; j < BITS_PER_LONG; j++, fd++, set >>= 1) {
+			struct file *file = fdt->fd[fd];
 			if (!file)
 				continue;
+			/* Close all cred-allow files. */
+			if (file->f_flags & O_CRED_ALLOW)
+				set |= 1;
+			if (!(set & 1))
+				continue;
 			rcu_assign_pointer(fdt->fd[fd], NULL);
 			__put_unused_fd(files, fd);
 			spin_unlock(&files->file_lock);
diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
index a332e79b3207..e074ee9c1e36 100644
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -10,7 +10,7 @@
 	(O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \
 	 O_APPEND | O_NDELAY | O_NONBLOCK | __O_SYNC | O_DSYNC | \
 	 FASYNC	| O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
-	 O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE)
+	 O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | O_CRED_ALLOW)
 
 /* List of all valid flags for the how->resolve argument: */
 #define VALID_RESOLVE_FLAGS \
diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index 80f37a0d40d7..9244c54bb933 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -89,6 +89,11 @@
 #define __O_TMPFILE	020000000
 #endif
 
+#ifndef O_CRED_ALLOW
+/* On parisc bit 23 is taken. On alpha bit 24 is also taken. Try bit 25. */
+#define O_CRED_ALLOW	0200000000
+#endif
+
 /* a horrid kludge trying to make sure that this will fail on old kernels */
 #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
 
diff --git a/net/core/scm.c b/net/core/scm.c
index 9cd4b0a01cd6..f54fb0ee9727 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -111,6 +111,11 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp)
 			fput(file);
 			return -EINVAL;
 		}
+		/* don't allow files with creds */
+		if (file->f_flags & O_CRED_ALLOW) {
+			fput(file);
+			return -EPERM;
+		}
 		if (unix_get_socket(file))
 			fpl->count_unix++;
 
diff --git a/tools/testing/selftests/core/Makefile b/tools/testing/selftests/core/Makefile
index ce262d097269..347a5a9d3f29 100644
--- a/tools/testing/selftests/core/Makefile
+++ b/tools/testing/selftests/core/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 CFLAGS += -g $(KHDR_INCLUDES)
 
-TEST_GEN_PROGS := close_range_test
+TEST_GEN_PROGS := close_range_test cred_allow
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/core/cred_allow.c b/tools/testing/selftests/core/cred_allow.c
new file mode 100644
index 000000000000..07d533207a2c
--- /dev/null
+++ b/tools/testing/selftests/core/cred_allow.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/socket.h>
+#include <time.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "../kselftest.h"
+
+#ifndef O_CRED_ALLOW
+#define O_CRED_ALLOW 0x2000000
+#endif
+
+enum { FD_NORM, FD_CE, FD_CA, FD_MAX };
+
+static int is_opened(int n)
+{
+	char buf[256];
+
+	snprintf(buf, sizeof(buf), "/proc/self/fd/%d", n);
+	return (access(buf, F_OK) == 0);
+}
+
+/* Sends an FD on a UNIX socket. Returns 0 on success or -errno. */
+static int send_fd(int usock, int fd_tx)
+{
+	union {
+		/* Aligned ancillary data buffer. */
+		char buf[CMSG_SPACE(sizeof(fd_tx))];
+		struct cmsghdr _align;
+	} cmsg_tx = {};
+	char data_tx = '.';
+	struct iovec io = {
+		.iov_base = &data_tx,
+		.iov_len = sizeof(data_tx),
+	};
+	struct msghdr msg = {
+		.msg_iov = &io,
+		.msg_iovlen = 1,
+		.msg_control = &cmsg_tx.buf,
+		.msg_controllen = sizeof(cmsg_tx.buf),
+	};
+	struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
+
+	cmsg->cmsg_len = CMSG_LEN(sizeof(fd_tx));
+	cmsg->cmsg_level = SOL_SOCKET;
+	cmsg->cmsg_type = SCM_RIGHTS;
+	memcpy(CMSG_DATA(cmsg), &fd_tx, sizeof(fd_tx));
+
+	if (sendmsg(usock, &msg, 0) < 0)
+		return -errno;
+	return 0;
+}
+
+int main(int argc, char *argv[], char *env[])
+{
+	int status;
+	int err;
+	pid_t pid;
+	int socket_fds[2];
+	int fds[FD_MAX];
+#define NFD(n) ((n) + 3)
+#define FD_OK(n) (NFD(n) == fds[n] && is_opened(fds[n]))
+
+	if (argc > 1 && strcmp(argv[1], "--child") == 0) {
+		int nfd = 0;
+		ksft_print_msg("we are child\n");
+		ksft_set_plan(3);
+		nfd += is_opened(NFD(FD_NORM));
+		ksft_test_result(nfd == 1, "normal fd opened\n");
+		nfd += is_opened(NFD(FD_CE));
+		ksft_test_result(nfd == 1, "O_CLOEXEC fd closed\n");
+		nfd += is_opened(NFD(FD_CA));
+		ksft_test_result(nfd == 1, "O_CRED_ALLOW fd closed\n");
+		/* exit with non-zero status propagates to parent's failure */
+		ksft_finished();
+		return 0;
+	}
+
+	ksft_set_plan(7);
+
+	fds[FD_NORM] = open("/proc/self/exe", O_RDONLY);
+	fds[FD_CE] = open("/proc/self/exe", O_RDONLY | O_CLOEXEC);
+	fds[FD_CA] = open("/proc/self/exe", O_RDONLY | O_CRED_ALLOW);
+	ksft_test_result(FD_OK(FD_NORM), "regular open\n");
+	ksft_test_result(FD_OK(FD_CE), "O_CLOEXEC open\n");
+	ksft_test_result(FD_OK(FD_CA), "O_CRED_ALLOW open\n");
+
+	err = socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0, socket_fds);
+	if (err) {
+		ksft_perror("socketpair() failed");
+		ksft_exit_fail_msg("socketpair\n");
+		return 1;
+	}
+	err = send_fd(socket_fds[0], fds[FD_NORM]);
+	ksft_test_result(err == 0, "normal fd sent\n");
+	err = send_fd(socket_fds[0], fds[FD_CE]);
+	ksft_test_result(err == 0, "O_CLOEXEC fd sent\n");
+	err = send_fd(socket_fds[0], fds[FD_CA]);
+	ksft_test_result(err == -EPERM, "O_CRED_ALLOW fd not sent, EPERM\n");
+	close(socket_fds[0]);
+	close(socket_fds[1]);
+
+	pid = fork();
+	if (pid < 0) {
+		ksft_perror("fork() failed");
+		ksft_exit_fail_msg("fork\n");
+		return 1;
+	}
+
+	if (pid == 0) {
+		char *cargv[] = {"cred_allow", "--child", NULL};
+
+		execve("/proc/self/exe", cargv, env);
+		ksft_perror("execve() failed");
+		ksft_exit_fail_msg("execve\n");
+		return 1;
+	}
+
+	if (waitpid(pid, &status, 0) != pid) {
+		ksft_perror("waitpid() failed");
+		ksft_exit_fail_msg("waitpid\n");
+		return 1;
+	}
+	ksft_print_msg("back to parent\n");
+
+	ksft_test_result(status == 0, "child success\n");
+
+	ksft_finished();
+	return 0;
+}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 3/3] openat2: add OA2_CRED_INHERIT flag
  2024-04-27 11:24 [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2() Stas Sergeev
  2024-04-27 11:24 ` [PATCH v6 1/3] fs: reorganize path_openat() Stas Sergeev
  2024-04-27 11:24 ` [PATCH v6 2/3] open: add O_CRED_ALLOW flag Stas Sergeev
@ 2024-04-27 11:24 ` Stas Sergeev
  2024-05-04 20:38   ` Donald Buczek
  2024-05-07  7:50 ` [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2() Aleksa Sarai
  3 siblings, 1 reply; 10+ messages in thread
From: Stas Sergeev @ 2024-04-27 11:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro,
	Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton,
	Chuck Lever, Alexander Aring, David Laight, linux-fsdevel,
	linux-api, Paolo Bonzini, Christian Göttsche

This flag performs the open operation with the fs credentials
(fsuid, fsgid, group_info) that were in effect when dir_fd was opened.
dir_fd must be opened with O_CRED_ALLOW, or EPERM is returned.

Selftests are added to check for these properties as well as for
the invalid flag combinations.

This allows the process to pre-open some directories and then
change eUID (and all other UIDs/GIDs) to a less-privileged user,
retaining the ability to open/create files within these directories.

Design goal:
The idea is to provide a very light-weight sandboxing, where the
process, without the use of any heavy-weight techniques like chroot
within namespaces, can restrict the access to the set of pre-opened
directories.
This patch is just a first step to such sandboxing. If things go
well, in the future the same extension can be added to more syscalls.
These should include at least unlinkat(), renameat2() and the
not-yet-upstreamed setxattrat().

Security considerations:
- Only the bare minimal set of credentials is overridden:
  fsuid, fsgid and group_info. The rest, for example capabilities,
  are not overridden to avoid unneeded security risks.
- To avoid sandboxing escape, this patch makes sure the restricted
  lookup modes are used. Namely, RESOLVE_BENEATH or RESOLVE_IN_ROOT.
- Magic /proc symlinks are discarded, as suggested by
  Andy Lutomirski <luto@kernel.org>
- O_CRED_ALLOW fds cannot be passed via unix socket and are always
  closed on exec() to prevent "unsuspecting userspace" from not being
  able to fully drop privs.

Use cases:
Virtual machines that deal with untrusted code, can use that
instead of a more heavy-weighted approaches.
Currently the approach is being tested on a dosemu2 VM.

Signed-off-by: Stas Sergeev <stsp2@yandex.ru>

CC: Stefan Metzmacher <metze@samba.org>
CC: Eric Biederman <ebiederm@xmission.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Andy Lutomirski <luto@kernel.org>
CC: Christian Brauner <brauner@kernel.org>
CC: Jan Kara <jack@suse.cz>
CC: Jeff Layton <jlayton@kernel.org>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Alexander Aring <alex.aring@gmail.com>
CC: linux-fsdevel@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Christian Göttsche <cgzones@googlemail.com>
---
 fs/fcntl.c                                    |   2 +
 fs/namei.c                                    |  56 +++++++++-
 fs/open.c                                     |  10 +-
 include/linux/fcntl.h                         |   2 +
 include/uapi/linux/openat2.h                  |   2 +
 tools/testing/selftests/openat2/Makefile      |   2 +-
 .../testing/selftests/openat2/cred_inherit.c  | 105 ++++++++++++++++++
 .../testing/selftests/openat2/openat2_test.c  |  12 +-
 8 files changed, 186 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/openat2/cred_inherit.c

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 78c96b1293c2..283c2e65fc2c 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -1043,6 +1043,8 @@ static int __init fcntl_init(void)
 		HWEIGHT32(
 			(VALID_OPEN_FLAGS & ~(O_NONBLOCK | O_NDELAY)) |
 			__FMODE_EXEC | __FMODE_NONOTIFY));
+	BUILD_BUG_ON(HWEIGHT32(VALID_OPENAT2_FLAGS) !=
+			HWEIGHT32(VALID_OPEN_FLAGS) + 1);
 
 	fasync_cache = kmem_cache_create("fasync_cache",
 					 sizeof(struct fasync_struct), 0,
diff --git a/fs/namei.c b/fs/namei.c
index dd50345f7260..aa5dcf57851b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3776,6 +3776,43 @@ static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file)
 	return error;
 }
 
+static const struct cred *openat2_init_creds(int dfd)
+{
+	struct cred *cred;
+	struct fd f;
+
+	if (dfd == AT_FDCWD)
+		return ERR_PTR(-EINVAL);
+
+	f = fdget_raw(dfd);
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+
+	cred = ERR_PTR(-EPERM);
+	if (!(f.file->f_flags & O_CRED_ALLOW))
+		goto done;
+
+	cred = prepare_creds();
+	if (!cred) {
+		cred = ERR_PTR(-ENOMEM);
+		goto done;
+	}
+
+	cred->fsuid = f.file->f_cred->fsuid;
+	cred->fsgid = f.file->f_cred->fsgid;
+	cred->group_info = get_group_info(f.file->f_cred->group_info);
+
+done:
+	fdput(f);
+	return cred;
+}
+
+static void openat2_done_creds(const struct cred *cred)
+{
+	put_group_info(cred->group_info);
+	put_cred(cred);
+}
+
 static struct file *path_openat(struct nameidata *nd,
 			const struct open_flags *op, unsigned flags)
 {
@@ -3793,18 +3830,33 @@ static struct file *path_openat(struct nameidata *nd,
 			error = do_o_path(nd, flags, file);
 	} else {
 		const char *s;
+		const struct cred *old_cred = NULL, *cred = NULL;
 
-		file = alloc_empty_file(open_flags, current_cred());
-		if (IS_ERR(file))
+		if (open_flags & OA2_CRED_INHERIT) {
+			cred = openat2_init_creds(nd->dfd);
+			if (IS_ERR(cred))
+				return ERR_CAST(cred);
+		}
+		file = alloc_empty_file(open_flags, cred ?: current_cred());
+		if (IS_ERR(file)) {
+			if (cred)
+				openat2_done_creds(cred);
 			return file;
+		}
 
 		s = path_init(nd, flags);
+		if (cred)
+			old_cred = override_creds(cred);
 		while (!(error = link_path_walk(s, nd)) &&
 		       (s = open_last_lookups(nd, file, op)) != NULL)
 			;
 		if (!error)
 			error = do_open(nd, file, op);
+		if (old_cred)
+			revert_creds(old_cred);
 		terminate_walk(nd);
+		if (cred)
+			openat2_done_creds(cred);
 	}
 	if (likely(!error)) {
 		if (likely(file->f_mode & FMODE_OPENED))
diff --git a/fs/open.c b/fs/open.c
index ee8460c83c77..dd4fab536135 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1225,7 +1225,7 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 	 * values before calling build_open_flags(), but openat2(2) checks all
 	 * of its arguments.
 	 */
-	if (flags & ~VALID_OPEN_FLAGS)
+	if (flags & ~VALID_OPENAT2_FLAGS)
 		return -EINVAL;
 	if (how->resolve & ~VALID_RESOLVE_FLAGS)
 		return -EINVAL;
@@ -1326,6 +1326,14 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
 		lookup_flags |= LOOKUP_CACHED;
 	}
 
+	if (flags & OA2_CRED_INHERIT) {
+		/* Inherit creds only with scoped look-up modes. */
+		if (!(lookup_flags & LOOKUP_IS_SCOPED))
+			return -EPERM;
+		/* Reject /proc "magic" links if inheriting creds. */
+		lookup_flags |= LOOKUP_NO_MAGICLINKS;
+	}
+
 	op->lookup_flags = lookup_flags;
 	return 0;
 }
diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
index e074ee9c1e36..33b9c7ad056b 100644
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -12,6 +12,8 @@
 	 FASYNC	| O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
 	 O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | O_CRED_ALLOW)
 
+#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_CRED_INHERIT)
+
 /* List of all valid flags for the how->resolve argument: */
 #define VALID_RESOLVE_FLAGS \
 	(RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \
diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h
index a5feb7604948..f803558ad62f 100644
--- a/include/uapi/linux/openat2.h
+++ b/include/uapi/linux/openat2.h
@@ -40,4 +40,6 @@ struct open_how {
 					return -EAGAIN if that's not
 					possible. */
 
+#define OA2_CRED_INHERIT		(1UL << 28)
+
 #endif /* _UAPI_LINUX_OPENAT2_H */
diff --git a/tools/testing/selftests/openat2/Makefile b/tools/testing/selftests/openat2/Makefile
index 254d676a2689..a1f4b5395f82 100644
--- a/tools/testing/selftests/openat2/Makefile
+++ b/tools/testing/selftests/openat2/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-or-later
 
 CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined -static-libasan
-TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test
+TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test cred_inherit
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/openat2/cred_inherit.c b/tools/testing/selftests/openat2/cred_inherit.c
new file mode 100644
index 000000000000..550a06763ac7
--- /dev/null
+++ b/tools/testing/selftests/openat2/cred_inherit.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/socket.h>
+#include <time.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "../kselftest.h"
+#include "helpers.h"
+
+#ifndef O_CRED_ALLOW
+#define O_CRED_ALLOW 0x2000000
+#endif
+
+#ifndef OA2_CRED_INHERIT
+#define OA2_CRED_INHERIT (1UL << 28)
+#endif
+
+enum { FD_NORM, FD_NCA, FD_DIR, FD_DCA, FD_MAX };
+
+int main(int argc, char *argv[], char *env[])
+{
+	struct open_how how1 = {
+				.flags = O_RDONLY,
+				.resolve = RESOLVE_BENEATH,
+			       };
+	struct open_how how2 = {
+				.flags = O_RDONLY | OA2_CRED_INHERIT,
+				.resolve = RESOLVE_BENEATH,
+			       };
+	int size = sizeof(struct open_how);
+	int i;
+	int fd;
+	int err;
+	int fds[FD_MAX];
+#define NFD(n) ((n) + 3)
+#define FD_OK(n) (NFD(n) == fds[n])
+
+	if (!openat2_supported) {
+		ksft_print_msg("openat2(2) unsupported\n");
+		return 0;
+	}
+
+	ksft_set_plan(14);
+
+	fds[FD_NORM] = open("/proc/self/maps", O_RDONLY);
+	fds[FD_NCA] = open("/proc/self/maps", O_RDONLY | O_CRED_ALLOW);
+	fds[FD_DIR] = open("/proc/self", O_RDONLY | O_DIRECTORY);
+	fds[FD_DCA] = open("/proc/self", O_RDONLY | O_DIRECTORY | O_CRED_ALLOW);
+	ksft_test_result(FD_OK(FD_NORM), "file open\n");
+	ksft_test_result(FD_OK(FD_NCA), "file open with O_CRED_ALLOW\n");
+	ksft_test_result(FD_OK(FD_DIR), "directory open\n");
+	ksft_test_result(FD_OK(FD_DCA), "directory open with O_CRED_ALLOW\n");
+
+	err = fchdir(fds[FD_DIR]);
+	if (err) {
+		ksft_perror("fchdir() failed");
+		ksft_exit_fail_msg("fchdir\n");
+		return 1;
+	}
+	fd = syscall(SYS_openat2, AT_FDCWD, "maps", &how1, size);
+	ksft_test_result(fd != -1, "AT_FDCWD success\n");
+	close(fd);
+	/* OA2_CRED_INHERIT fails with AT_FDCWD */
+	fd = syscall(SYS_openat2, AT_FDCWD, "maps", &how2, size);
+	ksft_test_result(fd == -1 && errno == EINVAL, "AT_FDCWD EINVAL\n");
+
+	fd = syscall(SYS_openat2, fds[FD_NORM], "maps", &how1, size);
+	ksft_test_result(fd == -1 && errno == ENOTDIR, "regilar file ENOTDIR\n");
+	/* No O_CRED_ALLOW -> EPERM */
+	fd = syscall(SYS_openat2, fds[FD_NORM], "maps", &how2, size);
+	ksft_test_result(fd == -1 && errno == EPERM, "regilar file EPERM\n");
+
+	fd = syscall(SYS_openat2, fds[FD_NCA], "maps", &how1, size);
+	ksft_test_result(fd == -1 && errno == ENOTDIR, "regilar file ENOTDIR\n");
+	fd = syscall(SYS_openat2, fds[FD_NCA], "maps", &how2, size);
+	ksft_test_result(fd == -1 && errno == ENOTDIR, "regilar file ENOTDIR\n");
+
+	fd = syscall(SYS_openat2, fds[FD_DIR], "maps", &how1, size);
+	ksft_test_result(fd != -1, "dir fd success\n");
+	close(fd);
+	/* No O_CRED_ALLOW -> EPERM */
+	fd = syscall(SYS_openat2, fds[FD_DIR], "maps", &how2, size);
+	ksft_test_result(fd == -1 && errno == EPERM, "dir fd EPERM\n");
+
+	fd = syscall(SYS_openat2, fds[FD_DCA], "maps", &how1, size);
+	ksft_test_result(fd != -1, "dir O_CRED_ALLOW fd success\n");
+	close(fd);
+	fd = syscall(SYS_openat2, fds[FD_DCA], "maps", &how2, size);
+	ksft_test_result(fd != -1, "dir O_CRED_ALLOW fd O_CRED_INHERIT success\n");
+	close(fd);
+
+	for (i = 0; i < FD_MAX; i++)
+		close(fds[i]);
+	ksft_finished();
+	return 0;
+}
diff --git a/tools/testing/selftests/openat2/openat2_test.c b/tools/testing/selftests/openat2/openat2_test.c
index 9024754530b2..5095288fe1ac 100644
--- a/tools/testing/selftests/openat2/openat2_test.c
+++ b/tools/testing/selftests/openat2/openat2_test.c
@@ -28,6 +28,10 @@
 #define	O_LARGEFILE 0x8000
 #endif
 
+#ifndef OA2_CRED_INHERIT
+#define OA2_CRED_INHERIT (1UL << 28)
+#endif
+
 struct open_how_ext {
 	struct open_how inner;
 	uint32_t extra1;
@@ -159,7 +163,7 @@ struct flag_test {
 	int err;
 };
 
-#define NUM_OPENAT2_FLAG_TESTS 25
+#define NUM_OPENAT2_FLAG_TESTS 27
 
 void test_openat2_flags(void)
 {
@@ -233,6 +237,12 @@ void test_openat2_flags(void)
 		{ .name = "invalid how.resolve and O_PATH",
 		  .how.flags = O_PATH,
 		  .how.resolve = 0x1337, .err = -EINVAL },
+		{ .name = "invalid how.resolve and OA2_CRED_INHERIT",
+		  .how.flags = OA2_CRED_INHERIT,
+		  .how.resolve = 0, .err = -EPERM },
+		{ .name = "invalid AT_FDCWD and OA2_CRED_INHERIT",
+		  .how.flags = OA2_CRED_INHERIT,
+		  .how.resolve = 0x08, .err = -EINVAL },
 
 		/* currently unknown upper 32 bit rejected. */
 		{ .name = "currently unknown bit (1 << 63)",
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 3/3] openat2: add OA2_CRED_INHERIT flag
  2024-04-27 11:24 ` [PATCH v6 3/3] openat2: add OA2_CRED_INHERIT flag Stas Sergeev
@ 2024-05-04 20:38   ` Donald Buczek
  2024-05-04 21:11     ` stsp
  0 siblings, 1 reply; 10+ messages in thread
From: Donald Buczek @ 2024-05-04 20:38 UTC (permalink / raw)
  To: Stas Sergeev, linux-kernel
  Cc: Stefan Metzmacher, Eric Biederman, Alexander Viro,
	Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton,
	Chuck Lever, Alexander Aring, David Laight, linux-fsdevel,
	linux-api, Paolo Bonzini, Christian Göttsche

On 4/27/24 13:24, Stas Sergeev wrote:
> This flag performs the open operation with the fs credentials
> (fsuid, fsgid, group_info) that were in effect when dir_fd was opened.
> dir_fd must be opened with O_CRED_ALLOW, or EPERM is returned.
> 
> Selftests are added to check for these properties as well as for
> the invalid flag combinations.
> 
> This allows the process to pre-open some directories and then
> change eUID (and all other UIDs/GIDs) to a less-privileged user,
> retaining the ability to open/create files within these directories.
> 
> Design goal:
> The idea is to provide a very light-weight sandboxing, where the
> process, without the use of any heavy-weight techniques like chroot
> within namespaces, can restrict the access to the set of pre-opened
> directories.
> This patch is just a first step to such sandboxing. If things go
> well, in the future the same extension can be added to more syscalls.
> These should include at least unlinkat(), renameat2() and the
> not-yet-upstreamed setxattrat().
> 
> Security considerations:
> - Only the bare minimal set of credentials is overridden:
>    fsuid, fsgid and group_info. The rest, for example capabilities,
>    are not overridden to avoid unneeded security risks.
> - To avoid sandboxing escape, this patch makes sure the restricted
>    lookup modes are used. Namely, RESOLVE_BENEATH or RESOLVE_IN_ROOT.
> - Magic /proc symlinks are discarded, as suggested by
>    Andy Lutomirski <luto@kernel.org>> - O_CRED_ALLOW fds cannot be passed via unix socket and are always
>    closed on exec() to prevent "unsuspecting userspace" from not being
>    able to fully drop privs.

What about hard links?

== snip ==
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <stdarg.h>
#include <fcntl.h>
#include <unistd.h>
#include <linux/openat2.h>

#define O_CRED_ALLOW 0x2000000
#define OA2_CRED_INHERIT (1UL << 28)

#define SYS_openat2 437
long openat2(int dirfd, const char *pathname, struct open_how *how, size_t size) {
     return syscall(SYS_openat2, dirfd, pathname, how, size);
}


__attribute__ ((noreturn, format(printf, 1, 2)))
static void die(const char *restrict fmt, ...) {
     va_list ap;
     va_start(ap, fmt);
     vfprintf(stderr, fmt, ap);
     va_end(ap);
     _exit(1);
}

int main() {

     unlink("/tmp/d/test.dat");
     unlink("/tmp/d/hostname");
     if (rmdir("/tmp/d") != 0 && errno != ENOENT)
         die("/tmp/d: %m\n");
     
     umask(0);
     if (mkdir("/tmp/d", 0777) != 0)
         die("/tmp/d: %m\n");

     int dirfd = open("/tmp/d", O_RDONLY + O_CRED_ALLOW);
     if (dirfd == -1)
         die("/tmp/d: %m\n");

     if (setuid(1000) != 0)
         die("setuid: %m\n");

     if (link("/etc/hostname", "/tmp/d/hostname") == -1)
         die ("/etc/hostname: %m\n");

     if(openat(dirfd, "hostname", O_RDWR) != -1)
         die("/tmp/d/hostname could be opened by uid 1000");

     {   struct open_how how = { .flags = O_RDWR + OA2_CRED_INHERIT, .resolve = RESOLVE_BENEATH };
         if (openat2(dirfd, "hostname", &how, sizeof(how)) == -1)
             die("hostname: %m\n");
         printf("able to open /etc/hostname RDWR \n");
     }
}

== snip ==


buczek@dose:~$ gcc -O0 -Wall -Wextra -Werror -g -o test test.c
buczek@dose:~$ sudo ./test
able to open /etc/hostname RDWR
buczek@dose:~$


-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 3/3] openat2: add OA2_CRED_INHERIT flag
  2024-05-04 20:38   ` Donald Buczek
@ 2024-05-04 21:11     ` stsp
  0 siblings, 0 replies; 10+ messages in thread
From: stsp @ 2024-05-04 21:11 UTC (permalink / raw)
  To: Donald Buczek, linux-kernel
  Cc: Stefan Metzmacher, Eric Biederman, Alexander Viro,
	Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton,
	Chuck Lever, Alexander Aring, David Laight, linux-fsdevel,
	linux-api, Paolo Bonzini, Christian Göttsche

04.05.2024 23:38, Donald Buczek пишет:
> On 4/27/24 13:24, Stas Sergeev wrote:
>> This flag performs the open operation with the fs credentials
>> (fsuid, fsgid, group_info) that were in effect when dir_fd was opened.
>> dir_fd must be opened with O_CRED_ALLOW, or EPERM is returned.
>>
>> Selftests are added to check for these properties as well as for
>> the invalid flag combinations.
>>
>> This allows the process to pre-open some directories and then
>> change eUID (and all other UIDs/GIDs) to a less-privileged user,
>> retaining the ability to open/create files within these directories.
>>
>> Design goal:
>> The idea is to provide a very light-weight sandboxing, where the
>> process, without the use of any heavy-weight techniques like chroot
>> within namespaces, can restrict the access to the set of pre-opened
>> directories.
>> This patch is just a first step to such sandboxing. If things go
>> well, in the future the same extension can be added to more syscalls.
>> These should include at least unlinkat(), renameat2() and the
>> not-yet-upstreamed setxattrat().
>>
>> Security considerations:
>> - Only the bare minimal set of credentials is overridden:
>>    fsuid, fsgid and group_info. The rest, for example capabilities,
>>    are not overridden to avoid unneeded security risks.
>> - To avoid sandboxing escape, this patch makes sure the restricted
>>    lookup modes are used. Namely, RESOLVE_BENEATH or RESOLVE_IN_ROOT.
>> - Magic /proc symlinks are discarded, as suggested by
>>    Andy Lutomirski <luto@kernel.org>> - O_CRED_ALLOW fds cannot be 
>> passed via unix socket and are always
>>    closed on exec() to prevent "unsuspecting userspace" from not being
>>    able to fully drop privs.
>
> What about hard links?
Well, you set umask to 0 in your example.
If you didn't do that, the dir wouldn't have
0777 perms, and the hard link would not
be created.
But yes, that demonstrates the unsafe
usage scenario, i.e. unsafe directory perms
immediately lead to a security hole.
Maybe O_CRED_ALLOW should check for
safe perms, or maybe it shouldn't... So far
there are no signs of this patch to ever be
accepted, so I am not sure if more complexity
needs to be added to it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()
  2024-04-27 11:24 [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2() Stas Sergeev
                   ` (2 preceding siblings ...)
  2024-04-27 11:24 ` [PATCH v6 3/3] openat2: add OA2_CRED_INHERIT flag Stas Sergeev
@ 2024-05-07  7:50 ` Aleksa Sarai
  2024-05-07  9:02   ` stsp
  3 siblings, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2024-05-07  7:50 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: linux-kernel, Stefan Metzmacher, Eric Biederman, Alexander Viro,
	Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton,
	Chuck Lever, Alexander Aring, David Laight, linux-fsdevel,
	linux-api, Paolo Bonzini, Christian Göttsche

[-- Attachment #1: Type: text/plain, Size: 2570 bytes --]

On 2024-04-27, Stas Sergeev <stsp2@yandex.ru> wrote:
> This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall.
> It is needed to perform an open operation with the creds that were in
> effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW
> flag. This allows the process to pre-open some dirs and switch eUID
> (and other UIDs/GIDs) to the less-privileged user, while still retaining
> the possibility to open/create files within the pre-opened directory set.
> 
> The sand-boxing is security-oriented: symlinks leading outside of a
> sand-box are rejected. /proc magic links are rejected. fds opened with
> O_CRED_ALLOW are always closed on exec() and cannot be passed via unix
> socket.
> The more detailed description (including security considerations)
> is available in the log messages of individual patches.

(I meant to reply last week but I couldn't get my mail server to send
mail...)

It seems to me that this can already be implemented using
MOUNT_ATTR_IDMAP, without creating a new form of credential overriding
within the filesystem (and with such a deceptively simple
implementation...)

If you are a privileged process which plans to change users, you can
create a detached tree with a user mapping that gives that user access
to only that tree. This is far more effective at restricting possible
attacks because id-mapped mounts don't override credentials during VFS
operations (meaning that if you miss something, you have a big problem),
instead they only affect uid-related operations within the filesystem
for that mount. Since this implementation does no inherit
CAP_DAC_OVERRIDE, being able to rewrite uid/gids is all you need.

A new attack I just thought of while writing this mail is that because
there is no RESOLVE_NO_XDEV requirement, it should be possible for the
process to get an arbitrary write primitive by creating a new
userns+mountns and then bind-mounting / underneath the directory. Since
O_CRED_INHERIT uses override_creds, it doesn't care about whether
something about the O_CRED_ALLOW directory changed afterwards. Yes, you
can "just fix this" by adding a RESOLVE_NO_XDEV requirement too, but
given that there have been 2-3 security issues with this design found
already, it makes me feel really uneasy. Using id-mapped mounts avoids
this issue because the new mount will not have the id-mapping applied
and thus there is no security issue.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()
  2024-05-07  7:50 ` [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2() Aleksa Sarai
@ 2024-05-07  9:02   ` stsp
  2024-05-07 11:58     ` Aleksa Sarai
  0 siblings, 1 reply; 10+ messages in thread
From: stsp @ 2024-05-07  9:02 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: linux-kernel, Stefan Metzmacher, Eric Biederman, Alexander Viro,
	Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton,
	Chuck Lever, Alexander Aring, David Laight, linux-fsdevel,
	linux-api, Paolo Bonzini, Christian Göttsche

07.05.2024 10:50, Aleksa Sarai пишет:
> If you are a privileged process which plans to change users,

Not privileged at all.
But I think what you say is still possible
with userns?


> A new attack I just thought of while writing this mail is that because
> there is no RESOLVE_NO_XDEV requirement, it should be possible for the
> process to get an arbitrary write primitive by creating a new
> userns+mountns and then bind-mounting / underneath the directory.
Doesn't this need a write perm to a
directory? In his case this is not a threat,
because you are not supposed to have a
write perm to that dir. OA2_CRED_INHERIT
is the only way to write.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()
  2024-05-07  9:02   ` stsp
@ 2024-05-07 11:58     ` Aleksa Sarai
  2024-05-07 12:48       ` stsp
  0 siblings, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2024-05-07 11:58 UTC (permalink / raw)
  To: stsp
  Cc: linux-kernel, Stefan Metzmacher, Eric Biederman, Alexander Viro,
	Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton,
	Chuck Lever, Alexander Aring, David Laight, linux-fsdevel,
	linux-api, Paolo Bonzini, Christian Göttsche

[-- Attachment #1: Type: text/plain, Size: 1758 bytes --]

On 2024-05-07, stsp <stsp2@yandex.ru> wrote:
> 07.05.2024 10:50, Aleksa Sarai пишет:
> > If you are a privileged process which plans to change users,
> 
> Not privileged at all. But I think what you say is still possible with
> userns?

It is possible to configure MOUNT_ATTR_IDMAP in a user namespace but
there are some restrictions that I suspect will make this complicated.
If you try to do something with a regular filesystem you'll probably run
into issues because you won't have CAP_SYS_ADMIN in the super block's
userns. But you could probably do it with tmpfs.

> > A new attack I just thought of while writing this mail is that because
> > there is no RESOLVE_NO_XDEV requirement, it should be possible for the
> > process to get an arbitrary write primitive by creating a new
> > userns+mountns and then bind-mounting / underneath the directory.
> Doesn't this need a write perm to a
> directory? In his case this is not a threat,
> because you are not supposed to have a
> write perm to that dir. OA2_CRED_INHERIT
> is the only way to write.

No, bind-mounts don't require write permission. As long as you can
resolve the target path you can bind-mount on top of it, so if there's a
subdirectory you can bind-mount / underneath (and if there is only a
file you can bind-mount any file you want to access/overwrite instead).

There are restrictions on mounting through /proc/self/fd/... but they
don't apply here (all files opened by a process doing setns/unshare have
their vfsmounts updated to be from the new mount namespace, meaning you
can do mounts through them with /proc/self/fd/... without issue.)

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()
  2024-05-07 11:58     ` Aleksa Sarai
@ 2024-05-07 12:48       ` stsp
  0 siblings, 0 replies; 10+ messages in thread
From: stsp @ 2024-05-07 12:48 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: linux-kernel, Stefan Metzmacher, Eric Biederman, Alexander Viro,
	Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton,
	Chuck Lever, Alexander Aring, David Laight, linux-fsdevel,
	linux-api, Paolo Bonzini, Christian Göttsche

07.05.2024 14:58, Aleksa Sarai пишет:
> On 2024-05-07, stsp <stsp2@yandex.ru> wrote:
>> 07.05.2024 10:50, Aleksa Sarai пишет:
>>> If you are a privileged process which plans to change users,
>> Not privileged at all. But I think what you say is still possible with
>> userns?
> It is possible to configure MOUNT_ATTR_IDMAP in a user namespace but
> there are some restrictions that I suspect will make this complicated.
> If you try to do something with a regular filesystem you'll probably run
> into issues because you won't have CAP_SYS_ADMIN in the super block's
> userns. But you could probably do it with tmpfs.

Then its likely not a replacement for
my proposal, as I really don't need that
on tmpfs.
Perhaps right now I can use the helper
process and an rpc as a replacement.
This is much more work and is slower,
but more or less can approximate my
original design decision quite precisely.
Another disadvantage of an rpc approach
is that the fds I get from the helper
process, can not be trusted, as in this
case kernel doesn't guarantee the fd
actually refers to the resource I requested.
I've seen a few OSes where rpc is checked
by a trusted entity to avoid such problem.

>>> A new attack I just thought of while writing this mail is that because
>>> there is no RESOLVE_NO_XDEV requirement, it should be possible for the
>>> process to get an arbitrary write primitive by creating a new
>>> userns+mountns and then bind-mounting / underneath the directory.
>> Doesn't this need a write perm to a
>> directory? In his case this is not a threat,
>> because you are not supposed to have a
>> write perm to that dir. OA2_CRED_INHERIT
>> is the only way to write.
> No, bind-mounts don't require write permission.

Oh, isn't this a problem by itself?
Yes, in this case my patch needs to
avoid RESOLVE_NO_XDEV, but I find this a harsh restriction. Maybe the 
bind mount was done before a priv drop? Then it is fully legitimate. 
Anyway, I don't know if I should work on it or not, as there seem to be 
no indication of a possible acceptance.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-07 12:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-27 11:24 [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2() Stas Sergeev
2024-04-27 11:24 ` [PATCH v6 1/3] fs: reorganize path_openat() Stas Sergeev
2024-04-27 11:24 ` [PATCH v6 2/3] open: add O_CRED_ALLOW flag Stas Sergeev
2024-04-27 11:24 ` [PATCH v6 3/3] openat2: add OA2_CRED_INHERIT flag Stas Sergeev
2024-05-04 20:38   ` Donald Buczek
2024-05-04 21:11     ` stsp
2024-05-07  7:50 ` [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2() Aleksa Sarai
2024-05-07  9:02   ` stsp
2024-05-07 11:58     ` Aleksa Sarai
2024-05-07 12:48       ` stsp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).