Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API
@ 2020-10-16 12:45 Sargun Dhillon
  2020-10-16 12:45 ` [RESEND PATCH v2 1/3] NFS: Use cred from fscontext during fsmount Sargun Dhillon
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Sargun Dhillon @ 2020-10-16 12:45 UTC (permalink / raw)
  To: J . Bruce Fields, Chuck Lever, Trond Myklebust, Anna Schumaker,
	David Howells
  Cc: Sargun Dhillon, linux-nfs, linux-fsdevel, linux-kernel

This patchset adds some functionality to allow NFS to be used from
NFS namespaces (containers).

Changes since v1:
  * Added samples

Sargun Dhillon (3):
  NFS: Use cred from fscontext during fsmount
  samples/vfs: Split out common code for new syscall APIs
  samples/vfs: Add example leveraging NFS with new APIs and user
    namespaces

 fs/nfs/client.c                        |   2 +-
 fs/nfs/flexfilelayout/flexfilelayout.c |   1 +
 fs/nfs/nfs4client.c                    |   2 +-
 samples/vfs/.gitignore                 |   2 +
 samples/vfs/Makefile                   |   5 +-
 samples/vfs/test-fsmount.c             |  86 +-----------
 samples/vfs/test-nfs-userns.c          | 181 +++++++++++++++++++++++++
 samples/vfs/vfs-helper.c               |  43 ++++++
 samples/vfs/vfs-helper.h               |  55 ++++++++
 9 files changed, 289 insertions(+), 88 deletions(-)
 create mode 100644 samples/vfs/test-nfs-userns.c
 create mode 100644 samples/vfs/vfs-helper.c
 create mode 100644 samples/vfs/vfs-helper.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RESEND PATCH v2 1/3] NFS: Use cred from fscontext during fsmount
  2020-10-16 12:45 [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API Sargun Dhillon
@ 2020-10-16 12:45 ` Sargun Dhillon
  2020-10-16 12:45 ` [RESEND PATCH v2 2/3] samples/vfs: Split out common code for new syscall APIs Sargun Dhillon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Sargun Dhillon @ 2020-10-16 12:45 UTC (permalink / raw)
  To: J . Bruce Fields, Chuck Lever, Trond Myklebust, Anna Schumaker,
	David Howells
  Cc: Sargun Dhillon, linux-nfs, linux-fsdevel, linux-kernel, Al Viro,
	Kyle Anderson

In several patches, support was introduced to NFS for user namespaces:

ccfe51a5161c: SUNRPC: Fix the server AUTH_UNIX userspace mappings
e6667c73a27d: SUNRPC: rsi_parse() should use the current user namespace
1a58e8a0e5c1: NFS: Store the credential of the mount process in the nfs_server
283ebe3ec415: SUNRPC: Use the client user namespace when encoding creds
ac83228a7101: SUNRPC: Use namespace of listening daemon in the client AUTH_GSS upcall
264d948ce7d0: NFS: Convert NFSv3 to use the container user namespace
58002399da65: NFSv4: Convert the NFS client idmapper to use the container user namespace
c207db2f5da5: NFS: Convert NFSv2 to use the container user namespace
3b7eb5e35d0f: NFS: When mounting, don't share filesystems between different user namespaces

All of these commits are predicated on the NFS server being created with
credentials that are in the user namespace of interest. The new VFS
mount APIs help in this[1], in that the creation of the FSFD (fsopen)
captures a set of credentials at creation time.

Normally, the new file system API users automatically get their
super block's user_ns set to the fc->user_ns in sget_fc, but since
NFS has to do special manipulation of UIDs / GIDs on the wire,
it keeps track of credentials itself.

Unfortunately, the credentials that the NFS uses are the current_creds
at the time FSCONFIG_CMD_CREATE is called. When FSCONFIG_CMD_CREATE is
called, simultaneously, mount_capable is checked -- which checks if
the user has CAP_SYS_ADMIN in the init_user_ns because NFS does not
have FS_USERNS_MOUNT.

This makes a subtle change so that the struct cred from fsopen
is used instead. Since the fs_context is available at server
creation time, and it has the credentials, we can just use
those.

This roughly allows a privileged user to mount on behalf of an unprivileged
usernamespace, by forking off and calling fsopen in the unprivileged user
namespace. It can then pass back that fsfd to the privileged process which
can configure the NFS mount, and then it can call FSCONFIG_CMD_CREATE
before switching back into the mount namespace of the container, and finish
up the mounting process and call fsmount and move_mount.

This change makes a small user space change if the user performs this
elaborate process of passing around file descriptors, and switching
namespaces. There may be a better way to go about this, or even enable
FS_USERNS_MOUNT on NFS, but this seems like the safest and most
straightforward approach.

[1]: https://lore.kernel.org/linux-fsdevel/155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk/

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kyle Anderson <kylea@netflix.com>
---
 fs/nfs/client.c     | 2 +-
 fs/nfs/nfs4client.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index f1ff3076e4a4..fdefcc649884 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -967,7 +967,7 @@ struct nfs_server *nfs_create_server(struct fs_context *fc)
 	if (!server)
 		return ERR_PTR(-ENOMEM);
 
-	server->cred = get_cred(current_cred());
+	server->cred = get_cred(fc->cred);
 
 	error = -ENOMEM;
 	fattr = nfs_alloc_fattr();
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 0bd77cc1f639..92ff6fb8e324 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -1120,7 +1120,7 @@ struct nfs_server *nfs4_create_server(struct fs_context *fc)
 	if (!server)
 		return ERR_PTR(-ENOMEM);
 
-	server->cred = get_cred(current_cred());
+	server->cred = get_cred(fc->cred);
 
 	auth_probe = ctx->auth_info.flavor_len < 1;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RESEND PATCH v2 2/3] samples/vfs: Split out common code for new syscall APIs
  2020-10-16 12:45 [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API Sargun Dhillon
  2020-10-16 12:45 ` [RESEND PATCH v2 1/3] NFS: Use cred from fscontext during fsmount Sargun Dhillon
@ 2020-10-16 12:45 ` Sargun Dhillon
  2020-10-16 12:45 ` [RESEND PATCH v2 3/3] samples/vfs: Add example leveraging NFS with new APIs and user namespaces Sargun Dhillon
  2020-10-17 23:59 ` [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API Sargun Dhillon
  3 siblings, 0 replies; 5+ messages in thread
From: Sargun Dhillon @ 2020-10-16 12:45 UTC (permalink / raw)
  To: J . Bruce Fields, Chuck Lever, Trond Myklebust, Anna Schumaker,
	David Howells
  Cc: Sargun Dhillon, linux-nfs, linux-fsdevel, linux-kernel, Al Viro,
	Kyle Anderson

There are a bunch of helper functions which make using the new
mount APIs much easier. As we add examples of leveraging the
new APIs, it probably makes sense to promote code reuse.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kyle Anderson <kylea@netflix.com>
---
 samples/vfs/Makefile       |  2 +
 samples/vfs/test-fsmount.c | 86 +-------------------------------------
 samples/vfs/vfs-helper.c   | 43 +++++++++++++++++++
 samples/vfs/vfs-helper.h   | 55 ++++++++++++++++++++++++
 4 files changed, 101 insertions(+), 85 deletions(-)
 create mode 100644 samples/vfs/vfs-helper.c
 create mode 100644 samples/vfs/vfs-helper.h

diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile
index 00b6824f9237..7f76875eaa70 100644
--- a/samples/vfs/Makefile
+++ b/samples/vfs/Makefile
@@ -1,5 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
+test-fsmount-objs := test-fsmount.o vfs-helper.o
 userprogs := test-fsmount test-statx
+
 always-y := $(userprogs)
 
 userccflags += -I usr/include
diff --git a/samples/vfs/test-fsmount.c b/samples/vfs/test-fsmount.c
index 50f47b72e85f..36a4fa886200 100644
--- a/samples/vfs/test-fsmount.c
+++ b/samples/vfs/test-fsmount.c
@@ -14,91 +14,7 @@
 #include <sys/wait.h>
 #include <linux/mount.h>
 #include <linux/unistd.h>
-
-#define E(x) do { if ((x) == -1) { perror(#x); exit(1); } } while(0)
-
-static void check_messages(int fd)
-{
-	char buf[4096];
-	int err, n;
-
-	err = errno;
-
-	for (;;) {
-		n = read(fd, buf, sizeof(buf));
-		if (n < 0)
-			break;
-		n -= 2;
-
-		switch (buf[0]) {
-		case 'e':
-			fprintf(stderr, "Error: %*.*s\n", n, n, buf + 2);
-			break;
-		case 'w':
-			fprintf(stderr, "Warning: %*.*s\n", n, n, buf + 2);
-			break;
-		case 'i':
-			fprintf(stderr, "Info: %*.*s\n", n, n, buf + 2);
-			break;
-		}
-	}
-
-	errno = err;
-}
-
-static __attribute__((noreturn))
-void mount_error(int fd, const char *s)
-{
-	check_messages(fd);
-	fprintf(stderr, "%s: %m\n", s);
-	exit(1);
-}
-
-/* Hope -1 isn't a syscall */
-#ifndef __NR_fsopen
-#define __NR_fsopen -1
-#endif
-#ifndef __NR_fsmount
-#define __NR_fsmount -1
-#endif
-#ifndef __NR_fsconfig
-#define __NR_fsconfig -1
-#endif
-#ifndef __NR_move_mount
-#define __NR_move_mount -1
-#endif
-
-
-static inline int fsopen(const char *fs_name, unsigned int flags)
-{
-	return syscall(__NR_fsopen, fs_name, flags);
-}
-
-static inline int fsmount(int fsfd, unsigned int flags, unsigned int ms_flags)
-{
-	return syscall(__NR_fsmount, fsfd, flags, ms_flags);
-}
-
-static inline int fsconfig(int fsfd, unsigned int cmd,
-			   const char *key, const void *val, int aux)
-{
-	return syscall(__NR_fsconfig, fsfd, cmd, key, val, aux);
-}
-
-static inline int move_mount(int from_dfd, const char *from_pathname,
-			     int to_dfd, const char *to_pathname,
-			     unsigned int flags)
-{
-	return syscall(__NR_move_mount,
-		       from_dfd, from_pathname,
-		       to_dfd, to_pathname, flags);
-}
-
-#define E_fsconfig(fd, cmd, key, val, aux)				\
-	do {								\
-		if (fsconfig(fd, cmd, key, val, aux) == -1)		\
-			mount_error(fd, key ?: "create");		\
-	} while (0)
+#include "vfs-helper.h"
 
 int main(int argc, char *argv[])
 {
diff --git a/samples/vfs/vfs-helper.c b/samples/vfs/vfs-helper.c
new file mode 100644
index 000000000000..136c6cb81540
--- /dev/null
+++ b/samples/vfs/vfs-helper.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include "vfs-helper.h"
+
+void check_messages(int fd)
+{
+	char buf[4096];
+	int err, n;
+
+	err = errno;
+
+	for (;;) {
+		n = read(fd, buf, sizeof(buf));
+		if (n < 0)
+			break;
+		n -= 2;
+
+		switch (buf[0]) {
+		case 'e':
+			fprintf(stderr, "Error: %*.*s\n", n, n, buf + 2);
+			break;
+		case 'w':
+			fprintf(stderr, "Warning: %*.*s\n", n, n, buf + 2);
+			break;
+		case 'i':
+			fprintf(stderr, "Info: %*.*s\n", n, n, buf + 2);
+			break;
+		}
+	}
+
+	errno = err;
+}
+
+__attribute__((noreturn))
+void mount_error(int fd, const char *s)
+{
+	check_messages(fd);
+	fprintf(stderr, "%s: %m\n", s);
+	exit(1);
+}
diff --git a/samples/vfs/vfs-helper.h b/samples/vfs/vfs-helper.h
new file mode 100644
index 000000000000..28c441f2fcbf
--- /dev/null
+++ b/samples/vfs/vfs-helper.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#include <linux/unistd.h>
+#include <linux/mount.h>
+#include <sys/syscall.h>
+
+#define E(x) do { if ((x) == -1) { perror(#x); exit(1); } } while(0)
+
+/* Hope -1 isn't a syscall */
+#ifndef __NR_fsopen
+#define __NR_fsopen -1
+#endif
+#ifndef __NR_fsmount
+#define __NR_fsmount -1
+#endif
+#ifndef __NR_fsconfig
+#define __NR_fsconfig -1
+#endif
+#ifndef __NR_move_mount
+#define __NR_move_mount -1
+#endif
+
+#define E_fsconfig(fd, cmd, key, val, aux)				\
+	do {								\
+		if (fsconfig(fd, cmd, key, val, aux) == -1)		\
+			mount_error(fd, key ?: "create");		\
+	} while (0)
+
+static inline int fsopen(const char *fs_name, unsigned int flags)
+{
+	return syscall(__NR_fsopen, fs_name, flags);
+}
+
+static inline int fsmount(int fsfd, unsigned int flags, unsigned int ms_flags)
+{
+	return syscall(__NR_fsmount, fsfd, flags, ms_flags);
+}
+
+static inline int fsconfig(int fsfd, unsigned int cmd,
+			   const char *key, const void *val, int aux)
+{
+	return syscall(__NR_fsconfig, fsfd, cmd, key, val, aux);
+}
+
+static inline int move_mount(int from_dfd, const char *from_pathname,
+			     int to_dfd, const char *to_pathname,
+			     unsigned int flags)
+{
+	return syscall(__NR_move_mount,
+		       from_dfd, from_pathname,
+		       to_dfd, to_pathname, flags);
+}
+
+__attribute__((noreturn))
+void mount_error(int fd, const char *s);
+void check_messages(int fd);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RESEND PATCH v2 3/3] samples/vfs: Add example leveraging NFS with new APIs and user namespaces
  2020-10-16 12:45 [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API Sargun Dhillon
  2020-10-16 12:45 ` [RESEND PATCH v2 1/3] NFS: Use cred from fscontext during fsmount Sargun Dhillon
  2020-10-16 12:45 ` [RESEND PATCH v2 2/3] samples/vfs: Split out common code for new syscall APIs Sargun Dhillon
@ 2020-10-16 12:45 ` Sargun Dhillon
  2020-10-17 23:59 ` [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API Sargun Dhillon
  3 siblings, 0 replies; 5+ messages in thread
From: Sargun Dhillon @ 2020-10-16 12:45 UTC (permalink / raw)
  To: J . Bruce Fields, Chuck Lever, Trond Myklebust, Anna Schumaker,
	David Howells
  Cc: Sargun Dhillon, linux-nfs, linux-fsdevel, linux-kernel, Al Viro,
	Kyle Anderson

This adds an example which assumes you already have an NFS server setup,
but does the work of creating a user namespace, and an NFS mount from
that user namespace which then exposes different UIDs than that of
the init user namespace.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kyle Anderson <kylea@netflix.com>
---
 fs/nfs/flexfilelayout/flexfilelayout.c |   1 +
 samples/vfs/.gitignore                 |   2 +
 samples/vfs/Makefile                   |   3 +-
 samples/vfs/test-nfs-userns.c          | 181 +++++++++++++++++++++++++
 4 files changed, 186 insertions(+), 1 deletion(-)
 create mode 100644 samples/vfs/test-nfs-userns.c

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index f9348ed1bcda..ee45ff7d75ac 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -361,6 +361,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
 		     struct nfs4_layoutget_res *lgr,
 		     gfp_t gfp_flags)
 {
+	struct user_namespace *user_ns = lh->plh_lc_cred->user_ns;
 	struct pnfs_layout_segment *ret;
 	struct nfs4_ff_layout_segment *fls = NULL;
 	struct xdr_stream stream;
diff --git a/samples/vfs/.gitignore b/samples/vfs/.gitignore
index 8fdabf7e5373..1d09826b31a6 100644
--- a/samples/vfs/.gitignore
+++ b/samples/vfs/.gitignore
@@ -1,3 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 test-fsmount
 test-statx
+test-nfs-userns
+
diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile
index 7f76875eaa70..6a2926080c08 100644
--- a/samples/vfs/Makefile
+++ b/samples/vfs/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 test-fsmount-objs := test-fsmount.o vfs-helper.o
-userprogs := test-fsmount test-statx
+test-nfs-userns-objs := test-nfs-userns.o vfs-helper.o
+userprogs := test-fsmount test-statx test-nfs-userns
 
 always-y := $(userprogs)
 
diff --git a/samples/vfs/test-nfs-userns.c b/samples/vfs/test-nfs-userns.c
new file mode 100644
index 000000000000..108af924cbdd
--- /dev/null
+++ b/samples/vfs/test-nfs-userns.c
@@ -0,0 +1,181 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <linux/unistd.h>
+#include <assert.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include "vfs-helper.h"
+
+
+#define WELL_KNOWN_FD	100
+
+static inline int pidfd_open(pid_t pid, unsigned int flags)
+{
+	return syscall(__NR_pidfd_open, pid, flags);
+}
+
+static inline int pidfd_getfd(int pidfd, int fd, int flags)
+{
+	return syscall(__NR_pidfd_getfd, pidfd, fd, flags);
+}
+
+static void write_to_path(const char *path, const char *str)
+{
+	int fd, len = strlen(str);
+
+	fd = open(path, O_WRONLY);
+	if (fd < 0) {
+		fprintf(stderr, "Can't open %s: %s\n", path, strerror(errno));
+		exit(1);
+	}
+
+	if (write(fd, str, len) != len) {
+		fprintf(stderr, "Can't write string: %s\n", strerror(errno));
+		exit(1);
+	}
+
+	E(close(fd));
+}
+
+static int do_work(int sk)
+{
+	int fsfd;
+
+	E(unshare(CLONE_NEWNS|CLONE_NEWUSER));
+
+	fsfd = fsopen("nfs4", 0);
+	E(fsfd);
+
+	E(send(sk, &fsfd, sizeof(fsfd), 0));
+	// Wait for the other side to close / finish / wrap up
+	recv(sk, &fsfd, sizeof(fsfd), 0);
+	E(close(sk));
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int pidfd, mntfd, fsfd, fsfdnum, status, sk_pair[2];
+	struct statx statxbuf;
+	char buf[1024];
+	pid_t pid;
+
+	if (mkdir("/mnt/share", 0777) && errno != EEXIST) {
+		perror("mkdir");
+		return 1;
+	}
+
+	E(chmod("/mnt/share", 0777));
+
+	if (mkdir("/mnt/nfs", 0755) && errno != EEXIST) {
+		perror("mkdir");
+		return 1;
+	}
+
+	if (unlink("/mnt/share/newfile") && errno != ENOENT) {
+		perror("unlink");
+		return 1;
+	}
+
+	E(creat("/mnt/share/testfile", 0644));
+	E(chown("/mnt/share/testfile", 1001, 1001));
+
+	/* exportfs is idempotent, but expects nfs-server to be running */
+	if (system("exportfs -o no_root_squash,no_subtree_check,rw 127.0.0.0/8:/mnt/share")) {
+		fprintf(stderr,
+			"Could not export /mnt/share. Is NFS the server running?\n");
+		return 1;
+	}
+
+	E(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair));
+
+	pid = fork();
+	E(pid);
+	if (pid == 0) {
+		E(close(sk_pair[0]));
+		return do_work(sk_pair[1]);
+	}
+
+	E(close(sk_pair[1]));
+
+	pidfd = pidfd_open(pid, 0);
+	E(pidfd);
+
+	E(recv(sk_pair[0], &fsfdnum, sizeof(fsfdnum), 0));
+
+	fsfd = pidfd_getfd(pidfd, fsfdnum, 0);
+	if (fsfd == -1) {
+		perror("pidfd_getfd");
+		return 1;
+	}
+
+
+	snprintf(buf, sizeof(buf) - 1, "/proc/%d/uid_map", pid);
+	write_to_path(buf, "0 1000 2");
+	snprintf(buf, sizeof(buf) - 1, "/proc/%d/setgroups", pid);
+	write_to_path(buf, "deny");
+	snprintf(buf, sizeof(buf) - 1, "/proc/%d/gid_map", pid);
+	write_to_path(buf, "0 1000 2");
+
+	/* Now we can proceed to mount */
+	E_fsconfig(fsfd, FSCONFIG_SET_STRING, "vers", "4.1", 0);
+	E_fsconfig(fsfd, FSCONFIG_SET_STRING, "clientaddr", "127.0.0.1", 0);
+	E_fsconfig(fsfd, FSCONFIG_SET_STRING, "addr", "127.0.0.1", 0);
+	E_fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "127.0.0.1:/mnt/share",
+		   0);
+	E_fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+
+	/* Move into the namespace's of the worker */
+	E(setns(pidfd, CLONE_NEWNS|CLONE_NEWUSER));
+	E(close(pidfd));
+
+	/* Close our socket pair indicating the child should exit */
+	E(close(sk_pair[0]));
+	assert(waitpid(pid, &status, 0) == pid);
+	if (!WIFEXITED(status) || WEXITSTATUS(status)) {
+		fprintf(stderr, "worker exited nonzero\n");
+		return 1;
+	}
+
+	E(setuid(0));
+	E(setgid(0));
+
+	/* Now do all the work of moving doing the mount in the child ns */
+	E(syscall(__NR_mount, NULL, "/", NULL, MS_REC|MS_PRIVATE, NULL));
+
+	mntfd = fsmount(fsfd, 0, MS_NODEV);
+	if (mntfd < 0) {
+		E(close(fsfd));
+		mount_error(fsfd, "fsmount");
+	}
+
+	E(move_mount(mntfd, "", AT_FDCWD, "/mnt/nfs", MOVE_MOUNT_F_EMPTY_PATH));
+	E(close(mntfd));
+
+	/* Create the file through NFS */
+	E(creat("/mnt/nfs/newfile", 0644));
+	/* Check what the file's status is on the disk, accessed directly */
+	E(statx(AT_FDCWD, "/mnt/share/newfile", 0, STATX_UID|STATX_GID,
+		&statxbuf));
+	assert(statxbuf.stx_uid == 0);
+	assert(statxbuf.stx_gid == 0);
+
+	E(statx(AT_FDCWD, "/mnt/nfs/testfile", 0, STATX_UID|STATX_GID,
+		&statxbuf));
+	assert(statxbuf.stx_uid == 1);
+	assert(statxbuf.stx_gid == 1);
+
+
+	return 0;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API
  2020-10-16 12:45 [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API Sargun Dhillon
                   ` (2 preceding siblings ...)
  2020-10-16 12:45 ` [RESEND PATCH v2 3/3] samples/vfs: Add example leveraging NFS with new APIs and user namespaces Sargun Dhillon
@ 2020-10-17 23:59 ` Sargun Dhillon
  3 siblings, 0 replies; 5+ messages in thread
From: Sargun Dhillon @ 2020-10-17 23:59 UTC (permalink / raw)
  To: J . Bruce Fields, Chuck Lever, Trond Myklebust, Anna Schumaker,
	David Howells
  Cc: linux-nfs, linux-fsdevel, linux-kernel

On Fri, Oct 16, 2020 at 05:45:47AM -0700, Sargun Dhillon wrote:
> This patchset adds some functionality to allow NFS to be used from
> NFS namespaces (containers).
> 
> Changes since v1:
>   * Added samples
> 
> Sargun Dhillon (3):
>   NFS: Use cred from fscontext during fsmount
>   samples/vfs: Split out common code for new syscall APIs
>   samples/vfs: Add example leveraging NFS with new APIs and user
>     namespaces
> 
>  fs/nfs/client.c                        |   2 +-
>  fs/nfs/flexfilelayout/flexfilelayout.c |   1 +
>  fs/nfs/nfs4client.c                    |   2 +-
>  samples/vfs/.gitignore                 |   2 +
>  samples/vfs/Makefile                   |   5 +-
>  samples/vfs/test-fsmount.c             |  86 +-----------
>  samples/vfs/test-nfs-userns.c          | 181 +++++++++++++++++++++++++
>  samples/vfs/vfs-helper.c               |  43 ++++++
>  samples/vfs/vfs-helper.h               |  55 ++++++++
>  9 files changed, 289 insertions(+), 88 deletions(-)
>  create mode 100644 samples/vfs/test-nfs-userns.c
>  create mode 100644 samples/vfs/vfs-helper.c
>  create mode 100644 samples/vfs/vfs-helper.h
> 
> -- 
> 2.25.1
> 

Digging deeper into this a little bit, I actually found that there is some 
problematic aspects of the current behaviour. Because nfs_get_tree_common calls 
sget_fc, and sget_fc sets the super block's s_user_ns (via alloc_super) to the 
fs_context's user namespace unless the global flag is set (which NFS does not 
set), there are a bunch of permissions checks that are done against the super 
block's user_ns.

It looks like this was introduced in:
f2aedb713c28: NFS: Add fs_context support[1]

It turns out that unmapped users in the "parent" user namespace just get an 
EOVERFLOW error when trying to perform a read, even if the UID sent to the NFS 
server to read a file is a valid uid (the uid in the init user ns), and 
inode_permission checks permissions against the mapped UID in the namespace, 
while the authentication credentials (UIDs, GIDs) sent to the server are
those from the init user ns.

[This is all under the assumption there's not upcalls doing ID mapping]

Although, I do not think this presents any security risk (because you have to 
have CAP_SYS_ADMIN in the init user ns to get this far), it definitely seems
like "incorrect" behaviour.

[1]: https://lore.kernel.org/linux-nfs/20191120152750.6880-26-smayhew@redhat.com/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-16 12:45 [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API Sargun Dhillon
2020-10-16 12:45 ` [RESEND PATCH v2 1/3] NFS: Use cred from fscontext during fsmount Sargun Dhillon
2020-10-16 12:45 ` [RESEND PATCH v2 2/3] samples/vfs: Split out common code for new syscall APIs Sargun Dhillon
2020-10-16 12:45 ` [RESEND PATCH v2 3/3] samples/vfs: Add example leveraging NFS with new APIs and user namespaces Sargun Dhillon
2020-10-17 23:59 ` [RESEND PATCH v2 0/3] NFS User Namespaces with new mount API Sargun Dhillon

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git