linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3 v3] devpts: handle bind-mounts
@ 2018-03-12 17:13 Christian Brauner
  2018-03-12 17:13 ` [PATCH 1/3 v3] devpts: hoist out check for DEVPTS_SUPER_MAGIC Christian Brauner
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Christian Brauner @ 2018-03-12 17:13 UTC (permalink / raw)
  To: viro, linux-kernel, ebiederm, torvalds; +Cc: Christian Brauner

Hey everyone,

This is the third iteration of this patch. Major changes are:
- rewritten commit message to thoroughly analyse the problem for
  posterity in Subject: [PATCH 2/3 v3] devpts: resolve devpts bind-mounts
  and in this cover letter.
- extended selftests to test for correct handling of /dev/pts/ptmx
  bind-mounts to /dev/ptmx and non-standard devpts mounts such as
  mount -t devpts devpts /mnt

Most libcs will still look at /dev/ptmx when opening the master fd of a pty
device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
ioctl() is used to safely retrieve a file descriptor for the slave side of
the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
point to /.

When the kernel tries to look up the root mount of the dentry for the slave
file descriptor it will detect that the dentry is escaping its bind-mount
since the root mount of the dentry is /dev/pts where the devpts is mounted
but the root mount of /dev/ptmx is /dev.

Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
regression. In addition, it is also a fairly common scenario in containers
employing user namespaces.

To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to
walk up the bind-mounts for /dev/ptmx in devpts_mntget(). Since the
contents of /proc/<pid>/fd/<nr> symlinks attached to the slave side of a
file descriptor will always point to a path under the devpts mount we need
to try and ensure that the kernel doesn't falsely get the impression that a
pty slave file descriptor retrieved via TIOCGPTPEER based on a pty master
file descriptor opened via a bind-mount of the ptmx device escapes its
bind-mount. To clarify in pseudo code:

* bind-mount /dev/pts/ptmx to /dev/ptmx
* master = open("/dev/ptmx", O_RDWR | O_NOCTTY | O_CLOEXEC);
* slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY | O_CLOEXEC);

would cause the kernel to think that slave is escaping its bind-mount. The
reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at
/dev as its parent mount:

21 -- -- / /dev
-- 21 -- / /dev/pts

they are on different devices

-- -- 0:6  / /dev
-- -- 0:20 / /dev/pts

which has the consequence that the pathname of the directory which forms
the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to
/dev/ptmx we will end up on the same device as the devtmpfs mount at
/dev/pts

-- -- 0:20 /ptmx /dev/ptmx

Without the bind-mount resolution patch here the kernel will now perform
the bind-mount escape check directly on /dev/ptmx. When it hits
devpts_ptmx_path()  calls pts_path() which in turn calls
path_parent_directory(). While one would expect that
path_parent_directory() *should* yield /dev it will yield / since
/dev and /dev/pts are on different devices. This will cause path_pts() to
fail finding a "pts" directory since there is none under /. Thus, the
kernel detects that /dev/ptmx is escaping its bind-mount and will set
/proc/<pid>/fd/<nr> to /.

This patch changes the logic to do bind-mount resolution and after the
bind-mount has been resolved (i.e. we have traced it back to the devpts
mount) we can safely perform devpts_ptmx_path() and check whether we find a
"pts" directory in the parent directory of the devpts mount. Since
path_parent_directory() will now correctly yield /dev as parent directory
for the devpts mount at /dev/pts.

However, we can only perform devpts_ptmx_path() devpts_mntget() if we
either did resolve a bind-mount or did not find a suitable devpts
filesystem. The reason is that we want and need to support non-standard
mountpoints for the devpts filesystem. If we call devpts_ptmx_path()
although we did already find a devpts filesystem and did not resolve
bind-mounts we will fail on devpts mounts such as:

mount -t devpts devpts /mnt

where no "pts" directory will be under /. So change the logic to account
for this.

Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in
its openpty() implementation:

unshare --mount
mount --bind /dev/pts/ptmx /dev/ptmx
chmod 666 /dev/ptmx
script
ls -al /proc/self/fd/0

with output:

lrwx------ 1 chb chb 64 Mar  7 16:41 /proc/self/fd/0 -> /

Christian Brauner (3):
  devpts: hoist out check for DEVPTS_SUPER_MAGIC
  devpts: resolve devpts bind-mounts
  selftests: add devpts selftests

 fs/devpts/inode.c                                |  35 ++-
 tools/testing/selftests/Makefile                 |   1 +
 tools/testing/selftests/filesystems/.gitignore   |   1 +
 tools/testing/selftests/filesystems/Makefile     |   2 +-
 tools/testing/selftests/filesystems/devpts_pts.c | 281 +++++++++++++++++++++++
 5 files changed, 308 insertions(+), 12 deletions(-)
 create mode 100644 tools/testing/selftests/filesystems/devpts_pts.c

-- 
2.15.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3 v3] devpts: hoist out check for DEVPTS_SUPER_MAGIC
  2018-03-12 17:13 [PATCH 0/3 v3] devpts: handle bind-mounts Christian Brauner
@ 2018-03-12 17:13 ` Christian Brauner
  2018-03-12 17:13 ` [PATCH 2/3 v3] devpts: resolve devpts bind-mounts Christian Brauner
  2018-03-12 17:13 ` [PATCH 3/3 v3] selftests: add devpts selftests Christian Brauner
  2 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2018-03-12 17:13 UTC (permalink / raw)
  To: viro, linux-kernel, ebiederm, torvalds; +Cc: Christian Brauner

Hoist the check whether we have already found a suitable devpts filesystem
out of devpts_ptmx_path() in preparation for the devpts bind-mount
resolution patch. This is a non-functional change.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
ChangeLog v2->v3:
* patch unchanged
ChangeLog v1->v2:
* patch added
ChangeLog v0->v1:
* patch not present
---
 fs/devpts/inode.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index e31d6ed3ec32..d3ddbb888874 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -138,10 +138,6 @@ static int devpts_ptmx_path(struct path *path)
 	struct super_block *sb;
 	int err;
 
-	/* Has the devpts filesystem already been found? */
-	if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
-		return 0;
-
 	/* Is a devpts filesystem at "pts" in the same directory? */
 	err = path_pts(path);
 	if (err)
@@ -159,17 +155,22 @@ static int devpts_ptmx_path(struct path *path)
 struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi)
 {
 	struct path path;
-	int err;
 
 	path = filp->f_path;
 	path_get(&path);
 
-	err = devpts_ptmx_path(&path);
-	dput(path.dentry);
-	if (err) {
-		mntput(path.mnt);
-		return ERR_PTR(err);
+	/* Has the devpts filesystem already been found? */
+	if (path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) {
+		int err;
+
+		err = devpts_ptmx_path(&path);
+		dput(path.dentry);
+		if (err) {
+			mntput(path.mnt);
+			return ERR_PTR(err);
+		}
 	}
+
 	if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) {
 		mntput(path.mnt);
 		return ERR_PTR(-ENODEV);
@@ -182,15 +183,19 @@ struct pts_fs_info *devpts_acquire(struct file *filp)
 	struct pts_fs_info *result;
 	struct path path;
 	struct super_block *sb;
-	int err;
 
 	path = filp->f_path;
 	path_get(&path);
 
-	err = devpts_ptmx_path(&path);
-	if (err) {
-		result = ERR_PTR(err);
-		goto out;
+	/* Has the devpts filesystem already been found? */
+	if (path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) {
+		int err;
+
+		err = devpts_ptmx_path(&path);
+		if (err) {
+			result = ERR_PTR(err);
+			goto out;
+		}
 	}
 
 	/*
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3 v3] devpts: resolve devpts bind-mounts
  2018-03-12 17:13 [PATCH 0/3 v3] devpts: handle bind-mounts Christian Brauner
  2018-03-12 17:13 ` [PATCH 1/3 v3] devpts: hoist out check for DEVPTS_SUPER_MAGIC Christian Brauner
@ 2018-03-12 17:13 ` Christian Brauner
  2018-03-12 19:52   ` Eric W. Biederman
  2018-03-12 17:13 ` [PATCH 3/3 v3] selftests: add devpts selftests Christian Brauner
  2 siblings, 1 reply; 8+ messages in thread
From: Christian Brauner @ 2018-03-12 17:13 UTC (permalink / raw)
  To: viro, linux-kernel, ebiederm, torvalds; +Cc: Christian Brauner

Most libcs will still look at /dev/ptmx when opening the master fd of a pty
device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
ioctl() is used to safely retrieve a file descriptor for the slave side of
the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
point to /.

When the kernel tries to look up the root mount of the dentry for the slave
file descriptor it will detect that the dentry is escaping its bind-mount
since the root mount of the dentry is /dev/pts where the devpts is mounted
but the root mount of /dev/ptmx is /dev.

Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
regression. In addition, it is also a fairly common scenario in containers
employing user namespaces.

To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to
walk up the bind-mounts for /dev/ptmx in devpts_mntget(). Since the
contents of /proc/<pid>/fd/<nr> symlinks attached to the slave side of a
file descriptor will always point to a path under the devpts mount we need
to try and ensure that the kernel doesn't falsely get the impression that a
pty slave file descriptor retrieved via TIOCGPTPEER based on a pty master
file descriptor opened via a bind-mount of the ptmx device escapes its
bind-mount. To clarify in pseudo code:

* bind-mount /dev/pts/ptmx to /dev/ptmx
* master = open("/dev/ptmx", O_RDWR | O_NOCTTY | O_CLOEXEC);
* slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY | O_CLOEXEC);

would cause the kernel to think that slave is escaping its bind-mount. The
reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at
/dev as its parent mount:

21 -- -- / /dev
-- 21 -- / /dev/pts

they are on different devices

-- -- 0:6  / /dev
-- -- 0:20 / /dev/pts

which has the consequence that the pathname of the directory which forms
the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to
/dev/ptmx we will end up on the same device as the devtmpfs mount at
/dev/pts

-- -- 0:20 /ptmx /dev/ptmx

Without the bind-mount resolution patch here the kernel will now perform
the bind-mount escape check directly on /dev/ptmx. When it hits
devpts_ptmx_path()  calls pts_path() which in turn calls
path_parent_directory(). While one would expect that
path_parent_directory() *should* yield /dev it will yield / since
/dev and /dev/pts are on different devices. This will cause path_pts() to
fail finding a "pts" directory since there is none under /. Thus, the
kernel detects that /dev/ptmx is escaping its bind-mount and will set
/proc/<pid>/fd/<nr> to /.

This patch changes the logic to do bind-mount resolution and after the
bind-mount has been resolved (i.e. we have traced it back to the devpts
mount) we can safely perform devpts_ptmx_path() and check whether we find a
"pts" directory in the parent directory of the devpts mount. Since
path_parent_directory() will now correctly yield /dev as parent directory
for the devpts mount at /dev/pts.

However, we can only perform devpts_ptmx_path() devpts_mntget() if we
either did resolve a bind-mount or did not find a suitable devpts
filesystem. The reason is that we want and need to support non-standard
mountpoints for the devpts filesystem. If we call devpts_ptmx_path()
although we did already find a devpts filesystem and did not resolve
bind-mounts we will fail on devpts mounts such as:

mount -t devpts devpts /mnt

where no "pts" directory will be under /. So change the logic to account
for this.

Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in
its openpty() implementation:

unshare --mount
mount --bind /dev/pts/ptmx /dev/ptmx
chmod 666 /dev/ptmx
script
ls -al /proc/self/fd/0

with output:

lrwx------ 1 chb chb 64 Mar  7 16:41 /proc/self/fd/0 -> /

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Suggested-by: Eric Biederman <ebiederm@xmission.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
---
ChangeLog v2->v3:
* rework logic to account for non-standard devpts mounts such as
  mount -t devpts devpts /mnt
ChangeLog v1->v2:
* move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
  condition to separate patch with non-functional changes
ChangeLog v0->v1:
* remove
        /* Has the devpts filesystem already been found? */
        if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
	                return 0
  from devpts_ptmx_path()
* check superblock after devpts_ptmx_path() returned
---
 fs/devpts/inode.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index d3ddbb888874..b31362c6c548 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -154,27 +154,35 @@ static int devpts_ptmx_path(struct path *path)
 
 struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi)
 {
+	bool unwind;
 	struct path path;
+	int err = 0;
 
 	path = filp->f_path;
 	path_get(&path);
 
-	/* Has the devpts filesystem already been found? */
-	if (path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) {
-		int err;
+	unwind = (DEVPTS_SB(path.mnt->mnt_sb) == fsi) &&
+		 (path.mnt->mnt_root == fsi->ptmx_dentry);
+	/* Walk upward while the start point is a bind mount of
+	 * a single file.
+	 */
+	while (path.mnt->mnt_root == path.dentry && unwind)
+		if (follow_up(&path) == 0)
+			break;
 
+	if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) || unwind)
 		err = devpts_ptmx_path(&path);
-		dput(path.dentry);
-		if (err) {
-			mntput(path.mnt);
-			return ERR_PTR(err);
-		}
+	dput(path.dentry);
+	if (err) {
+		mntput(path.mnt);
+		return ERR_PTR(err);
 	}
 
 	if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) {
 		mntput(path.mnt);
 		return ERR_PTR(-ENODEV);
 	}
+
 	return path.mnt;
 }
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3 v3] selftests: add devpts selftests
  2018-03-12 17:13 [PATCH 0/3 v3] devpts: handle bind-mounts Christian Brauner
  2018-03-12 17:13 ` [PATCH 1/3 v3] devpts: hoist out check for DEVPTS_SUPER_MAGIC Christian Brauner
  2018-03-12 17:13 ` [PATCH 2/3 v3] devpts: resolve devpts bind-mounts Christian Brauner
@ 2018-03-12 17:13 ` Christian Brauner
  2 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2018-03-12 17:13 UTC (permalink / raw)
  To: viro, linux-kernel, ebiederm, torvalds; +Cc: Christian Brauner

This adds a simple test to check whether /proc/<pid>/fd/<nr> symlinks are
correctly pointing to /dev/pts/<idx> devices when attached to a terminal.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
ChangeLog v2->v3:
* extend test for non-standard devpts mounts such as
  mount -t devpts e devpts /mnt
ChangeLog v1->v2:
* patch added
ChangeLog v0->v1:
* patch not present
---
 tools/testing/selftests/Makefile                 |   1 +
 tools/testing/selftests/filesystems/.gitignore   |   1 +
 tools/testing/selftests/filesystems/Makefile     |   2 +-
 tools/testing/selftests/filesystems/devpts_pts.c | 281 +++++++++++++++++++++++
 4 files changed, 284 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/filesystems/devpts_pts.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 7442dfb73b7f..dbda89c9d9b9 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -7,6 +7,7 @@ TARGETS += cpufreq
 TARGETS += cpu-hotplug
 TARGETS += efivarfs
 TARGETS += exec
+TARGETS += filesystems
 TARGETS += firmware
 TARGETS += ftrace
 TARGETS += futex
diff --git a/tools/testing/selftests/filesystems/.gitignore b/tools/testing/selftests/filesystems/.gitignore
index 31d6e426b6d4..8449cf6716ce 100644
--- a/tools/testing/selftests/filesystems/.gitignore
+++ b/tools/testing/selftests/filesystems/.gitignore
@@ -1 +1,2 @@
 dnotify_test
+devpts_pts
diff --git a/tools/testing/selftests/filesystems/Makefile b/tools/testing/selftests/filesystems/Makefile
index 13a73bf725b5..4e6d09fb166f 100644
--- a/tools/testing/selftests/filesystems/Makefile
+++ b/tools/testing/selftests/filesystems/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-TEST_PROGS := dnotify_test
+TEST_PROGS := dnotify_test devpts_pts
 all: $(TEST_PROGS)
 
 include ../lib.mk
diff --git a/tools/testing/selftests/filesystems/devpts_pts.c b/tools/testing/selftests/filesystems/devpts_pts.c
new file mode 100644
index 000000000000..d71d2f1f3b69
--- /dev/null
+++ b/tools/testing/selftests/filesystems/devpts_pts.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/mount.h>
+#include <sys/wait.h>
+
+static bool terminal_dup2(int duplicate, int original)
+{
+	int ret;
+
+	ret = dup2(duplicate, original);
+	if (ret < 0)
+		return false;
+
+	return true;
+}
+
+static int terminal_set_stdfds(int fd)
+{
+	int i;
+
+	if (fd < 0)
+		return 0;
+
+	for (i = 0; i < 3; i++)
+		if (!terminal_dup2(fd, (int[]){STDIN_FILENO, STDOUT_FILENO,
+					       STDERR_FILENO}[i]))
+			return -1;
+
+	return 0;
+}
+
+static int login_pty(int fd)
+{
+	int ret;
+
+	setsid();
+
+	ret = ioctl(fd, TIOCSCTTY, NULL);
+	if (ret < 0)
+		return -1;
+
+	ret = terminal_set_stdfds(fd);
+	if (ret < 0)
+		return -1;
+
+	if (fd > STDERR_FILENO)
+		close(fd);
+
+	return 0;
+}
+
+static int wait_for_pid(pid_t pid)
+{
+	int status, ret;
+
+again:
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		if (errno == EINTR)
+			goto again;
+		return -1;
+	}
+	if (ret != pid)
+		goto again;
+
+	if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
+		return -1;
+
+	return 0;
+}
+
+static int resolve_procfd_symlink(int fd, char *buf, size_t buflen)
+{
+	int ret;
+	char procfd[4096];
+
+	ret = snprintf(procfd, 4096, "/proc/self/fd/%d", fd);
+	if (ret < 0 || ret >= 4096)
+		return -1;
+
+	ret = readlink(procfd, buf, buflen);
+	if (ret < 0 || (size_t)ret >= buflen)
+		return -1;
+
+	buf[ret] = '\0';
+
+	return 0;
+}
+
+static int do_tiocgptpeer(char *ptmx, char *expected_procfd_contents)
+{
+	int ret;
+	int master = -1, slave = -1, fret = -1;
+
+	master = open(ptmx, O_RDWR | O_NOCTTY | O_CLOEXEC);
+	if (master < 0) {
+		fprintf(stderr, "Failed to open \"%s\": %s\n", ptmx,
+			strerror(errno));
+		return -1;
+	}
+
+	/*
+	 * grantpt() makes assumptions about /dev/pts/ so ignore it. It's also
+	 * not really needed.
+	 */
+	ret = unlockpt(master);
+	if (ret < 0) {
+		fprintf(stderr, "Failed to unlock terminal\n");
+		goto do_cleanup;
+	}
+
+#ifdef TIOCGPTPEER
+	slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY | O_CLOEXEC);
+#endif
+	if (slave < 0) {
+		if (errno == EINVAL) {
+			fprintf(stderr, "TIOCGPTPEER is not supported. "
+					"Skipping test.\n");
+			fret = EXIT_SUCCESS;
+		}
+
+		fprintf(stderr, "Failed to perform TIOCGPTPEER ioctl\n");
+		goto do_cleanup;
+	}
+
+	pid_t pid = fork();
+	if (pid < 0)
+		goto do_cleanup;
+
+	if (pid == 0) {
+		char buf[4096];
+
+		ret = login_pty(slave);
+		if (ret < 0) {
+			fprintf(stderr, "Failed to setup terminal\n");
+			_exit(EXIT_FAILURE);
+		}
+
+		ret = resolve_procfd_symlink(STDIN_FILENO, buf, sizeof(buf));
+		if (ret < 0) {
+			fprintf(stderr, "Failed to retrieve pathname of pts "
+					"slave file descriptor\n");
+			_exit(EXIT_FAILURE);
+		}
+
+		if (strncmp(expected_procfd_contents, buf,
+			    strlen(expected_procfd_contents)) != 0) {
+			fprintf(stderr, "Received invalid contents for "
+					"\"/proc/<pid>/fd/%d\" symlink: %s\n",
+					STDIN_FILENO, buf);
+			_exit(-1);
+		}
+
+		fprintf(stderr, "Contents of \"/proc/<pid>/fd/%d\" "
+				"symlink are valid: %s\n", STDIN_FILENO, buf);
+
+		_exit(EXIT_SUCCESS);
+	}
+
+	ret = wait_for_pid(pid);
+	if (ret < 0)
+		goto do_cleanup;
+
+	fret = EXIT_SUCCESS;
+
+do_cleanup:
+	if (master >= 0)
+		close(master);
+	if (slave >= 0)
+		close(slave);
+
+	return fret;
+}
+
+static int verify_non_standard_devpts_mount(void)
+{
+	char *mntpoint;
+	int ret = -1;
+	char devpts[] = P_tmpdir "/devpts_fs_XXXXXX";
+	char ptmx[] = P_tmpdir "/devpts_fs_XXXXXX/ptmx";
+
+	ret = umount("/dev/pts");
+	if (ret < 0) {
+		fprintf(stderr, "Failed to unmount \"/dev/pts\": %s\n",
+				strerror(errno));
+		return -1;
+	}
+
+	(void)umount("/dev/ptmx");
+
+	mntpoint = mkdtemp(devpts);
+	if (!mntpoint) {
+		fprintf(stderr, "Failed to create temporary mountpoint: %s\n",
+				 strerror(errno));
+		return -1;
+	}
+
+	ret = mount("devpts", mntpoint, "devpts", MS_NOSUID | MS_NOEXEC,
+		    "newinstance,ptmxmode=0666,mode=0620,gid=5");
+	if (ret < 0) {
+		fprintf(stderr, "Failed to mount devpts fs to \"%s\" in new "
+				"mount namespace: %s\n", mntpoint,
+				strerror(errno));
+		unlink(mntpoint);
+		return -1;
+	}
+
+	ret = snprintf(ptmx, sizeof(ptmx), "%s/ptmx", devpts);
+	if (ret < 0 || (size_t)ret >= sizeof(ptmx)) {
+		unlink(mntpoint);
+		return -1;
+	}
+
+	ret = do_tiocgptpeer(ptmx, mntpoint);
+	unlink(mntpoint);
+	if (ret < 0)
+		return -1;
+
+	return 0;
+}
+
+static int verify_ptmx_bind_mount(void)
+{
+	int ret;
+
+	ret = mount("/dev/pts/ptmx", "/dev/ptmx", NULL, MS_BIND, NULL);
+	if (ret < 0) {
+		fprintf(stderr, "Failed to bind mount \"/dev/pts/ptmx\" to "
+				"\"/dev/ptmx\" mount namespace\n");
+		return -1;
+	}
+
+	ret = do_tiocgptpeer("/dev/ptmx", "/dev/pts/");
+	if (ret < 0)
+		return -1;
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (!isatty(STDIN_FILENO)) {
+		fprintf(stderr, "Standard input file desciptor is not attached "
+				"to a terminal. Skipping test\n");
+		exit(EXIT_FAILURE);
+	}
+
+	ret = unshare(CLONE_NEWNS);
+	if (ret < 0) {
+		fprintf(stderr, "Failed to unshare mount namespace\n");
+		exit(EXIT_FAILURE);
+	}
+
+	ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
+	if (ret < 0) {
+		fprintf(stderr, "Failed to make \"/\" MS_PRIVATE in new mount "
+				"namespace\n");
+		exit(EXIT_FAILURE);
+	}
+
+	ret = verify_ptmx_bind_mount();
+	if (ret < 0)
+		exit(EXIT_FAILURE);
+
+	ret = verify_non_standard_devpts_mount();
+	if (ret < 0)
+		exit(EXIT_FAILURE);
+
+	exit(EXIT_SUCCESS);
+}
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3 v3] devpts: resolve devpts bind-mounts
  2018-03-12 17:13 ` [PATCH 2/3 v3] devpts: resolve devpts bind-mounts Christian Brauner
@ 2018-03-12 19:52   ` Eric W. Biederman
  2018-03-12 20:10     ` Christian Brauner
  0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2018-03-12 19:52 UTC (permalink / raw)
  To: Christian Brauner; +Cc: viro, linux-kernel, torvalds

Christian Brauner <christian.brauner@ubuntu.com> writes:

> Most libcs will still look at /dev/ptmx when opening the master fd of a pty
> device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
> ioctl() is used to safely retrieve a file descriptor for the slave side of
> the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
> point to /.
>
> When the kernel tries to look up the root mount of the dentry for the slave
> file descriptor it will detect that the dentry is escaping its bind-mount
> since the root mount of the dentry is /dev/pts where the devpts is mounted
> but the root mount of /dev/ptmx is /dev.
>
> Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
> regression. In addition, it is also a fairly common scenario in containers
> employing user namespaces.
>
> To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to
> walk up the bind-mounts for /dev/ptmx in devpts_mntget(). Since the
> contents of /proc/<pid>/fd/<nr> symlinks attached to the slave side of a
> file descriptor will always point to a path under the devpts mount we need
> to try and ensure that the kernel doesn't falsely get the impression that a
> pty slave file descriptor retrieved via TIOCGPTPEER based on a pty master
> file descriptor opened via a bind-mount of the ptmx device escapes its
> bind-mount. To clarify in pseudo code:
>
> * bind-mount /dev/pts/ptmx to /dev/ptmx
> * master = open("/dev/ptmx", O_RDWR | O_NOCTTY | O_CLOEXEC);
> * slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY | O_CLOEXEC);
>
> would cause the kernel to think that slave is escaping its bind-mount. The
> reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at
> /dev as its parent mount:
>
> 21 -- -- / /dev
> -- 21 -- / /dev/pts
>
> they are on different devices
>
> -- -- 0:6  / /dev
> -- -- 0:20 / /dev/pts
>
> which has the consequence that the pathname of the directory which forms
> the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to
> /dev/ptmx we will end up on the same device as the devtmpfs mount at
> /dev/pts
>
> -- -- 0:20 /ptmx /dev/ptmx
>
> Without the bind-mount resolution patch here the kernel will now perform
> the bind-mount escape check directly on /dev/ptmx. When it hits
> devpts_ptmx_path()  calls pts_path() which in turn calls
> path_parent_directory(). While one would expect that
> path_parent_directory() *should* yield /dev it will yield / since
> /dev and /dev/pts are on different devices. This will cause path_pts() to
> fail finding a "pts" directory since there is none under /. Thus, the
> kernel detects that /dev/ptmx is escaping its bind-mount and will set
> /proc/<pid>/fd/<nr> to /.
>
> This patch changes the logic to do bind-mount resolution and after the
> bind-mount has been resolved (i.e. we have traced it back to the devpts
> mount) we can safely perform devpts_ptmx_path() and check whether we find a
> "pts" directory in the parent directory of the devpts mount. Since
> path_parent_directory() will now correctly yield /dev as parent directory
> for the devpts mount at /dev/pts.
>
> However, we can only perform devpts_ptmx_path() devpts_mntget() if we
> either did resolve a bind-mount or did not find a suitable devpts
> filesystem. The reason is that we want and need to support non-standard
> mountpoints for the devpts filesystem. If we call devpts_ptmx_path()
> although we did already find a devpts filesystem and did not resolve
> bind-mounts we will fail on devpts mounts such as:
>
> mount -t devpts devpts /mnt
>
> where no "pts" directory will be under /. So change the logic to account
> for this.
>
> Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in
> its openpty() implementation:
>
> unshare --mount
> mount --bind /dev/pts/ptmx /dev/ptmx
> chmod 666 /dev/ptmx
> script
> ls -al /proc/self/fd/0
>
> with output:
>
> lrwx------ 1 chb chb 64 Mar  7 16:41 /proc/self/fd/0 -> /
>
> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> Suggested-by: Eric Biederman <ebiederm@xmission.com>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
> ChangeLog v2->v3:
> * rework logic to account for non-standard devpts mounts such as
>   mount -t devpts devpts /mnt
> ChangeLog v1->v2:
> * move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
>   condition to separate patch with non-functional changes
> ChangeLog v0->v1:
> * remove
>         /* Has the devpts filesystem already been found? */
>         if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
> 	                return 0
>   from devpts_ptmx_path()
> * check superblock after devpts_ptmx_path() returned
> ---
>  fs/devpts/inode.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
> index d3ddbb888874..b31362c6c548 100644
> --- a/fs/devpts/inode.c
> +++ b/fs/devpts/inode.c
> @@ -154,27 +154,35 @@ static int devpts_ptmx_path(struct path *path)

There is a question I asked in my original version and I haven't
seen it answered.

What do we want to do in the case where TIOCGPTPEER is called and
the ptmx file descriptor is not from a mount that has pty's under it.

a) We can continue the existing problematic behavior and return
   a pty fd whose proc path is '/'

b) We can return an error which changes the observable behavior.

My comments below are presuming we change the kernel to error.

>From my experience introducing the path following version of /dev/ptmx
no one in practice has a /dev/ptmx dentry without an accompoanying
/dev/pts/ptmx.  Therefore I do not expect changing the behavior to
error will introduce a regression in userspace.

So let's just change the behavior of devpts_mntget error if a mount with
the pty underneath it can not be found.

>  struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi)
>  {
> +	bool unwind;
>  	struct path path;
> +	int err = 0;

The only use I see for unwind is to ensure we don't change path
when we would want to return the mount from filp->f_path even it
is not connected to it's mount.

>  
>  	path = filp->f_path;
>  	path_get(&path);

> -	/* Has the devpts filesystem already been found? */
> -	if (path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) {
> -		int err;
> +	unwind = (DEVPTS_SB(path.mnt->mnt_sb) == fsi) &&
> +		 (path.mnt->mnt_root == fsi->ptmx_dentry);
> +	/* Walk upward while the start point is a bind mount of
> +	 * a single file.
> +	 */
> +	while (path.mnt->mnt_root == path.dentry && unwind)
> +		if (follow_up(&path) == 0)
> +			break;
This look can become simply:
	while (path.mnt->mnt_root == path.dentry)
		if (follow_up(&path) == 0)
			break;

> +	if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) || unwind)
This test can become simply:

	if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) ||
            (DEVPTS_SB(path->mnt.mnt_sb) != fsi))
>  		err = devpts_ptmx_path(&path);
> -		dput(path.dentry);
> -		if (err) {
> -			mntput(path.mnt);
> -			return ERR_PTR(err);
> -		}
> +	dput(path.dentry);
> +	if (err) {
> +		mntput(path.mnt);
> +		return ERR_PTR(err);
>  	}
>  
>  	if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) {
>  		mntput(path.mnt);
>  		return ERR_PTR(-ENODEV);
>  	}
> +
>  	return path.mnt;
>  }

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3 v3] devpts: resolve devpts bind-mounts
  2018-03-12 19:52   ` Eric W. Biederman
@ 2018-03-12 20:10     ` Christian Brauner
  2018-03-12 20:28       ` Eric W. Biederman
  2018-03-12 21:52       ` Linus Torvalds
  0 siblings, 2 replies; 8+ messages in thread
From: Christian Brauner @ 2018-03-12 20:10 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Christian Brauner, viro, linux-kernel, torvalds

On Mon, Mar 12, 2018 at 02:52:53PM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@ubuntu.com> writes:
> 
> > Most libcs will still look at /dev/ptmx when opening the master fd of a pty
> > device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
> > ioctl() is used to safely retrieve a file descriptor for the slave side of
> > the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
> > point to /.
> >
> > When the kernel tries to look up the root mount of the dentry for the slave
> > file descriptor it will detect that the dentry is escaping its bind-mount
> > since the root mount of the dentry is /dev/pts where the devpts is mounted
> > but the root mount of /dev/ptmx is /dev.
> >
> > Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
> > regression. In addition, it is also a fairly common scenario in containers
> > employing user namespaces.
> >
> > To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to
> > walk up the bind-mounts for /dev/ptmx in devpts_mntget(). Since the
> > contents of /proc/<pid>/fd/<nr> symlinks attached to the slave side of a
> > file descriptor will always point to a path under the devpts mount we need
> > to try and ensure that the kernel doesn't falsely get the impression that a
> > pty slave file descriptor retrieved via TIOCGPTPEER based on a pty master
> > file descriptor opened via a bind-mount of the ptmx device escapes its
> > bind-mount. To clarify in pseudo code:
> >
> > * bind-mount /dev/pts/ptmx to /dev/ptmx
> > * master = open("/dev/ptmx", O_RDWR | O_NOCTTY | O_CLOEXEC);
> > * slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY | O_CLOEXEC);
> >
> > would cause the kernel to think that slave is escaping its bind-mount. The
> > reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at
> > /dev as its parent mount:
> >
> > 21 -- -- / /dev
> > -- 21 -- / /dev/pts
> >
> > they are on different devices
> >
> > -- -- 0:6  / /dev
> > -- -- 0:20 / /dev/pts
> >
> > which has the consequence that the pathname of the directory which forms
> > the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to
> > /dev/ptmx we will end up on the same device as the devtmpfs mount at
> > /dev/pts
> >
> > -- -- 0:20 /ptmx /dev/ptmx
> >
> > Without the bind-mount resolution patch here the kernel will now perform
> > the bind-mount escape check directly on /dev/ptmx. When it hits
> > devpts_ptmx_path()  calls pts_path() which in turn calls
> > path_parent_directory(). While one would expect that
> > path_parent_directory() *should* yield /dev it will yield / since
> > /dev and /dev/pts are on different devices. This will cause path_pts() to
> > fail finding a "pts" directory since there is none under /. Thus, the
> > kernel detects that /dev/ptmx is escaping its bind-mount and will set
> > /proc/<pid>/fd/<nr> to /.
> >
> > This patch changes the logic to do bind-mount resolution and after the
> > bind-mount has been resolved (i.e. we have traced it back to the devpts
> > mount) we can safely perform devpts_ptmx_path() and check whether we find a
> > "pts" directory in the parent directory of the devpts mount. Since
> > path_parent_directory() will now correctly yield /dev as parent directory
> > for the devpts mount at /dev/pts.
> >
> > However, we can only perform devpts_ptmx_path() devpts_mntget() if we
> > either did resolve a bind-mount or did not find a suitable devpts
> > filesystem. The reason is that we want and need to support non-standard
> > mountpoints for the devpts filesystem. If we call devpts_ptmx_path()
> > although we did already find a devpts filesystem and did not resolve
> > bind-mounts we will fail on devpts mounts such as:
> >
> > mount -t devpts devpts /mnt
> >
> > where no "pts" directory will be under /. So change the logic to account
> > for this.
> >
> > Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in
> > its openpty() implementation:
> >
> > unshare --mount
> > mount --bind /dev/pts/ptmx /dev/ptmx
> > chmod 666 /dev/ptmx
> > script
> > ls -al /proc/self/fd/0
> >
> > with output:
> >
> > lrwx------ 1 chb chb 64 Mar  7 16:41 /proc/self/fd/0 -> /
> >
> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> > Suggested-by: Eric Biederman <ebiederm@xmission.com>
> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > ---
> > ChangeLog v2->v3:
> > * rework logic to account for non-standard devpts mounts such as
> >   mount -t devpts devpts /mnt
> > ChangeLog v1->v2:
> > * move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
> >   condition to separate patch with non-functional changes
> > ChangeLog v0->v1:
> > * remove
> >         /* Has the devpts filesystem already been found? */
> >         if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
> > 	                return 0
> >   from devpts_ptmx_path()
> > * check superblock after devpts_ptmx_path() returned
> > ---
> >  fs/devpts/inode.c | 24 ++++++++++++++++--------
> >  1 file changed, 16 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
> > index d3ddbb888874..b31362c6c548 100644
> > --- a/fs/devpts/inode.c
> > +++ b/fs/devpts/inode.c
> > @@ -154,27 +154,35 @@ static int devpts_ptmx_path(struct path *path)
> 
> There is a question I asked in my original version and I haven't
> seen it answered.
> 
> What do we want to do in the case where TIOCGPTPEER is called and
> the ptmx file descriptor is not from a mount that has pty's under it.
> 
> a) We can continue the existing problematic behavior and return
>    a pty fd whose proc path is '/'
> 
> b) We can return an error which changes the observable behavior.
> 
> My comments below are presuming we change the kernel to error.
> 
> From my experience introducing the path following version of /dev/ptmx
> no one in practice has a /dev/ptmx dentry without an accompoanying
> /dev/pts/ptmx.  Therefore I do not expect changing the behavior to
> error will introduce a regression in userspace.
> 
> So let's just change the behavior of devpts_mntget error if a mount with
> the pty underneath it can not be found.

I'm sort of torn but I think I'd prefer changing the behavior too. The
/dev/ptmx, /dev/pts/ptmx schisma is really a special case and I would
like it to be the only one. I really don't want to support bind-mounting
the ptmx character device under the devpts mounts across the system.
That seems like unnecessary complexity that is also unused. So if Linus
is fine with this change as well I don't mind sending a v4.

There's only one thing here I'd like to throw into the ring. This error
condition only works with TIOCGPTPEER, right? If userspace does a
path-based retrieveal of the pty file descriptor by doing e.g. a
readlink on the slave file descriptor and calls open() on it they would
still hit the /proc/<pid>/fd/<nr> -> / problem? It doesn't hit the same
codepath, does it? Or am I missing someting?

> 
> >  struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi)
> >  {
> > +	bool unwind;
> >  	struct path path;
> > +	int err = 0;
> 
> The only use I see for unwind is to ensure we don't change path
> when we would want to return the mount from filp->f_path even it
> is not connected to it's mount.

Yes.

> 
> >  
> >  	path = filp->f_path;
> >  	path_get(&path);
> 
> > -	/* Has the devpts filesystem already been found? */
> > -	if (path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) {
> > -		int err;
> > +	unwind = (DEVPTS_SB(path.mnt->mnt_sb) == fsi) &&
> > +		 (path.mnt->mnt_root == fsi->ptmx_dentry);
> > +	/* Walk upward while the start point is a bind mount of
> > +	 * a single file.
> > +	 */
> > +	while (path.mnt->mnt_root == path.dentry && unwind)
> > +		if (follow_up(&path) == 0)
> > +			break;
> This look can become simply:
> 	while (path.mnt->mnt_root == path.dentry)
> 		if (follow_up(&path) == 0)
> 			break;
> 
> > +	if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) || unwind)
> This test can become simply:
> 
> 	if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) ||
>             (DEVPTS_SB(path->mnt.mnt_sb) != fsi))
> >  		err = devpts_ptmx_path(&path);
> > -		dput(path.dentry);
> > -		if (err) {
> > -			mntput(path.mnt);
> > -			return ERR_PTR(err);
> > -		}
> > +	dput(path.dentry);
> > +	if (err) {
> > +		mntput(path.mnt);
> > +		return ERR_PTR(err);
> >  	}
> >  
> >  	if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) {
> >  		mntput(path.mnt);
> >  		return ERR_PTR(-ENODEV);
> >  	}
> > +
> >  	return path.mnt;
> >  }
> 
> Eric

Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3 v3] devpts: resolve devpts bind-mounts
  2018-03-12 20:10     ` Christian Brauner
@ 2018-03-12 20:28       ` Eric W. Biederman
  2018-03-12 21:52       ` Linus Torvalds
  1 sibling, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2018-03-12 20:28 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Christian Brauner, viro, linux-kernel, torvalds

Christian Brauner <christian.brauner@canonical.com> writes:

> On Mon, Mar 12, 2018 at 02:52:53PM -0500, Eric W. Biederman wrote:
>> Christian Brauner <christian.brauner@ubuntu.com> writes:
>> 
>> > Most libcs will still look at /dev/ptmx when opening the master fd of a pty
>> > device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
>> > ioctl() is used to safely retrieve a file descriptor for the slave side of
>> > the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
>> > point to /.
>> >
>> > When the kernel tries to look up the root mount of the dentry for the slave
>> > file descriptor it will detect that the dentry is escaping its bind-mount
>> > since the root mount of the dentry is /dev/pts where the devpts is mounted
>> > but the root mount of /dev/ptmx is /dev.
>> >
>> > Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
>> > regression. In addition, it is also a fairly common scenario in containers
>> > employing user namespaces.
>> >
>> > To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to
>> > walk up the bind-mounts for /dev/ptmx in devpts_mntget(). Since the
>> > contents of /proc/<pid>/fd/<nr> symlinks attached to the slave side of a
>> > file descriptor will always point to a path under the devpts mount we need
>> > to try and ensure that the kernel doesn't falsely get the impression that a
>> > pty slave file descriptor retrieved via TIOCGPTPEER based on a pty master
>> > file descriptor opened via a bind-mount of the ptmx device escapes its
>> > bind-mount. To clarify in pseudo code:
>> >
>> > * bind-mount /dev/pts/ptmx to /dev/ptmx
>> > * master = open("/dev/ptmx", O_RDWR | O_NOCTTY | O_CLOEXEC);
>> > * slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY | O_CLOEXEC);
>> >
>> > would cause the kernel to think that slave is escaping its bind-mount. The
>> > reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at
>> > /dev as its parent mount:
>> >
>> > 21 -- -- / /dev
>> > -- 21 -- / /dev/pts
>> >
>> > they are on different devices
>> >
>> > -- -- 0:6  / /dev
>> > -- -- 0:20 / /dev/pts
>> >
>> > which has the consequence that the pathname of the directory which forms
>> > the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to
>> > /dev/ptmx we will end up on the same device as the devtmpfs mount at
>> > /dev/pts
>> >
>> > -- -- 0:20 /ptmx /dev/ptmx
>> >
>> > Without the bind-mount resolution patch here the kernel will now perform
>> > the bind-mount escape check directly on /dev/ptmx. When it hits
>> > devpts_ptmx_path()  calls pts_path() which in turn calls
>> > path_parent_directory(). While one would expect that
>> > path_parent_directory() *should* yield /dev it will yield / since
>> > /dev and /dev/pts are on different devices. This will cause path_pts() to
>> > fail finding a "pts" directory since there is none under /. Thus, the
>> > kernel detects that /dev/ptmx is escaping its bind-mount and will set
>> > /proc/<pid>/fd/<nr> to /.
>> >
>> > This patch changes the logic to do bind-mount resolution and after the
>> > bind-mount has been resolved (i.e. we have traced it back to the devpts
>> > mount) we can safely perform devpts_ptmx_path() and check whether we find a
>> > "pts" directory in the parent directory of the devpts mount. Since
>> > path_parent_directory() will now correctly yield /dev as parent directory
>> > for the devpts mount at /dev/pts.
>> >
>> > However, we can only perform devpts_ptmx_path() devpts_mntget() if we
>> > either did resolve a bind-mount or did not find a suitable devpts
>> > filesystem. The reason is that we want and need to support non-standard
>> > mountpoints for the devpts filesystem. If we call devpts_ptmx_path()
>> > although we did already find a devpts filesystem and did not resolve
>> > bind-mounts we will fail on devpts mounts such as:
>> >
>> > mount -t devpts devpts /mnt
>> >
>> > where no "pts" directory will be under /. So change the logic to account
>> > for this.
>> >
>> > Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in
>> > its openpty() implementation:
>> >
>> > unshare --mount
>> > mount --bind /dev/pts/ptmx /dev/ptmx
>> > chmod 666 /dev/ptmx
>> > script
>> > ls -al /proc/self/fd/0
>> >
>> > with output:
>> >
>> > lrwx------ 1 chb chb 64 Mar  7 16:41 /proc/self/fd/0 -> /
>> >
>> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
>> > Suggested-by: Eric Biederman <ebiederm@xmission.com>
>> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> > ---
>> > ChangeLog v2->v3:
>> > * rework logic to account for non-standard devpts mounts such as
>> >   mount -t devpts devpts /mnt
>> > ChangeLog v1->v2:
>> > * move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
>> >   condition to separate patch with non-functional changes
>> > ChangeLog v0->v1:
>> > * remove
>> >         /* Has the devpts filesystem already been found? */
>> >         if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
>> > 	                return 0
>> >   from devpts_ptmx_path()
>> > * check superblock after devpts_ptmx_path() returned
>> > ---
>> >  fs/devpts/inode.c | 24 ++++++++++++++++--------
>> >  1 file changed, 16 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
>> > index d3ddbb888874..b31362c6c548 100644
>> > --- a/fs/devpts/inode.c
>> > +++ b/fs/devpts/inode.c
>> > @@ -154,27 +154,35 @@ static int devpts_ptmx_path(struct path *path)
>> 
>> There is a question I asked in my original version and I haven't
>> seen it answered.
>> 
>> What do we want to do in the case where TIOCGPTPEER is called and
>> the ptmx file descriptor is not from a mount that has pty's under it.
>> 
>> a) We can continue the existing problematic behavior and return
>>    a pty fd whose proc path is '/'
>> 
>> b) We can return an error which changes the observable behavior.
>> 
>> My comments below are presuming we change the kernel to error.
>> 
>> From my experience introducing the path following version of /dev/ptmx
>> no one in practice has a /dev/ptmx dentry without an accompoanying
>> /dev/pts/ptmx.  Therefore I do not expect changing the behavior to
>> error will introduce a regression in userspace.
>> 
>> So let's just change the behavior of devpts_mntget error if a mount with
>> the pty underneath it can not be found.
>
> I'm sort of torn but I think I'd prefer changing the behavior too. The
> /dev/ptmx, /dev/pts/ptmx schisma is really a special case and I would
> like it to be the only one. I really don't want to support bind-mounting
> the ptmx character device under the devpts mounts across the system.
> That seems like unnecessary complexity that is also unused. So if Linus
> is fine with this change as well I don't mind sending a v4.
>
> There's only one thing here I'd like to throw into the ring. This error
> condition only works with TIOCGPTPEER, right? If userspace does a
> path-based retrieveal of the pty file descriptor by doing e.g. a
> readlink on the slave file descriptor and calls open() on it they would
> still hit the /proc/<pid>/fd/<nr> -> / problem? It doesn't hit the same
> codepath, does it? Or am I missing someting?

The error is preventing TIOCGPTPEER from returning problem file
descriptors to userspace.  It is uniquely the source of such file
descriptors so correct the behavior when we can and returning an
error when we can't should prevent their existence.

Eric

>> >  struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi)
>> >  {
>> > +	bool unwind;
>> >  	struct path path;
>> > +	int err = 0;
>> 
>> The only use I see for unwind is to ensure we don't change path
>> when we would want to return the mount from filp->f_path even it
>> is not connected to it's mount.
>
> Yes.
>
>> 
>> >  
>> >  	path = filp->f_path;
>> >  	path_get(&path);
>> 
>> > -	/* Has the devpts filesystem already been found? */
>> > -	if (path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) {
>> > -		int err;
>> > +	unwind = (DEVPTS_SB(path.mnt->mnt_sb) == fsi) &&
>> > +		 (path.mnt->mnt_root == fsi->ptmx_dentry);
>> > +	/* Walk upward while the start point is a bind mount of
>> > +	 * a single file.
>> > +	 */
>> > +	while (path.mnt->mnt_root == path.dentry && unwind)
>> > +		if (follow_up(&path) == 0)
>> > +			break;
>> This look can become simply:
>> 	while (path.mnt->mnt_root == path.dentry)
>> 		if (follow_up(&path) == 0)
>> 			break;
>> 
>> > +	if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) || unwind)
>> This test can become simply:
>> 
>> 	if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) ||
>>             (DEVPTS_SB(path->mnt.mnt_sb) != fsi))
>> >  		err = devpts_ptmx_path(&path);
>> > -		dput(path.dentry);
>> > -		if (err) {
>> > -			mntput(path.mnt);
>> > -			return ERR_PTR(err);
>> > -		}
>> > +	dput(path.dentry);
>> > +	if (err) {
>> > +		mntput(path.mnt);
>> > +		return ERR_PTR(err);
>> >  	}
>> >  
>> >  	if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) {
>> >  		mntput(path.mnt);
>> >  		return ERR_PTR(-ENODEV);
>> >  	}
>> > +
>> >  	return path.mnt;
>> >  }
>> 
>> Eric
>
> Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3 v3] devpts: resolve devpts bind-mounts
  2018-03-12 20:10     ` Christian Brauner
  2018-03-12 20:28       ` Eric W. Biederman
@ 2018-03-12 21:52       ` Linus Torvalds
  1 sibling, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2018-03-12 21:52 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, Christian Brauner, Al Viro, Linux Kernel Mailing List

On Mon, Mar 12, 2018 at 1:10 PM, Christian Brauner
<christian.brauner@canonical.com> wrote:
>>
>> So let's just change the behavior of devpts_mntget error if a mount with
>> the pty underneath it can not be found.
>
> I'm sort of torn but I think I'd prefer changing the behavior too. The
> /dev/ptmx, /dev/pts/ptmx schisma is really a special case and I would
> like it to be the only one. I really don't want to support bind-mounting
> the ptmx character device under the devpts mounts across the system.
> That seems like unnecessary complexity that is also unused. So if Linus
> is fine with this change as well I don't mind sending a v4.

I'm fine with whatever error handling - I cannot imagine that it matters.

And if it does, and somebody finds somebody who depends on insane
behavior, we can revisit it then,

But fundamentally. I think there are only two valid situations:

 - people use devpts as-is (so the ptmx node is inherently in the same
subdirectory as the pts nodes, and is fundamentally the same
filesystem)

   NOTE! In this case, the ptmx node is *never* a bind-mount. It is
simply part of the pts directory as exported by devpts.

   This is the "people actually use /dev/pts/ptmx" case. It's fairly
rare, I think.

 - the parent directory ptmx node is an arbitrary ptmx node (symlink,
bind mount, or mknod)

   NOTE! In this case, the ptmx node is not necessarily associated
with the pts directory in any way, and the only thing that matters is
that it's the right special character device.

Honestly, I personally detest case #1, even if some people think it's
the "clean" case. It isn't particularly clean, since it fundamentally
is not how /dev/ptmx has traditionally ever worked. It would have been
clean had that been how ptmx worked traditionally, but that' ssimply
not the case.

But if something really odd happens, and we cannot find a pts
subdirectory, or there is not a pts node, or not a ptmx node in that
subdirectory, I think all bets are pretty much off path-wise.
Returning an error is probably better than returning a bogus path.

                 Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-03-12 21:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12 17:13 [PATCH 0/3 v3] devpts: handle bind-mounts Christian Brauner
2018-03-12 17:13 ` [PATCH 1/3 v3] devpts: hoist out check for DEVPTS_SUPER_MAGIC Christian Brauner
2018-03-12 17:13 ` [PATCH 2/3 v3] devpts: resolve devpts bind-mounts Christian Brauner
2018-03-12 19:52   ` Eric W. Biederman
2018-03-12 20:10     ` Christian Brauner
2018-03-12 20:28       ` Eric W. Biederman
2018-03-12 21:52       ` Linus Torvalds
2018-03-12 17:13 ` [PATCH 3/3 v3] selftests: add devpts selftests Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).