linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] fs/exec: Explicitly unshare fs_struct on exec
@ 2022-10-06  8:27 Kees Cook
  2022-10-06  8:27 ` [PATCH 1/2] " Kees Cook
  2022-10-06  8:27 ` [PATCH 2/2] exec: Remove LSM_UNSAFE_SHARE Kees Cook
  0 siblings, 2 replies; 15+ messages in thread
From: Kees Cook @ 2022-10-06  8:27 UTC (permalink / raw)
  To: Eric Biederman
  Cc: Kees Cook, Jorge Merlino, Al Viro, Christian Brauner (Microsoft),
	Thomas Gleixner, Andy Lutomirski, Sebastian Andrzej Siewior,
	Andrew Morton, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Stephen Smalley, Eric Paris, Richard Haines,
	Casey Schaufler, Xin Long, David S. Miller, Todd Kjos,
	Ondrej Mosnacek, Prashanth Prahlad, Micah Morton, Fenghua Yu,
	Andrei Vagin, linux-kernel, linux-mm, linux-fsdevel, apparmor,
	linux-security-module, selinux, linux-hardening

Hi,

These changes seek to address an issue reported[1] by Jorge Merlino where
high-thread-count processes would sometimes fail to setuid during a
setuid execve().

It looks to me like the solution is to explicitly do an unshare_fs(),
which should almost always be a no-op. Current testing seems to indicate
that only the swapper->init exec triggers this condition (and I'm unclear
on whether that's expected or undesirable). This has only received very
light testing so far, but I wanted to share it so other folks could look
it over.

Jorge, can you test with these patches? Your PoC triggered immediately
for me on an unpatched kernel, and did not trigger on a patched one.

I added this patch on top of the series to see if the code ever fired:

diff --git a/kernel/fork.c b/kernel/fork.c
index 53b7248f7a4b..3c197d9d8daa 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -3113,6 +3113,7 @@ int unshare_fs(void)
 	if (error || !new_fs)
 		return error;
 
+	pr_notice("UNSHARE of \"%s\" [%d]\n", current->comm, current->pid);
 	unshare_fs_finalize(&new_fs);
 
 	if (new_fs)

Thanks!

-Kees

[1] https://lore.kernel.org/lkml/20220910211215.140270-1-jorge.merlino@canonical.com/

Kees Cook (2):
  fs/exec: Explicitly unshare fs_struct on exec
  exec: Remove LSM_UNSAFE_SHARE

 fs/exec.c                  | 26 ++++------------
 fs/fs_struct.c             |  1 -
 include/linux/fdtable.h    |  1 +
 include/linux/fs_struct.h  |  1 -
 include/linux/security.h   |  5 ++-
 kernel/fork.c              | 62 ++++++++++++++++++++++++++------------
 security/apparmor/domain.c |  5 ---
 security/selinux/hooks.c   | 10 ------
 8 files changed, 51 insertions(+), 60 deletions(-)

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-06  8:27 [PATCH 0/2] fs/exec: Explicitly unshare fs_struct on exec Kees Cook
@ 2022-10-06  8:27 ` Kees Cook
  2022-10-06  9:05   ` Christian Brauner
  2022-10-06  8:27 ` [PATCH 2/2] exec: Remove LSM_UNSAFE_SHARE Kees Cook
  1 sibling, 1 reply; 15+ messages in thread
From: Kees Cook @ 2022-10-06  8:27 UTC (permalink / raw)
  To: Eric Biederman
  Cc: Kees Cook, Jorge Merlino, Alexander Viro,
	Christian Brauner (Microsoft),
	Thomas Gleixner, Andy Lutomirski, Sebastian Andrzej Siewior,
	Andrew Morton, linux-mm, linux-fsdevel, John Johansen,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Eric Paris, Richard Haines, Casey Schaufler, Xin Long,
	David S. Miller, Todd Kjos, Ondrej Mosnacek, Prashanth Prahlad,
	Micah Morton, Fenghua Yu, Andrei Vagin, linux-kernel, apparmor,
	linux-security-module, selinux, linux-hardening

The check_unsafe_exec() counting of n_fs would not add up under a heavily
threaded process trying to perform a suid exec, causing the suid portion
to fail. This counting error appears to be unneeded, but to catch any
possible conditions, explicitly unshare fs_struct on exec, if it ends up
needing to happen. This means fs->in_exec must be removed (since it would
never get cleared in the old copy), and is specifically no longer needed.

See also commit 498052bba55e ("New locking/refcounting for fs_struct").

This additionally allows the "in_exec" member to be dropped, showing an
8 bytes savings per fs_struct on 64-bit. Before:

struct fs_struct {
        int                        users;                /*     0     4 */
        spinlock_t                 lock;                 /*     4     4 */
        seqcount_spinlock_t        seq;                  /*     8     4 */
        int                        umask;                /*    12     4 */
        int                        in_exec;              /*    16     4 */

        /* XXX 4 bytes hole, try to pack */

        struct path                root;                 /*    24    16 */
        struct path                pwd;                  /*    40    16 */

        /* size: 56, cachelines: 1, members: 7 */
        /* sum members: 52, holes: 1, sum holes: 4 */
        /* last cacheline: 56 bytes */
};

After:

struct fs_struct {
        int                        users;                /*     0     4 */
        spinlock_t                 lock;                 /*     4     4 */
        seqcount_spinlock_t        seq;                  /*     8     4 */
        int                        umask;                /*    12     4 */
        struct path                root;                 /*    16    16 */
        struct path                pwd;                  /*    32    16 */

        /* size: 48, cachelines: 1, members: 6 */
        /* last cacheline: 48 bytes */
};

Reported-by: Jorge Merlino <jorge.merlino@canonical.com>
Link: https://lore.kernel.org/lkml/20220910211215.140270-1-jorge.merlino@canonical.com/
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Christian Brauner (Microsoft)" <brauner@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exec.c                 |  9 +++---
 fs/fs_struct.c            |  1 -
 include/linux/fdtable.h   |  1 +
 include/linux/fs_struct.h |  1 -
 kernel/fork.c             | 62 ++++++++++++++++++++++++++-------------
 5 files changed, 48 insertions(+), 26 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 902bce45b116..7d5f63f03c58 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1272,6 +1272,11 @@ int begin_new_exec(struct linux_binprm * bprm)
 	if (retval)
 		goto out;
 
+	/* Ensure the fs_struct is not shared. */
+	retval = unshare_fs();
+	if (retval)
+		goto out;
+
 	/*
 	 * Must be called _before_ exec_mmap() as bprm->mm is
 	 * not visible until then. This also enables the update
@@ -1583,8 +1588,6 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
 
 	if (p->fs->users > n_fs)
 		bprm->unsafe |= LSM_UNSAFE_SHARE;
-	else
-		p->fs->in_exec = 1;
 	spin_unlock(&p->fs->lock);
 }
 
@@ -1843,7 +1846,6 @@ static int bprm_execve(struct linux_binprm *bprm,
 		goto out;
 
 	/* execve succeeded */
-	current->fs->in_exec = 0;
 	current->in_execve = 0;
 	rseq_execve(current);
 	acct_update_integrals(current);
@@ -1861,7 +1863,6 @@ static int bprm_execve(struct linux_binprm *bprm,
 		force_fatal_sig(SIGSEGV);
 
 out_unmark:
-	current->fs->in_exec = 0;
 	current->in_execve = 0;
 
 	return retval;
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index 04b3f5b9c629..c27c67023d01 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -115,7 +115,6 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old)
 	/* We don't need to lock fs - think why ;-) */
 	if (fs) {
 		fs->users = 1;
-		fs->in_exec = 0;
 		spin_lock_init(&fs->lock);
 		seqcount_spinlock_init(&fs->seq, &fs->lock);
 		fs->umask = old->umask;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index e066816f3519..fbfb89ee3bda 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -117,6 +117,7 @@ struct task_struct;
 
 void put_files_struct(struct files_struct *fs);
 int unshare_files(void);
+int unshare_fs(void);
 struct files_struct *dup_fd(struct files_struct *, unsigned, int *) __latent_entropy;
 void do_close_on_exec(struct files_struct *);
 int iterate_fd(struct files_struct *, unsigned,
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index 783b48dedb72..08b82a90ceae 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -11,7 +11,6 @@ struct fs_struct {
 	spinlock_t lock;
 	seqcount_spinlock_t seq;
 	int umask;
-	int in_exec;
 	struct path root, pwd;
 } __randomize_layout;
 
diff --git a/kernel/fork.c b/kernel/fork.c
index b4a799d9c50f..53b7248f7a4b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1589,10 +1589,6 @@ static int copy_fs(unsigned long clone_flags, struct task_struct *tsk)
 	if (clone_flags & CLONE_FS) {
 		/* tsk->fs is already what we want */
 		spin_lock(&fs->lock);
-		if (fs->in_exec) {
-			spin_unlock(&fs->lock);
-			return -EAGAIN;
-		}
 		fs->users++;
 		spin_unlock(&fs->lock);
 		return 0;
@@ -3070,10 +3066,9 @@ static int check_unshare_flags(unsigned long unshare_flags)
 	return 0;
 }
 
-/*
- * Unshare the filesystem structure if it is being shared
- */
-static int unshare_fs(unsigned long unshare_flags, struct fs_struct **new_fsp)
+/* Allocate an fs_struct if it is currently shared and CLONE_FS requested. */
+static int unshare_fs_alloc(unsigned long unshare_flags,
+			    struct fs_struct **new_fsp)
 {
 	struct fs_struct *fs = current->fs;
 
@@ -3091,6 +3086,41 @@ static int unshare_fs(unsigned long unshare_flags, struct fs_struct **new_fsp)
 	return 0;
 }
 
+/* Swap out fs_struct, returning old fs if it needs freeing. */
+static void unshare_fs_finalize(struct fs_struct **new_fsp)
+{
+	struct task_struct *task = current;
+	struct fs_struct *fs = task->fs;
+
+	spin_lock(&fs->lock);
+	task->fs = *new_fsp;
+	if (--fs->users)
+		*new_fsp = NULL;
+	else
+		*new_fsp = fs;
+	spin_unlock(&fs->lock);
+}
+
+/*
+ * Unshare the filesystem structure if it is being shared
+ */
+int unshare_fs(void)
+{
+	struct fs_struct *new_fs = NULL;
+	int error;
+
+	error = unshare_fs_alloc(CLONE_FS, &new_fs);
+	if (error || !new_fs)
+		return error;
+
+	unshare_fs_finalize(&new_fs);
+
+	if (new_fs)
+		free_fs_struct(new_fs);
+
+	return 0;
+}
+
 /*
  * Unshare file descriptor table if it is being shared
  */
@@ -3120,7 +3150,7 @@ int unshare_fd(unsigned long unshare_flags, unsigned int max_fds,
  */
 int ksys_unshare(unsigned long unshare_flags)
 {
-	struct fs_struct *fs, *new_fs = NULL;
+	struct fs_struct *new_fs = NULL;
 	struct files_struct *new_fd = NULL;
 	struct cred *new_cred = NULL;
 	struct nsproxy *new_nsproxy = NULL;
@@ -3159,7 +3189,7 @@ int ksys_unshare(unsigned long unshare_flags)
 	 */
 	if (unshare_flags & (CLONE_NEWIPC|CLONE_SYSVSEM))
 		do_sysvsem = 1;
-	err = unshare_fs(unshare_flags, &new_fs);
+	err = unshare_fs_alloc(unshare_flags, &new_fs);
 	if (err)
 		goto bad_unshare_out;
 	err = unshare_fd(unshare_flags, NR_OPEN_MAX, &new_fd);
@@ -3197,16 +3227,8 @@ int ksys_unshare(unsigned long unshare_flags)
 
 		task_lock(current);
 
-		if (new_fs) {
-			fs = current->fs;
-			spin_lock(&fs->lock);
-			current->fs = new_fs;
-			if (--fs->users)
-				new_fs = NULL;
-			else
-				new_fs = fs;
-			spin_unlock(&fs->lock);
-		}
+		if (new_fs)
+			unshare_fs_finalize(&new_fs);
 
 		if (new_fd)
 			swap(current->files, new_fd);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] exec: Remove LSM_UNSAFE_SHARE
  2022-10-06  8:27 [PATCH 0/2] fs/exec: Explicitly unshare fs_struct on exec Kees Cook
  2022-10-06  8:27 ` [PATCH 1/2] " Kees Cook
@ 2022-10-06  8:27 ` Kees Cook
  1 sibling, 0 replies; 15+ messages in thread
From: Kees Cook @ 2022-10-06  8:27 UTC (permalink / raw)
  To: Eric Biederman
  Cc: Kees Cook, Alexander Viro, John Johansen, Paul Moore,
	James Morris, Serge E. Hallyn, Stephen Smalley, Eric Paris,
	Richard Haines, Casey Schaufler, Xin Long, David S. Miller,
	Todd Kjos, Ondrej Mosnacek, linux-fsdevel, linux-mm, apparmor,
	linux-security-module, selinux, Jorge Merlino,
	Christian Brauner (Microsoft),
	Thomas Gleixner, Andy Lutomirski, Sebastian Andrzej Siewior,
	Andrew Morton, Prashanth Prahlad, Micah Morton, Fenghua Yu,
	Andrei Vagin, linux-kernel, linux-hardening

With fs_struct explicitly unshared during exec, it is no longer possible
to have unexpected shared state, and LSM_UNSAFE_SHARE can be entirely
removed.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Richard Haines <richard_c_haines@btinternet.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Todd Kjos <tkjos@google.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: apparmor@lists.ubuntu.com
Cc: linux-security-module@vger.kernel.org
Cc: selinux@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exec.c                  | 17 +----------------
 include/linux/security.h   |  5 ++---
 security/apparmor/domain.c |  5 -----
 security/selinux/hooks.c   | 10 ----------
 4 files changed, 3 insertions(+), 34 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 7d5f63f03c58..3cd058711098 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1563,8 +1563,7 @@ EXPORT_SYMBOL(bprm_change_interp);
  */
 static void check_unsafe_exec(struct linux_binprm *bprm)
 {
-	struct task_struct *p = current, *t;
-	unsigned n_fs;
+	struct task_struct *p = current;
 
 	if (p->ptrace)
 		bprm->unsafe |= LSM_UNSAFE_PTRACE;
@@ -1575,20 +1574,6 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
 	 */
 	if (task_no_new_privs(current))
 		bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS;
-
-	t = p;
-	n_fs = 1;
-	spin_lock(&p->fs->lock);
-	rcu_read_lock();
-	while_each_thread(p, t) {
-		if (t->fs == p->fs)
-			n_fs++;
-	}
-	rcu_read_unlock();
-
-	if (p->fs->users > n_fs)
-		bprm->unsafe |= LSM_UNSAFE_SHARE;
-	spin_unlock(&p->fs->lock);
 }
 
 static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
diff --git a/include/linux/security.h b/include/linux/security.h
index 1bc362cb413f..db508a8c3cc7 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -215,9 +215,8 @@ struct sched_param;
 struct request_sock;
 
 /* bprm->unsafe reasons */
-#define LSM_UNSAFE_SHARE	1
-#define LSM_UNSAFE_PTRACE	2
-#define LSM_UNSAFE_NO_NEW_PRIVS	4
+#define LSM_UNSAFE_PTRACE	BIT(0)
+#define LSM_UNSAFE_NO_NEW_PRIVS	BIT(1)
 
 #ifdef CONFIG_MMU
 extern int mmap_min_addr_handler(struct ctl_table *table, int write,
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 91689d34d281..1b2c0bb4d9ae 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -924,11 +924,6 @@ int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm)
 		goto audit;
 	}
 
-	if (bprm->unsafe & LSM_UNSAFE_SHARE) {
-		/* FIXME: currently don't mediate shared state */
-		;
-	}
-
 	if (bprm->unsafe & (LSM_UNSAFE_PTRACE)) {
 		/* TODO: test needs to be profile of label to new */
 		error = may_change_ptraced_domain(new, &info);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 79573504783b..3ec80cc8ad1c 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2349,16 +2349,6 @@ static int selinux_bprm_creds_for_exec(struct linux_binprm *bprm)
 		if (rc)
 			return rc;
 
-		/* Check for shared state */
-		if (bprm->unsafe & LSM_UNSAFE_SHARE) {
-			rc = avc_has_perm(&selinux_state,
-					  old_tsec->sid, new_tsec->sid,
-					  SECCLASS_PROCESS, PROCESS__SHARE,
-					  NULL);
-			if (rc)
-				return -EPERM;
-		}
-
 		/* Make sure that anyone attempting to ptrace over a task that
 		 * changes its SID has the appropriate permit */
 		if (bprm->unsafe & LSM_UNSAFE_PTRACE) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-06  8:27 ` [PATCH 1/2] " Kees Cook
@ 2022-10-06  9:05   ` Christian Brauner
  2022-10-06 10:48     ` David Laight
  2022-10-06 14:13     ` Jann Horn
  0 siblings, 2 replies; 15+ messages in thread
From: Christian Brauner @ 2022-10-06  9:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric Biederman, Jorge Merlino, Alexander Viro, Thomas Gleixner,
	Andy Lutomirski, Sebastian Andrzej Siewior, Andrew Morton,
	linux-mm, linux-fsdevel, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Stephen Smalley, Eric Paris, Richard Haines,
	Casey Schaufler, Xin Long, David S. Miller, Todd Kjos,
	Ondrej Mosnacek, Prashanth Prahlad, Micah Morton, Fenghua Yu,
	Andrei Vagin, linux-kernel, apparmor, linux-security-module,
	selinux, linux-hardening

On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
> The check_unsafe_exec() counting of n_fs would not add up under a heavily
> threaded process trying to perform a suid exec, causing the suid portion
> to fail. This counting error appears to be unneeded, but to catch any
> possible conditions, explicitly unshare fs_struct on exec, if it ends up

Isn't this a potential uapi break? Afaict, before this change a call to
clone{3}(CLONE_FS) followed by an exec in the child would have the
parent and child share fs information. So if the child e.g., changes the
working directory post exec it would also affect the parent. But after
this change here this would no longer be true. So a child changing a
workding directoro would not affect the parent anymore. IOW, an exec is
accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
it seems like a non-trivial uapi change but there might be few users
that do clone{3}(CLONE_FS) followed by an exec.

> +/*
> + * Unshare the filesystem structure if it is being shared
> + */
> +int unshare_fs(void)
> +{
> +	struct fs_struct *new_fs = NULL;
> +	int error;
> +
> +	error = unshare_fs_alloc(CLONE_FS, &new_fs);
> +	if (error || !new_fs)
> +		return error;

You should just check for error here, not new_fs.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-06  9:05   ` Christian Brauner
@ 2022-10-06 10:48     ` David Laight
  2022-10-06 14:13     ` Jann Horn
  1 sibling, 0 replies; 15+ messages in thread
From: David Laight @ 2022-10-06 10:48 UTC (permalink / raw)
  To: 'Christian Brauner', Kees Cook
  Cc: Eric Biederman, Jorge Merlino, Alexander Viro, Thomas Gleixner,
	Andy Lutomirski, Sebastian Andrzej Siewior, Andrew Morton,
	linux-mm, linux-fsdevel, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Stephen Smalley, Eric Paris, Richard Haines,
	Casey Schaufler, Xin Long, David S. Miller, Todd Kjos,
	Ondrej Mosnacek, Prashanth Prahlad, Micah Morton, Fenghua Yu,
	Andrei Vagin, linux-kernel, apparmor, linux-security-module,
	selinux, linux-hardening

From: Christian Brauner
> Sent: 06 October 2022 10:05
> 
> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
> > The check_unsafe_exec() counting of n_fs would not add up under a heavily
> > threaded process trying to perform a suid exec, causing the suid portion
> > to fail. This counting error appears to be unneeded, but to catch any
> > possible conditions, explicitly unshare fs_struct on exec, if it ends up
> 
> Isn't this a potential uapi break? Afaict, before this change a call to
> clone{3}(CLONE_FS) followed by an exec in the child would have the
> parent and child share fs information. So if the child e.g., changes the
> working directory post exec it would also affect the parent. But after
> this change here this would no longer be true. So a child changing a
> workding directoro would not affect the parent anymore. IOW, an exec is
> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
> it seems like a non-trivial uapi change but there might be few users
> that do clone{3}(CLONE_FS) followed by an exec.

The thought of that is entirely horrid...

I presume a suid exec will fail in that case?

If the old code is trying to compare the number of threads
with the number of users of the fs table isn't is just buggy?
If a thread unshares the fs table there can be another
reference somewhere else - which is what (I presume) is being
tested for.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-06  9:05   ` Christian Brauner
  2022-10-06 10:48     ` David Laight
@ 2022-10-06 14:13     ` Jann Horn
  2022-10-06 15:25       ` Kees Cook
  2022-10-14  3:18       ` Andy Lutomirski
  1 sibling, 2 replies; 15+ messages in thread
From: Jann Horn @ 2022-10-06 14:13 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Kees Cook, Eric Biederman, Jorge Merlino, Alexander Viro,
	Thomas Gleixner, Andy Lutomirski, Sebastian Andrzej Siewior,
	Andrew Morton, linux-mm, linux-fsdevel, John Johansen,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Eric Paris, Richard Haines, Casey Schaufler, Xin Long,
	David S. Miller, Todd Kjos, Ondrej Mosnacek, Prashanth Prahlad,
	Micah Morton, Fenghua Yu, Andrei Vagin, linux-kernel, apparmor,
	linux-security-module, selinux, linux-hardening

On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner@kernel.org> wrote:
> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
> > The check_unsafe_exec() counting of n_fs would not add up under a heavily
> > threaded process trying to perform a suid exec, causing the suid portion
> > to fail. This counting error appears to be unneeded, but to catch any
> > possible conditions, explicitly unshare fs_struct on exec, if it ends up
>
> Isn't this a potential uapi break? Afaict, before this change a call to
> clone{3}(CLONE_FS) followed by an exec in the child would have the
> parent and child share fs information. So if the child e.g., changes the
> working directory post exec it would also affect the parent. But after
> this change here this would no longer be true. So a child changing a
> workding directoro would not affect the parent anymore. IOW, an exec is
> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
> it seems like a non-trivial uapi change but there might be few users
> that do clone{3}(CLONE_FS) followed by an exec.

I believe the following code in Chromium explicitly relies on this
behavior, but I'm not sure whether this code is in active use anymore:

https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-06 14:13     ` Jann Horn
@ 2022-10-06 15:25       ` Kees Cook
  2022-10-06 15:35         ` Jann Horn
  2022-10-14  3:18       ` Andy Lutomirski
  1 sibling, 1 reply; 15+ messages in thread
From: Kees Cook @ 2022-10-06 15:25 UTC (permalink / raw)
  To: Jann Horn, Christian Brauner
  Cc: Eric Biederman, Jorge Merlino, Alexander Viro, Thomas Gleixner,
	Andy Lutomirski, Sebastian Andrzej Siewior, Andrew Morton,
	linux-mm, linux-fsdevel, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Stephen Smalley, Eric Paris, Richard Haines,
	Casey Schaufler, Xin Long, David S. Miller, Todd Kjos,
	Ondrej Mosnacek, Prashanth Prahlad, Micah Morton, Fenghua Yu,
	Andrei Vagin, linux-kernel, apparmor, linux-security-module,
	selinux, linux-hardening



On October 6, 2022 7:13:37 AM PDT, Jann Horn <jannh@google.com> wrote:
>On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner@kernel.org> wrote:
>> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
>> > The check_unsafe_exec() counting of n_fs would not add up under a heavily
>> > threaded process trying to perform a suid exec, causing the suid portion
>> > to fail. This counting error appears to be unneeded, but to catch any
>> > possible conditions, explicitly unshare fs_struct on exec, if it ends up
>>
>> Isn't this a potential uapi break? Afaict, before this change a call to
>> clone{3}(CLONE_FS) followed by an exec in the child would have the
>> parent and child share fs information. So if the child e.g., changes the
>> working directory post exec it would also affect the parent. But after
>> this change here this would no longer be true. So a child changing a
>> workding directoro would not affect the parent anymore. IOW, an exec is
>> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
>> it seems like a non-trivial uapi change but there might be few users
>> that do clone{3}(CLONE_FS) followed by an exec.
>
>I believe the following code in Chromium explicitly relies on this
>behavior, but I'm not sure whether this code is in active use anymore:
>
>https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium

Oh yes. I think I had tried to forget this existed. Ugh. Okay, so back to the drawing board, I guess. The counting will need to be fixed...

It's possible we can move the counting after dethread -- it seems the early count was just to avoid setting flags after the point of no return, but it's not an error condition...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-06 15:25       ` Kees Cook
@ 2022-10-06 15:35         ` Jann Horn
  0 siblings, 0 replies; 15+ messages in thread
From: Jann Horn @ 2022-10-06 15:35 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christian Brauner, Eric Biederman, Jorge Merlino, Alexander Viro,
	Thomas Gleixner, Andy Lutomirski, Sebastian Andrzej Siewior,
	Andrew Morton, linux-mm, linux-fsdevel, John Johansen,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Eric Paris, Richard Haines, Casey Schaufler, Xin Long,
	David S. Miller, Todd Kjos, Ondrej Mosnacek, Prashanth Prahlad,
	Micah Morton, Fenghua Yu, Andrei Vagin, linux-kernel, apparmor,
	linux-security-module, selinux, linux-hardening

On Thu, Oct 6, 2022 at 5:25 PM Kees Cook <keescook@chromium.org> wrote:
> On October 6, 2022 7:13:37 AM PDT, Jann Horn <jannh@google.com> wrote:
> >On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner@kernel.org> wrote:
> >> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
> >> > The check_unsafe_exec() counting of n_fs would not add up under a heavily
> >> > threaded process trying to perform a suid exec, causing the suid portion
> >> > to fail. This counting error appears to be unneeded, but to catch any
> >> > possible conditions, explicitly unshare fs_struct on exec, if it ends up
> >>
> >> Isn't this a potential uapi break? Afaict, before this change a call to
> >> clone{3}(CLONE_FS) followed by an exec in the child would have the
> >> parent and child share fs information. So if the child e.g., changes the
> >> working directory post exec it would also affect the parent. But after
> >> this change here this would no longer be true. So a child changing a
> >> workding directoro would not affect the parent anymore. IOW, an exec is
> >> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
> >> it seems like a non-trivial uapi change but there might be few users
> >> that do clone{3}(CLONE_FS) followed by an exec.
> >
> >I believe the following code in Chromium explicitly relies on this
> >behavior, but I'm not sure whether this code is in active use anymore:
> >
> >https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium
>
> Oh yes. I think I had tried to forget this existed. Ugh. Okay, so back to the drawing board, I guess. The counting will need to be fixed...
>
> It's possible we can move the counting after dethread -- it seems the early count was just to avoid setting flags after the point of no return, but it's not an error condition...

Random idea that I haven't thought about a lot:

One approach might be to not do it by counting, but instead have a
flag on the fs_struct that we set when someone does a clone() with
CLONE_FS but without CLONE_THREAD? Then we'd end up with the following
possible states for fs_struct:

 - single-process, normal
 - single-process, pending execve past check_unsafe_exec() (prevent
concurrent CLONE_FS)
 - shared between processes

The slight difference from the old semantics would be that once you've
used CLONE_FS without CLONE_THREAD, you can never do setuid execve()
from your current process again (without calling unshare()), even if
the child disappears in the meantime. I think that might be an
acceptably small UAPI break.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-06 14:13     ` Jann Horn
  2022-10-06 15:25       ` Kees Cook
@ 2022-10-14  3:18       ` Andy Lutomirski
  2022-10-14  3:54         ` Kees Cook
                           ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Andy Lutomirski @ 2022-10-14  3:18 UTC (permalink / raw)
  To: Jann Horn, Christian Brauner
  Cc: Kees Cook, Eric W. Biederman, Jorge Merlino, Al Viro,
	Thomas Gleixner, Sebastian Andrzej Siewior, Andrew Morton,
	linux-mm, linux-fsdevel, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Stephen Smalley, Eric Paris, Richard Haines,
	Casey Schaufler, Xin Long, David S. Miller, Todd Kjos,
	Ondrej Mosnacek, Prashanth Prahlad, Micah Morton, Fenghua Yu,
	Andrei Vagin, Linux Kernel Mailing List, apparmor,
	linux-security-module, selinux, linux-hardening



On Thu, Oct 6, 2022, at 7:13 AM, Jann Horn wrote:
> On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner@kernel.org> wrote:
>> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
>> > The check_unsafe_exec() counting of n_fs would not add up under a heavily
>> > threaded process trying to perform a suid exec, causing the suid portion
>> > to fail. This counting error appears to be unneeded, but to catch any
>> > possible conditions, explicitly unshare fs_struct on exec, if it ends up
>>
>> Isn't this a potential uapi break? Afaict, before this change a call to
>> clone{3}(CLONE_FS) followed by an exec in the child would have the
>> parent and child share fs information. So if the child e.g., changes the
>> working directory post exec it would also affect the parent. But after
>> this change here this would no longer be true. So a child changing a
>> workding directoro would not affect the parent anymore. IOW, an exec is
>> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
>> it seems like a non-trivial uapi change but there might be few users
>> that do clone{3}(CLONE_FS) followed by an exec.
>
> I believe the following code in Chromium explicitly relies on this
> behavior, but I'm not sure whether this code is in active use anymore:
>
> https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium

Wait, this is absolutely nucking futs.  On a very quick inspection, the sharable things like this are fs, files, sighand, and io.    files and sighand get unshared, which makes sense.  fs supposedly checks for extra refs and prevents gaining privilege.  io is... ignored!  At least it's not immediately obvious that io is a problem.

But seriously, this makes no sense at all.  It should not be possible to exec a program and then, without ptrace, change its cwd out from under it.  Do we really need to preserve this behavior?

--Andy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-14  3:18       ` Andy Lutomirski
@ 2022-10-14  3:54         ` Kees Cook
  2022-10-14 15:35         ` Jann Horn
  2022-10-14 22:03         ` David Laight
  2 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2022-10-14  3:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jann Horn, Christian Brauner, Eric W. Biederman, Jorge Merlino,
	Al Viro, Thomas Gleixner, Sebastian Andrzej Siewior,
	Andrew Morton, linux-mm, linux-fsdevel, John Johansen,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Eric Paris, Richard Haines, Casey Schaufler, Xin Long,
	David S. Miller, Todd Kjos, Ondrej Mosnacek, Prashanth Prahlad,
	Micah Morton, Fenghua Yu, Andrei Vagin,
	Linux Kernel Mailing List, apparmor, linux-security-module,
	selinux, linux-hardening

On Thu, Oct 13, 2022 at 08:18:04PM -0700, Andy Lutomirski wrote:
> But seriously, this makes no sense at all.  It should not be possible to exec a program and then, without ptrace, change its cwd out from under it.  Do we really need to preserve this behavior?

Yup, already abandoned:
https://lore.kernel.org/lkml/202210061301.207A20C8E5@keescook/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-14  3:18       ` Andy Lutomirski
  2022-10-14  3:54         ` Kees Cook
@ 2022-10-14 15:35         ` Jann Horn
  2022-10-18  7:09           ` Kees Cook
  2022-10-14 22:03         ` David Laight
  2 siblings, 1 reply; 15+ messages in thread
From: Jann Horn @ 2022-10-14 15:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christian Brauner, Kees Cook, Eric W. Biederman, Jorge Merlino,
	Al Viro, Thomas Gleixner, Sebastian Andrzej Siewior,
	Andrew Morton, linux-mm, linux-fsdevel, John Johansen,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Eric Paris, Richard Haines, Casey Schaufler, Xin Long,
	David S. Miller, Todd Kjos, Ondrej Mosnacek, Prashanth Prahlad,
	Micah Morton, Fenghua Yu, Andrei Vagin,
	Linux Kernel Mailing List, apparmor, linux-security-module,
	selinux, linux-hardening

On Fri, Oct 14, 2022 at 5:18 AM Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Oct 6, 2022, at 7:13 AM, Jann Horn wrote:
> > On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner@kernel.org> wrote:
> >> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
> >> > The check_unsafe_exec() counting of n_fs would not add up under a heavily
> >> > threaded process trying to perform a suid exec, causing the suid portion
> >> > to fail. This counting error appears to be unneeded, but to catch any
> >> > possible conditions, explicitly unshare fs_struct on exec, if it ends up
> >>
> >> Isn't this a potential uapi break? Afaict, before this change a call to
> >> clone{3}(CLONE_FS) followed by an exec in the child would have the
> >> parent and child share fs information. So if the child e.g., changes the
> >> working directory post exec it would also affect the parent. But after
> >> this change here this would no longer be true. So a child changing a
> >> workding directoro would not affect the parent anymore. IOW, an exec is
> >> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
> >> it seems like a non-trivial uapi change but there might be few users
> >> that do clone{3}(CLONE_FS) followed by an exec.
> >
> > I believe the following code in Chromium explicitly relies on this
> > behavior, but I'm not sure whether this code is in active use anymore:
> >
> > https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium
>
> Wait, this is absolutely nucking futs.  On a very quick inspection, the sharable things like this are fs, files, sighand, and io.    files and sighand get unshared, which makes sense.  fs supposedly checks for extra refs and prevents gaining privilege.  io is... ignored!  At least it's not immediately obvious that io is a problem.
>
> But seriously, this makes no sense at all.  It should not be possible to exec a program and then, without ptrace, change its cwd out from under it.  Do we really need to preserve this behavior?

I agree that this is pretty wild.

The single user I'm aware of is Chrome, and as far as I know, they use
it for establishing their sandbox on systems where unprivileged user
namespaces are disabled - see
<https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox.md>.
They also have seccomp-based sandboxing, but IIRC there are some small
holes that mean it's still useful for them to be able to set up
namespaces, like how sendmsg() on a unix domain socket can specify a
file path as the destination address.

(By the way, I think maybe Chrome wouldn't need this wacky trick with
the shared fs_struct if the "NO_NEW_PRIVS permits chroot()" thing had
ever landed that you
(https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/)
and Mickaël Salaün proposed in the past... or alternatively, if there
was a way to properly filter all the syscalls that Chrome has to
permit for renderers.)

(But also, to be clear, I don't speak for Chrome, this is just my
understanding of how their stuff works.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-14  3:18       ` Andy Lutomirski
  2022-10-14  3:54         ` Kees Cook
  2022-10-14 15:35         ` Jann Horn
@ 2022-10-14 22:03         ` David Laight
  2022-11-28 17:49           ` Eric W. Biederman
  2 siblings, 1 reply; 15+ messages in thread
From: David Laight @ 2022-10-14 22:03 UTC (permalink / raw)
  To: 'Andy Lutomirski', Jann Horn, Christian Brauner
  Cc: Kees Cook, Eric W. Biederman, Jorge Merlino, Al Viro,
	Thomas Gleixner, Sebastian Andrzej Siewior, Andrew Morton,
	linux-mm, linux-fsdevel, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Stephen Smalley, Eric Paris, Richard Haines,
	Casey Schaufler, Xin Long, David S. Miller, Todd Kjos,
	Ondrej Mosnacek, Prashanth Prahlad, Micah Morton, Fenghua Yu,
	Andrei Vagin, Linux Kernel Mailing List, apparmor,
	linux-security-module, selinux, linux-hardening

From: Andy Lutomirski
> Sent: 14 October 2022 04:18
...
> But seriously, this makes no sense at all.  It should not be possible to exec a program and then,
> without ptrace, change its cwd out from under it.  Do we really need to preserve this behavior?

it maybe ok if the exec'ed program also 'bought-in' to the
fact that its cwd and open files might get changed.
But imagine someone doing it to a login shell!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-14 15:35         ` Jann Horn
@ 2022-10-18  7:09           ` Kees Cook
  2022-10-18 11:19             ` Jann Horn
  0 siblings, 1 reply; 15+ messages in thread
From: Kees Cook @ 2022-10-18  7:09 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andy Lutomirski, Christian Brauner, Eric W. Biederman,
	Jorge Merlino, Al Viro, Thomas Gleixner,
	Sebastian Andrzej Siewior, Andrew Morton, linux-mm,
	linux-fsdevel, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Stephen Smalley, Eric Paris, Richard Haines,
	Casey Schaufler, Xin Long, David S. Miller, Todd Kjos,
	Ondrej Mosnacek, Prashanth Prahlad, Micah Morton, Fenghua Yu,
	Andrei Vagin, Linux Kernel Mailing List, apparmor,
	linux-security-module, selinux, linux-hardening

On Fri, Oct 14, 2022 at 05:35:26PM +0200, Jann Horn wrote:
> On Fri, Oct 14, 2022 at 5:18 AM Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, Oct 6, 2022, at 7:13 AM, Jann Horn wrote:
> > > On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner@kernel.org> wrote:
> > >> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
> > >> > The check_unsafe_exec() counting of n_fs would not add up under a heavily
> > >> > threaded process trying to perform a suid exec, causing the suid portion
> > >> > to fail. This counting error appears to be unneeded, but to catch any
> > >> > possible conditions, explicitly unshare fs_struct on exec, if it ends up
> > >>
> > >> Isn't this a potential uapi break? Afaict, before this change a call to
> > >> clone{3}(CLONE_FS) followed by an exec in the child would have the
> > >> parent and child share fs information. So if the child e.g., changes the
> > >> working directory post exec it would also affect the parent. But after
> > >> this change here this would no longer be true. So a child changing a
> > >> workding directoro would not affect the parent anymore. IOW, an exec is
> > >> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
> > >> it seems like a non-trivial uapi change but there might be few users
> > >> that do clone{3}(CLONE_FS) followed by an exec.
> > >
> > > I believe the following code in Chromium explicitly relies on this
> > > behavior, but I'm not sure whether this code is in active use anymore:
> > >
> > > https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium
> >
> > Wait, this is absolutely nucking futs.  On a very quick inspection, the sharable things like this are fs, files, sighand, and io.    files and sighand get unshared, which makes sense.  fs supposedly checks for extra refs and prevents gaining privilege.  io is... ignored!  At least it's not immediately obvious that io is a problem.
> >
> > But seriously, this makes no sense at all.  It should not be possible to exec a program and then, without ptrace, change its cwd out from under it.  Do we really need to preserve this behavior?
> 
> I agree that this is pretty wild.
> 
> The single user I'm aware of is Chrome, and as far as I know, they use
> it for establishing their sandbox on systems where unprivileged user
> namespaces are disabled - see
> <https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox.md>.
> They also have seccomp-based sandboxing, but IIRC there are some small
> holes that mean it's still useful for them to be able to set up
> namespaces, like how sendmsg() on a unix domain socket can specify a
> file path as the destination address.
> 
> (By the way, I think maybe Chrome wouldn't need this wacky trick with
> the shared fs_struct if the "NO_NEW_PRIVS permits chroot()" thing had
> ever landed that you
> (https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/)
> and Mickaël Salaün proposed in the past... or alternatively, if there
> was a way to properly filter all the syscalls that Chrome has to
> permit for renderers.)
> 
> (But also, to be clear, I don't speak for Chrome, this is just my
> understanding of how their stuff works.)

Chrome seems to just want a totally empty filesystem view, yes?
Let's land the nnp+chroot change. :P Only 10 years late! Then we can
have Chrome use this and we can unshare fs on exec...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-18  7:09           ` Kees Cook
@ 2022-10-18 11:19             ` Jann Horn
  0 siblings, 0 replies; 15+ messages in thread
From: Jann Horn @ 2022-10-18 11:19 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Christian Brauner, Eric W. Biederman,
	Jorge Merlino, Al Viro, Thomas Gleixner,
	Sebastian Andrzej Siewior, Andrew Morton, linux-mm,
	linux-fsdevel, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Stephen Smalley, Eric Paris, Richard Haines,
	Casey Schaufler, Xin Long, David S. Miller, Todd Kjos,
	Ondrej Mosnacek, Prashanth Prahlad, Micah Morton, Fenghua Yu,
	Andrei Vagin, Linux Kernel Mailing List, apparmor,
	linux-security-module, selinux, linux-hardening

On Tue, Oct 18, 2022 at 9:09 AM Kees Cook <keescook@chromium.org> wrote:
> On Fri, Oct 14, 2022 at 05:35:26PM +0200, Jann Horn wrote:
> > On Fri, Oct 14, 2022 at 5:18 AM Andy Lutomirski <luto@kernel.org> wrote:
> > > On Thu, Oct 6, 2022, at 7:13 AM, Jann Horn wrote:
> > > > On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner@kernel.org> wrote:
> > > >> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
> > > >> > The check_unsafe_exec() counting of n_fs would not add up under a heavily
> > > >> > threaded process trying to perform a suid exec, causing the suid portion
> > > >> > to fail. This counting error appears to be unneeded, but to catch any
> > > >> > possible conditions, explicitly unshare fs_struct on exec, if it ends up
> > > >>
> > > >> Isn't this a potential uapi break? Afaict, before this change a call to
> > > >> clone{3}(CLONE_FS) followed by an exec in the child would have the
> > > >> parent and child share fs information. So if the child e.g., changes the
> > > >> working directory post exec it would also affect the parent. But after
> > > >> this change here this would no longer be true. So a child changing a
> > > >> workding directoro would not affect the parent anymore. IOW, an exec is
> > > >> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
> > > >> it seems like a non-trivial uapi change but there might be few users
> > > >> that do clone{3}(CLONE_FS) followed by an exec.
> > > >
> > > > I believe the following code in Chromium explicitly relies on this
> > > > behavior, but I'm not sure whether this code is in active use anymore:
> > > >
> > > > https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium
> > >
> > > Wait, this is absolutely nucking futs.  On a very quick inspection, the sharable things like this are fs, files, sighand, and io.    files and sighand get unshared, which makes sense.  fs supposedly checks for extra refs and prevents gaining privilege.  io is... ignored!  At least it's not immediately obvious that io is a problem.
> > >
> > > But seriously, this makes no sense at all.  It should not be possible to exec a program and then, without ptrace, change its cwd out from under it.  Do we really need to preserve this behavior?
> >
> > I agree that this is pretty wild.
> >
> > The single user I'm aware of is Chrome, and as far as I know, they use
> > it for establishing their sandbox on systems where unprivileged user
> > namespaces are disabled - see
> > <https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox.md>.
> > They also have seccomp-based sandboxing, but IIRC there are some small
> > holes that mean it's still useful for them to be able to set up
> > namespaces, like how sendmsg() on a unix domain socket can specify a
> > file path as the destination address.
> >
> > (By the way, I think maybe Chrome wouldn't need this wacky trick with
> > the shared fs_struct if the "NO_NEW_PRIVS permits chroot()" thing had
> > ever landed that you
> > (https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/)
> > and Mickaël Salaün proposed in the past... or alternatively, if there
> > was a way to properly filter all the syscalls that Chrome has to
> > permit for renderers.)
> >
> > (But also, to be clear, I don't speak for Chrome, this is just my
> > understanding of how their stuff works.)
>
> Chrome seems to just want a totally empty filesystem view, yes?
> Let's land the nnp+chroot change. :P Only 10 years late! Then we can
> have Chrome use this and we can unshare fs on exec...

Someone should check with Chrome first though to make sure what I said
accurately represents what they think...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
  2022-10-14 22:03         ` David Laight
@ 2022-11-28 17:49           ` Eric W. Biederman
  0 siblings, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2022-11-28 17:49 UTC (permalink / raw)
  To: David Laight
  Cc: 'Andy Lutomirski',
	Jann Horn, Christian Brauner, Kees Cook, Jorge Merlino, Al Viro,
	Thomas Gleixner, Sebastian Andrzej Siewior, Andrew Morton,
	linux-mm, linux-fsdevel, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Stephen Smalley, Eric Paris, Richard Haines,
	Casey Schaufler, Xin Long, David S. Miller, Todd Kjos,
	Ondrej Mosnacek, Prashanth Prahlad, Micah Morton, Fenghua Yu,
	Andrei Vagin, Linux Kernel Mailing List, apparmor,
	linux-security-module, selinux, linux-hardening

David Laight <David.Laight@ACULAB.COM> writes:

> From: Andy Lutomirski
>> Sent: 14 October 2022 04:18
> ...
>> But seriously, this makes no sense at all.  It should not be possible to exec a program and then,
>> without ptrace, change its cwd out from under it.  Do we really need to preserve this behavior?
>
> it maybe ok if the exec'ed program also 'bought-in' to the
> fact that its cwd and open files might get changed.
> But imagine someone doing it to a login shell!


I am slowly catching up on my email and I saw this conversation.

When I initially saw this thread I was confused and thought this
might run into an issue with fs/locks.c.  I was close but wrong.
fs/locks.c uses current->files as a sort of process identifier
and so is very sensitive to when it is unshared.  Making
unsharing current->files unconditionally a bug.  Not relevant to
this conversation.


There are several clone options that were only relevant for the old
LinuxThreads implementation including CLONE_FS and CLONE_SIGHAND.
The LinuxThreads implementation has not been needed since
the introduction of CLONE_THREAD in linux-2.6.0 in 17 Dec 2003.
Almost 20 years ago.

I suggest we introduce CONFIG_CLONE_FS and CONFIG_SIGHAND to allow
disabling support of these clone options.  No known user space will
care.  The are both getting in the way of kernel maintenance so there
is a reason to start pushing them out.

Further simply not worrying about UNSHARE_FS during exec fixes the
race so it essentially a bug fix by code removal.

I believe something like the patch below should get the job done.

diff --git a/fs/exec.c b/fs/exec.c
index a0b1f0337a62..7ff13c77ad04 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1186,7 +1186,8 @@ static int unshare_sighand(struct task_struct *me)
 {
 	struct sighand_struct *oldsighand = me->sighand;
 
-	if (refcount_read(&oldsighand->count) != 1) {
+	if (IS_ENABLED(CONFIG_CLONE_SIGHAND) &&
+	    refcount_read(&oldsighand->count) != 1) {
 		struct sighand_struct *newsighand;
 		/*
 		 * This ->sighand is shared with the CLONE_SIGHAND
@@ -1568,6 +1569,9 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
 	if (task_no_new_privs(current))
 		bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS;
 
+	if (!IS_ENABLED(CONFIG_CLONE_FS))
+		return;
+
 	t = p;
 	n_fs = 1;
 	spin_lock(&p->fs->lock);
diff --git a/init/Kconfig b/init/Kconfig
index 94125d3b6893..8660a6bcc1cf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1764,6 +1764,23 @@ config KALLSYMS_BASE_RELATIVE
 	  time constants, and no relocation pass is required at runtime to fix
 	  up the entries based on the runtime load address of the kernel.
 
+config CLONE_FS
+	bool
+	default y
+	help
+	  Support CLONE_FS being passed to clone.  The only known user
+	  is the old LinuxThreads package so it should be safe to disable
+	  this option.
+
+config CLONE_SIGHAND
+	bool
+	default y
+	help
+	  Support CLONE_SIGHAND being passed to clone.  The only known user
+	  is the old LinuxThreads package so it should be safe to disable
+	  this option.
+
+
 # end of the "standard kernel features (expert users)" menu
 
 # syscall, maps, verifier
diff --git a/kernel/fork.c b/kernel/fork.c
index 08969f5aa38d..da9017b51da4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2023,6 +2023,16 @@ static __latent_entropy struct task_struct *copy_process(
 	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
 		return ERR_PTR(-EINVAL);
 
+	/* Don't allow CLONE_FS if not enabled */
+	if (!IS_ENABLED(CONFIG_CLONE_FS) &&
+	    ((clone_flags & (CLONE_THREAD | CLONE_FS)) == CLONE_FS))
+		return ERR_PTR(-EINVAL);
+
+	/* Don't allow CLONE_SIGHAND if not enabled */
+	if (!IS_ENABLED(CONFIG_CLONE_SIGHAND) &&
+	    ((clone_flags & (CLONE_THREAD | CLONE_SIGHAND)) == CLONE_SIGHAND))
+		return ERR_PTR(-EINVAL);
+
 	/*
 	 * Siblings of global init remain as zombies on exit since they are
 	 * not reaped by their parent (swapper). To solve this and to avoid
@@ -3101,6 +3111,9 @@ static int unshare_fs(unsigned long unshare_flags, struct fs_struct **new_fsp)
 	if (fs->users == 1)
 		return 0;
 
+	if (!IS_ENABLED(CONFIG_CLONE_FS))
+		return -EINVAL;
+
 	*new_fsp = copy_fs_struct(fs);
 	if (!*new_fsp)
 		return -ENOMEM;

Eric

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-11-28 18:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06  8:27 [PATCH 0/2] fs/exec: Explicitly unshare fs_struct on exec Kees Cook
2022-10-06  8:27 ` [PATCH 1/2] " Kees Cook
2022-10-06  9:05   ` Christian Brauner
2022-10-06 10:48     ` David Laight
2022-10-06 14:13     ` Jann Horn
2022-10-06 15:25       ` Kees Cook
2022-10-06 15:35         ` Jann Horn
2022-10-14  3:18       ` Andy Lutomirski
2022-10-14  3:54         ` Kees Cook
2022-10-14 15:35         ` Jann Horn
2022-10-18  7:09           ` Kees Cook
2022-10-18 11:19             ` Jann Horn
2022-10-14 22:03         ` David Laight
2022-11-28 17:49           ` Eric W. Biederman
2022-10-06  8:27 ` [PATCH 2/2] exec: Remove LSM_UNSAFE_SHARE Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).