All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] compat_do_execve should unshare_files
@ 2009-03-28 23:16 Hugh Dickins
  2009-03-28 23:20 ` [PATCH 2/4] fix setuid sometimes doesn't Hugh Dickins
                   ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Hugh Dickins @ 2009-03-28 23:16 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Oleg Nesterov, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

2.6.26's commit fd8328be874f4190a811c58cd4778ec2c74d2c05
"sanitize handling of shared descriptor tables in failing execve()"
moved the unshare_files() from flush_old_exec() and several binfmts
to the head of do_execve(); but forgot to make the same change to
compat_do_execve(), leaving a CLONE_FILES files_struct shared across
exec from a 32-bit process on a 64-bit kernel.

It's arguable whether the files_struct really ought to be unshared
across exec; but 2.6.1 made that so to stop the loading binary's fd
leaking into other threads, and a 32-bit process on a 64-bit kernel
ought to behave in the same way as 32 on 32 and 64 on 64.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
---

 fs/compat.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- 2.6.29/fs/compat.c	2009-03-23 23:12:14.000000000 +0000
+++ linux/fs/compat.c	2009-03-28 18:06:02.000000000 +0000
@@ -1392,12 +1392,17 @@ int compat_do_execve(char * filename,
 {
 	struct linux_binprm *bprm;
 	struct file *file;
+	struct files_struct *displaced;
 	int retval;
 
+	retval = unshare_files(&displaced);
+	if (retval)
+		goto out_ret;
+
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
 	if (!bprm)
-		goto out_ret;
+		goto out_files;
 
 	retval = mutex_lock_interruptible(&current->cred_exec_mutex);
 	if (retval < 0)
@@ -1457,6 +1462,8 @@ int compat_do_execve(char * filename,
 	mutex_unlock(&current->cred_exec_mutex);
 	acct_update_integrals(current);
 	free_bprm(bprm);
+	if (displaced)
+		put_files_struct(displaced);
 	return retval;
 
 out:
@@ -1475,6 +1482,9 @@ out_unlock:
 out_free:
 	free_bprm(bprm);
 
+out_files:
+	if (displaced)
+		reset_files_struct(displaced);
 out_ret:
 	return retval;
 }

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/4] fix setuid sometimes doesn't
  2009-03-28 23:16 [PATCH 1/4] compat_do_execve should unshare_files Hugh Dickins
@ 2009-03-28 23:20 ` Hugh Dickins
  2009-03-29  0:53   ` Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't) Oleg Nesterov
  2009-03-28 23:21 ` [PATCH 3/4] fix setuid sometimes wouldn't Hugh Dickins
  2009-03-28 23:23 ` [PATCH 4/4] Annotate struct fs_struct's usage count restriction Hugh Dickins
  2 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2009-03-28 23:20 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Oleg Nesterov, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

Joe Malicki reports that setuid sometimes doesn't: very rarely,
a setuid root program does not get root euid; and, by the way,
they have a health check running lsof every few minutes.

Right, check_unsafe_exec() notes whether the files_struct is being
shared by more threads than will get killed by the exec, and if so
sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
But /proc/<pid>/fd and /proc/<pid>/fdinfo lookups make transient
use of get_files_struct(), which also raises that sharing count.

There's a rather simple fix for this: exec's check on files->count
has been redundant ever since 2.6.1 made it unshare_files() (except
while compat_do_execve() omitted to do so) - just remove that check.

[Note to -stable: this patch will not apply before 2.6.29: earlier
releases should just remove the files->count line from unsafe_exec().]

Reported-by: Joe Malicki <jmalicki@metacarta.com>
Narrowed-down-by: Michael Itz <mitz@metacarta.com>
Tested-by: Joe Malicki <jmalicki@metacarta.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
---

 fs/compat.c   |    2 +-
 fs/exec.c     |   10 +++-------
 fs/internal.h |    2 +-
 3 files changed, 5 insertions(+), 9 deletions(-)

--- 2.6.29/fs/compat.c	2009-03-23 23:12:14.000000000 +0000
+++ linux/fs/compat.c	2009-03-28 18:06:02.000000000 +0000
@@ -1412,7 +1412,7 @@ int compat_do_execve(char * filename,
 	bprm->cred = prepare_exec_creds();
 	if (!bprm->cred)
 		goto out_unlock;
-	check_unsafe_exec(bprm, current->files);
+	check_unsafe_exec(bprm);
 
 	file = open_exec(filename);
 	retval = PTR_ERR(file);
--- 2.6.29/fs/exec.c	2009-03-23 23:12:14.000000000 +0000
+++ linux/fs/exec.c	2009-03-28 18:06:02.000000000 +0000
@@ -1049,28 +1049,24 @@ EXPORT_SYMBOL(install_exec_creds);
  * - the caller must hold current->cred_exec_mutex to protect against
  *   PTRACE_ATTACH
  */
-void check_unsafe_exec(struct linux_binprm *bprm, struct files_struct *files)
+void check_unsafe_exec(struct linux_binprm *bprm)
 {
 	struct task_struct *p = current, *t;
 	unsigned long flags;
-	unsigned n_fs, n_files, n_sighand;
+	unsigned n_fs, n_sighand;
 
 	bprm->unsafe = tracehook_unsafe_exec(p);
 
 	n_fs = 1;
-	n_files = 1;
 	n_sighand = 1;
 	lock_task_sighand(p, &flags);
 	for (t = next_thread(p); t != p; t = next_thread(t)) {
 		if (t->fs == p->fs)
 			n_fs++;
-		if (t->files == files)
-			n_files++;
 		n_sighand++;
 	}
 
 	if (atomic_read(&p->fs->count) > n_fs ||
-	    atomic_read(&p->files->count) > n_files ||
 	    atomic_read(&p->sighand->count) > n_sighand)
 		bprm->unsafe |= LSM_UNSAFE_SHARE;
 
@@ -1289,7 +1285,7 @@ int do_execve(char * filename,
 	bprm->cred = prepare_exec_creds();
 	if (!bprm->cred)
 		goto out_unlock;
-	check_unsafe_exec(bprm, displaced);
+	check_unsafe_exec(bprm);
 
 	file = open_exec(filename);
 	retval = PTR_ERR(file);
--- 2.6.29/fs/internal.h	2009-03-23 23:12:14.000000000 +0000
+++ linux/fs/internal.h	2009-03-28 18:06:02.000000000 +0000
@@ -43,7 +43,7 @@ extern void __init chrdev_init(void);
 /*
  * exec.c
  */
-extern void check_unsafe_exec(struct linux_binprm *, struct files_struct *);
+extern void check_unsafe_exec(struct linux_binprm *);
 
 /*
  * namespace.c

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 3/4] fix setuid sometimes wouldn't
  2009-03-28 23:16 [PATCH 1/4] compat_do_execve should unshare_files Hugh Dickins
  2009-03-28 23:20 ` [PATCH 2/4] fix setuid sometimes doesn't Hugh Dickins
@ 2009-03-28 23:21 ` Hugh Dickins
  2009-03-29 11:19   ` Alexey Dobriyan
  2009-03-28 23:23 ` [PATCH 4/4] Annotate struct fs_struct's usage count restriction Hugh Dickins
  2 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2009-03-28 23:21 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Oleg Nesterov, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

check_unsafe_exec() also notes whether the fs_struct is being
shared by more threads than will get killed by the exec, and if so
sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
But /proc/<pid>/cwd and /proc/<pid>/root lookups make transient
use of get_fs_struct(), which also raises that sharing count.

This might occasionally cause a setuid program not to change euid,
in the same way as happened with files->count (check_unsafe_exec
also looks at sighand->count, but /proc doesn't raise that one).

We'd prefer exec not to unshare fs_struct: so fix this in procfs,
replacing get_fs_struct() by get_fs_path(), which does path_get
while still holding task_lock, instead of raising fs->count.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
___

 fs/proc/base.c |   50 +++++++++++++++--------------------------------
 1 file changed, 16 insertions(+), 34 deletions(-)

--- 2.6.29/fs/proc/base.c	2009-03-23 23:12:14.000000000 +0000
+++ linux/fs/proc/base.c	2009-03-28 18:06:02.000000000 +0000
@@ -146,15 +146,22 @@ static unsigned int pid_entry_count_dirs
 	return count;
 }
 
-static struct fs_struct *get_fs_struct(struct task_struct *task)
+static int get_fs_path(struct task_struct *task, struct path *path, bool root)
 {
 	struct fs_struct *fs;
+	int result = -ENOENT;
+
 	task_lock(task);
 	fs = task->fs;
-	if(fs)
-		atomic_inc(&fs->count);
+	if (fs) {
+		read_lock(&fs->lock);
+		*path = root ? fs->root : fs->pwd;
+		path_get(path);
+		read_unlock(&fs->lock);
+		result = 0;
+	}
 	task_unlock(task);
-	return fs;
+	return result;
 }
 
 static int get_nr_threads(struct task_struct *tsk)
@@ -172,42 +179,24 @@ static int get_nr_threads(struct task_st
 static int proc_cwd_link(struct inode *inode, struct path *path)
 {
 	struct task_struct *task = get_proc_task(inode);
-	struct fs_struct *fs = NULL;
 	int result = -ENOENT;
 
 	if (task) {
-		fs = get_fs_struct(task);
+		result = get_fs_path(task, path, 0);
 		put_task_struct(task);
 	}
-	if (fs) {
-		read_lock(&fs->lock);
-		*path = fs->pwd;
-		path_get(&fs->pwd);
-		read_unlock(&fs->lock);
-		result = 0;
-		put_fs_struct(fs);
-	}
 	return result;
 }
 
 static int proc_root_link(struct inode *inode, struct path *path)
 {
 	struct task_struct *task = get_proc_task(inode);
-	struct fs_struct *fs = NULL;
 	int result = -ENOENT;
 
 	if (task) {
-		fs = get_fs_struct(task);
+		result = get_fs_path(task, path, 1);
 		put_task_struct(task);
 	}
-	if (fs) {
-		read_lock(&fs->lock);
-		*path = fs->root;
-		path_get(&fs->root);
-		read_unlock(&fs->lock);
-		result = 0;
-		put_fs_struct(fs);
-	}
 	return result;
 }
 
@@ -596,7 +585,6 @@ static int mounts_open_common(struct ino
 	struct task_struct *task = get_proc_task(inode);
 	struct nsproxy *nsp;
 	struct mnt_namespace *ns = NULL;
-	struct fs_struct *fs = NULL;
 	struct path root;
 	struct proc_mounts *p;
 	int ret = -EINVAL;
@@ -610,22 +598,16 @@ static int mounts_open_common(struct ino
 				get_mnt_ns(ns);
 		}
 		rcu_read_unlock();
-		if (ns)
-			fs = get_fs_struct(task);
+		if (ns && get_fs_path(task, &root, 1) == 0)
+			ret = 0;
 		put_task_struct(task);
 	}
 
 	if (!ns)
 		goto err;
-	if (!fs)
+	if (ret)
 		goto err_put_ns;
 
-	read_lock(&fs->lock);
-	root = fs->root;
-	path_get(&root);
-	read_unlock(&fs->lock);
-	put_fs_struct(fs);
-
 	ret = -ENOMEM;
 	p = kmalloc(sizeof(struct proc_mounts), GFP_KERNEL);
 	if (!p)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/4] Annotate struct fs_struct's usage count restriction
  2009-03-28 23:16 [PATCH 1/4] compat_do_execve should unshare_files Hugh Dickins
  2009-03-28 23:20 ` [PATCH 2/4] fix setuid sometimes doesn't Hugh Dickins
  2009-03-28 23:21 ` [PATCH 3/4] fix setuid sometimes wouldn't Hugh Dickins
@ 2009-03-28 23:23 ` Hugh Dickins
  2 siblings, 0 replies; 50+ messages in thread
From: Hugh Dickins @ 2009-03-28 23:23 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Oleg Nesterov, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

From: David Howells <dhowells@redhat.com>

Annotate struct fs_struct's usage count to indicate the restrictions upon it.
It may not be incremented, except by clone(CLONE_FS), as this affects the
check in check_unsafe_exec() in fs/exec.c.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
---

 include/linux/fs_struct.h |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- 2.6.29/include/linux/fs_struct.h	2009-03-23 23:12:14.000000000 +0000
+++ linux/include/linux/fs_struct.h	2009-03-28 18:06:02.000000000 +0000
@@ -4,7 +4,10 @@
 #include <linux/path.h>
 
 struct fs_struct {
-	atomic_t count;
+	atomic_t count;	/* This usage count is used by check_unsafe_exec() for
+			 * security checking purposes - therefore it may not be
+			 * incremented, except by clone(CLONE_FS).
+			 */
 	rwlock_t lock;
 	int umask;
 	struct path root, pwd;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-28 23:20 ` [PATCH 2/4] fix setuid sometimes doesn't Hugh Dickins
@ 2009-03-29  0:53   ` Oleg Nesterov
  2009-03-29  4:10     ` Al Viro
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-03-29  0:53 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Al Viro, Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel

> -void check_unsafe_exec(struct linux_binprm *bprm, struct files_struct *files)
> +void check_unsafe_exec(struct linux_binprm *bprm)
>  {
>  	struct task_struct *p = current, *t;
>  	unsigned long flags;
> -	unsigned n_fs, n_files, n_sighand;
> +	unsigned n_fs, n_sighand;
>
>  	bprm->unsafe = tracehook_unsafe_exec(p);
>
>  	n_fs = 1;
> -	n_files = 1;
>  	n_sighand = 1;
>  	lock_task_sighand(p, &flags);
>  	for (t = next_thread(p); t != p; t = next_thread(t)) {
>  		if (t->fs == p->fs)
>  			n_fs++;
> -		if (t->files == files)
> -			n_files++;
>  		n_sighand++;
>  	}
>
>  	if (atomic_read(&p->fs->count) > n_fs ||
> -	    atomic_read(&p->files->count) > n_files ||
>  	    atomic_read(&p->sighand->count) > n_sighand)
>  		bprm->unsafe |= LSM_UNSAFE_SHARE;

Can't find the patch which introduced check_unsafe_exec(), so
I am asking here.

How it is supposed to work?

Let's suppose we have two threads T1 and T2. T1 exits, and calls
exit_fs().

	exit_fs:

		tsk->fs = NULL;
		// WINDOW
		put_fs_struct(fs);

Now, if T2 does exec() and check_unsafe_exec() happens in the WINDOW
above, we set LSM_UNSAFE_SHARE.

Or we can race with sub-thread doing clone(CLONE_FS|CLONE_THREAD),
the new thread is not visible in ->thread_group, buy copy_fs()
can already increment fs->count.


Another question. Why do we check sighand->count? We always unshare
->sighand on exec, see de_thread().


Minor, but why lock_task_sighand() ? This helper is "__must_check".
If it can't fail (yes, it can't fail here), spin_lock_irq(siglock)
is enough. (and given that ->siglock can't help anyway to calculate
n_fs, we could use rcu_read_lock() instead).


(as for these patches, I think they are correct).

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-29  0:53   ` Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't) Oleg Nesterov
@ 2009-03-29  4:10     ` Al Viro
  2009-03-29  4:14       ` Al Viro
  2009-03-29  4:52       ` Oleg Nesterov
  0 siblings, 2 replies; 50+ messages in thread
From: Al Viro @ 2009-03-29  4:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Sun, Mar 29, 2009 at 01:53:43AM +0100, Oleg Nesterov wrote:

> Can't find the patch which introduced check_unsafe_exec(), so
> I am asking here.
> 
> How it is supposed to work?
> 
> Let's suppose we have two threads T1 and T2. T1 exits, and calls
> exit_fs().
> 
> 	exit_fs:
> 
> 		tsk->fs = NULL;
> 		// WINDOW
> 		put_fs_struct(fs);
> 
> Now, if T2 does exec() and check_unsafe_exec() happens in the WINDOW
> above, we set LSM_UNSAFE_SHARE.
> 
> Or we can race with sub-thread doing clone(CLONE_FS|CLONE_THREAD),
> the new thread is not visible in ->thread_group, buy copy_fs()
> can already increment fs->count.
 
Frankly, I don't think we really care.  Note that having several sub-threads
and doing execve() in one of them will kill the rest, so you really want
to do some kind of synchronization to get something similar to reasonable
behaviour anyway.

> Another question. Why do we check sighand->count? We always unshare
> ->sighand on exec, see de_thread().

Correct.  That check can and should go.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-29  4:10     ` Al Viro
@ 2009-03-29  4:14       ` Al Viro
  2009-03-29  4:52       ` Oleg Nesterov
  1 sibling, 0 replies; 50+ messages in thread
From: Al Viro @ 2009-03-29  4:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Sun, Mar 29, 2009 at 05:10:22AM +0100, Al Viro wrote:

> Frankly, I don't think we really care.  Note that having several sub-threads
> and doing execve() in one of them will kill the rest, so you really want
> to do some kind of synchronization

in userland, that is

> to get something similar to reasonable
> behaviour anyway.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-29  4:10     ` Al Viro
  2009-03-29  4:14       ` Al Viro
@ 2009-03-29  4:52       ` Oleg Nesterov
  2009-03-29  5:55         ` Al Viro
  1 sibling, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-03-29  4:52 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 03/29, Al Viro wrote:
>
> On Sun, Mar 29, 2009 at 01:53:43AM +0100, Oleg Nesterov wrote:
>
> > Let's suppose we have two threads T1 and T2. T1 exits, and calls
> > exit_fs().
> >
> > 	exit_fs:
> >
> > 		tsk->fs = NULL;
> > 		// WINDOW
> > 		put_fs_struct(fs);
> >
> > Now, if T2 does exec() and check_unsafe_exec() happens in the WINDOW
> > above, we set LSM_UNSAFE_SHARE.
> >
> > Or we can race with sub-thread doing clone(CLONE_FS|CLONE_THREAD),
> > the new thread is not visible in ->thread_group, buy copy_fs()
> > can already increment fs->count.
>
> Frankly, I don't think we really care.  Note that having several sub-threads
> and doing execve() in one of them will kill the rest, so you really want
> to do some kind of synchronization to get something similar to reasonable
> behaviour anyway.

OK.

Let's suppose that check_unsafe_exec() does not set LSM_UNSAFE_SHARE and
drops ->siglock. After that, another sub-thread does clone(CLONE_FS) without
CLONE_THREAD.

Unless we killed other threads, I can't see how we can check ->fs is not
shared with another process, we can fool ->bprm_set_creds() anyway.

Confused.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-29  4:52       ` Oleg Nesterov
@ 2009-03-29  5:55         ` Al Viro
  2009-03-29  6:01           ` Al Viro
  0 siblings, 1 reply; 50+ messages in thread
From: Al Viro @ 2009-03-29  5:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Sun, Mar 29, 2009 at 06:52:06AM +0200, Oleg Nesterov wrote:

> Let's suppose that check_unsafe_exec() does not set LSM_UNSAFE_SHARE and
> drops ->siglock. After that, another sub-thread does clone(CLONE_FS) without
> CLONE_THREAD.

Lovely.  And yes, AFAICS that's a hole.

> Unless we killed other threads, I can't see how we can check ->fs is not
> shared with another process, we can fool ->bprm_set_creds() anyway.

We can't do that, until we are past the point of no return.  Charming...
In principle, we can mark these threads as "-EAGAIN on such clone()" and
clean that on exec failure.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-29  5:55         ` Al Viro
@ 2009-03-29  6:01           ` Al Viro
  2009-03-29 21:36             ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Al Viro @ 2009-03-29  6:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Sun, Mar 29, 2009 at 06:55:13AM +0100, Al Viro wrote:
> On Sun, Mar 29, 2009 at 06:52:06AM +0200, Oleg Nesterov wrote:
> 
> > Let's suppose that check_unsafe_exec() does not set LSM_UNSAFE_SHARE and
> > drops ->siglock. After that, another sub-thread does clone(CLONE_FS) without
> > CLONE_THREAD.
> 
> Lovely.  And yes, AFAICS that's a hole.
> 
> > Unless we killed other threads, I can't see how we can check ->fs is not
> > shared with another process, we can fool ->bprm_set_creds() anyway.
> 
> We can't do that, until we are past the point of no return.  Charming...
> In principle, we can mark these threads as "-EAGAIN on such clone()" and
> clean that on exec failure.

... or just do that to fs_struct.  After finding that there's no outside
users.  Commenst?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] fix setuid sometimes wouldn't
  2009-03-28 23:21 ` [PATCH 3/4] fix setuid sometimes wouldn't Hugh Dickins
@ 2009-03-29 11:19   ` Alexey Dobriyan
  2009-03-29 21:48     ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Dobriyan @ 2009-03-29 11:19 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Al Viro, Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Oleg Nesterov,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Sat, Mar 28, 2009 at 11:21:27PM +0000, Hugh Dickins wrote:
> check_unsafe_exec() also notes whether the fs_struct is being
> shared by more threads than will get killed by the exec, and if so
> sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
> But /proc/<pid>/cwd and /proc/<pid>/root lookups make transient
> use of get_fs_struct(), which also raises that sharing count.
> 
> This might occasionally cause a setuid program not to change euid,
> in the same way as happened with files->count (check_unsafe_exec
> also looks at sighand->count, but /proc doesn't raise that one).
> 
> We'd prefer exec not to unshare fs_struct: so fix this in procfs,
> replacing get_fs_struct() by get_fs_path(), which does path_get
> while still holding task_lock, instead of raising fs->count.

> --- 2.6.29/fs/proc/base.c
> +++ linux/fs/proc/base.c
> @@ -146,15 +146,22 @@ static unsigned int pid_entry_count_dirs
>  	return count;
>  }
>  
> -static struct fs_struct *get_fs_struct(struct task_struct *task)
> +static int get_fs_path(struct task_struct *task, struct path *path, bool root)
>  {
>  	struct fs_struct *fs;
> +	int result = -ENOENT;
> +
>  	task_lock(task);
>  	fs = task->fs;
> -	if(fs)
> -		atomic_inc(&fs->count);
> +	if (fs) {
> +		read_lock(&fs->lock);
> +		*path = root ? fs->root : fs->pwd;
> +		path_get(path);
> +		read_unlock(&fs->lock);
> +		result = 0;
> +	}
>  	task_unlock(task);
> -	return fs;
> +	return result;
>  }

I think it's better to open-code. "root" parameter is unnatural.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-29  6:01           ` Al Viro
@ 2009-03-29 21:36             ` Oleg Nesterov
  2009-03-29 22:20               ` Al Viro
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-03-29 21:36 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 03/29, Al Viro wrote:
>
> > In principle, we can mark these threads as "-EAGAIN on such clone()" and
> > clean that on exec failure.

We can't. We can miss the new subthread if we race with clone(CLONE_THREAD).
Unless we add the additional locking, of course.

We can set current->signal->flags |= SIGNAL_DO_NOT_CLONE_FS. But this is
really nasty. For examlpe, what if this flag is already set when
check_unsafe_exec() takes ->siglock ? We should return -ESOMETHING, not
good. Or schedule_timeout_uninterruptible(1) until it is cleared?

This also means copy_process()->copy_fs() path should take ->siglock too,
otherwise we we don't have a barrier.

> ... or just do that to fs_struct.  After finding that there's no outside
> users.  Commenst?

This is even worse. Not only we race with our sub-threads, we race
with CLONE_FS processes.

We can't mark fs_struct after finding that there's no outside users
lockless. Because we can't know whether this is "after" or not, we
can't trust "atomic_read(fs->count) <= n_fs".

Unless we re-use fs_struct->lock. In this case copy_fs() should take
it too. But again, ->fs can be already marked when we enter
check_unsafe_exec().


And btw check_unsafe_exec() seem to have another hole. Another thread
(which shares ->fs with us) can do exit_fs() right before we read
fs->count. Since this thread was already accounted in n_fs, we can
miss the fact we share ->fs with another process.


Perhaps I missed something...


Not that I like this idea (actually I hate), but perhaps we can change
the meaning of LSM_UNSAFE_SHARE,

	selinux_bprm_set_creds:

		if (new_tsec->sid != old_tsec->sid) {
			...

			if (avc_has_perm(...))
				bprm->unsafe |= LSM_UNSAFE_SHARE;
		}


Then we modify de_thread(). It sends SIGKILL to all subthreads, this
means that another thread can't clone() after we drop ->siglock. So we
can add this code to the ->siglock protected section

	if (unlikely(bprm->unsafe & LSM_UNSAFE_SHARE)) {
		if (fs_struct_is_shared())
			return -EPERM;
	}

	...
	zap_other_threads();

Oh, ugly.

I'd better hope I missed something ;)

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] fix setuid sometimes wouldn't
  2009-03-29 11:19   ` Alexey Dobriyan
@ 2009-03-29 21:48     ` Oleg Nesterov
  2009-03-29 22:37       ` Al Viro
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-03-29 21:48 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Hugh Dickins, Al Viro, Linus Torvalds, Andrew Morton,
	Joe Malicki, Michael Itz, Kenneth Baker, Chris Wright,
	David Howells, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 03/29, Alexey Dobriyan wrote:
>
> On Sat, Mar 28, 2009 at 11:21:27PM +0000, Hugh Dickins wrote:
> >
> > -static struct fs_struct *get_fs_struct(struct task_struct *task)
> > +static int get_fs_path(struct task_struct *task, struct path *path, bool root)
> >  {
> >  	struct fs_struct *fs;
> > +	int result = -ENOENT;
> > +
> >  	task_lock(task);
> >  	fs = task->fs;
> > -	if(fs)
> > -		atomic_inc(&fs->count);
> > +	if (fs) {
> > +		read_lock(&fs->lock);
> > +		*path = root ? fs->root : fs->pwd;
> > +		path_get(path);
> > +		read_unlock(&fs->lock);
> > +		result = 0;
> > +	}
> >  	task_unlock(task);
> > -	return fs;
> > +	return result;
> >  }
>
> I think it's better to open-code. "root" parameter is unnatural.

IMHO, open-coding doesn't improve readability. I am not even talking
about code size (it doesn't matter of course, the kernel is so tiny ;).

Perhaps "enum what" is more natural compared to "bool root", but this
helper is simple enough.

Personally, I think this patch makes sense even if it didn't fix the
bug, it cleanups the code and makes it more readable.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-29 21:36             ` Oleg Nesterov
@ 2009-03-29 22:20               ` Al Viro
  2009-03-29 23:56                 ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Al Viro @ 2009-03-29 22:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Sun, Mar 29, 2009 at 11:36:35PM +0200, Oleg Nesterov wrote:
> > ... or just do that to fs_struct.  After finding that there's no outside
> > users.  Commenst?
> 
> This is even worse. Not only we race with our sub-threads, we race
> with CLONE_FS processes.
> 
> We can't mark fs_struct after finding that there's no outside users
> lockless. Because we can't know whether this is "after" or not, we
> can't trust "atomic_read(fs->count) <= n_fs".

We can lock fs_struct in question, go through the threads, then mark
or bail out.  With cloning a reference to fs_struct protected by the
same lock.

FWIW, I'm not at all sure that we want atomic_t for refcount in that
case...

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] fix setuid sometimes wouldn't
  2009-03-29 21:48     ` Oleg Nesterov
@ 2009-03-29 22:37       ` Al Viro
  0 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2009-03-29 22:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Alexey Dobriyan, Hugh Dickins, Linus Torvalds, Andrew Morton,
	Joe Malicki, Michael Itz, Kenneth Baker, Chris Wright,
	David Howells, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Sun, Mar 29, 2009 at 11:48:55PM +0200, Oleg Nesterov wrote:
> On 03/29, Alexey Dobriyan wrote:
> >
> > On Sat, Mar 28, 2009 at 11:21:27PM +0000, Hugh Dickins wrote:
> > >
> > > -static struct fs_struct *get_fs_struct(struct task_struct *task)
> > > +static int get_fs_path(struct task_struct *task, struct path *path, bool root)
> > >  {
> > >  	struct fs_struct *fs;
> > > +	int result = -ENOENT;
> > > +
> > >  	task_lock(task);
> > >  	fs = task->fs;
> > > -	if(fs)
> > > -		atomic_inc(&fs->count);
> > > +	if (fs) {
> > > +		read_lock(&fs->lock);
> > > +		*path = root ? fs->root : fs->pwd;
> > > +		path_get(path);
> > > +		read_unlock(&fs->lock);
> > > +		result = 0;
> > > +	}
> > >  	task_unlock(task);
> > > -	return fs;
> > > +	return result;
> > >  }
> >
> > I think it's better to open-code. "root" parameter is unnatural.
> 
> IMHO, open-coding doesn't improve readability. I am not even talking
> about code size (it doesn't matter of course, the kernel is so tiny ;).
> 
> Perhaps "enum what" is more natural compared to "bool root", but this
> helper is simple enough.
> 
> Personally, I think this patch makes sense even if it didn't fix the
> bug, it cleanups the code and makes it more readable.

Aye.  Applied, with enum, but an anonymous one; no point breeding
tags without a reason.

Another note: chroot_fs_refs() shouldn't bump refcount at all; it should
do replacements, count them and then do path_release(old_root) that many
times - after having gone through all tasks.

I'm taking fs_struct handling into a new file (fs/fs_struct.c, unless
somebody has a better name), will do that in the same patch.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-29 22:20               ` Al Viro
@ 2009-03-29 23:56                 ` Oleg Nesterov
  2009-03-30  0:03                   ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-03-29 23:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 03/29, Al Viro wrote:
>
> On Sun, Mar 29, 2009 at 11:36:35PM +0200, Oleg Nesterov wrote:
> > > ... or just do that to fs_struct.  After finding that there's no outside
> > > users.  Commenst?
> >
> > This is even worse. Not only we race with our sub-threads, we race
> > with CLONE_FS processes.
> >
> > We can't mark fs_struct after finding that there's no outside users
> > lockless. Because we can't know whether this is "after" or not, we
> > can't trust "atomic_read(fs->count) <= n_fs".
>
> We can lock fs_struct in question, go through the threads, then mark
> or bail out.  With cloning a reference to fs_struct protected by the
> same lock.

Yes, this is what I meant, copy_fs() needs this lock too,

> FWIW, I'm not at all sure that we want atomic_t for refcount in that
> case...

I think you are right, because exit_fs() should take fs->lock as well.

But, again. What whould we do when check_unsafe_exec() takes fs->lock
and sees that this ->fs is already marked?

In that case -EWHATEVER is not very good, it could be another process
(not sub-thread) doing exec.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-29 23:56                 ` Oleg Nesterov
@ 2009-03-30  0:03                   ` Oleg Nesterov
  2009-03-30  1:08                     ` Al Viro
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-03-30  0:03 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 03/30, Oleg Nesterov wrote:
>
> On 03/29, Al Viro wrote:
> >
> > On Sun, Mar 29, 2009 at 11:36:35PM +0200, Oleg Nesterov wrote:
> > > > ... or just do that to fs_struct.  After finding that there's no outside
> > > > users.  Commenst?
> > >
> > > This is even worse. Not only we race with our sub-threads, we race
> > > with CLONE_FS processes.
> > >
> > > We can't mark fs_struct after finding that there's no outside users
> > > lockless. Because we can't know whether this is "after" or not, we
> > > can't trust "atomic_read(fs->count) <= n_fs".
> >
> > We can lock fs_struct in question, go through the threads, then mark
> > or bail out.  With cloning a reference to fs_struct protected by the
> > same lock.
>
> Yes, this is what I meant, copy_fs() needs this lock too,
>
> > FWIW, I'm not at all sure that we want atomic_t for refcount in that
> > case...
>
> I think you are right, because exit_fs() should take fs->lock as well.
>
> But, again. What whould we do when check_unsafe_exec() takes fs->lock
> and sees that this ->fs is already marked?

Ah, I am stupid. There is no another process if this flag is set.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-30  0:03                   ` Oleg Nesterov
@ 2009-03-30  1:08                     ` Al Viro
  2009-03-30  1:13                       ` Al Viro
  0 siblings, 1 reply; 50+ messages in thread
From: Al Viro @ 2009-03-30  1:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Mon, Mar 30, 2009 at 02:03:38AM +0200, Oleg Nesterov wrote:
> On 03/30, Oleg Nesterov wrote:
> >
> > On 03/29, Al Viro wrote:
> > >
> > > On Sun, Mar 29, 2009 at 11:36:35PM +0200, Oleg Nesterov wrote:
> > > > > ... or just do that to fs_struct.  After finding that there's no outside
> > > > > users.  Commenst?
> > > >
> > > > This is even worse. Not only we race with our sub-threads, we race
> > > > with CLONE_FS processes.
> > > >
> > > > We can't mark fs_struct after finding that there's no outside users
> > > > lockless. Because we can't know whether this is "after" or not, we
> > > > can't trust "atomic_read(fs->count) <= n_fs".
> > >
> > > We can lock fs_struct in question, go through the threads, then mark
> > > or bail out.  With cloning a reference to fs_struct protected by the
> > > same lock.
> >
> > Yes, this is what I meant, copy_fs() needs this lock too,
> >
> > > FWIW, I'm not at all sure that we want atomic_t for refcount in that
> > > case...
> >
> > I think you are right, because exit_fs() should take fs->lock as well.
> >
> > But, again. What whould we do when check_unsafe_exec() takes fs->lock
> > and sees that this ->fs is already marked?
> 
> Ah, I am stupid. There is no another process if this flag is set.

So...
	* one can always dereference current->fs
	* nobody can change task->fs for seen-by-scheduler task other than
current (we can, of course, do that for a task we'd just allocated in clone
before anyone else has seen it)
	* all changes of current->fs happen under task_lock *and* excl ->lock
of old value of current->fs.
	* anybody can dereference task->fs while holding task_lock(task)
	* ->lock nests inside task_lock
	* freeing happens only when the last reference is gone *and* all
tasks that used to hold such references has already gone through task_unlock
	* all refcount changes happen under excl ->lock
	* changes of ->root and ->pwd happen under excl ->lock
	* read access to ->root and ->pwd should be done under shared ->lock;
to use the contents past the unlock you need to grab references to path in
question while holding lock
	* cloning a reference is done while holding ->lock shared, fails with
-EAGAIN if fs_struct is marked "under exec"
	* mark in question is set and cleared with ->lock excl.
	* check_unsafe_exec() locks current->fs shared, goes through all
threads comparing their ->fs with our, if the number doesn't match - bail
out.  Otherwise we mark it "under exec".
	* in the end of execve() (failure or success) we clear the mark.
	* all reassignments of task->fs are either to NULL or to new instance
(never seen by anybody) or to &init_fs.
	* task with ->fs == &init_fs may not call execve(); those are kernel
threads and they must get ->fs unshared before they can do anything of that
kind (otherwise we'd have no end of trouble with chdir() done by exec'ed
binary affecting hell knows what else).

Does anybody see any holes in the above?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-30  1:08                     ` Al Viro
@ 2009-03-30  1:13                       ` Al Viro
  2009-03-30  1:36                         ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Al Viro @ 2009-03-30  1:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Mon, Mar 30, 2009 at 02:08:43AM +0100, Al Viro wrote:

> So...
> 	* one can always dereference current->fs
> 	* nobody can change task->fs for seen-by-scheduler task other than
> current (we can, of course, do that for a task we'd just allocated in clone
> before anyone else has seen it)
> 	* all changes of current->fs happen under task_lock *and* excl ->lock
> of old value of current->fs.
> 	* anybody can dereference task->fs while holding task_lock(task)
> 	* ->lock nests inside task_lock
> 	* freeing happens only when the last reference is gone *and* all
> tasks that used to hold such references has already gone through task_unlock
> 	* all refcount changes happen under excl ->lock
> 	* changes of ->root and ->pwd happen under excl ->lock
> 	* read access to ->root and ->pwd should be done under shared ->lock;
> to use the contents past the unlock you need to grab references to path in
> question while holding lock
> 	* cloning a reference is done while holding ->lock shared, fails with
	                                                   ^^^^^^
	                                                   excl, of course

> -EAGAIN if fs_struct is marked "under exec"
> 	* mark in question is set and cleared with ->lock excl.
> 	* check_unsafe_exec() locks current->fs shared, goes through all
> threads comparing their ->fs with our, if the number doesn't match - bail
> out.  Otherwise we mark it "under exec".
> 	* in the end of execve() (failure or success) we clear the mark.
> 	* all reassignments of task->fs are either to NULL or to new instance
> (never seen by anybody) or to &init_fs.
> 	* task with ->fs == &init_fs may not call execve(); those are kernel
> threads and they must get ->fs unshared before they can do anything of that
> kind (otherwise we'd have no end of trouble with chdir() done by exec'ed
> binary affecting hell knows what else).
> 
> Does anybody see any holes in the above?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-30  1:13                       ` Al Viro
@ 2009-03-30  1:36                         ` Oleg Nesterov
  2009-03-30  1:40                           ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-03-30  1:36 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 03/30, Al Viro wrote:
>
> On Mon, Mar 30, 2009 at 02:08:43AM +0100, Al Viro wrote:
>
> > So...
> > 	* check_unsafe_exec() locks current->fs shared, goes through all
> > threads comparing their ->fs with our, if the number doesn't match - bail
> > out.  Otherwise we mark it "under exec".

Unless I missed something again, check_unsafe_exec() should check "under exec"
after it takes fs->lock. If set - we are racing with sub-thread, return -EAGAIN.

We can't proceed. If that another exec() fails, it will clear "under exec" at
the end of do_execve(), before we kill other threads.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-30  1:36                         ` Oleg Nesterov
@ 2009-03-30  1:40                           ` Oleg Nesterov
  2009-03-30 12:31                             ` Al Viro
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-03-30  1:40 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 03/30, Oleg Nesterov wrote:
>
> On 03/30, Al Viro wrote:
> >
> > On Mon, Mar 30, 2009 at 02:08:43AM +0100, Al Viro wrote:
> >
> > > So...
> > > 	* check_unsafe_exec() locks current->fs shared, goes through all
> > > threads comparing their ->fs with our, if the number doesn't match - bail
> > > out.  Otherwise we mark it "under exec".
>
> Unless I missed something again, check_unsafe_exec() should check "under exec"
> after it takes fs->lock. If set - we are racing with sub-thread, return -EAGAIN.
>
> We can't proceed. If that another exec() fails, it will clear "under exec" at
> the end of do_execve(), before we kill other threads.

Or we need a counter to mark/unmark.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-30  1:40                           ` Oleg Nesterov
@ 2009-03-30 12:31                             ` Al Viro
  2009-03-30 14:32                               ` Hugh Dickins
  2009-03-30 23:45                               ` Serge E. Hallyn
  0 siblings, 2 replies; 50+ messages in thread
From: Al Viro @ 2009-03-30 12:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Mon, Mar 30, 2009 at 03:40:40AM +0200, Oleg Nesterov wrote:

> > We can't proceed. If that another exec() fails, it will clear "under exec" at
> > the end of do_execve(), before we kill other threads.
> 
> Or we need a counter to mark/unmark.

Nah, easier to have check_unsafe_exec() return -EAGAIN in cases we care
about.

Anyway, completely untested patchset is in
git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ execve-mess
(the last 9 changesets of it).

WARNING: that's *NOT* for merge at the moment; this is not a pull request.

Review (and testing) would be welcome.

Shortlog of execve-related part:
Al Viro (6):
      Take fs_struct handling to new file (fs/fs_struct.c), sanitize chroot_fs_refs()
      New helper - current_umask()
      Get rid of indirect include of fs_struct.h
      Kill unsharing fs_struct in __set_personality()
      New locking/refcounting for fs_struct
      check_unsafe_exec() doesn't care about signal handlers sharing

Hugh Dickins (3):
      Don't bump fs_struct refcount for procfs accesses
      compat_do_execve should unshare_files
      fix setuid sometimes doesn't - files_struct

Diffstat (again, of execve-related stuff)
 arch/cris/kernel/process.c                |    1 -
 arch/powerpc/platforms/cell/spufs/inode.c |    2 +-
 fs/Makefile                               |    2 +-
 fs/btrfs/acl.c                            |    2 +-
 fs/btrfs/ioctl.c                          |    2 +-
 fs/cifs/dir.c                             |    4 +-
 fs/cifs/inode.c                           |    4 +-
 fs/compat.c                               |   28 ++++-
 fs/dcache.c                               |    1 +
 fs/exec.c                                 |   39 +++++--
 fs/ext2/acl.c                             |    2 +-
 fs/ext3/acl.c                             |    2 +-
 fs/ext4/acl.c                             |    2 +-
 fs/fat/inode.c                            |    2 +-
 fs/fs_struct.c                            |  173 +++++++++++++++++++++++++++++
 fs/generic_acl.c                          |    2 +-
 fs/gfs2/acl.c                             |    2 +-
 fs/hfsplus/options.c                      |    2 +-
 fs/hpfs/super.c                           |    2 +-
 fs/internal.h                             |    8 +-
 fs/jffs2/acl.c                            |    2 +-
 fs/jfs/acl.c                              |    2 +-
 fs/namei.c                                |   14 +--
 fs/namespace.c                            |   61 +----------
 fs/nfs/nfs3proc.c                         |    6 +-
 fs/nfs/nfs4proc.c                         |    2 +-
 fs/nfsd/nfssvc.c                          |    7 +-
 fs/ocfs2/acl.c                            |    2 +-
 fs/omfs/inode.c                           |    2 +-
 fs/open.c                                 |    1 +
 fs/proc/base.c                            |   53 +++------
 fs/proc/task_nommu.c                      |    3 +-
 fs/reiserfs/xattr_acl.c                   |    2 +-
 fs/xfs/linux-2.6/xfs_iops.c               |    4 +-
 include/linux/fs.h                        |    2 +
 include/linux/fs_struct.h                 |    7 +-
 include/linux/mnt_namespace.h             |    2 +
 include/linux/nsproxy.h                   |    1 +
 include/linux/sched.h                     |    3 +-
 init/do_mounts.c                          |    1 +
 ipc/mqueue.c                              |    2 +-
 kernel/auditsc.c                          |    1 +
 kernel/exec_domain.c                      |   22 ----
 kernel/exit.c                             |   32 +-----
 kernel/fork.c                             |   63 +++++------
 kernel/sys.c                              |    1 +
 net/unix/af_unix.c                        |    2 +-
 security/tomoyo/realpath.c                |    1 +
 48 files changed, 337 insertions(+), 246 deletions(-)
 create mode 100644 fs/fs_struct.c


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-30 12:31                             ` Al Viro
@ 2009-03-30 14:32                               ` Hugh Dickins
  2009-03-31  6:16                                 ` Al Viro
  2009-03-30 23:45                               ` Serge E. Hallyn
  1 sibling, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2009-03-30 14:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Oleg Nesterov, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Mon, 30 Mar 2009, Al Viro wrote:
> On Mon, Mar 30, 2009 at 03:40:40AM +0200, Oleg Nesterov wrote:

Well done, Oleg, for finding this: I wish I'd Cc'ed you earlier.
Shortly before posting my patches, I got very worried by n_fs in
check_unsafe_exec; then fooled myself into thinking it was okay.

> 
> > > We can't proceed. If that another exec() fails, it will clear "under exec" at
> > > the end of do_execve(), before we kill other threads.
> > 
> > Or we need a counter to mark/unmark.
> 
> Nah, easier to have check_unsafe_exec() return -EAGAIN in cases we care
> about.
> 
> Anyway, completely untested patchset is in
> git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ execve-mess
> (the last 9 changesets of it).
> 
> WARNING: that's *NOT* for merge at the moment; this is not a pull request.
> 
> Review (and testing) would be welcome.
> 
> Shortlog of execve-related part:
> Al Viro (6):
>       Take fs_struct handling to new file (fs/fs_struct.c), sanitize chroot_fs_refs()
>       New helper - current_umask()
>       Get rid of indirect include of fs_struct.h
>       Kill unsharing fs_struct in __set_personality()
>       New locking/refcounting for fs_struct
>       check_unsafe_exec() doesn't care about signal handlers sharing
> 
> Hugh Dickins (3):
>       Don't bump fs_struct refcount for procfs accesses
>       compat_do_execve should unshare_files
>       fix setuid sometimes doesn't - files_struct

I'm looking now.  Your description of what you intended sounded good.

First impression is that there's lots of good cleanup in there, but
it's all too mixed up and misordered at present - though we have friends
who insist upon doing the cleanup first (and I usually like to work that
way too), it's going to be tiresome when backporting to 2.6.29.stable
(luckily not needed earlier, unless you've uncovered more on the way).

Note that Linus put the four patches I posted straight into his tree,
so I think you'll want to be working on top of those, whereas you've
currently got them mixed in and modified in your tree.

So, setting aside what's already in, and what's just cleanup, I think
it's just 8c6934e2603283d028ac2292fe8d2812f52f23d3 I should be looking
at, New locking/refcounting for fs_struct - it'd be good for stable if
you worked on just that one to go in on top of Linus's current tree.

One of the cleanups, the one which introduces fs/fs_struct.c,
includes a good-looking but significant change to chroot_fs_refs().
Better make that a separate patch, I couldn't quite tell if it was
something I'd seriously missed from the "setuid sometimes doesn't"
set or not (my belief: yes, I missed it, and it's a good addition;
but frankly, we don't much care what happens if setuid sometimes
gets turned off if a parallel thread about to be killed happens
to be chrooting at the time - that's a much less worrying issue
than what can happen via /proc).

I think your new chroot_fs_refs() should have a path_get(new_root)
just before each path_put(old_root)?

Hugh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-30 12:31                             ` Al Viro
  2009-03-30 14:32                               ` Hugh Dickins
@ 2009-03-30 23:45                               ` Serge E. Hallyn
  2009-03-31  6:19                                 ` Al Viro
  1 sibling, 1 reply; 50+ messages in thread
From: Serge E. Hallyn @ 2009-03-30 23:45 UTC (permalink / raw)
  To: Al Viro
  Cc: Oleg Nesterov, Hugh Dickins, Linus Torvalds, Andrew Morton,
	Joe Malicki, Michael Itz, Kenneth Baker, Chris Wright,
	David Howells, Alexey Dobriyan, Greg Kroah-Hartman,
	linux-fsdevel, linux-kernel

Quoting Al Viro (viro@ZenIV.linux.org.uk):
> On Mon, Mar 30, 2009 at 03:40:40AM +0200, Oleg Nesterov wrote:
> 
> > > We can't proceed. If that another exec() fails, it will clear "under exec" at
> > > the end of do_execve(), before we kill other threads.
> > 
> > Or we need a counter to mark/unmark.
> 
> Nah, easier to have check_unsafe_exec() return -EAGAIN in cases we care
> about.
> 
> Anyway, completely untested patchset is in
> git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ execve-mess
> (the last 9 changesets of it).
> 
> WARNING: that's *NOT* for merge at the moment; this is not a pull request.
> 
> Review (and testing) would be welcome.

(note exactly *meaningful* review, but)

exit_fs() and daemonize_fs_struct() do:

                if (--fs->users)
                        fs = NULL;
                write_unlock(&fs->lock);

Moving the write_unlock up actually let's the kernel boot and
start running ltp.

-serge

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-30 14:32                               ` Hugh Dickins
@ 2009-03-31  6:16                                 ` Al Viro
  2009-04-01  0:28                                   ` Hugh Dickins
  0 siblings, 1 reply; 50+ messages in thread
From: Al Viro @ 2009-03-31  6:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oleg Nesterov, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Mon, Mar 30, 2009 at 03:32:35PM +0100, Hugh Dickins wrote:
> > Anyway, completely untested patchset is in
> > git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ execve-mess
> > (the last 9 changesets of it).
> > 
> > WARNING: that's *NOT* for merge at the moment; this is not a pull request.

> I'm looking now.  Your description of what you intended sounded good.
> 
> First impression is that there's lots of good cleanup in there, but
> it's all too mixed up and misordered at present - though we have friends
> who insist upon doing the cleanup first (and I usually like to work that
> way too), it's going to be tiresome when backporting to 2.6.29.stable
> (luckily not needed earlier, unless you've uncovered more on the way).

Yes, and worse than that; this was basically a code dump - the state at
the moment when I'd been going down.

> I think your new chroot_fs_refs() should have a path_get(new_root)
> just before each path_put(old_root)?

Not just before; at the time it replaces ->root or ->pwd.  And yes,
it's one of the brainos there.

Anyway, I've got that stuff to something reasonably-looking.  See the
same branch (rebased), just 5 changesets now.  Have fun.  Cleanups are
not part of that set.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-30 23:45                               ` Serge E. Hallyn
@ 2009-03-31  6:19                                 ` Al Viro
  0 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2009-03-31  6:19 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Oleg Nesterov, Hugh Dickins, Linus Torvalds, Andrew Morton,
	Joe Malicki, Michael Itz, Kenneth Baker, Chris Wright,
	David Howells, Alexey Dobriyan, Greg Kroah-Hartman,
	linux-fsdevel, linux-kernel

On Mon, Mar 30, 2009 at 06:45:39PM -0500, Serge E. Hallyn wrote:

> (note exactly *meaningful* review, but)
> 
> exit_fs() and daemonize_fs_struct() do:
> 
>                 if (--fs->users)
>                         fs = NULL;
>                 write_unlock(&fs->lock);
> 
> Moving the write_unlock up actually let's the kernel boot and
> start running ltp.

Correct fix is
		kill = !--fs->users;
		write_unlock(&fs->lock);
		...
		if (kill)
			free_fs_struct(fs);
and similar in other places with the same idiocy (one of which forgets to
unlock, on top of everything else).

Anyway, hopefully much saner (== looked through after getting some sleep,
as opposed to "what I've got in that branch at ~26 hours of uptime")
variant is in the same repository, same branch.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-03-31  6:16                                 ` Al Viro
@ 2009-04-01  0:28                                   ` Hugh Dickins
  2009-04-01  2:38                                     ` Al Viro
  0 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2009-04-01  0:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Oleg Nesterov, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Tue, 31 Mar 2009, Al Viro wrote:
> 
> Anyway, I've got that stuff to something reasonably-looking.  See the
> same branch (rebased), just 5 changesets now.  Have fun.  Cleanups are
> not part of that set.

Just a couple of things on that:

Minor bisectability issue: the third patch, which introduces
int unshare_fs_struct(void), needs to return 0 when it succeeds:
that gets corrected in the fourth patch.

Lockdep objects to how check_unsafe_exec nests write_lock(&p->fs_lock)
inside lock_task_sighand(p, &flags).  It's right: we sometimes take
sighand->siglock in interrupt, so if such an interrupt occurred just
after you take fs_lock elsewhere, that could deadlock with this.  It
seems happy with taking fs_lock just outside the lock_task_sighand.

Otherwise it looks good to me, except I keep worrying about those
EAGAINs.  The more so once I noticed current->cred_exec_mutex is
already being used to handle a similar issue with ptrace.  What
do you think of this rather smaller patch?  which I'd much rather
send after having slept on it, since it may be embarrassingly and
obviously wrong, but tomorrow may be too late ...

 fs/compat.c               |    6 +++---
 fs/exec.c                 |    6 +++---
 fs/namei.c                |    1 +
 include/linux/fs_struct.h |    1 +
 include/linux/init_task.h |    2 --
 include/linux/sched.h     |    1 -
 kernel/cred.c             |    4 +---
 kernel/fork.c             |   13 +++++++++++--
 kernel/ptrace.c           |    4 ++--
 9 files changed, 22 insertions(+), 16 deletions(-)

--- 2.6.29-git8/fs/compat.c	2009-03-31 16:04:22.000000000 +0100
+++ linux/fs/compat.c	2009-04-01 00:52:26.000000000 +0100
@@ -1432,7 +1432,7 @@ int compat_do_execve(char * filename,
 	if (!bprm)
 		goto out_files;
 
-	retval = mutex_lock_interruptible(&current->cred_exec_mutex);
+	retval = mutex_lock_interruptible(&current->fs->cred_exec_mutex);
 	if (retval < 0)
 		goto out_free;
 	current->in_execve = 1;
@@ -1489,7 +1489,7 @@ int compat_do_execve(char * filename,
 
 	/* execve succeeded */
 	current->in_execve = 0;
-	mutex_unlock(&current->cred_exec_mutex);
+	mutex_unlock(&current->fs->cred_exec_mutex);
 	acct_update_integrals(current);
 	free_bprm(bprm);
 	if (displaced)
@@ -1508,7 +1508,7 @@ out_file:
 
 out_unlock:
 	current->in_execve = 0;
-	mutex_unlock(&current->cred_exec_mutex);
+	mutex_unlock(&current->fs->cred_exec_mutex);
 
 out_free:
 	free_bprm(bprm);
--- 2.6.29-git8/fs/exec.c	2009-03-31 16:04:22.000000000 +0100
+++ linux/fs/exec.c	2009-04-01 00:52:26.000000000 +0100
@@ -1287,7 +1287,7 @@ int do_execve(char * filename,
 	if (!bprm)
 		goto out_files;
 
-	retval = mutex_lock_interruptible(&current->cred_exec_mutex);
+	retval = mutex_lock_interruptible(&current->fs->cred_exec_mutex);
 	if (retval < 0)
 		goto out_free;
 	current->in_execve = 1;
@@ -1345,7 +1345,7 @@ int do_execve(char * filename,
 
 	/* execve succeeded */
 	current->in_execve = 0;
-	mutex_unlock(&current->cred_exec_mutex);
+	mutex_unlock(&current->fs->cred_exec_mutex);
 	acct_update_integrals(current);
 	free_bprm(bprm);
 	if (displaced)
@@ -1364,7 +1364,7 @@ out_file:
 
 out_unlock:
 	current->in_execve = 0;
-	mutex_unlock(&current->cred_exec_mutex);
+	mutex_unlock(&current->fs->cred_exec_mutex);
 
 out_free:
 	free_bprm(bprm);
--- 2.6.29-git8/fs/namei.c	2009-03-31 16:04:23.000000000 +0100
+++ linux/fs/namei.c	2009-04-01 00:52:26.000000000 +0100
@@ -2903,4 +2903,5 @@ struct fs_struct init_fs = {
 	.count		= ATOMIC_INIT(1),
 	.lock		= __RW_LOCK_UNLOCKED(init_fs.lock),
 	.umask		= 0022,
+	.cred_exec_mutex = __MUTEX_INITIALIZER(init_fs.cred_exec_mutex),
 };
--- 2.6.29-git8/include/linux/fs_struct.h	2009-03-31 16:04:24.000000000 +0100
+++ linux/include/linux/fs_struct.h	2009-04-01 00:52:26.000000000 +0100
@@ -11,6 +11,7 @@ struct fs_struct {
 	rwlock_t lock;
 	int umask;
 	struct path root, pwd;
+	struct mutex cred_exec_mutex;	/* execve cred calculation mutex */
 };
 
 extern struct kmem_cache *fs_cachep;
--- 2.6.29-git8/include/linux/init_task.h	2009-03-31 16:04:24.000000000 +0100
+++ linux/include/linux/init_task.h	2009-04-01 00:52:26.000000000 +0100
@@ -157,8 +157,6 @@ extern struct cred init_cred;
 	.group_leader	= &tsk,						\
 	.real_cred	= &init_cred,					\
 	.cred		= &init_cred,					\
-	.cred_exec_mutex =						\
-		 __MUTEX_INITIALIZER(tsk.cred_exec_mutex),		\
 	.comm		= "swapper",					\
 	.thread		= INIT_THREAD,					\
 	.fs		= &init_fs,					\
--- 2.6.29-git8/include/linux/sched.h	2009-03-31 16:04:25.000000000 +0100
+++ linux/include/linux/sched.h	2009-04-01 00:52:26.000000000 +0100
@@ -1250,7 +1250,6 @@ struct task_struct {
 					 * credentials (COW) */
 	const struct cred *cred;	/* effective (overridable) subjective task
 					 * credentials (COW) */
-	struct mutex cred_exec_mutex;	/* execve vs ptrace cred calculation mutex */
 
 	char comm[TASK_COMM_LEN]; /* executable name excluding path
 				     - access with [gs]et_task_comm (which lock
--- 2.6.29-git8/kernel/cred.c	2009-03-23 23:12:14.000000000 +0000
+++ linux/kernel/cred.c	2009-04-01 00:52:26.000000000 +0100
@@ -167,7 +167,7 @@ EXPORT_SYMBOL(prepare_creds);
 
 /*
  * Prepare credentials for current to perform an execve()
- * - The caller must hold current->cred_exec_mutex
+ * - The caller must hold current->fs->cred_exec_mutex
  */
 struct cred *prepare_exec_creds(void)
 {
@@ -276,8 +276,6 @@ int copy_creds(struct task_struct *p, un
 	struct cred *new;
 	int ret;
 
-	mutex_init(&p->cred_exec_mutex);
-
 	if (
 #ifdef CONFIG_KEYS
 		!p->cred->thread_keyring &&
--- 2.6.29-git8/kernel/fork.c	2009-03-31 16:04:26.000000000 +0100
+++ linux/kernel/fork.c	2009-04-01 00:52:26.000000000 +0100
@@ -695,6 +695,7 @@ static struct fs_struct *__copy_fs_struc
 		fs->pwd = old->pwd;
 		path_get(&old->pwd);
 		read_unlock(&old->lock);
+		mutex_init(&fs->cred_exec_mutex);
 	}
 	return fs;
 }
@@ -708,11 +709,19 @@ EXPORT_SYMBOL_GPL(copy_fs_struct);
 
 static int copy_fs(unsigned long clone_flags, struct task_struct *tsk)
 {
+	struct fs_struct *fs = current->fs;
+
 	if (clone_flags & CLONE_FS) {
-		atomic_inc(&current->fs->count);
+		int retval = mutex_lock_interruptible(&fs->cred_exec_mutex);
+		if (retval < 0)
+			return retval;
+		/* tsk->fs is already what we want */
+		atomic_inc(&fs->count);
+		mutex_unlock(&fs->cred_exec_mutex);
 		return 0;
 	}
-	tsk->fs = __copy_fs_struct(current->fs);
+
+	tsk->fs = __copy_fs_struct(fs);
 	if (!tsk->fs)
 		return -ENOMEM;
 	return 0;
--- 2.6.29-git8/kernel/ptrace.c	2009-03-23 23:12:14.000000000 +0000
+++ linux/kernel/ptrace.c	2009-04-01 00:52:26.000000000 +0100
@@ -186,7 +186,7 @@ int ptrace_attach(struct task_struct *ta
 	/* Protect exec's credential calculations against our interference;
 	 * SUID, SGID and LSM creds get determined differently under ptrace.
 	 */
-	retval = mutex_lock_interruptible(&current->cred_exec_mutex);
+	retval = mutex_lock_interruptible(&current->fs->cred_exec_mutex);
 	if (retval  < 0)
 		goto out;
 
@@ -230,7 +230,7 @@ repeat:
 bad:
 	write_unlock_irqrestore(&tasklist_lock, flags);
 	task_unlock(task);
-	mutex_unlock(&current->cred_exec_mutex);
+	mutex_unlock(&current->fs->cred_exec_mutex);
 out:
 	return retval;
 }

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-01  0:28                                   ` Hugh Dickins
@ 2009-04-01  2:38                                     ` Al Viro
  2009-04-01  3:03                                       ` Al Viro
                                                         ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Al Viro @ 2009-04-01  2:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oleg Nesterov, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Wed, Apr 01, 2009 at 01:28:01AM +0100, Hugh Dickins wrote:
> Minor bisectability issue: the third patch, which introduces
> int unshare_fs_struct(void), needs to return 0 when it succeeds:
> that gets corrected in the fourth patch.

ACK.

> Lockdep objects to how check_unsafe_exec nests write_lock(&p->fs_lock)
> inside lock_task_sighand(p, &flags).  It's right: we sometimes take
> sighand->siglock in interrupt, so if such an interrupt occurred just
> after you take fs_lock elsewhere, that could deadlock with this.  It
> seems happy with taking fs_lock just outside the lock_task_sighand.

Right you are, check_unsafe_exec() reordered.  Will push in a few.
 
> Otherwise it looks good to me, except I keep worrying about those
> EAGAINs.  The more so once I noticed current->cred_exec_mutex is
> already being used to handle a similar issue with ptrace.  What
> do you think of this rather smaller patch?  which I'd much rather
> send after having slept on it, since it may be embarrassingly and
> obviously wrong, but tomorrow may be too late ...
 
Eh...  I'm not particulary happy with fork() growing heavier and heavier.
Besides, there's a subtle problem avoided by another variant - think what
happens if past the point of no return execve() will unshare fs_struct
(e.g. by explicit unshare() from dynamic linker).

Frankly, -EAGAIN in situation when we have userland race is fine.  And
we *do* have a userland race here - execve() will kill -9 those threads
in case of success, so if they'd been doing something useful, they are
about to be suddenly screwed.

So I stand by my variant.  Note that if we have *other* tasks sharing
fs_struct, your variant will block their clone() for the duration of
execve() while mine will simply leave them alone (and accept that we
have unsafe sharing).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-01  2:38                                     ` Al Viro
@ 2009-04-01  3:03                                       ` Al Viro
  2009-04-01 11:25                                         ` Hugh Dickins
  2009-04-06 15:31                                         ` Oleg Nesterov
  2009-04-01 11:18                                       ` Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't) Hugh Dickins
  2009-04-06 15:51                                       ` Oleg Nesterov
  2 siblings, 2 replies; 50+ messages in thread
From: Al Viro @ 2009-04-01  3:03 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oleg Nesterov, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Wed, Apr 01, 2009 at 03:38:49AM +0100, Al Viro wrote:
> On Wed, Apr 01, 2009 at 01:28:01AM +0100, Hugh Dickins wrote:
> > Minor bisectability issue: the third patch, which introduces
> > int unshare_fs_struct(void), needs to return 0 when it succeeds:
> > that gets corrected in the fourth patch.
> 
> ACK.
> 
> > Lockdep objects to how check_unsafe_exec nests write_lock(&p->fs_lock)
> > inside lock_task_sighand(p, &flags).  It's right: we sometimes take
> > sighand->siglock in interrupt, so if such an interrupt occurred just
> > after you take fs_lock elsewhere, that could deadlock with this.  It
> > seems happy with taking fs_lock just outside the lock_task_sighand.
> 
> Right you are, check_unsafe_exec() reordered.  Will push in a few.

Rebased and pushed (same tree, same branch; included into for-next, along
with related cleanups).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-01  2:38                                     ` Al Viro
  2009-04-01  3:03                                       ` Al Viro
@ 2009-04-01 11:18                                       ` Hugh Dickins
  2009-04-06 15:51                                       ` Oleg Nesterov
  2 siblings, 0 replies; 50+ messages in thread
From: Hugh Dickins @ 2009-04-01 11:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Oleg Nesterov, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Wed, 1 Apr 2009, Al Viro wrote:
> On Wed, Apr 01, 2009 at 01:28:01AM +0100, Hugh Dickins wrote:
>  
> > Otherwise it looks good to me, except I keep worrying about those
> > EAGAINs.  The more so once I noticed current->cred_exec_mutex is
> > already being used to handle a similar issue with ptrace.  What
> > do you think of this rather smaller patch?  which I'd much rather
> > send after having slept on it, since it may be embarrassingly and
> > obviously wrong, but tomorrow may be too late ...
>  
> Eh...  I'm not particulary happy with fork() growing heavier and heavier.

I don't see it as making fork() any heavier, but never mind.
The important thing is to get a fix out.

> Besides, there's a subtle problem avoided by another variant - think what
> happens if past the point of no return execve() will unshare fs_struct
> (e.g. by explicit unshare() from dynamic linker).

You're too far ahead of me there.

> 
> Frankly, -EAGAIN in situation when we have userland race is fine.  And
> we *do* have a userland race here - execve() will kill -9 those threads
> in case of success, so if they'd been doing something useful, they are
> about to be suddenly screwed.

Good point.  I found it quite odd the way the awkward case (shared
beyond the threadgroup) is allowed to go forward (with possibility
that setuid will be undone), but the easy case is -EAGAINed.  (And
I gave up on trying to find a better name for your "in_exec" flag,
which is rather more subtle than just that!)  But odd as it is,
there's good reason for doing it that way.

> 
> So I stand by my variant.

Fair enough.

> Note that if we have *other* tasks sharing
> fs_struct, your variant will block their clone() for the duration of
> execve() while mine will simply leave them alone (and accept that we
> have unsafe sharing).

Yes, intentional, consistent with the existing cred_exec_mutex technique.

Hugh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-01  3:03                                       ` Al Viro
@ 2009-04-01 11:25                                         ` Hugh Dickins
  2009-04-06 15:31                                         ` Oleg Nesterov
  1 sibling, 0 replies; 50+ messages in thread
From: Hugh Dickins @ 2009-04-01 11:25 UTC (permalink / raw)
  To: Al Viro
  Cc: Oleg Nesterov, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Wed, 1 Apr 2009, Al Viro wrote:
> 
> Rebased and pushed (same tree, same branch; included into for-next, along
> with related cleanups).

for-next?  But don't we need all this (or at least the first four of
your five, on top of the four Linus took from me) in -stable now?

I was wondering if we should ask Chris to do a 2.6.29.1-rc2 with
them; but perhaps you'd much rather expose them in -next for a
while, before sending Linus and 2.6.29.2.

Hugh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-01  3:03                                       ` Al Viro
  2009-04-01 11:25                                         ` Hugh Dickins
@ 2009-04-06 15:31                                         ` Oleg Nesterov
  2009-04-19 16:30                                           ` Hugh Dickins
  1 sibling, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-04-06 15:31 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 04/01, Al Viro wrote:
>
> Rebased and pushed (same tree, same branch; included into for-next, along
> with related cleanups).

Sorry for delay!

Afaics, the usage of fs->in_exec is not completely right. But firstly, a
couple of minor nits.


check_unsafe_exec() doesn't need ->siglock, we can iterate over sub-threads
under rcu_read_lock(). Note that with RCU or ->siglock we can set the "wrong"
LSM_UNSAFE_SHARE if we race with copy_process(CLONE_THREAD | CLONE_FS), but
as it was already discussed we don't care. This means it is OK to miss the
freshly cloned thread which has already passed copy_fs().


do_execve:

	/* execve succeeded */
	write_lock(&current->fs->lock);
	current->fs->in_exec = 0;
	write_unlock(&current->fs->lock);

afaics, fs->lock is not needed. If ->in_exec was set, it was set by this
thread-group and we do not share ->fs with another process. Since we are
the only thread now, we can clear ->in_exec lockless.


And now, what I think is wrong:

do_execve:

	out_unmark:
		write_lock(&current->fs->lock);
		current->fs->in_exec = 0;
		write_unlock(&current->fs->lock);

Two threads T1 and T2 and another process P, all share the same ->fs.

T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since ->fs is
shared, we set LSM_UNSAFE but not ->in_exec (actually, not very good name).

P exits and decrements fs->users.

T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not shared,
we set fs->in_exec.

T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and return
to the user-space.

T1 does clone(CLONE_FS /* without CLONE_THREAD */).

T1 continues without LSM_UNSAFE_SHARE while ->fs is shared with another
process.


What do you think about the (uncompiled) patch below ? It doesn't change
compat_do_execve(), just for discussion.

But see also another message I am going to send...

Oleg.

do_execve() must not clear fs->in_exec if it was set by another thread,
and we don't need fs->lock to clear.

Also, s/lock_task_sighand/rcu_read_lock/ in check_unsafe_exec().

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1060,7 +1060,6 @@ EXPORT_SYMBOL(install_exec_creds);
 int check_unsafe_exec(struct linux_binprm *bprm)
 {
 	struct task_struct *p = current, *t;
-	unsigned long flags;
 	unsigned n_fs;
 	int res = 0;
 
@@ -1068,11 +1067,12 @@ int check_unsafe_exec(struct linux_binpr
 
 	n_fs = 1;
 	write_lock(&p->fs->lock);
-	lock_task_sighand(p, &flags);
+	rcu_read_lock();
 	for (t = next_thread(p); t != p; t = next_thread(t)) {
 		if (t->fs == p->fs)
 			n_fs++;
 	}
+	rcu_read_unlock();
 
 	if (p->fs->users > n_fs) {
 		bprm->unsafe |= LSM_UNSAFE_SHARE;
@@ -1080,9 +1080,8 @@ int check_unsafe_exec(struct linux_binpr
 		if (p->fs->in_exec)
 			res = -EAGAIN;
 		p->fs->in_exec = 1;
+		res = 1;
 	}
-
-	unlock_task_sighand(p, &flags);
 	write_unlock(&p->fs->lock);
 
 	return res;
@@ -1284,6 +1283,7 @@ int do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	struct files_struct *displaced;
+	bool clear_in_exec;
 	int retval;
 
 	retval = unshare_files(&displaced);
@@ -1306,8 +1306,9 @@ int do_execve(char * filename,
 		goto out_unlock;
 
 	retval = check_unsafe_exec(bprm);
-	if (retval)
+	if (retval < 0)
 		goto out_unlock;
+	clear_in_exec = retval;
 
 	file = open_exec(filename);
 	retval = PTR_ERR(file);
@@ -1355,9 +1356,7 @@ int do_execve(char * filename,
 		goto out;
 
 	/* execve succeeded */
-	write_lock(&current->fs->lock);
 	current->fs->in_exec = 0;
-	write_unlock(&current->fs->lock);
 	current->in_execve = 0;
 	mutex_unlock(&current->cred_exec_mutex);
 	acct_update_integrals(current);
@@ -1377,9 +1376,8 @@ out_file:
 	}
 
 out_unmark:
-	write_lock(&current->fs->lock);
-	current->fs->in_exec = 0;
-	write_unlock(&current->fs->lock);
+	if (clear_in_exec)
+		current->fs->in_exec = 0;
 
 out_unlock:
 	current->in_execve = 0;


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-01  2:38                                     ` Al Viro
  2009-04-01  3:03                                       ` Al Viro
  2009-04-01 11:18                                       ` Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't) Hugh Dickins
@ 2009-04-06 15:51                                       ` Oleg Nesterov
  2009-04-19 16:44                                         ` Hugh Dickins
  2 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-04-06 15:51 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 04/01, Al Viro wrote:
>
> On Wed, Apr 01, 2009 at 01:28:01AM +0100, Hugh Dickins wrote:
>
> > Otherwise it looks good to me, except I keep worrying about those
> > EAGAINs.
>
> Frankly, -EAGAIN in situation when we have userland race is fine.  And
> we *do* have a userland race here - execve() will kill -9 those threads
> in case of success, so if they'd been doing something useful, they are
> about to be suddenly screwed.

Can't resist! I dislike the "in_exec && -EAGAIN" oddity too.

Yes sure, we can't break the "well written" applications. But imho this
looks strange. And a bit "assymetrical" wrt LSM_UNSAFE_SHARE, I mean
check_unsafe_exec() allows sub-threads to race or CLONE_FS but only if
LSM_UNSAFE_SHARE.

Another reason, we can have the "my test-case found something strange"
bug-reports.

So. Please feel free to nack or just ignore this message, but since I
personally dislike the current behaviour I should at least try to suggest
something else.

	- add "wait_queue_head_t in_exec_wait" to "struct linux_binprm".

	- kill fs->in_exec, add "wait_queue_head_t *in_exec_wait_ptr"
	  instead.

	- introduce the new helper,

		void fs_lock_check_exec(struct fs_struct *fs)
		{
			write_lock(&fs->lock);
			while (unlikely(fs->in_exec_wait_ptr)) {
				DECLARE_WAITQUEUE(wait, current);

				if (fatal_signal_pending(current))
					/*
					 * clone/exec can't succeed, and this
					 * thread won't return to the user-space
					 */
					break;

				__add_wait_queue(fs->in_exec_wait_ptr, &wait);
				__set_current_state(TASK_KILLABLE);
				write_unlock(&fs->lock);

				schedule();

				write_lock(&fs->lock);
				__remove_wait_queue(&wait);
			}
		}

	  Or we can return -EANYTHING when fatal_signal_pending(), this doesn't
	  matter.

	  Note that this helper can block only if we race with our sub-thread
	  in the middle of !LSM_UNSAFE_SHARE exec. Otherwise this is not slower
	  than write_lock(fs->lock) + if (fs->in_exec) we currently have.


	 - change copy_fs() to do

		if (clone_flags & CLONE_FS) {
			fs_lock_check_exec(fs);
			fs->users++;
			write_unlock(&fs->lock);
			return 0;
		}


	- change check_unsafe_exec:

		void check_unsafe_exec(struct linux_binprm *bprm)
		{
			struct task_struct *p = current, *t;
			unsigned n_fs;

			bprm->unsafe = tracehook_unsafe_exec(p);

			n_fs = 1;
			fs_lock_check_exec(&p->fs);
			if (p->fs->in_exec_wait_ptr)
				/* we are going to die */
				goto out;

			rcu_read_lock();
			for (t = next_thread(p); t != p; t = next_thread(t)) {
				if (t->fs == p->fs)
					n_fs++;
			}
			rcu_read_unlock();

			if (p->fs->users > n_fs) {
				bprm->unsafe |= LSM_UNSAFE_SHARE;
			} else {
				bprm->unsafe |= __LSM_EXEC_WAKE;
				init_waitqueue_head(&bprm->in_exec_wait);
				p->fs->in_exec_wait_ptr = &bprm->in_exec_wait;
			}
		out:
			write_unlock(&p->fs->lock);
		}



	 - and, finally, change do_execve()

			/* execve succeeded */
			current->fs->in_exec_wait_ptr = NULL;

			...

		out_unmark:
			if (bprm->unsafe & __LSM_EXEC_WAKE) {
				write_lock(&current->fs->lock);
				current->fs->in_exec_wait_ptr = NULL;
				wake_up_locked(&bprm->in_exec_wait);
				write_unlock(&current->fs->lock);
			}

Comments?

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-06 15:31                                         ` Oleg Nesterov
@ 2009-04-19 16:30                                           ` Hugh Dickins
  2009-04-21 16:10                                             ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2009-04-19 16:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Mon, 6 Apr 2009, Oleg Nesterov wrote:
> On 04/01, Al Viro wrote:
> >
> > Rebased and pushed (same tree, same branch; included into for-next, along
> > with related cleanups).
> 
> Sorry for delay!

Please don't suppose that you can ever beat me at the slowness game!

> 
> Afaics, the usage of fs->in_exec is not completely right. But firstly, a
> couple of minor nits.
> 
> 
> check_unsafe_exec() doesn't need ->siglock, we can iterate over sub-threads
> under rcu_read_lock(). Note that with RCU or ->siglock we can set the "wrong"
> LSM_UNSAFE_SHARE if we race with copy_process(CLONE_THREAD | CLONE_FS), but
> as it was already discussed we don't care. This means it is OK to miss the
> freshly cloned thread which has already passed copy_fs().

Yes, I agree.
And preferable not to have IRQs disabled over that next_thread() loop.

> 
> 
> do_execve:
> 
> 	/* execve succeeded */
> 	write_lock(&current->fs->lock);
> 	current->fs->in_exec = 0;
> 	write_unlock(&current->fs->lock);
> 
> afaics, fs->lock is not needed. If ->in_exec was set, it was set by this
> thread-group and we do not share ->fs with another process. Since we are
> the only thread now, we can clear ->in_exec lockless.

Right, given your fix below.  I wondered for a moment if a barrier
would then be needed, but no, this is all racy (erring on the safe
side) if the userspace insists on being racy here.

> 
> 
> And now, what I think is wrong:
> 
> do_execve:
> 
> 	out_unmark:
> 		write_lock(&current->fs->lock);
> 		current->fs->in_exec = 0;
> 		write_unlock(&current->fs->lock);
> 
> Two threads T1 and T2 and another process P, all share the same ->fs.
> 
> T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since ->fs is
> shared, we set LSM_UNSAFE but not ->in_exec (actually, not very good name).
> 
> P exits and decrements fs->users.
> 
> T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not shared,
> we set fs->in_exec.
> 
> T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and return
> to the user-space.
> 
> T1 does clone(CLONE_FS /* without CLONE_THREAD */).
> 
> T1 continues without LSM_UNSAFE_SHARE while ->fs is shared with another
> process.

If I follow you correctly, you meant to say T2 not T1 in the last step.

> 
> 
> What do you think about the (uncompiled) patch below ? It doesn't change
> compat_do_execve(), just for discussion.
> 
> But see also another message I am going to send...
> 
> Oleg.
> 
> do_execve() must not clear fs->in_exec if it was set by another thread,
> and we don't need fs->lock to clear.
> 
> Also, s/lock_task_sighand/rcu_read_lock/ in check_unsafe_exec().

Yes, I think your clear_in_exec change is a necessary one,
and your rcu_read_lock well worth while.

One tiny change (aside from extending to compat_do_execve):
Al originally had check_unsafe_exec()'s write_lock(&p->fs->lock)
after the lock_task_sighand(p, &flags), but was forced to invert
that by the IRQ issue lockdep flagged.  I think we'd all prefer
to think of fs->lock as an innermost lock, and would like it
now to go after your rcu_read_lock().

(You do rcu_read_unlock() earlier, but that's okay.)

Hugh

> 
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1060,7 +1060,6 @@ EXPORT_SYMBOL(install_exec_creds);
>  int check_unsafe_exec(struct linux_binprm *bprm)
>  {
>  	struct task_struct *p = current, *t;
> -	unsigned long flags;
>  	unsigned n_fs;
>  	int res = 0;
>  
> @@ -1068,11 +1067,12 @@ int check_unsafe_exec(struct linux_binpr
>  
>  	n_fs = 1;
>  	write_lock(&p->fs->lock);
> -	lock_task_sighand(p, &flags);
> +	rcu_read_lock();
>  	for (t = next_thread(p); t != p; t = next_thread(t)) {
>  		if (t->fs == p->fs)
>  			n_fs++;
>  	}
> +	rcu_read_unlock();
>  
>  	if (p->fs->users > n_fs) {
>  		bprm->unsafe |= LSM_UNSAFE_SHARE;
> @@ -1080,9 +1080,8 @@ int check_unsafe_exec(struct linux_binpr
>  		if (p->fs->in_exec)
>  			res = -EAGAIN;
>  		p->fs->in_exec = 1;
> +		res = 1;
>  	}
> -
> -	unlock_task_sighand(p, &flags);
>  	write_unlock(&p->fs->lock);
>  
>  	return res;
> @@ -1284,6 +1283,7 @@ int do_execve(char * filename,
>  	struct linux_binprm *bprm;
>  	struct file *file;
>  	struct files_struct *displaced;
> +	bool clear_in_exec;
>  	int retval;
>  
>  	retval = unshare_files(&displaced);
> @@ -1306,8 +1306,9 @@ int do_execve(char * filename,
>  		goto out_unlock;
>  
>  	retval = check_unsafe_exec(bprm);
> -	if (retval)
> +	if (retval < 0)
>  		goto out_unlock;
> +	clear_in_exec = retval;
>  
>  	file = open_exec(filename);
>  	retval = PTR_ERR(file);
> @@ -1355,9 +1356,7 @@ int do_execve(char * filename,
>  		goto out;
>  
>  	/* execve succeeded */
> -	write_lock(&current->fs->lock);
>  	current->fs->in_exec = 0;
> -	write_unlock(&current->fs->lock);
>  	current->in_execve = 0;
>  	mutex_unlock(&current->cred_exec_mutex);
>  	acct_update_integrals(current);
> @@ -1377,9 +1376,8 @@ out_file:
>  	}
>  
>  out_unmark:
> -	write_lock(&current->fs->lock);
> -	current->fs->in_exec = 0;
> -	write_unlock(&current->fs->lock);
> +	if (clear_in_exec)
> +		current->fs->in_exec = 0;
>  
>  out_unlock:
>  	current->in_execve = 0;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-06 15:51                                       ` Oleg Nesterov
@ 2009-04-19 16:44                                         ` Hugh Dickins
  2009-04-21 16:39                                           ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2009-04-19 16:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Mon, 6 Apr 2009, Oleg Nesterov wrote:
> On 04/01, Al Viro wrote:
> >
> > On Wed, Apr 01, 2009 at 01:28:01AM +0100, Hugh Dickins wrote:
> >
> > > Otherwise it looks good to me, except I keep worrying about those
> > > EAGAINs.
> >
> > Frankly, -EAGAIN in situation when we have userland race is fine.  And
> > we *do* have a userland race here - execve() will kill -9 those threads
> > in case of success, so if they'd been doing something useful, they are
> > about to be suddenly screwed.
> 
> Can't resist! I dislike the "in_exec && -EAGAIN" oddity too.
> 
> Yes sure, we can't break the "well written" applications. But imho this
> looks strange. And a bit "assymetrical" wrt LSM_UNSAFE_SHARE, I mean
> check_unsafe_exec() allows sub-threads to race or CLONE_FS but only if
> LSM_UNSAFE_SHARE.
> 
> Another reason, we can have the "my test-case found something strange"
> bug-reports.
> 
> So. Please feel free to nack or just ignore this message, but since I
> personally dislike the current behaviour I should at least try to suggest
> something else.

I didn't spend very long on this: it looked rather equivalent to the
current->fs->cred_exec_mutex patch that I proposed, but spinning its
own infrastructure rather than relying on the existing mutex (and of
course based on top of Al's patches, now in the tree, which have
changed the path of least resistance).

I've probably missed subtleties, but I still prefer my own suggestion;
though Al hinted at a subtle problem with that which I never grasped.

Of course I agree with you sharing my unease at -EAGAIN and in_exec.
Maybe we just lie in wait preparing a "told you so" for when someone
reports "something strange"!  But I'd really like to see your fix
patch go in.

Hugh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-19 16:30                                           ` Hugh Dickins
@ 2009-04-21 16:10                                             ` Oleg Nesterov
  2009-04-21 16:31                                               ` Linus Torvalds
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-04-21 16:10 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Al Viro, Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 04/19, Hugh Dickins wrote:
>
> On Mon, 6 Apr 2009, Oleg Nesterov wrote:
> >
> > Sorry for delay!
>
> Please don't suppose that you can ever beat me at the slowness game!

I am still trying to compete...

> > check_unsafe_exec() doesn't need ->siglock, we can iterate over sub-threads
> > under rcu_read_lock(). Note that with RCU or ->siglock we can set the "wrong"
> > LSM_UNSAFE_SHARE if we race with copy_process(CLONE_THREAD | CLONE_FS), but
> > as it was already discussed we don't care. This means it is OK to miss the
> > freshly cloned thread which has already passed copy_fs().
>
> Yes, I agree.
> And preferable not to have IRQs disabled over that next_thread() loop.

Yes sure, we don't need local_irq_disable(), only rcu_read_lock().

> > T1 does clone(CLONE_FS /* without CLONE_THREAD */).
> >
> > T1 continues without LSM_UNSAFE_SHARE while ->fs is shared with another
> > process.
>
> If I follow you correctly, you meant to say T2 not T1 in the last step.

Yes,

> Yes, I think your clear_in_exec change is a necessary one,
> and your rcu_read_lock well worth while.

OK, I'll send 2 simple patches, the first one kills lock_task_sighand(),
another adds clear_in_exec.

But,

> One tiny change (aside from extending to compat_do_execve):
> Al originally had check_unsafe_exec()'s write_lock(&p->fs->lock)
> after the lock_task_sighand(p, &flags), but was forced to invert
> that by the IRQ issue lockdep flagged.  I think we'd all prefer
> to think of fs->lock as an innermost lock, and would like it
> now to go after your rcu_read_lock().

Since we are not going to disable IRQs, perhaps the above does
not matter? It is always safe to take rcu_read_lock(), no matter
which locks we already hold.

> (You do rcu_read_unlock() earlier, but that's okay.)

Yes, but unless we have a "strong" reason, it is better to take
fs->lock first. rcu_read_lock() is free, but disables preemption.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-21 16:10                                             ` Oleg Nesterov
@ 2009-04-21 16:31                                               ` Linus Torvalds
  2009-04-21 17:15                                                 ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2009-04-21 16:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Al Viro, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel



On Tue, 21 Apr 2009, Oleg Nesterov wrote:
> 
> > (You do rcu_read_unlock() earlier, but that's okay.)
> 
> Yes, but unless we have a "strong" reason, it is better to take
> fs->lock first. rcu_read_lock() is free, but disables preemption.

.. but so does taking a spinlock. So it shouldn't matter.

We could play games with that (the same way I think we have some games for 
large-system irq latency with '__raw_spin_lock_flags()' on ia64), but that 
makes sense only when you have lots of CPU's and expect irq latency to 
suffer. 

And it doesn't tend to make sense for preemption latency, because if you 
have so many CPU's that you have lots of spinning on locks, you would 
normally not really care deeply about preemption (sure, in theory it's a 
real-time thing, in practice I doubt you'll find anybody who cares).

			Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-19 16:44                                         ` Hugh Dickins
@ 2009-04-21 16:39                                           ` Oleg Nesterov
  0 siblings, 0 replies; 50+ messages in thread
From: Oleg Nesterov @ 2009-04-21 16:39 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Al Viro, Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 04/19, Hugh Dickins wrote:
>
> On Mon, 6 Apr 2009, Oleg Nesterov wrote:
> > On 04/01, Al Viro wrote:
> > >
> > > On Wed, Apr 01, 2009 at 01:28:01AM +0100, Hugh Dickins wrote:
> > >
> > > > Otherwise it looks good to me, except I keep worrying about those
> > > > EAGAINs.
> > >
> > > Frankly, -EAGAIN in situation when we have userland race is fine.  And
> > > we *do* have a userland race here - execve() will kill -9 those threads
> > > in case of success, so if they'd been doing something useful, they are
> > > about to be suddenly screwed.
> >
> > Can't resist! I dislike the "in_exec && -EAGAIN" oddity too.
> >
> > Yes sure, we can't break the "well written" applications. But imho this
> > looks strange. And a bit "assymetrical" wrt LSM_UNSAFE_SHARE, I mean
> > check_unsafe_exec() allows sub-threads to race or CLONE_FS but only if
> > LSM_UNSAFE_SHARE.
> >
> > Another reason, we can have the "my test-case found something strange"
> > bug-reports.
> >
> > So. Please feel free to nack or just ignore this message, but since I
> > personally dislike the current behaviour I should at least try to suggest
> > something else.
>
> I didn't spend very long on this: it looked rather equivalent to the
> current->fs->cred_exec_mutex patch that I proposed, but spinning its
> own infrastructure rather than relying on the existing mutex (and of
> course based on top of Al's patches, now in the tree, which have
> changed the path of least resistance).
>
> I've probably missed subtleties, but I still prefer my own suggestion;
> though Al hinted at a subtle problem with that which I never grasped.

I like your patch more too. Except it has problems with unshare_fs() as
Al pointed out.

I agree, this fs->in_exec_wait_ptr looks really ugly, so...

> Of course I agree with you sharing my unease at -EAGAIN and in_exec.
> Maybe we just lie in wait preparing a "told you so" for when someone
> reports "something strange"!  But I'd really like to see your fix
> patch go in.

Perhaps I'll try to make the patch later, but I am not sure I send it ;)
In any case it complicates the code.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-21 16:31                                               ` Linus Torvalds
@ 2009-04-21 17:15                                                 ` Oleg Nesterov
  2009-04-21 17:35                                                   ` Linus Torvalds
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2009-04-21 17:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Al Viro, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On 04/21, Linus Torvalds wrote:
>
> On Tue, 21 Apr 2009, Oleg Nesterov wrote:
> >
> > > (You do rcu_read_unlock() earlier, but that's okay.)
> >
> > Yes, but unless we have a "strong" reason, it is better to take
> > fs->lock first. rcu_read_lock() is free, but disables preemption.
>
> .. but so does taking a spinlock. So it shouldn't matter.
>
> We could play games with that (the same way I think we have some games for
> large-system irq latency with '__raw_spin_lock_flags()' on ia64), but that
> makes sense only when you have lots of CPU's and expect irq latency to
> suffer.
>
> And it doesn't tend to make sense for preemption latency, because if you
> have so many CPU's that you have lots of spinning on locks, you would
> normally not really care deeply about preemption (sure, in theory it's a
> real-time thing, in practice I doubt you'll find anybody who cares).

OK, I agree, it doesn't really matter from latency/etc pov.

But still I can't understand why it is better to take fs->lock under
RCU lock. I mean, "fs->lock is the innermost lock" should not apply
to rcu_read_lock(). Because the latter is a bit special, no?

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-21 17:15                                                 ` Oleg Nesterov
@ 2009-04-21 17:35                                                   ` Linus Torvalds
  2009-04-21 19:39                                                     ` Hugh Dickins
  0 siblings, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2009-04-21 17:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Al Viro, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel



On Tue, 21 Apr 2009, Oleg Nesterov wrote:
> 
> OK, I agree, it doesn't really matter from latency/etc pov.
> 
> But still I can't understand why it is better to take fs->lock under
> RCU lock. I mean, "fs->lock is the innermost lock" should not apply
> to rcu_read_lock(). Because the latter is a bit special, no?

Oh, I don't think it matters. If you want to put the RCU read-lock 
innermost, that's fine by me. I just reacted to your latency argument as 
not being very strong :)

All I personally want is a patch that everybody can agree on, and that 
has sane semantics. 

		Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't)
  2009-04-21 17:35                                                   ` Linus Torvalds
@ 2009-04-21 19:39                                                     ` Hugh Dickins
  2009-04-23 23:01                                                       ` [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread Oleg Nesterov
  2009-04-23 23:02                                                       ` [PATCH 2/2] check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/ Oleg Nesterov
  0 siblings, 2 replies; 50+ messages in thread
From: Hugh Dickins @ 2009-04-21 19:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Al Viro, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel

On Tue, 21 Apr 2009, Linus Torvalds wrote:
> On Tue, 21 Apr 2009, Oleg Nesterov wrote:
> > 
> > OK, I agree, it doesn't really matter from latency/etc pov.
> > 
> > But still I can't understand why it is better to take fs->lock under
> > RCU lock. I mean, "fs->lock is the innermost lock" should not apply
> > to rcu_read_lock(). Because the latter is a bit special, no?
> 
> Oh, I don't think it matters. If you want to put the RCU read-lock 
> innermost, that's fine by me. I just reacted to your latency argument as 
> not being very strong :)
> 
> All I personally want is a patch that everybody can agree on, and that 
> has sane semantics. 

Right, that ordering scarcely matters, and can probably be argued
either way.  I should have been clearer when I suggested inverting
them to Oleg: I meant it merely as a suggestion, that we go back
to the ordering which came more naturally to Al in the first place.
And since Al hasn't spoken up (probably has more important things
to care about), please do go ahead with your two patches, Oleg,
with the rcu_read_lock() on whichever side takes your fancy!

Thanks,
Hugh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread
  2009-04-21 19:39                                                     ` Hugh Dickins
@ 2009-04-23 23:01                                                       ` Oleg Nesterov
  2009-04-23 23:18                                                         ` Roland McGrath
                                                                           ` (2 more replies)
  2009-04-23 23:02                                                       ` [PATCH 2/2] check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/ Oleg Nesterov
  1 sibling, 3 replies; 50+ messages in thread
From: Oleg Nesterov @ 2009-04-23 23:01 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Al Viro, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel, Roland McGrath

If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:

	Two threads T1 and T2 and another process P, all share the same
	->fs.

	T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
	->fs is shared, we set LSM_UNSAFE but not ->in_exec.

	P exits and decrements fs->users.

	T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not
	shared, we set fs->in_exec.

	T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and
	return to the user-space.

	T1 does clone(CLONE_FS /* without CLONE_THREAD */).

	T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with
	another process.

Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change
do_execve() to clear ->in_exec depending on res.

When do_execve() suceeds, it is safe to clear ->in_exec unconditionally.
It can be set only if we don't share ->fs with another process, and since
we already killed all sub-threads either ->in_exec == 0 or we are the
only user of this ->fs.

Also, we do not need fs->lock to clear fs->in_exec.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

 fs/exec.c   |   19 ++++++++++---------
 fs/compat.c |   11 +++++------
 2 files changed, 15 insertions(+), 15 deletions(-)

--- PTRACE/fs/exec.c~1_IN_EXEC	2009-04-06 00:03:41.000000000 +0200
+++ PTRACE/fs/exec.c	2009-04-24 00:01:53.000000000 +0200
@@ -1077,9 +1077,11 @@ int check_unsafe_exec(struct linux_binpr
 	if (p->fs->users > n_fs) {
 		bprm->unsafe |= LSM_UNSAFE_SHARE;
 	} else {
-		if (p->fs->in_exec)
-			res = -EAGAIN;
-		p->fs->in_exec = 1;
+		res = -EAGAIN;
+		if (!p->fs->in_exec) {
+			p->fs->in_exec = 1;
+			res = 1;
+		}
 	}
 
 	unlock_task_sighand(p, &flags);
@@ -1284,6 +1286,7 @@ int do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	struct files_struct *displaced;
+	bool clear_in_exec;
 	int retval;
 
 	retval = unshare_files(&displaced);
@@ -1306,8 +1309,9 @@ int do_execve(char * filename,
 		goto out_unlock;
 
 	retval = check_unsafe_exec(bprm);
-	if (retval)
+	if (retval < 0)
 		goto out_unlock;
+	clear_in_exec = retval;
 
 	file = open_exec(filename);
 	retval = PTR_ERR(file);
@@ -1355,9 +1359,7 @@ int do_execve(char * filename,
 		goto out;
 
 	/* execve succeeded */
-	write_lock(&current->fs->lock);
 	current->fs->in_exec = 0;
-	write_unlock(&current->fs->lock);
 	current->in_execve = 0;
 	mutex_unlock(&current->cred_exec_mutex);
 	acct_update_integrals(current);
@@ -1377,9 +1379,8 @@ out_file:
 	}
 
 out_unmark:
-	write_lock(&current->fs->lock);
-	current->fs->in_exec = 0;
-	write_unlock(&current->fs->lock);
+	if (clear_in_exec)
+		current->fs->in_exec = 0;
 
 out_unlock:
 	current->in_execve = 0;
--- PTRACE/fs/compat.c~1_IN_EXEC	2009-04-22 20:49:07.000000000 +0200
+++ PTRACE/fs/compat.c	2009-04-24 00:09:36.000000000 +0200
@@ -1476,6 +1476,7 @@ int compat_do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	struct files_struct *displaced;
+	bool clear_in_exec;
 	int retval;
 
 	retval = unshare_files(&displaced);
@@ -1498,8 +1499,9 @@ int compat_do_execve(char * filename,
 		goto out_unlock;
 
 	retval = check_unsafe_exec(bprm);
-	if (retval)
+	if (retval < 0)
 		goto out_unlock;
+	clear_in_exec = retval;
 
 	file = open_exec(filename);
 	retval = PTR_ERR(file);
@@ -1546,9 +1548,7 @@ int compat_do_execve(char * filename,
 		goto out;
 
 	/* execve succeeded */
-	write_lock(&current->fs->lock);
 	current->fs->in_exec = 0;
-	write_unlock(&current->fs->lock);
 	current->in_execve = 0;
 	mutex_unlock(&current->cred_exec_mutex);
 	acct_update_integrals(current);
@@ -1568,9 +1568,8 @@ out_file:
 	}
 
 out_unmark:
-	write_lock(&current->fs->lock);
-	current->fs->in_exec = 0;
-	write_unlock(&current->fs->lock);
+	if (clear_in_exec)
+		current->fs->in_exec = 0;
 
 out_unlock:
 	current->in_execve = 0;


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/2] check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/
  2009-04-21 19:39                                                     ` Hugh Dickins
  2009-04-23 23:01                                                       ` [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread Oleg Nesterov
@ 2009-04-23 23:02                                                       ` Oleg Nesterov
  2009-04-23 23:18                                                         ` Roland McGrath
  2009-04-24  4:29                                                         ` Hugh Dickins
  1 sibling, 2 replies; 50+ messages in thread
From: Oleg Nesterov @ 2009-04-23 23:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Al Viro, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel, Roland McGrath

write_lock(&current->fs->lock) guarantees we can't wrongly miss
LSM_UNSAFE_SHARE, this is what we care about. Use rcu_read_lock()
instead of ->siglock to iterate over the sub-threads. We must see
all CLONE_THREAD|CLONE_FS threads which didn't pass exit_fs(), it
takes fs->lock too.

With or without this patch we can miss the freshly cloned thread
and set LSM_UNSAFE_SHARE, we don't care.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

--- PTRACE/fs/exec.c~2_NO_SIGLOCK	2009-04-24 00:01:53.000000000 +0200
+++ PTRACE/fs/exec.c	2009-04-24 00:36:22.000000000 +0200
@@ -1060,7 +1060,6 @@ EXPORT_SYMBOL(install_exec_creds);
 int check_unsafe_exec(struct linux_binprm *bprm)
 {
 	struct task_struct *p = current, *t;
-	unsigned long flags;
 	unsigned n_fs;
 	int res = 0;
 
@@ -1068,11 +1067,12 @@ int check_unsafe_exec(struct linux_binpr
 
 	n_fs = 1;
 	write_lock(&p->fs->lock);
-	lock_task_sighand(p, &flags);
+	rcu_read_lock();
 	for (t = next_thread(p); t != p; t = next_thread(t)) {
 		if (t->fs == p->fs)
 			n_fs++;
 	}
+	rcu_read_lock();
 
 	if (p->fs->users > n_fs) {
 		bprm->unsafe |= LSM_UNSAFE_SHARE;
@@ -1083,8 +1083,6 @@ int check_unsafe_exec(struct linux_binpr
 			res = 1;
 		}
 	}
-
-	unlock_task_sighand(p, &flags);
 	write_unlock(&p->fs->lock);
 
 	return res;


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread
  2009-04-23 23:01                                                       ` [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread Oleg Nesterov
@ 2009-04-23 23:18                                                         ` Roland McGrath
  2009-04-23 23:31                                                         ` Al Viro
  2009-04-24  4:20                                                         ` [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread Hugh Dickins
  2 siblings, 0 replies; 50+ messages in thread
From: Roland McGrath @ 2009-04-23 23:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Al Viro, Andrew Morton,
	Joe Malicki, Michael Itz, Kenneth Baker, Chris Wright,
	David Howells, Alexey Dobriyan, Greg Kroah-Hartman,
	linux-fsdevel, linux-kernel

Acked-by: Roland McGrath <roland@redhat.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/
  2009-04-23 23:02                                                       ` [PATCH 2/2] check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/ Oleg Nesterov
@ 2009-04-23 23:18                                                         ` Roland McGrath
  2009-04-24  4:29                                                         ` Hugh Dickins
  1 sibling, 0 replies; 50+ messages in thread
From: Roland McGrath @ 2009-04-23 23:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Al Viro, Andrew Morton,
	Joe Malicki, Michael Itz, Kenneth Baker, Chris Wright,
	David Howells, Alexey Dobriyan, Greg Kroah-Hartman,
	linux-fsdevel, linux-kernel

Acked-by: Roland McGrath <roland@redhat.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread
  2009-04-23 23:01                                                       ` [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread Oleg Nesterov
  2009-04-23 23:18                                                         ` Roland McGrath
@ 2009-04-23 23:31                                                         ` Al Viro
  2009-04-24 11:57                                                           ` [PATCH 3/2] check_unsafe_exec: rcu_read_unlock Hugh Dickins
  2009-04-24  4:20                                                         ` [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread Hugh Dickins
  2 siblings, 1 reply; 50+ messages in thread
From: Al Viro @ 2009-04-23 23:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel,
	Roland McGrath

On Fri, Apr 24, 2009 at 01:01:56AM +0200, Oleg Nesterov wrote:
> If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
> unconditionally. This is wrong if we race with our sub-thread which
> also does do_execve:


[snip]

Applied (both of those).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread
  2009-04-23 23:01                                                       ` [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread Oleg Nesterov
  2009-04-23 23:18                                                         ` Roland McGrath
  2009-04-23 23:31                                                         ` Al Viro
@ 2009-04-24  4:20                                                         ` Hugh Dickins
  2 siblings, 0 replies; 50+ messages in thread
From: Hugh Dickins @ 2009-04-24  4:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Al Viro, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel, Roland McGrath

Acked-by: Hugh Dickins <hugh@veritas.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/
  2009-04-23 23:02                                                       ` [PATCH 2/2] check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/ Oleg Nesterov
  2009-04-23 23:18                                                         ` Roland McGrath
@ 2009-04-24  4:29                                                         ` Hugh Dickins
  1 sibling, 0 replies; 50+ messages in thread
From: Hugh Dickins @ 2009-04-24  4:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Al Viro, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel, Roland McGrath

On Fri, 23 Apr 2009, Oleg Nesterov wrote:
>  	write_lock(&p->fs->lock);
> -	lock_task_sighand(p, &flags);
> +	rcu_read_lock();
>  	for (t = next_thread(p); t != p; t = next_thread(t)) {
>  		if (t->fs == p->fs)
>  			n_fs++;
>  	}
> +	rcu_read_lock();

Not quite! if second rcu_read_lock() changed to rcu_read_unlock(), then
Acked-by: Hugh Dickins <hugh@veritas.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 3/2] check_unsafe_exec: rcu_read_unlock
  2009-04-23 23:31                                                         ` Al Viro
@ 2009-04-24 11:57                                                           ` Hugh Dickins
  2009-04-24 14:34                                                             ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2009-04-24 11:57 UTC (permalink / raw)
  To: Al Viro
  Cc: Oleg Nesterov, Linus Torvalds, Andrew Morton, Joe Malicki,
	Michael Itz, Kenneth Baker, Chris Wright, David Howells,
	Alexey Dobriyan, Greg Kroah-Hartman, linux-fsdevel, linux-kernel,
	Roland McGrath

Fix typo in previous commit: second rcu_read_lock should be rcu_read_unlock.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 fs/exec.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 2.6.30-rc3-next-20090424/fs/exec.c	2009-04-24 12:23:43.000000000 +0100
+++ linux/fs/exec.c	2009-04-24 12:26:10.000000000 +0100
@@ -1043,7 +1043,7 @@ int check_unsafe_exec(struct linux_binpr
 		if (t->fs == p->fs)
 			n_fs++;
 	}
-	rcu_read_lock();
+	rcu_read_unlock();
 
 	if (p->fs->users > n_fs) {
 		bprm->unsafe |= LSM_UNSAFE_SHARE;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/2] check_unsafe_exec: rcu_read_unlock
  2009-04-24 11:57                                                           ` [PATCH 3/2] check_unsafe_exec: rcu_read_unlock Hugh Dickins
@ 2009-04-24 14:34                                                             ` Oleg Nesterov
  0 siblings, 0 replies; 50+ messages in thread
From: Oleg Nesterov @ 2009-04-24 14:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Al Viro, Linus Torvalds, Andrew Morton, Joe Malicki, Michael Itz,
	Kenneth Baker, Chris Wright, David Howells, Alexey Dobriyan,
	Greg Kroah-Hartman, linux-fsdevel, linux-kernel, Roland McGrath

On 04/24, Hugh Dickins wrote:
>
> --- 2.6.30-rc3-next-20090424/fs/exec.c	2009-04-24 12:23:43.000000000 +0100
> +++ linux/fs/exec.c	2009-04-24 12:26:10.000000000 +0100
> @@ -1043,7 +1043,7 @@ int check_unsafe_exec(struct linux_binpr
>  		if (t->fs == p->fs)
>  			n_fs++;
>  	}
> -	rcu_read_lock();
> +	rcu_read_unlock();

I'd say this change is not bad. Except it discloses the fact I didn't
bother to test my trivial patch.

Thanks a lot Hugh!!!

And my apologies to all.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2009-04-24 14:41 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-28 23:16 [PATCH 1/4] compat_do_execve should unshare_files Hugh Dickins
2009-03-28 23:20 ` [PATCH 2/4] fix setuid sometimes doesn't Hugh Dickins
2009-03-29  0:53   ` Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't) Oleg Nesterov
2009-03-29  4:10     ` Al Viro
2009-03-29  4:14       ` Al Viro
2009-03-29  4:52       ` Oleg Nesterov
2009-03-29  5:55         ` Al Viro
2009-03-29  6:01           ` Al Viro
2009-03-29 21:36             ` Oleg Nesterov
2009-03-29 22:20               ` Al Viro
2009-03-29 23:56                 ` Oleg Nesterov
2009-03-30  0:03                   ` Oleg Nesterov
2009-03-30  1:08                     ` Al Viro
2009-03-30  1:13                       ` Al Viro
2009-03-30  1:36                         ` Oleg Nesterov
2009-03-30  1:40                           ` Oleg Nesterov
2009-03-30 12:31                             ` Al Viro
2009-03-30 14:32                               ` Hugh Dickins
2009-03-31  6:16                                 ` Al Viro
2009-04-01  0:28                                   ` Hugh Dickins
2009-04-01  2:38                                     ` Al Viro
2009-04-01  3:03                                       ` Al Viro
2009-04-01 11:25                                         ` Hugh Dickins
2009-04-06 15:31                                         ` Oleg Nesterov
2009-04-19 16:30                                           ` Hugh Dickins
2009-04-21 16:10                                             ` Oleg Nesterov
2009-04-21 16:31                                               ` Linus Torvalds
2009-04-21 17:15                                                 ` Oleg Nesterov
2009-04-21 17:35                                                   ` Linus Torvalds
2009-04-21 19:39                                                     ` Hugh Dickins
2009-04-23 23:01                                                       ` [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread Oleg Nesterov
2009-04-23 23:18                                                         ` Roland McGrath
2009-04-23 23:31                                                         ` Al Viro
2009-04-24 11:57                                                           ` [PATCH 3/2] check_unsafe_exec: rcu_read_unlock Hugh Dickins
2009-04-24 14:34                                                             ` Oleg Nesterov
2009-04-24  4:20                                                         ` [PATCH 1/2] do_execve() must not clear fs->in_exec if it was set by another thread Hugh Dickins
2009-04-23 23:02                                                       ` [PATCH 2/2] check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/ Oleg Nesterov
2009-04-23 23:18                                                         ` Roland McGrath
2009-04-24  4:29                                                         ` Hugh Dickins
2009-04-01 11:18                                       ` Q: check_unsafe_exec() races (Was: [PATCH 2/4] fix setuid sometimes doesn't) Hugh Dickins
2009-04-06 15:51                                       ` Oleg Nesterov
2009-04-19 16:44                                         ` Hugh Dickins
2009-04-21 16:39                                           ` Oleg Nesterov
2009-03-30 23:45                               ` Serge E. Hallyn
2009-03-31  6:19                                 ` Al Viro
2009-03-28 23:21 ` [PATCH 3/4] fix setuid sometimes wouldn't Hugh Dickins
2009-03-29 11:19   ` Alexey Dobriyan
2009-03-29 21:48     ` Oleg Nesterov
2009-03-29 22:37       ` Al Viro
2009-03-28 23:23 ` [PATCH 4/4] Annotate struct fs_struct's usage count restriction Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.