linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] proc: first_tid() fix/cleanup
@ 2013-06-03 19:06 Oleg Nesterov
  2013-06-03 19:06 ` [PATCH v2 1/4] proc: first_tid: fix the potential use-after-free Oleg Nesterov
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-03 19:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, Michal Hocko, Sergey Dyasly, linux-kernel

Hello.

next_thread() should be avoided, probably next_tid() is the
only "valid" user.

But now we have another reason to avoid (and probably even kill)
it, we are going to replace or fix while_each_thread(), almost
every lockless usage is wrong.

Changes:

	1/4: Update the changelog, do not move the comment.

	2/4: No changes.

	3/4: Update the comment following the explanations from
	     Eric.

	     Eric pointed that get_proc_task() without rcu lock
	     can trigger the (bogus) warning. Extract the similar
	     check from pid_delete_dentry() into the new helper
	     and use it instead.

	     I didn't dare to preserve his ack, but the only change
	     is the new proc_inode_is_dead() helper and

		- if (pid_task(proc_pid(inode))
		+ if (proc_inode_is_dead(inode))

	     in proc_task_readdir().

	4/4: New.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/4] proc: first_tid: fix the potential use-after-free
  2013-06-03 19:06 [PATCH v2 0/4] proc: first_tid() fix/cleanup Oleg Nesterov
@ 2013-06-03 19:06 ` Oleg Nesterov
  2013-06-03 19:07 ` [PATCH v2 2/4] proc: change first_tid() to use while_each_thread() Oleg Nesterov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-03 19:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, Michal Hocko, Sergey Dyasly, linux-kernel

proc_task_readdir() verifies that the result of get_proc_task()
is pid_alive() and thus its ->group_leader is fine too. However
this is not necessarily true after rcu_read_unlock(), we need
to recheck this again after first_tid() does rcu_read_lock().
Otherwise leader->thread_group.next (used by next_thread()) can
be invalid if the rcu grace period expires in between.

The race is subtle and unlikely, but still it is possible afaics.
To simplify lets ignore the "likely" case when tid != 0, f_version
can be cleared by proc_task_operations->llseek().

Suppose we have a main thread M and its subthread T. Suppose that
f_pos == 3, iow first_tid() should return T. Now suppose that the
following happens between rcu_read_unlock() and rcu_read_lock():

	1. T execs and becomes the new leader. This removes M from
	    ->thread_group but next_thread(M) is still T.

	2. T creates another thread X which does exec as well, T
	   goes away.

	3. X creates another subthread, this increments nr_threads.

	4. first_tid() does next_thread(M) and returns the already
	   dead T.

Note also that we need 2. and 3. only because of get_nr_threads()
check, and this check was supposed to be optimization only.

Note: I think that proc_task_readdir/first_tid interaction can be
simplified, but this needs another patch. proc_task_readdir() should
not play with ->group_leader at all. See the next patches.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/proc/base.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index dd51e50..daf41dc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3190,6 +3190,9 @@ static struct task_struct *first_tid(struct task_struct *leader,
 	pos = NULL;
 	if (nr && nr >= get_nr_threads(leader))
 		goto out;
+	/* It could be unhashed before we take rcu lock */
+	if (!pid_alive(leader))
+		goto out;
 
 	/* If we haven't found our starting place yet start
 	 * with the leader and walk nr threads forward.
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/4] proc: change first_tid() to use while_each_thread()
  2013-06-03 19:06 [PATCH v2 0/4] proc: first_tid() fix/cleanup Oleg Nesterov
  2013-06-03 19:06 ` [PATCH v2 1/4] proc: first_tid: fix the potential use-after-free Oleg Nesterov
@ 2013-06-03 19:07 ` Oleg Nesterov
  2013-06-03 19:07 ` [PATCH v2 3/4] proc: simplify proc_task_readdir/first_tid paths Oleg Nesterov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-03 19:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, Michal Hocko, Sergey Dyasly, linux-kernel

Change first_tid() to use while_each_thread() instead of
next_thread().

This doesn't make the code simpler/better, but we are going
to fix or replace while_each_thread(), next_thread() should
be avoided whenever possible.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/proc/base.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index daf41dc..bed1096 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3187,23 +3187,23 @@ static struct task_struct *first_tid(struct task_struct *leader,
 	}
 
 	/* If nr exceeds the number of threads there is nothing todo */
-	pos = NULL;
 	if (nr && nr >= get_nr_threads(leader))
-		goto out;
+		goto fail;
 	/* It could be unhashed before we take rcu lock */
 	if (!pid_alive(leader))
-		goto out;
+		goto fail;
 
 	/* If we haven't found our starting place yet start
 	 * with the leader and walk nr threads forward.
 	 */
-	for (pos = leader; nr > 0; --nr) {
-		pos = next_thread(pos);
-		if (pos == leader) {
-			pos = NULL;
-			goto out;
-		}
-	}
+	pos = leader;
+	do {
+		if (nr-- <= 0)
+			goto found;
+	} while_each_thread(leader, pos);
+fail:
+	pos = NULL;
+	goto out;
 found:
 	get_task_struct(pos);
 out:
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/4] proc: simplify proc_task_readdir/first_tid paths
  2013-06-03 19:06 [PATCH v2 0/4] proc: first_tid() fix/cleanup Oleg Nesterov
  2013-06-03 19:06 ` [PATCH v2 1/4] proc: first_tid: fix the potential use-after-free Oleg Nesterov
  2013-06-03 19:07 ` [PATCH v2 2/4] proc: change first_tid() to use while_each_thread() Oleg Nesterov
@ 2013-06-03 19:07 ` Oleg Nesterov
  2013-06-03 22:06   ` Eric W. Biederman
  2013-06-03 19:07 ` [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths Oleg Nesterov
  2013-06-04 17:32 ` [PATCH v2 0/4] proc: first_tid() fix/cleanup Oleg Nesterov
  4 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-03 19:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, Michal Hocko, Sergey Dyasly, linux-kernel

proc_task_readdir() does not really need "leader", first_tid()
has to revalidate it anyway. Just pass proc_pid(inode) to
first_tid() instead, it can do pid_task(PIDTYPE_PID) itself
and read ->group_leader only if necessary.

The patch also extracts the "inode is dead" code
from pid_delete_dentry(dentry) into the new trivial helper,
proc_inode_is_dead(inode), proc_task_readdir() uses it to return
-ENOENT if this dir was removed. This is a bit racy, but the race
is very inlikely and the getdents() after openndir() can see the
empty "." + ".." dir only once.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/base.c |   53 ++++++++++++++++++++++-------------------------------
 1 files changed, 22 insertions(+), 31 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index bed1096..5e0e02f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1652,13 +1652,18 @@ int pid_revalidate(struct dentry *dentry, unsigned int flags)
 	return 0;
 }
 
+static inline bool proc_inode_is_dead(struct inode *inode)
+{
+	return !proc_pid(inode)->tasks[PIDTYPE_PID].first;
+}
+
 int pid_delete_dentry(const struct dentry *dentry)
 {
 	/* Is the task we represent dead?
 	 * If so, then don't put the dentry on the lru list,
 	 * kill it immediately.
 	 */
-	return !proc_pid(dentry->d_inode)->tasks[PIDTYPE_PID].first;
+	return proc_inode_is_dead(dentry->d_inode);
 }
 
 const struct dentry_operations pid_dentry_operations =
@@ -3173,34 +3178,35 @@ out_no_task:
  * In the case of a seek we start with the leader and walk nr
  * threads past it.
  */
-static struct task_struct *first_tid(struct task_struct *leader,
-		int tid, int nr, struct pid_namespace *ns)
+static struct task_struct *first_tid(struct pid *pid, int tid,
+					int nr, struct pid_namespace *ns)
 {
-	struct task_struct *pos;
+	struct task_struct *pos, *task;
 
 	rcu_read_lock();
-	/* Attempt to start with the pid of a thread */
+	task = pid_task(pid, PIDTYPE_PID);
+	if (!task)
+		goto fail;
+
+	/* Attempt to start with the tid of a thread */
 	if (tid && (nr > 0)) {
 		pos = find_task_by_pid_ns(tid, ns);
-		if (pos && (pos->group_leader == leader))
+		if (pos && same_thread_group(pos, task))
 			goto found;
 	}
 
 	/* If nr exceeds the number of threads there is nothing todo */
-	if (nr && nr >= get_nr_threads(leader))
-		goto fail;
-	/* It could be unhashed before we take rcu lock */
-	if (!pid_alive(leader))
+	if (nr && nr >= get_nr_threads(task))
 		goto fail;
 
 	/* If we haven't found our starting place yet start
 	 * with the leader and walk nr threads forward.
 	 */
-	pos = leader;
+	pos = task = task->group_leader;
 	do {
 		if (nr-- <= 0)
 			goto found;
-	} while_each_thread(leader, pos);
+	} while_each_thread(task, pos);
 fail:
 	pos = NULL;
 	goto out;
@@ -3247,26 +3253,13 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
 {
 	struct dentry *dentry = filp->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
-	struct task_struct *leader = NULL;
 	struct task_struct *task;
-	int retval = -ENOENT;
 	ino_t ino;
 	int tid;
 	struct pid_namespace *ns;
 
-	task = get_proc_task(inode);
-	if (!task)
-		goto out_no_task;
-	rcu_read_lock();
-	if (pid_alive(task)) {
-		leader = task->group_leader;
-		get_task_struct(leader);
-	}
-	rcu_read_unlock();
-	put_task_struct(task);
-	if (!leader)
-		goto out_no_task;
-	retval = 0;
+	if (proc_inode_is_dead(inode))
+		return -ENOENT;
 
 	switch ((unsigned long)filp->f_pos) {
 	case 0:
@@ -3289,7 +3282,7 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
 	ns = filp->f_dentry->d_sb->s_fs_info;
 	tid = (int)filp->f_version;
 	filp->f_version = 0;
-	for (task = first_tid(leader, tid, filp->f_pos - 2, ns);
+	for (task = first_tid(proc_pid(inode), tid, filp->f_pos - 2, ns);
 	     task;
 	     task = next_tid(task), filp->f_pos++) {
 		tid = task_pid_nr_ns(task, ns);
@@ -3302,9 +3295,7 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
 		}
 	}
 out:
-	put_task_struct(leader);
-out_no_task:
-	return retval;
+	return 0;
 }
 
 static int proc_task_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths
  2013-06-03 19:06 [PATCH v2 0/4] proc: first_tid() fix/cleanup Oleg Nesterov
                   ` (2 preceding siblings ...)
  2013-06-03 19:07 ` [PATCH v2 3/4] proc: simplify proc_task_readdir/first_tid paths Oleg Nesterov
@ 2013-06-03 19:07 ` Oleg Nesterov
  2013-06-03 22:18   ` Eric W. Biederman
  2013-06-04  0:58   ` Al Viro
  2013-06-04 17:32 ` [PATCH v2 0/4] proc: first_tid() fix/cleanup Oleg Nesterov
  4 siblings, 2 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-03 19:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, Michal Hocko, Sergey Dyasly, linux-kernel

1. proc_task_readdir() truncates f_pos to long, this can lead
   to wrong result on 32bit.

2. first_tid() truncates f_pos to int, this is wrong even on
   64bit.

   We could check that f_pos < PID_MAX or even INT_MAX in
   proc_task_readdir(), but this patch simply checks the
   potential overflow in first_tid(), this check is nop on
   64bit. We do not care if it was negative and the new
   unsigned value is huge, all we need to ensure is that we
   never wrongly return !NULL.

3. Remove the 2nd "nr != 0" check before get_nr_threads(),
   nr_threads == 0 is not distinguishable from !pid_task()
   above.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/base.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 5e0e02f..5598cfa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3178,10 +3178,14 @@ out_no_task:
  * In the case of a seek we start with the leader and walk nr
  * threads past it.
  */
-static struct task_struct *first_tid(struct pid *pid, int tid,
-					int nr, struct pid_namespace *ns)
+static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos,
+					struct pid_namespace *ns)
 {
 	struct task_struct *pos, *task;
+	unsigned long nr = f_pos;
+
+	if (nr != f_pos)	/* 32bit overflow? */
+		return NULL;
 
 	rcu_read_lock();
 	task = pid_task(pid, PIDTYPE_PID);
@@ -3189,14 +3193,14 @@ static struct task_struct *first_tid(struct pid *pid, int tid,
 		goto fail;
 
 	/* Attempt to start with the tid of a thread */
-	if (tid && (nr > 0)) {
+	if (tid && nr) {
 		pos = find_task_by_pid_ns(tid, ns);
 		if (pos && same_thread_group(pos, task))
 			goto found;
 	}
 
 	/* If nr exceeds the number of threads there is nothing todo */
-	if (nr && nr >= get_nr_threads(task))
+	if (nr >= get_nr_threads(task))
 		goto fail;
 
 	/* If we haven't found our starting place yet start
@@ -3204,7 +3208,7 @@ static struct task_struct *first_tid(struct pid *pid, int tid,
 	 */
 	pos = task = task->group_leader;
 	do {
-		if (nr-- <= 0)
+		if (!nr--)
 			goto found;
 	} while_each_thread(task, pos);
 fail:
@@ -3261,7 +3265,7 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
 	if (proc_inode_is_dead(inode))
 		return -ENOENT;
 
-	switch ((unsigned long)filp->f_pos) {
+	switch (filp->f_pos) {
 	case 0:
 		ino = inode->i_ino;
 		if (filldir(dirent, ".", 1, filp->f_pos, ino, DT_DIR) < 0)
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/4] proc: simplify proc_task_readdir/first_tid paths
  2013-06-03 19:07 ` [PATCH v2 3/4] proc: simplify proc_task_readdir/first_tid paths Oleg Nesterov
@ 2013-06-03 22:06   ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2013-06-03 22:06 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, Michal Hocko, Sergey Dyasly, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> proc_task_readdir() does not really need "leader", first_tid()
> has to revalidate it anyway. Just pass proc_pid(inode) to
> first_tid() instead, it can do pid_task(PIDTYPE_PID) itself
> and read ->group_leader only if necessary.
>
> The patch also extracts the "inode is dead" code
> from pid_delete_dentry(dentry) into the new trivial helper,
> proc_inode_is_dead(inode), proc_task_readdir() uses it to return
> -ENOENT if this dir was removed. This is a bit racy, but the race
> is very inlikely and the getdents() after openndir() can see the
> empty "." + ".." dir only once.

This version looks good.  I especially like the factoring out of
proc_inode_is_dead, that makes the purpose of that code much clearer.

Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  fs/proc/base.c |   53 ++++++++++++++++++++++-------------------------------
>  1 files changed, 22 insertions(+), 31 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index bed1096..5e0e02f 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1652,13 +1652,18 @@ int pid_revalidate(struct dentry *dentry, unsigned int flags)
>  	return 0;
>  }
>  
> +static inline bool proc_inode_is_dead(struct inode *inode)
> +{
> +	return !proc_pid(inode)->tasks[PIDTYPE_PID].first;
> +}
> +
>  int pid_delete_dentry(const struct dentry *dentry)
>  {
>  	/* Is the task we represent dead?
>  	 * If so, then don't put the dentry on the lru list,
>  	 * kill it immediately.
>  	 */
> -	return !proc_pid(dentry->d_inode)->tasks[PIDTYPE_PID].first;
> +	return proc_inode_is_dead(dentry->d_inode);
>  }
>  
>  const struct dentry_operations pid_dentry_operations =
> @@ -3173,34 +3178,35 @@ out_no_task:
>   * In the case of a seek we start with the leader and walk nr
>   * threads past it.
>   */
> -static struct task_struct *first_tid(struct task_struct *leader,
> -		int tid, int nr, struct pid_namespace *ns)
> +static struct task_struct *first_tid(struct pid *pid, int tid,
> +					int nr, struct pid_namespace *ns)
>  {
> -	struct task_struct *pos;
> +	struct task_struct *pos, *task;
>  
>  	rcu_read_lock();
> -	/* Attempt to start with the pid of a thread */
> +	task = pid_task(pid, PIDTYPE_PID);
> +	if (!task)
> +		goto fail;
> +
> +	/* Attempt to start with the tid of a thread */
>  	if (tid && (nr > 0)) {
>  		pos = find_task_by_pid_ns(tid, ns);
> -		if (pos && (pos->group_leader == leader))
> +		if (pos && same_thread_group(pos, task))
>  			goto found;
>  	}
>  
>  	/* If nr exceeds the number of threads there is nothing todo */
> -	if (nr && nr >= get_nr_threads(leader))
> -		goto fail;
> -	/* It could be unhashed before we take rcu lock */
> -	if (!pid_alive(leader))
> +	if (nr && nr >= get_nr_threads(task))
>  		goto fail;
>  
>  	/* If we haven't found our starting place yet start
>  	 * with the leader and walk nr threads forward.
>  	 */
> -	pos = leader;
> +	pos = task = task->group_leader;
>  	do {
>  		if (nr-- <= 0)
>  			goto found;
> -	} while_each_thread(leader, pos);
> +	} while_each_thread(task, pos);
>  fail:
>  	pos = NULL;
>  	goto out;
> @@ -3247,26 +3253,13 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
>  {
>  	struct dentry *dentry = filp->f_path.dentry;
>  	struct inode *inode = dentry->d_inode;
> -	struct task_struct *leader = NULL;
>  	struct task_struct *task;
> -	int retval = -ENOENT;
>  	ino_t ino;
>  	int tid;
>  	struct pid_namespace *ns;
>  
> -	task = get_proc_task(inode);
> -	if (!task)
> -		goto out_no_task;
> -	rcu_read_lock();
> -	if (pid_alive(task)) {
> -		leader = task->group_leader;
> -		get_task_struct(leader);
> -	}
> -	rcu_read_unlock();
> -	put_task_struct(task);
> -	if (!leader)
> -		goto out_no_task;
> -	retval = 0;
> +	if (proc_inode_is_dead(inode))
> +		return -ENOENT;
>  
>  	switch ((unsigned long)filp->f_pos) {
>  	case 0:
> @@ -3289,7 +3282,7 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
>  	ns = filp->f_dentry->d_sb->s_fs_info;
>  	tid = (int)filp->f_version;
>  	filp->f_version = 0;
> -	for (task = first_tid(leader, tid, filp->f_pos - 2, ns);
> +	for (task = first_tid(proc_pid(inode), tid, filp->f_pos - 2, ns);
>  	     task;
>  	     task = next_tid(task), filp->f_pos++) {
>  		tid = task_pid_nr_ns(task, ns);
> @@ -3302,9 +3295,7 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
>  		}
>  	}
>  out:
> -	put_task_struct(leader);
> -out_no_task:
> -	return retval;
> +	return 0;
>  }
>  
>  static int proc_task_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths
  2013-06-03 19:07 ` [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths Oleg Nesterov
@ 2013-06-03 22:18   ` Eric W. Biederman
  2013-06-04 17:14     ` Oleg Nesterov
  2013-06-04  0:58   ` Al Viro
  1 sibling, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2013-06-03 22:18 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, Michal Hocko, Sergey Dyasly, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> 1. proc_task_readdir() truncates f_pos to long, this can lead
>    to wrong result on 32bit.
>
> 2. first_tid() truncates f_pos to int, this is wrong even on
>    64bit.
>
>    We could check that f_pos < PID_MAX or even INT_MAX in
>    proc_task_readdir(), but this patch simply checks the
>    potential overflow in first_tid(), this check is nop on
>    64bit. We do not care if it was negative and the new
>    unsigned value is huge, all we need to ensure is that we
>    never wrongly return !NULL.
>
> 3. Remove the 2nd "nr != 0" check before get_nr_threads(),
>    nr_threads == 0 is not distinguishable from !pid_task()
>    above.

This won't compile on some 32bit architectures like x86-32.

switch(unsigned long long) requires helpers that the kernel does not
included.  Or at least the kernel has not included because such code
is a problem.  In fact that is the reason Linus put the case to
unsigned long in there.

There is another bug in here as well that we may return really crazy
things in the case of seek simultaneous with readdir.

I do like your overflow check, but unfortunately I think it is
susceptible to races with lseek.

Simply to avoid lseek non-sense I think we really need to put f_pos
in a local variable.  If the code continues to evolve our heads
will like to explode trying to think about what happens when someone
modifies f_pos while we are reading it/modifying it.

> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  fs/proc/base.c |   16 ++++++++++------
>  1 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 5e0e02f..5598cfa 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3178,10 +3178,14 @@ out_no_task:
>   * In the case of a seek we start with the leader and walk nr
>   * threads past it.
>   */
> -static struct task_struct *first_tid(struct pid *pid, int tid,
> -					int nr, struct pid_namespace *ns)
> +static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos,
> +					struct pid_namespace *ns)
>  {
>  	struct task_struct *pos, *task;
> +	unsigned long nr = f_pos;
> +
> +	if (nr != f_pos)	/* 32bit overflow? */
> +		return NULL;
>  
>  	rcu_read_lock();
>  	task = pid_task(pid, PIDTYPE_PID);
> @@ -3189,14 +3193,14 @@ static struct task_struct *first_tid(struct pid *pid, int tid,
>  		goto fail;
>  
>  	/* Attempt to start with the tid of a thread */
> -	if (tid && (nr > 0)) {
> +	if (tid && nr) {
>  		pos = find_task_by_pid_ns(tid, ns);
>  		if (pos && same_thread_group(pos, task))
>  			goto found;
>  	}
>  
>  	/* If nr exceeds the number of threads there is nothing todo */
> -	if (nr && nr >= get_nr_threads(task))
> +	if (nr >= get_nr_threads(task))
>  		goto fail;
>  
>  	/* If we haven't found our starting place yet start
> @@ -3204,7 +3208,7 @@ static struct task_struct *first_tid(struct pid *pid, int tid,
>  	 */
>  	pos = task = task->group_leader;
>  	do {
> -		if (nr-- <= 0)
> +		if (!nr--)
>  			goto found;
>  	} while_each_thread(task, pos);
>  fail:
> @@ -3261,7 +3265,7 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
>  	if (proc_inode_is_dead(inode))
>  		return -ENOENT;
>  
> -	switch ((unsigned long)filp->f_pos) {
> +	switch (filp->f_pos) {
>  	case 0:
>  		ino = inode->i_ino;
>  		if (filldir(dirent, ".", 1, filp->f_pos, ino, DT_DIR) < 0)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths
  2013-06-03 19:07 ` [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths Oleg Nesterov
  2013-06-03 22:18   ` Eric W. Biederman
@ 2013-06-04  0:58   ` Al Viro
  2013-06-04 17:35     ` Oleg Nesterov
  1 sibling, 1 reply; 14+ messages in thread
From: Al Viro @ 2013-06-04  0:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Michal Hocko, Sergey Dyasly,
	linux-kernel

On Mon, Jun 03, 2013 at 09:07:05PM +0200, Oleg Nesterov wrote:
> 1. proc_task_readdir() truncates f_pos to long, this can lead
>    to wrong result on 32bit.
> 
> 2. first_tid() truncates f_pos to int, this is wrong even on
>    64bit.
> 
>    We could check that f_pos < PID_MAX or even INT_MAX in
>    proc_task_readdir(), but this patch simply checks the
>    potential overflow in first_tid(), this check is nop on
>    64bit. We do not care if it was negative and the new
>    unsigned value is huge, all we need to ensure is that we
>    never wrongly return !NULL.
> 
> 3. Remove the 2nd "nr != 0" check before get_nr_threads(),
>    nr_threads == 0 is not distinguishable from !pid_task()
>    above.

Oleg, please take a look at the series in vfs.git#experimental; at the very
least, we don't want to access file->f_pos in any foo_readdir() - it's too
messy and race-prone.  It's pretty much independent from the issues you
are dealing with, but let's avoid creating pointless conflicts...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths
  2013-06-03 22:18   ` Eric W. Biederman
@ 2013-06-04 17:14     ` Oleg Nesterov
  2013-06-04 17:39       ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-04 17:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Michal Hocko, Sergey Dyasly, linux-kernel

On 06/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > 1. proc_task_readdir() truncates f_pos to long, this can lead
> >    to wrong result on 32bit.
> >
> This won't compile on some 32bit architectures like x86-32.
>
> switch(unsigned long long) requires helpers that the kernel does not
> included.  Or at least the kernel has not included because such code
> is a problem.  In fact that is the reason Linus put the case to
> unsigned long in there.

Hmm, ee568b25, suprise... I am wondering if the kernel still supports
the compilers which needs __cmpdi2 in this case...

But this doesn't matter. The patch should not blindly revert ee568b25,
thanks!

> There is another bug in here as well that we may return really crazy
> things in the case of seek simultaneous with readdir.

i_mutex? both vfs_readdir() and default_llseek() take it... Anyway, this
is another issue.

> I think we really need to put f_pos
> in a local variable.

Heh. I swear, this is what I did initially. But I was afraid you will
blame this change as "overcomplicated" ;)

OK, I'll redo/resend this one later.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/4] proc: first_tid() fix/cleanup
  2013-06-03 19:06 [PATCH v2 0/4] proc: first_tid() fix/cleanup Oleg Nesterov
                   ` (3 preceding siblings ...)
  2013-06-03 19:07 ` [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths Oleg Nesterov
@ 2013-06-04 17:32 ` Oleg Nesterov
  4 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-04 17:32 UTC (permalink / raw)
  To: Andrew Morton, Al Viro
  Cc: Eric W. Biederman, Michal Hocko, Sergey Dyasly, linux-kernel

Andrew,

On 06/03, Oleg Nesterov wrote:
>
> next_thread() should be avoided, probably next_tid() is the
> only "valid" user.
>
> But now we have another reason to avoid (and probably even kill)
> it, we are going to replace or fix while_each_thread(), almost
> every lockless usage is wrong.
>
> Changes:
>
> 	1/4: Update the changelog, do not move the comment.
>
> 	2/4: No changes.

So these two are fine, but please ignore 3 and 4.

> 	3/4: Update the comment following the explanations from
> 	     Eric.
>
> 	     Eric pointed that get_proc_task() without rcu lock
> 	     can trigger the (bogus) warning. Extract the similar
> 	     check from pid_delete_dentry() into the new helper
> 	     and use it instead.
>
> 	     I didn't dare to preserve his ack, but the only change
> 	     is the new proc_inode_is_dead() helper and
>
> 		- if (pid_task(proc_pid(inode))
> 		+ if (proc_inode_is_dead(inode))
>
> 	     in proc_task_readdir().

Looks like a good cleanup and it was acked, but conflicts (textually)
with viro/vfs.git#experimental.

> 	4/4: New.

Should be updated and conflicts too.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths
  2013-06-04  0:58   ` Al Viro
@ 2013-06-04 17:35     ` Oleg Nesterov
  0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-04 17:35 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Eric W. Biederman, Michal Hocko, Sergey Dyasly,
	linux-kernel

On 06/04, Al Viro wrote:
>
> On Mon, Jun 03, 2013 at 09:07:05PM +0200, Oleg Nesterov wrote:
> > 1. proc_task_readdir() truncates f_pos to long, this can lead
> >    to wrong result on 32bit.
> >
> > 2. first_tid() truncates f_pos to int, this is wrong even on
> >    64bit.
> >
> >    We could check that f_pos < PID_MAX or even INT_MAX in
> >    proc_task_readdir(), but this patch simply checks the
> >    potential overflow in first_tid(), this check is nop on
> >    64bit. We do not care if it was negative and the new
> >    unsigned value is huge, all we need to ensure is that we
> >    never wrongly return !NULL.
> >
> > 3. Remove the 2nd "nr != 0" check before get_nr_threads(),
> >    nr_threads == 0 is not distinguishable from !pid_task()
> >    above.
>
> Oleg, please take a look at the series in vfs.git#experimental; at the very
> least, we don't want to access file->f_pos in any foo_readdir() - it's too
> messy and race-prone.  It's pretty much independent from the issues you
> are dealing with, but let's avoid creating pointless conflicts...

Yes, thanks.

Sadly, 3/4 conflicts with 08c35e10 too. I'll rediff/resend this
cleanup later then.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths
  2013-06-04 17:14     ` Oleg Nesterov
@ 2013-06-04 17:39       ` Al Viro
  2013-06-04 19:57         ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2013-06-04 17:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, Andrew Morton, Michal Hocko, Sergey Dyasly,
	linux-kernel

On Tue, Jun 04, 2013 at 07:14:35PM +0200, Oleg Nesterov wrote:
> > There is another bug in here as well that we may return really crazy
> > things in the case of seek simultaneous with readdir.
> 
> i_mutex? both vfs_readdir() and default_llseek() take it... Anyway, this
> is another issue.

That's part of the reasons for the series I've mentioned (vfs.git#experimental);
seriously, check that up.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths
  2013-06-04 17:39       ` Al Viro
@ 2013-06-04 19:57         ` Oleg Nesterov
  2013-06-04 21:06           ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-04 19:57 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric W. Biederman, Andrew Morton, Michal Hocko, Sergey Dyasly,
	linux-kernel

On 06/04, Al Viro wrote:
>
> On Tue, Jun 04, 2013 at 07:14:35PM +0200, Oleg Nesterov wrote:
> > > There is another bug in here as well that we may return really crazy
> > > things in the case of seek simultaneous with readdir.
> >
> > i_mutex? both vfs_readdir() and default_llseek() take it... Anyway, this
> > is another issue.
>
> That's part of the reasons for the series I've mentioned (vfs.git#experimental);
> seriously, check that up.

Will do... but so far I am confused.

I do not see how they could race (I mean /proc/pid/task only). OK, OK,
the usage of ->f_pos in sys_getdents() looks "obviously wrong", but this
is another story? And "put f_pos in a local variable" can't help.

But if you meant other problems with readdir in general, then I seem
to understand, thanks.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths
  2013-06-04 19:57         ` Oleg Nesterov
@ 2013-06-04 21:06           ` Al Viro
  0 siblings, 0 replies; 14+ messages in thread
From: Al Viro @ 2013-06-04 21:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, Andrew Morton, Michal Hocko, Sergey Dyasly,
	linux-kernel

On Tue, Jun 04, 2013 at 09:57:00PM +0200, Oleg Nesterov wrote:
> Will do... but so far I am confused.
> 
> I do not see how they could race (I mean /proc/pid/task only). OK, OK,
> the usage of ->f_pos in sys_getdents() looks "obviously wrong", but this
> is another story? And "put f_pos in a local variable" can't help.

For one thing, a bunch of directories use generic_file_llseek(), which
does *not* use ->i_mutex.  For another, there's a very unpleasant problem
with read(2) (failing) attempt racing with ->f_pos modifications in
->readdir().  Take a look at sys_read() and note that it is done with no
serialization at all (not in the top level, that is) and that it puts the
(unmodified by generic_read_dir()) value of pos back into file->f_pos as
soon as vfs_read() passes -EISDIR (returned by generic_read_dir()) back to
sys_read().

I.e. ->f_pos is silently reset back to the value it used to have on the
entry to read(2).  Despite foo_readdir() assumptions that it won't be
changed behind its back.

Reset itself wouldn't be a problem - if several threads mess with read()
on the same opened file in parallel, you are not promised anything good
about the resulting IO pointer position.  The same applies here.  However,
many ->readdir() instances use file->f_pos as a variable they can use for
internal needs and _that_ leads to very unpleasant races.

The sane solution is to do what ->read()/->write()/etc. are doing - pass
an address of local copy of ->f_pos, so they are able to use it without
worrying about concurrent modifications of that value.  That obviously
solves all problems with generic_file_lseek(), etc., as well as this
sys_read() shite.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-06-04 21:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-03 19:06 [PATCH v2 0/4] proc: first_tid() fix/cleanup Oleg Nesterov
2013-06-03 19:06 ` [PATCH v2 1/4] proc: first_tid: fix the potential use-after-free Oleg Nesterov
2013-06-03 19:07 ` [PATCH v2 2/4] proc: change first_tid() to use while_each_thread() Oleg Nesterov
2013-06-03 19:07 ` [PATCH v2 3/4] proc: simplify proc_task_readdir/first_tid paths Oleg Nesterov
2013-06-03 22:06   ` Eric W. Biederman
2013-06-03 19:07 ` [PATCH v2 4/4] proc: avoid ->f_pos overflows in proc_task_readdir() paths Oleg Nesterov
2013-06-03 22:18   ` Eric W. Biederman
2013-06-04 17:14     ` Oleg Nesterov
2013-06-04 17:39       ` Al Viro
2013-06-04 19:57         ` Oleg Nesterov
2013-06-04 21:06           ` Al Viro
2013-06-04  0:58   ` Al Viro
2013-06-04 17:35     ` Oleg Nesterov
2013-06-04 17:32 ` [PATCH v2 0/4] proc: first_tid() fix/cleanup Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).