All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Blum <bblum@andrew.cmu.edu>
To: NeilBrown <neilb@suse.de>
Cc: paulmck@linux.vnet.ibm.com, Ben Blum <bblum@andrew.cmu.edu>,
	Paul Menage <menage@google.com>, Li Zefan <lizf@cn.fujitsu.com>,
	Oleg Nesterov <oleg@tv-sign.ru>,
	containers@lists.linux-foundation.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Possible race between cgroup_attach_proc and de_thread, and questionable code in de_thread.
Date: Thu, 28 Jul 2011 02:26:16 -0400	[thread overview]
Message-ID: <20110728062616.GC15204@unix33.andrew.cmu.edu> (raw)
In-Reply-To: <20110728110813.7ff84b13@notabene.brown>

On Thu, Jul 28, 2011 at 11:08:13AM +1000, NeilBrown wrote:
> On Wed, 27 Jul 2011 16:42:35 -0700 "Paul E. McKenney"
> <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Wed, Jul 27, 2011 at 11:07:10AM -0400, Ben Blum wrote:
> > > On Wed, Jul 27, 2011 at 05:11:01PM +1000, NeilBrown wrote:
> > 
> > [ . . . ]
> > 
> > > >  The race as I understand it is with this code:
> > > > 
> > > > 
> > > > 		list_replace_rcu(&leader->tasks, &tsk->tasks);
> > > > 		list_replace_init(&leader->sibling, &tsk->sibling);
> > > > 
> > > > 		tsk->group_leader = tsk;
> > > > 		leader->group_leader = tsk;
> > > > 
> > > > 
> > > >  which seems to be called with only tasklist_lock held, which doesn't seem to
> > > >  be held in the cgroup code.
> > > > 
> > > >  If the "thread_group_leader(leader)" call in cgroup_attach_proc() runs before
> > > >  this chunk is run with the same value for 'leader', but the
> > > >  while_each_thread is run after, then the while_read_thread() might loop
> > > >  forever.  rcu_read_lock doesn't prevent this from happening.
> > > 
> > > Somehow I was under the impression that holding tasklist_lock (for
> > > writing) provided exclusion from code that holds rcu_read_lock -
> > > probably because there are other points in the kernel which do
> > > while_each_thread with only RCU-read held (and not tasklist):
> > > 
> > > - kernel/hung_task.c, check_hung_uninterruptible_tasks()
> > 
> > This one looks OK to me.  The code is just referencing fields in each
> > of the task structures, and appears to be making proper use of
> > rcu_dereference().  All this code requires is that the task structures
> > remain in existence through the full lifetime of the RCU read-side
> > critical section, which is guaranteed because of the way the task_struct
> > is freed.
> 
> I disagree.  It also requires - by virtue of the use of while_each_thread() -
> that 'g' remains on the list that 't' is walking along.
> 
> Now for a normal list, the head always stays on the list and is accessible
> even from an rcu-removed entry.  But the thread_group list isn't a normal
> list.  It doesn't have a distinct head.  It is a loop of all of the
> 'task_structs' in a thread group.  One of them is designated the 'leader' but
> de_thread() can change the 'leader' - though it doesn't remove the old leader.
> 
> __unhash_process in mm/exit.c looks like it could remove the leader from the
> list and definitely could remove a non-leader.
> 
> So if a non-leader calls 'exec' and the leader calls 'exit', then a
> task_struct that was the leader could become a non-leader and then be removed
> from the list that kernel/hung_task could be walking along.

That agrees with my understanding.

> 
> So I don't think that while_each_thread() is currently safe.  It depends on
> the thread leader not disappearing and I think it can.

I think that while_each_thread is perfectly safe, it just needs to be
protected properly while used. it reads the tasklist, and both competing
paths (__unhash_process and de_thread) are done with tasklist_lock write
locked, so read-locking ought to suffice. all it needs is to be better
documented.

> [...]
> 
> +/* Thread group leader can change, so stop loop when we see one
> + * even if it isn't 'g' */
>  #define while_each_thread(g, t) \
> -	while ((t = next_thread(t)) != g)
> +	while ((t = next_thread(t)) != g && !thread_group_leader(t))

this is semantically wrong: it will stop as soon as it finds a thread
that has newly become the leader, and not run the loop body code in that
thread's case. so the thread that just execed would not get run on, and
in the case of my code, would "escape" the cgroup migration.

but I argue it is also organisationally wrong. while_each_thread's
purpose is just to worry about the structure of the process list, not to
account for behavioural details of de_thread. this check belongs outside
of the macro, and it should be protected by tasklist_lock in the same
critical section in which while_each_thread is used.

-- Ben

>  
>  static inline int get_nr_threads(struct task_struct *tsk)
>  {
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f2b321b..d6cef25 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -70,8 +70,13 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
>  		list_del_rcu(&p->tasks);
>  		list_del_init(&p->sibling);
>  		__this_cpu_dec(process_counts);
> -	}
> -	list_del_rcu(&p->thread_group);
> +	} else
> +		/* only remove members from the thread group.
> +		 * The thread group leader must stay so that
> +		 * while_each_thread() uses can see the end of
> +		 * the list and stop.
> +		 */
> +		list_del_rcu(&p->thread_group);
>  }
>  
>  /*
> 
> 

  reply	other threads:[~2011-07-28  6:26 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-27  7:11 Possible race between cgroup_attach_proc and de_thread, and questionable code in de_thread NeilBrown
2011-08-14 17:40 ` Oleg Nesterov
2011-08-15  0:11   ` NeilBrown
     [not found]     ` <20110815101144.39812e9f-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2011-08-15 19:09       ` Oleg Nesterov
2011-08-15 19:09     ` Oleg Nesterov
     [not found]   ` <20110814174000.GA2381-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-15  0:11     ` NeilBrown
     [not found] ` <20110727171101.5e32d8eb-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2011-07-27 15:07   ` Ben Blum
2011-07-27 15:07     ` Ben Blum
2011-07-27 23:42     ` Paul E. McKenney
     [not found]       ` <20110727234235.GA2318-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2011-07-28  1:08         ` NeilBrown
2011-07-28  1:08       ` NeilBrown
2011-07-28  6:26         ` Ben Blum [this message]
2011-07-28  7:13           ` NeilBrown
     [not found]             ` <20110728171345.67d3797d-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2011-07-29 14:28               ` [PATCH][BUGFIX] cgroups: more safe tasklist locking in cgroup_attach_proc Ben Blum
2011-07-29 14:28             ` Ben Blum
2011-08-01 19:31               ` Paul Menage
     [not found]               ` <20110729142842.GA8462-japSPQJXeIlCM1neWV3AGuCmf2DRS9x2@public.gmane.org>
2011-08-01 19:31                 ` Paul Menage
2011-08-15 18:49                 ` Oleg Nesterov
2011-08-15 18:49               ` Oleg Nesterov
2011-08-15 22:50                 ` Frederic Weisbecker
2011-08-15 23:04                   ` Ben Blum
2011-08-15 23:09                     ` Ben Blum
2011-08-15 23:19                       ` Frederic Weisbecker
     [not found]                       ` <20110815230900.GB6867-japSPQJXeIlCM1neWV3AGuCmf2DRS9x2@public.gmane.org>
2011-08-15 23:19                         ` Frederic Weisbecker
     [not found]                     ` <20110815230415.GA6867-japSPQJXeIlCM1neWV3AGuCmf2DRS9x2@public.gmane.org>
2011-08-15 23:09                       ` Ben Blum
2011-08-15 23:04                   ` Ben Blum
2011-08-15 23:11                   ` [PATCH][BUGFIX] cgroups: fix ordering of calls " Ben Blum
2011-08-15 23:20                     ` Frederic Weisbecker
     [not found]                     ` <20110815231156.GC6867-japSPQJXeIlCM1neWV3AGuCmf2DRS9x2@public.gmane.org>
2011-08-15 23:20                       ` Frederic Weisbecker
2011-08-15 23:31                       ` Paul Menage
2011-08-15 23:31                     ` Paul Menage
2011-08-15 23:11                   ` Ben Blum
     [not found]                 ` <20110815184957.GA16588-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-15 22:50                   ` [PATCH][BUGFIX] cgroups: more safe tasklist locking " Frederic Weisbecker
2011-09-01 21:46                   ` Ben Blum
2011-09-01 21:46                 ` Ben Blum
     [not found]                   ` <20110901214643.GD10401-japSPQJXeIlCM1neWV3AGuCmf2DRS9x2@public.gmane.org>
2011-09-02 12:32                     ` Oleg Nesterov
2011-09-02 12:32                   ` Oleg Nesterov
     [not found]                     ` <20110902123251.GA26764-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-09-08  2:11                       ` Ben Blum
2011-09-08  2:11                     ` Ben Blum
2011-10-14  0:31                 ` [PATCH 1/2] cgroups: use sighand lock instead of tasklist_lock " Ben Blum
2011-10-14 12:15                   ` Frederic Weisbecker
2011-10-14  0:36                 ` [PATCH 2/2] cgroups: convert ss->attach to use whole threadgroup flex_array (cpuset, memcontrol) Ben Blum
2011-10-14 12:21                   ` Frederic Weisbecker
2011-10-14 13:53                     ` Ben Blum
2011-10-14 13:54                       ` Ben Blum
2011-10-14 15:22                         ` Frederic Weisbecker
2011-10-17 19:11                           ` Ben Blum
2011-10-14 15:21                       ` Frederic Weisbecker
2011-10-19  5:43                   ` Paul Menage
     [not found]           ` <20110728062616.GC15204-japSPQJXeIlCM1neWV3AGuCmf2DRS9x2@public.gmane.org>
2011-07-28  7:13             ` Possible race between cgroup_attach_proc and de_thread, and questionable code in de_thread NeilBrown
2011-07-28 12:17         ` Paul E. McKenney
     [not found]           ` <20110728121741.GB2427-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2011-08-14 17:51             ` Oleg Nesterov
2011-08-14 17:51           ` Oleg Nesterov
     [not found]             ` <20110814175119.GC2381-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-14 23:58               ` NeilBrown
2011-08-15 18:01               ` Paul E. McKenney
2011-08-14 23:58             ` NeilBrown
2011-08-15 18:01             ` Paul E. McKenney
     [not found]         ` <20110728110813.7ff84b13-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2011-07-28  6:26           ` Ben Blum
2011-07-28 12:17           ` Paul E. McKenney
2011-08-14 17:45           ` Oleg Nesterov
2011-08-14 17:45         ` Oleg Nesterov
     [not found]     ` <20110727150710.GB5242-japSPQJXeIlCM1neWV3AGuCmf2DRS9x2@public.gmane.org>
2011-07-27 23:42       ` Paul E. McKenney
2011-08-14 17:40   ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110728062616.GC15204@unix33.andrew.cmu.edu \
    --to=bblum@andrew.cmu.edu \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=neilb@suse.de \
    --cc=oleg@tv-sign.ru \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.