linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Q: cgroup: Questions about possible issues in cgroup locking
@ 2011-12-21  3:43 Frederic Weisbecker
  2011-12-21 13:08 ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2011-12-21  3:43 UTC (permalink / raw)
  To: Li Zefan, Tejun Heo
  Cc: LKML, Mandeep Singh Baines, Containers, Cgroups,
	KAMEZAWA Hiroyuki, Oleg Nesterov, Paul Menage, Andrew Morton,
	Paul E. McKenney

Hi,

Starring at some parts of cgroups, I have a few questions:

- Is cgroup_enable_task_cg_list()'s while_each_thread() safe
against concurrent exec()? The leader may change in de_thread()
and invalidate the test done in while_each_thread().

- do_each_thread() also needs RCU and cgroup_enable_task_cg_list()
seems to remind it. But it seems there is at least one caller that
doesn't call rcu_read_lock(): update_cpu_mask() -> update_tasks_cpumask() -> cgroup_scan_tasks()

- By the time we call cgroup_post_fork(), it is ready to be woken up
and usable by the scheduler. I was thinking about a possible race
where somebody kills the child just before we call cgroup_post_fork()
and then it ends up calling cgroup_exit(). In this case should we perhaps
check for PF_EXITING. Of course I'm talking about a race that will certainly
never happen.
If I'm mistaken and the task can not exit concurrently, probably we don't need
the task_lock() there.

- Is the check for use_task_css_set_links in cgroup_post_fork() safe? given
it is checked outside css_set_lock?

Imagine this:

CPU 0                                                        CPU 1
----                                                         -----

cgroup_enable_task_cg() {                            
	uset_tasks_css_set_links = 1
	for_each_thread() {
		add tasks in the list
	}
}
                                                           do_fork() {
                                                               cgroup_post_fork() {
                                                                     use_tasks_css_set_links appears
                                                                     to be equal to 0 due to write/read
                                                                      not flushed. New task won't
                                                                      appear to the list.


- Is the read_lock() in cgroup_attach_proc() still necessary? It was added to protect
against exec() but now I think it is useless. rcu_read_lock() would be necessary to protect
against group_entry removal though I think.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21  3:43 Q: cgroup: Questions about possible issues in cgroup locking Frederic Weisbecker
@ 2011-12-21 13:08 ` Oleg Nesterov
  2011-12-21 17:56   ` Frederic Weisbecker
                     ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Oleg Nesterov @ 2011-12-21 13:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zefan, Tejun Heo, LKML, Mandeep Singh Baines, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On 12/21, Frederic Weisbecker wrote:
> Hi,
>
> Starring at some parts of cgroups, I have a few questions:
>
> - Is cgroup_enable_task_cg_list()'s while_each_thread() safe
> against concurrent exec()? The leader may change in de_thread()
> and invalidate the test done in while_each_thread().

Yes. Oh, we need to do something with while_each_thread.

> - do_each_thread() also needs RCU and cgroup_enable_task_cg_list()
> seems to remind it. But it seems there is at least one caller that
> doesn't call rcu_read_lock(): update_cpu_mask() -> update_tasks_cpumask() -> cgroup_scan_tasks()

I don't see any caller which takes rcu_read_lock...

> - By the time we call cgroup_post_fork(), it is ready to be woken up
> and usable by the scheduler.

No, the new child can't run until do_fork()->wake_up_new_task().

> - Is the check for use_task_css_set_links in cgroup_post_fork() safe? given
> it is checked outside css_set_lock?
>
> Imagine this:
>
> CPU 0                                                        CPU 1
> ----                                                         -----
>
> cgroup_enable_task_cg() {
> 	uset_tasks_css_set_links = 1
> 	for_each_thread() {
> 		add tasks in the list
> 	}
> }
>                                                            do_fork() {
>                                                                cgroup_post_fork() {
>                                                                      use_tasks_css_set_links appears
>                                                                      to be equal to 0 due to write/read
>                                                                       not flushed. New task won't
>                                                                       appear to the list.

Yes, I was thinking about this too.

Or (I think) they can race "contrariwise". CPU_1 creates the new child,
then CPU_0 sets uset_tasks_css_set_links = 1. But afaics there is no any
guarantee that CPU_0 sees the result of list_add_tail_rcu().

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21 13:08 ` Oleg Nesterov
@ 2011-12-21 17:56   ` Frederic Weisbecker
  2011-12-21 19:01     ` Mandeep Singh Baines
  2011-12-21 17:59   ` Frederic Weisbecker
  2012-02-01 16:28   ` Frederic Weisbecker
  2 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2011-12-21 17:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Li Zefan, Tejun Heo, LKML, Mandeep Singh Baines, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On Wed, Dec 21, 2011 at 02:08:48PM +0100, Oleg Nesterov wrote:
> On 12/21, Frederic Weisbecker wrote:
> > Hi,
> >
> > Starring at some parts of cgroups, I have a few questions:
> >
> > - Is cgroup_enable_task_cg_list()'s while_each_thread() safe
> > against concurrent exec()? The leader may change in de_thread()
> > and invalidate the test done in while_each_thread().
> 
> Yes. Oh, we need to do something with while_each_thread.

Would something like this work?

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c0c5876..e002a00 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2293,8 +2293,12 @@ extern bool current_is_single_threaded(void);
 #define do_each_thread(g, t) \
 	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
 
-#define while_each_thread(g, t) \
-	while ((t = next_thread(t)) != g)
+#define while_each_thread(g, t)					\
+	while (({						\
+		struct task_struct *__prev = t;			\
+		t = next_thread(t);				\
+		t != __prev && t != g;				\
+	}))
 
 static inline int get_nr_threads(struct task_struct *tsk)
 {

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21 13:08 ` Oleg Nesterov
  2011-12-21 17:56   ` Frederic Weisbecker
@ 2011-12-21 17:59   ` Frederic Weisbecker
  2011-12-21 18:11     ` Oleg Nesterov
  2012-02-01 16:28   ` Frederic Weisbecker
  2 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2011-12-21 17:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Li Zefan, Tejun Heo, LKML, Mandeep Singh Baines, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On Wed, Dec 21, 2011 at 02:08:48PM +0100, Oleg Nesterov wrote:
> On 12/21, Frederic Weisbecker wrote:
> > - By the time we call cgroup_post_fork(), it is ready to be woken up
> > and usable by the scheduler.
> 
> No, the new child can't run until do_fork()->wake_up_new_task().

Out of curiosity, why is it not possible for a task to kill and wake up the child
before that happens?

Understanding that could potentially let us remove the task_lock() in cgroup_post_fork()
(although I need to examine that more deeply).

Thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21 17:59   ` Frederic Weisbecker
@ 2011-12-21 18:11     ` Oleg Nesterov
  2011-12-21 18:23       ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-12-21 18:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zefan, Tejun Heo, LKML, Mandeep Singh Baines, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On 12/21, Frederic Weisbecker wrote:
>
> On Wed, Dec 21, 2011 at 02:08:48PM +0100, Oleg Nesterov wrote:
> > On 12/21, Frederic Weisbecker wrote:
> > > - By the time we call cgroup_post_fork(), it is ready to be woken up
> > > and usable by the scheduler.
> >
> > No, the new child can't run until do_fork()->wake_up_new_task().
>
> Out of curiosity, why is it not possible for a task to kill and wake up the child
> before that happens?

Because it is not possible to wake it up.

Please note that copy_process() creates the "deactivated" child, iow
it is not on rq.

But, at the same time its ->state == TASK_RUNNING. This "fools"
try_to_wake_up() or anything else which in theory could place it
on the runqueue.

Except, of course, wake_up_new_task() does activate_task(). And
note that it does this unconditionally, exactly because we know that
this task can't be woken.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21 18:11     ` Oleg Nesterov
@ 2011-12-21 18:23       ` Frederic Weisbecker
  0 siblings, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2011-12-21 18:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Li Zefan, Tejun Heo, LKML, Mandeep Singh Baines, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On Wed, Dec 21, 2011 at 07:11:01PM +0100, Oleg Nesterov wrote:
> On 12/21, Frederic Weisbecker wrote:
> >
> > On Wed, Dec 21, 2011 at 02:08:48PM +0100, Oleg Nesterov wrote:
> > > On 12/21, Frederic Weisbecker wrote:
> > > > - By the time we call cgroup_post_fork(), it is ready to be woken up
> > > > and usable by the scheduler.
> > >
> > > No, the new child can't run until do_fork()->wake_up_new_task().
> >
> > Out of curiosity, why is it not possible for a task to kill and wake up the child
> > before that happens?
> 
> Because it is not possible to wake it up.
> 
> Please note that copy_process() creates the "deactivated" child, iow
> it is not on rq.
> 
> But, at the same time its ->state == TASK_RUNNING. This "fools"
> try_to_wake_up() or anything else which in theory could place it
> on the runqueue.

Aaah I see.

> 
> Except, of course, wake_up_new_task() does activate_task(). And
> note that it does this unconditionally, exactly because we know that
> this task can't be woken.
>

ok, thanks for the explanation!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21 17:56   ` Frederic Weisbecker
@ 2011-12-21 19:01     ` Mandeep Singh Baines
  2011-12-21 19:08       ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Mandeep Singh Baines @ 2011-12-21 19:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Oleg Nesterov, Li Zefan, Tejun Heo, LKML, Mandeep Singh Baines,
	Containers, Cgroups, KAMEZAWA Hiroyuki, Paul Menage,
	Andrew Morton, Paul E. McKenney

Frederic Weisbecker (fweisbec@gmail.com) wrote:
> On Wed, Dec 21, 2011 at 02:08:48PM +0100, Oleg Nesterov wrote:
> > On 12/21, Frederic Weisbecker wrote:
> > > Hi,
> > >
> > > Starring at some parts of cgroups, I have a few questions:
> > >
> > > - Is cgroup_enable_task_cg_list()'s while_each_thread() safe
> > > against concurrent exec()? The leader may change in de_thread()
> > > and invalidate the test done in while_each_thread().
> > 
> > Yes. Oh, we need to do something with while_each_thread.
> 
> Would something like this work?
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index c0c5876..e002a00 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2293,8 +2293,12 @@ extern bool current_is_single_threaded(void);
>  #define do_each_thread(g, t) \
>  	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
>  
> -#define while_each_thread(g, t) \
> -	while ((t = next_thread(t)) != g)
> +#define while_each_thread(g, t)					\
> +	while (({						\
> +		struct task_struct *__prev = t;			\
> +		t = next_thread(t);				\
> +		t != __prev && t != g;				\

Hi,

Don't you still have an (highly unlikely) race if you exec
and then pthread_create()?

Instead of:

t != __prev && t != g;

How about:

t != t->group_leader;

Regards,
Mandeep

> +	}))
>  
>  static inline int get_nr_threads(struct task_struct *tsk)
>  {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21 19:01     ` Mandeep Singh Baines
@ 2011-12-21 19:08       ` Frederic Weisbecker
  2011-12-21 19:24         ` Mandeep Singh Baines
  0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2011-12-21 19:08 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Oleg Nesterov, Li Zefan, Tejun Heo, LKML, Containers, Cgroups,
	KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton, Paul E. McKenney

On Wed, Dec 21, 2011 at 11:01:02AM -0800, Mandeep Singh Baines wrote:
> Frederic Weisbecker (fweisbec@gmail.com) wrote:
> > On Wed, Dec 21, 2011 at 02:08:48PM +0100, Oleg Nesterov wrote:
> > > On 12/21, Frederic Weisbecker wrote:
> > > > Hi,
> > > >
> > > > Starring at some parts of cgroups, I have a few questions:
> > > >
> > > > - Is cgroup_enable_task_cg_list()'s while_each_thread() safe
> > > > against concurrent exec()? The leader may change in de_thread()
> > > > and invalidate the test done in while_each_thread().
> > > 
> > > Yes. Oh, we need to do something with while_each_thread.
> > 
> > Would something like this work?
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index c0c5876..e002a00 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -2293,8 +2293,12 @@ extern bool current_is_single_threaded(void);
> >  #define do_each_thread(g, t) \
> >  	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
> >  
> > -#define while_each_thread(g, t) \
> > -	while ((t = next_thread(t)) != g)
> > +#define while_each_thread(g, t)					\
> > +	while (({						\
> > +		struct task_struct *__prev = t;			\
> > +		t = next_thread(t);				\
> > +		t != __prev && t != g;				\
> 
> Hi,
> 
> Don't you still have an (highly unlikely) race if you exec
> and then pthread_create()?

I'm not sure what you mean.

> 
> Instead of:
> 
> t != __prev && t != g;
> 
> How about:
> 
> t != t->group_leader;

That might work too but we need a pair of memory barriers.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21 19:08       ` Frederic Weisbecker
@ 2011-12-21 19:24         ` Mandeep Singh Baines
  2011-12-21 20:04           ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Mandeep Singh Baines @ 2011-12-21 19:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Mandeep Singh Baines, Oleg Nesterov, Li Zefan, Tejun Heo, LKML,
	Containers, Cgroups, KAMEZAWA Hiroyuki, Paul Menage,
	Andrew Morton, Paul E. McKenney

Frederic Weisbecker (fweisbec@gmail.com) wrote:
> On Wed, Dec 21, 2011 at 11:01:02AM -0800, Mandeep Singh Baines wrote:
> > Frederic Weisbecker (fweisbec@gmail.com) wrote:
> > > On Wed, Dec 21, 2011 at 02:08:48PM +0100, Oleg Nesterov wrote:
> > > > On 12/21, Frederic Weisbecker wrote:
> > > > > Hi,
> > > > >
> > > > > Starring at some parts of cgroups, I have a few questions:
> > > > >
> > > > > - Is cgroup_enable_task_cg_list()'s while_each_thread() safe
> > > > > against concurrent exec()? The leader may change in de_thread()
> > > > > and invalidate the test done in while_each_thread().
> > > > 
> > > > Yes. Oh, we need to do something with while_each_thread.
> > > 
> > > Would something like this work?
> > > 
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index c0c5876..e002a00 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -2293,8 +2293,12 @@ extern bool current_is_single_threaded(void);
> > >  #define do_each_thread(g, t) \
> > >  	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
> > >  
> > > -#define while_each_thread(g, t) \
> > > -	while ((t = next_thread(t)) != g)
> > > +#define while_each_thread(g, t)					\
> > > +	while (({						\
> > > +		struct task_struct *__prev = t;			\
> > > +		t = next_thread(t);				\
> > > +		t != __prev && t != g;				\
> > 
> > Hi,
> > 
> > Don't you still have an (highly unlikely) race if you exec
> > and then pthread_create()?
> 
> I'm not sure what you mean.

Here is what I'm thinking:

If you call exec from a thread other than g, g is now unlinked. So
"t != g" will always be true. If you then pthread_create, you now
have two threads so "t != __prev" will also always be true. So
you now have an infinite loop.

> 
> > 
> > Instead of:
> > 
> > t != __prev && t != g;
> > 
> > How about:
> > 
> > t != t->group_leader;
> 
> That might work too but we need a pair of memory barriers.

next_thread() calls list_entry_rcu. Shouldn't that protect against
a dereference? You don't need to synchronize group_leader since
you are only using it as a value. You don't dereference it.

Regards,
Mandeep

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21 19:24         ` Mandeep Singh Baines
@ 2011-12-21 20:04           ` Frederic Weisbecker
  2011-12-22 15:30             ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2011-12-21 20:04 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Oleg Nesterov, Li Zefan, Tejun Heo, LKML, Containers, Cgroups,
	KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton, Paul E. McKenney

On Wed, Dec 21, 2011 at 11:24:13AM -0800, Mandeep Singh Baines wrote:
> Frederic Weisbecker (fweisbec@gmail.com) wrote:
> > On Wed, Dec 21, 2011 at 11:01:02AM -0800, Mandeep Singh Baines wrote:
> > > Frederic Weisbecker (fweisbec@gmail.com) wrote:
> > > > On Wed, Dec 21, 2011 at 02:08:48PM +0100, Oleg Nesterov wrote:
> > > > > On 12/21, Frederic Weisbecker wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Starring at some parts of cgroups, I have a few questions:
> > > > > >
> > > > > > - Is cgroup_enable_task_cg_list()'s while_each_thread() safe
> > > > > > against concurrent exec()? The leader may change in de_thread()
> > > > > > and invalidate the test done in while_each_thread().
> > > > > 
> > > > > Yes. Oh, we need to do something with while_each_thread.
> > > > 
> > > > Would something like this work?
> > > > 
> > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > index c0c5876..e002a00 100644
> > > > --- a/include/linux/sched.h
> > > > +++ b/include/linux/sched.h
> > > > @@ -2293,8 +2293,12 @@ extern bool current_is_single_threaded(void);
> > > >  #define do_each_thread(g, t) \
> > > >  	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
> > > >  
> > > > -#define while_each_thread(g, t) \
> > > > -	while ((t = next_thread(t)) != g)
> > > > +#define while_each_thread(g, t)					\
> > > > +	while (({						\
> > > > +		struct task_struct *__prev = t;			\
> > > > +		t = next_thread(t);				\
> > > > +		t != __prev && t != g;				\
> > > 
> > > Hi,
> > > 
> > > Don't you still have an (highly unlikely) race if you exec
> > > and then pthread_create()?
> > 
> > I'm not sure what you mean.
> 
> Here is what I'm thinking:
> 
> If you call exec from a thread other than g, g is now unlinked. So
> "t != g" will always be true. If you then pthread_create, you now
> have two threads so "t != __prev" will also always be true. So
> you now have an infinite loop.

Oh you're right.

But then we can't use t != t->group_leader because that assumes while_each_thread()
started on the leader. Or may be we can take this assumption...

> 
> > 
> > > 
> > > Instead of:
> > > 
> > > t != __prev && t != g;
> > > 
> > > How about:
> > > 
> > > t != t->group_leader;
> > 
> > That might work too but we need a pair of memory barriers.
> 
> next_thread() calls list_entry_rcu. Shouldn't that protect against
> a dereference? You don't need to synchronize group_leader since
> you are only using it as a value. You don't dereference it.
> 
> Regards,
> Mandeep

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21 20:04           ` Frederic Weisbecker
@ 2011-12-22 15:30             ` Oleg Nesterov
  2012-01-04 19:36               ` Mandeep Singh Baines
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-12-22 15:30 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Mandeep Singh Baines, Li Zefan, Tejun Heo, LKML, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On 12/21, Frederic Weisbecker wrote:
>
> On Wed, Dec 21, 2011 at 11:24:13AM -0800, Mandeep Singh Baines wrote:
> >
> > If you call exec from a thread other than g, g is now unlinked. So
> > "t != g" will always be true. If you then pthread_create, you now
> > have two threads so "t != __prev" will also always be true. So
> > you now have an infinite loop.
>
> Oh you're right.
>
> But then we can't use t != t->group_leader because that assumes while_each_thread()
> started on the leader.

Yes, this can't work.

Besides, we need more burriers to rely on the ->group_leader check.

See http://marc.info/?t=127688987300002

in particular, http://marc.info/?l=linux-kernel&m=127714242731448
I think this should work, but then we should do something with the
users like zap_threads().

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-22 15:30             ` Oleg Nesterov
@ 2012-01-04 19:36               ` Mandeep Singh Baines
  2012-01-06 15:23                 ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Mandeep Singh Baines @ 2012-01-04 19:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Frederic Weisbecker, Mandeep Singh Baines, Li Zefan, Tejun Heo,
	LKML, Containers, Cgroups, KAMEZAWA Hiroyuki, Paul Menage,
	Andrew Morton, Paul E. McKenney

Oleg Nesterov (oleg@redhat.com) wrote:
> On 12/21, Frederic Weisbecker wrote:
> >
> > On Wed, Dec 21, 2011 at 11:24:13AM -0800, Mandeep Singh Baines wrote:
> > >
> > > If you call exec from a thread other than g, g is now unlinked. So
> > > "t != g" will always be true. If you then pthread_create, you now
> > > have two threads so "t != __prev" will also always be true. So
> > > you now have an infinite loop.
> >
> > Oh you're right.
> >
> > But then we can't use t != t->group_leader because that assumes while_each_thread()
> > started on the leader.
> 
> Yes, this can't work.
> 
> Besides, we need more burriers to rely on the ->group_leader check.
> 
> See http://marc.info/?t=127688987300002
> 

I went through the thread. Were there any other concerns other than
requiring that you start with the group_leader and the barrier?

You could modify zap_other_threads to start with the group leader by
skipping p:

if (p == t)
   continue;

> in particular, http://marc.info/?l=linux-kernel&m=127714242731448
> I think this should work, but then we should do something with the
> users like zap_threads().
> 

With that patch, won't you potentially miss the exec thread if an exec
occurs while you're iterating over the list? Is that OK?

Regards,
Mandeep

> Oleg.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-04 19:36               ` Mandeep Singh Baines
@ 2012-01-06 15:23                 ` Oleg Nesterov
  2012-01-06 18:25                   ` Mandeep Singh Baines
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2012-01-06 15:23 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Frederic Weisbecker, Li Zefan, Tejun Heo, LKML, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On 01/04, Mandeep Singh Baines wrote:
>
> Oleg Nesterov (oleg@redhat.com) wrote:
> > On 12/21, Frederic Weisbecker wrote:
> > >
> > > On Wed, Dec 21, 2011 at 11:24:13AM -0800, Mandeep Singh Baines wrote:
> > > >
> > > > If you call exec from a thread other than g, g is now unlinked. So
> > > > "t != g" will always be true. If you then pthread_create, you now
> > > > have two threads so "t != __prev" will also always be true. So
> > > > you now have an infinite loop.
> > >
> > > Oh you're right.
> > >
> > > But then we can't use t != t->group_leader because that assumes while_each_thread()
> > > started on the leader.
> >
> > Yes, this can't work.
> >
> > Besides, we need more burriers to rely on the ->group_leader check.
> >
> > See http://marc.info/?t=127688987300002
> >
>
> I went through the thread. Were there any other concerns other than
> requiring that you start with the group_leader and the barrier?
>
> You could modify zap_other_threads to start with the group leader by
> skipping p:
>
> if (p == t)
>    continue;

Yes, we can but there are other while_each_thread(nonleader) users.
Yes we can fix them too but this looks a bit ugly and we need to
change while_each_thread() anyway. And I do not see why this change
will be simpler if we restrict it to group_leader.

And note that zap_other_threads() is fine in any case, it is called
under ->siglock.

> > in particular, http://marc.info/?l=linux-kernel&m=127714242731448
> > I think this should work, but then we should do something with the
> > users like zap_threads().
> >
>
> With that patch, won't you potentially miss the exec thread if an exec
> occurs while you're iterating over the list? Is that OK?

Of course it is not OK ;) Note the "we should do something with" above.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-06 15:23                 ` Oleg Nesterov
@ 2012-01-06 18:25                   ` Mandeep Singh Baines
  2012-01-11 16:07                     ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Mandeep Singh Baines @ 2012-01-06 18:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mandeep Singh Baines, Frederic Weisbecker, Li Zefan, Tejun Heo,
	LKML, Containers, Cgroups, KAMEZAWA Hiroyuki, Paul Menage,
	Andrew Morton, Paul E. McKenney

Oleg Nesterov (oleg@redhat.com) wrote:
> On 01/04, Mandeep Singh Baines wrote:
> >
> > Oleg Nesterov (oleg@redhat.com) wrote:
> > > On 12/21, Frederic Weisbecker wrote:
> > > >
> > > > On Wed, Dec 21, 2011 at 11:24:13AM -0800, Mandeep Singh Baines wrote:
> > > > >
> > > > > If you call exec from a thread other than g, g is now unlinked. So
> > > > > "t != g" will always be true. If you then pthread_create, you now
> > > > > have two threads so "t != __prev" will also always be true. So
> > > > > you now have an infinite loop.
> > > >
> > > > Oh you're right.
> > > >
> > > > But then we can't use t != t->group_leader because that assumes while_each_thread()
> > > > started on the leader.
> > >
> > > Yes, this can't work.
> > >
> > > Besides, we need more burriers to rely on the ->group_leader check.
> > >
> > > See http://marc.info/?t=127688987300002
> > >
> >
> > I went through the thread. Were there any other concerns other than
> > requiring that you start with the group_leader and the barrier?
> >
> > You could modify zap_other_threads to start with the group leader by
> > skipping p:
> >
> > if (p == t)
> >    continue;
> 
> Yes, we can but there are other while_each_thread(nonleader) users.
> Yes we can fix them too but this looks a bit ugly and we need to
> change while_each_thread() anyway. And I do not see why this change
> will be simpler if we restrict it to group_leader.
> 
> And note that zap_other_threads() is fine in any case, it is called
> under ->siglock.
> 
> > > in particular, http://marc.info/?l=linux-kernel&m=127714242731448
> > > I think this should work, but then we should do something with the
> > > users like zap_threads().
> > >
> >
> > With that patch, won't you potentially miss the exec thread if an exec
> > occurs while you're iterating over the list? Is that OK?
> 
> Of course it is not OK ;) Note the "we should do something with" above.
> 

So requirements should be something like this:

* Any task alive for the duration of the iteration MUST be visited
* No task should be visited more than once
* Any task born or exiting after starting the iteration MAY be skipped
* You can start at any task in the thread group

Would something like this work:

#define while_each_thread(g, t, o) \
	while (t->group_leader == o && (t = next_thread(t)) != g)

Where o should have the value of g->group_leader.

Regards,
Mandeep

> Oleg.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-06 18:25                   ` Mandeep Singh Baines
@ 2012-01-11 16:07                     ` Oleg Nesterov
  2012-01-12  0:31                       ` Mandeep Singh Baines
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2012-01-11 16:07 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Frederic Weisbecker, Li Zefan, Tejun Heo, LKML, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On 01/06, Mandeep Singh Baines wrote:
>
> Oleg Nesterov (oleg@redhat.com) wrote:
> >
> > > > in particular, http://marc.info/?l=linux-kernel&m=127714242731448
> > > > I think this should work, but then we should do something with the
> > > > users like zap_threads().
> > > >
> > >
> > > With that patch, won't you potentially miss the exec thread if an exec
> > > occurs while you're iterating over the list? Is that OK?
> >
> > Of course it is not OK ;) Note the "we should do something with" above.
> >
>
> So requirements should be something like this:

(I assume, you mean the lockless case)

> * Any task alive for the duration of the iteration MUST be visited
> * No task should be visited more than once
> * Any task born or exiting after starting the iteration MAY be skipped
> * You can start at any task in the thread group

Well yes, but it is not easy to exactly define what after/before
means in this case.

> Would something like this work:
>
> #define while_each_thread(g, t, o) \
> 	while (t->group_leader == o && (t = next_thread(t)) != g)
>
> Where o should have the value of g->group_leader.

I don't understand how this helps... and how this can work even
ignoring the barriers.

OK, we have the main thream M and the sub-thread T, we are doing

	do {
		do_something(t);
	} while_each_thread(M, t, M);

why we can't miss T if it does exec?

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-11 16:07                     ` Oleg Nesterov
@ 2012-01-12  0:31                       ` Mandeep Singh Baines
  2012-01-12 17:07                         ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Mandeep Singh Baines @ 2012-01-12  0:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mandeep Singh Baines, Frederic Weisbecker, Li Zefan, Tejun Heo,
	LKML, Containers, Cgroups, KAMEZAWA Hiroyuki, Paul Menage,
	Andrew Morton, Paul E. McKenney

Hi Oleg,

Oleg Nesterov (oleg@redhat.com) wrote:
> On 01/06, Mandeep Singh Baines wrote:
> >
> > Oleg Nesterov (oleg@redhat.com) wrote:
> > >
> > > > > in particular, http://marc.info/?l=linux-kernel&m=127714242731448
> > > > > I think this should work, but then we should do something with the
> > > > > users like zap_threads().
> > > > >
> > > >
> > > > With that patch, won't you potentially miss the exec thread if an exec
> > > > occurs while you're iterating over the list? Is that OK?
> > >
> > > Of course it is not OK ;) Note the "we should do something with" above.
> > >
> >
> > So requirements should be something like this:
> 
> (I assume, you mean the lockless case)
> 

Correct.

> > * Any task alive for the duration of the iteration MUST be visited
> > * No task should be visited more than once
> > * Any task born or exiting after starting the iteration MAY be skipped
> > * You can start at any task in the thread group
> 
> Well yes, but it is not easy to exactly define what after/before
> means in this case.
> 
> > Would something like this work:
> >
> > #define while_each_thread(g, t, o) \
> > 	while (t->group_leader == o && (t = next_thread(t)) != g)
> >
> > Where o should have the value of g->group_leader.
> 
> I don't understand how this helps... and how this can work even
> ignoring the barriers.
> 
> OK, we have the main thream M and the sub-thread T, we are doing
> 
> 	do {
> 		do_something(t);
> 	} while_each_thread(M, t, M);
> 
> why we can't miss T if it does exec?
> 

So for:

struct task *M; /* assuming this is passed in to us */
struct task *L = M->group_leader;
struct task *I = M;

do {
	do_something(T);
} while_each_thread(M, T, L);

Here is my thinking.

If some thread K does exec, you won't miss it because:

1) Ignoring the group_leader check, you'll visit K just by following
   next_thread(). That's the case today and is what you except
   when iterating over an rcu_list.
2) (t->group_leader == o) will fail iff t is the exec thread.
   Since we test t->group_leader before re-assigning it (t=next_thread()),
   the test will fail only after visiting the exec thread. So you'll
   visit the exec thread and then terminate the loop.

I realize its a klutzy interface (requires 3 variables) but it seems
correct (ignoring barriers) and meets all the requirements. I'm hoping
it inspires a solution which is less klutzy and meet its all the
requirements.

Regards,
Mandeep

> Oleg.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-12  0:31                       ` Mandeep Singh Baines
@ 2012-01-12 17:07                         ` Oleg Nesterov
  2012-01-12 17:57                           ` Mandeep Singh Baines
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2012-01-12 17:07 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Frederic Weisbecker, Li Zefan, Tejun Heo, LKML, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

Hi Mandeep,

On 01/11, Mandeep Singh Baines wrote:
> > >
> > > #define while_each_thread(g, t, o) \
> > > 	while (t->group_leader == o && (t = next_thread(t)) != g)
> > >
> > > Where o should have the value of g->group_leader.
> >
> > I don't understand how this helps... and how this can work even
> > ignoring the barriers.
> >
> > OK, we have the main thream M and the sub-thread T, we are doing
> >
> > 	do {
> > 		do_something(t);
> > 	} while_each_thread(M, t, M);
> >
> > why we can't miss T if it does exec?
> >
>
> So for:
>
> struct task *M; /* assuming this is passed in to us */
> struct task *L = M->group_leader;

L == M

> do {
> 	do_something(T);
> } while_each_thread(M, T, L);
>
> Here is my thinking.
>
> If some thread K does exec, you won't miss it because:
>
> 1) Ignoring the group_leader check, you'll visit K just by following
>    next_thread(). That's the case today and is what you except
>    when iterating over an rcu_list.
> 2) (t->group_leader == o) will fail iff t is the exec thread.
>    Since we test t->group_leader before re-assigning it (t=next_thread()),
>    the test will fail only after visiting the exec thread. So you'll
>    visit the exec thread and then terminate the loop.

Still can't understand... Lets look at this trivial example again.

We start from the main thread M, it is ->group_leader. There is
another thread T in this thread group. We are doing

	OLD = M;

	t = M;
	do {
		do_smth(t);
	}
	while (t->group_leader == OLD && ((t = next_thread(t)) != M);

The first iteration does do_smth(M).

T calls de_thread() and, in particular, it does M->group_leader = T
(see "leader->group_leader = tsk" in de_thread).

after that t->group_leader == OLD fails. t == M, its group_leader == T.
do_smth(T) won't be called.

No?

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-12 17:07                         ` Oleg Nesterov
@ 2012-01-12 17:57                           ` Mandeep Singh Baines
  2012-01-13 15:20                             ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Mandeep Singh Baines @ 2012-01-12 17:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mandeep Singh Baines, Frederic Weisbecker, Li Zefan, Tejun Heo,
	LKML, Containers, Cgroups, KAMEZAWA Hiroyuki, Paul Menage,
	Andrew Morton, Paul E. McKenney

Hi Oleg,

Oleg Nesterov (oleg@redhat.com) wrote:
> Hi Mandeep,
> 
> On 01/11, Mandeep Singh Baines wrote:
> > > >
> > > > #define while_each_thread(g, t, o) \
> > > > 	while (t->group_leader == o && (t = next_thread(t)) != g)
> > > >
> > > > Where o should have the value of g->group_leader.
> > >
> > > I don't understand how this helps... and how this can work even
> > > ignoring the barriers.
> > >
> > > OK, we have the main thream M and the sub-thread T, we are doing
> > >
> > > 	do {
> > > 		do_something(t);
> > > 	} while_each_thread(M, t, M);
> > >
> > > why we can't miss T if it does exec?
> > >
> >
> > So for:
> >
> > struct task *M; /* assuming this is passed in to us */
> > struct task *L = M->group_leader;
> 
> L == M
> 
> > do {
> > 	do_something(T);
> > } while_each_thread(M, T, L);
> >
> > Here is my thinking.
> >
> > If some thread K does exec, you won't miss it because:
> >
> > 1) Ignoring the group_leader check, you'll visit K just by following
> >    next_thread(). That's the case today and is what you except
> >    when iterating over an rcu_list.
> > 2) (t->group_leader == o) will fail iff t is the exec thread.
> >    Since we test t->group_leader before re-assigning it (t=next_thread()),
> >    the test will fail only after visiting the exec thread. So you'll
> >    visit the exec thread and then terminate the loop.
> 
> Still can't understand... Lets look at this trivial example again.
> 
> We start from the main thread M, it is ->group_leader. There is
> another thread T in this thread group. We are doing
> 
> 	OLD = M;
> 
> 	t = M;
> 	do {
> 		do_smth(t);
> 	}
> 	while (t->group_leader == OLD && ((t = next_thread(t)) != M);
> 
> The first iteration does do_smth(M).
> 
> T calls de_thread() and, in particular, it does M->group_leader = T
> (see "leader->group_leader = tsk" in de_thread).
> 
> after that t->group_leader == OLD fails. t == M, its group_leader == T.
> do_smth(T) won't be called.
> 
> No?
> 

I think we can handle this by removing the assignment. So in de_thread():

-		leader->group_leader = tsk;

		tsk->exit_signal = SIGCHLD;
		leader->exit_signal = -1;

		BUG_ON(leader->exit_state != EXIT_ZOMBIE);
		leader->exit_state = EXIT_DEAD;

In the current d_thread(), four statements after reassigning
leader->group_leader, we mark the old leader as EXIT_DEAD. So what if
we leave leader->group_leader = leader. Since its EXIT_DEAD a few
statements later, I don't think anything should break.

What do you think?

Regards,
Mandeep

> Oleg.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-12 17:57                           ` Mandeep Singh Baines
@ 2012-01-13 15:20                             ` Oleg Nesterov
  2012-01-13 18:27                               ` Mandeep Singh Baines
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2012-01-13 15:20 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Frederic Weisbecker, Li Zefan, Tejun Heo, LKML, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On 01/12, Mandeep Singh Baines wrote:
>
> Oleg Nesterov (oleg@redhat.com) wrote:
> >
> > Still can't understand... Lets look at this trivial example again.
> >
> > We start from the main thread M, it is ->group_leader. There is
> > another thread T in this thread group. We are doing
> >
> > 	OLD = M;
> >
> > 	t = M;
> > 	do {
> > 		do_smth(t);
> > 	}
> > 	while (t->group_leader == OLD && ((t = next_thread(t)) != M);
> >
> > The first iteration does do_smth(M).
> >
> > T calls de_thread() and, in particular, it does M->group_leader = T
> > (see "leader->group_leader = tsk" in de_thread).
> >
> > after that t->group_leader == OLD fails. t == M, its group_leader == T.
> > do_smth(T) won't be called.
> >
> > No?
> >
>
> I think we can handle this by removing the assignment. So in de_thread():
>
> -		leader->group_leader = tsk;

Ah, so that was you plan. I was confused by the 3rd argument, why
it is needed?

Yes, I thought about this too. Suppose we remove this assignment,
then we can simply do

	#define while_each_thread(g, t) \
		while (t->group_leader == g->group_leader && (t = next_thread(t)) != g)

with the same effect. (to remind, currently I ignore the barriers/etc).

But this can _only_ help if we start at the group leader!

May be we should enforce this rule (for the lockless case), I dunno...
In that case I'd prefer to add the new while_each_thread_rcu() helper.
But! in this case we do not need to change de_thread(), we can simply do

	#define while_each_thread_rcu(t) \
		while (({ t = next_thread(t); !thread_group_leader(t); }))

The definition above was one of the possibilities I considered, but
I wasn't able to convince myself this is the best option.

See? Or do you think I missed something?

Just in case... note that while_each_thread_rcu() doesn't use 'g'
at all. May be it makes sense to keep the old "t != g &&", but this
is minor.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-13 15:20                             ` Oleg Nesterov
@ 2012-01-13 18:27                               ` Mandeep Singh Baines
  2012-01-14 17:36                                 ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Mandeep Singh Baines @ 2012-01-13 18:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mandeep Singh Baines, Frederic Weisbecker, Li Zefan, Tejun Heo,
	LKML, Containers, Cgroups, KAMEZAWA Hiroyuki, Paul Menage,
	Andrew Morton, Paul E. McKenney

Oleg Nesterov (oleg@redhat.com) wrote:
> On 01/12, Mandeep Singh Baines wrote:
> >
> > Oleg Nesterov (oleg@redhat.com) wrote:
> > >
> > > Still can't understand... Lets look at this trivial example again.
> > >
> > > We start from the main thread M, it is ->group_leader. There is
> > > another thread T in this thread group. We are doing
> > >
> > > 	OLD = M;
> > >
> > > 	t = M;
> > > 	do {
> > > 		do_smth(t);
> > > 	}
> > > 	while (t->group_leader == OLD && ((t = next_thread(t)) != M);
> > >
> > > The first iteration does do_smth(M).
> > >
> > > T calls de_thread() and, in particular, it does M->group_leader = T
> > > (see "leader->group_leader = tsk" in de_thread).
> > >
> > > after that t->group_leader == OLD fails. t == M, its group_leader == T.
> > > do_smth(T) won't be called.
> > >
> > > No?
> > >
> >
> > I think we can handle this by removing the assignment. So in de_thread():
> >
> > -		leader->group_leader = tsk;
> 
> Ah, so that was you plan. I was confused by the 3rd argument, why
> it is needed?
> 

Good question. On second thought, I don't think its needed as shown
in you're solution below.

> Yes, I thought about this too. Suppose we remove this assignment,
> then we can simply do
> 
> 	#define while_each_thread(g, t) \
> 		while (t->group_leader == g->group_leader && (t = next_thread(t)) != g)
> 
> with the same effect. (to remind, currently I ignore the barriers/etc).
> 

Nice! I think this works.

> But this can _only_ help if we start at the group leader!

But I don't think this solution requires we start at the group leader.

My thinking:

Case 1: g is the exec thread

The only condition under which g->group_leader would change is if g is the
exec thread. If you are the exec the only requirement is that you visit the
exec thread. Visiting any other threads is optional. Since g is the exec
thread, you've already visited it and can safely stop once
g->group_leader is re-assigned to g.

Case 2: g is the group leader

If g is the group leader and a subthread execs, you'll terminate just
after visiting the exec thread.

Case 3: g is some other thread

In this case, g MUST be current so you don't really need to worry
about de_thread() since current can't be de_threaded.

> 
> May be we should enforce this rule (for the lockless case), I dunno...
> In that case I'd prefer to add the new while_each_thread_rcu() helper.
> But! in this case we do not need to change de_thread(), we can simply do
> 
> 	#define while_each_thread_rcu(t) \
> 		while (({ t = next_thread(t); !thread_group_leader(t); }))
> 

Won't this terminate just before visiting the exec thread?

> The definition above was one of the possibilities I considered, but
> I wasn't able to convince myself this is the best option.
> 
> See? Or do you think I missed something?
> 
> Just in case... note that while_each_thread_rcu() doesn't use 'g'
> at all. May be it makes sense to keep the old "t != g &&", but this
> is minor.
> 
> Oleg.
> 

Regards,
Mandeep

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-13 18:27                               ` Mandeep Singh Baines
@ 2012-01-14 17:36                                 ` Oleg Nesterov
  2012-01-18 23:17                                   ` Mandeep Singh Baines
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2012-01-14 17:36 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Frederic Weisbecker, Li Zefan, Tejun Heo, LKML, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On 01/13, Mandeep Singh Baines wrote:
>
> Oleg Nesterov (oleg@redhat.com) wrote:
>
> > Yes, I thought about this too. Suppose we remove this assignment,
> > then we can simply do
> >
> > 	#define while_each_thread(g, t) \
> > 		while (t->group_leader == g->group_leader && (t = next_thread(t)) != g)
> >
> > with the same effect. (to remind, currently I ignore the barriers/etc).
> >
>
> Nice! I think this works.
>
> > But this can _only_ help if we start at the group leader!
>
> But I don't think this solution requires we start at the group leader.
>
> My thinking:
>
> Case 1: g is the exec thread
> ...
> Case 2: g is the group leader
> ...
> Case 3: g is some other thread
>
> In this case, g MUST be current

Why? This is not true. But I guess this doesn't matter, I think I am
starting to understand why our discussion was a bit confusing.

The problem is _not_ exec/de_thread by itself. The problem is that
while_each_thread(g, t) can race with removing 'g' from the list.
IOW, list_for_each_rcu-like things assume that the head of the list
is "stable" but this is not true.

And note that de_thread() does not matter in this sense, only
__unhash_process()->list_del_rcu(&p->thread_group) does matter.

Now. If 'g' is the group leader, we should only worry about exec,
otherwise it can't disappear before other threads.

But if it is not the group leader, it can simply exit and
while_each_thread(g, t) can never stop by the same reason.

And we can't detect this case. OK, OK, we can, let me remind
about http://marc.info/?l=linux-kernel&m=127714242731448 and the
whole discussion. But this makes while_each_thread() more complex
and this doesn't fork for zap_threads-like users which must not
miss the "interesing" threads.

Finally. If we restrict the lockless while_each_thread() so that
is starts at the group leader, then we can rely on de_thread()
which changes the leadership and try to detect this case.

See?

> > May be we should enforce this rule (for the lockless case), I dunno...
> > In that case I'd prefer to add the new while_each_thread_rcu() helper.
> > But! in this case we do not need to change de_thread(), we can simply do
> >
> > 	#define while_each_thread_rcu(t) \
> > 		while (({ t = next_thread(t); !thread_group_leader(t); }))
> >
>
> Won't this terminate just before visiting the exec thread?

Sure. But this is fine for zap_threads() or current_is_single_threaded(),
the execing thread already has another ->mm. Just we shouldn't hang
forever if we race with exec (we start at the group leader).

And I hope this is fine in general (to remind, in the lockless case).
If we race with exec we see either the old or the new leader with the
same pid/signal/etc (at least).

Hmm. On a second thought, perhaps I thought to much about zap_threads(),
perhaps it would be better to find all threads we can...

I dunno. But I am starting to like the ->group_leader more. Hmm, and
with thread_group_leader() check we should ensure we do not visit the
old leader twice.

Thanks Mandeep.

I'll try to recheck my thinking once again, what do you think? Anything
else we could miss?

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-14 17:36                                 ` Oleg Nesterov
@ 2012-01-18 23:17                                   ` Mandeep Singh Baines
  2012-01-19 15:45                                     ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Mandeep Singh Baines @ 2012-01-18 23:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mandeep Singh Baines, Frederic Weisbecker, Li Zefan, Tejun Heo,
	LKML, Containers, Cgroups, KAMEZAWA Hiroyuki, Paul Menage,
	Andrew Morton, Paul E. McKenney

Oleg Nesterov (oleg@redhat.com) wrote:
> On 01/13, Mandeep Singh Baines wrote:
> >
> > Oleg Nesterov (oleg@redhat.com) wrote:
> >
> > > Yes, I thought about this too. Suppose we remove this assignment,
> > > then we can simply do
> > >
> > > 	#define while_each_thread(g, t) \
> > > 		while (t->group_leader == g->group_leader && (t = next_thread(t)) != g)
> > >
> > > with the same effect. (to remind, currently I ignore the barriers/etc).
> > >
> >
> > Nice! I think this works.
> >
> > > But this can _only_ help if we start at the group leader!
> >
> > But I don't think this solution requires we start at the group leader.
> >
> > My thinking:
> >
> > Case 1: g is the exec thread
> > ...
> > Case 2: g is the group leader
> > ...
> > Case 3: g is some other thread
> >
> > In this case, g MUST be current
> 
> Why? This is not true.

Here is my thinking:

The terminating condition, t != g, assumes that you can get back to
g. If g is unhashed, there is no guarantee you'll ever get back to it.
Holding a reference does not prevent unhashing.

for_each_process avoids unhashing by starting and ending at init_task
(which can never be unhashed).

As you pointed out a while back, this doesn't work for:

do_each_thread(g, t){
  do_something(t);
} while_each_thread(g, t)

because g can be unhashed.

However, you should be able to use while_each_thread if you are current.
Being current would prevent 'g' from being unhashed. I can think of no
other way of preventing 'g' from being unhashed. So I suspect that,
other than, do_each_thread/while_each_thread, all other callers
of while_each_thread() are starting at current. Otherwise, how do
you guarantee that it terminates.

I see at least one example, coredump_wait() that uses while_each_thread
starting at current. I didn't find any cases where while_each_thread
starts anywhere other than current or group_leader.

> But I guess this doesn't matter, I think I am
> starting to understand why our discussion was a bit confusing.
> 
> The problem is _not_ exec/de_thread by itself. The problem is that
> while_each_thread(g, t) can race with removing 'g' from the list.
> IOW, list_for_each_rcu-like things assume that the head of the list
> is "stable" but this is not true.
> 
> And note that de_thread() does not matter in this sense, only
> __unhash_process()->list_del_rcu(&p->thread_group) does matter.
> 
> Now. If 'g' is the group leader, we should only worry about exec,
> otherwise it can't disappear before other threads.
> 
> But if it is not the group leader, it can simply exit and
> while_each_thread(g, t) can never stop by the same reason.
> 

I think we are on the same page. Your explanation is consistent with
my understanding.

Some other thoughts:

I suspect that other than do_each_thread()/while_each_thread() or
for_each_thread()/while_each_thread() where 'g' is the group_leader,
'g' MUST be current. So the only cases to consider are:

1) g is the group_leader: only exec is important for the reasons
   you specify (it can't disappear before other threads)
2) g is current: no worries. it can't disappear

> And we can't detect this case. OK, OK, we can, let me remind
> about http://marc.info/?l=linux-kernel&m=127714242731448 and the
> whole discussion. But this makes while_each_thread() more complex
> and this doesn't fork for zap_threads-like users which must not
> miss the "interesing" threads.
> 

This URL is blacked out today.

> Finally. If we restrict the lockless while_each_thread() so that
> is starts at the group leader, then we can rely on de_thread()
> which changes the leadership and try to detect this case.
> 
> See?
> 
> > > May be we should enforce this rule (for the lockless case), I dunno...
> > > In that case I'd prefer to add the new while_each_thread_rcu() helper.
> > > But! in this case we do not need to change de_thread(), we can simply do
> > >
> > > 	#define while_each_thread_rcu(t) \
> > > 		while (({ t = next_thread(t); !thread_group_leader(t); }))
> > >
> >
> > Won't this terminate just before visiting the exec thread?
> 
> Sure. But this is fine for zap_threads() or current_is_single_threaded(),
> the execing thread already has another ->mm. Just we shouldn't hang
> forever if we race with exec (we start at the group leader).
> 
> And I hope this is fine in general (to remind, in the lockless case).
> If we race with exec we see either the old or the new leader with the
> same pid/signal/etc (at least).
> 
> Hmm. On a second thought, perhaps I thought to much about zap_threads(),
> perhaps it would be better to find all threads we can...
> 
> I dunno. But I am starting to like the ->group_leader more. Hmm, and
> with thread_group_leader() check we should ensure we do not visit the
> old leader twice.
> 

Ah. Yes. You could potentially visit the old group_leader twice if you
have exec'ed and have already visited the exec thread. You'd visit the
old group_leader again  because thread_group_leader(t) would no longer be
true for the old group_leader. Worse yet, you'd visit it again and just
keep on going.

> Thanks Mandeep.
> 
> I'll try to recheck my thinking once again, what do you think? Anything
> else we could miss?
> 

Yeah, the ->group_leader solution seems the most promising. It seems
correct (ignoring barriers) and should work for all supported cases:

1) when g is group_leader
2) when g is current

Regards,
Mandeep

> Oleg.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-18 23:17                                   ` Mandeep Singh Baines
@ 2012-01-19 15:45                                     ` Oleg Nesterov
  2012-01-19 18:18                                       ` Mandeep Singh Baines
  2012-03-20 19:34                                       ` Oleg Nesterov
  0 siblings, 2 replies; 29+ messages in thread
From: Oleg Nesterov @ 2012-01-19 15:45 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Frederic Weisbecker, Li Zefan, Tejun Heo, LKML, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On 01/18, Mandeep Singh Baines wrote:
>
> Oleg Nesterov (oleg@redhat.com) wrote:
> > On 01/13, Mandeep Singh Baines wrote:
> > >
> > > Case 3: g is some other thread
> > >
> > > In this case, g MUST be current
> >
> > Why? This is not true.
>
> Here is my thinking:
>
> The terminating condition, t != g, assumes that you can get back to
> g. If g is unhashed, there is no guarantee you'll ever get back to it.
> Holding a reference does not prevent unhashing.
>
> for_each_process avoids unhashing by starting and ending at init_task
> (which can never be unhashed).

Yes,

> As you pointed out a while back, this doesn't work for:
>
> do_each_thread(g, t){
>   do_something(t);
> } while_each_thread(g, t)
>
> because g can be unhashed.
>
> However, you should be able to use while_each_thread if you are current.
> Being current would prevent 'g' from being unhashed.

Ah, I misunderstood you. Yes, sure.

> other than, do_each_thread/while_each_thread, all other callers
> of while_each_thread() are starting at current. Otherwise, how do
> you guarantee that it terminates.

Hm, still can't understand...

> I see at least one example, coredump_wait() that uses while_each_thread
> starting at current. I didn't find any cases where while_each_thread
> starts anywhere other than current or group_leader.

Probably you meant zap_threads/zap_process, not coredump_wait?

zap_process() is fine, we hold ->siglock. But zap_threads does _not_
start at current, may be you misread the g == tsk->group_leader check
in the main for_each_process() loop ? But most probably we simply
misunderstand each other a bit, see below.

However it starts at ->group_leader, so it won't suffer if we restrict
the lockless while_each_thread_rcu().

> > But I guess this doesn't matter, I think I am
> > starting to understand why our discussion was a bit confusing.
> >
> > The problem is _not_ exec/de_thread by itself. The problem is that
> > while_each_thread(g, t) can race with removing 'g' from the list.
> > IOW, list_for_each_rcu-like things assume that the head of the list
> > is "stable" but this is not true.
> >
> > And note that de_thread() does not matter in this sense, only
> > __unhash_process()->list_del_rcu(&p->thread_group) does matter.
> >
> > Now. If 'g' is the group leader, we should only worry about exec,
> > otherwise it can't disappear before other threads.
> >
> > But if it is not the group leader, it can simply exit and
> > while_each_thread(g, t) can never stop by the same reason.
> >
>
> I think we are on the same page. Your explanation is consistent with
> my understanding.
>
> Some other thoughts:
>
> I suspect that other than do_each_thread()/while_each_thread() or
> for_each_thread()/while_each_thread() where 'g' is the group_leader,
> 'g' MUST be current. So the only cases to consider are:

I didn't try to grep, but I do not know any example of the lockless
while_each_thread() which starts at current.

I guess this is the source of confusion.

> > I'll try to recheck my thinking once again, what do you think? Anything
> > else we could miss?
>
> Yeah, the ->group_leader solution seems the most promising. It seems
> correct (ignoring barriers) and should work for all supported cases:
>
> 1) when g is group_leader
> 2) when g is current

OK, thanks.

I'll try to investigate if we can remove

	leader->group_leader = tsk;

from de_thread(). In fact I already thought about this change a long
ago without any connection to while_each_thread(). This assignment
looks "assymetrical" compared to other threads we kill. But we did
have a reason (reasons?). Hopefully, the only really important reason
was already removed by 087806b1.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-19 15:45                                     ` Oleg Nesterov
@ 2012-01-19 18:18                                       ` Mandeep Singh Baines
  2012-01-20 15:06                                         ` Oleg Nesterov
  2012-03-20 19:34                                       ` Oleg Nesterov
  1 sibling, 1 reply; 29+ messages in thread
From: Mandeep Singh Baines @ 2012-01-19 18:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mandeep Singh Baines, Frederic Weisbecker, Li Zefan, Tejun Heo,
	LKML, Containers, Cgroups, KAMEZAWA Hiroyuki, Paul Menage,
	Andrew Morton, Paul E. McKenney

Oleg Nesterov (oleg@redhat.com) wrote:
> On 01/18, Mandeep Singh Baines wrote:
> >
> > Oleg Nesterov (oleg@redhat.com) wrote:
> > > On 01/13, Mandeep Singh Baines wrote:
> > > >
> > > > Case 3: g is some other thread
> > > >
> > > > In this case, g MUST be current
> > >
> > > Why? This is not true.
> >
> > Here is my thinking:
> >
> > The terminating condition, t != g, assumes that you can get back to
> > g. If g is unhashed, there is no guarantee you'll ever get back to it.
> > Holding a reference does not prevent unhashing.
> >
> > for_each_process avoids unhashing by starting and ending at init_task
> > (which can never be unhashed).
> 
> Yes,
> 
> > As you pointed out a while back, this doesn't work for:
> >
> > do_each_thread(g, t){
> >   do_something(t);
> > } while_each_thread(g, t)
> >
> > because g can be unhashed.
> >
> > However, you should be able to use while_each_thread if you are current.
> > Being current would prevent 'g' from being unhashed.
> 
> Ah, I misunderstood you. Yes, sure.
> 
> > other than, do_each_thread/while_each_thread, all other callers
> > of while_each_thread() are starting at current. Otherwise, how do
> > you guarantee that it terminates.
> 
> Hm, still can't understand...
> 

On second thought. I think I've made some incorrect assumptions.

> > I see at least one example, coredump_wait() that uses while_each_thread
> > starting at current. I didn't find any cases where while_each_thread
> > starts anywhere other than current or group_leader.
> 
> Probably you meant zap_threads/zap_process, not coredump_wait?
> 
> zap_process() is fine, we hold ->siglock. But zap_threads does _not_

Ah. Missed the siglock.

> start at current, may be you misread the g == tsk->group_leader check
> in the main for_each_process() loop ? But most probably we simply
> misunderstand each other a bit, see below.
> 
> However it starts at ->group_leader, so it won't suffer if we restrict
> the lockless while_each_thread_rcu().
> 
> > > But I guess this doesn't matter, I think I am
> > > starting to understand why our discussion was a bit confusing.
> > >
> > > The problem is _not_ exec/de_thread by itself. The problem is that
> > > while_each_thread(g, t) can race with removing 'g' from the list.
> > > IOW, list_for_each_rcu-like things assume that the head of the list
> > > is "stable" but this is not true.
> > >
> > > And note that de_thread() does not matter in this sense, only
> > > __unhash_process()->list_del_rcu(&p->thread_group) does matter.
> > >
> > > Now. If 'g' is the group leader, we should only worry about exec,
> > > otherwise it can't disappear before other threads.
> > >
> > > But if it is not the group leader, it can simply exit and
> > > while_each_thread(g, t) can never stop by the same reason.
> > >
> >
> > I think we are on the same page. Your explanation is consistent with
> > my understanding.
> >
> > Some other thoughts:
> >
> > I suspect that other than do_each_thread()/while_each_thread() or
> > for_each_thread()/while_each_thread() where 'g' is the group_leader,
> > 'g' MUST be current. So the only cases to consider are:
> 
> I didn't try to grep, but I do not know any example of the lockless
> while_each_thread() which starts at current.
> 

Did a quick scan of all the while_each_thread()s under rcu_read_lock
and couldn't find one that starts at current.

> I guess this is the source of confusion.
> 
> > > I'll try to recheck my thinking once again, what do you think? Anything
> > > else we could miss?
> >
> > Yeah, the ->group_leader solution seems the most promising. It seems
> > correct (ignoring barriers) and should work for all supported cases:
> >
> > 1) when g is group_leader
> > 2) when g is current
> 
> OK, thanks.
> 
> I'll try to investigate if we can remove
> 
> 	leader->group_leader = tsk;
> 
> from de_thread(). In fact I already thought about this change a long
> ago without any connection to while_each_thread(). This assignment
> looks "assymetrical" compared to other threads we kill. But we did
> have a reason (reasons?). Hopefully, the only really important reason
> was already removed by 087806b1.
> 

Ah. So the leader->group_leader may have been necessary earlier in order
to prevent two tasks, old leader and new leader from both returning true
for thread_group_leader(tsk).

Regards,
Mandeep

> Oleg.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-19 18:18                                       ` Mandeep Singh Baines
@ 2012-01-20 15:06                                         ` Oleg Nesterov
  0 siblings, 0 replies; 29+ messages in thread
From: Oleg Nesterov @ 2012-01-20 15:06 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Frederic Weisbecker, Li Zefan, Tejun Heo, LKML, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On 01/19, Mandeep Singh Baines wrote:
>
> Oleg Nesterov (oleg@redhat.com) wrote:
> > But we did
> > have a reason (reasons?). Hopefully, the only really important reason
> > was already removed by 087806b1.
>
> Ah. So the leader->group_leader may have been necessary earlier in order
> to prevent two tasks, old leader and new leader from both returning true
> for thread_group_leader(tsk).

Not really. de_thread() does release_task(old_leader). Please look, for
example, at __exit_signal(). thread_group_leader() must not be true in
this case.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2011-12-21 13:08 ` Oleg Nesterov
  2011-12-21 17:56   ` Frederic Weisbecker
  2011-12-21 17:59   ` Frederic Weisbecker
@ 2012-02-01 16:28   ` Frederic Weisbecker
  2 siblings, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2012-02-01 16:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Li Zefan, Tejun Heo, LKML, Mandeep Singh Baines, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On Wed, Dec 21, 2011 at 02:08:48PM +0100, Oleg Nesterov wrote:
> On 12/21, Frederic Weisbecker wrote:
> > - Is the check for use_task_css_set_links in cgroup_post_fork() safe? given
> > it is checked outside css_set_lock?
> >
> > Imagine this:
> >
> > CPU 0                                                        CPU 1
> > ----                                                         -----
> >
> > cgroup_enable_task_cg() {
> > 	uset_tasks_css_set_links = 1
> > 	for_each_thread() {
> > 		add tasks in the list
> > 	}
> > }
> >                                                            do_fork() {
> >                                                                cgroup_post_fork() {
> >                                                                      use_tasks_css_set_links appears
> >                                                                      to be equal to 0 due to write/read
> >                                                                       not flushed. New task won't
> >                                                                       appear to the list.
> 
> Yes, I was thinking about this too.
> 
> Or (I think) they can race "contrariwise". CPU_1 creates the new child,
> then CPU_0 sets uset_tasks_css_set_links = 1. But afaics there is no any
> guarantee that CPU_0 sees the result of list_add_tail_rcu().

Exactly! In fact even if RCU was safe with while_each_thread() it wouldn't
be enough for us because of that.

I fear we need the read_lock(tasklist_lock) here, with a pair of smp
barriers to ensure use_task_css_set_links update is visible as
expected.

I'll try to cook something.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-01-19 15:45                                     ` Oleg Nesterov
  2012-01-19 18:18                                       ` Mandeep Singh Baines
@ 2012-03-20 19:34                                       ` Oleg Nesterov
  2012-03-21 18:59                                         ` Mandeep Singh Baines
  1 sibling, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2012-03-20 19:34 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Frederic Weisbecker, Li Zefan, Tejun Heo, LKML, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

OK, finally we should do something with this problem ;)

On 01/19, Oleg Nesterov wrote:
>
> I'll try to investigate if we can remove
>
> 	leader->group_leader = tsk;
>
> from de_thread(). In fact I already thought about this change a long
> ago without any connection to while_each_thread(). This assignment
> looks "assymetrical" compared to other threads we kill. But we did
> have a reason (reasons?). Hopefully, the only really important reason
> was already removed by 087806b1.

On the second thought, I think we should not do this.

For example, do_prlimit() assumes that tsk->group_leader is correct
under tasklist_lock.

OK, lets return to the thread_group_leader() check. To ensure we do
not visit the same thread twice we can check 'g', not 't'.

This is what I am going to send, after I re-check once again...

I have the problem with the changelog ;) Somehow it should explain
that while_each_thread_rcu(g, t) can't race with do_group_exit().
I think it can't, list_del_rcu(leader->thread_group) happens when
this entry is already "empty", it should be the last thread in group.
If the non-leader thread goes away from the least, we still have
the "path" to reach the leader. But this is not easy to explain.

As for the barrier. If de_thread() changes the leader it drops
and re-acquires tasklist_lock (this implies mb) after it changes
old_leader->exit_signal (used in thread_group_leader) and before
__unhash_process() which does list_del_rcu().

This means that if while_each_thread() sees the result of
list_del_rcu(old_leader) it must also see that
thread_group_leader(old_leader) != T.

What do you think? Do you see any problems?

Oleg.
---

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7d379a6..f169bfd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2323,9 +2323,24 @@ extern bool current_is_single_threaded(void);
 #define do_each_thread(g, t) \
 	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
 
+/*
+ * needs tasklist_lock or ->siglock, or rcu if the caller ensures
+ * that 'g' can't exit or exec.
+ */
 #define while_each_thread(g, t) \
 	while ((t = next_thread(t)) != g)
 
+/*
+ * rcu-safe, but should start at ->group_leader.
+ * thread_group_leader(g) protects against the race with exec which
+ * removes the leader from list.
+ * smp_rmb() pairs with implicit mb() implied by unlock + lock in
+ * de_thread()->release_task() path.
+ */
+#define while_each_thread_rcu(g, t)				\
+	while ((t = next_thread(t)) != g &&			\
+		({ smp_rmb(); thread_group_leader(g); }))
+
 static inline int get_nr_threads(struct task_struct *tsk)
 {
 	return tsk->signal->nr_threads;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-03-20 19:34                                       ` Oleg Nesterov
@ 2012-03-21 18:59                                         ` Mandeep Singh Baines
  2012-03-23 17:51                                           ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Mandeep Singh Baines @ 2012-03-21 18:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mandeep Singh Baines, Frederic Weisbecker, Li Zefan, Tejun Heo,
	LKML, Containers, Cgroups, KAMEZAWA Hiroyuki, Paul Menage,
	Andrew Morton, Paul E. McKenney

Oleg Nesterov (oleg@redhat.com) wrote:
> OK, finally we should do something with this problem ;)
> 
> On 01/19, Oleg Nesterov wrote:
> >
> > I'll try to investigate if we can remove
> >
> > 	leader->group_leader = tsk;
> >
> > from de_thread(). In fact I already thought about this change a long
> > ago without any connection to while_each_thread(). This assignment
> > looks "assymetrical" compared to other threads we kill. But we did
> > have a reason (reasons?). Hopefully, the only really important reason
> > was already removed by 087806b1.
> 
> On the second thought, I think we should not do this.
> 
> For example, do_prlimit() assumes that tsk->group_leader is correct
> under tasklist_lock.
> 
> OK, lets return to the thread_group_leader() check. To ensure we do
> not visit the same thread twice we can check 'g', not 't'.
> 
> This is what I am going to send, after I re-check once again...
> 
> I have the problem with the changelog ;) Somehow it should explain
> that while_each_thread_rcu(g, t) can't race with do_group_exit().
> I think it can't, list_del_rcu(leader->thread_group) happens when
> this entry is already "empty", it should be the last thread in group.
> If the non-leader thread goes away from the least, we still have
> the "path" to reach the leader. But this is not easy to explain.
> 
> As for the barrier. If de_thread() changes the leader it drops
> and re-acquires tasklist_lock (this implies mb) after it changes
> old_leader->exit_signal (used in thread_group_leader) and before
> __unhash_process() which does list_del_rcu().
> 
> This means that if while_each_thread() sees the result of
> list_del_rcu(old_leader) it must also see that
> thread_group_leader(old_leader) != T.
> 
> What do you think? Do you see any problems?
> 
> Oleg.
> ---
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 7d379a6..f169bfd 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2323,9 +2323,24 @@ extern bool current_is_single_threaded(void);
>  #define do_each_thread(g, t) \
>  	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
>  
> +/*
> + * needs tasklist_lock or ->siglock, or rcu if the caller ensures
> + * that 'g' can't exit or exec.
> + */
>  #define while_each_thread(g, t) \
>  	while ((t = next_thread(t)) != g)
>  
> +/*
> + * rcu-safe, but should start at ->group_leader.
> + * thread_group_leader(g) protects against the race with exec which
> + * removes the leader from list.
> + * smp_rmb() pairs with implicit mb() implied by unlock + lock in
> + * de_thread()->release_task() path.
> + */
> +#define while_each_thread_rcu(g, t)				\
> +	while ((t = next_thread(t)) != g &&			\
> +		({ smp_rmb(); thread_group_leader(g); }))
> +

Couldn't you miss the exec_thread if:

t = exec_thread && !thread_group_leader(g)

i.e. if you just passed (leader->group_leader = tsk;) in de_thread().

Could we change do_prlimit()? Especially since its slow path.

I really like you're earlier solution (ignoring barrier):

#define while_each_thread(g, t) \
	while (t->group_leader == g->group_leader && (t = next_thread(t)) != g)

Regards,
Mandeep

>  static inline int get_nr_threads(struct task_struct *tsk)
>  {
>  	return tsk->signal->nr_threads;
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Q: cgroup: Questions about possible issues in cgroup locking
  2012-03-21 18:59                                         ` Mandeep Singh Baines
@ 2012-03-23 17:51                                           ` Oleg Nesterov
  0 siblings, 0 replies; 29+ messages in thread
From: Oleg Nesterov @ 2012-03-23 17:51 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Frederic Weisbecker, Li Zefan, Tejun Heo, LKML, Containers,
	Cgroups, KAMEZAWA Hiroyuki, Paul Menage, Andrew Morton,
	Paul E. McKenney

On 03/21, Mandeep Singh Baines wrote:
>
> Oleg Nesterov (oleg@redhat.com) wrote:
> > +/*
> > + * rcu-safe, but should start at ->group_leader.
> > + * thread_group_leader(g) protects against the race with exec which
> > + * removes the leader from list.
> > + * smp_rmb() pairs with implicit mb() implied by unlock + lock in
> > + * de_thread()->release_task() path.
> > + */
> > +#define while_each_thread_rcu(g, t)				\
> > +	while ((t = next_thread(t)) != g &&			\
> > +		({ smp_rmb(); thread_group_leader(g); }))
> > +
>
> Couldn't you miss the exec_thread if:
>
> t = exec_thread && !thread_group_leader(g)

Yes, we already discussed this, iirc.

I was going to write that this is fine, but then I changed my mind.
Indeed, it is not good while_each_thread_rcu() can miss the new leader.

> Could we change do_prlimit()? Especially since its slow path.

But do_prlimit() is correct. It sees the unhashed task under tasklist,
task->group_leader should be correct.

> I really like you're earlier solution (ignoring barrier):
>
> #define while_each_thread(g, t) \
> 	while (t->group_leader == g->group_leader && (t = next_thread(t)) != g)

The problem is "ignoring barrier and races".

OK, I'll try to think again.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-03-23 17:59 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-21  3:43 Q: cgroup: Questions about possible issues in cgroup locking Frederic Weisbecker
2011-12-21 13:08 ` Oleg Nesterov
2011-12-21 17:56   ` Frederic Weisbecker
2011-12-21 19:01     ` Mandeep Singh Baines
2011-12-21 19:08       ` Frederic Weisbecker
2011-12-21 19:24         ` Mandeep Singh Baines
2011-12-21 20:04           ` Frederic Weisbecker
2011-12-22 15:30             ` Oleg Nesterov
2012-01-04 19:36               ` Mandeep Singh Baines
2012-01-06 15:23                 ` Oleg Nesterov
2012-01-06 18:25                   ` Mandeep Singh Baines
2012-01-11 16:07                     ` Oleg Nesterov
2012-01-12  0:31                       ` Mandeep Singh Baines
2012-01-12 17:07                         ` Oleg Nesterov
2012-01-12 17:57                           ` Mandeep Singh Baines
2012-01-13 15:20                             ` Oleg Nesterov
2012-01-13 18:27                               ` Mandeep Singh Baines
2012-01-14 17:36                                 ` Oleg Nesterov
2012-01-18 23:17                                   ` Mandeep Singh Baines
2012-01-19 15:45                                     ` Oleg Nesterov
2012-01-19 18:18                                       ` Mandeep Singh Baines
2012-01-20 15:06                                         ` Oleg Nesterov
2012-03-20 19:34                                       ` Oleg Nesterov
2012-03-21 18:59                                         ` Mandeep Singh Baines
2012-03-23 17:51                                           ` Oleg Nesterov
2011-12-21 17:59   ` Frederic Weisbecker
2011-12-21 18:11     ` Oleg Nesterov
2011-12-21 18:23       ` Frederic Weisbecker
2012-02-01 16:28   ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).