* cgroup attach task - slogging cpu @ 2013-10-04 5:25 anjana vk [not found] ` <CALPf4Tz+Gf_Q7wKKBufCc1mtV1qVPVrOW0S1qhHxfOv6pJa2Kg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: anjana vk @ 2013-10-04 5:25 UTC (permalink / raw) To: oleg-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A Cc: cgroups-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 1651 bytes --] Hi Oleg, All I saw an issue of CPU slogging posted in https://lkml.org/lkml/2013/8/28/283, and require your valuable suggestion on a very similar issue I am facing. We are facing the issue of cpu slogging in the function cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, bool threadgroup) in the do-while loop which iterates over the threadgroup. rcu_read_lock(); do { struct task_and_cgroup ent; /* @tsk either already exited or can't exit until the end */ if (tsk->flags & PF_EXITING) continue; /* as per above, nr_threads may decrease, but not increase. */ BUG_ON(i >= group_size); ent.task = tsk; ent.cgrp = task_cgroup_from_root(tsk, root); /* nothing to do if this task is already in the cgroup */ if (ent.cgrp == cgrp) continue; /* * saying GFP_ATOMIC has no effect here because we did prealloc * earlier, but it's good form to communicate our expectations. */ retval = flex_array_put(group, i, &ent, GFP_ATOMIC); BUG_ON(retval != 0); i++; if (!threadgroup) break; } while_each_thread(leader, tsk); Problem Observed: In this loop, in case of a single thread, and argument “bool threadgroup” as “false” and if(ent.cgrp == cgrp) is true we will continue to the next thread instead of breaking out of the loop. Possible Solution and Doubt: When a break condition was added as shown in the patch attached, the cpu sluggishness was not occurring. Can you please provide your suggestions, if this is the right way to fix the above mentioned issue. Also, if a fix is already in for this, can you please guide me to that. Thanks and Regards Anjana [-- Attachment #2: cgroup_singe_thread_attach.patch --] [-- Type: application/octet-stream, Size: 737 bytes --] diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 1f53387..cae8416 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2002,7 +2002,9 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, ent.task = tsk; ent.cgrp = task_cgroup_from_root(tsk, root); /* nothing to do if this task is already in the cgroup */ - if (ent.cgrp == cgrp) + if (ent.cgrp == cgrp && !threadgroup) + break; + else if(ent.cgrp == cgrp) continue; /* * saying GFP_ATOMIC has no effect here because we did prealloc @@ -2199,7 +2201,6 @@ retry_find_task: ret = cgroup_attach_task(cgrp, tsk, threadgroup); threadgroup_unlock(tsk); - put_task_struct(tsk); out_unlock_cgroup: mutex_unlock(&cgroup_mutex); ^ permalink raw reply related [flat|nested] 23+ messages in thread
[parent not found: <CALPf4Tz+Gf_Q7wKKBufCc1mtV1qVPVrOW0S1qhHxfOv6pJa2Kg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: cgroup attach task - slogging cpu [not found] ` <CALPf4Tz+Gf_Q7wKKBufCc1mtV1qVPVrOW0S1qhHxfOv6pJa2Kg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-10-04 13:02 ` Oleg Nesterov [not found] ` <20131004130207.GA9338-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: Oleg Nesterov @ 2013-10-04 13:02 UTC (permalink / raw) To: anjana vk; +Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA Hello Anjana, On 10/04, anjana vk wrote: > > I saw an issue of CPU slogging posted in > https://lkml.org/lkml/2013/8/28/283, and require your valuable > suggestion on a very similar issue I am facing. Not sure I understand, but just in case: yes the lockless while_each_thread() is buggy and should be fixed (it should actually die eventually). And a lot of current users of while_each_thread() are themselves buggy and need the fixes no matter what we do with while_each_thread. Oh. and just in case... I am (slowly) working on this, but didn't finish it yet, sorry. But I do not really understand cgroup_attach_task(), and I am not sure you observed the same problem. Add Tejun. > We are facing the issue of cpu slogging in the function > cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, > bool threadgroup) > in the do-while loop which iterates over the threadgroup. > > rcu_read_lock(); > do { > struct task_and_cgroup ent; > > /* @tsk either already exited or can't exit until the end */ > if (tsk->flags & PF_EXITING) > continue; > > /* as per above, nr_threads may decrease, but not increase. */ > BUG_ON(i >= group_size); > ent.task = tsk; > ent.cgrp = task_cgroup_from_root(tsk, root); > /* nothing to do if this task is already in the cgroup */ > if (ent.cgrp == cgrp) > continue; > /* > * saying GFP_ATOMIC has no effect here because we did prealloc > * earlier, but it's good form to communicate our expectations. > */ > retval = flex_array_put(group, i, &ent, GFP_ATOMIC); > BUG_ON(retval != 0); > i++; > > if (!threadgroup) > break; > } while_each_thread(leader, tsk); > > Problem Observed: > In this loop, in case of a single thread, and argument “bool > threadgroup” as “false” and > if(ent.cgrp == cgrp) is true > we will continue to the next thread instead of breaking out of the loop. But in this case the loop should be terminated by while_each_thread's check? Again, again, while_each_thread() is wrong, and we need --- x/kernel/cgroup.c +++ x/kernel/cgroup.c @@ -2034,7 +2034,7 @@ static int cgroup_attach_task(struct cgr * take an rcu_read_lock. */ rcu_read_lock(); - do { + for_each_thread(leader->signal, task) { struct task_and_cgroup ent; /* @tsk either already exited or can't exit until the end */ @@ -2058,7 +2058,7 @@ static int cgroup_attach_task(struct cgr if (!threadgroup) break; - } while_each_thread(leader, tsk); + } rcu_read_unlock(); /* remember the number of threads in the array for later. */ group_size = i; after we add for_each_thread/etc. But I am not sure we need another "threadgroup" check. > Possible Solution and Doubt: > When a break condition was added as shown in the patch attached, the > cpu sluggishness was not occurring. > Can you please provide your suggestions, if this is the right way to > fix the above mentioned issue. > Also, if a fix is already in for this, can you please guide me to that. > > Thanks and Regards > Anjana > > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 1f53387..cae8416 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -2002,7 +2002,9 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, > ent.task = tsk; > ent.cgrp = task_cgroup_from_root(tsk, root); > /* nothing to do if this task is already in the cgroup */ > - if (ent.cgrp == cgrp) > + if (ent.cgrp == cgrp && !threadgroup) > + break; > + else if(ent.cgrp == cgrp) > continue; > /* > * saying GFP_ATOMIC has no effect here because we did prealloc > @@ -2199,7 +2201,6 @@ retry_find_task: > ret = cgroup_attach_task(cgrp, tsk, threadgroup); > > threadgroup_unlock(tsk); > - > put_task_struct(tsk); > out_unlock_cgroup: > mutex_unlock(&cgroup_mutex); ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20131004130207.GA9338-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: cgroup attach task - slogging cpu [not found] ` <20131004130207.GA9338-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-10-07 18:45 ` Tejun Heo [not found] ` <20131007184507.GD27396-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: Tejun Heo @ 2013-10-07 18:45 UTC (permalink / raw) To: Oleg Nesterov; +Cc: anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA Hey, On Fri, Oct 04, 2013 at 03:02:07PM +0200, Oleg Nesterov wrote: > Not sure I understand, but just in case: yes the lockless > while_each_thread() is buggy and should be fixed (it should actually > die eventually). And a lot of current users of while_each_thread() > are themselves buggy and need the fixes no matter what we do with > while_each_thread. > > Oh. and just in case... I am (slowly) working on this, but didn't > finish it yet, sorry. Is there anything I can do to make the hang go away in the meantime? > But I do not really understand cgroup_attach_task(), and I am not > sure you observed the same problem. Add Tejun. Sounds like the same problem to me. > --- x/kernel/cgroup.c > +++ x/kernel/cgroup.c > @@ -2034,7 +2034,7 @@ static int cgroup_attach_task(struct cgr > * take an rcu_read_lock. > */ > rcu_read_lock(); > - do { > + for_each_thread(leader->signal, task) { > struct task_and_cgroup ent; > > /* @tsk either already exited or can't exit until the end */ > @@ -2058,7 +2058,7 @@ static int cgroup_attach_task(struct cgr > > if (!threadgroup) > break; > - } while_each_thread(leader, tsk); > + } > rcu_read_unlock(); > /* remember the number of threads in the array for later. */ > group_size = i; > > after we add for_each_thread/etc. But I am not sure we need another > "threadgroup" check. ... > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > > index 1f53387..cae8416 100644 > > --- a/kernel/cgroup.c > > +++ b/kernel/cgroup.c > > @@ -2002,7 +2002,9 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, > > ent.task = tsk; > > ent.cgrp = task_cgroup_from_root(tsk, root); > > /* nothing to do if this task is already in the cgroup */ > > - if (ent.cgrp == cgrp) > > + if (ent.cgrp == cgrp && !threadgroup) > > + break; > > + else if(ent.cgrp == cgrp) > > continue; I have no idea know how this would make any difference. Hmm... Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20131007184507.GD27396-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: cgroup attach task - slogging cpu [not found] ` <20131007184507.GD27396-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> @ 2013-10-08 14:58 ` Oleg Nesterov [not found] ` <20131008145833.GA15600-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: Oleg Nesterov @ 2013-10-08 14:58 UTC (permalink / raw) To: Tejun Heo; +Cc: anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA On 10/07, Tejun Heo wrote: > > Hey, > > On Fri, Oct 04, 2013 at 03:02:07PM +0200, Oleg Nesterov wrote: > > Not sure I understand, but just in case: yes the lockless > > while_each_thread() is buggy and should be fixed (it should actually > > die eventually). And a lot of current users of while_each_thread() > > are themselves buggy and need the fixes no matter what we do with > > while_each_thread. > > > > Oh. and just in case... I am (slowly) working on this, but didn't > > finish it yet, sorry. > > Is there anything I can do to make the hang go away in the meantime? well, probably tasklist, or pid_alive() check... note that this code rcu/while_each_thread() logic looks wrong even if while_each_thread() was fine. Will try to makes a patch today. But! Anjana! I am so sorry!! I simply can't understand how I managed to misunderstand (and confuse!) you. Tejun, could you please look at the patch from Anjana again? > > > --- a/kernel/cgroup.c > > > +++ b/kernel/cgroup.c > > > @@ -2002,7 +2002,9 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, > > > ent.task = tsk; > > > ent.cgrp = task_cgroup_from_root(tsk, root); > > > /* nothing to do if this task is already in the cgroup */ > > > - if (ent.cgrp == cgrp) > > > + if (ent.cgrp == cgrp && !threadgroup) > > > + break; > > > + else if(ent.cgrp == cgrp) > > > continue; > > I have no idea know how this would make any difference. Hmm... I guess the patch needs a cleanup, but this code looks indeed wrong even if we forget about rcu/while_each_thread? Really, we should not "continue" if threadgroup == T, in this case we miss the last if (!threadgroup) break; check in the main loop. So Anjana was right (sorry again!), and we should probably do ent.cgrp = task_cgroup_from_root(...); if (ent.cgrp != cgrp) { retval = flex_array_put(...); ... } if (!threadgroup) break; Or I am wrong again? Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20131008145833.GA15600-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: cgroup attach task - slogging cpu [not found] ` <20131008145833.GA15600-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-10-09 5:35 ` Li Zefan [not found] ` <5254EB2A.7090803-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: Li Zefan @ 2013-10-09 5:35 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Tejun Heo, anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA >>>> --- a/kernel/cgroup.c >>>> +++ b/kernel/cgroup.c >>>> @@ -2002,7 +2002,9 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, >>>> ent.task = tsk; >>>> ent.cgrp = task_cgroup_from_root(tsk, root); >>>> /* nothing to do if this task is already in the cgroup */ >>>> - if (ent.cgrp == cgrp) >>>> + if (ent.cgrp == cgrp && !threadgroup) >>>> + break; >>>> + else if(ent.cgrp == cgrp) >>>> continue; >> >> I have no idea know how this would make any difference. Hmm... > > I guess the patch needs a cleanup, but this code looks indeed wrong > even if we forget about rcu/while_each_thread? Really, we should not > "continue" if threadgroup == T, in this case we miss the last > > if (!threadgroup) > break; > Yes, this is wrong. lxc34:/mnt/tmp # echo 5207 > tasks lxc34:/mnt/tmp # cat tasks 5207 lxc34:/mnt/tmp # echo 5207 > tasks lxc34:/mnt/tmp # cat tasks 5207 5215 lxc34:/mnt/tmp # echo 5207 > tasks lxc34:/mnt/tmp # cat tasks 5207 5215 5216 I wanted to attach 5207 to the cgroup, and in this case it should be a no-op because the thread was already in this cgroup, but the result was another thread got attached to it. Anjana, could you revise the patch and send it out with proper changelog and Signed-off-by? And please add "Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # 3.9+" The cpu slogging issue is another bug that we're working on. > check in the main loop. So Anjana was right (sorry again!), and we > should probably do > > ent.cgrp = task_cgroup_from_root(...); > if (ent.cgrp != cgrp) { > retval = flex_array_put(...); > ... > } > > if (!threadgroup) > break; > Or do { ... if (ent.cgrp == cgrp) goto next; ... next: if (!threadgroup) break; } while (...); > Or I am wrong again? > No, you are not! :) ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <5254EB2A.7090803-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: cgroup attach task - slogging cpu [not found] ` <5254EB2A.7090803-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> @ 2013-10-09 13:30 ` Oleg Nesterov [not found] ` <20131009133047.GA12414-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: Oleg Nesterov @ 2013-10-09 13:30 UTC (permalink / raw) To: Li Zefan; +Cc: Tejun Heo, anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA On 10/09, Li Zefan wrote: > > Anjana, could you revise the patch and send it out with proper changelog > and Signed-off-by? And please add "Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # 3.9+" Yes, Anjana, please! > > check in the main loop. So Anjana was right (sorry again!), and we > > should probably do > > > > ent.cgrp = task_cgroup_from_root(...); > > if (ent.cgrp != cgrp) { > > retval = flex_array_put(...); > > ... > > } > > > > if (!threadgroup) > > break; > > > > Or > > do { > ... > if (ent.cgrp == cgrp) > goto next; Or this, agreed. > > Or I am wrong again? > > No, you are not! :) Thanks ;) Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20131009133047.GA12414-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: cgroup attach task - slogging cpu [not found] ` <20131009133047.GA12414-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-10-09 14:05 ` Oleg Nesterov 2013-10-09 16:54 ` Oleg Nesterov [not found] ` <CAChhN7RerxpSadqyosUeSKFg+qcOpO4d-maEKBZ0rvOQGvN27g@mail.gmail.com> 0 siblings, 2 replies; 23+ messages in thread From: Oleg Nesterov @ 2013-10-09 14:05 UTC (permalink / raw) To: Li Zefan; +Cc: Tejun Heo, anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA On 10/09, Oleg Nesterov wrote: > > On 10/09, Li Zefan wrote: > > > > Anjana, could you revise the patch and send it out with proper changelog > > and Signed-off-by? And please add "Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # 3.9+" > > Yes, Anjana, please! Please note also that the PF_EXITING check has the same problem, it also needs "goto next". > > > check in the main loop. So Anjana was right (sorry again!), and we > > > should probably do > > > > > > ent.cgrp = task_cgroup_from_root(...); > > > if (ent.cgrp != cgrp) { > > > retval = flex_array_put(...); > > > ... > > > } > > > > > > if (!threadgroup) > > > break; > > > > > > > Or > > > > do { > > ... > > if (ent.cgrp == cgrp) > > goto next; > > Or this, agreed. > > > > Or I am wrong again? > > > > No, you are not! :) > > Thanks ;) > > Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) @ 2013-10-09 16:54 ` Oleg Nesterov 0 siblings, 0 replies; 23+ messages in thread From: Oleg Nesterov @ 2013-10-09 16:54 UTC (permalink / raw) To: Li Zefan; +Cc: Tejun Heo, anjana vk, cgroups, linux-kernel, eunki_kim And I am starting to think that this change should also fix the while_each_thread() problems in this particular case. In generak the code like rcu_read_lock(); task = find_get_task(...); rcu_read_unlock(); rcu_read_lock(); t = task; do { ... } while_each_thread (task, t); rcu_read_unlock(); is wrong even if while_each_thread() was correct (and we have a lot of examples of this pattern). A GP can pass before the 2nd rcu-lock, and we simply can't trust ->thread_group.next. But I didn't notice that cgroup_attach_task(tsk, threadgroup) can only be called with threadgroup == T when a) tsk is ->group_leader and b) we hold threadgroup_lock() which blocks de_thread(). IOW, in this case "tsk" can't be removed from ->thread_group list before other threads. If next_thread() sees thread_group.next != leader, we know that the that .next thread didn't do __unhash_process() yet, and since we know that in this case "leader" didn't do this too we are safe. In short: __unhash_process(leader) (in this) case can never change ->thread_group.next of another thread, because leader->thread_group should be already list_empty(). On 10/09, Oleg Nesterov wrote: > > On 10/09, Oleg Nesterov wrote: > > > > On 10/09, Li Zefan wrote: > > > > > > Anjana, could you revise the patch and send it out with proper changelog > > > and Signed-off-by? And please add "Cc: <stable@vger.kernel.org> # 3.9+" > > > > Yes, Anjana, please! > > Please note also that the PF_EXITING check has the same problem, it also > needs "goto next". > > > > > check in the main loop. So Anjana was right (sorry again!), and we > > > > should probably do > > > > > > > > ent.cgrp = task_cgroup_from_root(...); > > > > if (ent.cgrp != cgrp) { > > > > retval = flex_array_put(...); > > > > ... > > > > } > > > > > > > > if (!threadgroup) > > > > break; > > > > > > > > > > Or > > > > > > do { > > > ... > > > if (ent.cgrp == cgrp) > > > goto next; > > > > Or this, agreed. > > > > > > Or I am wrong again? > > > > > > No, you are not! :) > > > > Thanks ;) > > > > Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) @ 2013-10-09 16:54 ` Oleg Nesterov 0 siblings, 0 replies; 23+ messages in thread From: Oleg Nesterov @ 2013-10-09 16:54 UTC (permalink / raw) To: Li Zefan Cc: Tejun Heo, anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, eunki_kim-Sze3O3UU22JBDgjK7y7TUQ And I am starting to think that this change should also fix the while_each_thread() problems in this particular case. In generak the code like rcu_read_lock(); task = find_get_task(...); rcu_read_unlock(); rcu_read_lock(); t = task; do { ... } while_each_thread (task, t); rcu_read_unlock(); is wrong even if while_each_thread() was correct (and we have a lot of examples of this pattern). A GP can pass before the 2nd rcu-lock, and we simply can't trust ->thread_group.next. But I didn't notice that cgroup_attach_task(tsk, threadgroup) can only be called with threadgroup == T when a) tsk is ->group_leader and b) we hold threadgroup_lock() which blocks de_thread(). IOW, in this case "tsk" can't be removed from ->thread_group list before other threads. If next_thread() sees thread_group.next != leader, we know that the that .next thread didn't do __unhash_process() yet, and since we know that in this case "leader" didn't do this too we are safe. In short: __unhash_process(leader) (in this) case can never change ->thread_group.next of another thread, because leader->thread_group should be already list_empty(). On 10/09, Oleg Nesterov wrote: > > On 10/09, Oleg Nesterov wrote: > > > > On 10/09, Li Zefan wrote: > > > > > > Anjana, could you revise the patch and send it out with proper changelog > > > and Signed-off-by? And please add "Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # 3.9+" > > > > Yes, Anjana, please! > > Please note also that the PF_EXITING check has the same problem, it also > needs "goto next". > > > > > check in the main loop. So Anjana was right (sorry again!), and we > > > > should probably do > > > > > > > > ent.cgrp = task_cgroup_from_root(...); > > > > if (ent.cgrp != cgrp) { > > > > retval = flex_array_put(...); > > > > ... > > > > } > > > > > > > > if (!threadgroup) > > > > break; > > > > > > > > > > Or > > > > > > do { > > > ... > > > if (ent.cgrp == cgrp) > > > goto next; > > > > Or this, agreed. > > > > > > Or I am wrong again? > > > > > > No, you are not! :) > > > > Thanks ;) > > > > Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) @ 2013-10-11 13:15 ` Li Zefan 0 siblings, 0 replies; 23+ messages in thread From: Li Zefan @ 2013-10-11 13:15 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Tejun Heo, anjana vk, cgroups, linux-kernel, eunki_kim On 2013/10/10 0:54, Oleg Nesterov wrote: > And I am starting to think that this change should also fix the > while_each_thread() problems in this particular case. > > In generak the code like > > rcu_read_lock(); > task = find_get_task(...); > rcu_read_unlock(); > > rcu_read_lock(); > t = task; > do { > ... > } while_each_thread (task, t); > rcu_read_unlock(); > > is wrong even if while_each_thread() was correct (and we have a lot > of examples of this pattern). A GP can pass before the 2nd rcu-lock, > and we simply can't trust ->thread_group.next. > > But I didn't notice that cgroup_attach_task(tsk, threadgroup) can only > be called with threadgroup == T when a) tsk is ->group_leader and b) > we hold threadgroup_lock() which blocks de_thread(). IOW, in this case > "tsk" can't be removed from ->thread_group list before other threads. > > If next_thread() sees thread_group.next != leader, we know that the > that .next thread didn't do __unhash_process() yet, and since we > know that in this case "leader" didn't do this too we are safe. > > In short: __unhash_process(leader) (in this) case can never change > ->thread_group.next of another thread, because leader->thread_group > should be already list_empty(). > If threadgroup == false, and if the tsk is existing or is already in the targeted cgroup, we won't break the loop due to the bug but do this: while_each_thread(task, t) If @task isn't the leader, we might got stuck in the loop? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) @ 2013-10-11 13:15 ` Li Zefan 0 siblings, 0 replies; 23+ messages in thread From: Li Zefan @ 2013-10-11 13:15 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, eunki_kim-Sze3O3UU22JBDgjK7y7TUQ On 2013/10/10 0:54, Oleg Nesterov wrote: > And I am starting to think that this change should also fix the > while_each_thread() problems in this particular case. > > In generak the code like > > rcu_read_lock(); > task = find_get_task(...); > rcu_read_unlock(); > > rcu_read_lock(); > t = task; > do { > ... > } while_each_thread (task, t); > rcu_read_unlock(); > > is wrong even if while_each_thread() was correct (and we have a lot > of examples of this pattern). A GP can pass before the 2nd rcu-lock, > and we simply can't trust ->thread_group.next. > > But I didn't notice that cgroup_attach_task(tsk, threadgroup) can only > be called with threadgroup == T when a) tsk is ->group_leader and b) > we hold threadgroup_lock() which blocks de_thread(). IOW, in this case > "tsk" can't be removed from ->thread_group list before other threads. > > If next_thread() sees thread_group.next != leader, we know that the > that .next thread didn't do __unhash_process() yet, and since we > know that in this case "leader" didn't do this too we are safe. > > In short: __unhash_process(leader) (in this) case can never change > ->thread_group.next of another thread, because leader->thread_group > should be already list_empty(). > If threadgroup == false, and if the tsk is existing or is already in the targeted cgroup, we won't break the loop due to the bug but do this: while_each_thread(task, t) If @task isn't the leader, we might got stuck in the loop? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) @ 2013-10-11 16:00 ` Oleg Nesterov 0 siblings, 0 replies; 23+ messages in thread From: Oleg Nesterov @ 2013-10-11 16:00 UTC (permalink / raw) To: Li Zefan; +Cc: Tejun Heo, anjana vk, cgroups, linux-kernel, eunki_kim On 10/11, Li Zefan wrote: > > On 2013/10/10 0:54, Oleg Nesterov wrote: > > > And I am starting to think that this change should also fix the > > while_each_thread() problems in this particular case. Please see below, > > In generak the code like > > > > rcu_read_lock(); > > task = find_get_task(...); > > rcu_read_unlock(); > > > > rcu_read_lock(); > > t = task; > > do { > > ... > > } while_each_thread (task, t); > > rcu_read_unlock(); > > > > is wrong even if while_each_thread() was correct (and we have a lot > > of examples of this pattern). A GP can pass before the 2nd rcu-lock, > > and we simply can't trust ->thread_group.next. > > > > But I didn't notice that cgroup_attach_task(tsk, threadgroup) can only > > be called with threadgroup == T when a) tsk is ->group_leader and b) > > we hold threadgroup_lock() which blocks de_thread(). IOW, in this case > > "tsk" can't be removed from ->thread_group list before other threads. > > > > If next_thread() sees thread_group.next != leader, we know that the > > that .next thread didn't do __unhash_process() yet, and since we > > know that in this case "leader" didn't do this too we are safe. > > > > In short: __unhash_process(leader) (in this) case can never change > > ->thread_group.next of another thread, because leader->thread_group > > should be already list_empty(). > > > > If threadgroup == false, and if the tsk is existing or is already in > the targeted cgroup, we won't break the loop due to the bug but do > this: > > while_each_thread(task, t) > > If @task isn't the leader, we might got stuck in the loop? Yes, yes, sure. We need to fix the wrong "continue" logic, hopefully I tried to say (see above) that after we do this while_each_thread() should be fine in this particular case. Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) @ 2013-10-11 16:00 ` Oleg Nesterov 0 siblings, 0 replies; 23+ messages in thread From: Oleg Nesterov @ 2013-10-11 16:00 UTC (permalink / raw) To: Li Zefan Cc: Tejun Heo, anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, eunki_kim-Sze3O3UU22JBDgjK7y7TUQ On 10/11, Li Zefan wrote: > > On 2013/10/10 0:54, Oleg Nesterov wrote: > > > And I am starting to think that this change should also fix the > > while_each_thread() problems in this particular case. Please see below, > > In generak the code like > > > > rcu_read_lock(); > > task = find_get_task(...); > > rcu_read_unlock(); > > > > rcu_read_lock(); > > t = task; > > do { > > ... > > } while_each_thread (task, t); > > rcu_read_unlock(); > > > > is wrong even if while_each_thread() was correct (and we have a lot > > of examples of this pattern). A GP can pass before the 2nd rcu-lock, > > and we simply can't trust ->thread_group.next. > > > > But I didn't notice that cgroup_attach_task(tsk, threadgroup) can only > > be called with threadgroup == T when a) tsk is ->group_leader and b) > > we hold threadgroup_lock() which blocks de_thread(). IOW, in this case > > "tsk" can't be removed from ->thread_group list before other threads. > > > > If next_thread() sees thread_group.next != leader, we know that the > > that .next thread didn't do __unhash_process() yet, and since we > > know that in this case "leader" didn't do this too we are safe. > > > > In short: __unhash_process(leader) (in this) case can never change > > ->thread_group.next of another thread, because leader->thread_group > > should be already list_empty(). > > > > If threadgroup == false, and if the tsk is existing or is already in > the targeted cgroup, we won't break the loop due to the bug but do > this: > > while_each_thread(task, t) > > If @task isn't the leader, we might got stuck in the loop? Yes, yes, sure. We need to fix the wrong "continue" logic, hopefully I tried to say (see above) that after we do this while_each_thread() should be fine in this particular case. Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] cgroup: fix to break the while loop in cgroup_attach_task() correctly @ 2013-10-12 2:59 ` Li Zefan 0 siblings, 0 replies; 23+ messages in thread From: Li Zefan @ 2013-10-12 2:59 UTC (permalink / raw) To: Tejun Heo; +Cc: Oleg Nesterov, anjana vk, cgroups, linux-kernel, eunki_kim From: Anjana V Kumar <anjanavk12@gmail.com> Both Anjana and Eunki reported a stall in the while_each_thread loop in cgroup_attach_task(). It's because, when we attach a single thread to a cgroup, if the cgroup is exiting or is already in that cgroup, we won't break the loop. If the task is already in the cgroup, the bug can lead to another thread being attached to the cgroup unexpectedly: # echo 5207 > tasks # cat tasks 5207 # echo 5207 > tasks # cat tasks 5207 5215 What's worse, if the task to be attached isn't the leader of the thread group, we might never exit the loop, hence cpu stall. Thanks for Oleg's analysis. This bug was introduced by commit 081aa458c38ba576bdd4265fc807fa95b48b9e79 ("cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc()") Cc: <stable@vger.kernel.org> # 3.9+ Reported-by: Eunki Kim <eunki_kim@samsung.com> Reported-by: Anjana V Kumar <anjanavk12@gmail.com> Signed-off-by: Anjana V Kumar <anjanavk12@gmail.com> [ lizf: - fixed the first continue, pointed out by Oleg, - rewrote changelog. ] Signed-off-by: Li Zefan <lizefan@huawei.com> --- kernel/cgroup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index a5629f1..3db1d2e 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2002,7 +2002,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, /* @tsk either already exited or can't exit until the end */ if (tsk->flags & PF_EXITING) - continue; + goto next; /* as per above, nr_threads may decrease, but not increase. */ BUG_ON(i >= group_size); @@ -2010,7 +2010,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, ent.cgrp = task_cgroup_from_root(tsk, root); /* nothing to do if this task is already in the cgroup */ if (ent.cgrp == cgrp) - continue; + goto next; /* * saying GFP_ATOMIC has no effect here because we did prealloc * earlier, but it's good form to communicate our expectations. @@ -2018,7 +2018,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, retval = flex_array_put(group, i, &ent, GFP_ATOMIC); BUG_ON(retval != 0); i++; - +next: if (!threadgroup) break; } while_each_thread(leader, tsk); -- 1.8.0.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH] cgroup: fix to break the while loop in cgroup_attach_task() correctly @ 2013-10-12 2:59 ` Li Zefan 0 siblings, 0 replies; 23+ messages in thread From: Li Zefan @ 2013-10-12 2:59 UTC (permalink / raw) To: Tejun Heo Cc: Oleg Nesterov, anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, eunki_kim-Sze3O3UU22JBDgjK7y7TUQ From: Anjana V Kumar <anjanavk12-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Both Anjana and Eunki reported a stall in the while_each_thread loop in cgroup_attach_task(). It's because, when we attach a single thread to a cgroup, if the cgroup is exiting or is already in that cgroup, we won't break the loop. If the task is already in the cgroup, the bug can lead to another thread being attached to the cgroup unexpectedly: # echo 5207 > tasks # cat tasks 5207 # echo 5207 > tasks # cat tasks 5207 5215 What's worse, if the task to be attached isn't the leader of the thread group, we might never exit the loop, hence cpu stall. Thanks for Oleg's analysis. This bug was introduced by commit 081aa458c38ba576bdd4265fc807fa95b48b9e79 ("cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc()") Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # 3.9+ Reported-by: Eunki Kim <eunki_kim-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org> Reported-by: Anjana V Kumar <anjanavk12-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Signed-off-by: Anjana V Kumar <anjanavk12-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> [ lizf: - fixed the first continue, pointed out by Oleg, - rewrote changelog. ] Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> --- kernel/cgroup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index a5629f1..3db1d2e 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2002,7 +2002,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, /* @tsk either already exited or can't exit until the end */ if (tsk->flags & PF_EXITING) - continue; + goto next; /* as per above, nr_threads may decrease, but not increase. */ BUG_ON(i >= group_size); @@ -2010,7 +2010,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, ent.cgrp = task_cgroup_from_root(tsk, root); /* nothing to do if this task is already in the cgroup */ if (ent.cgrp == cgrp) - continue; + goto next; /* * saying GFP_ATOMIC has no effect here because we did prealloc * earlier, but it's good form to communicate our expectations. @@ -2018,7 +2018,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, retval = flex_array_put(group, i, &ent, GFP_ATOMIC); BUG_ON(retval != 0); i++; - +next: if (!threadgroup) break; } while_each_thread(leader, tsk); -- 1.8.0.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] cgroup: fix to break the while loop in cgroup_attach_task() correctly @ 2013-10-13 20:08 ` Tejun Heo 0 siblings, 0 replies; 23+ messages in thread From: Tejun Heo @ 2013-10-13 20:08 UTC (permalink / raw) To: Li Zefan; +Cc: Oleg Nesterov, anjana vk, cgroups, linux-kernel, eunki_kim On Sat, Oct 12, 2013 at 10:59:17AM +0800, Li Zefan wrote: > From: Anjana V Kumar <anjanavk12@gmail.com> > > Both Anjana and Eunki reported a stall in the while_each_thread loop > in cgroup_attach_task(). > > It's because, when we attach a single thread to a cgroup, if the cgroup > is exiting or is already in that cgroup, we won't break the loop. > > If the task is already in the cgroup, the bug can lead to another thread > being attached to the cgroup unexpectedly: > > # echo 5207 > tasks > # cat tasks > 5207 > # echo 5207 > tasks > # cat tasks > 5207 > 5215 > > What's worse, if the task to be attached isn't the leader of the thread > group, we might never exit the loop, hence cpu stall. Thanks for Oleg's > analysis. > > This bug was introduced by commit 081aa458c38ba576bdd4265fc807fa95b48b9e79 > ("cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc()") > > Cc: <stable@vger.kernel.org> # 3.9+ > Reported-by: Eunki Kim <eunki_kim@samsung.com> > Reported-by: Anjana V Kumar <anjanavk12@gmail.com> > Signed-off-by: Anjana V Kumar <anjanavk12@gmail.com> > [ lizf: - fixed the first continue, pointed out by Oleg, > - rewrote changelog. ] > Signed-off-by: Li Zefan <lizefan@huawei.com> Applied to cgroup/for-3.12-fixes. Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] cgroup: fix to break the while loop in cgroup_attach_task() correctly @ 2013-10-13 20:08 ` Tejun Heo 0 siblings, 0 replies; 23+ messages in thread From: Tejun Heo @ 2013-10-13 20:08 UTC (permalink / raw) To: Li Zefan Cc: Oleg Nesterov, anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, eunki_kim-Sze3O3UU22JBDgjK7y7TUQ On Sat, Oct 12, 2013 at 10:59:17AM +0800, Li Zefan wrote: > From: Anjana V Kumar <anjanavk12-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > > Both Anjana and Eunki reported a stall in the while_each_thread loop > in cgroup_attach_task(). > > It's because, when we attach a single thread to a cgroup, if the cgroup > is exiting or is already in that cgroup, we won't break the loop. > > If the task is already in the cgroup, the bug can lead to another thread > being attached to the cgroup unexpectedly: > > # echo 5207 > tasks > # cat tasks > 5207 > # echo 5207 > tasks > # cat tasks > 5207 > 5215 > > What's worse, if the task to be attached isn't the leader of the thread > group, we might never exit the loop, hence cpu stall. Thanks for Oleg's > analysis. > > This bug was introduced by commit 081aa458c38ba576bdd4265fc807fa95b48b9e79 > ("cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc()") > > Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # 3.9+ > Reported-by: Eunki Kim <eunki_kim-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org> > Reported-by: Anjana V Kumar <anjanavk12-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > Signed-off-by: Anjana V Kumar <anjanavk12-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > [ lizf: - fixed the first continue, pointed out by Oleg, > - rewrote changelog. ] > Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Applied to cgroup/for-3.12-fixes. Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) @ 2013-10-15 5:04 ` anjana vk 0 siblings, 0 replies; 23+ messages in thread From: anjana vk @ 2013-10-15 5:04 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Li Zefan, Tejun Heo, cgroups, linux-kernel, eunki_kim Hi All, Thankyou for your suggestions. I have made the modifications as suggested. Please find the patch below. >From fd85f093f912a160c0896ea2784dfbdd64f142ca Mon Sep 17 00:00:00 2001 From: Anjana V Kumar <anjanavk12@gmail.com> Date: Wed, 9 Oct 2013 16:49:22 +0530 Subject: [PATCH] Single thread break missing in cgroup_attach_task Problem: Issue when attaching a single thread to a cgroup if the thread was alredy in the cgroup. The check if the thread is already in cgroup in the above case, continues to the next thread instead of exciting. Solution: Add a check if it is a single thread or a threadgroup and take decision accordingly. If it is a single thread break out of the do-while loop, and if it is a threadgroup goto the next thread. Change-Id: If6bbdef12ca5992823ac2a7bc15eaeb3dddb461a Signed-off-by: Anjana V Kumar <anjanavk12@gmail.com> --- kernel/cgroup.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2418b6e..13eafdc 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2039,7 +2039,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, /* @tsk either already exited or can't exit until the end */ if (tsk->flags & PF_EXITING) - continue; + goto next_thread; /* as per above, nr_threads may decrease, but not increase. */ BUG_ON(i >= group_size); @@ -2047,7 +2047,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, ent.cgrp = task_cgroup_from_root(tsk, root); /* nothing to do if this task is already in the cgroup */ if (ent.cgrp == cgrp) - continue; + goto next_thread; /* * saying GFP_ATOMIC has no effect here because we did prealloc * earlier, but it's good form to communicate our expectations. @@ -2055,7 +2055,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, retval = flex_array_put(group, i, &ent, GFP_ATOMIC); BUG_ON(retval != 0); i++; - +next_thread: if (!threadgroup) break; } while_each_thread(leader, tsk); -- 1.7.6 On 10/11/13, Oleg Nesterov <oleg@redhat.com> wrote: > On 10/11, Li Zefan wrote: >> >> On 2013/10/10 0:54, Oleg Nesterov wrote: >> >> > And I am starting to think that this change should also fix the >> > while_each_thread() problems in this particular case. > > Please see below, > >> > In generak the code like >> > >> > rcu_read_lock(); >> > task = find_get_task(...); >> > rcu_read_unlock(); >> > >> > rcu_read_lock(); >> > t = task; >> > do { >> > ... >> > } while_each_thread (task, t); >> > rcu_read_unlock(); >> > >> > is wrong even if while_each_thread() was correct (and we have a lot >> > of examples of this pattern). A GP can pass before the 2nd rcu-lock, >> > and we simply can't trust ->thread_group.next. >> > >> > But I didn't notice that cgroup_attach_task(tsk, threadgroup) can only >> > be called with threadgroup == T when a) tsk is ->group_leader and b) >> > we hold threadgroup_lock() which blocks de_thread(). IOW, in this case >> > "tsk" can't be removed from ->thread_group list before other threads. >> > >> > If next_thread() sees thread_group.next != leader, we know that the >> > that .next thread didn't do __unhash_process() yet, and since we >> > know that in this case "leader" didn't do this too we are safe. >> > >> > In short: __unhash_process(leader) (in this) case can never change >> > ->thread_group.next of another thread, because leader->thread_group >> > should be already list_empty(). >> > >> >> If threadgroup == false, and if the tsk is existing or is already in >> the targeted cgroup, we won't break the loop due to the bug but do >> this: >> >> while_each_thread(task, t) >> >> If @task isn't the leader, we might got stuck in the loop? > > Yes, yes, sure. We need to fix the wrong "continue" logic, hopefully > > I tried to say (see above) that after we do this while_each_thread() > should be fine in this particular case. > > Oleg. > > ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) @ 2013-10-15 5:04 ` anjana vk 0 siblings, 0 replies; 23+ messages in thread From: anjana vk @ 2013-10-15 5:04 UTC (permalink / raw) To: Oleg Nesterov Cc: Li Zefan, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, eunki_kim-Sze3O3UU22JBDgjK7y7TUQ Hi All, Thankyou for your suggestions. I have made the modifications as suggested. Please find the patch below. From fd85f093f912a160c0896ea2784dfbdd64f142ca Mon Sep 17 00:00:00 2001 From: Anjana V Kumar <anjanavk12-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Date: Wed, 9 Oct 2013 16:49:22 +0530 Subject: [PATCH] Single thread break missing in cgroup_attach_task Problem: Issue when attaching a single thread to a cgroup if the thread was alredy in the cgroup. The check if the thread is already in cgroup in the above case, continues to the next thread instead of exciting. Solution: Add a check if it is a single thread or a threadgroup and take decision accordingly. If it is a single thread break out of the do-while loop, and if it is a threadgroup goto the next thread. Change-Id: If6bbdef12ca5992823ac2a7bc15eaeb3dddb461a Signed-off-by: Anjana V Kumar <anjanavk12-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> --- kernel/cgroup.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2418b6e..13eafdc 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2039,7 +2039,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, /* @tsk either already exited or can't exit until the end */ if (tsk->flags & PF_EXITING) - continue; + goto next_thread; /* as per above, nr_threads may decrease, but not increase. */ BUG_ON(i >= group_size); @@ -2047,7 +2047,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, ent.cgrp = task_cgroup_from_root(tsk, root); /* nothing to do if this task is already in the cgroup */ if (ent.cgrp == cgrp) - continue; + goto next_thread; /* * saying GFP_ATOMIC has no effect here because we did prealloc * earlier, but it's good form to communicate our expectations. @@ -2055,7 +2055,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, retval = flex_array_put(group, i, &ent, GFP_ATOMIC); BUG_ON(retval != 0); i++; - +next_thread: if (!threadgroup) break; } while_each_thread(leader, tsk); -- 1.7.6 On 10/11/13, Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On 10/11, Li Zefan wrote: >> >> On 2013/10/10 0:54, Oleg Nesterov wrote: >> >> > And I am starting to think that this change should also fix the >> > while_each_thread() problems in this particular case. > > Please see below, > >> > In generak the code like >> > >> > rcu_read_lock(); >> > task = find_get_task(...); >> > rcu_read_unlock(); >> > >> > rcu_read_lock(); >> > t = task; >> > do { >> > ... >> > } while_each_thread (task, t); >> > rcu_read_unlock(); >> > >> > is wrong even if while_each_thread() was correct (and we have a lot >> > of examples of this pattern). A GP can pass before the 2nd rcu-lock, >> > and we simply can't trust ->thread_group.next. >> > >> > But I didn't notice that cgroup_attach_task(tsk, threadgroup) can only >> > be called with threadgroup == T when a) tsk is ->group_leader and b) >> > we hold threadgroup_lock() which blocks de_thread(). IOW, in this case >> > "tsk" can't be removed from ->thread_group list before other threads. >> > >> > If next_thread() sees thread_group.next != leader, we know that the >> > that .next thread didn't do __unhash_process() yet, and since we >> > know that in this case "leader" didn't do this too we are safe. >> > >> > In short: __unhash_process(leader) (in this) case can never change >> > ->thread_group.next of another thread, because leader->thread_group >> > should be already list_empty(). >> > >> >> If threadgroup == false, and if the tsk is existing or is already in >> the targeted cgroup, we won't break the loop due to the bug but do >> this: >> >> while_each_thread(task, t) >> >> If @task isn't the leader, we might got stuck in the loop? > > Yes, yes, sure. We need to fix the wrong "continue" logic, hopefully > > I tried to say (see above) that after we do this while_each_thread() > should be fine in this particular case. > > Oleg. > > ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) @ 2013-10-15 13:34 ` Tejun Heo 0 siblings, 0 replies; 23+ messages in thread From: Tejun Heo @ 2013-10-15 13:34 UTC (permalink / raw) To: anjana vk; +Cc: Oleg Nesterov, Li Zefan, cgroups, linux-kernel, eunki_kim On Tue, Oct 15, 2013 at 10:34:04AM +0530, anjana vk wrote: > Hi All, > > Thankyou for your suggestions. > I have made the modifications as suggested. I already applied the version Li posted (as you the author, of course). Thanks a lot! -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) @ 2013-10-15 13:34 ` Tejun Heo 0 siblings, 0 replies; 23+ messages in thread From: Tejun Heo @ 2013-10-15 13:34 UTC (permalink / raw) To: anjana vk Cc: Oleg Nesterov, Li Zefan, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, eunki_kim-Sze3O3UU22JBDgjK7y7TUQ On Tue, Oct 15, 2013 at 10:34:04AM +0530, anjana vk wrote: > Hi All, > > Thankyou for your suggestions. > I have made the modifications as suggested. I already applied the version Li posted (as you the author, of course). Thanks a lot! -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <CAChhN7RerxpSadqyosUeSKFg+qcOpO4d-maEKBZ0rvOQGvN27g@mail.gmail.com>]
[parent not found: <CAChhN7RerxpSadqyosUeSKFg+qcOpO4d-maEKBZ0rvOQGvN27g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: cgroup attach task - slogging cpu [not found] ` <CAChhN7RerxpSadqyosUeSKFg+qcOpO4d-maEKBZ0rvOQGvN27g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-10-10 4:22 ` anjana vk 2013-10-10 11:11 ` Oleg Nesterov 1 sibling, 0 replies; 23+ messages in thread From: anjana vk @ 2013-10-10 4:22 UTC (permalink / raw) To: Anjana V Kumar Cc: Oleg Nesterov, Li Zefan, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA +cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 10/10/13, Anjana V Kumar <anjanavk12-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Hi All, > > Please find the attached patch making the changes for the case where thread > is already in the required cgroup. > > Can you please let me know if this is the right way to fix this issue, if > any modifications are required. > > Thanks and Regards > Anjana > > > On Wed, Oct 9, 2013 at 7:35 PM, Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > >> On 10/09, Oleg Nesterov wrote: >> > >> > On 10/09, Li Zefan wrote: >> > > >> > > Anjana, could you revise the patch and send it out with proper >> changelog >> > > and Signed-off-by? And please add "Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # >> 3.9+" >> > >> > Yes, Anjana, please! >> >> Please note also that the PF_EXITING check has the same problem, it also >> needs "goto next". >> >> > > > check in the main loop. So Anjana was right (sorry again!), and we >> > > > should probably do >> > > > >> > > > ent.cgrp = task_cgroup_from_root(...); >> > > > if (ent.cgrp != cgrp) { >> > > > retval = flex_array_put(...); >> > > > ... >> > > > } >> > > > >> > > > if (!threadgroup) >> > > > break; >> > > > >> > > >> > > Or >> > > >> > > do { >> > > ... >> > > if (ent.cgrp == cgrp) >> > > goto next; >> > >> > Or this, agreed. >> > >> > > > Or I am wrong again? >> > > >> > > No, you are not! :) >> > >> > Thanks ;) >> > >> > Oleg. >> >> > > > -- > Anjana > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: cgroup attach task - slogging cpu [not found] ` <CAChhN7RerxpSadqyosUeSKFg+qcOpO4d-maEKBZ0rvOQGvN27g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-10-10 4:22 ` cgroup attach task - slogging cpu anjana vk @ 2013-10-10 11:11 ` Oleg Nesterov 1 sibling, 0 replies; 23+ messages in thread From: Oleg Nesterov @ 2013-10-10 11:11 UTC (permalink / raw) To: Anjana V Kumar Cc: Li Zefan, Tejun Heo, anjana vk, cgroups-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA Hi Anjana, > On 10/10, Anjana V Kumar wrote: > > > > > Problem: > Issue when attaching a single thread to a cgroup if the thread was alredy in the > cgroup. The check if the thread is already in cgroup in the above case, > continues to the next thread instead of exciting. Yes. Thanks. > @@ -2047,7 +2047,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, > ent.cgrp = task_cgroup_from_root(tsk, root); > /* nothing to do if this task is already in the cgroup */ > if (ent.cgrp == cgrp) > - continue; > + goto next_thread; > /* > * saying GFP_ATOMIC has no effect here because we did prealloc > * earlier, but it's good form to communicate our expectations. > @@ -2055,7 +2055,7 @@ static int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk, > retval = flex_array_put(group, i, &ent, GFP_ATOMIC); > BUG_ON(retval != 0); > i++; > - > +next_thread: Yes, but you forgot to fix another "continue" after "ent.cgrp == cgrp" ;) Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2013-10-15 13:34 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-10-04 5:25 cgroup attach task - slogging cpu anjana vk [not found] ` <CALPf4Tz+Gf_Q7wKKBufCc1mtV1qVPVrOW0S1qhHxfOv6pJa2Kg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-10-04 13:02 ` Oleg Nesterov [not found] ` <20131004130207.GA9338-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-10-07 18:45 ` Tejun Heo [not found] ` <20131007184507.GD27396-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 2013-10-08 14:58 ` Oleg Nesterov [not found] ` <20131008145833.GA15600-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-10-09 5:35 ` Li Zefan [not found] ` <5254EB2A.7090803-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> 2013-10-09 13:30 ` Oleg Nesterov [not found] ` <20131009133047.GA12414-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-10-09 14:05 ` Oleg Nesterov 2013-10-09 16:54 ` cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) Oleg Nesterov 2013-10-09 16:54 ` Oleg Nesterov 2013-10-11 13:15 ` Li Zefan 2013-10-11 13:15 ` Li Zefan 2013-10-11 16:00 ` Oleg Nesterov 2013-10-11 16:00 ` Oleg Nesterov 2013-10-12 2:59 ` [PATCH] cgroup: fix to break the while loop in cgroup_attach_task() correctly Li Zefan 2013-10-12 2:59 ` Li Zefan 2013-10-13 20:08 ` Tejun Heo 2013-10-13 20:08 ` Tejun Heo 2013-10-15 5:04 ` cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu) anjana vk 2013-10-15 5:04 ` anjana vk 2013-10-15 13:34 ` Tejun Heo 2013-10-15 13:34 ` Tejun Heo [not found] ` <CAChhN7RerxpSadqyosUeSKFg+qcOpO4d-maEKBZ0rvOQGvN27g@mail.gmail.com> [not found] ` <CAChhN7RerxpSadqyosUeSKFg+qcOpO4d-maEKBZ0rvOQGvN27g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-10-10 4:22 ` cgroup attach task - slogging cpu anjana vk 2013-10-10 11:11 ` Oleg Nesterov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.