* [PATCH] Fix race between oom kill and task exit @ 2013-11-28 5:09 Ma, Xindong 2013-11-28 6:35 ` Johannes Weiner 2013-11-28 18:37 ` Oleg Nesterov 0 siblings, 2 replies; 17+ messages in thread From: Ma, Xindong @ 2013-11-28 5:09 UTC (permalink / raw) To: akpm, mhocko, rientjes, hannes, rusty, linux-mm, linux-kernel, 'Peter Zijlstra', 'gregkh@linuxfoundation.org' Cc: Ma, Xindong, Tu, Xiaobing From: Leon Ma <xindong.ma@intel.com> Date: Thu, 28 Nov 2013 12:46:09 +0800 Subject: [PATCH] Fix race between oom kill and task exit There is a race between oom kill and task exit. Scenario is: TASK A TASK B TASK B is selected to oom kill in oom_kill_process() check PF_EXITING of TASK B task call do_exit() task set PF_EXITING flag write_lock_irq(&tasklist_lock); remove TASK B from thread group in __unhash_process() write_unlock_irq(&tasklist_lock); read_lock(&tasklist_lock); traverse threads of TASK B read_unlock(&tasklist_lock); After that, the following traversal of threads in TASK B will not end because TASK B is not in the thread group: do { .... } while_each_thread(p, t); Signed-off-by: Leon Ma <xindong.ma@intel.com> Signed-off-by: xiaobing tu <xiaobing.tu@intel.com> --- mm/oom_kill.c | 20 ++++++++++---------- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1e4a600..32ec88d 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -412,16 +412,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); - /* - * If the task is already exiting, don't alarm the sysadmin or kill - * its children or threads, just set TIF_MEMDIE so it can die quickly - */ - if (p->flags & PF_EXITING) { - set_tsk_thread_flag(p, TIF_MEMDIE); - put_task_struct(p); - return; - } - if (__ratelimit(&oom_rs)) dump_header(p, gfp_mask, order, memcg, nodemask); @@ -437,6 +427,16 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, * still freeing memory. */ read_lock(&tasklist_lock); + /* + * If the task is already exiting, don't alarm the sysadmin or kill + * its children or threads, just set TIF_MEMDIE so it can die quickly + */ + if (p->flags & PF_EXITING) { + set_tsk_thread_flag(p, TIF_MEMDIE); + put_task_struct(p); + read_unlock(&tasklist_lock); + return; + } do { list_for_each_entry(child, &t->children, sibling) { unsigned int child_points; -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-11-28 5:09 [PATCH] Fix race between oom kill and task exit Ma, Xindong @ 2013-11-28 6:35 ` Johannes Weiner 2013-11-28 8:41 ` William Dauchy 2013-11-28 9:18 ` azurIt 2013-11-28 18:37 ` Oleg Nesterov 1 sibling, 2 replies; 17+ messages in thread From: Johannes Weiner @ 2013-11-28 6:35 UTC (permalink / raw) To: Ma, Xindong Cc: akpm, mhocko, rientjes, rusty, linux-mm, linux-kernel, 'Peter Zijlstra', 'gregkh@linuxfoundation.org', Tu, Xiaobing, William Dauchy, azurIt Cc William and azur who might have encountered this problem. On Thu, Nov 28, 2013 at 05:09:16AM +0000, Ma, Xindong wrote: > From: Leon Ma <xindong.ma@intel.com> > Date: Thu, 28 Nov 2013 12:46:09 +0800 > Subject: [PATCH] Fix race between oom kill and task exit > > There is a race between oom kill and task exit. Scenario is: > TASK A TASK B > TASK B is selected to oom kill > in oom_kill_process() > check PF_EXITING of TASK B > task call do_exit() > task set PF_EXITING flag > write_lock_irq(&tasklist_lock); > remove TASK B from thread group in __unhash_process() > write_unlock_irq(&tasklist_lock); > read_lock(&tasklist_lock); > traverse threads of TASK B > read_unlock(&tasklist_lock); > > After that, the following traversal of threads in TASK B will not end because TASK B is not in the thread group: > do { > .... > } while_each_thread(p, t); > > Signed-off-by: Leon Ma <xindong.ma@intel.com> > Signed-off-by: xiaobing tu <xiaobing.tu@intel.com> > --- > mm/oom_kill.c | 20 ++++++++++---------- > 1 files changed, 10 insertions(+), 10 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 1e4a600..32ec88d 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -412,16 +412,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL, > DEFAULT_RATELIMIT_BURST); > > - /* > - * If the task is already exiting, don't alarm the sysadmin or kill > - * its children or threads, just set TIF_MEMDIE so it can die quickly > - */ > - if (p->flags & PF_EXITING) { > - set_tsk_thread_flag(p, TIF_MEMDIE); > - put_task_struct(p); > - return; > - } > - > if (__ratelimit(&oom_rs)) > dump_header(p, gfp_mask, order, memcg, nodemask); > > @@ -437,6 +427,16 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > * still freeing memory. > */ > read_lock(&tasklist_lock); > + /* > + * If the task is already exiting, don't alarm the sysadmin or kill > + * its children or threads, just set TIF_MEMDIE so it can die quickly > + */ > + if (p->flags & PF_EXITING) { > + set_tsk_thread_flag(p, TIF_MEMDIE); > + put_task_struct(p); > + read_unlock(&tasklist_lock); > + return; > + } > do { > list_for_each_entry(child, &t->children, sibling) { > unsigned int child_points; > -- > 1.7.4.1 > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-11-28 6:35 ` Johannes Weiner @ 2013-11-28 8:41 ` William Dauchy 2013-11-28 12:00 ` Michal Hocko 2013-11-28 9:18 ` azurIt 1 sibling, 1 reply; 17+ messages in thread From: William Dauchy @ 2013-11-28 8:41 UTC (permalink / raw) To: Johannes Weiner Cc: Ma, Xindong, akpm, mhocko, rientjes, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda Hi Johannes, On Thu, Nov 28, 2013 at 7:35 AM, Johannes Weiner <hannes@cmpxchg.org> wrote: > Cc William and azur who might have encountered this problem. Thank you for letting me know. Note that before this patch I saw the one from Sameer Nanda mm, oom: Fix race when selecting process to kill https://lkml.org/lkml/2013/11/13/336 After applying this later one, I still have the issue I sent you (oom killing a task outside cgroup i.e Task in / killed as a result of limit of /lxc/foo) *but* I don't have a crash any more. So I was in the process of reapplying your debug patch to send you a new report. However, I'm now wondering if this present patch is a replacement of Sameer Nanda's patch or if this a complementary patch. Thanks, -- William ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-11-28 8:41 ` William Dauchy @ 2013-11-28 12:00 ` Michal Hocko 2013-11-28 17:57 ` Vladimir Murzin 2013-11-28 18:38 ` Oleg Nesterov 0 siblings, 2 replies; 17+ messages in thread From: Michal Hocko @ 2013-11-28 12:00 UTC (permalink / raw) To: William Dauchy Cc: Johannes Weiner, Ma, Xindong, akpm, rientjes, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda, Oleg Nesterov [CCing Oleg - the thread started here: https://lkml.org/lkml/2013/11/28/2] On Thu 28-11-13 09:41:40, William Dauchy wrote: [...] > However, I'm now wondering if this present patch is a replacement of > Sameer Nanda's patch or if this a complementary patch. They are both trying to solve the same issue. Neither of them is optimal unfortunately. Oleg said he would look into this and I have seen some patches but didn't geto check them. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-11-28 12:00 ` Michal Hocko @ 2013-11-28 17:57 ` Vladimir Murzin 2013-11-28 18:38 ` Oleg Nesterov 1 sibling, 0 replies; 17+ messages in thread From: Vladimir Murzin @ 2013-11-28 17:57 UTC (permalink / raw) To: Michal Hocko Cc: William Dauchy, Johannes Weiner, Ma, Xindong, akpm, rientjes, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda, Oleg Nesterov, dserrg On Thu, Nov 28, 2013 at 01:00:18PM +0100, Michal Hocko wrote: > [CCing Oleg - the thread started here: > https://lkml.org/lkml/2013/11/28/2] > > On Thu 28-11-13 09:41:40, William Dauchy wrote: > [...] > > However, I'm now wondering if this present patch is a replacement of > > Sameer Nanda's patch or if this a complementary patch. > > They are both trying to solve the same issue. Neither of them is > optimal unfortunately. Oleg said he would look into this and I have seen > some patches but didn't geto check them. CCing Sergey - he reported and proposed the patch for the similar issue. Vladimir > > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-11-28 12:00 ` Michal Hocko 2013-11-28 17:57 ` Vladimir Murzin @ 2013-11-28 18:38 ` Oleg Nesterov 2013-11-29 2:06 ` Ma, Xindong 2013-12-02 14:12 ` Oleg Nesterov 1 sibling, 2 replies; 17+ messages in thread From: Oleg Nesterov @ 2013-11-28 18:38 UTC (permalink / raw) To: Michal Hocko Cc: William Dauchy, Johannes Weiner, Ma, Xindong, akpm, rientjes, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda On 11/28, Michal Hocko wrote: > > They are both trying to solve the same issue. Neither of them is > optimal unfortunately. yes, but this one doesn't look right. > Oleg said he would look into this and I have seen > some patches but didn't geto check them. Only preparations so far. Oleg. ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH] Fix race between oom kill and task exit 2013-11-28 18:38 ` Oleg Nesterov @ 2013-11-29 2:06 ` Ma, Xindong 2013-11-29 2:08 ` Tu, Xiaobing 2013-12-02 14:12 ` Oleg Nesterov 1 sibling, 1 reply; 17+ messages in thread From: Ma, Xindong @ 2013-11-29 2:06 UTC (permalink / raw) To: Oleg Nesterov, Michal Hocko Cc: William Dauchy, Johannes Weiner, akpm, rientjes, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda > From: Oleg Nesterov [mailto:oleg@redhat.com] > Sent: Friday, November 29, 2013 2:39 AM > To: Michal Hocko > Cc: William Dauchy; Johannes Weiner; Ma, Xindong; > akpm@linux-foundation.org; rientjes@google.com; rusty@rustcorp.com.au; > linux-mm@kvack.org; linux-kernel@vger.kernel.org; Peter Zijlstra; > gregkh@linuxfoundation.org; Tu, Xiaobing; azurIt; Sameer Nanda > Subject: Re: [PATCH] Fix race between oom kill and task exit > > On 11/28, Michal Hocko wrote: > > > > They are both trying to solve the same issue. Neither of them is > > optimal unfortunately. > > yes, but this one doesn't look right. > > > Oleg said he would look into this and I have seen some patches but > > didn't geto check them. > > Only preparations so far. > > Oleg. I was not aware there's a long story for this issue. I hit this issue a lot of times during stress test and root caused it. After applying my patch, I did extensive test on 5 machines for a long time, it does not reproduced anymore so I submitted the patch. I will do more research on this issue. ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH] Fix race between oom kill and task exit 2013-11-29 2:06 ` Ma, Xindong @ 2013-11-29 2:08 ` Tu, Xiaobing 0 siblings, 0 replies; 17+ messages in thread From: Tu, Xiaobing @ 2013-11-29 2:08 UTC (permalink / raw) To: Ma, Xindong, Oleg Nesterov, Michal Hocko Cc: William Dauchy, Johannes Weiner, akpm, rientjes, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, azurIt, Sameer Nanda We will do more stress test in more machine at the same time -----Original Message----- From: Ma, Xindong Sent: Friday, November 29, 2013 10:06 AM To: Oleg Nesterov; Michal Hocko Cc: William Dauchy; Johannes Weiner; akpm@linux-foundation.org; rientjes@google.com; rusty@rustcorp.com.au; linux-mm@kvack.org; linux-kernel@vger.kernel.org; Peter Zijlstra; gregkh@linuxfoundation.org; Tu, Xiaobing; azurIt; Sameer Nanda Subject: RE: [PATCH] Fix race between oom kill and task exit > From: Oleg Nesterov [mailto:oleg@redhat.com] > Sent: Friday, November 29, 2013 2:39 AM > To: Michal Hocko > Cc: William Dauchy; Johannes Weiner; Ma, Xindong; > akpm@linux-foundation.org; rientjes@google.com; rusty@rustcorp.com.au; > linux-mm@kvack.org; linux-kernel@vger.kernel.org; Peter Zijlstra; > gregkh@linuxfoundation.org; Tu, Xiaobing; azurIt; Sameer Nanda > Subject: Re: [PATCH] Fix race between oom kill and task exit > > On 11/28, Michal Hocko wrote: > > > > They are both trying to solve the same issue. Neither of them is > > optimal unfortunately. > > yes, but this one doesn't look right. > > > Oleg said he would look into this and I have seen some patches but > > didn't geto check them. > > Only preparations so far. > > Oleg. I was not aware there's a long story for this issue. I hit this issue a lot of times during stress test and root caused it. After applying my patch, I did extensive test on 5 machines for a long time, it does not reproduced anymore so I submitted the patch. I will do more research on this issue. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-11-28 18:38 ` Oleg Nesterov 2013-11-29 2:06 ` Ma, Xindong @ 2013-12-02 14:12 ` Oleg Nesterov 2013-12-05 0:56 ` David Rientjes 1 sibling, 1 reply; 17+ messages in thread From: Oleg Nesterov @ 2013-12-02 14:12 UTC (permalink / raw) To: Michal Hocko Cc: William Dauchy, Johannes Weiner, Ma, Xindong, akpm, rientjes, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda On 11/28, Oleg Nesterov wrote: > > On 11/28, Michal Hocko wrote: > > > > They are both trying to solve the same issue. Neither of them is > > optimal unfortunately. > > yes, but this one doesn't look right. > > > Oleg said he would look into this and I have seen > > some patches but didn't geto check them. > > Only preparations so far. OK, I am going to send the initial fixes today. This means (I hope) that we do not need this or Sameer's "[PATCH] mm, oom: Fix race when selecting process to kill". Oleg. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-12-02 14:12 ` Oleg Nesterov @ 2013-12-05 0:56 ` David Rientjes 2013-12-05 17:29 ` Oleg Nesterov 0 siblings, 1 reply; 17+ messages in thread From: David Rientjes @ 2013-12-05 0:56 UTC (permalink / raw) To: Oleg Nesterov Cc: Michal Hocko, William Dauchy, Johannes Weiner, Ma, Xindong, akpm, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda On Mon, 2 Dec 2013, Oleg Nesterov wrote: > OK, I am going to send the initial fixes today. This means (I hope) > that we do not need this or Sameer's "[PATCH] mm, oom: Fix race when > selecting process to kill". > Your v2 series looks good and I suspect anybody trying them doesn't have additional reports of the infinite loop? Should they be marked for stable? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-12-05 0:56 ` David Rientjes @ 2013-12-05 17:29 ` Oleg Nesterov 2013-12-05 23:35 ` David Rientjes 0 siblings, 1 reply; 17+ messages in thread From: Oleg Nesterov @ 2013-12-05 17:29 UTC (permalink / raw) To: David Rientjes Cc: Michal Hocko, William Dauchy, Johannes Weiner, Ma, Xindong, akpm, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda On 12/04, David Rientjes wrote: > > On Mon, 2 Dec 2013, Oleg Nesterov wrote: > > > OK, I am going to send the initial fixes today. This means (I hope) > > that we do not need this or Sameer's "[PATCH] mm, oom: Fix race when > > selecting process to kill". > > Your v2 series looks good and I suspect anybody trying them doesn't have > additional reports of the infinite loop? Should they be marked for > stable? Unlikely... I think the patch from Sameer makes more sense for stable as a temporary (and obviously incomplete) fix. Oleg. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-12-05 17:29 ` Oleg Nesterov @ 2013-12-05 23:35 ` David Rientjes 2013-12-06 15:19 ` Oleg Nesterov 0 siblings, 1 reply; 17+ messages in thread From: David Rientjes @ 2013-12-05 23:35 UTC (permalink / raw) To: Oleg Nesterov, Andrew Morton Cc: Michal Hocko, William Dauchy, Johannes Weiner, Ma, Xindong, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda On Thu, 5 Dec 2013, Oleg Nesterov wrote: > > > OK, I am going to send the initial fixes today. This means (I hope) > > > that we do not need this or Sameer's "[PATCH] mm, oom: Fix race when > > > selecting process to kill". > > > > Your v2 series looks good and I suspect anybody trying them doesn't have > > additional reports of the infinite loop? Should they be marked for > > stable? > > Unlikely... > > I think the patch from Sameer makes more sense for stable as a temporary > (and obviously incomplete) fix. > There's a problem because none of this is currently even in linux-next. I think we could make a case for getting Sameer's patch at http://marc.info/?l=linux-kernel&m=138436313021133 to be merged for stable, but then we'd have to revert it in linux-next before merging your series at http://marc.info/?l=linux-kernel&m=138616217925981. All of the issues you present in that series seem to be stable material, so why not just go ahead with your series and mark it for stable for 3.13? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-12-05 23:35 ` David Rientjes @ 2013-12-06 15:19 ` Oleg Nesterov 2013-12-06 15:52 ` Oleg Nesterov 2013-12-06 17:54 ` Sameer Nanda 0 siblings, 2 replies; 17+ messages in thread From: Oleg Nesterov @ 2013-12-06 15:19 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Michal Hocko, William Dauchy, Johannes Weiner, Ma, Xindong, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda On 12/05, David Rientjes wrote: > > On Thu, 5 Dec 2013, Oleg Nesterov wrote: > > > > Your v2 series looks good and I suspect anybody trying them doesn't have > > > additional reports of the infinite loop? Should they be marked for > > > stable? > > > > Unlikely... > > > > I think the patch from Sameer makes more sense for stable as a temporary > > (and obviously incomplete) fix. > > There's a problem because none of this is currently even in linux-next. I > think we could make a case for getting Sameer's patch at > http://marc.info/?l=linux-kernel&m=138436313021133 to be merged for > stable, Probably. Ah, I just noticed that this change - if (p->flags & PF_EXITING) { + if (p->flags & PF_EXITING || !pid_alive(p)) { is not needed. !pid_alive(p) obviously implies PF_EXITING. > but then we'd have to revert it in linux-next Or perhaps Sameer can just send his fix to stable/gregkh. Just the changelog should clearly explain that this is the minimal workaround for stable. Once again it doesn't (and can't) fix all problems even in oom_kill_process() paths, but it helps anyway to avoid the easy-to-trigger hang. > before merging your > series at http://marc.info/?l=linux-kernel&m=138616217925981. Just in case, I won't mind to rediff my patches on top of Sameer's patch and then add git-revert patch. > All of the > issues you present in that series seem to be stable material, so why not > just go ahead with your series and mark it for stable for 3.13? OK... I can do this too. I do not really like this because it adds thread_head/node but doesn't remove the old ->thread_group. We will do this later, but obviously this is not the stable material. IOW, if we send this to stable, thread_head/node/for_each_thread will be only used by oom_kill.c. And this is risky. For example, 1/4 depends on (at least) another patch I sent in preparation for this change, commit 81907739851 "kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists", perhaps on something else. So personally I'd prefer to simply send the workaround for stable. Oleg. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-12-06 15:19 ` Oleg Nesterov @ 2013-12-06 15:52 ` Oleg Nesterov 2013-12-06 17:54 ` Sameer Nanda 1 sibling, 0 replies; 17+ messages in thread From: Oleg Nesterov @ 2013-12-06 15:52 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Michal Hocko, William Dauchy, Johannes Weiner, Ma, Xindong, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt, Sameer Nanda On 12/06, Oleg Nesterov wrote: > > And this is risky. For example, 1/4 depends on (at least) another patch > I sent in preparation for this change, commit 81907739851 > "kernel/fork.c:copy_process(): don't add the uninitialized > child to thread/task/pid lists", perhaps on something else. Hmm. not too much actually, I re-checked v3.10:kernel/copy_process.c. Yes, list_add(thread_node)) in copy_process() can add the new thread with the wrong pids, but somehow I forgot that list_add(thread_group) in v3.10 has the same problem, so this probably doesn't matter and we can safely backport this change. > So personally I'd prefer to simply send the workaround for stable. Yes, anyway, bacause I will sleep better ;) But OK, if you think it would be better to mark 1-4 series I sent for stable - I won't argue. Oleg. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-12-06 15:19 ` Oleg Nesterov 2013-12-06 15:52 ` Oleg Nesterov @ 2013-12-06 17:54 ` Sameer Nanda 1 sibling, 0 replies; 17+ messages in thread From: Sameer Nanda @ 2013-12-06 17:54 UTC (permalink / raw) To: Oleg Nesterov Cc: David Rientjes, Andrew Morton, Michal Hocko, William Dauchy, Johannes Weiner, Ma, Xindong, rusty, linux-mm, linux-kernel, Peter Zijlstra, Greg KH, Tu, Xiaobing, azurIt On Fri, Dec 6, 2013 at 7:19 AM, Oleg Nesterov <oleg@redhat.com> wrote: > On 12/05, David Rientjes wrote: >> >> On Thu, 5 Dec 2013, Oleg Nesterov wrote: >> >> > > Your v2 series looks good and I suspect anybody trying them doesn't have >> > > additional reports of the infinite loop? Should they be marked for >> > > stable? >> > >> > Unlikely... >> > >> > I think the patch from Sameer makes more sense for stable as a temporary >> > (and obviously incomplete) fix. >> >> There's a problem because none of this is currently even in linux-next. I >> think we could make a case for getting Sameer's patch at >> http://marc.info/?l=linux-kernel&m=138436313021133 to be merged for >> stable, > > Probably. > > Ah, I just noticed that this change > > - if (p->flags & PF_EXITING) { > + if (p->flags & PF_EXITING || !pid_alive(p)) { > > is not needed. !pid_alive(p) obviously implies PF_EXITING. Ah right. > >> but then we'd have to revert it in linux-next > > Or perhaps Sameer can just send his fix to stable/gregkh. > > Just the changelog should clearly explain that this is the minimal > workaround for stable. Once again it doesn't (and can't) fix all > problems even in oom_kill_process() paths, but it helps anyway to > avoid the easy-to-trigger hang. I don't mind doing that if that seems to be the consensus. FWIW, I've already added my patch to the Chrome OS kernel repo. > >> before merging your >> series at http://marc.info/?l=linux-kernel&m=138616217925981. > > Just in case, I won't mind to rediff my patches on top of Sameer's > patch and then add git-revert patch. > >> All of the >> issues you present in that series seem to be stable material, so why not >> just go ahead with your series and mark it for stable for 3.13? > > OK... I can do this too. > > I do not really like this because it adds thread_head/node but doesn't > remove the old ->thread_group. We will do this later, but obviously > this is not the stable material. > > IOW, if we send this to stable, thread_head/node/for_each_thread will > be only used by oom_kill.c. > > And this is risky. For example, 1/4 depends on (at least) another patch > I sent in preparation for this change, commit 81907739851 > "kernel/fork.c:copy_process(): don't add the uninitialized > child to thread/task/pid lists", perhaps on something else. > > So personally I'd prefer to simply send the workaround for stable. > > Oleg. > -- Sameer ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-11-28 6:35 ` Johannes Weiner 2013-11-28 8:41 ` William Dauchy @ 2013-11-28 9:18 ` azurIt 1 sibling, 0 replies; 17+ messages in thread From: azurIt @ 2013-11-28 9:18 UTC (permalink / raw) To: Johannes Weiner, Ma, Xindong Cc: akpm, mhocko, rientjes, rusty, linux-mm, linux-kernel, 'Peter Zijlstra', 'gregkh@linuxfoundation.org', Tu, Xiaobing, William Dauchy > Od: Johannes Weiner <hannes@cmpxchg.org> > Komu: "Ma, Xindong" <xindong.ma@intel.com> > Dátum: 28.11.2013 07:54 > Predmet: Re: [PATCH] Fix race between oom kill and task exit > > CC: "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "mhocko@suse.cz" <mhocko@suse.cz>, "rientjes@google.com" <rientjes@google.com>, "rusty@rustcorp.com.au" <rusty@rustcorp.com.au>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "'Peter Zijlstra'" <peterz@infradead.org>, "'gregkh@linuxfoundation.org'" <gregkh@linuxfoundation.org>, "Tu, Xiaobing" <xiaobing.tu@intel.com>, "William Dauchy" <wdauchy@gmail.com> >Cc William and azur who might have encountered this problem. > >On Thu, Nov 28, 2013 at 05:09:16AM +0000, Ma, Xindong wrote: >> From: Leon Ma <xindong.ma@intel.com> >> Date: Thu, 28 Nov 2013 12:46:09 +0800 >> Subject: [PATCH] Fix race between oom kill and task exit >> >> There is a race between oom kill and task exit. Scenario is: >> TASK A TASK B >> TASK B is selected to oom kill >> in oom_kill_process() >> check PF_EXITING of TASK B >> task call do_exit() >> task set PF_EXITING flag >> write_lock_irq(&tasklist_lock); >> remove TASK B from thread group in __unhash_process() >> write_unlock_irq(&tasklist_lock); >> read_lock(&tasklist_lock); >> traverse threads of TASK B >> read_unlock(&tasklist_lock); >> >> After that, the following traversal of threads in TASK B will not end because TASK B is not in the thread group: >> do { >> .... >> } while_each_thread(p, t); >> >> Signed-off-by: Leon Ma <xindong.ma@intel.com> >> Signed-off-by: xiaobing tu <xiaobing.tu@intel.com> >> --- >> mm/oom_kill.c | 20 ++++++++++---------- >> 1 files changed, 10 insertions(+), 10 deletions(-) >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >> index 1e4a600..32ec88d 100644 >> --- a/mm/oom_kill.c >> +++ b/mm/oom_kill.c >> @@ -412,16 +412,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, >> static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL, >> DEFAULT_RATELIMIT_BURST); >> >> - /* >> - * If the task is already exiting, don't alarm the sysadmin or kill >> - * its children or threads, just set TIF_MEMDIE so it can die quickly >> - */ >> - if (p->flags & PF_EXITING) { >> - set_tsk_thread_flag(p, TIF_MEMDIE); >> - put_task_struct(p); >> - return; >> - } >> - >> if (__ratelimit(&oom_rs)) >> dump_header(p, gfp_mask, order, memcg, nodemask); >> >> @@ -437,6 +427,16 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, >> * still freeing memory. >> */ >> read_lock(&tasklist_lock); >> + /* >> + * If the task is already exiting, don't alarm the sysadmin or kill >> + * its children or threads, just set TIF_MEMDIE so it can die quickly >> + */ >> + if (p->flags & PF_EXITING) { >> + set_tsk_thread_flag(p, TIF_MEMDIE); >> + put_task_struct(p); >> + read_unlock(&tasklist_lock); >> + return; >> + } >> do { >> list_for_each_entry(child, &t->children, sibling) { >> unsigned int child_points; >> -- >> 1.7.4.1 >> > Hi Johannes, thank you very much! Fortunately, i didn't notice anything like this yet. azur ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Fix race between oom kill and task exit 2013-11-28 5:09 [PATCH] Fix race between oom kill and task exit Ma, Xindong 2013-11-28 6:35 ` Johannes Weiner @ 2013-11-28 18:37 ` Oleg Nesterov 1 sibling, 0 replies; 17+ messages in thread From: Oleg Nesterov @ 2013-11-28 18:37 UTC (permalink / raw) To: Ma, Xindong, Sameer Nanda Cc: akpm, mhocko, rientjes, hannes, rusty, linux-mm, linux-kernel, 'Peter Zijlstra', 'gregkh@linuxfoundation.org', Tu, Xiaobing On 11/28, Ma, Xindong wrote: > > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -412,16 +412,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL, > DEFAULT_RATELIMIT_BURST); > > - /* > - * If the task is already exiting, don't alarm the sysadmin or kill > - * its children or threads, just set TIF_MEMDIE so it can die quickly > - */ > - if (p->flags & PF_EXITING) { > - set_tsk_thread_flag(p, TIF_MEMDIE); > - put_task_struct(p); > - return; > - } > - > if (__ratelimit(&oom_rs)) > dump_header(p, gfp_mask, order, memcg, nodemask); > > @@ -437,6 +427,16 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > * still freeing memory. > */ > read_lock(&tasklist_lock); > + /* > + * If the task is already exiting, don't alarm the sysadmin or kill > + * its children or threads, just set TIF_MEMDIE so it can die quickly > + */ > + if (p->flags & PF_EXITING) { > + set_tsk_thread_flag(p, TIF_MEMDIE); > + put_task_struct(p); > + read_unlock(&tasklist_lock); > + return; > + } I got lost... didn't we recently discussed the similar patch from Sameer? This one doesn't look right. find_lock_task_mm() after unlock(tasklist) can hit the same problem. I belive the patch from Sameer was correct. Oleg. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2013-12-06 17:55 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-11-28 5:09 [PATCH] Fix race between oom kill and task exit Ma, Xindong 2013-11-28 6:35 ` Johannes Weiner 2013-11-28 8:41 ` William Dauchy 2013-11-28 12:00 ` Michal Hocko 2013-11-28 17:57 ` Vladimir Murzin 2013-11-28 18:38 ` Oleg Nesterov 2013-11-29 2:06 ` Ma, Xindong 2013-11-29 2:08 ` Tu, Xiaobing 2013-12-02 14:12 ` Oleg Nesterov 2013-12-05 0:56 ` David Rientjes 2013-12-05 17:29 ` Oleg Nesterov 2013-12-05 23:35 ` David Rientjes 2013-12-06 15:19 ` Oleg Nesterov 2013-12-06 15:52 ` Oleg Nesterov 2013-12-06 17:54 ` Sameer Nanda 2013-11-28 9:18 ` azurIt 2013-11-28 18:37 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).