linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix race between oom kill and task exit
@ 2013-11-28  5:09 Ma, Xindong
  2013-11-28  6:35 ` Johannes Weiner
  2013-11-28 18:37 ` Oleg Nesterov
  0 siblings, 2 replies; 17+ messages in thread
From: Ma, Xindong @ 2013-11-28  5:09 UTC (permalink / raw)
  To: akpm, mhocko, rientjes, hannes, rusty, linux-mm, linux-kernel,
	'Peter Zijlstra', 'gregkh@linuxfoundation.org'
  Cc: Ma, Xindong, Tu, Xiaobing

From: Leon Ma <xindong.ma@intel.com>
Date: Thu, 28 Nov 2013 12:46:09 +0800
Subject: [PATCH] Fix race between oom kill and task exit

There is a race between oom kill and task exit. Scenario is:
   TASK  A                      TASK  B
TASK B is selected to oom kill
in oom_kill_process()
check PF_EXITING of TASK B
                            task call do_exit()
                            task set PF_EXITING flag
                            write_lock_irq(&tasklist_lock);
                            remove TASK B from thread group in __unhash_process()
                            write_unlock_irq(&tasklist_lock);
read_lock(&tasklist_lock);
traverse threads of TASK B
read_unlock(&tasklist_lock);

After that, the following traversal of threads in TASK B will not end because TASK B is not in the thread group:
do {
....
} while_each_thread(p, t);

Signed-off-by: Leon Ma <xindong.ma@intel.com>
Signed-off-by: xiaobing tu <xiaobing.tu@intel.com>
---
 mm/oom_kill.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1e4a600..32ec88d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -412,16 +412,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
 					      DEFAULT_RATELIMIT_BURST);
 
-	/*
-	 * If the task is already exiting, don't alarm the sysadmin or kill
-	 * its children or threads, just set TIF_MEMDIE so it can die quickly
-	 */
-	if (p->flags & PF_EXITING) {
-		set_tsk_thread_flag(p, TIF_MEMDIE);
-		put_task_struct(p);
-		return;
-	}
-
 	if (__ratelimit(&oom_rs))
 		dump_header(p, gfp_mask, order, memcg, nodemask);
 
@@ -437,6 +427,16 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 * still freeing memory.
 	 */
 	read_lock(&tasklist_lock);
+	/*
+	 * If the task is already exiting, don't alarm the sysadmin or kill
+	 * its children or threads, just set TIF_MEMDIE so it can die quickly
+	 */
+	if (p->flags & PF_EXITING) {
+		set_tsk_thread_flag(p, TIF_MEMDIE);
+		put_task_struct(p);
+		read_unlock(&tasklist_lock);
+		return;
+	}
 	do {
 		list_for_each_entry(child, &t->children, sibling) {
 			unsigned int child_points;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-11-28  5:09 [PATCH] Fix race between oom kill and task exit Ma, Xindong
@ 2013-11-28  6:35 ` Johannes Weiner
  2013-11-28  8:41   ` William Dauchy
  2013-11-28  9:18   ` azurIt
  2013-11-28 18:37 ` Oleg Nesterov
  1 sibling, 2 replies; 17+ messages in thread
From: Johannes Weiner @ 2013-11-28  6:35 UTC (permalink / raw)
  To: Ma, Xindong
  Cc: akpm, mhocko, rientjes, rusty, linux-mm, linux-kernel,
	'Peter Zijlstra', 'gregkh@linuxfoundation.org',
	Tu, Xiaobing, William Dauchy, azurIt

Cc William and azur who might have encountered this problem.

On Thu, Nov 28, 2013 at 05:09:16AM +0000, Ma, Xindong wrote:
> From: Leon Ma <xindong.ma@intel.com>
> Date: Thu, 28 Nov 2013 12:46:09 +0800
> Subject: [PATCH] Fix race between oom kill and task exit
> 
> There is a race between oom kill and task exit. Scenario is:
>    TASK  A                      TASK  B
> TASK B is selected to oom kill
> in oom_kill_process()
> check PF_EXITING of TASK B
>                             task call do_exit()
>                             task set PF_EXITING flag
>                             write_lock_irq(&tasklist_lock);
>                             remove TASK B from thread group in __unhash_process()
>                             write_unlock_irq(&tasklist_lock);
> read_lock(&tasklist_lock);
> traverse threads of TASK B
> read_unlock(&tasklist_lock);
> 
> After that, the following traversal of threads in TASK B will not end because TASK B is not in the thread group:
> do {
> ....
> } while_each_thread(p, t);
> 
> Signed-off-by: Leon Ma <xindong.ma@intel.com>
> Signed-off-by: xiaobing tu <xiaobing.tu@intel.com>
> ---
>  mm/oom_kill.c |   20 ++++++++++----------
>  1 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 1e4a600..32ec88d 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -412,16 +412,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
>  					      DEFAULT_RATELIMIT_BURST);
>  
> -	/*
> -	 * If the task is already exiting, don't alarm the sysadmin or kill
> -	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> -	 */
> -	if (p->flags & PF_EXITING) {
> -		set_tsk_thread_flag(p, TIF_MEMDIE);
> -		put_task_struct(p);
> -		return;
> -	}
> -
>  	if (__ratelimit(&oom_rs))
>  		dump_header(p, gfp_mask, order, memcg, nodemask);
>  
> @@ -437,6 +427,16 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  	 * still freeing memory.
>  	 */
>  	read_lock(&tasklist_lock);
> +	/*
> +	 * If the task is already exiting, don't alarm the sysadmin or kill
> +	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> +	 */
> +	if (p->flags & PF_EXITING) {
> +		set_tsk_thread_flag(p, TIF_MEMDIE);
> +		put_task_struct(p);
> +		read_unlock(&tasklist_lock);
> +		return;
> +	}
>  	do {
>  		list_for_each_entry(child, &t->children, sibling) {
>  			unsigned int child_points;
> -- 
> 1.7.4.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-11-28  6:35 ` Johannes Weiner
@ 2013-11-28  8:41   ` William Dauchy
  2013-11-28 12:00     ` Michal Hocko
  2013-11-28  9:18   ` azurIt
  1 sibling, 1 reply; 17+ messages in thread
From: William Dauchy @ 2013-11-28  8:41 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Ma, Xindong, akpm, mhocko, rientjes, rusty, linux-mm,
	linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt,
	Sameer Nanda

Hi Johannes,

On Thu, Nov 28, 2013 at 7:35 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> Cc William and azur who might have encountered this problem.

Thank you for letting me know.
Note that before this patch I saw the one from Sameer Nanda
mm, oom: Fix race when selecting process to kill
https://lkml.org/lkml/2013/11/13/336

After applying this later one, I still have the issue I sent you (oom
killing a task outside cgroup i.e Task in / killed as a result of
limit of /lxc/foo) *but* I don't have a crash any more. So I was in
the process of reapplying your debug patch to send you a new report.

However, I'm now wondering if this present patch is a replacement of
Sameer Nanda's patch or if this a complementary patch.

Thanks,
-- 
William

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-11-28  6:35 ` Johannes Weiner
  2013-11-28  8:41   ` William Dauchy
@ 2013-11-28  9:18   ` azurIt
  1 sibling, 0 replies; 17+ messages in thread
From: azurIt @ 2013-11-28  9:18 UTC (permalink / raw)
  To: Johannes Weiner, Ma, Xindong
  Cc: akpm, mhocko, rientjes, rusty, linux-mm, linux-kernel,
	'Peter Zijlstra', 'gregkh@linuxfoundation.org',
	Tu, Xiaobing, William Dauchy

> Od: Johannes Weiner <hannes@cmpxchg.org>
> Komu: "Ma, Xindong" <xindong.ma@intel.com>
> Dátum: 28.11.2013 07:54
> Predmet: Re: [PATCH] Fix race between oom kill and task exit
>
> CC: "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "mhocko@suse.cz" <mhocko@suse.cz>, "rientjes@google.com" <rientjes@google.com>, "rusty@rustcorp.com.au" <rusty@rustcorp.com.au>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "'Peter Zijlstra'" <peterz@infradead.org>, "'gregkh@linuxfoundation.org'" <gregkh@linuxfoundation.org>, "Tu, Xiaobing" <xiaobing.tu@intel.com>, "William Dauchy" <wdauchy@gmail.com>
>Cc William and azur who might have encountered this problem.
>
>On Thu, Nov 28, 2013 at 05:09:16AM +0000, Ma, Xindong wrote:
>> From: Leon Ma <xindong.ma@intel.com>
>> Date: Thu, 28 Nov 2013 12:46:09 +0800
>> Subject: [PATCH] Fix race between oom kill and task exit
>> 
>> There is a race between oom kill and task exit. Scenario is:
>>    TASK  A                      TASK  B
>> TASK B is selected to oom kill
>> in oom_kill_process()
>> check PF_EXITING of TASK B
>>                             task call do_exit()
>>                             task set PF_EXITING flag
>>                             write_lock_irq(&tasklist_lock);
>>                             remove TASK B from thread group in __unhash_process()
>>                             write_unlock_irq(&tasklist_lock);
>> read_lock(&tasklist_lock);
>> traverse threads of TASK B
>> read_unlock(&tasklist_lock);
>> 
>> After that, the following traversal of threads in TASK B will not end because TASK B is not in the thread group:
>> do {
>> ....
>> } while_each_thread(p, t);
>> 
>> Signed-off-by: Leon Ma <xindong.ma@intel.com>
>> Signed-off-by: xiaobing tu <xiaobing.tu@intel.com>
>> ---
>>  mm/oom_kill.c |   20 ++++++++++----------
>>  1 files changed, 10 insertions(+), 10 deletions(-)
>> 
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 1e4a600..32ec88d 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -412,16 +412,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>>  	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
>>  					      DEFAULT_RATELIMIT_BURST);
>>  
>> -	/*
>> -	 * If the task is already exiting, don't alarm the sysadmin or kill
>> -	 * its children or threads, just set TIF_MEMDIE so it can die quickly
>> -	 */
>> -	if (p->flags & PF_EXITING) {
>> -		set_tsk_thread_flag(p, TIF_MEMDIE);
>> -		put_task_struct(p);
>> -		return;
>> -	}
>> -
>>  	if (__ratelimit(&oom_rs))
>>  		dump_header(p, gfp_mask, order, memcg, nodemask);
>>  
>> @@ -437,6 +427,16 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>>  	 * still freeing memory.
>>  	 */
>>  	read_lock(&tasklist_lock);
>> +	/*
>> +	 * If the task is already exiting, don't alarm the sysadmin or kill
>> +	 * its children or threads, just set TIF_MEMDIE so it can die quickly
>> +	 */
>> +	if (p->flags & PF_EXITING) {
>> +		set_tsk_thread_flag(p, TIF_MEMDIE);
>> +		put_task_struct(p);
>> +		read_unlock(&tasklist_lock);
>> +		return;
>> +	}
>>  	do {
>>  		list_for_each_entry(child, &t->children, sibling) {
>>  			unsigned int child_points;
>> -- 
>> 1.7.4.1
>> 
>




Hi Johannes,

thank you very much! Fortunately, i didn't notice anything like this yet.

azur

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-11-28  8:41   ` William Dauchy
@ 2013-11-28 12:00     ` Michal Hocko
  2013-11-28 17:57       ` Vladimir Murzin
  2013-11-28 18:38       ` Oleg Nesterov
  0 siblings, 2 replies; 17+ messages in thread
From: Michal Hocko @ 2013-11-28 12:00 UTC (permalink / raw)
  To: William Dauchy
  Cc: Johannes Weiner, Ma, Xindong, akpm, rientjes, rusty, linux-mm,
	linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt,
	Sameer Nanda, Oleg Nesterov

[CCing Oleg - the thread started here:
https://lkml.org/lkml/2013/11/28/2]

On Thu 28-11-13 09:41:40, William Dauchy wrote:
[...]
> However, I'm now wondering if this present patch is a replacement of
> Sameer Nanda's patch or if this a complementary patch.

They are both trying to solve the same issue. Neither of them is
optimal unfortunately. Oleg said he would look into this and I have seen
some patches but didn't geto check them.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-11-28 12:00     ` Michal Hocko
@ 2013-11-28 17:57       ` Vladimir Murzin
  2013-11-28 18:38       ` Oleg Nesterov
  1 sibling, 0 replies; 17+ messages in thread
From: Vladimir Murzin @ 2013-11-28 17:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: William Dauchy, Johannes Weiner, Ma, Xindong, akpm, rientjes,
	rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu,
	Xiaobing, azurIt, Sameer Nanda, Oleg Nesterov, dserrg

On Thu, Nov 28, 2013 at 01:00:18PM +0100, Michal Hocko wrote:
> [CCing Oleg - the thread started here:
> https://lkml.org/lkml/2013/11/28/2]
> 
> On Thu 28-11-13 09:41:40, William Dauchy wrote:
> [...]
> > However, I'm now wondering if this present patch is a replacement of
> > Sameer Nanda's patch or if this a complementary patch.
> 
> They are both trying to solve the same issue. Neither of them is
> optimal unfortunately. Oleg said he would look into this and I have seen
> some patches but didn't geto check them.

CCing Sergey - he reported and proposed the patch for the similar issue.

Vladimir

> 
> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-11-28  5:09 [PATCH] Fix race between oom kill and task exit Ma, Xindong
  2013-11-28  6:35 ` Johannes Weiner
@ 2013-11-28 18:37 ` Oleg Nesterov
  1 sibling, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2013-11-28 18:37 UTC (permalink / raw)
  To: Ma, Xindong, Sameer Nanda
  Cc: akpm, mhocko, rientjes, hannes, rusty, linux-mm, linux-kernel,
	'Peter Zijlstra', 'gregkh@linuxfoundation.org',
	Tu, Xiaobing

On 11/28, Ma, Xindong wrote:
>
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -412,16 +412,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
>  					      DEFAULT_RATELIMIT_BURST);
>  
> -	/*
> -	 * If the task is already exiting, don't alarm the sysadmin or kill
> -	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> -	 */
> -	if (p->flags & PF_EXITING) {
> -		set_tsk_thread_flag(p, TIF_MEMDIE);
> -		put_task_struct(p);
> -		return;
> -	}
> -
>  	if (__ratelimit(&oom_rs))
>  		dump_header(p, gfp_mask, order, memcg, nodemask);
>  
> @@ -437,6 +427,16 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  	 * still freeing memory.
>  	 */
>  	read_lock(&tasklist_lock);
> +	/*
> +	 * If the task is already exiting, don't alarm the sysadmin or kill
> +	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> +	 */
> +	if (p->flags & PF_EXITING) {
> +		set_tsk_thread_flag(p, TIF_MEMDIE);
> +		put_task_struct(p);
> +		read_unlock(&tasklist_lock);
> +		return;
> +	}

I got lost... didn't we recently discussed the similar patch from Sameer?

This one doesn't look right. find_lock_task_mm() after unlock(tasklist)
can hit the same problem.

I belive the patch from Sameer was correct.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-11-28 12:00     ` Michal Hocko
  2013-11-28 17:57       ` Vladimir Murzin
@ 2013-11-28 18:38       ` Oleg Nesterov
  2013-11-29  2:06         ` Ma, Xindong
  2013-12-02 14:12         ` Oleg Nesterov
  1 sibling, 2 replies; 17+ messages in thread
From: Oleg Nesterov @ 2013-11-28 18:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: William Dauchy, Johannes Weiner, Ma, Xindong, akpm, rientjes,
	rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu,
	Xiaobing, azurIt, Sameer Nanda

On 11/28, Michal Hocko wrote:
>
> They are both trying to solve the same issue. Neither of them is
> optimal unfortunately.

yes, but this one doesn't look right.

> Oleg said he would look into this and I have seen
> some patches but didn't geto check them.

Only preparations so far.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] Fix race between oom kill and task exit
  2013-11-28 18:38       ` Oleg Nesterov
@ 2013-11-29  2:06         ` Ma, Xindong
  2013-11-29  2:08           ` Tu, Xiaobing
  2013-12-02 14:12         ` Oleg Nesterov
  1 sibling, 1 reply; 17+ messages in thread
From: Ma, Xindong @ 2013-11-29  2:06 UTC (permalink / raw)
  To: Oleg Nesterov, Michal Hocko
  Cc: William Dauchy, Johannes Weiner, akpm, rientjes, rusty, linux-mm,
	linux-kernel, Peter Zijlstra, gregkh, Tu, Xiaobing, azurIt,
	Sameer Nanda

> From: Oleg Nesterov [mailto:oleg@redhat.com]
> Sent: Friday, November 29, 2013 2:39 AM
> To: Michal Hocko
> Cc: William Dauchy; Johannes Weiner; Ma, Xindong;
> akpm@linux-foundation.org; rientjes@google.com; rusty@rustcorp.com.au;
> linux-mm@kvack.org; linux-kernel@vger.kernel.org; Peter Zijlstra;
> gregkh@linuxfoundation.org; Tu, Xiaobing; azurIt; Sameer Nanda
> Subject: Re: [PATCH] Fix race between oom kill and task exit
> 
> On 11/28, Michal Hocko wrote:
> >
> > They are both trying to solve the same issue. Neither of them is
> > optimal unfortunately.
> 
> yes, but this one doesn't look right.
> 
> > Oleg said he would look into this and I have seen some patches but
> > didn't geto check them.
> 
> Only preparations so far.
> 
> Oleg.

I was not aware there's a long story for this issue. I hit this issue a lot of times during stress test and root caused it. After applying my patch, I did extensive test on 5 machines for a long time, it does not reproduced anymore so I submitted the patch.

I will do more research on this issue.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] Fix race between oom kill and task exit
  2013-11-29  2:06         ` Ma, Xindong
@ 2013-11-29  2:08           ` Tu, Xiaobing
  0 siblings, 0 replies; 17+ messages in thread
From: Tu, Xiaobing @ 2013-11-29  2:08 UTC (permalink / raw)
  To: Ma, Xindong, Oleg Nesterov, Michal Hocko
  Cc: William Dauchy, Johannes Weiner, akpm, rientjes, rusty, linux-mm,
	linux-kernel, Peter Zijlstra, gregkh, azurIt, Sameer Nanda

We will do more stress test in more machine at the same time

-----Original Message-----
From: Ma, Xindong 
Sent: Friday, November 29, 2013 10:06 AM
To: Oleg Nesterov; Michal Hocko
Cc: William Dauchy; Johannes Weiner; akpm@linux-foundation.org; rientjes@google.com; rusty@rustcorp.com.au; linux-mm@kvack.org; linux-kernel@vger.kernel.org; Peter Zijlstra; gregkh@linuxfoundation.org; Tu, Xiaobing; azurIt; Sameer Nanda
Subject: RE: [PATCH] Fix race between oom kill and task exit

> From: Oleg Nesterov [mailto:oleg@redhat.com]
> Sent: Friday, November 29, 2013 2:39 AM
> To: Michal Hocko
> Cc: William Dauchy; Johannes Weiner; Ma, Xindong; 
> akpm@linux-foundation.org; rientjes@google.com; rusty@rustcorp.com.au; 
> linux-mm@kvack.org; linux-kernel@vger.kernel.org; Peter Zijlstra; 
> gregkh@linuxfoundation.org; Tu, Xiaobing; azurIt; Sameer Nanda
> Subject: Re: [PATCH] Fix race between oom kill and task exit
> 
> On 11/28, Michal Hocko wrote:
> >
> > They are both trying to solve the same issue. Neither of them is 
> > optimal unfortunately.
> 
> yes, but this one doesn't look right.
> 
> > Oleg said he would look into this and I have seen some patches but 
> > didn't geto check them.
> 
> Only preparations so far.
> 
> Oleg.

I was not aware there's a long story for this issue. I hit this issue a lot of times during stress test and root caused it. After applying my patch, I did extensive test on 5 machines for a long time, it does not reproduced anymore so I submitted the patch.

I will do more research on this issue.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-11-28 18:38       ` Oleg Nesterov
  2013-11-29  2:06         ` Ma, Xindong
@ 2013-12-02 14:12         ` Oleg Nesterov
  2013-12-05  0:56           ` David Rientjes
  1 sibling, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2013-12-02 14:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: William Dauchy, Johannes Weiner, Ma, Xindong, akpm, rientjes,
	rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu,
	Xiaobing, azurIt, Sameer Nanda

On 11/28, Oleg Nesterov wrote:
>
> On 11/28, Michal Hocko wrote:
> >
> > They are both trying to solve the same issue. Neither of them is
> > optimal unfortunately.
>
> yes, but this one doesn't look right.
>
> > Oleg said he would look into this and I have seen
> > some patches but didn't geto check them.
>
> Only preparations so far.

OK, I am going to send the initial fixes today. This means (I hope)
that we do not need this or Sameer's "[PATCH] mm, oom: Fix race when
selecting process to kill".

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-12-02 14:12         ` Oleg Nesterov
@ 2013-12-05  0:56           ` David Rientjes
  2013-12-05 17:29             ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: David Rientjes @ 2013-12-05  0:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Michal Hocko, William Dauchy, Johannes Weiner, Ma, Xindong, akpm,
	rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu,
	Xiaobing, azurIt, Sameer Nanda

On Mon, 2 Dec 2013, Oleg Nesterov wrote:

> OK, I am going to send the initial fixes today. This means (I hope)
> that we do not need this or Sameer's "[PATCH] mm, oom: Fix race when
> selecting process to kill".
> 

Your v2 series looks good and I suspect anybody trying them doesn't have 
additional reports of the infinite loop?  Should they be marked for 
stable?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-12-05  0:56           ` David Rientjes
@ 2013-12-05 17:29             ` Oleg Nesterov
  2013-12-05 23:35               ` David Rientjes
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2013-12-05 17:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, William Dauchy, Johannes Weiner, Ma, Xindong, akpm,
	rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu,
	Xiaobing, azurIt, Sameer Nanda

On 12/04, David Rientjes wrote:
>
> On Mon, 2 Dec 2013, Oleg Nesterov wrote:
>
> > OK, I am going to send the initial fixes today. This means (I hope)
> > that we do not need this or Sameer's "[PATCH] mm, oom: Fix race when
> > selecting process to kill".
>
> Your v2 series looks good and I suspect anybody trying them doesn't have
> additional reports of the infinite loop?  Should they be marked for
> stable?

Unlikely...

I think the patch from Sameer makes more sense for stable as a temporary
(and obviously incomplete) fix.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-12-05 17:29             ` Oleg Nesterov
@ 2013-12-05 23:35               ` David Rientjes
  2013-12-06 15:19                 ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: David Rientjes @ 2013-12-05 23:35 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton
  Cc: Michal Hocko, William Dauchy, Johannes Weiner, Ma, Xindong,
	rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh, Tu,
	Xiaobing, azurIt, Sameer Nanda

On Thu, 5 Dec 2013, Oleg Nesterov wrote:

> > > OK, I am going to send the initial fixes today. This means (I hope)
> > > that we do not need this or Sameer's "[PATCH] mm, oom: Fix race when
> > > selecting process to kill".
> >
> > Your v2 series looks good and I suspect anybody trying them doesn't have
> > additional reports of the infinite loop?  Should they be marked for
> > stable?
> 
> Unlikely...
> 
> I think the patch from Sameer makes more sense for stable as a temporary
> (and obviously incomplete) fix.
> 

There's a problem because none of this is currently even in linux-next.  I 
think we could make a case for getting Sameer's patch at 
http://marc.info/?l=linux-kernel&m=138436313021133 to be merged for 
stable, but then we'd have to revert it in linux-next before merging your 
series at http://marc.info/?l=linux-kernel&m=138616217925981.  All of the 
issues you present in that series seem to be stable material, so why not 
just go ahead with your series and mark it for stable for 3.13?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-12-05 23:35               ` David Rientjes
@ 2013-12-06 15:19                 ` Oleg Nesterov
  2013-12-06 15:52                   ` Oleg Nesterov
  2013-12-06 17:54                   ` Sameer Nanda
  0 siblings, 2 replies; 17+ messages in thread
From: Oleg Nesterov @ 2013-12-06 15:19 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Michal Hocko, William Dauchy, Johannes Weiner, Ma,
	Xindong, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh,
	Tu, Xiaobing, azurIt, Sameer Nanda

On 12/05, David Rientjes wrote:
>
> On Thu, 5 Dec 2013, Oleg Nesterov wrote:
>
> > > Your v2 series looks good and I suspect anybody trying them doesn't have
> > > additional reports of the infinite loop?  Should they be marked for
> > > stable?
> >
> > Unlikely...
> >
> > I think the patch from Sameer makes more sense for stable as a temporary
> > (and obviously incomplete) fix.
>
> There's a problem because none of this is currently even in linux-next.  I
> think we could make a case for getting Sameer's patch at
> http://marc.info/?l=linux-kernel&m=138436313021133 to be merged for
> stable,

Probably.

Ah, I just noticed that this change

	-	if (p->flags & PF_EXITING) {
	+	if (p->flags & PF_EXITING || !pid_alive(p)) {

is not needed. !pid_alive(p) obviously implies PF_EXITING.

> but then we'd have to revert it in linux-next

Or perhaps Sameer can just send his fix to stable/gregkh.

Just the changelog should clearly explain that this is the minimal
workaround for stable. Once again it doesn't (and can't) fix all
problems even in oom_kill_process() paths, but it helps anyway to
avoid the easy-to-trigger hang.

> before merging your
> series at http://marc.info/?l=linux-kernel&m=138616217925981.

Just in case, I won't mind to rediff my patches on top of Sameer's
patch and then add git-revert patch.

> All of the
> issues you present in that series seem to be stable material, so why not
> just go ahead with your series and mark it for stable for 3.13?

OK... I can do this too.

I do not really like this because it adds thread_head/node but doesn't
remove the old ->thread_group. We will do this later, but obviously
this is not the stable material.

IOW, if we send this to stable, thread_head/node/for_each_thread will
be only used by oom_kill.c.

And this is risky. For example, 1/4 depends on (at least) another patch
I sent in preparation for this change, commit 81907739851
"kernel/fork.c:copy_process(): don't add the uninitialized
child to thread/task/pid lists", perhaps on something else.

So personally I'd prefer to simply send the workaround for stable.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-12-06 15:19                 ` Oleg Nesterov
@ 2013-12-06 15:52                   ` Oleg Nesterov
  2013-12-06 17:54                   ` Sameer Nanda
  1 sibling, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2013-12-06 15:52 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Michal Hocko, William Dauchy, Johannes Weiner, Ma,
	Xindong, rusty, linux-mm, linux-kernel, Peter Zijlstra, gregkh,
	Tu, Xiaobing, azurIt, Sameer Nanda

On 12/06, Oleg Nesterov wrote:
>
> And this is risky. For example, 1/4 depends on (at least) another patch
> I sent in preparation for this change, commit 81907739851
> "kernel/fork.c:copy_process(): don't add the uninitialized
> child to thread/task/pid lists", perhaps on something else.

Hmm. not too much actually, I re-checked v3.10:kernel/copy_process.c.
Yes, list_add(thread_node)) in copy_process() can add the new thread
with the wrong pids, but somehow I forgot that list_add(thread_group)
in v3.10 has the same problem, so this probably doesn't matter and
we can safely backport this change.

> So personally I'd prefer to simply send the workaround for stable.

Yes, anyway, bacause I will sleep better ;)

But OK, if you think it would be better to mark 1-4 series I sent
for stable - I won't argue.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix race between oom kill and task exit
  2013-12-06 15:19                 ` Oleg Nesterov
  2013-12-06 15:52                   ` Oleg Nesterov
@ 2013-12-06 17:54                   ` Sameer Nanda
  1 sibling, 0 replies; 17+ messages in thread
From: Sameer Nanda @ 2013-12-06 17:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: David Rientjes, Andrew Morton, Michal Hocko, William Dauchy,
	Johannes Weiner, Ma, Xindong, rusty, linux-mm, linux-kernel,
	Peter Zijlstra, Greg KH, Tu, Xiaobing, azurIt

On Fri, Dec 6, 2013 at 7:19 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 12/05, David Rientjes wrote:
>>
>> On Thu, 5 Dec 2013, Oleg Nesterov wrote:
>>
>> > > Your v2 series looks good and I suspect anybody trying them doesn't have
>> > > additional reports of the infinite loop?  Should they be marked for
>> > > stable?
>> >
>> > Unlikely...
>> >
>> > I think the patch from Sameer makes more sense for stable as a temporary
>> > (and obviously incomplete) fix.
>>
>> There's a problem because none of this is currently even in linux-next.  I
>> think we could make a case for getting Sameer's patch at
>> http://marc.info/?l=linux-kernel&m=138436313021133 to be merged for
>> stable,
>
> Probably.
>
> Ah, I just noticed that this change
>
>         -       if (p->flags & PF_EXITING) {
>         +       if (p->flags & PF_EXITING || !pid_alive(p)) {
>
> is not needed. !pid_alive(p) obviously implies PF_EXITING.

Ah right.

>
>> but then we'd have to revert it in linux-next
>
> Or perhaps Sameer can just send his fix to stable/gregkh.
>
> Just the changelog should clearly explain that this is the minimal
> workaround for stable. Once again it doesn't (and can't) fix all
> problems even in oom_kill_process() paths, but it helps anyway to
> avoid the easy-to-trigger hang.

I don't mind doing that if that seems to be the consensus.  FWIW, I've
already added my patch to the Chrome OS kernel repo.

>
>> before merging your
>> series at http://marc.info/?l=linux-kernel&m=138616217925981.
>
> Just in case, I won't mind to rediff my patches on top of Sameer's
> patch and then add git-revert patch.
>
>> All of the
>> issues you present in that series seem to be stable material, so why not
>> just go ahead with your series and mark it for stable for 3.13?
>
> OK... I can do this too.
>
> I do not really like this because it adds thread_head/node but doesn't
> remove the old ->thread_group. We will do this later, but obviously
> this is not the stable material.
>
> IOW, if we send this to stable, thread_head/node/for_each_thread will
> be only used by oom_kill.c.
>
> And this is risky. For example, 1/4 depends on (at least) another patch
> I sent in preparation for this change, commit 81907739851
> "kernel/fork.c:copy_process(): don't add the uninitialized
> child to thread/task/pid lists", perhaps on something else.
>
> So personally I'd prefer to simply send the workaround for stable.
>
> Oleg.
>



-- 
Sameer

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-12-06 17:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-28  5:09 [PATCH] Fix race between oom kill and task exit Ma, Xindong
2013-11-28  6:35 ` Johannes Weiner
2013-11-28  8:41   ` William Dauchy
2013-11-28 12:00     ` Michal Hocko
2013-11-28 17:57       ` Vladimir Murzin
2013-11-28 18:38       ` Oleg Nesterov
2013-11-29  2:06         ` Ma, Xindong
2013-11-29  2:08           ` Tu, Xiaobing
2013-12-02 14:12         ` Oleg Nesterov
2013-12-05  0:56           ` David Rientjes
2013-12-05 17:29             ` Oleg Nesterov
2013-12-05 23:35               ` David Rientjes
2013-12-06 15:19                 ` Oleg Nesterov
2013-12-06 15:52                   ` Oleg Nesterov
2013-12-06 17:54                   ` Sameer Nanda
2013-11-28  9:18   ` azurIt
2013-11-28 18:37 ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).