All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: fix oom work when memory is under pressure
@ 2016-09-06 14:47 zhongjiang
  2016-09-09 11:44 ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhongjiang @ 2016-09-06 14:47 UTC (permalink / raw)
  To: akpm; +Cc: vbabka, mhocko, rientjes, linux-mm

From: zhong jiang <zhongjiang@huawei.com>

Some hungtask come up when I run the trinity, and OOM occurs
frequently.
A task hold lock to allocate memory, due to the low memory,
it will lead to oom. at the some time , it will retry because
it find that oom is in progress. but it always allocate fails,
the freed memory was taken away quickly.
The patch fix it by limit times to avoid hungtask and livelock
come up.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/page_alloc.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a178b1d..0dcf08b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3457,6 +3457,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	enum compact_result compact_result;
 	int compaction_retries = 0;
 	int no_progress_loops = 0;
+	int oom_failed = 0;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -3645,8 +3646,13 @@ retry:
 	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
 	if (page)
 		goto got_pg;
+	else
+		oom_failed++;
+
+	/* more than limited times will drop out */
+	if (oom_failed > MAX_RECLAIM_RETRIES)
+		goto nopage;
 
-	/* Retry as long as the OOM killer is making progress */
 	if (did_some_progress) {
 		no_progress_loops = 0;
 		goto retry;
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-06 14:47 [PATCH] mm: fix oom work when memory is under pressure zhongjiang
@ 2016-09-09 11:44 ` Michal Hocko
  2016-09-12  9:51   ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-09-09 11:44 UTC (permalink / raw)
  To: zhongjiang; +Cc: akpm, vbabka, rientjes, linux-mm

On Tue 06-09-16 22:47:06, zhongjiang wrote:
> From: zhong jiang <zhongjiang@huawei.com>
> 
> Some hungtask come up when I run the trinity, and OOM occurs
> frequently.
> A task hold lock to allocate memory, due to the low memory,
> it will lead to oom. at the some time , it will retry because
> it find that oom is in progress. but it always allocate fails,
> the freed memory was taken away quickly.
> The patch fix it by limit times to avoid hungtask and livelock
> come up.

Which kernel has shown this issue? Since 4.6 IIRC we have oom reaper
responsible for the async memory reclaim from the oom victim and later
changes should help to reduce oom lockups even further.

That being said this is not a right approach. It is even incorrect
because it allows __GFP_NOFAIL to fail now. So NAK to this patch.

> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  mm/page_alloc.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a178b1d..0dcf08b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3457,6 +3457,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	enum compact_result compact_result;
>  	int compaction_retries = 0;
>  	int no_progress_loops = 0;
> +	int oom_failed = 0;
>  
>  	/*
>  	 * In the slowpath, we sanity check order to avoid ever trying to
> @@ -3645,8 +3646,13 @@ retry:
>  	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
>  	if (page)
>  		goto got_pg;
> +	else
> +		oom_failed++;
> +
> +	/* more than limited times will drop out */
> +	if (oom_failed > MAX_RECLAIM_RETRIES)
> +		goto nopage;
>  
> -	/* Retry as long as the OOM killer is making progress */
>  	if (did_some_progress) {
>  		no_progress_loops = 0;
>  		goto retry;
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-09 11:44 ` Michal Hocko
@ 2016-09-12  9:51   ` zhong jiang
  2016-09-12 11:13     ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-09-12  9:51 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On 2016/9/9 19:44, Michal Hocko wrote:
> On Tue 06-09-16 22:47:06, zhongjiang wrote:
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> Some hungtask come up when I run the trinity, and OOM occurs
>> frequently.
>> A task hold lock to allocate memory, due to the low memory,
>> it will lead to oom. at the some time , it will retry because
>> it find that oom is in progress. but it always allocate fails,
>> the freed memory was taken away quickly.
>> The patch fix it by limit times to avoid hungtask and livelock
>> come up.
> Which kernel has shown this issue? Since 4.6 IIRC we have oom reaper
> responsible for the async memory reclaim from the oom victim and later
> changes should help to reduce oom lockups even further.
>
> That being said this is not a right approach. It is even incorrect
> because it allows __GFP_NOFAIL to fail now. So NAK to this patch.
>
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  mm/page_alloc.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a178b1d..0dcf08b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3457,6 +3457,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>>  	enum compact_result compact_result;
>>  	int compaction_retries = 0;
>>  	int no_progress_loops = 0;
>> +	int oom_failed = 0;
>>  
>>  	/*
>>  	 * In the slowpath, we sanity check order to avoid ever trying to
>> @@ -3645,8 +3646,13 @@ retry:
>>  	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
>>  	if (page)
>>  		goto got_pg;
>> +	else
>> +		oom_failed++;
>> +
>> +	/* more than limited times will drop out */
>> +	if (oom_failed > MAX_RECLAIM_RETRIES)
>> +		goto nopage;
>>  
>> -	/* Retry as long as the OOM killer is making progress */
>>  	if (did_some_progress) {
>>  		no_progress_loops = 0;
>>  		goto retry;
>> -- 
>> 1.8.3.1
 hi,  Michal
 oom reaper indeed can accelerate the recovery of memory,  but the patch solve the extreme scenario,
 I hit it by runing trinity. I think the scenario can happen whether  oom reaper  or not.
 
The __GFP_NOFAIL should be considered. Thank you for reminding. The following patch is updated.

Thanks
zhongjiang

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a178b1d..47804c1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3457,6 +3457,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
        enum compact_result compact_result;
        int compaction_retries = 0;
        int no_progress_loops = 0;
+       int oom_failed = 0;

        /*
         * In the slowpath, we sanity check order to avoid ever trying to
@@ -3645,8 +3646,15 @@ retry:
        page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
        if (page)
                goto got_pg;
+       else
+               oom_failed++;
+
+       /* more than limited times will drop out */
+       if (oom_failed > MAX_RECLAIM_RETRIES) {
+               WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+               goto nopage;
+       }

-       /* Retry as long as the OOM killer is making progress */
        if (did_some_progress) {
                no_progress_loops = 0;
                goto retry;



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-12  9:51   ` zhong jiang
@ 2016-09-12 11:13     ` Michal Hocko
  2016-09-12 13:42       ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-09-12 11:13 UTC (permalink / raw)
  To: zhong jiang; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On Mon 12-09-16 17:51:06, zhong jiang wrote:
[...]
> hi,  Michal
> oom reaper indeed can accelerate the recovery of memory, but the patch
> solve the extreme scenario, I hit it by runing trinity. I think the
> scenario can happen whether oom reaper or not.

could you be more specific about the case when the oom reaper and the
current oom code led to the oom deadlock?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-12 11:13     ` Michal Hocko
@ 2016-09-12 13:42       ` zhong jiang
  2016-09-12 17:44         ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-09-12 13:42 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On 2016/9/12 19:13, Michal Hocko wrote:
> On Mon 12-09-16 17:51:06, zhong jiang wrote:
> [...]
>> hi,  Michal
>> oom reaper indeed can accelerate the recovery of memory, but the patch
>> solve the extreme scenario, I hit it by runing trinity. I think the
>> scenario can happen whether oom reaper or not.
> could you be more specific about the case when the oom reaper and the
> current oom code led to the oom deadlock?
  It is not the oom deadlock.  It will lead  to hungtask.  The explain is as follows.

  process A occupy a resource and lock it.  then A need to allocate memory when memory is
  very low. at the some time, oom will come up and return directly. because it find other process
  is freeing memory in same zone.  however,  the freed memory is taken away by another process.
 it will lead to A oom again and again.

process B still wait some resource holded by A. so B will obtain the lock until A release the resource.
therefor,  if A spend much time to obtain memory,  B will hungtask. 

Thanks
zhongjiang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-12 13:42       ` zhong jiang
@ 2016-09-12 17:44         ` Michal Hocko
  2016-09-13 13:13           ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-09-12 17:44 UTC (permalink / raw)
  To: zhong jiang; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On Mon 12-09-16 21:42:28, zhong jiang wrote:
> On 2016/9/12 19:13, Michal Hocko wrote:
> > On Mon 12-09-16 17:51:06, zhong jiang wrote:
> > [...]
> >> hi,  Michal
> >> oom reaper indeed can accelerate the recovery of memory, but the patch
> >> solve the extreme scenario, I hit it by runing trinity. I think the
> >> scenario can happen whether oom reaper or not.
> > could you be more specific about the case when the oom reaper and the
> > current oom code led to the oom deadlock?
>
> It is not the oom deadlock.  It will lead to hungtask.  The explain is
> as follows.
> 
> process A occupy a resource and lock it. then A need to allocate
> memory when memory is very low. at the some time, oom will come up and
> return directly. because it find other process is freeing memory in
> same zone.
>
> however, the freed memory is taken away by another process.
> it will lead to A oom again and again.
> 
> process B still wait some resource holded by A. so B will obtain the
> lock until A release the resource. therefor, if A spend much time to
> obtain memory, B will hungtask.

OK, I see what you are aiming for. And indeed such a starvation and
resulting priority inversion is possible. It is a hard problem to solve
and your patch doesn't address it either. You can spend enough time
reclaiming and retrying without ever getting to the oom path to trigger
this hungtask warning.

If you want to solve this problem properly then you would have to give
tasks which are looping in the page allocator access to some portion of
memory reserves. This is quite tricky to do right, though.

Retry counters with the fail path have been proposed in the past and not
accepted.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-12 17:44         ` Michal Hocko
@ 2016-09-13 13:13           ` zhong jiang
  2016-09-13 13:28             ` Michal Hocko
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-09-13 13:13 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On 2016/9/13 1:44, Michal Hocko wrote:
> On Mon 12-09-16 21:42:28, zhong jiang wrote:
>> On 2016/9/12 19:13, Michal Hocko wrote:
>>> On Mon 12-09-16 17:51:06, zhong jiang wrote:
>>> [...]
>>>> hi,  Michal
>>>> oom reaper indeed can accelerate the recovery of memory, but the patch
>>>> solve the extreme scenario, I hit it by runing trinity. I think the
>>>> scenario can happen whether oom reaper or not.
>>> could you be more specific about the case when the oom reaper and the
>>> current oom code led to the oom deadlock?
>> It is not the oom deadlock.  It will lead to hungtask.  The explain is
>> as follows.
>>
>> process A occupy a resource and lock it. then A need to allocate
>> memory when memory is very low. at the some time, oom will come up and
>> return directly. because it find other process is freeing memory in
>> same zone.
>>
>> however, the freed memory is taken away by another process.
>> it will lead to A oom again and again.
>>
>> process B still wait some resource holded by A. so B will obtain the
>> lock until A release the resource. therefor, if A spend much time to
>> obtain memory, B will hungtask.
> OK, I see what you are aiming for. And indeed such a starvation and
> resulting priority inversion is possible. It is a hard problem to solve
> and your patch doesn't address it either. You can spend enough time
> reclaiming and retrying without ever getting to the oom path to trigger
> this hungtask warning.
  Yes.
> If you want to solve this problem properly then you would have to give
> tasks which are looping in the page allocator access to some portion of
> memory reserves. This is quite tricky to do right, though.
  To use some portion of memory reserves is almost no effect in a so starvation scenario.
   I think the hungtask still will occur. it can not  solve the problem primarily.
> Retry counters with the fail path have been proposed in the past and not
> accepted.
  The above patch have been tested by runing the trinity.  The question is fixed. 
  Is there  any reasonable reason oppose to the patch ?  or it will bring in  any side-effect.

 Thanks
zhongjiang


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-13 13:13           ` zhong jiang
@ 2016-09-13 13:28             ` Michal Hocko
  2016-09-13 14:01               ` zhong jiang
  2016-09-14  7:13               ` zhong jiang
  0 siblings, 2 replies; 26+ messages in thread
From: Michal Hocko @ 2016-09-13 13:28 UTC (permalink / raw)
  To: zhong jiang; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On Tue 13-09-16 21:13:21, zhong jiang wrote:
> On 2016/9/13 1:44, Michal Hocko wrote:
[...]
> > If you want to solve this problem properly then you would have to give
> > tasks which are looping in the page allocator access to some portion of
> > memory reserves. This is quite tricky to do right, though.
>
> To use some portion of memory reserves is almost no effect in a so
> starvation scenario.  I think the hungtask still will occur. it can
> not solve the problem primarily.

Granting an access to memory reserves is of course no full solution but
it raises chances for a forward progress. Other solutions would have to
guarantee that the memory reclaimed on behalf of the requester will be
given to the requester. Not an easy task

> > Retry counters with the fail path have been proposed in the past and not
> > accepted.
>
> The above patch have been tested by runing the trinity.  The question
> is fixed.  Is there any reasonable reason oppose to the patch ? or it
> will bring in any side-effect.

Sure there is. Low order allocations have been traditionally non failing
and changing that behavior is a major obstacle because it opens up a
door to many bugs. I've tried to do something similar in the past and
there was a strong resistance against it. Believe me been there done
that...

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-13 13:28             ` Michal Hocko
@ 2016-09-13 14:01               ` zhong jiang
  2016-09-14  7:13               ` zhong jiang
  1 sibling, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-09-13 14:01 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On 2016/9/13 21:28, Michal Hocko wrote:
> On Tue 13-09-16 21:13:21, zhong jiang wrote:
>> On 2016/9/13 1:44, Michal Hocko wrote:
> [...]
>>> If you want to solve this problem properly then you would have to give
>>> tasks which are looping in the page allocator access to some portion of
>>> memory reserves. This is quite tricky to do right, though.
>> To use some portion of memory reserves is almost no effect in a so
>> starvation scenario.  I think the hungtask still will occur. it can
>> not solve the problem primarily.
> Granting an access to memory reserves is of course no full solution but
> it raises chances for a forward progress. Other solutions would have to
> guarantee that the memory reclaimed on behalf of the requester will be
> given to the requester. Not an easy task
>
>>> Retry counters with the fail path have been proposed in the past and not
>>> accepted.
>> The above patch have been tested by runing the trinity.  The question
>> is fixed.  Is there any reasonable reason oppose to the patch ? or it
>> will bring in any side-effect.
> Sure there is. Low order allocations have been traditionally non failing
> and changing that behavior is a major obstacle because it opens up a
> door to many bugs. I've tried to do something similar in the past and
> there was a strong resistance against it. Believe me been there done
> that...
>
  That sounds resonable.  but So starvation scenario should unavoidable failed. In any case
  you mean  we need allow to allocate the low order.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-13 13:28             ` Michal Hocko
  2016-09-13 14:01               ` zhong jiang
@ 2016-09-14  7:13               ` zhong jiang
  2016-09-14  8:42                 ` Michal Hocko
  1 sibling, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-09-14  7:13 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On 2016/9/13 21:28, Michal Hocko wrote:
> On Tue 13-09-16 21:13:21, zhong jiang wrote:
>> On 2016/9/13 1:44, Michal Hocko wrote:
> [...]
>>> If you want to solve this problem properly then you would have to give
>>> tasks which are looping in the page allocator access to some portion of
>>> memory reserves. This is quite tricky to do right, though.
>> To use some portion of memory reserves is almost no effect in a so
>> starvation scenario.  I think the hungtask still will occur. it can
>> not solve the problem primarily.
> Granting an access to memory reserves is of course no full solution but
> it raises chances for a forward progress. Other solutions would have to
> guarantee that the memory reclaimed on behalf of the requester will be
> given to the requester. Not an easy task
>
>>> Retry counters with the fail path have been proposed in the past and not
>>> accepted.
>> The above patch have been tested by runing the trinity.  The question
>> is fixed.  Is there any reasonable reason oppose to the patch ? or it
>> will bring in any side-effect.
> Sure there is. Low order allocations have been traditionally non failing
> and changing that behavior is a major obstacle because it opens up a
> door to many bugs. I've tried to do something similar in the past and
> there was a strong resistance against it. Believe me been there done
> that...
>
  hi, Michal

  Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
 
[  601.937145] Call trace:
[  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
[  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
[  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
[  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
[  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
[  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
[  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
[  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
[  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
[  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
[  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
[  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78

The root case is that ksmd hold the read lock. and the lock is not released.
 scan_get_next_rmap_item
         down_read
                   get_next_rmap_item
                             alloc_rmap_item     #ksmd will loop permanently.

How do you see this kind of situation ? or  let the issue alone.

Thanks
zhongjiang
 
                      
    

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-14  7:13               ` zhong jiang
@ 2016-09-14  8:42                 ` Michal Hocko
  2016-09-14  8:50                   ` zhong jiang
  2016-09-14  8:52                   ` Michal Hocko
  0 siblings, 2 replies; 26+ messages in thread
From: Michal Hocko @ 2016-09-14  8:42 UTC (permalink / raw)
  To: zhong jiang; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

[Let's CC Hugh]

On Wed 14-09-16 15:13:50, zhong jiang wrote:
[...]
>   hi, Michal
> 
>   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
>  
> [  601.937145] Call trace:
> [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
> [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
> [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
> [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
> [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
> [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
> [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
> [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
> [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
> [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
> [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
> [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78

So this is a hung task triggering because the exiting task cannot get
the mmap sem for write because the ksmd holds it for read while
allocating memory which just takes ages to complete, right?

> 
> The root case is that ksmd hold the read lock. and the lock is not released.
>  scan_get_next_rmap_item
>          down_read
>                    get_next_rmap_item
>                              alloc_rmap_item     #ksmd will loop permanently.
> 
> How do you see this kind of situation ? or  let the issue alone.

I am not familiar with the ksmd code so it is hard for me to judge but
one thing to do would be __GFP_NORETRY which would force a bail out from
the allocation rather than looping for ever. A quick look tells me that
the allocation failure here is quite easy to handle. There might be
others...

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-14  8:42                 ` Michal Hocko
@ 2016-09-14  8:50                   ` zhong jiang
  2016-09-14  9:05                     ` Michal Hocko
  2016-09-14  8:52                   ` Michal Hocko
  1 sibling, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-09-14  8:50 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On 2016/9/14 16:42, Michal Hocko wrote:
> [Let's CC Hugh]
>
> On Wed 14-09-16 15:13:50, zhong jiang wrote:
> [...]
>>   hi, Michal
>>
>>   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
>>  
>> [  601.937145] Call trace:
>> [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
>> [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
>> [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
>> [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
>> [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
>> [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
>> [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
>> [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
>> [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
>> [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
>> [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
>> [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
> So this is a hung task triggering because the exiting task cannot get
> the mmap sem for write because the ksmd holds it for read while
> allocating memory which just takes ages to complete, right?
  Yes
>> The root case is that ksmd hold the read lock. and the lock is not released.
>>  scan_get_next_rmap_item
>>          down_read
>>                    get_next_rmap_item
>>                              alloc_rmap_item     #ksmd will loop permanently.
>>
>> How do you see this kind of situation ? or  let the issue alone.
> I am not familiar with the ksmd code so it is hard for me to judge but
> one thing to do would be __GFP_NORETRY which would force a bail out from
> the allocation rather than looping for ever. A quick look tells me that
> the allocation failure here is quite easy to handle. There might be
> others...
>
  by adding my patch , The question is fixed.  They are same issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-14  8:42                 ` Michal Hocko
  2016-09-14  8:50                   ` zhong jiang
@ 2016-09-14  8:52                   ` Michal Hocko
  2016-09-14  9:25                     ` zhong jiang
  2016-09-16 22:13                     ` Hugh Dickins
  1 sibling, 2 replies; 26+ messages in thread
From: Michal Hocko @ 2016-09-14  8:52 UTC (permalink / raw)
  To: zhong jiang
  Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo, Hugh Dickins

On Wed 14-09-16 10:42:19, Michal Hocko wrote:
> [Let's CC Hugh]

now for real...

> 
> On Wed 14-09-16 15:13:50, zhong jiang wrote:
> [...]
> >   hi, Michal
> > 
> >   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
> >  
> > [  601.937145] Call trace:
> > [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
> > [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
> > [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
> > [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
> > [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
> > [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
> > [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
> > [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
> > [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
> > [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
> > [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
> > [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
> 
> So this is a hung task triggering because the exiting task cannot get
> the mmap sem for write because the ksmd holds it for read while
> allocating memory which just takes ages to complete, right?
> 
> > 
> > The root case is that ksmd hold the read lock. and the lock is not released.
> >  scan_get_next_rmap_item
> >          down_read
> >                    get_next_rmap_item
> >                              alloc_rmap_item     #ksmd will loop permanently.
> > 
> > How do you see this kind of situation ? or  let the issue alone.
> 
> I am not familiar with the ksmd code so it is hard for me to judge but
> one thing to do would be __GFP_NORETRY which would force a bail out from
> the allocation rather than looping for ever. A quick look tells me that
> the allocation failure here is quite easy to handle. There might be
> others...
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-14  8:50                   ` zhong jiang
@ 2016-09-14  9:05                     ` Michal Hocko
  0 siblings, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2016-09-14  9:05 UTC (permalink / raw)
  To: zhong jiang; +Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On Wed 14-09-16 16:50:48, zhong jiang wrote:
> On 2016/9/14 16:42, Michal Hocko wrote:
> > [Let's CC Hugh]
> >
> > On Wed 14-09-16 15:13:50, zhong jiang wrote:
> > [...]
> >>   hi, Michal
> >>
> >>   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
> >>  
> >> [  601.937145] Call trace:
> >> [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
> >> [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
> >> [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
> >> [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
> >> [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
> >> [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
> >> [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
> >> [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
> >> [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
> >> [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
> >> [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
> >> [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
> > So this is a hung task triggering because the exiting task cannot get
> > the mmap sem for write because the ksmd holds it for read while
> > allocating memory which just takes ages to complete, right?
>   Yes
> >> The root case is that ksmd hold the read lock. and the lock is not released.
> >>  scan_get_next_rmap_item
> >>          down_read
> >>                    get_next_rmap_item
> >>                              alloc_rmap_item     #ksmd will loop permanently.
> >>
> >> How do you see this kind of situation ? or  let the issue alone.
> > I am not familiar with the ksmd code so it is hard for me to judge but
> > one thing to do would be __GFP_NORETRY which would force a bail out from
> > the allocation rather than looping for ever. A quick look tells me that
> > the allocation failure here is quite easy to handle. There might be
> > others...
> >
>   by adding my patch , The question is fixed.  They are same issue.

No, it's not, as I've alreade mentioned before.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-14  8:52                   ` Michal Hocko
@ 2016-09-14  9:25                     ` zhong jiang
  2016-09-14 11:29                       ` Tetsuo Handa
  2016-09-16 22:13                     ` Hugh Dickins
  1 sibling, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-09-14  9:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo, Hugh Dickins

On 2016/9/14 16:52, Michal Hocko wrote:
> On Wed 14-09-16 10:42:19, Michal Hocko wrote:
>> [Let's CC Hugh]
> now for real...
>
>> On Wed 14-09-16 15:13:50, zhong jiang wrote:
>> [...]
>>>   hi, Michal
>>>
>>>   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
>>>  
>>> [  601.937145] Call trace:
>>> [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
>>> [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
>>> [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
>>> [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
>>> [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
>>> [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
>>> [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
>>> [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
>>> [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
>>> [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
>>> [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
>>> [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
>> So this is a hung task triggering because the exiting task cannot get
>> the mmap sem for write because the ksmd holds it for read while
>> allocating memory which just takes ages to complete, right?
>>
>>> The root case is that ksmd hold the read lock. and the lock is not released.
>>>  scan_get_next_rmap_item
>>>          down_read
>>>                    get_next_rmap_item
>>>                              alloc_rmap_item     #ksmd will loop permanently.
>>>
>>> How do you see this kind of situation ? or  let the issue alone.
>> I am not familiar with the ksmd code so it is hard for me to judge but
>> one thing to do would be __GFP_NORETRY which would force a bail out from
>> the allocation rather than looping for ever. A quick look tells me that
>> the allocation failure here is quite easy to handle. There might be
>> others...
>>
>> -- 
>> Michal Hocko
>> SUSE Labs
Adding the __GFP_NORETRY,  the issue also can fixed.
Therefore, we can assure that the case of LTP will leads to the endless looping.

index d45a0a1..03fb67b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -283,7 +283,7 @@ static inline struct rmap_item *alloc_rmap_item(void)
 {
        struct rmap_item *rmap_item;

-       rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL);
+       rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL | __GFP_NORETRY);
        if (rmap_item)
                ksm_rmap_items++;
        return rmap_item;



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-14  9:25                     ` zhong jiang
@ 2016-09-14 11:29                       ` Tetsuo Handa
  2016-09-14 13:52                         ` zhong jiang
  0 siblings, 1 reply; 26+ messages in thread
From: Tetsuo Handa @ 2016-09-14 11:29 UTC (permalink / raw)
  To: zhong jiang, Michal Hocko
  Cc: akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo, Hugh Dickins

On 2016/09/14 18:25, zhong jiang wrote:
> On 2016/9/14 16:52, Michal Hocko wrote:
>> On Wed 14-09-16 10:42:19, Michal Hocko wrote:
>>> [Let's CC Hugh]
>> now for real...
>>
>>> On Wed 14-09-16 15:13:50, zhong jiang wrote:
>>> [...]
>>>>   hi, Michal
>>>>
>>>>   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
>>>>  
>>>> [  601.937145] Call trace:
>>>> [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
>>>> [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
>>>> [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
>>>> [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
>>>> [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
>>>> [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
>>>> [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
>>>> [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
>>>> [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
>>>> [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
>>>> [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
>>>> [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78

Please be sure to include exact kernel version (e.g. "uname -r",
"cat /proc/version") when reporting.

You are reporting a bug in 4.1-stable kernel, which was prone to
OOM livelock because the OOM reaper is not available.
( http://lkml.kernel.org/r/57D8012F.7080508@huawei.com )

I think we no longer can reproduce this bug using 4.8-rc6 (or linux-next),
but it will be a nice thing to backport __GFP_NORETRY patch to stable
kernels which do not have the OOM reaper.

> Adding the __GFP_NORETRY,  the issue also can fixed.
> Therefore, we can assure that the case of LTP will leads to the endless looping.
> 
> index d45a0a1..03fb67b 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -283,7 +283,7 @@ static inline struct rmap_item *alloc_rmap_item(void)
>  {
>         struct rmap_item *rmap_item;
> 
> -       rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL);
> +       rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL | __GFP_NORETRY);
>         if (rmap_item)
>                 ksm_rmap_items++;
>         return rmap_item;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-14 11:29                       ` Tetsuo Handa
@ 2016-09-14 13:52                         ` zhong jiang
  2016-09-18  6:00                           ` Tetsuo Handa
  0 siblings, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-09-14 13:52 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Michal Hocko, akpm, vbabka, rientjes, linux-mm, Xishi Qiu,
	Hanjun Guo, Hugh Dickins

On 2016/9/14 19:29, Tetsuo Handa wrote:
> On 2016/09/14 18:25, zhong jiang wrote:
>> On 2016/9/14 16:52, Michal Hocko wrote:
>>> On Wed 14-09-16 10:42:19, Michal Hocko wrote:
>>>> [Let's CC Hugh]
>>> now for real...
>>>
>>>> On Wed 14-09-16 15:13:50, zhong jiang wrote:
>>>> [...]
>>>>>   hi, Michal
>>>>>
>>>>>   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
>>>>>  
>>>>> [  601.937145] Call trace:
>>>>> [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
>>>>> [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
>>>>> [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
>>>>> [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
>>>>> [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
>>>>> [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
>>>>> [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
>>>>> [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
>>>>> [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
>>>>> [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
>>>>> [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
>>>>> [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
> Please be sure to include exact kernel version (e.g. "uname -r",
> "cat /proc/version") when reporting.
>
> You are reporting a bug in 4.1-stable kernel, which was prone to
> OOM livelock because the OOM reaper is not available.
> ( http://lkml.kernel.org/r/57D8012F.7080508@huawei.com )
>
> I think we no longer can reproduce this bug using 4.8-rc6 (or linux-next),
> but it will be a nice thing to backport __GFP_NORETRY patch to stable
> kernels which do not have the OOM reaper.
  No, OOM reaper can not  solve the issue completely , As had disscussed with Michal.
  The conclusion is that we need come up with a better method to fix it.

   Thanks
  zhongjiang
>> Adding the __GFP_NORETRY,  the issue also can fixed.
>> Therefore, we can assure that the case of LTP will leads to the endless looping.
>>
>> index d45a0a1..03fb67b 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -283,7 +283,7 @@ static inline struct rmap_item *alloc_rmap_item(void)
>>  {
>>         struct rmap_item *rmap_item;
>>
>> -       rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL);
>> +       rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL | __GFP_NORETRY);
>>         if (rmap_item)
>>                 ksm_rmap_items++;
>>         return rmap_item;
>
> .
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-14  8:52                   ` Michal Hocko
  2016-09-14  9:25                     ` zhong jiang
@ 2016-09-16 22:13                     ` Hugh Dickins
  2016-09-17 15:56                       ` Michal Hocko
  2016-09-18  4:04                       ` zhong jiang
  1 sibling, 2 replies; 26+ messages in thread
From: Hugh Dickins @ 2016-09-16 22:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: zhong jiang, akpm, vbabka, rientjes, linux-mm, Xishi Qiu,
	Hanjun Guo, Hugh Dickins

On Wed, 14 Sep 2016, Michal Hocko wrote:
> On Wed 14-09-16 10:42:19, Michal Hocko wrote:
> > [Let's CC Hugh]
> 
> now for real...
> 
> > 
> > On Wed 14-09-16 15:13:50, zhong jiang wrote:
> > [...]
> > >   hi, Michal
> > > 
> > >   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
> > >  
> > > [  601.937145] Call trace:
> > > [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
> > > [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
> > > [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
> > > [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
> > > [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
> > > [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
> > > [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
> > > [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
> > > [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
> > > [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
> > > [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
> > > [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
> > 
> > So this is a hung task triggering because the exiting task cannot get
> > the mmap sem for write because the ksmd holds it for read while
> > allocating memory which just takes ages to complete, right?
> > 
> > > 
> > > The root case is that ksmd hold the read lock. and the lock is not released.
> > >  scan_get_next_rmap_item
> > >          down_read
> > >                    get_next_rmap_item
> > >                              alloc_rmap_item     #ksmd will loop permanently.
> > > 
> > > How do you see this kind of situation ? or  let the issue alone.
> > 
> > I am not familiar with the ksmd code so it is hard for me to judge but
> > one thing to do would be __GFP_NORETRY which would force a bail out from
> > the allocation rather than looping for ever. A quick look tells me that
> > the allocation failure here is quite easy to handle. There might be
> > others...

Yes, very good suggestion in this case: the ksmd code does exactly the
right thing when that allocation fails, but was too stupid to use an
allocation mode which might fail - and it can allocate rather a lot of
slots along that path, so it will be good to let it break out there.

Thank you, Zhongjiang, please send akpm a fully signed-off patch, tagged
for stable, with your explanation above (which was a lot more helpful
to me than what you wrote in your other mail of Sept 13th).  But please
make it GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN (and break that line
before 80 cols): the allocation will sometimes fail, and we're not at
all interested in hearing about that.

Michal, how would you feel about this or a separate patch adding
__GFP_HIGH to the allocation in ksm's alloc_stable_node()?  That
allocation could cause the same problem, but it is much less common
(so less important to do anything about it), and differs from the
rmap_item case in that if it succeeds, it will usually free a page;
whereas if it fails, the fallback (two break_cow()s) may want to
allocate a couple of pages.  So __GFP_HIGH makes more sense for it
than __GFP_NORETRY: but perhaps we prefer not to add __GFP_HIGHs?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-16 22:13                     ` Hugh Dickins
@ 2016-09-17 15:56                       ` Michal Hocko
  2016-09-18  4:04                       ` zhong jiang
  1 sibling, 0 replies; 26+ messages in thread
From: Michal Hocko @ 2016-09-17 15:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: zhong jiang, akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On Fri 16-09-16 15:13:56, Hugh Dickins wrote:
> On Wed, 14 Sep 2016, Michal Hocko wrote:
> > On Wed 14-09-16 10:42:19, Michal Hocko wrote:
> > > [Let's CC Hugh]
> > 
> > now for real...
> > 
> > > 
> > > On Wed 14-09-16 15:13:50, zhong jiang wrote:
> > > [...]
> > > >   hi, Michal
> > > > 
> > > >   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
> > > >  
> > > > [  601.937145] Call trace:
> > > > [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
> > > > [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
> > > > [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
> > > > [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
> > > > [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
> > > > [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
> > > > [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
> > > > [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
> > > > [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
> > > > [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
> > > > [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
> > > > [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
> > > 
> > > So this is a hung task triggering because the exiting task cannot get
> > > the mmap sem for write because the ksmd holds it for read while
> > > allocating memory which just takes ages to complete, right?
> > > 
> > > > 
> > > > The root case is that ksmd hold the read lock. and the lock is not released.
> > > >  scan_get_next_rmap_item
> > > >          down_read
> > > >                    get_next_rmap_item
> > > >                              alloc_rmap_item     #ksmd will loop permanently.
> > > > 
> > > > How do you see this kind of situation ? or  let the issue alone.
> > > 
> > > I am not familiar with the ksmd code so it is hard for me to judge but
> > > one thing to do would be __GFP_NORETRY which would force a bail out from
> > > the allocation rather than looping for ever. A quick look tells me that
> > > the allocation failure here is quite easy to handle. There might be
> > > others...
> 
> Yes, very good suggestion in this case: the ksmd code does exactly the
> right thing when that allocation fails, but was too stupid to use an
> allocation mode which might fail - and it can allocate rather a lot of
> slots along that path, so it will be good to let it break out there.
> 
> Thank you, Zhongjiang, please send akpm a fully signed-off patch, tagged
> for stable, with your explanation above (which was a lot more helpful
> to me than what you wrote in your other mail of Sept 13th).  But please
> make it GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN (and break that line

agreed

> before 80 cols): the allocation will sometimes fail, and we're not at
> all interested in hearing about that.
> 
> Michal, how would you feel about this or a separate patch adding
> __GFP_HIGH to the allocation in ksm's alloc_stable_node()?  That
> allocation could cause the same problem, but it is much less common
> (so less important to do anything about it), and differs from the
> rmap_item case in that if it succeeds, it will usually free a page;
> whereas if it fails, the fallback (two break_cow()s) may want to
> allocate a couple of pages.  So __GFP_HIGH makes more sense for it
> than __GFP_NORETRY: but perhaps we prefer not to add __GFP_HIGHs?

I am not familiar with the ksmd code enough to have a strong opinion
here. __GFP_HIGH should be imho used only when really necessary but as
you point out and comment in cmp_and_merge_page explain
			/*
			 * If we fail to insert the page into the stable tree,
			 * we will have 2 virtual addresses that are pointing
			 * to a ksm page left outside the stable tree,
			 * in which case we need to break_cow on both.
			 */
this can actually save some memory if succeed. So I will leave the
decision to you. I have no experience in how much this path can actually
eat and whether the flag actually makes much difference.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-16 22:13                     ` Hugh Dickins
  2016-09-17 15:56                       ` Michal Hocko
@ 2016-09-18  4:04                       ` zhong jiang
  2016-09-18 14:42                         ` Michal Hocko
  1 sibling, 1 reply; 26+ messages in thread
From: zhong jiang @ 2016-09-18  4:04 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Michal Hocko, akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On 2016/9/17 6:13, Hugh Dickins wrote:
> On Wed, 14 Sep 2016, Michal Hocko wrote:
>> On Wed 14-09-16 10:42:19, Michal Hocko wrote:
>>> [Let's CC Hugh]
>> now for real...
>>
>>> On Wed 14-09-16 15:13:50, zhong jiang wrote:
>>> [...]
>>>>   hi, Michal
>>>>
>>>>   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
>>>>  
>>>> [  601.937145] Call trace:
>>>> [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
>>>> [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
>>>> [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
>>>> [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
>>>> [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
>>>> [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
>>>> [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
>>>> [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
>>>> [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
>>>> [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
>>>> [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
>>>> [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
>>> So this is a hung task triggering because the exiting task cannot get
>>> the mmap sem for write because the ksmd holds it for read while
>>> allocating memory which just takes ages to complete, right?
>>>
>>>> The root case is that ksmd hold the read lock. and the lock is not released.
>>>>  scan_get_next_rmap_item
>>>>          down_read
>>>>                    get_next_rmap_item
>>>>                              alloc_rmap_item     #ksmd will loop permanently.
>>>>
>>>> How do you see this kind of situation ? or  let the issue alone.
>>> I am not familiar with the ksmd code so it is hard for me to judge but
>>> one thing to do would be __GFP_NORETRY which would force a bail out from
>>> the allocation rather than looping for ever. A quick look tells me that
>>> the allocation failure here is quite easy to handle. There might be
>>> others...
> Yes, very good suggestion in this case: the ksmd code does exactly the
> right thing when that allocation fails, but was too stupid to use an
> allocation mode which might fail - and it can allocate rather a lot of
> slots along that path, so it will be good to let it break out there.
>
> Thank you, Zhongjiang, please send akpm a fully signed-off patch, tagged
> for stable, with your explanation above (which was a lot more helpful
> to me than what you wrote in your other mail of Sept 13th).  But please
> make it GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN (and break that line
> before 80 cols): the allocation will sometimes fail, and we're not at
> all interested in hearing about that.
>
> Michal, how would you feel about this or a separate patch adding
> __GFP_HIGH to the allocation in ksm's alloc_stable_node()?  That
> allocation could cause the same problem, but it is much less common
> (so less important to do anything about it), and differs from the
> rmap_item case in that if it succeeds, it will usually free a page;
> whereas if it fails, the fallback (two break_cow()s) may want to
> allocate a couple of pages.  So __GFP_HIGH makes more sense for it
> than __GFP_NORETRY: but perhaps we prefer not to add __GFP_HIGHs?
>
> Hugh
>
> .
>
  I agree.  it indeed make progress.  if alloc_stable_node fails to allocate memory,
  some memory need to obtain from kernel at same time. the pressure suddenly
  will increase.

 index 5048083..72dc475 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -299,7 +299,7 @@ static inline void free_rmap_item(struct rmap_item *rmap_item)

 static inline struct stable_node *alloc_stable_node(void)
 {
-       return kmem_cache_alloc(stable_node_cache, GFP_KERNEL);
+       return kmem_cache_alloc(stable_node_cache, __GFP_HIGH);
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-14 13:52                         ` zhong jiang
@ 2016-09-18  6:00                           ` Tetsuo Handa
  2016-09-18  6:13                             ` Tetsuo Handa
  2016-09-19  7:15                             ` zhong jiang
  0 siblings, 2 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-09-18  6:00 UTC (permalink / raw)
  To: zhongjiang
  Cc: mhocko, akpm, vbabka, rientjes, linux-mm, qiuxishi, guohanjun, hughd

zhong jiang wrote:
> On 2016/9/14 19:29, Tetsuo Handa wrote:
> > On 2016/09/14 18:25, zhong jiang wrote:
> >> On 2016/9/14 16:52, Michal Hocko wrote:
> >>> On Wed 14-09-16 10:42:19, Michal Hocko wrote:
> >>>> [Let's CC Hugh]
> >>> now for real...
> >>>
> >>>> On Wed 14-09-16 15:13:50, zhong jiang wrote:
> >>>> [...]
> >>>>>   hi, Michal
> >>>>>
> >>>>>   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
> >>>>>  
> >>>>> [  601.937145] Call trace:
> >>>>> [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
> >>>>> [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
> >>>>> [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
> >>>>> [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
> >>>>> [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
> >>>>> [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
> >>>>> [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
> >>>>> [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
> >>>>> [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
> >>>>> [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
> >>>>> [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
> >>>>> [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
> > Please be sure to include exact kernel version (e.g. "uname -r",
> > "cat /proc/version") when reporting.
> >
> > You are reporting a bug in 4.1-stable kernel, which was prone to
> > OOM livelock because the OOM reaper is not available.
> > ( http://lkml.kernel.org/r/57D8012F.7080508@huawei.com )
> >
> > I think we no longer can reproduce this bug using 4.8-rc6 (or linux-next),
> > but it will be a nice thing to backport __GFP_NORETRY patch to stable
> > kernels which do not have the OOM reaper.
>   No, OOM reaper can not  solve the issue completely , As had disscussed with Michal.
>   The conclusion is that we need come up with a better method to fix it.
> 
>    Thanks
>   zhongjiang

I still think we no longer can reproduce this bug using 4.8-rc6 (or linux-next).

As of 4.1-stable, this bug caused OOM livelock situation because TIF_MEMDIE was
cleared only after returning from mmput() from exit_mm() from do_exit(). Since
there is a TIF_MEMDIE thread waiting at mmput() exists, the OOM killer does not
select next OOM victim because oom_scan_process_thread() returns OOM_SCAN_ABORT,
although ksmd is waking up the OOM killer via a __GFP_FS allocation request.

As of 4.8-rc6, the OOM reaper cannot take mmap_sem for read at __oom_reap_task()
because of TIF_MEMDIE thread waiting at ksm_exit() from __mmput() from mmput()
 from exit_mm() from do_exit(). Thus, __oom_reap_task() returns false and
oom_reap_task() will emit "oom_reaper: unable to reap pid:%d (%s)\n" message.
Then, oom_reap_task() clears TIF_MEMDIE from that thread, which in turn
makes oom_scan_process_thread() not to return OOM_SCAN_ABORT because
atomic_read(&task->signal->oom_victims) becomes 0 due to exit_oom_victim()
by the OOM reaper. Then, the OOM killer selects next OOM victim because
ksmd is waking up the OOM killer via a __GFP_FS allocation request.

Thus, this bug will be completely solved (at the cost of selecting next
OOM victim) as of 4.8-rc6.

> >> Adding the __GFP_NORETRY,  the issue also can fixed.
> >> Therefore, we can assure that the case of LTP will leads to the endless looping.
> >>
> >> index d45a0a1..03fb67b 100644
> >> --- a/mm/ksm.c
> >> +++ b/mm/ksm.c
> >> @@ -283,7 +283,7 @@ static inline struct rmap_item *alloc_rmap_item(void)
> >>  {
> >>         struct rmap_item *rmap_item;
> >>
> >> -       rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL);
> >> +       rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL | __GFP_NORETRY);
> >>         if (rmap_item)
> >>                 ksm_rmap_items++;
> >>         return rmap_item;
> >

Your patch to add __GFP_NORETRY and __GFP_NOWARN is OK. But please explicitly state
that you hit this bug in 4.1-stable. Also, your trace is not only a hung task but also
an OOM livelock; the kernel as of 4.1-stable is silent when OOM livelock situation
occurred.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-18  6:00                           ` Tetsuo Handa
@ 2016-09-18  6:13                             ` Tetsuo Handa
  2016-09-19  4:44                               ` zhong jiang
  2016-09-19  7:15                             ` zhong jiang
  1 sibling, 1 reply; 26+ messages in thread
From: Tetsuo Handa @ 2016-09-18  6:13 UTC (permalink / raw)
  To: zhongjiang
  Cc: mhocko, akpm, vbabka, rientjes, linux-mm, qiuxishi, guohanjun, hughd

Tetsuo Handa wrote:
> As of 4.8-rc6, the OOM reaper cannot take mmap_sem for read at __oom_reap_task()
> because of TIF_MEMDIE thread waiting at ksm_exit() from __mmput() from mmput()
>  from exit_mm() from do_exit(). Thus, __oom_reap_task() returns false and
> oom_reap_task() will emit "oom_reaper: unable to reap pid:%d (%s)\n" message.
> Then, oom_reap_task() clears TIF_MEMDIE from that thread, which in turn
> makes oom_scan_process_thread() not to return OOM_SCAN_ABORT because
> atomic_read(&task->signal->oom_victims) becomes 0 due to exit_oom_victim()
> by the OOM reaper. Then, the OOM killer selects next OOM victim because
> ksmd is waking up the OOM killer via a __GFP_FS allocation request.

Oops. As of 4.8-rc6, __oom_reap_task() returns true because of
find_lock_task_mm() returning NULL. Thus, oom_reap_task() clears TIF_MEMDIE
without emitting "oom_reaper: unable to reap pid:%d (%s)\n" message.

> 
> Thus, this bug will be completely solved (at the cost of selecting next
> OOM victim) as of 4.8-rc6.

The conclusion is same.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-18  4:04                       ` zhong jiang
@ 2016-09-18 14:42                         ` Michal Hocko
  2016-09-19 17:27                           ` Hugh Dickins
  0 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2016-09-18 14:42 UTC (permalink / raw)
  To: zhong jiang
  Cc: Hugh Dickins, akpm, vbabka, rientjes, linux-mm, Xishi Qiu, Hanjun Guo

On Sun 18-09-16 12:04:47, zhong jiang wrote:
[...]
>  index 5048083..72dc475 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -299,7 +299,7 @@ static inline void free_rmap_item(struct rmap_item *rmap_item)
> 
>  static inline struct stable_node *alloc_stable_node(void)
>  {
> -       return kmem_cache_alloc(stable_node_cache, GFP_KERNEL);
> +       return kmem_cache_alloc(stable_node_cache, __GFP_HIGH);
>  }

I do not want to speak for Hugh but I believe he meant something
different. The above will grant access to memory reserves but it doesn't
wake kswapd nor the direct reclaim. I guess he meant GFP_KERNEL | __GFP_HIGH

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-18  6:13                             ` Tetsuo Handa
@ 2016-09-19  4:44                               ` zhong jiang
  0 siblings, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-09-19  4:44 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mhocko, akpm, vbabka, rientjes, linux-mm, qiuxishi, guohanjun, hughd

On 2016/9/18 14:13, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
>> As of 4.8-rc6, the OOM reaper cannot take mmap_sem for read at __oom_reap_task()
>> because of TIF_MEMDIE thread waiting at ksm_exit() from __mmput() from mmput()
>>  from exit_mm() from do_exit(). Thus, __oom_reap_task() returns false and
>> oom_reap_task() will emit "oom_reaper: unable to reap pid:%d (%s)\n" message.
>> Then, oom_reap_task() clears TIF_MEMDIE from that thread, which in turn
>> makes oom_scan_process_thread() not to return OOM_SCAN_ABORT because
>> atomic_read(&task->signal->oom_victims) becomes 0 due to exit_oom_victim()
>> by the OOM reaper. Then, the OOM killer selects next OOM victim because
>> ksmd is waking up the OOM killer via a __GFP_FS allocation request.
  hi, Tetsuo

  OOM reaper indeed relieve the issue,  as had discussed with Michal,  but it is not completely
  solved.  and OOM livelock had been solved by backport the patch from Michal.
 
  The key is that ksmd enter into the OOM and bail out quickly. because other process implement
  a OOM in the same zone. therefore, ksmd can not obtain the memory.

  Thanks
  zhongjiang
> Oops. As of 4.8-rc6, __oom_reap_task() returns true because of
> find_lock_task_mm() returning NULL. Thus, oom_reap_task() clears TIF_MEMDIE
> without emitting "oom_reaper: unable to reap pid:%d (%s)\n" message.
>
>> Thus, this bug will be completely solved (at the cost of selecting next
>> OOM victim) as of 4.8-rc6.
> The conclusion is same.
>
> .
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-18  6:00                           ` Tetsuo Handa
  2016-09-18  6:13                             ` Tetsuo Handa
@ 2016-09-19  7:15                             ` zhong jiang
  1 sibling, 0 replies; 26+ messages in thread
From: zhong jiang @ 2016-09-19  7:15 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mhocko, akpm, vbabka, rientjes, linux-mm, qiuxishi, guohanjun, hughd

On 2016/9/18 14:00, Tetsuo Handa wrote:
> zhong jiang wrote:
>> On 2016/9/14 19:29, Tetsuo Handa wrote:
>>> On 2016/09/14 18:25, zhong jiang wrote:
>>>> On 2016/9/14 16:52, Michal Hocko wrote:
>>>>> On Wed 14-09-16 10:42:19, Michal Hocko wrote:
>>>>>> [Let's CC Hugh]
>>>>> now for real...
>>>>>
>>>>>> On Wed 14-09-16 15:13:50, zhong jiang wrote:
>>>>>> [...]
>>>>>>>   hi, Michal
>>>>>>>
>>>>>>>   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
>>>>>>>  
>>>>>>> [  601.937145] Call trace:
>>>>>>> [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
>>>>>>> [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
>>>>>>> [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
>>>>>>> [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
>>>>>>> [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
>>>>>>> [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
>>>>>>> [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
>>>>>>> [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
>>>>>>> [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
>>>>>>> [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
>>>>>>> [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
>>>>>>> [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
>>> Please be sure to include exact kernel version (e.g. "uname -r",
>>> "cat /proc/version") when reporting.
>>>
>>> You are reporting a bug in 4.1-stable kernel, which was prone to
>>> OOM livelock because the OOM reaper is not available.
>>> ( http://lkml.kernel.org/r/57D8012F.7080508@huawei.com )
>>>
>>> I think we no longer can reproduce this bug using 4.8-rc6 (or linux-next),
>>> but it will be a nice thing to backport __GFP_NORETRY patch to stable
>>> kernels which do not have the OOM reaper.
>>   No, OOM reaper can not  solve the issue completely , As had disscussed with Michal.
>>   The conclusion is that we need come up with a better method to fix it.
>>
>>    Thanks
>>   zhongjiang
> I still think we no longer can reproduce this bug using 4.8-rc6 (or linux-next).
>
> As of 4.1-stable, this bug caused OOM livelock situation because TIF_MEMDIE was
> cleared only after returning from mmput() from exit_mm() from do_exit(). Since
> there is a TIF_MEMDIE thread waiting at mmput() exists, the OOM killer does not
> select next OOM victim because oom_scan_process_thread() returns OOM_SCAN_ABORT,
> although ksmd is waking up the OOM killer via a __GFP_FS allocation request.
>
> As of 4.8-rc6, the OOM reaper cannot take mmap_sem for read at __oom_reap_task()
> because of TIF_MEMDIE thread waiting at ksm_exit() from __mmput() from mmput()
>  from exit_mm() from do_exit(). Thus, __oom_reap_task() returns false and
> oom_reap_task() will emit "oom_reaper: unable to reap pid:%d (%s)\n" message.
> Then, oom_reap_task() clears TIF_MEMDIE from that thread, which in turn
> makes oom_scan_process_thread() not to return OOM_SCAN_ABORT because
> atomic_read(&task->signal->oom_victims) becomes 0 due to exit_oom_victim()
> by the OOM reaper. Then, the OOM killer selects next OOM victim because
> ksmd is waking up the OOM killer via a __GFP_FS allocation request.
  The issue will come up just in arm64, x86 is not.
> Thus, this bug will be completely solved (at the cost of selecting next
> OOM victim) as of 4.8-rc6.
>
>>>> Adding the __GFP_NORETRY,  the issue also can fixed.
>>>> Therefore, we can assure that the case of LTP will leads to the endless looping.
>>>>
>>>> index d45a0a1..03fb67b 100644
>>>> --- a/mm/ksm.c
>>>> +++ b/mm/ksm.c
>>>> @@ -283,7 +283,7 @@ static inline struct rmap_item *alloc_rmap_item(void)
>>>>  {
>>>>         struct rmap_item *rmap_item;
>>>>
>>>> -       rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL);
>>>> +       rmap_item = kmem_cache_zalloc(rmap_item_cache, GFP_KERNEL | __GFP_NORETRY);
>>>>         if (rmap_item)
>>>>                 ksm_rmap_items++;
>>>>         return rmap_item;
> Your patch to add __GFP_NORETRY and __GFP_NOWARN is OK. But please explicitly state
> that you hit this bug in 4.1-stable. Also, your trace is not only a hung task but also
> an OOM livelock; the kernel as of 4.1-stable is silent when OOM livelock situation
> occurred.
>
> .
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mm: fix oom work when memory is under pressure
  2016-09-18 14:42                         ` Michal Hocko
@ 2016-09-19 17:27                           ` Hugh Dickins
  0 siblings, 0 replies; 26+ messages in thread
From: Hugh Dickins @ 2016-09-19 17:27 UTC (permalink / raw)
  To: zhong jiang
  Cc: Michal Hocko, Hugh Dickins, akpm, vbabka, rientjes, linux-mm,
	Xishi Qiu, Hanjun Guo

On Sun, 18 Sep 2016, Michal Hocko wrote:
> On Sun 18-09-16 12:04:47, zhong jiang wrote:
> [...]
> >  index 5048083..72dc475 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -299,7 +299,7 @@ static inline void free_rmap_item(struct rmap_item *rmap_item)
> > 
> >  static inline struct stable_node *alloc_stable_node(void)
> >  {
> > -       return kmem_cache_alloc(stable_node_cache, GFP_KERNEL);
> > +       return kmem_cache_alloc(stable_node_cache, __GFP_HIGH);
> >  }
> 
> I do not want to speak for Hugh but I believe he meant something
> different. The above will grant access to memory reserves but it doesn't
> wake kswapd nor the direct reclaim. I guess he meant GFP_KERNEL | __GFP_HIGH

You speak for me correctly, Michal: sorry I wasn't clear, Zhongjiang, yes,
I meant __GFP_HIGH as a modifier for GFP_KERNEL: GFP_KERNEL | __GFP_HIGH

And after running it past Michal and thinking on it some more, I do
still think that it's the right thing to do for alloc_stable_node().
But please only include that change in your patch if you yourself are
comfortable with it: it is very definitely a much lower order issue
than the alloc_rmap_item() issue, and can always be added later.

(I have no view on whether and how this problem still occurs with
OOM reaper: I leave the thinking on that to you and the experts.)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-09-19 17:27 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-06 14:47 [PATCH] mm: fix oom work when memory is under pressure zhongjiang
2016-09-09 11:44 ` Michal Hocko
2016-09-12  9:51   ` zhong jiang
2016-09-12 11:13     ` Michal Hocko
2016-09-12 13:42       ` zhong jiang
2016-09-12 17:44         ` Michal Hocko
2016-09-13 13:13           ` zhong jiang
2016-09-13 13:28             ` Michal Hocko
2016-09-13 14:01               ` zhong jiang
2016-09-14  7:13               ` zhong jiang
2016-09-14  8:42                 ` Michal Hocko
2016-09-14  8:50                   ` zhong jiang
2016-09-14  9:05                     ` Michal Hocko
2016-09-14  8:52                   ` Michal Hocko
2016-09-14  9:25                     ` zhong jiang
2016-09-14 11:29                       ` Tetsuo Handa
2016-09-14 13:52                         ` zhong jiang
2016-09-18  6:00                           ` Tetsuo Handa
2016-09-18  6:13                             ` Tetsuo Handa
2016-09-19  4:44                               ` zhong jiang
2016-09-19  7:15                             ` zhong jiang
2016-09-16 22:13                     ` Hugh Dickins
2016-09-17 15:56                       ` Michal Hocko
2016-09-18  4:04                       ` zhong jiang
2016-09-18 14:42                         ` Michal Hocko
2016-09-19 17:27                           ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.