linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm,page_alloc: softlockup on warn_alloc on
@ 2017-09-15  9:58 wang Yu
  2017-09-15 10:39 ` Michal Hocko
  2017-09-15 14:37 ` Johannes Weiner
  0 siblings, 2 replies; 20+ messages in thread
From: wang Yu @ 2017-09-15  9:58 UTC (permalink / raw)
  To: mhocko, penguin-kernel, linux-mm; +Cc: chenggang.qcg, yuwang.yuwang

From: "yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>

I found a softlockup when running some stress testcase in 4.9.x,
but i think the mainline have the same problem.

call trace:
[365724.502896] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s!
[jbd2/sda3-8:1164]
...
...
[365724.503258] Call Trace:
[365724.503260]  [<ffffffff811ace5f>] warn_alloc+0x13f/0x170
[365724.503264]  [<ffffffff811ad8c2>] __alloc_pages_slowpath+0x9b2/0xc10
[365724.503265]  [<ffffffff811add43>] __alloc_pages_nodemask+0x223/0x2a0
[365724.503268]  [<ffffffff811fe838>] alloc_pages_current+0x88/0x120
[365724.503270]  [<ffffffff811a3644>] __page_cache_alloc+0xb4/0xc0
[365724.503272]  [<ffffffff811a49e9>] pagecache_get_page+0x59/0x230
[365724.503275]  [<ffffffff8126b2db>] __getblk_gfp+0xfb/0x2f0
[365724.503281]  [<ffffffffa00f9cee>]
jbd2_journal_get_descriptor_buffer+0x5e/0xe0 [jbd2]
[365724.503286]  [<ffffffffa00f2a01>]
jbd2_journal_commit_transaction+0x901/0x1880 [jbd2]
[365724.503291]  [<ffffffff8102d6a5>] ? __switch_to+0x215/0x730
[365724.503294]  [<ffffffff810f962d>] ? lock_timer_base+0x7d/0xa0
[365724.503298]  [<ffffffffa00f7cda>] kjournald2+0xca/0x260 [jbd2]
[365724.503300]  [<ffffffff810cfb00>] ? prepare_to_wait_event+0xf0/0xf0
[365724.503304]  [<ffffffffa00f7c10>] ? commit_timeout+0x10/0x10 [jbd2]
[365724.503307]  [<ffffffff810a8d66>] kthread+0xe6/0x100
[365724.503309]  [<ffffffff810a8c80>] ? kthread_park+0x60/0x60
[365724.503313]  [<ffffffff816f3795>] ret_from_fork+0x25/0x30

we can limit the warn_alloc caller to workaround it.
__alloc_pages_slowpath only call once warn_alloc each time.

Signed-off-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
---
 mm/page_alloc.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2abf8d5..8b86686 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3525,6 +3525,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 	unsigned long alloc_start = jiffies;
 	unsigned int stall_timeout = 10 * HZ;
 	unsigned int cpuset_mems_cookie;
+	static unsigned long stall_warn_lock;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -3698,11 +3699,13 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 		goto nopage;
 
 	/* Make sure we know about allocations which stall for too long */
-	if (time_after(jiffies, alloc_start + stall_timeout)) {
+	if (time_after(jiffies, alloc_start + stall_timeout) &&
+		!test_and_set_bit_lock(0, &stall_warn_lock)) {
 		warn_alloc(gfp_mask,
 			"page allocation stalls for %ums, order:%u",
 			jiffies_to_msecs(jiffies-alloc_start), order);
-		stall_timeout += 10 * HZ;
+		stall_timeout = jiffies - alloc_start + 10 * HZ;
+		clear_bit_unlock(0, &stall_warn_lock);
 	}
 
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15  9:58 [PATCH] mm,page_alloc: softlockup on warn_alloc on wang Yu
@ 2017-09-15 10:39 ` Michal Hocko
  2017-09-15 11:38   ` Tetsuo Handa
  2017-09-15 14:37 ` Johannes Weiner
  1 sibling, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-09-15 10:39 UTC (permalink / raw)
  To: wang Yu; +Cc: penguin-kernel, linux-mm, chenggang.qcg, yuwang.yuwang

On Fri 15-09-17 17:58:49, wang Yu wrote:
> From: "yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>
> 
> I found a softlockup when running some stress testcase in 4.9.x,
> but i think the mainline have the same problem.
> 
> call trace:
> [365724.502896] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s!
> [jbd2/sda3-8:1164]
> ...
> ...
> [365724.503258] Call Trace:
> [365724.503260]  [<ffffffff811ace5f>] warn_alloc+0x13f/0x170
> [365724.503264]  [<ffffffff811ad8c2>] __alloc_pages_slowpath+0x9b2/0xc10
> [365724.503265]  [<ffffffff811add43>] __alloc_pages_nodemask+0x223/0x2a0
> [365724.503268]  [<ffffffff811fe838>] alloc_pages_current+0x88/0x120
> [365724.503270]  [<ffffffff811a3644>] __page_cache_alloc+0xb4/0xc0
> [365724.503272]  [<ffffffff811a49e9>] pagecache_get_page+0x59/0x230
> [365724.503275]  [<ffffffff8126b2db>] __getblk_gfp+0xfb/0x2f0
> [365724.503281]  [<ffffffffa00f9cee>]
> jbd2_journal_get_descriptor_buffer+0x5e/0xe0 [jbd2]
> [365724.503286]  [<ffffffffa00f2a01>]
> jbd2_journal_commit_transaction+0x901/0x1880 [jbd2]
> [365724.503291]  [<ffffffff8102d6a5>] ? __switch_to+0x215/0x730
> [365724.503294]  [<ffffffff810f962d>] ? lock_timer_base+0x7d/0xa0
> [365724.503298]  [<ffffffffa00f7cda>] kjournald2+0xca/0x260 [jbd2]
> [365724.503300]  [<ffffffff810cfb00>] ? prepare_to_wait_event+0xf0/0xf0
> [365724.503304]  [<ffffffffa00f7c10>] ? commit_timeout+0x10/0x10 [jbd2]
> [365724.503307]  [<ffffffff810a8d66>] kthread+0xe6/0x100
> [365724.503309]  [<ffffffff810a8c80>] ? kthread_park+0x60/0x60
> [365724.503313]  [<ffffffff816f3795>] ret_from_fork+0x25/0x30
> 
> we can limit the warn_alloc caller to workaround it.
> __alloc_pages_slowpath only call once warn_alloc each time.

similar attempts to add a lock there were tried in the past and refused.
Anyway using a normal lock would be preferred over a bit lock. But the
most important part is to identify _why_ we see the lockup trigerring in
the first place. And try to fix it rather than workaround it here.

> Signed-off-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
> Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> ---
>  mm/page_alloc.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2abf8d5..8b86686 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3525,6 +3525,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>  	unsigned long alloc_start = jiffies;
>  	unsigned int stall_timeout = 10 * HZ;
>  	unsigned int cpuset_mems_cookie;
> +	static unsigned long stall_warn_lock;
>  
>  	/*
>  	 * In the slowpath, we sanity check order to avoid ever trying to
> @@ -3698,11 +3699,13 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
>  		goto nopage;
>  
>  	/* Make sure we know about allocations which stall for too long */
> -	if (time_after(jiffies, alloc_start + stall_timeout)) {
> +	if (time_after(jiffies, alloc_start + stall_timeout) &&
> +		!test_and_set_bit_lock(0, &stall_warn_lock)) {
>  		warn_alloc(gfp_mask,
>  			"page allocation stalls for %ums, order:%u",
>  			jiffies_to_msecs(jiffies-alloc_start), order);
> -		stall_timeout += 10 * HZ;
> +		stall_timeout = jiffies - alloc_start + 10 * HZ;
> +		clear_bit_unlock(0, &stall_warn_lock);
>  	}
>  
>  	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 10:39 ` Michal Hocko
@ 2017-09-15 11:38   ` Tetsuo Handa
  2017-09-15 12:00     ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2017-09-15 11:38 UTC (permalink / raw)
  To: mhocko, yuwang668899
  Cc: vbabka, mpatocka, hannes, mgorman, dave.hansen, akpm, linux-mm,
	chenggang.qcg, yuwang.yuwang

Michal Hocko wrote:
> On Fri 15-09-17 17:58:49, wang Yu wrote:
> > From: "yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>
> > 
> > I found a softlockup when running some stress testcase in 4.9.x,
> > but i think the mainline have the same problem.
> > 
> > call trace:
> > [365724.502896] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s!
> > [jbd2/sda3-8:1164]
> > ...
> > ...
> > [365724.503258] Call Trace:
> > [365724.503260]  [<ffffffff811ace5f>] warn_alloc+0x13f/0x170
> > [365724.503264]  [<ffffffff811ad8c2>] __alloc_pages_slowpath+0x9b2/0xc10
> > [365724.503265]  [<ffffffff811add43>] __alloc_pages_nodemask+0x223/0x2a0
> > [365724.503268]  [<ffffffff811fe838>] alloc_pages_current+0x88/0x120
> > [365724.503270]  [<ffffffff811a3644>] __page_cache_alloc+0xb4/0xc0
> > [365724.503272]  [<ffffffff811a49e9>] pagecache_get_page+0x59/0x230
> > [365724.503275]  [<ffffffff8126b2db>] __getblk_gfp+0xfb/0x2f0
> > [365724.503281]  [<ffffffffa00f9cee>]
> > jbd2_journal_get_descriptor_buffer+0x5e/0xe0 [jbd2]
> > [365724.503286]  [<ffffffffa00f2a01>]
> > jbd2_journal_commit_transaction+0x901/0x1880 [jbd2]
> > [365724.503291]  [<ffffffff8102d6a5>] ? __switch_to+0x215/0x730
> > [365724.503294]  [<ffffffff810f962d>] ? lock_timer_base+0x7d/0xa0
> > [365724.503298]  [<ffffffffa00f7cda>] kjournald2+0xca/0x260 [jbd2]
> > [365724.503300]  [<ffffffff810cfb00>] ? prepare_to_wait_event+0xf0/0xf0
> > [365724.503304]  [<ffffffffa00f7c10>] ? commit_timeout+0x10/0x10 [jbd2]
> > [365724.503307]  [<ffffffff810a8d66>] kthread+0xe6/0x100
> > [365724.503309]  [<ffffffff810a8c80>] ? kthread_park+0x60/0x60
> > [365724.503313]  [<ffffffff816f3795>] ret_from_fork+0x25/0x30
> > 
> > we can limit the warn_alloc caller to workaround it.
> > __alloc_pages_slowpath only call once warn_alloc each time.
> 
> similar attempts to add a lock there were tried in the past and refused.

Wang already read that thread before proposing this patch.

> Anyway using a normal lock would be preferred over a bit lock. But the
> most important part is to identify _why_ we see the lockup trigerring in
> the first place. And try to fix it rather than workaround it here.

The bitlock is what Andrew thought at
http://lkml.kernel.org/r/20170601151022.b17716472adbf0e6d51fb011@linux-foundation.org .
I'm OK with using a normal lock.

This patch does not make callers of warn_alloc() sleep. This is different from
past attempt.

You said "identify _why_ we see the lockup trigerring in the first place" without
providing means to identify it. Unless you provide means to identify it (in a form
which can be immediately and easily backported to 4.9 kernels; that is, backporting
not-yet-accepted printk() offloading patchset is not a choice), this patch cannot be
refused.

> 
> > Signed-off-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
> > Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> > ---
> >  mm/page_alloc.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 2abf8d5..8b86686 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3525,6 +3525,7 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> >  	unsigned long alloc_start = jiffies;
> >  	unsigned int stall_timeout = 10 * HZ;
> >  	unsigned int cpuset_mems_cookie;
> > +	static unsigned long stall_warn_lock;
> >  
> >  	/*
> >  	 * In the slowpath, we sanity check order to avoid ever trying to
> > @@ -3698,11 +3699,13 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> >  		goto nopage;
> >  
> >  	/* Make sure we know about allocations which stall for too long */
> > -	if (time_after(jiffies, alloc_start + stall_timeout)) {
> > +	if (time_after(jiffies, alloc_start + stall_timeout) &&
> > +		!test_and_set_bit_lock(0, &stall_warn_lock)) {
> >  		warn_alloc(gfp_mask,
> >  			"page allocation stalls for %ums, order:%u",
> >  			jiffies_to_msecs(jiffies-alloc_start), order);
> > -		stall_timeout += 10 * HZ;
> > +		stall_timeout = jiffies - alloc_start + 10 * HZ;
> > +		clear_bit_unlock(0, &stall_warn_lock);
> >  	}
> >  
> >  	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
> > -- 
> > 1.8.3.1
> 
> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 11:38   ` Tetsuo Handa
@ 2017-09-15 12:00     ` Michal Hocko
  2017-09-15 12:09       ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-09-15 12:00 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: yuwang668899, vbabka, mpatocka, hannes, mgorman, dave.hansen,
	akpm, linux-mm, chenggang.qcg, yuwang.yuwang

On Fri 15-09-17 20:38:49, Tetsuo Handa wrote:
[...]
> You said "identify _why_ we see the lockup trigerring in the first
> place" without providing means to identify it. Unless you provide
> means to identify it (in a form which can be immediately and easily
> backported to 4.9 kernels; that is, backporting not-yet-accepted
> printk() offloading patchset is not a choice), this patch cannot be
> refused.

I fail to see why. It simply workarounds an existing problem elsewhere
in the kernel without deeper understanding on where the problem is. You
can add your own instrumentation to debug and describe the problem. This
is no different to any other kernel bugs...

If our printk implementation is so weak it cannot cope with writers then
that should be fixed without spreading hacks in different subsystems. If
the lockup is a real problem under normal workloads (rather than
artificial ones) then we should try to throttle more aggresively.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 12:00     ` Michal Hocko
@ 2017-09-15 12:09       ` Tetsuo Handa
  2017-09-15 12:14         ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2017-09-15 12:09 UTC (permalink / raw)
  To: mhocko
  Cc: yuwang668899, vbabka, mpatocka, hannes, mgorman, dave.hansen,
	akpm, linux-mm, chenggang.qcg, yuwang.yuwang

Michal Hocko wrote:
> On Fri 15-09-17 20:38:49, Tetsuo Handa wrote:
> [...]
> > You said "identify _why_ we see the lockup trigerring in the first
> > place" without providing means to identify it. Unless you provide
> > means to identify it (in a form which can be immediately and easily
> > backported to 4.9 kernels; that is, backporting not-yet-accepted
> > printk() offloading patchset is not a choice), this patch cannot be
> > refused.
> 
> I fail to see why. It simply workarounds an existing problem elsewhere
> in the kernel without deeper understanding on where the problem is. You
> can add your own instrumentation to debug and describe the problem. This
> is no different to any other kernel bugs...

Please do show us your patch for that. Normal users cannot afford developing
such instrumentation to debug and describe the problem.

> 
> If our printk implementation is so weak it cannot cope with writers then
> that should be fixed without spreading hacks in different subsystems. If
> the lockup is a real problem under normal workloads (rather than
> artificial ones) then we should try to throttle more aggresively.

No throttle please. Throttling makes warn_alloc() more and more useless.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 12:09       ` Tetsuo Handa
@ 2017-09-15 12:14         ` Michal Hocko
  2017-09-15 14:12           ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-09-15 12:14 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: yuwang668899, vbabka, mpatocka, hannes, mgorman, dave.hansen,
	akpm, linux-mm, chenggang.qcg, yuwang.yuwang

On Fri 15-09-17 21:09:29, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 15-09-17 20:38:49, Tetsuo Handa wrote:
> > [...]
> > > You said "identify _why_ we see the lockup trigerring in the first
> > > place" without providing means to identify it. Unless you provide
> > > means to identify it (in a form which can be immediately and easily
> > > backported to 4.9 kernels; that is, backporting not-yet-accepted
> > > printk() offloading patchset is not a choice), this patch cannot be
> > > refused.
> > 
> > I fail to see why. It simply workarounds an existing problem elsewhere
> > in the kernel without deeper understanding on where the problem is. You
> > can add your own instrumentation to debug and describe the problem. This
> > is no different to any other kernel bugs...
> 
> Please do show us your patch for that. Normal users cannot afford developing
> such instrumentation to debug and describe the problem.

Stop this nonsense already! Any kernel bug/lockup needs a debugging
which might be non-trivial and it is necessary to understand the real
culprit. We do not add random hacks to silence a problem. We aim at
fixing it!
 
> > If our printk implementation is so weak it cannot cope with writers then
> > that should be fixed without spreading hacks in different subsystems. If
> > the lockup is a real problem under normal workloads (rather than
> > artificial ones) then we should try to throttle more aggresively.
> 
> No throttle please. Throttling makes warn_alloc() more and more useless.

so does try_lock approach...

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 12:14         ` Michal Hocko
@ 2017-09-15 14:12           ` Tetsuo Handa
  2017-09-15 14:23             ` Michal Hocko
  2017-09-24  1:56             ` Tetsuo Handa
  0 siblings, 2 replies; 20+ messages in thread
From: Tetsuo Handa @ 2017-09-15 14:12 UTC (permalink / raw)
  To: mhocko
  Cc: yuwang668899, vbabka, mpatocka, hannes, mgorman, dave.hansen,
	akpm, linux-mm, chenggang.qcg, yuwang.yuwang

Michal Hocko wrote:
> On Fri 15-09-17 21:09:29, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Fri 15-09-17 20:38:49, Tetsuo Handa wrote:
> > > [...]
> > > > You said "identify _why_ we see the lockup trigerring in the first
> > > > place" without providing means to identify it. Unless you provide
> > > > means to identify it (in a form which can be immediately and easily
> > > > backported to 4.9 kernels; that is, backporting not-yet-accepted
> > > > printk() offloading patchset is not a choice), this patch cannot be
> > > > refused.
> > > 
> > > I fail to see why. It simply workarounds an existing problem elsewhere
> > > in the kernel without deeper understanding on where the problem is. You
> > > can add your own instrumentation to debug and describe the problem. This
> > > is no different to any other kernel bugs...
> > 
> > Please do show us your patch for that. Normal users cannot afford developing
> > such instrumentation to debug and describe the problem.
> 
> Stop this nonsense already! Any kernel bug/lockup needs a debugging
> which might be non-trivial and it is necessary to understand the real
> culprit. We do not add random hacks to silence a problem. We aim at
> fixing it!

Assuming that Wang Yu's trace has

  RIP: 0010:[<...>]  [<...>] dump_stack+0x.../0x...

line in the omitted part (like Cong Wang's trace did), I suspect that a thread
which is holding dump_lock is unable to leave console_unlock() from printk() for
so long because many other threads are trying to call printk() from warn_alloc()
while consuming all CPU time.

Thus, not allowing other threads to consume CPU time / call printk() is a step for
isolating it. If this problem still exists even if we made other threads sleep,
the real cause will be somewhere else. But unfortunately Cong Wang has not yet
succeeded with reproducing the problem. If Wang Yu is able to reproduce the problem,
we can try setting 1 to /proc/sys/kernel/softlockup_all_cpu_backtrace so that
we can know what other CPUs are doing.

>  
> > > If our printk implementation is so weak it cannot cope with writers then
> > > that should be fixed without spreading hacks in different subsystems. If
> > > the lockup is a real problem under normal workloads (rather than
> > > artificial ones) then we should try to throttle more aggresively.
> > 
> > No throttle please. Throttling makes warn_alloc() more and more useless.
> 
> so does try_lock approach...

There is mutex_lock() approach, but you don't agree on using it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 14:12           ` Tetsuo Handa
@ 2017-09-15 14:23             ` Michal Hocko
  2017-09-24  1:56             ` Tetsuo Handa
  1 sibling, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2017-09-15 14:23 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: yuwang668899, vbabka, mpatocka, hannes, mgorman, dave.hansen,
	akpm, linux-mm, chenggang.qcg, yuwang.yuwang

On Fri 15-09-17 23:12:24, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 15-09-17 21:09:29, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Fri 15-09-17 20:38:49, Tetsuo Handa wrote:
> > > > [...]
> > > > > You said "identify _why_ we see the lockup trigerring in the first
> > > > > place" without providing means to identify it. Unless you provide
> > > > > means to identify it (in a form which can be immediately and easily
> > > > > backported to 4.9 kernels; that is, backporting not-yet-accepted
> > > > > printk() offloading patchset is not a choice), this patch cannot be
> > > > > refused.
> > > > 
> > > > I fail to see why. It simply workarounds an existing problem elsewhere
> > > > in the kernel without deeper understanding on where the problem is. You
> > > > can add your own instrumentation to debug and describe the problem. This
> > > > is no different to any other kernel bugs...
> > > 
> > > Please do show us your patch for that. Normal users cannot afford developing
> > > such instrumentation to debug and describe the problem.
> > 
> > Stop this nonsense already! Any kernel bug/lockup needs a debugging
> > which might be non-trivial and it is necessary to understand the real
> > culprit. We do not add random hacks to silence a problem. We aim at
> > fixing it!
> 
> Assuming that Wang Yu's trace has
> 
>   RIP: 0010:[<...>]  [<...>] dump_stack+0x.../0x...
> 
> line in the omitted part (like Cong Wang's trace did), I suspect that a thread
> which is holding dump_lock is unable to leave console_unlock() from printk() for
> so long because many other threads are trying to call printk() from warn_alloc()
> while consuming all CPU time.

__dump_stack should be an atomic context AFAIR. But as we already
discussed some time ago this lock is not fair and one function might
bounce for too long.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15  9:58 [PATCH] mm,page_alloc: softlockup on warn_alloc on wang Yu
  2017-09-15 10:39 ` Michal Hocko
@ 2017-09-15 14:37 ` Johannes Weiner
  2017-09-15 15:23   ` Tetsuo Handa
                     ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Johannes Weiner @ 2017-09-15 14:37 UTC (permalink / raw)
  To: wang Yu
  Cc: mhocko, penguin-kernel, linux-mm, chenggang.qcg, yuwang.yuwang,
	Andrew Morton

On Fri, Sep 15, 2017 at 05:58:49PM +0800, wang Yu wrote:
> From: "yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>
> 
> I found a softlockup when running some stress testcase in 4.9.x,
> but i think the mainline have the same problem.
> 
> call trace:
> [365724.502896] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s!
> [jbd2/sda3-8:1164]

We've started seeing the same thing on 4.11. Tons and tons of
allocation stall warnings followed by the soft lock-ups.

These allocation stalls happen when the allocating task reclaims
successfully yet isn't able to allocate, meaning other threads are
stealing those pages.

Now, it *looks* like something changed recently to make this race
window wider, and there might well be a bug there. But regardless, we
have a real livelock or at least starvation window here, where
reclaimers have their bounty continuously stolen by concurrent allocs;
but instead of recognizing and handling the situation, we flood the
console which in many cases adds fuel to the fire.

When threads cannibalize each other to the point where one of them can
reclaim but not allocate for 10s, it's safe to say we are out of
memory. I think we need something like the below regardless of any
other investigations and fixes into the root cause here.

But Michal, this needs an answer. We don't want to paper over bugs,
but we also cannot continue to ship a kernel that has a known issue
and for which there are mitigation fixes, root-caused or not.

How can we figure out if there is a bug here? Can we time the calls to
__alloc_pages_direct_reclaim() and __alloc_pages_direct_compact() and
drill down from there? Print out the number of times we have retried?
We're counting no_progress_loops, but we are also very much interested
in progress_loops that didn't result in a successful allocation. Too
many of those and I think we want to OOM kill as per above.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bec5e96f3b88..01736596389a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3830,6 +3830,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 			"page allocation stalls for %ums, order:%u",
 			jiffies_to_msecs(jiffies-alloc_start), order);
 		stall_timeout += 10 * HZ;
+		goto oom;
 	}
 
 	/* Avoid recursion of direct reclaim */
@@ -3882,6 +3883,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (read_mems_allowed_retry(cpuset_mems_cookie))
 		goto retry_cpuset;
 
+oom:
 	/* Reclaim has failed us, start killing things */
 	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
 	if (page)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 14:37 ` Johannes Weiner
@ 2017-09-15 15:23   ` Tetsuo Handa
  2017-09-15 18:44     ` Johannes Weiner
  2017-09-16  4:12   ` Tetsuo Handa
  2017-09-18  6:03   ` Michal Hocko
  2 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2017-09-15 15:23 UTC (permalink / raw)
  To: hannes, yuwang668899; +Cc: mhocko, linux-mm, chenggang.qcg, yuwang.yuwang, akpm

Johannes Weiner wrote:
> How can we figure out if there is a bug here? Can we time the calls to
> __alloc_pages_direct_reclaim() and __alloc_pages_direct_compact() and
> drill down from there? Print out the number of times we have retried?
> We're counting no_progress_loops, but we are also very much interested
> in progress_loops that didn't result in a successful allocation. Too
> many of those and I think we want to OOM kill as per above.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bec5e96f3b88..01736596389a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3830,6 +3830,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  			"page allocation stalls for %ums, order:%u",
>  			jiffies_to_msecs(jiffies-alloc_start), order);
>  		stall_timeout += 10 * HZ;
> +		goto oom;
>  	}
>  
>  	/* Avoid recursion of direct reclaim */
> @@ -3882,6 +3883,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (read_mems_allowed_retry(cpuset_mems_cookie))
>  		goto retry_cpuset;
>  
> +oom:
>  	/* Reclaim has failed us, start killing things */
>  	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
>  	if (page)
> 

According to my stress tests, it is mutex_trylock() in __alloc_pages_may_oom()
that causes warn_alloc() to be called for so many times. The comment

	/*
	 * Acquire the oom lock.  If that fails, somebody else is
	 * making progress for us.
	 */

is true only if the owner of oom_lock can call out_of_memory() and is __GFP_FS
allocation. Consider a situation where there are 1 GFP_KERNEL allocating thread
and 99 GFP_NOFS/GFP_NOIO allocating threads contending the oom_lock. How likely
the OOM killer is invoked? It is very unlikely because GFP_KERNEL allocating thread
likely fails to grab oom_lock because GFP_NOFS/GFP_NOIO allocating threads is
grabing oom_lock. And GFP_KERNEL allocating thread yields CPU time for
GFP_NOFS/GFP_NOIO allocating threads to waste pointlessly.
s/!mutex_trylock(&oom_lock)/mutex_lock_killable()/ significantly improves
this situation for my stress tests. How is your case?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 15:23   ` Tetsuo Handa
@ 2017-09-15 18:44     ` Johannes Weiner
  2017-09-16  0:25       ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Johannes Weiner @ 2017-09-15 18:44 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: yuwang668899, mhocko, linux-mm, chenggang.qcg, yuwang.yuwang, akpm

On Sat, Sep 16, 2017 at 12:23:53AM +0900, Tetsuo Handa wrote:
> Johannes Weiner wrote:
> > How can we figure out if there is a bug here? Can we time the calls to
> > __alloc_pages_direct_reclaim() and __alloc_pages_direct_compact() and
> > drill down from there? Print out the number of times we have retried?
> > We're counting no_progress_loops, but we are also very much interested
> > in progress_loops that didn't result in a successful allocation. Too
> > many of those and I think we want to OOM kill as per above.
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index bec5e96f3b88..01736596389a 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3830,6 +3830,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  			"page allocation stalls for %ums, order:%u",
> >  			jiffies_to_msecs(jiffies-alloc_start), order);
> >  		stall_timeout += 10 * HZ;
> > +		goto oom;
> >  	}
> >  
> >  	/* Avoid recursion of direct reclaim */
> > @@ -3882,6 +3883,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	if (read_mems_allowed_retry(cpuset_mems_cookie))
> >  		goto retry_cpuset;
> >  
> > +oom:
> >  	/* Reclaim has failed us, start killing things */
> >  	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
> >  	if (page)
> > 
> 
> According to my stress tests, it is mutex_trylock() in __alloc_pages_may_oom()
> that causes warn_alloc() to be called for so many times. The comment
> 
> 	/*
> 	 * Acquire the oom lock.  If that fails, somebody else is
> 	 * making progress for us.
> 	 */
> 
> is true only if the owner of oom_lock can call out_of_memory() and is __GFP_FS
> allocation. Consider a situation where there are 1 GFP_KERNEL allocating thread
> and 99 GFP_NOFS/GFP_NOIO allocating threads contending the oom_lock. How likely
> the OOM killer is invoked? It is very unlikely because GFP_KERNEL allocating thread
> likely fails to grab oom_lock because GFP_NOFS/GFP_NOIO allocating threads is
> grabing oom_lock. And GFP_KERNEL allocating thread yields CPU time for
> GFP_NOFS/GFP_NOIO allocating threads to waste pointlessly.
> s/!mutex_trylock(&oom_lock)/mutex_lock_killable()/ significantly improves
> this situation for my stress tests. How is your case?

Interesting analysis, that definitely sounds plausible.

It just started happening to us in production and I haven't isolated
it yet. If you already have a reproducer, that's excellent.

The synchronization has worked this way for a long time (trylock
failure assuming progress, but the order/NOFS/zone bailouts from
actually OOM-killing inside the locked section). We should really fix
*that* rather than serializing warn_alloc().

For GFP_NOFS, it seems to go back to 9879de7373fc ("mm: page_alloc:
embed OOM killing naturally into allocation slowpath"). Before that we
didn't use to call __alloc_pages_may_oom() for NOFS allocations. So I
still wonder why this only now appears to be causing problems.

In any case, converting that trylock to a sleeping lock in this case
makes sense to me. Nobody is blocking under this lock (except that one
schedule_timeout_killable(1) after dispatching a victim) and it's not
obvious to me why we'd need that level of concurrency under OOM.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 18:44     ` Johannes Weiner
@ 2017-09-16  0:25       ` Tetsuo Handa
  2017-09-18  6:05         ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2017-09-16  0:25 UTC (permalink / raw)
  To: hannes; +Cc: yuwang668899, mhocko, linux-mm, chenggang.qcg, yuwang.yuwang, akpm

Johannes Weiner wrote:
> On Sat, Sep 16, 2017 at 12:23:53AM +0900, Tetsuo Handa wrote:
> > Johannes Weiner wrote:
> > > How can we figure out if there is a bug here? Can we time the calls to
> > > __alloc_pages_direct_reclaim() and __alloc_pages_direct_compact() and
> > > drill down from there? Print out the number of times we have retried?
> > > We're counting no_progress_loops, but we are also very much interested
> > > in progress_loops that didn't result in a successful allocation. Too
> > > many of those and I think we want to OOM kill as per above.
> > > 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index bec5e96f3b88..01736596389a 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -3830,6 +3830,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >  			"page allocation stalls for %ums, order:%u",
> > >  			jiffies_to_msecs(jiffies-alloc_start), order);
> > >  		stall_timeout += 10 * HZ;
> > > +		goto oom;
> > >  	}
> > >  
> > >  	/* Avoid recursion of direct reclaim */
> > > @@ -3882,6 +3883,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >  	if (read_mems_allowed_retry(cpuset_mems_cookie))
> > >  		goto retry_cpuset;
> > >  
> > > +oom:
> > >  	/* Reclaim has failed us, start killing things */
> > >  	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
> > >  	if (page)
> > > 
> > 
> > According to my stress tests, it is mutex_trylock() in __alloc_pages_may_oom()
> > that causes warn_alloc() to be called for so many times. The comment
> > 
> > 	/*
> > 	 * Acquire the oom lock.  If that fails, somebody else is
> > 	 * making progress for us.
> > 	 */
> > 
> > is true only if the owner of oom_lock can call out_of_memory() and is __GFP_FS
> > allocation. Consider a situation where there are 1 GFP_KERNEL allocating thread
> > and 99 GFP_NOFS/GFP_NOIO allocating threads contending the oom_lock. How likely
> > the OOM killer is invoked? It is very unlikely because GFP_KERNEL allocating thread
> > likely fails to grab oom_lock because GFP_NOFS/GFP_NOIO allocating threads is
> > grabing oom_lock. And GFP_KERNEL allocating thread yields CPU time for
> > GFP_NOFS/GFP_NOIO allocating threads to waste pointlessly.
> > s/!mutex_trylock(&oom_lock)/mutex_lock_killable()/ significantly improves
> > this situation for my stress tests. How is your case?
> 
> Interesting analysis, that definitely sounds plausible.
> 
> It just started happening to us in production and I haven't isolated
> it yet. If you already have a reproducer, that's excellent.

Well, my reproducer is an artificial stressor. I think you want to test
using natural programs used in your production environment.

> 
> The synchronization has worked this way for a long time (trylock
> failure assuming progress, but the order/NOFS/zone bailouts from
> actually OOM-killing inside the locked section). We should really fix
> *that* rather than serializing warn_alloc().
> 
> For GFP_NOFS, it seems to go back to 9879de7373fc ("mm: page_alloc:
> embed OOM killing naturally into allocation slowpath"). Before that we
> didn't use to call __alloc_pages_may_oom() for NOFS allocations. So I
> still wonder why this only now appears to be causing problems.
> 
> In any case, converting that trylock to a sleeping lock in this case
> makes sense to me. Nobody is blocking under this lock (except that one
> schedule_timeout_killable(1) after dispatching a victim) and it's not
> obvious to me why we'd need that level of concurrency under OOM.

You can try http://lkml.kernel.org/r/1500202791-5427-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
and http://lkml.kernel.org/r/1503577106-9196-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp together.
Then, we can remove mutex_lock(&oom_lock) serialization from __oom_reap_task_mm()
which still exists because Andrea's patch was accepted instead of Michal's patch.

By the way, your environment is not using virtio, is it?
At least virtballoon_oom_notify() is blocking (i.e. might wait for memory
allocation) under oom_lock.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 14:37 ` Johannes Weiner
  2017-09-15 15:23   ` Tetsuo Handa
@ 2017-09-16  4:12   ` Tetsuo Handa
  2017-10-11 11:14     ` Tetsuo Handa
  2017-09-18  6:03   ` Michal Hocko
  2 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2017-09-16  4:12 UTC (permalink / raw)
  To: hannes; +Cc: yuwang668899, mhocko, linux-mm, chenggang.qcg, yuwang.yuwang, akpm

Johannes Weiner wrote:
> On Fri, Sep 15, 2017 at 05:58:49PM +0800, wang Yu wrote:
> > From: "yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>
> > 
> > I found a softlockup when running some stress testcase in 4.9.x,
> > but i think the mainline have the same problem.
> > 
> > call trace:
> > [365724.502896] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s!
> > [jbd2/sda3-8:1164]
> 
> We've started seeing the same thing on 4.11. Tons and tons of
> allocation stall warnings followed by the soft lock-ups.

Forgot to comment. Since you are able to reproduce the problem (aren't you?),
please try setting 1 to /proc/sys/kernel/softlockup_all_cpu_backtrace so that
we can know what other CPUs are doing. It does not need to patch kernels.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 14:37 ` Johannes Weiner
  2017-09-15 15:23   ` Tetsuo Handa
  2017-09-16  4:12   ` Tetsuo Handa
@ 2017-09-18  6:03   ` Michal Hocko
  2 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2017-09-18  6:03 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: wang Yu, penguin-kernel, linux-mm, chenggang.qcg, yuwang.yuwang,
	Andrew Morton

On Fri 15-09-17 07:37:32, Johannes Weiner wrote:
> On Fri, Sep 15, 2017 at 05:58:49PM +0800, wang Yu wrote:
> > From: "yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>
> > 
> > I found a softlockup when running some stress testcase in 4.9.x,
> > but i think the mainline have the same problem.
> > 
> > call trace:
> > [365724.502896] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s!
> > [jbd2/sda3-8:1164]
> 
> We've started seeing the same thing on 4.11. Tons and tons of
> allocation stall warnings followed by the soft lock-ups.

Is the RIP consistent? Where it is reported?

> These allocation stalls happen when the allocating task reclaims
> successfully yet isn't able to allocate, meaning other threads are
> stealing those pages.
> 
> Now, it *looks* like something changed recently to make this race
> window wider, and there might well be a bug there. But regardless, we
> have a real livelock or at least starvation window here, where
> reclaimers have their bounty continuously stolen by concurrent allocs;
> but instead of recognizing and handling the situation, we flood the
> console which in many cases adds fuel to the fire.
> 
> When threads cannibalize each other to the point where one of them can
> reclaim but not allocate for 10s, it's safe to say we are out of
> memory. I think we need something like the below regardless of any
> other investigations and fixes into the root cause here.
> 
> But Michal, this needs an answer. We don't want to paper over bugs,
> but we also cannot continue to ship a kernel that has a known issue
> and for which there are mitigation fixes, root-caused or not.

I am more than happy to ack a fix that actually makes sense. I haven't
seen any so far to be honest. All of them were puting locks around this
stall warning path which is imho not the right way forward. Seeing soft
lockups just because of stall warning is really bad and if it happens
during your normal workloads then I am perfectly happy to ditch the
whole stall warning thingy.
 
> How can we figure out if there is a bug here? Can we time the calls to
> __alloc_pages_direct_reclaim() and __alloc_pages_direct_compact() and
> drill down from there? Print out the number of times we have retried?
> We're counting no_progress_loops, but we are also very much interested
> in progress_loops that didn't result in a successful allocation. Too
> many of those and I think we want to OOM kill as per above.

I do not think so. Low priority thread not making progress you get us
easily into OOM and we do not want to kill in that case.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bec5e96f3b88..01736596389a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3830,6 +3830,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  			"page allocation stalls for %ums, order:%u",
>  			jiffies_to_msecs(jiffies-alloc_start), order);
>  		stall_timeout += 10 * HZ;
> +		goto oom;
>  	}
>  
>  	/* Avoid recursion of direct reclaim */
> @@ -3882,6 +3883,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (read_mems_allowed_retry(cpuset_mems_cookie))
>  		goto retry_cpuset;
>  
> +oom:
>  	/* Reclaim has failed us, start killing things */
>  	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
>  	if (page)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-16  0:25       ` Tetsuo Handa
@ 2017-09-18  6:05         ` Michal Hocko
  2017-09-18  6:31           ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2017-09-18  6:05 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: hannes, yuwang668899, linux-mm, chenggang.qcg, yuwang.yuwang, akpm

On Sat 16-09-17 09:25:26, Tetsuo Handa wrote:
> Johannes Weiner wrote:
> > On Sat, Sep 16, 2017 at 12:23:53AM +0900, Tetsuo Handa wrote:
> > > Johannes Weiner wrote:
> > > > How can we figure out if there is a bug here? Can we time the calls to
> > > > __alloc_pages_direct_reclaim() and __alloc_pages_direct_compact() and
> > > > drill down from there? Print out the number of times we have retried?
> > > > We're counting no_progress_loops, but we are also very much interested
> > > > in progress_loops that didn't result in a successful allocation. Too
> > > > many of those and I think we want to OOM kill as per above.
> > > > 
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index bec5e96f3b88..01736596389a 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -3830,6 +3830,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > > >  			"page allocation stalls for %ums, order:%u",
> > > >  			jiffies_to_msecs(jiffies-alloc_start), order);
> > > >  		stall_timeout += 10 * HZ;
> > > > +		goto oom;
> > > >  	}
> > > >  
> > > >  	/* Avoid recursion of direct reclaim */
> > > > @@ -3882,6 +3883,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > > >  	if (read_mems_allowed_retry(cpuset_mems_cookie))
> > > >  		goto retry_cpuset;
> > > >  
> > > > +oom:
> > > >  	/* Reclaim has failed us, start killing things */
> > > >  	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
> > > >  	if (page)
> > > > 
> > > 
> > > According to my stress tests, it is mutex_trylock() in __alloc_pages_may_oom()
> > > that causes warn_alloc() to be called for so many times. The comment
> > > 
> > > 	/*
> > > 	 * Acquire the oom lock.  If that fails, somebody else is
> > > 	 * making progress for us.
> > > 	 */
> > > 
> > > is true only if the owner of oom_lock can call out_of_memory() and is __GFP_FS
> > > allocation. Consider a situation where there are 1 GFP_KERNEL allocating thread
> > > and 99 GFP_NOFS/GFP_NOIO allocating threads contending the oom_lock. How likely
> > > the OOM killer is invoked? It is very unlikely because GFP_KERNEL allocating thread
> > > likely fails to grab oom_lock because GFP_NOFS/GFP_NOIO allocating threads is
> > > grabing oom_lock. And GFP_KERNEL allocating thread yields CPU time for
> > > GFP_NOFS/GFP_NOIO allocating threads to waste pointlessly.
> > > s/!mutex_trylock(&oom_lock)/mutex_lock_killable()/ significantly improves
> > > this situation for my stress tests. How is your case?
> > 
> > Interesting analysis, that definitely sounds plausible.
> > 
> > It just started happening to us in production and I haven't isolated
> > it yet. If you already have a reproducer, that's excellent.
> 
> Well, my reproducer is an artificial stressor. I think you want to test
> using natural programs used in your production environment.
> 
> > 
> > The synchronization has worked this way for a long time (trylock
> > failure assuming progress, but the order/NOFS/zone bailouts from
> > actually OOM-killing inside the locked section). We should really fix
> > *that* rather than serializing warn_alloc().
> > 
> > For GFP_NOFS, it seems to go back to 9879de7373fc ("mm: page_alloc:
> > embed OOM killing naturally into allocation slowpath"). Before that we
> > didn't use to call __alloc_pages_may_oom() for NOFS allocations. So I
> > still wonder why this only now appears to be causing problems.
> > 
> > In any case, converting that trylock to a sleeping lock in this case
> > makes sense to me. Nobody is blocking under this lock (except that one
> > schedule_timeout_killable(1) after dispatching a victim) and it's not
> > obvious to me why we'd need that level of concurrency under OOM.
> 
> You can try http://lkml.kernel.org/r/1500202791-5427-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
> and http://lkml.kernel.org/r/1503577106-9196-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp together.
> Then, we can remove mutex_lock(&oom_lock) serialization from __oom_reap_task_mm()
> which still exists because Andrea's patch was accepted instead of Michal's patch.

We can safely drop the oom_lock from __oom_reap_task_mm now. Andrea
didn't want to do it in his patch because that is a separate thing
logically. But nothing should prefent the removal now that AFAICS.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-18  6:05         ` Michal Hocko
@ 2017-09-18  6:31           ` Tetsuo Handa
  2017-09-18  6:43             ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2017-09-18  6:31 UTC (permalink / raw)
  To: mhocko; +Cc: hannes, yuwang668899, linux-mm, chenggang.qcg, yuwang.yuwang, akpm

Michal Hocko wrote:
> > > The synchronization has worked this way for a long time (trylock
> > > failure assuming progress, but the order/NOFS/zone bailouts from
> > > actually OOM-killing inside the locked section). We should really fix
> > > *that* rather than serializing warn_alloc().
> > > 
> > > For GFP_NOFS, it seems to go back to 9879de7373fc ("mm: page_alloc:
> > > embed OOM killing naturally into allocation slowpath"). Before that we
> > > didn't use to call __alloc_pages_may_oom() for NOFS allocations. So I
> > > still wonder why this only now appears to be causing problems.
> > > 
> > > In any case, converting that trylock to a sleeping lock in this case
> > > makes sense to me. Nobody is blocking under this lock (except that one
> > > schedule_timeout_killable(1) after dispatching a victim) and it's not
> > > obvious to me why we'd need that level of concurrency under OOM.
> > 
> > You can try http://lkml.kernel.org/r/1500202791-5427-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
> > and http://lkml.kernel.org/r/1503577106-9196-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp together.
> > Then, we can remove mutex_lock(&oom_lock) serialization from __oom_reap_task_mm()
> > which still exists because Andrea's patch was accepted instead of Michal's patch.
> 
> We can safely drop the oom_lock from __oom_reap_task_mm now. Andrea
> didn't want to do it in his patch because that is a separate thing
> logically. But nothing should prefent the removal now that AFAICS.

No! The oom_lock in __oom_reap_task_mm() is still required due to lack of
really last second allocation attempt. If we do really last second
allocation attempt, we can remove the oom_lock from __oom_reap_task_mm().



Enter __alloc_pages_may_oom()              Enter __oom_reap_task_mm()

  Take oom_lock

  Try last get_page_from_freelist()

                                             No "take oom_lock" here

                                             Reap memory

                                             Set MMF_OOM_SKIP

                                             No "release oom_lock" here

                                           Leave __oom_reap_task_mm()

  Enter out_of_memory()

    Enter select_bad_process()

      Enter oom_evaluate_task()

        Check if MMF_OOM_SKIP is already set

      Leave oom_evaluate_task()

    Leave select_bad_process()

    No "really last get_page_from_freelist()" here

    Kill the next victim needlessly

  Leave out_of_memory()

  Release oom_lock

Leave __alloc_pages_may_oom()

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-18  6:31           ` Tetsuo Handa
@ 2017-09-18  6:43             ` Michal Hocko
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2017-09-18  6:43 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: hannes, yuwang668899, linux-mm, chenggang.qcg, yuwang.yuwang, akpm

On Mon 18-09-17 15:31:31, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > > The synchronization has worked this way for a long time (trylock
> > > > failure assuming progress, but the order/NOFS/zone bailouts from
> > > > actually OOM-killing inside the locked section). We should really fix
> > > > *that* rather than serializing warn_alloc().
> > > > 
> > > > For GFP_NOFS, it seems to go back to 9879de7373fc ("mm: page_alloc:
> > > > embed OOM killing naturally into allocation slowpath"). Before that we
> > > > didn't use to call __alloc_pages_may_oom() for NOFS allocations. So I
> > > > still wonder why this only now appears to be causing problems.
> > > > 
> > > > In any case, converting that trylock to a sleeping lock in this case
> > > > makes sense to me. Nobody is blocking under this lock (except that one
> > > > schedule_timeout_killable(1) after dispatching a victim) and it's not
> > > > obvious to me why we'd need that level of concurrency under OOM.
> > > 
> > > You can try http://lkml.kernel.org/r/1500202791-5427-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
> > > and http://lkml.kernel.org/r/1503577106-9196-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp together.
> > > Then, we can remove mutex_lock(&oom_lock) serialization from __oom_reap_task_mm()
> > > which still exists because Andrea's patch was accepted instead of Michal's patch.
> > 
> > We can safely drop the oom_lock from __oom_reap_task_mm now. Andrea
> > didn't want to do it in his patch because that is a separate thing
> > logically. But nothing should prefent the removal now that AFAICS.
> 
> No! The oom_lock in __oom_reap_task_mm() is still required due to lack of
> really last second allocation attempt. If we do really last second
> allocation attempt, we can remove the oom_lock from __oom_reap_task_mm().

Yes, there is a race possible but this is not a _correctness_ issue. It is
a mere suboptimality. This can and should be addressed separately. I was
not really opposed to your patch to move the last allocation attempt
before oom_kill_process once all the concerns are clarified.
 
> Enter __alloc_pages_may_oom()              Enter __oom_reap_task_mm()
> 
>   Take oom_lock
> 
>   Try last get_page_from_freelist()
> 
>                                              No "take oom_lock" here
> 
>                                              Reap memory
> 
>                                              Set MMF_OOM_SKIP
> 
>                                              No "release oom_lock" here
> 
>                                            Leave __oom_reap_task_mm()
> 
>   Enter out_of_memory()
> 
>     Enter select_bad_process()
> 
>       Enter oom_evaluate_task()
> 
>         Check if MMF_OOM_SKIP is already set
> 
>       Leave oom_evaluate_task()
> 
>     Leave select_bad_process()
> 
>     No "really last get_page_from_freelist()" here
> 
>     Kill the next victim needlessly
> 
>   Leave out_of_memory()
> 
>   Release oom_lock
> 
> Leave __alloc_pages_may_oom()

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-15 14:12           ` Tetsuo Handa
  2017-09-15 14:23             ` Michal Hocko
@ 2017-09-24  1:56             ` Tetsuo Handa
  1 sibling, 0 replies; 20+ messages in thread
From: Tetsuo Handa @ 2017-09-24  1:56 UTC (permalink / raw)
  To: yuwang668899
  Cc: mhocko, vbabka, mpatocka, hannes, mgorman, dave.hansen, akpm,
	linux-mm, chenggang.qcg, yuwang.yuwang

Tetsuo Handa wrote:
> Assuming that Wang Yu's trace has
> 
>   RIP: 0010:[<...>]  [<...>] dump_stack+0x.../0x...
> 
> line in the omitted part (like Cong Wang's trace did), I suspect that a thread
> which is holding dump_lock is unable to leave console_unlock() from printk() for
> so long because many other threads are trying to call printk() from warn_alloc()
> while consuming all CPU time.
> 
> Thus, not allowing other threads to consume CPU time / call printk() is a step for
> isolating it. If this problem still exists even if we made other threads sleep,
> the real cause will be somewhere else. But unfortunately Cong Wang has not yet
> succeeded with reproducing the problem. If Wang Yu is able to reproduce the problem,
> we can try setting 1 to /proc/sys/kernel/softlockup_all_cpu_backtrace so that
> we can know what other CPUs are doing.

It seems that Johannes needs more time for getting a test result from production
environment. Meanwhile, for use as a reference, Wang, do you have a chance to retry
your stress test with /proc/sys/kernel/softlockup_all_cpu_backtrace set to 1 ?
I don't have access to environments with many CPUs...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-09-16  4:12   ` Tetsuo Handa
@ 2017-10-11 11:14     ` Tetsuo Handa
  2017-10-18 10:54       ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2017-10-11 11:14 UTC (permalink / raw)
  To: hannes; +Cc: yuwang668899, mhocko, linux-mm, chenggang.qcg, yuwang.yuwang, akpm

Tetsuo Handa wrote:
> Johannes Weiner wrote:
> > On Fri, Sep 15, 2017 at 05:58:49PM +0800, wang Yu wrote:
> > > From: "yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>
> > > 
> > > I found a softlockup when running some stress testcase in 4.9.x,
> > > but i think the mainline have the same problem.
> > > 
> > > call trace:
> > > [365724.502896] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s!
> > > [jbd2/sda3-8:1164]
> > 
> > We've started seeing the same thing on 4.11. Tons and tons of
> > allocation stall warnings followed by the soft lock-ups.
> 
> Forgot to comment. Since you are able to reproduce the problem (aren't you?),
> please try setting 1 to /proc/sys/kernel/softlockup_all_cpu_backtrace so that
> we can know what other CPUs are doing. It does not need to patch kernels.
> 
Johannes, were you able to reproduce the problem? I'd like to continue
warn_alloc() serialization patch if you can confirm that uncontrolled
flooding of allocation stall warning can lead to soft lockups.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,page_alloc: softlockup on warn_alloc on
  2017-10-11 11:14     ` Tetsuo Handa
@ 2017-10-18 10:54       ` Tetsuo Handa
  0 siblings, 0 replies; 20+ messages in thread
From: Tetsuo Handa @ 2017-10-18 10:54 UTC (permalink / raw)
  To: mhocko, akpm; +Cc: hannes, yuwang668899, linux-mm, chenggang.qcg, yuwang.yuwang

Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Johannes Weiner wrote:
> > > On Fri, Sep 15, 2017 at 05:58:49PM +0800, wang Yu wrote:
> > > > From: "yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>
> > > > 
> > > > I found a softlockup when running some stress testcase in 4.9.x,
> > > > but i think the mainline have the same problem.
> > > > 
> > > > call trace:
> > > > [365724.502896] NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s!
> > > > [jbd2/sda3-8:1164]
> > > 
> > > We've started seeing the same thing on 4.11. Tons and tons of
> > > allocation stall warnings followed by the soft lock-ups.
> > 
> > Forgot to comment. Since you are able to reproduce the problem (aren't you?),
> > please try setting 1 to /proc/sys/kernel/softlockup_all_cpu_backtrace so that
> > we can know what other CPUs are doing. It does not need to patch kernels.
> > 
> Johannes, were you able to reproduce the problem? I'd like to continue
> warn_alloc() serialization patch if you can confirm that uncontrolled
> flooding of allocation stall warning can lead to soft lockups.
> 

I hit soft lockup when I was examining why fork() is failing when virtio_balloon
driver has a lot of pages to free and VIRTIO_BALLOON_F_DEFLATE_ON_OOM is
negotiated using a debug patch (part of the patch is shown below).

----------------------------------------
diff --git a/kernel/fork.c b/kernel/fork.c
index 07cc743..e806695 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1985,6 +1985,7 @@ static __latent_entropy struct task_struct *copy_process(
 	put_task_stack(p);
 	free_task(p);
 fork_out:
+	WARN_ON(1);
 	return ERR_PTR(retval);
 }
 
diff --git a/mm/util.c b/mm/util.c
index 34e57fae..ce458de 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -675,6 +675,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
 error:
 	vm_unacct_memory(pages);
 
+	WARN_ON(1);
 	return -ENOMEM;
 }
 
----------------------------------------

It turned out that the cause of failing fork() was failing __vm_enough_memory()
because /proc/sys/vm/overcommit_memory was set to 0. Although virtio_balloon
driver was ready to release pages if asked via virtballoon_oom_notify() from
out_of_memory(), __vm_enough_memory() was not able to take such pages into
account. As a result, operations which need to use fork() (e.g. login via ssh)
and operations which calls vm_enough_memory() (e.g. mmap()) were failing without
calling out_of_memory().

But what I want to say here is that we should serialize warn_alloc() for reporting
allocation stalls because the essence is the same for warn_alloc() and WARN_ON(1)
above. Although warn_alloc() uses ratelimiting, both cases allow unbounded number
of threads to call printk() concurrently under memory pressure.

Since offloading printk() to the kernel thread is not yet available, and writing
to printk() buffer faster than the kernel thread can write to consoles (even if
it became possible) results in loss of messages, we should try to avoid appending
to printk() buffer when printk() is called from the same location in order to
reduce possibility of hitting soft lockup and getting unreadably-jumbled messages.
It is ridiculous to keep depending on not yet available printk() offloading.
printk() does want careful coordination from users in order to deliver important
messages reliably. Calling printk() uncontrolledly is not offered for free.

Complete log is http://I-love.SAKURA.ne.jp/tmp/20171018-softlockup.log.xz .
----------------------------------------
[   63.721863] ------------[ cut here ]------------
[   63.722467] WARNING: CPU: 1 PID: 852 at mm/util.c:678 __vm_enough_memory+0x11f/0x130
[   63.723100] Modules linked in: netconsole ip_set nfnetlink snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep crct10dif_pclmul crc32_pclmul snd_hda_core snd_seq ghash_clmulni_intel snd_seq_device ppdev snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev sg virtio_balloon pcspkr parport_pc soundcore i2c_piix4 parport xfs libcrc32c sr_mod cdrom ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk drm virtio_net virtio_console ata_piix crc32c_intel serio_raw libata virtio_pci i2c_core virtio_ring virtio floppy
[   63.726200] CPU: 1 PID: 852 Comm: sshd Tainted: G        W       4.14.0-rc5+ #303
[   63.726831] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   63.727498] task: ffff8e8cf07b8000 task.stack: ffff9ad48179c000
[   63.728282] RIP: 0010:__vm_enough_memory+0x11f/0x130
[   63.728964] RSP: 0018:ffff9ad48179fd58 EFLAGS: 00010257
[   63.729621] RAX: 000000000001fc2c RBX: fffffffffffffffc RCX: 00000000000007ca
[   63.730281] RDX: fffffffffffff836 RSI: fffffffffffffffc RDI: ffffffff914a4fc0
[   63.730924] RBP: ffff9ad48179fd70 R08: ffffffff9152f530 R09: ffff9ad48179fcfc
[   63.731582] R10: 0000000000000020 R11: ffff8e8cf61142a8 R12: 0000000000000001
[   63.732250] R13: ffff8e8cd6b47800 R14: ffff8e8cf6ebdc80 R15: 0000000000000004
[   63.732903] FS:  00007feb5d71f8c0(0000) GS:ffff8e8cffc80000(0000) knlGS:0000000000000000
[   63.733579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   63.734258] CR2: 00007feb5aaf38c0 CR3: 0000000230ab9000 CR4: 00000000001406e0
[   63.734936] Call Trace:
[   63.735623]  security_vm_enough_memory_mm+0x53/0x60
[   63.736310]  copy_process.part.34+0xd7d/0x1d60
[   63.736976]  _do_fork+0xed/0x390
[   63.737653]  SyS_clone+0x19/0x20
[   63.738359]  do_syscall_64+0x67/0x1b0
[   63.739030]  entry_SYSCALL64_slow_path+0x25/0x25
[   63.739726] RIP: 0033:0x7feb5ab39291
[   63.740390] RSP: 002b:00007fff7ff715c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
(... 1000+ mostly "WARNING: CPU: 2 PID: 417 at mm/util.c:678 __vm_enough_memory+0x11f/0x130" lines snipped...)
[   88.005016] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [sshd:852]
[   88.005017] Modules linked in: netconsole ip_set nfnetlink snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep crct10dif_pclmul crc32_pclmul snd_hda_core snd_seq ghash_clmulni_intel snd_seq_device ppdev snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev sg virtio_balloon pcspkr parport_pc soundcore i2c_piix4 parport xfs libcrc32c sr_mod cdrom ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk drm virtio_net virtio_console ata_piix crc32c_intel serio_raw libata virtio_pci i2c_core virtio_ring virtio floppy
[   88.005036] CPU: 1 PID: 852 Comm: sshd Tainted: G        W       4.14.0-rc5+ #303
[   88.005036] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   88.005037] task: ffff8e8cf07b8000 task.stack: ffff9ad48179c000
[   88.005041] RIP: 0010:console_unlock+0x24e/0x4c0
[   88.005042] RSP: 0018:ffff9ad48179f810 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[   88.005043] RAX: 0000000000000001 RBX: 0000000000000051 RCX: ffff8e8cee406000
[   88.005043] RDX: 0000000000000051 RSI: 0000000000000087 RDI: 0000000000000246
[   88.005044] RBP: ffff9ad48179f850 R08: 0000000001080020 R09: 00007174c0000000
[   88.005044] R10: 0000000000000c06 R11: 000000000000000c R12: 0000000000000400
[   88.005045] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000051
[   88.005050] FS:  00007feb5d71f8c0(0000) GS:ffff8e8cffc80000(0000) knlGS:0000000000000000
[   88.005050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   88.005051] CR2: 00007feb5aaf38c0 CR3: 0000000230ab9000 CR4: 00000000001406e0
[   88.005054] Call Trace:
[   88.005058]  vprintk_emit+0x2f5/0x3a0
[   88.005061]  ? entry_SYSCALL64_slow_path+0x25/0x25
[   88.005061]  vprintk_default+0x29/0x50
[   88.005063]  vprintk_func+0x27/0x60
[   88.005064]  printk+0x58/0x6f
[   88.005065]  ? entry_SYSCALL64_slow_path+0x25/0x25
[   88.005067]  __show_regs+0x7a/0x2d0
[   88.005068]  ? printk+0x58/0x6f
[   88.005069]  ? entry_SYSCALL64_slow_path+0x25/0x25
[   88.005070]  ? entry_SYSCALL64_slow_path+0x25/0x25
[   88.005071]  show_trace_log_lvl+0x2bf/0x410
[   88.005073]  show_regs+0x9f/0x1a0
[   88.005076]  ? __vm_enough_memory+0x11f/0x130
[   88.005077]  __warn+0x9b/0xeb
[   88.005078]  ? __vm_enough_memory+0x11f/0x130
[   88.005080]  report_bug+0x87/0x100
[   88.005081]  fixup_bug+0x2c/0x50
[   88.005083]  do_trap+0x12e/0x180
[   88.005084]  do_error_trap+0x89/0x110
[   88.005085]  ? __vm_enough_memory+0x11f/0x130
[   88.005087]  ? avc_has_perm_noaudit+0xca/0x140
[   88.005089]  do_invalid_op+0x20/0x30
[   88.005090]  invalid_op+0x18/0x20
[   88.005091] RIP: 0010:__vm_enough_memory+0x11f/0x130
[   88.005091] RSP: 0018:ffff9ad48179fd58 EFLAGS: 00010257
[   88.005092] RAX: 000000000001fc2c RBX: fffffffffffffffc RCX: 00000000000007ca
[   88.005092] RDX: fffffffffffff836 RSI: fffffffffffffffc RDI: ffffffff914a4fc0
[   88.005093] RBP: ffff9ad48179fd70 R08: ffffffff9152f530 R09: ffff9ad48179fcfc
[   88.005093] R10: 0000000000000020 R11: ffff8e8cf61142a8 R12: 0000000000000001
[   88.005094] R13: ffff8e8cd6b47800 R14: ffff8e8cf6ebdc80 R15: 0000000000000004
[   88.005096]  ? __vm_enough_memory+0x11f/0x130
[   88.005098]  security_vm_enough_memory_mm+0x53/0x60
[   88.005100]  copy_process.part.34+0xd7d/0x1d60
[   88.005102]  _do_fork+0xed/0x390
[   88.005104]  SyS_clone+0x19/0x20
[   88.005106]  do_syscall_64+0x67/0x1b0
[   88.005107]  entry_SYSCALL64_slow_path+0x25/0x25
[   88.005108] RIP: 0033:0x7feb5ab39291
[   88.005108] RSP: 002b:00007fff7ff715c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[   88.005109] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007feb5ab39291
[   88.005109] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[   88.005110] RBP: 00007fff7ff71600 R08: 00007feb5d71f8c0 R09: 0000000000000354
[   88.005110] R10: 00007feb5d71fb90 R11: 0000000000000246 R12: 00007fff7ff715c0
[   88.005111] R13: 0000000000000000 R14: 0000000000000000 R15: 000055e03ff11e08
[   88.005111] Code: a8 40 0f 84 ad 01 00 00 4c 89 f7 44 89 ea 48 c7 c6 c0 a3 2d 91 ff d1 4d 8b 76 50 4d 85 f6 75 a6 e8 88 17 00 00 48 8b 7d d0 57 9d <0f> 1f 44 00 00 8b 55 c8 85 d2 0f 84 26 fe ff ff e8 fd 29 68 00
[   88.161470] virtio_balloon virtio3: Released 256 pages. Remains 1977163 pages.
[   88.277947] virtio_balloon virtio3: Released 256 pages. Remains 1976907 pages.
[   88.279782] virtio_balloon virtio3: Released 256 pages. Remains 1976651 pages.
[   90.526651] virtio_balloon virtio3: Released 256 pages. Remains 1976395 pages.
[   91.714541] virtio_balloon virtio3: Released 256 pages. Remains 1976139 pages.
[   92.900111] virtio_balloon virtio3: Released 256 pages. Remains 1975883 pages.
[   93.379880] virtio_balloon virtio3: Released 256 pages. Remains 1975627 pages.
[   93.408485] virtio_balloon virtio3: Released 256 pages. Remains 1975371 pages.
[   94.567452] virtio_balloon virtio3: Released 256 pages. Remains 1975115 pages.
[   95.194615] virtio_balloon virtio3: Released 256 pages. Remains 1974859 pages.
[   95.458030] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007feb5ab39291
[   95.458453] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[   95.458873] RBP: 00007fff7ff71600 R08: 00007feb5d71f8c0 R09: 0000000000000354
[   95.459307] R10: 00007feb5d71fb90 R11: 0000000000000246 R12: 00007fff7ff715c0
[   95.459732] R13: 0000000000000000 R14: 0000000000000000 R15: 000055e03ff11e08
[   95.460172] Code: 0d 9f 41 0d 01 31 d2 48 85 c9 48 0f 49 d1 48 39 d0 7f a5 8b 15 23 c5 b2 00 48 f7 db 48 c7 c7 c0 4f 4a 91 48 89 de e8 d1 06 1e 00 <0f> ff b8 f4 ff ff ff eb 86 0f 1f 84 00 00 00 00 00 0f 1f 44 00
[   95.461110] ---[ end trace 7b4eb70d4e6de603 ]---
[   95.461622] ------------[ cut here ]------------
[   95.462147] WARNING: CPU: 2 PID: 852 at kernel/fork.c:1988 copy_process.part.34+0x5ce/0x1d60
[   95.462675] Modules linked in: netconsole ip_set nfnetlink snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep crct10dif_pclmul crc32_pclmul snd_hda_core snd_seq ghash_clmulni_intel snd_seq_device ppdev snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev sg virtio_balloon pcspkr parport_pc soundcore i2c_piix4 parport xfs libcrc32c sr_mod cdrom ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk drm virtio_net virtio_console ata_piix crc32c_intel serio_raw libata virtio_pci i2c_core virtio_ring virtio floppy
[   95.465775] CPU: 2 PID: 852 Comm: sshd Tainted: G        W    L  4.14.0-rc5+ #303
[   95.466404] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   95.467015] task: ffff8e8cf07b8000 task.stack: ffff9ad48179c000
[   95.467705] RIP: 0010:copy_process.part.34+0x5ce/0x1d60
[   95.468314] RSP: 0000:ffff9ad48179fda8 EFLAGS: 00010202
[   95.468902] RAX: 0000000000000001 RBX: ffff8e8cf07bae80 RCX: 0000000180050004
[   95.469520] RDX: 0000000180050005 RSI: fffff98608c1ee00 RDI: 0000000044040000
[   95.470139] RBP: ffff9ad48179fe90 R08: ffff8e8cf07bae80 R09: 0000000180050004
[   95.470724] R10: 0000000000000001 R11: ffff8e8cf07bae80 R12: fffffffffffffff4
[   95.471348] R13: 0000000000000000 R14: ffff8e8cf6ebdc80 R15: 0000000000000000
[   95.471964] FS:  00007feb5d71f8c0(0000) GS:ffff8e8cffd00000(0000) knlGS:0000000000000000
[   95.472607] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.473246] CR2: 00007f7d9da40690 CR3: 0000000230ab9000 CR4: 00000000001406e0
[   95.473870] Call Trace:
[   95.474593]  _do_fork+0xed/0x390
[   95.475267]  SyS_clone+0x19/0x20
[   95.475901]  do_syscall_64+0x67/0x1b0
[   95.476554]  entry_SYSCALL64_slow_path+0x25/0x25
[   95.477241] RIP: 0033:0x7feb5ab39291
[   95.477857] RSP: 002b:00007fff7ff715c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[   95.478534] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007feb5ab39291
[   95.479187] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[   95.479785] RBP: 00007fff7ff71600 R08: 00007feb5d71f8c0 R09: 0000000000000354
[   95.480391] R10: 00007feb5d71fb90 R11: 0000000000000246 R12: 00007fff7ff715c0
[   95.480988] R13: 0000000000000000 R14: 0000000000000000 R15: 000055e03ff11e08
[   95.481610] Code: 48 8b 80 80 00 00 00 f0 ff 48 04 48 89 df e8 ba 58 02 00 48 89 df 48 c7 43 08 80 00 00 00 e8 3a f3 ff ff 48 89 df e8 52 f1 ff ff <0f> ff 4c 89 e3 e9 ba fa ff ff 4c 63 e0 eb d9 65 48 8b 05 63 5b
[   95.482925] ---[ end trace 7b4eb70d4e6de604 ]---
----------------------------------------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-10-18 10:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-15  9:58 [PATCH] mm,page_alloc: softlockup on warn_alloc on wang Yu
2017-09-15 10:39 ` Michal Hocko
2017-09-15 11:38   ` Tetsuo Handa
2017-09-15 12:00     ` Michal Hocko
2017-09-15 12:09       ` Tetsuo Handa
2017-09-15 12:14         ` Michal Hocko
2017-09-15 14:12           ` Tetsuo Handa
2017-09-15 14:23             ` Michal Hocko
2017-09-24  1:56             ` Tetsuo Handa
2017-09-15 14:37 ` Johannes Weiner
2017-09-15 15:23   ` Tetsuo Handa
2017-09-15 18:44     ` Johannes Weiner
2017-09-16  0:25       ` Tetsuo Handa
2017-09-18  6:05         ` Michal Hocko
2017-09-18  6:31           ` Tetsuo Handa
2017-09-18  6:43             ` Michal Hocko
2017-09-16  4:12   ` Tetsuo Handa
2017-10-11 11:14     ` Tetsuo Handa
2017-10-18 10:54       ` Tetsuo Handa
2017-09-18  6:03   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).