All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-23  8:15 ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-23  8:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, Tetsuo Handa, LKML,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Currently we do warn only about allocation failures but small
allocations are basically nofail and they might loop in the page
allocator for a long time.  Especially when the reclaim cannot make
any progress - e.g. GFP_NOFS cannot invoke the oom killer and rely on
a different context to make a forward progress in case there is a lot
memory used by filesystems.

Give us at least a clue when something like this happens and warn about
allocations which take more than 10s. Print the basic allocation context
information along with the cumulative time spent in the allocation as
well as the allocation stack. Repeat the warning after every 10 seconds so
that we know that the problem is permanent rather than ephemeral.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
I am sending this as an RFC because I am not really sure what is the reasonable
timeout when to warn. I went with 10s because that should be close to "for ever"
from the user perspective. But maybe a shorter would be helpful as well?
I didn't go with a tunable because I would rather not add a new one.

Thoughts? Ideas?

 mm/page_alloc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5155485057cb..d5faab8aa94d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3485,6 +3485,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	enum compact_result compact_result;
 	int compaction_retries = 0;
 	int no_progress_loops = 0;
+	unsigned long alloc_start = jiffies;
+	unsigned int stall_timeout = 10 * HZ;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	else
 		no_progress_loops++;
 
+	/* Make sure we know about allocations which stall for too long */
+	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
+		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
+				current->comm, jiffies_to_msecs(jiffies-alloc_start),
+				order, gfp_mask, &gfp_mask);
+		stall_timeout += 10 * HZ;
+		dump_stack();
+	}
+
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 				 did_some_progress > 0, no_progress_loops))
 		goto retry;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-23  8:15 ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-23  8:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, Tetsuo Handa, LKML,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Currently we do warn only about allocation failures but small
allocations are basically nofail and they might loop in the page
allocator for a long time.  Especially when the reclaim cannot make
any progress - e.g. GFP_NOFS cannot invoke the oom killer and rely on
a different context to make a forward progress in case there is a lot
memory used by filesystems.

Give us at least a clue when something like this happens and warn about
allocations which take more than 10s. Print the basic allocation context
information along with the cumulative time spent in the allocation as
well as the allocation stack. Repeat the warning after every 10 seconds so
that we know that the problem is permanent rather than ephemeral.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
I am sending this as an RFC because I am not really sure what is the reasonable
timeout when to warn. I went with 10s because that should be close to "for ever"
from the user perspective. But maybe a shorter would be helpful as well?
I didn't go with a tunable because I would rather not add a new one.

Thoughts? Ideas?

 mm/page_alloc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5155485057cb..d5faab8aa94d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3485,6 +3485,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	enum compact_result compact_result;
 	int compaction_retries = 0;
 	int no_progress_loops = 0;
+	unsigned long alloc_start = jiffies;
+	unsigned int stall_timeout = 10 * HZ;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	else
 		no_progress_loops++;
 
+	/* Make sure we know about allocations which stall for too long */
+	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
+		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
+				current->comm, jiffies_to_msecs(jiffies-alloc_start),
+				order, gfp_mask, &gfp_mask);
+		stall_timeout += 10 * HZ;
+		dump_stack();
+	}
+
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 				 did_some_progress > 0, no_progress_loops))
 		goto retry;
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-23  8:15 ` Michal Hocko
@ 2016-09-23  8:29   ` Hillf Danton
  -1 siblings, 0 replies; 42+ messages in thread
From: Hillf Danton @ 2016-09-23  8:29 UTC (permalink / raw)
  To: 'Michal Hocko', linux-mm
  Cc: 'Andrew Morton', 'Johannes Weiner',
	'Mel Gorman', 'Tetsuo Handa', 'LKML',
	'Michal Hocko'

> 
> From: Michal Hocko <mhocko@suse.com>
> 
> Currently we do warn only about allocation failures but small
> allocations are basically nofail and they might loop in the page
> allocator for a long time.  Especially when the reclaim cannot make
> any progress - e.g. GFP_NOFS cannot invoke the oom killer and rely on
> a different context to make a forward progress in case there is a lot
> memory used by filesystems.
> 
> Give us at least a clue when something like this happens and warn about
> allocations which take more than 10s. Print the basic allocation context
> information along with the cumulative time spent in the allocation as
> well as the allocation stack. Repeat the warning after every 10 seconds so
> that we know that the problem is permanent rather than ephemeral.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> I am sending this as an RFC because I am not really sure what is the reasonable
> timeout when to warn. I went with 10s because that should be close to "for ever"
> from the user perspective. But maybe a shorter would be helpful as well?
> I didn't go with a tunable because I would rather not add a new one.
> 
> Thoughts? Ideas?
> 
>  mm/page_alloc.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5155485057cb..d5faab8aa94d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3485,6 +3485,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	enum compact_result compact_result;
>  	int compaction_retries = 0;
>  	int no_progress_loops = 0;
> +	unsigned long alloc_start = jiffies;
> +	unsigned int stall_timeout = 10 * HZ;
> 
>  	/*
>  	 * In the slowpath, we sanity check order to avoid ever trying to
> @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	else
>  		no_progress_loops++;
> 
> +	/* Make sure we know about allocations which stall for too long */
> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),

Better if pid is also printed.
> +				order, gfp_mask, &gfp_mask);
> +		stall_timeout += 10 * HZ;

Alternatively	 alloc_start = jiffies;

> +		dump_stack();
> +	}
> +
>  	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
>  				 did_some_progress > 0, no_progress_loops))
>  		goto retry;
> --
> 2.9.3
> 
thanks
Hillf

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-23  8:29   ` Hillf Danton
  0 siblings, 0 replies; 42+ messages in thread
From: Hillf Danton @ 2016-09-23  8:29 UTC (permalink / raw)
  To: 'Michal Hocko', linux-mm
  Cc: 'Andrew Morton', 'Johannes Weiner',
	'Mel Gorman', 'Tetsuo Handa', 'LKML',
	'Michal Hocko'

> 
> From: Michal Hocko <mhocko@suse.com>
> 
> Currently we do warn only about allocation failures but small
> allocations are basically nofail and they might loop in the page
> allocator for a long time.  Especially when the reclaim cannot make
> any progress - e.g. GFP_NOFS cannot invoke the oom killer and rely on
> a different context to make a forward progress in case there is a lot
> memory used by filesystems.
> 
> Give us at least a clue when something like this happens and warn about
> allocations which take more than 10s. Print the basic allocation context
> information along with the cumulative time spent in the allocation as
> well as the allocation stack. Repeat the warning after every 10 seconds so
> that we know that the problem is permanent rather than ephemeral.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> I am sending this as an RFC because I am not really sure what is the reasonable
> timeout when to warn. I went with 10s because that should be close to "for ever"
> from the user perspective. But maybe a shorter would be helpful as well?
> I didn't go with a tunable because I would rather not add a new one.
> 
> Thoughts? Ideas?
> 
>  mm/page_alloc.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5155485057cb..d5faab8aa94d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3485,6 +3485,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	enum compact_result compact_result;
>  	int compaction_retries = 0;
>  	int no_progress_loops = 0;
> +	unsigned long alloc_start = jiffies;
> +	unsigned int stall_timeout = 10 * HZ;
> 
>  	/*
>  	 * In the slowpath, we sanity check order to avoid ever trying to
> @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	else
>  		no_progress_loops++;
> 
> +	/* Make sure we know about allocations which stall for too long */
> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),

Better if pid is also printed.
> +				order, gfp_mask, &gfp_mask);
> +		stall_timeout += 10 * HZ;

Alternatively	 alloc_start = jiffies;

> +		dump_stack();
> +	}
> +
>  	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
>  				 did_some_progress > 0, no_progress_loops))
>  		goto retry;
> --
> 2.9.3
> 
thanks
Hillf

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-23  8:29   ` Hillf Danton
@ 2016-09-23  8:32     ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-23  8:32 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, 'Andrew Morton', 'Johannes Weiner',
	'Mel Gorman', 'Tetsuo Handa', 'LKML'

On Fri 23-09-16 16:29:36, Hillf Danton wrote:
[...]
> > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	else
> >  		no_progress_loops++;
> > 
> > +	/* Make sure we know about allocations which stall for too long */
> > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> 
> Better if pid is also printed.

I've tried to be consistent with warn_alloc_failed and that doesn't
print pid either. Maybe both of them should. Dunno

> > +				order, gfp_mask, &gfp_mask);
> > +		stall_timeout += 10 * HZ;
> 
> Alternatively	 alloc_start = jiffies;

Then we would lose the cumulative time in the output which is imho
helpful because you cannot tell whether the new warning is a new request
or the old one still looping.

> > +		dump_stack();
> > +	}
> > +
> >  	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
> >  				 did_some_progress > 0, no_progress_loops))
> >  		goto retry;
> > --
> > 2.9.3
> > 
> thanks
> Hillf
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-23  8:32     ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-23  8:32 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, 'Andrew Morton', 'Johannes Weiner',
	'Mel Gorman', 'Tetsuo Handa', 'LKML'

On Fri 23-09-16 16:29:36, Hillf Danton wrote:
[...]
> > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	else
> >  		no_progress_loops++;
> > 
> > +	/* Make sure we know about allocations which stall for too long */
> > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> 
> Better if pid is also printed.

I've tried to be consistent with warn_alloc_failed and that doesn't
print pid either. Maybe both of them should. Dunno

> > +				order, gfp_mask, &gfp_mask);
> > +		stall_timeout += 10 * HZ;
> 
> Alternatively	 alloc_start = jiffies;

Then we would lose the cumulative time in the output which is imho
helpful because you cannot tell whether the new warning is a new request
or the old one still looping.

> > +		dump_stack();
> > +	}
> > +
> >  	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
> >  				 did_some_progress > 0, no_progress_loops))
> >  		goto retry;
> > --
> > 2.9.3
> > 
> thanks
> Hillf
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-23  8:32     ` Michal Hocko
@ 2016-09-23  8:44       ` Hillf Danton
  -1 siblings, 0 replies; 42+ messages in thread
From: Hillf Danton @ 2016-09-23  8:44 UTC (permalink / raw)
  To: 'Michal Hocko'
  Cc: linux-mm, 'Andrew Morton', 'Johannes Weiner',
	'Mel Gorman', 'Tetsuo Handa', 'LKML'

On Friday, September 23, 2016 4:32 PM, Michal Hocko wrote
> On Fri 23-09-16 16:29:36, Hillf Danton wrote:
> [...]
> > > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >  	else
> > >  		no_progress_loops++;
> > >
> > > +	/* Make sure we know about allocations which stall for too long */
> > > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> >
> > Better if pid is also printed.
> 
> I've tried to be consistent with warn_alloc_failed and that doesn't
> print pid either. Maybe both of them should. Dunno
> 
With pid imho we can distinguish two tasks with same name in a simpler way. 

> > > +				order, gfp_mask, &gfp_mask);
> > > +		stall_timeout += 10 * HZ;
> >
> > Alternatively	 alloc_start = jiffies;
> 
> Then we would lose the cumulative time in the output which is imho
> helpful because you cannot tell whether the new warning is a new request
> or the old one still looping.
> 
Fair.

thanks
Hillf

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-23  8:44       ` Hillf Danton
  0 siblings, 0 replies; 42+ messages in thread
From: Hillf Danton @ 2016-09-23  8:44 UTC (permalink / raw)
  To: 'Michal Hocko'
  Cc: linux-mm, 'Andrew Morton', 'Johannes Weiner',
	'Mel Gorman', 'Tetsuo Handa', 'LKML'

On Friday, September 23, 2016 4:32 PM, Michal Hocko wrote
> On Fri 23-09-16 16:29:36, Hillf Danton wrote:
> [...]
> > > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >  	else
> > >  		no_progress_loops++;
> > >
> > > +	/* Make sure we know about allocations which stall for too long */
> > > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> >
> > Better if pid is also printed.
> 
> I've tried to be consistent with warn_alloc_failed and that doesn't
> print pid either. Maybe both of them should. Dunno
> 
With pid imho we can distinguish two tasks with same name in a simpler way. 

> > > +				order, gfp_mask, &gfp_mask);
> > > +		stall_timeout += 10 * HZ;
> >
> > Alternatively	 alloc_start = jiffies;
> 
> Then we would lose the cumulative time in the output which is imho
> helpful because you cannot tell whether the new warning is a new request
> or the old one still looping.
> 
Fair.

thanks
Hillf

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-23  8:44       ` Hillf Danton
@ 2016-09-23  9:15         ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-23  9:15 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, 'Andrew Morton', 'Johannes Weiner',
	'Mel Gorman', 'Tetsuo Handa', 'LKML'

On Fri 23-09-16 16:44:26, Hillf Danton wrote:
> On Friday, September 23, 2016 4:32 PM, Michal Hocko wrote
> > On Fri 23-09-16 16:29:36, Hillf Danton wrote:
> > [...]
> > > > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > > >  	else
> > > >  		no_progress_loops++;
> > > >
> > > > +	/* Make sure we know about allocations which stall for too long */
> > > > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > > > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > > > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > >
> > > Better if pid is also printed.
> > 
> > I've tried to be consistent with warn_alloc_failed and that doesn't
> > print pid either. Maybe both of them should. Dunno
> > 
> With pid imho we can distinguish two tasks with same name in a simpler way. 

I've just checked dump_stack and dump_stack_print_info provides that
information already.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-23  9:15         ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-23  9:15 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, 'Andrew Morton', 'Johannes Weiner',
	'Mel Gorman', 'Tetsuo Handa', 'LKML'

On Fri 23-09-16 16:44:26, Hillf Danton wrote:
> On Friday, September 23, 2016 4:32 PM, Michal Hocko wrote
> > On Fri 23-09-16 16:29:36, Hillf Danton wrote:
> > [...]
> > > > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > > >  	else
> > > >  		no_progress_loops++;
> > > >
> > > > +	/* Make sure we know about allocations which stall for too long */
> > > > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > > > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > > > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > >
> > > Better if pid is also printed.
> > 
> > I've tried to be consistent with warn_alloc_failed and that doesn't
> > print pid either. Maybe both of them should. Dunno
> > 
> With pid imho we can distinguish two tasks with same name in a simpler way. 

I've just checked dump_stack and dump_stack_print_info provides that
information already.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-23  8:15 ` Michal Hocko
@ 2016-09-23 14:36   ` Tetsuo Handa
  -1 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2016-09-23 14:36 UTC (permalink / raw)
  To: mhocko, linux-mm; +Cc: akpm, hannes, mgorman, linux-kernel, mhocko

Michal Hocko wrote:
> @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	else
>  		no_progress_loops++;
>  
> +	/* Make sure we know about allocations which stall for too long */
> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {

Should we check !__GFP_NOWARN ? I think __GFP_NOWARN is likely used with
__GFP_NORETRY, and __GFP_NORETRY is already checked by now.

I think printing warning regardless of __GFP_NOWARN is better because
this check is similar to hungtask warning.

> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> +				order, gfp_mask, &gfp_mask);
> +		stall_timeout += 10 * HZ;
> +		dump_stack();

Can we move this pr_warn() + dump_stack() to a separate function like

static void __warn_memalloc_stall(unsigned int order, gfp_t gfp_mask, unsigned long alloc_start)
{
	pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
		current->comm, jiffies_to_msecs(jiffies-alloc_start),
		order, gfp_mask, &gfp_mask);
	dump_stack();
}

in order to allow SystemTap scripts to perform additional actions by name (e.g.

# stap -g -e 'probe kernel.function("__warn_memalloc_stall").return { panic(); }

) rather than by line number, and surround __warn_memalloc_stall() call with
mutex in order to serialize warning messages because it is possible that
multiple allocation requests are stalling?

> +	}
> +
>  	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
>  				 did_some_progress > 0, no_progress_loops))
>  		goto retry;
> -- 
> 2.9.3

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-23 14:36   ` Tetsuo Handa
  0 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2016-09-23 14:36 UTC (permalink / raw)
  To: mhocko, linux-mm; +Cc: akpm, hannes, mgorman, linux-kernel, mhocko

Michal Hocko wrote:
> @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	else
>  		no_progress_loops++;
>  
> +	/* Make sure we know about allocations which stall for too long */
> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {

Should we check !__GFP_NOWARN ? I think __GFP_NOWARN is likely used with
__GFP_NORETRY, and __GFP_NORETRY is already checked by now.

I think printing warning regardless of __GFP_NOWARN is better because
this check is similar to hungtask warning.

> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> +				order, gfp_mask, &gfp_mask);
> +		stall_timeout += 10 * HZ;
> +		dump_stack();

Can we move this pr_warn() + dump_stack() to a separate function like

static void __warn_memalloc_stall(unsigned int order, gfp_t gfp_mask, unsigned long alloc_start)
{
	pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
		current->comm, jiffies_to_msecs(jiffies-alloc_start),
		order, gfp_mask, &gfp_mask);
	dump_stack();
}

in order to allow SystemTap scripts to perform additional actions by name (e.g.

# stap -g -e 'probe kernel.function("__warn_memalloc_stall").return { panic(); }

) rather than by line number, and surround __warn_memalloc_stall() call with
mutex in order to serialize warning messages because it is possible that
multiple allocation requests are stalling?

> +	}
> +
>  	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
>  				 did_some_progress > 0, no_progress_loops))
>  		goto retry;
> -- 
> 2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-23 14:36   ` Tetsuo Handa
@ 2016-09-23 15:02     ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-23 15:02 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, akpm, hannes, mgorman, linux-kernel

On Fri 23-09-16 23:36:22, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	else
> >  		no_progress_loops++;
> >  
> > +	/* Make sure we know about allocations which stall for too long */
> > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> 
> Should we check !__GFP_NOWARN ? I think __GFP_NOWARN is likely used with
> __GFP_NORETRY, and __GFP_NORETRY is already checked by now.
> 
> I think printing warning regardless of __GFP_NOWARN is better because
> this check is similar to hungtask warning.

Well, if the user said to not warn we should really obey that. Why would
that matter?
 
> > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > +				order, gfp_mask, &gfp_mask);
> > +		stall_timeout += 10 * HZ;
> > +		dump_stack();
> 
> Can we move this pr_warn() + dump_stack() to a separate function like
> 
> static void __warn_memalloc_stall(unsigned int order, gfp_t gfp_mask, unsigned long alloc_start)
> {
> 	pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> 		current->comm, jiffies_to_msecs(jiffies-alloc_start),
> 		order, gfp_mask, &gfp_mask);
> 	dump_stack();
> }
> 
> in order to allow SystemTap scripts to perform additional actions by name (e.g.
> 
> # stap -g -e 'probe kernel.function("__warn_memalloc_stall").return { panic(); }

I find this reasoning and the use case really _absurd_, seriously! Pulling
the warning into a separate function might be reasonable regardless,
though. It matches warn_alloc_failed. Also if we find out we need some
rate limitting or more checks it might just turn out being easier to
follow rather than in the middle of an already complicated allocation
slow path. I just do not like that the stall_timeout would have to stay
in the original place or have it an in/out parameter.

> ) rather than by line number, and surround __warn_memalloc_stall() call with
> mutex in order to serialize warning messages because it is possible that
> multiple allocation requests are stalling?

we do not use any lock in warn_alloc_failed so why this should be any
different?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-23 15:02     ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-23 15:02 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, akpm, hannes, mgorman, linux-kernel

On Fri 23-09-16 23:36:22, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	else
> >  		no_progress_loops++;
> >  
> > +	/* Make sure we know about allocations which stall for too long */
> > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> 
> Should we check !__GFP_NOWARN ? I think __GFP_NOWARN is likely used with
> __GFP_NORETRY, and __GFP_NORETRY is already checked by now.
> 
> I think printing warning regardless of __GFP_NOWARN is better because
> this check is similar to hungtask warning.

Well, if the user said to not warn we should really obey that. Why would
that matter?
 
> > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > +				order, gfp_mask, &gfp_mask);
> > +		stall_timeout += 10 * HZ;
> > +		dump_stack();
> 
> Can we move this pr_warn() + dump_stack() to a separate function like
> 
> static void __warn_memalloc_stall(unsigned int order, gfp_t gfp_mask, unsigned long alloc_start)
> {
> 	pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> 		current->comm, jiffies_to_msecs(jiffies-alloc_start),
> 		order, gfp_mask, &gfp_mask);
> 	dump_stack();
> }
> 
> in order to allow SystemTap scripts to perform additional actions by name (e.g.
> 
> # stap -g -e 'probe kernel.function("__warn_memalloc_stall").return { panic(); }

I find this reasoning and the use case really _absurd_, seriously! Pulling
the warning into a separate function might be reasonable regardless,
though. It matches warn_alloc_failed. Also if we find out we need some
rate limitting or more checks it might just turn out being easier to
follow rather than in the middle of an already complicated allocation
slow path. I just do not like that the stall_timeout would have to stay
in the original place or have it an in/out parameter.

> ) rather than by line number, and surround __warn_memalloc_stall() call with
> mutex in order to serialize warning messages because it is possible that
> multiple allocation requests are stalling?

we do not use any lock in warn_alloc_failed so why this should be any
different?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-23  8:15 ` Michal Hocko
@ 2016-09-23 17:34   ` Dave Hansen
  -1 siblings, 0 replies; 42+ messages in thread
From: Dave Hansen @ 2016-09-23 17:34 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, Tetsuo Handa, LKML,
	Michal Hocko

On 09/23/2016 01:15 AM, Michal Hocko wrote:
> +	/* Make sure we know about allocations which stall for too long */
> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> +				order, gfp_mask, &gfp_mask);
> +		stall_timeout += 10 * HZ;
> +		dump_stack();
> +	}

This would make an awesome tracepoint.  There's probably still plenty of
value to having it in dmesg, but the configurability of tracepoints is
hard to beat.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-23 17:34   ` Dave Hansen
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Hansen @ 2016-09-23 17:34 UTC (permalink / raw)
  To: Michal Hocko, linux-mm
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, Tetsuo Handa, LKML,
	Michal Hocko

On 09/23/2016 01:15 AM, Michal Hocko wrote:
> +	/* Make sure we know about allocations which stall for too long */
> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> +				order, gfp_mask, &gfp_mask);
> +		stall_timeout += 10 * HZ;
> +		dump_stack();
> +	}

This would make an awesome tracepoint.  There's probably still plenty of
value to having it in dmesg, but the configurability of tracepoints is
hard to beat.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-23 15:02     ` Michal Hocko
@ 2016-09-24  3:00       ` Tetsuo Handa
  -1 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2016-09-24  3:00 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, akpm, hannes, mgorman, linux-kernel

Michal Hocko wrote:
> On Fri 23-09-16 23:36:22, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >  	else
> > >  		no_progress_loops++;
> > >  
> > > +	/* Make sure we know about allocations which stall for too long */
> > > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > 
> > Should we check !__GFP_NOWARN ? I think __GFP_NOWARN is likely used with
> > __GFP_NORETRY, and __GFP_NORETRY is already checked by now.
> > 
> > I think printing warning regardless of __GFP_NOWARN is better because
> > this check is similar to hungtask warning.
> 
> Well, if the user said to not warn we should really obey that. Why would
> that matter?

__GFP_NOWARN is defined as "Do not print failure messages when memory
allocation failed". It is not defined as "Do not print OOM killer messages
when OOM killer is invoked". It is undefined that "Do not print stall
messages when memory allocation is stalling".

If memory allocating threads were blocked on locks instead of doing direct
reclaim, hungtask will be able to find stalling memory allocations without
this change. Since direct reclaim prevents allocating threads from sleeping
for long enough to be warned by hungtask, it is important that this change
shall find allocating threads which cannot be warned by hungtask. That is,
not printing warning messages for __GFP_NOWARN allocation requests looses
the value of this change.

>  
> > > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > > +				order, gfp_mask, &gfp_mask);
> > > +		stall_timeout += 10 * HZ;
> > > +		dump_stack();
> > 
> > Can we move this pr_warn() + dump_stack() to a separate function like
> > 
> > static void __warn_memalloc_stall(unsigned int order, gfp_t gfp_mask, unsigned long alloc_start)
> > {
> > 	pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > 		current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > 		order, gfp_mask, &gfp_mask);
> > 	dump_stack();
> > }
> > 
> > in order to allow SystemTap scripts to perform additional actions by name (e.g.
> > 
> > # stap -g -e 'probe kernel.function("__warn_memalloc_stall").return { panic(); }
> 
> I find this reasoning and the use case really _absurd_, seriously! Pulling
> the warning into a separate function might be reasonable regardless,
> though. It matches warn_alloc_failed. Also if we find out we need some
> rate limitting or more checks it might just turn out being easier to
> follow rather than in the middle of an already complicated allocation
> slow path. I just do not like that the stall_timeout would have to stay
> in the original place or have it an in/out parameter.

SystemTap script shown above is just an example. What is nice is that
we can do whatever actions for examining what is going on by using a
function as if an interrupt handler. For example, when I was working at
support center, there was a support case where the customer's system always
reboots for unknown reason whenever specific action is taken. I inserted
SystemTap script shown above into a function which is called when a system
reboots, and I identified that the reason was SysRq-b which was triggered by
HA manager daemon due to misconfiguration. I used a reboot function as an
interrupt handler for examining why that handler was called. Likewise, we
can use __warn_memalloc_stall() as an interrupt handler for examining what
is going on. I think that there will be situations where existing printk()
does not provide enough information and thus examining a memory snapshot is
needed. Allowing tracing tools like SystemTap to insert a hook by function
name (instead of line number) is helpful anyway.

Going back to !(gfp_mask & __GFP_NOWARN) check, if you don't want to
print stall messages, you can move that check to inside
__warn_memalloc_stall(). Then, I can insert a SystemTap hook to print
stall messages for !(gfp_mask & __GFP_NOWARN) case. Even more, if you
are not sure what is best threshold, you can call a hook function every
second. The SystemTap script can check threshold and print warning if
necessary information are passed to that hook function. This resembles
LSM hooks. In the past, LSM hook was calling an empty function with
necessary information when LSM hook user (e.g. SELinux) is not registered.

> 
> > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > mutex in order to serialize warning messages because it is possible that
> > multiple allocation requests are stalling?
> 
> we do not use any lock in warn_alloc_failed so why this should be any
> different?

warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
!__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
concurrent memory allocation failure messages make dmesg output
unreadable. But __warn_memalloc_stall() is called for only
__GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
sleep in order to serialize concurrent memory allocation stall
messages.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-24  3:00       ` Tetsuo Handa
  0 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2016-09-24  3:00 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, akpm, hannes, mgorman, linux-kernel

Michal Hocko wrote:
> On Fri 23-09-16 23:36:22, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >  	else
> > >  		no_progress_loops++;
> > >  
> > > +	/* Make sure we know about allocations which stall for too long */
> > > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > 
> > Should we check !__GFP_NOWARN ? I think __GFP_NOWARN is likely used with
> > __GFP_NORETRY, and __GFP_NORETRY is already checked by now.
> > 
> > I think printing warning regardless of __GFP_NOWARN is better because
> > this check is similar to hungtask warning.
> 
> Well, if the user said to not warn we should really obey that. Why would
> that matter?

__GFP_NOWARN is defined as "Do not print failure messages when memory
allocation failed". It is not defined as "Do not print OOM killer messages
when OOM killer is invoked". It is undefined that "Do not print stall
messages when memory allocation is stalling".

If memory allocating threads were blocked on locks instead of doing direct
reclaim, hungtask will be able to find stalling memory allocations without
this change. Since direct reclaim prevents allocating threads from sleeping
for long enough to be warned by hungtask, it is important that this change
shall find allocating threads which cannot be warned by hungtask. That is,
not printing warning messages for __GFP_NOWARN allocation requests looses
the value of this change.

>  
> > > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > > +				order, gfp_mask, &gfp_mask);
> > > +		stall_timeout += 10 * HZ;
> > > +		dump_stack();
> > 
> > Can we move this pr_warn() + dump_stack() to a separate function like
> > 
> > static void __warn_memalloc_stall(unsigned int order, gfp_t gfp_mask, unsigned long alloc_start)
> > {
> > 	pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > 		current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > 		order, gfp_mask, &gfp_mask);
> > 	dump_stack();
> > }
> > 
> > in order to allow SystemTap scripts to perform additional actions by name (e.g.
> > 
> > # stap -g -e 'probe kernel.function("__warn_memalloc_stall").return { panic(); }
> 
> I find this reasoning and the use case really _absurd_, seriously! Pulling
> the warning into a separate function might be reasonable regardless,
> though. It matches warn_alloc_failed. Also if we find out we need some
> rate limitting or more checks it might just turn out being easier to
> follow rather than in the middle of an already complicated allocation
> slow path. I just do not like that the stall_timeout would have to stay
> in the original place or have it an in/out parameter.

SystemTap script shown above is just an example. What is nice is that
we can do whatever actions for examining what is going on by using a
function as if an interrupt handler. For example, when I was working at
support center, there was a support case where the customer's system always
reboots for unknown reason whenever specific action is taken. I inserted
SystemTap script shown above into a function which is called when a system
reboots, and I identified that the reason was SysRq-b which was triggered by
HA manager daemon due to misconfiguration. I used a reboot function as an
interrupt handler for examining why that handler was called. Likewise, we
can use __warn_memalloc_stall() as an interrupt handler for examining what
is going on. I think that there will be situations where existing printk()
does not provide enough information and thus examining a memory snapshot is
needed. Allowing tracing tools like SystemTap to insert a hook by function
name (instead of line number) is helpful anyway.

Going back to !(gfp_mask & __GFP_NOWARN) check, if you don't want to
print stall messages, you can move that check to inside
__warn_memalloc_stall(). Then, I can insert a SystemTap hook to print
stall messages for !(gfp_mask & __GFP_NOWARN) case. Even more, if you
are not sure what is best threshold, you can call a hook function every
second. The SystemTap script can check threshold and print warning if
necessary information are passed to that hook function. This resembles
LSM hooks. In the past, LSM hook was calling an empty function with
necessary information when LSM hook user (e.g. SELinux) is not registered.

> 
> > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > mutex in order to serialize warning messages because it is possible that
> > multiple allocation requests are stalling?
> 
> we do not use any lock in warn_alloc_failed so why this should be any
> different?

warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
!__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
concurrent memory allocation failure messages make dmesg output
unreadable. But __warn_memalloc_stall() is called for only
__GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
sleep in order to serialize concurrent memory allocation stall
messages.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-23 17:34   ` Dave Hansen
@ 2016-09-24 13:19     ` Balbir Singh
  -1 siblings, 0 replies; 42+ messages in thread
From: Balbir Singh @ 2016-09-24 13:19 UTC (permalink / raw)
  To: Dave Hansen, Michal Hocko, linux-mm
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, Tetsuo Handa, LKML,
	Michal Hocko



On 24/09/16 03:34, Dave Hansen wrote:
> On 09/23/2016 01:15 AM, Michal Hocko wrote:
>> +	/* Make sure we know about allocations which stall for too long */
>> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
>> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
>> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
>> +				order, gfp_mask, &gfp_mask);
>> +		stall_timeout += 10 * HZ;
>> +		dump_stack();
>> +	}
> 
> This would make an awesome tracepoint.  There's probably still plenty of
> value to having it in dmesg, but the configurability of tracepoints is
> hard to beat.

An awesome tracepoint and a great place to trigger other tracepoints. With stall timeout
increasing every time, do we only care about the first instance when we exceeded stall_timeout?
Do we debug just that instance?

Balbir Singh.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-24 13:19     ` Balbir Singh
  0 siblings, 0 replies; 42+ messages in thread
From: Balbir Singh @ 2016-09-24 13:19 UTC (permalink / raw)
  To: Dave Hansen, Michal Hocko, linux-mm
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, Tetsuo Handa, LKML,
	Michal Hocko



On 24/09/16 03:34, Dave Hansen wrote:
> On 09/23/2016 01:15 AM, Michal Hocko wrote:
>> +	/* Make sure we know about allocations which stall for too long */
>> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
>> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
>> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
>> +				order, gfp_mask, &gfp_mask);
>> +		stall_timeout += 10 * HZ;
>> +		dump_stack();
>> +	}
> 
> This would make an awesome tracepoint.  There's probably still plenty of
> value to having it in dmesg, but the configurability of tracepoints is
> hard to beat.

An awesome tracepoint and a great place to trigger other tracepoints. With stall timeout
increasing every time, do we only care about the first instance when we exceeded stall_timeout?
Do we debug just that instance?

Balbir Singh.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-23 17:34   ` Dave Hansen
@ 2016-09-26  8:12     ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-26  8:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman, Tetsuo Handa, LKML

On Fri 23-09-16 10:34:01, Dave Hansen wrote:
> On 09/23/2016 01:15 AM, Michal Hocko wrote:
> > +	/* Make sure we know about allocations which stall for too long */
> > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > +				order, gfp_mask, &gfp_mask);
> > +		stall_timeout += 10 * HZ;
> > +		dump_stack();
> > +	}
> 
> This would make an awesome tracepoint.  There's probably still plenty of
> value to having it in dmesg, but the configurability of tracepoints is
> hard to beat.

Currently we only have trace_mm_page_alloc in __alloc_pages_nodemask. I
think we want to add another one to mark the beginning of the allocation
so that we can track allocation latencies per allocation context and
ideally drop them down into sources - congestion waits, reclaim path,
slab reclaim etc. Janani Ravichandran is working on a script to do that
http://lkml.kernel.org/r/20160911222411.GA2854@janani-Inspiron-3521

But this sounds a bit orthogonal to my proposal here because I would
really like to warn unconditionally when an allocation stalls for
unreasonably long. Tracepoints are not an ideal tool for that because
you have to start collecting tracing output before this situations
happen. Moreover in my experience I often had to replace my local
debugging trace_printks by regular printks because the prior ones just
got lost under a heavy memory pressure.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-26  8:12     ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-26  8:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman, Tetsuo Handa, LKML

On Fri 23-09-16 10:34:01, Dave Hansen wrote:
> On 09/23/2016 01:15 AM, Michal Hocko wrote:
> > +	/* Make sure we know about allocations which stall for too long */
> > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > +				order, gfp_mask, &gfp_mask);
> > +		stall_timeout += 10 * HZ;
> > +		dump_stack();
> > +	}
> 
> This would make an awesome tracepoint.  There's probably still plenty of
> value to having it in dmesg, but the configurability of tracepoints is
> hard to beat.

Currently we only have trace_mm_page_alloc in __alloc_pages_nodemask. I
think we want to add another one to mark the beginning of the allocation
so that we can track allocation latencies per allocation context and
ideally drop them down into sources - congestion waits, reclaim path,
slab reclaim etc. Janani Ravichandran is working on a script to do that
http://lkml.kernel.org/r/20160911222411.GA2854@janani-Inspiron-3521

But this sounds a bit orthogonal to my proposal here because I would
really like to warn unconditionally when an allocation stalls for
unreasonably long. Tracepoints are not an ideal tool for that because
you have to start collecting tracing output before this situations
happen. Moreover in my experience I often had to replace my local
debugging trace_printks by regular printks because the prior ones just
got lost under a heavy memory pressure.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-24 13:19     ` Balbir Singh
@ 2016-09-26  8:13       ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-26  8:13 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Dave Hansen, linux-mm, Andrew Morton, Johannes Weiner,
	Mel Gorman, Tetsuo Handa, LKML

On Sat 24-09-16 23:19:04, Balbir Singh wrote:
> 
> 
> On 24/09/16 03:34, Dave Hansen wrote:
> > On 09/23/2016 01:15 AM, Michal Hocko wrote:
> >> +	/* Make sure we know about allocations which stall for too long */
> >> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> >> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> >> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> >> +				order, gfp_mask, &gfp_mask);
> >> +		stall_timeout += 10 * HZ;
> >> +		dump_stack();
> >> +	}
> > 
> > This would make an awesome tracepoint.  There's probably still plenty of
> > value to having it in dmesg, but the configurability of tracepoints is
> > hard to beat.
> 
> An awesome tracepoint and a great place to trigger other tracepoints. With stall timeout
> increasing every time, do we only care about the first instance when we exceeded stall_timeout?
> Do we debug just that instance?

I am not sure I understand you here. The stall_timeout is increased to
see whether the situation is permanent of ephemeral. This is similar to
RCU lockup reports.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-26  8:13       ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-26  8:13 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Dave Hansen, linux-mm, Andrew Morton, Johannes Weiner,
	Mel Gorman, Tetsuo Handa, LKML

On Sat 24-09-16 23:19:04, Balbir Singh wrote:
> 
> 
> On 24/09/16 03:34, Dave Hansen wrote:
> > On 09/23/2016 01:15 AM, Michal Hocko wrote:
> >> +	/* Make sure we know about allocations which stall for too long */
> >> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> >> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> >> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> >> +				order, gfp_mask, &gfp_mask);
> >> +		stall_timeout += 10 * HZ;
> >> +		dump_stack();
> >> +	}
> > 
> > This would make an awesome tracepoint.  There's probably still plenty of
> > value to having it in dmesg, but the configurability of tracepoints is
> > hard to beat.
> 
> An awesome tracepoint and a great place to trigger other tracepoints. With stall timeout
> increasing every time, do we only care about the first instance when we exceeded stall_timeout?
> Do we debug just that instance?

I am not sure I understand you here. The stall_timeout is increased to
see whether the situation is permanent of ephemeral. This is similar to
RCU lockup reports.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-24  3:00       ` Tetsuo Handa
@ 2016-09-26  8:17         ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-26  8:17 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, akpm, hannes, mgorman, linux-kernel

On Sat 24-09-16 12:00:07, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 23-09-16 23:36:22, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > > >  	else
> > > >  		no_progress_loops++;
> > > >  
> > > > +	/* Make sure we know about allocations which stall for too long */
> > > > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > > 
> > > Should we check !__GFP_NOWARN ? I think __GFP_NOWARN is likely used with
> > > __GFP_NORETRY, and __GFP_NORETRY is already checked by now.
> > > 
> > > I think printing warning regardless of __GFP_NOWARN is better because
> > > this check is similar to hungtask warning.
> > 
> > Well, if the user said to not warn we should really obey that. Why would
> > that matter?
> 
> __GFP_NOWARN is defined as "Do not print failure messages when memory
> allocation failed". It is not defined as "Do not print OOM killer messages
> when OOM killer is invoked". It is undefined that "Do not print stall
> messages when memory allocation is stalling".

Which is kind of expected as we warned only about allocation failures up
to now.

> If memory allocating threads were blocked on locks instead of doing direct
> reclaim, hungtask will be able to find stalling memory allocations without
> this change. Since direct reclaim prevents allocating threads from sleeping
> for long enough to be warned by hungtask, it is important that this change
> shall find allocating threads which cannot be warned by hungtask. That is,
> not printing warning messages for __GFP_NOWARN allocation requests looses
> the value of this change.

I dunno. If the user explicitly requests to not have allocation warning
then I think we should obey that. But this is not something I would be
really insisting hard. If others think that the check should be dropped
I can live with that.

[...]
> > > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > > mutex in order to serialize warning messages because it is possible that
> > > multiple allocation requests are stalling?
> > 
> > we do not use any lock in warn_alloc_failed so why this should be any
> > different?
> 
> warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
> !__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
> to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
> concurrent memory allocation failure messages make dmesg output
> unreadable. But __warn_memalloc_stall() is called for only
> __GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
> sleep in order to serialize concurrent memory allocation stall
> messages.

I still do not see a point. A single line about the warning and locked
dump_stack sounds sufficient to me.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-26  8:17         ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-26  8:17 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, akpm, hannes, mgorman, linux-kernel

On Sat 24-09-16 12:00:07, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 23-09-16 23:36:22, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > > >  	else
> > > >  		no_progress_loops++;
> > > >  
> > > > +	/* Make sure we know about allocations which stall for too long */
> > > > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > > 
> > > Should we check !__GFP_NOWARN ? I think __GFP_NOWARN is likely used with
> > > __GFP_NORETRY, and __GFP_NORETRY is already checked by now.
> > > 
> > > I think printing warning regardless of __GFP_NOWARN is better because
> > > this check is similar to hungtask warning.
> > 
> > Well, if the user said to not warn we should really obey that. Why would
> > that matter?
> 
> __GFP_NOWARN is defined as "Do not print failure messages when memory
> allocation failed". It is not defined as "Do not print OOM killer messages
> when OOM killer is invoked". It is undefined that "Do not print stall
> messages when memory allocation is stalling".

Which is kind of expected as we warned only about allocation failures up
to now.

> If memory allocating threads were blocked on locks instead of doing direct
> reclaim, hungtask will be able to find stalling memory allocations without
> this change. Since direct reclaim prevents allocating threads from sleeping
> for long enough to be warned by hungtask, it is important that this change
> shall find allocating threads which cannot be warned by hungtask. That is,
> not printing warning messages for __GFP_NOWARN allocation requests looses
> the value of this change.

I dunno. If the user explicitly requests to not have allocation warning
then I think we should obey that. But this is not something I would be
really insisting hard. If others think that the check should be dropped
I can live with that.

[...]
> > > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > > mutex in order to serialize warning messages because it is possible that
> > > multiple allocation requests are stalling?
> > 
> > we do not use any lock in warn_alloc_failed so why this should be any
> > different?
> 
> warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
> !__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
> to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
> concurrent memory allocation failure messages make dmesg output
> unreadable. But __warn_memalloc_stall() is called for only
> __GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
> sleep in order to serialize concurrent memory allocation stall
> messages.

I still do not see a point. A single line about the warning and locked
dump_stack sounds sufficient to me.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-26  8:17         ` Michal Hocko
@ 2016-09-27 12:57           ` Tetsuo Handa
  -1 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2016-09-27 12:57 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, akpm, hannes, mgorman, linux-kernel

Michal Hocko wrote:
> > > > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > > > mutex in order to serialize warning messages because it is possible that
> > > > multiple allocation requests are stalling?
> > > 
> > > we do not use any lock in warn_alloc_failed so why this should be any
> > > different?
> > 
> > warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
> > !__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
> > to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
> > concurrent memory allocation failure messages make dmesg output
> > unreadable. But __warn_memalloc_stall() is called for only
> > __GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
> > sleep in order to serialize concurrent memory allocation stall
> > messages.
> 
> I still do not see a point. A single line about the warning and locked
> dump_stack sounds sufficient to me.

printk() is slow operation. It is possible that two allocation requests
start within time period needed for completing warn_alloc_failed().
It is possible that multiple concurrent allocations are stalling when
one of them cannot be satisfied. The consequence is multiple concurrent
timeouts corrupting dmesg.
http://I-love.SAKURA.ne.jp/tmp/serial-20160927-nolock.txt.xz
(Please ignore Oops at do_task_stat(); it is irrelevant to this topic.)

If we guard it with mutex_lock(&oom_lock)/mutex_unlock(&oom_lock),
no corruption.
http://I-love.SAKURA.ne.jp/tmp/serial-20160927-lock.txt.xz

Deferring it when trylock() failed will be also possible.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-27 12:57           ` Tetsuo Handa
  0 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2016-09-27 12:57 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, akpm, hannes, mgorman, linux-kernel

Michal Hocko wrote:
> > > > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > > > mutex in order to serialize warning messages because it is possible that
> > > > multiple allocation requests are stalling?
> > > 
> > > we do not use any lock in warn_alloc_failed so why this should be any
> > > different?
> > 
> > warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
> > !__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
> > to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
> > concurrent memory allocation failure messages make dmesg output
> > unreadable. But __warn_memalloc_stall() is called for only
> > __GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
> > sleep in order to serialize concurrent memory allocation stall
> > messages.
> 
> I still do not see a point. A single line about the warning and locked
> dump_stack sounds sufficient to me.

printk() is slow operation. It is possible that two allocation requests
start within time period needed for completing warn_alloc_failed().
It is possible that multiple concurrent allocations are stalling when
one of them cannot be satisfied. The consequence is multiple concurrent
timeouts corrupting dmesg.
http://I-love.SAKURA.ne.jp/tmp/serial-20160927-nolock.txt.xz
(Please ignore Oops at do_task_stat(); it is irrelevant to this topic.)

If we guard it with mutex_lock(&oom_lock)/mutex_unlock(&oom_lock),
no corruption.
http://I-love.SAKURA.ne.jp/tmp/serial-20160927-lock.txt.xz

Deferring it when trylock() failed will be also possible.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 0/2] warn about allocations which stall for too long
  2016-09-23  8:15 ` Michal Hocko
@ 2016-09-29  8:44   ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-29  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Mel Gorman, Tetsuo Handa, Dave Hansen, linux-mm, LKML

Hi,
it seems there was no fundamental opposition to my previous RFC [1]
so I am sending this again now to be considered for inclusion. I have
reworked the patch slightly and made it use the already existing
warn_alloc_failed which was updated and renamed to be more generic. This
is the patch 1. The patch 2 then simply uses it to warn about long
stall. Comparing to the previous patch it also does show_mem() which
might be really helpful to see why the allocation cannot make any
progress.

[1] http://lkml.kernel.org/r/20160923081555.14645-1-mhocko@kernel.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 0/2] warn about allocations which stall for too long
@ 2016-09-29  8:44   ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-29  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Mel Gorman, Tetsuo Handa, Dave Hansen, linux-mm, LKML

Hi,
it seems there was no fundamental opposition to my previous RFC [1]
so I am sending this again now to be considered for inclusion. I have
reworked the patch slightly and made it use the already existing
warn_alloc_failed which was updated and renamed to be more generic. This
is the patch 1. The patch 2 then simply uses it to warn about long
stall. Comparing to the previous patch it also does show_mem() which
might be really helpful to see why the allocation cannot make any
progress.

[1] http://lkml.kernel.org/r/20160923081555.14645-1-mhocko@kernel.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/2] mm: consolidate warn_alloc_failed users
  2016-09-29  8:44   ` Michal Hocko
@ 2016-09-29  8:44     ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-29  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Mel Gorman, Tetsuo Handa, Dave Hansen, linux-mm,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

warn_alloc_failed is currently used from the page and vmalloc
allocators. This is a good reuse of the code except that vmalloc would
appreciate a slightly different warning message. This is already handled
by the fmt parameter except that

"%s: page allocation failure: order:%u, mode:%#x(%pGg)"

is printed anyway. This might be quite misleading because it might be
a vmalloc failure which leads to the warning while the page allocator is
not the culprit here. Fix this by always using the fmt string and only
print the context that makes sense for the particular context (e.g.
order makes only very little sense for the vmalloc context). Rename
the function to not miss any user and also because a later patch will
reuse it also for !failure cases.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h |  5 ++---
 mm/page_alloc.c    | 27 ++++++++++++---------------
 mm/vmalloc.c       | 14 ++++++--------
 3 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5db886915265..df4a6c02e421 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1916,9 +1916,8 @@ extern void si_meminfo_node(struct sysinfo *val, int nid);
 extern unsigned long arch_reserved_kernel_pages(void);
 #endif
 
-extern __printf(3, 4)
-void warn_alloc_failed(gfp_t gfp_mask, unsigned int order,
-		const char *fmt, ...);
+extern __printf(2, 3)
+void warn_alloc(gfp_t gfp_mask, const char *fmt, ...);
 
 extern void setup_per_cpu_pageset(void);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab8b6913179..969ffc97045b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2981,9 +2981,11 @@ static DEFINE_RATELIMIT_STATE(nopage_rs,
 		DEFAULT_RATELIMIT_INTERVAL,
 		DEFAULT_RATELIMIT_BURST);
 
-void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...)
+void warn_alloc(gfp_t gfp_mask, const char *fmt, ...)
 {
 	unsigned int filter = SHOW_MEM_FILTER_NODES;
+	struct va_format vaf;
+	va_list args;
 
 	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
 	    debug_guardpage_minorder() > 0)
@@ -3001,22 +3003,16 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...)
 	if (in_interrupt() || !(gfp_mask & __GFP_DIRECT_RECLAIM))
 		filter &= ~SHOW_MEM_FILTER_NODES;
 
-	if (fmt) {
-		struct va_format vaf;
-		va_list args;
+	pr_warn("%s: ", current->comm);
 
-		va_start(args, fmt);
+	va_start(args, fmt);
+	vaf.fmt = fmt;
+	vaf.va = &args;
+	pr_cont("%pV", &vaf);
+	va_end(args);
 
-		vaf.fmt = fmt;
-		vaf.va = &args;
+	pr_cont(", mode:%#x(%pGg)\n", gfp_mask, &gfp_mask);
 
-		pr_warn("%pV", &vaf);
-
-		va_end(args);
-	}
-
-	pr_warn("%s: page allocation failure: order:%u, mode:%#x(%pGg)\n",
-		current->comm, order, gfp_mask, &gfp_mask);
 	dump_stack();
 	if (!should_suppress_show_mem())
 		show_mem(filter);
@@ -3682,7 +3678,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	}
 
 nopage:
-	warn_alloc_failed(gfp_mask, order, NULL);
+	warn_alloc(gfp_mask,
+			"page allocation failure: order:%u", order);
 got_pg:
 	return page;
 }
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 80660a0f989b..f2481cb4e6b2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1601,7 +1601,6 @@ static void *__vmalloc_node(unsigned long size, unsigned long align,
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, int node)
 {
-	const int order = 0;
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
@@ -1629,9 +1628,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		struct page *page;
 
 		if (node == NUMA_NO_NODE)
-			page = alloc_pages(alloc_mask, order);
+			page = alloc_page(alloc_mask);
 		else
-			page = alloc_pages_node(node, alloc_mask, order);
+			page = alloc_pages_node(node, alloc_mask, 0);
 
 		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
@@ -1648,8 +1647,8 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	return area->addr;
 
 fail:
-	warn_alloc_failed(gfp_mask, order,
-			  "vmalloc: allocation failure, allocated %ld of %ld bytes\n",
+	warn_alloc(gfp_mask,
+			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
 			  (area->nr_pages*PAGE_SIZE), area->size);
 	vfree(area->addr);
 	return NULL;
@@ -1710,9 +1709,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 	return addr;
 
 fail:
-	warn_alloc_failed(gfp_mask, 0,
-			  "vmalloc: allocation failure: %lu bytes\n",
-			  real_size);
+	warn_alloc(gfp_mask,
+			  "vmalloc: allocation failure: %lu bytes", real_size);
 	return NULL;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 1/2] mm: consolidate warn_alloc_failed users
@ 2016-09-29  8:44     ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-29  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Mel Gorman, Tetsuo Handa, Dave Hansen, linux-mm,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

warn_alloc_failed is currently used from the page and vmalloc
allocators. This is a good reuse of the code except that vmalloc would
appreciate a slightly different warning message. This is already handled
by the fmt parameter except that

"%s: page allocation failure: order:%u, mode:%#x(%pGg)"

is printed anyway. This might be quite misleading because it might be
a vmalloc failure which leads to the warning while the page allocator is
not the culprit here. Fix this by always using the fmt string and only
print the context that makes sense for the particular context (e.g.
order makes only very little sense for the vmalloc context). Rename
the function to not miss any user and also because a later patch will
reuse it also for !failure cases.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h |  5 ++---
 mm/page_alloc.c    | 27 ++++++++++++---------------
 mm/vmalloc.c       | 14 ++++++--------
 3 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5db886915265..df4a6c02e421 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1916,9 +1916,8 @@ extern void si_meminfo_node(struct sysinfo *val, int nid);
 extern unsigned long arch_reserved_kernel_pages(void);
 #endif
 
-extern __printf(3, 4)
-void warn_alloc_failed(gfp_t gfp_mask, unsigned int order,
-		const char *fmt, ...);
+extern __printf(2, 3)
+void warn_alloc(gfp_t gfp_mask, const char *fmt, ...);
 
 extern void setup_per_cpu_pageset(void);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab8b6913179..969ffc97045b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2981,9 +2981,11 @@ static DEFINE_RATELIMIT_STATE(nopage_rs,
 		DEFAULT_RATELIMIT_INTERVAL,
 		DEFAULT_RATELIMIT_BURST);
 
-void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...)
+void warn_alloc(gfp_t gfp_mask, const char *fmt, ...)
 {
 	unsigned int filter = SHOW_MEM_FILTER_NODES;
+	struct va_format vaf;
+	va_list args;
 
 	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) ||
 	    debug_guardpage_minorder() > 0)
@@ -3001,22 +3003,16 @@ void warn_alloc_failed(gfp_t gfp_mask, unsigned int order, const char *fmt, ...)
 	if (in_interrupt() || !(gfp_mask & __GFP_DIRECT_RECLAIM))
 		filter &= ~SHOW_MEM_FILTER_NODES;
 
-	if (fmt) {
-		struct va_format vaf;
-		va_list args;
+	pr_warn("%s: ", current->comm);
 
-		va_start(args, fmt);
+	va_start(args, fmt);
+	vaf.fmt = fmt;
+	vaf.va = &args;
+	pr_cont("%pV", &vaf);
+	va_end(args);
 
-		vaf.fmt = fmt;
-		vaf.va = &args;
+	pr_cont(", mode:%#x(%pGg)\n", gfp_mask, &gfp_mask);
 
-		pr_warn("%pV", &vaf);
-
-		va_end(args);
-	}
-
-	pr_warn("%s: page allocation failure: order:%u, mode:%#x(%pGg)\n",
-		current->comm, order, gfp_mask, &gfp_mask);
 	dump_stack();
 	if (!should_suppress_show_mem())
 		show_mem(filter);
@@ -3682,7 +3678,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	}
 
 nopage:
-	warn_alloc_failed(gfp_mask, order, NULL);
+	warn_alloc(gfp_mask,
+			"page allocation failure: order:%u", order);
 got_pg:
 	return page;
 }
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 80660a0f989b..f2481cb4e6b2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1601,7 +1601,6 @@ static void *__vmalloc_node(unsigned long size, unsigned long align,
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, int node)
 {
-	const int order = 0;
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
@@ -1629,9 +1628,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		struct page *page;
 
 		if (node == NUMA_NO_NODE)
-			page = alloc_pages(alloc_mask, order);
+			page = alloc_page(alloc_mask);
 		else
-			page = alloc_pages_node(node, alloc_mask, order);
+			page = alloc_pages_node(node, alloc_mask, 0);
 
 		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
@@ -1648,8 +1647,8 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	return area->addr;
 
 fail:
-	warn_alloc_failed(gfp_mask, order,
-			  "vmalloc: allocation failure, allocated %ld of %ld bytes\n",
+	warn_alloc(gfp_mask,
+			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
 			  (area->nr_pages*PAGE_SIZE), area->size);
 	vfree(area->addr);
 	return NULL;
@@ -1710,9 +1709,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 	return addr;
 
 fail:
-	warn_alloc_failed(gfp_mask, 0,
-			  "vmalloc: allocation failure: %lu bytes\n",
-			  real_size);
+	warn_alloc(gfp_mask,
+			  "vmalloc: allocation failure: %lu bytes", real_size);
 	return NULL;
 }
 
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/2] mm: warn about allocations which stall for too long
  2016-09-29  8:44   ` Michal Hocko
@ 2016-09-29  8:44     ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-29  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Mel Gorman, Tetsuo Handa, Dave Hansen, linux-mm,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Currently we do warn only about allocation failures but small
allocations are basically nofail and they might loop in the page
allocator for a long time.  Especially when the reclaim cannot make
any progress - e.g. GFP_NOFS cannot invoke the oom killer and rely on
a different context to make a forward progress in case there is a lot
memory used by filesystems.

Give us at least a clue when something like this happens and warn about
allocations which take more than 10s. Print the basic allocation context
information along with the cumulative time spent in the allocation as
well as the allocation stack. Repeat the warning after every 10 seconds so
that we know that the problem is permanent rather than ephemeral.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 969ffc97045b..73f60ad6315f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3495,6 +3495,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	enum compact_result compact_result;
 	int compaction_retries = 0;
 	int no_progress_loops = 0;
+	unsigned long alloc_start = jiffies;
+	unsigned int stall_timeout = 10 * HZ;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -3650,6 +3652,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
+	/* Make sure we know about allocations which stall for too long */
+	if (time_after(jiffies, alloc_start + stall_timeout)) {
+		warn_alloc(gfp_mask,
+			"page alloction stalls for %ums, order:%u\n",
+			jiffies_to_msecs(jiffies-alloc_start), order);
+		stall_timeout += 10 * HZ;
+	}
+
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 				 did_some_progress > 0, &no_progress_loops))
 		goto retry;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/2] mm: warn about allocations which stall for too long
@ 2016-09-29  8:44     ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-29  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Mel Gorman, Tetsuo Handa, Dave Hansen, linux-mm,
	LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Currently we do warn only about allocation failures but small
allocations are basically nofail and they might loop in the page
allocator for a long time.  Especially when the reclaim cannot make
any progress - e.g. GFP_NOFS cannot invoke the oom killer and rely on
a different context to make a forward progress in case there is a lot
memory used by filesystems.

Give us at least a clue when something like this happens and warn about
allocations which take more than 10s. Print the basic allocation context
information along with the cumulative time spent in the allocation as
well as the allocation stack. Repeat the warning after every 10 seconds so
that we know that the problem is permanent rather than ephemeral.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 969ffc97045b..73f60ad6315f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3495,6 +3495,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	enum compact_result compact_result;
 	int compaction_retries = 0;
 	int no_progress_loops = 0;
+	unsigned long alloc_start = jiffies;
+	unsigned int stall_timeout = 10 * HZ;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -3650,6 +3652,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
+	/* Make sure we know about allocations which stall for too long */
+	if (time_after(jiffies, alloc_start + stall_timeout)) {
+		warn_alloc(gfp_mask,
+			"page alloction stalls for %ums, order:%u\n",
+			jiffies_to_msecs(jiffies-alloc_start), order);
+		stall_timeout += 10 * HZ;
+	}
+
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 				 did_some_progress > 0, &no_progress_loops))
 		goto retry;
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
  2016-09-27 12:57           ` Tetsuo Handa
@ 2016-09-29  8:48             ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-29  8:48 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, akpm, hannes, mgorman, linux-kernel

On Tue 27-09-16 21:57:26, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > > > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > > > > mutex in order to serialize warning messages because it is possible that
> > > > > multiple allocation requests are stalling?
> > > > 
> > > > we do not use any lock in warn_alloc_failed so why this should be any
> > > > different?
> > > 
> > > warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
> > > !__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
> > > to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
> > > concurrent memory allocation failure messages make dmesg output
> > > unreadable. But __warn_memalloc_stall() is called for only
> > > __GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
> > > sleep in order to serialize concurrent memory allocation stall
> > > messages.
> > 
> > I still do not see a point. A single line about the warning and locked
> > dump_stack sounds sufficient to me.
> 
> printk() is slow operation. It is possible that two allocation requests
> start within time period needed for completing warn_alloc_failed().
> It is possible that multiple concurrent allocations are stalling when
> one of them cannot be satisfied. The consequence is multiple concurrent
> timeouts corrupting dmesg.
> http://I-love.SAKURA.ne.jp/tmp/serial-20160927-nolock.txt.xz
> (Please ignore Oops at do_task_stat(); it is irrelevant to this topic.)
> 
> If we guard it with mutex_lock(&oom_lock)/mutex_unlock(&oom_lock),
> no corruption.
> http://I-love.SAKURA.ne.jp/tmp/serial-20160927-lock.txt.xz

I have just posted v2 which reuses warn_alloc_failed infrastructure. If
we want to have a lock there then it should be a separate patch imho.
Ideally with and example from your above kernel log.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mm: warn about allocations which stall for too long
@ 2016-09-29  8:48             ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-29  8:48 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, akpm, hannes, mgorman, linux-kernel

On Tue 27-09-16 21:57:26, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > > > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > > > > mutex in order to serialize warning messages because it is possible that
> > > > > multiple allocation requests are stalling?
> > > > 
> > > > we do not use any lock in warn_alloc_failed so why this should be any
> > > > different?
> > > 
> > > warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
> > > !__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
> > > to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
> > > concurrent memory allocation failure messages make dmesg output
> > > unreadable. But __warn_memalloc_stall() is called for only
> > > __GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
> > > sleep in order to serialize concurrent memory allocation stall
> > > messages.
> > 
> > I still do not see a point. A single line about the warning and locked
> > dump_stack sounds sufficient to me.
> 
> printk() is slow operation. It is possible that two allocation requests
> start within time period needed for completing warn_alloc_failed().
> It is possible that multiple concurrent allocations are stalling when
> one of them cannot be satisfied. The consequence is multiple concurrent
> timeouts corrupting dmesg.
> http://I-love.SAKURA.ne.jp/tmp/serial-20160927-nolock.txt.xz
> (Please ignore Oops at do_task_stat(); it is irrelevant to this topic.)
> 
> If we guard it with mutex_lock(&oom_lock)/mutex_unlock(&oom_lock),
> no corruption.
> http://I-love.SAKURA.ne.jp/tmp/serial-20160927-lock.txt.xz

I have just posted v2 which reuses warn_alloc_failed infrastructure. If
we want to have a lock there then it should be a separate patch imho.
Ideally with and example from your above kernel log.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm: warn about allocations which stall for too long
  2016-09-29  8:44     ` Michal Hocko
@ 2016-09-29  9:02       ` Tetsuo Handa
  -1 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2016-09-29  9:02 UTC (permalink / raw)
  To: mhocko, akpm; +Cc: hannes, mgorman, dave.hansen, linux-mm, linux-kernel, mhocko

Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Currently we do warn only about allocation failures but small
> allocations are basically nofail and they might loop in the page
> allocator for a long time.  Especially when the reclaim cannot make
> any progress - e.g. GFP_NOFS cannot invoke the oom killer and rely on
> a different context to make a forward progress in case there is a lot
> memory used by filesystems.
> 
> Give us at least a clue when something like this happens and warn about
> allocations which take more than 10s. Print the basic allocation context
> information along with the cumulative time spent in the allocation as
> well as the allocation stack. Repeat the warning after every 10 seconds so
> that we know that the problem is permanent rather than ephemeral.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_alloc.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 969ffc97045b..73f60ad6315f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3495,6 +3495,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	enum compact_result compact_result;
>  	int compaction_retries = 0;
>  	int no_progress_loops = 0;
> +	unsigned long alloc_start = jiffies;
> +	unsigned int stall_timeout = 10 * HZ;
>  
>  	/*
>  	 * In the slowpath, we sanity check order to avoid ever trying to
> @@ -3650,6 +3652,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
>  		goto nopage;
>  
> +	/* Make sure we know about allocations which stall for too long */
> +	if (time_after(jiffies, alloc_start + stall_timeout)) {
> +		warn_alloc(gfp_mask,

I expect "gfp_mask & ~__GFP_NOWARN" rather than "gfp_mask" here.
Otherwise, we can't get a clue for __GFP_NOWARN allocations.

> +			"page alloction stalls for %ums, order:%u\n",
> +			jiffies_to_msecs(jiffies-alloc_start), order);
> +		stall_timeout += 10 * HZ;
> +	}
> +
>  	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
>  				 did_some_progress > 0, &no_progress_loops))
>  		goto retry;
> -- 
> 2.9.3

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm: warn about allocations which stall for too long
@ 2016-09-29  9:02       ` Tetsuo Handa
  0 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2016-09-29  9:02 UTC (permalink / raw)
  To: mhocko, akpm; +Cc: hannes, mgorman, dave.hansen, linux-mm, linux-kernel, mhocko

Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Currently we do warn only about allocation failures but small
> allocations are basically nofail and they might loop in the page
> allocator for a long time.  Especially when the reclaim cannot make
> any progress - e.g. GFP_NOFS cannot invoke the oom killer and rely on
> a different context to make a forward progress in case there is a lot
> memory used by filesystems.
> 
> Give us at least a clue when something like this happens and warn about
> allocations which take more than 10s. Print the basic allocation context
> information along with the cumulative time spent in the allocation as
> well as the allocation stack. Repeat the warning after every 10 seconds so
> that we know that the problem is permanent rather than ephemeral.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_alloc.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 969ffc97045b..73f60ad6315f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3495,6 +3495,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	enum compact_result compact_result;
>  	int compaction_retries = 0;
>  	int no_progress_loops = 0;
> +	unsigned long alloc_start = jiffies;
> +	unsigned int stall_timeout = 10 * HZ;
>  
>  	/*
>  	 * In the slowpath, we sanity check order to avoid ever trying to
> @@ -3650,6 +3652,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
>  		goto nopage;
>  
> +	/* Make sure we know about allocations which stall for too long */
> +	if (time_after(jiffies, alloc_start + stall_timeout)) {
> +		warn_alloc(gfp_mask,

I expect "gfp_mask & ~__GFP_NOWARN" rather than "gfp_mask" here.
Otherwise, we can't get a clue for __GFP_NOWARN allocations.

> +			"page alloction stalls for %ums, order:%u\n",
> +			jiffies_to_msecs(jiffies-alloc_start), order);
> +		stall_timeout += 10 * HZ;
> +	}
> +
>  	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
>  				 did_some_progress > 0, &no_progress_loops))
>  		goto retry;
> -- 
> 2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm: warn about allocations which stall for too long
  2016-09-29  9:02       ` Tetsuo Handa
@ 2016-09-29  9:10         ` Michal Hocko
  -1 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-29  9:10 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, hannes, mgorman, dave.hansen, linux-mm, linux-kernel

On Thu 29-09-16 18:02:44, Tetsuo Handa wrote:
[...]
> > @@ -3650,6 +3652,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
> >  		goto nopage;
> >  
> > +	/* Make sure we know about allocations which stall for too long */
> > +	if (time_after(jiffies, alloc_start + stall_timeout)) {
> > +		warn_alloc(gfp_mask,
> 
> I expect "gfp_mask & ~__GFP_NOWARN" rather than "gfp_mask" here.
> Otherwise, we can't get a clue for __GFP_NOWARN allocations.

If there is an explicit __GFP_NOWARN then I believe we should obey it
same way we do for the allocation failure. If you believe this is not
the best way then feel free to send a patch with an example where a
__GFP_NOWARN user would really like to see about the stall.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] mm: warn about allocations which stall for too long
@ 2016-09-29  9:10         ` Michal Hocko
  0 siblings, 0 replies; 42+ messages in thread
From: Michal Hocko @ 2016-09-29  9:10 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, hannes, mgorman, dave.hansen, linux-mm, linux-kernel

On Thu 29-09-16 18:02:44, Tetsuo Handa wrote:
[...]
> > @@ -3650,6 +3652,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
> >  		goto nopage;
> >  
> > +	/* Make sure we know about allocations which stall for too long */
> > +	if (time_after(jiffies, alloc_start + stall_timeout)) {
> > +		warn_alloc(gfp_mask,
> 
> I expect "gfp_mask & ~__GFP_NOWARN" rather than "gfp_mask" here.
> Otherwise, we can't get a clue for __GFP_NOWARN allocations.

If there is an explicit __GFP_NOWARN then I believe we should obey it
same way we do for the allocation failure. If you believe this is not
the best way then feel free to send a patch with an example where a
__GFP_NOWARN user would really like to see about the stall.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] mm: consolidate warn_alloc_failed users
  2016-09-29  8:44     ` Michal Hocko
@ 2016-09-29  9:23       ` Vlastimil Babka
  -1 siblings, 0 replies; 42+ messages in thread
From: Vlastimil Babka @ 2016-09-29  9:23 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Johannes Weiner, Mel Gorman, Tetsuo Handa, Dave Hansen, linux-mm,
	LKML, Michal Hocko

On 09/29/2016 10:44 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> warn_alloc_failed is currently used from the page and vmalloc
> allocators. This is a good reuse of the code except that vmalloc would
> appreciate a slightly different warning message. This is already handled
> by the fmt parameter except that
>
> "%s: page allocation failure: order:%u, mode:%#x(%pGg)"
>
> is printed anyway. This might be quite misleading because it might be
> a vmalloc failure which leads to the warning while the page allocator is
> not the culprit here. Fix this by always using the fmt string and only
> print the context that makes sense for the particular context (e.g.
> order makes only very little sense for the vmalloc context). Rename
> the function to not miss any user and also because a later patch will
> reuse it also for !failure cases.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] mm: consolidate warn_alloc_failed users
@ 2016-09-29  9:23       ` Vlastimil Babka
  0 siblings, 0 replies; 42+ messages in thread
From: Vlastimil Babka @ 2016-09-29  9:23 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Johannes Weiner, Mel Gorman, Tetsuo Handa, Dave Hansen, linux-mm,
	LKML, Michal Hocko

On 09/29/2016 10:44 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> warn_alloc_failed is currently used from the page and vmalloc
> allocators. This is a good reuse of the code except that vmalloc would
> appreciate a slightly different warning message. This is already handled
> by the fmt parameter except that
>
> "%s: page allocation failure: order:%u, mode:%#x(%pGg)"
>
> is printed anyway. This might be quite misleading because it might be
> a vmalloc failure which leads to the warning while the page allocator is
> not the culprit here. Fix this by always using the fmt string and only
> print the context that makes sense for the particular context (e.g.
> order makes only very little sense for the vmalloc context). Rename
> the function to not miss any user and also because a later patch will
> reuse it also for !failure cases.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2016-09-29  9:23 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-23  8:15 [PATCH] mm: warn about allocations which stall for too long Michal Hocko
2016-09-23  8:15 ` Michal Hocko
2016-09-23  8:29 ` Hillf Danton
2016-09-23  8:29   ` Hillf Danton
2016-09-23  8:32   ` Michal Hocko
2016-09-23  8:32     ` Michal Hocko
2016-09-23  8:44     ` Hillf Danton
2016-09-23  8:44       ` Hillf Danton
2016-09-23  9:15       ` Michal Hocko
2016-09-23  9:15         ` Michal Hocko
2016-09-23 14:36 ` Tetsuo Handa
2016-09-23 14:36   ` Tetsuo Handa
2016-09-23 15:02   ` Michal Hocko
2016-09-23 15:02     ` Michal Hocko
2016-09-24  3:00     ` Tetsuo Handa
2016-09-24  3:00       ` Tetsuo Handa
2016-09-26  8:17       ` Michal Hocko
2016-09-26  8:17         ` Michal Hocko
2016-09-27 12:57         ` Tetsuo Handa
2016-09-27 12:57           ` Tetsuo Handa
2016-09-29  8:48           ` Michal Hocko
2016-09-29  8:48             ` Michal Hocko
2016-09-23 17:34 ` Dave Hansen
2016-09-23 17:34   ` Dave Hansen
2016-09-24 13:19   ` Balbir Singh
2016-09-24 13:19     ` Balbir Singh
2016-09-26  8:13     ` Michal Hocko
2016-09-26  8:13       ` Michal Hocko
2016-09-26  8:12   ` Michal Hocko
2016-09-26  8:12     ` Michal Hocko
2016-09-29  8:44 ` [PATCH 0/2] " Michal Hocko
2016-09-29  8:44   ` Michal Hocko
2016-09-29  8:44   ` [PATCH 1/2] mm: consolidate warn_alloc_failed users Michal Hocko
2016-09-29  8:44     ` Michal Hocko
2016-09-29  9:23     ` Vlastimil Babka
2016-09-29  9:23       ` Vlastimil Babka
2016-09-29  8:44   ` [PATCH 2/2] mm: warn about allocations which stall for too long Michal Hocko
2016-09-29  8:44     ` Michal Hocko
2016-09-29  9:02     ` Tetsuo Handa
2016-09-29  9:02       ` Tetsuo Handa
2016-09-29  9:10       ` Michal Hocko
2016-09-29  9:10         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.