All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	hannes@cmpxchg.org, mgorman@suse.de,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: warn about allocations which stall for too long
Date: Sat, 24 Sep 2016 12:00:07 +0900	[thread overview]
Message-ID: <201609241200.AEE21807.OSOtQVOLHMFJFF@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20160923150234.GV4478@dhcp22.suse.cz>

Michal Hocko wrote:
> On Fri 23-09-16 23:36:22, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >  	else
> > >  		no_progress_loops++;
> > >  
> > > +	/* Make sure we know about allocations which stall for too long */
> > > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > 
> > Should we check !__GFP_NOWARN ? I think __GFP_NOWARN is likely used with
> > __GFP_NORETRY, and __GFP_NORETRY is already checked by now.
> > 
> > I think printing warning regardless of __GFP_NOWARN is better because
> > this check is similar to hungtask warning.
> 
> Well, if the user said to not warn we should really obey that. Why would
> that matter?

__GFP_NOWARN is defined as "Do not print failure messages when memory
allocation failed". It is not defined as "Do not print OOM killer messages
when OOM killer is invoked". It is undefined that "Do not print stall
messages when memory allocation is stalling".

If memory allocating threads were blocked on locks instead of doing direct
reclaim, hungtask will be able to find stalling memory allocations without
this change. Since direct reclaim prevents allocating threads from sleeping
for long enough to be warned by hungtask, it is important that this change
shall find allocating threads which cannot be warned by hungtask. That is,
not printing warning messages for __GFP_NOWARN allocation requests looses
the value of this change.

>  
> > > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > > +				order, gfp_mask, &gfp_mask);
> > > +		stall_timeout += 10 * HZ;
> > > +		dump_stack();
> > 
> > Can we move this pr_warn() + dump_stack() to a separate function like
> > 
> > static void __warn_memalloc_stall(unsigned int order, gfp_t gfp_mask, unsigned long alloc_start)
> > {
> > 	pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > 		current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > 		order, gfp_mask, &gfp_mask);
> > 	dump_stack();
> > }
> > 
> > in order to allow SystemTap scripts to perform additional actions by name (e.g.
> > 
> > # stap -g -e 'probe kernel.function("__warn_memalloc_stall").return { panic(); }
> 
> I find this reasoning and the use case really _absurd_, seriously! Pulling
> the warning into a separate function might be reasonable regardless,
> though. It matches warn_alloc_failed. Also if we find out we need some
> rate limitting or more checks it might just turn out being easier to
> follow rather than in the middle of an already complicated allocation
> slow path. I just do not like that the stall_timeout would have to stay
> in the original place or have it an in/out parameter.

SystemTap script shown above is just an example. What is nice is that
we can do whatever actions for examining what is going on by using a
function as if an interrupt handler. For example, when I was working at
support center, there was a support case where the customer's system always
reboots for unknown reason whenever specific action is taken. I inserted
SystemTap script shown above into a function which is called when a system
reboots, and I identified that the reason was SysRq-b which was triggered by
HA manager daemon due to misconfiguration. I used a reboot function as an
interrupt handler for examining why that handler was called. Likewise, we
can use __warn_memalloc_stall() as an interrupt handler for examining what
is going on. I think that there will be situations where existing printk()
does not provide enough information and thus examining a memory snapshot is
needed. Allowing tracing tools like SystemTap to insert a hook by function
name (instead of line number) is helpful anyway.

Going back to !(gfp_mask & __GFP_NOWARN) check, if you don't want to
print stall messages, you can move that check to inside
__warn_memalloc_stall(). Then, I can insert a SystemTap hook to print
stall messages for !(gfp_mask & __GFP_NOWARN) case. Even more, if you
are not sure what is best threshold, you can call a hook function every
second. The SystemTap script can check threshold and print warning if
necessary information are passed to that hook function. This resembles
LSM hooks. In the past, LSM hook was calling an empty function with
necessary information when LSM hook user (e.g. SELinux) is not registered.

> 
> > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > mutex in order to serialize warning messages because it is possible that
> > multiple allocation requests are stalling?
> 
> we do not use any lock in warn_alloc_failed so why this should be any
> different?

warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
!__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
concurrent memory allocation failure messages make dmesg output
unreadable. But __warn_memalloc_stall() is called for only
__GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
sleep in order to serialize concurrent memory allocation stall
messages.

WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	hannes@cmpxchg.org, mgorman@suse.de,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: warn about allocations which stall for too long
Date: Sat, 24 Sep 2016 12:00:07 +0900	[thread overview]
Message-ID: <201609241200.AEE21807.OSOtQVOLHMFJFF@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20160923150234.GV4478@dhcp22.suse.cz>

Michal Hocko wrote:
> On Fri 23-09-16 23:36:22, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > @@ -3659,6 +3661,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >  	else
> > >  		no_progress_loops++;
> > >  
> > > +	/* Make sure we know about allocations which stall for too long */
> > > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > 
> > Should we check !__GFP_NOWARN ? I think __GFP_NOWARN is likely used with
> > __GFP_NORETRY, and __GFP_NORETRY is already checked by now.
> > 
> > I think printing warning regardless of __GFP_NOWARN is better because
> > this check is similar to hungtask warning.
> 
> Well, if the user said to not warn we should really obey that. Why would
> that matter?

__GFP_NOWARN is defined as "Do not print failure messages when memory
allocation failed". It is not defined as "Do not print OOM killer messages
when OOM killer is invoked". It is undefined that "Do not print stall
messages when memory allocation is stalling".

If memory allocating threads were blocked on locks instead of doing direct
reclaim, hungtask will be able to find stalling memory allocations without
this change. Since direct reclaim prevents allocating threads from sleeping
for long enough to be warned by hungtask, it is important that this change
shall find allocating threads which cannot be warned by hungtask. That is,
not printing warning messages for __GFP_NOWARN allocation requests looses
the value of this change.

>  
> > > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > > +				order, gfp_mask, &gfp_mask);
> > > +		stall_timeout += 10 * HZ;
> > > +		dump_stack();
> > 
> > Can we move this pr_warn() + dump_stack() to a separate function like
> > 
> > static void __warn_memalloc_stall(unsigned int order, gfp_t gfp_mask, unsigned long alloc_start)
> > {
> > 	pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > 		current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > 		order, gfp_mask, &gfp_mask);
> > 	dump_stack();
> > }
> > 
> > in order to allow SystemTap scripts to perform additional actions by name (e.g.
> > 
> > # stap -g -e 'probe kernel.function("__warn_memalloc_stall").return { panic(); }
> 
> I find this reasoning and the use case really _absurd_, seriously! Pulling
> the warning into a separate function might be reasonable regardless,
> though. It matches warn_alloc_failed. Also if we find out we need some
> rate limitting or more checks it might just turn out being easier to
> follow rather than in the middle of an already complicated allocation
> slow path. I just do not like that the stall_timeout would have to stay
> in the original place or have it an in/out parameter.

SystemTap script shown above is just an example. What is nice is that
we can do whatever actions for examining what is going on by using a
function as if an interrupt handler. For example, when I was working at
support center, there was a support case where the customer's system always
reboots for unknown reason whenever specific action is taken. I inserted
SystemTap script shown above into a function which is called when a system
reboots, and I identified that the reason was SysRq-b which was triggered by
HA manager daemon due to misconfiguration. I used a reboot function as an
interrupt handler for examining why that handler was called. Likewise, we
can use __warn_memalloc_stall() as an interrupt handler for examining what
is going on. I think that there will be situations where existing printk()
does not provide enough information and thus examining a memory snapshot is
needed. Allowing tracing tools like SystemTap to insert a hook by function
name (instead of line number) is helpful anyway.

Going back to !(gfp_mask & __GFP_NOWARN) check, if you don't want to
print stall messages, you can move that check to inside
__warn_memalloc_stall(). Then, I can insert a SystemTap hook to print
stall messages for !(gfp_mask & __GFP_NOWARN) case. Even more, if you
are not sure what is best threshold, you can call a hook function every
second. The SystemTap script can check threshold and print warning if
necessary information are passed to that hook function. This resembles
LSM hooks. In the past, LSM hook was calling an empty function with
necessary information when LSM hook user (e.g. SELinux) is not registered.

> 
> > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > mutex in order to serialize warning messages because it is possible that
> > multiple allocation requests are stalling?
> 
> we do not use any lock in warn_alloc_failed so why this should be any
> different?

warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
!__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
concurrent memory allocation failure messages make dmesg output
unreadable. But __warn_memalloc_stall() is called for only
__GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
sleep in order to serialize concurrent memory allocation stall
messages.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-09-24  3:00 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-23  8:15 [PATCH] mm: warn about allocations which stall for too long Michal Hocko
2016-09-23  8:15 ` Michal Hocko
2016-09-23  8:29 ` Hillf Danton
2016-09-23  8:29   ` Hillf Danton
2016-09-23  8:32   ` Michal Hocko
2016-09-23  8:32     ` Michal Hocko
2016-09-23  8:44     ` Hillf Danton
2016-09-23  8:44       ` Hillf Danton
2016-09-23  9:15       ` Michal Hocko
2016-09-23  9:15         ` Michal Hocko
2016-09-23 14:36 ` Tetsuo Handa
2016-09-23 14:36   ` Tetsuo Handa
2016-09-23 15:02   ` Michal Hocko
2016-09-23 15:02     ` Michal Hocko
2016-09-24  3:00     ` Tetsuo Handa [this message]
2016-09-24  3:00       ` Tetsuo Handa
2016-09-26  8:17       ` Michal Hocko
2016-09-26  8:17         ` Michal Hocko
2016-09-27 12:57         ` Tetsuo Handa
2016-09-27 12:57           ` Tetsuo Handa
2016-09-29  8:48           ` Michal Hocko
2016-09-29  8:48             ` Michal Hocko
2016-09-23 17:34 ` Dave Hansen
2016-09-23 17:34   ` Dave Hansen
2016-09-24 13:19   ` Balbir Singh
2016-09-24 13:19     ` Balbir Singh
2016-09-26  8:13     ` Michal Hocko
2016-09-26  8:13       ` Michal Hocko
2016-09-26  8:12   ` Michal Hocko
2016-09-26  8:12     ` Michal Hocko
2016-09-29  8:44 ` [PATCH 0/2] " Michal Hocko
2016-09-29  8:44   ` Michal Hocko
2016-09-29  8:44   ` [PATCH 1/2] mm: consolidate warn_alloc_failed users Michal Hocko
2016-09-29  8:44     ` Michal Hocko
2016-09-29  9:23     ` Vlastimil Babka
2016-09-29  9:23       ` Vlastimil Babka
2016-09-29  8:44   ` [PATCH 2/2] mm: warn about allocations which stall for too long Michal Hocko
2016-09-29  8:44     ` Michal Hocko
2016-09-29  9:02     ` Tetsuo Handa
2016-09-29  9:02       ` Tetsuo Handa
2016-09-29  9:10       ` Michal Hocko
2016-09-29  9:10         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201609241200.AEE21807.OSOtQVOLHMFJFF@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.