All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Huang Ying <ying.huang@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Theodore Ts'o" <tytso@mit.edu>
Subject: Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
Date: Tue, 14 Apr 2015 16:23:07 +0200	[thread overview]
Message-ID: <20150414142307.GI17160@dhcp22.suse.cz> (raw)
In-Reply-To: <20150414103625.GA26264@cmpxchg.org>

On Tue 14-04-15 06:36:25, Johannes Weiner wrote:
> On Mon, Apr 13, 2015 at 02:46:14PM +0200, Michal Hocko wrote:
[...]
> > AFAIU, David wasn't asking for the OOM killer as much as he was
> > interested in getting access to a small amount of reserves in order to
> > make a progress. __GFP_HIGH is there for this purpose.
> 
> That's not just any reserve pool available to the generic caller, it's
> the reserve pool for interrupts, which can not wait and replenish it.
> It relies on kswapd to run soon after the interrupt, or right away on
> SMP.  But locks held in the filesystem can hold up kswapd (the reason
> we even still perform direct reclaim) so NOFS allocs shouldn't use it.
> 
> [hannes@dexter linux]$ git grep '__GFP_HIGH\b' | wc -l
> 39
> [hannes@dexter linux]$ git grep GFP_ATOMIC | wc -l
> 4324
> 
> Interrupts have *no other option*. 

Atomic context in general can ALLOC_HARDER so it has an access to
additional reserves wrt. __GFP_HIGH|__GFP_WAIT.

> It's misguided to deplete their
> reserves, cause loss of network packets, loss of input events, from
> allocations that can actually perform reclaim and have perfectly
> acceptable fallback strategies in the caller.

OK, I thought that it was clear that the proposed __GFP_HIGH is a
fallback strategy for those paths which cannot do much better. Not a
random solution for "this shouldn't fail to eagerly".

> Generally, for any reserve system there must be a way to replenish it.
> For interrupts it's kswapd, for the OOM reserves I proposed it's the
> OOM victim exiting soon after the allocation, if not right away.

And my understanding was that the fallback mode would be used in the
context which would lead to release of the fs pressure thus releasing a
memory as well.

> __GFP_NOFAIL is the odd one out here because accessing the system's
> emergency reserves without any prospect of near-future replenishing is
> just slightly better than deadlocking right away.  Which is why this
> reserve access can not be separated out: if you can do *anything*
> better than hanging, do it.  If not, use __GFP_NOFAIL.

Agreed.
 
> > > My question here would be: are there any NOFS allocations that *don't*
> > > want this behavior?  Does it even make sense to require this separate
> > > annotation or should we just make it the default?
> > > 
> > > The argument here was always that NOFS allocations are very limited in
> > > their reclaim powers and will trigger OOM prematurely.  However, the
> > > way we limit dirty memory these days forces most cache to be clean at
> > > all times, and direct reclaim in general hasn't been allowed to issue
> > > page writeback for quite some time.  So these days, NOFS reclaim isn't
> > > really weaker than regular direct reclaim. 
> > 
> > What about [di]cache and some others fs specific shrinkers (and heavy
> > metadata loads)?
> 
> My bad, I forgot about those.  But it doesn't really change the basic
> question of whether we want to change the GFP_NOFS default or merely
> annotate individual sites that want to try harder.

My understanding was the later one. If you look at page cache allocations
which use mapping_gfp_mask (e.g. xfs is using GFP_NOFS for that context
all the time) then those do not really have to try harder.

> > > The only exception is that
> > > it might block writeback, so we'd go OOM if the only reclaimables left
> > > were dirty pages against that filesystem.  That should be acceptable.
> > 
> > OOM killer is hardly acceptable by most users I've heard from. OOM
> > killer is the _last_ resort and if the allocation is restricted then
> > we shouldn't use the big hammer.
> 
> We *are* talking about the last resort for these allocations!  There
> is nothing else we can do to avoid allocation failure at this point.
> Absent a reservation system, we have the choice between failing after
> reclaim - which Dave said was too fragile for XFS - or OOM killing.

As per other emails in this thread (e.g.
http://marc.info/?l=linux-mm&m=142897087230385&w=2), I understood that
the access to a small portion of emergency pool would be sufficient to
release the pressure and that sounds preferable to me over a destructive
reclaim attempts.

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Huang Ying <ying.huang@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Theodore Ts'o <tytso@mit.edu>
Subject: Re: [patch 00/12] mm: page_alloc: improve OOM mechanism and policy
Date: Tue, 14 Apr 2015 16:23:07 +0200	[thread overview]
Message-ID: <20150414142307.GI17160@dhcp22.suse.cz> (raw)
In-Reply-To: <20150414103625.GA26264@cmpxchg.org>

On Tue 14-04-15 06:36:25, Johannes Weiner wrote:
> On Mon, Apr 13, 2015 at 02:46:14PM +0200, Michal Hocko wrote:
[...]
> > AFAIU, David wasn't asking for the OOM killer as much as he was
> > interested in getting access to a small amount of reserves in order to
> > make a progress. __GFP_HIGH is there for this purpose.
> 
> That's not just any reserve pool available to the generic caller, it's
> the reserve pool for interrupts, which can not wait and replenish it.
> It relies on kswapd to run soon after the interrupt, or right away on
> SMP.  But locks held in the filesystem can hold up kswapd (the reason
> we even still perform direct reclaim) so NOFS allocs shouldn't use it.
> 
> [hannes@dexter linux]$ git grep '__GFP_HIGH\b' | wc -l
> 39
> [hannes@dexter linux]$ git grep GFP_ATOMIC | wc -l
> 4324
> 
> Interrupts have *no other option*. 

Atomic context in general can ALLOC_HARDER so it has an access to
additional reserves wrt. __GFP_HIGH|__GFP_WAIT.

> It's misguided to deplete their
> reserves, cause loss of network packets, loss of input events, from
> allocations that can actually perform reclaim and have perfectly
> acceptable fallback strategies in the caller.

OK, I thought that it was clear that the proposed __GFP_HIGH is a
fallback strategy for those paths which cannot do much better. Not a
random solution for "this shouldn't fail to eagerly".

> Generally, for any reserve system there must be a way to replenish it.
> For interrupts it's kswapd, for the OOM reserves I proposed it's the
> OOM victim exiting soon after the allocation, if not right away.

And my understanding was that the fallback mode would be used in the
context which would lead to release of the fs pressure thus releasing a
memory as well.

> __GFP_NOFAIL is the odd one out here because accessing the system's
> emergency reserves without any prospect of near-future replenishing is
> just slightly better than deadlocking right away.  Which is why this
> reserve access can not be separated out: if you can do *anything*
> better than hanging, do it.  If not, use __GFP_NOFAIL.

Agreed.
 
> > > My question here would be: are there any NOFS allocations that *don't*
> > > want this behavior?  Does it even make sense to require this separate
> > > annotation or should we just make it the default?
> > > 
> > > The argument here was always that NOFS allocations are very limited in
> > > their reclaim powers and will trigger OOM prematurely.  However, the
> > > way we limit dirty memory these days forces most cache to be clean at
> > > all times, and direct reclaim in general hasn't been allowed to issue
> > > page writeback for quite some time.  So these days, NOFS reclaim isn't
> > > really weaker than regular direct reclaim. 
> > 
> > What about [di]cache and some others fs specific shrinkers (and heavy
> > metadata loads)?
> 
> My bad, I forgot about those.  But it doesn't really change the basic
> question of whether we want to change the GFP_NOFS default or merely
> annotate individual sites that want to try harder.

My understanding was the later one. If you look at page cache allocations
which use mapping_gfp_mask (e.g. xfs is using GFP_NOFS for that context
all the time) then those do not really have to try harder.

> > > The only exception is that
> > > it might block writeback, so we'd go OOM if the only reclaimables left
> > > were dirty pages against that filesystem.  That should be acceptable.
> > 
> > OOM killer is hardly acceptable by most users I've heard from. OOM
> > killer is the _last_ resort and if the allocation is restricted then
> > we shouldn't use the big hammer.
> 
> We *are* talking about the last resort for these allocations!  There
> is nothing else we can do to avoid allocation failure at this point.
> Absent a reservation system, we have the choice between failing after
> reclaim - which Dave said was too fragile for XFS - or OOM killing.

As per other emails in this thread (e.g.
http://marc.info/?l=linux-mm&m=142897087230385&w=2), I understood that
the access to a small portion of emergency pool would be sufficient to
release the pressure and that sounds preferable to me over a destructive
reclaim attempts.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-04-14 14:23 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-25  6:17 [patch 00/12] mm: page_alloc: improve OOM mechanism and policy Johannes Weiner
2015-03-25  6:17 ` Johannes Weiner
2015-03-25  6:17 ` [patch 01/12] mm: oom_kill: remove unnecessary locking in oom_enable() Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26  0:51   ` David Rientjes
2015-03-26  0:51     ` David Rientjes
2015-03-26 11:51     ` Michal Hocko
2015-03-26 11:51       ` Michal Hocko
2015-03-26 13:18       ` Michal Hocko
2015-03-26 13:18         ` Michal Hocko
2015-03-26 19:30         ` David Rientjes
2015-03-26 19:30           ` David Rientjes
2015-03-26 11:43   ` Michal Hocko
2015-03-26 11:43     ` Michal Hocko
2015-03-26 20:05   ` David Rientjes
2015-03-26 20:05     ` David Rientjes
2015-03-25  6:17 ` [patch 02/12] mm: oom_kill: clean up victim marking and exiting interfaces Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26  3:34   ` David Rientjes
2015-03-26  3:34     ` David Rientjes
2015-03-26 11:54   ` Michal Hocko
2015-03-26 11:54     ` Michal Hocko
2015-03-25  6:17 ` [patch 03/12] mm: oom_kill: switch test-and-clear of known TIF_MEMDIE to clear Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26  3:31   ` David Rientjes
2015-03-26  3:31     ` David Rientjes
2015-03-26 11:05     ` Johannes Weiner
2015-03-26 11:05       ` Johannes Weiner
2015-03-26 19:50       ` David Rientjes
2015-03-26 19:50         ` David Rientjes
2015-03-30 14:48         ` Michal Hocko
2015-03-30 14:48           ` Michal Hocko
2015-04-02 23:01         ` [patch] android, lmk: avoid setting TIF_MEMDIE if process has already exited David Rientjes
2015-04-02 23:01           ` David Rientjes
2015-04-28 22:50           ` [patch resend] " David Rientjes
2015-04-28 22:50             ` David Rientjes
2015-03-26 11:57   ` [patch 03/12] mm: oom_kill: switch test-and-clear of known TIF_MEMDIE to clear Michal Hocko
2015-03-26 11:57     ` Michal Hocko
2015-03-25  6:17 ` [patch 04/12] mm: oom_kill: remove unnecessary locking in exit_oom_victim() Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 12:53   ` Michal Hocko
2015-03-26 12:53     ` Michal Hocko
2015-03-26 13:01     ` Michal Hocko
2015-03-26 13:01       ` Michal Hocko
2015-03-26 15:10       ` Johannes Weiner
2015-03-26 15:10         ` Johannes Weiner
2015-03-26 15:04     ` Johannes Weiner
2015-03-26 15:04       ` Johannes Weiner
2015-03-25  6:17 ` [patch 05/12] mm: oom_kill: generalize OOM progress waitqueue Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 13:03   ` Michal Hocko
2015-03-26 13:03     ` Michal Hocko
2015-03-25  6:17 ` [patch 06/12] mm: oom_kill: simplify OOM killer locking Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 13:31   ` Michal Hocko
2015-03-26 13:31     ` Michal Hocko
2015-03-26 15:17     ` Johannes Weiner
2015-03-26 15:17       ` Johannes Weiner
2015-03-26 16:07       ` Michal Hocko
2015-03-26 16:07         ` Michal Hocko
2015-03-25  6:17 ` [patch 07/12] mm: page_alloc: inline should_alloc_retry() Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 14:11   ` Michal Hocko
2015-03-26 14:11     ` Michal Hocko
2015-03-26 15:18     ` Johannes Weiner
2015-03-26 15:18       ` Johannes Weiner
2015-03-25  6:17 ` [patch 08/12] mm: page_alloc: wait for OOM killer progress before retrying Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-25 14:15   ` Tetsuo Handa
2015-03-25 14:15     ` Tetsuo Handa
2015-03-25 17:01     ` Vlastimil Babka
2015-03-25 17:01       ` Vlastimil Babka
2015-03-26 11:28       ` Johannes Weiner
2015-03-26 11:28         ` Johannes Weiner
2015-03-26 11:24     ` Johannes Weiner
2015-03-26 11:24       ` Johannes Weiner
2015-03-26 14:32       ` Michal Hocko
2015-03-26 14:32         ` Michal Hocko
2015-03-26 15:23         ` Johannes Weiner
2015-03-26 15:23           ` Johannes Weiner
2015-03-26 15:38           ` Michal Hocko
2015-03-26 15:38             ` Michal Hocko
2015-03-26 18:17             ` Johannes Weiner
2015-03-26 18:17               ` Johannes Weiner
2015-03-27 14:01             ` [patch 08/12] mm: page_alloc: wait for OOM killer progressbefore retrying Tetsuo Handa
2015-03-27 14:01               ` Tetsuo Handa
2015-03-26 15:58   ` [patch 08/12] mm: page_alloc: wait for OOM killer progress before retrying Michal Hocko
2015-03-26 15:58     ` Michal Hocko
2015-03-26 18:23     ` Johannes Weiner
2015-03-26 18:23       ` Johannes Weiner
2015-03-25  6:17 ` [patch 09/12] mm: page_alloc: private memory reserves for OOM-killing allocations Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-04-14 16:49   ` Michal Hocko
2015-04-14 16:49     ` Michal Hocko
2015-04-24 19:13     ` Johannes Weiner
2015-04-24 19:13       ` Johannes Weiner
2015-03-25  6:17 ` [patch 10/12] mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-04-14 16:55   ` Michal Hocko
2015-04-14 16:55     ` Michal Hocko
2015-03-25  6:17 ` [patch 11/12] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 14:50   ` Michal Hocko
2015-03-26 14:50     ` Michal Hocko
2015-03-25  6:17 ` [patch 12/12] mm: page_alloc: do not lock up low-order " Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 15:32   ` Michal Hocko
2015-03-26 15:32     ` Michal Hocko
2015-03-26 19:58 ` [patch 00/12] mm: page_alloc: improve OOM mechanism and policy Dave Chinner
2015-03-26 19:58   ` Dave Chinner
2015-03-27 15:05   ` Johannes Weiner
2015-03-27 15:05     ` Johannes Weiner
2015-03-30  0:32     ` Dave Chinner
2015-03-30  0:32       ` Dave Chinner
2015-03-30 19:31       ` Johannes Weiner
2015-03-30 19:31         ` Johannes Weiner
2015-04-01 15:19       ` Michal Hocko
2015-04-01 15:19         ` Michal Hocko
2015-04-01 21:39         ` Dave Chinner
2015-04-01 21:39           ` Dave Chinner
2015-04-02  7:29           ` Michal Hocko
2015-04-02  7:29             ` Michal Hocko
2015-04-07 14:18         ` Johannes Weiner
2015-04-07 14:18           ` Johannes Weiner
2015-04-11  7:29           ` Tetsuo Handa
2015-04-11  7:29             ` Tetsuo Handa
2015-04-13 12:49             ` Michal Hocko
2015-04-13 12:49               ` Michal Hocko
2015-04-13 12:46           ` Michal Hocko
2015-04-13 12:46             ` Michal Hocko
2015-04-14  0:11             ` Dave Chinner
2015-04-14  0:11               ` Dave Chinner
2015-04-14  7:20               ` Michal Hocko
2015-04-14  7:20                 ` Michal Hocko
2015-04-14 10:36             ` Johannes Weiner
2015-04-14 10:36               ` Johannes Weiner
2015-04-14 14:23               ` Michal Hocko [this message]
2015-04-14 14:23                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150414142307.GI17160@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.