All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-mm@kvack.org, vbabka@suse.cz, rientjes@google.com,
	hannes@cmpxchg.org, mgorman@suse.de, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
Date: Fri, 25 Nov 2016 14:18:07 +0100	[thread overview]
Message-ID: <20161125131806.GB24353@dhcp22.suse.cz> (raw)
In-Reply-To: <201611252100.ADG04225.MFOSOVtHJFFLQO@I-love.SAKURA.ne.jp>

On Fri 25-11-16 21:00:52, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 23-11-16 23:35:10, Tetsuo Handa wrote:
> > > If __alloc_pages_nowmark() called by __GFP_NOFAIL could not find pages
> > > with requested order due to fragmentation, __GFP_NOFAIL should invoke
> > > the OOM killer. I believe that risking kill all processes and panic the
> > > system eventually is better than __GFP_NOFAIL livelock.
> >
> > I violently disagree. Just imagine a driver which asks for an order-9
> > page and cannot really continue without it so it uses GFP_NOFAIL. There
> > is absolutely no reason to disrupt or even put the whole system down
> > just because of this particular request. It might take for ever to
> > continue but that is to be expected when asking for such a hard
> > requirement.
> 
> Did we find such in-tree drivers? If any, we likely already know it via
> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); in buffered_rmqueue().
> Even if there were such out-of-tree drivers, we don't need to take care of
> out-of-tree drivers.

We do not have any costly + GFP_NOFAIL users in the tree from my quick
check. The whole point of this excercise is to make such a potential
user not seeing unexpected side effects - e.g. when transforming open
coded endless lopps into GFP_NOFAIL which is imho preferable.

The bottom line is that GFP_NOFAIL is about never failing not to get the
memory as quickly as possible or for any price.

> > > Unfortunately, there seems to be cases where the
> > > caller needs to use GFP_NOFS rather than GFP_KERNEL due to unclear dependency
> > > between memory allocation by system calls and memory reclaim by filesystems.
> >
> > I do not understand your point here. Syscall is an entry point to the
> > kernel where we cannot recurse to the FS code so GFP_NOFS seems wrong
> > thing to ask.
> 
> Will you look at http://marc.info/?t=120716967100004&r=1&w=2 which lead to
> commit a02fe13297af26c1 ("selinux: prevent rentry into the FS") and commit
> 869ab5147e1eead8 ("SELinux: more GFP_NOFS fixups to prevent selinux from
> re-entering the fs code") ? My understanding is that mkdir() system call
> caused memory allocation for inode creation and that memory allocation
> caused memory reclaim which had to be !__GFP_FS.

I will have a look later, thanks for the points.
 
> And whether we need to use GFP_NOFS at specific point is very very unclear.

And that is exactly why I am pushing for a scoped GFP_NOFS usage where
the FS code marks those scopes which are dangerous from the reclaim
recursion or for other FS internal reasons and the stacking code
shouldn't care at all. Spreading GFP_NOFS randomly is not at all helpful
nor it makes the situation any better.

I am sorry but I would prefer not to discuss this part in this thread as
it is mostly off topic. The point I am trying to make here is to clean
up GFP_NOFAIL usage. And I argue that overriding the oom prevention
decisions just because of GFP_NOFAIL is wrong. So let's please stick
with this topic. I might be wrong and miss some legitimate case but then
I would like to hear about it.

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-mm@kvack.org, vbabka@suse.cz, rientjes@google.com,
	hannes@cmpxchg.org, mgorman@suse.de, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
Date: Fri, 25 Nov 2016 14:18:07 +0100	[thread overview]
Message-ID: <20161125131806.GB24353@dhcp22.suse.cz> (raw)
In-Reply-To: <201611252100.ADG04225.MFOSOVtHJFFLQO@I-love.SAKURA.ne.jp>

On Fri 25-11-16 21:00:52, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 23-11-16 23:35:10, Tetsuo Handa wrote:
> > > If __alloc_pages_nowmark() called by __GFP_NOFAIL could not find pages
> > > with requested order due to fragmentation, __GFP_NOFAIL should invoke
> > > the OOM killer. I believe that risking kill all processes and panic the
> > > system eventually is better than __GFP_NOFAIL livelock.
> >
> > I violently disagree. Just imagine a driver which asks for an order-9
> > page and cannot really continue without it so it uses GFP_NOFAIL. There
> > is absolutely no reason to disrupt or even put the whole system down
> > just because of this particular request. It might take for ever to
> > continue but that is to be expected when asking for such a hard
> > requirement.
> 
> Did we find such in-tree drivers? If any, we likely already know it via
> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); in buffered_rmqueue().
> Even if there were such out-of-tree drivers, we don't need to take care of
> out-of-tree drivers.

We do not have any costly + GFP_NOFAIL users in the tree from my quick
check. The whole point of this excercise is to make such a potential
user not seeing unexpected side effects - e.g. when transforming open
coded endless lopps into GFP_NOFAIL which is imho preferable.

The bottom line is that GFP_NOFAIL is about never failing not to get the
memory as quickly as possible or for any price.

> > > Unfortunately, there seems to be cases where the
> > > caller needs to use GFP_NOFS rather than GFP_KERNEL due to unclear dependency
> > > between memory allocation by system calls and memory reclaim by filesystems.
> >
> > I do not understand your point here. Syscall is an entry point to the
> > kernel where we cannot recurse to the FS code so GFP_NOFS seems wrong
> > thing to ask.
> 
> Will you look at http://marc.info/?t=120716967100004&r=1&w=2 which lead to
> commit a02fe13297af26c1 ("selinux: prevent rentry into the FS") and commit
> 869ab5147e1eead8 ("SELinux: more GFP_NOFS fixups to prevent selinux from
> re-entering the fs code") ? My understanding is that mkdir() system call
> caused memory allocation for inode creation and that memory allocation
> caused memory reclaim which had to be !__GFP_FS.

I will have a look later, thanks for the points.
 
> And whether we need to use GFP_NOFS at specific point is very very unclear.

And that is exactly why I am pushing for a scoped GFP_NOFS usage where
the FS code marks those scopes which are dangerous from the reclaim
recursion or for other FS internal reasons and the stacking code
shouldn't care at all. Spreading GFP_NOFS randomly is not at all helpful
nor it makes the situation any better.

I am sorry but I would prefer not to discuss this part in this thread as
it is mostly off topic. The point I am trying to make here is to clean
up GFP_NOFAIL usage. And I argue that overriding the oom prevention
decisions just because of GFP_NOFAIL is wrong. So let's please stick
with this topic. I might be wrong and miss some legitimate case but then
I would like to hear about it.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-11-25 13:18 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-23  6:49 [RFC 0/2] GFP_NOFAIL cleanups Michal Hocko
2016-11-23  6:49 ` Michal Hocko
2016-11-23  6:49 ` [RFC 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
2016-11-23  6:49   ` Michal Hocko
2016-11-23 10:43   ` Vlastimil Babka
2016-11-23 10:43     ` Vlastimil Babka
2016-11-23  6:49 ` [RFC 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
2016-11-23  6:49   ` Michal Hocko
2016-11-23 12:19   ` Vlastimil Babka
2016-11-23 12:19     ` Vlastimil Babka
2016-11-23 12:35     ` Michal Hocko
2016-11-23 12:35       ` Michal Hocko
2016-11-24  7:41       ` Vlastimil Babka
2016-11-24  7:41         ` Vlastimil Babka
2016-11-24  7:51         ` Michal Hocko
2016-11-24  7:51           ` Michal Hocko
2016-11-23 14:35   ` Tetsuo Handa
2016-11-23 14:35     ` Tetsuo Handa
2016-11-23 15:35     ` Michal Hocko
2016-11-23 15:35       ` Michal Hocko
2016-11-25 12:00       ` Tetsuo Handa
2016-11-25 12:00         ` Tetsuo Handa
2016-11-25 13:18         ` Michal Hocko [this message]
2016-11-25 13:18           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161125131806.GB24353@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.