All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, vbabka@suse.cz, rientjes@google.com,
	hannes@cmpxchg.org, mgorman@suse.de, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
Date: Fri, 25 Nov 2016 21:00:52 +0900	[thread overview]
Message-ID: <201611252100.ADG04225.MFOSOVtHJFFLQO@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20161123153544.GN2864@dhcp22.suse.cz>

Michal Hocko wrote:
> On Wed 23-11-16 23:35:10, Tetsuo Handa wrote:
> > If __alloc_pages_nowmark() called by __GFP_NOFAIL could not find pages
> > with requested order due to fragmentation, __GFP_NOFAIL should invoke
> > the OOM killer. I believe that risking kill all processes and panic the
> > system eventually is better than __GFP_NOFAIL livelock.
>
> I violently disagree. Just imagine a driver which asks for an order-9
> page and cannot really continue without it so it uses GFP_NOFAIL. There
> is absolutely no reason to disrupt or even put the whole system down
> just because of this particular request. It might take for ever to
> continue but that is to be expected when asking for such a hard
> requirement.

Did we find such in-tree drivers? If any, we likely already know it via
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); in buffered_rmqueue().
Even if there were such out-of-tree drivers, we don't need to take care of
out-of-tree drivers.

> > Unfortunately, there seems to be cases where the
> > caller needs to use GFP_NOFS rather than GFP_KERNEL due to unclear dependency
> > between memory allocation by system calls and memory reclaim by filesystems.
>
> I do not understand your point here. Syscall is an entry point to the
> kernel where we cannot recurse to the FS code so GFP_NOFS seems wrong
> thing to ask.

Will you look at http://marc.info/?t=120716967100004&r=1&w=2 which lead to
commit a02fe13297af26c1 ("selinux: prevent rentry into the FS") and commit
869ab5147e1eead8 ("SELinux: more GFP_NOFS fixups to prevent selinux from
re-entering the fs code") ? My understanding is that mkdir() system call
caused memory allocation for inode creation and that memory allocation
caused memory reclaim which had to be !__GFP_FS.

And whether we need to use GFP_NOFS at specific point is very very unclear.
For example, security_inode_init_security() calls call_int_hook() macro
which calls smack_inode_init_security() if Smack is active.
smack_inode_init_security() uses GFP_NOFS for memory allocation.
security_inode_init_security() also calls evm_inode_init_security(), and
evm_inode_init_security() uses GFP_NOFS for memory allocation.
Looks consistent? Yes.

But evm_inode_init_security() also calls evm_init_hmac() which in turn calls
init_desc() which uses GFP_KERNEL for memory allocation. This is not consistent.

And security_inode_init_security() also calls initxattrs() callback which is
provided by filesystem code. For example, btrfs_initxattrs() is called if
security_inode_init_security() is called by btrfs. And btrfs_initxattrs() is
using GFP_KERNEL for memory allocation. This is not consistent too.

Either we are needlessly using GFP_NOFS with risk of retry-forever loop or
we are wrongly using GFP_KERNEL with risk of memory reclaim deadlock.
Apart from we need to make these GFP_NOFS/GFP_KERNEL usages consistent
(although whether we need to use GFP_NOFS is very very unclear),
I do want to allow memory allocations from functions which are called by
system calls to invoke the OOM-killer (e.g. __GFP_MAY_OOMKILL) rather than
risk retry-forever loop (or fail that request) even if we need to use GFP_NOFS.
Also, I'm willing to give up memory allocations from functions which are
called by system calls if SIGKILL is pending (i.e. __GFP_KILLABLE).

Did you understand my point?

WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, vbabka@suse.cz, rientjes@google.com,
	hannes@cmpxchg.org, mgorman@suse.de, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
Date: Fri, 25 Nov 2016 21:00:52 +0900	[thread overview]
Message-ID: <201611252100.ADG04225.MFOSOVtHJFFLQO@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20161123153544.GN2864@dhcp22.suse.cz>

Michal Hocko wrote:
> On Wed 23-11-16 23:35:10, Tetsuo Handa wrote:
> > If __alloc_pages_nowmark() called by __GFP_NOFAIL could not find pages
> > with requested order due to fragmentation, __GFP_NOFAIL should invoke
> > the OOM killer. I believe that risking kill all processes and panic the
> > system eventually is better than __GFP_NOFAIL livelock.
>
> I violently disagree. Just imagine a driver which asks for an order-9
> page and cannot really continue without it so it uses GFP_NOFAIL. There
> is absolutely no reason to disrupt or even put the whole system down
> just because of this particular request. It might take for ever to
> continue but that is to be expected when asking for such a hard
> requirement.

Did we find such in-tree drivers? If any, we likely already know it via
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); in buffered_rmqueue().
Even if there were such out-of-tree drivers, we don't need to take care of
out-of-tree drivers.

> > Unfortunately, there seems to be cases where the
> > caller needs to use GFP_NOFS rather than GFP_KERNEL due to unclear dependency
> > between memory allocation by system calls and memory reclaim by filesystems.
>
> I do not understand your point here. Syscall is an entry point to the
> kernel where we cannot recurse to the FS code so GFP_NOFS seems wrong
> thing to ask.

Will you look at http://marc.info/?t=120716967100004&r=1&w=2 which lead to
commit a02fe13297af26c1 ("selinux: prevent rentry into the FS") and commit
869ab5147e1eead8 ("SELinux: more GFP_NOFS fixups to prevent selinux from
re-entering the fs code") ? My understanding is that mkdir() system call
caused memory allocation for inode creation and that memory allocation
caused memory reclaim which had to be !__GFP_FS.

And whether we need to use GFP_NOFS at specific point is very very unclear.
For example, security_inode_init_security() calls call_int_hook() macro
which calls smack_inode_init_security() if Smack is active.
smack_inode_init_security() uses GFP_NOFS for memory allocation.
security_inode_init_security() also calls evm_inode_init_security(), and
evm_inode_init_security() uses GFP_NOFS for memory allocation.
Looks consistent? Yes.

But evm_inode_init_security() also calls evm_init_hmac() which in turn calls
init_desc() which uses GFP_KERNEL for memory allocation. This is not consistent.

And security_inode_init_security() also calls initxattrs() callback which is
provided by filesystem code. For example, btrfs_initxattrs() is called if
security_inode_init_security() is called by btrfs. And btrfs_initxattrs() is
using GFP_KERNEL for memory allocation. This is not consistent too.

Either we are needlessly using GFP_NOFS with risk of retry-forever loop or
we are wrongly using GFP_KERNEL with risk of memory reclaim deadlock.
Apart from we need to make these GFP_NOFS/GFP_KERNEL usages consistent
(although whether we need to use GFP_NOFS is very very unclear),
I do want to allow memory allocations from functions which are called by
system calls to invoke the OOM-killer (e.g. __GFP_MAY_OOMKILL) rather than
risk retry-forever loop (or fail that request) even if we need to use GFP_NOFS.
Also, I'm willing to give up memory allocations from functions which are
called by system calls if SIGKILL is pending (i.e. __GFP_KILLABLE).

Did you understand my point?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-11-25 12:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-23  6:49 [RFC 0/2] GFP_NOFAIL cleanups Michal Hocko
2016-11-23  6:49 ` Michal Hocko
2016-11-23  6:49 ` [RFC 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
2016-11-23  6:49   ` Michal Hocko
2016-11-23 10:43   ` Vlastimil Babka
2016-11-23 10:43     ` Vlastimil Babka
2016-11-23  6:49 ` [RFC 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
2016-11-23  6:49   ` Michal Hocko
2016-11-23 12:19   ` Vlastimil Babka
2016-11-23 12:19     ` Vlastimil Babka
2016-11-23 12:35     ` Michal Hocko
2016-11-23 12:35       ` Michal Hocko
2016-11-24  7:41       ` Vlastimil Babka
2016-11-24  7:41         ` Vlastimil Babka
2016-11-24  7:51         ` Michal Hocko
2016-11-24  7:51           ` Michal Hocko
2016-11-23 14:35   ` Tetsuo Handa
2016-11-23 14:35     ` Tetsuo Handa
2016-11-23 15:35     ` Michal Hocko
2016-11-23 15:35       ` Michal Hocko
2016-11-25 12:00       ` Tetsuo Handa [this message]
2016-11-25 12:00         ` Tetsuo Handa
2016-11-25 13:18         ` Michal Hocko
2016-11-25 13:18           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201611252100.ADG04225.MFOSOVtHJFFLQO@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.