From: NeilBrown <neilb@suse.de>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Sage Weil <sage@inktank.com>, Mark Fasheh <mfasheh@suse.com>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
Date: Thu, 19 Mar 2015 08:38:35 +1100 [thread overview]
Message-ID: <20150319083835.2115ba11@notabene.brown> (raw)
In-Reply-To: <20150318154540.GN17241@dhcp22.suse.cz>
[-- Attachment #1: Type: text/plain, Size: 2983 bytes --]
On Wed, 18 Mar 2015 16:45:40 +0100 Michal Hocko <mhocko@suse.cz> wrote:
> What do you think about this v2? I cannot say I would like it but I
> really dislike the whole mapping_gfp_mask API to be honest.
> ---
> >From d88010d6f5f59d7eb87b691e27e201d12cab9141 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Wed, 18 Mar 2015 16:06:40 +0100
> Subject: [PATCH] mm: Allow __GFP_FS for page_cache_read page cache allocation
>
> page_cache_read has been historically using page_cache_alloc_cold to
> allocate a new page. This means that mapping_gfp_mask is used as the
> base for the gfp_mask. Many filesystems are setting this mask to
> GFP_NOFS to prevent from fs recursion issues. page_cache_read is,
> however, not called from the fs layer so it doesn't need this
> protection. Even ceph and ocfs2 which call filemap_fault from their
> fault handlers seem to be OK because they are not taking any fs lock
> before invoking generic implementation.
>
> The protection might be even harmful. There is a strong push to fail
> GFP_NOFS allocations rather than loop within allocator indefinitely with
> a very limited reclaim ability. Once we start failing those requests
> the OOM killer might be triggered prematurely because the page cache
> allocation failure is propagated up the page fault path and end up in
> pagefault_out_of_memory.
>
> Add __GFP_FS and __GFPIO to the gfp mask which is coming from the
> mapping to fix this issue.
>
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/filemap.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 968cd8e03d2e..8b50d5eb52b2 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1752,7 +1752,15 @@ static int page_cache_read(struct file *file, pgoff_t offset)
> int ret;
>
> do {
> - page = page_cache_alloc_cold(mapping);
> + gfp_t page_cache_gfp = mapping_gfp_mask(mapping)|__GFP_COLD;
> +
> + /*
> + * This code is not called from the fs layer so we do not need
> + * reclaim recursion protection. !GFP_FS might fail too easy
> + * and trigger OOM killer prematuraly.
> + */
> + page_cache_gfp |= __GFP_FS | __GFP_IO;
> + page = __page_cache_alloc(page_cache_gfp);
> if (!page)
> return -ENOMEM;
>
Nearly half the places in the kernel which call mapping_gfp_mask() remove the
__GFP_FS bit.
That suggests to me that it might make sense to have
mapping_gfp_mask_fs()
and
mapping_gfp_mask_nofs()
and let the presence of __GFP_FS (and __GFP_IO) be determined by the
call-site rather than the filesystem.
However I am a bit concerned about drivers/block/loop.c.
Might a filesystem read on the loop block device wait for a page_cache_read()
on the loop-mounted file? In that case you really don't want __GFP_FS set
when allocating that page.
NeilBrown
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
next prev parent reply other threads:[~2015-03-18 21:38 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-18 14:09 [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read Michal Hocko
2015-03-18 14:32 ` Rik van Riel
2015-03-18 14:37 ` Michal Hocko
2015-03-18 14:38 ` Mel Gorman
2015-03-18 14:43 ` Michal Hocko
2015-03-18 14:44 ` Rik van Riel
2015-03-18 14:55 ` Michal Hocko
2015-03-19 7:14 ` Dave Chinner
2015-03-19 11:11 ` [PATCH] mm: Use GFP_KERNEL allocation for the page cache inpage_cache_read Tetsuo Handa
2015-03-19 12:44 ` [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read Michal Hocko
2015-03-20 3:48 ` Dave Chinner
2015-03-20 13:14 ` Michal Hocko
2015-03-20 22:51 ` Dave Chinner
2015-03-23 13:02 ` Michal Hocko
2015-03-26 9:53 ` Michal Hocko
2015-03-26 21:43 ` Dave Chinner
2015-03-30 8:22 ` Michal Hocko
2015-03-31 21:46 ` Dave Chinner
2015-04-07 12:16 ` Michal Hocko
2015-03-18 15:45 ` Michal Hocko
2015-03-18 21:38 ` NeilBrown [this message]
2015-03-19 13:55 ` Michal Hocko
2015-03-19 14:27 ` Michal Hocko
2015-03-20 3:57 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150319083835.2115ba11@notabene.brown \
--to=neilb@suse.de \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mfasheh@suse.com \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=riel@redhat.com \
--cc=sage@inktank.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).