linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Neil Brown <neilb@suse.de>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Sage Weil <sage@inktank.com>, Mark Fasheh <mfasheh@suse.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
Date: Fri, 20 Mar 2015 14:14:53 +0100	[thread overview]
Message-ID: <20150320131453.GA4821@dhcp22.suse.cz> (raw)
In-Reply-To: <20150320034820.GH28621@dastard>

On Fri 20-03-15 14:48:20, Dave Chinner wrote:
> On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote:
> > On Thu 19-03-15 18:14:39, Dave Chinner wrote:
> > > On Wed, Mar 18, 2015 at 03:55:28PM +0100, Michal Hocko wrote:
> > > > On Wed 18-03-15 10:44:11, Rik van Riel wrote:
> > > > > On 03/18/2015 10:09 AM, Michal Hocko wrote:
> > > > > > page_cache_read has been historically using page_cache_alloc_cold to
> > > > > > allocate a new page. This means that mapping_gfp_mask is used as the
> > > > > > base for the gfp_mask. Many filesystems are setting this mask to
> > > > > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is,
> > > > > > however, not called from the fs layer 
> > > > > 
> > > > > Is that true for filesystems that have directories in
> > > > > the page cache?
> > > > 
> > > > I haven't found any explicit callers of filemap_fault except for ocfs2
> > > > and ceph and those seem OK to me. Which filesystems you have in mind?
> > > 
> > > Just about every major filesystem calls filemap_fault through the
> > > .fault callout.
> > 
> > That is right but the callback is called from the VM layer where we
> > obviously do not take any fs locks (we are holding only mmap_sem
> > for reading).
> > Those who call filemap_fault directly (ocfs2 and ceph) and those
> > who call the callback directly: qxl_ttm_fault, radeon_ttm_fault,
> > kernfs_vma_fault, shm_fault seem to be safe from the reclaim recursion
> > POV. radeon_ttm_fault takes a lock for reading but that one doesn't seem
> > to be used from the reclaim context.
> > 
> > Or did I miss your point? Are you concerned about some fs overloading
> > filemap_fault and do some locking before delegating to filemap_fault?
> 
> The latter:
> 
> https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b

I will have a look at the code to see what we can do about it.
 
> > > GFP_KERNEL allocation for mappings is simply wrong. All mapping
> > > allocations where the caller cannot pass a gfp_mask need to obey
> > > the mapping_gfp_mask that is set by the mapping owner....
> > 
> > Hmm, I thought this is true only when the function might be called from
> > the fs path.
> 
> How do you know in, say, mpage_readpages, you aren't being called
> from a fs path that holds locks? e.g. we can get there from ext4
> doing readdir, so it is holding an i_mutex lock at that point.
> 
> Many other paths into mpages_readpages don't hold locks, but there
> are some that do, and those that do need functionals like this to
> obey the mapping_gfp_mask because it is set appropriately for the
> allocation context of the inode that owns the mapping....

What about the following?
---
>From 5d905cb291138d61bbab056845d6e53bc4451ec8 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Thu, 19 Mar 2015 14:56:56 +0100
Subject: [PATCH 1/2] mm: do not ignore mapping_gfp_mask in page cache
 allocation paths

page_cache_read, do_generic_file_read, __generic_file_splice_read and
__ntfs_grab_cache_pages currently ignore mapping_gfp_mask when calling
add_to_page_cache_lru which might cause recursion into fs down in the
direct reclaim path if the mapping really relies on GFP_NOFS semantic.

This doesn't seem to be the case now because page_cache_read (page fault
path) doesn't seem to suffer from the reclaim recursion issues and
do_generic_file_read and __generic_file_splice_read also shouldn't be
called under fs locks which would deadlock in the reclaim path. Anyway
it is better to obey mapping gfp mask and prevent from later breakage.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 fs/ntfs/file.c | 2 +-
 fs/splice.c    | 2 +-
 mm/filemap.c   | 6 ++++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index 1da9b2d184dc..568c9dbc7e61 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -422,7 +422,7 @@ static inline int __ntfs_grab_cache_pages(struct address_space *mapping,
 				}
 			}
 			err = add_to_page_cache_lru(*cached_page, mapping, index,
-					GFP_KERNEL);
+					GFP_KERNEL & mapping_gfp_mask(mapping));
 			if (unlikely(err)) {
 				if (err == -EEXIST)
 					continue;
diff --git a/fs/splice.c b/fs/splice.c
index 75c6058eabf2..71f6c51f019a 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -360,7 +360,7 @@ __generic_file_splice_read(struct file *in, loff_t *ppos,
 				break;
 
 			error = add_to_page_cache_lru(page, mapping, index,
-						GFP_KERNEL);
+						GFP_KERNEL & mapping_gfp_mask(mapping));
 			if (unlikely(error)) {
 				page_cache_release(page);
 				if (error == -EEXIST)
diff --git a/mm/filemap.c b/mm/filemap.c
index 968cd8e03d2e..4756cba51655 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1656,7 +1656,8 @@ no_cached_page:
 			goto out;
 		}
 		error = add_to_page_cache_lru(page, mapping,
-						index, GFP_KERNEL);
+						index,
+						GFP_KERNEL & mapping_gfp_mask(mapping));
 		if (error) {
 			page_cache_release(page);
 			if (error == -EEXIST) {
@@ -1756,7 +1757,8 @@ static int page_cache_read(struct file *file, pgoff_t offset)
 		if (!page)
 			return -ENOMEM;
 
-		ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL);
+		ret = add_to_page_cache_lru(page, mapping, offset,
+				GFP_KERNEL & mapping_gfp_mask(mapping));
 		if (ret == 0)
 			ret = mapping->a_ops->readpage(file, page);
 		else if (ret == -EEXIST)
-- 
2.1.4


-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2015-03-20 13:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-18 14:09 [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read Michal Hocko
2015-03-18 14:32 ` Rik van Riel
2015-03-18 14:37   ` Michal Hocko
2015-03-18 14:38 ` Mel Gorman
2015-03-18 14:43   ` Michal Hocko
2015-03-18 14:44 ` Rik van Riel
2015-03-18 14:55   ` Michal Hocko
2015-03-19  7:14     ` Dave Chinner
2015-03-19 11:11       ` [PATCH] mm: Use GFP_KERNEL allocation for the page cache inpage_cache_read Tetsuo Handa
2015-03-19 12:44       ` [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read Michal Hocko
2015-03-20  3:48         ` Dave Chinner
2015-03-20 13:14           ` Michal Hocko [this message]
2015-03-20 22:51             ` Dave Chinner
2015-03-23 13:02               ` Michal Hocko
2015-03-26  9:53           ` Michal Hocko
2015-03-26 21:43             ` Dave Chinner
2015-03-30  8:22               ` Michal Hocko
2015-03-31 21:46                 ` Dave Chinner
2015-04-07 12:16                   ` Michal Hocko
2015-03-18 15:45 ` Michal Hocko
2015-03-18 21:38   ` NeilBrown
2015-03-19 13:55     ` Michal Hocko
2015-03-19 14:27       ` Michal Hocko
2015-03-20  3:57       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150320131453.GA4821@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mfasheh@suse.com \
    --cc=mgorman@suse.de \
    --cc=neilb@suse.de \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=riel@redhat.com \
    --cc=sage@inktank.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).