All of lore.kernel.org
 help / color / mirror / Atom feed
From: Harsh Shandilya <harsh@prjkt.io>
To: Michal Hocko <mhocko@kernel.org>
Cc: stable@vger.kernel.org,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Mel Gorman <mgorman@suse.de>, Dave Chinner <david@fromorbit.com>,
	Mark Fasheh <mfasheh@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 3.18.y 3/5] mm: allow GFP_{FS,IO} for page_cache_read page cache allocation
Date: Mon, 23 Apr 2018 08:46:06 +0530	[thread overview]
Message-ID: <390EC9D9-32C8-42D2-8970-F4640B45C523@prjkt.io> (raw)
In-Reply-To: <20180422224636.GL17484@dhcp22.suse.cz>

On 23 April 2018 4:16:36 AM IST, Michal Hocko <mhocko@kernel.org> wrote:
>I am not reallu sure it is a good idea to blindly apply this patch to
>the older stable tree. Have you checked all filemap_fault handlers?
>This might have been quite different in 3.18 than for the kernel I was
>developing this against.

I did, but it's likely that I missed a few instances.

>If the sole purpose of this backport is to make other
>patch (abc1be13fd11 ("mm/filemap.c: fix NULL pointer in
>page_cache_tree_insert()")) apply easier then I've already suggested
>how
>to handle those rejects.

I'll look for the email after this exam and spin up a fixed series, thanks for the heads-up!

>On Mon 23-04-18 01:37:44, Harsh Shandilya wrote:
>> From: Michal Hocko <mhocko@suse.com>
>> 
>> Commit c20cd45eb01748f0fba77a504f956b000df4ea73 upstream.
>> 
>> page_cache_read has been historically using page_cache_alloc_cold to
>> allocate a new page.  This means that mapping_gfp_mask is used as the
>> base for the gfp_mask.  Many filesystems are setting this mask to
>> GFP_NOFS to prevent from fs recursion issues.  page_cache_read is
>called
>> from the vm_operations_struct::fault() context during the page fault.
>> This context doesn't need the reclaim protection normally.
>> 
>> ceph and ocfs2 which call filemap_fault from their fault handlers
>seem
>> to be OK because they are not taking any fs lock before invoking
>generic
>> implementation.  xfs which takes XFS_MMAPLOCK_SHARED is safe from the
>> reclaim recursion POV because this lock serializes truncate and punch
>> hole with the page faults and it doesn't get involved in the reclaim.
>> 
>> There is simply no reason to deliberately use a weaker allocation
>> context when a __GFP_FS | __GFP_IO can be used.  The GFP_NOFS
>protection
>> might be even harmful.  There is a push to fail GFP_NOFS allocations
>> rather than loop within allocator indefinitely with a very limited
>> reclaim ability.  Once we start failing those requests the OOM killer
>> might be triggered prematurely because the page cache allocation
>failure
>> is propagated up the page fault path and end up in
>> pagefault_out_of_memory.
>> 
>> We cannot play with mapping_gfp_mask directly because that would be
>racy
>> wrt.  parallel page faults and it might interfere with other users
>who
>> really rely on NOFS semantic from the stored gfp_mask.  The mask is
>also
>> inode proper so it would even be a layering violation.  What we can
>do
>> instead is to push the gfp_mask into struct vm_fault and allow fs
>layer
>> to overwrite it should the callback need to be called with a
>different
>> allocation context.
>> 
>> Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO)
>> because this should be safe from the page fault path normally.  Why
>do
>> we care about mapping_gfp_mask at all then? Because this doesn't hold
>> only reclaim protection flags but it also might contain zone and
>> movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we
>have
>> to respect those.
>> 
>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Acked-by: Jan Kara <jack@suse.com>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Dave Chinner <david@fromorbit.com>
>> Cc: Mark Fasheh <mfasheh@suse.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Harsh Shandilya <harsh@prjkt.io>
>> ---
>>  include/linux/mm.h |  4 ++++
>>  mm/filemap.c       |  8 ++++----
>>  mm/memory.c        | 17 +++++++++++++++++
>>  3 files changed, 25 insertions(+), 4 deletions(-)
>> 
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 5adffb0a468f..9ac4697979e8 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -203,9 +203,13 @@ extern pgprot_t protection_map[16];
>>   *
>>   * pgoff should be used in favour of virtual_address, if possible.
>If pgoff
>>   * is used, one may implement ->remap_pages to get nonlinear mapping
>support.
>> + *
>> + * MM layer fills up gfp_mask for page allocations but fault handler
>might
>> + * alter it if its implementation requires a different allocation
>context.
>>   */
>>  struct vm_fault {
>>  	unsigned int flags;		/* FAULT_FLAG_xxx flags */
>> +	gfp_t gfp_mask;			/* gfp mask to be used for allocations */
>>  	pgoff_t pgoff;			/* Logical page offset based on vma */
>>  	void __user *virtual_address;	/* Faulting virtual address */
>>  
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 7e6ab98d4d3c..aafeeefcb00d 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -1746,18 +1746,18 @@ EXPORT_SYMBOL(generic_file_read_iter);
>>   * This adds the requested page to the page cache if it isn't
>already there,
>>   * and schedules an I/O to read in its contents from disk.
>>   */
>> -static int page_cache_read(struct file *file, pgoff_t offset)
>> +static int page_cache_read(struct file *file, pgoff_t offset, gfp_t
>gfp_mask)
>>  {
>>  	struct address_space *mapping = file->f_mapping;
>>  	struct page *page;
>>  	int ret;
>>  
>>  	do {
>> -		page = page_cache_alloc_cold(mapping);
>> +		page = __page_cache_alloc(gfp_mask|__GFP_COLD);
>>  		if (!page)
>>  			return -ENOMEM;
>>  
>> -		ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL);
>> +		ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask &
>GFP_KERNEL);
>>  		if (ret == 0)
>>  			ret = mapping->a_ops->readpage(file, page);
>>  		else if (ret == -EEXIST)
>> @@ -1940,7 +1940,7 @@ no_cached_page:
>>  	 * We're only likely to ever get here if MADV_RANDOM is in
>>  	 * effect.
>>  	 */
>> -	error = page_cache_read(file, offset);
>> +	error = page_cache_read(file, offset, vmf->gfp_mask);
>>  
>>  	/*
>>  	 * The page we want has now been added to the page cache.
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 0c4f5e36b155..5a62c6a42143 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -1973,6 +1973,20 @@ static inline void cow_user_page(struct page
>*dst, struct page *src, unsigned lo
>>  		copy_user_highpage(dst, src, va, vma);
>>  }
>>  
>> +static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
>> +{
>> +	struct file *vm_file = vma->vm_file;
>> +
>> +	if (vm_file)
>> +		return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS | __GFP_IO;
>> +
>> +	/*
>> +	 * Special mappings (e.g. VDSO) do not have any file so fake
>> +	 * a default GFP_KERNEL for them.
>> +	 */
>> +	return GFP_KERNEL;
>> +}
>> +
>>  /*
>>   * Notify the address space that the page is about to become
>writable so that
>>   * it can prohibit this or wait for the page to get into an
>appropriate state.
>> @@ -1988,6 +2002,7 @@ static int do_page_mkwrite(struct
>vm_area_struct *vma, struct page *page,
>>  	vmf.virtual_address = (void __user *)(address & PAGE_MASK);
>>  	vmf.pgoff = page->index;
>>  	vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
>> +	vmf.gfp_mask = __get_fault_gfp_mask(vma);
>>  	vmf.page = page;
>>  
>>  	ret = vma->vm_ops->page_mkwrite(vma, &vmf);
>> @@ -2670,6 +2685,7 @@ static int __do_fault(struct vm_area_struct
>*vma, unsigned long address,
>>  	vmf.pgoff = pgoff;
>>  	vmf.flags = flags;
>>  	vmf.page = NULL;
>> +	vmf.gfp_mask = __get_fault_gfp_mask(vma);
>>  
>>  	ret = vma->vm_ops->fault(vma, &vmf);
>>  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE |
>VM_FAULT_RETRY)))
>> @@ -2834,6 +2850,7 @@ static void do_fault_around(struct
>vm_area_struct *vma, unsigned long address,
>>  	vmf.pgoff = pgoff;
>>  	vmf.max_pgoff = max_pgoff;
>>  	vmf.flags = flags;
>> +	vmf.gfp_mask = __get_fault_gfp_mask(vma);
>>  	vma->vm_ops->map_pages(vma, &vmf);
>>  }
>>  
>> -- 
>> 2.15.0.2308.g658a28aa74af
>> 


-- 
Harsh Shandilya, PRJKT Development LLC

  reply	other threads:[~2018-04-23  3:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-22 20:07 [PATCH 3.18.y 0/5] Backports for 3.18.y Harsh Shandilya
2018-04-22 20:07 ` [PATCH 3.18.y 1/5] ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() Harsh Shandilya
2018-04-22 20:07 ` [PATCH 3.18.y 2/5] ext4: bugfix for mmaped pages in mpage_release_unused_pages() Harsh Shandilya
2018-04-22 20:07 ` [PATCH 3.18.y 3/5] mm: allow GFP_{FS,IO} for page_cache_read page cache allocation Harsh Shandilya
2018-04-22 22:46   ` Michal Hocko
2018-04-23  3:16     ` Harsh Shandilya [this message]
2018-04-22 20:07 ` [PATCH 3.18.y 4/5] mm/filemap.c: fix NULL pointer in page_cache_tree_insert() Harsh Shandilya
2018-04-22 20:07 ` [PATCH 3.18.y 5/5] ext4: don't update checksum of new initialized bitmaps Harsh Shandilya
2018-04-24 12:30 ` [PATCH 3.18.y 0/5] Backports for 3.18.y Greg KH
2018-04-24 12:59   ` Harsh Shandilya
2018-04-24 13:11   ` Harsh Shandilya
2018-04-24 14:03     ` Greg KH
2018-04-24 14:11       ` Harsh Shandilya
2018-04-24 14:24         ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=390EC9D9-32C8-42D2-8970-F4640B45C523@prjkt.io \
    --to=harsh@prjkt.io \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=mfasheh@suse.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.