From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f195.google.com ([209.85.192.195]:37448 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753911AbeDWDQO (ORCPT ); Sun, 22 Apr 2018 23:16:14 -0400 Received: by mail-pf0-f195.google.com with SMTP id p6so8040958pfn.4 for ; Sun, 22 Apr 2018 20:16:13 -0700 (PDT) Date: Mon, 23 Apr 2018 08:46:06 +0530 In-Reply-To: <20180422224636.GL17484@dhcp22.suse.cz> References: <20180422200746.29118-1-harsh@prjkt.io> <20180422200746.29118-4-harsh@prjkt.io> <20180422224636.GL17484@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PATCH 3.18.y 3/5] mm: allow GFP_{FS,IO} for page_cache_read page cache allocation To: Michal Hocko CC: stable@vger.kernel.org, Tetsuo Handa , Mel Gorman , Dave Chinner , Mark Fasheh , Andrew Morton , Linus Torvalds From: Harsh Shandilya Message-ID: <390EC9D9-32C8-42D2-8970-F4640B45C523@prjkt.io> Sender: stable-owner@vger.kernel.org List-ID: On 23 April 2018 4:16:36 AM IST, Michal Hocko wrote: >I am not reallu sure it is a good idea to blindly apply this patch to >the older stable tree=2E Have you checked all filemap_fault handlers? >This might have been quite different in 3=2E18 than for the kernel I was >developing this against=2E I did, but it's likely that I missed a few instances=2E >If the sole purpose of this backport is to make other >patch (abc1be13fd11 ("mm/filemap=2Ec: fix NULL pointer in >page_cache_tree_insert()")) apply easier then I've already suggested >how >to handle those rejects=2E I'll look for the email after this exam and spin up a fixed series, thanks= for the heads-up! >On Mon 23-04-18 01:37:44, Harsh Shandilya wrote: >> From: Michal Hocko >>=20 >> Commit c20cd45eb01748f0fba77a504f956b000df4ea73 upstream=2E >>=20 >> page_cache_read has been historically using page_cache_alloc_cold to >> allocate a new page=2E This means that mapping_gfp_mask is used as the >> base for the gfp_mask=2E Many filesystems are setting this mask to >> GFP_NOFS to prevent from fs recursion issues=2E page_cache_read is >called >> from the vm_operations_struct::fault() context during the page fault=2E >> This context doesn't need the reclaim protection normally=2E >>=20 >> ceph and ocfs2 which call filemap_fault from their fault handlers >seem >> to be OK because they are not taking any fs lock before invoking >generic >> implementation=2E xfs which takes XFS_MMAPLOCK_SHARED is safe from the >> reclaim recursion POV because this lock serializes truncate and punch >> hole with the page faults and it doesn't get involved in the reclaim=2E >>=20 >> There is simply no reason to deliberately use a weaker allocation >> context when a __GFP_FS | __GFP_IO can be used=2E The GFP_NOFS >protection >> might be even harmful=2E There is a push to fail GFP_NOFS allocations >> rather than loop within allocator indefinitely with a very limited >> reclaim ability=2E Once we start failing those requests the OOM killer >> might be triggered prematurely because the page cache allocation >failure >> is propagated up the page fault path and end up in >> pagefault_out_of_memory=2E >>=20 >> We cannot play with mapping_gfp_mask directly because that would be >racy >> wrt=2E parallel page faults and it might interfere with other users >who >> really rely on NOFS semantic from the stored gfp_mask=2E The mask is >also >> inode proper so it would even be a layering violation=2E What we can >do >> instead is to push the gfp_mask into struct vm_fault and allow fs >layer >> to overwrite it should the callback need to be called with a >different >> allocation context=2E >>=20 >> Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO) >> because this should be safe from the page fault path normally=2E Why >do >> we care about mapping_gfp_mask at all then? Because this doesn't hold >> only reclaim protection flags but it also might contain zone and >> movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we >have >> to respect those=2E >>=20 >> Signed-off-by: Michal Hocko >> Reported-by: Tetsuo Handa >> Acked-by: Jan Kara >> Acked-by: Vlastimil Babka >> Cc: Tetsuo Handa >> Cc: Mel Gorman >> Cc: Dave Chinner >> Cc: Mark Fasheh >> Signed-off-by: Andrew Morton >> Signed-off-by: Linus Torvalds >> Signed-off-by: Harsh Shandilya >> --- >> include/linux/mm=2Eh | 4 ++++ >> mm/filemap=2Ec | 8 ++++---- >> mm/memory=2Ec | 17 +++++++++++++++++ >> 3 files changed, 25 insertions(+), 4 deletions(-) >>=20 >> diff --git a/include/linux/mm=2Eh b/include/linux/mm=2Eh >> index 5adffb0a468f=2E=2E9ac4697979e8 100644 >> --- a/include/linux/mm=2Eh >> +++ b/include/linux/mm=2Eh >> @@ -203,9 +203,13 @@ extern pgprot_t protection_map[16]; >> * >> * pgoff should be used in favour of virtual_address, if possible=2E >If pgoff >> * is used, one may implement ->remap_pages to get nonlinear mapping >support=2E >> + * >> + * MM layer fills up gfp_mask for page allocations but fault handler >might >> + * alter it if its implementation requires a different allocation >context=2E >> */ >> struct vm_fault { >> unsigned int flags; /* FAULT_FLAG_xxx flags */ >> + gfp_t gfp_mask; /* gfp mask to be used for allocations */ >> pgoff_t pgoff; /* Logical page offset based on vma */ >> void __user *virtual_address; /* Faulting virtual address */ >> =20 >> diff --git a/mm/filemap=2Ec b/mm/filemap=2Ec >> index 7e6ab98d4d3c=2E=2Eaafeeefcb00d 100644 >> --- a/mm/filemap=2Ec >> +++ b/mm/filemap=2Ec >> @@ -1746,18 +1746,18 @@ EXPORT_SYMBOL(generic_file_read_iter); >> * This adds the requested page to the page cache if it isn't >already there, >> * and schedules an I/O to read in its contents from disk=2E >> */ >> -static int page_cache_read(struct file *file, pgoff_t offset) >> +static int page_cache_read(struct file *file, pgoff_t offset, gfp_t >gfp_mask) >> { >> struct address_space *mapping =3D file->f_mapping; >> struct page *page; >> int ret; >> =20 >> do { >> - page =3D page_cache_alloc_cold(mapping); >> + page =3D __page_cache_alloc(gfp_mask|__GFP_COLD); >> if (!page) >> return -ENOMEM; >> =20 >> - ret =3D add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL); >> + ret =3D add_to_page_cache_lru(page, mapping, offset, gfp_mask & >GFP_KERNEL); >> if (ret =3D=3D 0) >> ret =3D mapping->a_ops->readpage(file, page); >> else if (ret =3D=3D -EEXIST) >> @@ -1940,7 +1940,7 @@ no_cached_page: >> * We're only likely to ever get here if MADV_RANDOM is in >> * effect=2E >> */ >> - error =3D page_cache_read(file, offset); >> + error =3D page_cache_read(file, offset, vmf->gfp_mask); >> =20 >> /* >> * The page we want has now been added to the page cache=2E >> diff --git a/mm/memory=2Ec b/mm/memory=2Ec >> index 0c4f5e36b155=2E=2E5a62c6a42143 100644 >> --- a/mm/memory=2Ec >> +++ b/mm/memory=2Ec >> @@ -1973,6 +1973,20 @@ static inline void cow_user_page(struct page >*dst, struct page *src, unsigned lo >> copy_user_highpage(dst, src, va, vma); >> } >> =20 >> +static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma) >> +{ >> + struct file *vm_file =3D vma->vm_file; >> + >> + if (vm_file) >> + return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS | __GFP_IO; >> + >> + /* >> + * Special mappings (e=2Eg=2E VDSO) do not have any file so fake >> + * a default GFP_KERNEL for them=2E >> + */ >> + return GFP_KERNEL; >> +} >> + >> /* >> * Notify the address space that the page is about to become >writable so that >> * it can prohibit this or wait for the page to get into an >appropriate state=2E >> @@ -1988,6 +2002,7 @@ static int do_page_mkwrite(struct >vm_area_struct *vma, struct page *page, >> vmf=2Evirtual_address =3D (void __user *)(address & PAGE_MASK); >> vmf=2Epgoff =3D page->index; >> vmf=2Eflags =3D FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE; >> + vmf=2Egfp_mask =3D __get_fault_gfp_mask(vma); >> vmf=2Epage =3D page; >> =20 >> ret =3D vma->vm_ops->page_mkwrite(vma, &vmf); >> @@ -2670,6 +2685,7 @@ static int __do_fault(struct vm_area_struct >*vma, unsigned long address, >> vmf=2Epgoff =3D pgoff; >> vmf=2Eflags =3D flags; >> vmf=2Epage =3D NULL; >> + vmf=2Egfp_mask =3D __get_fault_gfp_mask(vma); >> =20 >> ret =3D vma->vm_ops->fault(vma, &vmf); >> if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | >VM_FAULT_RETRY))) >> @@ -2834,6 +2850,7 @@ static void do_fault_around(struct >vm_area_struct *vma, unsigned long address, >> vmf=2Epgoff =3D pgoff; >> vmf=2Emax_pgoff =3D max_pgoff; >> vmf=2Eflags =3D flags; >> + vmf=2Egfp_mask =3D __get_fault_gfp_mask(vma); >> vma->vm_ops->map_pages(vma, &vmf); >> } >> =20 >> --=20 >> 2=2E15=2E0=2E2308=2Eg658a28aa74af >>=20 --=20 Harsh Shandilya, PRJKT Development LLC