From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E73CC10F11 for ; Mon, 22 Apr 2019 20:19:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0D60420675 for ; Mon, 22 Apr 2019 20:19:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728515AbfDVUTE (ORCPT ); Mon, 22 Apr 2019 16:19:04 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53636 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728143AbfDVUTE (ORCPT ); Mon, 22 Apr 2019 16:19:04 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EFE9E83F3D; Mon, 22 Apr 2019 20:19:02 +0000 (UTC) Received: from redhat.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4B40F6013B; Mon, 22 Apr 2019 20:18:59 +0000 (UTC) Date: Mon, 22 Apr 2019 16:18:57 -0400 From: Jerome Glisse To: Laurent Dufour Cc: akpm@linux-foundation.org, mhocko@kernel.org, peterz@infradead.org, kirill@shutemov.name, ak@linux.intel.com, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , aneesh.kumar@linux.ibm.com, benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , sergey.senozhatsky.work@gmail.com, Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, Daniel Jordan , David Rientjes , Ganesh Mahendran , Minchan Kim , Punit Agrawal , vinayak menon , Yang Shi , zhong jiang , Haiyan Song , Balbir Singh , sj38.park@gmail.com, Michel Lespinasse , Mike Rapoport , linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, npiggin@gmail.com, paulmck@linux.vnet.ibm.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: Re: [PATCH v12 17/31] mm: introduce __page_add_new_anon_rmap() Message-ID: <20190422201857.GH14666@redhat.com> References: <20190416134522.17540-1-ldufour@linux.ibm.com> <20190416134522.17540-18-ldufour@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20190416134522.17540-18-ldufour@linux.ibm.com> User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Mon, 22 Apr 2019 20:19:03 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 16, 2019 at 03:45:08PM +0200, Laurent Dufour wrote: > When dealing with speculative page fault handler, we may race with VMA > being split or merged. In this case the vma->vm_start and vm->vm_end > fields may not match the address the page fault is occurring. > > This can only happens when the VMA is split but in that case, the > anon_vma pointer of the new VMA will be the same as the original one, > because in __split_vma the new->anon_vma is set to src->anon_vma when > *new = *vma. > > So even if the VMA boundaries are not correct, the anon_vma pointer is > still valid. > > If the VMA has been merged, then the VMA in which it has been merged > must have the same anon_vma pointer otherwise the merge can't be done. > > So in all the case we know that the anon_vma is valid, since we have > checked before starting the speculative page fault that the anon_vma > pointer is valid for this VMA and since there is an anon_vma this > means that at one time a page has been backed and that before the VMA > is cleaned, the page table lock would have to be grab to clean the > PTE, and the anon_vma field is checked once the PTE is locked. > > This patch introduce a new __page_add_new_anon_rmap() service which > doesn't check for the VMA boundaries, and create a new inline one > which do the check. > > When called from a page fault handler, if this is not a speculative one, > there is a guarantee that vm_start and vm_end match the faulting address, > so this check is useless. In the context of the speculative page fault > handler, this check may be wrong but anon_vma is still valid as explained > above. > > Signed-off-by: Laurent Dufour Reviewed-by: Jérôme Glisse > --- > include/linux/rmap.h | 12 ++++++++++-- > mm/memory.c | 8 ++++---- > mm/rmap.c | 5 ++--- > 3 files changed, 16 insertions(+), 9 deletions(-) > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > index 988d176472df..a5d282573093 100644 > --- a/include/linux/rmap.h > +++ b/include/linux/rmap.h > @@ -174,8 +174,16 @@ void page_add_anon_rmap(struct page *, struct vm_area_struct *, > unsigned long, bool); > void do_page_add_anon_rmap(struct page *, struct vm_area_struct *, > unsigned long, int); > -void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, > - unsigned long, bool); > +void __page_add_new_anon_rmap(struct page *, struct vm_area_struct *, > + unsigned long, bool); > +static inline void page_add_new_anon_rmap(struct page *page, > + struct vm_area_struct *vma, > + unsigned long address, bool compound) > +{ > + VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); > + __page_add_new_anon_rmap(page, vma, address, compound); > +} > + > void page_add_file_rmap(struct page *, bool); > void page_remove_rmap(struct page *, bool); > > diff --git a/mm/memory.c b/mm/memory.c > index be93f2c8ebe0..46f877b6abea 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2347,7 +2347,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) > * thread doing COW. > */ > ptep_clear_flush_notify(vma, vmf->address, vmf->pte); > - page_add_new_anon_rmap(new_page, vma, vmf->address, false); > + __page_add_new_anon_rmap(new_page, vma, vmf->address, false); > mem_cgroup_commit_charge(new_page, memcg, false, false); > __lru_cache_add_active_or_unevictable(new_page, vmf->vma_flags); > /* > @@ -2897,7 +2897,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > /* ksm created a completely new copy */ > if (unlikely(page != swapcache && swapcache)) { > - page_add_new_anon_rmap(page, vma, vmf->address, false); > + __page_add_new_anon_rmap(page, vma, vmf->address, false); > mem_cgroup_commit_charge(page, memcg, false, false); > __lru_cache_add_active_or_unevictable(page, vmf->vma_flags); > } else { > @@ -3049,7 +3049,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) > } > > inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); > - page_add_new_anon_rmap(page, vma, vmf->address, false); > + __page_add_new_anon_rmap(page, vma, vmf->address, false); > mem_cgroup_commit_charge(page, memcg, false, false); > __lru_cache_add_active_or_unevictable(page, vmf->vma_flags); > setpte: > @@ -3328,7 +3328,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, > /* copy-on-write page */ > if (write && !(vmf->vma_flags & VM_SHARED)) { > inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); > - page_add_new_anon_rmap(page, vma, vmf->address, false); > + __page_add_new_anon_rmap(page, vma, vmf->address, false); > mem_cgroup_commit_charge(page, memcg, false, false); > __lru_cache_add_active_or_unevictable(page, vmf->vma_flags); > } else { > diff --git a/mm/rmap.c b/mm/rmap.c > index e5dfe2ae6b0d..2148e8ce6e34 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1140,7 +1140,7 @@ void do_page_add_anon_rmap(struct page *page, > } > > /** > - * page_add_new_anon_rmap - add pte mapping to a new anonymous page > + * __page_add_new_anon_rmap - add pte mapping to a new anonymous page > * @page: the page to add the mapping to > * @vma: the vm area in which the mapping is added > * @address: the user virtual address mapped > @@ -1150,12 +1150,11 @@ void do_page_add_anon_rmap(struct page *page, > * This means the inc-and-test can be bypassed. > * Page does not have to be locked. > */ > -void page_add_new_anon_rmap(struct page *page, > +void __page_add_new_anon_rmap(struct page *page, > struct vm_area_struct *vma, unsigned long address, bool compound) > { > int nr = compound ? hpage_nr_pages(page) : 1; > > - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); > __SetPageSwapBacked(page); > if (compound) { > VM_BUG_ON_PAGE(!PageTransHuge(page), page); > -- > 2.21.0 >