From: Laurent Dufour <ldufour@linux.vnet.ibm.com> To: Matthew Wilcox <willy@infradead.org> Cc: paulmck@linux.vnet.ibm.com, peterz@infradead.org, akpm@linux-foundation.org, kirill@shutemov.name, ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net, jack@suse.cz, benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, hpa@zytor.com, Will Deacon <will.deacon@arm.com>, Sergey Senozhatsky <sergey.senozhatsky@gmail.com>, Andrea Arcangeli <aarcange@redhat.com>, Alexei Starovoitov <alexei.starovoitov@gmail.com>, kemi.wang@intel.com, sergey.senozhatsky.work@gmail.com, Daniel Jordan <daniel.m.jordan@oracle.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, Tim Chen <tim.c.chen@linux.intel.com>, linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: Re: [PATCH v7 04/24] mm: Dont assume page-table invariance during faults Date: Thu, 8 Feb 2018 15:35:58 +0100 [thread overview] Message-ID: <484242d8-e632-9e39-5c99-2e1b4b3b69a5@linux.vnet.ibm.com> (raw) In-Reply-To: <20180206202831.GB16511@bombadil.infradead.org> On 06/02/2018 21:28, Matthew Wilcox wrote: > On Tue, Feb 06, 2018 at 05:49:50PM +0100, Laurent Dufour wrote: >> From: Peter Zijlstra <peterz@infradead.org> >> >> One of the side effects of speculating on faults (without holding >> mmap_sem) is that we can race with free_pgtables() and therefore we >> cannot assume the page-tables will stick around. >> >> Remove the reliance on the pte pointer. >> >> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> >> >> In most of the case pte_unmap_same() was returning 1, which meaning that >> do_swap_page() should do its processing. So in most of the case there will >> be no impact. >> >> Now regarding the case where pte_unmap_safe() was returning 0, and thus >> do_swap_page return 0 too, this happens when the page has already been >> swapped back. This may happen before do_swap_page() get called or while in >> the call to do_swap_page(). In that later case, the check done when >> swapin_readahead() returns will detect that case. >> >> The worst case would be that a page fault is occuring on 2 threads at the >> same time on the same swapped out page. In that case one thread will take >> much time looping in __read_swap_cache_async(). But in the regular page >> fault path, this is even worse since the thread would wait for semaphore to >> be released before starting anything. >> >> [Remove only if !CONFIG_SPECULATIVE_PAGE_FAULT] >> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> > > I have a great deal of trouble connecting all of the words above to the > contents of the patch. Thanks for pushing forward here, this raised some doubts on my side. I reviewed that part of code, and I think I could now change the way pte_unmap_safe() is checking for the pte's value. Since we now have all the needed details in the vm_fault structure, I will pass it to pte_unamp_same() and deal with the VMA checks when locking for the pte as it is done in the other part of the page fault handler by calling pte_spinlock(). This means that this patch will be dropped, and pte_unmap_same() will become : static inline int pte_unmap_same(struct vm_fault *vmf, int *same) { int ret = 0; *same = 1; #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT) if (sizeof(pte_t) > sizeof(unsigned long)) { if (pte_spinlock(vmf)) { *same = pte_same(*vmf->pte, vmf->orig_pte); spin_unlock(vmf->ptl); } else ret = VM_FAULT_RETRY; } #endif pte_unmap(vmf->pte); return ret; } Laurent. > >> >> +#ifndef CONFIG_SPECULATIVE_PAGE_FAULT >> /* >> * handle_pte_fault chooses page fault handler according to an entry which was >> * read non-atomically. Before making any commitment, on those architectures >> @@ -2311,6 +2312,7 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd, >> pte_unmap(page_table); >> return same; >> } >> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ >> >> static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma) >> { >> @@ -2898,11 +2900,13 @@ int do_swap_page(struct vm_fault *vmf) >> swapcache = page; >> } >> >> +#ifndef CONFIG_SPECULATIVE_PAGE_FAULT >> if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) { >> if (page) >> put_page(page); >> goto out; >> } >> +#endif >> > > This feels to me like we want: > > #ifdef CONFIG_SPECULATIVE_PAGE_FAULT > [current code] > #else > /* > * Some words here which explains why we always want to return this > * value if we support speculative page faults. > */ > static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd, > pte_t *page_table, pte_t orig_pte) > { > return 1; > } > #endif > > instead of cluttering do_swap_page with an ifdef. >
WARNING: multiple messages have this Message-ID (diff)
From: Laurent Dufour <ldufour@linux.vnet.ibm.com> To: Matthew Wilcox <willy@infradead.org> Cc: paulmck@linux.vnet.ibm.com, peterz@infradead.org, akpm@linux-foundation.org, kirill@shutemov.name, ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net, jack@suse.cz, benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, hpa@zytor.com, Will Deacon <will.deacon@arm.com>, Sergey Senozhatsky <sergey.senozhatsky@gmail.com>, Andrea Arcangeli <aarcange@redhat.com>, Alexei Starovoitov <alexei.starovoitov@gmail.com>, kemi.wang@intel.com, sergey.senozhatsky.work@gmail.com, Daniel Jordan <daniel.m.jordan@oracle.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, Tim Chen <tim.c.chen@linux.intel.com>, linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: Re: [PATCH v7 04/24] mm: Dont assume page-table invariance during faults Date: Thu, 8 Feb 2018 15:35:58 +0100 [thread overview] Message-ID: <484242d8-e632-9e39-5c99-2e1b4b3b69a5@linux.vnet.ibm.com> (raw) In-Reply-To: <20180206202831.GB16511@bombadil.infradead.org> On 06/02/2018 21:28, Matthew Wilcox wrote: > On Tue, Feb 06, 2018 at 05:49:50PM +0100, Laurent Dufour wrote: >> From: Peter Zijlstra <peterz@infradead.org> >> >> One of the side effects of speculating on faults (without holding >> mmap_sem) is that we can race with free_pgtables() and therefore we >> cannot assume the page-tables will stick around. >> >> Remove the reliance on the pte pointer. >> >> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> >> >> In most of the case pte_unmap_same() was returning 1, which meaning that >> do_swap_page() should do its processing. So in most of the case there will >> be no impact. >> >> Now regarding the case where pte_unmap_safe() was returning 0, and thus >> do_swap_page return 0 too, this happens when the page has already been >> swapped back. This may happen before do_swap_page() get called or while in >> the call to do_swap_page(). In that later case, the check done when >> swapin_readahead() returns will detect that case. >> >> The worst case would be that a page fault is occuring on 2 threads at the >> same time on the same swapped out page. In that case one thread will take >> much time looping in __read_swap_cache_async(). But in the regular page >> fault path, this is even worse since the thread would wait for semaphore to >> be released before starting anything. >> >> [Remove only if !CONFIG_SPECULATIVE_PAGE_FAULT] >> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> > > I have a great deal of trouble connecting all of the words above to the > contents of the patch. Thanks for pushing forward here, this raised some doubts on my side. I reviewed that part of code, and I think I could now change the way pte_unmap_safe() is checking for the pte's value. Since we now have all the needed details in the vm_fault structure, I will pass it to pte_unamp_same() and deal with the VMA checks when locking for the pte as it is done in the other part of the page fault handler by calling pte_spinlock(). This means that this patch will be dropped, and pte_unmap_same() will become : static inline int pte_unmap_same(struct vm_fault *vmf, int *same) { int ret = 0; *same = 1; #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT) if (sizeof(pte_t) > sizeof(unsigned long)) { if (pte_spinlock(vmf)) { *same = pte_same(*vmf->pte, vmf->orig_pte); spin_unlock(vmf->ptl); } else ret = VM_FAULT_RETRY; } #endif pte_unmap(vmf->pte); return ret; } Laurent. > >> >> +#ifndef CONFIG_SPECULATIVE_PAGE_FAULT >> /* >> * handle_pte_fault chooses page fault handler according to an entry which was >> * read non-atomically. Before making any commitment, on those architectures >> @@ -2311,6 +2312,7 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd, >> pte_unmap(page_table); >> return same; >> } >> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ >> >> static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma) >> { >> @@ -2898,11 +2900,13 @@ int do_swap_page(struct vm_fault *vmf) >> swapcache = page; >> } >> >> +#ifndef CONFIG_SPECULATIVE_PAGE_FAULT >> if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) { >> if (page) >> put_page(page); >> goto out; >> } >> +#endif >> > > This feels to me like we want: > > #ifdef CONFIG_SPECULATIVE_PAGE_FAULT > [current code] > #else > /* > * Some words here which explains why we always want to return this > * value if we support speculative page faults. > */ > static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd, > pte_t *page_table, pte_t orig_pte) > { > return 1; > } > #endif > > instead of cluttering do_swap_page with an ifdef. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2018-02-08 14:36 UTC|newest] Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-02-06 16:49 [PATCH v7 00/24] Speculative page faults Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 01/24] mm: Introduce CONFIG_SPECULATIVE_PAGE_FAULT Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 02/24] x86/mm: Define CONFIG_SPECULATIVE_PAGE_FAULT Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 03/24] powerpc/mm: " Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 04/24] mm: Dont assume page-table invariance during faults Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 20:28 ` Matthew Wilcox 2018-02-06 20:28 ` Matthew Wilcox 2018-02-08 14:35 ` Laurent Dufour [this message] 2018-02-08 14:35 ` Laurent Dufour 2018-02-08 15:00 ` Matthew Wilcox 2018-02-08 15:00 ` Matthew Wilcox 2018-02-08 17:14 ` Laurent Dufour 2018-02-08 17:14 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 05/24] mm: Prepare for FAULT_FLAG_SPECULATIVE Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 06/24] mm: Introduce pte_spinlock " Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 07/24] mm: VMA sequence count Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 08/24] mm: Protect VMA modifications using " Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 09/24] mm: protect mremap() against SPF hanlder Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 10/24] mm: Protect SPF handler against anon_vma changes Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 11/24] mm: Cache some VMA fields in the vm_fault structure Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 12/24] mm/migrate: Pass vm_fault pointer to migrate_misplaced_page() Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:49 ` [PATCH v7 13/24] mm: Introduce __lru_cache_add_active_or_unevictable Laurent Dufour 2018-02-06 16:49 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 14/24] mm: Introduce __maybe_mkwrite() Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 15/24] mm: Introduce __vm_normal_page() Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 16/24] mm: Introduce __page_add_new_anon_rmap() Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 17/24] mm: Protect mm_rb tree with a rwlock Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 18/24] mm: Provide speculative fault infrastructure Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 19/24] mm: Adding speculative page fault failure trace events Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 20/24] perf: Add a speculative page fault sw event Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 21/24] perf tools: Add support for the SPF perf event Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 22/24] mm: Speculative page fault handler return VMA Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 23/24] x86/mm: Add speculative pagefault handling Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-06 16:50 ` [PATCH v7 24/24] powerpc/mm: Add speculative page fault Laurent Dufour 2018-02-06 16:50 ` Laurent Dufour 2018-02-08 20:53 ` [PATCH v7 00/24] Speculative page faults Andrew Morton 2018-02-08 20:53 ` Andrew Morton 2018-02-13 7:56 ` Laurent Dufour 2018-02-13 7:56 ` Laurent Dufour
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=484242d8-e632-9e39-5c99-2e1b4b3b69a5@linux.vnet.ibm.com \ --to=ldufour@linux.vnet.ibm.com \ --cc=aarcange@redhat.com \ --cc=ak@linux.intel.com \ --cc=akpm@linux-foundation.org \ --cc=alexei.starovoitov@gmail.com \ --cc=benh@kernel.crashing.org \ --cc=bsingharora@gmail.com \ --cc=daniel.m.jordan@oracle.com \ --cc=dave@stgolabs.net \ --cc=haren@linux.vnet.ibm.com \ --cc=hpa@zytor.com \ --cc=jack@suse.cz \ --cc=kemi.wang@intel.com \ --cc=khandual@linux.vnet.ibm.com \ --cc=kirill@shutemov.name \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mhocko@kernel.org \ --cc=mingo@redhat.com \ --cc=mpe@ellerman.id.au \ --cc=npiggin@gmail.com \ --cc=paulmck@linux.vnet.ibm.com \ --cc=paulus@samba.org \ --cc=peterz@infradead.org \ --cc=sergey.senozhatsky.work@gmail.com \ --cc=sergey.senozhatsky@gmail.com \ --cc=tglx@linutronix.de \ --cc=tim.c.chen@linux.intel.com \ --cc=will.deacon@arm.com \ --cc=willy@infradead.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.