All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laurent Dufour <ldufour@linux.vnet.ibm.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: paulmck@linux.vnet.ibm.com, peterz@infradead.org,
	akpm@linux-foundation.org, kirill@shutemov.name,
	ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net,
	jack@suse.cz, benh@kernel.crashing.org, mpe@ellerman.id.au,
	paulus@samba.org, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	hpa@zytor.com, Will Deacon <will.deacon@arm.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	kemi.wang@intel.com, sergey.senozhatsky.work@gmail.com,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com,
	npiggin@gmail.com, bsingharora@gmail.com,
	Tim Chen <tim.c.chen@linux.intel.com>,
	linuxppc-dev@lists.ozlabs.org, x86@kernel.org
Subject: Re: [PATCH v7 04/24] mm: Dont assume page-table invariance during faults
Date: Thu, 8 Feb 2018 15:35:58 +0100	[thread overview]
Message-ID: <484242d8-e632-9e39-5c99-2e1b4b3b69a5@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180206202831.GB16511@bombadil.infradead.org>

On 06/02/2018 21:28, Matthew Wilcox wrote:
> On Tue, Feb 06, 2018 at 05:49:50PM +0100, Laurent Dufour wrote:
>> From: Peter Zijlstra <peterz@infradead.org>
>>
>> One of the side effects of speculating on faults (without holding
>> mmap_sem) is that we can race with free_pgtables() and therefore we
>> cannot assume the page-tables will stick around.
>>
>> Remove the reliance on the pte pointer.
>>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>
>> In most of the case pte_unmap_same() was returning 1, which meaning that
>> do_swap_page() should do its processing. So in most of the case there will
>> be no impact.
>>
>> Now regarding the case where pte_unmap_safe() was returning 0, and thus
>> do_swap_page return 0 too, this happens when the page has already been
>> swapped back. This may happen before do_swap_page() get called or while in
>> the call to do_swap_page(). In that later case, the check done when
>> swapin_readahead() returns will detect that case.
>>
>> The worst case would be that a page fault is occuring on 2 threads at the
>> same time on the same swapped out page. In that case one thread will take
>> much time looping in __read_swap_cache_async(). But in the regular page
>> fault path, this is even worse since the thread would wait for semaphore to
>> be released before starting anything.
>>
>> [Remove only if !CONFIG_SPECULATIVE_PAGE_FAULT]
>> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> 
> I have a great deal of trouble connecting all of the words above to the
> contents of the patch.

Thanks for pushing forward here, this raised some doubts on my side.

I reviewed that part of code, and I think I could now change the way
pte_unmap_safe() is checking for the pte's value. Since we now have all the
needed details in the vm_fault structure, I will pass it to
pte_unamp_same() and deal with the VMA checks when locking for the pte as
it is done in the other part of the page fault handler by calling
pte_spinlock().

This means that this patch will be dropped, and pte_unmap_same() will become :

static inline int pte_unmap_same(struct vm_fault *vmf, int *same)
{
	int ret = 0;

	*same = 1;
#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
	if (sizeof(pte_t) > sizeof(unsigned long)) {
		if (pte_spinlock(vmf)) {
			*same = pte_same(*vmf->pte, vmf->orig_pte);
			spin_unlock(vmf->ptl);
		}
		else
			ret = VM_FAULT_RETRY;
	}
#endif
	pte_unmap(vmf->pte);
	return ret;
}

Laurent.

> 
>>  
>> +#ifndef CONFIG_SPECULATIVE_PAGE_FAULT
>>  /*
>>   * handle_pte_fault chooses page fault handler according to an entry which was
>>   * read non-atomically.  Before making any commitment, on those architectures
>> @@ -2311,6 +2312,7 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
>>  	pte_unmap(page_table);
>>  	return same;
>>  }
>> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
>>  
>>  static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
>>  {
>> @@ -2898,11 +2900,13 @@ int do_swap_page(struct vm_fault *vmf)
>>  		swapcache = page;
>>  	}
>>  
>> +#ifndef CONFIG_SPECULATIVE_PAGE_FAULT
>>  	if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) {
>>  		if (page)
>>  			put_page(page);
>>  		goto out;
>>  	}
>> +#endif
>>  
> 
> This feels to me like we want:
> 
> #ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> [current code]
> #else
> /*
>  * Some words here which explains why we always want to return this
>  * value if we support speculative page faults.
>  */
> static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
> 				pte_t *page_table, pte_t orig_pte)
> {
> 	return 1;
> }
> #endif
> 
> instead of cluttering do_swap_page with an ifdef.
> 

WARNING: multiple messages have this Message-ID (diff)
From: Laurent Dufour <ldufour@linux.vnet.ibm.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: paulmck@linux.vnet.ibm.com, peterz@infradead.org,
	akpm@linux-foundation.org, kirill@shutemov.name,
	ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net,
	jack@suse.cz, benh@kernel.crashing.org, mpe@ellerman.id.au,
	paulus@samba.org, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	hpa@zytor.com, Will Deacon <will.deacon@arm.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	kemi.wang@intel.com, sergey.senozhatsky.work@gmail.com,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com,
	npiggin@gmail.com, bsingharora@gmail.com,
	Tim Chen <tim.c.chen@linux.intel.com>,
	linuxppc-dev@lists.ozlabs.org, x86@kernel.org
Subject: Re: [PATCH v7 04/24] mm: Dont assume page-table invariance during faults
Date: Thu, 8 Feb 2018 15:35:58 +0100	[thread overview]
Message-ID: <484242d8-e632-9e39-5c99-2e1b4b3b69a5@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180206202831.GB16511@bombadil.infradead.org>

On 06/02/2018 21:28, Matthew Wilcox wrote:
> On Tue, Feb 06, 2018 at 05:49:50PM +0100, Laurent Dufour wrote:
>> From: Peter Zijlstra <peterz@infradead.org>
>>
>> One of the side effects of speculating on faults (without holding
>> mmap_sem) is that we can race with free_pgtables() and therefore we
>> cannot assume the page-tables will stick around.
>>
>> Remove the reliance on the pte pointer.
>>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>
>> In most of the case pte_unmap_same() was returning 1, which meaning that
>> do_swap_page() should do its processing. So in most of the case there will
>> be no impact.
>>
>> Now regarding the case where pte_unmap_safe() was returning 0, and thus
>> do_swap_page return 0 too, this happens when the page has already been
>> swapped back. This may happen before do_swap_page() get called or while in
>> the call to do_swap_page(). In that later case, the check done when
>> swapin_readahead() returns will detect that case.
>>
>> The worst case would be that a page fault is occuring on 2 threads at the
>> same time on the same swapped out page. In that case one thread will take
>> much time looping in __read_swap_cache_async(). But in the regular page
>> fault path, this is even worse since the thread would wait for semaphore to
>> be released before starting anything.
>>
>> [Remove only if !CONFIG_SPECULATIVE_PAGE_FAULT]
>> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> 
> I have a great deal of trouble connecting all of the words above to the
> contents of the patch.

Thanks for pushing forward here, this raised some doubts on my side.

I reviewed that part of code, and I think I could now change the way
pte_unmap_safe() is checking for the pte's value. Since we now have all the
needed details in the vm_fault structure, I will pass it to
pte_unamp_same() and deal with the VMA checks when locking for the pte as
it is done in the other part of the page fault handler by calling
pte_spinlock().

This means that this patch will be dropped, and pte_unmap_same() will become :

static inline int pte_unmap_same(struct vm_fault *vmf, int *same)
{
	int ret = 0;

	*same = 1;
#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
	if (sizeof(pte_t) > sizeof(unsigned long)) {
		if (pte_spinlock(vmf)) {
			*same = pte_same(*vmf->pte, vmf->orig_pte);
			spin_unlock(vmf->ptl);
		}
		else
			ret = VM_FAULT_RETRY;
	}
#endif
	pte_unmap(vmf->pte);
	return ret;
}

Laurent.

> 
>>  
>> +#ifndef CONFIG_SPECULATIVE_PAGE_FAULT
>>  /*
>>   * handle_pte_fault chooses page fault handler according to an entry which was
>>   * read non-atomically.  Before making any commitment, on those architectures
>> @@ -2311,6 +2312,7 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
>>  	pte_unmap(page_table);
>>  	return same;
>>  }
>> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
>>  
>>  static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
>>  {
>> @@ -2898,11 +2900,13 @@ int do_swap_page(struct vm_fault *vmf)
>>  		swapcache = page;
>>  	}
>>  
>> +#ifndef CONFIG_SPECULATIVE_PAGE_FAULT
>>  	if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) {
>>  		if (page)
>>  			put_page(page);
>>  		goto out;
>>  	}
>> +#endif
>>  
> 
> This feels to me like we want:
> 
> #ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> [current code]
> #else
> /*
>  * Some words here which explains why we always want to return this
>  * value if we support speculative page faults.
>  */
> static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
> 				pte_t *page_table, pte_t orig_pte)
> {
> 	return 1;
> }
> #endif
> 
> instead of cluttering do_swap_page with an ifdef.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2018-02-08 14:36 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-06 16:49 [PATCH v7 00/24] Speculative page faults Laurent Dufour
2018-02-06 16:49 ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 01/24] mm: Introduce CONFIG_SPECULATIVE_PAGE_FAULT Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 02/24] x86/mm: Define CONFIG_SPECULATIVE_PAGE_FAULT Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 03/24] powerpc/mm: " Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 04/24] mm: Dont assume page-table invariance during faults Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 20:28   ` Matthew Wilcox
2018-02-06 20:28     ` Matthew Wilcox
2018-02-08 14:35     ` Laurent Dufour [this message]
2018-02-08 14:35       ` Laurent Dufour
2018-02-08 15:00       ` Matthew Wilcox
2018-02-08 15:00         ` Matthew Wilcox
2018-02-08 17:14         ` Laurent Dufour
2018-02-08 17:14           ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 05/24] mm: Prepare for FAULT_FLAG_SPECULATIVE Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 06/24] mm: Introduce pte_spinlock " Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 07/24] mm: VMA sequence count Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 08/24] mm: Protect VMA modifications using " Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 09/24] mm: protect mremap() against SPF hanlder Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 10/24] mm: Protect SPF handler against anon_vma changes Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 11/24] mm: Cache some VMA fields in the vm_fault structure Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 12/24] mm/migrate: Pass vm_fault pointer to migrate_misplaced_page() Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:49 ` [PATCH v7 13/24] mm: Introduce __lru_cache_add_active_or_unevictable Laurent Dufour
2018-02-06 16:49   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 14/24] mm: Introduce __maybe_mkwrite() Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 15/24] mm: Introduce __vm_normal_page() Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 16/24] mm: Introduce __page_add_new_anon_rmap() Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 17/24] mm: Protect mm_rb tree with a rwlock Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 18/24] mm: Provide speculative fault infrastructure Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 19/24] mm: Adding speculative page fault failure trace events Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 20/24] perf: Add a speculative page fault sw event Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 21/24] perf tools: Add support for the SPF perf event Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 22/24] mm: Speculative page fault handler return VMA Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 23/24] x86/mm: Add speculative pagefault handling Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-06 16:50 ` [PATCH v7 24/24] powerpc/mm: Add speculative page fault Laurent Dufour
2018-02-06 16:50   ` Laurent Dufour
2018-02-08 20:53 ` [PATCH v7 00/24] Speculative page faults Andrew Morton
2018-02-08 20:53   ` Andrew Morton
2018-02-13  7:56   ` Laurent Dufour
2018-02-13  7:56     ` Laurent Dufour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=484242d8-e632-9e39-5c99-2e1b4b3b69a5@linux.vnet.ibm.com \
    --to=ldufour@linux.vnet.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=bsingharora@gmail.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave@stgolabs.net \
    --cc=haren@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=kemi.wang@intel.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.