All of lore.kernel.org
 help / color / mirror / Atom feed
* [v2 PATCH] mm: avoid access flag update TLB flush for retried page fault
@ 2020-07-15 21:36 ` Yang Shi
  0 siblings, 0 replies; 4+ messages in thread
From: Yang Shi @ 2020-07-15 21:36 UTC (permalink / raw)
  To: hannes, catalin.marinas, will.deacon, akpm
  Cc: yang.shi, xuyu, linux-mm, linux-arm-kernel, linux-kernel

Recently we found regression when running will_it_scale/page_fault3 test
on ARM64.  Over 70% down for the multi processes cases and over 20% down
for the multi threads cases.  It turns out the regression is caused by commit
89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
calling balance_dirty_pages() in write fault").

The test mmaps a memory size file then write to the mapping, this would
make all memory dirty and trigger dirty pages throttle, that upstream
commit would release mmap_sem then retry the page fault.  The retried
page fault would see correct PTEs installed by the first try then update
dirty bit and clear read-only bit and flush TLBs for ARM.  The regression is
caused by the excessive TLB flush.  It is fine on x86 since x86 doesn't
clear read-only bit so there is no need to flush TLB for this case.

The page fault would be retried due to:
1. Waiting for page readahead
2. Waiting for page swapped in
3. Waiting for dirty pages throttling

The first two cases don't have PTEs set up at all, so the retried page
fault would install the PTEs, so they don't reach there.  But the #3
case usually has PTEs installed, the retried page fault would reach the
dirty bit and read-only bit update.  But it seems not necessary to
modify those bits again for #3 since they should be already set by the
first page fault try.

Of course the parallel page fault may set up PTEs, but we just need care
about write fault.  If the parallel page fault setup a writable and dirty
PTE then the retried fault doesn't need do anything extra.  If the
parallel page fault setup a clean read-only PTE, the retried fault should
just call do_wp_page() then return as the below code snippet shows:

if (vmf->flags & FAULT_FLAG_WRITE) {
        if (!pte_write(entry))
            return do_wp_page(vmf);
}

With this fix the test result get back to normal.

Fixes: 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Xu Yu <xuyu@linux.alibaba.com>
Debugged-by: Xu Yu <xuyu@linux.alibaba.com>
Tested-by: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
v2: * Incorporated the comment from Will Deacon.
    * Updated the commit log per the discussion.

 mm/memory.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 87ec87c..e93e1da 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	if (vmf->flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry))
 			return do_wp_page(vmf);
-		entry = pte_mkdirty(entry);
 	}
+
+	if (vmf->flags & FAULT_FLAG_TRIED)
+		goto unlock;
+
+	if (vmf->flags & FAULT_FLAG_WRITE)
+		entry = pte_mkdirty(entry);
+
 	entry = pte_mkyoung(entry);
 	if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry,
 				vmf->flags & FAULT_FLAG_WRITE)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [v2 PATCH] mm: avoid access flag update TLB flush for retried page fault
@ 2020-07-15 21:36 ` Yang Shi
  0 siblings, 0 replies; 4+ messages in thread
From: Yang Shi @ 2020-07-15 21:36 UTC (permalink / raw)
  To: hannes, catalin.marinas, will.deacon, akpm
  Cc: xuyu, yang.shi, linux-kernel, linux-arm-kernel, linux-mm

Recently we found regression when running will_it_scale/page_fault3 test
on ARM64.  Over 70% down for the multi processes cases and over 20% down
for the multi threads cases.  It turns out the regression is caused by commit
89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
calling balance_dirty_pages() in write fault").

The test mmaps a memory size file then write to the mapping, this would
make all memory dirty and trigger dirty pages throttle, that upstream
commit would release mmap_sem then retry the page fault.  The retried
page fault would see correct PTEs installed by the first try then update
dirty bit and clear read-only bit and flush TLBs for ARM.  The regression is
caused by the excessive TLB flush.  It is fine on x86 since x86 doesn't
clear read-only bit so there is no need to flush TLB for this case.

The page fault would be retried due to:
1. Waiting for page readahead
2. Waiting for page swapped in
3. Waiting for dirty pages throttling

The first two cases don't have PTEs set up at all, so the retried page
fault would install the PTEs, so they don't reach there.  But the #3
case usually has PTEs installed, the retried page fault would reach the
dirty bit and read-only bit update.  But it seems not necessary to
modify those bits again for #3 since they should be already set by the
first page fault try.

Of course the parallel page fault may set up PTEs, but we just need care
about write fault.  If the parallel page fault setup a writable and dirty
PTE then the retried fault doesn't need do anything extra.  If the
parallel page fault setup a clean read-only PTE, the retried fault should
just call do_wp_page() then return as the below code snippet shows:

if (vmf->flags & FAULT_FLAG_WRITE) {
        if (!pte_write(entry))
            return do_wp_page(vmf);
}

With this fix the test result get back to normal.

Fixes: 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Xu Yu <xuyu@linux.alibaba.com>
Debugged-by: Xu Yu <xuyu@linux.alibaba.com>
Tested-by: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
v2: * Incorporated the comment from Will Deacon.
    * Updated the commit log per the discussion.

 mm/memory.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 87ec87c..e93e1da 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	if (vmf->flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry))
 			return do_wp_page(vmf);
-		entry = pte_mkdirty(entry);
 	}
+
+	if (vmf->flags & FAULT_FLAG_TRIED)
+		goto unlock;
+
+	if (vmf->flags & FAULT_FLAG_WRITE)
+		entry = pte_mkdirty(entry);
+
 	entry = pte_mkyoung(entry);
 	if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry,
 				vmf->flags & FAULT_FLAG_WRITE)) {
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [v2 PATCH] mm: avoid access flag update TLB flush for retried page fault
  2020-07-15 21:36 ` Yang Shi
@ 2020-07-17 10:08   ` Will Deacon
  -1 siblings, 0 replies; 4+ messages in thread
From: Will Deacon @ 2020-07-17 10:08 UTC (permalink / raw)
  To: Yang Shi
  Cc: hannes, catalin.marinas, will.deacon, akpm, xuyu, linux-kernel,
	linux-arm-kernel, linux-mm

On Thu, Jul 16, 2020 at 05:36:30AM +0800, Yang Shi wrote:
> Recently we found regression when running will_it_scale/page_fault3 test
> on ARM64.  Over 70% down for the multi processes cases and over 20% down
> for the multi threads cases.  It turns out the regression is caused by commit
> 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
> calling balance_dirty_pages() in write fault").
> 
> The test mmaps a memory size file then write to the mapping, this would
> make all memory dirty and trigger dirty pages throttle, that upstream
> commit would release mmap_sem then retry the page fault.  The retried
> page fault would see correct PTEs installed by the first try then update
> dirty bit and clear read-only bit and flush TLBs for ARM.  The regression is
> caused by the excessive TLB flush.  It is fine on x86 since x86 doesn't
> clear read-only bit so there is no need to flush TLB for this case.
> 
> The page fault would be retried due to:
> 1. Waiting for page readahead
> 2. Waiting for page swapped in
> 3. Waiting for dirty pages throttling
> 
> The first two cases don't have PTEs set up at all, so the retried page
> fault would install the PTEs, so they don't reach there.  But the #3
> case usually has PTEs installed, the retried page fault would reach the
> dirty bit and read-only bit update.  But it seems not necessary to
> modify those bits again for #3 since they should be already set by the
> first page fault try.
> 
> Of course the parallel page fault may set up PTEs, but we just need care
> about write fault.  If the parallel page fault setup a writable and dirty
> PTE then the retried fault doesn't need do anything extra.  If the
> parallel page fault setup a clean read-only PTE, the retried fault should
> just call do_wp_page() then return as the below code snippet shows:
> 
> if (vmf->flags & FAULT_FLAG_WRITE) {
>         if (!pte_write(entry))
>             return do_wp_page(vmf);
> }
> 
> With this fix the test result get back to normal.
> 
> Fixes: 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault")
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Reported-by: Xu Yu <xuyu@linux.alibaba.com>
> Debugged-by: Xu Yu <xuyu@linux.alibaba.com>
> Tested-by: Xu Yu <xuyu@linux.alibaba.com>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
> v2: * Incorporated the comment from Will Deacon.
>     * Updated the commit log per the discussion.
> 
>  mm/memory.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 87ec87c..e93e1da 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>  	if (vmf->flags & FAULT_FLAG_WRITE) {
>  		if (!pte_write(entry))
>  			return do_wp_page(vmf);
> -		entry = pte_mkdirty(entry);
>  	}
> +
> +	if (vmf->flags & FAULT_FLAG_TRIED)
> +		goto unlock;
> +
> +	if (vmf->flags & FAULT_FLAG_WRITE)
> +		entry = pte_mkdirty(entry);
> +


Thanks, this looks better to me.

Andrew -- please can you update the version in your tree?

Cheers,

Will

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [v2 PATCH] mm: avoid access flag update TLB flush for retried page fault
@ 2020-07-17 10:08   ` Will Deacon
  0 siblings, 0 replies; 4+ messages in thread
From: Will Deacon @ 2020-07-17 10:08 UTC (permalink / raw)
  To: Yang Shi
  Cc: catalin.marinas, will.deacon, linux-kernel, linux-mm, hannes,
	xuyu, akpm, linux-arm-kernel

On Thu, Jul 16, 2020 at 05:36:30AM +0800, Yang Shi wrote:
> Recently we found regression when running will_it_scale/page_fault3 test
> on ARM64.  Over 70% down for the multi processes cases and over 20% down
> for the multi threads cases.  It turns out the regression is caused by commit
> 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
> calling balance_dirty_pages() in write fault").
> 
> The test mmaps a memory size file then write to the mapping, this would
> make all memory dirty and trigger dirty pages throttle, that upstream
> commit would release mmap_sem then retry the page fault.  The retried
> page fault would see correct PTEs installed by the first try then update
> dirty bit and clear read-only bit and flush TLBs for ARM.  The regression is
> caused by the excessive TLB flush.  It is fine on x86 since x86 doesn't
> clear read-only bit so there is no need to flush TLB for this case.
> 
> The page fault would be retried due to:
> 1. Waiting for page readahead
> 2. Waiting for page swapped in
> 3. Waiting for dirty pages throttling
> 
> The first two cases don't have PTEs set up at all, so the retried page
> fault would install the PTEs, so they don't reach there.  But the #3
> case usually has PTEs installed, the retried page fault would reach the
> dirty bit and read-only bit update.  But it seems not necessary to
> modify those bits again for #3 since they should be already set by the
> first page fault try.
> 
> Of course the parallel page fault may set up PTEs, but we just need care
> about write fault.  If the parallel page fault setup a writable and dirty
> PTE then the retried fault doesn't need do anything extra.  If the
> parallel page fault setup a clean read-only PTE, the retried fault should
> just call do_wp_page() then return as the below code snippet shows:
> 
> if (vmf->flags & FAULT_FLAG_WRITE) {
>         if (!pte_write(entry))
>             return do_wp_page(vmf);
> }
> 
> With this fix the test result get back to normal.
> 
> Fixes: 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault")
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Reported-by: Xu Yu <xuyu@linux.alibaba.com>
> Debugged-by: Xu Yu <xuyu@linux.alibaba.com>
> Tested-by: Xu Yu <xuyu@linux.alibaba.com>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
> v2: * Incorporated the comment from Will Deacon.
>     * Updated the commit log per the discussion.
> 
>  mm/memory.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 87ec87c..e93e1da 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>  	if (vmf->flags & FAULT_FLAG_WRITE) {
>  		if (!pte_write(entry))
>  			return do_wp_page(vmf);
> -		entry = pte_mkdirty(entry);
>  	}
> +
> +	if (vmf->flags & FAULT_FLAG_TRIED)
> +		goto unlock;
> +
> +	if (vmf->flags & FAULT_FLAG_WRITE)
> +		entry = pte_mkdirty(entry);
> +


Thanks, this looks better to me.

Andrew -- please can you update the version in your tree?

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-17 10:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 21:36 [v2 PATCH] mm: avoid access flag update TLB flush for retried page fault Yang Shi
2020-07-15 21:36 ` Yang Shi
2020-07-17 10:08 ` Will Deacon
2020-07-17 10:08   ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.