All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: hannes@cmpxchg.org, catalin.marinas@arm.com, will.deacon@arm.com,
	akpm@linux-foundation.org
Cc: yang.shi@linux.alibaba.com, xuyu@linux.alibaba.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH] mm: avoid access flag update TLB flush for retried page fault
Date: Wed,  8 Jul 2020 02:54:32 +0800	[thread overview]
Message-ID: <1594148072-91273-1-git-send-email-yang.shi@linux.alibaba.com> (raw)

Recently we found regression when running will_it_scale/page_fault3 test
on ARM64.  Over 70% down for the multi processes cases and over 20% down
for the multi threads cases.  It turns out the regression is caused by commit
89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
calling balance_dirty_pages() in write fault").

The test mmaps a memory size file then write to the mapping, this would
make all memory dirty and trigger dirty pages throttle, that upstream
commit would release mmap_sem then retry the page fault.  The retried
page fault would see correct PTEs installed by the first try then update
access flags and flush TLBs.  The regression is caused by the excessive
TLB flush.  It is fine on x86 since x86 doesn't need flush TLB for
access flag update.

The page fault would be retried due to:
1. Waiting for page readahead
2. Waiting for page swapped in
3. Waiting for dirty pages throttling

The first two cases don't have PTEs set up at all, so the retried page
fault would install the PTEs, so they don't reach there.  But the #3
case usually has PTEs installed, the retried page fault would reach the
access flag update.  But it seems not necessary to update access flags
for #3 since retried page fault is not real "second access", so it
sounds safe to skip access flag update for retried page fault.

With this fix the test result get back to normal.

Reported-by: Xu Yu <xuyu@linux.alibaba.com>
Debugged-by: Xu Yu <xuyu@linux.alibaba.com>
Tested-by: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
I'm not sure if this is safe for non-x86 machines, we did some tests on arm64, but
there may be still corner cases not covered.

 mm/memory.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 87ec87c..3d4e671 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4241,8 +4241,13 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	if (vmf->flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry))
 			return do_wp_page(vmf);
-		entry = pte_mkdirty(entry);
 	}
+
+	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vmf->flags & FAULT_FLAG_TRIED))
+		entry = pte_mkdirty(entry); 
+	else if (vmf->flags & FAULT_FLAG_TRIED)
+		goto unlock;
+
 	entry = pte_mkyoung(entry);
 	if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry,
 				vmf->flags & FAULT_FLAG_WRITE)) {
-- 
1.8.3.1


WARNING: multiple messages have this Message-ID (diff)
From: Yang Shi <yang.shi@linux.alibaba.com>
To: hannes@cmpxchg.org, catalin.marinas@arm.com, will.deacon@arm.com,
	akpm@linux-foundation.org
Cc: xuyu@linux.alibaba.com, yang.shi@linux.alibaba.com,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
Subject: [RFC PATCH] mm: avoid access flag update TLB flush for retried page fault
Date: Wed,  8 Jul 2020 02:54:32 +0800	[thread overview]
Message-ID: <1594148072-91273-1-git-send-email-yang.shi@linux.alibaba.com> (raw)

Recently we found regression when running will_it_scale/page_fault3 test
on ARM64.  Over 70% down for the multi processes cases and over 20% down
for the multi threads cases.  It turns out the regression is caused by commit
89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
calling balance_dirty_pages() in write fault").

The test mmaps a memory size file then write to the mapping, this would
make all memory dirty and trigger dirty pages throttle, that upstream
commit would release mmap_sem then retry the page fault.  The retried
page fault would see correct PTEs installed by the first try then update
access flags and flush TLBs.  The regression is caused by the excessive
TLB flush.  It is fine on x86 since x86 doesn't need flush TLB for
access flag update.

The page fault would be retried due to:
1. Waiting for page readahead
2. Waiting for page swapped in
3. Waiting for dirty pages throttling

The first two cases don't have PTEs set up at all, so the retried page
fault would install the PTEs, so they don't reach there.  But the #3
case usually has PTEs installed, the retried page fault would reach the
access flag update.  But it seems not necessary to update access flags
for #3 since retried page fault is not real "second access", so it
sounds safe to skip access flag update for retried page fault.

With this fix the test result get back to normal.

Reported-by: Xu Yu <xuyu@linux.alibaba.com>
Debugged-by: Xu Yu <xuyu@linux.alibaba.com>
Tested-by: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
I'm not sure if this is safe for non-x86 machines, we did some tests on arm64, but
there may be still corner cases not covered.

 mm/memory.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 87ec87c..3d4e671 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4241,8 +4241,13 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	if (vmf->flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry))
 			return do_wp_page(vmf);
-		entry = pte_mkdirty(entry);
 	}
+
+	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vmf->flags & FAULT_FLAG_TRIED))
+		entry = pte_mkdirty(entry); 
+	else if (vmf->flags & FAULT_FLAG_TRIED)
+		goto unlock;
+
 	entry = pte_mkyoung(entry);
 	if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry,
 				vmf->flags & FAULT_FLAG_WRITE)) {
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

             reply	other threads:[~2020-07-07 18:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07 18:54 Yang Shi [this message]
2020-07-07 18:54 ` [RFC PATCH] mm: avoid access flag update TLB flush for retried page fault Yang Shi
2020-07-08  8:00 ` Will Deacon
2020-07-08  8:00   ` Will Deacon
2020-07-08 16:40   ` Yang Shi
2020-07-08 16:40     ` Yang Shi
2020-07-08 17:29     ` Catalin Marinas
2020-07-08 17:29       ` Catalin Marinas
2020-07-08 18:13       ` Yang Shi
2020-07-08 18:13         ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1594148072-91273-1-git-send-email-yang.shi@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=will.deacon@arm.com \
    --cc=xuyu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.