From: Will Deacon <will@kernel.org>
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: hannes@cmpxchg.org, catalin.marinas@arm.com, will.deacon@arm.com,
akpm@linux-foundation.org, xuyu@linux.alibaba.com,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
Subject: Re: [v2 PATCH] mm: avoid access flag update TLB flush for retried page fault
Date: Fri, 17 Jul 2020 11:08:21 +0100 [thread overview]
Message-ID: <20200717100820.GB8673@willie-the-truck> (raw)
In-Reply-To: <1594848990-55657-1-git-send-email-yang.shi@linux.alibaba.com>
On Thu, Jul 16, 2020 at 05:36:30AM +0800, Yang Shi wrote:
> Recently we found regression when running will_it_scale/page_fault3 test
> on ARM64. Over 70% down for the multi processes cases and over 20% down
> for the multi threads cases. It turns out the regression is caused by commit
> 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
> calling balance_dirty_pages() in write fault").
>
> The test mmaps a memory size file then write to the mapping, this would
> make all memory dirty and trigger dirty pages throttle, that upstream
> commit would release mmap_sem then retry the page fault. The retried
> page fault would see correct PTEs installed by the first try then update
> dirty bit and clear read-only bit and flush TLBs for ARM. The regression is
> caused by the excessive TLB flush. It is fine on x86 since x86 doesn't
> clear read-only bit so there is no need to flush TLB for this case.
>
> The page fault would be retried due to:
> 1. Waiting for page readahead
> 2. Waiting for page swapped in
> 3. Waiting for dirty pages throttling
>
> The first two cases don't have PTEs set up at all, so the retried page
> fault would install the PTEs, so they don't reach there. But the #3
> case usually has PTEs installed, the retried page fault would reach the
> dirty bit and read-only bit update. But it seems not necessary to
> modify those bits again for #3 since they should be already set by the
> first page fault try.
>
> Of course the parallel page fault may set up PTEs, but we just need care
> about write fault. If the parallel page fault setup a writable and dirty
> PTE then the retried fault doesn't need do anything extra. If the
> parallel page fault setup a clean read-only PTE, the retried fault should
> just call do_wp_page() then return as the below code snippet shows:
>
> if (vmf->flags & FAULT_FLAG_WRITE) {
> if (!pte_write(entry))
> return do_wp_page(vmf);
> }
>
> With this fix the test result get back to normal.
>
> Fixes: 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault")
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Reported-by: Xu Yu <xuyu@linux.alibaba.com>
> Debugged-by: Xu Yu <xuyu@linux.alibaba.com>
> Tested-by: Xu Yu <xuyu@linux.alibaba.com>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
> v2: * Incorporated the comment from Will Deacon.
> * Updated the commit log per the discussion.
>
> mm/memory.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 87ec87c..e93e1da 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> if (vmf->flags & FAULT_FLAG_WRITE) {
> if (!pte_write(entry))
> return do_wp_page(vmf);
> - entry = pte_mkdirty(entry);
> }
> +
> + if (vmf->flags & FAULT_FLAG_TRIED)
> + goto unlock;
> +
> + if (vmf->flags & FAULT_FLAG_WRITE)
> + entry = pte_mkdirty(entry);
> +
Thanks, this looks better to me.
Andrew -- please can you update the version in your tree?
Cheers,
Will
prev parent reply other threads:[~2020-07-17 10:09 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-15 21:36 [v2 PATCH] mm: avoid access flag update TLB flush for retried page fault Yang Shi
2020-07-17 10:08 ` Will Deacon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200717100820.GB8673@willie-the-truck \
--to=will@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=hannes@cmpxchg.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=will.deacon@arm.com \
--cc=xuyu@linux.alibaba.com \
--cc=yang.shi@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).