From: Will Deacon <will@kernel.org> To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Will Deacon <will@kernel.org>, Catalin Marinas <catalin.marinas@arm.com>, Jan Kara <jack@suse.cz>, Minchan Kim <minchan@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, Linus Torvalds <torvalds@linux-foundation.org>, Vinayak Menon <vinmenon@codeaurora.org>, kernel-team@android.com Subject: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Date: Wed, 9 Dec 2020 16:39:49 +0000 [thread overview] Message-ID: <20201209163950.8494-2-will@kernel.org> (raw) In-Reply-To: <20201209163950.8494-1-will@kernel.org> Commit 5c0a85fad949 ("mm: make faultaround produce old ptes") changed the "faultaround" behaviour to initialise prefaulted PTEs as 'old', since this avoids vmscan wrongly assuming that they are hot, despite having never been explicitly accessed by userspace. The change has been shown to benefit numerous arm64 micro-architectures (with hardware access flag) running Android, where both application launch latency and direct reclaim time are significantly reduced. Unfortunately, commit 315d09bf30c2 ("Revert "mm: make faultaround produce old ptes"") reverted the change to it being identified as the cause of a ~6% regression in unixbench on x86. Experiments on a variety of recent arm64 micro-architectures indicate that unixbench is not affected by the original commit, yielding a 0-1% performance improvement. Since one size does not fit all for the initial state of prefaulted PTEs, introduce arch_wants_old_faultaround_pte(), which allows an architecture to opt-in to 'old' prefaulted PTEs at runtime based on whatever criteria it may have. Cc: Jan Kara <jack@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Will Deacon <will@kernel.org> --- include/linux/mm.h | 5 ++++- mm/memory.c | 31 ++++++++++++++++++++++++++++--- 2 files changed, 32 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index db6ae4d3fb4e..932886554586 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -426,6 +426,7 @@ extern pgprot_t protection_map[16]; * @FAULT_FLAG_REMOTE: The fault is not for current task/mm. * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch. * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals. + * @FAULT_FLAG_PREFAULT_OLD: Initialise pre-faulted PTEs in the 'old' state. * * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify * whether we would allow page faults to retry by specifying these two @@ -456,6 +457,7 @@ extern pgprot_t protection_map[16]; #define FAULT_FLAG_REMOTE 0x80 #define FAULT_FLAG_INSTRUCTION 0x100 #define FAULT_FLAG_INTERRUPTIBLE 0x200 +#define FAULT_FLAG_PREFAULT_OLD 0x400 /* * The default fault flags that should be used by most of the @@ -493,7 +495,8 @@ static inline bool fault_flag_allow_retry_first(unsigned int flags) { FAULT_FLAG_USER, "USER" }, \ { FAULT_FLAG_REMOTE, "REMOTE" }, \ { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \ - { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" } + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \ + { FAULT_FLAG_PREFAULT_OLD, "PREFAULT_OLD" } /* * vm_fault is filled by the pagefault handler and passed to the vma's diff --git a/mm/memory.c b/mm/memory.c index c48f8df6e502..6b30c15120e7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -134,6 +134,18 @@ static inline bool arch_faults_on_old_pte(void) } #endif +#ifndef arch_wants_old_faultaround_pte +static inline bool arch_wants_old_faultaround_pte(void) +{ + /* + * Transitioning a PTE from 'old' to 'young' can be expensive on + * some architectures, even if it's performed in hardware. By + * default, "false" means prefaulted entries will be 'young'. + */ + return false; +} +#endif + static int __init disable_randmaps(char *s) { randomize_va_space = 0; @@ -3788,6 +3800,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page) { struct vm_area_struct *vma = vmf->vma; bool write = vmf->flags & FAULT_FLAG_WRITE; + bool old = vmf->flags & FAULT_FLAG_PREFAULT_OLD; pte_t entry; vm_fault_t ret; @@ -3811,7 +3824,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page) flush_icache_page(vma, page); entry = mk_pte(page, vma->vm_page_prot); - entry = pte_sw_mkyoung(entry); + entry = old ? pte_mkold(entry) : pte_sw_mkyoung(entry); if (write) entry = maybe_mkwrite(pte_mkdirty(entry), vma); /* copy-on-write page */ @@ -3964,6 +3977,9 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) smp_wmb(); /* See comment in __pte_alloc() */ } + if (arch_wants_old_faultaround_pte()) + vmf->flags |= FAULT_FLAG_PREFAULT_OLD; + vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff); /* Huge page is mapped? Page fault is solved */ @@ -3978,8 +3994,17 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) /* check if the page fault is solved */ vmf->pte -= (vmf->address >> PAGE_SHIFT) - (address >> PAGE_SHIFT); - if (!pte_none(*vmf->pte)) - ret = VM_FAULT_NOPAGE; + if (pte_none(*vmf->pte)) + goto out_unlock; + + if (vmf->flags & FAULT_FLAG_PREFAULT_OLD) { + pte_t pte = pte_mkyoung(*vmf->pte); + if (ptep_set_access_flags(vmf->vma, address, vmf->pte, pte, 0)) + update_mmu_cache(vmf->vma, address, vmf->pte); + } + + ret = VM_FAULT_NOPAGE; +out_unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); out: vmf->address = address; -- 2.29.2.576.ga3fc446d84-goog
WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org> To: linux-kernel@vger.kernel.org Cc: kernel-team@android.com, Jan Kara <jack@suse.cz>, Minchan Kim <minchan@kernel.org>, Catalin Marinas <catalin.marinas@arm.com>, Linus Torvalds <torvalds@linux-foundation.org>, linux-mm@kvack.org, Vinayak Menon <vinmenon@codeaurora.org>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, Andrew Morton <akpm@linux-foundation.org>, Will Deacon <will@kernel.org>, linux-arm-kernel@lists.infradead.org Subject: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Date: Wed, 9 Dec 2020 16:39:49 +0000 [thread overview] Message-ID: <20201209163950.8494-2-will@kernel.org> (raw) In-Reply-To: <20201209163950.8494-1-will@kernel.org> Commit 5c0a85fad949 ("mm: make faultaround produce old ptes") changed the "faultaround" behaviour to initialise prefaulted PTEs as 'old', since this avoids vmscan wrongly assuming that they are hot, despite having never been explicitly accessed by userspace. The change has been shown to benefit numerous arm64 micro-architectures (with hardware access flag) running Android, where both application launch latency and direct reclaim time are significantly reduced. Unfortunately, commit 315d09bf30c2 ("Revert "mm: make faultaround produce old ptes"") reverted the change to it being identified as the cause of a ~6% regression in unixbench on x86. Experiments on a variety of recent arm64 micro-architectures indicate that unixbench is not affected by the original commit, yielding a 0-1% performance improvement. Since one size does not fit all for the initial state of prefaulted PTEs, introduce arch_wants_old_faultaround_pte(), which allows an architecture to opt-in to 'old' prefaulted PTEs at runtime based on whatever criteria it may have. Cc: Jan Kara <jack@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Will Deacon <will@kernel.org> --- include/linux/mm.h | 5 ++++- mm/memory.c | 31 ++++++++++++++++++++++++++++--- 2 files changed, 32 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index db6ae4d3fb4e..932886554586 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -426,6 +426,7 @@ extern pgprot_t protection_map[16]; * @FAULT_FLAG_REMOTE: The fault is not for current task/mm. * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch. * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals. + * @FAULT_FLAG_PREFAULT_OLD: Initialise pre-faulted PTEs in the 'old' state. * * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify * whether we would allow page faults to retry by specifying these two @@ -456,6 +457,7 @@ extern pgprot_t protection_map[16]; #define FAULT_FLAG_REMOTE 0x80 #define FAULT_FLAG_INSTRUCTION 0x100 #define FAULT_FLAG_INTERRUPTIBLE 0x200 +#define FAULT_FLAG_PREFAULT_OLD 0x400 /* * The default fault flags that should be used by most of the @@ -493,7 +495,8 @@ static inline bool fault_flag_allow_retry_first(unsigned int flags) { FAULT_FLAG_USER, "USER" }, \ { FAULT_FLAG_REMOTE, "REMOTE" }, \ { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \ - { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" } + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \ + { FAULT_FLAG_PREFAULT_OLD, "PREFAULT_OLD" } /* * vm_fault is filled by the pagefault handler and passed to the vma's diff --git a/mm/memory.c b/mm/memory.c index c48f8df6e502..6b30c15120e7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -134,6 +134,18 @@ static inline bool arch_faults_on_old_pte(void) } #endif +#ifndef arch_wants_old_faultaround_pte +static inline bool arch_wants_old_faultaround_pte(void) +{ + /* + * Transitioning a PTE from 'old' to 'young' can be expensive on + * some architectures, even if it's performed in hardware. By + * default, "false" means prefaulted entries will be 'young'. + */ + return false; +} +#endif + static int __init disable_randmaps(char *s) { randomize_va_space = 0; @@ -3788,6 +3800,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page) { struct vm_area_struct *vma = vmf->vma; bool write = vmf->flags & FAULT_FLAG_WRITE; + bool old = vmf->flags & FAULT_FLAG_PREFAULT_OLD; pte_t entry; vm_fault_t ret; @@ -3811,7 +3824,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page) flush_icache_page(vma, page); entry = mk_pte(page, vma->vm_page_prot); - entry = pte_sw_mkyoung(entry); + entry = old ? pte_mkold(entry) : pte_sw_mkyoung(entry); if (write) entry = maybe_mkwrite(pte_mkdirty(entry), vma); /* copy-on-write page */ @@ -3964,6 +3977,9 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) smp_wmb(); /* See comment in __pte_alloc() */ } + if (arch_wants_old_faultaround_pte()) + vmf->flags |= FAULT_FLAG_PREFAULT_OLD; + vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff); /* Huge page is mapped? Page fault is solved */ @@ -3978,8 +3994,17 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) /* check if the page fault is solved */ vmf->pte -= (vmf->address >> PAGE_SHIFT) - (address >> PAGE_SHIFT); - if (!pte_none(*vmf->pte)) - ret = VM_FAULT_NOPAGE; + if (pte_none(*vmf->pte)) + goto out_unlock; + + if (vmf->flags & FAULT_FLAG_PREFAULT_OLD) { + pte_t pte = pte_mkyoung(*vmf->pte); + if (ptep_set_access_flags(vmf->vma, address, vmf->pte, pte, 0)) + update_mmu_cache(vmf->vma, address, vmf->pte); + } + + ret = VM_FAULT_NOPAGE; +out_unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); out: vmf->address = address; -- 2.29.2.576.ga3fc446d84-goog _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-12-09 16:41 UTC|newest] Thread overview: 138+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-12-09 16:39 [PATCH 0/2] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon 2020-12-09 16:39 ` Will Deacon 2020-12-09 16:39 ` Will Deacon [this message] 2020-12-09 16:39 ` [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Will Deacon 2020-12-09 17:58 ` Linus Torvalds 2020-12-09 17:58 ` Linus Torvalds 2020-12-09 17:58 ` Linus Torvalds 2020-12-09 18:40 ` Will Deacon 2020-12-09 18:40 ` Will Deacon 2020-12-09 19:04 ` Linus Torvalds 2020-12-09 19:04 ` Linus Torvalds 2020-12-09 19:04 ` Linus Torvalds 2020-12-09 20:32 ` Matthew Wilcox 2020-12-09 20:32 ` Matthew Wilcox 2020-12-09 21:04 ` Linus Torvalds 2020-12-09 21:04 ` Linus Torvalds 2020-12-09 21:04 ` Linus Torvalds 2020-12-10 15:08 ` Kirill A. Shutemov 2020-12-10 15:08 ` Kirill A. Shutemov 2020-12-10 17:23 ` Linus Torvalds 2020-12-10 17:23 ` Linus Torvalds 2020-12-10 17:23 ` Linus Torvalds 2020-12-14 16:07 ` Kirill A. Shutemov 2020-12-14 16:07 ` Kirill A. Shutemov 2020-12-14 17:54 ` Linus Torvalds 2020-12-14 17:54 ` Linus Torvalds 2020-12-14 17:54 ` Linus Torvalds 2020-12-14 18:56 ` Matthew Wilcox 2020-12-14 18:56 ` Matthew Wilcox 2020-12-16 17:07 ` Kirill A. Shutemov 2020-12-16 17:07 ` Kirill A. Shutemov 2020-12-16 18:41 ` Linus Torvalds 2020-12-16 18:41 ` Linus Torvalds 2020-12-16 18:41 ` Linus Torvalds 2020-12-17 10:54 ` Kirill A. Shutemov 2020-12-17 10:54 ` Kirill A. Shutemov 2020-12-17 18:22 ` Linus Torvalds 2020-12-17 18:22 ` Linus Torvalds 2020-12-17 18:22 ` Linus Torvalds 2020-12-18 11:04 ` Kirill A. Shutemov 2020-12-18 11:04 ` Kirill A. Shutemov 2020-12-18 18:56 ` Linus Torvalds 2020-12-18 18:56 ` Linus Torvalds 2020-12-18 18:56 ` Linus Torvalds 2020-12-19 12:41 ` Kirill A. Shutemov 2020-12-19 12:41 ` Kirill A. Shutemov 2020-12-19 20:08 ` Linus Torvalds 2020-12-19 20:08 ` Linus Torvalds 2020-12-19 20:08 ` Linus Torvalds 2020-12-19 20:34 ` Linus Torvalds 2020-12-19 20:34 ` Linus Torvalds 2020-12-19 20:34 ` Linus Torvalds 2020-12-22 10:00 ` Kirill A. Shutemov 2020-12-22 10:00 ` Kirill A. Shutemov 2020-12-24 4:04 ` Hugh Dickins 2020-12-24 4:04 ` Hugh Dickins 2020-12-24 4:04 ` Hugh Dickins 2020-12-25 11:31 ` Kirill A. Shutemov 2020-12-25 11:31 ` Kirill A. Shutemov 2020-12-26 17:57 ` Linus Torvalds 2020-12-26 17:57 ` Linus Torvalds 2020-12-26 17:57 ` Linus Torvalds 2020-12-26 20:43 ` Kirill A. Shutemov 2020-12-26 20:43 ` Kirill A. Shutemov 2020-12-26 21:03 ` Hugh Dickins 2020-12-26 21:03 ` Hugh Dickins 2020-12-26 21:03 ` Hugh Dickins 2020-12-26 21:16 ` Linus Torvalds 2020-12-26 21:16 ` Linus Torvalds 2020-12-26 21:16 ` Linus Torvalds 2020-12-26 22:40 ` Kirill A. Shutemov 2020-12-26 22:40 ` Kirill A. Shutemov 2020-12-27 0:45 ` Hugh Dickins 2020-12-27 0:45 ` Hugh Dickins 2020-12-27 0:45 ` Hugh Dickins 2020-12-27 2:38 ` Hugh Dickins 2020-12-27 2:38 ` Hugh Dickins 2020-12-27 2:38 ` Hugh Dickins 2020-12-27 19:38 ` Linus Torvalds 2020-12-27 19:38 ` Linus Torvalds 2020-12-27 19:38 ` Linus Torvalds 2020-12-27 20:32 ` Damian Tometzki 2020-12-27 20:32 ` Damian Tometzki 2020-12-27 22:35 ` Hugh Dickins 2020-12-27 22:35 ` Hugh Dickins 2020-12-27 22:35 ` Hugh Dickins 2020-12-27 23:12 ` Linus Torvalds 2020-12-27 23:12 ` Linus Torvalds 2020-12-27 23:12 ` Linus Torvalds 2020-12-27 23:40 ` Linus Torvalds 2020-12-27 23:40 ` Linus Torvalds 2020-12-27 23:40 ` Linus Torvalds 2020-12-27 23:55 ` Kirill A. Shutemov 2020-12-27 23:55 ` Kirill A. Shutemov 2020-12-27 23:48 ` Kirill A. Shutemov 2020-12-27 23:48 ` Kirill A. Shutemov 2020-12-28 1:54 ` Linus Torvalds 2020-12-28 1:54 ` Linus Torvalds 2020-12-28 1:54 ` Linus Torvalds 2020-12-28 6:43 ` Hugh Dickins 2020-12-28 6:43 ` Hugh Dickins 2020-12-28 6:43 ` Hugh Dickins 2020-12-28 12:53 ` Kirill A. Shutemov 2020-12-28 12:53 ` Kirill A. Shutemov 2020-12-28 18:47 ` Linus Torvalds 2020-12-28 18:47 ` Linus Torvalds 2020-12-28 18:47 ` Linus Torvalds 2020-12-28 21:58 ` Linus Torvalds 2020-12-28 21:58 ` Linus Torvalds 2020-12-28 21:58 ` Linus Torvalds 2020-12-29 13:28 ` Kirill A. Shutemov 2020-12-29 13:28 ` Kirill A. Shutemov 2020-12-29 15:19 ` Matthew Wilcox 2020-12-29 15:19 ` Matthew Wilcox 2020-12-29 20:52 ` Linus Torvalds 2020-12-29 20:52 ` Linus Torvalds 2020-12-29 20:52 ` Linus Torvalds 2020-12-28 22:05 ` Kirill A. Shutemov 2020-12-28 22:05 ` Kirill A. Shutemov 2020-12-28 22:12 ` Kirill A. Shutemov 2020-12-28 22:12 ` Kirill A. Shutemov 2020-12-29 4:35 ` Hugh Dickins 2020-12-29 4:35 ` Hugh Dickins 2020-12-29 4:35 ` Hugh Dickins 2020-12-28 23:28 ` Linus Torvalds 2020-12-28 23:28 ` Linus Torvalds 2020-12-28 23:28 ` Linus Torvalds 2020-12-26 21:07 ` Linus Torvalds 2020-12-26 21:07 ` Linus Torvalds 2020-12-26 21:07 ` Linus Torvalds 2020-12-26 21:41 ` Matthew Wilcox 2020-12-26 21:41 ` Matthew Wilcox 2020-12-09 16:39 ` [PATCH 2/2] arm64: mm: Implement arch_wants_old_faultaround_pte() Will Deacon 2020-12-09 16:39 ` Will Deacon 2020-12-09 18:35 ` Catalin Marinas 2020-12-09 18:35 ` Catalin Marinas 2020-12-09 18:46 ` Will Deacon 2020-12-09 18:46 ` Will Deacon
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201209163950.8494-2-will@kernel.org \ --to=will@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=catalin.marinas@arm.com \ --cc=jack@suse.cz \ --cc=kernel-team@android.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=minchan@kernel.org \ --cc=torvalds@linux-foundation.org \ --cc=vinmenon@codeaurora.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.