Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v10 0/3] fix double page fault on arm64
@ 2019-09-30  1:57 Jia He
  2019-09-30  1:57 ` [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af() Jia He
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jia He @ 2019-09-30  1:57 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm
  Cc: Punit Agrawal, Thomas Gleixner, Andrew Morton, hejianet,
	Kaly Xin, Jia He

When we tested pmdk unit test vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

Changes
v10:
    add r-b from Catalin and a-b from Kirill in PATCH 03
    remoe Reported-by in PATCH 01
v9: refactor cow_user_page for indention optimization (Catalin)
    hold the ptl longer (Catalin)
v8: change cow_user_page's return type (Matthew)
v7: s/pte_spinlock/pte_offset_map_lock (Kirill)
v6: fix error case of returning with spinlock taken (Catalin)
    move kmap_atomic to avoid handling kunmap_atomic
v5: handle the case correctly when !pte_same
    fix kbuild test failed
v4: introduce cpu_has_hw_af (Suzuki)
    bail out if !pte_same (Kirill)
v3: add vmf->ptl lock/unlock (Kirill A. Shutemov)
    add arch_faults_on_old_pte (Matthew, Catalin)
v2: remove FAULT_FLAG_WRITE when setting pte access flag (Catalin)

Jia He (3):
  arm64: cpufeature: introduce helper cpu_has_hw_af()
  arm64: mm: implement arch_faults_on_old_pte() on arm64
  mm: fix double page fault on arm64 if PTE_AF is cleared

 arch/arm64/include/asm/cpufeature.h | 10 +++
 arch/arm64/include/asm/pgtable.h    | 14 ++++
 mm/memory.c                         | 99 ++++++++++++++++++++++++-----
 3 files changed, 108 insertions(+), 15 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-09-30  1:57 [PATCH v10 0/3] fix double page fault on arm64 Jia He
@ 2019-09-30  1:57 ` Jia He
  2019-10-01 12:54   ` Will Deacon
  2019-09-30  1:57 ` [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64 Jia He
  2019-09-30  1:57 ` [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
  2 siblings, 1 reply; 21+ messages in thread
From: Jia He @ 2019-09-30  1:57 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm
  Cc: Punit Agrawal, Thomas Gleixner, Andrew Morton, hejianet,
	Kaly Xin, Jia He

We unconditionally set the HW_AFDBM capability and only enable it on
CPUs which really have the feature. But sometimes we need to know
whether this cpu has the capability of HW AF. So decouple AF from
DBM by new helper cpu_has_hw_af().

Signed-off-by: Jia He <justin.he@arm.com>
Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 9cde5d2e768f..949bc7c85030 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -659,6 +659,16 @@ static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
 	default: return CONFIG_ARM64_PA_BITS;
 	}
 }
+
+/* Check whether hardware update of the Access flag is supported */
+static inline bool cpu_has_hw_af(void)
+{
+	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
+		return read_cpuid(ID_AA64MMFR1_EL1) & 0xf;
+
+	return false;
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif
-- 
2.17.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64
  2019-09-30  1:57 [PATCH v10 0/3] fix double page fault on arm64 Jia He
  2019-09-30  1:57 ` [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af() Jia He
@ 2019-09-30  1:57 ` Jia He
  2019-10-01 12:50   ` Will Deacon
  2019-09-30  1:57 ` [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
  2 siblings, 1 reply; 21+ messages in thread
From: Jia He @ 2019-09-30  1:57 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm
  Cc: Punit Agrawal, Thomas Gleixner, Andrew Morton, hejianet,
	Kaly Xin, Jia He

On arm64 without hardware Access Flag, copying fromuser will fail because
the pte is old and cannot be marked young. So we always end up with zeroed
page after fork() + CoW for pfn mappings. we don't always have a
hardware-managed access flag on arm64.

Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
cause page fault when accessing old pte.

Signed-off-by: Jia He <justin.he@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7576df00eb50..e96fb82f62de 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 #define phys_to_ttbr(addr)	(addr)
 #endif
 
+/*
+ * On arm64 without hardware Access Flag, copying from user will fail because
+ * the pte is old and cannot be marked young. So we always end up with zeroed
+ * page after fork() + CoW for pfn mappings. We don't always have a
+ * hardware-managed access flag on arm64.
+ */
+static inline bool arch_faults_on_old_pte(void)
+{
+	WARN_ON(preemptible());
+
+	return !cpu_has_hw_af();
+}
+#define arch_faults_on_old_pte arch_faults_on_old_pte
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_PGTABLE_H */
-- 
2.17.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-30  1:57 [PATCH v10 0/3] fix double page fault on arm64 Jia He
  2019-09-30  1:57 ` [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af() Jia He
  2019-09-30  1:57 ` [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64 Jia He
@ 2019-09-30  1:57 ` Jia He
  2019-10-01 12:54   ` Will Deacon
  2 siblings, 1 reply; 21+ messages in thread
From: Jia He @ 2019-09-30  1:57 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm
  Cc: Punit Agrawal, Thomas Gleixner, Andrew Morton, hejianet,
	Kaly Xin, Jia He

When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

Below call trace is from arm64 do_page_fault for debugging purpose
[  110.016195] Call trace:
[  110.016826]  do_page_fault+0x5a4/0x690
[  110.017812]  do_mem_abort+0x50/0xb0
[  110.018726]  el1_da+0x20/0xc4
[  110.019492]  __arch_copy_from_user+0x180/0x280
[  110.020646]  do_wp_page+0xb0/0x860
[  110.021517]  __handle_mm_fault+0x994/0x1338
[  110.022606]  handle_mm_fault+0xe8/0x180
[  110.023584]  do_page_fault+0x240/0x690
[  110.024535]  do_mem_abort+0x50/0xb0
[  110.025423]  el0_da+0x20/0x24

The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

This patch fix it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()

Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
in case there can be some obscure use-case.(by Kirill)

[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork

Signed-off-by: Jia He <justin.he@arm.com>
Reported-by: Yibo Cai <Yibo.Cai@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c | 99 +++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 84 insertions(+), 15 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index b1ca51a079f2..1f56b0118ef5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
 					2;
 #endif
 
+#ifndef arch_faults_on_old_pte
+static inline bool arch_faults_on_old_pte(void)
+{
+	return false;
+}
+#endif
+
 static int __init disable_randmaps(char *s)
 {
 	randomize_va_space = 0;
@@ -2145,32 +2152,82 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
 	return same;
 }
 
-static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
+static inline bool cow_user_page(struct page *dst, struct page *src,
+				 struct vm_fault *vmf)
 {
+	bool ret;
+	void *kaddr;
+	void __user *uaddr;
+	bool force_mkyoung;
+	struct vm_area_struct *vma = vmf->vma;
+	struct mm_struct *mm = vma->vm_mm;
+	unsigned long addr = vmf->address;
+
 	debug_dma_assert_idle(src);
 
+	if (likely(src)) {
+		copy_user_highpage(dst, src, addr, vma);
+		return true;
+	}
+
 	/*
 	 * If the source page was a PFN mapping, we don't have
 	 * a "struct page" for it. We do a best-effort copy by
 	 * just copying from the original user address. If that
 	 * fails, we just zero-fill it. Live with it.
 	 */
-	if (unlikely(!src)) {
-		void *kaddr = kmap_atomic(dst);
-		void __user *uaddr = (void __user *)(va & PAGE_MASK);
+	kaddr = kmap_atomic(dst);
+	uaddr = (void __user *)(addr & PAGE_MASK);
+
+	/*
+	 * On architectures with software "accessed" bits, we would
+	 * take a double page fault, so mark it accessed here.
+	 */
+	force_mkyoung = arch_faults_on_old_pte() && !pte_young(vmf->orig_pte);
+	if (force_mkyoung) {
+		pte_t entry;
+
+		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
+		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
+			/*
+			 * Other thread has already handled the fault
+			 * and we don't need to do anything. If it's
+			 * not the case, the fault will be triggered
+			 * again on the same address.
+			 */
+			ret = false;
+			goto pte_unlock;
+		}
+
+		entry = pte_mkyoung(vmf->orig_pte);
+		if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0))
+			update_mmu_cache(vma, addr, vmf->pte);
+	}
 
+	/*
+	 * This really shouldn't fail, because the page is there
+	 * in the page tables. But it might just be unreadable,
+	 * in which case we just give up and fill the result with
+	 * zeroes.
+	 */
+	if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
 		/*
-		 * This really shouldn't fail, because the page is there
-		 * in the page tables. But it might just be unreadable,
-		 * in which case we just give up and fill the result with
-		 * zeroes.
+		 * Give a warn in case there can be some obscure
+		 * use-case
 		 */
-		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
-			clear_page(kaddr);
-		kunmap_atomic(kaddr);
-		flush_dcache_page(dst);
-	} else
-		copy_user_highpage(dst, src, va, vma);
+		WARN_ON_ONCE(1);
+		clear_page(kaddr);
+	}
+
+	ret = true;
+
+pte_unlock:
+	if (force_mkyoung)
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+	kunmap_atomic(kaddr);
+	flush_dcache_page(dst);
+
+	return ret;
 }
 
 static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
@@ -2327,7 +2384,19 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 				vmf->address);
 		if (!new_page)
 			goto oom;
-		cow_user_page(new_page, old_page, vmf->address, vma);
+
+		if (!cow_user_page(new_page, old_page, vmf)) {
+			/*
+			 * COW failed, if the fault was solved by other,
+			 * it's fine. If not, userspace would re-fault on
+			 * the same address and we will handle the fault
+			 * from the second attempt.
+			 */
+			put_page(new_page);
+			if (old_page)
+				put_page(old_page);
+			return 0;
+		}
 	}
 
 	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
-- 
2.17.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64
  2019-09-30  1:57 ` [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64 Jia He
@ 2019-10-01 12:50   ` Will Deacon
  2019-10-01 13:32     ` Marc Zyngier
  0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2019-10-01 12:50 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Thomas Gleixner,
	Andrew Morton, hejianet, Kaly Xin

On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:
> On arm64 without hardware Access Flag, copying fromuser will fail because
> the pte is old and cannot be marked young. So we always end up with zeroed
> page after fork() + CoW for pfn mappings. we don't always have a
> hardware-managed access flag on arm64.
> 
> Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
> cause page fault when accessing old pte.
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7576df00eb50..e96fb82f62de 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  #define phys_to_ttbr(addr)	(addr)
>  #endif
>  
> +/*
> + * On arm64 without hardware Access Flag, copying from user will fail because
> + * the pte is old and cannot be marked young. So we always end up with zeroed
> + * page after fork() + CoW for pfn mappings. We don't always have a
> + * hardware-managed access flag on arm64.
> + */
> +static inline bool arch_faults_on_old_pte(void)
> +{
> +	WARN_ON(preemptible());
> +
> +	return !cpu_has_hw_af();
> +}

Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in that
case, despite not being the case on the host?)

Will


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-30  1:57 ` [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
@ 2019-10-01 12:54   ` Will Deacon
  2019-10-08  2:19     ` Justin He (Arm Technology China)
  0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2019-10-01 12:54 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Thomas Gleixner,
	Andrew Morton, hejianet, Kaly Xin

On Mon, Sep 30, 2019 at 09:57:40AM +0800, Jia He wrote:
> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> will be a double page fault in __copy_from_user_inatomic of cow_user_page.
> 
> Below call trace is from arm64 do_page_fault for debugging purpose
> [  110.016195] Call trace:
> [  110.016826]  do_page_fault+0x5a4/0x690
> [  110.017812]  do_mem_abort+0x50/0xb0
> [  110.018726]  el1_da+0x20/0xc4
> [  110.019492]  __arch_copy_from_user+0x180/0x280
> [  110.020646]  do_wp_page+0xb0/0x860
> [  110.021517]  __handle_mm_fault+0x994/0x1338
> [  110.022606]  handle_mm_fault+0xe8/0x180
> [  110.023584]  do_page_fault+0x240/0x690
> [  110.024535]  do_mem_abort+0x50/0xb0
> [  110.025423]  el0_da+0x20/0x24
> 
> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
> 
> As told by Catalin: "On arm64 without hardware Access Flag, copying from
> user will fail because the pte is old and cannot be marked young. So we
> always end up with zeroed page after fork() + CoW for pfn mappings. we
> don't always have a hardware-managed access flag on arm64."
> 
> This patch fix it by calling pte_mkyoung. Also, the parameter is
> changed because vmf should be passed to cow_user_page()
> 
> Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
> in case there can be some obscure use-case.(by Kirill)
> 
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/memory.c | 99 +++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 84 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index b1ca51a079f2..1f56b0118ef5 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
>  					2;
>  #endif
>  
> +#ifndef arch_faults_on_old_pte
> +static inline bool arch_faults_on_old_pte(void)
> +{
> +	return false;
> +}
> +#endif

Kirill has acked this, so I'm happy to take the patch as-is, however isn't
it the case that /most/ architectures will want to return true for
arch_faults_on_old_pte()? In which case, wouldn't it make more sense for
that to be the default, and have x86 and arm64 provide an override? For
example, aren't most architectures still going to hit the double fault
scenario even with your patch applied?

Will


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-09-30  1:57 ` [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af() Jia He
@ 2019-10-01 12:54   ` Will Deacon
  2019-10-01 13:18     ` Marc Zyngier
  0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2019-10-01 12:54 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Thomas Gleixner,
	Andrew Morton, hejianet, Kaly Xin

On Mon, Sep 30, 2019 at 09:57:38AM +0800, Jia He wrote:
> We unconditionally set the HW_AFDBM capability and only enable it on
> CPUs which really have the feature. But sometimes we need to know
> whether this cpu has the capability of HW AF. So decouple AF from
> DBM by new helper cpu_has_hw_af().
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/include/asm/cpufeature.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 9cde5d2e768f..949bc7c85030 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -659,6 +659,16 @@ static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
>  	default: return CONFIG_ARM64_PA_BITS;
>  	}
>  }
> +
> +/* Check whether hardware update of the Access flag is supported */
> +static inline bool cpu_has_hw_af(void)
> +{
> +	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
> +		return read_cpuid(ID_AA64MMFR1_EL1) & 0xf;

0xf? I think we should have a mask in sysreg.h for this constant.

Will


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-10-01 12:54   ` Will Deacon
@ 2019-10-01 13:18     ` Marc Zyngier
  2019-10-08  1:12       ` Justin He (Arm Technology China)
  0 siblings, 1 reply; 21+ messages in thread
From: Marc Zyngier @ 2019-10-01 13:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jia He, Catalin Marinas, Mark Rutland, James Morse,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Thomas Gleixner,
	Andrew Morton, hejianet, Kaly Xin

On Tue, 1 Oct 2019 13:54:47 +0100
Will Deacon <will@kernel.org> wrote:

> On Mon, Sep 30, 2019 at 09:57:38AM +0800, Jia He wrote:
> > We unconditionally set the HW_AFDBM capability and only enable it on
> > CPUs which really have the feature. But sometimes we need to know
> > whether this cpu has the capability of HW AF. So decouple AF from
> > DBM by new helper cpu_has_hw_af().
> > 
> > Signed-off-by: Jia He <justin.he@arm.com>
> > Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  arch/arm64/include/asm/cpufeature.h | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> > index 9cde5d2e768f..949bc7c85030 100644
> > --- a/arch/arm64/include/asm/cpufeature.h
> > +++ b/arch/arm64/include/asm/cpufeature.h
> > @@ -659,6 +659,16 @@ static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
> >  	default: return CONFIG_ARM64_PA_BITS;
> >  	}
> >  }
> > +
> > +/* Check whether hardware update of the Access flag is supported */
> > +static inline bool cpu_has_hw_af(void)
> > +{
> > +	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
> > +		return read_cpuid(ID_AA64MMFR1_EL1) & 0xf;  
> 
> 0xf? I think we should have a mask in sysreg.h for this constant.

We don't have the mask, but we certainly have the shift.

GENMASK(ID_AA64MMFR1_HADBS_SHIFT + 3, ID_AA64MMFR1_HADBS_SHIFT) is a bit
of a mouthful though. Ideally, we'd have a helper for that.

	M.
-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64
  2019-10-01 12:50   ` Will Deacon
@ 2019-10-01 13:32     ` Marc Zyngier
  2019-10-08  1:55       ` Justin He (Arm Technology China)
  0 siblings, 1 reply; 21+ messages in thread
From: Marc Zyngier @ 2019-10-01 13:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jia He, Catalin Marinas, Mark Rutland, James Morse,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Thomas Gleixner,
	Andrew Morton, hejianet, Kaly Xin

On Tue, 1 Oct 2019 13:50:32 +0100
Will Deacon <will@kernel.org> wrote:

> On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:
> > On arm64 without hardware Access Flag, copying fromuser will fail because
> > the pte is old and cannot be marked young. So we always end up with zeroed
> > page after fork() + CoW for pfn mappings. we don't always have a
> > hardware-managed access flag on arm64.
> > 
> > Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
> > cause page fault when accessing old pte.
> > 
> > Signed-off-by: Jia He <justin.he@arm.com>
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index 7576df00eb50..e96fb82f62de 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> >  #define phys_to_ttbr(addr)	(addr)
> >  #endif
> >  
> > +/*
> > + * On arm64 without hardware Access Flag, copying from user will fail because
> > + * the pte is old and cannot be marked young. So we always end up with zeroed
> > + * page after fork() + CoW for pfn mappings. We don't always have a
> > + * hardware-managed access flag on arm64.
> > + */
> > +static inline bool arch_faults_on_old_pte(void)
> > +{
> > +	WARN_ON(preemptible());
> > +
> > +	return !cpu_has_hw_af();
> > +}  
> 
> Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in that
> case, despite not being the case on the host?)

Yup, all the 64bit MMFRs are trapped (HCR_EL2.TID3 is set for an
AArch64 guest), and we return the sanitised version.

But that's an interesting remark: we're now trading an extra fault on
CPUs that do not support HWAFDBS for a guaranteed trap for each and
every guest under the sun that will hit the COW path...

My gut feeling is that this is going to be pretty visible. Jia, do you
have any numbers for this kind of behaviour?

Thanks,

	M.
-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-10-01 13:18     ` Marc Zyngier
@ 2019-10-08  1:12       ` Justin He (Arm Technology China)
  2019-10-08 15:32         ` Suzuki K Poulose
  0 siblings, 1 reply; 21+ messages in thread
From: Justin He (Arm Technology China) @ 2019-10-08  1:12 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Catalin Marinas, Mark Rutland, James Morse, Matthew Wilcox,
	Kirill A. Shutemov, linux-arm-kernel, linux-kernel, linux-mm,
	Punit Agrawal, Thomas Gleixner, Andrew Morton, hejianet,
	Kaly Xin (Arm Technology China)

Hi Will and Marc
Sorry for the late response, just came back from a vacation.

> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: 2019Äê10ÔÂ1ÈÕ 21:19
> To: Will Deacon <will@kernel.org>
> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
> Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com; Kaly
> Xin (Arm Technology China) <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v10 1/3] arm64: cpufeature: introduce helper
> cpu_has_hw_af()
>
> On Tue, 1 Oct 2019 13:54:47 +0100
> Will Deacon <will@kernel.org> wrote:
>
> > On Mon, Sep 30, 2019 at 09:57:38AM +0800, Jia He wrote:
> > > We unconditionally set the HW_AFDBM capability and only enable it on
> > > CPUs which really have the feature. But sometimes we need to know
> > > whether this cpu has the capability of HW AF. So decouple AF from
> > > DBM by new helper cpu_has_hw_af().
> > >
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
> > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > > ---
> > >  arch/arm64/include/asm/cpufeature.h | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/cpufeature.h
> b/arch/arm64/include/asm/cpufeature.h
> > > index 9cde5d2e768f..949bc7c85030 100644
> > > --- a/arch/arm64/include/asm/cpufeature.h
> > > +++ b/arch/arm64/include/asm/cpufeature.h
> > > @@ -659,6 +659,16 @@ static inline u32
> id_aa64mmfr0_parange_to_phys_shift(int parange)
> > >   default: return CONFIG_ARM64_PA_BITS;
> > >   }
> > >  }
> > > +
> > > +/* Check whether hardware update of the Access flag is supported */
> > > +static inline bool cpu_has_hw_af(void)
> > > +{
> > > + if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
> > > +         return read_cpuid(ID_AA64MMFR1_EL1) & 0xf;
> >
> > 0xf? I think we should have a mask in sysreg.h for this constant.
>
> We don't have the mask, but we certainly have the shift.
>
> GENMASK(ID_AA64MMFR1_HADBS_SHIFT + 3,
> ID_AA64MMFR1_HADBS_SHIFT) is a bit
> of a mouthful though. Ideally, we'd have a helper for that.
>
Ok, I will implement the helper if there isn't so far.
And then replace the 0xf with it.


--
Cheers,
Justin (Jia He)


>       M.
> --
> Without deviation from the norm, progress is not possible.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64
  2019-10-01 13:32     ` Marc Zyngier
@ 2019-10-08  1:55       ` Justin He (Arm Technology China)
  2019-10-08  2:30         ` Justin He (Arm Technology China)
  2019-10-08  7:46         ` Marc Zyngier
  0 siblings, 2 replies; 21+ messages in thread
From: Justin He (Arm Technology China) @ 2019-10-08  1:55 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Catalin Marinas, Mark Rutland, James Morse, Matthew Wilcox,
	Kirill A. Shutemov, linux-arm-kernel, linux-kernel, linux-mm,
	Punit Agrawal, Thomas Gleixner, Andrew Morton, hejianet,
	Kaly Xin (Arm Technology China),
	nd

Hi Will and Marc

> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: 2019Äê10ÔÂ1ÈÕ 21:32
> To: Will Deacon <will@kernel.org>
> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
> Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com; Kaly
> Xin (Arm Technology China) <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v10 2/3] arm64: mm: implement
> arch_faults_on_old_pte() on arm64
> 
> On Tue, 1 Oct 2019 13:50:32 +0100
> Will Deacon <will@kernel.org> wrote:
> 
> > On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:
> > > On arm64 without hardware Access Flag, copying fromuser will fail
> because
> > > the pte is old and cannot be marked young. So we always end up with
> zeroed
> > > page after fork() + CoW for pfn mappings. we don't always have a
> > > hardware-managed access flag on arm64.
> > >
> > > Hence implement arch_faults_on_old_pte on arm64 to indicate that it
> might
> > > cause page fault when accessing old pte.
> > >
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > > ---
> > >  arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/pgtable.h
> b/arch/arm64/include/asm/pgtable.h
> > > index 7576df00eb50..e96fb82f62de 100644
> > > --- a/arch/arm64/include/asm/pgtable.h
> > > +++ b/arch/arm64/include/asm/pgtable.h
> > > @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct
> vm_area_struct *vma,
> > >  #define phys_to_ttbr(addr)	(addr)
> > >  #endif
> > >
> > > +/*
> > > + * On arm64 without hardware Access Flag, copying from user will fail
> because
> > > + * the pte is old and cannot be marked young. So we always end up
> with zeroed
> > > + * page after fork() + CoW for pfn mappings. We don't always have a
> > > + * hardware-managed access flag on arm64.
> > > + */
> > > +static inline bool arch_faults_on_old_pte(void)
> > > +{
> > > +	WARN_ON(preemptible());
> > > +
> > > +	return !cpu_has_hw_af();
> > > +}
> >
> > Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in
> that
> > case, despite not being the case on the host?)
> 
> Yup, all the 64bit MMFRs are trapped (HCR_EL2.TID3 is set for an
> AArch64 guest), and we return the sanitised version.
Thanks for Marc's explanation. I verified the patch series on a kvm guest (-M virt)
with simulated nvdimm device created by qemu. The host is ThunderX2 aarch64.

> 
> But that's an interesting remark: we're now trading an extra fault on
> CPUs that do not support HWAFDBS for a guaranteed trap for each and
> every guest under the sun that will hit the COW path...
> 
> My gut feeling is that this is going to be pretty visible. Jia, do you
> have any numbers for this kind of behaviour?
It is not a common COW path, but a COW for PFN mapping pages only.
I add a g_counter before pte_mkyoung in force_mkyoung{} when testing 
vmmalloc_fork at [1].

In this test case, it will start M fork processes and N pthreads. The default is
M=2,N=4. the g_counter is about 241, that is it will hit my patch series for 241
times.
If I set M=20 and N=40 for TEST3, the g_counter is about 1492.
  
[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork


--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-10-01 12:54   ` Will Deacon
@ 2019-10-08  2:19     ` Justin He (Arm Technology China)
  2019-10-08 12:39       ` Will Deacon
  0 siblings, 1 reply; 21+ messages in thread
From: Justin He (Arm Technology China) @ 2019-10-08  2:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Thomas Gleixner,
	Andrew Morton, hejianet, Kaly Xin (Arm Technology China),
	nd

Hi Will

> -----Original Message-----
> From: Will Deacon <will@kernel.org>
> Sent: 2019Äê10ÔÂ1ÈÕ 20:54
> To: Justin He (Arm Technology China) <Justin.He@arm.com>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>; Marc
> Zyngier <maz@kernel.org>; Matthew Wilcox <willy@infradead.org>; Kirill A.
> Shutemov <kirill.shutemov@linux.intel.com>; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org; Punit Agrawal <punitagrawal@gmail.com>; Thomas
> Gleixner <tglx@linutronix.de>; Andrew Morton <akpm@linux-
> foundation.org>; hejianet@gmail.com; Kaly Xin (Arm Technology China)
> <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF
> is cleared
> 
> On Mon, Sep 30, 2019 at 09:57:40AM +0800, Jia He wrote:
> > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest,
> there
> > will be a double page fault in __copy_from_user_inatomic of
> cow_user_page.
> >
> > Below call trace is from arm64 do_page_fault for debugging purpose
> > [  110.016195] Call trace:
> > [  110.016826]  do_page_fault+0x5a4/0x690
> > [  110.017812]  do_mem_abort+0x50/0xb0
> > [  110.018726]  el1_da+0x20/0xc4
> > [  110.019492]  __arch_copy_from_user+0x180/0x280
> > [  110.020646]  do_wp_page+0xb0/0x860
> > [  110.021517]  __handle_mm_fault+0x994/0x1338
> > [  110.022606]  handle_mm_fault+0xe8/0x180
> > [  110.023584]  do_page_fault+0x240/0x690
> > [  110.024535]  do_mem_abort+0x50/0xb0
> > [  110.025423]  el0_da+0x20/0x24
> >
> > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
> pmd=000000023d4b3003, pte=360000298607bd3
> >
> > As told by Catalin: "On arm64 without hardware Access Flag, copying
> from
> > user will fail because the pte is old and cannot be marked young. So we
> > always end up with zeroed page after fork() + CoW for pfn mappings. we
> > don't always have a hardware-managed access flag on arm64."
> >
> > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > changed because vmf should be passed to cow_user_page()
> >
> > Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns
> error
> > in case there can be some obscure use-case.(by Kirill)
> >
> > [1]
> https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> >
> > Signed-off-by: Jia He <justin.he@arm.com>
> > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  mm/memory.c | 99
> +++++++++++++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 84 insertions(+), 15 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index b1ca51a079f2..1f56b0118ef5 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> >  					2;
> >  #endif
> >
> > +#ifndef arch_faults_on_old_pte
> > +static inline bool arch_faults_on_old_pte(void)
> > +{
> > +	return false;
> > +}
> > +#endif
> 
> Kirill has acked this, so I'm happy to take the patch as-is, however isn't
> it the case that /most/ architectures will want to return true for
> arch_faults_on_old_pte()? In which case, wouldn't it make more sense for
> that to be the default, and have x86 and arm64 provide an override? For
> example, aren't most architectures still going to hit the double fault
> scenario even with your patch applied?

No, after applying my patch series, only those architectures which don't provide
setting access flag by hardware AND don't implement their arch_faults_on_old_pte
will hit the double page fault.

The meaning of true for arch_faults_on_old_pte() is "this arch doesn't have the hardware
setting access flag way, it might cause page fault on an old pte"
I don't want to change other architectures' default behavior here. So by default, 
arch_faults_on_old_pte() is false.

Btw, currently I only observed this double pagefault on arm64's guest (host is ThunderX2).
On X86 guest (host is Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz ), there is no such double
pagefault. It has the similar setting access flag way by hardware.


--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64
  2019-10-08  1:55       ` Justin He (Arm Technology China)
@ 2019-10-08  2:30         ` Justin He (Arm Technology China)
  2019-10-08  7:46         ` Marc Zyngier
  1 sibling, 0 replies; 21+ messages in thread
From: Justin He (Arm Technology China) @ 2019-10-08  2:30 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Catalin Marinas, Mark Rutland, James Morse, Matthew Wilcox,
	Kirill A. Shutemov, linux-arm-kernel, linux-kernel, linux-mm,
	Punit Agrawal, Thomas Gleixner, Andrew Morton, hejianet,
	Kaly Xin (Arm Technology China),
	nd



> -----Original Message-----
> From: Justin He (Arm Technology China)
> Sent: 2019Äê10ÔÂ8ÈÕ 9:55
> To: Marc Zyngier <maz@kernel.org>; Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
> Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com; Kaly
> Xin (Arm Technology China) <Kaly.Xin@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v10 2/3] arm64: mm: implement
> arch_faults_on_old_pte() on arm64
> 
> Hi Will and Marc
> 
> > -----Original Message-----
> > From: Marc Zyngier <maz@kernel.org>
> > Sent: 2019Äê10ÔÂ1ÈÕ 21:32
> > To: Will Deacon <will@kernel.org>
> > Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> > Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> > <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> > Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> > linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> > <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
> > Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com;
> Kaly
> > Xin (Arm Technology China) <Kaly.Xin@arm.com>
> > Subject: Re: [PATCH v10 2/3] arm64: mm: implement
> > arch_faults_on_old_pte() on arm64
> >
> > On Tue, 1 Oct 2019 13:50:32 +0100
> > Will Deacon <will@kernel.org> wrote:
> >
> > > On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:
> > > > On arm64 without hardware Access Flag, copying fromuser will fail
> > because
> > > > the pte is old and cannot be marked young. So we always end up with
> > zeroed
> > > > page after fork() + CoW for pfn mappings. we don't always have a
> > > > hardware-managed access flag on arm64.
> > > >
> > > > Hence implement arch_faults_on_old_pte on arm64 to indicate that
> it
> > might
> > > > cause page fault when accessing old pte.
> > > >
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > ---
> > > >  arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/include/asm/pgtable.h
> > b/arch/arm64/include/asm/pgtable.h
> > > > index 7576df00eb50..e96fb82f62de 100644
> > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct
> > vm_area_struct *vma,
> > > >  #define phys_to_ttbr(addr)	(addr)
> > > >  #endif
> > > >
> > > > +/*
> > > > + * On arm64 without hardware Access Flag, copying from user will
> fail
> > because
> > > > + * the pte is old and cannot be marked young. So we always end up
> > with zeroed
> > > > + * page after fork() + CoW for pfn mappings. We don't always have a
> > > > + * hardware-managed access flag on arm64.
> > > > + */
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > +	WARN_ON(preemptible());
> > > > +
> > > > +	return !cpu_has_hw_af();
> > > > +}
> > >
> > > Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in
> > that
> > > case, despite not being the case on the host?)
> >
> > Yup, all the 64bit MMFRs are trapped (HCR_EL2.TID3 is set for an
> > AArch64 guest), and we return the sanitised version.
> Thanks for Marc's explanation. I verified the patch series on a kvm guest (-
> M virt)
> with simulated nvdimm device created by qemu. The host is ThunderX2
> aarch64.
> 
> >
> > But that's an interesting remark: we're now trading an extra fault on
> > CPUs that do not support HWAFDBS for a guaranteed trap for each and
> > every guest under the sun that will hit the COW path...
> >
> > My gut feeling is that this is going to be pretty visible. Jia, do you
> > have any numbers for this kind of behaviour?
> It is not a common COW path, but a COW for PFN mapping pages only.
> I add a g_counter before pte_mkyoung in force_mkyoung{} when testing
> vmmalloc_fork at [1].
> 
> In this test case, it will start M fork processes and N pthreads. The default is
> M=2,N=4. the g_counter is about 241, that is it will hit my patch series for
> 241
> times.
> If I set M=20 and N=40 for TEST3, the g_counter is about 1492.

The time overhead of test vmmalloc_fork is:
real    0m5.411s
user    0m4.206s
sys     0m2.699s

> 
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> 
> 
> --
> Cheers,
> Justin (Jia He)
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64
  2019-10-08  1:55       ` Justin He (Arm Technology China)
  2019-10-08  2:30         ` Justin He (Arm Technology China)
@ 2019-10-08  7:46         ` Marc Zyngier
  1 sibling, 0 replies; 21+ messages in thread
From: Marc Zyngier @ 2019-10-08  7:46 UTC (permalink / raw)
  To: Justin He (Arm Technology China)
  Cc: Will Deacon, Catalin Marinas, Mark Rutland, James Morse,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Thomas Gleixner,
	Andrew Morton, hejianet, Kaly Xin (Arm Technology China),
	nd

On Tue, 8 Oct 2019 01:55:04 +0000
"Justin He (Arm Technology China)" <Justin.He@arm.com> wrote:

> Hi Will and Marc
> 
> > -----Original Message-----
> > From: Marc Zyngier <maz@kernel.org>
> > Sent: 2019年10月1日 21:32
> > To: Will Deacon <will@kernel.org>
> > Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> > Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> > <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> > Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> > linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> > <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
> > Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com; Kaly
> > Xin (Arm Technology China) <Kaly.Xin@arm.com>
> > Subject: Re: [PATCH v10 2/3] arm64: mm: implement
> > arch_faults_on_old_pte() on arm64
> > 
> > On Tue, 1 Oct 2019 13:50:32 +0100
> > Will Deacon <will@kernel.org> wrote:
> >   
> > > On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:  
> > > > On arm64 without hardware Access Flag, copying fromuser will fail  
> > because  
> > > > the pte is old and cannot be marked young. So we always end up with  
> > zeroed  
> > > > page after fork() + CoW for pfn mappings. we don't always have a
> > > > hardware-managed access flag on arm64.
> > > >
> > > > Hence implement arch_faults_on_old_pte on arm64 to indicate that it  
> > might  
> > > > cause page fault when accessing old pte.
> > > >
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > ---
> > > >  arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/include/asm/pgtable.h  
> > b/arch/arm64/include/asm/pgtable.h  
> > > > index 7576df00eb50..e96fb82f62de 100644
> > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct  
> > vm_area_struct *vma,  
> > > >  #define phys_to_ttbr(addr)	(addr)
> > > >  #endif
> > > >
> > > > +/*
> > > > + * On arm64 without hardware Access Flag, copying from user will fail  
> > because  
> > > > + * the pte is old and cannot be marked young. So we always end up  
> > with zeroed  
> > > > + * page after fork() + CoW for pfn mappings. We don't always have a
> > > > + * hardware-managed access flag on arm64.
> > > > + */
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > +	WARN_ON(preemptible());
> > > > +
> > > > +	return !cpu_has_hw_af();
> > > > +}  
> > >
> > > Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in  
> > that  
> > > case, despite not being the case on the host?)  
> > 
> > Yup, all the 64bit MMFRs are trapped (HCR_EL2.TID3 is set for an
> > AArch64 guest), and we return the sanitised version.  
> Thanks for Marc's explanation. I verified the patch series on a kvm guest (-M virt)
> with simulated nvdimm device created by qemu. The host is ThunderX2 aarch64.
> 
> > 
> > But that's an interesting remark: we're now trading an extra fault on
> > CPUs that do not support HWAFDBS for a guaranteed trap for each and
> > every guest under the sun that will hit the COW path...
> > 
> > My gut feeling is that this is going to be pretty visible. Jia, do you
> > have any numbers for this kind of behaviour?  
> It is not a common COW path, but a COW for PFN mapping pages only.
> I add a g_counter before pte_mkyoung in force_mkyoung{} when testing 
> vmmalloc_fork at [1].
> 
> In this test case, it will start M fork processes and N pthreads. The default is
> M=2,N=4. the g_counter is about 241, that is it will hit my patch series for 241
> times.
> If I set M=20 and N=40 for TEST3, the g_counter is about 1492.

I must confess I'm not so much interested in random microbenchmarks,
but more in actual applications that could potentially be impacted by
this. The numbers you're quoting here seem pretty small, which would
indicate a low overhead, but that's not indicative of what would happen
in real life.

I guess that we can leave it at that for now, and turn it into a CPU
feature (with the associated static key) if this shows anywhere.

Thanks,

	M.


>   
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> 
> 
> --
> Cheers,
> Justin (Jia He)
> 
-- 
Jazz is not dead. It just smells funny...


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-10-08  2:19     ` Justin He (Arm Technology China)
@ 2019-10-08 12:39       ` Will Deacon
  2019-10-08 12:58         ` Justin He (Arm Technology China)
  2019-10-16 23:21         ` Palmer Dabbelt
  0 siblings, 2 replies; 21+ messages in thread
From: Will Deacon @ 2019-10-08 12:39 UTC (permalink / raw)
  To: Justin He (Arm Technology China)
  Cc: Catalin Marinas, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Thomas Gleixner,
	Andrew Morton, hejianet, Kaly Xin (Arm Technology China),
	nd

On Tue, Oct 08, 2019 at 02:19:05AM +0000, Justin He (Arm Technology China) wrote:
> > -----Original Message-----
> > From: Will Deacon <will@kernel.org>
> > Sent: 2019年10月1日 20:54
> > To: Justin He (Arm Technology China) <Justin.He@arm.com>
> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> > <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>; Marc
> > Zyngier <maz@kernel.org>; Matthew Wilcox <willy@infradead.org>; Kirill A.
> > Shutemov <kirill.shutemov@linux.intel.com>; linux-arm-
> > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> > mm@kvack.org; Punit Agrawal <punitagrawal@gmail.com>; Thomas
> > Gleixner <tglx@linutronix.de>; Andrew Morton <akpm@linux-
> > foundation.org>; hejianet@gmail.com; Kaly Xin (Arm Technology China)
> > <Kaly.Xin@arm.com>
> > Subject: Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF
> > is cleared
> > 
> > On Mon, Sep 30, 2019 at 09:57:40AM +0800, Jia He wrote:
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index b1ca51a079f2..1f56b0118ef5 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > >  					2;
> > >  #endif
> > >
> > > +#ifndef arch_faults_on_old_pte
> > > +static inline bool arch_faults_on_old_pte(void)
> > > +{
> > > +	return false;
> > > +}
> > > +#endif
> > 
> > Kirill has acked this, so I'm happy to take the patch as-is, however isn't
> > it the case that /most/ architectures will want to return true for
> > arch_faults_on_old_pte()? In which case, wouldn't it make more sense for
> > that to be the default, and have x86 and arm64 provide an override? For
> > example, aren't most architectures still going to hit the double fault
> > scenario even with your patch applied?
> 
> No, after applying my patch series, only those architectures which don't provide
> setting access flag by hardware AND don't implement their arch_faults_on_old_pte
> will hit the double page fault.
> 
> The meaning of true for arch_faults_on_old_pte() is "this arch doesn't have the hardware
> setting access flag way, it might cause page fault on an old pte"
> I don't want to change other architectures' default behavior here. So by default, 
> arch_faults_on_old_pte() is false.

...and my complaint is that this is the majority of supported architectures,
so you're fixing something for arm64 which also affects arm, powerpc,
alpha, mips, riscv, ...

Chances are, they won't even realise they need to implement
arch_faults_on_old_pte() until somebody runs into the double fault and
wastes lots of time debugging it before they spot your patch.

> Btw, currently I only observed this double pagefault on arm64's guest
> (host is ThunderX2).  On X86 guest (host is Intel(R) Core(TM) i7-4790 CPU
> @ 3.60GHz ), there is no such double pagefault. It has the similar setting
> access flag way by hardware.

Right, and that's why I'm not concerned about x86 for this problem.

Will


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-10-08 12:39       ` Will Deacon
@ 2019-10-08 12:58         ` Justin He (Arm Technology China)
  2019-10-08 14:32           ` Kirill A. Shutemov
  2019-10-16 23:21         ` Palmer Dabbelt
  1 sibling, 1 reply; 21+ messages in thread
From: Justin He (Arm Technology China) @ 2019-10-08 12:58 UTC (permalink / raw)
  To: Will Deacon, Kirill A. Shutemov
  Cc: Catalin Marinas, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, linux-arm-kernel, linux-kernel, linux-mm,
	Punit Agrawal, Thomas Gleixner, Andrew Morton, hejianet,
	Kaly Xin (Arm Technology China),
	nd

Hi Will

> -----Original Message-----
> From: Will Deacon <will@kernel.org>
> Sent: 2019年10月8日 20:40
> To: Justin He (Arm Technology China) <Justin.He@arm.com>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>; Marc
> Zyngier <maz@kernel.org>; Matthew Wilcox <willy@infradead.org>; Kirill A.
> Shutemov <kirill.shutemov@linux.intel.com>; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org; Punit Agrawal <punitagrawal@gmail.com>; Thomas
> Gleixner <tglx@linutronix.de>; Andrew Morton <akpm@linux-
> foundation.org>; hejianet@gmail.com; Kaly Xin (Arm Technology China)
> <Kaly.Xin@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF
> is cleared
> 
> On Tue, Oct 08, 2019 at 02:19:05AM +0000, Justin He (Arm Technology
> China) wrote:
> > > -----Original Message-----
> > > From: Will Deacon <will@kernel.org>
> > > Sent: 2019年10月1日 20:54
> > > To: Justin He (Arm Technology China) <Justin.He@arm.com>
> > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> > > <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> Marc
> > > Zyngier <maz@kernel.org>; Matthew Wilcox <willy@infradead.org>;
> Kirill A.
> > > Shutemov <kirill.shutemov@linux.intel.com>; linux-arm-
> > > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> > > mm@kvack.org; Punit Agrawal <punitagrawal@gmail.com>; Thomas
> > > Gleixner <tglx@linutronix.de>; Andrew Morton <akpm@linux-
> > > foundation.org>; hejianet@gmail.com; Kaly Xin (Arm Technology China)
> > > <Kaly.Xin@arm.com>
> > > Subject: Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if
> PTE_AF
> > > is cleared
> > >
> > > On Mon, Sep 30, 2019 at 09:57:40AM +0800, Jia He wrote:
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index b1ca51a079f2..1f56b0118ef5 100644
> > > > --- a/mm/memory.c
> > > > +++ b/mm/memory.c
> > > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > > >  					2;
> > > >  #endif
> > > >
> > > > +#ifndef arch_faults_on_old_pte
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > +	return false;
> > > > +}
> > > > +#endif
> > >
> > > Kirill has acked this, so I'm happy to take the patch as-is, however isn't
> > > it the case that /most/ architectures will want to return true for
> > > arch_faults_on_old_pte()? In which case, wouldn't it make more sense
> for
> > > that to be the default, and have x86 and arm64 provide an override?
> For
> > > example, aren't most architectures still going to hit the double fault
> > > scenario even with your patch applied?
> >
> > No, after applying my patch series, only those architectures which don't
> provide
> > setting access flag by hardware AND don't implement their
> arch_faults_on_old_pte
> > will hit the double page fault.
> >
> > The meaning of true for arch_faults_on_old_pte() is "this arch doesn't
> have the hardware
> > setting access flag way, it might cause page fault on an old pte"
> > I don't want to change other architectures' default behavior here. So by
> default,
> > arch_faults_on_old_pte() is false.
> 
> ...and my complaint is that this is the majority of supported architectures,
> so you're fixing something for arm64 which also affects arm, powerpc,
> alpha, mips, riscv, ...

So, IIUC, you suggested that:
1. by default, arch_faults_on_old_pte() return true
2. on X86, let arch_faults_on_old_pte() be overrided as returning false
3. on arm64, let it be as-is my patch set.
4. let other architectures decide the behavior. (But by default, it will set
pte_young)

I am ok with that if no objections from others.

@Kirill A. Shutemov Do you have any comments? Thanks
> 
> Chances are, they won't even realise they need to implement
> arch_faults_on_old_pte() until somebody runs into the double fault and
> wastes lots of time debugging it before they spot your patch.

As to this point, I added a WARN_ON in patch 03 to speed up the debugging
process.

--
Cheers,
Justin (Jia He)



> 
> > Btw, currently I only observed this double pagefault on arm64's guest
> > (host is ThunderX2).  On X86 guest (host is Intel(R) Core(TM) i7-4790 CPU
> > @ 3.60GHz ), there is no such double pagefault. It has the similar setting
> > access flag way by hardware.
> 
> Right, and that's why I'm not concerned about x86 for this problem.
> 
> Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-10-08 12:58         ` Justin He (Arm Technology China)
@ 2019-10-08 14:32           ` Kirill A. Shutemov
  0 siblings, 0 replies; 21+ messages in thread
From: Kirill A. Shutemov @ 2019-10-08 14:32 UTC (permalink / raw)
  To: Justin He (Arm Technology China)
  Cc: Will Deacon, Kirill A. Shutemov, Catalin Marinas, Mark Rutland,
	James Morse, Marc Zyngier, Matthew Wilcox, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Thomas Gleixner,
	Andrew Morton, hejianet, Kaly Xin (Arm Technology China),
	nd

On Tue, Oct 08, 2019 at 12:58:57PM +0000, Justin He (Arm Technology China) wrote:
> Hi Will
> 
> > -----Original Message-----
> > From: Will Deacon <will@kernel.org>
> > Sent: 2019年10月8日 20:40
> > To: Justin He (Arm Technology China) <Justin.He@arm.com>
> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> > <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>; Marc
> > Zyngier <maz@kernel.org>; Matthew Wilcox <willy@infradead.org>; Kirill A.
> > Shutemov <kirill.shutemov@linux.intel.com>; linux-arm-
> > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> > mm@kvack.org; Punit Agrawal <punitagrawal@gmail.com>; Thomas
> > Gleixner <tglx@linutronix.de>; Andrew Morton <akpm@linux-
> > foundation.org>; hejianet@gmail.com; Kaly Xin (Arm Technology China)
> > <Kaly.Xin@arm.com>; nd <nd@arm.com>
> > Subject: Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF
> > is cleared
> > 
> > On Tue, Oct 08, 2019 at 02:19:05AM +0000, Justin He (Arm Technology
> > China) wrote:
> > > > -----Original Message-----
> > > > From: Will Deacon <will@kernel.org>
> > > > Sent: 2019年10月1日 20:54
> > > > To: Justin He (Arm Technology China) <Justin.He@arm.com>
> > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> > > > <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> > Marc
> > > > Zyngier <maz@kernel.org>; Matthew Wilcox <willy@infradead.org>;
> > Kirill A.
> > > > Shutemov <kirill.shutemov@linux.intel.com>; linux-arm-
> > > > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> > > > mm@kvack.org; Punit Agrawal <punitagrawal@gmail.com>; Thomas
> > > > Gleixner <tglx@linutronix.de>; Andrew Morton <akpm@linux-
> > > > foundation.org>; hejianet@gmail.com; Kaly Xin (Arm Technology China)
> > > > <Kaly.Xin@arm.com>
> > > > Subject: Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if
> > PTE_AF
> > > > is cleared
> > > >
> > > > On Mon, Sep 30, 2019 at 09:57:40AM +0800, Jia He wrote:
> > > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > > index b1ca51a079f2..1f56b0118ef5 100644
> > > > > --- a/mm/memory.c
> > > > > +++ b/mm/memory.c
> > > > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > > > >  					2;
> > > > >  #endif
> > > > >
> > > > > +#ifndef arch_faults_on_old_pte
> > > > > +static inline bool arch_faults_on_old_pte(void)
> > > > > +{
> > > > > +	return false;
> > > > > +}
> > > > > +#endif
> > > >
> > > > Kirill has acked this, so I'm happy to take the patch as-is, however isn't
> > > > it the case that /most/ architectures will want to return true for
> > > > arch_faults_on_old_pte()? In which case, wouldn't it make more sense
> > for
> > > > that to be the default, and have x86 and arm64 provide an override?
> > For
> > > > example, aren't most architectures still going to hit the double fault
> > > > scenario even with your patch applied?
> > >
> > > No, after applying my patch series, only those architectures which don't
> > provide
> > > setting access flag by hardware AND don't implement their
> > arch_faults_on_old_pte
> > > will hit the double page fault.
> > >
> > > The meaning of true for arch_faults_on_old_pte() is "this arch doesn't
> > have the hardware
> > > setting access flag way, it might cause page fault on an old pte"
> > > I don't want to change other architectures' default behavior here. So by
> > default,
> > > arch_faults_on_old_pte() is false.
> > 
> > ...and my complaint is that this is the majority of supported architectures,
> > so you're fixing something for arm64 which also affects arm, powerpc,
> > alpha, mips, riscv, ...
> 
> So, IIUC, you suggested that:
> 1. by default, arch_faults_on_old_pte() return true
> 2. on X86, let arch_faults_on_old_pte() be overrided as returning false
> 3. on arm64, let it be as-is my patch set.
> 4. let other architectures decide the behavior. (But by default, it will set
> pte_young)
> 
> I am ok with that if no objections from others.
> 
> @Kirill A. Shutemov Do you have any comments? Thanks

Sounds sane to me.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-10-08  1:12       ` Justin He (Arm Technology China)
@ 2019-10-08 15:32         ` Suzuki K Poulose
  2019-10-09  6:29           ` Jia He
  0 siblings, 1 reply; 21+ messages in thread
From: Suzuki K Poulose @ 2019-10-08 15:32 UTC (permalink / raw)
  To: Justin He (Arm Technology China), Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Kaly Xin (Arm Technology China),
	Catalin Marinas, linux-kernel, Matthew Wilcox, linux-mm,
	James Morse, linux-arm-kernel, Punit Agrawal, hejianet,
	Thomas Gleixner, Andrew Morton, Kirill A. Shutemov



On 08/10/2019 02:12, Justin He (Arm Technology China) wrote:
> Hi Will and Marc
> Sorry for the late response, just came back from a vacation.
> 
>> -----Original Message-----
>> From: Marc Zyngier <maz@kernel.org>
>> Sent: 2019Äê10ÔÂ1ÈÕ 21:19
>> To: Will Deacon <will@kernel.org>
>> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
>> Marinas <Catalin.Marinas@arm.com>; Mark Rutland
>> <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
>> Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
>> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
>> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
>> <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
>> Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com; Kaly
>> Xin (Arm Technology China) <Kaly.Xin@arm.com>
>> Subject: Re: [PATCH v10 1/3] arm64: cpufeature: introduce helper
>> cpu_has_hw_af()
>>
>> On Tue, 1 Oct 2019 13:54:47 +0100
>> Will Deacon <will@kernel.org> wrote:
>>
>>> On Mon, Sep 30, 2019 at 09:57:38AM +0800, Jia He wrote:
>>>> We unconditionally set the HW_AFDBM capability and only enable it on
>>>> CPUs which really have the feature. But sometimes we need to know
>>>> whether this cpu has the capability of HW AF. So decouple AF from
>>>> DBM by new helper cpu_has_hw_af().
>>>>
>>>> Signed-off-by: Jia He <justin.he@arm.com>
>>>> Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
>>>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>>>> ---
>>>>   arch/arm64/include/asm/cpufeature.h | 10 ++++++++++
>>>>   1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/cpufeature.h
>> b/arch/arm64/include/asm/cpufeature.h
>>>> index 9cde5d2e768f..949bc7c85030 100644
>>>> --- a/arch/arm64/include/asm/cpufeature.h
>>>> +++ b/arch/arm64/include/asm/cpufeature.h
>>>> @@ -659,6 +659,16 @@ static inline u32
>> id_aa64mmfr0_parange_to_phys_shift(int parange)
>>>>    default: return CONFIG_ARM64_PA_BITS;
>>>>    }
>>>>   }
>>>> +
>>>> +/* Check whether hardware update of the Access flag is supported */
>>>> +static inline bool cpu_has_hw_af(void)
>>>> +{
>>>> + if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
>>>> +         return read_cpuid(ID_AA64MMFR1_EL1) & 0xf;
>>>
>>> 0xf? I think we should have a mask in sysreg.h for this constant.
>>
>> We don't have the mask, but we certainly have the shift.
>>
>> GENMASK(ID_AA64MMFR1_HADBS_SHIFT + 3,
>> ID_AA64MMFR1_HADBS_SHIFT) is a bit
>> of a mouthful though. Ideally, we'd have a helper for that.
>>
> Ok, I will implement the helper if there isn't so far.
> And then replace the 0xf with it.

Or could we simpl reuse existing cpuid_feature_extract_unsigned_field() ?

u64 mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);

return cpuid_feature_extract_unsigned_field(mmfr1, ID_AA64MMFR1_HADBS_SHIFT) ?

Cheers
Suzuki


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-10-08 15:32         ` Suzuki K Poulose
@ 2019-10-09  6:29           ` Jia He
  0 siblings, 0 replies; 21+ messages in thread
From: Jia He @ 2019-10-09  6:29 UTC (permalink / raw)
  To: Suzuki K Poulose, Justin He (Arm Technology China),
	Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Kaly Xin (Arm Technology China),
	Catalin Marinas, linux-kernel, Matthew Wilcox, linux-mm,
	James Morse, linux-arm-kernel, Punit Agrawal, Thomas Gleixner,
	Andrew Morton, Kirill A. Shutemov

Hi Suzuki

On 2019/10/8 23:32, Suzuki K Poulose wrote:
>
>
> On 08/10/2019 02:12, Justin He (Arm Technology China) wrote:
>> Hi Will and Marc
>> Sorry for the late response, just came back from a vacation.
>>
>>> -----Original Message-----
>>> From: Marc Zyngier <maz@kernel.org>
>>> Sent: 2019年10月1日 21:19
>>> To: Will Deacon <will@kernel.org>
>>> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
>>> Marinas <Catalin.Marinas@arm.com>; Mark Rutland
>>> <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
>>> Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
>>> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
>>> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
>>> <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
>>> Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com; Kaly
>>> Xin (Arm Technology China) <Kaly.Xin@arm.com>
>>> Subject: Re: [PATCH v10 1/3] arm64: cpufeature: introduce helper
>>> cpu_has_hw_af()
>>>
>>> On Tue, 1 Oct 2019 13:54:47 +0100
>>> Will Deacon <will@kernel.org> wrote:
>>>
>>>> On Mon, Sep 30, 2019 at 09:57:38AM +0800, Jia He wrote:
>>>>> We unconditionally set the HW_AFDBM capability and only enable it on
>>>>> CPUs which really have the feature. But sometimes we need to know
>>>>> whether this cpu has the capability of HW AF. So decouple AF from
>>>>> DBM by new helper cpu_has_hw_af().
>>>>>
>>>>> Signed-off-by: Jia He <justin.he@arm.com>
>>>>> Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
>>>>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>>>>> ---
>>>>>   arch/arm64/include/asm/cpufeature.h | 10 ++++++++++
>>>>>   1 file changed, 10 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/include/asm/cpufeature.h
>>> b/arch/arm64/include/asm/cpufeature.h
>>>>> index 9cde5d2e768f..949bc7c85030 100644
>>>>> --- a/arch/arm64/include/asm/cpufeature.h
>>>>> +++ b/arch/arm64/include/asm/cpufeature.h
>>>>> @@ -659,6 +659,16 @@ static inline u32
>>> id_aa64mmfr0_parange_to_phys_shift(int parange)
>>>>>    default: return CONFIG_ARM64_PA_BITS;
>>>>>    }
>>>>>   }
>>>>> +
>>>>> +/* Check whether hardware update of the Access flag is supported */
>>>>> +static inline bool cpu_has_hw_af(void)
>>>>> +{
>>>>> + if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
>>>>> +         return read_cpuid(ID_AA64MMFR1_EL1) & 0xf;
>>>>
>>>> 0xf? I think we should have a mask in sysreg.h for this constant.
>>>
>>> We don't have the mask, but we certainly have the shift.
>>>
>>> GENMASK(ID_AA64MMFR1_HADBS_SHIFT + 3,
>>> ID_AA64MMFR1_HADBS_SHIFT) is a bit
>>> of a mouthful though. Ideally, we'd have a helper for that.
>>>
>> Ok, I will implement the helper if there isn't so far.
>> And then replace the 0xf with it.
>
> Or could we simpl reuse existing cpuid_feature_extract_unsigned_field() ?
>
> u64 mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
>
> return cpuid_feature_extract_unsigned_field(mmfr1, ID_AA64MMFR1_HADBS_SHIFT) ?
>
Yes, we can, I will send the new version

---
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-10-08 12:39       ` Will Deacon
  2019-10-08 12:58         ` Justin He (Arm Technology China)
@ 2019-10-16 23:21         ` Palmer Dabbelt
  2019-10-16 23:46           ` Will Deacon
  1 sibling, 1 reply; 21+ messages in thread
From: Palmer Dabbelt @ 2019-10-16 23:21 UTC (permalink / raw)
  To: will
  Cc: Justin.He, Catalin.Marinas, Mark.Rutland, James.Morse, maz,
	willy, kirill.shutemov, linux-arm-kernel, linux-kernel, linux-mm,
	punitagrawal, tglx, akpm, hejianet, Kaly.Xin, nd

On Tue, 08 Oct 2019 05:39:44 PDT (-0700), will@kernel.org wrote:
> On Tue, Oct 08, 2019 at 02:19:05AM +0000, Justin He (Arm Technology China) wrote:
>> > -----Original Message-----
>> > From: Will Deacon <will@kernel.org>
>> > Sent: 2019年10月1日 20:54
>> > To: Justin He (Arm Technology China) <Justin.He@arm.com>
>> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Mark Rutland
>> > <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>; Marc
>> > Zyngier <maz@kernel.org>; Matthew Wilcox <willy@infradead.org>; Kirill A.
>> > Shutemov <kirill.shutemov@linux.intel.com>; linux-arm-
>> > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
>> > mm@kvack.org; Punit Agrawal <punitagrawal@gmail.com>; Thomas
>> > Gleixner <tglx@linutronix.de>; Andrew Morton <akpm@linux-
>> > foundation.org>; hejianet@gmail.com; Kaly Xin (Arm Technology China)
>> > <Kaly.Xin@arm.com>
>> > Subject: Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF
>> > is cleared
>> >
>> > On Mon, Sep 30, 2019 at 09:57:40AM +0800, Jia He wrote:
>> > > diff --git a/mm/memory.c b/mm/memory.c
>> > > index b1ca51a079f2..1f56b0118ef5 100644
>> > > --- a/mm/memory.c
>> > > +++ b/mm/memory.c
>> > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
>> > >  					2;
>> > >  #endif
>> > >
>> > > +#ifndef arch_faults_on_old_pte
>> > > +static inline bool arch_faults_on_old_pte(void)
>> > > +{
>> > > +	return false;
>> > > +}
>> > > +#endif
>> >
>> > Kirill has acked this, so I'm happy to take the patch as-is, however isn't
>> > it the case that /most/ architectures will want to return true for
>> > arch_faults_on_old_pte()? In which case, wouldn't it make more sense for
>> > that to be the default, and have x86 and arm64 provide an override? For
>> > example, aren't most architectures still going to hit the double fault
>> > scenario even with your patch applied?
>>
>> No, after applying my patch series, only those architectures which don't provide
>> setting access flag by hardware AND don't implement their arch_faults_on_old_pte
>> will hit the double page fault.
>>
>> The meaning of true for arch_faults_on_old_pte() is "this arch doesn't have the hardware
>> setting access flag way, it might cause page fault on an old pte"
>> I don't want to change other architectures' default behavior here. So by default,
>> arch_faults_on_old_pte() is false.
>
> ...and my complaint is that this is the majority of supported architectures,
> so you're fixing something for arm64 which also affects arm, powerpc,
> alpha, mips, riscv, ...
>
> Chances are, they won't even realise they need to implement
> arch_faults_on_old_pte() until somebody runs into the double fault and
> wastes lots of time debugging it before they spot your patch.

If I understand the semantics correctly, we should have this set to true.  I 
don't have any context here, but we've got

                /*
                 * The kernel assumes that TLBs don't cache invalid
                 * entries, but in RISC-V, SFENCE.VMA specifies an
                 * ordering constraint, not a cache flush; it is
                 * necessary even after writing invalid entries.
                 */
                local_flush_tlb_page(addr);

in do_page_fault().

>> Btw, currently I only observed this double pagefault on arm64's guest
>> (host is ThunderX2).  On X86 guest (host is Intel(R) Core(TM) i7-4790 CPU
>> @ 3.60GHz ), there is no such double pagefault. It has the similar setting
>> access flag way by hardware.
>
> Right, and that's why I'm not concerned about x86 for this problem.
>
> Will


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-10-16 23:21         ` Palmer Dabbelt
@ 2019-10-16 23:46           ` Will Deacon
  0 siblings, 0 replies; 21+ messages in thread
From: Will Deacon @ 2019-10-16 23:46 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Justin.He, Catalin.Marinas, Mark.Rutland, James.Morse, maz,
	willy, kirill.shutemov, linux-arm-kernel, linux-kernel, linux-mm,
	punitagrawal, tglx, akpm, hejianet, Kaly.Xin, nd

Hey Palmer,

On Wed, Oct 16, 2019 at 04:21:59PM -0700, Palmer Dabbelt wrote:
> On Tue, 08 Oct 2019 05:39:44 PDT (-0700), will@kernel.org wrote:
> > On Tue, Oct 08, 2019 at 02:19:05AM +0000, Justin He (Arm Technology China) wrote:
> > > > On Mon, Sep 30, 2019 at 09:57:40AM +0800, Jia He wrote:
> > > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > > index b1ca51a079f2..1f56b0118ef5 100644
> > > > > --- a/mm/memory.c
> > > > > +++ b/mm/memory.c
> > > > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > > > >  					2;
> > > > >  #endif
> > > > >
> > > > > +#ifndef arch_faults_on_old_pte
> > > > > +static inline bool arch_faults_on_old_pte(void)
> > > > > +{
> > > > > +	return false;
> > > > > +}
> > > > > +#endif
> > > >
> > > > Kirill has acked this, so I'm happy to take the patch as-is, however isn't
> > > > it the case that /most/ architectures will want to return true for
> > > > arch_faults_on_old_pte()? In which case, wouldn't it make more sense for
> > > > that to be the default, and have x86 and arm64 provide an override? For
> > > > example, aren't most architectures still going to hit the double fault
> > > > scenario even with your patch applied?
> > > 
> > > No, after applying my patch series, only those architectures which don't provide
> > > setting access flag by hardware AND don't implement their arch_faults_on_old_pte
> > > will hit the double page fault.
> > > 
> > > The meaning of true for arch_faults_on_old_pte() is "this arch doesn't have the hardware
> > > setting access flag way, it might cause page fault on an old pte"
> > > I don't want to change other architectures' default behavior here. So by default,
> > > arch_faults_on_old_pte() is false.
> > 
> > ...and my complaint is that this is the majority of supported architectures,
> > so you're fixing something for arm64 which also affects arm, powerpc,
> > alpha, mips, riscv, ...
> > 
> > Chances are, they won't even realise they need to implement
> > arch_faults_on_old_pte() until somebody runs into the double fault and
> > wastes lots of time debugging it before they spot your patch.
> 
> If I understand the semantics correctly, we should have this set to true.  I
> don't have any context here, but we've got
> 
>                /*
>                 * The kernel assumes that TLBs don't cache invalid
>                 * entries, but in RISC-V, SFENCE.VMA specifies an
>                 * ordering constraint, not a cache flush; it is
>                 * necessary even after writing invalid entries.
>                 */
>                local_flush_tlb_page(addr);
> 
> in do_page_fault().

Ok, although I think this is really about whether or not your hardware can
make a pte young when accessed, or whether you take a fault and do it
by updating the pte explicitly.

v12 of the patches did change the default, so you should be "safe" with
those either way:

http://lists.infradead.org/pipermail/linux-arm-kernel/2019-October/686030.html

Will


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, back to index

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-30  1:57 [PATCH v10 0/3] fix double page fault on arm64 Jia He
2019-09-30  1:57 ` [PATCH v10 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af() Jia He
2019-10-01 12:54   ` Will Deacon
2019-10-01 13:18     ` Marc Zyngier
2019-10-08  1:12       ` Justin He (Arm Technology China)
2019-10-08 15:32         ` Suzuki K Poulose
2019-10-09  6:29           ` Jia He
2019-09-30  1:57 ` [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64 Jia He
2019-10-01 12:50   ` Will Deacon
2019-10-01 13:32     ` Marc Zyngier
2019-10-08  1:55       ` Justin He (Arm Technology China)
2019-10-08  2:30         ` Justin He (Arm Technology China)
2019-10-08  7:46         ` Marc Zyngier
2019-09-30  1:57 ` [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
2019-10-01 12:54   ` Will Deacon
2019-10-08  2:19     ` Justin He (Arm Technology China)
2019-10-08 12:39       ` Will Deacon
2019-10-08 12:58         ` Justin He (Arm Technology China)
2019-10-08 14:32           ` Kirill A. Shutemov
2019-10-16 23:21         ` Palmer Dabbelt
2019-10-16 23:46           ` Will Deacon

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox