All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] fix double page fault on arm64
@ 2019-09-18 13:19 ` Jia He
  0 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-18 13:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose
  Cc: Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, hejianet, Kaly Xin,
	Jia He

When we tested pmdk unit test vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

Changes
v4: introduce cpu_has_hw_af (Suzuki)
    bail out if !pte_same (Kirill)
v3: add vmf->ptl lock/unlock (by Kirill A. Shutemov)
    add arch_faults_on_old_pte (Matthew, Catalin)
v2: remove FAULT_FLAG_WRITE when setting pte access flag (by Catalin)

Jia He (3):
  arm64: cpufeature: introduce helper cpu_has_hw_af()
  arm64: mm: implement arch_faults_on_old_pte() on arm64
  mm: fix double page fault on arm64 if PTE_AF is cleared

 arch/arm64/include/asm/cpufeature.h |  1 +
 arch/arm64/include/asm/pgtable.h    | 12 ++++++++++
 arch/arm64/kernel/cpufeature.c      |  6 +++++
 mm/memory.c                         | 35 ++++++++++++++++++++++++-----
 4 files changed, 49 insertions(+), 5 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v4 0/3] fix double page fault on arm64
@ 2019-09-18 13:19 ` Jia He
  0 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-18 13:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose
  Cc: Ralph Campbell, Jia He, Andrew Morton, Anshuman Khandual,
	Jun Yao, Kaly Xin, Jérôme Glisse, Punit Agrawal,
	hejianet, Thomas Gleixner, Robin Murphy, Alex Van Brunt

When we tested pmdk unit test vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

Changes
v4: introduce cpu_has_hw_af (Suzuki)
    bail out if !pte_same (Kirill)
v3: add vmf->ptl lock/unlock (by Kirill A. Shutemov)
    add arch_faults_on_old_pte (Matthew, Catalin)
v2: remove FAULT_FLAG_WRITE when setting pte access flag (by Catalin)

Jia He (3):
  arm64: cpufeature: introduce helper cpu_has_hw_af()
  arm64: mm: implement arch_faults_on_old_pte() on arm64
  mm: fix double page fault on arm64 if PTE_AF is cleared

 arch/arm64/include/asm/cpufeature.h |  1 +
 arch/arm64/include/asm/pgtable.h    | 12 ++++++++++
 arch/arm64/kernel/cpufeature.c      |  6 +++++
 mm/memory.c                         | 35 ++++++++++++++++++++++++-----
 4 files changed, 49 insertions(+), 5 deletions(-)

-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-09-18 13:19 ` Jia He
@ 2019-09-18 13:19   ` Jia He
  -1 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-18 13:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose
  Cc: Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, hejianet, Kaly Xin,
	Jia He

We unconditionally set the HW_AFDBM capability and only enable it on
CPUs which really have the feature. But sometimes we need to know
whether this cpu has the capability of HW AF. So decouple AF from
DBM by new helper cpu_has_hw_af().

Signed-off-by: Jia He <justin.he@arm.com>
Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 1 +
 arch/arm64/kernel/cpufeature.c      | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index c96ffa4722d3..206b6e3954cf 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -390,6 +390,7 @@ extern DECLARE_BITMAP(boot_capabilities, ARM64_NPATCHABLE);
 	for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
 
 bool this_cpu_has_cap(unsigned int cap);
+bool cpu_has_hw_af(void);
 void cpu_set_feature(unsigned int num);
 bool cpu_have_feature(unsigned int num);
 unsigned long cpu_get_elf_hwcap(void);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b1fdc486aed8..c5097f58649d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1141,6 +1141,12 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
 	return true;
 }
 
+/* Decouple AF from AFDBM. */
+bool cpu_has_hw_af(void)
+{
+	return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);
+}
+
 #endif
 
 #ifdef CONFIG_ARM64_VHE
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
@ 2019-09-18 13:19   ` Jia He
  0 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-18 13:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose
  Cc: Ralph Campbell, Jia He, Andrew Morton, Anshuman Khandual,
	Jun Yao, Kaly Xin, Jérôme Glisse, Punit Agrawal,
	hejianet, Thomas Gleixner, Robin Murphy, Alex Van Brunt

We unconditionally set the HW_AFDBM capability and only enable it on
CPUs which really have the feature. But sometimes we need to know
whether this cpu has the capability of HW AF. So decouple AF from
DBM by new helper cpu_has_hw_af().

Signed-off-by: Jia He <justin.he@arm.com>
Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 1 +
 arch/arm64/kernel/cpufeature.c      | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index c96ffa4722d3..206b6e3954cf 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -390,6 +390,7 @@ extern DECLARE_BITMAP(boot_capabilities, ARM64_NPATCHABLE);
 	for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
 
 bool this_cpu_has_cap(unsigned int cap);
+bool cpu_has_hw_af(void);
 void cpu_set_feature(unsigned int num);
 bool cpu_have_feature(unsigned int num);
 unsigned long cpu_get_elf_hwcap(void);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b1fdc486aed8..c5097f58649d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1141,6 +1141,12 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
 	return true;
 }
 
+/* Decouple AF from AFDBM. */
+bool cpu_has_hw_af(void)
+{
+	return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);
+}
+
 #endif
 
 #ifdef CONFIG_ARM64_VHE
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64
  2019-09-18 13:19 ` Jia He
@ 2019-09-18 13:19   ` Jia He
  -1 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-18 13:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose
  Cc: Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, hejianet, Kaly Xin,
	Jia He

On arm64 without hardware Access Flag, copying fromuser will fail because
the pte is old and cannot be marked young. So we always end up with zeroed
page after fork() + CoW for pfn mappings. we don't always have a
hardware-managed access flag on arm64.

Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
cause page fault when accessing old pte.

Signed-off-by: Jia He <justin.he@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e09760ece844..4a9939615e41 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -868,6 +868,18 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 #define phys_to_ttbr(addr)	(addr)
 #endif
 
+/*
+ * On arm64 without hardware Access Flag, copying fromuser will fail because
+ * the pte is old and cannot be marked young. So we always end up with zeroed
+ * page after fork() + CoW for pfn mappings. we don't always have a
+ * hardware-managed access flag on arm64.
+ */
+static inline bool arch_faults_on_old_pte(void)
+{
+	return !cpu_has_hw_af();
+}
+#define arch_faults_on_old_pte arch_faults_on_old_pte
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_PGTABLE_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64
@ 2019-09-18 13:19   ` Jia He
  0 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-18 13:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose
  Cc: Ralph Campbell, Jia He, Andrew Morton, Anshuman Khandual,
	Jun Yao, Kaly Xin, Jérôme Glisse, Punit Agrawal,
	hejianet, Thomas Gleixner, Robin Murphy, Alex Van Brunt

On arm64 without hardware Access Flag, copying fromuser will fail because
the pte is old and cannot be marked young. So we always end up with zeroed
page after fork() + CoW for pfn mappings. we don't always have a
hardware-managed access flag on arm64.

Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
cause page fault when accessing old pte.

Signed-off-by: Jia He <justin.he@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e09760ece844..4a9939615e41 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -868,6 +868,18 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 #define phys_to_ttbr(addr)	(addr)
 #endif
 
+/*
+ * On arm64 without hardware Access Flag, copying fromuser will fail because
+ * the pte is old and cannot be marked young. So we always end up with zeroed
+ * page after fork() + CoW for pfn mappings. we don't always have a
+ * hardware-managed access flag on arm64.
+ */
+static inline bool arch_faults_on_old_pte(void)
+{
+	return !cpu_has_hw_af();
+}
+#define arch_faults_on_old_pte arch_faults_on_old_pte
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_PGTABLE_H */
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-18 13:19 ` Jia He
@ 2019-09-18 13:19   ` Jia He
  -1 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-18 13:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose
  Cc: Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, hejianet, Kaly Xin,
	Jia He

When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

Below call trace is from arm64 do_page_fault for debugging purpose
[  110.016195] Call trace:
[  110.016826]  do_page_fault+0x5a4/0x690
[  110.017812]  do_mem_abort+0x50/0xb0
[  110.018726]  el1_da+0x20/0xc4
[  110.019492]  __arch_copy_from_user+0x180/0x280
[  110.020646]  do_wp_page+0xb0/0x860
[  110.021517]  __handle_mm_fault+0x994/0x1338
[  110.022606]  handle_mm_fault+0xe8/0x180
[  110.023584]  do_page_fault+0x240/0x690
[  110.024535]  do_mem_abort+0x50/0xb0
[  110.025423]  el0_da+0x20/0x24

The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

This patch fix it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()

[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork

Reported-by: Yibo Cai <Yibo.Cai@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..d2c130a5883b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
 					2;
 #endif
 
+#ifndef arch_faults_on_old_pte
+static inline bool arch_faults_on_old_pte(void)
+{
+	return false;
+}
+#endif
+
 static int __init disable_randmaps(char *s)
 {
 	randomize_va_space = 0;
@@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
 	return same;
 }
 
-static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
+static inline void cow_user_page(struct page *dst, struct page *src,
+				 struct vm_fault *vmf)
 {
+	struct vm_area_struct *vma = vmf->vma;
+	unsigned long addr = vmf->address;
+
 	debug_dma_assert_idle(src);
 
 	/*
@@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
 	 */
 	if (unlikely(!src)) {
 		void *kaddr = kmap_atomic(dst);
-		void __user *uaddr = (void __user *)(va & PAGE_MASK);
+		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
+		pte_t entry;
 
 		/*
 		 * This really shouldn't fail, because the page is there
 		 * in the page tables. But it might just be unreadable,
 		 * in which case we just give up and fill the result with
-		 * zeroes.
+		 * zeroes. On architectures with software "accessed" bits,
+		 * we would take a double page fault here, so mark it
+		 * accessed here.
 		 */
+		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+			spin_lock(vmf->ptl);
+			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
+				entry = pte_mkyoung(vmf->orig_pte);
+				if (ptep_set_access_flags(vma, addr,
+							  vmf->pte, entry, 0))
+					update_mmu_cache(vma, addr, vmf->pte);
+			}
+			spin_unlock(vmf->ptl);
+		}
+
 		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
 			clear_page(kaddr);
 		kunmap_atomic(kaddr);
 		flush_dcache_page(dst);
 	} else
-		copy_user_highpage(dst, src, va, vma);
+		copy_user_highpage(dst, src, addr, vma);
 }
 
 static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
@@ -2318,7 +2343,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 				vmf->address);
 		if (!new_page)
 			goto oom;
-		cow_user_page(new_page, old_page, vmf->address, vma);
+		cow_user_page(new_page, old_page, vmf);
 	}
 
 	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-18 13:19   ` Jia He
  0 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-18 13:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose
  Cc: Ralph Campbell, Jia He, Andrew Morton, Anshuman Khandual,
	Jun Yao, Kaly Xin, Jérôme Glisse, Punit Agrawal,
	hejianet, Thomas Gleixner, Robin Murphy, Alex Van Brunt

When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

Below call trace is from arm64 do_page_fault for debugging purpose
[  110.016195] Call trace:
[  110.016826]  do_page_fault+0x5a4/0x690
[  110.017812]  do_mem_abort+0x50/0xb0
[  110.018726]  el1_da+0x20/0xc4
[  110.019492]  __arch_copy_from_user+0x180/0x280
[  110.020646]  do_wp_page+0xb0/0x860
[  110.021517]  __handle_mm_fault+0x994/0x1338
[  110.022606]  handle_mm_fault+0xe8/0x180
[  110.023584]  do_page_fault+0x240/0x690
[  110.024535]  do_mem_abort+0x50/0xb0
[  110.025423]  el0_da+0x20/0x24

The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

This patch fix it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()

[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork

Reported-by: Yibo Cai <Yibo.Cai@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..d2c130a5883b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
 					2;
 #endif
 
+#ifndef arch_faults_on_old_pte
+static inline bool arch_faults_on_old_pte(void)
+{
+	return false;
+}
+#endif
+
 static int __init disable_randmaps(char *s)
 {
 	randomize_va_space = 0;
@@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
 	return same;
 }
 
-static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
+static inline void cow_user_page(struct page *dst, struct page *src,
+				 struct vm_fault *vmf)
 {
+	struct vm_area_struct *vma = vmf->vma;
+	unsigned long addr = vmf->address;
+
 	debug_dma_assert_idle(src);
 
 	/*
@@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
 	 */
 	if (unlikely(!src)) {
 		void *kaddr = kmap_atomic(dst);
-		void __user *uaddr = (void __user *)(va & PAGE_MASK);
+		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
+		pte_t entry;
 
 		/*
 		 * This really shouldn't fail, because the page is there
 		 * in the page tables. But it might just be unreadable,
 		 * in which case we just give up and fill the result with
-		 * zeroes.
+		 * zeroes. On architectures with software "accessed" bits,
+		 * we would take a double page fault here, so mark it
+		 * accessed here.
 		 */
+		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+			spin_lock(vmf->ptl);
+			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
+				entry = pte_mkyoung(vmf->orig_pte);
+				if (ptep_set_access_flags(vma, addr,
+							  vmf->pte, entry, 0))
+					update_mmu_cache(vma, addr, vmf->pte);
+			}
+			spin_unlock(vmf->ptl);
+		}
+
 		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
 			clear_page(kaddr);
 		kunmap_atomic(kaddr);
 		flush_dcache_page(dst);
 	} else
-		copy_user_highpage(dst, src, va, vma);
+		copy_user_highpage(dst, src, addr, vma);
 }
 
 static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
@@ -2318,7 +2343,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 				vmf->address);
 		if (!new_page)
 			goto oom;
-		cow_user_page(new_page, old_page, vmf->address, vma);
+		cow_user_page(new_page, old_page, vmf);
 	}
 
 	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-18 13:19   ` Jia He
@ 2019-09-18 14:00     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-18 14:00 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose,
	Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, hejianet, Kaly Xin

On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> will be a double page fault in __copy_from_user_inatomic of cow_user_page.
> 
> Below call trace is from arm64 do_page_fault for debugging purpose
> [  110.016195] Call trace:
> [  110.016826]  do_page_fault+0x5a4/0x690
> [  110.017812]  do_mem_abort+0x50/0xb0
> [  110.018726]  el1_da+0x20/0xc4
> [  110.019492]  __arch_copy_from_user+0x180/0x280
> [  110.020646]  do_wp_page+0xb0/0x860
> [  110.021517]  __handle_mm_fault+0x994/0x1338
> [  110.022606]  handle_mm_fault+0xe8/0x180
> [  110.023584]  do_page_fault+0x240/0x690
> [  110.024535]  do_mem_abort+0x50/0xb0
> [  110.025423]  el0_da+0x20/0x24
> 
> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
> 
> As told by Catalin: "On arm64 without hardware Access Flag, copying from
> user will fail because the pte is old and cannot be marked young. So we
> always end up with zeroed page after fork() + CoW for pfn mappings. we
> don't always have a hardware-managed access flag on arm64."
> 
> This patch fix it by calling pte_mkyoung. Also, the parameter is
> changed because vmf should be passed to cow_user_page()
> 
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> 
> Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e2bb51b6242e..d2c130a5883b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
>  					2;
>  #endif
>  
> +#ifndef arch_faults_on_old_pte
> +static inline bool arch_faults_on_old_pte(void)
> +{
> +	return false;
> +}
> +#endif
> +
>  static int __init disable_randmaps(char *s)
>  {
>  	randomize_va_space = 0;
> @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
>  	return same;
>  }
>  
> -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
> +static inline void cow_user_page(struct page *dst, struct page *src,
> +				 struct vm_fault *vmf)
>  {
> +	struct vm_area_struct *vma = vmf->vma;
> +	unsigned long addr = vmf->address;
> +
>  	debug_dma_assert_idle(src);
>  
>  	/*
> @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
>  	 */
>  	if (unlikely(!src)) {
>  		void *kaddr = kmap_atomic(dst);
> -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> +		pte_t entry;
>  
>  		/*
>  		 * This really shouldn't fail, because the page is there
>  		 * in the page tables. But it might just be unreadable,
>  		 * in which case we just give up and fill the result with
> -		 * zeroes.
> +		 * zeroes. On architectures with software "accessed" bits,
> +		 * we would take a double page fault here, so mark it
> +		 * accessed here.
>  		 */
> +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> +			spin_lock(vmf->ptl);
> +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> +				entry = pte_mkyoung(vmf->orig_pte);
> +				if (ptep_set_access_flags(vma, addr,
> +							  vmf->pte, entry, 0))
> +					update_mmu_cache(vma, addr, vmf->pte);
> +			}

I don't follow.

So if pte has changed under you, you don't set the accessed bit, but never
the less copy from the user.

What makes you think it will not trigger the same problem?

I think we need to make cow_user_page() fail in this case and caller --
wp_page_copy() -- return zero. If the fault was solved by other thread, we
are fine. If not userspace would re-fault on the same address and we will
handle the fault from the second attempt.

> +			spin_unlock(vmf->ptl);
> +		}
> +
>  		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
>  			clear_page(kaddr);
>  		kunmap_atomic(kaddr);
>  		flush_dcache_page(dst);
>  	} else
> -		copy_user_highpage(dst, src, va, vma);
> +		copy_user_highpage(dst, src, addr, vma);
>  }
>  
>  static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
> @@ -2318,7 +2343,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>  				vmf->address);
>  		if (!new_page)
>  			goto oom;
> -		cow_user_page(new_page, old_page, vmf->address, vma);
> +		cow_user_page(new_page, old_page, vmf);
>  	}
>  
>  	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
> -- 
> 2.17.1
> 
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-18 14:00     ` Kirill A. Shutemov
  0 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-18 14:00 UTC (permalink / raw)
  To: Jia He
  Cc: Mark Rutland, Catalin Marinas, linux-mm, Punit Agrawal,
	Will Deacon, Alex Van Brunt, Marc Zyngier, Anshuman Khandual,
	Matthew Wilcox, Jun Yao, Kaly Xin, hejianet, Ralph Campbell,
	Suzuki Poulose, Jérôme Glisse, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, James Morse, Andrew Morton,
	Robin Murphy, Kirill A. Shutemov

On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> will be a double page fault in __copy_from_user_inatomic of cow_user_page.
> 
> Below call trace is from arm64 do_page_fault for debugging purpose
> [  110.016195] Call trace:
> [  110.016826]  do_page_fault+0x5a4/0x690
> [  110.017812]  do_mem_abort+0x50/0xb0
> [  110.018726]  el1_da+0x20/0xc4
> [  110.019492]  __arch_copy_from_user+0x180/0x280
> [  110.020646]  do_wp_page+0xb0/0x860
> [  110.021517]  __handle_mm_fault+0x994/0x1338
> [  110.022606]  handle_mm_fault+0xe8/0x180
> [  110.023584]  do_page_fault+0x240/0x690
> [  110.024535]  do_mem_abort+0x50/0xb0
> [  110.025423]  el0_da+0x20/0x24
> 
> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
> 
> As told by Catalin: "On arm64 without hardware Access Flag, copying from
> user will fail because the pte is old and cannot be marked young. So we
> always end up with zeroed page after fork() + CoW for pfn mappings. we
> don't always have a hardware-managed access flag on arm64."
> 
> This patch fix it by calling pte_mkyoung. Also, the parameter is
> changed because vmf should be passed to cow_user_page()
> 
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> 
> Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e2bb51b6242e..d2c130a5883b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
>  					2;
>  #endif
>  
> +#ifndef arch_faults_on_old_pte
> +static inline bool arch_faults_on_old_pte(void)
> +{
> +	return false;
> +}
> +#endif
> +
>  static int __init disable_randmaps(char *s)
>  {
>  	randomize_va_space = 0;
> @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
>  	return same;
>  }
>  
> -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
> +static inline void cow_user_page(struct page *dst, struct page *src,
> +				 struct vm_fault *vmf)
>  {
> +	struct vm_area_struct *vma = vmf->vma;
> +	unsigned long addr = vmf->address;
> +
>  	debug_dma_assert_idle(src);
>  
>  	/*
> @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
>  	 */
>  	if (unlikely(!src)) {
>  		void *kaddr = kmap_atomic(dst);
> -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> +		pte_t entry;
>  
>  		/*
>  		 * This really shouldn't fail, because the page is there
>  		 * in the page tables. But it might just be unreadable,
>  		 * in which case we just give up and fill the result with
> -		 * zeroes.
> +		 * zeroes. On architectures with software "accessed" bits,
> +		 * we would take a double page fault here, so mark it
> +		 * accessed here.
>  		 */
> +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> +			spin_lock(vmf->ptl);
> +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> +				entry = pte_mkyoung(vmf->orig_pte);
> +				if (ptep_set_access_flags(vma, addr,
> +							  vmf->pte, entry, 0))
> +					update_mmu_cache(vma, addr, vmf->pte);
> +			}

I don't follow.

So if pte has changed under you, you don't set the accessed bit, but never
the less copy from the user.

What makes you think it will not trigger the same problem?

I think we need to make cow_user_page() fail in this case and caller --
wp_page_copy() -- return zero. If the fault was solved by other thread, we
are fine. If not userspace would re-fault on the same address and we will
handle the fault from the second attempt.

> +			spin_unlock(vmf->ptl);
> +		}
> +
>  		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
>  			clear_page(kaddr);
>  		kunmap_atomic(kaddr);
>  		flush_dcache_page(dst);
>  	} else
> -		copy_user_highpage(dst, src, va, vma);
> +		copy_user_highpage(dst, src, addr, vma);
>  }
>  
>  static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
> @@ -2318,7 +2343,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>  				vmf->address);
>  		if (!new_page)
>  			goto oom;
> -		cow_user_page(new_page, old_page, vmf->address, vma);
> +		cow_user_page(new_page, old_page, vmf);
>  	}
>  
>  	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
> -- 
> 2.17.1
> 
> 

-- 
 Kirill A. Shutemov

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-09-18 13:19   ` Jia He
@ 2019-09-18 14:20     ` Matthew Wilcox
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Wilcox @ 2019-09-18 14:20 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Kirill A. Shutemov, linux-arm-kernel, linux-kernel,
	linux-mm, Suzuki Poulose, Punit Agrawal, Anshuman Khandual,
	Jun Yao, Alex Van Brunt, Robin Murphy, Thomas Gleixner,
	Andrew Morton, Jérôme Glisse, Ralph Campbell, hejianet,
	Kaly Xin

On Wed, Sep 18, 2019 at 09:19:12PM +0800, Jia He wrote:
> +/* Decouple AF from AFDBM. */
> +bool cpu_has_hw_af(void)
> +{
> +	return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);
> +}
> +

Do you really want to call read_cpuid() every time?  I would have thought
you'd want to use the static branch mechanism to do the right thing at
boot time.  See Documentation/static-keys.txt.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
@ 2019-09-18 14:20     ` Matthew Wilcox
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Wilcox @ 2019-09-18 14:20 UTC (permalink / raw)
  To: Jia He
  Cc: Mark Rutland, Kaly Xin, Ralph Campbell, Andrew Morton,
	Suzuki Poulose, Catalin Marinas, Anshuman Khandual, linux-kernel,
	Jun Yao, linux-mm, Jérôme Glisse, James Morse,
	linux-arm-kernel, Punit Agrawal, Marc Zyngier, hejianet,
	Thomas Gleixner, Will Deacon, Alex Van Brunt, Kirill A. Shutemov,
	Robin Murphy

On Wed, Sep 18, 2019 at 09:19:12PM +0800, Jia He wrote:
> +/* Decouple AF from AFDBM. */
> +bool cpu_has_hw_af(void)
> +{
> +	return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);
> +}
> +

Do you really want to call read_cpuid() every time?  I would have thought
you'd want to use the static branch mechanism to do the right thing at
boot time.  See Documentation/static-keys.txt.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-09-18 13:19   ` Jia He
@ 2019-09-18 14:20     ` Suzuki K Poulose
  -1 siblings, 0 replies; 39+ messages in thread
From: Suzuki K Poulose @ 2019-09-18 14:20 UTC (permalink / raw)
  To: Jia He, Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm
  Cc: Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, hejianet, Kaly Xin

Hi Jia,

On 18/09/2019 14:19, Jia He wrote:
> We unconditionally set the HW_AFDBM capability and only enable it on
> CPUs which really have the feature. But sometimes we need to know
> whether this cpu has the capability of HW AF. So decouple AF from
> DBM by new helper cpu_has_hw_af().
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
> ---
>   arch/arm64/include/asm/cpufeature.h | 1 +
>   arch/arm64/kernel/cpufeature.c      | 6 ++++++
>   2 files changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index c96ffa4722d3..206b6e3954cf 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -390,6 +390,7 @@ extern DECLARE_BITMAP(boot_capabilities, ARM64_NPATCHABLE);
>   	for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
>   
>   bool this_cpu_has_cap(unsigned int cap);
> +bool cpu_has_hw_af(void);
>   void cpu_set_feature(unsigned int num);
>   bool cpu_have_feature(unsigned int num);
>   unsigned long cpu_get_elf_hwcap(void);
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index b1fdc486aed8..c5097f58649d 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1141,6 +1141,12 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
>   	return true;
>   }
>   
> +/* Decouple AF from AFDBM. */
> +bool cpu_has_hw_af(void)
> +{
Sorry for not having asked this earlier. Are we interested in,

"whether *this* CPU has AF support ?" or "whether *at least one*
CPU has the AF support" ? The following code does the former.

> +	return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);

Getting the latter is tricky, and I think it is what we are looking
for here. In which case we may need something more to report this.

Kind regards
Suzuki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
@ 2019-09-18 14:20     ` Suzuki K Poulose
  0 siblings, 0 replies; 39+ messages in thread
From: Suzuki K Poulose @ 2019-09-18 14:20 UTC (permalink / raw)
  To: Jia He, Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm
  Cc: Ralph Campbell, Andrew Morton, Anshuman Khandual, Jun Yao,
	Kaly Xin, Jérôme Glisse, Punit Agrawal, hejianet,
	Thomas Gleixner, Robin Murphy, Alex Van Brunt

Hi Jia,

On 18/09/2019 14:19, Jia He wrote:
> We unconditionally set the HW_AFDBM capability and only enable it on
> CPUs which really have the feature. But sometimes we need to know
> whether this cpu has the capability of HW AF. So decouple AF from
> DBM by new helper cpu_has_hw_af().
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
> ---
>   arch/arm64/include/asm/cpufeature.h | 1 +
>   arch/arm64/kernel/cpufeature.c      | 6 ++++++
>   2 files changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index c96ffa4722d3..206b6e3954cf 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -390,6 +390,7 @@ extern DECLARE_BITMAP(boot_capabilities, ARM64_NPATCHABLE);
>   	for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
>   
>   bool this_cpu_has_cap(unsigned int cap);
> +bool cpu_has_hw_af(void);
>   void cpu_set_feature(unsigned int num);
>   bool cpu_have_feature(unsigned int num);
>   unsigned long cpu_get_elf_hwcap(void);
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index b1fdc486aed8..c5097f58649d 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1141,6 +1141,12 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
>   	return true;
>   }
>   
> +/* Decouple AF from AFDBM. */
> +bool cpu_has_hw_af(void)
> +{
Sorry for not having asked this earlier. Are we interested in,

"whether *this* CPU has AF support ?" or "whether *at least one*
CPU has the AF support" ? The following code does the former.

> +	return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);

Getting the latter is tricky, and I think it is what we are looking
for here. In which case we may need something more to report this.

Kind regards
Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-09-18 14:20     ` Suzuki K Poulose
@ 2019-09-18 16:45       ` Catalin Marinas
  -1 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-18 16:45 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Jia He, Will Deacon, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Anshuman Khandual,
	Jun Yao, Alex Van Brunt, Robin Murphy, Thomas Gleixner,
	Andrew Morton, Jérôme Glisse, Ralph Campbell, hejianet,
	Kaly Xin

On Wed, Sep 18, 2019 at 03:20:41PM +0100, Suzuki K Poulose wrote:
> On 18/09/2019 14:19, Jia He wrote:
> > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> > index c96ffa4722d3..206b6e3954cf 100644
> > --- a/arch/arm64/include/asm/cpufeature.h
> > +++ b/arch/arm64/include/asm/cpufeature.h
> > @@ -390,6 +390,7 @@ extern DECLARE_BITMAP(boot_capabilities, ARM64_NPATCHABLE);
> >   	for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
> >   bool this_cpu_has_cap(unsigned int cap);
> > +bool cpu_has_hw_af(void);
> >   void cpu_set_feature(unsigned int num);
> >   bool cpu_have_feature(unsigned int num);
> >   unsigned long cpu_get_elf_hwcap(void);
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index b1fdc486aed8..c5097f58649d 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1141,6 +1141,12 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
> >   	return true;
> >   }
> > +/* Decouple AF from AFDBM. */
> > +bool cpu_has_hw_af(void)
> > +{
> Sorry for not having asked this earlier. Are we interested in,
> 
> "whether *this* CPU has AF support ?" or "whether *at least one*
> CPU has the AF support" ? The following code does the former.
> 
> > +	return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);

In a non-preemptible context, the former is ok (per-CPU).

-- 
Catalin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
@ 2019-09-18 16:45       ` Catalin Marinas
  0 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-18 16:45 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Mark Rutland, Kaly Xin, Ralph Campbell, Jia He, Andrew Morton,
	Anshuman Khandual, Marc Zyngier, linux-kernel, Matthew Wilcox,
	Jun Yao, linux-mm, Jérôme Glisse, James Morse,
	linux-arm-kernel, Punit Agrawal, hejianet, Thomas Gleixner,
	Will Deacon, Alex Van Brunt, Kirill A. Shutemov, Robin Murphy

On Wed, Sep 18, 2019 at 03:20:41PM +0100, Suzuki K Poulose wrote:
> On 18/09/2019 14:19, Jia He wrote:
> > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> > index c96ffa4722d3..206b6e3954cf 100644
> > --- a/arch/arm64/include/asm/cpufeature.h
> > +++ b/arch/arm64/include/asm/cpufeature.h
> > @@ -390,6 +390,7 @@ extern DECLARE_BITMAP(boot_capabilities, ARM64_NPATCHABLE);
> >   	for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
> >   bool this_cpu_has_cap(unsigned int cap);
> > +bool cpu_has_hw_af(void);
> >   void cpu_set_feature(unsigned int num);
> >   bool cpu_have_feature(unsigned int num);
> >   unsigned long cpu_get_elf_hwcap(void);
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index b1fdc486aed8..c5097f58649d 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1141,6 +1141,12 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
> >   	return true;
> >   }
> > +/* Decouple AF from AFDBM. */
> > +bool cpu_has_hw_af(void)
> > +{
> Sorry for not having asked this earlier. Are we interested in,
> 
> "whether *this* CPU has AF support ?" or "whether *at least one*
> CPU has the AF support" ? The following code does the former.
> 
> > +	return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);

In a non-preemptible context, the former is ok (per-CPU).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-09-18 14:20     ` Matthew Wilcox
@ 2019-09-18 16:49       ` Catalin Marinas
  -1 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-18 16:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jia He, Will Deacon, Mark Rutland, James Morse, Marc Zyngier,
	Kirill A. Shutemov, linux-arm-kernel, linux-kernel, linux-mm,
	Suzuki Poulose, Punit Agrawal, Anshuman Khandual, Jun Yao,
	Alex Van Brunt, Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, hejianet, Kaly Xin

On Wed, Sep 18, 2019 at 07:20:17AM -0700, Matthew Wilcox wrote:
> On Wed, Sep 18, 2019 at 09:19:12PM +0800, Jia He wrote:
> > +/* Decouple AF from AFDBM. */
> > +bool cpu_has_hw_af(void)
> > +{
> > +	return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);
> > +}
> > +
> 
> Do you really want to call read_cpuid() every time?  I would have thought
> you'd want to use the static branch mechanism to do the right thing at
> boot time.  See Documentation/static-keys.txt.

We do have a static branch mechanism for other CPU features but mostly
because on some big.LITTLE systems, the CPUs may differ slightly so we
only advertise the common features.

I guess here the additional instructions for reading and checking the
CPUID here would be lost in the noise compared to the actual page
copying.

-- 
Catalin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
@ 2019-09-18 16:49       ` Catalin Marinas
  0 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-18 16:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mark Rutland, Kaly Xin, Ralph Campbell, Jia He, Andrew Morton,
	Suzuki Poulose, Marc Zyngier, Anshuman Khandual, linux-kernel,
	Jun Yao, linux-mm, Jérôme Glisse, James Morse,
	linux-arm-kernel, Punit Agrawal, hejianet, Thomas Gleixner,
	Will Deacon, Alex Van Brunt, Kirill A. Shutemov, Robin Murphy

On Wed, Sep 18, 2019 at 07:20:17AM -0700, Matthew Wilcox wrote:
> On Wed, Sep 18, 2019 at 09:19:12PM +0800, Jia He wrote:
> > +/* Decouple AF from AFDBM. */
> > +bool cpu_has_hw_af(void)
> > +{
> > +	return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);
> > +}
> > +
> 
> Do you really want to call read_cpuid() every time?  I would have thought
> you'd want to use the static branch mechanism to do the right thing at
> boot time.  See Documentation/static-keys.txt.

We do have a static branch mechanism for other CPU features but mostly
because on some big.LITTLE systems, the CPUs may differ slightly so we
only advertise the common features.

I guess here the additional instructions for reading and checking the
CPUID here would be lost in the noise compared to the actual page
copying.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-18 14:00     ` Kirill A. Shutemov
@ 2019-09-18 18:00       ` Catalin Marinas
  -1 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-18 18:00 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Jia He, Will Deacon, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Suzuki Poulose, Punit Agrawal,
	Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
	Thomas Gleixner, Andrew Morton, Jérôme Glisse,
	Ralph Campbell, hejianet, Kaly Xin

On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> >  	 */
> >  	if (unlikely(!src)) {
> >  		void *kaddr = kmap_atomic(dst);
> > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > +		pte_t entry;
> >  
> >  		/*
> >  		 * This really shouldn't fail, because the page is there
> >  		 * in the page tables. But it might just be unreadable,
> >  		 * in which case we just give up and fill the result with
> > -		 * zeroes.
> > +		 * zeroes. On architectures with software "accessed" bits,
> > +		 * we would take a double page fault here, so mark it
> > +		 * accessed here.
> >  		 */
> > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > +			spin_lock(vmf->ptl);
> > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > +				entry = pte_mkyoung(vmf->orig_pte);
> > +				if (ptep_set_access_flags(vma, addr,
> > +							  vmf->pte, entry, 0))
> > +					update_mmu_cache(vma, addr, vmf->pte);
> > +			}
> 
> I don't follow.
> 
> So if pte has changed under you, you don't set the accessed bit, but never
> the less copy from the user.
> 
> What makes you think it will not trigger the same problem?
> 
> I think we need to make cow_user_page() fail in this case and caller --
> wp_page_copy() -- return zero. If the fault was solved by other thread, we
> are fine. If not userspace would re-fault on the same address and we will
> handle the fault from the second attempt.

It would be nice to clarify the semantics of this function and do as
you suggest but the current comment is slightly confusing:

	/*
	 * If the source page was a PFN mapping, we don't have
	 * a "struct page" for it. We do a best-effort copy by
	 * just copying from the original user address. If that
	 * fails, we just zero-fill it. Live with it.
	 */

Would any user-space rely on getting a zero-filled page here instead of
a recursive fault?

-- 
Catalin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-18 18:00       ` Catalin Marinas
  0 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-18 18:00 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Mark Rutland, linux-mm, Punit Agrawal, Will Deacon,
	Alex Van Brunt, Jia He, Marc Zyngier, Anshuman Khandual,
	Matthew Wilcox, Jun Yao, Kaly Xin, hejianet, Ralph Campbell,
	Suzuki Poulose, Jérôme Glisse, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, James Morse, Andrew Morton,
	Robin Murphy, Kirill A. Shutemov

On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> >  	 */
> >  	if (unlikely(!src)) {
> >  		void *kaddr = kmap_atomic(dst);
> > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > +		pte_t entry;
> >  
> >  		/*
> >  		 * This really shouldn't fail, because the page is there
> >  		 * in the page tables. But it might just be unreadable,
> >  		 * in which case we just give up and fill the result with
> > -		 * zeroes.
> > +		 * zeroes. On architectures with software "accessed" bits,
> > +		 * we would take a double page fault here, so mark it
> > +		 * accessed here.
> >  		 */
> > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > +			spin_lock(vmf->ptl);
> > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > +				entry = pte_mkyoung(vmf->orig_pte);
> > +				if (ptep_set_access_flags(vma, addr,
> > +							  vmf->pte, entry, 0))
> > +					update_mmu_cache(vma, addr, vmf->pte);
> > +			}
> 
> I don't follow.
> 
> So if pte has changed under you, you don't set the accessed bit, but never
> the less copy from the user.
> 
> What makes you think it will not trigger the same problem?
> 
> I think we need to make cow_user_page() fail in this case and caller --
> wp_page_copy() -- return zero. If the fault was solved by other thread, we
> are fine. If not userspace would re-fault on the same address and we will
> handle the fault from the second attempt.

It would be nice to clarify the semantics of this function and do as
you suggest but the current comment is slightly confusing:

	/*
	 * If the source page was a PFN mapping, we don't have
	 * a "struct page" for it. We do a best-effort copy by
	 * just copying from the original user address. If that
	 * fails, we just zero-fill it. Live with it.
	 */

Would any user-space rely on getting a zero-filled page here instead of
a recursive fault?

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-18 13:19   ` Jia He
@ 2019-09-18 19:35     ` kbuild test robot
  -1 siblings, 0 replies; 39+ messages in thread
From: kbuild test robot @ 2019-09-18 19:35 UTC (permalink / raw)
  To: Jia He
  Cc: kbuild-all, Catalin Marinas, Will Deacon, Mark Rutland,
	James Morse, Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose,
	Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	=?unknown-8bit?B?SsOpcsO0bWU=?= Glisse, Ralph Campbell, hejianet,
	Kaly Xin, Jia He

[-- Attachment #1: Type: text/plain, Size: 1205 bytes --]

Hi Jia,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3 next-20190917]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jia-He/fix-double-page-fault-on-arm64/20190918-220036
config: arm64-allnoconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/memory.o: In function `wp_page_copy':
>> memory.c:(.text+0x8fc): undefined reference to `cpu_has_hw_af'
   memory.c:(.text+0x8fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `cpu_has_hw_af'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7916 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-18 19:35     ` kbuild test robot
  0 siblings, 0 replies; 39+ messages in thread
From: kbuild test robot @ 2019-09-18 19:35 UTC (permalink / raw)
  To: Jia He
  Cc: Mark Rutland, Catalin Marinas, linux-mm, Punit Agrawal,
	Will Deacon, Alex Van Brunt, Jia He, Marc Zyngier,
	Anshuman Khandual, Matthew Wilcox, Jun Yao, Kaly Xin, hejianet,
	Ralph Campbell, Suzuki Poulose,
	=?unknown-8bit?B?SsOpcsO0bWU=?= Glisse, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, James Morse, kbuild-all,
	Andrew Morton, Robin Murphy, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 1205 bytes --]

Hi Jia,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3 next-20190917]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jia-He/fix-double-page-fault-on-arm64/20190918-220036
config: arm64-allnoconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/memory.o: In function `wp_page_copy':
>> memory.c:(.text+0x8fc): undefined reference to `cpu_has_hw_af'
   memory.c:(.text+0x8fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `cpu_has_hw_af'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7916 bytes --]

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-18 19:35     ` kbuild test robot
  (?)
@ 2019-09-19  1:46       ` Justin He (Arm Technology China)
  -1 siblings, 0 replies; 39+ messages in thread
From: Justin He (Arm Technology China) @ 2019-09-19  1:46 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Catalin Marinas, Will Deacon, Mark Rutland,
	James Morse, Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose,
	Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton, jglisse,
	Ralph Campbell, hejianet, Kaly Xin (Arm Technology China)



> -----Original Message-----
> From: kbuild test robot <lkp@intel.com>
> Sent: 2019年9月19日 3:36
> To: Justin He (Arm Technology China) <Justin.He@arm.com>
> Cc: kbuild-all@01.org; Catalin Marinas <Catalin.Marinas@arm.com>; Will
> Deacon <will@kernel.org>; Mark Rutland <Mark.Rutland@arm.com>;
> James Morse <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>;
> Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Suzuki Poulose
> <Suzuki.Poulose@arm.com>; Punit Agrawal <punitagrawal@gmail.com>;
> Anshuman Khandual <Anshuman.Khandual@arm.com>; Jun Yao
> <yaojun8558363@gmail.com>; Alex Van Brunt <avanbrunt@nvidia.com>;
> Robin Murphy <Robin.Murphy@arm.com>; Thomas Gleixner
> <tglx@linutronix.de>; Andrew Morton <akpm@linux-foundation.org>;
> jglisse@redhat.com; Ralph Campbell <rcampbell@nvidia.com>;
> hejianet@gmail.com; Kaly Xin (Arm Technology China)
> <Kaly.Xin@arm.com>; Justin He (Arm Technology China)
> <Justin.He@arm.com>
> Subject: Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF
> is cleared
>
> Hi Jia,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3 next-20190917]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Jia-He/fix-double-page-
> fault-on-arm64/20190918-220036
> config: arm64-allnoconfig (attached as .config)
> compiler: aarch64-linux-gcc (GCC) 7.4.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=7.4.0 make.cross ARCH=arm64
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>    mm/memory.o: In function `wp_page_copy':
> >> memory.c:(.text+0x8fc): undefined reference to `cpu_has_hw_af'
>    memory.c:(.text+0x8fc): relocation truncated to fit: R_AARCH64_CALL26
> against undefined symbol `cpu_has_hw_af'
>
Ah, I should add a stub for CONFIG_ARM64_HW_AFDBM is 'N' on arm64 arch
Will fix it asap

--
Cheers,
Justin (Jia He)


> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19  1:46       ` Justin He (Arm Technology China)
  0 siblings, 0 replies; 39+ messages in thread
From: Justin He (Arm Technology China) @ 2019-09-19  1:46 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Catalin Marinas, Will Deacon, Mark Rutland,
	James Morse, Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose,
	Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton, jglisse,
	Ralph Campbell, hejianet, Kaly Xin (Arm Technology China)



> -----Original Message-----
> From: kbuild test robot <lkp@intel.com>
> Sent: 2019年9月19日 3:36
> To: Justin He (Arm Technology China) <Justin.He@arm.com>
> Cc: kbuild-all@01.org; Catalin Marinas <Catalin.Marinas@arm.com>; Will
> Deacon <will@kernel.org>; Mark Rutland <Mark.Rutland@arm.com>;
> James Morse <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>;
> Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Suzuki Poulose
> <Suzuki.Poulose@arm.com>; Punit Agrawal <punitagrawal@gmail.com>;
> Anshuman Khandual <Anshuman.Khandual@arm.com>; Jun Yao
> <yaojun8558363@gmail.com>; Alex Van Brunt <avanbrunt@nvidia.com>;
> Robin Murphy <Robin.Murphy@arm.com>; Thomas Gleixner
> <tglx@linutronix.de>; Andrew Morton <akpm@linux-foundation.org>;
> jglisse@redhat.com; Ralph Campbell <rcampbell@nvidia.com>;
> hejianet@gmail.com; Kaly Xin (Arm Technology China)
> <Kaly.Xin@arm.com>; Justin He (Arm Technology China)
> <Justin.He@arm.com>
> Subject: Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF
> is cleared
>
> Hi Jia,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3 next-20190917]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Jia-He/fix-double-page-
> fault-on-arm64/20190918-220036
> config: arm64-allnoconfig (attached as .config)
> compiler: aarch64-linux-gcc (GCC) 7.4.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=7.4.0 make.cross ARCH=arm64
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>    mm/memory.o: In function `wp_page_copy':
> >> memory.c:(.text+0x8fc): undefined reference to `cpu_has_hw_af'
>    memory.c:(.text+0x8fc): relocation truncated to fit: R_AARCH64_CALL26
> against undefined symbol `cpu_has_hw_af'
>
Ah, I should add a stub for CONFIG_ARM64_HW_AFDBM is 'N' on arm64 arch
Will fix it asap

--
Cheers,
Justin (Jia He)


> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19  1:46       ` Justin He (Arm Technology China)
  0 siblings, 0 replies; 39+ messages in thread
From: Justin He (Arm Technology China) @ 2019-09-19  1:46 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Mark Rutland, Catalin Marinas, linux-mm, Punit Agrawal,
	Will Deacon, Alex Van Brunt, Marc Zyngier, Anshuman Khandual,
	Matthew Wilcox, Jun Yao, Kaly Xin (Arm Technology China),
	hejianet, Ralph Campbell, Suzuki Poulose, jglisse,
	Thomas Gleixner, linux-arm-kernel, linux-kernel, James Morse,
	kbuild-all, Andrew Morton, Robin Murphy, Kirill A. Shutemov



> -----Original Message-----
> From: kbuild test robot <lkp@intel.com>
> Sent: 2019年9月19日 3:36
> To: Justin He (Arm Technology China) <Justin.He@arm.com>
> Cc: kbuild-all@01.org; Catalin Marinas <Catalin.Marinas@arm.com>; Will
> Deacon <will@kernel.org>; Mark Rutland <Mark.Rutland@arm.com>;
> James Morse <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>;
> Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Suzuki Poulose
> <Suzuki.Poulose@arm.com>; Punit Agrawal <punitagrawal@gmail.com>;
> Anshuman Khandual <Anshuman.Khandual@arm.com>; Jun Yao
> <yaojun8558363@gmail.com>; Alex Van Brunt <avanbrunt@nvidia.com>;
> Robin Murphy <Robin.Murphy@arm.com>; Thomas Gleixner
> <tglx@linutronix.de>; Andrew Morton <akpm@linux-foundation.org>;
> jglisse@redhat.com; Ralph Campbell <rcampbell@nvidia.com>;
> hejianet@gmail.com; Kaly Xin (Arm Technology China)
> <Kaly.Xin@arm.com>; Justin He (Arm Technology China)
> <Justin.He@arm.com>
> Subject: Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF
> is cleared
>
> Hi Jia,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3 next-20190917]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Jia-He/fix-double-page-
> fault-on-arm64/20190918-220036
> config: arm64-allnoconfig (attached as .config)
> compiler: aarch64-linux-gcc (GCC) 7.4.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=7.4.0 make.cross ARCH=arm64
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>    mm/memory.o: In function `wp_page_copy':
> >> memory.c:(.text+0x8fc): undefined reference to `cpu_has_hw_af'
>    memory.c:(.text+0x8fc): relocation truncated to fit: R_AARCH64_CALL26
> against undefined symbol `cpu_has_hw_af'
>
Ah, I should add a stub for CONFIG_ARM64_HW_AFDBM is 'N' on arm64 arch
Will fix it asap

--
Cheers,
Justin (Jia He)


> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
  2019-09-18 16:45       ` Catalin Marinas
@ 2019-09-19  1:55         ` Justin He (Arm Technology China)
  -1 siblings, 0 replies; 39+ messages in thread
From: Justin He (Arm Technology China) @ 2019-09-19  1:55 UTC (permalink / raw)
  To: Catalin Marinas, Suzuki Poulose
  Cc: Will Deacon, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Punit Agrawal, Anshuman Khandual,
	Jun Yao, Alex Van Brunt, Robin Murphy, Thomas Gleixner,
	Andrew Morton, Jérôme Glisse, Ralph Campbell, hejianet,
	Kaly Xin (Arm Technology China)

Hi Suzuki

> -----Original Message-----
> From: Catalin Marinas <catalin.marinas@arm.com>
> Sent: 2019年9月19日 0:46
> To: Suzuki Poulose <Suzuki.Poulose@arm.com>
> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Will Deacon
> <will@kernel.org>; Mark Rutland <Mark.Rutland@arm.com>; James Morse
> <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>; Matthew
> Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> <punitagrawal@gmail.com>; Anshuman Khandual
> <Anshuman.Khandual@arm.com>; Jun Yao <yaojun8558363@gmail.com>;
> Alex Van Brunt <avanbrunt@nvidia.com>; Robin Murphy
> <Robin.Murphy@arm.com>; Thomas Gleixner <tglx@linutronix.de>;
> Andrew Morton <akpm@linux-foundation.org>; Jérôme Glisse
> <jglisse@redhat.com>; Ralph Campbell <rcampbell@nvidia.com>;
> hejianet@gmail.com; Kaly Xin (Arm Technology China) <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v4 1/3] arm64: cpufeature: introduce helper
> cpu_has_hw_af()
>
> On Wed, Sep 18, 2019 at 03:20:41PM +0100, Suzuki K Poulose wrote:
> > On 18/09/2019 14:19, Jia He wrote:
> > > diff --git a/arch/arm64/include/asm/cpufeature.h
> b/arch/arm64/include/asm/cpufeature.h
> > > index c96ffa4722d3..206b6e3954cf 100644
> > > --- a/arch/arm64/include/asm/cpufeature.h
> > > +++ b/arch/arm64/include/asm/cpufeature.h
> > > @@ -390,6 +390,7 @@ extern DECLARE_BITMAP(boot_capabilities,
> ARM64_NPATCHABLE);
> > >           for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
> > >   bool this_cpu_has_cap(unsigned int cap);
> > > +bool cpu_has_hw_af(void);
> > >   void cpu_set_feature(unsigned int num);
> > >   bool cpu_have_feature(unsigned int num);
> > >   unsigned long cpu_get_elf_hwcap(void);
> > > diff --git a/arch/arm64/kernel/cpufeature.c
> b/arch/arm64/kernel/cpufeature.c
> > > index b1fdc486aed8..c5097f58649d 100644
> > > --- a/arch/arm64/kernel/cpufeature.c
> > > +++ b/arch/arm64/kernel/cpufeature.c
> > > @@ -1141,6 +1141,12 @@ static bool has_hw_dbm(const struct
> arm64_cpu_capabilities *cap,
> > >           return true;
> > >   }
> > > +/* Decouple AF from AFDBM. */
> > > +bool cpu_has_hw_af(void)
> > > +{
> > Sorry for not having asked this earlier. Are we interested in,
> >
> > "whether *this* CPU has AF support ?" or "whether *at least one*
> > CPU has the AF support" ? The following code does the former.
> >
> > > + return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);
>
> In a non-preemptible context, the former is ok (per-CPU).

Yes, just as what Catalin explained, we need the former because the
pagefault occurred in every cpus

--
Cheers,
Justin (Jia He)


>
> --
> Catalin
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()
@ 2019-09-19  1:55         ` Justin He (Arm Technology China)
  0 siblings, 0 replies; 39+ messages in thread
From: Justin He (Arm Technology China) @ 2019-09-19  1:55 UTC (permalink / raw)
  To: Catalin Marinas, Suzuki Poulose
  Cc: Mark Rutland, Kaly Xin (Arm Technology China),
	Ralph Campbell, Andrew Morton, Anshuman Khandual, Marc Zyngier,
	linux-kernel, Matthew Wilcox, Jun Yao, linux-mm,
	Jérôme Glisse, James Morse, linux-arm-kernel,
	Punit Agrawal, hejianet, Thomas Gleixner, Will Deacon,
	Alex Van Brunt, Kirill A. Shutemov, Robin Murphy

Hi Suzuki

> -----Original Message-----
> From: Catalin Marinas <catalin.marinas@arm.com>
> Sent: 2019年9月19日 0:46
> To: Suzuki Poulose <Suzuki.Poulose@arm.com>
> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Will Deacon
> <will@kernel.org>; Mark Rutland <Mark.Rutland@arm.com>; James Morse
> <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>; Matthew
> Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> <punitagrawal@gmail.com>; Anshuman Khandual
> <Anshuman.Khandual@arm.com>; Jun Yao <yaojun8558363@gmail.com>;
> Alex Van Brunt <avanbrunt@nvidia.com>; Robin Murphy
> <Robin.Murphy@arm.com>; Thomas Gleixner <tglx@linutronix.de>;
> Andrew Morton <akpm@linux-foundation.org>; Jérôme Glisse
> <jglisse@redhat.com>; Ralph Campbell <rcampbell@nvidia.com>;
> hejianet@gmail.com; Kaly Xin (Arm Technology China) <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v4 1/3] arm64: cpufeature: introduce helper
> cpu_has_hw_af()
>
> On Wed, Sep 18, 2019 at 03:20:41PM +0100, Suzuki K Poulose wrote:
> > On 18/09/2019 14:19, Jia He wrote:
> > > diff --git a/arch/arm64/include/asm/cpufeature.h
> b/arch/arm64/include/asm/cpufeature.h
> > > index c96ffa4722d3..206b6e3954cf 100644
> > > --- a/arch/arm64/include/asm/cpufeature.h
> > > +++ b/arch/arm64/include/asm/cpufeature.h
> > > @@ -390,6 +390,7 @@ extern DECLARE_BITMAP(boot_capabilities,
> ARM64_NPATCHABLE);
> > >           for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
> > >   bool this_cpu_has_cap(unsigned int cap);
> > > +bool cpu_has_hw_af(void);
> > >   void cpu_set_feature(unsigned int num);
> > >   bool cpu_have_feature(unsigned int num);
> > >   unsigned long cpu_get_elf_hwcap(void);
> > > diff --git a/arch/arm64/kernel/cpufeature.c
> b/arch/arm64/kernel/cpufeature.c
> > > index b1fdc486aed8..c5097f58649d 100644
> > > --- a/arch/arm64/kernel/cpufeature.c
> > > +++ b/arch/arm64/kernel/cpufeature.c
> > > @@ -1141,6 +1141,12 @@ static bool has_hw_dbm(const struct
> arm64_cpu_capabilities *cap,
> > >           return true;
> > >   }
> > > +/* Decouple AF from AFDBM. */
> > > +bool cpu_has_hw_af(void)
> > > +{
> > Sorry for not having asked this earlier. Are we interested in,
> >
> > "whether *this* CPU has AF support ?" or "whether *at least one*
> > CPU has the AF support" ? The following code does the former.
> >
> > > + return (read_cpuid(ID_AA64MMFR1_EL1) & 0xf);
>
> In a non-preemptible context, the former is ok (per-CPU).

Yes, just as what Catalin explained, we need the former because the
pagefault occurred in every cpus

--
Cheers,
Justin (Jia He)


>
> --
> Catalin
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-18 14:00     ` Kirill A. Shutemov
@ 2019-09-19  2:16       ` Jia He
  -1 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-19  2:16 UTC (permalink / raw)
  To: Kirill A. Shutemov, Jia He
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose,
	Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, Kaly Xin

Hi Kirill

[On behalf of justin.he@arm.com because some mails are filted...]

On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
>> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
>> will be a double page fault in __copy_from_user_inatomic of cow_user_page.
>>
>> Below call trace is from arm64 do_page_fault for debugging purpose
>> [  110.016195] Call trace:
>> [  110.016826]  do_page_fault+0x5a4/0x690
>> [  110.017812]  do_mem_abort+0x50/0xb0
>> [  110.018726]  el1_da+0x20/0xc4
>> [  110.019492]  __arch_copy_from_user+0x180/0x280
>> [  110.020646]  do_wp_page+0xb0/0x860
>> [  110.021517]  __handle_mm_fault+0x994/0x1338
>> [  110.022606]  handle_mm_fault+0xe8/0x180
>> [  110.023584]  do_page_fault+0x240/0x690
>> [  110.024535]  do_mem_abort+0x50/0xb0
>> [  110.025423]  el0_da+0x20/0x24
>>
>> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
>> [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
>>
>> As told by Catalin: "On arm64 without hardware Access Flag, copying from
>> user will fail because the pte is old and cannot be marked young. So we
>> always end up with zeroed page after fork() + CoW for pfn mappings. we
>> don't always have a hardware-managed access flag on arm64."
>>
>> This patch fix it by calling pte_mkyoung. Also, the parameter is
>> changed because vmf should be passed to cow_user_page()
>>
>> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
>>
>> Reported-by: Yibo Cai <Yibo.Cai@arm.com>
>> Signed-off-by: Jia He <justin.he@arm.com>
>> ---
>>   mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
>>   1 file changed, 30 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index e2bb51b6242e..d2c130a5883b 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
>>   					2;
>>   #endif
>>   
>> +#ifndef arch_faults_on_old_pte
>> +static inline bool arch_faults_on_old_pte(void)
>> +{
>> +	return false;
>> +}
>> +#endif
>> +
>>   static int __init disable_randmaps(char *s)
>>   {
>>   	randomize_va_space = 0;
>> @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
>>   	return same;
>>   }
>>   
>> -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
>> +static inline void cow_user_page(struct page *dst, struct page *src,
>> +				 struct vm_fault *vmf)
>>   {
>> +	struct vm_area_struct *vma = vmf->vma;
>> +	unsigned long addr = vmf->address;
>> +
>>   	debug_dma_assert_idle(src);
>>   
>>   	/*
>> @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
>>   	 */
>>   	if (unlikely(!src)) {
>>   		void *kaddr = kmap_atomic(dst);
>> -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
>> +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
>> +		pte_t entry;
>>   
>>   		/*
>>   		 * This really shouldn't fail, because the page is there
>>   		 * in the page tables. But it might just be unreadable,
>>   		 * in which case we just give up and fill the result with
>> -		 * zeroes.
>> +		 * zeroes. On architectures with software "accessed" bits,
>> +		 * we would take a double page fault here, so mark it
>> +		 * accessed here.
>>   		 */
>> +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
>> +			spin_lock(vmf->ptl);
>> +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
>> +				entry = pte_mkyoung(vmf->orig_pte);
>> +				if (ptep_set_access_flags(vma, addr,
>> +							  vmf->pte, entry, 0))
>> +					update_mmu_cache(vma, addr, vmf->pte);
>> +			}
> I don't follow.
>
> So if pte has changed under you, you don't set the accessed bit, but never
> the less copy from the user.
>
> What makes you think it will not trigger the same problem?
>
> I think we need to make cow_user_page() fail in this case and caller --
> wp_page_copy() -- return zero. If the fault was solved by other thread, we
> are fine. If not userspace would re-fault on the same address and we will
> handle the fault from the second attempt.

Thanks for the pointing. How about make cow_user_page() be returned

VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?

---
Cheers,
Justin (Jia He)

>
>> +			spin_unlock(vmf->ptl);
>> +		}
>> +
>>   		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
>>   			clear_page(kaddr);
>>   		kunmap_atomic(kaddr);
>>   		flush_dcache_page(dst);
>>   	} else
>> -		copy_user_highpage(dst, src, va, vma);
>> +		copy_user_highpage(dst, src, addr, vma);
>>   }
>>   
>>   static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
>> @@ -2318,7 +2343,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>>   				vmf->address);
>>   		if (!new_page)
>>   			goto oom;
>> -		cow_user_page(new_page, old_page, vmf->address, vma);
>> +		cow_user_page(new_page, old_page, vmf);
>>   	}
>>   
>>   	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
>> -- 
>> 2.17.1
>>
>>
-- 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19  2:16       ` Jia He
  0 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-19  2:16 UTC (permalink / raw)
  To: Kirill A. Shutemov, Jia He
  Cc: Mark Rutland, Kaly Xin, Ralph Campbell, Andrew Morton,
	Suzuki Poulose, Catalin Marinas, Anshuman Khandual, linux-kernel,
	Matthew Wilcox, Jun Yao, linux-mm, Jérôme Glisse,
	James Morse, linux-arm-kernel, Punit Agrawal, Marc Zyngier,
	Thomas Gleixner, Will Deacon, Alex Van Brunt, Kirill A. Shutemov,
	Robin Murphy

Hi Kirill

[On behalf of justin.he@arm.com because some mails are filted...]

On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
>> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
>> will be a double page fault in __copy_from_user_inatomic of cow_user_page.
>>
>> Below call trace is from arm64 do_page_fault for debugging purpose
>> [  110.016195] Call trace:
>> [  110.016826]  do_page_fault+0x5a4/0x690
>> [  110.017812]  do_mem_abort+0x50/0xb0
>> [  110.018726]  el1_da+0x20/0xc4
>> [  110.019492]  __arch_copy_from_user+0x180/0x280
>> [  110.020646]  do_wp_page+0xb0/0x860
>> [  110.021517]  __handle_mm_fault+0x994/0x1338
>> [  110.022606]  handle_mm_fault+0xe8/0x180
>> [  110.023584]  do_page_fault+0x240/0x690
>> [  110.024535]  do_mem_abort+0x50/0xb0
>> [  110.025423]  el0_da+0x20/0x24
>>
>> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
>> [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
>>
>> As told by Catalin: "On arm64 without hardware Access Flag, copying from
>> user will fail because the pte is old and cannot be marked young. So we
>> always end up with zeroed page after fork() + CoW for pfn mappings. we
>> don't always have a hardware-managed access flag on arm64."
>>
>> This patch fix it by calling pte_mkyoung. Also, the parameter is
>> changed because vmf should be passed to cow_user_page()
>>
>> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
>>
>> Reported-by: Yibo Cai <Yibo.Cai@arm.com>
>> Signed-off-by: Jia He <justin.he@arm.com>
>> ---
>>   mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
>>   1 file changed, 30 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index e2bb51b6242e..d2c130a5883b 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
>>   					2;
>>   #endif
>>   
>> +#ifndef arch_faults_on_old_pte
>> +static inline bool arch_faults_on_old_pte(void)
>> +{
>> +	return false;
>> +}
>> +#endif
>> +
>>   static int __init disable_randmaps(char *s)
>>   {
>>   	randomize_va_space = 0;
>> @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
>>   	return same;
>>   }
>>   
>> -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
>> +static inline void cow_user_page(struct page *dst, struct page *src,
>> +				 struct vm_fault *vmf)
>>   {
>> +	struct vm_area_struct *vma = vmf->vma;
>> +	unsigned long addr = vmf->address;
>> +
>>   	debug_dma_assert_idle(src);
>>   
>>   	/*
>> @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
>>   	 */
>>   	if (unlikely(!src)) {
>>   		void *kaddr = kmap_atomic(dst);
>> -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
>> +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
>> +		pte_t entry;
>>   
>>   		/*
>>   		 * This really shouldn't fail, because the page is there
>>   		 * in the page tables. But it might just be unreadable,
>>   		 * in which case we just give up and fill the result with
>> -		 * zeroes.
>> +		 * zeroes. On architectures with software "accessed" bits,
>> +		 * we would take a double page fault here, so mark it
>> +		 * accessed here.
>>   		 */
>> +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
>> +			spin_lock(vmf->ptl);
>> +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
>> +				entry = pte_mkyoung(vmf->orig_pte);
>> +				if (ptep_set_access_flags(vma, addr,
>> +							  vmf->pte, entry, 0))
>> +					update_mmu_cache(vma, addr, vmf->pte);
>> +			}
> I don't follow.
>
> So if pte has changed under you, you don't set the accessed bit, but never
> the less copy from the user.
>
> What makes you think it will not trigger the same problem?
>
> I think we need to make cow_user_page() fail in this case and caller --
> wp_page_copy() -- return zero. If the fault was solved by other thread, we
> are fine. If not userspace would re-fault on the same address and we will
> handle the fault from the second attempt.

Thanks for the pointing. How about make cow_user_page() be returned

VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?

---
Cheers,
Justin (Jia He)

>
>> +			spin_unlock(vmf->ptl);
>> +		}
>> +
>>   		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
>>   			clear_page(kaddr);
>>   		kunmap_atomic(kaddr);
>>   		flush_dcache_page(dst);
>>   	} else
>> -		copy_user_highpage(dst, src, va, vma);
>> +		copy_user_highpage(dst, src, addr, vma);
>>   }
>>   
>>   static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
>> @@ -2318,7 +2343,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>>   				vmf->address);
>>   		if (!new_page)
>>   			goto oom;
>> -		cow_user_page(new_page, old_page, vmf->address, vma);
>> +		cow_user_page(new_page, old_page, vmf);
>>   	}
>>   
>>   	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
>> -- 
>> 2.17.1
>>
>>
-- 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-19  2:16       ` Jia He
@ 2019-09-19 14:57         ` Kirill A. Shutemov
  -1 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 14:57 UTC (permalink / raw)
  To: Jia He
  Cc: Jia He, Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose,
	Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, Kaly Xin

On Thu, Sep 19, 2019 at 10:16:34AM +0800, Jia He wrote:
> Hi Kirill
> 
> [On behalf of justin.he@arm.com because some mails are filted...]
> 
> On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> > > will be a double page fault in __copy_from_user_inatomic of cow_user_page.
> > > 
> > > Below call trace is from arm64 do_page_fault for debugging purpose
> > > [  110.016195] Call trace:
> > > [  110.016826]  do_page_fault+0x5a4/0x690
> > > [  110.017812]  do_mem_abort+0x50/0xb0
> > > [  110.018726]  el1_da+0x20/0xc4
> > > [  110.019492]  __arch_copy_from_user+0x180/0x280
> > > [  110.020646]  do_wp_page+0xb0/0x860
> > > [  110.021517]  __handle_mm_fault+0x994/0x1338
> > > [  110.022606]  handle_mm_fault+0xe8/0x180
> > > [  110.023584]  do_page_fault+0x240/0x690
> > > [  110.024535]  do_mem_abort+0x50/0xb0
> > > [  110.025423]  el0_da+0x20/0x24
> > > 
> > > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
> > > 
> > > As told by Catalin: "On arm64 without hardware Access Flag, copying from
> > > user will fail because the pte is old and cannot be marked young. So we
> > > always end up with zeroed page after fork() + CoW for pfn mappings. we
> > > don't always have a hardware-managed access flag on arm64."
> > > 
> > > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > > changed because vmf should be passed to cow_user_page()
> > > 
> > > [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> > > 
> > > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > ---
> > >   mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
> > >   1 file changed, 30 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index e2bb51b6242e..d2c130a5883b 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > >   					2;
> > >   #endif
> > > +#ifndef arch_faults_on_old_pte
> > > +static inline bool arch_faults_on_old_pte(void)
> > > +{
> > > +	return false;
> > > +}
> > > +#endif
> > > +
> > >   static int __init disable_randmaps(char *s)
> > >   {
> > >   	randomize_va_space = 0;
> > > @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
> > >   	return same;
> > >   }
> > > -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
> > > +static inline void cow_user_page(struct page *dst, struct page *src,
> > > +				 struct vm_fault *vmf)
> > >   {
> > > +	struct vm_area_struct *vma = vmf->vma;
> > > +	unsigned long addr = vmf->address;
> > > +
> > >   	debug_dma_assert_idle(src);
> > >   	/*
> > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > >   	 */
> > >   	if (unlikely(!src)) {
> > >   		void *kaddr = kmap_atomic(dst);
> > > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > +		pte_t entry;
> > >   		/*
> > >   		 * This really shouldn't fail, because the page is there
> > >   		 * in the page tables. But it might just be unreadable,
> > >   		 * in which case we just give up and fill the result with
> > > -		 * zeroes.
> > > +		 * zeroes. On architectures with software "accessed" bits,
> > > +		 * we would take a double page fault here, so mark it
> > > +		 * accessed here.
> > >   		 */
> > > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > +			spin_lock(vmf->ptl);
> > > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > +				entry = pte_mkyoung(vmf->orig_pte);
> > > +				if (ptep_set_access_flags(vma, addr,
> > > +							  vmf->pte, entry, 0))
> > > +					update_mmu_cache(vma, addr, vmf->pte);
> > > +			}
> > I don't follow.
> > 
> > So if pte has changed under you, you don't set the accessed bit, but never
> > the less copy from the user.
> > 
> > What makes you think it will not trigger the same problem?
> > 
> > I think we need to make cow_user_page() fail in this case and caller --
> > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > are fine. If not userspace would re-fault on the same address and we will
> > handle the fault from the second attempt.
> 
> Thanks for the pointing. How about make cow_user_page() be returned
> 
> VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?

No. VM_FAULT_RETRY has different semantics: we have to drop mmap_sem(), so
let's try to take it again and handle the fault. In this case the more
likely scenario is that other thread has already handled the fault and we
don't need to do anything. If it's not the case, the fault will be
triggered again on the same address.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 14:57         ` Kirill A. Shutemov
  0 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 14:57 UTC (permalink / raw)
  To: Jia He
  Cc: Mark Rutland, Catalin Marinas, linux-mm, Punit Agrawal,
	Will Deacon, Alex Van Brunt, Jia He, Marc Zyngier,
	Anshuman Khandual, Matthew Wilcox, Jun Yao, Kaly Xin,
	Ralph Campbell, Suzuki Poulose, Jérôme Glisse,
	Thomas Gleixner, linux-arm-kernel, linux-kernel, James Morse,
	Andrew Morton, Robin Murphy, Kirill A. Shutemov

On Thu, Sep 19, 2019 at 10:16:34AM +0800, Jia He wrote:
> Hi Kirill
> 
> [On behalf of justin.he@arm.com because some mails are filted...]
> 
> On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> > > will be a double page fault in __copy_from_user_inatomic of cow_user_page.
> > > 
> > > Below call trace is from arm64 do_page_fault for debugging purpose
> > > [  110.016195] Call trace:
> > > [  110.016826]  do_page_fault+0x5a4/0x690
> > > [  110.017812]  do_mem_abort+0x50/0xb0
> > > [  110.018726]  el1_da+0x20/0xc4
> > > [  110.019492]  __arch_copy_from_user+0x180/0x280
> > > [  110.020646]  do_wp_page+0xb0/0x860
> > > [  110.021517]  __handle_mm_fault+0x994/0x1338
> > > [  110.022606]  handle_mm_fault+0xe8/0x180
> > > [  110.023584]  do_page_fault+0x240/0x690
> > > [  110.024535]  do_mem_abort+0x50/0xb0
> > > [  110.025423]  el0_da+0x20/0x24
> > > 
> > > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
> > > 
> > > As told by Catalin: "On arm64 without hardware Access Flag, copying from
> > > user will fail because the pte is old and cannot be marked young. So we
> > > always end up with zeroed page after fork() + CoW for pfn mappings. we
> > > don't always have a hardware-managed access flag on arm64."
> > > 
> > > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > > changed because vmf should be passed to cow_user_page()
> > > 
> > > [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> > > 
> > > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > ---
> > >   mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
> > >   1 file changed, 30 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index e2bb51b6242e..d2c130a5883b 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > >   					2;
> > >   #endif
> > > +#ifndef arch_faults_on_old_pte
> > > +static inline bool arch_faults_on_old_pte(void)
> > > +{
> > > +	return false;
> > > +}
> > > +#endif
> > > +
> > >   static int __init disable_randmaps(char *s)
> > >   {
> > >   	randomize_va_space = 0;
> > > @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
> > >   	return same;
> > >   }
> > > -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
> > > +static inline void cow_user_page(struct page *dst, struct page *src,
> > > +				 struct vm_fault *vmf)
> > >   {
> > > +	struct vm_area_struct *vma = vmf->vma;
> > > +	unsigned long addr = vmf->address;
> > > +
> > >   	debug_dma_assert_idle(src);
> > >   	/*
> > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > >   	 */
> > >   	if (unlikely(!src)) {
> > >   		void *kaddr = kmap_atomic(dst);
> > > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > +		pte_t entry;
> > >   		/*
> > >   		 * This really shouldn't fail, because the page is there
> > >   		 * in the page tables. But it might just be unreadable,
> > >   		 * in which case we just give up and fill the result with
> > > -		 * zeroes.
> > > +		 * zeroes. On architectures with software "accessed" bits,
> > > +		 * we would take a double page fault here, so mark it
> > > +		 * accessed here.
> > >   		 */
> > > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > +			spin_lock(vmf->ptl);
> > > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > +				entry = pte_mkyoung(vmf->orig_pte);
> > > +				if (ptep_set_access_flags(vma, addr,
> > > +							  vmf->pte, entry, 0))
> > > +					update_mmu_cache(vma, addr, vmf->pte);
> > > +			}
> > I don't follow.
> > 
> > So if pte has changed under you, you don't set the accessed bit, but never
> > the less copy from the user.
> > 
> > What makes you think it will not trigger the same problem?
> > 
> > I think we need to make cow_user_page() fail in this case and caller --
> > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > are fine. If not userspace would re-fault on the same address and we will
> > handle the fault from the second attempt.
> 
> Thanks for the pointing. How about make cow_user_page() be returned
> 
> VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?

No. VM_FAULT_RETRY has different semantics: we have to drop mmap_sem(), so
let's try to take it again and handle the fault. In this case the more
likely scenario is that other thread has already handled the fault and we
don't need to do anything. If it's not the case, the fault will be
triggered again on the same address.

-- 
 Kirill A. Shutemov

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-18 18:00       ` Catalin Marinas
@ 2019-09-19 15:00         ` Kirill A. Shutemov
  -1 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 15:00 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Jia He, Will Deacon, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Suzuki Poulose, Punit Agrawal,
	Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
	Thomas Gleixner, Andrew Morton, Jérôme Glisse,
	Ralph Campbell, hejianet, Kaly Xin

On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > >  	 */
> > >  	if (unlikely(!src)) {
> > >  		void *kaddr = kmap_atomic(dst);
> > > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > +		pte_t entry;
> > >  
> > >  		/*
> > >  		 * This really shouldn't fail, because the page is there
> > >  		 * in the page tables. But it might just be unreadable,
> > >  		 * in which case we just give up and fill the result with
> > > -		 * zeroes.
> > > +		 * zeroes. On architectures with software "accessed" bits,
> > > +		 * we would take a double page fault here, so mark it
> > > +		 * accessed here.
> > >  		 */
> > > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > +			spin_lock(vmf->ptl);
> > > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > +				entry = pte_mkyoung(vmf->orig_pte);
> > > +				if (ptep_set_access_flags(vma, addr,
> > > +							  vmf->pte, entry, 0))
> > > +					update_mmu_cache(vma, addr, vmf->pte);
> > > +			}
> > 
> > I don't follow.
> > 
> > So if pte has changed under you, you don't set the accessed bit, but never
> > the less copy from the user.
> > 
> > What makes you think it will not trigger the same problem?
> > 
> > I think we need to make cow_user_page() fail in this case and caller --
> > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > are fine. If not userspace would re-fault on the same address and we will
> > handle the fault from the second attempt.
> 
> It would be nice to clarify the semantics of this function and do as
> you suggest but the current comment is slightly confusing:
> 
> 	/*
> 	 * If the source page was a PFN mapping, we don't have
> 	 * a "struct page" for it. We do a best-effort copy by
> 	 * just copying from the original user address. If that
> 	 * fails, we just zero-fill it. Live with it.
> 	 */
> 
> Would any user-space rely on getting a zero-filled page here instead of
> a recursive fault?

I don't see the point in zero-filled page in this case. SIGBUS sounds like
more appropriate response, no?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 15:00         ` Kirill A. Shutemov
  0 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 15:00 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, linux-mm, Punit Agrawal, Will Deacon,
	Alex Van Brunt, Jia He, Marc Zyngier, Anshuman Khandual,
	Matthew Wilcox, Jun Yao, Kaly Xin, hejianet, Ralph Campbell,
	Suzuki Poulose, Jérôme Glisse, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, James Morse, Andrew Morton,
	Robin Murphy, Kirill A. Shutemov

On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > >  	 */
> > >  	if (unlikely(!src)) {
> > >  		void *kaddr = kmap_atomic(dst);
> > > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > +		pte_t entry;
> > >  
> > >  		/*
> > >  		 * This really shouldn't fail, because the page is there
> > >  		 * in the page tables. But it might just be unreadable,
> > >  		 * in which case we just give up and fill the result with
> > > -		 * zeroes.
> > > +		 * zeroes. On architectures with software "accessed" bits,
> > > +		 * we would take a double page fault here, so mark it
> > > +		 * accessed here.
> > >  		 */
> > > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > +			spin_lock(vmf->ptl);
> > > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > +				entry = pte_mkyoung(vmf->orig_pte);
> > > +				if (ptep_set_access_flags(vma, addr,
> > > +							  vmf->pte, entry, 0))
> > > +					update_mmu_cache(vma, addr, vmf->pte);
> > > +			}
> > 
> > I don't follow.
> > 
> > So if pte has changed under you, you don't set the accessed bit, but never
> > the less copy from the user.
> > 
> > What makes you think it will not trigger the same problem?
> > 
> > I think we need to make cow_user_page() fail in this case and caller --
> > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > are fine. If not userspace would re-fault on the same address and we will
> > handle the fault from the second attempt.
> 
> It would be nice to clarify the semantics of this function and do as
> you suggest but the current comment is slightly confusing:
> 
> 	/*
> 	 * If the source page was a PFN mapping, we don't have
> 	 * a "struct page" for it. We do a best-effort copy by
> 	 * just copying from the original user address. If that
> 	 * fails, we just zero-fill it. Live with it.
> 	 */
> 
> Would any user-space rely on getting a zero-filled page here instead of
> a recursive fault?

I don't see the point in zero-filled page in this case. SIGBUS sounds like
more appropriate response, no?

-- 
 Kirill A. Shutemov

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-19 14:57         ` Kirill A. Shutemov
@ 2019-09-19 15:02           ` Justin He (Arm Technology China)
  -1 siblings, 0 replies; 39+ messages in thread
From: Justin He (Arm Technology China) @ 2019-09-19 15:02 UTC (permalink / raw)
  To: Kirill A. Shutemov, Jia He
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose,
	Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell,
	Kaly Xin (Arm Technology China)

Hi Kirill
Thanks for the detailed explanation.

--
Cheers,
Justin (Jia He)



> -----Original Message-----
> From: Kirill A. Shutemov <kirill@shutemov.name>
> Sent: 2019年9月19日 22:58
> To: Jia He <hejianet@gmail.com>
> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> Marinas <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>; Mark
> Rutland <Mark.Rutland@arm.com>; James Morse
> <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>; Matthew
> Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Suzuki Poulose
> <Suzuki.Poulose@arm.com>; Punit Agrawal <punitagrawal@gmail.com>;
> Anshuman Khandual <Anshuman.Khandual@arm.com>; Jun Yao
> <yaojun8558363@gmail.com>; Alex Van Brunt <avanbrunt@nvidia.com>;
> Robin Murphy <Robin.Murphy@arm.com>; Thomas Gleixner
> <tglx@linutronix.de>; Andrew Morton <akpm@linux-foundation.org>;
> Jérôme Glisse <jglisse@redhat.com>; Ralph Campbell
> <rcampbell@nvidia.com>; Kaly Xin (Arm Technology China)
> <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is
> cleared
>
> On Thu, Sep 19, 2019 at 10:16:34AM +0800, Jia He wrote:
> > Hi Kirill
> >
> > [On behalf of justin.he@arm.com because some mails are filted...]
> >
> > On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64
> guest, there
> > > > will be a double page fault in __copy_from_user_inatomic of
> cow_user_page.
> > > >
> > > > Below call trace is from arm64 do_page_fault for debugging purpose
> > > > [  110.016195] Call trace:
> > > > [  110.016826]  do_page_fault+0x5a4/0x690
> > > > [  110.017812]  do_mem_abort+0x50/0xb0
> > > > [  110.018726]  el1_da+0x20/0xc4
> > > > [  110.019492]  __arch_copy_from_user+0x180/0x280
> > > > [  110.020646]  do_wp_page+0xb0/0x860
> > > > [  110.021517]  __handle_mm_fault+0x994/0x1338
> > > > [  110.022606]  handle_mm_fault+0xe8/0x180
> > > > [  110.023584]  do_page_fault+0x240/0x690
> > > > [  110.024535]  do_mem_abort+0x50/0xb0
> > > > [  110.025423]  el0_da+0x20/0x24
> > > >
> > > > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > > > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
> pmd=000000023d4b3003, pte=360000298607bd3
> > > >
> > > > As told by Catalin: "On arm64 without hardware Access Flag, copying
> from
> > > > user will fail because the pte is old and cannot be marked young. So
> we
> > > > always end up with zeroed page after fork() + CoW for pfn mappings.
> we
> > > > don't always have a hardware-managed access flag on arm64."
> > > >
> > > > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > > > changed because vmf should be passed to cow_user_page()
> > > >
> > > > [1]
> https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> > > >
> > > > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > ---
> > > >   mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
> > > >   1 file changed, 30 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index e2bb51b6242e..d2c130a5883b 100644
> > > > --- a/mm/memory.c
> > > > +++ b/mm/memory.c
> > > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > > >                                         2;
> > > >   #endif
> > > > +#ifndef arch_faults_on_old_pte
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > +       return false;
> > > > +}
> > > > +#endif
> > > > +
> > > >   static int __init disable_randmaps(char *s)
> > > >   {
> > > >         randomize_va_space = 0;
> > > > @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct
> mm_struct *mm, pmd_t *pmd,
> > > >         return same;
> > > >   }
> > > > -static inline void cow_user_page(struct page *dst, struct page *src,
> unsigned long va, struct vm_area_struct *vma)
> > > > +static inline void cow_user_page(struct page *dst, struct page *src,
> > > > +                                struct vm_fault *vmf)
> > > >   {
> > > > +       struct vm_area_struct *vma = vmf->vma;
> > > > +       unsigned long addr = vmf->address;
> > > > +
> > > >         debug_dma_assert_idle(src);
> > > >         /*
> > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct
> page *dst, struct page *src, unsigned lo
> > > >          */
> > > >         if (unlikely(!src)) {
> > > >                 void *kaddr = kmap_atomic(dst);
> > > > -               void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > +               void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > +               pte_t entry;
> > > >                 /*
> > > >                  * This really shouldn't fail, because the page is there
> > > >                  * in the page tables. But it might just be unreadable,
> > > >                  * in which case we just give up and fill the result with
> > > > -                * zeroes.
> > > > +                * zeroes. On architectures with software "accessed" bits,
> > > > +                * we would take a double page fault here, so mark it
> > > > +                * accessed here.
> > > >                  */
> > > > +               if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte))
> {
> > > > +                       spin_lock(vmf->ptl);
> > > > +                       if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > +                               entry = pte_mkyoung(vmf->orig_pte);
> > > > +                               if (ptep_set_access_flags(vma, addr,
> > > > +                                                         vmf->pte, entry, 0))
> > > > +                                       update_mmu_cache(vma, addr, vmf-
> >pte);
> > > > +                       }
> > > I don't follow.
> > >
> > > So if pte has changed under you, you don't set the accessed bit, but
> never
> > > the less copy from the user.
> > >
> > > What makes you think it will not trigger the same problem?
> > >
> > > I think we need to make cow_user_page() fail in this case and caller --
> > > wp_page_copy() -- return zero. If the fault was solved by other thread,
> we
> > > are fine. If not userspace would re-fault on the same address and we
> will
> > > handle the fault from the second attempt.
> >
> > Thanks for the pointing. How about make cow_user_page() be returned
> >
> > VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?
>
> No. VM_FAULT_RETRY has different semantics: we have to drop
> mmap_sem(), so
> let's try to take it again and handle the fault. In this case the more
> likely scenario is that other thread has already handled the fault and we
> don't need to do anything. If it's not the case, the fault will be
> triggered again on the same address.
>
> --
>  Kirill A. Shutemov
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 15:02           ` Justin He (Arm Technology China)
  0 siblings, 0 replies; 39+ messages in thread
From: Justin He (Arm Technology China) @ 2019-09-19 15:02 UTC (permalink / raw)
  To: Kirill A. Shutemov, Jia He
  Cc: Mark Rutland, Kaly Xin (Arm Technology China),
	Ralph Campbell, Andrew Morton, Suzuki Poulose, Catalin Marinas,
	Anshuman Khandual, linux-kernel, Matthew Wilcox, Jun Yao,
	linux-mm, Jérôme Glisse, James Morse, linux-arm-kernel,
	Punit Agrawal, Marc Zyngier, Thomas Gleixner, Will Deacon,
	Alex Van Brunt, Kirill A. Shutemov, Robin Murphy

Hi Kirill
Thanks for the detailed explanation.

--
Cheers,
Justin (Jia He)



> -----Original Message-----
> From: Kirill A. Shutemov <kirill@shutemov.name>
> Sent: 2019年9月19日 22:58
> To: Jia He <hejianet@gmail.com>
> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> Marinas <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>; Mark
> Rutland <Mark.Rutland@arm.com>; James Morse
> <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>; Matthew
> Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Suzuki Poulose
> <Suzuki.Poulose@arm.com>; Punit Agrawal <punitagrawal@gmail.com>;
> Anshuman Khandual <Anshuman.Khandual@arm.com>; Jun Yao
> <yaojun8558363@gmail.com>; Alex Van Brunt <avanbrunt@nvidia.com>;
> Robin Murphy <Robin.Murphy@arm.com>; Thomas Gleixner
> <tglx@linutronix.de>; Andrew Morton <akpm@linux-foundation.org>;
> Jérôme Glisse <jglisse@redhat.com>; Ralph Campbell
> <rcampbell@nvidia.com>; Kaly Xin (Arm Technology China)
> <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is
> cleared
>
> On Thu, Sep 19, 2019 at 10:16:34AM +0800, Jia He wrote:
> > Hi Kirill
> >
> > [On behalf of justin.he@arm.com because some mails are filted...]
> >
> > On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64
> guest, there
> > > > will be a double page fault in __copy_from_user_inatomic of
> cow_user_page.
> > > >
> > > > Below call trace is from arm64 do_page_fault for debugging purpose
> > > > [  110.016195] Call trace:
> > > > [  110.016826]  do_page_fault+0x5a4/0x690
> > > > [  110.017812]  do_mem_abort+0x50/0xb0
> > > > [  110.018726]  el1_da+0x20/0xc4
> > > > [  110.019492]  __arch_copy_from_user+0x180/0x280
> > > > [  110.020646]  do_wp_page+0xb0/0x860
> > > > [  110.021517]  __handle_mm_fault+0x994/0x1338
> > > > [  110.022606]  handle_mm_fault+0xe8/0x180
> > > > [  110.023584]  do_page_fault+0x240/0x690
> > > > [  110.024535]  do_mem_abort+0x50/0xb0
> > > > [  110.025423]  el0_da+0x20/0x24
> > > >
> > > > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > > > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
> pmd=000000023d4b3003, pte=360000298607bd3
> > > >
> > > > As told by Catalin: "On arm64 without hardware Access Flag, copying
> from
> > > > user will fail because the pte is old and cannot be marked young. So
> we
> > > > always end up with zeroed page after fork() + CoW for pfn mappings.
> we
> > > > don't always have a hardware-managed access flag on arm64."
> > > >
> > > > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > > > changed because vmf should be passed to cow_user_page()
> > > >
> > > > [1]
> https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> > > >
> > > > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > ---
> > > >   mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
> > > >   1 file changed, 30 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index e2bb51b6242e..d2c130a5883b 100644
> > > > --- a/mm/memory.c
> > > > +++ b/mm/memory.c
> > > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > > >                                         2;
> > > >   #endif
> > > > +#ifndef arch_faults_on_old_pte
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > +       return false;
> > > > +}
> > > > +#endif
> > > > +
> > > >   static int __init disable_randmaps(char *s)
> > > >   {
> > > >         randomize_va_space = 0;
> > > > @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct
> mm_struct *mm, pmd_t *pmd,
> > > >         return same;
> > > >   }
> > > > -static inline void cow_user_page(struct page *dst, struct page *src,
> unsigned long va, struct vm_area_struct *vma)
> > > > +static inline void cow_user_page(struct page *dst, struct page *src,
> > > > +                                struct vm_fault *vmf)
> > > >   {
> > > > +       struct vm_area_struct *vma = vmf->vma;
> > > > +       unsigned long addr = vmf->address;
> > > > +
> > > >         debug_dma_assert_idle(src);
> > > >         /*
> > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct
> page *dst, struct page *src, unsigned lo
> > > >          */
> > > >         if (unlikely(!src)) {
> > > >                 void *kaddr = kmap_atomic(dst);
> > > > -               void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > +               void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > +               pte_t entry;
> > > >                 /*
> > > >                  * This really shouldn't fail, because the page is there
> > > >                  * in the page tables. But it might just be unreadable,
> > > >                  * in which case we just give up and fill the result with
> > > > -                * zeroes.
> > > > +                * zeroes. On architectures with software "accessed" bits,
> > > > +                * we would take a double page fault here, so mark it
> > > > +                * accessed here.
> > > >                  */
> > > > +               if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte))
> {
> > > > +                       spin_lock(vmf->ptl);
> > > > +                       if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > +                               entry = pte_mkyoung(vmf->orig_pte);
> > > > +                               if (ptep_set_access_flags(vma, addr,
> > > > +                                                         vmf->pte, entry, 0))
> > > > +                                       update_mmu_cache(vma, addr, vmf-
> >pte);
> > > > +                       }
> > > I don't follow.
> > >
> > > So if pte has changed under you, you don't set the accessed bit, but
> never
> > > the less copy from the user.
> > >
> > > What makes you think it will not trigger the same problem?
> > >
> > > I think we need to make cow_user_page() fail in this case and caller --
> > > wp_page_copy() -- return zero. If the fault was solved by other thread,
> we
> > > are fine. If not userspace would re-fault on the same address and we
> will
> > > handle the fault from the second attempt.
> >
> > Thanks for the pointing. How about make cow_user_page() be returned
> >
> > VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?
>
> No. VM_FAULT_RETRY has different semantics: we have to drop
> mmap_sem(), so
> let's try to take it again and handle the fault. In this case the more
> likely scenario is that other thread has already handled the fault and we
> don't need to do anything. If it's not the case, the fault will be
> triggered again on the same address.
>
> --
>  Kirill A. Shutemov
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-19 15:00         ` Kirill A. Shutemov
@ 2019-09-19 15:41           ` Catalin Marinas
  -1 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-19 15:41 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Jia He, Will Deacon, Mark Rutland, James Morse, Marc Zyngier,
	Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
	linux-kernel, linux-mm, Suzuki Poulose, Punit Agrawal,
	Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
	Thomas Gleixner, Andrew Morton, Jérôme Glisse,
	Ralph Campbell, hejianet, Kaly Xin

On Thu, Sep 19, 2019 at 06:00:07PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> > On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > >  	 */
> > > >  	if (unlikely(!src)) {
> > > >  		void *kaddr = kmap_atomic(dst);
> > > > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > +		pte_t entry;
> > > >  
> > > >  		/*
> > > >  		 * This really shouldn't fail, because the page is there
> > > >  		 * in the page tables. But it might just be unreadable,
> > > >  		 * in which case we just give up and fill the result with
> > > > -		 * zeroes.
> > > > +		 * zeroes. On architectures with software "accessed" bits,
> > > > +		 * we would take a double page fault here, so mark it
> > > > +		 * accessed here.
> > > >  		 */
> > > > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > > +			spin_lock(vmf->ptl);
> > > > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > +				entry = pte_mkyoung(vmf->orig_pte);
> > > > +				if (ptep_set_access_flags(vma, addr,
> > > > +							  vmf->pte, entry, 0))
> > > > +					update_mmu_cache(vma, addr, vmf->pte);
> > > > +			}
> > > 
> > > I don't follow.
> > > 
> > > So if pte has changed under you, you don't set the accessed bit, but never
> > > the less copy from the user.
> > > 
> > > What makes you think it will not trigger the same problem?
> > > 
> > > I think we need to make cow_user_page() fail in this case and caller --
> > > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > > are fine. If not userspace would re-fault on the same address and we will
> > > handle the fault from the second attempt.
> > 
> > It would be nice to clarify the semantics of this function and do as
> > you suggest but the current comment is slightly confusing:
> > 
> > 	/*
> > 	 * If the source page was a PFN mapping, we don't have
> > 	 * a "struct page" for it. We do a best-effort copy by
> > 	 * just copying from the original user address. If that
> > 	 * fails, we just zero-fill it. Live with it.
> > 	 */
> > 
> > Would any user-space rely on getting a zero-filled page here instead of
> > a recursive fault?
> 
> I don't see the point in zero-filled page in this case. SIGBUS sounds like
> more appropriate response, no?

I think misunderstood your comment. So, if !pte_same(), we should let
userspace re-fault. This wouldn't be a user ABI change and it is
bounded, can't end up in an infinite re-fault loop.

In case of a __copy_from_user_inatomic() error, SIGBUS would make more
sense but it changes the current behaviour (zero-filling the page). This
can be left for a separate patch, doesn't affect the arm64 case here.

-- 
Catalin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 15:41           ` Catalin Marinas
  0 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-19 15:41 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Mark Rutland, linux-mm, Punit Agrawal, Will Deacon,
	Alex Van Brunt, Jia He, Marc Zyngier, Anshuman Khandual,
	Matthew Wilcox, Jun Yao, Kaly Xin, hejianet, Ralph Campbell,
	Suzuki Poulose, Jérôme Glisse, Thomas Gleixner,
	linux-arm-kernel, linux-kernel, James Morse, Andrew Morton,
	Robin Murphy, Kirill A. Shutemov

On Thu, Sep 19, 2019 at 06:00:07PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> > On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > >  	 */
> > > >  	if (unlikely(!src)) {
> > > >  		void *kaddr = kmap_atomic(dst);
> > > > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > +		pte_t entry;
> > > >  
> > > >  		/*
> > > >  		 * This really shouldn't fail, because the page is there
> > > >  		 * in the page tables. But it might just be unreadable,
> > > >  		 * in which case we just give up and fill the result with
> > > > -		 * zeroes.
> > > > +		 * zeroes. On architectures with software "accessed" bits,
> > > > +		 * we would take a double page fault here, so mark it
> > > > +		 * accessed here.
> > > >  		 */
> > > > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > > +			spin_lock(vmf->ptl);
> > > > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > +				entry = pte_mkyoung(vmf->orig_pte);
> > > > +				if (ptep_set_access_flags(vma, addr,
> > > > +							  vmf->pte, entry, 0))
> > > > +					update_mmu_cache(vma, addr, vmf->pte);
> > > > +			}
> > > 
> > > I don't follow.
> > > 
> > > So if pte has changed under you, you don't set the accessed bit, but never
> > > the less copy from the user.
> > > 
> > > What makes you think it will not trigger the same problem?
> > > 
> > > I think we need to make cow_user_page() fail in this case and caller --
> > > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > > are fine. If not userspace would re-fault on the same address and we will
> > > handle the fault from the second attempt.
> > 
> > It would be nice to clarify the semantics of this function and do as
> > you suggest but the current comment is slightly confusing:
> > 
> > 	/*
> > 	 * If the source page was a PFN mapping, we don't have
> > 	 * a "struct page" for it. We do a best-effort copy by
> > 	 * just copying from the original user address. If that
> > 	 * fails, we just zero-fill it. Live with it.
> > 	 */
> > 
> > Would any user-space rely on getting a zero-filled page here instead of
> > a recursive fault?
> 
> I don't see the point in zero-filled page in this case. SIGBUS sounds like
> more appropriate response, no?

I think misunderstood your comment. So, if !pte_same(), we should let
userspace re-fault. This wouldn't be a user ABI change and it is
bounded, can't end up in an infinite re-fault loop.

In case of a __copy_from_user_inatomic() error, SIGBUS would make more
sense but it changes the current behaviour (zero-filling the page). This
can be left for a separate patch, doesn't affect the arm64 case here.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-19 15:41           ` Catalin Marinas
@ 2019-09-19 15:51             ` Kirill A. Shutemov
  -1 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 15:51 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Kirill A. Shutemov, Jia He, Will Deacon, Mark Rutland,
	James Morse, Marc Zyngier, Matthew Wilcox, linux-arm-kernel,
	linux-kernel, linux-mm, Suzuki Poulose, Punit Agrawal,
	Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
	Thomas Gleixner, Andrew Morton, Jérôme Glisse,
	Ralph Campbell, hejianet, Kaly Xin

On Thu, Sep 19, 2019 at 03:41:43PM +0000, Catalin Marinas wrote:
> On Thu, Sep 19, 2019 at 06:00:07PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> > > On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > > >  	 */
> > > > >  	if (unlikely(!src)) {
> > > > >  		void *kaddr = kmap_atomic(dst);
> > > > > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > > +		pte_t entry;
> > > > >  
> > > > >  		/*
> > > > >  		 * This really shouldn't fail, because the page is there
> > > > >  		 * in the page tables. But it might just be unreadable,
> > > > >  		 * in which case we just give up and fill the result with
> > > > > -		 * zeroes.
> > > > > +		 * zeroes. On architectures with software "accessed" bits,
> > > > > +		 * we would take a double page fault here, so mark it
> > > > > +		 * accessed here.
> > > > >  		 */
> > > > > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > > > +			spin_lock(vmf->ptl);
> > > > > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > > +				entry = pte_mkyoung(vmf->orig_pte);
> > > > > +				if (ptep_set_access_flags(vma, addr,
> > > > > +							  vmf->pte, entry, 0))
> > > > > +					update_mmu_cache(vma, addr, vmf->pte);
> > > > > +			}
> > > > 
> > > > I don't follow.
> > > > 
> > > > So if pte has changed under you, you don't set the accessed bit, but never
> > > > the less copy from the user.
> > > > 
> > > > What makes you think it will not trigger the same problem?
> > > > 
> > > > I think we need to make cow_user_page() fail in this case and caller --
> > > > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > > > are fine. If not userspace would re-fault on the same address and we will
> > > > handle the fault from the second attempt.
> > > 
> > > It would be nice to clarify the semantics of this function and do as
> > > you suggest but the current comment is slightly confusing:
> > > 
> > > 	/*
> > > 	 * If the source page was a PFN mapping, we don't have
> > > 	 * a "struct page" for it. We do a best-effort copy by
> > > 	 * just copying from the original user address. If that
> > > 	 * fails, we just zero-fill it. Live with it.
> > > 	 */
> > > 
> > > Would any user-space rely on getting a zero-filled page here instead of
> > > a recursive fault?
> > 
> > I don't see the point in zero-filled page in this case. SIGBUS sounds like
> > more appropriate response, no?
> 
> I think misunderstood your comment. So, if !pte_same(), we should let
> userspace re-fault. This wouldn't be a user ABI change and it is
> bounded, can't end up in an infinite re-fault loop.

Right. !pte_same() can only happen if we raced with somebody else.
I cannot imagine situation when the race will happen more than few times
in a row.

> In case of a __copy_from_user_inatomic() error, SIGBUS would make more
> sense but it changes the current behaviour (zero-filling the page). This
> can be left for a separate patch, doesn't affect the arm64 case here.

I think it's safer to leave it as is. Maybe put WARN_ON_ONCE() or
something. There can be some obscure use-case that I don't see.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 15:51             ` Kirill A. Shutemov
  0 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 15:51 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, linux-mm, Punit Agrawal, Will Deacon,
	Alex Van Brunt, Jia He, Marc Zyngier, Anshuman Khandual,
	Matthew Wilcox, Jun Yao, Kaly Xin, hejianet, Ralph Campbell,
	Suzuki Poulose, Jérôme Glisse, Kirill A. Shutemov,
	Thomas Gleixner, linux-arm-kernel, linux-kernel, James Morse,
	Andrew Morton, Robin Murphy

On Thu, Sep 19, 2019 at 03:41:43PM +0000, Catalin Marinas wrote:
> On Thu, Sep 19, 2019 at 06:00:07PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> > > On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > > >  	 */
> > > > >  	if (unlikely(!src)) {
> > > > >  		void *kaddr = kmap_atomic(dst);
> > > > > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > > +		pte_t entry;
> > > > >  
> > > > >  		/*
> > > > >  		 * This really shouldn't fail, because the page is there
> > > > >  		 * in the page tables. But it might just be unreadable,
> > > > >  		 * in which case we just give up and fill the result with
> > > > > -		 * zeroes.
> > > > > +		 * zeroes. On architectures with software "accessed" bits,
> > > > > +		 * we would take a double page fault here, so mark it
> > > > > +		 * accessed here.
> > > > >  		 */
> > > > > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > > > +			spin_lock(vmf->ptl);
> > > > > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > > +				entry = pte_mkyoung(vmf->orig_pte);
> > > > > +				if (ptep_set_access_flags(vma, addr,
> > > > > +							  vmf->pte, entry, 0))
> > > > > +					update_mmu_cache(vma, addr, vmf->pte);
> > > > > +			}
> > > > 
> > > > I don't follow.
> > > > 
> > > > So if pte has changed under you, you don't set the accessed bit, but never
> > > > the less copy from the user.
> > > > 
> > > > What makes you think it will not trigger the same problem?
> > > > 
> > > > I think we need to make cow_user_page() fail in this case and caller --
> > > > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > > > are fine. If not userspace would re-fault on the same address and we will
> > > > handle the fault from the second attempt.
> > > 
> > > It would be nice to clarify the semantics of this function and do as
> > > you suggest but the current comment is slightly confusing:
> > > 
> > > 	/*
> > > 	 * If the source page was a PFN mapping, we don't have
> > > 	 * a "struct page" for it. We do a best-effort copy by
> > > 	 * just copying from the original user address. If that
> > > 	 * fails, we just zero-fill it. Live with it.
> > > 	 */
> > > 
> > > Would any user-space rely on getting a zero-filled page here instead of
> > > a recursive fault?
> > 
> > I don't see the point in zero-filled page in this case. SIGBUS sounds like
> > more appropriate response, no?
> 
> I think misunderstood your comment. So, if !pte_same(), we should let
> userspace re-fault. This wouldn't be a user ABI change and it is
> bounded, can't end up in an infinite re-fault loop.

Right. !pte_same() can only happen if we raced with somebody else.
I cannot imagine situation when the race will happen more than few times
in a row.

> In case of a __copy_from_user_inatomic() error, SIGBUS would make more
> sense but it changes the current behaviour (zero-filling the page). This
> can be left for a separate patch, doesn't affect the arm64 case here.

I think it's safer to leave it as is. Maybe put WARN_ON_ONCE() or
something. There can be some obscure use-case that I don't see.

-- 
 Kirill A. Shutemov

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2019-09-19 15:51 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-18 13:19 [PATCH v4 0/3] fix double page fault on arm64 Jia He
2019-09-18 13:19 ` Jia He
2019-09-18 13:19 ` [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af() Jia He
2019-09-18 13:19   ` Jia He
2019-09-18 14:20   ` Matthew Wilcox
2019-09-18 14:20     ` Matthew Wilcox
2019-09-18 16:49     ` Catalin Marinas
2019-09-18 16:49       ` Catalin Marinas
2019-09-18 14:20   ` Suzuki K Poulose
2019-09-18 14:20     ` Suzuki K Poulose
2019-09-18 16:45     ` Catalin Marinas
2019-09-18 16:45       ` Catalin Marinas
2019-09-19  1:55       ` Justin He (Arm Technology China)
2019-09-19  1:55         ` Justin He (Arm Technology China)
2019-09-18 13:19 ` [PATCH v4 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64 Jia He
2019-09-18 13:19   ` Jia He
2019-09-18 13:19 ` [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
2019-09-18 13:19   ` Jia He
2019-09-18 14:00   ` Kirill A. Shutemov
2019-09-18 14:00     ` Kirill A. Shutemov
2019-09-18 18:00     ` Catalin Marinas
2019-09-18 18:00       ` Catalin Marinas
2019-09-19 15:00       ` Kirill A. Shutemov
2019-09-19 15:00         ` Kirill A. Shutemov
2019-09-19 15:41         ` Catalin Marinas
2019-09-19 15:41           ` Catalin Marinas
2019-09-19 15:51           ` Kirill A. Shutemov
2019-09-19 15:51             ` Kirill A. Shutemov
2019-09-19  2:16     ` Jia He
2019-09-19  2:16       ` Jia He
2019-09-19 14:57       ` Kirill A. Shutemov
2019-09-19 14:57         ` Kirill A. Shutemov
2019-09-19 15:02         ` Justin He (Arm Technology China)
2019-09-19 15:02           ` Justin He (Arm Technology China)
2019-09-18 19:35   ` kbuild test robot
2019-09-18 19:35     ` kbuild test robot
2019-09-19  1:46     ` Justin He (Arm Technology China)
2019-09-19  1:46       ` Justin He (Arm Technology China)
2019-09-19  1:46       ` Justin He (Arm Technology China)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.