Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/2] fix double page fault on arm64
@ 2019-09-13 16:32 Jia He
  2019-09-13 16:32 ` [PATCH v3 1/2] arm64: mm: implement arch_faults_on_old_pte() " Jia He
  2019-09-13 16:32 ` [PATCH v3 2/2] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
  0 siblings, 2 replies; 7+ messages in thread
From: Jia He @ 2019-09-13 16:32 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm
  Cc: Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, hejianet, Jia He

When we tested pmdk unit test vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

Changes
v3: add vmf->ptl lock/unlock (by Kirill A. Shutemov)
    add arch_faults_on_old_pte (Matthew, Catalins)
v2: remove FAULT_FLAG_WRITE when setting pte access flag (by Catalin)
Jia He (2):
  arm64: mm: implement arch_faults_on_old_pte() on arm64
  mm: fix double page fault on arm64 if PTE_AF is cleared

 arch/arm64/include/asm/pgtable.h | 11 +++++++++++
 mm/memory.c                      | 29 ++++++++++++++++++++++++-----
 2 files changed, 35 insertions(+), 5 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/2] arm64: mm: implement arch_faults_on_old_pte() on arm64
  2019-09-13 16:32 [PATCH v3 0/2] fix double page fault on arm64 Jia He
@ 2019-09-13 16:32 ` " Jia He
  2019-09-16  9:20   ` Kirill A. Shutemov
  2019-09-13 16:32 ` [PATCH v3 2/2] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
  1 sibling, 1 reply; 7+ messages in thread
From: Jia He @ 2019-09-13 16:32 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm
  Cc: Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, hejianet, Jia He

On arm64 without hardware Access Flag, copying fromuser will fail because
the pte is old and cannot be marked young. So we always end up with zeroed
page after fork() + CoW for pfn mappings. we don't always have a
hardware-managed access flag on arm64.

Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
cause page fault when accessing old pte.

Signed-off-by: Jia He <justin.he@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e09760ece844..b41399d758df 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -868,6 +868,18 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 #define phys_to_ttbr(addr)	(addr)
 #endif
 
+/*
+ * On arm64 without hardware Access Flag, copying fromuser will fail because
+ * the pte is old and cannot be marked young. So we always end up with zeroed
+ * page after fork() + CoW for pfn mappings. we don't always have a
+ * hardware-managed access flag on arm64.
+ */
+static inline bool arch_faults_on_old_pte(void)
+{
+	return true;
+}
+#define arch_faults_on_old_pte arch_faults_on_old_pte
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_PGTABLE_H */
-- 
2.17.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 2/2] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-13 16:32 [PATCH v3 0/2] fix double page fault on arm64 Jia He
  2019-09-13 16:32 ` [PATCH v3 1/2] arm64: mm: implement arch_faults_on_old_pte() " Jia He
@ 2019-09-13 16:32 ` Jia He
  2019-09-16  9:16   ` Kirill A. Shutemov
  1 sibling, 1 reply; 7+ messages in thread
From: Jia He @ 2019-09-13 16:32 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm
  Cc: Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
	Robin Murphy, Thomas Gleixner, Andrew Morton,
	Jérôme Glisse, Ralph Campbell, hejianet, Jia He

When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

Below call trace is from arm64 do_page_fault for debugging purpose
[  110.016195] Call trace:
[  110.016826]  do_page_fault+0x5a4/0x690
[  110.017812]  do_mem_abort+0x50/0xb0
[  110.018726]  el1_da+0x20/0xc4
[  110.019492]  __arch_copy_from_user+0x180/0x280
[  110.020646]  do_wp_page+0xb0/0x860
[  110.021517]  __handle_mm_fault+0x994/0x1338
[  110.022606]  handle_mm_fault+0xe8/0x180
[  110.023584]  do_page_fault+0x240/0x690
[  110.024535]  do_mem_abort+0x50/0xb0
[  110.025423]  el0_da+0x20/0x24

The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

This patch fix it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()

[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork

Reported-by: Yibo Cai <Yibo.Cai@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 mm/memory.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..a64af6495f71 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
 					2;
 #endif
 
+#ifndef arch_faults_on_old_pte
+static inline bool arch_faults_on_old_pte(void)
+{
+	return false;
+}
+#endif
+
 static int __init disable_randmaps(char *s)
 {
 	randomize_va_space = 0;
@@ -2140,7 +2147,8 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
 	return same;
 }
 
-static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
+static inline void cow_user_page(struct page *dst, struct page *src,
+				struct vm_fault *vmf)
 {
 	debug_dma_assert_idle(src);
 
@@ -2152,20 +2160,32 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
 	 */
 	if (unlikely(!src)) {
 		void *kaddr = kmap_atomic(dst);
-		void __user *uaddr = (void __user *)(va & PAGE_MASK);
+		void __user *uaddr = (void __user *)(vmf->address & PAGE_MASK);
+		pte_t entry;
 
 		/*
 		 * This really shouldn't fail, because the page is there
 		 * in the page tables. But it might just be unreadable,
 		 * in which case we just give up and fill the result with
-		 * zeroes.
+		 * zeroes. If PTE_AF is cleared on arm64, it might
+		 * cause double page fault. So makes pte young here
 		 */
+		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+			spin_lock(vmf->ptl);
+			entry = pte_mkyoung(vmf->orig_pte);
+			if (ptep_set_access_flags(vmf->vma, vmf->address,
+						  vmf->pte, entry, 0))
+				update_mmu_cache(vmf->vma, vmf->address,
+						 vmf->pte);
+			spin_unlock(vmf->ptl);
+		}
+
 		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
 			clear_page(kaddr);
 		kunmap_atomic(kaddr);
 		flush_dcache_page(dst);
 	} else
-		copy_user_highpage(dst, src, va, vma);
+		copy_user_highpage(dst, src, vmf->address, vmf->vma);
 }
 
 static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
@@ -2318,7 +2338,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 				vmf->address);
 		if (!new_page)
 			goto oom;
-		cow_user_page(new_page, old_page, vmf->address, vma);
+		cow_user_page(new_page, old_page, vmf);
 	}
 
 	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
-- 
2.17.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-13 16:32 ` [PATCH v3 2/2] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
@ 2019-09-16  9:16   ` Kirill A. Shutemov
  2019-09-16  9:35     ` Justin He (Arm Technology China)
  0 siblings, 1 reply; 7+ messages in thread
From: Kirill A. Shutemov @ 2019-09-16  9:16 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Punit Agrawal,
	Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
	Thomas Gleixner, Andrew Morton, Jérôme Glisse,
	Ralph Campbell, hejianet

On Sat, Sep 14, 2019 at 12:32:39AM +0800, Jia He wrote:
> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> will be a double page fault in __copy_from_user_inatomic of cow_user_page.
> 
> Below call trace is from arm64 do_page_fault for debugging purpose
> [  110.016195] Call trace:
> [  110.016826]  do_page_fault+0x5a4/0x690
> [  110.017812]  do_mem_abort+0x50/0xb0
> [  110.018726]  el1_da+0x20/0xc4
> [  110.019492]  __arch_copy_from_user+0x180/0x280
> [  110.020646]  do_wp_page+0xb0/0x860
> [  110.021517]  __handle_mm_fault+0x994/0x1338
> [  110.022606]  handle_mm_fault+0xe8/0x180
> [  110.023584]  do_page_fault+0x240/0x690
> [  110.024535]  do_mem_abort+0x50/0xb0
> [  110.025423]  el0_da+0x20/0x24
> 
> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
> 
> As told by Catalin: "On arm64 without hardware Access Flag, copying from
> user will fail because the pte is old and cannot be marked young. So we
> always end up with zeroed page after fork() + CoW for pfn mappings. we
> don't always have a hardware-managed access flag on arm64."
> 
> This patch fix it by calling pte_mkyoung. Also, the parameter is
> changed because vmf should be passed to cow_user_page()
> 
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> 
> Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  mm/memory.c | 30 +++++++++++++++++++++++++-----
>  1 file changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e2bb51b6242e..a64af6495f71 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
>  					2;
>  #endif
>  
> +#ifndef arch_faults_on_old_pte
> +static inline bool arch_faults_on_old_pte(void)
> +{
> +	return false;
> +}
> +#endif
> +
>  static int __init disable_randmaps(char *s)
>  {
>  	randomize_va_space = 0;
> @@ -2140,7 +2147,8 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
>  	return same;
>  }
>  
> -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
> +static inline void cow_user_page(struct page *dst, struct page *src,
> +				struct vm_fault *vmf)
>  {
>  	debug_dma_assert_idle(src);
>  
> @@ -2152,20 +2160,32 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
>  	 */
>  	if (unlikely(!src)) {
>  		void *kaddr = kmap_atomic(dst);
> -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> +		void __user *uaddr = (void __user *)(vmf->address & PAGE_MASK);
> +		pte_t entry;
>  
>  		/*
>  		 * This really shouldn't fail, because the page is there
>  		 * in the page tables. But it might just be unreadable,
>  		 * in which case we just give up and fill the result with
> -		 * zeroes.
> +		 * zeroes. If PTE_AF is cleared on arm64, it might
> +		 * cause double page fault. So makes pte young here
>  		 */
> +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> +			spin_lock(vmf->ptl);
> +			entry = pte_mkyoung(vmf->orig_pte);

Should't you re-validate that orig_pte after re-taking ptl? It can be
stale by now.

> +			if (ptep_set_access_flags(vmf->vma, vmf->address,
> +						  vmf->pte, entry, 0))
> +				update_mmu_cache(vmf->vma, vmf->address,
> +						 vmf->pte);
> +			spin_unlock(vmf->ptl);
> +		}
> +
>  		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
>  			clear_page(kaddr);
>  		kunmap_atomic(kaddr);
>  		flush_dcache_page(dst);
>  	} else
> -		copy_user_highpage(dst, src, va, vma);
> +		copy_user_highpage(dst, src, vmf->address, vmf->vma);
>  }
>  
>  static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
> @@ -2318,7 +2338,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>  				vmf->address);
>  		if (!new_page)
>  			goto oom;
> -		cow_user_page(new_page, old_page, vmf->address, vma);
> +		cow_user_page(new_page, old_page, vmf);
>  	}
>  
>  	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
> -- 
> 2.17.1
> 
> 

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] arm64: mm: implement arch_faults_on_old_pte() on arm64
  2019-09-13 16:32 ` [PATCH v3 1/2] arm64: mm: implement arch_faults_on_old_pte() " Jia He
@ 2019-09-16  9:20   ` Kirill A. Shutemov
  0 siblings, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2019-09-16  9:20 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Punit Agrawal,
	Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
	Thomas Gleixner, Andrew Morton, Jérôme Glisse,
	Ralph Campbell, hejianet

On Sat, Sep 14, 2019 at 12:32:38AM +0800, Jia He wrote:
> On arm64 without hardware Access Flag, copying fromuser will fail because
> the pte is old and cannot be marked young. So we always end up with zeroed
> page after fork() + CoW for pfn mappings. we don't always have a
> hardware-managed access flag on arm64.
> 
> Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
> cause page fault when accessing old pte.
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  arch/arm64/include/asm/pgtable.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index e09760ece844..b41399d758df 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -868,6 +868,18 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  #define phys_to_ttbr(addr)	(addr)
>  #endif
>  
> +/*
> + * On arm64 without hardware Access Flag, copying fromuser will fail because
> + * the pte is old and cannot be marked young. So we always end up with zeroed
> + * page after fork() + CoW for pfn mappings. we don't always have a
> + * hardware-managed access flag on arm64.
> + */
> +static inline bool arch_faults_on_old_pte(void)
> +{
> +	return true;

Shouldn't youc check if this particular machine supports hardware access
bit?

> +}
> +#define arch_faults_on_old_pte arch_faults_on_old_pte
> +
>  #endif /* !__ASSEMBLY__ */
>  
>  #endif /* __ASM_PGTABLE_H */
> -- 
> 2.17.1
> 
> 

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v3 2/2] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-16  9:16   ` Kirill A. Shutemov
@ 2019-09-16  9:35     ` Justin He (Arm Technology China)
  2019-09-16 14:16       ` Kirill A. Shutemov
  0 siblings, 1 reply; 7+ messages in thread
From: Justin He (Arm Technology China) @ 2019-09-16  9:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Punit Agrawal,
	Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
	Thomas Gleixner, Andrew Morton, Jérôme Glisse,
	Ralph Campbell, hejianet


Hi Kirill
> -----Original Message-----
> From: Kirill A. Shutemov <kirill@shutemov.name>
> Sent: 2019年9月16日 17:16
> To: Justin He (Arm Technology China) <Justin.He@arm.com>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> <will@kernel.org>; Mark Rutland <Mark.Rutland@arm.com>; James Morse
> <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>; Matthew
> Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> <punitagrawal@gmail.com>; Anshuman Khandual
> <Anshuman.Khandual@arm.com>; Jun Yao <yaojun8558363@gmail.com>;
> Alex Van Brunt <avanbrunt@nvidia.com>; Robin Murphy
> <Robin.Murphy@arm.com>; Thomas Gleixner <tglx@linutronix.de>;
> Andrew Morton <akpm@linux-foundation.org>; Jérôme Glisse
> <jglisse@redhat.com>; Ralph Campbell <rcampbell@nvidia.com>;
> hejianet@gmail.com
> Subject: Re: [PATCH v3 2/2] mm: fix double page fault on arm64 if PTE_AF
> is cleared
>
> On Sat, Sep 14, 2019 at 12:32:39AM +0800, Jia He wrote:
> > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest,
> there
> > will be a double page fault in __copy_from_user_inatomic of
> cow_user_page.
> >
> > Below call trace is from arm64 do_page_fault for debugging purpose
> > [  110.016195] Call trace:
> > [  110.016826]  do_page_fault+0x5a4/0x690
> > [  110.017812]  do_mem_abort+0x50/0xb0
> > [  110.018726]  el1_da+0x20/0xc4
> > [  110.019492]  __arch_copy_from_user+0x180/0x280
> > [  110.020646]  do_wp_page+0xb0/0x860
> > [  110.021517]  __handle_mm_fault+0x994/0x1338
> > [  110.022606]  handle_mm_fault+0xe8/0x180
> > [  110.023584]  do_page_fault+0x240/0x690
> > [  110.024535]  do_mem_abort+0x50/0xb0
> > [  110.025423]  el0_da+0x20/0x24
> >
> > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
> pmd=000000023d4b3003, pte=360000298607bd3
> >
> > As told by Catalin: "On arm64 without hardware Access Flag, copying
> from
> > user will fail because the pte is old and cannot be marked young. So we
> > always end up with zeroed page after fork() + CoW for pfn mappings. we
> > don't always have a hardware-managed access flag on arm64."
> >
> > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > changed because vmf should be passed to cow_user_page()
> >
> > [1]
> https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> >
> > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  mm/memory.c | 30 +++++++++++++++++++++++++-----
> >  1 file changed, 25 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index e2bb51b6242e..a64af6495f71 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> >                                     2;
> >  #endif
> >
> > +#ifndef arch_faults_on_old_pte
> > +static inline bool arch_faults_on_old_pte(void)
> > +{
> > +   return false;
> > +}
> > +#endif
> > +
> >  static int __init disable_randmaps(char *s)
> >  {
> >     randomize_va_space = 0;
> > @@ -2140,7 +2147,8 @@ static inline int pte_unmap_same(struct
> mm_struct *mm, pmd_t *pmd,
> >     return same;
> >  }
> >
> > -static inline void cow_user_page(struct page *dst, struct page *src,
> unsigned long va, struct vm_area_struct *vma)
> > +static inline void cow_user_page(struct page *dst, struct page *src,
> > +                           struct vm_fault *vmf)
> >  {
> >     debug_dma_assert_idle(src);
> >
> > @@ -2152,20 +2160,32 @@ static inline void cow_user_page(struct page
> *dst, struct page *src, unsigned lo
> >      */
> >     if (unlikely(!src)) {
> >             void *kaddr = kmap_atomic(dst);
> > -           void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > +           void __user *uaddr = (void __user *)(vmf->address &
> PAGE_MASK);
> > +           pte_t entry;
> >
> >             /*
> >              * This really shouldn't fail, because the page is there
> >              * in the page tables. But it might just be unreadable,
> >              * in which case we just give up and fill the result with
> > -            * zeroes.
> > +            * zeroes. If PTE_AF is cleared on arm64, it might
> > +            * cause double page fault. So makes pte young here
> >              */
> > +           if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte))
> {
> > +                   spin_lock(vmf->ptl);
> > +                   entry = pte_mkyoung(vmf->orig_pte);
>
> Should't you re-validate that orig_pte after re-taking ptl? It can be
> stale by now.
Thanks, do you mean flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte))
before pte_mkyoung?

--
Cheers,
Justin (Jia He)


>
> > +                   if (ptep_set_access_flags(vmf->vma, vmf->address,
> > +                                             vmf->pte, entry, 0))
> > +                           update_mmu_cache(vmf->vma, vmf-
> >address,
> > +                                            vmf->pte);
> > +                   spin_unlock(vmf->ptl);
> > +           }
> > +
> >             if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
> >                     clear_page(kaddr);
> >             kunmap_atomic(kaddr);
> >             flush_dcache_page(dst);
> >     } else
> > -           copy_user_highpage(dst, src, va, vma);
> > +           copy_user_highpage(dst, src, vmf->address, vmf->vma);
> >  }
> >
> >  static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
> > @@ -2318,7 +2338,7 @@ static vm_fault_t wp_page_copy(struct
> vm_fault *vmf)
> >                             vmf->address);
> >             if (!new_page)
> >                     goto oom;
> > -           cow_user_page(new_page, old_page, vmf->address, vma);
> > +           cow_user_page(new_page, old_page, vmf);
> >     }
> >
> >     if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL,
> &memcg, false))
> > --
> > 2.17.1
> >
> >
>
> --
>  Kirill A. Shutemov
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] mm: fix double page fault on arm64 if PTE_AF is cleared
  2019-09-16  9:35     ` Justin He (Arm Technology China)
@ 2019-09-16 14:16       ` Kirill A. Shutemov
  0 siblings, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2019-09-16 14:16 UTC (permalink / raw)
  To: Justin He (Arm Technology China)
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
	Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
	linux-arm-kernel, linux-kernel, linux-mm, Punit Agrawal,
	Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
	Thomas Gleixner, Andrew Morton, Jérôme Glisse,
	Ralph Campbell, hejianet

On Mon, Sep 16, 2019 at 09:35:21AM +0000, Justin He (Arm Technology China) wrote:
> 
> Hi Kirill
> > -----Original Message-----
> > From: Kirill A. Shutemov <kirill@shutemov.name>
> > Sent: 2019年9月16日 17:16
> > To: Justin He (Arm Technology China) <Justin.He@arm.com>
> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> > <will@kernel.org>; Mark Rutland <Mark.Rutland@arm.com>; James Morse
> > <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>; Matthew
> > Wilcox <willy@infradead.org>; Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> > linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> > <punitagrawal@gmail.com>; Anshuman Khandual
> > <Anshuman.Khandual@arm.com>; Jun Yao <yaojun8558363@gmail.com>;
> > Alex Van Brunt <avanbrunt@nvidia.com>; Robin Murphy
> > <Robin.Murphy@arm.com>; Thomas Gleixner <tglx@linutronix.de>;
> > Andrew Morton <akpm@linux-foundation.org>; Jérôme Glisse
> > <jglisse@redhat.com>; Ralph Campbell <rcampbell@nvidia.com>;
> > hejianet@gmail.com
> > Subject: Re: [PATCH v3 2/2] mm: fix double page fault on arm64 if PTE_AF
> > is cleared
> >
> > On Sat, Sep 14, 2019 at 12:32:39AM +0800, Jia He wrote:
> > > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest,
> > there
> > > will be a double page fault in __copy_from_user_inatomic of
> > cow_user_page.
> > >
> > > Below call trace is from arm64 do_page_fault for debugging purpose
> > > [  110.016195] Call trace:
> > > [  110.016826]  do_page_fault+0x5a4/0x690
> > > [  110.017812]  do_mem_abort+0x50/0xb0
> > > [  110.018726]  el1_da+0x20/0xc4
> > > [  110.019492]  __arch_copy_from_user+0x180/0x280
> > > [  110.020646]  do_wp_page+0xb0/0x860
> > > [  110.021517]  __handle_mm_fault+0x994/0x1338
> > > [  110.022606]  handle_mm_fault+0xe8/0x180
> > > [  110.023584]  do_page_fault+0x240/0x690
> > > [  110.024535]  do_mem_abort+0x50/0xb0
> > > [  110.025423]  el0_da+0x20/0x24
> > >
> > > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
> > pmd=000000023d4b3003, pte=360000298607bd3
> > >
> > > As told by Catalin: "On arm64 without hardware Access Flag, copying
> > from
> > > user will fail because the pte is old and cannot be marked young. So we
> > > always end up with zeroed page after fork() + CoW for pfn mappings. we
> > > don't always have a hardware-managed access flag on arm64."
> > >
> > > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > > changed because vmf should be passed to cow_user_page()
> > >
> > > [1]
> > https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> > >
> > > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > ---
> > >  mm/memory.c | 30 +++++++++++++++++++++++++-----
> > >  1 file changed, 25 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index e2bb51b6242e..a64af6495f71 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > >                                     2;
> > >  #endif
> > >
> > > +#ifndef arch_faults_on_old_pte
> > > +static inline bool arch_faults_on_old_pte(void)
> > > +{
> > > +   return false;
> > > +}
> > > +#endif
> > > +
> > >  static int __init disable_randmaps(char *s)
> > >  {
> > >     randomize_va_space = 0;
> > > @@ -2140,7 +2147,8 @@ static inline int pte_unmap_same(struct
> > mm_struct *mm, pmd_t *pmd,
> > >     return same;
> > >  }
> > >
> > > -static inline void cow_user_page(struct page *dst, struct page *src,
> > unsigned long va, struct vm_area_struct *vma)
> > > +static inline void cow_user_page(struct page *dst, struct page *src,
> > > +                           struct vm_fault *vmf)
> > >  {
> > >     debug_dma_assert_idle(src);
> > >
> > > @@ -2152,20 +2160,32 @@ static inline void cow_user_page(struct page
> > *dst, struct page *src, unsigned lo
> > >      */
> > >     if (unlikely(!src)) {
> > >             void *kaddr = kmap_atomic(dst);
> > > -           void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > +           void __user *uaddr = (void __user *)(vmf->address &
> > PAGE_MASK);
> > > +           pte_t entry;
> > >
> > >             /*
> > >              * This really shouldn't fail, because the page is there
> > >              * in the page tables. But it might just be unreadable,
> > >              * in which case we just give up and fill the result with
> > > -            * zeroes.
> > > +            * zeroes. If PTE_AF is cleared on arm64, it might
> > > +            * cause double page fault. So makes pte young here
> > >              */
> > > +           if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte))
> > {
> > > +                   spin_lock(vmf->ptl);
> > > +                   entry = pte_mkyoung(vmf->orig_pte);
> >
> > Should't you re-validate that orig_pte after re-taking ptl? It can be
> > stale by now.
> Thanks, do you mean flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte))
> before pte_mkyoung?

No. You need to check pte_same(*vmf->pte, vmf->orig_pte) before modifying
anything and bail out if *vmf->pte has changed under you.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13 16:32 [PATCH v3 0/2] fix double page fault on arm64 Jia He
2019-09-13 16:32 ` [PATCH v3 1/2] arm64: mm: implement arch_faults_on_old_pte() " Jia He
2019-09-16  9:20   ` Kirill A. Shutemov
2019-09-13 16:32 ` [PATCH v3 2/2] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
2019-09-16  9:16   ` Kirill A. Shutemov
2019-09-16  9:35     ` Justin He (Arm Technology China)
2019-09-16 14:16       ` Kirill A. Shutemov

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git