* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-18 14:00 ` Kirill A. Shutemov
0 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-18 14:00 UTC (permalink / raw)
To: Jia He
Cc: Mark Rutland, Catalin Marinas, linux-mm, Punit Agrawal,
Will Deacon, Alex Van Brunt, Marc Zyngier, Anshuman Khandual,
Matthew Wilcox, Jun Yao, Kaly Xin, hejianet, Ralph Campbell,
Suzuki Poulose, Jérôme Glisse, Thomas Gleixner,
linux-arm-kernel, linux-kernel, James Morse, Andrew Morton,
Robin Murphy, Kirill A. Shutemov
On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> will be a double page fault in __copy_from_user_inatomic of cow_user_page.
>
> Below call trace is from arm64 do_page_fault for debugging purpose
> [ 110.016195] Call trace:
> [ 110.016826] do_page_fault+0x5a4/0x690
> [ 110.017812] do_mem_abort+0x50/0xb0
> [ 110.018726] el1_da+0x20/0xc4
> [ 110.019492] __arch_copy_from_user+0x180/0x280
> [ 110.020646] do_wp_page+0xb0/0x860
> [ 110.021517] __handle_mm_fault+0x994/0x1338
> [ 110.022606] handle_mm_fault+0xe8/0x180
> [ 110.023584] do_page_fault+0x240/0x690
> [ 110.024535] do_mem_abort+0x50/0xb0
> [ 110.025423] el0_da+0x20/0x24
>
> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
>
> As told by Catalin: "On arm64 without hardware Access Flag, copying from
> user will fail because the pte is old and cannot be marked young. So we
> always end up with zeroed page after fork() + CoW for pfn mappings. we
> don't always have a hardware-managed access flag on arm64."
>
> This patch fix it by calling pte_mkyoung. Also, the parameter is
> changed because vmf should be passed to cow_user_page()
>
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
>
> Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
> mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
> 1 file changed, 30 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index e2bb51b6242e..d2c130a5883b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> 2;
> #endif
>
> +#ifndef arch_faults_on_old_pte
> +static inline bool arch_faults_on_old_pte(void)
> +{
> + return false;
> +}
> +#endif
> +
> static int __init disable_randmaps(char *s)
> {
> randomize_va_space = 0;
> @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
> return same;
> }
>
> -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
> +static inline void cow_user_page(struct page *dst, struct page *src,
> + struct vm_fault *vmf)
> {
> + struct vm_area_struct *vma = vmf->vma;
> + unsigned long addr = vmf->address;
> +
> debug_dma_assert_idle(src);
>
> /*
> @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> */
> if (unlikely(!src)) {
> void *kaddr = kmap_atomic(dst);
> - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> + pte_t entry;
>
> /*
> * This really shouldn't fail, because the page is there
> * in the page tables. But it might just be unreadable,
> * in which case we just give up and fill the result with
> - * zeroes.
> + * zeroes. On architectures with software "accessed" bits,
> + * we would take a double page fault here, so mark it
> + * accessed here.
> */
> + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> + spin_lock(vmf->ptl);
> + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> + entry = pte_mkyoung(vmf->orig_pte);
> + if (ptep_set_access_flags(vma, addr,
> + vmf->pte, entry, 0))
> + update_mmu_cache(vma, addr, vmf->pte);
> + }
I don't follow.
So if pte has changed under you, you don't set the accessed bit, but never
the less copy from the user.
What makes you think it will not trigger the same problem?
I think we need to make cow_user_page() fail in this case and caller --
wp_page_copy() -- return zero. If the fault was solved by other thread, we
are fine. If not userspace would re-fault on the same address and we will
handle the fault from the second attempt.
> + spin_unlock(vmf->ptl);
> + }
> +
> if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
> clear_page(kaddr);
> kunmap_atomic(kaddr);
> flush_dcache_page(dst);
> } else
> - copy_user_highpage(dst, src, va, vma);
> + copy_user_highpage(dst, src, addr, vma);
> }
>
> static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
> @@ -2318,7 +2343,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> vmf->address);
> if (!new_page)
> goto oom;
> - cow_user_page(new_page, old_page, vmf->address, vma);
> + cow_user_page(new_page, old_page, vmf);
> }
>
> if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
> --
> 2.17.1
>
>
--
Kirill A. Shutemov
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
2019-09-18 14:00 ` Kirill A. Shutemov
@ 2019-09-18 18:00 ` Catalin Marinas
-1 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-18 18:00 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Jia He, Will Deacon, Mark Rutland, James Morse, Marc Zyngier,
Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
linux-kernel, linux-mm, Suzuki Poulose, Punit Agrawal,
Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
Thomas Gleixner, Andrew Morton, Jérôme Glisse,
Ralph Campbell, hejianet, Kaly Xin
On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > */
> > if (unlikely(!src)) {
> > void *kaddr = kmap_atomic(dst);
> > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > + pte_t entry;
> >
> > /*
> > * This really shouldn't fail, because the page is there
> > * in the page tables. But it might just be unreadable,
> > * in which case we just give up and fill the result with
> > - * zeroes.
> > + * zeroes. On architectures with software "accessed" bits,
> > + * we would take a double page fault here, so mark it
> > + * accessed here.
> > */
> > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > + spin_lock(vmf->ptl);
> > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > + entry = pte_mkyoung(vmf->orig_pte);
> > + if (ptep_set_access_flags(vma, addr,
> > + vmf->pte, entry, 0))
> > + update_mmu_cache(vma, addr, vmf->pte);
> > + }
>
> I don't follow.
>
> So if pte has changed under you, you don't set the accessed bit, but never
> the less copy from the user.
>
> What makes you think it will not trigger the same problem?
>
> I think we need to make cow_user_page() fail in this case and caller --
> wp_page_copy() -- return zero. If the fault was solved by other thread, we
> are fine. If not userspace would re-fault on the same address and we will
> handle the fault from the second attempt.
It would be nice to clarify the semantics of this function and do as
you suggest but the current comment is slightly confusing:
/*
* If the source page was a PFN mapping, we don't have
* a "struct page" for it. We do a best-effort copy by
* just copying from the original user address. If that
* fails, we just zero-fill it. Live with it.
*/
Would any user-space rely on getting a zero-filled page here instead of
a recursive fault?
--
Catalin
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-18 18:00 ` Catalin Marinas
0 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-18 18:00 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Mark Rutland, linux-mm, Punit Agrawal, Will Deacon,
Alex Van Brunt, Jia He, Marc Zyngier, Anshuman Khandual,
Matthew Wilcox, Jun Yao, Kaly Xin, hejianet, Ralph Campbell,
Suzuki Poulose, Jérôme Glisse, Thomas Gleixner,
linux-arm-kernel, linux-kernel, James Morse, Andrew Morton,
Robin Murphy, Kirill A. Shutemov
On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > */
> > if (unlikely(!src)) {
> > void *kaddr = kmap_atomic(dst);
> > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > + pte_t entry;
> >
> > /*
> > * This really shouldn't fail, because the page is there
> > * in the page tables. But it might just be unreadable,
> > * in which case we just give up and fill the result with
> > - * zeroes.
> > + * zeroes. On architectures with software "accessed" bits,
> > + * we would take a double page fault here, so mark it
> > + * accessed here.
> > */
> > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > + spin_lock(vmf->ptl);
> > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > + entry = pte_mkyoung(vmf->orig_pte);
> > + if (ptep_set_access_flags(vma, addr,
> > + vmf->pte, entry, 0))
> > + update_mmu_cache(vma, addr, vmf->pte);
> > + }
>
> I don't follow.
>
> So if pte has changed under you, you don't set the accessed bit, but never
> the less copy from the user.
>
> What makes you think it will not trigger the same problem?
>
> I think we need to make cow_user_page() fail in this case and caller --
> wp_page_copy() -- return zero. If the fault was solved by other thread, we
> are fine. If not userspace would re-fault on the same address and we will
> handle the fault from the second attempt.
It would be nice to clarify the semantics of this function and do as
you suggest but the current comment is slightly confusing:
/*
* If the source page was a PFN mapping, we don't have
* a "struct page" for it. We do a best-effort copy by
* just copying from the original user address. If that
* fails, we just zero-fill it. Live with it.
*/
Would any user-space rely on getting a zero-filled page here instead of
a recursive fault?
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
2019-09-18 18:00 ` Catalin Marinas
@ 2019-09-19 15:00 ` Kirill A. Shutemov
-1 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 15:00 UTC (permalink / raw)
To: Catalin Marinas
Cc: Jia He, Will Deacon, Mark Rutland, James Morse, Marc Zyngier,
Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
linux-kernel, linux-mm, Suzuki Poulose, Punit Agrawal,
Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
Thomas Gleixner, Andrew Morton, Jérôme Glisse,
Ralph Campbell, hejianet, Kaly Xin
On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > */
> > > if (unlikely(!src)) {
> > > void *kaddr = kmap_atomic(dst);
> > > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > + pte_t entry;
> > >
> > > /*
> > > * This really shouldn't fail, because the page is there
> > > * in the page tables. But it might just be unreadable,
> > > * in which case we just give up and fill the result with
> > > - * zeroes.
> > > + * zeroes. On architectures with software "accessed" bits,
> > > + * we would take a double page fault here, so mark it
> > > + * accessed here.
> > > */
> > > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > + spin_lock(vmf->ptl);
> > > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > + entry = pte_mkyoung(vmf->orig_pte);
> > > + if (ptep_set_access_flags(vma, addr,
> > > + vmf->pte, entry, 0))
> > > + update_mmu_cache(vma, addr, vmf->pte);
> > > + }
> >
> > I don't follow.
> >
> > So if pte has changed under you, you don't set the accessed bit, but never
> > the less copy from the user.
> >
> > What makes you think it will not trigger the same problem?
> >
> > I think we need to make cow_user_page() fail in this case and caller --
> > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > are fine. If not userspace would re-fault on the same address and we will
> > handle the fault from the second attempt.
>
> It would be nice to clarify the semantics of this function and do as
> you suggest but the current comment is slightly confusing:
>
> /*
> * If the source page was a PFN mapping, we don't have
> * a "struct page" for it. We do a best-effort copy by
> * just copying from the original user address. If that
> * fails, we just zero-fill it. Live with it.
> */
>
> Would any user-space rely on getting a zero-filled page here instead of
> a recursive fault?
I don't see the point in zero-filled page in this case. SIGBUS sounds like
more appropriate response, no?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 15:00 ` Kirill A. Shutemov
0 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 15:00 UTC (permalink / raw)
To: Catalin Marinas
Cc: Mark Rutland, linux-mm, Punit Agrawal, Will Deacon,
Alex Van Brunt, Jia He, Marc Zyngier, Anshuman Khandual,
Matthew Wilcox, Jun Yao, Kaly Xin, hejianet, Ralph Campbell,
Suzuki Poulose, Jérôme Glisse, Thomas Gleixner,
linux-arm-kernel, linux-kernel, James Morse, Andrew Morton,
Robin Murphy, Kirill A. Shutemov
On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > */
> > > if (unlikely(!src)) {
> > > void *kaddr = kmap_atomic(dst);
> > > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > + pte_t entry;
> > >
> > > /*
> > > * This really shouldn't fail, because the page is there
> > > * in the page tables. But it might just be unreadable,
> > > * in which case we just give up and fill the result with
> > > - * zeroes.
> > > + * zeroes. On architectures with software "accessed" bits,
> > > + * we would take a double page fault here, so mark it
> > > + * accessed here.
> > > */
> > > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > + spin_lock(vmf->ptl);
> > > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > + entry = pte_mkyoung(vmf->orig_pte);
> > > + if (ptep_set_access_flags(vma, addr,
> > > + vmf->pte, entry, 0))
> > > + update_mmu_cache(vma, addr, vmf->pte);
> > > + }
> >
> > I don't follow.
> >
> > So if pte has changed under you, you don't set the accessed bit, but never
> > the less copy from the user.
> >
> > What makes you think it will not trigger the same problem?
> >
> > I think we need to make cow_user_page() fail in this case and caller --
> > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > are fine. If not userspace would re-fault on the same address and we will
> > handle the fault from the second attempt.
>
> It would be nice to clarify the semantics of this function and do as
> you suggest but the current comment is slightly confusing:
>
> /*
> * If the source page was a PFN mapping, we don't have
> * a "struct page" for it. We do a best-effort copy by
> * just copying from the original user address. If that
> * fails, we just zero-fill it. Live with it.
> */
>
> Would any user-space rely on getting a zero-filled page here instead of
> a recursive fault?
I don't see the point in zero-filled page in this case. SIGBUS sounds like
more appropriate response, no?
--
Kirill A. Shutemov
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
2019-09-19 15:00 ` Kirill A. Shutemov
@ 2019-09-19 15:41 ` Catalin Marinas
-1 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-19 15:41 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Jia He, Will Deacon, Mark Rutland, James Morse, Marc Zyngier,
Matthew Wilcox, Kirill A. Shutemov, linux-arm-kernel,
linux-kernel, linux-mm, Suzuki Poulose, Punit Agrawal,
Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
Thomas Gleixner, Andrew Morton, Jérôme Glisse,
Ralph Campbell, hejianet, Kaly Xin
On Thu, Sep 19, 2019 at 06:00:07PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> > On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > > */
> > > > if (unlikely(!src)) {
> > > > void *kaddr = kmap_atomic(dst);
> > > > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > + pte_t entry;
> > > >
> > > > /*
> > > > * This really shouldn't fail, because the page is there
> > > > * in the page tables. But it might just be unreadable,
> > > > * in which case we just give up and fill the result with
> > > > - * zeroes.
> > > > + * zeroes. On architectures with software "accessed" bits,
> > > > + * we would take a double page fault here, so mark it
> > > > + * accessed here.
> > > > */
> > > > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > > + spin_lock(vmf->ptl);
> > > > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > + entry = pte_mkyoung(vmf->orig_pte);
> > > > + if (ptep_set_access_flags(vma, addr,
> > > > + vmf->pte, entry, 0))
> > > > + update_mmu_cache(vma, addr, vmf->pte);
> > > > + }
> > >
> > > I don't follow.
> > >
> > > So if pte has changed under you, you don't set the accessed bit, but never
> > > the less copy from the user.
> > >
> > > What makes you think it will not trigger the same problem?
> > >
> > > I think we need to make cow_user_page() fail in this case and caller --
> > > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > > are fine. If not userspace would re-fault on the same address and we will
> > > handle the fault from the second attempt.
> >
> > It would be nice to clarify the semantics of this function and do as
> > you suggest but the current comment is slightly confusing:
> >
> > /*
> > * If the source page was a PFN mapping, we don't have
> > * a "struct page" for it. We do a best-effort copy by
> > * just copying from the original user address. If that
> > * fails, we just zero-fill it. Live with it.
> > */
> >
> > Would any user-space rely on getting a zero-filled page here instead of
> > a recursive fault?
>
> I don't see the point in zero-filled page in this case. SIGBUS sounds like
> more appropriate response, no?
I think misunderstood your comment. So, if !pte_same(), we should let
userspace re-fault. This wouldn't be a user ABI change and it is
bounded, can't end up in an infinite re-fault loop.
In case of a __copy_from_user_inatomic() error, SIGBUS would make more
sense but it changes the current behaviour (zero-filling the page). This
can be left for a separate patch, doesn't affect the arm64 case here.
--
Catalin
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 15:41 ` Catalin Marinas
0 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2019-09-19 15:41 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Mark Rutland, linux-mm, Punit Agrawal, Will Deacon,
Alex Van Brunt, Jia He, Marc Zyngier, Anshuman Khandual,
Matthew Wilcox, Jun Yao, Kaly Xin, hejianet, Ralph Campbell,
Suzuki Poulose, Jérôme Glisse, Thomas Gleixner,
linux-arm-kernel, linux-kernel, James Morse, Andrew Morton,
Robin Murphy, Kirill A. Shutemov
On Thu, Sep 19, 2019 at 06:00:07PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> > On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > > */
> > > > if (unlikely(!src)) {
> > > > void *kaddr = kmap_atomic(dst);
> > > > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > + pte_t entry;
> > > >
> > > > /*
> > > > * This really shouldn't fail, because the page is there
> > > > * in the page tables. But it might just be unreadable,
> > > > * in which case we just give up and fill the result with
> > > > - * zeroes.
> > > > + * zeroes. On architectures with software "accessed" bits,
> > > > + * we would take a double page fault here, so mark it
> > > > + * accessed here.
> > > > */
> > > > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > > + spin_lock(vmf->ptl);
> > > > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > + entry = pte_mkyoung(vmf->orig_pte);
> > > > + if (ptep_set_access_flags(vma, addr,
> > > > + vmf->pte, entry, 0))
> > > > + update_mmu_cache(vma, addr, vmf->pte);
> > > > + }
> > >
> > > I don't follow.
> > >
> > > So if pte has changed under you, you don't set the accessed bit, but never
> > > the less copy from the user.
> > >
> > > What makes you think it will not trigger the same problem?
> > >
> > > I think we need to make cow_user_page() fail in this case and caller --
> > > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > > are fine. If not userspace would re-fault on the same address and we will
> > > handle the fault from the second attempt.
> >
> > It would be nice to clarify the semantics of this function and do as
> > you suggest but the current comment is slightly confusing:
> >
> > /*
> > * If the source page was a PFN mapping, we don't have
> > * a "struct page" for it. We do a best-effort copy by
> > * just copying from the original user address. If that
> > * fails, we just zero-fill it. Live with it.
> > */
> >
> > Would any user-space rely on getting a zero-filled page here instead of
> > a recursive fault?
>
> I don't see the point in zero-filled page in this case. SIGBUS sounds like
> more appropriate response, no?
I think misunderstood your comment. So, if !pte_same(), we should let
userspace re-fault. This wouldn't be a user ABI change and it is
bounded, can't end up in an infinite re-fault loop.
In case of a __copy_from_user_inatomic() error, SIGBUS would make more
sense but it changes the current behaviour (zero-filling the page). This
can be left for a separate patch, doesn't affect the arm64 case here.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
2019-09-19 15:41 ` Catalin Marinas
@ 2019-09-19 15:51 ` Kirill A. Shutemov
-1 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 15:51 UTC (permalink / raw)
To: Catalin Marinas
Cc: Kirill A. Shutemov, Jia He, Will Deacon, Mark Rutland,
James Morse, Marc Zyngier, Matthew Wilcox, linux-arm-kernel,
linux-kernel, linux-mm, Suzuki Poulose, Punit Agrawal,
Anshuman Khandual, Jun Yao, Alex Van Brunt, Robin Murphy,
Thomas Gleixner, Andrew Morton, Jérôme Glisse,
Ralph Campbell, hejianet, Kaly Xin
On Thu, Sep 19, 2019 at 03:41:43PM +0000, Catalin Marinas wrote:
> On Thu, Sep 19, 2019 at 06:00:07PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> > > On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > > > */
> > > > > if (unlikely(!src)) {
> > > > > void *kaddr = kmap_atomic(dst);
> > > > > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > > + pte_t entry;
> > > > >
> > > > > /*
> > > > > * This really shouldn't fail, because the page is there
> > > > > * in the page tables. But it might just be unreadable,
> > > > > * in which case we just give up and fill the result with
> > > > > - * zeroes.
> > > > > + * zeroes. On architectures with software "accessed" bits,
> > > > > + * we would take a double page fault here, so mark it
> > > > > + * accessed here.
> > > > > */
> > > > > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > > > + spin_lock(vmf->ptl);
> > > > > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > > + entry = pte_mkyoung(vmf->orig_pte);
> > > > > + if (ptep_set_access_flags(vma, addr,
> > > > > + vmf->pte, entry, 0))
> > > > > + update_mmu_cache(vma, addr, vmf->pte);
> > > > > + }
> > > >
> > > > I don't follow.
> > > >
> > > > So if pte has changed under you, you don't set the accessed bit, but never
> > > > the less copy from the user.
> > > >
> > > > What makes you think it will not trigger the same problem?
> > > >
> > > > I think we need to make cow_user_page() fail in this case and caller --
> > > > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > > > are fine. If not userspace would re-fault on the same address and we will
> > > > handle the fault from the second attempt.
> > >
> > > It would be nice to clarify the semantics of this function and do as
> > > you suggest but the current comment is slightly confusing:
> > >
> > > /*
> > > * If the source page was a PFN mapping, we don't have
> > > * a "struct page" for it. We do a best-effort copy by
> > > * just copying from the original user address. If that
> > > * fails, we just zero-fill it. Live with it.
> > > */
> > >
> > > Would any user-space rely on getting a zero-filled page here instead of
> > > a recursive fault?
> >
> > I don't see the point in zero-filled page in this case. SIGBUS sounds like
> > more appropriate response, no?
>
> I think misunderstood your comment. So, if !pte_same(), we should let
> userspace re-fault. This wouldn't be a user ABI change and it is
> bounded, can't end up in an infinite re-fault loop.
Right. !pte_same() can only happen if we raced with somebody else.
I cannot imagine situation when the race will happen more than few times
in a row.
> In case of a __copy_from_user_inatomic() error, SIGBUS would make more
> sense but it changes the current behaviour (zero-filling the page). This
> can be left for a separate patch, doesn't affect the arm64 case here.
I think it's safer to leave it as is. Maybe put WARN_ON_ONCE() or
something. There can be some obscure use-case that I don't see.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 15:51 ` Kirill A. Shutemov
0 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 15:51 UTC (permalink / raw)
To: Catalin Marinas
Cc: Mark Rutland, linux-mm, Punit Agrawal, Will Deacon,
Alex Van Brunt, Jia He, Marc Zyngier, Anshuman Khandual,
Matthew Wilcox, Jun Yao, Kaly Xin, hejianet, Ralph Campbell,
Suzuki Poulose, Jérôme Glisse, Kirill A. Shutemov,
Thomas Gleixner, linux-arm-kernel, linux-kernel, James Morse,
Andrew Morton, Robin Murphy
On Thu, Sep 19, 2019 at 03:41:43PM +0000, Catalin Marinas wrote:
> On Thu, Sep 19, 2019 at 06:00:07PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 07:00:30PM +0100, Catalin Marinas wrote:
> > > On Wed, Sep 18, 2019 at 05:00:27PM +0300, Kirill A. Shutemov wrote:
> > > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > > > */
> > > > > if (unlikely(!src)) {
> > > > > void *kaddr = kmap_atomic(dst);
> > > > > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > > + pte_t entry;
> > > > >
> > > > > /*
> > > > > * This really shouldn't fail, because the page is there
> > > > > * in the page tables. But it might just be unreadable,
> > > > > * in which case we just give up and fill the result with
> > > > > - * zeroes.
> > > > > + * zeroes. On architectures with software "accessed" bits,
> > > > > + * we would take a double page fault here, so mark it
> > > > > + * accessed here.
> > > > > */
> > > > > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > > > + spin_lock(vmf->ptl);
> > > > > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > > + entry = pte_mkyoung(vmf->orig_pte);
> > > > > + if (ptep_set_access_flags(vma, addr,
> > > > > + vmf->pte, entry, 0))
> > > > > + update_mmu_cache(vma, addr, vmf->pte);
> > > > > + }
> > > >
> > > > I don't follow.
> > > >
> > > > So if pte has changed under you, you don't set the accessed bit, but never
> > > > the less copy from the user.
> > > >
> > > > What makes you think it will not trigger the same problem?
> > > >
> > > > I think we need to make cow_user_page() fail in this case and caller --
> > > > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > > > are fine. If not userspace would re-fault on the same address and we will
> > > > handle the fault from the second attempt.
> > >
> > > It would be nice to clarify the semantics of this function and do as
> > > you suggest but the current comment is slightly confusing:
> > >
> > > /*
> > > * If the source page was a PFN mapping, we don't have
> > > * a "struct page" for it. We do a best-effort copy by
> > > * just copying from the original user address. If that
> > > * fails, we just zero-fill it. Live with it.
> > > */
> > >
> > > Would any user-space rely on getting a zero-filled page here instead of
> > > a recursive fault?
> >
> > I don't see the point in zero-filled page in this case. SIGBUS sounds like
> > more appropriate response, no?
>
> I think misunderstood your comment. So, if !pte_same(), we should let
> userspace re-fault. This wouldn't be a user ABI change and it is
> bounded, can't end up in an infinite re-fault loop.
Right. !pte_same() can only happen if we raced with somebody else.
I cannot imagine situation when the race will happen more than few times
in a row.
> In case of a __copy_from_user_inatomic() error, SIGBUS would make more
> sense but it changes the current behaviour (zero-filling the page). This
> can be left for a separate patch, doesn't affect the arm64 case here.
I think it's safer to leave it as is. Maybe put WARN_ON_ONCE() or
something. There can be some obscure use-case that I don't see.
--
Kirill A. Shutemov
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
2019-09-18 14:00 ` Kirill A. Shutemov
@ 2019-09-19 2:16 ` Jia He
-1 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-19 2:16 UTC (permalink / raw)
To: Kirill A. Shutemov, Jia He
Cc: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose,
Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
Robin Murphy, Thomas Gleixner, Andrew Morton,
Jérôme Glisse, Ralph Campbell, Kaly Xin
Hi Kirill
[On behalf of justin.he@arm.com because some mails are filted...]
On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
>> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
>> will be a double page fault in __copy_from_user_inatomic of cow_user_page.
>>
>> Below call trace is from arm64 do_page_fault for debugging purpose
>> [ 110.016195] Call trace:
>> [ 110.016826] do_page_fault+0x5a4/0x690
>> [ 110.017812] do_mem_abort+0x50/0xb0
>> [ 110.018726] el1_da+0x20/0xc4
>> [ 110.019492] __arch_copy_from_user+0x180/0x280
>> [ 110.020646] do_wp_page+0xb0/0x860
>> [ 110.021517] __handle_mm_fault+0x994/0x1338
>> [ 110.022606] handle_mm_fault+0xe8/0x180
>> [ 110.023584] do_page_fault+0x240/0x690
>> [ 110.024535] do_mem_abort+0x50/0xb0
>> [ 110.025423] el0_da+0x20/0x24
>>
>> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
>> [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
>>
>> As told by Catalin: "On arm64 without hardware Access Flag, copying from
>> user will fail because the pte is old and cannot be marked young. So we
>> always end up with zeroed page after fork() + CoW for pfn mappings. we
>> don't always have a hardware-managed access flag on arm64."
>>
>> This patch fix it by calling pte_mkyoung. Also, the parameter is
>> changed because vmf should be passed to cow_user_page()
>>
>> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
>>
>> Reported-by: Yibo Cai <Yibo.Cai@arm.com>
>> Signed-off-by: Jia He <justin.he@arm.com>
>> ---
>> mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
>> 1 file changed, 30 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index e2bb51b6242e..d2c130a5883b 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
>> 2;
>> #endif
>>
>> +#ifndef arch_faults_on_old_pte
>> +static inline bool arch_faults_on_old_pte(void)
>> +{
>> + return false;
>> +}
>> +#endif
>> +
>> static int __init disable_randmaps(char *s)
>> {
>> randomize_va_space = 0;
>> @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
>> return same;
>> }
>>
>> -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
>> +static inline void cow_user_page(struct page *dst, struct page *src,
>> + struct vm_fault *vmf)
>> {
>> + struct vm_area_struct *vma = vmf->vma;
>> + unsigned long addr = vmf->address;
>> +
>> debug_dma_assert_idle(src);
>>
>> /*
>> @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
>> */
>> if (unlikely(!src)) {
>> void *kaddr = kmap_atomic(dst);
>> - void __user *uaddr = (void __user *)(va & PAGE_MASK);
>> + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
>> + pte_t entry;
>>
>> /*
>> * This really shouldn't fail, because the page is there
>> * in the page tables. But it might just be unreadable,
>> * in which case we just give up and fill the result with
>> - * zeroes.
>> + * zeroes. On architectures with software "accessed" bits,
>> + * we would take a double page fault here, so mark it
>> + * accessed here.
>> */
>> + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
>> + spin_lock(vmf->ptl);
>> + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
>> + entry = pte_mkyoung(vmf->orig_pte);
>> + if (ptep_set_access_flags(vma, addr,
>> + vmf->pte, entry, 0))
>> + update_mmu_cache(vma, addr, vmf->pte);
>> + }
> I don't follow.
>
> So if pte has changed under you, you don't set the accessed bit, but never
> the less copy from the user.
>
> What makes you think it will not trigger the same problem?
>
> I think we need to make cow_user_page() fail in this case and caller --
> wp_page_copy() -- return zero. If the fault was solved by other thread, we
> are fine. If not userspace would re-fault on the same address and we will
> handle the fault from the second attempt.
Thanks for the pointing. How about make cow_user_page() be returned
VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?
---
Cheers,
Justin (Jia He)
>
>> + spin_unlock(vmf->ptl);
>> + }
>> +
>> if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
>> clear_page(kaddr);
>> kunmap_atomic(kaddr);
>> flush_dcache_page(dst);
>> } else
>> - copy_user_highpage(dst, src, va, vma);
>> + copy_user_highpage(dst, src, addr, vma);
>> }
>>
>> static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
>> @@ -2318,7 +2343,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>> vmf->address);
>> if (!new_page)
>> goto oom;
>> - cow_user_page(new_page, old_page, vmf->address, vma);
>> + cow_user_page(new_page, old_page, vmf);
>> }
>>
>> if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
>> --
>> 2.17.1
>>
>>
--
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 2:16 ` Jia He
0 siblings, 0 replies; 39+ messages in thread
From: Jia He @ 2019-09-19 2:16 UTC (permalink / raw)
To: Kirill A. Shutemov, Jia He
Cc: Mark Rutland, Kaly Xin, Ralph Campbell, Andrew Morton,
Suzuki Poulose, Catalin Marinas, Anshuman Khandual, linux-kernel,
Matthew Wilcox, Jun Yao, linux-mm, Jérôme Glisse,
James Morse, linux-arm-kernel, Punit Agrawal, Marc Zyngier,
Thomas Gleixner, Will Deacon, Alex Van Brunt, Kirill A. Shutemov,
Robin Murphy
Hi Kirill
[On behalf of justin.he@arm.com because some mails are filted...]
On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
>> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
>> will be a double page fault in __copy_from_user_inatomic of cow_user_page.
>>
>> Below call trace is from arm64 do_page_fault for debugging purpose
>> [ 110.016195] Call trace:
>> [ 110.016826] do_page_fault+0x5a4/0x690
>> [ 110.017812] do_mem_abort+0x50/0xb0
>> [ 110.018726] el1_da+0x20/0xc4
>> [ 110.019492] __arch_copy_from_user+0x180/0x280
>> [ 110.020646] do_wp_page+0xb0/0x860
>> [ 110.021517] __handle_mm_fault+0x994/0x1338
>> [ 110.022606] handle_mm_fault+0xe8/0x180
>> [ 110.023584] do_page_fault+0x240/0x690
>> [ 110.024535] do_mem_abort+0x50/0xb0
>> [ 110.025423] el0_da+0x20/0x24
>>
>> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
>> [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
>>
>> As told by Catalin: "On arm64 without hardware Access Flag, copying from
>> user will fail because the pte is old and cannot be marked young. So we
>> always end up with zeroed page after fork() + CoW for pfn mappings. we
>> don't always have a hardware-managed access flag on arm64."
>>
>> This patch fix it by calling pte_mkyoung. Also, the parameter is
>> changed because vmf should be passed to cow_user_page()
>>
>> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
>>
>> Reported-by: Yibo Cai <Yibo.Cai@arm.com>
>> Signed-off-by: Jia He <justin.he@arm.com>
>> ---
>> mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
>> 1 file changed, 30 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index e2bb51b6242e..d2c130a5883b 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
>> 2;
>> #endif
>>
>> +#ifndef arch_faults_on_old_pte
>> +static inline bool arch_faults_on_old_pte(void)
>> +{
>> + return false;
>> +}
>> +#endif
>> +
>> static int __init disable_randmaps(char *s)
>> {
>> randomize_va_space = 0;
>> @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
>> return same;
>> }
>>
>> -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
>> +static inline void cow_user_page(struct page *dst, struct page *src,
>> + struct vm_fault *vmf)
>> {
>> + struct vm_area_struct *vma = vmf->vma;
>> + unsigned long addr = vmf->address;
>> +
>> debug_dma_assert_idle(src);
>>
>> /*
>> @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
>> */
>> if (unlikely(!src)) {
>> void *kaddr = kmap_atomic(dst);
>> - void __user *uaddr = (void __user *)(va & PAGE_MASK);
>> + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
>> + pte_t entry;
>>
>> /*
>> * This really shouldn't fail, because the page is there
>> * in the page tables. But it might just be unreadable,
>> * in which case we just give up and fill the result with
>> - * zeroes.
>> + * zeroes. On architectures with software "accessed" bits,
>> + * we would take a double page fault here, so mark it
>> + * accessed here.
>> */
>> + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
>> + spin_lock(vmf->ptl);
>> + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
>> + entry = pte_mkyoung(vmf->orig_pte);
>> + if (ptep_set_access_flags(vma, addr,
>> + vmf->pte, entry, 0))
>> + update_mmu_cache(vma, addr, vmf->pte);
>> + }
> I don't follow.
>
> So if pte has changed under you, you don't set the accessed bit, but never
> the less copy from the user.
>
> What makes you think it will not trigger the same problem?
>
> I think we need to make cow_user_page() fail in this case and caller --
> wp_page_copy() -- return zero. If the fault was solved by other thread, we
> are fine. If not userspace would re-fault on the same address and we will
> handle the fault from the second attempt.
Thanks for the pointing. How about make cow_user_page() be returned
VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?
---
Cheers,
Justin (Jia He)
>
>> + spin_unlock(vmf->ptl);
>> + }
>> +
>> if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
>> clear_page(kaddr);
>> kunmap_atomic(kaddr);
>> flush_dcache_page(dst);
>> } else
>> - copy_user_highpage(dst, src, va, vma);
>> + copy_user_highpage(dst, src, addr, vma);
>> }
>>
>> static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
>> @@ -2318,7 +2343,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>> vmf->address);
>> if (!new_page)
>> goto oom;
>> - cow_user_page(new_page, old_page, vmf->address, vma);
>> + cow_user_page(new_page, old_page, vmf);
>> }
>>
>> if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
>> --
>> 2.17.1
>>
>>
--
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
2019-09-19 2:16 ` Jia He
@ 2019-09-19 14:57 ` Kirill A. Shutemov
-1 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 14:57 UTC (permalink / raw)
To: Jia He
Cc: Jia He, Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose,
Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
Robin Murphy, Thomas Gleixner, Andrew Morton,
Jérôme Glisse, Ralph Campbell, Kaly Xin
On Thu, Sep 19, 2019 at 10:16:34AM +0800, Jia He wrote:
> Hi Kirill
>
> [On behalf of justin.he@arm.com because some mails are filted...]
>
> On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> > > will be a double page fault in __copy_from_user_inatomic of cow_user_page.
> > >
> > > Below call trace is from arm64 do_page_fault for debugging purpose
> > > [ 110.016195] Call trace:
> > > [ 110.016826] do_page_fault+0x5a4/0x690
> > > [ 110.017812] do_mem_abort+0x50/0xb0
> > > [ 110.018726] el1_da+0x20/0xc4
> > > [ 110.019492] __arch_copy_from_user+0x180/0x280
> > > [ 110.020646] do_wp_page+0xb0/0x860
> > > [ 110.021517] __handle_mm_fault+0x994/0x1338
> > > [ 110.022606] handle_mm_fault+0xe8/0x180
> > > [ 110.023584] do_page_fault+0x240/0x690
> > > [ 110.024535] do_mem_abort+0x50/0xb0
> > > [ 110.025423] el0_da+0x20/0x24
> > >
> > > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
> > >
> > > As told by Catalin: "On arm64 without hardware Access Flag, copying from
> > > user will fail because the pte is old and cannot be marked young. So we
> > > always end up with zeroed page after fork() + CoW for pfn mappings. we
> > > don't always have a hardware-managed access flag on arm64."
> > >
> > > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > > changed because vmf should be passed to cow_user_page()
> > >
> > > [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> > >
> > > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > ---
> > > mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
> > > 1 file changed, 30 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index e2bb51b6242e..d2c130a5883b 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > > 2;
> > > #endif
> > > +#ifndef arch_faults_on_old_pte
> > > +static inline bool arch_faults_on_old_pte(void)
> > > +{
> > > + return false;
> > > +}
> > > +#endif
> > > +
> > > static int __init disable_randmaps(char *s)
> > > {
> > > randomize_va_space = 0;
> > > @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
> > > return same;
> > > }
> > > -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
> > > +static inline void cow_user_page(struct page *dst, struct page *src,
> > > + struct vm_fault *vmf)
> > > {
> > > + struct vm_area_struct *vma = vmf->vma;
> > > + unsigned long addr = vmf->address;
> > > +
> > > debug_dma_assert_idle(src);
> > > /*
> > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > */
> > > if (unlikely(!src)) {
> > > void *kaddr = kmap_atomic(dst);
> > > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > + pte_t entry;
> > > /*
> > > * This really shouldn't fail, because the page is there
> > > * in the page tables. But it might just be unreadable,
> > > * in which case we just give up and fill the result with
> > > - * zeroes.
> > > + * zeroes. On architectures with software "accessed" bits,
> > > + * we would take a double page fault here, so mark it
> > > + * accessed here.
> > > */
> > > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > + spin_lock(vmf->ptl);
> > > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > + entry = pte_mkyoung(vmf->orig_pte);
> > > + if (ptep_set_access_flags(vma, addr,
> > > + vmf->pte, entry, 0))
> > > + update_mmu_cache(vma, addr, vmf->pte);
> > > + }
> > I don't follow.
> >
> > So if pte has changed under you, you don't set the accessed bit, but never
> > the less copy from the user.
> >
> > What makes you think it will not trigger the same problem?
> >
> > I think we need to make cow_user_page() fail in this case and caller --
> > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > are fine. If not userspace would re-fault on the same address and we will
> > handle the fault from the second attempt.
>
> Thanks for the pointing. How about make cow_user_page() be returned
>
> VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?
No. VM_FAULT_RETRY has different semantics: we have to drop mmap_sem(), so
let's try to take it again and handle the fault. In this case the more
likely scenario is that other thread has already handled the fault and we
don't need to do anything. If it's not the case, the fault will be
triggered again on the same address.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 14:57 ` Kirill A. Shutemov
0 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2019-09-19 14:57 UTC (permalink / raw)
To: Jia He
Cc: Mark Rutland, Catalin Marinas, linux-mm, Punit Agrawal,
Will Deacon, Alex Van Brunt, Jia He, Marc Zyngier,
Anshuman Khandual, Matthew Wilcox, Jun Yao, Kaly Xin,
Ralph Campbell, Suzuki Poulose, Jérôme Glisse,
Thomas Gleixner, linux-arm-kernel, linux-kernel, James Morse,
Andrew Morton, Robin Murphy, Kirill A. Shutemov
On Thu, Sep 19, 2019 at 10:16:34AM +0800, Jia He wrote:
> Hi Kirill
>
> [On behalf of justin.he@arm.com because some mails are filted...]
>
> On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> > > will be a double page fault in __copy_from_user_inatomic of cow_user_page.
> > >
> > > Below call trace is from arm64 do_page_fault for debugging purpose
> > > [ 110.016195] Call trace:
> > > [ 110.016826] do_page_fault+0x5a4/0x690
> > > [ 110.017812] do_mem_abort+0x50/0xb0
> > > [ 110.018726] el1_da+0x20/0xc4
> > > [ 110.019492] __arch_copy_from_user+0x180/0x280
> > > [ 110.020646] do_wp_page+0xb0/0x860
> > > [ 110.021517] __handle_mm_fault+0x994/0x1338
> > > [ 110.022606] handle_mm_fault+0xe8/0x180
> > > [ 110.023584] do_page_fault+0x240/0x690
> > > [ 110.024535] do_mem_abort+0x50/0xb0
> > > [ 110.025423] el0_da+0x20/0x24
> > >
> > > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
> > >
> > > As told by Catalin: "On arm64 without hardware Access Flag, copying from
> > > user will fail because the pte is old and cannot be marked young. So we
> > > always end up with zeroed page after fork() + CoW for pfn mappings. we
> > > don't always have a hardware-managed access flag on arm64."
> > >
> > > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > > changed because vmf should be passed to cow_user_page()
> > >
> > > [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> > >
> > > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > ---
> > > mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
> > > 1 file changed, 30 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index e2bb51b6242e..d2c130a5883b 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > > 2;
> > > #endif
> > > +#ifndef arch_faults_on_old_pte
> > > +static inline bool arch_faults_on_old_pte(void)
> > > +{
> > > + return false;
> > > +}
> > > +#endif
> > > +
> > > static int __init disable_randmaps(char *s)
> > > {
> > > randomize_va_space = 0;
> > > @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
> > > return same;
> > > }
> > > -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
> > > +static inline void cow_user_page(struct page *dst, struct page *src,
> > > + struct vm_fault *vmf)
> > > {
> > > + struct vm_area_struct *vma = vmf->vma;
> > > + unsigned long addr = vmf->address;
> > > +
> > > debug_dma_assert_idle(src);
> > > /*
> > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > > */
> > > if (unlikely(!src)) {
> > > void *kaddr = kmap_atomic(dst);
> > > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > + pte_t entry;
> > > /*
> > > * This really shouldn't fail, because the page is there
> > > * in the page tables. But it might just be unreadable,
> > > * in which case we just give up and fill the result with
> > > - * zeroes.
> > > + * zeroes. On architectures with software "accessed" bits,
> > > + * we would take a double page fault here, so mark it
> > > + * accessed here.
> > > */
> > > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > + spin_lock(vmf->ptl);
> > > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > + entry = pte_mkyoung(vmf->orig_pte);
> > > + if (ptep_set_access_flags(vma, addr,
> > > + vmf->pte, entry, 0))
> > > + update_mmu_cache(vma, addr, vmf->pte);
> > > + }
> > I don't follow.
> >
> > So if pte has changed under you, you don't set the accessed bit, but never
> > the less copy from the user.
> >
> > What makes you think it will not trigger the same problem?
> >
> > I think we need to make cow_user_page() fail in this case and caller --
> > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > are fine. If not userspace would re-fault on the same address and we will
> > handle the fault from the second attempt.
>
> Thanks for the pointing. How about make cow_user_page() be returned
>
> VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?
No. VM_FAULT_RETRY has different semantics: we have to drop mmap_sem(), so
let's try to take it again and handle the fault. In this case the more
likely scenario is that other thread has already handled the fault and we
don't need to do anything. If it's not the case, the fault will be
triggered again on the same address.
--
Kirill A. Shutemov
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
2019-09-19 14:57 ` Kirill A. Shutemov
@ 2019-09-19 15:02 ` Justin He (Arm Technology China)
-1 siblings, 0 replies; 39+ messages in thread
From: Justin He (Arm Technology China) @ 2019-09-19 15:02 UTC (permalink / raw)
To: Kirill A. Shutemov, Jia He
Cc: Catalin Marinas, Will Deacon, Mark Rutland, James Morse,
Marc Zyngier, Matthew Wilcox, Kirill A. Shutemov,
linux-arm-kernel, linux-kernel, linux-mm, Suzuki Poulose,
Punit Agrawal, Anshuman Khandual, Jun Yao, Alex Van Brunt,
Robin Murphy, Thomas Gleixner, Andrew Morton,
Jérôme Glisse, Ralph Campbell,
Kaly Xin (Arm Technology China)
Hi Kirill
Thanks for the detailed explanation.
--
Cheers,
Justin (Jia He)
> -----Original Message-----
> From: Kirill A. Shutemov <kirill@shutemov.name>
> Sent: 2019年9月19日 22:58
> To: Jia He <hejianet@gmail.com>
> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> Marinas <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>; Mark
> Rutland <Mark.Rutland@arm.com>; James Morse
> <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>; Matthew
> Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Suzuki Poulose
> <Suzuki.Poulose@arm.com>; Punit Agrawal <punitagrawal@gmail.com>;
> Anshuman Khandual <Anshuman.Khandual@arm.com>; Jun Yao
> <yaojun8558363@gmail.com>; Alex Van Brunt <avanbrunt@nvidia.com>;
> Robin Murphy <Robin.Murphy@arm.com>; Thomas Gleixner
> <tglx@linutronix.de>; Andrew Morton <akpm@linux-foundation.org>;
> Jérôme Glisse <jglisse@redhat.com>; Ralph Campbell
> <rcampbell@nvidia.com>; Kaly Xin (Arm Technology China)
> <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is
> cleared
>
> On Thu, Sep 19, 2019 at 10:16:34AM +0800, Jia He wrote:
> > Hi Kirill
> >
> > [On behalf of justin.he@arm.com because some mails are filted...]
> >
> > On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64
> guest, there
> > > > will be a double page fault in __copy_from_user_inatomic of
> cow_user_page.
> > > >
> > > > Below call trace is from arm64 do_page_fault for debugging purpose
> > > > [ 110.016195] Call trace:
> > > > [ 110.016826] do_page_fault+0x5a4/0x690
> > > > [ 110.017812] do_mem_abort+0x50/0xb0
> > > > [ 110.018726] el1_da+0x20/0xc4
> > > > [ 110.019492] __arch_copy_from_user+0x180/0x280
> > > > [ 110.020646] do_wp_page+0xb0/0x860
> > > > [ 110.021517] __handle_mm_fault+0x994/0x1338
> > > > [ 110.022606] handle_mm_fault+0xe8/0x180
> > > > [ 110.023584] do_page_fault+0x240/0x690
> > > > [ 110.024535] do_mem_abort+0x50/0xb0
> > > > [ 110.025423] el0_da+0x20/0x24
> > > >
> > > > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > > > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
> pmd=000000023d4b3003, pte=360000298607bd3
> > > >
> > > > As told by Catalin: "On arm64 without hardware Access Flag, copying
> from
> > > > user will fail because the pte is old and cannot be marked young. So
> we
> > > > always end up with zeroed page after fork() + CoW for pfn mappings.
> we
> > > > don't always have a hardware-managed access flag on arm64."
> > > >
> > > > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > > > changed because vmf should be passed to cow_user_page()
> > > >
> > > > [1]
> https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> > > >
> > > > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > ---
> > > > mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
> > > > 1 file changed, 30 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index e2bb51b6242e..d2c130a5883b 100644
> > > > --- a/mm/memory.c
> > > > +++ b/mm/memory.c
> > > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > > > 2;
> > > > #endif
> > > > +#ifndef arch_faults_on_old_pte
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > + return false;
> > > > +}
> > > > +#endif
> > > > +
> > > > static int __init disable_randmaps(char *s)
> > > > {
> > > > randomize_va_space = 0;
> > > > @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct
> mm_struct *mm, pmd_t *pmd,
> > > > return same;
> > > > }
> > > > -static inline void cow_user_page(struct page *dst, struct page *src,
> unsigned long va, struct vm_area_struct *vma)
> > > > +static inline void cow_user_page(struct page *dst, struct page *src,
> > > > + struct vm_fault *vmf)
> > > > {
> > > > + struct vm_area_struct *vma = vmf->vma;
> > > > + unsigned long addr = vmf->address;
> > > > +
> > > > debug_dma_assert_idle(src);
> > > > /*
> > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct
> page *dst, struct page *src, unsigned lo
> > > > */
> > > > if (unlikely(!src)) {
> > > > void *kaddr = kmap_atomic(dst);
> > > > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > + pte_t entry;
> > > > /*
> > > > * This really shouldn't fail, because the page is there
> > > > * in the page tables. But it might just be unreadable,
> > > > * in which case we just give up and fill the result with
> > > > - * zeroes.
> > > > + * zeroes. On architectures with software "accessed" bits,
> > > > + * we would take a double page fault here, so mark it
> > > > + * accessed here.
> > > > */
> > > > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte))
> {
> > > > + spin_lock(vmf->ptl);
> > > > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > + entry = pte_mkyoung(vmf->orig_pte);
> > > > + if (ptep_set_access_flags(vma, addr,
> > > > + vmf->pte, entry, 0))
> > > > + update_mmu_cache(vma, addr, vmf-
> >pte);
> > > > + }
> > > I don't follow.
> > >
> > > So if pte has changed under you, you don't set the accessed bit, but
> never
> > > the less copy from the user.
> > >
> > > What makes you think it will not trigger the same problem?
> > >
> > > I think we need to make cow_user_page() fail in this case and caller --
> > > wp_page_copy() -- return zero. If the fault was solved by other thread,
> we
> > > are fine. If not userspace would re-fault on the same address and we
> will
> > > handle the fault from the second attempt.
> >
> > Thanks for the pointing. How about make cow_user_page() be returned
> >
> > VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?
>
> No. VM_FAULT_RETRY has different semantics: we have to drop
> mmap_sem(), so
> let's try to take it again and handle the fault. In this case the more
> likely scenario is that other thread has already handled the fault and we
> don't need to do anything. If it's not the case, the fault will be
> triggered again on the same address.
>
> --
> Kirill A. Shutemov
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
@ 2019-09-19 15:02 ` Justin He (Arm Technology China)
0 siblings, 0 replies; 39+ messages in thread
From: Justin He (Arm Technology China) @ 2019-09-19 15:02 UTC (permalink / raw)
To: Kirill A. Shutemov, Jia He
Cc: Mark Rutland, Kaly Xin (Arm Technology China),
Ralph Campbell, Andrew Morton, Suzuki Poulose, Catalin Marinas,
Anshuman Khandual, linux-kernel, Matthew Wilcox, Jun Yao,
linux-mm, Jérôme Glisse, James Morse, linux-arm-kernel,
Punit Agrawal, Marc Zyngier, Thomas Gleixner, Will Deacon,
Alex Van Brunt, Kirill A. Shutemov, Robin Murphy
Hi Kirill
Thanks for the detailed explanation.
--
Cheers,
Justin (Jia He)
> -----Original Message-----
> From: Kirill A. Shutemov <kirill@shutemov.name>
> Sent: 2019年9月19日 22:58
> To: Jia He <hejianet@gmail.com>
> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> Marinas <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>; Mark
> Rutland <Mark.Rutland@arm.com>; James Morse
> <James.Morse@arm.com>; Marc Zyngier <maz@kernel.org>; Matthew
> Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Suzuki Poulose
> <Suzuki.Poulose@arm.com>; Punit Agrawal <punitagrawal@gmail.com>;
> Anshuman Khandual <Anshuman.Khandual@arm.com>; Jun Yao
> <yaojun8558363@gmail.com>; Alex Van Brunt <avanbrunt@nvidia.com>;
> Robin Murphy <Robin.Murphy@arm.com>; Thomas Gleixner
> <tglx@linutronix.de>; Andrew Morton <akpm@linux-foundation.org>;
> Jérôme Glisse <jglisse@redhat.com>; Ralph Campbell
> <rcampbell@nvidia.com>; Kaly Xin (Arm Technology China)
> <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is
> cleared
>
> On Thu, Sep 19, 2019 at 10:16:34AM +0800, Jia He wrote:
> > Hi Kirill
> >
> > [On behalf of justin.he@arm.com because some mails are filted...]
> >
> > On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> > > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64
> guest, there
> > > > will be a double page fault in __copy_from_user_inatomic of
> cow_user_page.
> > > >
> > > > Below call trace is from arm64 do_page_fault for debugging purpose
> > > > [ 110.016195] Call trace:
> > > > [ 110.016826] do_page_fault+0x5a4/0x690
> > > > [ 110.017812] do_mem_abort+0x50/0xb0
> > > > [ 110.018726] el1_da+0x20/0xc4
> > > > [ 110.019492] __arch_copy_from_user+0x180/0x280
> > > > [ 110.020646] do_wp_page+0xb0/0x860
> > > > [ 110.021517] __handle_mm_fault+0x994/0x1338
> > > > [ 110.022606] handle_mm_fault+0xe8/0x180
> > > > [ 110.023584] do_page_fault+0x240/0x690
> > > > [ 110.024535] do_mem_abort+0x50/0xb0
> > > > [ 110.025423] el0_da+0x20/0x24
> > > >
> > > > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > > > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
> pmd=000000023d4b3003, pte=360000298607bd3
> > > >
> > > > As told by Catalin: "On arm64 without hardware Access Flag, copying
> from
> > > > user will fail because the pte is old and cannot be marked young. So
> we
> > > > always end up with zeroed page after fork() + CoW for pfn mappings.
> we
> > > > don't always have a hardware-managed access flag on arm64."
> > > >
> > > > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > > > changed because vmf should be passed to cow_user_page()
> > > >
> > > > [1]
> https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> > > >
> > > > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > ---
> > > > mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
> > > > 1 file changed, 30 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index e2bb51b6242e..d2c130a5883b 100644
> > > > --- a/mm/memory.c
> > > > +++ b/mm/memory.c
> > > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > > > 2;
> > > > #endif
> > > > +#ifndef arch_faults_on_old_pte
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > + return false;
> > > > +}
> > > > +#endif
> > > > +
> > > > static int __init disable_randmaps(char *s)
> > > > {
> > > > randomize_va_space = 0;
> > > > @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct
> mm_struct *mm, pmd_t *pmd,
> > > > return same;
> > > > }
> > > > -static inline void cow_user_page(struct page *dst, struct page *src,
> unsigned long va, struct vm_area_struct *vma)
> > > > +static inline void cow_user_page(struct page *dst, struct page *src,
> > > > + struct vm_fault *vmf)
> > > > {
> > > > + struct vm_area_struct *vma = vmf->vma;
> > > > + unsigned long addr = vmf->address;
> > > > +
> > > > debug_dma_assert_idle(src);
> > > > /*
> > > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct
> page *dst, struct page *src, unsigned lo
> > > > */
> > > > if (unlikely(!src)) {
> > > > void *kaddr = kmap_atomic(dst);
> > > > - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > > + void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > > + pte_t entry;
> > > > /*
> > > > * This really shouldn't fail, because the page is there
> > > > * in the page tables. But it might just be unreadable,
> > > > * in which case we just give up and fill the result with
> > > > - * zeroes.
> > > > + * zeroes. On architectures with software "accessed" bits,
> > > > + * we would take a double page fault here, so mark it
> > > > + * accessed here.
> > > > */
> > > > + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte))
> {
> > > > + spin_lock(vmf->ptl);
> > > > + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > > + entry = pte_mkyoung(vmf->orig_pte);
> > > > + if (ptep_set_access_flags(vma, addr,
> > > > + vmf->pte, entry, 0))
> > > > + update_mmu_cache(vma, addr, vmf-
> >pte);
> > > > + }
> > > I don't follow.
> > >
> > > So if pte has changed under you, you don't set the accessed bit, but
> never
> > > the less copy from the user.
> > >
> > > What makes you think it will not trigger the same problem?
> > >
> > > I think we need to make cow_user_page() fail in this case and caller --
> > > wp_page_copy() -- return zero. If the fault was solved by other thread,
> we
> > > are fine. If not userspace would re-fault on the same address and we
> will
> > > handle the fault from the second attempt.
> >
> > Thanks for the pointing. How about make cow_user_page() be returned
> >
> > VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?
>
> No. VM_FAULT_RETRY has different semantics: we have to drop
> mmap_sem(), so
> let's try to take it again and handle the fault. In this case the more
> likely scenario is that other thread has already handled the fault and we
> don't need to do anything. If it's not the case, the fault will be
> triggered again on the same address.
>
> --
> Kirill A. Shutemov
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 39+ messages in thread