linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Jia He <hejianet@gmail.com>
Cc: "Jia He" <justin.he@arm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Will Deacon" <will@kernel.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"James Morse" <james.morse@arm.com>,
	"Marc Zyngier" <maz@kernel.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	"Suzuki Poulose" <Suzuki.Poulose@arm.com>,
	"Punit Agrawal" <punitagrawal@gmail.com>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Jun Yao" <yaojun8558363@gmail.com>,
	"Alex Van Brunt" <avanbrunt@nvidia.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Kaly Xin" <Kaly.Xin@arm.com>
Subject: Re: [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
Date: Thu, 19 Sep 2019 17:57:42 +0300	[thread overview]
Message-ID: <20190919145742.ooamzlmcfqjobcx6@box> (raw)
In-Reply-To: <bebe97e1-b1fe-7f36-91c0-7d412f093560@gmail.com>

On Thu, Sep 19, 2019 at 10:16:34AM +0800, Jia He wrote:
> Hi Kirill
> 
> [On behalf of justin.he@arm.com because some mails are filted...]
> 
> On 2019/9/18 22:00, Kirill A. Shutemov wrote:
> > On Wed, Sep 18, 2019 at 09:19:14PM +0800, Jia He wrote:
> > > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> > > will be a double page fault in __copy_from_user_inatomic of cow_user_page.
> > > 
> > > Below call trace is from arm64 do_page_fault for debugging purpose
> > > [  110.016195] Call trace:
> > > [  110.016826]  do_page_fault+0x5a4/0x690
> > > [  110.017812]  do_mem_abort+0x50/0xb0
> > > [  110.018726]  el1_da+0x20/0xc4
> > > [  110.019492]  __arch_copy_from_user+0x180/0x280
> > > [  110.020646]  do_wp_page+0xb0/0x860
> > > [  110.021517]  __handle_mm_fault+0x994/0x1338
> > > [  110.022606]  handle_mm_fault+0xe8/0x180
> > > [  110.023584]  do_page_fault+0x240/0x690
> > > [  110.024535]  do_mem_abort+0x50/0xb0
> > > [  110.025423]  el0_da+0x20/0x24
> > > 
> > > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > > [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3
> > > 
> > > As told by Catalin: "On arm64 without hardware Access Flag, copying from
> > > user will fail because the pte is old and cannot be marked young. So we
> > > always end up with zeroed page after fork() + CoW for pfn mappings. we
> > > don't always have a hardware-managed access flag on arm64."
> > > 
> > > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > > changed because vmf should be passed to cow_user_page()
> > > 
> > > [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> > > 
> > > Reported-by: Yibo Cai <Yibo.Cai@arm.com>
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > ---
> > >   mm/memory.c | 35 ++++++++++++++++++++++++++++++-----
> > >   1 file changed, 30 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index e2bb51b6242e..d2c130a5883b 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > >   					2;
> > >   #endif
> > > +#ifndef arch_faults_on_old_pte
> > > +static inline bool arch_faults_on_old_pte(void)
> > > +{
> > > +	return false;
> > > +}
> > > +#endif
> > > +
> > >   static int __init disable_randmaps(char *s)
> > >   {
> > >   	randomize_va_space = 0;
> > > @@ -2140,8 +2147,12 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
> > >   	return same;
> > >   }
> > > -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
> > > +static inline void cow_user_page(struct page *dst, struct page *src,
> > > +				 struct vm_fault *vmf)
> > >   {
> > > +	struct vm_area_struct *vma = vmf->vma;
> > > +	unsigned long addr = vmf->address;
> > > +
> > >   	debug_dma_assert_idle(src);
> > >   	/*
> > > @@ -2152,20 +2163,34 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
> > >   	 */
> > >   	if (unlikely(!src)) {
> > >   		void *kaddr = kmap_atomic(dst);
> > > -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
> > > +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
> > > +		pte_t entry;
> > >   		/*
> > >   		 * This really shouldn't fail, because the page is there
> > >   		 * in the page tables. But it might just be unreadable,
> > >   		 * in which case we just give up and fill the result with
> > > -		 * zeroes.
> > > +		 * zeroes. On architectures with software "accessed" bits,
> > > +		 * we would take a double page fault here, so mark it
> > > +		 * accessed here.
> > >   		 */
> > > +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
> > > +			spin_lock(vmf->ptl);
> > > +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> > > +				entry = pte_mkyoung(vmf->orig_pte);
> > > +				if (ptep_set_access_flags(vma, addr,
> > > +							  vmf->pte, entry, 0))
> > > +					update_mmu_cache(vma, addr, vmf->pte);
> > > +			}
> > I don't follow.
> > 
> > So if pte has changed under you, you don't set the accessed bit, but never
> > the less copy from the user.
> > 
> > What makes you think it will not trigger the same problem?
> > 
> > I think we need to make cow_user_page() fail in this case and caller --
> > wp_page_copy() -- return zero. If the fault was solved by other thread, we
> > are fine. If not userspace would re-fault on the same address and we will
> > handle the fault from the second attempt.
> 
> Thanks for the pointing. How about make cow_user_page() be returned
> 
> VM_FAULT_RETRY? Then in do_page_fault(), it can retry the page fault?

No. VM_FAULT_RETRY has different semantics: we have to drop mmap_sem(), so
let's try to take it again and handle the fault. In this case the more
likely scenario is that other thread has already handled the fault and we
don't need to do anything. If it's not the case, the fault will be
triggered again on the same address.

-- 
 Kirill A. Shutemov

  reply	other threads:[~2019-09-19 14:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-18 13:19 [PATCH v4 0/3] fix double page fault on arm64 Jia He
2019-09-18 13:19 ` [PATCH v4 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af() Jia He
2019-09-18 14:20   ` Matthew Wilcox
2019-09-18 16:49     ` Catalin Marinas
2019-09-18 14:20   ` Suzuki K Poulose
2019-09-18 16:45     ` Catalin Marinas
2019-09-19  1:55       ` Justin He (Arm Technology China)
2019-09-18 13:19 ` [PATCH v4 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64 Jia He
2019-09-18 13:19 ` [PATCH v4 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
2019-09-18 14:00   ` Kirill A. Shutemov
2019-09-18 18:00     ` Catalin Marinas
2019-09-19 15:00       ` Kirill A. Shutemov
2019-09-19 15:41         ` Catalin Marinas
2019-09-19 15:51           ` Kirill A. Shutemov
2019-09-19  2:16     ` Jia He
2019-09-19 14:57       ` Kirill A. Shutemov [this message]
2019-09-19 15:02         ` Justin He (Arm Technology China)
2019-09-18 19:35   ` kbuild test robot
2019-09-19  1:46     ` Justin He (Arm Technology China)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190919145742.ooamzlmcfqjobcx6@box \
    --to=kirill@shutemov.name \
    --cc=Kaly.Xin@arm.com \
    --cc=Suzuki.Poulose@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=avanbrunt@nvidia.com \
    --cc=catalin.marinas@arm.com \
    --cc=hejianet@gmail.com \
    --cc=james.morse@arm.com \
    --cc=jglisse@redhat.com \
    --cc=justin.he@arm.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=punitagrawal@gmail.com \
    --cc=rcampbell@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yaojun8558363@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).