All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jia He <hejianet@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
	"Justin He (Arm Technology China)" <Justin.He@arm.com>
Cc: "Will Deacon" <will@kernel.org>,
	"Mark Rutland" <Mark.Rutland@arm.com>,
	"James Morse" <James.Morse@arm.com>,
	"Marc Zyngier" <maz@kernel.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Suzuki Poulose" <Suzuki.Poulose@arm.com>,
	"Punit Agrawal" <punitagrawal@gmail.com>,
	"Anshuman Khandual" <Anshuman.Khandual@arm.com>,
	"Alex Van Brunt" <avanbrunt@nvidia.com>,
	"Robin Murphy" <Robin.Murphy@arm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Kaly Xin (Arm Technology China)" <Kaly.Xin@arm.com>,
	nd <nd@arm.com>
Subject: Re: [PATCH v8 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
Date: Tue, 24 Sep 2019 23:29:07 +0800	[thread overview]
Message-ID: <6267b685-5162-85ac-087f-112303bb7035@gmail.com> (raw)
In-Reply-To: <20190924103324.GB41214@arrakis.emea.arm.com>

Hi Catalin

On 2019/9/24 18:33, Catalin Marinas wrote:
> On Tue, Sep 24, 2019 at 06:43:06AM +0000, Justin He (Arm Technology China) wrote:
>> Catalin Marinas wrote:
>>> On Sat, Sep 21, 2019 at 09:50:54PM +0800, Jia He wrote:
>>>> @@ -2151,21 +2163,53 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
>>>>   	 * fails, we just zero-fill it. Live with it.
>>>>   	 */
>>>>   	if (unlikely(!src)) {
>>>> -		void *kaddr = kmap_atomic(dst);
>>>> -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
>>>> +		void *kaddr;
>>>> +		pte_t entry;
>>>> +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
>>>>
>>>> +		/* On architectures with software "accessed" bits, we would
>>>> +		 * take a double page fault, so mark it accessed here.
>>>> +		 */
> [...]
>>>> +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
>>>> +			vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr,
>>>> +						       &vmf->ptl);
>>>> +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
>>>> +				entry = pte_mkyoung(vmf->orig_pte);
>>>> +				if (ptep_set_access_flags(vma, addr,
>>>> +							  vmf->pte, entry, 0))
>>>> +					update_mmu_cache(vma, addr, vmf->pte);
>>>> +			} else {
>>>> +				/* Other thread has already handled the fault
>>>> +				 * and we don't need to do anything. If it's
>>>> +				 * not the case, the fault will be triggered
>>>> +				 * again on the same address.
>>>> +				 */
>>>> +				pte_unmap_unlock(vmf->pte, vmf->ptl);
>>>> +				return false;
>>>> +			}
>>>> +			pte_unmap_unlock(vmf->pte, vmf->ptl);
>>>> +		}
> [...]
>>>> +
>>>> +		kaddr = kmap_atomic(dst);
>>> Since you moved the kmap_atomic() here, could the above
>>> arch_faults_on_old_pte() run in a preemptible context? I suggested to
>>> add a WARN_ON in patch 2 to be sure.
>> Should I move kmap_atomic back to the original line? Thus, we can make sure
>> that arch_faults_on_old_pte() is in the context of preempt_disabled?
>> Otherwise, arch_faults_on_old_pte() may cause plenty of warning if I add
>> a WARN_ON in arch_faults_on_old_pte.  I tested it when I enable the PREEMPT=y
>> on a ThunderX2 qemu guest.
> So we have two options here:
>
> 1. Change arch_faults_on_old_pte() scope to the whole system rather than
>     just the current CPU. You'd have to wire up a new arm64 capability
>     for the access flag but this way we don't care whether it's
>     preemptible or not.
>
> 2. Keep the arch_faults_on_old_pte() per-CPU but make sure we are not
>     preempted here. The kmap_atomic() move would do but you'd have to
>     kunmap_atomic() before the return.
>
> I think the answer to my question below also has some implication on
> which option to pick:
>
>>>>   		/*
>>>>   		 * This really shouldn't fail, because the page is there
>>>>   		 * in the page tables. But it might just be unreadable,
>>>>   		 * in which case we just give up and fill the result with
>>>>   		 * zeroes.
>>>>   		 */
>>>> -		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
>>>> +		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
>>>> +			/* Give a warn in case there can be some obscure
>>>> +			 * use-case
>>>> +			 */
>>>> +			WARN_ON_ONCE(1);
>>> That's more of a question for the mm guys: at this point we do the
>>> copying with the ptl released; is there anything else that could have
>>> made the pte old in the meantime? I think unuse_pte() is only called on
>>> anonymous vmas, so it shouldn't be the case here.
> If we need to hold the ptl here, you could as well have an enclosing
> kmap/kunmap_atomic (option 2) with some goto instead of "return false".

I am not 100% sure that I understand your suggestion well, so I drafted the patch

here:

Changes: optimize the indentions

      hold the ptl longer


-static inline void cow_user_page(struct page *dst, struct page *src, unsigned 
long va, struct vm_area_struct *vma)
+static inline bool cow_user_page(struct page *dst, struct page *src,
+                 struct vm_fault *vmf)
  {
+    struct vm_area_struct *vma = vmf->vma;
+    struct mm_struct *mm = vma->vm_mm;
+    unsigned long addr = vmf->address;
+    bool ret;
+    pte_t entry;
+    void *kaddr;
+    void __user *uaddr;
+
      debug_dma_assert_idle(src);

+    if (likely(src)) {
+        copy_user_highpage(dst, src, addr, vma);
+        return true;
+    }
+
      /*
       * If the source page was a PFN mapping, we don't have
       * a "struct page" for it. We do a best-effort copy by
       * just copying from the original user address. If that
       * fails, we just zero-fill it. Live with it.
       */
-    if (unlikely(!src)) {
-        void *kaddr = kmap_atomic(dst);
-        void __user *uaddr = (void __user *)(va & PAGE_MASK);
+    kaddr = kmap_atomic(dst);
+    uaddr = (void __user *)(addr & PAGE_MASK);
+
+    /*
+     * On architectures with software "accessed" bits, we would
+     * take a double page fault, so mark it accessed here.
+     */
+    vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
+    if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+        if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
+            /*
+             * Other thread has already handled the fault
+             * and we don't need to do anything. If it's
+             * not the case, the fault will be triggered
+             * again on the same address.
+             */
+            ret = false;
+            goto pte_unlock;
+        }
+
+        entry = pte_mkyoung(vmf->orig_pte);
+        if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0))
+            update_mmu_cache(vma, addr, vmf->pte);
+    }

+    /*
+     * This really shouldn't fail, because the page is there
+     * in the page tables. But it might just be unreadable,
+     * in which case we just give up and fill the result with
+     * zeroes.
+     */
+    if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
          /*
-         * This really shouldn't fail, because the page is there
-         * in the page tables. But it might just be unreadable,
-         * in which case we just give up and fill the result with
-         * zeroes.
+         * Give a warn in case there can be some obscure
+         * use-case
           */
-        if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
-            clear_page(kaddr);
-        kunmap_atomic(kaddr);
-        flush_dcache_page(dst);
-    } else
-        copy_user_highpage(dst, src, va, vma);
+        WARN_ON_ONCE(1);
+        clear_page(kaddr);
+    }
+
+    ret = true;
+
+pte_unlock:
+    pte_unmap_unlock(vmf->pte, vmf->ptl);
+    kunmap_atomic(kaddr);
+    flush_dcache_page(dst);
+
+    return ret;
  }


---
Cheers,
Justin (Jia He)


WARNING: multiple messages have this Message-ID (diff)
From: Jia He <hejianet@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
	"Justin He (Arm Technology China)" <Justin.He@arm.com>
Cc: "Mark Rutland" <Mark.Rutland@arm.com>,
	"Kaly Xin (Arm Technology China)" <Kaly.Xin@arm.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Suzuki Poulose" <Suzuki.Poulose@arm.com>,
	"Marc Zyngier" <maz@kernel.org>,
	"Anshuman Khandual" <Anshuman.Khandual@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"James Morse" <James.Morse@arm.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"Punit Agrawal" <punitagrawal@gmail.com>,
	"Thomas Gleixner" <tglx@linutronix.de>, nd <nd@arm.com>,
	"Will Deacon" <will@kernel.org>,
	"Alex Van Brunt" <avanbrunt@nvidia.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Robin Murphy" <Robin.Murphy@arm.com>
Subject: Re: [PATCH v8 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared
Date: Tue, 24 Sep 2019 23:29:07 +0800	[thread overview]
Message-ID: <6267b685-5162-85ac-087f-112303bb7035@gmail.com> (raw)
In-Reply-To: <20190924103324.GB41214@arrakis.emea.arm.com>

Hi Catalin

On 2019/9/24 18:33, Catalin Marinas wrote:
> On Tue, Sep 24, 2019 at 06:43:06AM +0000, Justin He (Arm Technology China) wrote:
>> Catalin Marinas wrote:
>>> On Sat, Sep 21, 2019 at 09:50:54PM +0800, Jia He wrote:
>>>> @@ -2151,21 +2163,53 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
>>>>   	 * fails, we just zero-fill it. Live with it.
>>>>   	 */
>>>>   	if (unlikely(!src)) {
>>>> -		void *kaddr = kmap_atomic(dst);
>>>> -		void __user *uaddr = (void __user *)(va & PAGE_MASK);
>>>> +		void *kaddr;
>>>> +		pte_t entry;
>>>> +		void __user *uaddr = (void __user *)(addr & PAGE_MASK);
>>>>
>>>> +		/* On architectures with software "accessed" bits, we would
>>>> +		 * take a double page fault, so mark it accessed here.
>>>> +		 */
> [...]
>>>> +		if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
>>>> +			vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr,
>>>> +						       &vmf->ptl);
>>>> +			if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
>>>> +				entry = pte_mkyoung(vmf->orig_pte);
>>>> +				if (ptep_set_access_flags(vma, addr,
>>>> +							  vmf->pte, entry, 0))
>>>> +					update_mmu_cache(vma, addr, vmf->pte);
>>>> +			} else {
>>>> +				/* Other thread has already handled the fault
>>>> +				 * and we don't need to do anything. If it's
>>>> +				 * not the case, the fault will be triggered
>>>> +				 * again on the same address.
>>>> +				 */
>>>> +				pte_unmap_unlock(vmf->pte, vmf->ptl);
>>>> +				return false;
>>>> +			}
>>>> +			pte_unmap_unlock(vmf->pte, vmf->ptl);
>>>> +		}
> [...]
>>>> +
>>>> +		kaddr = kmap_atomic(dst);
>>> Since you moved the kmap_atomic() here, could the above
>>> arch_faults_on_old_pte() run in a preemptible context? I suggested to
>>> add a WARN_ON in patch 2 to be sure.
>> Should I move kmap_atomic back to the original line? Thus, we can make sure
>> that arch_faults_on_old_pte() is in the context of preempt_disabled?
>> Otherwise, arch_faults_on_old_pte() may cause plenty of warning if I add
>> a WARN_ON in arch_faults_on_old_pte.  I tested it when I enable the PREEMPT=y
>> on a ThunderX2 qemu guest.
> So we have two options here:
>
> 1. Change arch_faults_on_old_pte() scope to the whole system rather than
>     just the current CPU. You'd have to wire up a new arm64 capability
>     for the access flag but this way we don't care whether it's
>     preemptible or not.
>
> 2. Keep the arch_faults_on_old_pte() per-CPU but make sure we are not
>     preempted here. The kmap_atomic() move would do but you'd have to
>     kunmap_atomic() before the return.
>
> I think the answer to my question below also has some implication on
> which option to pick:
>
>>>>   		/*
>>>>   		 * This really shouldn't fail, because the page is there
>>>>   		 * in the page tables. But it might just be unreadable,
>>>>   		 * in which case we just give up and fill the result with
>>>>   		 * zeroes.
>>>>   		 */
>>>> -		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
>>>> +		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
>>>> +			/* Give a warn in case there can be some obscure
>>>> +			 * use-case
>>>> +			 */
>>>> +			WARN_ON_ONCE(1);
>>> That's more of a question for the mm guys: at this point we do the
>>> copying with the ptl released; is there anything else that could have
>>> made the pte old in the meantime? I think unuse_pte() is only called on
>>> anonymous vmas, so it shouldn't be the case here.
> If we need to hold the ptl here, you could as well have an enclosing
> kmap/kunmap_atomic (option 2) with some goto instead of "return false".

I am not 100% sure that I understand your suggestion well, so I drafted the patch

here:

Changes: optimize the indentions

      hold the ptl longer


-static inline void cow_user_page(struct page *dst, struct page *src, unsigned 
long va, struct vm_area_struct *vma)
+static inline bool cow_user_page(struct page *dst, struct page *src,
+                 struct vm_fault *vmf)
  {
+    struct vm_area_struct *vma = vmf->vma;
+    struct mm_struct *mm = vma->vm_mm;
+    unsigned long addr = vmf->address;
+    bool ret;
+    pte_t entry;
+    void *kaddr;
+    void __user *uaddr;
+
      debug_dma_assert_idle(src);

+    if (likely(src)) {
+        copy_user_highpage(dst, src, addr, vma);
+        return true;
+    }
+
      /*
       * If the source page was a PFN mapping, we don't have
       * a "struct page" for it. We do a best-effort copy by
       * just copying from the original user address. If that
       * fails, we just zero-fill it. Live with it.
       */
-    if (unlikely(!src)) {
-        void *kaddr = kmap_atomic(dst);
-        void __user *uaddr = (void __user *)(va & PAGE_MASK);
+    kaddr = kmap_atomic(dst);
+    uaddr = (void __user *)(addr & PAGE_MASK);
+
+    /*
+     * On architectures with software "accessed" bits, we would
+     * take a double page fault, so mark it accessed here.
+     */
+    vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
+    if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+        if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
+            /*
+             * Other thread has already handled the fault
+             * and we don't need to do anything. If it's
+             * not the case, the fault will be triggered
+             * again on the same address.
+             */
+            ret = false;
+            goto pte_unlock;
+        }
+
+        entry = pte_mkyoung(vmf->orig_pte);
+        if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0))
+            update_mmu_cache(vma, addr, vmf->pte);
+    }

+    /*
+     * This really shouldn't fail, because the page is there
+     * in the page tables. But it might just be unreadable,
+     * in which case we just give up and fill the result with
+     * zeroes.
+     */
+    if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
          /*
-         * This really shouldn't fail, because the page is there
-         * in the page tables. But it might just be unreadable,
-         * in which case we just give up and fill the result with
-         * zeroes.
+         * Give a warn in case there can be some obscure
+         * use-case
           */
-        if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
-            clear_page(kaddr);
-        kunmap_atomic(kaddr);
-        flush_dcache_page(dst);
-    } else
-        copy_user_highpage(dst, src, va, vma);
+        WARN_ON_ONCE(1);
+        clear_page(kaddr);
+    }
+
+    ret = true;
+
+pte_unlock:
+    pte_unmap_unlock(vmf->pte, vmf->ptl);
+    kunmap_atomic(kaddr);
+    flush_dcache_page(dst);
+
+    return ret;
  }


---
Cheers,
Justin (Jia He)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2019-09-24 15:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-21 13:50 [PATCH v8 0/3] fix double page fault on arm64 Jia He
2019-09-21 13:50 ` Jia He
2019-09-21 13:50 ` [PATCH v8 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af() Jia He
2019-09-21 13:50   ` Jia He
2019-09-23 16:07   ` Catalin Marinas
2019-09-23 16:07     ` Catalin Marinas
2019-09-24  1:50     ` Justin He (Arm Technology China)
2019-09-24  1:50       ` Justin He (Arm Technology China)
2019-09-21 13:50 ` [PATCH v8 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64 Jia He
2019-09-21 13:50   ` Jia He
2019-09-23 16:18   ` Catalin Marinas
2019-09-23 16:18     ` Catalin Marinas
2019-09-24  2:17     ` Justin He (Arm Technology China)
2019-09-24  2:17       ` Justin He (Arm Technology China)
2019-09-21 13:50 ` [PATCH v8 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared Jia He
2019-09-21 13:50   ` Jia He
2019-09-21 15:31   ` Matthew Wilcox
2019-09-21 15:31     ` Matthew Wilcox
2019-09-23  8:28   ` Kirill A. Shutemov
2019-09-23  8:28     ` Kirill A. Shutemov
2019-09-23 17:04   ` Catalin Marinas
2019-09-23 17:04     ` Catalin Marinas
2019-09-24  6:43     ` Justin He (Arm Technology China)
2019-09-24  6:43       ` Justin He (Arm Technology China)
2019-09-24 10:33       ` Catalin Marinas
2019-09-24 10:33         ` Catalin Marinas
2019-09-24 11:59         ` Kirill A. Shutemov
2019-09-24 11:59           ` Kirill A. Shutemov
2019-09-24 15:29         ` Jia He [this message]
2019-09-24 15:29           ` Jia He
2019-09-24 16:35           ` Catalin Marinas
2019-09-24 16:35             ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6267b685-5162-85ac-087f-112303bb7035@gmail.com \
    --to=hejianet@gmail.com \
    --cc=Anshuman.Khandual@arm.com \
    --cc=James.Morse@arm.com \
    --cc=Justin.He@arm.com \
    --cc=Kaly.Xin@arm.com \
    --cc=Mark.Rutland@arm.com \
    --cc=Robin.Murphy@arm.com \
    --cc=Suzuki.Poulose@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=avanbrunt@nvidia.com \
    --cc=catalin.marinas@arm.com \
    --cc=jglisse@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maz@kernel.org \
    --cc=nd@arm.com \
    --cc=punitagrawal@gmail.com \
    --cc=rcampbell@nvidia.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.