linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kvm-ppc@vger.kernel.org
Cc: npiggin@gmail.com, paulus@ozlabs.org, leonardo@linux.ibm.com,
	kirill@shutemov.name,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: [PATCH v2 07/22] powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slow
Date: Thu, 19 Mar 2020 09:25:54 +0530	[thread overview]
Message-ID: <20200319035609.158654-8-aneesh.kumar@linux.ibm.com> (raw)
In-Reply-To: <20200319035609.158654-1-aneesh.kumar@linux.ibm.com>

read_user_stack_slow is called with interrupts soft disabled and it copies contents
from the page which we find mapped to a specific address. To convert
userspace address to pfn, the kernel now uses lockless page table walk.

The kernel needs to make sure the pfn value read remains stable and is not released
and reused for another process while the contents are read from the page. This
can only be achieved by holding a page reference.

One of the first approaches I tried was to check the pte value after the kernel
copies the contents from the page. But as shown below we can still get it wrong

CPU0                           CPU1
pte = READ_ONCE(*ptep);
                               pte_clear(pte);
                               put_page(page);
                               page = alloc_page();
                               memcpy(page_address(page), "secret password", nr);
memcpy(buf, kaddr + offset, nb);
                               put_page(page);
                               handle_mm_fault()
                               page = alloc_page();
                               set_pte(pte, page);
if (pte_val(pte) != pte_val(*ptep))

Hence switch to __get_user_pages_fast.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/perf/callchain.c | 53 ++++++++++++++---------------------
 1 file changed, 21 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index cbc251981209..7b7b3ff53180 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -107,46 +107,35 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
  * On 64-bit we don't want to invoke hash_page on user addresses from
  * interrupt context, so if the access faults, we read the page tables
  * to find which page (if any) is mapped and access it directly.
+
+ * This should get called only if the PMU interrupt occurred while
+ * interrupts are soft-disabled, and there is no MMU hash table entry
+ * for the page . In this case we will get an -EFAULT return from
+ * __get_user_inatomic even if there is a valid Linux PTE for the page,
+ * since hash_page isn't reentrant.  Thus we have code here to read the
+ * Linux PTE and access the page via the kernel linear mapping.
  */
 static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 {
-	int ret = -EFAULT;
-	pgd_t *pgdir;
-	pte_t *ptep, pte;
-	unsigned shift;
+
 	unsigned long addr = (unsigned long) ptr;
 	unsigned long offset;
-	unsigned long pfn, flags;
+	struct page *page;
+	int nrpages;
 	void *kaddr;
 
-	pgdir = current->mm->pgd;
-	if (!pgdir)
-		return -EFAULT;
+	nrpages = __get_user_pages_fast(addr, 1, 1, &page);
+	if (nrpages == 1) {
+		kaddr = page_address(page);
 
-	local_irq_save(flags);
-	ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
-	if (!ptep)
-		goto err_out;
-	if (!shift)
-		shift = PAGE_SHIFT;
-
-	/* align address to page boundary */
-	offset = addr & ((1UL << shift) - 1);
-
-	pte = READ_ONCE(*ptep);
-	if (!pte_present(pte) || !pte_user(pte))
-		goto err_out;
-	pfn = pte_pfn(pte);
-	if (!page_is_ram(pfn))
-		goto err_out;
-
-	/* no highmem to worry about here */
-	kaddr = pfn_to_kaddr(pfn);
-	memcpy(buf, kaddr + offset, nb);
-	ret = 0;
-err_out:
-	local_irq_restore(flags);
-	return ret;
+		/* align address to page boundary */
+		offset = addr & ~PAGE_MASK;
+
+		memcpy(buf, kaddr + offset, nb);
+		put_page(page);
+		return 0;
+	}
+	return -EFAULT;
 }
 
 static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
-- 
2.24.1



  parent reply	other threads:[~2020-03-19  3:56 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-19  3:55 [PATCH v2 00/22] Avoid IPI while updating page table entries Aneesh Kumar K.V
2020-03-19  3:55 ` [PATCH v2 01/22] powerpc/pkeys: Avoid using lockless page table walk Aneesh Kumar K.V
2020-04-03  0:28   ` Ram Pai
2020-04-05 13:37     ` Aneesh Kumar K.V
2020-03-19  3:55 ` [PATCH v2 02/22] powerpc/pkeys: Check vma before returning key fault error to the user Aneesh Kumar K.V
2020-03-19  3:55 ` [PATCH v2 03/22] powerpc/mm/hash64: use _PAGE_PTE when checking for pte_present Aneesh Kumar K.V
2020-03-19  3:55 ` [PATCH v2 04/22] powerpc/hash64: Restrict page table lookup using init_mm with __flush_hash_table_range Aneesh Kumar K.V
2020-03-19  3:55 ` [PATCH v2 05/22] powerpc/book3s64/hash: Use the pte_t address from the caller Aneesh Kumar K.V
2020-03-19  3:55 ` [PATCH v2 06/22] powerpc/mce: Don't reload pte val in addr_to_pfn Aneesh Kumar K.V
2020-03-19  3:55 ` Aneesh Kumar K.V [this message]
2020-03-19  3:55 ` [PATCH v2 08/22] powerpc/kvm/book3s: switch from raw_spin_*lock to arch_spin_lock Aneesh Kumar K.V
2020-03-19  3:55 ` [PATCH v2 09/22] powerpc/kvm/book3s: Add helper to walk partition scoped linux page table Aneesh Kumar K.V
2020-03-19  3:55 ` [PATCH v2 10/22] powerpc/kvm/nested: Add helper to walk nested shadow " Aneesh Kumar K.V
2020-03-19  3:55 ` [PATCH v2 11/22] powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table Aneesh Kumar K.V
2020-03-19  3:55 ` [PATCH v2 12/22] powerpc/kvm/book3s: Add helper for host page table walk Aneesh Kumar K.V
2020-03-19  3:56 ` [PATCH v2 13/22] powerpc/kvm/book3s: Use find_kvm_host_pte in page fault handler Aneesh Kumar K.V
2020-03-19  3:56 ` [PATCH v2 14/22] powerpc/kvm/book3s: Use find_kvm_host_pte in h_enter Aneesh Kumar K.V
2020-03-19  3:56 ` [PATCH v2 17/22] powerpc/kvm/book3s: use find_kvm_host_pte in kvmppc_book3s_instantiate_page Aneesh Kumar K.V
2020-03-19  3:56 ` [PATCH v2 18/22] powerpc/kvm/book3s: Use find_kvm_host_pte in kvmppc_get_hpa Aneesh Kumar K.V
2020-03-19  3:56 ` [PATCH v2 19/22] powerpc/kvm/book3s: Use pte_present instead of opencoding _PAGE_PRESENT check Aneesh Kumar K.V
2020-03-19  3:56 ` [PATCH v2 20/22] powerpc/mm/book3s64: Avoid sending IPI on clearing PMD Aneesh Kumar K.V
2020-03-19  3:56 ` [PATCH v2 21/22] mm: change pmdp_huge_get_and_clear_full take vm_area_struct as arg Aneesh Kumar K.V
2020-03-19  3:56 ` [PATCH v2 22/22] powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200319035609.158654-8-aneesh.kumar@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=leonardo@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulus@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).