All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-28 14:45 ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

Hello,
This series implements general forms of get_user_pages_fast and
__get_user_pages_fast and activates them for arm and arm64.

These are required for Transparent HugePages to function correctly, as
a futex on a THP tail will otherwise result in an infinite loop (due to
the core implementation of __get_user_pages_fast always returning 0).

Unfortunately, a futex on THP tail can be quite common for certain
workloads; thus THP is unreliable without a __get_user_pages_fast
implementation.

This series may also be beneficial for direct-IO heavy workloads and
certain KVM workloads.

Changes since PATCH V2 are:
 * spelt `PATCH' correctly in the subject prefix this time. :-(
 * Added acks, tested-bys and reviewed-bys.
 * Cleanup of patch #6 with pud_pte and pud_pmd helpers.
 * Switched config option from HAVE_RCU_GUP to HAVE_GENERIC_RCU_GUP.

Changes since PATCH V1 are:
 * Rebase to 3.17-rc1
 * Switched to kick_all_cpus_sync as suggested by Mark Rutland.

The main changes since RFC V5 are:
 * Rebased against 3.16-rc1.
 * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
   because the entry must be present for these leaf functions to be
   called. 
 * Rather than assume puds can be re-cast as pmds, a separate
   function pud_write is instead used by the core gup.
 * ARM activation logic changed, now it will only activate
   RCU_TABLE_FREE and RCU_GUP when running with LPAE.

The main changes since RFC V4 are:
 * corrected the arm64 logic so it now correctly rcu-frees page
   table backing pages.
 * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
   invalidate TLBs anyway.
 * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
 * dropped Catalin's mmu_gather patch as that's been merged already.

This series has been tested with LTP mm tests and some custom futex tests
that exacerbate the futex on THP tail case; on both an Arndale board and
a Juno board. Also debug counters were temporarily employed to ensure that
the RCU_TABLE_FREE logic was behaving as expected.

I would like to get this series into 3.18 as it fixes quite a big problem
with THP on arm and arm64. This series is split into a core mm part, an
arm part and an arm64 part.

Could somebody please take patch #1 (if it looks okay)?
Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)

Cheers,
--
Steve

Steve Capper (6):
  mm: Introduce a general RCU get_user_pages_fast.
  arm: mm: Introduce special ptes for LPAE
  arm: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm: mm: Enable RCU fast_gup
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

 arch/arm/Kconfig                      |   5 +
 arch/arm/include/asm/pgtable-2level.h |   2 +
 arch/arm/include/asm/pgtable-3level.h |  15 ++
 arch/arm/include/asm/pgtable.h        |   6 +-
 arch/arm/include/asm/tlb.h            |  38 ++++-
 arch/arm/mm/flush.c                   |  15 ++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/include/asm/pgtable.h      |  21 ++-
 arch/arm64/include/asm/tlb.h          |  20 ++-
 arch/arm64/mm/flush.c                 |  15 ++
 mm/Kconfig                            |   3 +
 mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
 12 files changed, 412 insertions(+), 10 deletions(-)

-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-28 14:45 ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

Hello,
This series implements general forms of get_user_pages_fast and
__get_user_pages_fast and activates them for arm and arm64.

These are required for Transparent HugePages to function correctly, as
a futex on a THP tail will otherwise result in an infinite loop (due to
the core implementation of __get_user_pages_fast always returning 0).

Unfortunately, a futex on THP tail can be quite common for certain
workloads; thus THP is unreliable without a __get_user_pages_fast
implementation.

This series may also be beneficial for direct-IO heavy workloads and
certain KVM workloads.

Changes since PATCH V2 are:
 * spelt `PATCH' correctly in the subject prefix this time. :-(
 * Added acks, tested-bys and reviewed-bys.
 * Cleanup of patch #6 with pud_pte and pud_pmd helpers.
 * Switched config option from HAVE_RCU_GUP to HAVE_GENERIC_RCU_GUP.

Changes since PATCH V1 are:
 * Rebase to 3.17-rc1
 * Switched to kick_all_cpus_sync as suggested by Mark Rutland.

The main changes since RFC V5 are:
 * Rebased against 3.16-rc1.
 * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
   because the entry must be present for these leaf functions to be
   called. 
 * Rather than assume puds can be re-cast as pmds, a separate
   function pud_write is instead used by the core gup.
 * ARM activation logic changed, now it will only activate
   RCU_TABLE_FREE and RCU_GUP when running with LPAE.

The main changes since RFC V4 are:
 * corrected the arm64 logic so it now correctly rcu-frees page
   table backing pages.
 * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
   invalidate TLBs anyway.
 * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
 * dropped Catalin's mmu_gather patch as that's been merged already.

This series has been tested with LTP mm tests and some custom futex tests
that exacerbate the futex on THP tail case; on both an Arndale board and
a Juno board. Also debug counters were temporarily employed to ensure that
the RCU_TABLE_FREE logic was behaving as expected.

I would like to get this series into 3.18 as it fixes quite a big problem
with THP on arm and arm64. This series is split into a core mm part, an
arm part and an arm64 part.

Could somebody please take patch #1 (if it looks okay)?
Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)

Cheers,
--
Steve

Steve Capper (6):
  mm: Introduce a general RCU get_user_pages_fast.
  arm: mm: Introduce special ptes for LPAE
  arm: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm: mm: Enable RCU fast_gup
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

 arch/arm/Kconfig                      |   5 +
 arch/arm/include/asm/pgtable-2level.h |   2 +
 arch/arm/include/asm/pgtable-3level.h |  15 ++
 arch/arm/include/asm/pgtable.h        |   6 +-
 arch/arm/include/asm/tlb.h            |  38 ++++-
 arch/arm/mm/flush.c                   |  15 ++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/include/asm/pgtable.h      |  21 ++-
 arch/arm64/include/asm/tlb.h          |  20 ++-
 arch/arm64/mm/flush.c                 |  15 ++
 mm/Kconfig                            |   3 +
 mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
 12 files changed, 412 insertions(+), 10 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-28 14:45 ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,
This series implements general forms of get_user_pages_fast and
__get_user_pages_fast and activates them for arm and arm64.

These are required for Transparent HugePages to function correctly, as
a futex on a THP tail will otherwise result in an infinite loop (due to
the core implementation of __get_user_pages_fast always returning 0).

Unfortunately, a futex on THP tail can be quite common for certain
workloads; thus THP is unreliable without a __get_user_pages_fast
implementation.

This series may also be beneficial for direct-IO heavy workloads and
certain KVM workloads.

Changes since PATCH V2 are:
 * spelt `PATCH' correctly in the subject prefix this time. :-(
 * Added acks, tested-bys and reviewed-bys.
 * Cleanup of patch #6 with pud_pte and pud_pmd helpers.
 * Switched config option from HAVE_RCU_GUP to HAVE_GENERIC_RCU_GUP.

Changes since PATCH V1 are:
 * Rebase to 3.17-rc1
 * Switched to kick_all_cpus_sync as suggested by Mark Rutland.

The main changes since RFC V5 are:
 * Rebased against 3.16-rc1.
 * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
   because the entry must be present for these leaf functions to be
   called. 
 * Rather than assume puds can be re-cast as pmds, a separate
   function pud_write is instead used by the core gup.
 * ARM activation logic changed, now it will only activate
   RCU_TABLE_FREE and RCU_GUP when running with LPAE.

The main changes since RFC V4 are:
 * corrected the arm64 logic so it now correctly rcu-frees page
   table backing pages.
 * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
   invalidate TLBs anyway.
 * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
 * dropped Catalin's mmu_gather patch as that's been merged already.

This series has been tested with LTP mm tests and some custom futex tests
that exacerbate the futex on THP tail case; on both an Arndale board and
a Juno board. Also debug counters were temporarily employed to ensure that
the RCU_TABLE_FREE logic was behaving as expected.

I would like to get this series into 3.18 as it fixes quite a big problem
with THP on arm and arm64. This series is split into a core mm part, an
arm part and an arm64 part.

Could somebody please take patch #1 (if it looks okay)?
Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)

Cheers,
--
Steve

Steve Capper (6):
  mm: Introduce a general RCU get_user_pages_fast.
  arm: mm: Introduce special ptes for LPAE
  arm: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm: mm: Enable RCU fast_gup
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

 arch/arm/Kconfig                      |   5 +
 arch/arm/include/asm/pgtable-2level.h |   2 +
 arch/arm/include/asm/pgtable-3level.h |  15 ++
 arch/arm/include/asm/pgtable.h        |   6 +-
 arch/arm/include/asm/tlb.h            |  38 ++++-
 arch/arm/mm/flush.c                   |  15 ++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/include/asm/pgtable.h      |  21 ++-
 arch/arm64/include/asm/tlb.h          |  20 ++-
 arch/arm64/mm/flush.c                 |  15 ++
 mm/Kconfig                            |   3 +
 mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
 12 files changed, 412 insertions(+), 10 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

get_user_pages_fast attempts to pin user pages by walking the page
tables directly and avoids taking locks. Thus the walker needs to be
protected from page table pages being freed from under it, and needs
to block any THP splits.

One way to achieve this is to have the walker disable interrupts, and
rely on IPIs from the TLB flushing code blocking before the page table
pages are freed.

On some platforms we have hardware broadcast of TLB invalidations, thus
the TLB flushing code doesn't necessarily need to broadcast IPIs; and
spuriously broadcasting IPIs can hurt system performance if done too
often.

This problem has been solved on PowerPC and Sparc by batching up page
table pages belonging to more than one mm_user, then scheduling an
rcu_sched callback to free the pages. This RCU page table free logic
has been promoted to core code and is activated when one enables
HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
their own get_user_pages_fast routines.

The RCU page table free logic coupled with a an IPI broadcast on THP
split (which is a rare event), allows one to protect a page table
walker by merely disabling the interrupts during the walk.

This patch provides a general RCU implementation of get_user_pages_fast
that can be used by architectures that perform hardware broadcast of
TLB invalidations.

It is based heavily on the PowerPC implementation by Nick Piggin.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 mm/Kconfig |   3 +
 mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 886db21..0ceb8a5 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
 	boolean
 
+config HAVE_GENERIC_RCU_GUP
+	boolean
+
 config ARCH_DISCARD_MEMBLOCK
 	boolean
 
diff --git a/mm/gup.c b/mm/gup.c
index 91d044b..5e6f6cb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 
+#include <linux/sched.h>
+#include <linux/rwsem.h>
+#include <asm/pgtable.h>
+
 #include "internal.h"
 
 static struct page *no_page_table(struct vm_area_struct *vma,
@@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
 	return page;
 }
 #endif /* CONFIG_ELF_CORE */
+
+#ifdef CONFIG_HAVE_GENERIC_RCU_GUP
+
+#ifdef __HAVE_ARCH_PTE_SPECIAL
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, *ptem;
+	int ret = 0;
+
+	ptem = ptep = pte_offset_map(&pmd, addr);
+	do {
+		pte_t pte = ACCESS_ONCE(*ptep);
+		struct page *page;
+
+		if (!pte_present(pte) || pte_special(pte)
+			|| (write && !pte_write(pte)))
+			goto pte_unmap;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			goto pte_unmap;
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			goto pte_unmap;
+		}
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	ret = 1;
+
+pte_unmap:
+	pte_unmap(ptem);
+	return ret;
+}
+#else
+
+/*
+ * If we can't determine whether or not a pte is special, then fail immediately
+ * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
+ * to be special.
+ */
+static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	return 0;
+}
+#endif /* __HAVE_ARCH_PTE_SPECIAL */
+
+static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pmd_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(orig);
+	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	/*
+	 * Any tail pages need their mapcount reference taken before we
+	 * return. (This allows the THP code to bump their ref count when
+	 * they are split into base pages).
+	 */
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pud_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pud_page(orig);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(&pud, addr);
+	do {
+		pmd_t pmd = ACCESS_ONCE(*pmdp);
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+			return 0;
+
+		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+				pages, nr))
+				return 0;
+		} else {
+			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+				return 0;
+		}
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		pud_t pud = ACCESS_ONCE(*pudp);
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			return 0;
+		if (pud_huge(pud)) {
+			if (!gup_huge_pud(pud, pudp, addr, next, write,
+					pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			  struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next, flags;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		return 0;
+
+	/*
+	 * Disable interrupts, we use the nested form as we can already
+	 * have interrupts disabled by get_futex_key.
+	 *
+	 * With interrupts disabled, we block page table pages from being
+	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+	 * for more details.
+	 *
+	 * We do not adopt an rcu_read_lock(.) here as we also want to
+	 * block IPIs that come from THPs splitting.
+	 */
+
+	local_irq_save(flags);
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			break;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			break;
+	} while (pgdp++, addr = next, addr != end);
+	local_irq_restore(flags);
+
+	return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr, ret;
+
+	start &= PAGE_MASK;
+	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	ret = nr;
+
+	if (nr < nr_pages) {
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+				     nr_pages - nr, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+	}
+
+	return ret;
+}
+
+#endif /* CONFIG_HAVE_GENERIC_RCU_GUP */
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

get_user_pages_fast attempts to pin user pages by walking the page
tables directly and avoids taking locks. Thus the walker needs to be
protected from page table pages being freed from under it, and needs
to block any THP splits.

One way to achieve this is to have the walker disable interrupts, and
rely on IPIs from the TLB flushing code blocking before the page table
pages are freed.

On some platforms we have hardware broadcast of TLB invalidations, thus
the TLB flushing code doesn't necessarily need to broadcast IPIs; and
spuriously broadcasting IPIs can hurt system performance if done too
often.

This problem has been solved on PowerPC and Sparc by batching up page
table pages belonging to more than one mm_user, then scheduling an
rcu_sched callback to free the pages. This RCU page table free logic
has been promoted to core code and is activated when one enables
HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
their own get_user_pages_fast routines.

The RCU page table free logic coupled with a an IPI broadcast on THP
split (which is a rare event), allows one to protect a page table
walker by merely disabling the interrupts during the walk.

This patch provides a general RCU implementation of get_user_pages_fast
that can be used by architectures that perform hardware broadcast of
TLB invalidations.

It is based heavily on the PowerPC implementation by Nick Piggin.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 mm/Kconfig |   3 +
 mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 886db21..0ceb8a5 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
 	boolean
 
+config HAVE_GENERIC_RCU_GUP
+	boolean
+
 config ARCH_DISCARD_MEMBLOCK
 	boolean
 
diff --git a/mm/gup.c b/mm/gup.c
index 91d044b..5e6f6cb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 
+#include <linux/sched.h>
+#include <linux/rwsem.h>
+#include <asm/pgtable.h>
+
 #include "internal.h"
 
 static struct page *no_page_table(struct vm_area_struct *vma,
@@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
 	return page;
 }
 #endif /* CONFIG_ELF_CORE */
+
+#ifdef CONFIG_HAVE_GENERIC_RCU_GUP
+
+#ifdef __HAVE_ARCH_PTE_SPECIAL
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, *ptem;
+	int ret = 0;
+
+	ptem = ptep = pte_offset_map(&pmd, addr);
+	do {
+		pte_t pte = ACCESS_ONCE(*ptep);
+		struct page *page;
+
+		if (!pte_present(pte) || pte_special(pte)
+			|| (write && !pte_write(pte)))
+			goto pte_unmap;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			goto pte_unmap;
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			goto pte_unmap;
+		}
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	ret = 1;
+
+pte_unmap:
+	pte_unmap(ptem);
+	return ret;
+}
+#else
+
+/*
+ * If we can't determine whether or not a pte is special, then fail immediately
+ * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
+ * to be special.
+ */
+static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	return 0;
+}
+#endif /* __HAVE_ARCH_PTE_SPECIAL */
+
+static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pmd_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(orig);
+	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	/*
+	 * Any tail pages need their mapcount reference taken before we
+	 * return. (This allows the THP code to bump their ref count when
+	 * they are split into base pages).
+	 */
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pud_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pud_page(orig);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(&pud, addr);
+	do {
+		pmd_t pmd = ACCESS_ONCE(*pmdp);
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+			return 0;
+
+		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+				pages, nr))
+				return 0;
+		} else {
+			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+				return 0;
+		}
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		pud_t pud = ACCESS_ONCE(*pudp);
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			return 0;
+		if (pud_huge(pud)) {
+			if (!gup_huge_pud(pud, pudp, addr, next, write,
+					pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			  struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next, flags;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		return 0;
+
+	/*
+	 * Disable interrupts, we use the nested form as we can already
+	 * have interrupts disabled by get_futex_key.
+	 *
+	 * With interrupts disabled, we block page table pages from being
+	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+	 * for more details.
+	 *
+	 * We do not adopt an rcu_read_lock(.) here as we also want to
+	 * block IPIs that come from THPs splitting.
+	 */
+
+	local_irq_save(flags);
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			break;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			break;
+	} while (pgdp++, addr = next, addr != end);
+	local_irq_restore(flags);
+
+	return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr, ret;
+
+	start &= PAGE_MASK;
+	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	ret = nr;
+
+	if (nr < nr_pages) {
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+				     nr_pages - nr, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+	}
+
+	return ret;
+}
+
+#endif /* CONFIG_HAVE_GENERIC_RCU_GUP */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

get_user_pages_fast attempts to pin user pages by walking the page
tables directly and avoids taking locks. Thus the walker needs to be
protected from page table pages being freed from under it, and needs
to block any THP splits.

One way to achieve this is to have the walker disable interrupts, and
rely on IPIs from the TLB flushing code blocking before the page table
pages are freed.

On some platforms we have hardware broadcast of TLB invalidations, thus
the TLB flushing code doesn't necessarily need to broadcast IPIs; and
spuriously broadcasting IPIs can hurt system performance if done too
often.

This problem has been solved on PowerPC and Sparc by batching up page
table pages belonging to more than one mm_user, then scheduling an
rcu_sched callback to free the pages. This RCU page table free logic
has been promoted to core code and is activated when one enables
HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
their own get_user_pages_fast routines.

The RCU page table free logic coupled with a an IPI broadcast on THP
split (which is a rare event), allows one to protect a page table
walker by merely disabling the interrupts during the walk.

This patch provides a general RCU implementation of get_user_pages_fast
that can be used by architectures that perform hardware broadcast of
TLB invalidations.

It is based heavily on the PowerPC implementation by Nick Piggin.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 mm/Kconfig |   3 +
 mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 886db21..0ceb8a5 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
 	boolean
 
+config HAVE_GENERIC_RCU_GUP
+	boolean
+
 config ARCH_DISCARD_MEMBLOCK
 	boolean
 
diff --git a/mm/gup.c b/mm/gup.c
index 91d044b..5e6f6cb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 
+#include <linux/sched.h>
+#include <linux/rwsem.h>
+#include <asm/pgtable.h>
+
 #include "internal.h"
 
 static struct page *no_page_table(struct vm_area_struct *vma,
@@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
 	return page;
 }
 #endif /* CONFIG_ELF_CORE */
+
+#ifdef CONFIG_HAVE_GENERIC_RCU_GUP
+
+#ifdef __HAVE_ARCH_PTE_SPECIAL
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, *ptem;
+	int ret = 0;
+
+	ptem = ptep = pte_offset_map(&pmd, addr);
+	do {
+		pte_t pte = ACCESS_ONCE(*ptep);
+		struct page *page;
+
+		if (!pte_present(pte) || pte_special(pte)
+			|| (write && !pte_write(pte)))
+			goto pte_unmap;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			goto pte_unmap;
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			goto pte_unmap;
+		}
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	ret = 1;
+
+pte_unmap:
+	pte_unmap(ptem);
+	return ret;
+}
+#else
+
+/*
+ * If we can't determine whether or not a pte is special, then fail immediately
+ * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
+ * to be special.
+ */
+static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	return 0;
+}
+#endif /* __HAVE_ARCH_PTE_SPECIAL */
+
+static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pmd_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(orig);
+	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	/*
+	 * Any tail pages need their mapcount reference taken before we
+	 * return. (This allows the THP code to bump their ref count when
+	 * they are split into base pages).
+	 */
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pud_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pud_page(orig);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(&pud, addr);
+	do {
+		pmd_t pmd = ACCESS_ONCE(*pmdp);
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+			return 0;
+
+		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+				pages, nr))
+				return 0;
+		} else {
+			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+				return 0;
+		}
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		pud_t pud = ACCESS_ONCE(*pudp);
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			return 0;
+		if (pud_huge(pud)) {
+			if (!gup_huge_pud(pud, pudp, addr, next, write,
+					pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			  struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next, flags;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		return 0;
+
+	/*
+	 * Disable interrupts, we use the nested form as we can already
+	 * have interrupts disabled by get_futex_key.
+	 *
+	 * With interrupts disabled, we block page table pages from being
+	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+	 * for more details.
+	 *
+	 * We do not adopt an rcu_read_lock(.) here as we also want to
+	 * block IPIs that come from THPs splitting.
+	 */
+
+	local_irq_save(flags);
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			break;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			break;
+	} while (pgdp++, addr = next, addr != end);
+	local_irq_restore(flags);
+
+	return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr, ret;
+
+	start &= PAGE_MASK;
+	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	ret = nr;
+
+	if (nr < nr_pages) {
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+				     nr_pages - nr, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+	}
+
+	return ret;
+}
+
+#endif /* CONFIG_HAVE_GENERIC_RCU_GUP */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

We need a mechanism to tag ptes as being special, this indicates that
no attempt should be made to access the underlying struct page *
associated with the pte. This is used by the fast_gup when operating on
ptes as it has no means to access VMAs (that also contain this
information) locklessly.

The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
pte_special and pte_mkspecial to make use of it, and defines
__HAVE_ARCH_PTE_SPECIAL.

This patch also excludes special ptes from the icache/dcache sync logic.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 arch/arm/include/asm/pgtable-3level.h | 7 +++++++
 arch/arm/include/asm/pgtable.h        | 6 ++----
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 219ac88..f027941 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_addr_end(addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#define pte_special(pte)	(0)
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 
 /*
  * We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 06e0bc0..16122d4 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -213,6 +213,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_isclear(pmd, val)	(!(pmd_val(pmd) & (val)))
 
 #define pmd_young(pmd)		(pmd_isset((pmd), PMD_SECT_AF))
+#define pte_special(pte)	(pte_isset((pte), L_PTE_SPECIAL))
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+	pte_val(pte) |= L_PTE_SPECIAL;
+	return pte;
+}
+#define	__HAVE_ARCH_PTE_SPECIAL
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 01baef0..90aa4583 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -226,7 +226,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_dirty(pte)		(pte_isset((pte), L_PTE_DIRTY))
 #define pte_young(pte)		(pte_isset((pte), L_PTE_YOUNG))
 #define pte_exec(pte)		(pte_isclear((pte), L_PTE_XN))
-#define pte_special(pte)	(0)
 
 #define pte_valid_user(pte)	\
 	(pte_valid(pte) && pte_isset((pte), L_PTE_USER) && pte_young(pte))
@@ -245,7 +244,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long ext = 0;
 
 	if (addr < TASK_SIZE && pte_valid_user(pteval)) {
-		__sync_icache_dcache(pteval);
+		if (!pte_special(pteval))
+			__sync_icache_dcache(pteval);
 		ext |= PTE_EXT_NG;
 	}
 
@@ -264,8 +264,6 @@ PTE_BIT_FUNC(mkyoung,   |= L_PTE_YOUNG);
 PTE_BIT_FUNC(mkexec,   &= ~L_PTE_XN);
 PTE_BIT_FUNC(mknexec,   |= L_PTE_XN);
 
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

We need a mechanism to tag ptes as being special, this indicates that
no attempt should be made to access the underlying struct page *
associated with the pte. This is used by the fast_gup when operating on
ptes as it has no means to access VMAs (that also contain this
information) locklessly.

The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
pte_special and pte_mkspecial to make use of it, and defines
__HAVE_ARCH_PTE_SPECIAL.

This patch also excludes special ptes from the icache/dcache sync logic.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 arch/arm/include/asm/pgtable-3level.h | 7 +++++++
 arch/arm/include/asm/pgtable.h        | 6 ++----
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 219ac88..f027941 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_addr_end(addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#define pte_special(pte)	(0)
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 
 /*
  * We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 06e0bc0..16122d4 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -213,6 +213,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_isclear(pmd, val)	(!(pmd_val(pmd) & (val)))
 
 #define pmd_young(pmd)		(pmd_isset((pmd), PMD_SECT_AF))
+#define pte_special(pte)	(pte_isset((pte), L_PTE_SPECIAL))
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+	pte_val(pte) |= L_PTE_SPECIAL;
+	return pte;
+}
+#define	__HAVE_ARCH_PTE_SPECIAL
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 01baef0..90aa4583 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -226,7 +226,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_dirty(pte)		(pte_isset((pte), L_PTE_DIRTY))
 #define pte_young(pte)		(pte_isset((pte), L_PTE_YOUNG))
 #define pte_exec(pte)		(pte_isclear((pte), L_PTE_XN))
-#define pte_special(pte)	(0)
 
 #define pte_valid_user(pte)	\
 	(pte_valid(pte) && pte_isset((pte), L_PTE_USER) && pte_young(pte))
@@ -245,7 +244,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long ext = 0;
 
 	if (addr < TASK_SIZE && pte_valid_user(pteval)) {
-		__sync_icache_dcache(pteval);
+		if (!pte_special(pteval))
+			__sync_icache_dcache(pteval);
 		ext |= PTE_EXT_NG;
 	}
 
@@ -264,8 +264,6 @@ PTE_BIT_FUNC(mkyoung,   |= L_PTE_YOUNG);
 PTE_BIT_FUNC(mkexec,   &= ~L_PTE_XN);
 PTE_BIT_FUNC(mknexec,   |= L_PTE_XN);
 
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

We need a mechanism to tag ptes as being special, this indicates that
no attempt should be made to access the underlying struct page *
associated with the pte. This is used by the fast_gup when operating on
ptes as it has no means to access VMAs (that also contain this
information) locklessly.

The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
pte_special and pte_mkspecial to make use of it, and defines
__HAVE_ARCH_PTE_SPECIAL.

This patch also excludes special ptes from the icache/dcache sync logic.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 arch/arm/include/asm/pgtable-3level.h | 7 +++++++
 arch/arm/include/asm/pgtable.h        | 6 ++----
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 219ac88..f027941 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_addr_end(addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#define pte_special(pte)	(0)
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 
 /*
  * We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 06e0bc0..16122d4 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -213,6 +213,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_isclear(pmd, val)	(!(pmd_val(pmd) & (val)))
 
 #define pmd_young(pmd)		(pmd_isset((pmd), PMD_SECT_AF))
+#define pte_special(pte)	(pte_isset((pte), L_PTE_SPECIAL))
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+	pte_val(pte) |= L_PTE_SPECIAL;
+	return pte;
+}
+#define	__HAVE_ARCH_PTE_SPECIAL
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 01baef0..90aa4583 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -226,7 +226,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_dirty(pte)		(pte_isset((pte), L_PTE_DIRTY))
 #define pte_young(pte)		(pte_isset((pte), L_PTE_YOUNG))
 #define pte_exec(pte)		(pte_isclear((pte), L_PTE_XN))
-#define pte_special(pte)	(0)
 
 #define pte_valid_user(pte)	\
 	(pte_valid(pte) && pte_isset((pte), L_PTE_USER) && pte_young(pte))
@@ -245,7 +244,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long ext = 0;
 
 	if (addr < TASK_SIZE && pte_valid_user(pteval)) {
-		__sync_icache_dcache(pteval);
+		if (!pte_special(pteval))
+			__sync_icache_dcache(pteval);
 		ext |= PTE_EXT_NG;
 	}
 
@@ -264,8 +264,6 @@ PTE_BIT_FUNC(mkyoung,   |= L_PTE_YOUNG);
 PTE_BIT_FUNC(mkexec,   &= ~L_PTE_XN);
 PTE_BIT_FUNC(mknexec,   |= L_PTE_XN);
 
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect
the fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/Kconfig           |  1 +
 arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c49a775..cc740d2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -60,6 +60,7 @@ config ARM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f1a0dac..3cadb72 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -35,12 +35,39 @@
 
 #define MMU_GATHER_BUNDLE	8
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+
+struct mmu_table_batch {
+	struct rcu_head		rcu;
+	unsigned int		nr;
+	void			*tables[0];
+};
+
+#define MAX_TABLE_BATCH		\
+	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * TLB handling.  This allows us to remove pages from the page
  * tables, and efficiently handle the TLB issues.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+	unsigned int		need_flush;
+#endif
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
 	unsigned long		start, end;
@@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
@@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
 	__tlb_alloc_page(tlb);
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
 }
 
 static inline void
@@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	tlb_add_flush(tlb, addr + SZ_1M);
 #endif
 
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
@@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 {
 #ifdef CONFIG_ARM_LPAE
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 #endif
 }
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect
the fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/Kconfig           |  1 +
 arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c49a775..cc740d2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -60,6 +60,7 @@ config ARM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f1a0dac..3cadb72 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -35,12 +35,39 @@
 
 #define MMU_GATHER_BUNDLE	8
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+
+struct mmu_table_batch {
+	struct rcu_head		rcu;
+	unsigned int		nr;
+	void			*tables[0];
+};
+
+#define MAX_TABLE_BATCH		\
+	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * TLB handling.  This allows us to remove pages from the page
  * tables, and efficiently handle the TLB issues.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+	unsigned int		need_flush;
+#endif
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
 	unsigned long		start, end;
@@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
@@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
 	__tlb_alloc_page(tlb);
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
 }
 
 static inline void
@@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	tlb_add_flush(tlb, addr + SZ_1M);
 #endif
 
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
@@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 {
 #ifdef CONFIG_ARM_LPAE
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 #endif
 }
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect
the fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/Kconfig           |  1 +
 arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c49a775..cc740d2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -60,6 +60,7 @@ config ARM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f1a0dac..3cadb72 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -35,12 +35,39 @@
 
 #define MMU_GATHER_BUNDLE	8
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+
+struct mmu_table_batch {
+	struct rcu_head		rcu;
+	unsigned int		nr;
+	void			*tables[0];
+};
+
+#define MAX_TABLE_BATCH		\
+	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * TLB handling.  This allows us to remove pages from the page
  * tables, and efficiently handle the TLB issues.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+	unsigned int		need_flush;
+#endif
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
 	unsigned long		start, end;
@@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
@@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
 	__tlb_alloc_page(tlb);
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
 }
 
 static inline void
@@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	tlb_add_flush(tlb, addr + SZ_1M);
 #endif
 
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
@@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 {
 #ifdef CONFIG_ARM_LPAE
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 #endif
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 4/6] arm: mm: Enable RCU fast_gup
  2014-08-28 14:45 ` Steve Capper
  (?)
@ 2014-08-28 14:45   ` Steve Capper
  -1 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

Activate the RCU fast_gup for ARM. We also need to force THP splits to
broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/Kconfig                      |  4 ++++
 arch/arm/include/asm/pgtable-3level.h |  8 ++++++++
 arch/arm/mm/flush.c                   | 15 +++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cc740d2..0e5b47f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1645,6 +1645,10 @@ config ARCH_SELECT_MEMORY_MODEL
 config HAVE_ARCH_PFN_VALID
 	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
 
+config HAVE_GENERIC_RCU_GUP
+	def_bool y
+	depends on ARM_LPAE
+
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 16122d4..a31ecdad 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -224,6 +224,8 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
 #define pmd_dirty(pmd)		(pmd_isset((pmd), L_PMD_SECT_DIRTY))
+#define pud_page(pud)		pmd_page(__pmd(pud_val(pud)))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 
 #define pmd_hugewillfault(pmd)	(!pmd_young(pmd) || !pmd_write(pmd))
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
@@ -231,6 +233,12 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !pmd_table(pmd))
 #define pmd_trans_splitting(pmd) (pmd_isset((pmd), L_PMD_SECT_SPLITTING))
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif
 #endif
 
 #define PMD_BIT_FUNC(fn,op) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 43d54f5..265b836 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -400,3 +400,18 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l
 	 */
 	__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

Activate the RCU fast_gup for ARM. We also need to force THP splits to
broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/Kconfig                      |  4 ++++
 arch/arm/include/asm/pgtable-3level.h |  8 ++++++++
 arch/arm/mm/flush.c                   | 15 +++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cc740d2..0e5b47f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1645,6 +1645,10 @@ config ARCH_SELECT_MEMORY_MODEL
 config HAVE_ARCH_PFN_VALID
 	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
 
+config HAVE_GENERIC_RCU_GUP
+	def_bool y
+	depends on ARM_LPAE
+
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 16122d4..a31ecdad 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -224,6 +224,8 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
 #define pmd_dirty(pmd)		(pmd_isset((pmd), L_PMD_SECT_DIRTY))
+#define pud_page(pud)		pmd_page(__pmd(pud_val(pud)))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 
 #define pmd_hugewillfault(pmd)	(!pmd_young(pmd) || !pmd_write(pmd))
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
@@ -231,6 +233,12 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !pmd_table(pmd))
 #define pmd_trans_splitting(pmd) (pmd_isset((pmd), L_PMD_SECT_SPLITTING))
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif
 #endif
 
 #define PMD_BIT_FUNC(fn,op) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 43d54f5..265b836 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -400,3 +400,18 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l
 	 */
 	__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

Activate the RCU fast_gup for ARM. We also need to force THP splits to
broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/Kconfig                      |  4 ++++
 arch/arm/include/asm/pgtable-3level.h |  8 ++++++++
 arch/arm/mm/flush.c                   | 15 +++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cc740d2..0e5b47f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1645,6 +1645,10 @@ config ARCH_SELECT_MEMORY_MODEL
 config HAVE_ARCH_PFN_VALID
 	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
 
+config HAVE_GENERIC_RCU_GUP
+	def_bool y
+	depends on ARM_LPAE
+
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 16122d4..a31ecdad 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -224,6 +224,8 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
 #define pmd_dirty(pmd)		(pmd_isset((pmd), L_PMD_SECT_DIRTY))
+#define pud_page(pud)		pmd_page(__pmd(pud_val(pud)))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 
 #define pmd_hugewillfault(pmd)	(!pmd_young(pmd) || !pmd_write(pmd))
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
@@ -231,6 +233,12 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !pmd_table(pmd))
 #define pmd_trans_splitting(pmd) (pmd_isset((pmd), L_PMD_SECT_SPLITTING))
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif
 #endif
 
 #define PMD_BIT_FUNC(fn,op) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 43d54f5..265b836 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -400,3 +400,18 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l
 	 */
 	__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect the
fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig           |  1 +
 arch/arm64/include/asm/tlb.h | 20 +++++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd4e81a..ce9062b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -54,6 +54,7 @@ config ARM64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
 	select IRQ_DOMAIN
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 62731ef..a82c0c5 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -23,6 +23,20 @@
 
 #include <asm-generic/tlb.h>
 
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * There's three ways the TLB shootdown code is used:
  *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
@@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 {
 	pgtable_page_dtor(pte);
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 2
@@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 }
 #endif
 
@@ -105,7 +119,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pudp));
+	tlb_remove_entry(tlb, virt_to_page(pudp));
 }
 #endif
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect the
fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig           |  1 +
 arch/arm64/include/asm/tlb.h | 20 +++++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd4e81a..ce9062b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -54,6 +54,7 @@ config ARM64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
 	select IRQ_DOMAIN
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 62731ef..a82c0c5 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -23,6 +23,20 @@
 
 #include <asm-generic/tlb.h>
 
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * There's three ways the TLB shootdown code is used:
  *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
@@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 {
 	pgtable_page_dtor(pte);
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 2
@@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 }
 #endif
 
@@ -105,7 +119,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pudp));
+	tlb_remove_entry(tlb, virt_to_page(pudp));
 }
 #endif
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect the
fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig           |  1 +
 arch/arm64/include/asm/tlb.h | 20 +++++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd4e81a..ce9062b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -54,6 +54,7 @@ config ARM64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
 	select IRQ_DOMAIN
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 62731ef..a82c0c5 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -23,6 +23,20 @@
 
 #include <asm-generic/tlb.h>
 
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * There's three ways the TLB shootdown code is used:
  *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
@@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 {
 	pgtable_page_dtor(pte);
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 2
@@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 }
 #endif
 
@@ -105,7 +119,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pudp));
+	tlb_remove_entry(tlb, virt_to_page(pudp));
 }
 #endif
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

Activate the RCU fast_gup for ARM64. We also need to force THP splits
to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgtable.h | 21 ++++++++++++++++++++-
 arch/arm64/mm/flush.c            | 15 +++++++++++++++
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ce9062b..435305e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -108,6 +108,9 @@ config GENERIC_CALIBRATE_DELAY
 config ZONE_DMA
 	def_bool y
 
+config HAVE_GENERIC_RCU_GUP
+	def_bool y
+
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool y
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index ffe1ba0..6d81471 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -239,6 +239,16 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 
 #define __HAVE_ARCH_PTE_SPECIAL
 
+static inline pte_t pud_pte(pud_t pud)
+{
+	return __pte(pud_val(pud));
+}
+
+static inline pmd_t pud_pmd(pud_t pud)
+{
+	return __pmd(pud_val(pud));
+}
+
 static inline pte_t pmd_pte(pmd_t pmd)
 {
 	return __pte(pmd_val(pmd));
@@ -256,7 +266,13 @@ static inline pmd_t pte_pmd(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
-#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+struct vm_area_struct;
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
@@ -277,6 +293,7 @@ static inline pmd_t pte_pmd(pte_t pte)
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
 #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+#define pud_write(pud)		pte_write(pud_pte(pud))
 #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
 
 #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
@@ -376,6 +393,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
 }
 
+#define pud_page(pud)           pmd_page(pud_pmd(pud))
+
 #endif	/* CONFIG_ARM64_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 3
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 0d64089..2d5fd47 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -104,3 +104,18 @@ EXPORT_SYMBOL(flush_dcache_page);
  */
 EXPORT_SYMBOL(flush_cache_all);
 EXPORT_SYMBOL(flush_icache_range);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm, akpm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman, Steve Capper

Activate the RCU fast_gup for ARM64. We also need to force THP splits
to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgtable.h | 21 ++++++++++++++++++++-
 arch/arm64/mm/flush.c            | 15 +++++++++++++++
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ce9062b..435305e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -108,6 +108,9 @@ config GENERIC_CALIBRATE_DELAY
 config ZONE_DMA
 	def_bool y
 
+config HAVE_GENERIC_RCU_GUP
+	def_bool y
+
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool y
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index ffe1ba0..6d81471 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -239,6 +239,16 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 
 #define __HAVE_ARCH_PTE_SPECIAL
 
+static inline pte_t pud_pte(pud_t pud)
+{
+	return __pte(pud_val(pud));
+}
+
+static inline pmd_t pud_pmd(pud_t pud)
+{
+	return __pmd(pud_val(pud));
+}
+
 static inline pte_t pmd_pte(pmd_t pmd)
 {
 	return __pte(pmd_val(pmd));
@@ -256,7 +266,13 @@ static inline pmd_t pte_pmd(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
-#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+struct vm_area_struct;
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
@@ -277,6 +293,7 @@ static inline pmd_t pte_pmd(pte_t pte)
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
 #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+#define pud_write(pud)		pte_write(pud_pte(pud))
 #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
 
 #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
@@ -376,6 +393,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
 }
 
+#define pud_page(pud)           pmd_page(pud_pmd(pud))
+
 #endif	/* CONFIG_ARM64_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 3
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 0d64089..2d5fd47 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -104,3 +104,18 @@ EXPORT_SYMBOL(flush_dcache_page);
  */
 EXPORT_SYMBOL(flush_cache_all);
 EXPORT_SYMBOL(flush_icache_range);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V3 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-28 14:45   ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-08-28 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

Activate the RCU fast_gup for ARM64. We also need to force THP splits
to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgtable.h | 21 ++++++++++++++++++++-
 arch/arm64/mm/flush.c            | 15 +++++++++++++++
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ce9062b..435305e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -108,6 +108,9 @@ config GENERIC_CALIBRATE_DELAY
 config ZONE_DMA
 	def_bool y
 
+config HAVE_GENERIC_RCU_GUP
+	def_bool y
+
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool y
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index ffe1ba0..6d81471 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -239,6 +239,16 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 
 #define __HAVE_ARCH_PTE_SPECIAL
 
+static inline pte_t pud_pte(pud_t pud)
+{
+	return __pte(pud_val(pud));
+}
+
+static inline pmd_t pud_pmd(pud_t pud)
+{
+	return __pmd(pud_val(pud));
+}
+
 static inline pte_t pmd_pte(pmd_t pmd)
 {
 	return __pte(pmd_val(pmd));
@@ -256,7 +266,13 @@ static inline pmd_t pte_pmd(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
-#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+struct vm_area_struct;
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
@@ -277,6 +293,7 @@ static inline pmd_t pte_pmd(pte_t pte)
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
 #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+#define pud_write(pud)		pte_write(pud_pte(pud))
 #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
 
 #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
@@ -376,6 +393,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
 }
 
+#define pud_page(pud)           pmd_page(pud_pmd(pud))
+
 #endif	/* CONFIG_ARM64_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 3
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 0d64089..2d5fd47 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -104,3 +104,18 @@ EXPORT_SYMBOL(flush_dcache_page);
  */
 EXPORT_SYMBOL(flush_cache_all);
 EXPORT_SYMBOL(flush_icache_range);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
  2014-08-28 14:45 ` Steve Capper
  (?)
@ 2014-08-28 15:23   ` Will Deacon
  -1 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2014-08-28 15:23 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	akpm, gary.robertson, christoffer.dall, peterz, anders.roxell,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
> I would like to get this series into 3.18 as it fixes quite a big problem
> with THP on arm and arm64. This series is split into a core mm part, an
> arm part and an arm64 part.
> 
> Could somebody please take patch #1 (if it looks okay)?
> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)

Pretty sure we're happy to take the arm64 bits once you've got the core
changes sorted out. Failing that, Catalin's acked them so they could go via
an mm tree if it's easier.

Will

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-28 15:23   ` Will Deacon
  0 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2014-08-28 15:23 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	akpm, gary.robertson, christoffer.dall, peterz, anders.roxell,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
> I would like to get this series into 3.18 as it fixes quite a big problem
> with THP on arm and arm64. This series is split into a core mm part, an
> arm part and an arm64 part.
> 
> Could somebody please take patch #1 (if it looks okay)?
> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)

Pretty sure we're happy to take the arm64 bits once you've got the core
changes sorted out. Failing that, Catalin's acked them so they could go via
an mm tree if it's easier.

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-28 15:23   ` Will Deacon
  0 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2014-08-28 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
> I would like to get this series into 3.18 as it fixes quite a big problem
> with THP on arm and arm64. This series is split into a core mm part, an
> arm part and an arm64 part.
> 
> Could somebody please take patch #1 (if it looks okay)?
> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)

Pretty sure we're happy to take the arm64 bits once you've got the core
changes sorted out. Failing that, Catalin's acked them so they could go via
an mm tree if it's easier.

Will

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-01 11:43     ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-01 11:43 UTC (permalink / raw)
  To: linux-mm, Will Deacon, akpm
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch,
	gary.robertson, christoffer.dall, peterz, anders.roxell,
	dann.frazier, Mark Rutland, mgorman

On 28 August 2014 16:23, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
>> I would like to get this series into 3.18 as it fixes quite a big problem
>> with THP on arm and arm64. This series is split into a core mm part, an
>> arm part and an arm64 part.
>>
>> Could somebody please take patch #1 (if it looks okay)?
>> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
>> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)
>
> Pretty sure we're happy to take the arm64 bits once you've got the core
> changes sorted out. Failing that, Catalin's acked them so they could go via
> an mm tree if it's easier.
>

Hello,

Are any mm maintainers willing to take the first patch from this
series into their tree for merging into 3.18?
  mm: Introduce a general RCU get_user_pages_fast.

(or please let me know if there are any issues with the patch that
need addressing).

As Will has stated, Catalin's already acked the arm64 patches, and
these can also go in via an mm tree if that makes things easier:
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

Thanks,
--
Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-01 11:43     ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-01 11:43 UTC (permalink / raw)
  To: linux-mm, Will Deacon, akpm
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch,
	gary.robertson, christoffer.dall, peterz, anders.roxell,
	dann.frazier, Mark Rutland, mgorman

On 28 August 2014 16:23, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
>> I would like to get this series into 3.18 as it fixes quite a big problem
>> with THP on arm and arm64. This series is split into a core mm part, an
>> arm part and an arm64 part.
>>
>> Could somebody please take patch #1 (if it looks okay)?
>> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
>> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)
>
> Pretty sure we're happy to take the arm64 bits once you've got the core
> changes sorted out. Failing that, Catalin's acked them so they could go via
> an mm tree if it's easier.
>

Hello,

Are any mm maintainers willing to take the first patch from this
series into their tree for merging into 3.18?
  mm: Introduce a general RCU get_user_pages_fast.

(or please let me know if there are any issues with the patch that
need addressing).

As Will has stated, Catalin's already acked the arm64 patches, and
these can also go in via an mm tree if that makes things easier:
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

Thanks,
--
Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-01 11:43     ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-01 11:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 August 2014 16:23, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
>> I would like to get this series into 3.18 as it fixes quite a big problem
>> with THP on arm and arm64. This series is split into a core mm part, an
>> arm part and an arm64 part.
>>
>> Could somebody please take patch #1 (if it looks okay)?
>> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
>> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)
>
> Pretty sure we're happy to take the arm64 bits once you've got the core
> changes sorted out. Failing that, Catalin's acked them so they could go via
> an mm tree if it's easier.
>

Hello,

Are any mm maintainers willing to take the first patch from this
series into their tree for merging into 3.18?
  mm: Introduce a general RCU get_user_pages_fast.

(or please let me know if there are any issues with the patch that
need addressing).

As Will has stated, Catalin's already acked the arm64 patches, and
these can also go in via an mm tree if that makes things easier:
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

Thanks,
--
Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-08  9:06       ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-08  9:06 UTC (permalink / raw)
  To: linux-mm, Will Deacon, akpm
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch,
	gary.robertson, christoffer.dall, peterz, anders.roxell,
	dann.frazier, Mark Rutland, mgorman

On Mon, Sep 01, 2014 at 12:43:06PM +0100, Steve Capper wrote:
> On 28 August 2014 16:23, Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
> >> I would like to get this series into 3.18 as it fixes quite a big problem
> >> with THP on arm and arm64. This series is split into a core mm part, an
> >> arm part and an arm64 part.
> >>
> >> Could somebody please take patch #1 (if it looks okay)?
> >> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
> >> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)
> >
> > Pretty sure we're happy to take the arm64 bits once you've got the core
> > changes sorted out. Failing that, Catalin's acked them so they could go via
> > an mm tree if it's easier.
> >
> 
> Hello,
> 
> Are any mm maintainers willing to take the first patch from this
> series into their tree for merging into 3.18?
>   mm: Introduce a general RCU get_user_pages_fast.
> 
> (or please let me know if there are any issues with the patch that
> need addressing).
> 
> As Will has stated, Catalin's already acked the arm64 patches, and
> these can also go in via an mm tree if that makes things easier:
>   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm64: mm: Enable RCU fast_gup
> 
> Thanks,
> --
> Steve

Hi,
Just a ping on this.

I was wondering if the first patch in this series:

[PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
http://marc.info/?l=linux-mm&m=140923713202355&w=2

could be merged into 3.18 via an mm tree, or if there are any issues
with the patch that I should fix?

Acks or flames from the mm maintainers would be greatly appreciated!

Cheers,
-- 
Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-08  9:06       ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-08  9:06 UTC (permalink / raw)
  To: linux-mm, Will Deacon, akpm
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch,
	gary.robertson, christoffer.dall, peterz, anders.roxell,
	dann.frazier, Mark Rutland, mgorman

On Mon, Sep 01, 2014 at 12:43:06PM +0100, Steve Capper wrote:
> On 28 August 2014 16:23, Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
> >> I would like to get this series into 3.18 as it fixes quite a big problem
> >> with THP on arm and arm64. This series is split into a core mm part, an
> >> arm part and an arm64 part.
> >>
> >> Could somebody please take patch #1 (if it looks okay)?
> >> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
> >> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)
> >
> > Pretty sure we're happy to take the arm64 bits once you've got the core
> > changes sorted out. Failing that, Catalin's acked them so they could go via
> > an mm tree if it's easier.
> >
> 
> Hello,
> 
> Are any mm maintainers willing to take the first patch from this
> series into their tree for merging into 3.18?
>   mm: Introduce a general RCU get_user_pages_fast.
> 
> (or please let me know if there are any issues with the patch that
> need addressing).
> 
> As Will has stated, Catalin's already acked the arm64 patches, and
> these can also go in via an mm tree if that makes things easier:
>   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm64: mm: Enable RCU fast_gup
> 
> Thanks,
> --
> Steve

Hi,
Just a ping on this.

I was wondering if the first patch in this series:

[PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
http://marc.info/?l=linux-mm&m=140923713202355&w=2

could be merged into 3.18 via an mm tree, or if there are any issues
with the patch that I should fix?

Acks or flames from the mm maintainers would be greatly appreciated!

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-08  9:06       ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-08  9:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 01, 2014 at 12:43:06PM +0100, Steve Capper wrote:
> On 28 August 2014 16:23, Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
> >> I would like to get this series into 3.18 as it fixes quite a big problem
> >> with THP on arm and arm64. This series is split into a core mm part, an
> >> arm part and an arm64 part.
> >>
> >> Could somebody please take patch #1 (if it looks okay)?
> >> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
> >> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)
> >
> > Pretty sure we're happy to take the arm64 bits once you've got the core
> > changes sorted out. Failing that, Catalin's acked them so they could go via
> > an mm tree if it's easier.
> >
> 
> Hello,
> 
> Are any mm maintainers willing to take the first patch from this
> series into their tree for merging into 3.18?
>   mm: Introduce a general RCU get_user_pages_fast.
> 
> (or please let me know if there are any issues with the patch that
> need addressing).
> 
> As Will has stated, Catalin's already acked the arm64 patches, and
> these can also go in via an mm tree if that makes things easier:
>   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm64: mm: Enable RCU fast_gup
> 
> Thanks,
> --
> Steve

Hi,
Just a ping on this.

I was wondering if the first patch in this series:

[PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
http://marc.info/?l=linux-mm&m=140923713202355&w=2

could be merged into 3.18 via an mm tree, or if there are any issues
with the patch that I should fix?

Acks or flames from the mm maintainers would be greatly appreciated!

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-19 18:28         ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-19 18:28 UTC (permalink / raw)
  To: linux-mm, Will Deacon, akpm
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch,
	gary.robertson, christoffer.dall, peterz, anders.roxell,
	dann.frazier, Mark Rutland, mgorman, hughd

On Mon, Sep 08, 2014 at 10:06:27AM +0100, Steve Capper wrote:
> On Mon, Sep 01, 2014 at 12:43:06PM +0100, Steve Capper wrote:
> > On 28 August 2014 16:23, Will Deacon <will.deacon@arm.com> wrote:
> > > On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
> > >> I would like to get this series into 3.18 as it fixes quite a big problem
> > >> with THP on arm and arm64. This series is split into a core mm part, an
> > >> arm part and an arm64 part.
> > >>
> > >> Could somebody please take patch #1 (if it looks okay)?
> > >> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
> > >> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)
> > >
> > > Pretty sure we're happy to take the arm64 bits once you've got the core
> > > changes sorted out. Failing that, Catalin's acked them so they could go via
> > > an mm tree if it's easier.
> > >
> > 
> > Hello,
> > 
> > Are any mm maintainers willing to take the first patch from this
> > series into their tree for merging into 3.18?
> >   mm: Introduce a general RCU get_user_pages_fast.
> > 
> > (or please let me know if there are any issues with the patch that
> > need addressing).
> > 
> > As Will has stated, Catalin's already acked the arm64 patches, and
> > these can also go in via an mm tree if that makes things easier:
> >   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
> >   arm64: mm: Enable RCU fast_gup
> > 
> > Thanks,
> > --
> > Steve
> 
> Hi,
> Just a ping on this.
> 
> I was wondering if the first patch in this series:
> 
> [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
> http://marc.info/?l=linux-mm&m=140923713202355&w=2
> 
> could be merged into 3.18 via an mm tree, or if there are any issues
> with the patch that I should fix?
> 
> Acks or flames from the mm maintainers would be greatly appreciated!
> 
> Cheers,
> -- 
> Steve


Hello,
Apologies for being a pest, but we're really keen to get this into 3.18,
as it fixes a THP problem with arm/arm64.

I need mm folk to either ack or flame the first patch in the series in
order to proceed. (All the patches in the series have been
acked/reviewed, but not by any mm folk.):
 [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.

If it puts people's minds at rest regarding the testing...
On top of the ltp tests, and futex tests, we also ran these patches on
the arm64 Debian buildd's. With THP set to always, just under 8000
Debian packages have been built (and unit tested) without any kernel
issues for arm64.

Cheers,
-- 
Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-19 18:28         ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-19 18:28 UTC (permalink / raw)
  To: linux-mm, Will Deacon, akpm
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch,
	gary.robertson, christoffer.dall, peterz, anders.roxell,
	dann.frazier, Mark Rutland, mgorman, hughd

On Mon, Sep 08, 2014 at 10:06:27AM +0100, Steve Capper wrote:
> On Mon, Sep 01, 2014 at 12:43:06PM +0100, Steve Capper wrote:
> > On 28 August 2014 16:23, Will Deacon <will.deacon@arm.com> wrote:
> > > On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
> > >> I would like to get this series into 3.18 as it fixes quite a big problem
> > >> with THP on arm and arm64. This series is split into a core mm part, an
> > >> arm part and an arm64 part.
> > >>
> > >> Could somebody please take patch #1 (if it looks okay)?
> > >> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
> > >> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)
> > >
> > > Pretty sure we're happy to take the arm64 bits once you've got the core
> > > changes sorted out. Failing that, Catalin's acked them so they could go via
> > > an mm tree if it's easier.
> > >
> > 
> > Hello,
> > 
> > Are any mm maintainers willing to take the first patch from this
> > series into their tree for merging into 3.18?
> >   mm: Introduce a general RCU get_user_pages_fast.
> > 
> > (or please let me know if there are any issues with the patch that
> > need addressing).
> > 
> > As Will has stated, Catalin's already acked the arm64 patches, and
> > these can also go in via an mm tree if that makes things easier:
> >   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
> >   arm64: mm: Enable RCU fast_gup
> > 
> > Thanks,
> > --
> > Steve
> 
> Hi,
> Just a ping on this.
> 
> I was wondering if the first patch in this series:
> 
> [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
> http://marc.info/?l=linux-mm&m=140923713202355&w=2
> 
> could be merged into 3.18 via an mm tree, or if there are any issues
> with the patch that I should fix?
> 
> Acks or flames from the mm maintainers would be greatly appreciated!
> 
> Cheers,
> -- 
> Steve


Hello,
Apologies for being a pest, but we're really keen to get this into 3.18,
as it fixes a THP problem with arm/arm64.

I need mm folk to either ack or flame the first patch in the series in
order to proceed. (All the patches in the series have been
acked/reviewed, but not by any mm folk.):
 [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.

If it puts people's minds at rest regarding the testing...
On top of the ltp tests, and futex tests, we also ran these patches on
the arm64 Debian buildd's. With THP set to always, just under 8000
Debian packages have been built (and unit tested) without any kernel
issues for arm64.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-19 18:28         ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-19 18:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 08, 2014 at 10:06:27AM +0100, Steve Capper wrote:
> On Mon, Sep 01, 2014 at 12:43:06PM +0100, Steve Capper wrote:
> > On 28 August 2014 16:23, Will Deacon <will.deacon@arm.com> wrote:
> > > On Thu, Aug 28, 2014 at 03:45:01PM +0100, Steve Capper wrote:
> > >> I would like to get this series into 3.18 as it fixes quite a big problem
> > >> with THP on arm and arm64. This series is split into a core mm part, an
> > >> arm part and an arm64 part.
> > >>
> > >> Could somebody please take patch #1 (if it looks okay)?
> > >> Russell, would you be happy with patches #2, #3, #4? (if we get #1 merged)
> > >> Catalin, would you be happy taking patches #5, #6? (if we get #1 merged)
> > >
> > > Pretty sure we're happy to take the arm64 bits once you've got the core
> > > changes sorted out. Failing that, Catalin's acked them so they could go via
> > > an mm tree if it's easier.
> > >
> > 
> > Hello,
> > 
> > Are any mm maintainers willing to take the first patch from this
> > series into their tree for merging into 3.18?
> >   mm: Introduce a general RCU get_user_pages_fast.
> > 
> > (or please let me know if there are any issues with the patch that
> > need addressing).
> > 
> > As Will has stated, Catalin's already acked the arm64 patches, and
> > these can also go in via an mm tree if that makes things easier:
> >   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
> >   arm64: mm: Enable RCU fast_gup
> > 
> > Thanks,
> > --
> > Steve
> 
> Hi,
> Just a ping on this.
> 
> I was wondering if the first patch in this series:
> 
> [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
> http://marc.info/?l=linux-mm&m=140923713202355&w=2
> 
> could be merged into 3.18 via an mm tree, or if there are any issues
> with the patch that I should fix?
> 
> Acks or flames from the mm maintainers would be greatly appreciated!
> 
> Cheers,
> -- 
> Steve


Hello,
Apologies for being a pest, but we're really keen to get this into 3.18,
as it fixes a THP problem with arm/arm64.

I need mm folk to either ack or flame the first patch in the series in
order to proceed. (All the patches in the series have been
acked/reviewed, but not by any mm folk.):
 [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.

If it puts people's minds at rest regarding the testing...
On top of the ltp tests, and futex tests, we also ran these patches on
the arm64 Debian buildd's. With THP set to always, just under 8000
Debian packages have been built (and unit tested) without any kernel
issues for arm64.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
  2014-09-19 18:28         ` Steve Capper
  (?)
@ 2014-09-22  9:28           ` Will Deacon
  -1 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2014-09-22  9:28 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-mm, akpm, linux-arm-kernel, Catalin Marinas, linux,
	linux-arch, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, Mark Rutland, mgorman, hughd

On Fri, Sep 19, 2014 at 07:28:09PM +0100, Steve Capper wrote:
> Apologies for being a pest, but we're really keen to get this into 3.18,
> as it fixes a THP problem with arm/arm64.
> 
> I need mm folk to either ack or flame the first patch in the series in
> order to proceed. (All the patches in the series have been
> acked/reviewed, but not by any mm folk.):
>  [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
> 
> If it puts people's minds at rest regarding the testing...
> On top of the ltp tests, and futex tests, we also ran these patches on
> the arm64 Debian buildd's. With THP set to always, just under 8000
> Debian packages have been built (and unit tested) without any kernel
> issues for arm64.

Yes, please. It would be great if somebody can take these into an -mm tree
and/or provide an ack so that we can queue them via the arm64 tree instead.

Will

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-22  9:28           ` Will Deacon
  0 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2014-09-22  9:28 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-mm, akpm, linux-arm-kernel, Catalin Marinas, linux,
	linux-arch, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, Mark Rutland, mgorman, hughd

On Fri, Sep 19, 2014 at 07:28:09PM +0100, Steve Capper wrote:
> Apologies for being a pest, but we're really keen to get this into 3.18,
> as it fixes a THP problem with arm/arm64.
> 
> I need mm folk to either ack or flame the first patch in the series in
> order to proceed. (All the patches in the series have been
> acked/reviewed, but not by any mm folk.):
>  [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
> 
> If it puts people's minds at rest regarding the testing...
> On top of the ltp tests, and futex tests, we also ran these patches on
> the arm64 Debian buildd's. With THP set to always, just under 8000
> Debian packages have been built (and unit tested) without any kernel
> issues for arm64.

Yes, please. It would be great if somebody can take these into an -mm tree
and/or provide an ack so that we can queue them via the arm64 tree instead.

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-09-22  9:28           ` Will Deacon
  0 siblings, 0 replies; 43+ messages in thread
From: Will Deacon @ 2014-09-22  9:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 19, 2014 at 07:28:09PM +0100, Steve Capper wrote:
> Apologies for being a pest, but we're really keen to get this into 3.18,
> as it fixes a THP problem with arm/arm64.
> 
> I need mm folk to either ack or flame the first patch in the series in
> order to proceed. (All the patches in the series have been
> acked/reviewed, but not by any mm folk.):
>  [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
> 
> If it puts people's minds at rest regarding the testing...
> On top of the ltp tests, and futex tests, we also ran these patches on
> the arm64 Debian buildd's. With THP set to always, just under 8000
> Debian packages have been built (and unit tested) without any kernel
> issues for arm64.

Yes, please. It would be great if somebody can take these into an -mm tree
and/or provide an ack so that we can queue them via the arm64 tree instead.

Will

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-09-24 13:34     ` Hugh Dickins
  0 siblings, 0 replies; 43+ messages in thread
From: Hugh Dickins @ 2014-09-24 13:34 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm,
	akpm, will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman

On Thu, 28 Aug 2014, Steve Capper wrote:

> get_user_pages_fast attempts to pin user pages by walking the page
> tables directly and avoids taking locks. Thus the walker needs to be
> protected from page table pages being freed from under it, and needs
> to block any THP splits.
> 
> One way to achieve this is to have the walker disable interrupts, and
> rely on IPIs from the TLB flushing code blocking before the page table
> pages are freed.
> 
> On some platforms we have hardware broadcast of TLB invalidations, thus
> the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> spuriously broadcasting IPIs can hurt system performance if done too
> often.
> 
> This problem has been solved on PowerPC and Sparc by batching up page
> table pages belonging to more than one mm_user, then scheduling an
> rcu_sched callback to free the pages. This RCU page table free logic
> has been promoted to core code and is activated when one enables
> HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> their own get_user_pages_fast routines.
> 
> The RCU page table free logic coupled with a an IPI broadcast on THP
> split (which is a rare event), allows one to protect a page table
> walker by merely disabling the interrupts during the walk.
> 
> This patch provides a general RCU implementation of get_user_pages_fast
> that can be used by architectures that perform hardware broadcast of
> TLB invalidations.
> 
> It is based heavily on the PowerPC implementation by Nick Piggin.

That's a helpful description above, thank you; and the patch looks
mostly good to me.  I took a look because I see time is running out,
and you're having trouble getting review of this one: I was hoping
to give you a quick acked-by, but cannot do so as yet.

Most of my remarks below are trivial comments on where it
needs a little more, to be presented as a generic implementation in
mm/gup.c.  And most come from comparing against an up-to-date version
of arch/x86/mm/gup.c: please do the same, I may have missed some.

It would be a pity to mess up your arm schedule for lack of linkage
to this one: maybe this patch can go in as is, and be fixed up a
litte later (that would be up to Andrew); or maybe you'll have
no trouble making the changes before the merge window; or maybe
this should just be kept with arm and arm64 for now (but thank
you for making the effort to give us a generic version).

Hugh

> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> Tested-by: Dann Frazier <dann.frazier@canonical.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  mm/Kconfig |   3 +
>  mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 281 insertions(+)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 886db21..0ceb8a5 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>  config HAVE_MEMBLOCK_PHYS_MAP
>  	boolean
>  
> +config HAVE_GENERIC_RCU_GUP

I'm not wild about that name (fast GUP does require that page tables
cannot be freed beneath it, and RCU freeing of page tables is one way
in which that can be guaranteed for this implementation); but I cannot
suggest a better, so let's stick with it.

> +	boolean
> +
>  config ARCH_DISCARD_MEMBLOCK
>  	boolean
>  
> diff --git a/mm/gup.c b/mm/gup.c
> index 91d044b..5e6f6cb 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -10,6 +10,10 @@
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
>  
> +#include <linux/sched.h>
> +#include <linux/rwsem.h>
> +#include <asm/pgtable.h>
> +
>  #include "internal.h"
>  
>  static struct page *no_page_table(struct vm_area_struct *vma,
> @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
>  	return page;
>  }
>  #endif /* CONFIG_ELF_CORE */
> +
> +#ifdef CONFIG_HAVE_GENERIC_RCU_GUP

This desperately needs a long comment explaining the assumptions made,
and what an architecture must supply and guarantee to use this option.

Maybe your commit message already provides a good enough comment (I
have not now re-read it in that light) and can simply be inserted here.
I don't think it needs to spell everything out, but it does need to
direct a maintainer to thinking through the appropriate issues.

> +
> +#ifdef __HAVE_ARCH_PTE_SPECIAL
> +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> +			 int write, struct page **pages, int *nr)
> +{
> +	pte_t *ptep, *ptem;
> +	int ret = 0;
> +
> +	ptem = ptep = pte_offset_map(&pmd, addr);
> +	do {
> +		pte_t pte = ACCESS_ONCE(*ptep);

Here is my only substantive criticism.  I don't know the arm architecture,
but my guess is that your LPAE has a similar problem to x86's PAE: that
the pte entry is bigger than the natural word size of the architecture,
and so cannot be safely accessed in one operation on SMP or PREEMPT -
there's a danger that you get mismatched top and bottom halves here.
And how serious that is depends upon the layout of the pte bits.

See comments on gup_get_pte() in arch/x86/mm/gup.c,
and pte_unmap_same() in mm/memory.c.

And even if arm's LPAE is safe, this is unsafe to present in generic
code, or not without a big comment that GENERIC_RCU_GUP should not be
used for such configs; or, better than a comment, a build time error
according to sizeof(pte_t).

(It turns out not to be a problem at pmd, pud and pgd level: IIRC
that's because the transitions at those levels are much more restricted,
limited to setting, then clearing on pagetable teardown - except for
the THP transitions which the local_irq_disable() guards against.)

Ah, enlightenment: arm (unlike arm64) does not __HAVE_ARCH_PTE_SPECIAL,
so this "dangerous" code won't be compiled in for it, it's only using
the stub below.  Well, you can see my point about needing more
comments, those would have saved me a LOT of time.

> +		struct page *page;
> +
> +		if (!pte_present(pte) || pte_special(pte)
> +			|| (write && !pte_write(pte)))

The " ||" at end of line above please.  And, more importantly,
we need a pte_numa() test in here nowadays, for generic use.

> +			goto pte_unmap;
> +
> +		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> +		page = pte_page(pte);
> +
> +		if (!page_cache_get_speculative(page))
> +			goto pte_unmap;
> +
> +		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> +			put_page(page);
> +			goto pte_unmap;
> +		}
> +
> +		pages[*nr] = page;
> +		(*nr)++;
> +
> +	} while (ptep++, addr += PAGE_SIZE, addr != end);
> +
> +	ret = 1;
> +
> +pte_unmap:
> +	pte_unmap(ptem);
> +	return ret;
> +}
> +#else
> +
> +/*
> + * If we can't determine whether or not a pte is special, then fail immediately
> + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
> + * to be special.

From that comment, I just thought it very weird that you were compiling
in any of this HAVE_GENERIC_RCU_GUP code in the !__HAVE_ARCH_PTE_SPECIAL
case.  But somewhere else, over in the 0/6, you have a very important
remark about futex on THP tail which makes sense of it: please add that
explanation here.

> + */
> +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,

checkpatch.pl is noisy about that line over 80 characters, whereas
you understandably prefer to keep the stub declaration just like the
main declaration.  Simply omit the " inline"?  The compiler should be
able to work that out for itself, and it doesn't matter if it cannot.

> +			 int write, struct page **pages, int *nr)
> +{
> +	return 0;
> +}
> +#endif /* __HAVE_ARCH_PTE_SPECIAL */
> +
> +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> +		unsigned long end, int write, struct page **pages, int *nr)
> +{
> +	struct page *head, *page, *tail;
> +	int refs;
> +
> +	if (write && !pmd_write(orig))
> +		return 0;
> +
> +	refs = 0;
> +	head = pmd_page(orig);
> +	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> +	tail = page;
> +	do {
> +		VM_BUG_ON(compound_head(page) != head);

VM_BUG_ON_PAGE() is the latest preference.

> +		pages[*nr] = page;
> +		(*nr)++;
> +		page++;
> +		refs++;
> +	} while (addr += PAGE_SIZE, addr != end);
> +
> +	if (!page_cache_add_speculative(head, refs)) {
> +		*nr -= refs;
> +		return 0;
> +	}
> +
> +	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> +		*nr -= refs;
> +		while (refs--)
> +			put_page(head);
> +		return 0;
> +	}
> +
> +	/*
> +	 * Any tail pages need their mapcount reference taken before we
> +	 * return. (This allows the THP code to bump their ref count when
> +	 * they are split into base pages).
> +	 */
> +	while (refs--) {
> +		if (PageTail(tail))
> +			get_huge_page_tail(tail);
> +		tail++;
> +	}
> +
> +	return 1;
> +}
> +
> +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> +		unsigned long end, int write, struct page **pages, int *nr)
> +{
> +	struct page *head, *page, *tail;
> +	int refs;
> +
> +	if (write && !pud_write(orig))
> +		return 0;
> +
> +	refs = 0;
> +	head = pud_page(orig);
> +	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> +	tail = page;
> +	do {
> +		VM_BUG_ON(compound_head(page) != head);

VM_BUG_ON_PAGE() is the latest preference.

> +		pages[*nr] = page;
> +		(*nr)++;
> +		page++;
> +		refs++;
> +	} while (addr += PAGE_SIZE, addr != end);
> +
> +	if (!page_cache_add_speculative(head, refs)) {
> +		*nr -= refs;
> +		return 0;
> +	}
> +
> +	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> +		*nr -= refs;
> +		while (refs--)
> +			put_page(head);
> +		return 0;
> +	}
> +
> +	while (refs--) {
> +		if (PageTail(tail))
> +			get_huge_page_tail(tail);
> +		tail++;
> +	}
> +
> +	return 1;
> +}
> +
> +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> +		int write, struct page **pages, int *nr)
> +{
> +	unsigned long next;
> +	pmd_t *pmdp;
> +
> +	pmdp = pmd_offset(&pud, addr);
> +	do {
> +		pmd_t pmd = ACCESS_ONCE(*pmdp);

I like to do it this way too, but checkpatch.pl prefers a blank line.

> +		next = pmd_addr_end(addr, end);
> +		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> +			return 0;
> +
> +		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {

I wonder if you spent any time pondering pmd_large() and whether to
use it here (and define it in arm): I have forgotten its relationship
to pmd_huge() and pmd_trans_huge(), and you are probably right to
steer clear of it.

A pmd_numa() test is needed here nowadays, for generic use.

> +			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
> +				pages, nr))
> +				return 0;
> +		} else {
> +			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
> +				return 0;
> +		}

You've chosen a different (indentation and else) style here from what
you use below in the very similar gup_pud_range(): it's easier to see
the differences if you keep the style the same, personally I prefer
how you did gup_pud_range().

> +	} while (pmdp++, addr = next, addr != end);
> +
> +	return 1;
> +}
> +
> +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
> +		int write, struct page **pages, int *nr)
> +{
> +	unsigned long next;
> +	pud_t *pudp;
> +
> +	pudp = pud_offset(pgdp, addr);
> +	do {
> +		pud_t pud = ACCESS_ONCE(*pudp);

I like to do it this way too, but checkpatch.pl prefers a blank line.

> +		next = pud_addr_end(addr, end);
> +		if (pud_none(pud))
> +			return 0;
> +		if (pud_huge(pud)) {

I wonder if you spent any time pondering pud_large() and whether to
use it here (and define it in arm): I have forgotten its relationship
to pud_huge(), and you are probably right to steer clear of it.

> +			if (!gup_huge_pud(pud, pudp, addr, next, write,
> +					pages, nr))
> +				return 0;
> +		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
> +			return 0;
> +	} while (pudp++, addr = next, addr != end);
> +
> +	return 1;
> +}
> +
> +/*
> + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
> + * back to the regular GUP.
> + */
> +int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> +			  struct page **pages)
> +{
> +	struct mm_struct *mm = current->mm;
> +	unsigned long addr, len, end;
> +	unsigned long next, flags;
> +	pgd_t *pgdp;
> +	int nr = 0;
> +
> +	start &= PAGE_MASK;
> +	addr = start;
> +	len = (unsigned long) nr_pages << PAGE_SHIFT;
> +	end = start + len;
> +
> +	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
> +					start, len)))
> +		return 0;
> +
> +	/*
> +	 * Disable interrupts, we use the nested form as we can already
> +	 * have interrupts disabled by get_futex_key.
> +	 *
> +	 * With interrupts disabled, we block page table pages from being
> +	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
> +	 * for more details.
> +	 *
> +	 * We do not adopt an rcu_read_lock(.) here as we also want to
> +	 * block IPIs that come from THPs splitting.
> +	 */
> +
> +	local_irq_save(flags);
> +	pgdp = pgd_offset(mm, addr);
> +	do {
> +		next = pgd_addr_end(addr, end);
> +		if (pgd_none(*pgdp))
> +			break;
> +		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
> +			break;
> +	} while (pgdp++, addr = next, addr != end);
> +	local_irq_restore(flags);
> +
> +	return nr;
> +}
> +

The x86 version has a comment on this interface:
it would be helpful to copy that here.

> +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> +			struct page **pages)
> +{
> +	struct mm_struct *mm = current->mm;
> +	int nr, ret;
> +
> +	start &= PAGE_MASK;
> +	nr = __get_user_pages_fast(start, nr_pages, write, pages);

The x86 version has a commit from Linus, avoiding the access_ok() check
in __get_user_pages_fast(): I confess I just did not spend long enough
trying to understand what that's about, and whether it would be
important to incorporate here.

> +	ret = nr;
> +
> +	if (nr < nr_pages) {
> +		/* Try to get the remaining pages with get_user_pages */
> +		start += nr << PAGE_SHIFT;
> +		pages += nr;
> +
> +		down_read(&mm->mmap_sem);
> +		ret = get_user_pages(current, mm, start,
> +				     nr_pages - nr, write, 0, pages, NULL);
> +		up_read(&mm->mmap_sem);
> +
> +		/* Have to be a bit careful with return values */
> +		if (nr > 0) {
> +			if (ret < 0)
> +				ret = nr;
> +			else
> +				ret += nr;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +#endif /* CONFIG_HAVE_GENERIC_RCU_GUP */
> -- 
> 1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-09-24 13:34     ` Hugh Dickins
  0 siblings, 0 replies; 43+ messages in thread
From: Hugh Dickins @ 2014-09-24 13:34 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm,
	akpm, will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman

On Thu, 28 Aug 2014, Steve Capper wrote:

> get_user_pages_fast attempts to pin user pages by walking the page
> tables directly and avoids taking locks. Thus the walker needs to be
> protected from page table pages being freed from under it, and needs
> to block any THP splits.
> 
> One way to achieve this is to have the walker disable interrupts, and
> rely on IPIs from the TLB flushing code blocking before the page table
> pages are freed.
> 
> On some platforms we have hardware broadcast of TLB invalidations, thus
> the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> spuriously broadcasting IPIs can hurt system performance if done too
> often.
> 
> This problem has been solved on PowerPC and Sparc by batching up page
> table pages belonging to more than one mm_user, then scheduling an
> rcu_sched callback to free the pages. This RCU page table free logic
> has been promoted to core code and is activated when one enables
> HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> their own get_user_pages_fast routines.
> 
> The RCU page table free logic coupled with a an IPI broadcast on THP
> split (which is a rare event), allows one to protect a page table
> walker by merely disabling the interrupts during the walk.
> 
> This patch provides a general RCU implementation of get_user_pages_fast
> that can be used by architectures that perform hardware broadcast of
> TLB invalidations.
> 
> It is based heavily on the PowerPC implementation by Nick Piggin.

That's a helpful description above, thank you; and the patch looks
mostly good to me.  I took a look because I see time is running out,
and you're having trouble getting review of this one: I was hoping
to give you a quick acked-by, but cannot do so as yet.

Most of my remarks below are trivial comments on where it
needs a little more, to be presented as a generic implementation in
mm/gup.c.  And most come from comparing against an up-to-date version
of arch/x86/mm/gup.c: please do the same, I may have missed some.

It would be a pity to mess up your arm schedule for lack of linkage
to this one: maybe this patch can go in as is, and be fixed up a
litte later (that would be up to Andrew); or maybe you'll have
no trouble making the changes before the merge window; or maybe
this should just be kept with arm and arm64 for now (but thank
you for making the effort to give us a generic version).

Hugh

> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> Tested-by: Dann Frazier <dann.frazier@canonical.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  mm/Kconfig |   3 +
>  mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 281 insertions(+)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 886db21..0ceb8a5 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>  config HAVE_MEMBLOCK_PHYS_MAP
>  	boolean
>  
> +config HAVE_GENERIC_RCU_GUP

I'm not wild about that name (fast GUP does require that page tables
cannot be freed beneath it, and RCU freeing of page tables is one way
in which that can be guaranteed for this implementation); but I cannot
suggest a better, so let's stick with it.

> +	boolean
> +
>  config ARCH_DISCARD_MEMBLOCK
>  	boolean
>  
> diff --git a/mm/gup.c b/mm/gup.c
> index 91d044b..5e6f6cb 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -10,6 +10,10 @@
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
>  
> +#include <linux/sched.h>
> +#include <linux/rwsem.h>
> +#include <asm/pgtable.h>
> +
>  #include "internal.h"
>  
>  static struct page *no_page_table(struct vm_area_struct *vma,
> @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
>  	return page;
>  }
>  #endif /* CONFIG_ELF_CORE */
> +
> +#ifdef CONFIG_HAVE_GENERIC_RCU_GUP

This desperately needs a long comment explaining the assumptions made,
and what an architecture must supply and guarantee to use this option.

Maybe your commit message already provides a good enough comment (I
have not now re-read it in that light) and can simply be inserted here.
I don't think it needs to spell everything out, but it does need to
direct a maintainer to thinking through the appropriate issues.

> +
> +#ifdef __HAVE_ARCH_PTE_SPECIAL
> +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> +			 int write, struct page **pages, int *nr)
> +{
> +	pte_t *ptep, *ptem;
> +	int ret = 0;
> +
> +	ptem = ptep = pte_offset_map(&pmd, addr);
> +	do {
> +		pte_t pte = ACCESS_ONCE(*ptep);

Here is my only substantive criticism.  I don't know the arm architecture,
but my guess is that your LPAE has a similar problem to x86's PAE: that
the pte entry is bigger than the natural word size of the architecture,
and so cannot be safely accessed in one operation on SMP or PREEMPT -
there's a danger that you get mismatched top and bottom halves here.
And how serious that is depends upon the layout of the pte bits.

See comments on gup_get_pte() in arch/x86/mm/gup.c,
and pte_unmap_same() in mm/memory.c.

And even if arm's LPAE is safe, this is unsafe to present in generic
code, or not without a big comment that GENERIC_RCU_GUP should not be
used for such configs; or, better than a comment, a build time error
according to sizeof(pte_t).

(It turns out not to be a problem at pmd, pud and pgd level: IIRC
that's because the transitions at those levels are much more restricted,
limited to setting, then clearing on pagetable teardown - except for
the THP transitions which the local_irq_disable() guards against.)

Ah, enlightenment: arm (unlike arm64) does not __HAVE_ARCH_PTE_SPECIAL,
so this "dangerous" code won't be compiled in for it, it's only using
the stub below.  Well, you can see my point about needing more
comments, those would have saved me a LOT of time.

> +		struct page *page;
> +
> +		if (!pte_present(pte) || pte_special(pte)
> +			|| (write && !pte_write(pte)))

The " ||" at end of line above please.  And, more importantly,
we need a pte_numa() test in here nowadays, for generic use.

> +			goto pte_unmap;
> +
> +		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> +		page = pte_page(pte);
> +
> +		if (!page_cache_get_speculative(page))
> +			goto pte_unmap;
> +
> +		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> +			put_page(page);
> +			goto pte_unmap;
> +		}
> +
> +		pages[*nr] = page;
> +		(*nr)++;
> +
> +	} while (ptep++, addr += PAGE_SIZE, addr != end);
> +
> +	ret = 1;
> +
> +pte_unmap:
> +	pte_unmap(ptem);
> +	return ret;
> +}
> +#else
> +
> +/*
> + * If we can't determine whether or not a pte is special, then fail immediately
> + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
> + * to be special.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-09-24 13:34     ` Hugh Dickins
  0 siblings, 0 replies; 43+ messages in thread
From: Hugh Dickins @ 2014-09-24 13:34 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm,
	akpm, will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman

On Thu, 28 Aug 2014, Steve Capper wrote:

> get_user_pages_fast attempts to pin user pages by walking the page
> tables directly and avoids taking locks. Thus the walker needs to be
> protected from page table pages being freed from under it, and needs
> to block any THP splits.
> 
> One way to achieve this is to have the walker disable interrupts, and
> rely on IPIs from the TLB flushing code blocking before the page table
> pages are freed.
> 
> On some platforms we have hardware broadcast of TLB invalidations, thus
> the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> spuriously broadcasting IPIs can hurt system performance if done too
> often.
> 
> This problem has been solved on PowerPC and Sparc by batching up page
> table pages belonging to more than one mm_user, then scheduling an
> rcu_sched callback to free the pages. This RCU page table free logic
> has been promoted to core code and is activated when one enables
> HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> their own get_user_pages_fast routines.
> 
> The RCU page table free logic coupled with a an IPI broadcast on THP
> split (which is a rare event), allows one to protect a page table
> walker by merely disabling the interrupts during the walk.
> 
> This patch provides a general RCU implementation of get_user_pages_fast
> that can be used by architectures that perform hardware broadcast of
> TLB invalidations.
> 
> It is based heavily on the PowerPC implementation by Nick Piggin.

That's a helpful description above, thank you; and the patch looks
mostly good to me.  I took a look because I see time is running out,
and you're having trouble getting review of this one: I was hoping
to give you a quick acked-by, but cannot do so as yet.

Most of my remarks below are trivial comments on where it
needs a little more, to be presented as a generic implementation in
mm/gup.c.  And most come from comparing against an up-to-date version
of arch/x86/mm/gup.c: please do the same, I may have missed some.

It would be a pity to mess up your arm schedule for lack of linkage
to this one: maybe this patch can go in as is, and be fixed up a
litte later (that would be up to Andrew); or maybe you'll have
no trouble making the changes before the merge window; or maybe
this should just be kept with arm and arm64 for now (but thank
you for making the effort to give us a generic version).

Hugh

> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> Tested-by: Dann Frazier <dann.frazier@canonical.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  mm/Kconfig |   3 +
>  mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 281 insertions(+)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 886db21..0ceb8a5 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>  config HAVE_MEMBLOCK_PHYS_MAP
>  	boolean
>  
> +config HAVE_GENERIC_RCU_GUP

I'm not wild about that name (fast GUP does require that page tables
cannot be freed beneath it, and RCU freeing of page tables is one way
in which that can be guaranteed for this implementation); but I cannot
suggest a better, so let's stick with it.

> +	boolean
> +
>  config ARCH_DISCARD_MEMBLOCK
>  	boolean
>  
> diff --git a/mm/gup.c b/mm/gup.c
> index 91d044b..5e6f6cb 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -10,6 +10,10 @@
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
>  
> +#include <linux/sched.h>
> +#include <linux/rwsem.h>
> +#include <asm/pgtable.h>
> +
>  #include "internal.h"
>  
>  static struct page *no_page_table(struct vm_area_struct *vma,
> @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
>  	return page;
>  }
>  #endif /* CONFIG_ELF_CORE */
> +
> +#ifdef CONFIG_HAVE_GENERIC_RCU_GUP

This desperately needs a long comment explaining the assumptions made,
and what an architecture must supply and guarantee to use this option.

Maybe your commit message already provides a good enough comment (I
have not now re-read it in that light) and can simply be inserted here.
I don't think it needs to spell everything out, but it does need to
direct a maintainer to thinking through the appropriate issues.

> +
> +#ifdef __HAVE_ARCH_PTE_SPECIAL
> +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> +			 int write, struct page **pages, int *nr)
> +{
> +	pte_t *ptep, *ptem;
> +	int ret = 0;
> +
> +	ptem = ptep = pte_offset_map(&pmd, addr);
> +	do {
> +		pte_t pte = ACCESS_ONCE(*ptep);

Here is my only substantive criticism.  I don't know the arm architecture,
but my guess is that your LPAE has a similar problem to x86's PAE: that
the pte entry is bigger than the natural word size of the architecture,
and so cannot be safely accessed in one operation on SMP or PREEMPT -
there's a danger that you get mismatched top and bottom halves here.
And how serious that is depends upon the layout of the pte bits.

See comments on gup_get_pte() in arch/x86/mm/gup.c,
and pte_unmap_same() in mm/memory.c.

And even if arm's LPAE is safe, this is unsafe to present in generic
code, or not without a big comment that GENERIC_RCU_GUP should not be
used for such configs; or, better than a comment, a build time error
according to sizeof(pte_t).

(It turns out not to be a problem at pmd, pud and pgd level: IIRC
that's because the transitions at those levels are much more restricted,
limited to setting, then clearing on pagetable teardown - except for
the THP transitions which the local_irq_disable() guards against.)

Ah, enlightenment: arm (unlike arm64) does not __HAVE_ARCH_PTE_SPECIAL,
so this "dangerous" code won't be compiled in for it, it's only using
the stub below.  Well, you can see my point about needing more
comments, those would have saved me a LOT of time.

> +		struct page *page;
> +
> +		if (!pte_present(pte) || pte_special(pte)
> +			|| (write && !pte_write(pte)))

The " ||" at end of line above please.  And, more importantly,
we need a pte_numa() test in here nowadays, for generic use.

> +			goto pte_unmap;
> +
> +		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> +		page = pte_page(pte);
> +
> +		if (!page_cache_get_speculative(page))
> +			goto pte_unmap;
> +
> +		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> +			put_page(page);
> +			goto pte_unmap;
> +		}
> +
> +		pages[*nr] = page;
> +		(*nr)++;
> +
> +	} while (ptep++, addr += PAGE_SIZE, addr != end);
> +
> +	ret = 1;
> +
> +pte_unmap:
> +	pte_unmap(ptem);
> +	return ret;
> +}
> +#else
> +
> +/*
> + * If we can't determine whether or not a pte is special, then fail immediately
> + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
> + * to be special.

>From that comment, I just thought it very weird that you were compiling
in any of this HAVE_GENERIC_RCU_GUP code in the !__HAVE_ARCH_PTE_SPECIAL
case.  But somewhere else, over in the 0/6, you have a very important
remark about futex on THP tail which makes sense of it: please add that
explanation here.

> + */
> +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,

checkpatch.pl is noisy about that line over 80 characters, whereas
you understandably prefer to keep the stub declaration just like the
main declaration.  Simply omit the " inline"?  The compiler should be
able to work that out for itself, and it doesn't matter if it cannot.

> +			 int write, struct page **pages, int *nr)
> +{
> +	return 0;
> +}
> +#endif /* __HAVE_ARCH_PTE_SPECIAL */
> +
> +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> +		unsigned long end, int write, struct page **pages, int *nr)
> +{
> +	struct page *head, *page, *tail;
> +	int refs;
> +
> +	if (write && !pmd_write(orig))
> +		return 0;
> +
> +	refs = 0;
> +	head = pmd_page(orig);
> +	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> +	tail = page;
> +	do {
> +		VM_BUG_ON(compound_head(page) != head);

VM_BUG_ON_PAGE() is the latest preference.

> +		pages[*nr] = page;
> +		(*nr)++;
> +		page++;
> +		refs++;
> +	} while (addr += PAGE_SIZE, addr != end);
> +
> +	if (!page_cache_add_speculative(head, refs)) {
> +		*nr -= refs;
> +		return 0;
> +	}
> +
> +	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> +		*nr -= refs;
> +		while (refs--)
> +			put_page(head);
> +		return 0;
> +	}
> +
> +	/*
> +	 * Any tail pages need their mapcount reference taken before we
> +	 * return. (This allows the THP code to bump their ref count when
> +	 * they are split into base pages).
> +	 */
> +	while (refs--) {
> +		if (PageTail(tail))
> +			get_huge_page_tail(tail);
> +		tail++;
> +	}
> +
> +	return 1;
> +}
> +
> +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> +		unsigned long end, int write, struct page **pages, int *nr)
> +{
> +	struct page *head, *page, *tail;
> +	int refs;
> +
> +	if (write && !pud_write(orig))
> +		return 0;
> +
> +	refs = 0;
> +	head = pud_page(orig);
> +	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> +	tail = page;
> +	do {
> +		VM_BUG_ON(compound_head(page) != head);

VM_BUG_ON_PAGE() is the latest preference.

> +		pages[*nr] = page;
> +		(*nr)++;
> +		page++;
> +		refs++;
> +	} while (addr += PAGE_SIZE, addr != end);
> +
> +	if (!page_cache_add_speculative(head, refs)) {
> +		*nr -= refs;
> +		return 0;
> +	}
> +
> +	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> +		*nr -= refs;
> +		while (refs--)
> +			put_page(head);
> +		return 0;
> +	}
> +
> +	while (refs--) {
> +		if (PageTail(tail))
> +			get_huge_page_tail(tail);
> +		tail++;
> +	}
> +
> +	return 1;
> +}
> +
> +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> +		int write, struct page **pages, int *nr)
> +{
> +	unsigned long next;
> +	pmd_t *pmdp;
> +
> +	pmdp = pmd_offset(&pud, addr);
> +	do {
> +		pmd_t pmd = ACCESS_ONCE(*pmdp);

I like to do it this way too, but checkpatch.pl prefers a blank line.

> +		next = pmd_addr_end(addr, end);
> +		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> +			return 0;
> +
> +		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {

I wonder if you spent any time pondering pmd_large() and whether to
use it here (and define it in arm): I have forgotten its relationship
to pmd_huge() and pmd_trans_huge(), and you are probably right to
steer clear of it.

A pmd_numa() test is needed here nowadays, for generic use.

> +			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
> +				pages, nr))
> +				return 0;
> +		} else {
> +			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
> +				return 0;
> +		}

You've chosen a different (indentation and else) style here from what
you use below in the very similar gup_pud_range(): it's easier to see
the differences if you keep the style the same, personally I prefer
how you did gup_pud_range().

> +	} while (pmdp++, addr = next, addr != end);
> +
> +	return 1;
> +}
> +
> +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
> +		int write, struct page **pages, int *nr)
> +{
> +	unsigned long next;
> +	pud_t *pudp;
> +
> +	pudp = pud_offset(pgdp, addr);
> +	do {
> +		pud_t pud = ACCESS_ONCE(*pudp);

I like to do it this way too, but checkpatch.pl prefers a blank line.

> +		next = pud_addr_end(addr, end);
> +		if (pud_none(pud))
> +			return 0;
> +		if (pud_huge(pud)) {

I wonder if you spent any time pondering pud_large() and whether to
use it here (and define it in arm): I have forgotten its relationship
to pud_huge(), and you are probably right to steer clear of it.

> +			if (!gup_huge_pud(pud, pudp, addr, next, write,
> +					pages, nr))
> +				return 0;
> +		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
> +			return 0;
> +	} while (pudp++, addr = next, addr != end);
> +
> +	return 1;
> +}
> +
> +/*
> + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
> + * back to the regular GUP.
> + */
> +int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> +			  struct page **pages)
> +{
> +	struct mm_struct *mm = current->mm;
> +	unsigned long addr, len, end;
> +	unsigned long next, flags;
> +	pgd_t *pgdp;
> +	int nr = 0;
> +
> +	start &= PAGE_MASK;
> +	addr = start;
> +	len = (unsigned long) nr_pages << PAGE_SHIFT;
> +	end = start + len;
> +
> +	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
> +					start, len)))
> +		return 0;
> +
> +	/*
> +	 * Disable interrupts, we use the nested form as we can already
> +	 * have interrupts disabled by get_futex_key.
> +	 *
> +	 * With interrupts disabled, we block page table pages from being
> +	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
> +	 * for more details.
> +	 *
> +	 * We do not adopt an rcu_read_lock(.) here as we also want to
> +	 * block IPIs that come from THPs splitting.
> +	 */
> +
> +	local_irq_save(flags);
> +	pgdp = pgd_offset(mm, addr);
> +	do {
> +		next = pgd_addr_end(addr, end);
> +		if (pgd_none(*pgdp))
> +			break;
> +		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
> +			break;
> +	} while (pgdp++, addr = next, addr != end);
> +	local_irq_restore(flags);
> +
> +	return nr;
> +}
> +

The x86 version has a comment on this interface:
it would be helpful to copy that here.

> +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> +			struct page **pages)
> +{
> +	struct mm_struct *mm = current->mm;
> +	int nr, ret;
> +
> +	start &= PAGE_MASK;
> +	nr = __get_user_pages_fast(start, nr_pages, write, pages);

The x86 version has a commit from Linus, avoiding the access_ok() check
in __get_user_pages_fast(): I confess I just did not spend long enough
trying to understand what that's about, and whether it would be
important to incorporate here.

> +	ret = nr;
> +
> +	if (nr < nr_pages) {
> +		/* Try to get the remaining pages with get_user_pages */
> +		start += nr << PAGE_SHIFT;
> +		pages += nr;
> +
> +		down_read(&mm->mmap_sem);
> +		ret = get_user_pages(current, mm, start,
> +				     nr_pages - nr, write, 0, pages, NULL);
> +		up_read(&mm->mmap_sem);
> +
> +		/* Have to be a bit careful with return values */
> +		if (nr > 0) {
> +			if (ret < 0)
> +				ret = nr;
> +			else
> +				ret += nr;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +#endif /* CONFIG_HAVE_GENERIC_RCU_GUP */
> -- 
> 1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-09-24 13:34     ` Hugh Dickins
  0 siblings, 0 replies; 43+ messages in thread
From: Hugh Dickins @ 2014-09-24 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 28 Aug 2014, Steve Capper wrote:

> get_user_pages_fast attempts to pin user pages by walking the page
> tables directly and avoids taking locks. Thus the walker needs to be
> protected from page table pages being freed from under it, and needs
> to block any THP splits.
> 
> One way to achieve this is to have the walker disable interrupts, and
> rely on IPIs from the TLB flushing code blocking before the page table
> pages are freed.
> 
> On some platforms we have hardware broadcast of TLB invalidations, thus
> the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> spuriously broadcasting IPIs can hurt system performance if done too
> often.
> 
> This problem has been solved on PowerPC and Sparc by batching up page
> table pages belonging to more than one mm_user, then scheduling an
> rcu_sched callback to free the pages. This RCU page table free logic
> has been promoted to core code and is activated when one enables
> HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> their own get_user_pages_fast routines.
> 
> The RCU page table free logic coupled with a an IPI broadcast on THP
> split (which is a rare event), allows one to protect a page table
> walker by merely disabling the interrupts during the walk.
> 
> This patch provides a general RCU implementation of get_user_pages_fast
> that can be used by architectures that perform hardware broadcast of
> TLB invalidations.
> 
> It is based heavily on the PowerPC implementation by Nick Piggin.

That's a helpful description above, thank you; and the patch looks
mostly good to me.  I took a look because I see time is running out,
and you're having trouble getting review of this one: I was hoping
to give you a quick acked-by, but cannot do so as yet.

Most of my remarks below are trivial comments on where it
needs a little more, to be presented as a generic implementation in
mm/gup.c.  And most come from comparing against an up-to-date version
of arch/x86/mm/gup.c: please do the same, I may have missed some.

It would be a pity to mess up your arm schedule for lack of linkage
to this one: maybe this patch can go in as is, and be fixed up a
litte later (that would be up to Andrew); or maybe you'll have
no trouble making the changes before the merge window; or maybe
this should just be kept with arm and arm64 for now (but thank
you for making the effort to give us a generic version).

Hugh

> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> Tested-by: Dann Frazier <dann.frazier@canonical.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  mm/Kconfig |   3 +
>  mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 281 insertions(+)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 886db21..0ceb8a5 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>  config HAVE_MEMBLOCK_PHYS_MAP
>  	boolean
>  
> +config HAVE_GENERIC_RCU_GUP

I'm not wild about that name (fast GUP does require that page tables
cannot be freed beneath it, and RCU freeing of page tables is one way
in which that can be guaranteed for this implementation); but I cannot
suggest a better, so let's stick with it.

> +	boolean
> +
>  config ARCH_DISCARD_MEMBLOCK
>  	boolean
>  
> diff --git a/mm/gup.c b/mm/gup.c
> index 91d044b..5e6f6cb 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -10,6 +10,10 @@
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
>  
> +#include <linux/sched.h>
> +#include <linux/rwsem.h>
> +#include <asm/pgtable.h>
> +
>  #include "internal.h"
>  
>  static struct page *no_page_table(struct vm_area_struct *vma,
> @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
>  	return page;
>  }
>  #endif /* CONFIG_ELF_CORE */
> +
> +#ifdef CONFIG_HAVE_GENERIC_RCU_GUP

This desperately needs a long comment explaining the assumptions made,
and what an architecture must supply and guarantee to use this option.

Maybe your commit message already provides a good enough comment (I
have not now re-read it in that light) and can simply be inserted here.
I don't think it needs to spell everything out, but it does need to
direct a maintainer to thinking through the appropriate issues.

> +
> +#ifdef __HAVE_ARCH_PTE_SPECIAL
> +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> +			 int write, struct page **pages, int *nr)
> +{
> +	pte_t *ptep, *ptem;
> +	int ret = 0;
> +
> +	ptem = ptep = pte_offset_map(&pmd, addr);
> +	do {
> +		pte_t pte = ACCESS_ONCE(*ptep);

Here is my only substantive criticism.  I don't know the arm architecture,
but my guess is that your LPAE has a similar problem to x86's PAE: that
the pte entry is bigger than the natural word size of the architecture,
and so cannot be safely accessed in one operation on SMP or PREEMPT -
there's a danger that you get mismatched top and bottom halves here.
And how serious that is depends upon the layout of the pte bits.

See comments on gup_get_pte() in arch/x86/mm/gup.c,
and pte_unmap_same() in mm/memory.c.

And even if arm's LPAE is safe, this is unsafe to present in generic
code, or not without a big comment that GENERIC_RCU_GUP should not be
used for such configs; or, better than a comment, a build time error
according to sizeof(pte_t).

(It turns out not to be a problem at pmd, pud and pgd level: IIRC
that's because the transitions at those levels are much more restricted,
limited to setting, then clearing on pagetable teardown - except for
the THP transitions which the local_irq_disable() guards against.)

Ah, enlightenment: arm (unlike arm64) does not __HAVE_ARCH_PTE_SPECIAL,
so this "dangerous" code won't be compiled in for it, it's only using
the stub below.  Well, you can see my point about needing more
comments, those would have saved me a LOT of time.

> +		struct page *page;
> +
> +		if (!pte_present(pte) || pte_special(pte)
> +			|| (write && !pte_write(pte)))

The " ||" at end of line above please.  And, more importantly,
we need a pte_numa() test in here nowadays, for generic use.

> +			goto pte_unmap;
> +
> +		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> +		page = pte_page(pte);
> +
> +		if (!page_cache_get_speculative(page))
> +			goto pte_unmap;
> +
> +		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> +			put_page(page);
> +			goto pte_unmap;
> +		}
> +
> +		pages[*nr] = page;
> +		(*nr)++;
> +
> +	} while (ptep++, addr += PAGE_SIZE, addr != end);
> +
> +	ret = 1;
> +
> +pte_unmap:
> +	pte_unmap(ptem);
> +	return ret;
> +}
> +#else
> +
> +/*
> + * If we can't determine whether or not a pte is special, then fail immediately
> + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
> + * to be special.

>From that comment, I just thought it very weird that you were compiling
in any of this HAVE_GENERIC_RCU_GUP code in the !__HAVE_ARCH_PTE_SPECIAL
case.  But somewhere else, over in the 0/6, you have a very important
remark about futex on THP tail which makes sense of it: please add that
explanation here.

> + */
> +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,

checkpatch.pl is noisy about that line over 80 characters, whereas
you understandably prefer to keep the stub declaration just like the
main declaration.  Simply omit the " inline"?  The compiler should be
able to work that out for itself, and it doesn't matter if it cannot.

> +			 int write, struct page **pages, int *nr)
> +{
> +	return 0;
> +}
> +#endif /* __HAVE_ARCH_PTE_SPECIAL */
> +
> +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> +		unsigned long end, int write, struct page **pages, int *nr)
> +{
> +	struct page *head, *page, *tail;
> +	int refs;
> +
> +	if (write && !pmd_write(orig))
> +		return 0;
> +
> +	refs = 0;
> +	head = pmd_page(orig);
> +	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> +	tail = page;
> +	do {
> +		VM_BUG_ON(compound_head(page) != head);

VM_BUG_ON_PAGE() is the latest preference.

> +		pages[*nr] = page;
> +		(*nr)++;
> +		page++;
> +		refs++;
> +	} while (addr += PAGE_SIZE, addr != end);
> +
> +	if (!page_cache_add_speculative(head, refs)) {
> +		*nr -= refs;
> +		return 0;
> +	}
> +
> +	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> +		*nr -= refs;
> +		while (refs--)
> +			put_page(head);
> +		return 0;
> +	}
> +
> +	/*
> +	 * Any tail pages need their mapcount reference taken before we
> +	 * return. (This allows the THP code to bump their ref count when
> +	 * they are split into base pages).
> +	 */
> +	while (refs--) {
> +		if (PageTail(tail))
> +			get_huge_page_tail(tail);
> +		tail++;
> +	}
> +
> +	return 1;
> +}
> +
> +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> +		unsigned long end, int write, struct page **pages, int *nr)
> +{
> +	struct page *head, *page, *tail;
> +	int refs;
> +
> +	if (write && !pud_write(orig))
> +		return 0;
> +
> +	refs = 0;
> +	head = pud_page(orig);
> +	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> +	tail = page;
> +	do {
> +		VM_BUG_ON(compound_head(page) != head);

VM_BUG_ON_PAGE() is the latest preference.

> +		pages[*nr] = page;
> +		(*nr)++;
> +		page++;
> +		refs++;
> +	} while (addr += PAGE_SIZE, addr != end);
> +
> +	if (!page_cache_add_speculative(head, refs)) {
> +		*nr -= refs;
> +		return 0;
> +	}
> +
> +	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> +		*nr -= refs;
> +		while (refs--)
> +			put_page(head);
> +		return 0;
> +	}
> +
> +	while (refs--) {
> +		if (PageTail(tail))
> +			get_huge_page_tail(tail);
> +		tail++;
> +	}
> +
> +	return 1;
> +}
> +
> +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> +		int write, struct page **pages, int *nr)
> +{
> +	unsigned long next;
> +	pmd_t *pmdp;
> +
> +	pmdp = pmd_offset(&pud, addr);
> +	do {
> +		pmd_t pmd = ACCESS_ONCE(*pmdp);

I like to do it this way too, but checkpatch.pl prefers a blank line.

> +		next = pmd_addr_end(addr, end);
> +		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> +			return 0;
> +
> +		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {

I wonder if you spent any time pondering pmd_large() and whether to
use it here (and define it in arm): I have forgotten its relationship
to pmd_huge() and pmd_trans_huge(), and you are probably right to
steer clear of it.

A pmd_numa() test is needed here nowadays, for generic use.

> +			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
> +				pages, nr))
> +				return 0;
> +		} else {
> +			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
> +				return 0;
> +		}

You've chosen a different (indentation and else) style here from what
you use below in the very similar gup_pud_range(): it's easier to see
the differences if you keep the style the same, personally I prefer
how you did gup_pud_range().

> +	} while (pmdp++, addr = next, addr != end);
> +
> +	return 1;
> +}
> +
> +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
> +		int write, struct page **pages, int *nr)
> +{
> +	unsigned long next;
> +	pud_t *pudp;
> +
> +	pudp = pud_offset(pgdp, addr);
> +	do {
> +		pud_t pud = ACCESS_ONCE(*pudp);

I like to do it this way too, but checkpatch.pl prefers a blank line.

> +		next = pud_addr_end(addr, end);
> +		if (pud_none(pud))
> +			return 0;
> +		if (pud_huge(pud)) {

I wonder if you spent any time pondering pud_large() and whether to
use it here (and define it in arm): I have forgotten its relationship
to pud_huge(), and you are probably right to steer clear of it.

> +			if (!gup_huge_pud(pud, pudp, addr, next, write,
> +					pages, nr))
> +				return 0;
> +		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
> +			return 0;
> +	} while (pudp++, addr = next, addr != end);
> +
> +	return 1;
> +}
> +
> +/*
> + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
> + * back to the regular GUP.
> + */
> +int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> +			  struct page **pages)
> +{
> +	struct mm_struct *mm = current->mm;
> +	unsigned long addr, len, end;
> +	unsigned long next, flags;
> +	pgd_t *pgdp;
> +	int nr = 0;
> +
> +	start &= PAGE_MASK;
> +	addr = start;
> +	len = (unsigned long) nr_pages << PAGE_SHIFT;
> +	end = start + len;
> +
> +	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
> +					start, len)))
> +		return 0;
> +
> +	/*
> +	 * Disable interrupts, we use the nested form as we can already
> +	 * have interrupts disabled by get_futex_key.
> +	 *
> +	 * With interrupts disabled, we block page table pages from being
> +	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
> +	 * for more details.
> +	 *
> +	 * We do not adopt an rcu_read_lock(.) here as we also want to
> +	 * block IPIs that come from THPs splitting.
> +	 */
> +
> +	local_irq_save(flags);
> +	pgdp = pgd_offset(mm, addr);
> +	do {
> +		next = pgd_addr_end(addr, end);
> +		if (pgd_none(*pgdp))
> +			break;
> +		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
> +			break;
> +	} while (pgdp++, addr = next, addr != end);
> +	local_irq_restore(flags);
> +
> +	return nr;
> +}
> +

The x86 version has a comment on this interface:
it would be helpful to copy that here.

> +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> +			struct page **pages)
> +{
> +	struct mm_struct *mm = current->mm;
> +	int nr, ret;
> +
> +	start &= PAGE_MASK;
> +	nr = __get_user_pages_fast(start, nr_pages, write, pages);

The x86 version has a commit from Linus, avoiding the access_ok() check
in __get_user_pages_fast(): I confess I just did not spend long enough
trying to understand what that's about, and whether it would be
important to incorporate here.

> +	ret = nr;
> +
> +	if (nr < nr_pages) {
> +		/* Try to get the remaining pages with get_user_pages */
> +		start += nr << PAGE_SHIFT;
> +		pages += nr;
> +
> +		down_read(&mm->mmap_sem);
> +		ret = get_user_pages(current, mm, start,
> +				     nr_pages - nr, write, 0, pages, NULL);
> +		up_read(&mm->mmap_sem);
> +
> +		/* Have to be a bit careful with return values */
> +		if (nr > 0) {
> +			if (ret < 0)
> +				ret = nr;
> +			else
> +				ret += nr;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +#endif /* CONFIG_HAVE_GENERIC_RCU_GUP */
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-09-24 15:57       ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-24 15:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm,
	akpm, will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman

On Wed, Sep 24, 2014 at 06:34:56AM -0700, Hugh Dickins wrote:
> On Thu, 28 Aug 2014, Steve Capper wrote:
> 
> > get_user_pages_fast attempts to pin user pages by walking the page
> > tables directly and avoids taking locks. Thus the walker needs to be
> > protected from page table pages being freed from under it, and needs
> > to block any THP splits.
> > 
> > One way to achieve this is to have the walker disable interrupts, and
> > rely on IPIs from the TLB flushing code blocking before the page table
> > pages are freed.
> > 
> > On some platforms we have hardware broadcast of TLB invalidations, thus
> > the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> > spuriously broadcasting IPIs can hurt system performance if done too
> > often.
> > 
> > This problem has been solved on PowerPC and Sparc by batching up page
> > table pages belonging to more than one mm_user, then scheduling an
> > rcu_sched callback to free the pages. This RCU page table free logic
> > has been promoted to core code and is activated when one enables
> > HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> > their own get_user_pages_fast routines.
> > 
> > The RCU page table free logic coupled with a an IPI broadcast on THP
> > split (which is a rare event), allows one to protect a page table
> > walker by merely disabling the interrupts during the walk.
> > 
> > This patch provides a general RCU implementation of get_user_pages_fast
> > that can be used by architectures that perform hardware broadcast of
> > TLB invalidations.
> > 
> > It is based heavily on the PowerPC implementation by Nick Piggin.
> 
> That's a helpful description above, thank you; and the patch looks
> mostly good to me.  I took a look because I see time is running out,
> and you're having trouble getting review of this one: I was hoping
> to give you a quick acked-by, but cannot do so as yet.
> 
> Most of my remarks below are trivial comments on where it
> needs a little more, to be presented as a generic implementation in
> mm/gup.c.  And most come from comparing against an up-to-date version
> of arch/x86/mm/gup.c: please do the same, I may have missed some.
> 
> It would be a pity to mess up your arm schedule for lack of linkage
> to this one: maybe this patch can go in as is, and be fixed up a
> litte later (that would be up to Andrew); or maybe you'll have
> no trouble making the changes before the merge window; or maybe
> this should just be kept with arm and arm64 for now (but thank
> you for making the effort to give us a generic version).
> 
> Hugh

Hi Hugh,
A big thank you for taking a look at this.

> 
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > Tested-by: Dann Frazier <dann.frazier@canonical.com>
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  mm/Kconfig |   3 +
> >  mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 281 insertions(+)
> > 
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 886db21..0ceb8a5 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
> >  config HAVE_MEMBLOCK_PHYS_MAP
> >  	boolean
> >  
> > +config HAVE_GENERIC_RCU_GUP
> 
> I'm not wild about that name (fast GUP does require that page tables
> cannot be freed beneath it, and RCU freeing of page tables is one way
> in which that can be guaranteed for this implementation); but I cannot
> suggest a better, so let's stick with it.
> 

Yeah, we couldn't think of a better one. :-(

> > +	boolean
> > +
> >  config ARCH_DISCARD_MEMBLOCK
> >  	boolean
> >  
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 91d044b..5e6f6cb 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -10,6 +10,10 @@
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> >  
> > +#include <linux/sched.h>
> > +#include <linux/rwsem.h>
> > +#include <asm/pgtable.h>
> > +
> >  #include "internal.h"
> >  
> >  static struct page *no_page_table(struct vm_area_struct *vma,
> > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
> >  	return page;
> >  }
> >  #endif /* CONFIG_ELF_CORE */
> > +
> > +#ifdef CONFIG_HAVE_GENERIC_RCU_GUP
> 
> This desperately needs a long comment explaining the assumptions made,
> and what an architecture must supply and guarantee to use this option.
> 
> Maybe your commit message already provides a good enough comment (I
> have not now re-read it in that light) and can simply be inserted here.
> I don't think it needs to spell everything out, but it does need to
> direct a maintainer to thinking through the appropriate issues.

Agreed, I think a summary of the logic and the pre-requisites in a
comment block will make this a lot easier to adopt.

> 
> > +
> > +#ifdef __HAVE_ARCH_PTE_SPECIAL
> > +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> > +			 int write, struct page **pages, int *nr)
> > +{
> > +	pte_t *ptep, *ptem;
> > +	int ret = 0;
> > +
> > +	ptem = ptep = pte_offset_map(&pmd, addr);
> > +	do {
> > +		pte_t pte = ACCESS_ONCE(*ptep);
> 
> Here is my only substantive criticism.  I don't know the arm architecture,
> but my guess is that your LPAE has a similar problem to x86's PAE: that
> the pte entry is bigger than the natural word size of the architecture,
> and so cannot be safely accessed in one operation on SMP or PREEMPT -
> there's a danger that you get mismatched top and bottom halves here.
> And how serious that is depends upon the layout of the pte bits.
> 
> See comments on gup_get_pte() in arch/x86/mm/gup.c,
> and pte_unmap_same() in mm/memory.c.

Thanks, on ARM platforms with LPAE support this will be okay as 64-bit
single-copy atomicity is guaranteed for reading pagetable entries.

> 
> And even if arm's LPAE is safe, this is unsafe to present in generic
> code, or not without a big comment that GENERIC_RCU_GUP should not be
> used for such configs; or, better than a comment, a build time error
> according to sizeof(pte_t).
> 

I was thinking of introducing something like: ARCH_HAS_ATOMIC64_PTE_READS,
then putting in some compiler logic; it looked overkill to me.

Then I thought of adding a comment to this line of code and explicitly
adding a pre-requisite to the comments block that I'm about to add before
#ifdef CONFIG_HAVE_GENERIC_RCU_GUP
Hopefully that'll be okay.

> (It turns out not to be a problem at pmd, pud and pgd level: IIRC
> that's because the transitions at those levels are much more restricted,
> limited to setting, then clearing on pagetable teardown - except for
> the THP transitions which the local_irq_disable() guards against.)
> 
> Ah, enlightenment: arm (unlike arm64) does not __HAVE_ARCH_PTE_SPECIAL,
> so this "dangerous" code won't be compiled in for it, it's only using
> the stub below.  Well, you can see my point about needing more
> comments, those would have saved me a LOT of time.
> 

This is so we can cover the futex on THP tail case without the need for
__HAVE_ARCH_PTE_SPECIAL.

> > +		struct page *page;
> > +
> > +		if (!pte_present(pte) || pte_special(pte)
> > +			|| (write && !pte_write(pte)))
> 
> The " ||" at end of line above please.  And, more importantly,
> we need a pte_numa() test in here nowadays, for generic use.
> 
Will do.
Ahh, okay, apologies I didn't spot pte_numa tests being introduced.
I will check for other changes.

> > +			goto pte_unmap;
> > +
> > +		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> > +		page = pte_page(pte);
> > +
> > +		if (!page_cache_get_speculative(page))
> > +			goto pte_unmap;
> > +
> > +		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> > +			put_page(page);
> > +			goto pte_unmap;
> > +		}
> > +
> > +		pages[*nr] = page;
> > +		(*nr)++;
> > +
> > +	} while (ptep++, addr += PAGE_SIZE, addr != end);
> > +
> > +	ret = 1;
> > +
> > +pte_unmap:
> > +	pte_unmap(ptem);
> > +	return ret;
> > +}
> > +#else
> > +
> > +/*
> > + * If we can't determine whether or not a pte is special, then fail immediately
> > + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
> > + * to be special.
> 
> From that comment, I just thought it very weird that you were compiling
> in any of this HAVE_GENERIC_RCU_GUP code in the !__HAVE_ARCH_PTE_SPECIAL
> case.  But somewhere else, over in the 0/6, you have a very important
> remark about futex on THP tail which makes sense of it: please add that
> explanation here.

Sure thing, thanks.

> 
> > + */
> > +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> 
> checkpatch.pl is noisy about that line over 80 characters, whereas
> you understandably prefer to keep the stub declaration just like the
> main declaration.  Simply omit the " inline"?  The compiler should be
> able to work that out for itself, and it doesn't matter if it cannot.
> 

Okay, thanks.

> > +			 int write, struct page **pages, int *nr)
> > +{
> > +	return 0;
> > +}
> > +#endif /* __HAVE_ARCH_PTE_SPECIAL */
> > +
> > +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> > +		unsigned long end, int write, struct page **pages, int *nr)
> > +{
> > +	struct page *head, *page, *tail;
> > +	int refs;
> > +
> > +	if (write && !pmd_write(orig))
> > +		return 0;
> > +
> > +	refs = 0;
> > +	head = pmd_page(orig);
> > +	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> > +	tail = page;
> > +	do {
> > +		VM_BUG_ON(compound_head(page) != head);
> 
> VM_BUG_ON_PAGE() is the latest preference.

Cheers, I will update this...

> 
> > +		pages[*nr] = page;
> > +		(*nr)++;
> > +		page++;
> > +		refs++;
> > +	} while (addr += PAGE_SIZE, addr != end);
> > +
> > +	if (!page_cache_add_speculative(head, refs)) {
> > +		*nr -= refs;
> > +		return 0;
> > +	}
> > +
> > +	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> > +		*nr -= refs;
> > +		while (refs--)
> > +			put_page(head);
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * Any tail pages need their mapcount reference taken before we
> > +	 * return. (This allows the THP code to bump their ref count when
> > +	 * they are split into base pages).
> > +	 */
> > +	while (refs--) {
> > +		if (PageTail(tail))
> > +			get_huge_page_tail(tail);
> > +		tail++;
> > +	}
> > +
> > +	return 1;
> > +}
> > +
> > +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> > +		unsigned long end, int write, struct page **pages, int *nr)
> > +{
> > +	struct page *head, *page, *tail;
> > +	int refs;
> > +
> > +	if (write && !pud_write(orig))
> > +		return 0;
> > +
> > +	refs = 0;
> > +	head = pud_page(orig);
> > +	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> > +	tail = page;
> > +	do {
> > +		VM_BUG_ON(compound_head(page) != head);
> 
> VM_BUG_ON_PAGE() is the latest preference.
> 

... and that :-).

> > +		pages[*nr] = page;
> > +		(*nr)++;
> > +		page++;
> > +		refs++;
> > +	} while (addr += PAGE_SIZE, addr != end);
> > +
> > +	if (!page_cache_add_speculative(head, refs)) {
> > +		*nr -= refs;
> > +		return 0;
> > +	}
> > +
> > +	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> > +		*nr -= refs;
> > +		while (refs--)
> > +			put_page(head);
> > +		return 0;
> > +	}
> > +
> > +	while (refs--) {
> > +		if (PageTail(tail))
> > +			get_huge_page_tail(tail);
> > +		tail++;
> > +	}
> > +
> > +	return 1;
> > +}
> > +
> > +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> > +		int write, struct page **pages, int *nr)
> > +{
> > +	unsigned long next;
> > +	pmd_t *pmdp;
> > +
> > +	pmdp = pmd_offset(&pud, addr);
> > +	do {
> > +		pmd_t pmd = ACCESS_ONCE(*pmdp);
> 
> I like to do it this way too, but checkpatch.pl prefers a blank line.
> 

Okay, I will add a blank line.

> > +		next = pmd_addr_end(addr, end);
> > +		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> > +			return 0;
> > +
> > +		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
> 
> I wonder if you spent any time pondering pmd_large() and whether to
> use it here (and define it in arm): I have forgotten its relationship
> to pmd_huge() and pmd_trans_huge(), and you are probably right to
> steer clear of it.

pmd_large is only defined by a few architectures, I opted for
generality and clarity.

> 
> A pmd_numa() test is needed here nowadays, for generic use.
> 

Thanks, I will add the logic.

> > +			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
> > +				pages, nr))
> > +				return 0;
> > +		} else {
> > +			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
> > +				return 0;
> > +		}
> 
> You've chosen a different (indentation and else) style here from what
> you use below in the very similar gup_pud_range(): it's easier to see
> the differences if you keep the style the same, personally I prefer
> how you did gup_pud_range().

Okay, I will re-structure.

> 
> > +	} while (pmdp++, addr = next, addr != end);
> > +
> > +	return 1;
> > +}
> > +
> > +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
> > +		int write, struct page **pages, int *nr)
> > +{
> > +	unsigned long next;
> > +	pud_t *pudp;
> > +
> > +	pudp = pud_offset(pgdp, addr);
> > +	do {
> > +		pud_t pud = ACCESS_ONCE(*pudp);
> 
> I like to do it this way too, but checkpatch.pl prefers a blank line.

I'll add a line.

> 
> > +		next = pud_addr_end(addr, end);
> > +		if (pud_none(pud))
> > +			return 0;
> > +		if (pud_huge(pud)) {
> 
> I wonder if you spent any time pondering pud_large() and whether to
> use it here (and define it in arm): I have forgotten its relationship
> to pud_huge(), and you are probably right to steer clear of it.

I preferred pud_huge, due to it being more well defined.

> 
> > +			if (!gup_huge_pud(pud, pudp, addr, next, write,
> > +					pages, nr))
> > +				return 0;
> > +		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
> > +			return 0;
> > +	} while (pudp++, addr = next, addr != end);
> > +
> > +	return 1;
> > +}
> > +
> > +/*
> > + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
> > + * back to the regular GUP.
> > + */
> > +int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > +			  struct page **pages)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	unsigned long addr, len, end;
> > +	unsigned long next, flags;
> > +	pgd_t *pgdp;
> > +	int nr = 0;
> > +
> > +	start &= PAGE_MASK;
> > +	addr = start;
> > +	len = (unsigned long) nr_pages << PAGE_SHIFT;
> > +	end = start + len;
> > +
> > +	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
> > +					start, len)))
> > +		return 0;
> > +
> > +	/*
> > +	 * Disable interrupts, we use the nested form as we can already
> > +	 * have interrupts disabled by get_futex_key.
> > +	 *
> > +	 * With interrupts disabled, we block page table pages from being
> > +	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
> > +	 * for more details.
> > +	 *
> > +	 * We do not adopt an rcu_read_lock(.) here as we also want to
> > +	 * block IPIs that come from THPs splitting.
> > +	 */
> > +
> > +	local_irq_save(flags);
> > +	pgdp = pgd_offset(mm, addr);
> > +	do {
> > +		next = pgd_addr_end(addr, end);
> > +		if (pgd_none(*pgdp))
> > +			break;
> > +		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
> > +			break;
> > +	} while (pgdp++, addr = next, addr != end);
> > +	local_irq_restore(flags);
> > +
> > +	return nr;
> > +}
> > +
> 
> The x86 version has a comment on this interface:
> it would be helpful to copy that here.
> 

Thanks, I'll copy it over.

> > +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > +			struct page **pages)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	int nr, ret;
> > +
> > +	start &= PAGE_MASK;
> > +	nr = __get_user_pages_fast(start, nr_pages, write, pages);
> 
> The x86 version has a commit from Linus, avoiding the access_ok() check
> in __get_user_pages_fast(): I confess I just did not spend long enough
> trying to understand what that's about, and whether it would be
> important to incorporate here.
> 

Thanks, I see the commit, I will need to have a think about it as it's
not immediately obvious to me.

> > +	ret = nr;
> > +
> > +	if (nr < nr_pages) {
> > +		/* Try to get the remaining pages with get_user_pages */
> > +		start += nr << PAGE_SHIFT;
> > +		pages += nr;
> > +
> > +		down_read(&mm->mmap_sem);
> > +		ret = get_user_pages(current, mm, start,
> > +				     nr_pages - nr, write, 0, pages, NULL);
> > +		up_read(&mm->mmap_sem);
> > +
> > +		/* Have to be a bit careful with return values */
> > +		if (nr > 0) {
> > +			if (ret < 0)
> > +				ret = nr;
> > +			else
> > +				ret += nr;
> > +		}
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +#endif /* CONFIG_HAVE_GENERIC_RCU_GUP */
> > -- 
> > 1.9.3

Thanks again Hugh for the very useful comments.

Cheers,
-- 
Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-09-24 15:57       ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-24 15:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm,
	akpm, will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, dann.frazier, mark.rutland, mgorman

On Wed, Sep 24, 2014 at 06:34:56AM -0700, Hugh Dickins wrote:
> On Thu, 28 Aug 2014, Steve Capper wrote:
> 
> > get_user_pages_fast attempts to pin user pages by walking the page
> > tables directly and avoids taking locks. Thus the walker needs to be
> > protected from page table pages being freed from under it, and needs
> > to block any THP splits.
> > 
> > One way to achieve this is to have the walker disable interrupts, and
> > rely on IPIs from the TLB flushing code blocking before the page table
> > pages are freed.
> > 
> > On some platforms we have hardware broadcast of TLB invalidations, thus
> > the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> > spuriously broadcasting IPIs can hurt system performance if done too
> > often.
> > 
> > This problem has been solved on PowerPC and Sparc by batching up page
> > table pages belonging to more than one mm_user, then scheduling an
> > rcu_sched callback to free the pages. This RCU page table free logic
> > has been promoted to core code and is activated when one enables
> > HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> > their own get_user_pages_fast routines.
> > 
> > The RCU page table free logic coupled with a an IPI broadcast on THP
> > split (which is a rare event), allows one to protect a page table
> > walker by merely disabling the interrupts during the walk.
> > 
> > This patch provides a general RCU implementation of get_user_pages_fast
> > that can be used by architectures that perform hardware broadcast of
> > TLB invalidations.
> > 
> > It is based heavily on the PowerPC implementation by Nick Piggin.
> 
> That's a helpful description above, thank you; and the patch looks
> mostly good to me.  I took a look because I see time is running out,
> and you're having trouble getting review of this one: I was hoping
> to give you a quick acked-by, but cannot do so as yet.
> 
> Most of my remarks below are trivial comments on where it
> needs a little more, to be presented as a generic implementation in
> mm/gup.c.  And most come from comparing against an up-to-date version
> of arch/x86/mm/gup.c: please do the same, I may have missed some.
> 
> It would be a pity to mess up your arm schedule for lack of linkage
> to this one: maybe this patch can go in as is, and be fixed up a
> litte later (that would be up to Andrew); or maybe you'll have
> no trouble making the changes before the merge window; or maybe
> this should just be kept with arm and arm64 for now (but thank
> you for making the effort to give us a generic version).
> 
> Hugh

Hi Hugh,
A big thank you for taking a look at this.

> 
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > Tested-by: Dann Frazier <dann.frazier@canonical.com>
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  mm/Kconfig |   3 +
> >  mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 281 insertions(+)
> > 
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 886db21..0ceb8a5 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
> >  config HAVE_MEMBLOCK_PHYS_MAP
> >  	boolean
> >  
> > +config HAVE_GENERIC_RCU_GUP
> 
> I'm not wild about that name (fast GUP does require that page tables
> cannot be freed beneath it, and RCU freeing of page tables is one way
> in which that can be guaranteed for this implementation); but I cannot
> suggest a better, so let's stick with it.
> 

Yeah, we couldn't think of a better one. :-(

> > +	boolean
> > +
> >  config ARCH_DISCARD_MEMBLOCK
> >  	boolean
> >  
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 91d044b..5e6f6cb 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -10,6 +10,10 @@
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> >  
> > +#include <linux/sched.h>
> > +#include <linux/rwsem.h>
> > +#include <asm/pgtable.h>
> > +
> >  #include "internal.h"
> >  
> >  static struct page *no_page_table(struct vm_area_struct *vma,
> > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
> >  	return page;
> >  }
> >  #endif /* CONFIG_ELF_CORE */
> > +
> > +#ifdef CONFIG_HAVE_GENERIC_RCU_GUP
> 
> This desperately needs a long comment explaining the assumptions made,
> and what an architecture must supply and guarantee to use this option.
> 
> Maybe your commit message already provides a good enough comment (I
> have not now re-read it in that light) and can simply be inserted here.
> I don't think it needs to spell everything out, but it does need to
> direct a maintainer to thinking through the appropriate issues.

Agreed, I think a summary of the logic and the pre-requisites in a
comment block will make this a lot easier to adopt.

> 
> > +
> > +#ifdef __HAVE_ARCH_PTE_SPECIAL
> > +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> > +			 int write, struct page **pages, int *nr)
> > +{
> > +	pte_t *ptep, *ptem;
> > +	int ret = 0;
> > +
> > +	ptem = ptep = pte_offset_map(&pmd, addr);
> > +	do {
> > +		pte_t pte = ACCESS_ONCE(*ptep);
> 
> Here is my only substantive criticism.  I don't know the arm architecture,
> but my guess is that your LPAE has a similar problem to x86's PAE: that
> the pte entry is bigger than the natural word size of the architecture,
> and so cannot be safely accessed in one operation on SMP or PREEMPT -
> there's a danger that you get mismatched top and bottom halves here.
> And how serious that is depends upon the layout of the pte bits.
> 
> See comments on gup_get_pte() in arch/x86/mm/gup.c,
> and pte_unmap_same() in mm/memory.c.

Thanks, on ARM platforms with LPAE support this will be okay as 64-bit
single-copy atomicity is guaranteed for reading pagetable entries.

> 
> And even if arm's LPAE is safe, this is unsafe to present in generic
> code, or not without a big comment that GENERIC_RCU_GUP should not be
> used for such configs; or, better than a comment, a build time error
> according to sizeof(pte_t).
> 

I was thinking of introducing something like: ARCH_HAS_ATOMIC64_PTE_READS,
then putting in some compiler logic; it looked overkill to me.

Then I thought of adding a comment to this line of code and explicitly
adding a pre-requisite to the comments block that I'm about to add before
#ifdef CONFIG_HAVE_GENERIC_RCU_GUP
Hopefully that'll be okay.

> (It turns out not to be a problem at pmd, pud and pgd level: IIRC
> that's because the transitions at those levels are much more restricted,
> limited to setting, then clearing on pagetable teardown - except for
> the THP transitions which the local_irq_disable() guards against.)
> 
> Ah, enlightenment: arm (unlike arm64) does not __HAVE_ARCH_PTE_SPECIAL,
> so this "dangerous" code won't be compiled in for it, it's only using
> the stub below.  Well, you can see my point about needing more
> comments, those would have saved me a LOT of time.
> 

This is so we can cover the futex on THP tail case without the need for
__HAVE_ARCH_PTE_SPECIAL.

> > +		struct page *page;
> > +
> > +		if (!pte_present(pte) || pte_special(pte)
> > +			|| (write && !pte_write(pte)))
> 
> The " ||" at end of line above please.  And, more importantly,
> we need a pte_numa() test in here nowadays, for generic use.
> 
Will do.
Ahh, okay, apologies I didn't spot pte_numa tests being introduced.
I will check for other changes.

> > +			goto pte_unmap;
> > +
> > +		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> > +		page = pte_page(pte);
> > +
> > +		if (!page_cache_get_speculative(page))
> > +			goto pte_unmap;
> > +
> > +		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> > +			put_page(page);
> > +			goto pte_unmap;
> > +		}
> > +
> > +		pages[*nr] = page;
> > +		(*nr)++;
> > +
> > +	} while (ptep++, addr += PAGE_SIZE, addr != end);
> > +
> > +	ret = 1;
> > +
> > +pte_unmap:
> > +	pte_unmap(ptem);
> > +	return ret;
> > +}
> > +#else
> > +
> > +/*
> > + * If we can't determine whether or not a pte is special, then fail immediately
> > + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
> > + * to be special.
> 
> From that comment, I just thought it very weird that you were compiling
> in any of this HAVE_GENERIC_RCU_GUP code in the !__HAVE_ARCH_PTE_SPECIAL
> case.  But somewhere else, over in the 0/6, you have a very important
> remark about futex on THP tail which makes sense of it: please add that
> explanation here.

Sure thing, thanks.

> 
> > + */
> > +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> 
> checkpatch.pl is noisy about that line over 80 characters, whereas
> you understandably prefer to keep the stub declaration just like the
> main declaration.  Simply omit the " inline"?  The compiler should be
> able to work that out for itself, and it doesn't matter if it cannot.
> 

Okay, thanks.

> > +			 int write, struct page **pages, int *nr)
> > +{
> > +	return 0;
> > +}
> > +#endif /* __HAVE_ARCH_PTE_SPECIAL */
> > +
> > +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> > +		unsigned long end, int write, struct page **pages, int *nr)
> > +{
> > +	struct page *head, *page, *tail;
> > +	int refs;
> > +
> > +	if (write && !pmd_write(orig))
> > +		return 0;
> > +
> > +	refs = 0;
> > +	head = pmd_page(orig);
> > +	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> > +	tail = page;
> > +	do {
> > +		VM_BUG_ON(compound_head(page) != head);
> 
> VM_BUG_ON_PAGE() is the latest preference.

Cheers, I will update this...

> 
> > +		pages[*nr] = page;
> > +		(*nr)++;
> > +		page++;
> > +		refs++;
> > +	} while (addr += PAGE_SIZE, addr != end);
> > +
> > +	if (!page_cache_add_speculative(head, refs)) {
> > +		*nr -= refs;
> > +		return 0;
> > +	}
> > +
> > +	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> > +		*nr -= refs;
> > +		while (refs--)
> > +			put_page(head);
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * Any tail pages need their mapcount reference taken before we
> > +	 * return. (This allows the THP code to bump their ref count when
> > +	 * they are split into base pages).
> > +	 */
> > +	while (refs--) {
> > +		if (PageTail(tail))
> > +			get_huge_page_tail(tail);
> > +		tail++;
> > +	}
> > +
> > +	return 1;
> > +}
> > +
> > +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> > +		unsigned long end, int write, struct page **pages, int *nr)
> > +{
> > +	struct page *head, *page, *tail;
> > +	int refs;
> > +
> > +	if (write && !pud_write(orig))
> > +		return 0;
> > +
> > +	refs = 0;
> > +	head = pud_page(orig);
> > +	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> > +	tail = page;
> > +	do {
> > +		VM_BUG_ON(compound_head(page) != head);
> 
> VM_BUG_ON_PAGE() is the latest preference.
> 

... and that :-).

> > +		pages[*nr] = page;
> > +		(*nr)++;
> > +		page++;
> > +		refs++;
> > +	} while (addr += PAGE_SIZE, addr != end);
> > +
> > +	if (!page_cache_add_speculative(head, refs)) {
> > +		*nr -= refs;
> > +		return 0;
> > +	}
> > +
> > +	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> > +		*nr -= refs;
> > +		while (refs--)
> > +			put_page(head);
> > +		return 0;
> > +	}
> > +
> > +	while (refs--) {
> > +		if (PageTail(tail))
> > +			get_huge_page_tail(tail);
> > +		tail++;
> > +	}
> > +
> > +	return 1;
> > +}
> > +
> > +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> > +		int write, struct page **pages, int *nr)
> > +{
> > +	unsigned long next;
> > +	pmd_t *pmdp;
> > +
> > +	pmdp = pmd_offset(&pud, addr);
> > +	do {
> > +		pmd_t pmd = ACCESS_ONCE(*pmdp);
> 
> I like to do it this way too, but checkpatch.pl prefers a blank line.
> 

Okay, I will add a blank line.

> > +		next = pmd_addr_end(addr, end);
> > +		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> > +			return 0;
> > +
> > +		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
> 
> I wonder if you spent any time pondering pmd_large() and whether to
> use it here (and define it in arm): I have forgotten its relationship
> to pmd_huge() and pmd_trans_huge(), and you are probably right to
> steer clear of it.

pmd_large is only defined by a few architectures, I opted for
generality and clarity.

> 
> A pmd_numa() test is needed here nowadays, for generic use.
> 

Thanks, I will add the logic.

> > +			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
> > +				pages, nr))
> > +				return 0;
> > +		} else {
> > +			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
> > +				return 0;
> > +		}
> 
> You've chosen a different (indentation and else) style here from what
> you use below in the very similar gup_pud_range(): it's easier to see
> the differences if you keep the style the same, personally I prefer
> how you did gup_pud_range().

Okay, I will re-structure.

> 
> > +	} while (pmdp++, addr = next, addr != end);
> > +
> > +	return 1;
> > +}
> > +
> > +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
> > +		int write, struct page **pages, int *nr)
> > +{
> > +	unsigned long next;
> > +	pud_t *pudp;
> > +
> > +	pudp = pud_offset(pgdp, addr);
> > +	do {
> > +		pud_t pud = ACCESS_ONCE(*pudp);
> 
> I like to do it this way too, but checkpatch.pl prefers a blank line.

I'll add a line.

> 
> > +		next = pud_addr_end(addr, end);
> > +		if (pud_none(pud))
> > +			return 0;
> > +		if (pud_huge(pud)) {
> 
> I wonder if you spent any time pondering pud_large() and whether to
> use it here (and define it in arm): I have forgotten its relationship
> to pud_huge(), and you are probably right to steer clear of it.

I preferred pud_huge, due to it being more well defined.

> 
> > +			if (!gup_huge_pud(pud, pudp, addr, next, write,
> > +					pages, nr))
> > +				return 0;
> > +		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
> > +			return 0;
> > +	} while (pudp++, addr = next, addr != end);
> > +
> > +	return 1;
> > +}
> > +
> > +/*
> > + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
> > + * back to the regular GUP.
> > + */
> > +int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > +			  struct page **pages)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	unsigned long addr, len, end;
> > +	unsigned long next, flags;
> > +	pgd_t *pgdp;
> > +	int nr = 0;
> > +
> > +	start &= PAGE_MASK;
> > +	addr = start;
> > +	len = (unsigned long) nr_pages << PAGE_SHIFT;
> > +	end = start + len;
> > +
> > +	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
> > +					start, len)))
> > +		return 0;
> > +
> > +	/*
> > +	 * Disable interrupts, we use the nested form as we can already
> > +	 * have interrupts disabled by get_futex_key.
> > +	 *
> > +	 * With interrupts disabled, we block page table pages from being
> > +	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
> > +	 * for more details.
> > +	 *
> > +	 * We do not adopt an rcu_read_lock(.) here as we also want to
> > +	 * block IPIs that come from THPs splitting.
> > +	 */
> > +
> > +	local_irq_save(flags);
> > +	pgdp = pgd_offset(mm, addr);
> > +	do {
> > +		next = pgd_addr_end(addr, end);
> > +		if (pgd_none(*pgdp))
> > +			break;
> > +		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
> > +			break;
> > +	} while (pgdp++, addr = next, addr != end);
> > +	local_irq_restore(flags);
> > +
> > +	return nr;
> > +}
> > +
> 
> The x86 version has a comment on this interface:
> it would be helpful to copy that here.
> 

Thanks, I'll copy it over.

> > +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > +			struct page **pages)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	int nr, ret;
> > +
> > +	start &= PAGE_MASK;
> > +	nr = __get_user_pages_fast(start, nr_pages, write, pages);
> 
> The x86 version has a commit from Linus, avoiding the access_ok() check
> in __get_user_pages_fast(): I confess I just did not spend long enough
> trying to understand what that's about, and whether it would be
> important to incorporate here.
> 

Thanks, I see the commit, I will need to have a think about it as it's
not immediately obvious to me.

> > +	ret = nr;
> > +
> > +	if (nr < nr_pages) {
> > +		/* Try to get the remaining pages with get_user_pages */
> > +		start += nr << PAGE_SHIFT;
> > +		pages += nr;
> > +
> > +		down_read(&mm->mmap_sem);
> > +		ret = get_user_pages(current, mm, start,
> > +				     nr_pages - nr, write, 0, pages, NULL);
> > +		up_read(&mm->mmap_sem);
> > +
> > +		/* Have to be a bit careful with return values */
> > +		if (nr > 0) {
> > +			if (ret < 0)
> > +				ret = nr;
> > +			else
> > +				ret += nr;
> > +		}
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +#endif /* CONFIG_HAVE_GENERIC_RCU_GUP */
> > -- 
> > 1.9.3

Thanks again Hugh for the very useful comments.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-09-24 15:57       ` Steve Capper
  0 siblings, 0 replies; 43+ messages in thread
From: Steve Capper @ 2014-09-24 15:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 24, 2014 at 06:34:56AM -0700, Hugh Dickins wrote:
> On Thu, 28 Aug 2014, Steve Capper wrote:
> 
> > get_user_pages_fast attempts to pin user pages by walking the page
> > tables directly and avoids taking locks. Thus the walker needs to be
> > protected from page table pages being freed from under it, and needs
> > to block any THP splits.
> > 
> > One way to achieve this is to have the walker disable interrupts, and
> > rely on IPIs from the TLB flushing code blocking before the page table
> > pages are freed.
> > 
> > On some platforms we have hardware broadcast of TLB invalidations, thus
> > the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> > spuriously broadcasting IPIs can hurt system performance if done too
> > often.
> > 
> > This problem has been solved on PowerPC and Sparc by batching up page
> > table pages belonging to more than one mm_user, then scheduling an
> > rcu_sched callback to free the pages. This RCU page table free logic
> > has been promoted to core code and is activated when one enables
> > HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> > their own get_user_pages_fast routines.
> > 
> > The RCU page table free logic coupled with a an IPI broadcast on THP
> > split (which is a rare event), allows one to protect a page table
> > walker by merely disabling the interrupts during the walk.
> > 
> > This patch provides a general RCU implementation of get_user_pages_fast
> > that can be used by architectures that perform hardware broadcast of
> > TLB invalidations.
> > 
> > It is based heavily on the PowerPC implementation by Nick Piggin.
> 
> That's a helpful description above, thank you; and the patch looks
> mostly good to me.  I took a look because I see time is running out,
> and you're having trouble getting review of this one: I was hoping
> to give you a quick acked-by, but cannot do so as yet.
> 
> Most of my remarks below are trivial comments on where it
> needs a little more, to be presented as a generic implementation in
> mm/gup.c.  And most come from comparing against an up-to-date version
> of arch/x86/mm/gup.c: please do the same, I may have missed some.
> 
> It would be a pity to mess up your arm schedule for lack of linkage
> to this one: maybe this patch can go in as is, and be fixed up a
> litte later (that would be up to Andrew); or maybe you'll have
> no trouble making the changes before the merge window; or maybe
> this should just be kept with arm and arm64 for now (but thank
> you for making the effort to give us a generic version).
> 
> Hugh

Hi Hugh,
A big thank you for taking a look at this.

> 
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > Tested-by: Dann Frazier <dann.frazier@canonical.com>
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  mm/Kconfig |   3 +
> >  mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 281 insertions(+)
> > 
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 886db21..0ceb8a5 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
> >  config HAVE_MEMBLOCK_PHYS_MAP
> >  	boolean
> >  
> > +config HAVE_GENERIC_RCU_GUP
> 
> I'm not wild about that name (fast GUP does require that page tables
> cannot be freed beneath it, and RCU freeing of page tables is one way
> in which that can be guaranteed for this implementation); but I cannot
> suggest a better, so let's stick with it.
> 

Yeah, we couldn't think of a better one. :-(

> > +	boolean
> > +
> >  config ARCH_DISCARD_MEMBLOCK
> >  	boolean
> >  
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 91d044b..5e6f6cb 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -10,6 +10,10 @@
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> >  
> > +#include <linux/sched.h>
> > +#include <linux/rwsem.h>
> > +#include <asm/pgtable.h>
> > +
> >  #include "internal.h"
> >  
> >  static struct page *no_page_table(struct vm_area_struct *vma,
> > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
> >  	return page;
> >  }
> >  #endif /* CONFIG_ELF_CORE */
> > +
> > +#ifdef CONFIG_HAVE_GENERIC_RCU_GUP
> 
> This desperately needs a long comment explaining the assumptions made,
> and what an architecture must supply and guarantee to use this option.
> 
> Maybe your commit message already provides a good enough comment (I
> have not now re-read it in that light) and can simply be inserted here.
> I don't think it needs to spell everything out, but it does need to
> direct a maintainer to thinking through the appropriate issues.

Agreed, I think a summary of the logic and the pre-requisites in a
comment block will make this a lot easier to adopt.

> 
> > +
> > +#ifdef __HAVE_ARCH_PTE_SPECIAL
> > +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> > +			 int write, struct page **pages, int *nr)
> > +{
> > +	pte_t *ptep, *ptem;
> > +	int ret = 0;
> > +
> > +	ptem = ptep = pte_offset_map(&pmd, addr);
> > +	do {
> > +		pte_t pte = ACCESS_ONCE(*ptep);
> 
> Here is my only substantive criticism.  I don't know the arm architecture,
> but my guess is that your LPAE has a similar problem to x86's PAE: that
> the pte entry is bigger than the natural word size of the architecture,
> and so cannot be safely accessed in one operation on SMP or PREEMPT -
> there's a danger that you get mismatched top and bottom halves here.
> And how serious that is depends upon the layout of the pte bits.
> 
> See comments on gup_get_pte() in arch/x86/mm/gup.c,
> and pte_unmap_same() in mm/memory.c.

Thanks, on ARM platforms with LPAE support this will be okay as 64-bit
single-copy atomicity is guaranteed for reading pagetable entries.

> 
> And even if arm's LPAE is safe, this is unsafe to present in generic
> code, or not without a big comment that GENERIC_RCU_GUP should not be
> used for such configs; or, better than a comment, a build time error
> according to sizeof(pte_t).
> 

I was thinking of introducing something like: ARCH_HAS_ATOMIC64_PTE_READS,
then putting in some compiler logic; it looked overkill to me.

Then I thought of adding a comment to this line of code and explicitly
adding a pre-requisite to the comments block that I'm about to add before
#ifdef CONFIG_HAVE_GENERIC_RCU_GUP
Hopefully that'll be okay.

> (It turns out not to be a problem at pmd, pud and pgd level: IIRC
> that's because the transitions at those levels are much more restricted,
> limited to setting, then clearing on pagetable teardown - except for
> the THP transitions which the local_irq_disable() guards against.)
> 
> Ah, enlightenment: arm (unlike arm64) does not __HAVE_ARCH_PTE_SPECIAL,
> so this "dangerous" code won't be compiled in for it, it's only using
> the stub below.  Well, you can see my point about needing more
> comments, those would have saved me a LOT of time.
> 

This is so we can cover the futex on THP tail case without the need for
__HAVE_ARCH_PTE_SPECIAL.

> > +		struct page *page;
> > +
> > +		if (!pte_present(pte) || pte_special(pte)
> > +			|| (write && !pte_write(pte)))
> 
> The " ||" at end of line above please.  And, more importantly,
> we need a pte_numa() test in here nowadays, for generic use.
> 
Will do.
Ahh, okay, apologies I didn't spot pte_numa tests being introduced.
I will check for other changes.

> > +			goto pte_unmap;
> > +
> > +		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> > +		page = pte_page(pte);
> > +
> > +		if (!page_cache_get_speculative(page))
> > +			goto pte_unmap;
> > +
> > +		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> > +			put_page(page);
> > +			goto pte_unmap;
> > +		}
> > +
> > +		pages[*nr] = page;
> > +		(*nr)++;
> > +
> > +	} while (ptep++, addr += PAGE_SIZE, addr != end);
> > +
> > +	ret = 1;
> > +
> > +pte_unmap:
> > +	pte_unmap(ptem);
> > +	return ret;
> > +}
> > +#else
> > +
> > +/*
> > + * If we can't determine whether or not a pte is special, then fail immediately
> > + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
> > + * to be special.
> 
> From that comment, I just thought it very weird that you were compiling
> in any of this HAVE_GENERIC_RCU_GUP code in the !__HAVE_ARCH_PTE_SPECIAL
> case.  But somewhere else, over in the 0/6, you have a very important
> remark about futex on THP tail which makes sense of it: please add that
> explanation here.

Sure thing, thanks.

> 
> > + */
> > +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> 
> checkpatch.pl is noisy about that line over 80 characters, whereas
> you understandably prefer to keep the stub declaration just like the
> main declaration.  Simply omit the " inline"?  The compiler should be
> able to work that out for itself, and it doesn't matter if it cannot.
> 

Okay, thanks.

> > +			 int write, struct page **pages, int *nr)
> > +{
> > +	return 0;
> > +}
> > +#endif /* __HAVE_ARCH_PTE_SPECIAL */
> > +
> > +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> > +		unsigned long end, int write, struct page **pages, int *nr)
> > +{
> > +	struct page *head, *page, *tail;
> > +	int refs;
> > +
> > +	if (write && !pmd_write(orig))
> > +		return 0;
> > +
> > +	refs = 0;
> > +	head = pmd_page(orig);
> > +	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> > +	tail = page;
> > +	do {
> > +		VM_BUG_ON(compound_head(page) != head);
> 
> VM_BUG_ON_PAGE() is the latest preference.

Cheers, I will update this...

> 
> > +		pages[*nr] = page;
> > +		(*nr)++;
> > +		page++;
> > +		refs++;
> > +	} while (addr += PAGE_SIZE, addr != end);
> > +
> > +	if (!page_cache_add_speculative(head, refs)) {
> > +		*nr -= refs;
> > +		return 0;
> > +	}
> > +
> > +	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> > +		*nr -= refs;
> > +		while (refs--)
> > +			put_page(head);
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * Any tail pages need their mapcount reference taken before we
> > +	 * return. (This allows the THP code to bump their ref count when
> > +	 * they are split into base pages).
> > +	 */
> > +	while (refs--) {
> > +		if (PageTail(tail))
> > +			get_huge_page_tail(tail);
> > +		tail++;
> > +	}
> > +
> > +	return 1;
> > +}
> > +
> > +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> > +		unsigned long end, int write, struct page **pages, int *nr)
> > +{
> > +	struct page *head, *page, *tail;
> > +	int refs;
> > +
> > +	if (write && !pud_write(orig))
> > +		return 0;
> > +
> > +	refs = 0;
> > +	head = pud_page(orig);
> > +	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> > +	tail = page;
> > +	do {
> > +		VM_BUG_ON(compound_head(page) != head);
> 
> VM_BUG_ON_PAGE() is the latest preference.
> 

... and that :-).

> > +		pages[*nr] = page;
> > +		(*nr)++;
> > +		page++;
> > +		refs++;
> > +	} while (addr += PAGE_SIZE, addr != end);
> > +
> > +	if (!page_cache_add_speculative(head, refs)) {
> > +		*nr -= refs;
> > +		return 0;
> > +	}
> > +
> > +	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> > +		*nr -= refs;
> > +		while (refs--)
> > +			put_page(head);
> > +		return 0;
> > +	}
> > +
> > +	while (refs--) {
> > +		if (PageTail(tail))
> > +			get_huge_page_tail(tail);
> > +		tail++;
> > +	}
> > +
> > +	return 1;
> > +}
> > +
> > +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> > +		int write, struct page **pages, int *nr)
> > +{
> > +	unsigned long next;
> > +	pmd_t *pmdp;
> > +
> > +	pmdp = pmd_offset(&pud, addr);
> > +	do {
> > +		pmd_t pmd = ACCESS_ONCE(*pmdp);
> 
> I like to do it this way too, but checkpatch.pl prefers a blank line.
> 

Okay, I will add a blank line.

> > +		next = pmd_addr_end(addr, end);
> > +		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> > +			return 0;
> > +
> > +		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
> 
> I wonder if you spent any time pondering pmd_large() and whether to
> use it here (and define it in arm): I have forgotten its relationship
> to pmd_huge() and pmd_trans_huge(), and you are probably right to
> steer clear of it.

pmd_large is only defined by a few architectures, I opted for
generality and clarity.

> 
> A pmd_numa() test is needed here nowadays, for generic use.
> 

Thanks, I will add the logic.

> > +			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
> > +				pages, nr))
> > +				return 0;
> > +		} else {
> > +			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
> > +				return 0;
> > +		}
> 
> You've chosen a different (indentation and else) style here from what
> you use below in the very similar gup_pud_range(): it's easier to see
> the differences if you keep the style the same, personally I prefer
> how you did gup_pud_range().

Okay, I will re-structure.

> 
> > +	} while (pmdp++, addr = next, addr != end);
> > +
> > +	return 1;
> > +}
> > +
> > +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
> > +		int write, struct page **pages, int *nr)
> > +{
> > +	unsigned long next;
> > +	pud_t *pudp;
> > +
> > +	pudp = pud_offset(pgdp, addr);
> > +	do {
> > +		pud_t pud = ACCESS_ONCE(*pudp);
> 
> I like to do it this way too, but checkpatch.pl prefers a blank line.

I'll add a line.

> 
> > +		next = pud_addr_end(addr, end);
> > +		if (pud_none(pud))
> > +			return 0;
> > +		if (pud_huge(pud)) {
> 
> I wonder if you spent any time pondering pud_large() and whether to
> use it here (and define it in arm): I have forgotten its relationship
> to pud_huge(), and you are probably right to steer clear of it.

I preferred pud_huge, due to it being more well defined.

> 
> > +			if (!gup_huge_pud(pud, pudp, addr, next, write,
> > +					pages, nr))
> > +				return 0;
> > +		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
> > +			return 0;
> > +	} while (pudp++, addr = next, addr != end);
> > +
> > +	return 1;
> > +}
> > +
> > +/*
> > + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
> > + * back to the regular GUP.
> > + */
> > +int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > +			  struct page **pages)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	unsigned long addr, len, end;
> > +	unsigned long next, flags;
> > +	pgd_t *pgdp;
> > +	int nr = 0;
> > +
> > +	start &= PAGE_MASK;
> > +	addr = start;
> > +	len = (unsigned long) nr_pages << PAGE_SHIFT;
> > +	end = start + len;
> > +
> > +	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
> > +					start, len)))
> > +		return 0;
> > +
> > +	/*
> > +	 * Disable interrupts, we use the nested form as we can already
> > +	 * have interrupts disabled by get_futex_key.
> > +	 *
> > +	 * With interrupts disabled, we block page table pages from being
> > +	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
> > +	 * for more details.
> > +	 *
> > +	 * We do not adopt an rcu_read_lock(.) here as we also want to
> > +	 * block IPIs that come from THPs splitting.
> > +	 */
> > +
> > +	local_irq_save(flags);
> > +	pgdp = pgd_offset(mm, addr);
> > +	do {
> > +		next = pgd_addr_end(addr, end);
> > +		if (pgd_none(*pgdp))
> > +			break;
> > +		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
> > +			break;
> > +	} while (pgdp++, addr = next, addr != end);
> > +	local_irq_restore(flags);
> > +
> > +	return nr;
> > +}
> > +
> 
> The x86 version has a comment on this interface:
> it would be helpful to copy that here.
> 

Thanks, I'll copy it over.

> > +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > +			struct page **pages)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	int nr, ret;
> > +
> > +	start &= PAGE_MASK;
> > +	nr = __get_user_pages_fast(start, nr_pages, write, pages);
> 
> The x86 version has a commit from Linus, avoiding the access_ok() check
> in __get_user_pages_fast(): I confess I just did not spend long enough
> trying to understand what that's about, and whether it would be
> important to incorporate here.
> 

Thanks, I see the commit, I will need to have a think about it as it's
not immediately obvious to me.

> > +	ret = nr;
> > +
> > +	if (nr < nr_pages) {
> > +		/* Try to get the remaining pages with get_user_pages */
> > +		start += nr << PAGE_SHIFT;
> > +		pages += nr;
> > +
> > +		down_read(&mm->mmap_sem);
> > +		ret = get_user_pages(current, mm, start,
> > +				     nr_pages - nr, write, 0, pages, NULL);
> > +		up_read(&mm->mmap_sem);
> > +
> > +		/* Have to be a bit careful with return values */
> > +		if (nr > 0) {
> > +			if (ret < 0)
> > +				ret = nr;
> > +			else
> > +				ret += nr;
> > +		}
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +#endif /* CONFIG_HAVE_GENERIC_RCU_GUP */
> > -- 
> > 1.9.3

Thanks again Hugh for the very useful comments.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2014-09-24 15:57 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-28 14:45 [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast Steve Capper
2014-08-28 14:45 ` Steve Capper
2014-08-28 14:45 ` Steve Capper
2014-08-28 14:45 ` [PATCH V3 1/6] mm: Introduce a general RCU get_user_pages_fast Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-09-24 13:34   ` Hugh Dickins
2014-09-24 13:34     ` Hugh Dickins
2014-09-24 13:34     ` Hugh Dickins
2014-09-24 13:34     ` Hugh Dickins
2014-09-24 15:57     ` Steve Capper
2014-09-24 15:57       ` Steve Capper
2014-09-24 15:57       ` Steve Capper
2014-08-28 14:45 ` [PATCH V3 2/6] arm: mm: Introduce special ptes for LPAE Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 14:45 ` [PATCH V3 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 14:45 ` [PATCH V3 4/6] arm: mm: Enable RCU fast_gup Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 14:45 ` [PATCH V3 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 14:45 ` [PATCH V3 6/6] arm64: mm: Enable RCU fast_gup Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 14:45   ` Steve Capper
2014-08-28 15:23 ` [PATCH V3 0/6] RCU get_user_pages_fast and __get_user_pages_fast Will Deacon
2014-08-28 15:23   ` Will Deacon
2014-08-28 15:23   ` Will Deacon
2014-09-01 11:43   ` Steve Capper
2014-09-01 11:43     ` Steve Capper
2014-09-01 11:43     ` Steve Capper
2014-09-08  9:06     ` Steve Capper
2014-09-08  9:06       ` Steve Capper
2014-09-08  9:06       ` Steve Capper
2014-09-19 18:28       ` Steve Capper
2014-09-19 18:28         ` Steve Capper
2014-09-19 18:28         ` Steve Capper
2014-09-22  9:28         ` Will Deacon
2014-09-22  9:28           ` Will Deacon
2014-09-22  9:28           ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.