All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-21 15:43 ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

Hello,
This series implements general forms of get_user_pages_fast and
__get_user_pages_fast and activates them for arm and arm64.

These are required for Transparent HugePages to function correctly, as
a futex on a THP tail will otherwise result in an infinite loop (due to
the core implementation of __get_user_pages_fast always returning 0).

Unfortunately, a futex on THP tail can be quite common for certain
workloads; thus THP is unreliable without a __get_user_pages_fast
implementation.

This series may also be beneficial for direct-IO heavy workloads and
certain KVM workloads.

Changes since PATCH V1 are:
 * Rebase to 3.17-rc1
 * Switched to kick_all_cpus_sync as suggested by Mark Rutland.

The main changes since RFC V5 are:
 * Rebased against 3.16-rc1.
 * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
   because the entry must be present for these leaf functions to be
   called. 
 * Rather than assume puds can be re-cast as pmds, a separate
   function pud_write is instead used by the core gup.
 * ARM activation logic changed, now it will only activate
   RCU_TABLE_FREE and RCU_GUP when running with LPAE.

The main changes since RFC V4 are:
 * corrected the arm64 logic so it now correctly rcu-frees page
   table backing pages.
 * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
   invalidate TLBs anyway.
 * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
 * dropped Catalin's mmu_gather patch as that's been merged already.

This series has been tested with LTP mm tests and some custom futex tests
that exacerbate the futex on THP tail case; on both an Arndale board and
a Juno board. Also debug counters were temporarily employed to ensure that
the RCU_TABLE_FREE logic was behaving as expected.

I would really appreciate any comments (especially on the validity or
otherwise of the core fast_gup implementation) and testers.

Cheers,
--
Steve

Steve Capper (6):
  mm: Introduce a general RCU get_user_pages_fast.
  arm: mm: Introduce special ptes for LPAE
  arm: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm: mm: Enable RCU fast_gup
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

 arch/arm/Kconfig                      |   5 +
 arch/arm/include/asm/pgtable-2level.h |   2 +
 arch/arm/include/asm/pgtable-3level.h |  15 ++
 arch/arm/include/asm/pgtable.h        |   6 +-
 arch/arm/include/asm/tlb.h            |  38 ++++-
 arch/arm/mm/flush.c                   |  15 ++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/include/asm/pgtable.h      |  11 +-
 arch/arm64/include/asm/tlb.h          |  20 ++-
 arch/arm64/mm/flush.c                 |  15 ++
 mm/Kconfig                            |   3 +
 mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
 12 files changed, 402 insertions(+), 10 deletions(-)

-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-21 15:43 ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

Hello,
This series implements general forms of get_user_pages_fast and
__get_user_pages_fast and activates them for arm and arm64.

These are required for Transparent HugePages to function correctly, as
a futex on a THP tail will otherwise result in an infinite loop (due to
the core implementation of __get_user_pages_fast always returning 0).

Unfortunately, a futex on THP tail can be quite common for certain
workloads; thus THP is unreliable without a __get_user_pages_fast
implementation.

This series may also be beneficial for direct-IO heavy workloads and
certain KVM workloads.

Changes since PATCH V1 are:
 * Rebase to 3.17-rc1
 * Switched to kick_all_cpus_sync as suggested by Mark Rutland.

The main changes since RFC V5 are:
 * Rebased against 3.16-rc1.
 * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
   because the entry must be present for these leaf functions to be
   called. 
 * Rather than assume puds can be re-cast as pmds, a separate
   function pud_write is instead used by the core gup.
 * ARM activation logic changed, now it will only activate
   RCU_TABLE_FREE and RCU_GUP when running with LPAE.

The main changes since RFC V4 are:
 * corrected the arm64 logic so it now correctly rcu-frees page
   table backing pages.
 * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
   invalidate TLBs anyway.
 * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
 * dropped Catalin's mmu_gather patch as that's been merged already.

This series has been tested with LTP mm tests and some custom futex tests
that exacerbate the futex on THP tail case; on both an Arndale board and
a Juno board. Also debug counters were temporarily employed to ensure that
the RCU_TABLE_FREE logic was behaving as expected.

I would really appreciate any comments (especially on the validity or
otherwise of the core fast_gup implementation) and testers.

Cheers,
--
Steve

Steve Capper (6):
  mm: Introduce a general RCU get_user_pages_fast.
  arm: mm: Introduce special ptes for LPAE
  arm: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm: mm: Enable RCU fast_gup
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

 arch/arm/Kconfig                      |   5 +
 arch/arm/include/asm/pgtable-2level.h |   2 +
 arch/arm/include/asm/pgtable-3level.h |  15 ++
 arch/arm/include/asm/pgtable.h        |   6 +-
 arch/arm/include/asm/tlb.h            |  38 ++++-
 arch/arm/mm/flush.c                   |  15 ++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/include/asm/pgtable.h      |  11 +-
 arch/arm64/include/asm/tlb.h          |  20 ++-
 arch/arm64/mm/flush.c                 |  15 ++
 mm/Kconfig                            |   3 +
 mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
 12 files changed, 402 insertions(+), 10 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-21 15:43 ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,
This series implements general forms of get_user_pages_fast and
__get_user_pages_fast and activates them for arm and arm64.

These are required for Transparent HugePages to function correctly, as
a futex on a THP tail will otherwise result in an infinite loop (due to
the core implementation of __get_user_pages_fast always returning 0).

Unfortunately, a futex on THP tail can be quite common for certain
workloads; thus THP is unreliable without a __get_user_pages_fast
implementation.

This series may also be beneficial for direct-IO heavy workloads and
certain KVM workloads.

Changes since PATCH V1 are:
 * Rebase to 3.17-rc1
 * Switched to kick_all_cpus_sync as suggested by Mark Rutland.

The main changes since RFC V5 are:
 * Rebased against 3.16-rc1.
 * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
   because the entry must be present for these leaf functions to be
   called. 
 * Rather than assume puds can be re-cast as pmds, a separate
   function pud_write is instead used by the core gup.
 * ARM activation logic changed, now it will only activate
   RCU_TABLE_FREE and RCU_GUP when running with LPAE.

The main changes since RFC V4 are:
 * corrected the arm64 logic so it now correctly rcu-frees page
   table backing pages.
 * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
   invalidate TLBs anyway.
 * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
 * dropped Catalin's mmu_gather patch as that's been merged already.

This series has been tested with LTP mm tests and some custom futex tests
that exacerbate the futex on THP tail case; on both an Arndale board and
a Juno board. Also debug counters were temporarily employed to ensure that
the RCU_TABLE_FREE logic was behaving as expected.

I would really appreciate any comments (especially on the validity or
otherwise of the core fast_gup implementation) and testers.

Cheers,
--
Steve

Steve Capper (6):
  mm: Introduce a general RCU get_user_pages_fast.
  arm: mm: Introduce special ptes for LPAE
  arm: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm: mm: Enable RCU fast_gup
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

 arch/arm/Kconfig                      |   5 +
 arch/arm/include/asm/pgtable-2level.h |   2 +
 arch/arm/include/asm/pgtable-3level.h |  15 ++
 arch/arm/include/asm/pgtable.h        |   6 +-
 arch/arm/include/asm/tlb.h            |  38 ++++-
 arch/arm/mm/flush.c                   |  15 ++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/include/asm/pgtable.h      |  11 +-
 arch/arm64/include/asm/tlb.h          |  20 ++-
 arch/arm64/mm/flush.c                 |  15 ++
 mm/Kconfig                            |   3 +
 mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
 12 files changed, 402 insertions(+), 10 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

get_user_pages_fast attempts to pin user pages by walking the page
tables directly and avoids taking locks. Thus the walker needs to be
protected from page table pages being freed from under it, and needs
to block any THP splits.

One way to achieve this is to have the walker disable interrupts, and
rely on IPIs from the TLB flushing code blocking before the page table
pages are freed.

On some platforms we have hardware broadcast of TLB invalidations, thus
the TLB flushing code doesn't necessarily need to broadcast IPIs; and
spuriously broadcasting IPIs can hurt system performance if done too
often.

This problem has been solved on PowerPC and Sparc by batching up page
table pages belonging to more than one mm_user, then scheduling an
rcu_sched callback to free the pages. This RCU page table free logic
has been promoted to core code and is activated when one enables
HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
their own get_user_pages_fast routines.

The RCU page table free logic coupled with a an IPI broadcast on THP
split (which is a rare event), allows one to protect a page table
walker by merely disabling the interrupts during the walk.

This patch provides a general RCU implementation of get_user_pages_fast
that can be used by architectures that perform hardware broadcast of
TLB invalidations.

It is based heavily on the PowerPC implementation by Nick Piggin.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 mm/Kconfig |   3 +
 mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 886db21..6a4d764 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
 	boolean
 
+config HAVE_RCU_GUP
+	boolean
+
 config ARCH_DISCARD_MEMBLOCK
 	boolean
 
diff --git a/mm/gup.c b/mm/gup.c
index 91d044b..2f684fa 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 
+#include <linux/sched.h>
+#include <linux/rwsem.h>
+#include <asm/pgtable.h>
+
 #include "internal.h"
 
 static struct page *no_page_table(struct vm_area_struct *vma,
@@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
 	return page;
 }
 #endif /* CONFIG_ELF_CORE */
+
+#ifdef CONFIG_HAVE_RCU_GUP
+
+#ifdef __HAVE_ARCH_PTE_SPECIAL
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, *ptem;
+	int ret = 0;
+
+	ptem = ptep = pte_offset_map(&pmd, addr);
+	do {
+		pte_t pte = ACCESS_ONCE(*ptep);
+		struct page *page;
+
+		if (!pte_present(pte) || pte_special(pte)
+			|| (write && !pte_write(pte)))
+			goto pte_unmap;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			goto pte_unmap;
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			goto pte_unmap;
+		}
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	ret = 1;
+
+pte_unmap:
+	pte_unmap(ptem);
+	return ret;
+}
+#else
+
+/*
+ * If we can't determine whether or not a pte is special, then fail immediately
+ * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
+ * to be special.
+ */
+static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	return 0;
+}
+#endif /* __HAVE_ARCH_PTE_SPECIAL */
+
+static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pmd_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(orig);
+	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	/*
+	 * Any tail pages need their mapcount reference taken before we
+	 * return. (This allows the THP code to bump their ref count when
+	 * they are split into base pages).
+	 */
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pud_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pud_page(orig);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(&pud, addr);
+	do {
+		pmd_t pmd = ACCESS_ONCE(*pmdp);
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+			return 0;
+
+		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+				pages, nr))
+				return 0;
+		} else {
+			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+				return 0;
+		}
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		pud_t pud = ACCESS_ONCE(*pudp);
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			return 0;
+		if (pud_huge(pud)) {
+			if (!gup_huge_pud(pud, pudp, addr, next, write,
+					pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			  struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next, flags;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		return 0;
+
+	/*
+	 * Disable interrupts, we use the nested form as we can already
+	 * have interrupts disabled by get_futex_key.
+	 *
+	 * With interrupts disabled, we block page table pages from being
+	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+	 * for more details.
+	 *
+	 * We do not adopt an rcu_read_lock(.) here as we also want to
+	 * block IPIs that come from THPs splitting.
+	 */
+
+	local_irq_save(flags);
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			break;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			break;
+	} while (pgdp++, addr = next, addr != end);
+	local_irq_restore(flags);
+
+	return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr, ret;
+
+	start &= PAGE_MASK;
+	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	ret = nr;
+
+	if (nr < nr_pages) {
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+				     nr_pages - nr, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+	}
+
+	return ret;
+}
+
+#endif /* CONFIG_HAVE_RCU_GUP */
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

get_user_pages_fast attempts to pin user pages by walking the page
tables directly and avoids taking locks. Thus the walker needs to be
protected from page table pages being freed from under it, and needs
to block any THP splits.

One way to achieve this is to have the walker disable interrupts, and
rely on IPIs from the TLB flushing code blocking before the page table
pages are freed.

On some platforms we have hardware broadcast of TLB invalidations, thus
the TLB flushing code doesn't necessarily need to broadcast IPIs; and
spuriously broadcasting IPIs can hurt system performance if done too
often.

This problem has been solved on PowerPC and Sparc by batching up page
table pages belonging to more than one mm_user, then scheduling an
rcu_sched callback to free the pages. This RCU page table free logic
has been promoted to core code and is activated when one enables
HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
their own get_user_pages_fast routines.

The RCU page table free logic coupled with a an IPI broadcast on THP
split (which is a rare event), allows one to protect a page table
walker by merely disabling the interrupts during the walk.

This patch provides a general RCU implementation of get_user_pages_fast
that can be used by architectures that perform hardware broadcast of
TLB invalidations.

It is based heavily on the PowerPC implementation by Nick Piggin.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 mm/Kconfig |   3 +
 mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 886db21..6a4d764 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
 	boolean
 
+config HAVE_RCU_GUP
+	boolean
+
 config ARCH_DISCARD_MEMBLOCK
 	boolean
 
diff --git a/mm/gup.c b/mm/gup.c
index 91d044b..2f684fa 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 
+#include <linux/sched.h>
+#include <linux/rwsem.h>
+#include <asm/pgtable.h>
+
 #include "internal.h"
 
 static struct page *no_page_table(struct vm_area_struct *vma,
@@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
 	return page;
 }
 #endif /* CONFIG_ELF_CORE */
+
+#ifdef CONFIG_HAVE_RCU_GUP
+
+#ifdef __HAVE_ARCH_PTE_SPECIAL
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, *ptem;
+	int ret = 0;
+
+	ptem = ptep = pte_offset_map(&pmd, addr);
+	do {
+		pte_t pte = ACCESS_ONCE(*ptep);
+		struct page *page;
+
+		if (!pte_present(pte) || pte_special(pte)
+			|| (write && !pte_write(pte)))
+			goto pte_unmap;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			goto pte_unmap;
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			goto pte_unmap;
+		}
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	ret = 1;
+
+pte_unmap:
+	pte_unmap(ptem);
+	return ret;
+}
+#else
+
+/*
+ * If we can't determine whether or not a pte is special, then fail immediately
+ * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
+ * to be special.
+ */
+static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	return 0;
+}
+#endif /* __HAVE_ARCH_PTE_SPECIAL */
+
+static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pmd_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(orig);
+	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	/*
+	 * Any tail pages need their mapcount reference taken before we
+	 * return. (This allows the THP code to bump their ref count when
+	 * they are split into base pages).
+	 */
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pud_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pud_page(orig);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(&pud, addr);
+	do {
+		pmd_t pmd = ACCESS_ONCE(*pmdp);
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+			return 0;
+
+		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+				pages, nr))
+				return 0;
+		} else {
+			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+				return 0;
+		}
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		pud_t pud = ACCESS_ONCE(*pudp);
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			return 0;
+		if (pud_huge(pud)) {
+			if (!gup_huge_pud(pud, pudp, addr, next, write,
+					pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			  struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next, flags;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		return 0;
+
+	/*
+	 * Disable interrupts, we use the nested form as we can already
+	 * have interrupts disabled by get_futex_key.
+	 *
+	 * With interrupts disabled, we block page table pages from being
+	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+	 * for more details.
+	 *
+	 * We do not adopt an rcu_read_lock(.) here as we also want to
+	 * block IPIs that come from THPs splitting.
+	 */
+
+	local_irq_save(flags);
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			break;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			break;
+	} while (pgdp++, addr = next, addr != end);
+	local_irq_restore(flags);
+
+	return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr, ret;
+
+	start &= PAGE_MASK;
+	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	ret = nr;
+
+	if (nr < nr_pages) {
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+				     nr_pages - nr, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+	}
+
+	return ret;
+}
+
+#endif /* CONFIG_HAVE_RCU_GUP */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

get_user_pages_fast attempts to pin user pages by walking the page
tables directly and avoids taking locks. Thus the walker needs to be
protected from page table pages being freed from under it, and needs
to block any THP splits.

One way to achieve this is to have the walker disable interrupts, and
rely on IPIs from the TLB flushing code blocking before the page table
pages are freed.

On some platforms we have hardware broadcast of TLB invalidations, thus
the TLB flushing code doesn't necessarily need to broadcast IPIs; and
spuriously broadcasting IPIs can hurt system performance if done too
often.

This problem has been solved on PowerPC and Sparc by batching up page
table pages belonging to more than one mm_user, then scheduling an
rcu_sched callback to free the pages. This RCU page table free logic
has been promoted to core code and is activated when one enables
HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
their own get_user_pages_fast routines.

The RCU page table free logic coupled with a an IPI broadcast on THP
split (which is a rare event), allows one to protect a page table
walker by merely disabling the interrupts during the walk.

This patch provides a general RCU implementation of get_user_pages_fast
that can be used by architectures that perform hardware broadcast of
TLB invalidations.

It is based heavily on the PowerPC implementation by Nick Piggin.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 mm/Kconfig |   3 +
 mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 886db21..6a4d764 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
 	boolean
 
+config HAVE_RCU_GUP
+	boolean
+
 config ARCH_DISCARD_MEMBLOCK
 	boolean
 
diff --git a/mm/gup.c b/mm/gup.c
index 91d044b..2f684fa 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 
+#include <linux/sched.h>
+#include <linux/rwsem.h>
+#include <asm/pgtable.h>
+
 #include "internal.h"
 
 static struct page *no_page_table(struct vm_area_struct *vma,
@@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
 	return page;
 }
 #endif /* CONFIG_ELF_CORE */
+
+#ifdef CONFIG_HAVE_RCU_GUP
+
+#ifdef __HAVE_ARCH_PTE_SPECIAL
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, *ptem;
+	int ret = 0;
+
+	ptem = ptep = pte_offset_map(&pmd, addr);
+	do {
+		pte_t pte = ACCESS_ONCE(*ptep);
+		struct page *page;
+
+		if (!pte_present(pte) || pte_special(pte)
+			|| (write && !pte_write(pte)))
+			goto pte_unmap;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			goto pte_unmap;
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			goto pte_unmap;
+		}
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	ret = 1;
+
+pte_unmap:
+	pte_unmap(ptem);
+	return ret;
+}
+#else
+
+/*
+ * If we can't determine whether or not a pte is special, then fail immediately
+ * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
+ * to be special.
+ */
+static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	return 0;
+}
+#endif /* __HAVE_ARCH_PTE_SPECIAL */
+
+static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pmd_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(orig);
+	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	/*
+	 * Any tail pages need their mapcount reference taken before we
+	 * return. (This allows the THP code to bump their ref count when
+	 * they are split into base pages).
+	 */
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pud_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pud_page(orig);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(&pud, addr);
+	do {
+		pmd_t pmd = ACCESS_ONCE(*pmdp);
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+			return 0;
+
+		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+				pages, nr))
+				return 0;
+		} else {
+			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+				return 0;
+		}
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		pud_t pud = ACCESS_ONCE(*pudp);
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			return 0;
+		if (pud_huge(pud)) {
+			if (!gup_huge_pud(pud, pudp, addr, next, write,
+					pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			  struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next, flags;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		return 0;
+
+	/*
+	 * Disable interrupts, we use the nested form as we can already
+	 * have interrupts disabled by get_futex_key.
+	 *
+	 * With interrupts disabled, we block page table pages from being
+	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+	 * for more details.
+	 *
+	 * We do not adopt an rcu_read_lock(.) here as we also want to
+	 * block IPIs that come from THPs splitting.
+	 */
+
+	local_irq_save(flags);
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			break;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			break;
+	} while (pgdp++, addr = next, addr != end);
+	local_irq_restore(flags);
+
+	return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr, ret;
+
+	start &= PAGE_MASK;
+	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	ret = nr;
+
+	if (nr < nr_pages) {
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+				     nr_pages - nr, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+	}
+
+	return ret;
+}
+
+#endif /* CONFIG_HAVE_RCU_GUP */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE
  2014-08-21 15:43 ` Steve Capper
  (?)
@ 2014-08-21 15:43   ` Steve Capper
  -1 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

We need a mechanism to tag ptes as being special, this indicates that
no attempt should be made to access the underlying struct page *
associated with the pte. This is used by the fast_gup when operating on
ptes as it has no means to access VMAs (that also contain this
information) locklessly.

The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
pte_special and pte_mkspecial to make use of it, and defines
__HAVE_ARCH_PTE_SPECIAL.

This patch also excludes special ptes from the icache/dcache sync logic.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 arch/arm/include/asm/pgtable-3level.h | 7 +++++++
 arch/arm/include/asm/pgtable.h        | 6 ++----
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 219ac88..f027941 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_addr_end(addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#define pte_special(pte)	(0)
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 
 /*
  * We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 06e0bc0..16122d4 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -213,6 +213,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_isclear(pmd, val)	(!(pmd_val(pmd) & (val)))
 
 #define pmd_young(pmd)		(pmd_isset((pmd), PMD_SECT_AF))
+#define pte_special(pte)	(pte_isset((pte), L_PTE_SPECIAL))
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+	pte_val(pte) |= L_PTE_SPECIAL;
+	return pte;
+}
+#define	__HAVE_ARCH_PTE_SPECIAL
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 01baef0..90aa4583 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -226,7 +226,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_dirty(pte)		(pte_isset((pte), L_PTE_DIRTY))
 #define pte_young(pte)		(pte_isset((pte), L_PTE_YOUNG))
 #define pte_exec(pte)		(pte_isclear((pte), L_PTE_XN))
-#define pte_special(pte)	(0)
 
 #define pte_valid_user(pte)	\
 	(pte_valid(pte) && pte_isset((pte), L_PTE_USER) && pte_young(pte))
@@ -245,7 +244,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long ext = 0;
 
 	if (addr < TASK_SIZE && pte_valid_user(pteval)) {
-		__sync_icache_dcache(pteval);
+		if (!pte_special(pteval))
+			__sync_icache_dcache(pteval);
 		ext |= PTE_EXT_NG;
 	}
 
@@ -264,8 +264,6 @@ PTE_BIT_FUNC(mkyoung,   |= L_PTE_YOUNG);
 PTE_BIT_FUNC(mkexec,   &= ~L_PTE_XN);
 PTE_BIT_FUNC(mknexec,   |= L_PTE_XN);
 
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

We need a mechanism to tag ptes as being special, this indicates that
no attempt should be made to access the underlying struct page *
associated with the pte. This is used by the fast_gup when operating on
ptes as it has no means to access VMAs (that also contain this
information) locklessly.

The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
pte_special and pte_mkspecial to make use of it, and defines
__HAVE_ARCH_PTE_SPECIAL.

This patch also excludes special ptes from the icache/dcache sync logic.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 arch/arm/include/asm/pgtable-3level.h | 7 +++++++
 arch/arm/include/asm/pgtable.h        | 6 ++----
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 219ac88..f027941 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_addr_end(addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#define pte_special(pte)	(0)
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 
 /*
  * We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 06e0bc0..16122d4 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -213,6 +213,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_isclear(pmd, val)	(!(pmd_val(pmd) & (val)))
 
 #define pmd_young(pmd)		(pmd_isset((pmd), PMD_SECT_AF))
+#define pte_special(pte)	(pte_isset((pte), L_PTE_SPECIAL))
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+	pte_val(pte) |= L_PTE_SPECIAL;
+	return pte;
+}
+#define	__HAVE_ARCH_PTE_SPECIAL
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 01baef0..90aa4583 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -226,7 +226,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_dirty(pte)		(pte_isset((pte), L_PTE_DIRTY))
 #define pte_young(pte)		(pte_isset((pte), L_PTE_YOUNG))
 #define pte_exec(pte)		(pte_isclear((pte), L_PTE_XN))
-#define pte_special(pte)	(0)
 
 #define pte_valid_user(pte)	\
 	(pte_valid(pte) && pte_isset((pte), L_PTE_USER) && pte_young(pte))
@@ -245,7 +244,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long ext = 0;
 
 	if (addr < TASK_SIZE && pte_valid_user(pteval)) {
-		__sync_icache_dcache(pteval);
+		if (!pte_special(pteval))
+			__sync_icache_dcache(pteval);
 		ext |= PTE_EXT_NG;
 	}
 
@@ -264,8 +264,6 @@ PTE_BIT_FUNC(mkyoung,   |= L_PTE_YOUNG);
 PTE_BIT_FUNC(mkexec,   &= ~L_PTE_XN);
 PTE_BIT_FUNC(mknexec,   |= L_PTE_XN);
 
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

We need a mechanism to tag ptes as being special, this indicates that
no attempt should be made to access the underlying struct page *
associated with the pte. This is used by the fast_gup when operating on
ptes as it has no means to access VMAs (that also contain this
information) locklessly.

The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
pte_special and pte_mkspecial to make use of it, and defines
__HAVE_ARCH_PTE_SPECIAL.

This patch also excludes special ptes from the icache/dcache sync logic.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 arch/arm/include/asm/pgtable-3level.h | 7 +++++++
 arch/arm/include/asm/pgtable.h        | 6 ++----
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 219ac88..f027941 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_addr_end(addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#define pte_special(pte)	(0)
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 
 /*
  * We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 06e0bc0..16122d4 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -213,6 +213,13 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_isclear(pmd, val)	(!(pmd_val(pmd) & (val)))
 
 #define pmd_young(pmd)		(pmd_isset((pmd), PMD_SECT_AF))
+#define pte_special(pte)	(pte_isset((pte), L_PTE_SPECIAL))
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+	pte_val(pte) |= L_PTE_SPECIAL;
+	return pte;
+}
+#define	__HAVE_ARCH_PTE_SPECIAL
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 01baef0..90aa4583 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -226,7 +226,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_dirty(pte)		(pte_isset((pte), L_PTE_DIRTY))
 #define pte_young(pte)		(pte_isset((pte), L_PTE_YOUNG))
 #define pte_exec(pte)		(pte_isclear((pte), L_PTE_XN))
-#define pte_special(pte)	(0)
 
 #define pte_valid_user(pte)	\
 	(pte_valid(pte) && pte_isset((pte), L_PTE_USER) && pte_young(pte))
@@ -245,7 +244,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long ext = 0;
 
 	if (addr < TASK_SIZE && pte_valid_user(pteval)) {
-		__sync_icache_dcache(pteval);
+		if (!pte_special(pteval))
+			__sync_icache_dcache(pteval);
 		ext |= PTE_EXT_NG;
 	}
 
@@ -264,8 +264,6 @@ PTE_BIT_FUNC(mkyoung,   |= L_PTE_YOUNG);
 PTE_BIT_FUNC(mkexec,   &= ~L_PTE_XN);
 PTE_BIT_FUNC(mknexec,   |= L_PTE_XN);
 
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
  2014-08-21 15:43 ` Steve Capper
  (?)
@ 2014-08-21 15:43   ` Steve Capper
  -1 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect
the fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig           |  1 +
 arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c49a775..cc740d2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -60,6 +60,7 @@ config ARM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f1a0dac..3cadb72 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -35,12 +35,39 @@
 
 #define MMU_GATHER_BUNDLE	8
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+
+struct mmu_table_batch {
+	struct rcu_head		rcu;
+	unsigned int		nr;
+	void			*tables[0];
+};
+
+#define MAX_TABLE_BATCH		\
+	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * TLB handling.  This allows us to remove pages from the page
  * tables, and efficiently handle the TLB issues.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+	unsigned int		need_flush;
+#endif
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
 	unsigned long		start, end;
@@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
@@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
 	__tlb_alloc_page(tlb);
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
 }
 
 static inline void
@@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	tlb_add_flush(tlb, addr + SZ_1M);
 #endif
 
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
@@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 {
 #ifdef CONFIG_ARM_LPAE
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 #endif
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect
the fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig           |  1 +
 arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c49a775..cc740d2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -60,6 +60,7 @@ config ARM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f1a0dac..3cadb72 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -35,12 +35,39 @@
 
 #define MMU_GATHER_BUNDLE	8
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+
+struct mmu_table_batch {
+	struct rcu_head		rcu;
+	unsigned int		nr;
+	void			*tables[0];
+};
+
+#define MAX_TABLE_BATCH		\
+	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * TLB handling.  This allows us to remove pages from the page
  * tables, and efficiently handle the TLB issues.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+	unsigned int		need_flush;
+#endif
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
 	unsigned long		start, end;
@@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
@@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
 	__tlb_alloc_page(tlb);
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
 }
 
 static inline void
@@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	tlb_add_flush(tlb, addr + SZ_1M);
 #endif
 
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
@@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 {
 #ifdef CONFIG_ARM_LPAE
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 #endif
 }
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect
the fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig           |  1 +
 arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c49a775..cc740d2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -60,6 +60,7 @@ config ARM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f1a0dac..3cadb72 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -35,12 +35,39 @@
 
 #define MMU_GATHER_BUNDLE	8
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+
+struct mmu_table_batch {
+	struct rcu_head		rcu;
+	unsigned int		nr;
+	void			*tables[0];
+};
+
+#define MAX_TABLE_BATCH		\
+	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * TLB handling.  This allows us to remove pages from the page
  * tables, and efficiently handle the TLB issues.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+	unsigned int		need_flush;
+#endif
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
 	unsigned long		start, end;
@@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
@@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
 	__tlb_alloc_page(tlb);
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
 }
 
 static inline void
@@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	tlb_add_flush(tlb, addr + SZ_1M);
 #endif
 
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
@@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 {
 #ifdef CONFIG_ARM_LPAE
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 #endif
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

Activate the RCU fast_gup for ARM. We also need to force THP splits to
broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig                      |  4 ++++
 arch/arm/include/asm/pgtable-3level.h |  8 ++++++++
 arch/arm/mm/flush.c                   | 15 +++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cc740d2..21f12be 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1645,6 +1645,10 @@ config ARCH_SELECT_MEMORY_MODEL
 config HAVE_ARCH_PFN_VALID
 	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
 
+config HAVE_RCU_GUP
+	def_bool y
+	depends on ARM_LPAE
+
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 16122d4..a31ecdad 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -224,6 +224,8 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
 #define pmd_dirty(pmd)		(pmd_isset((pmd), L_PMD_SECT_DIRTY))
+#define pud_page(pud)		pmd_page(__pmd(pud_val(pud)))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 
 #define pmd_hugewillfault(pmd)	(!pmd_young(pmd) || !pmd_write(pmd))
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
@@ -231,6 +233,12 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !pmd_table(pmd))
 #define pmd_trans_splitting(pmd) (pmd_isset((pmd), L_PMD_SECT_SPLITTING))
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif
 #endif
 
 #define PMD_BIT_FUNC(fn,op) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 43d54f5..265b836 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -400,3 +400,18 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l
 	 */
 	__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

Activate the RCU fast_gup for ARM. We also need to force THP splits to
broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig                      |  4 ++++
 arch/arm/include/asm/pgtable-3level.h |  8 ++++++++
 arch/arm/mm/flush.c                   | 15 +++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cc740d2..21f12be 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1645,6 +1645,10 @@ config ARCH_SELECT_MEMORY_MODEL
 config HAVE_ARCH_PFN_VALID
 	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
 
+config HAVE_RCU_GUP
+	def_bool y
+	depends on ARM_LPAE
+
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 16122d4..a31ecdad 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -224,6 +224,8 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
 #define pmd_dirty(pmd)		(pmd_isset((pmd), L_PMD_SECT_DIRTY))
+#define pud_page(pud)		pmd_page(__pmd(pud_val(pud)))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 
 #define pmd_hugewillfault(pmd)	(!pmd_young(pmd) || !pmd_write(pmd))
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
@@ -231,6 +233,12 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !pmd_table(pmd))
 #define pmd_trans_splitting(pmd) (pmd_isset((pmd), L_PMD_SECT_SPLITTING))
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif
 #endif
 
 #define PMD_BIT_FUNC(fn,op) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 43d54f5..265b836 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -400,3 +400,18 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l
 	 */
 	__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

Activate the RCU fast_gup for ARM. We also need to force THP splits to
broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig                      |  4 ++++
 arch/arm/include/asm/pgtable-3level.h |  8 ++++++++
 arch/arm/mm/flush.c                   | 15 +++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index cc740d2..21f12be 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1645,6 +1645,10 @@ config ARCH_SELECT_MEMORY_MODEL
 config HAVE_ARCH_PFN_VALID
 	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
 
+config HAVE_RCU_GUP
+	def_bool y
+	depends on ARM_LPAE
+
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 16122d4..a31ecdad 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -224,6 +224,8 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(pmd_isclear((pmd), L_PMD_SECT_RDONLY))
 #define pmd_dirty(pmd)		(pmd_isset((pmd), L_PMD_SECT_DIRTY))
+#define pud_page(pud)		pmd_page(__pmd(pud_val(pud)))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 
 #define pmd_hugewillfault(pmd)	(!pmd_young(pmd) || !pmd_write(pmd))
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
@@ -231,6 +233,12 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !pmd_table(pmd))
 #define pmd_trans_splitting(pmd) (pmd_isset((pmd), L_PMD_SECT_SPLITTING))
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif
 #endif
 
 #define PMD_BIT_FUNC(fn,op) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 43d54f5..265b836 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -400,3 +400,18 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l
 	 */
 	__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  2014-08-21 15:43 ` Steve Capper
  (?)
@ 2014-08-21 15:43   ` Steve Capper
  -1 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect the
fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig           |  1 +
 arch/arm64/include/asm/tlb.h | 20 +++++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd4e81a..ce9062b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -54,6 +54,7 @@ config ARM64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
 	select IRQ_DOMAIN
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 62731ef..a82c0c5 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -23,6 +23,20 @@
 
 #include <asm-generic/tlb.h>
 
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * There's three ways the TLB shootdown code is used:
  *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
@@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 {
 	pgtable_page_dtor(pte);
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 2
@@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 }
 #endif
 
@@ -105,7 +119,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pudp));
+	tlb_remove_entry(tlb, virt_to_page(pudp));
 }
 #endif
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect the
fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig           |  1 +
 arch/arm64/include/asm/tlb.h | 20 +++++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd4e81a..ce9062b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -54,6 +54,7 @@ config ARM64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
 	select IRQ_DOMAIN
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 62731ef..a82c0c5 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -23,6 +23,20 @@
 
 #include <asm-generic/tlb.h>
 
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * There's three ways the TLB shootdown code is used:
  *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
@@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 {
 	pgtable_page_dtor(pte);
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 2
@@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 }
 #endif
 
@@ -105,7 +119,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pudp));
+	tlb_remove_entry(tlb, virt_to_page(pudp));
 }
 #endif
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect the
fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig           |  1 +
 arch/arm64/include/asm/tlb.h | 20 +++++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd4e81a..ce9062b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -54,6 +54,7 @@ config ARM64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
 	select IRQ_DOMAIN
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 62731ef..a82c0c5 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -23,6 +23,20 @@
 
 #include <asm-generic/tlb.h>
 
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * There's three ways the TLB shootdown code is used:
  *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
@@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 {
 	pgtable_page_dtor(pte);
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 2
@@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 }
 #endif
 
@@ -105,7 +119,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pudp));
+	tlb_remove_entry(tlb, virt_to_page(pudp));
 }
 #endif
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

Activate the RCU fast_gup for ARM64. We also need to force THP splits
to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
 arch/arm64/mm/flush.c            | 15 +++++++++++++++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ce9062b..f03273d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -108,6 +108,9 @@ config GENERIC_CALIBRATE_DELAY
 config ZONE_DMA
 	def_bool y
 
+config HAVE_RCU_GUP
+	def_bool y
+
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool y
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index ffe1ba0..f2a48e9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -256,7 +256,13 @@ static inline pmd_t pte_pmd(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
-#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+struct vm_area_struct;
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
@@ -277,6 +283,7 @@ static inline pmd_t pte_pmd(pte_t pte)
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
 #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
 
 #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
@@ -376,6 +383,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
 }
 
+#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))
+
 #endif	/* CONFIG_ARM64_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 3
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 0d64089..2d5fd47 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -104,3 +104,18 @@ EXPORT_SYMBOL(flush_dcache_page);
  */
 EXPORT_SYMBOL(flush_cache_all);
 EXPORT_SYMBOL(flush_icache_range);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman,
	Steve Capper

Activate the RCU fast_gup for ARM64. We also need to force THP splits
to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
 arch/arm64/mm/flush.c            | 15 +++++++++++++++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ce9062b..f03273d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -108,6 +108,9 @@ config GENERIC_CALIBRATE_DELAY
 config ZONE_DMA
 	def_bool y
 
+config HAVE_RCU_GUP
+	def_bool y
+
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool y
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index ffe1ba0..f2a48e9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -256,7 +256,13 @@ static inline pmd_t pte_pmd(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
-#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+struct vm_area_struct;
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
@@ -277,6 +283,7 @@ static inline pmd_t pte_pmd(pte_t pte)
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
 #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
 
 #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
@@ -376,6 +383,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
 }
 
+#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))
+
 #endif	/* CONFIG_ARM64_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 3
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 0d64089..2d5fd47 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -104,3 +104,18 @@ EXPORT_SYMBOL(flush_dcache_page);
  */
 EXPORT_SYMBOL(flush_cache_all);
 EXPORT_SYMBOL(flush_icache_range);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATH V2 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-21 15:43   ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-21 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

Activate the RCU fast_gup for ARM64. We also need to force THP splits
to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
 arch/arm64/mm/flush.c            | 15 +++++++++++++++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ce9062b..f03273d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -108,6 +108,9 @@ config GENERIC_CALIBRATE_DELAY
 config ZONE_DMA
 	def_bool y
 
+config HAVE_RCU_GUP
+	def_bool y
+
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool y
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index ffe1ba0..f2a48e9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -256,7 +256,13 @@ static inline pmd_t pte_pmd(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
-#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+struct vm_area_struct;
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
@@ -277,6 +283,7 @@ static inline pmd_t pte_pmd(pte_t pte)
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
 #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
 
 #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
@@ -376,6 +383,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
 }
 
+#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))
+
 #endif	/* CONFIG_ARM64_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_ARM64_PGTABLE_LEVELS > 3
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 0d64089..2d5fd47 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -104,3 +104,18 @@ EXPORT_SYMBOL(flush_dcache_page);
  */
 EXPORT_SYMBOL(flush_cache_all);
 EXPORT_SYMBOL(flush_icache_range);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	kick_all_cpus_sync();
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-21 20:42   ` Dann Frazier
  0 siblings, 0 replies; 78+ messages in thread
From: Dann Frazier @ 2014-08-21 20:42 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	Will Deacon, gary.robertson, Christoffer Dall, peterz,
	anders.roxell, akpm, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 9:43 AM, Steve Capper <steve.capper@linaro.org> wrote:
> Hello,
> This series implements general forms of get_user_pages_fast and
> __get_user_pages_fast and activates them for arm and arm64.
>
> These are required for Transparent HugePages to function correctly, as
> a futex on a THP tail will otherwise result in an infinite loop (due to
> the core implementation of __get_user_pages_fast always returning 0).
>
> Unfortunately, a futex on THP tail can be quite common for certain
> workloads; thus THP is unreliable without a __get_user_pages_fast
> implementation.
>
> This series may also be beneficial for direct-IO heavy workloads and
> certain KVM workloads.
>
> Changes since PATCH V1 are:
>  * Rebase to 3.17-rc1
>  * Switched to kick_all_cpus_sync as suggested by Mark Rutland.
>
> The main changes since RFC V5 are:
>  * Rebased against 3.16-rc1.
>  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
>    because the entry must be present for these leaf functions to be
>    called.
>  * Rather than assume puds can be re-cast as pmds, a separate
>    function pud_write is instead used by the core gup.
>  * ARM activation logic changed, now it will only activate
>    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
>
> The main changes since RFC V4 are:
>  * corrected the arm64 logic so it now correctly rcu-frees page
>    table backing pages.
>  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
>    invalidate TLBs anyway.
>  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
>  * dropped Catalin's mmu_gather patch as that's been merged already.
>
> This series has been tested with LTP mm tests and some custom futex tests
> that exacerbate the futex on THP tail case; on both an Arndale board and
> a Juno board. Also debug counters were temporarily employed to ensure that
> the RCU_TABLE_FREE logic was behaving as expected.
>
> I would really appreciate any comments (especially on the validity or
> otherwise of the core fast_gup implementation) and testers.

Continues to gets rid of my gccgo hang issue w/ THP.

Tested-by: dann frazier <dann.frazier@canonical.com>

> Cheers,
> --
> Steve
>
> Steve Capper (6):
>   mm: Introduce a general RCU get_user_pages_fast.
>   arm: mm: Introduce special ptes for LPAE
>   arm: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm: mm: Enable RCU fast_gup
>   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm64: mm: Enable RCU fast_gup
>
>  arch/arm/Kconfig                      |   5 +
>  arch/arm/include/asm/pgtable-2level.h |   2 +
>  arch/arm/include/asm/pgtable-3level.h |  15 ++
>  arch/arm/include/asm/pgtable.h        |   6 +-
>  arch/arm/include/asm/tlb.h            |  38 ++++-
>  arch/arm/mm/flush.c                   |  15 ++
>  arch/arm64/Kconfig                    |   4 +
>  arch/arm64/include/asm/pgtable.h      |  11 +-
>  arch/arm64/include/asm/tlb.h          |  20 ++-
>  arch/arm64/mm/flush.c                 |  15 ++
>  mm/Kconfig                            |   3 +
>  mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
>  12 files changed, 402 insertions(+), 10 deletions(-)
>
> --
> 1.9.3
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-21 20:42   ` Dann Frazier
  0 siblings, 0 replies; 78+ messages in thread
From: Dann Frazier @ 2014-08-21 20:42 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	Will Deacon, gary.robertson, Christoffer Dall, peterz,
	anders.roxell, akpm, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 9:43 AM, Steve Capper <steve.capper@linaro.org> wrote:
> Hello,
> This series implements general forms of get_user_pages_fast and
> __get_user_pages_fast and activates them for arm and arm64.
>
> These are required for Transparent HugePages to function correctly, as
> a futex on a THP tail will otherwise result in an infinite loop (due to
> the core implementation of __get_user_pages_fast always returning 0).
>
> Unfortunately, a futex on THP tail can be quite common for certain
> workloads; thus THP is unreliable without a __get_user_pages_fast
> implementation.
>
> This series may also be beneficial for direct-IO heavy workloads and
> certain KVM workloads.
>
> Changes since PATCH V1 are:
>  * Rebase to 3.17-rc1
>  * Switched to kick_all_cpus_sync as suggested by Mark Rutland.
>
> The main changes since RFC V5 are:
>  * Rebased against 3.16-rc1.
>  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
>    because the entry must be present for these leaf functions to be
>    called.
>  * Rather than assume puds can be re-cast as pmds, a separate
>    function pud_write is instead used by the core gup.
>  * ARM activation logic changed, now it will only activate
>    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
>
> The main changes since RFC V4 are:
>  * corrected the arm64 logic so it now correctly rcu-frees page
>    table backing pages.
>  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
>    invalidate TLBs anyway.
>  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
>  * dropped Catalin's mmu_gather patch as that's been merged already.
>
> This series has been tested with LTP mm tests and some custom futex tests
> that exacerbate the futex on THP tail case; on both an Arndale board and
> a Juno board. Also debug counters were temporarily employed to ensure that
> the RCU_TABLE_FREE logic was behaving as expected.
>
> I would really appreciate any comments (especially on the validity or
> otherwise of the core fast_gup implementation) and testers.

Continues to gets rid of my gccgo hang issue w/ THP.

Tested-by: dann frazier <dann.frazier@canonical.com>

> Cheers,
> --
> Steve
>
> Steve Capper (6):
>   mm: Introduce a general RCU get_user_pages_fast.
>   arm: mm: Introduce special ptes for LPAE
>   arm: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm: mm: Enable RCU fast_gup
>   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm64: mm: Enable RCU fast_gup
>
>  arch/arm/Kconfig                      |   5 +
>  arch/arm/include/asm/pgtable-2level.h |   2 +
>  arch/arm/include/asm/pgtable-3level.h |  15 ++
>  arch/arm/include/asm/pgtable.h        |   6 +-
>  arch/arm/include/asm/tlb.h            |  38 ++++-
>  arch/arm/mm/flush.c                   |  15 ++
>  arch/arm64/Kconfig                    |   4 +
>  arch/arm64/include/asm/pgtable.h      |  11 +-
>  arch/arm64/include/asm/tlb.h          |  20 ++-
>  arch/arm64/mm/flush.c                 |  15 ++
>  mm/Kconfig                            |   3 +
>  mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
>  12 files changed, 402 insertions(+), 10 deletions(-)
>
> --
> 1.9.3
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-21 20:42   ` Dann Frazier
  0 siblings, 0 replies; 78+ messages in thread
From: Dann Frazier @ 2014-08-21 20:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 21, 2014 at 9:43 AM, Steve Capper <steve.capper@linaro.org> wrote:
> Hello,
> This series implements general forms of get_user_pages_fast and
> __get_user_pages_fast and activates them for arm and arm64.
>
> These are required for Transparent HugePages to function correctly, as
> a futex on a THP tail will otherwise result in an infinite loop (due to
> the core implementation of __get_user_pages_fast always returning 0).
>
> Unfortunately, a futex on THP tail can be quite common for certain
> workloads; thus THP is unreliable without a __get_user_pages_fast
> implementation.
>
> This series may also be beneficial for direct-IO heavy workloads and
> certain KVM workloads.
>
> Changes since PATCH V1 are:
>  * Rebase to 3.17-rc1
>  * Switched to kick_all_cpus_sync as suggested by Mark Rutland.
>
> The main changes since RFC V5 are:
>  * Rebased against 3.16-rc1.
>  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
>    because the entry must be present for these leaf functions to be
>    called.
>  * Rather than assume puds can be re-cast as pmds, a separate
>    function pud_write is instead used by the core gup.
>  * ARM activation logic changed, now it will only activate
>    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
>
> The main changes since RFC V4 are:
>  * corrected the arm64 logic so it now correctly rcu-frees page
>    table backing pages.
>  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
>    invalidate TLBs anyway.
>  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
>  * dropped Catalin's mmu_gather patch as that's been merged already.
>
> This series has been tested with LTP mm tests and some custom futex tests
> that exacerbate the futex on THP tail case; on both an Arndale board and
> a Juno board. Also debug counters were temporarily employed to ensure that
> the RCU_TABLE_FREE logic was behaving as expected.
>
> I would really appreciate any comments (especially on the validity or
> otherwise of the core fast_gup implementation) and testers.

Continues to gets rid of my gccgo hang issue w/ THP.

Tested-by: dann frazier <dann.frazier@canonical.com>

> Cheers,
> --
> Steve
>
> Steve Capper (6):
>   mm: Introduce a general RCU get_user_pages_fast.
>   arm: mm: Introduce special ptes for LPAE
>   arm: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm: mm: Enable RCU fast_gup
>   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm64: mm: Enable RCU fast_gup
>
>  arch/arm/Kconfig                      |   5 +
>  arch/arm/include/asm/pgtable-2level.h |   2 +
>  arch/arm/include/asm/pgtable-3level.h |  15 ++
>  arch/arm/include/asm/pgtable.h        |   6 +-
>  arch/arm/include/asm/tlb.h            |  38 ++++-
>  arch/arm/mm/flush.c                   |  15 ++
>  arch/arm64/Kconfig                    |   4 +
>  arch/arm64/include/asm/pgtable.h      |  11 +-
>  arch/arm64/include/asm/tlb.h          |  20 ++-
>  arch/arm64/mm/flush.c                 |  15 ++
>  mm/Kconfig                            |   3 +
>  mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
>  12 files changed, 402 insertions(+), 10 deletions(-)
>
> --
> 1.9.3
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-22  8:11     ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-22  8:11 UTC (permalink / raw)
  To: Dann Frazier
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	Will Deacon, gary.robertson, Christoffer Dall, peterz,
	anders.roxell, akpm, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 02:42:29PM -0600, Dann Frazier wrote:
> On Thu, Aug 21, 2014 at 9:43 AM, Steve Capper <steve.capper@linaro.org> wrote:
> > Hello,
> > This series implements general forms of get_user_pages_fast and
> > __get_user_pages_fast and activates them for arm and arm64.
> >
> > These are required for Transparent HugePages to function correctly, as
> > a futex on a THP tail will otherwise result in an infinite loop (due to
> > the core implementation of __get_user_pages_fast always returning 0).
> >
> > Unfortunately, a futex on THP tail can be quite common for certain
> > workloads; thus THP is unreliable without a __get_user_pages_fast
> > implementation.
> >
> > This series may also be beneficial for direct-IO heavy workloads and
> > certain KVM workloads.
> >
> > Changes since PATCH V1 are:
> >  * Rebase to 3.17-rc1
> >  * Switched to kick_all_cpus_sync as suggested by Mark Rutland.
> >
> > The main changes since RFC V5 are:
> >  * Rebased against 3.16-rc1.
> >  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
> >    because the entry must be present for these leaf functions to be
> >    called.
> >  * Rather than assume puds can be re-cast as pmds, a separate
> >    function pud_write is instead used by the core gup.
> >  * ARM activation logic changed, now it will only activate
> >    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
> >
> > The main changes since RFC V4 are:
> >  * corrected the arm64 logic so it now correctly rcu-frees page
> >    table backing pages.
> >  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
> >    invalidate TLBs anyway.
> >  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
> >  * dropped Catalin's mmu_gather patch as that's been merged already.
> >
> > This series has been tested with LTP mm tests and some custom futex tests
> > that exacerbate the futex on THP tail case; on both an Arndale board and
> > a Juno board. Also debug counters were temporarily employed to ensure that
> > the RCU_TABLE_FREE logic was behaving as expected.
> >
> > I would really appreciate any comments (especially on the validity or
> > otherwise of the core fast_gup implementation) and testers.
> 
> Continues to gets rid of my gccgo hang issue w/ THP.
> 
> Tested-by: dann frazier <dann.frazier@canonical.com>
> 

Thanks Dann,
I've added your Tested-by to the mm and two arm64 patches.

Cheers,
-- 
Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-22  8:11     ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-22  8:11 UTC (permalink / raw)
  To: Dann Frazier
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	Will Deacon, gary.robertson, Christoffer Dall, peterz,
	anders.roxell, akpm, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 02:42:29PM -0600, Dann Frazier wrote:
> On Thu, Aug 21, 2014 at 9:43 AM, Steve Capper <steve.capper@linaro.org> wrote:
> > Hello,
> > This series implements general forms of get_user_pages_fast and
> > __get_user_pages_fast and activates them for arm and arm64.
> >
> > These are required for Transparent HugePages to function correctly, as
> > a futex on a THP tail will otherwise result in an infinite loop (due to
> > the core implementation of __get_user_pages_fast always returning 0).
> >
> > Unfortunately, a futex on THP tail can be quite common for certain
> > workloads; thus THP is unreliable without a __get_user_pages_fast
> > implementation.
> >
> > This series may also be beneficial for direct-IO heavy workloads and
> > certain KVM workloads.
> >
> > Changes since PATCH V1 are:
> >  * Rebase to 3.17-rc1
> >  * Switched to kick_all_cpus_sync as suggested by Mark Rutland.
> >
> > The main changes since RFC V5 are:
> >  * Rebased against 3.16-rc1.
> >  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
> >    because the entry must be present for these leaf functions to be
> >    called.
> >  * Rather than assume puds can be re-cast as pmds, a separate
> >    function pud_write is instead used by the core gup.
> >  * ARM activation logic changed, now it will only activate
> >    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
> >
> > The main changes since RFC V4 are:
> >  * corrected the arm64 logic so it now correctly rcu-frees page
> >    table backing pages.
> >  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
> >    invalidate TLBs anyway.
> >  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
> >  * dropped Catalin's mmu_gather patch as that's been merged already.
> >
> > This series has been tested with LTP mm tests and some custom futex tests
> > that exacerbate the futex on THP tail case; on both an Arndale board and
> > a Juno board. Also debug counters were temporarily employed to ensure that
> > the RCU_TABLE_FREE logic was behaving as expected.
> >
> > I would really appreciate any comments (especially on the validity or
> > otherwise of the core fast_gup implementation) and testers.
> 
> Continues to gets rid of my gccgo hang issue w/ THP.
> 
> Tested-by: dann frazier <dann.frazier@canonical.com>
> 

Thanks Dann,
I've added your Tested-by to the mm and two arm64 patches.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-22  8:11     ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-22  8:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 21, 2014 at 02:42:29PM -0600, Dann Frazier wrote:
> On Thu, Aug 21, 2014 at 9:43 AM, Steve Capper <steve.capper@linaro.org> wrote:
> > Hello,
> > This series implements general forms of get_user_pages_fast and
> > __get_user_pages_fast and activates them for arm and arm64.
> >
> > These are required for Transparent HugePages to function correctly, as
> > a futex on a THP tail will otherwise result in an infinite loop (due to
> > the core implementation of __get_user_pages_fast always returning 0).
> >
> > Unfortunately, a futex on THP tail can be quite common for certain
> > workloads; thus THP is unreliable without a __get_user_pages_fast
> > implementation.
> >
> > This series may also be beneficial for direct-IO heavy workloads and
> > certain KVM workloads.
> >
> > Changes since PATCH V1 are:
> >  * Rebase to 3.17-rc1
> >  * Switched to kick_all_cpus_sync as suggested by Mark Rutland.
> >
> > The main changes since RFC V5 are:
> >  * Rebased against 3.16-rc1.
> >  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
> >    because the entry must be present for these leaf functions to be
> >    called.
> >  * Rather than assume puds can be re-cast as pmds, a separate
> >    function pud_write is instead used by the core gup.
> >  * ARM activation logic changed, now it will only activate
> >    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
> >
> > The main changes since RFC V4 are:
> >  * corrected the arm64 logic so it now correctly rcu-frees page
> >    table backing pages.
> >  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
> >    invalidate TLBs anyway.
> >  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
> >  * dropped Catalin's mmu_gather patch as that's been merged already.
> >
> > This series has been tested with LTP mm tests and some custom futex tests
> > that exacerbate the futex on THP tail case; on both an Arndale board and
> > a Juno board. Also debug counters were temporarily employed to ensure that
> > the RCU_TABLE_FREE logic was behaving as expected.
> >
> > I would really appreciate any comments (especially on the validity or
> > otherwise of the core fast_gup implementation) and testers.
> 
> Continues to gets rid of my gccgo hang issue w/ THP.
> 
> Tested-by: dann frazier <dann.frazier@canonical.com>
> 

Thanks Dann,
I've added your Tested-by to the mm and two arm64 patches.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
  2014-08-21 15:43   ` Steve Capper
  (?)
@ 2014-08-27  8:54     ` Will Deacon
  -1 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-27  8:54 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

Hi Steve,

A few minor comments (took me a while to understand how this works, so I
thought I'd make some noise :)

On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> get_user_pages_fast attempts to pin user pages by walking the page
> tables directly and avoids taking locks. Thus the walker needs to be
> protected from page table pages being freed from under it, and needs
> to block any THP splits.
> 
> One way to achieve this is to have the walker disable interrupts, and
> rely on IPIs from the TLB flushing code blocking before the page table
> pages are freed.
> 
> On some platforms we have hardware broadcast of TLB invalidations, thus
> the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> spuriously broadcasting IPIs can hurt system performance if done too
> often.
> 
> This problem has been solved on PowerPC and Sparc by batching up page
> table pages belonging to more than one mm_user, then scheduling an
> rcu_sched callback to free the pages. This RCU page table free logic
> has been promoted to core code and is activated when one enables
> HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> their own get_user_pages_fast routines.
> 
> The RCU page table free logic coupled with a an IPI broadcast on THP
> split (which is a rare event), allows one to protect a page table
> walker by merely disabling the interrupts during the walk.

Disabling interrupts isn't completely free (it's a self-synchronising
operation on ARM). It would be interesting to see if your futex workload
performance is improved by my simple irq_save optimisation for ARM:

  https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=misc-patches&id=312a70adfa6f22e9d62803dd21400f481253e58b

(I've been struggling to show anything other than tiny improvements from
that patch).

> This patch provides a general RCU implementation of get_user_pages_fast
> that can be used by architectures that perform hardware broadcast of
> TLB invalidations.
> 
> It is based heavily on the PowerPC implementation by Nick Piggin.

[...]

> diff --git a/mm/gup.c b/mm/gup.c
> index 91d044b..2f684fa 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -10,6 +10,10 @@
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
>  
> +#include <linux/sched.h>
> +#include <linux/rwsem.h>
> +#include <asm/pgtable.h>
> +
>  #include "internal.h"
>  
>  static struct page *no_page_table(struct vm_area_struct *vma,
> @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
>  	return page;
>  }
>  #endif /* CONFIG_ELF_CORE */
> +
> +#ifdef CONFIG_HAVE_RCU_GUP
> +
> +#ifdef __HAVE_ARCH_PTE_SPECIAL

Do we actually require this (pte special) if hugepages are disabled or
not supported?

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27  8:54     ` Will Deacon
  0 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-27  8:54 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

Hi Steve,

A few minor comments (took me a while to understand how this works, so I
thought I'd make some noise :)

On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> get_user_pages_fast attempts to pin user pages by walking the page
> tables directly and avoids taking locks. Thus the walker needs to be
> protected from page table pages being freed from under it, and needs
> to block any THP splits.
> 
> One way to achieve this is to have the walker disable interrupts, and
> rely on IPIs from the TLB flushing code blocking before the page table
> pages are freed.
> 
> On some platforms we have hardware broadcast of TLB invalidations, thus
> the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> spuriously broadcasting IPIs can hurt system performance if done too
> often.
> 
> This problem has been solved on PowerPC and Sparc by batching up page
> table pages belonging to more than one mm_user, then scheduling an
> rcu_sched callback to free the pages. This RCU page table free logic
> has been promoted to core code and is activated when one enables
> HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> their own get_user_pages_fast routines.
> 
> The RCU page table free logic coupled with a an IPI broadcast on THP
> split (which is a rare event), allows one to protect a page table
> walker by merely disabling the interrupts during the walk.

Disabling interrupts isn't completely free (it's a self-synchronising
operation on ARM). It would be interesting to see if your futex workload
performance is improved by my simple irq_save optimisation for ARM:

  https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=misc-patches&id=312a70adfa6f22e9d62803dd21400f481253e58b

(I've been struggling to show anything other than tiny improvements from
that patch).

> This patch provides a general RCU implementation of get_user_pages_fast
> that can be used by architectures that perform hardware broadcast of
> TLB invalidations.
> 
> It is based heavily on the PowerPC implementation by Nick Piggin.

[...]

> diff --git a/mm/gup.c b/mm/gup.c
> index 91d044b..2f684fa 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -10,6 +10,10 @@
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
>  
> +#include <linux/sched.h>
> +#include <linux/rwsem.h>
> +#include <asm/pgtable.h>
> +
>  #include "internal.h"
>  
>  static struct page *no_page_table(struct vm_area_struct *vma,
> @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
>  	return page;
>  }
>  #endif /* CONFIG_ELF_CORE */
> +
> +#ifdef CONFIG_HAVE_RCU_GUP
> +
> +#ifdef __HAVE_ARCH_PTE_SPECIAL

Do we actually require this (pte special) if hugepages are disabled or
not supported?

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27  8:54     ` Will Deacon
  0 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-27  8:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Steve,

A few minor comments (took me a while to understand how this works, so I
thought I'd make some noise :)

On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> get_user_pages_fast attempts to pin user pages by walking the page
> tables directly and avoids taking locks. Thus the walker needs to be
> protected from page table pages being freed from under it, and needs
> to block any THP splits.
> 
> One way to achieve this is to have the walker disable interrupts, and
> rely on IPIs from the TLB flushing code blocking before the page table
> pages are freed.
> 
> On some platforms we have hardware broadcast of TLB invalidations, thus
> the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> spuriously broadcasting IPIs can hurt system performance if done too
> often.
> 
> This problem has been solved on PowerPC and Sparc by batching up page
> table pages belonging to more than one mm_user, then scheduling an
> rcu_sched callback to free the pages. This RCU page table free logic
> has been promoted to core code and is activated when one enables
> HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> their own get_user_pages_fast routines.
> 
> The RCU page table free logic coupled with a an IPI broadcast on THP
> split (which is a rare event), allows one to protect a page table
> walker by merely disabling the interrupts during the walk.

Disabling interrupts isn't completely free (it's a self-synchronising
operation on ARM). It would be interesting to see if your futex workload
performance is improved by my simple irq_save optimisation for ARM:

  https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=misc-patches&id=312a70adfa6f22e9d62803dd21400f481253e58b

(I've been struggling to show anything other than tiny improvements from
that patch).

> This patch provides a general RCU implementation of get_user_pages_fast
> that can be used by architectures that perform hardware broadcast of
> TLB invalidations.
> 
> It is based heavily on the PowerPC implementation by Nick Piggin.

[...]

> diff --git a/mm/gup.c b/mm/gup.c
> index 91d044b..2f684fa 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -10,6 +10,10 @@
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
>  
> +#include <linux/sched.h>
> +#include <linux/rwsem.h>
> +#include <asm/pgtable.h>
> +
>  #include "internal.h"
>  
>  static struct page *no_page_table(struct vm_area_struct *vma,
> @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
>  	return page;
>  }
>  #endif /* CONFIG_ELF_CORE */
> +
> +#ifdef CONFIG_HAVE_RCU_GUP
> +
> +#ifdef __HAVE_ARCH_PTE_SPECIAL

Do we actually require this (pte special) if hugepages are disabled or
not supported?

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE
  2014-08-21 15:43   ` Steve Capper
  (?)
@ 2014-08-27 10:46     ` Catalin Marinas
  -1 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 10:46 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:28PM +0100, Steve Capper wrote:
> We need a mechanism to tag ptes as being special, this indicates that
> no attempt should be made to access the underlying struct page *
> associated with the pte. This is used by the fast_gup when operating on
> ptes as it has no means to access VMAs (that also contain this
> information) locklessly.
> 
> The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> pte_special and pte_mkspecial to make use of it, and defines
> __HAVE_ARCH_PTE_SPECIAL.
> 
> This patch also excludes special ptes from the icache/dcache sync logic.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-08-27 10:46     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 10:46 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:28PM +0100, Steve Capper wrote:
> We need a mechanism to tag ptes as being special, this indicates that
> no attempt should be made to access the underlying struct page *
> associated with the pte. This is used by the fast_gup when operating on
> ptes as it has no means to access VMAs (that also contain this
> information) locklessly.
> 
> The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> pte_special and pte_mkspecial to make use of it, and defines
> __HAVE_ARCH_PTE_SPECIAL.
> 
> This patch also excludes special ptes from the icache/dcache sync logic.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-08-27 10:46     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 10:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 21, 2014 at 04:43:28PM +0100, Steve Capper wrote:
> We need a mechanism to tag ptes as being special, this indicates that
> no attempt should be made to access the underlying struct page *
> associated with the pte. This is used by the fast_gup when operating on
> ptes as it has no means to access VMAs (that also contain this
> information) locklessly.
> 
> The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> pte_special and pte_mkspecial to make use of it, and defines
> __HAVE_ARCH_PTE_SPECIAL.
> 
> This patch also excludes special ptes from the icache/dcache sync logic.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 10:48     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 10:48 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:31PM +0100, Steve Capper wrote:
> In order to implement fast_get_user_pages we need to ensure that the
> page table walker is protected from page table pages being freed from
> under it.
> 
> This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> to address spaces with multiple users will be call_rcu_sched freed.
> Meaning that disabling interrupts will block the free and protect the
> fast gup page walker.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>

I'm happy to take this patch independently of this series. But if the
whole series goes in via some other tree (mm):

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 10:48     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 10:48 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:31PM +0100, Steve Capper wrote:
> In order to implement fast_get_user_pages we need to ensure that the
> page table walker is protected from page table pages being freed from
> under it.
> 
> This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> to address spaces with multiple users will be call_rcu_sched freed.
> Meaning that disabling interrupts will block the free and protect the
> fast gup page walker.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>

I'm happy to take this patch independently of this series. But if the
whole series goes in via some other tree (mm):

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 10:48     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 10:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 21, 2014 at 04:43:31PM +0100, Steve Capper wrote:
> In order to implement fast_get_user_pages we need to ensure that the
> page table walker is protected from page table pages being freed from
> under it.
> 
> This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> to address spaces with multiple users will be call_rcu_sched freed.
> Meaning that disabling interrupts will block the free and protect the
> fast gup page walker.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>

I'm happy to take this patch independently of this series. But if the
whole series goes in via some other tree (mm):

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 6/6] arm64: mm: Enable RCU fast_gup
  2014-08-21 15:43   ` Steve Capper
  (?)
@ 2014-08-27 11:09     ` Catalin Marinas
  -1 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 11:09 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:32PM +0100, Steve Capper wrote:
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -256,7 +256,13 @@ static inline pmd_t pte_pmd(pte_t pte)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
>  #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
> -#endif
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
> +struct vm_area_struct;
> +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> +			  pmd_t *pmdp);
> +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
>  #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
>  #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
> @@ -277,6 +283,7 @@ static inline pmd_t pte_pmd(pte_t pte)
>  #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
>  
>  #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> +#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
>  #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
>  
>  #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
> @@ -376,6 +383,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>  	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
>  }
>  
> +#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))

I think you could define a pud_pte as you've done for pmd. The
conversion would look slightly cleaner. Otherwise:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-27 11:09     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 11:09 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:32PM +0100, Steve Capper wrote:
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -256,7 +256,13 @@ static inline pmd_t pte_pmd(pte_t pte)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
>  #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
> -#endif
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
> +struct vm_area_struct;
> +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> +			  pmd_t *pmdp);
> +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
>  #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
>  #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
> @@ -277,6 +283,7 @@ static inline pmd_t pte_pmd(pte_t pte)
>  #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
>  
>  #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> +#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
>  #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
>  
>  #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
> @@ -376,6 +383,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>  	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
>  }
>  
> +#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))

I think you could define a pud_pte as you've done for pmd. The
conversion would look slightly cleaner. Otherwise:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-27 11:09     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 11:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 21, 2014 at 04:43:32PM +0100, Steve Capper wrote:
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -256,7 +256,13 @@ static inline pmd_t pte_pmd(pte_t pte)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
>  #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
> -#endif
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
> +struct vm_area_struct;
> +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> +			  pmd_t *pmdp);
> +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
>  #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
>  #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
> @@ -277,6 +283,7 @@ static inline pmd_t pte_pmd(pte_t pte)
>  #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
>  
>  #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> +#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
>  #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
>  
>  #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
> @@ -376,6 +383,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>  	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
>  }
>  
> +#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))

I think you could define a pud_pte as you've done for pmd. The
conversion would look slightly cleaner. Otherwise:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 11:50     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 11:50 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:29PM +0100, Steve Capper wrote:
> --- a/arch/arm/include/asm/tlb.h
> +++ b/arch/arm/include/asm/tlb.h
> @@ -35,12 +35,39 @@
>  
>  #define MMU_GATHER_BUNDLE	8
>  
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +static inline void __tlb_remove_table(void *_table)
> +{
> +	free_page_and_swap_cache((struct page *)_table);
> +}
> +
> +struct mmu_table_batch {
> +	struct rcu_head		rcu;
> +	unsigned int		nr;
> +	void			*tables[0];
> +};
> +
> +#define MAX_TABLE_BATCH		\
> +	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
> +
> +extern void tlb_table_flush(struct mmu_gather *tlb);
> +extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
> +
> +#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
> +#else
> +#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
> +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> +
>  /*
>   * TLB handling.  This allows us to remove pages from the page
>   * tables, and efficiently handle the TLB issues.
>   */
>  struct mmu_gather {
>  	struct mm_struct	*mm;
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +	struct mmu_table_batch	*batch;
> +	unsigned int		need_flush;
> +#endif

We add need_flush here just because it is set by tlb_remove_table() but
it won't actually be checked by anything since arch/arm uses its own
version of tlb_flush_mmu(). But I wouldn't go for #ifdefs in the core
code either.

We should (as a separate patchset) convert arch/arm to generic
mmu_gather. I know Russell had objections in the past but mmu_gather has
evolved since and it's not longer inefficient (I think the only case is
shift_arg_pages but that's pretty much lost in the noise).

For this patch:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 11:50     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 11:50 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:29PM +0100, Steve Capper wrote:
> --- a/arch/arm/include/asm/tlb.h
> +++ b/arch/arm/include/asm/tlb.h
> @@ -35,12 +35,39 @@
>  
>  #define MMU_GATHER_BUNDLE	8
>  
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +static inline void __tlb_remove_table(void *_table)
> +{
> +	free_page_and_swap_cache((struct page *)_table);
> +}
> +
> +struct mmu_table_batch {
> +	struct rcu_head		rcu;
> +	unsigned int		nr;
> +	void			*tables[0];
> +};
> +
> +#define MAX_TABLE_BATCH		\
> +	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
> +
> +extern void tlb_table_flush(struct mmu_gather *tlb);
> +extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
> +
> +#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
> +#else
> +#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
> +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> +
>  /*
>   * TLB handling.  This allows us to remove pages from the page
>   * tables, and efficiently handle the TLB issues.
>   */
>  struct mmu_gather {
>  	struct mm_struct	*mm;
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +	struct mmu_table_batch	*batch;
> +	unsigned int		need_flush;
> +#endif

We add need_flush here just because it is set by tlb_remove_table() but
it won't actually be checked by anything since arch/arm uses its own
version of tlb_flush_mmu(). But I wouldn't go for #ifdefs in the core
code either.

We should (as a separate patchset) convert arch/arm to generic
mmu_gather. I know Russell had objections in the past but mmu_gather has
evolved since and it's not longer inefficient (I think the only case is
shift_arg_pages but that's pretty much lost in the noise).

For this patch:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 11:50     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 11:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 21, 2014 at 04:43:29PM +0100, Steve Capper wrote:
> --- a/arch/arm/include/asm/tlb.h
> +++ b/arch/arm/include/asm/tlb.h
> @@ -35,12 +35,39 @@
>  
>  #define MMU_GATHER_BUNDLE	8
>  
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +static inline void __tlb_remove_table(void *_table)
> +{
> +	free_page_and_swap_cache((struct page *)_table);
> +}
> +
> +struct mmu_table_batch {
> +	struct rcu_head		rcu;
> +	unsigned int		nr;
> +	void			*tables[0];
> +};
> +
> +#define MAX_TABLE_BATCH		\
> +	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
> +
> +extern void tlb_table_flush(struct mmu_gather *tlb);
> +extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
> +
> +#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
> +#else
> +#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
> +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> +
>  /*
>   * TLB handling.  This allows us to remove pages from the page
>   * tables, and efficiently handle the TLB issues.
>   */
>  struct mmu_gather {
>  	struct mm_struct	*mm;
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +	struct mmu_table_batch	*batch;
> +	unsigned int		need_flush;
> +#endif

We add need_flush here just because it is set by tlb_remove_table() but
it won't actually be checked by anything since arch/arm uses its own
version of tlb_flush_mmu(). But I wouldn't go for #ifdefs in the core
code either.

We should (as a separate patchset) convert arch/arm to generic
mmu_gather. I know Russell had objections in the past but mmu_gather has
evolved since and it's not longer inefficient (I think the only case is
shift_arg_pages but that's pretty much lost in the noise).

For this patch:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-27 11:51     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 11:51 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:30PM +0100, Steve Capper wrote:
> Activate the RCU fast_gup for ARM. We also need to force THP splits to
> broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> splits are comparatively rare, this should not lead to a noticeable
> performance degradation.
> 
> Some pre-requisite functions pud_write and pud_page are also added.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-27 11:51     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 11:51 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:30PM +0100, Steve Capper wrote:
> Activate the RCU fast_gup for ARM. We also need to force THP splits to
> broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> splits are comparatively rare, this should not lead to a noticeable
> performance degradation.
> 
> Some pre-requisite functions pud_write and pud_page are also added.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-27 11:51     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 11:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 21, 2014 at 04:43:30PM +0100, Steve Capper wrote:
> Activate the RCU fast_gup for ARM. We also need to force THP splits to
> broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> splits are comparatively rare, this should not lead to a noticeable
> performance degradation.
> 
> Some pre-requisite functions pud_write and pud_page are also added.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 12:50       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 12:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 09:54:42AM +0100, Will Deacon wrote:
> Hi Steve,
> 

Hey Will,

> A few minor comments (took me a while to understand how this works, so I
> thought I'd make some noise :)

A big thank you for reading through it :-).

> 
> On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > get_user_pages_fast attempts to pin user pages by walking the page
> > tables directly and avoids taking locks. Thus the walker needs to be
> > protected from page table pages being freed from under it, and needs
> > to block any THP splits.
> > 
> > One way to achieve this is to have the walker disable interrupts, and
> > rely on IPIs from the TLB flushing code blocking before the page table
> > pages are freed.
> > 
> > On some platforms we have hardware broadcast of TLB invalidations, thus
> > the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> > spuriously broadcasting IPIs can hurt system performance if done too
> > often.
> > 
> > This problem has been solved on PowerPC and Sparc by batching up page
> > table pages belonging to more than one mm_user, then scheduling an
> > rcu_sched callback to free the pages. This RCU page table free logic
> > has been promoted to core code and is activated when one enables
> > HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> > their own get_user_pages_fast routines.
> > 
> > The RCU page table free logic coupled with a an IPI broadcast on THP
> > split (which is a rare event), allows one to protect a page table
> > walker by merely disabling the interrupts during the walk.
> 
> Disabling interrupts isn't completely free (it's a self-synchronising
> operation on ARM). It would be interesting to see if your futex workload
> performance is improved by my simple irq_save optimisation for ARM:
> 
>   https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=misc-patches&id=312a70adfa6f22e9d62803dd21400f481253e58b
> 
> (I've been struggling to show anything other than tiny improvements from
> that patch).
> 

This looks like a useful optimisation; I'll have a think about workloads that
fire many futexes on THP tails. (The test I used only fired off one futex).

> > This patch provides a general RCU implementation of get_user_pages_fast
> > that can be used by architectures that perform hardware broadcast of
> > TLB invalidations.
> > 
> > It is based heavily on the PowerPC implementation by Nick Piggin.
> 
> [...]
> 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 91d044b..2f684fa 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -10,6 +10,10 @@
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> >  
> > +#include <linux/sched.h>
> > +#include <linux/rwsem.h>
> > +#include <asm/pgtable.h>
> > +
> >  #include "internal.h"
> >  
> >  static struct page *no_page_table(struct vm_area_struct *vma,
> > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
> >  	return page;
> >  }
> >  #endif /* CONFIG_ELF_CORE */
> > +
> > +#ifdef CONFIG_HAVE_RCU_GUP
> > +
> > +#ifdef __HAVE_ARCH_PTE_SPECIAL
> 
> Do we actually require this (pte special) if hugepages are disabled or
> not supported?

We need this logic if we want use fast_gup on normal pages safely. The special
bit indicates that we should not attempt to take a reference to the underlying
page.

Huge pages are guaranteed not to be special.

Cheers,
-- 
Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 12:50       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 12:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 09:54:42AM +0100, Will Deacon wrote:
> Hi Steve,
> 

Hey Will,

> A few minor comments (took me a while to understand how this works, so I
> thought I'd make some noise :)

A big thank you for reading through it :-).

> 
> On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > get_user_pages_fast attempts to pin user pages by walking the page
> > tables directly and avoids taking locks. Thus the walker needs to be
> > protected from page table pages being freed from under it, and needs
> > to block any THP splits.
> > 
> > One way to achieve this is to have the walker disable interrupts, and
> > rely on IPIs from the TLB flushing code blocking before the page table
> > pages are freed.
> > 
> > On some platforms we have hardware broadcast of TLB invalidations, thus
> > the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> > spuriously broadcasting IPIs can hurt system performance if done too
> > often.
> > 
> > This problem has been solved on PowerPC and Sparc by batching up page
> > table pages belonging to more than one mm_user, then scheduling an
> > rcu_sched callback to free the pages. This RCU page table free logic
> > has been promoted to core code and is activated when one enables
> > HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> > their own get_user_pages_fast routines.
> > 
> > The RCU page table free logic coupled with a an IPI broadcast on THP
> > split (which is a rare event), allows one to protect a page table
> > walker by merely disabling the interrupts during the walk.
> 
> Disabling interrupts isn't completely free (it's a self-synchronising
> operation on ARM). It would be interesting to see if your futex workload
> performance is improved by my simple irq_save optimisation for ARM:
> 
>   https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=misc-patches&id=312a70adfa6f22e9d62803dd21400f481253e58b
> 
> (I've been struggling to show anything other than tiny improvements from
> that patch).
> 

This looks like a useful optimisation; I'll have a think about workloads that
fire many futexes on THP tails. (The test I used only fired off one futex).

> > This patch provides a general RCU implementation of get_user_pages_fast
> > that can be used by architectures that perform hardware broadcast of
> > TLB invalidations.
> > 
> > It is based heavily on the PowerPC implementation by Nick Piggin.
> 
> [...]
> 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 91d044b..2f684fa 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -10,6 +10,10 @@
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> >  
> > +#include <linux/sched.h>
> > +#include <linux/rwsem.h>
> > +#include <asm/pgtable.h>
> > +
> >  #include "internal.h"
> >  
> >  static struct page *no_page_table(struct vm_area_struct *vma,
> > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
> >  	return page;
> >  }
> >  #endif /* CONFIG_ELF_CORE */
> > +
> > +#ifdef CONFIG_HAVE_RCU_GUP
> > +
> > +#ifdef __HAVE_ARCH_PTE_SPECIAL
> 
> Do we actually require this (pte special) if hugepages are disabled or
> not supported?

We need this logic if we want use fast_gup on normal pages safely. The special
bit indicates that we should not attempt to take a reference to the underlying
page.

Huge pages are guaranteed not to be special.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 12:50       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 12:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 27, 2014 at 09:54:42AM +0100, Will Deacon wrote:
> Hi Steve,
> 

Hey Will,

> A few minor comments (took me a while to understand how this works, so I
> thought I'd make some noise :)

A big thank you for reading through it :-).

> 
> On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > get_user_pages_fast attempts to pin user pages by walking the page
> > tables directly and avoids taking locks. Thus the walker needs to be
> > protected from page table pages being freed from under it, and needs
> > to block any THP splits.
> > 
> > One way to achieve this is to have the walker disable interrupts, and
> > rely on IPIs from the TLB flushing code blocking before the page table
> > pages are freed.
> > 
> > On some platforms we have hardware broadcast of TLB invalidations, thus
> > the TLB flushing code doesn't necessarily need to broadcast IPIs; and
> > spuriously broadcasting IPIs can hurt system performance if done too
> > often.
> > 
> > This problem has been solved on PowerPC and Sparc by batching up page
> > table pages belonging to more than one mm_user, then scheduling an
> > rcu_sched callback to free the pages. This RCU page table free logic
> > has been promoted to core code and is activated when one enables
> > HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
> > their own get_user_pages_fast routines.
> > 
> > The RCU page table free logic coupled with a an IPI broadcast on THP
> > split (which is a rare event), allows one to protect a page table
> > walker by merely disabling the interrupts during the walk.
> 
> Disabling interrupts isn't completely free (it's a self-synchronising
> operation on ARM). It would be interesting to see if your futex workload
> performance is improved by my simple irq_save optimisation for ARM:
> 
>   https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=misc-patches&id=312a70adfa6f22e9d62803dd21400f481253e58b
> 
> (I've been struggling to show anything other than tiny improvements from
> that patch).
> 

This looks like a useful optimisation; I'll have a think about workloads that
fire many futexes on THP tails. (The test I used only fired off one futex).

> > This patch provides a general RCU implementation of get_user_pages_fast
> > that can be used by architectures that perform hardware broadcast of
> > TLB invalidations.
> > 
> > It is based heavily on the PowerPC implementation by Nick Piggin.
> 
> [...]
> 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 91d044b..2f684fa 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -10,6 +10,10 @@
> >  #include <linux/swap.h>
> >  #include <linux/swapops.h>
> >  
> > +#include <linux/sched.h>
> > +#include <linux/rwsem.h>
> > +#include <asm/pgtable.h>
> > +
> >  #include "internal.h"
> >  
> >  static struct page *no_page_table(struct vm_area_struct *vma,
> > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
> >  	return page;
> >  }
> >  #endif /* CONFIG_ELF_CORE */
> > +
> > +#ifdef CONFIG_HAVE_RCU_GUP
> > +
> > +#ifdef __HAVE_ARCH_PTE_SPECIAL
> 
> Do we actually require this (pte special) if hugepages are disabled or
> not supported?

We need this logic if we want use fast_gup on normal pages safely. The special
bit indicates that we should not attempt to take a reference to the underlying
page.

Huge pages are guaranteed not to be special.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE
  2014-08-27 10:46     ` Catalin Marinas
  (?)
@ 2014-08-27 12:52       ` Steve Capper
  -1 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 12:52 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 11:46:53AM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:28PM +0100, Steve Capper wrote:
> > We need a mechanism to tag ptes as being special, this indicates that
> > no attempt should be made to access the underlying struct page *
> > associated with the pte. This is used by the fast_gup when operating on
> > ptes as it has no means to access VMAs (that also contain this
> > information) locklessly.
> > 
> > The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> > pte_special and pte_mkspecial to make use of it, and defines
> > __HAVE_ARCH_PTE_SPECIAL.
> > 
> > This patch also excludes special ptes from the icache/dcache sync logic.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks Catalin,
I've added this to the patch.
-- 
Steve

> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-08-27 12:52       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 12:52 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 11:46:53AM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:28PM +0100, Steve Capper wrote:
> > We need a mechanism to tag ptes as being special, this indicates that
> > no attempt should be made to access the underlying struct page *
> > associated with the pte. This is used by the fast_gup when operating on
> > ptes as it has no means to access VMAs (that also contain this
> > information) locklessly.
> > 
> > The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> > pte_special and pte_mkspecial to make use of it, and defines
> > __HAVE_ARCH_PTE_SPECIAL.
> > 
> > This patch also excludes special ptes from the icache/dcache sync logic.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks Catalin,
I've added this to the patch.
-- 
Steve

> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-08-27 12:52       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 27, 2014 at 11:46:53AM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:28PM +0100, Steve Capper wrote:
> > We need a mechanism to tag ptes as being special, this indicates that
> > no attempt should be made to access the underlying struct page *
> > associated with the pte. This is used by the fast_gup when operating on
> > ptes as it has no means to access VMAs (that also contain this
> > information) locklessly.
> > 
> > The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> > pte_special and pte_mkspecial to make use of it, and defines
> > __HAVE_ARCH_PTE_SPECIAL.
> > 
> > This patch also excludes special ptes from the icache/dcache sync logic.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks Catalin,
I've added this to the patch.
-- 
Steve

> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 12:59       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 12:59 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 12:50:10PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:29PM +0100, Steve Capper wrote:
> > --- a/arch/arm/include/asm/tlb.h
> > +++ b/arch/arm/include/asm/tlb.h
> > @@ -35,12 +35,39 @@
> >  
> >  #define MMU_GATHER_BUNDLE	8
> >  
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +static inline void __tlb_remove_table(void *_table)
> > +{
> > +	free_page_and_swap_cache((struct page *)_table);
> > +}
> > +
> > +struct mmu_table_batch {
> > +	struct rcu_head		rcu;
> > +	unsigned int		nr;
> > +	void			*tables[0];
> > +};
> > +
> > +#define MAX_TABLE_BATCH		\
> > +	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
> > +
> > +extern void tlb_table_flush(struct mmu_gather *tlb);
> > +extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
> > +
> > +#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
> > +#else
> > +#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
> > +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> > +
> >  /*
> >   * TLB handling.  This allows us to remove pages from the page
> >   * tables, and efficiently handle the TLB issues.
> >   */
> >  struct mmu_gather {
> >  	struct mm_struct	*mm;
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +	struct mmu_table_batch	*batch;
> > +	unsigned int		need_flush;
> > +#endif
> 
> We add need_flush here just because it is set by tlb_remove_table() but
> it won't actually be checked by anything since arch/arm uses its own
> version of tlb_flush_mmu(). But I wouldn't go for #ifdefs in the core
> code either.
> 
> We should (as a separate patchset) convert arch/arm to generic
> mmu_gather. I know Russell had objections in the past but mmu_gather has
> evolved since and it's not longer inefficient (I think the only case is
> shift_arg_pages but that's pretty much lost in the noise).

I would be happy to help out with a conversion to generic mmu_gather if
it's wanted for arm.

> 
> For this patch:
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Cheers.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 12:59       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 12:59 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 12:50:10PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:29PM +0100, Steve Capper wrote:
> > --- a/arch/arm/include/asm/tlb.h
> > +++ b/arch/arm/include/asm/tlb.h
> > @@ -35,12 +35,39 @@
> >  
> >  #define MMU_GATHER_BUNDLE	8
> >  
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +static inline void __tlb_remove_table(void *_table)
> > +{
> > +	free_page_and_swap_cache((struct page *)_table);
> > +}
> > +
> > +struct mmu_table_batch {
> > +	struct rcu_head		rcu;
> > +	unsigned int		nr;
> > +	void			*tables[0];
> > +};
> > +
> > +#define MAX_TABLE_BATCH		\
> > +	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
> > +
> > +extern void tlb_table_flush(struct mmu_gather *tlb);
> > +extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
> > +
> > +#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
> > +#else
> > +#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
> > +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> > +
> >  /*
> >   * TLB handling.  This allows us to remove pages from the page
> >   * tables, and efficiently handle the TLB issues.
> >   */
> >  struct mmu_gather {
> >  	struct mm_struct	*mm;
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +	struct mmu_table_batch	*batch;
> > +	unsigned int		need_flush;
> > +#endif
> 
> We add need_flush here just because it is set by tlb_remove_table() but
> it won't actually be checked by anything since arch/arm uses its own
> version of tlb_flush_mmu(). But I wouldn't go for #ifdefs in the core
> code either.
> 
> We should (as a separate patchset) convert arch/arm to generic
> mmu_gather. I know Russell had objections in the past but mmu_gather has
> evolved since and it's not longer inefficient (I think the only case is
> shift_arg_pages but that's pretty much lost in the noise).

I would be happy to help out with a conversion to generic mmu_gather if
it's wanted for arm.

> 
> For this patch:
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Cheers.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 12:59       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 12:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 27, 2014 at 12:50:10PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:29PM +0100, Steve Capper wrote:
> > --- a/arch/arm/include/asm/tlb.h
> > +++ b/arch/arm/include/asm/tlb.h
> > @@ -35,12 +35,39 @@
> >  
> >  #define MMU_GATHER_BUNDLE	8
> >  
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +static inline void __tlb_remove_table(void *_table)
> > +{
> > +	free_page_and_swap_cache((struct page *)_table);
> > +}
> > +
> > +struct mmu_table_batch {
> > +	struct rcu_head		rcu;
> > +	unsigned int		nr;
> > +	void			*tables[0];
> > +};
> > +
> > +#define MAX_TABLE_BATCH		\
> > +	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
> > +
> > +extern void tlb_table_flush(struct mmu_gather *tlb);
> > +extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
> > +
> > +#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
> > +#else
> > +#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
> > +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> > +
> >  /*
> >   * TLB handling.  This allows us to remove pages from the page
> >   * tables, and efficiently handle the TLB issues.
> >   */
> >  struct mmu_gather {
> >  	struct mm_struct	*mm;
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +	struct mmu_table_batch	*batch;
> > +	unsigned int		need_flush;
> > +#endif
> 
> We add need_flush here just because it is set by tlb_remove_table() but
> it won't actually be checked by anything since arch/arm uses its own
> version of tlb_flush_mmu(). But I wouldn't go for #ifdefs in the core
> code either.
> 
> We should (as a separate patchset) convert arch/arm to generic
> mmu_gather. I know Russell had objections in the past but mmu_gather has
> evolved since and it's not longer inefficient (I think the only case is
> shift_arg_pages but that's pretty much lost in the noise).

I would be happy to help out with a conversion to generic mmu_gather if
it's wanted for arm.

> 
> For this patch:
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Cheers.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-27 13:01       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 13:01 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 12:51:37PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:30PM +0100, Steve Capper wrote:
> > Activate the RCU fast_gup for ARM. We also need to force THP splits to
> > broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> > splits are comparatively rare, this should not lead to a noticeable
> > performance degradation.
> > 
> > Some pre-requisite functions pud_write and pud_page are also added.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-27 13:01       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 13:01 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 12:51:37PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:30PM +0100, Steve Capper wrote:
> > Activate the RCU fast_gup for ARM. We also need to force THP splits to
> > broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> > splits are comparatively rare, this should not lead to a noticeable
> > performance degradation.
> > 
> > Some pre-requisite functions pud_write and pud_page are also added.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 4/6] arm: mm: Enable RCU fast_gup
@ 2014-08-27 13:01       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 13:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 27, 2014 at 12:51:37PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:30PM +0100, Steve Capper wrote:
> > Activate the RCU fast_gup for ARM. We also need to force THP splits to
> > broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> > splits are comparatively rare, this should not lead to a noticeable
> > performance degradation.
> > 
> > Some pre-requisite functions pud_write and pud_page are also added.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  2014-08-27 10:48     ` Catalin Marinas
  (?)
@ 2014-08-27 13:08       ` Steve Capper
  -1 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 13:08 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 11:48:41AM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:31PM +0100, Steve Capper wrote:
> > In order to implement fast_get_user_pages we need to ensure that the
> > page table walker is protected from page table pages being freed from
> > under it.
> > 
> > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> > to address spaces with multiple users will be call_rcu_sched freed.
> > Meaning that disabling interrupts will block the free and protect the
> > fast gup page walker.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> 
> I'm happy to take this patch independently of this series. But if the
> whole series goes in via some other tree (mm):
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks. If patch #1 looks okay to the mm folks, I'm hoping this patch
can be merged via the same tree.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 13:08       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 13:08 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 11:48:41AM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:31PM +0100, Steve Capper wrote:
> > In order to implement fast_get_user_pages we need to ensure that the
> > page table walker is protected from page table pages being freed from
> > under it.
> > 
> > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> > to address spaces with multiple users will be call_rcu_sched freed.
> > Meaning that disabling interrupts will block the free and protect the
> > fast gup page walker.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> 
> I'm happy to take this patch independently of this series. But if the
> whole series goes in via some other tree (mm):
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks. If patch #1 looks okay to the mm folks, I'm hoping this patch
can be merged via the same tree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-08-27 13:08       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 27, 2014 at 11:48:41AM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:31PM +0100, Steve Capper wrote:
> > In order to implement fast_get_user_pages we need to ensure that the
> > page table walker is protected from page table pages being freed from
> > under it.
> > 
> > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> > to address spaces with multiple users will be call_rcu_sched freed.
> > Meaning that disabling interrupts will block the free and protect the
> > fast gup page walker.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> 
> I'm happy to take this patch independently of this series. But if the
> whole series goes in via some other tree (mm):
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks. If patch #1 looks okay to the mm folks, I'm hoping this patch
can be merged via the same tree.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
  2014-08-27 12:50       ` Steve Capper
  (?)
@ 2014-08-27 13:14         ` Will Deacon
  -1 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-27 13:14 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 01:50:28PM +0100, Steve Capper wrote:
> On Wed, Aug 27, 2014 at 09:54:42AM +0100, Will Deacon wrote:
> > On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
> > >  	return page;
> > >  }
> > >  #endif /* CONFIG_ELF_CORE */
> > > +
> > > +#ifdef CONFIG_HAVE_RCU_GUP
> > > +
> > > +#ifdef __HAVE_ARCH_PTE_SPECIAL
> > 
> > Do we actually require this (pte special) if hugepages are disabled or
> > not supported?
> 
> We need this logic if we want use fast_gup on normal pages safely. The special
> bit indicates that we should not attempt to take a reference to the underlying
> page.
> 
> Huge pages are guaranteed not to be special.

Gah, I somehow mixed up sp-litting and sp-ecial. Step away from the
computer.

In which case, the patch looks fine. You might need to repost with '[PATCH]'
instead of '[PATH]', in case you confused people's filters.

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 13:14         ` Will Deacon
  0 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-27 13:14 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 01:50:28PM +0100, Steve Capper wrote:
> On Wed, Aug 27, 2014 at 09:54:42AM +0100, Will Deacon wrote:
> > On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
> > >  	return page;
> > >  }
> > >  #endif /* CONFIG_ELF_CORE */
> > > +
> > > +#ifdef CONFIG_HAVE_RCU_GUP
> > > +
> > > +#ifdef __HAVE_ARCH_PTE_SPECIAL
> > 
> > Do we actually require this (pte special) if hugepages are disabled or
> > not supported?
> 
> We need this logic if we want use fast_gup on normal pages safely. The special
> bit indicates that we should not attempt to take a reference to the underlying
> page.
> 
> Huge pages are guaranteed not to be special.

Gah, I somehow mixed up sp-litting and sp-ecial. Step away from the
computer.

In which case, the patch looks fine. You might need to repost with '[PATCH]'
instead of '[PATH]', in case you confused people's filters.

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 13:14         ` Will Deacon
  0 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2014-08-27 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 27, 2014 at 01:50:28PM +0100, Steve Capper wrote:
> On Wed, Aug 27, 2014 at 09:54:42AM +0100, Will Deacon wrote:
> > On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr)
> > >  	return page;
> > >  }
> > >  #endif /* CONFIG_ELF_CORE */
> > > +
> > > +#ifdef CONFIG_HAVE_RCU_GUP
> > > +
> > > +#ifdef __HAVE_ARCH_PTE_SPECIAL
> > 
> > Do we actually require this (pte special) if hugepages are disabled or
> > not supported?
> 
> We need this logic if we want use fast_gup on normal pages safely. The special
> bit indicates that we should not attempt to take a reference to the underlying
> page.
> 
> Huge pages are guaranteed not to be special.

Gah, I somehow mixed up sp-litting and sp-ecial. Step away from the
computer.

In which case, the patch looks fine. You might need to repost with '[PATCH]'
instead of '[PATH]', in case you confused people's filters.

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-27 13:43       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 13:43 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 12:09:48PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:32PM +0100, Steve Capper wrote:
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -256,7 +256,13 @@ static inline pmd_t pte_pmd(pte_t pte)
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
> >  #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
> > -#endif
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
> > +struct vm_area_struct;
> > +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> > +			  pmd_t *pmdp);
> > +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >  
> >  #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
> >  #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
> > @@ -277,6 +283,7 @@ static inline pmd_t pte_pmd(pte_t pte)
> >  #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
> >  
> >  #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> > +#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
> >  #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
> >  
> >  #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
> > @@ -376,6 +383,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >  	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
> >  }
> >  
> > +#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))
> 
> I think you could define a pud_pte as you've done for pmd. The
> conversion would look slightly cleaner. Otherwise:

Thanks Catalin,
I've added pud_pte and pud_pmd helpers and that now looks a lot
clearer.

> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-27 13:43       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 13:43 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 12:09:48PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:32PM +0100, Steve Capper wrote:
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -256,7 +256,13 @@ static inline pmd_t pte_pmd(pte_t pte)
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
> >  #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
> > -#endif
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
> > +struct vm_area_struct;
> > +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> > +			  pmd_t *pmdp);
> > +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >  
> >  #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
> >  #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
> > @@ -277,6 +283,7 @@ static inline pmd_t pte_pmd(pte_t pte)
> >  #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
> >  
> >  #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> > +#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
> >  #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
> >  
> >  #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
> > @@ -376,6 +383,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >  	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
> >  }
> >  
> > +#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))
> 
> I think you could define a pud_pte as you've done for pmd. The
> conversion would look slightly cleaner. Otherwise:

Thanks Catalin,
I've added pud_pte and pud_pmd helpers and that now looks a lot
clearer.

> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-08-27 13:43       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 13:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 27, 2014 at 12:09:48PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:32PM +0100, Steve Capper wrote:
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -256,7 +256,13 @@ static inline pmd_t pte_pmd(pte_t pte)
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
> >  #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
> > -#endif
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
> > +struct vm_area_struct;
> > +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> > +			  pmd_t *pmdp);
> > +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >  
> >  #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
> >  #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
> > @@ -277,6 +283,7 @@ static inline pmd_t pte_pmd(pte_t pte)
> >  #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
> >  
> >  #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> > +#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
> >  #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
> >  
> >  #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
> > @@ -376,6 +383,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >  	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
> >  }
> >  
> > +#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))
> 
> I think you could define a pud_pte as you've done for pmd. The
> conversion would look slightly cleaner. Otherwise:

Thanks Catalin,
I've added pud_pte and pud_pmd helpers and that now looks a lot
clearer.

> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
  2014-08-21 15:43   ` Steve Capper
  (?)
@ 2014-08-27 14:28     ` Catalin Marinas
  -1 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 14:28 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 886db21..6a4d764 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>  config HAVE_MEMBLOCK_PHYS_MAP
>  	boolean
>  
> +config HAVE_RCU_GUP
> +	boolean

Minor detail, maybe HAVE_GENERIC_RCU_GUP to avoid confusion.

Otherwise the patch looks fine to me.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 14:28     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 14:28 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 886db21..6a4d764 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>  config HAVE_MEMBLOCK_PHYS_MAP
>  	boolean
>  
> +config HAVE_RCU_GUP
> +	boolean

Minor detail, maybe HAVE_GENERIC_RCU_GUP to avoid confusion.

Otherwise the patch looks fine to me.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 14:28     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2014-08-27 14:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 886db21..6a4d764 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>  config HAVE_MEMBLOCK_PHYS_MAP
>  	boolean
>  
> +config HAVE_RCU_GUP
> +	boolean

Minor detail, maybe HAVE_GENERIC_RCU_GUP to avoid confusion.

Otherwise the patch looks fine to me.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
  2014-08-27 14:28     ` Catalin Marinas
  (?)
@ 2014-08-27 14:42       ` Steve Capper
  -1 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 14:42 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 03:28:01PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 886db21..6a4d764 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
> >  config HAVE_MEMBLOCK_PHYS_MAP
> >  	boolean
> >  
> > +config HAVE_RCU_GUP
> > +	boolean
> 
> Minor detail, maybe HAVE_GENERIC_RCU_GUP to avoid confusion.

Yeah, that does look better, I'll amend the series accordingly.

> 
> Otherwise the patch looks fine to me.
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks Catalin.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 14:42       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 14:42 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux, linux-arch, linux-mm, Will Deacon,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm,
	dann.frazier, Mark Rutland, mgorman

On Wed, Aug 27, 2014 at 03:28:01PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 886db21..6a4d764 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
> >  config HAVE_MEMBLOCK_PHYS_MAP
> >  	boolean
> >  
> > +config HAVE_RCU_GUP
> > +	boolean
> 
> Minor detail, maybe HAVE_GENERIC_RCU_GUP to avoid confusion.

Yeah, that does look better, I'll amend the series accordingly.

> 
> Otherwise the patch looks fine to me.
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks Catalin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 14:42       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-27 14:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 27, 2014 at 03:28:01PM +0100, Catalin Marinas wrote:
> On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 886db21..6a4d764 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
> >  config HAVE_MEMBLOCK_PHYS_MAP
> >  	boolean
> >  
> > +config HAVE_RCU_GUP
> > +	boolean
> 
> Minor detail, maybe HAVE_GENERIC_RCU_GUP to avoid confusion.

Yeah, that does look better, I'll amend the series accordingly.

> 
> Otherwise the patch looks fine to me.
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks Catalin.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
  2014-08-21 15:43   ` Steve Capper
  (?)
@ 2014-08-27 15:01     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-08-27 15:01 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, catalin.marinas, linux-arch, linux-mm,
	will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> +			struct page **pages)
> +{
> +	struct mm_struct *mm = current->mm;
> +	int nr, ret;
> +
> +	start &= PAGE_MASK;
> +	nr = __get_user_pages_fast(start, nr_pages, write, pages);
> +	ret = nr;
> +
> +	if (nr < nr_pages) {
> +		/* Try to get the remaining pages with get_user_pages */
> +		start += nr << PAGE_SHIFT;
> +		pages += nr;

When I read this, my first reaction was... what if nr is negative?  In
that case, if nr_pages is positive, we fall through into this if, and
start to wind things backwards - which isn't what we want.

It looks like that can't happen... right?  __get_user_pages_fast() only
returns greater-or-equal to zero right now, but what about the future?

> +
> +		down_read(&mm->mmap_sem);
> +		ret = get_user_pages(current, mm, start,
> +				     nr_pages - nr, write, 0, pages, NULL);
> +		up_read(&mm->mmap_sem);
> +
> +		/* Have to be a bit careful with return values */
> +		if (nr > 0) {

This kind'a makes it look like nr could be negative.

Other than that, I don't see anything obviously wrong with it.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 15:01     ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-08-27 15:01 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, catalin.marinas, linux-arch, linux-mm,
	will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman

On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> +			struct page **pages)
> +{
> +	struct mm_struct *mm = current->mm;
> +	int nr, ret;
> +
> +	start &= PAGE_MASK;
> +	nr = __get_user_pages_fast(start, nr_pages, write, pages);
> +	ret = nr;
> +
> +	if (nr < nr_pages) {
> +		/* Try to get the remaining pages with get_user_pages */
> +		start += nr << PAGE_SHIFT;
> +		pages += nr;

When I read this, my first reaction was... what if nr is negative?  In
that case, if nr_pages is positive, we fall through into this if, and
start to wind things backwards - which isn't what we want.

It looks like that can't happen... right?  __get_user_pages_fast() only
returns greater-or-equal to zero right now, but what about the future?

> +
> +		down_read(&mm->mmap_sem);
> +		ret = get_user_pages(current, mm, start,
> +				     nr_pages - nr, write, 0, pages, NULL);
> +		up_read(&mm->mmap_sem);
> +
> +		/* Have to be a bit careful with return values */
> +		if (nr > 0) {

This kind'a makes it look like nr could be negative.

Other than that, I don't see anything obviously wrong with it.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-27 15:01     ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-08-27 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> +			struct page **pages)
> +{
> +	struct mm_struct *mm = current->mm;
> +	int nr, ret;
> +
> +	start &= PAGE_MASK;
> +	nr = __get_user_pages_fast(start, nr_pages, write, pages);
> +	ret = nr;
> +
> +	if (nr < nr_pages) {
> +		/* Try to get the remaining pages with get_user_pages */
> +		start += nr << PAGE_SHIFT;
> +		pages += nr;

When I read this, my first reaction was... what if nr is negative?  In
that case, if nr_pages is positive, we fall through into this if, and
start to wind things backwards - which isn't what we want.

It looks like that can't happen... right?  __get_user_pages_fast() only
returns greater-or-equal to zero right now, but what about the future?

> +
> +		down_read(&mm->mmap_sem);
> +		ret = get_user_pages(current, mm, start,
> +				     nr_pages - nr, write, 0, pages, NULL);
> +		up_read(&mm->mmap_sem);
> +
> +		/* Have to be a bit careful with return values */
> +		if (nr > 0) {

This kind'a makes it look like nr could be negative.

Other than that, I don't see anything obviously wrong with it.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-28  8:59       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-28  8:59 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arm-kernel, catalin.marinas, linux-arch, linux-mm,
	will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman

On Wed, Aug 27, 2014 at 04:01:39PM +0100, Russell King - ARM Linux wrote:

Hi Russell,

> On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > +			struct page **pages)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	int nr, ret;
> > +
> > +	start &= PAGE_MASK;
> > +	nr = __get_user_pages_fast(start, nr_pages, write, pages);
> > +	ret = nr;
> > +
> > +	if (nr < nr_pages) {
> > +		/* Try to get the remaining pages with get_user_pages */
> > +		start += nr << PAGE_SHIFT;
> > +		pages += nr;
> 
> When I read this, my first reaction was... what if nr is negative?  In
> that case, if nr_pages is positive, we fall through into this if, and
> start to wind things backwards - which isn't what we want.
> 
> It looks like that can't happen... right?  __get_user_pages_fast() only
> returns greater-or-equal to zero right now, but what about the future?

__get_user_pages_fast is a strict fast path, it will grab as many page
references as it can and if something gets in its way it backs off. As
it can't take locks, it can't inspect the VMA, thus it really isn't in
a position to know if there's an error. It may be possible for the
slow path to take a write fault for a read only pte, for instance.
(we could in theory return an error on pte_special and save a fallback
to the slowpath but I don't believe it's worth doing as special ptes
should be encountered very rarely by the fast_gup).

I think it's safe to assume that __get_use_pages_fast has non-negative
return values; also it is logically contained in the same area as
get_user_pages_fast, so if this does change we can apply changes below
it too.

get_user_pages_fast attempts the fast path but is allowed to fallback
to the slowpath, so is in a position to return an error code thus can
return negative values.

> 
> > +
> > +		down_read(&mm->mmap_sem);
> > +		ret = get_user_pages(current, mm, start,
> > +				     nr_pages - nr, write, 0, pages, NULL);
> > +		up_read(&mm->mmap_sem);
> > +
> > +		/* Have to be a bit careful with return values */
> > +		if (nr > 0) {
> 
> This kind'a makes it look like nr could be negative.

I read it as "did the fast path get at least one page?".

> 
> Other than that, I don't see anything obviously wrong with it.

Thank you for giving this a going over.

Cheers,
-- 
Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-28  8:59       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-28  8:59 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arm-kernel, catalin.marinas, linux-arch, linux-mm,
	will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, dann.frazier, mark.rutland, mgorman

On Wed, Aug 27, 2014 at 04:01:39PM +0100, Russell King - ARM Linux wrote:

Hi Russell,

> On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > +			struct page **pages)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	int nr, ret;
> > +
> > +	start &= PAGE_MASK;
> > +	nr = __get_user_pages_fast(start, nr_pages, write, pages);
> > +	ret = nr;
> > +
> > +	if (nr < nr_pages) {
> > +		/* Try to get the remaining pages with get_user_pages */
> > +		start += nr << PAGE_SHIFT;
> > +		pages += nr;
> 
> When I read this, my first reaction was... what if nr is negative?  In
> that case, if nr_pages is positive, we fall through into this if, and
> start to wind things backwards - which isn't what we want.
> 
> It looks like that can't happen... right?  __get_user_pages_fast() only
> returns greater-or-equal to zero right now, but what about the future?

__get_user_pages_fast is a strict fast path, it will grab as many page
references as it can and if something gets in its way it backs off. As
it can't take locks, it can't inspect the VMA, thus it really isn't in
a position to know if there's an error. It may be possible for the
slow path to take a write fault for a read only pte, for instance.
(we could in theory return an error on pte_special and save a fallback
to the slowpath but I don't believe it's worth doing as special ptes
should be encountered very rarely by the fast_gup).

I think it's safe to assume that __get_use_pages_fast has non-negative
return values; also it is logically contained in the same area as
get_user_pages_fast, so if this does change we can apply changes below
it too.

get_user_pages_fast attempts the fast path but is allowed to fallback
to the slowpath, so is in a position to return an error code thus can
return negative values.

> 
> > +
> > +		down_read(&mm->mmap_sem);
> > +		ret = get_user_pages(current, mm, start,
> > +				     nr_pages - nr, write, 0, pages, NULL);
> > +		up_read(&mm->mmap_sem);
> > +
> > +		/* Have to be a bit careful with return values */
> > +		if (nr > 0) {
> 
> This kind'a makes it look like nr could be negative.

I read it as "did the fast path get at least one page?".

> 
> Other than that, I don't see anything obviously wrong with it.

Thank you for giving this a going over.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-08-28  8:59       ` Steve Capper
  0 siblings, 0 replies; 78+ messages in thread
From: Steve Capper @ 2014-08-28  8:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 27, 2014 at 04:01:39PM +0100, Russell King - ARM Linux wrote:

Hi Russell,

> On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote:
> > +int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > +			struct page **pages)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	int nr, ret;
> > +
> > +	start &= PAGE_MASK;
> > +	nr = __get_user_pages_fast(start, nr_pages, write, pages);
> > +	ret = nr;
> > +
> > +	if (nr < nr_pages) {
> > +		/* Try to get the remaining pages with get_user_pages */
> > +		start += nr << PAGE_SHIFT;
> > +		pages += nr;
> 
> When I read this, my first reaction was... what if nr is negative?  In
> that case, if nr_pages is positive, we fall through into this if, and
> start to wind things backwards - which isn't what we want.
> 
> It looks like that can't happen... right?  __get_user_pages_fast() only
> returns greater-or-equal to zero right now, but what about the future?

__get_user_pages_fast is a strict fast path, it will grab as many page
references as it can and if something gets in its way it backs off. As
it can't take locks, it can't inspect the VMA, thus it really isn't in
a position to know if there's an error. It may be possible for the
slow path to take a write fault for a read only pte, for instance.
(we could in theory return an error on pte_special and save a fallback
to the slowpath but I don't believe it's worth doing as special ptes
should be encountered very rarely by the fast_gup).

I think it's safe to assume that __get_use_pages_fast has non-negative
return values; also it is logically contained in the same area as
get_user_pages_fast, so if this does change we can apply changes below
it too.

get_user_pages_fast attempts the fast path but is allowed to fallback
to the slowpath, so is in a position to return an error code thus can
return negative values.

> 
> > +
> > +		down_read(&mm->mmap_sem);
> > +		ret = get_user_pages(current, mm, start,
> > +				     nr_pages - nr, write, 0, pages, NULL);
> > +		up_read(&mm->mmap_sem);
> > +
> > +		/* Have to be a bit careful with return values */
> > +		if (nr > 0) {
> 
> This kind'a makes it look like nr could be negative.

I read it as "did the fast path get at least one page?".

> 
> Other than that, I don't see anything obviously wrong with it.

Thank you for giving this a going over.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2014-08-28  8:59 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-21 15:43 [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast Steve Capper
2014-08-21 15:43 ` Steve Capper
2014-08-21 15:43 ` Steve Capper
2014-08-21 15:43 ` [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-27  8:54   ` Will Deacon
2014-08-27  8:54     ` Will Deacon
2014-08-27  8:54     ` Will Deacon
2014-08-27 12:50     ` Steve Capper
2014-08-27 12:50       ` Steve Capper
2014-08-27 12:50       ` Steve Capper
2014-08-27 13:14       ` Will Deacon
2014-08-27 13:14         ` Will Deacon
2014-08-27 13:14         ` Will Deacon
2014-08-27 14:28   ` Catalin Marinas
2014-08-27 14:28     ` Catalin Marinas
2014-08-27 14:28     ` Catalin Marinas
2014-08-27 14:42     ` Steve Capper
2014-08-27 14:42       ` Steve Capper
2014-08-27 14:42       ` Steve Capper
2014-08-27 15:01   ` Russell King - ARM Linux
2014-08-27 15:01     ` Russell King - ARM Linux
2014-08-27 15:01     ` Russell King - ARM Linux
2014-08-28  8:59     ` Steve Capper
2014-08-28  8:59       ` Steve Capper
2014-08-28  8:59       ` Steve Capper
2014-08-21 15:43 ` [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-27 10:46   ` Catalin Marinas
2014-08-27 10:46     ` Catalin Marinas
2014-08-27 10:46     ` Catalin Marinas
2014-08-27 12:52     ` Steve Capper
2014-08-27 12:52       ` Steve Capper
2014-08-27 12:52       ` Steve Capper
2014-08-21 15:43 ` [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-27 11:50   ` Catalin Marinas
2014-08-27 11:50     ` Catalin Marinas
2014-08-27 11:50     ` Catalin Marinas
2014-08-27 12:59     ` Steve Capper
2014-08-27 12:59       ` Steve Capper
2014-08-27 12:59       ` Steve Capper
2014-08-21 15:43 ` [PATH V2 4/6] arm: mm: Enable RCU fast_gup Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-27 11:51   ` Catalin Marinas
2014-08-27 11:51     ` Catalin Marinas
2014-08-27 11:51     ` Catalin Marinas
2014-08-27 13:01     ` Steve Capper
2014-08-27 13:01       ` Steve Capper
2014-08-27 13:01       ` Steve Capper
2014-08-21 15:43 ` [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-27 10:48   ` Catalin Marinas
2014-08-27 10:48     ` Catalin Marinas
2014-08-27 10:48     ` Catalin Marinas
2014-08-27 13:08     ` Steve Capper
2014-08-27 13:08       ` Steve Capper
2014-08-27 13:08       ` Steve Capper
2014-08-21 15:43 ` [PATH V2 6/6] arm64: mm: Enable RCU fast_gup Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-21 15:43   ` Steve Capper
2014-08-27 11:09   ` Catalin Marinas
2014-08-27 11:09     ` Catalin Marinas
2014-08-27 11:09     ` Catalin Marinas
2014-08-27 13:43     ` Steve Capper
2014-08-27 13:43       ` Steve Capper
2014-08-27 13:43       ` Steve Capper
2014-08-21 20:42 ` [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast Dann Frazier
2014-08-21 20:42   ` Dann Frazier
2014-08-21 20:42   ` Dann Frazier
2014-08-22  8:11   ` Steve Capper
2014-08-22  8:11     ` Steve Capper
2014-08-22  8:11     ` Steve Capper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.