All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-06-25 15:40 ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

Hello,
This series implements general forms of get_user_pages_fast and
__get_user_pages_fast and activates them for arm and arm64.

These are required for Transparent HugePages to function correctly, as
a futex on a THP tail will otherwise result in an infinite loop (due to
the core implementation of __get_user_pages_fast always returning 0).

This series may also be beneficial for direct-IO heavy workloads and
certain KVM workloads.

The main changes since RFC V5 are:
 * Rebased against 3.16-rc1.
 * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
   because the entry must be present for these leaf functions to be
   called. 
 * Rather than assume puds can be re-cast as pmds, a separate
   function pud_write is instead used by the core gup.
 * ARM activation logic changed, now it will only activate
   RCU_TABLE_FREE and RCU_GUP when running with LPAE.

The main changes since RFC V4 are:
 * corrected the arm64 logic so it now correctly rcu-frees page
   table backing pages.
 * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
   invalidate TLBs anyway.
 * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
 * dropped Catalin's mmu_gather patch as that's been merged already.

This series has been tested with LTP and some custom futex tests that
exacerbate the futex on THP tail case. Also debug counters were
temporarily employed to ensure that the RCU_TABLE_FREE logic was
behaving as expected.

I would really appreciate any testers or comments (especially on the
validity or otherwise of the core fast_gup implementation).

Cheers,
--
Steve

Steve Capper (6):
  mm: Introduce a general RCU get_user_pages_fast.
  arm: mm: Introduce special ptes for LPAE
  arm: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm: mm: Enable RCU fast_gup
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

 arch/arm/Kconfig                      |   5 +
 arch/arm/include/asm/pgtable-2level.h |   2 +
 arch/arm/include/asm/pgtable-3level.h |  16 ++
 arch/arm/include/asm/pgtable.h        |   6 +-
 arch/arm/include/asm/tlb.h            |  38 ++++-
 arch/arm/mm/flush.c                   |  19 +++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/include/asm/pgtable.h      |  11 +-
 arch/arm64/include/asm/tlb.h          |  18 ++-
 arch/arm64/mm/flush.c                 |  19 +++
 mm/Kconfig                            |   3 +
 mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
 12 files changed, 410 insertions(+), 9 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-06-25 15:40 ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

Hello,
This series implements general forms of get_user_pages_fast and
__get_user_pages_fast and activates them for arm and arm64.

These are required for Transparent HugePages to function correctly, as
a futex on a THP tail will otherwise result in an infinite loop (due to
the core implementation of __get_user_pages_fast always returning 0).

This series may also be beneficial for direct-IO heavy workloads and
certain KVM workloads.

The main changes since RFC V5 are:
 * Rebased against 3.16-rc1.
 * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
   because the entry must be present for these leaf functions to be
   called. 
 * Rather than assume puds can be re-cast as pmds, a separate
   function pud_write is instead used by the core gup.
 * ARM activation logic changed, now it will only activate
   RCU_TABLE_FREE and RCU_GUP when running with LPAE.

The main changes since RFC V4 are:
 * corrected the arm64 logic so it now correctly rcu-frees page
   table backing pages.
 * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
   invalidate TLBs anyway.
 * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
 * dropped Catalin's mmu_gather patch as that's been merged already.

This series has been tested with LTP and some custom futex tests that
exacerbate the futex on THP tail case. Also debug counters were
temporarily employed to ensure that the RCU_TABLE_FREE logic was
behaving as expected.

I would really appreciate any testers or comments (especially on the
validity or otherwise of the core fast_gup implementation).

Cheers,
--
Steve

Steve Capper (6):
  mm: Introduce a general RCU get_user_pages_fast.
  arm: mm: Introduce special ptes for LPAE
  arm: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm: mm: Enable RCU fast_gup
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

 arch/arm/Kconfig                      |   5 +
 arch/arm/include/asm/pgtable-2level.h |   2 +
 arch/arm/include/asm/pgtable-3level.h |  16 ++
 arch/arm/include/asm/pgtable.h        |   6 +-
 arch/arm/include/asm/tlb.h            |  38 ++++-
 arch/arm/mm/flush.c                   |  19 +++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/include/asm/pgtable.h      |  11 +-
 arch/arm64/include/asm/tlb.h          |  18 ++-
 arch/arm64/mm/flush.c                 |  19 +++
 mm/Kconfig                            |   3 +
 mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
 12 files changed, 410 insertions(+), 9 deletions(-)

-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-06-25 15:40 ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,
This series implements general forms of get_user_pages_fast and
__get_user_pages_fast and activates them for arm and arm64.

These are required for Transparent HugePages to function correctly, as
a futex on a THP tail will otherwise result in an infinite loop (due to
the core implementation of __get_user_pages_fast always returning 0).

This series may also be beneficial for direct-IO heavy workloads and
certain KVM workloads.

The main changes since RFC V5 are:
 * Rebased against 3.16-rc1.
 * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
   because the entry must be present for these leaf functions to be
   called. 
 * Rather than assume puds can be re-cast as pmds, a separate
   function pud_write is instead used by the core gup.
 * ARM activation logic changed, now it will only activate
   RCU_TABLE_FREE and RCU_GUP when running with LPAE.

The main changes since RFC V4 are:
 * corrected the arm64 logic so it now correctly rcu-frees page
   table backing pages.
 * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
   invalidate TLBs anyway.
 * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
 * dropped Catalin's mmu_gather patch as that's been merged already.

This series has been tested with LTP and some custom futex tests that
exacerbate the futex on THP tail case. Also debug counters were
temporarily employed to ensure that the RCU_TABLE_FREE logic was
behaving as expected.

I would really appreciate any testers or comments (especially on the
validity or otherwise of the core fast_gup implementation).

Cheers,
--
Steve

Steve Capper (6):
  mm: Introduce a general RCU get_user_pages_fast.
  arm: mm: Introduce special ptes for LPAE
  arm: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm: mm: Enable RCU fast_gup
  arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
  arm64: mm: Enable RCU fast_gup

 arch/arm/Kconfig                      |   5 +
 arch/arm/include/asm/pgtable-2level.h |   2 +
 arch/arm/include/asm/pgtable-3level.h |  16 ++
 arch/arm/include/asm/pgtable.h        |   6 +-
 arch/arm/include/asm/tlb.h            |  38 ++++-
 arch/arm/mm/flush.c                   |  19 +++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/include/asm/pgtable.h      |  11 +-
 arch/arm64/include/asm/tlb.h          |  18 ++-
 arch/arm64/mm/flush.c                 |  19 +++
 mm/Kconfig                            |   3 +
 mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
 12 files changed, 410 insertions(+), 9 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/6] mm: Introduce a general RCU get_user_pages_fast.
  2014-06-25 15:40 ` Steve Capper
  (?)
@ 2014-06-25 15:40   ` Steve Capper
  -1 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

get_user_pages_fast attempts to pin user pages by walking the page
tables directly and avoids taking locks. Thus the walker needs to be
protected from page table pages being freed from under it, and needs
to block any THP splits.

One way to achieve this is to have the walker disable interrupts, and
rely on IPIs from the TLB flushing code blocking before the page table
pages are freed.

On some platforms we have hardware broadcast of TLB invalidations, thus
the TLB flushing code doesn't necessarily need to broadcast IPIs; and
spuriously broadcasting IPIs can hurt system performance if done too
often.

This problem has been solved on PowerPC and Sparc by batching up page
table pages belonging to more than one mm_user, then scheduling an
rcu_sched callback to free the pages. This RCU page table free logic
has promoted to core code and is activated when one enables
HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
their own get_user_pages_fast routines.

The RCU page table free logic coupled with a an IPI broadcast on THP
split (which is a rare event), allows one to protect a page table
walker by merely disabling the interrupts during the walk.

This patch provides a general RCU implementation of get_user_pages_fast
that can be used by architectures that perform hardware broadcast of
TLB invalidations.

It is based heavily on the PowerPC implementation by Nick Piggin.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 mm/Kconfig |   3 +
 mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 3e9977a..2dabf62 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
 	boolean
 
+config HAVE_RCU_GUP
+	boolean
+
 config ARCH_DISCARD_MEMBLOCK
 	boolean
 
diff --git a/mm/gup.c b/mm/gup.c
index cc5a9e7..4ecef68 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 
+#include <linux/sched.h>
+#include <linux/rwsem.h>
+#include <asm/pgtable.h>
+
 #include "internal.h"
 
 static struct page *no_page_table(struct vm_area_struct *vma,
@@ -660,3 +664,277 @@ struct page *get_dump_page(unsigned long addr)
 	return page;
 }
 #endif /* CONFIG_ELF_CORE */
+
+#ifdef CONFIG_HAVE_RCU_GUP
+
+#ifdef __HAVE_ARCH_PTE_SPECIAL
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, *ptem;
+	int ret = 0;
+
+	ptem = ptep = pte_offset_map(&pmd, addr);
+	do {
+		pte_t pte = ACCESS_ONCE(*ptep);
+		struct page *page;
+
+		if (!pte_present(pte) || pte_special(pte)
+			|| (write && !pte_write(pte)))
+			goto pte_unmap;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			goto pte_unmap;
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			goto pte_unmap;
+		}
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	ret = 1;
+
+pte_unmap:
+	pte_unmap(ptem);
+	return ret;
+}
+#else
+
+/*
+ * If we can't determine whether or not a pte is special, then fail immediately
+ * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
+ * to be special.
+ */
+static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	return 0;
+}
+#endif /* __HAVE_ARCH_PTE_SPECIAL */
+
+static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pmd_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(orig);
+	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	/*
+	 * Any tail pages need their mapcount reference taken before we
+	 * return. (This allows the THP code to bump their ref count when
+	 * they are split into base pages).
+	 */
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pud_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pud_page(orig);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(&pud, addr);
+	do {
+		pmd_t pmd = ACCESS_ONCE(*pmdp);
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+			return 0;
+
+		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+				pages, nr))
+				return 0;
+		} else {
+			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+				return 0;
+		}
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		pud_t pud = ACCESS_ONCE(*pudp);
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			return 0;
+		if (pud_huge(pud)) {
+			if (!gup_huge_pud(pud, pudp, addr, next, write,
+					pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			  struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next, flags;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		return 0;
+
+	/*
+	 * Disable interrupts, we use the nested form as we can already
+	 * have interrupts disabled by get_futex_key.
+	 *
+	 * With interrupts disabled, we block page table pages from being
+	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+	 * for more details.
+	 *
+	 * We do not adopt an rcu_read_lock(.) here as we also want to
+	 * block IPIs that come from THPs splitting.
+	 */
+
+	local_irq_save(flags);
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			break;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			break;
+	} while (pgdp++, addr = next, addr != end);
+	local_irq_restore(flags);
+
+	return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr, ret;
+
+	start &= PAGE_MASK;
+	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	ret = nr;
+
+	if (nr < nr_pages) {
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+				     nr_pages - nr, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+	}
+
+	return ret;
+}
+
+#endif /* CONFIG_HAVE_RCU_GUP */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

get_user_pages_fast attempts to pin user pages by walking the page
tables directly and avoids taking locks. Thus the walker needs to be
protected from page table pages being freed from under it, and needs
to block any THP splits.

One way to achieve this is to have the walker disable interrupts, and
rely on IPIs from the TLB flushing code blocking before the page table
pages are freed.

On some platforms we have hardware broadcast of TLB invalidations, thus
the TLB flushing code doesn't necessarily need to broadcast IPIs; and
spuriously broadcasting IPIs can hurt system performance if done too
often.

This problem has been solved on PowerPC and Sparc by batching up page
table pages belonging to more than one mm_user, then scheduling an
rcu_sched callback to free the pages. This RCU page table free logic
has promoted to core code and is activated when one enables
HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
their own get_user_pages_fast routines.

The RCU page table free logic coupled with a an IPI broadcast on THP
split (which is a rare event), allows one to protect a page table
walker by merely disabling the interrupts during the walk.

This patch provides a general RCU implementation of get_user_pages_fast
that can be used by architectures that perform hardware broadcast of
TLB invalidations.

It is based heavily on the PowerPC implementation by Nick Piggin.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 mm/Kconfig |   3 +
 mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 3e9977a..2dabf62 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
 	boolean
 
+config HAVE_RCU_GUP
+	boolean
+
 config ARCH_DISCARD_MEMBLOCK
 	boolean
 
diff --git a/mm/gup.c b/mm/gup.c
index cc5a9e7..4ecef68 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 
+#include <linux/sched.h>
+#include <linux/rwsem.h>
+#include <asm/pgtable.h>
+
 #include "internal.h"
 
 static struct page *no_page_table(struct vm_area_struct *vma,
@@ -660,3 +664,277 @@ struct page *get_dump_page(unsigned long addr)
 	return page;
 }
 #endif /* CONFIG_ELF_CORE */
+
+#ifdef CONFIG_HAVE_RCU_GUP
+
+#ifdef __HAVE_ARCH_PTE_SPECIAL
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, *ptem;
+	int ret = 0;
+
+	ptem = ptep = pte_offset_map(&pmd, addr);
+	do {
+		pte_t pte = ACCESS_ONCE(*ptep);
+		struct page *page;
+
+		if (!pte_present(pte) || pte_special(pte)
+			|| (write && !pte_write(pte)))
+			goto pte_unmap;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			goto pte_unmap;
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			goto pte_unmap;
+		}
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	ret = 1;
+
+pte_unmap:
+	pte_unmap(ptem);
+	return ret;
+}
+#else
+
+/*
+ * If we can't determine whether or not a pte is special, then fail immediately
+ * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
+ * to be special.
+ */
+static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	return 0;
+}
+#endif /* __HAVE_ARCH_PTE_SPECIAL */
+
+static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pmd_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(orig);
+	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	/*
+	 * Any tail pages need their mapcount reference taken before we
+	 * return. (This allows the THP code to bump their ref count when
+	 * they are split into base pages).
+	 */
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pud_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pud_page(orig);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(&pud, addr);
+	do {
+		pmd_t pmd = ACCESS_ONCE(*pmdp);
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+			return 0;
+
+		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+				pages, nr))
+				return 0;
+		} else {
+			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+				return 0;
+		}
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		pud_t pud = ACCESS_ONCE(*pudp);
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			return 0;
+		if (pud_huge(pud)) {
+			if (!gup_huge_pud(pud, pudp, addr, next, write,
+					pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			  struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next, flags;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		return 0;
+
+	/*
+	 * Disable interrupts, we use the nested form as we can already
+	 * have interrupts disabled by get_futex_key.
+	 *
+	 * With interrupts disabled, we block page table pages from being
+	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+	 * for more details.
+	 *
+	 * We do not adopt an rcu_read_lock(.) here as we also want to
+	 * block IPIs that come from THPs splitting.
+	 */
+
+	local_irq_save(flags);
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			break;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			break;
+	} while (pgdp++, addr = next, addr != end);
+	local_irq_restore(flags);
+
+	return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr, ret;
+
+	start &= PAGE_MASK;
+	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	ret = nr;
+
+	if (nr < nr_pages) {
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+				     nr_pages - nr, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+	}
+
+	return ret;
+}
+
+#endif /* CONFIG_HAVE_RCU_GUP */
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 1/6] mm: Introduce a general RCU get_user_pages_fast.
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

get_user_pages_fast attempts to pin user pages by walking the page
tables directly and avoids taking locks. Thus the walker needs to be
protected from page table pages being freed from under it, and needs
to block any THP splits.

One way to achieve this is to have the walker disable interrupts, and
rely on IPIs from the TLB flushing code blocking before the page table
pages are freed.

On some platforms we have hardware broadcast of TLB invalidations, thus
the TLB flushing code doesn't necessarily need to broadcast IPIs; and
spuriously broadcasting IPIs can hurt system performance if done too
often.

This problem has been solved on PowerPC and Sparc by batching up page
table pages belonging to more than one mm_user, then scheduling an
rcu_sched callback to free the pages. This RCU page table free logic
has promoted to core code and is activated when one enables
HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement
their own get_user_pages_fast routines.

The RCU page table free logic coupled with a an IPI broadcast on THP
split (which is a rare event), allows one to protect a page table
walker by merely disabling the interrupts during the walk.

This patch provides a general RCU implementation of get_user_pages_fast
that can be used by architectures that perform hardware broadcast of
TLB invalidations.

It is based heavily on the PowerPC implementation by Nick Piggin.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 mm/Kconfig |   3 +
 mm/gup.c   | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 281 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 3e9977a..2dabf62 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
 	boolean
 
+config HAVE_RCU_GUP
+	boolean
+
 config ARCH_DISCARD_MEMBLOCK
 	boolean
 
diff --git a/mm/gup.c b/mm/gup.c
index cc5a9e7..4ecef68 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 
+#include <linux/sched.h>
+#include <linux/rwsem.h>
+#include <asm/pgtable.h>
+
 #include "internal.h"
 
 static struct page *no_page_table(struct vm_area_struct *vma,
@@ -660,3 +664,277 @@ struct page *get_dump_page(unsigned long addr)
 	return page;
 }
 #endif /* CONFIG_ELF_CORE */
+
+#ifdef CONFIG_HAVE_RCU_GUP
+
+#ifdef __HAVE_ARCH_PTE_SPECIAL
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, *ptem;
+	int ret = 0;
+
+	ptem = ptep = pte_offset_map(&pmd, addr);
+	do {
+		pte_t pte = ACCESS_ONCE(*ptep);
+		struct page *page;
+
+		if (!pte_present(pte) || pte_special(pte)
+			|| (write && !pte_write(pte)))
+			goto pte_unmap;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			goto pte_unmap;
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			goto pte_unmap;
+		}
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	ret = 1;
+
+pte_unmap:
+	pte_unmap(ptem);
+	return ret;
+}
+#else
+
+/*
+ * If we can't determine whether or not a pte is special, then fail immediately
+ * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
+ * to be special.
+ */
+static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+			 int write, struct page **pages, int *nr)
+{
+	return 0;
+}
+#endif /* __HAVE_ARCH_PTE_SPECIAL */
+
+static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pmd_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pmd_page(orig);
+	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	/*
+	 * Any tail pages need their mapcount reference taken before we
+	 * return. (This allows the THP code to bump their ref count when
+	 * they are split into base pages).
+	 */
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	struct page *head, *page, *tail;
+	int refs;
+
+	if (write && !pud_write(orig))
+		return 0;
+
+	refs = 0;
+	head = pud_page(orig);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	tail = page;
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	while (refs--) {
+		if (PageTail(tail))
+			get_huge_page_tail(tail);
+		tail++;
+	}
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(&pud, addr);
+	do {
+		pmd_t pmd = ACCESS_ONCE(*pmdp);
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+			return 0;
+
+		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (!gup_huge_pmd(pmd, pmdp, addr, next, write,
+				pages, nr))
+				return 0;
+		} else {
+			if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+				return 0;
+		}
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		pud_t pud = ACCESS_ONCE(*pudp);
+		next = pud_addr_end(addr, end);
+		if (pud_none(pud))
+			return 0;
+		if (pud_huge(pud)) {
+			if (!gup_huge_pud(pud, pudp, addr, next, write,
+					pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			  struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next, flags;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		return 0;
+
+	/*
+	 * Disable interrupts, we use the nested form as we can already
+	 * have interrupts disabled by get_futex_key.
+	 *
+	 * With interrupts disabled, we block page table pages from being
+	 * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+	 * for more details.
+	 *
+	 * We do not adopt an rcu_read_lock(.) here as we also want to
+	 * block IPIs that come from THPs splitting.
+	 */
+
+	local_irq_save(flags);
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			break;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			break;
+	} while (pgdp++, addr = next, addr != end);
+	local_irq_restore(flags);
+
+	return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	int nr, ret;
+
+	start &= PAGE_MASK;
+	nr = __get_user_pages_fast(start, nr_pages, write, pages);
+	ret = nr;
+
+	if (nr < nr_pages) {
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+				     nr_pages - nr, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+	}
+
+	return ret;
+}
+
+#endif /* CONFIG_HAVE_RCU_GUP */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/6] arm: mm: Introduce special ptes for LPAE
  2014-06-25 15:40 ` Steve Capper
  (?)
@ 2014-06-25 15:40   ` Steve Capper
  -1 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

We need a mechanism to tag ptes as being special, this indicates that
no attempt should be made to access the underlying struct page *
associated with the pte. This is used by the fast_gup when operating on
ptes as it has no means to access VMAs (that also contain this
information) locklessly.

The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
pte_special and pte_mkspecial to make use of it, and defines
__HAVE_ARCH_PTE_SPECIAL.

This patch also excludes special ptes from the icache/dcache sync logic.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 arch/arm/include/asm/pgtable-3level.h | 8 ++++++++
 arch/arm/include/asm/pgtable.h        | 6 ++----
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 219ac88..f027941 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_addr_end(addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#define pte_special(pte)	(0)
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 
 /*
  * We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 85c60ad..b286ba9 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pte_huge(pte)		(pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
 #define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
 
+#define pte_special(pte)	(!!(pte_val(pte) & L_PTE_SPECIAL))
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+	pte_val(pte) |= L_PTE_SPECIAL;
+	return pte;
+}
+#define	__HAVE_ARCH_PTE_SPECIAL
+
 #define pmd_young(pmd)		(pmd_val(pmd) & PMD_SECT_AF)
 
 #define __HAVE_ARCH_PMD_WRITE
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 5478e5d..63b1db2 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -222,7 +222,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_dirty(pte)		(pte_val(pte) & L_PTE_DIRTY)
 #define pte_young(pte)		(pte_val(pte) & L_PTE_YOUNG)
 #define pte_exec(pte)		(!(pte_val(pte) & L_PTE_XN))
-#define pte_special(pte)	(0)
 
 #define pte_valid_user(pte)	\
 	(pte_valid(pte) && (pte_val(pte) & L_PTE_USER) && pte_young(pte))
@@ -241,7 +240,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long ext = 0;
 
 	if (addr < TASK_SIZE && pte_valid_user(pteval)) {
-		__sync_icache_dcache(pteval);
+		if (!pte_special(pteval))
+			__sync_icache_dcache(pteval);
 		ext |= PTE_EXT_NG;
 	}
 
@@ -260,8 +260,6 @@ PTE_BIT_FUNC(mkyoung,   |= L_PTE_YOUNG);
 PTE_BIT_FUNC(mkexec,   &= ~L_PTE_XN);
 PTE_BIT_FUNC(mknexec,   |= L_PTE_XN);
 
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

We need a mechanism to tag ptes as being special, this indicates that
no attempt should be made to access the underlying struct page *
associated with the pte. This is used by the fast_gup when operating on
ptes as it has no means to access VMAs (that also contain this
information) locklessly.

The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
pte_special and pte_mkspecial to make use of it, and defines
__HAVE_ARCH_PTE_SPECIAL.

This patch also excludes special ptes from the icache/dcache sync logic.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 arch/arm/include/asm/pgtable-3level.h | 8 ++++++++
 arch/arm/include/asm/pgtable.h        | 6 ++----
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 219ac88..f027941 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_addr_end(addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#define pte_special(pte)	(0)
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 
 /*
  * We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 85c60ad..b286ba9 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pte_huge(pte)		(pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
 #define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
 
+#define pte_special(pte)	(!!(pte_val(pte) & L_PTE_SPECIAL))
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+	pte_val(pte) |= L_PTE_SPECIAL;
+	return pte;
+}
+#define	__HAVE_ARCH_PTE_SPECIAL
+
 #define pmd_young(pmd)		(pmd_val(pmd) & PMD_SECT_AF)
 
 #define __HAVE_ARCH_PMD_WRITE
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 5478e5d..63b1db2 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -222,7 +222,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_dirty(pte)		(pte_val(pte) & L_PTE_DIRTY)
 #define pte_young(pte)		(pte_val(pte) & L_PTE_YOUNG)
 #define pte_exec(pte)		(!(pte_val(pte) & L_PTE_XN))
-#define pte_special(pte)	(0)
 
 #define pte_valid_user(pte)	\
 	(pte_valid(pte) && (pte_val(pte) & L_PTE_USER) && pte_young(pte))
@@ -241,7 +240,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long ext = 0;
 
 	if (addr < TASK_SIZE && pte_valid_user(pteval)) {
-		__sync_icache_dcache(pteval);
+		if (!pte_special(pteval))
+			__sync_icache_dcache(pteval);
 		ext |= PTE_EXT_NG;
 	}
 
@@ -260,8 +260,6 @@ PTE_BIT_FUNC(mkyoung,   |= L_PTE_YOUNG);
 PTE_BIT_FUNC(mkexec,   &= ~L_PTE_XN);
 PTE_BIT_FUNC(mknexec,   |= L_PTE_XN);
 
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

We need a mechanism to tag ptes as being special, this indicates that
no attempt should be made to access the underlying struct page *
associated with the pte. This is used by the fast_gup when operating on
ptes as it has no means to access VMAs (that also contain this
information) locklessly.

The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
pte_special and pte_mkspecial to make use of it, and defines
__HAVE_ARCH_PTE_SPECIAL.

This patch also excludes special ptes from the icache/dcache sync logic.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 arch/arm/include/asm/pgtable-3level.h | 8 ++++++++
 arch/arm/include/asm/pgtable.h        | 6 ++----
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 219ac88..f027941 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pmd_addr_end(addr,end) (end)
 
 #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#define pte_special(pte)	(0)
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
 
 /*
  * We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 85c60ad..b286ba9 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #define pte_huge(pte)		(pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
 #define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
 
+#define pte_special(pte)	(!!(pte_val(pte) & L_PTE_SPECIAL))
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+	pte_val(pte) |= L_PTE_SPECIAL;
+	return pte;
+}
+#define	__HAVE_ARCH_PTE_SPECIAL
+
 #define pmd_young(pmd)		(pmd_val(pmd) & PMD_SECT_AF)
 
 #define __HAVE_ARCH_PMD_WRITE
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 5478e5d..63b1db2 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -222,7 +222,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_dirty(pte)		(pte_val(pte) & L_PTE_DIRTY)
 #define pte_young(pte)		(pte_val(pte) & L_PTE_YOUNG)
 #define pte_exec(pte)		(!(pte_val(pte) & L_PTE_XN))
-#define pte_special(pte)	(0)
 
 #define pte_valid_user(pte)	\
 	(pte_valid(pte) && (pte_val(pte) & L_PTE_USER) && pte_young(pte))
@@ -241,7 +240,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	unsigned long ext = 0;
 
 	if (addr < TASK_SIZE && pte_valid_user(pteval)) {
-		__sync_icache_dcache(pteval);
+		if (!pte_special(pteval))
+			__sync_icache_dcache(pteval);
 		ext |= PTE_EXT_NG;
 	}
 
@@ -260,8 +260,6 @@ PTE_BIT_FUNC(mkyoung,   |= L_PTE_YOUNG);
 PTE_BIT_FUNC(mkexec,   &= ~L_PTE_XN);
 PTE_BIT_FUNC(mknexec,   |= L_PTE_XN);
 
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
-
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect
the fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig           |  1 +
 arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 245058b..888bc8a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f1a0dac..3cadb72 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -35,12 +35,39 @@
 
 #define MMU_GATHER_BUNDLE	8
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+
+struct mmu_table_batch {
+	struct rcu_head		rcu;
+	unsigned int		nr;
+	void			*tables[0];
+};
+
+#define MAX_TABLE_BATCH		\
+	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * TLB handling.  This allows us to remove pages from the page
  * tables, and efficiently handle the TLB issues.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+	unsigned int		need_flush;
+#endif
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
 	unsigned long		start, end;
@@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
@@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
 	__tlb_alloc_page(tlb);
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
 }
 
 static inline void
@@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	tlb_add_flush(tlb, addr + SZ_1M);
 #endif
 
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
@@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 {
 #ifdef CONFIG_ARM_LPAE
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 #endif
 }
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect
the fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig           |  1 +
 arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 245058b..888bc8a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f1a0dac..3cadb72 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -35,12 +35,39 @@
 
 #define MMU_GATHER_BUNDLE	8
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+
+struct mmu_table_batch {
+	struct rcu_head		rcu;
+	unsigned int		nr;
+	void			*tables[0];
+};
+
+#define MAX_TABLE_BATCH		\
+	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * TLB handling.  This allows us to remove pages from the page
  * tables, and efficiently handle the TLB issues.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+	unsigned int		need_flush;
+#endif
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
 	unsigned long		start, end;
@@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
@@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
 	__tlb_alloc_page(tlb);
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
 }
 
 static inline void
@@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	tlb_add_flush(tlb, addr + SZ_1M);
 #endif
 
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
@@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 {
 #ifdef CONFIG_ARM_LPAE
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 #endif
 }
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect
the fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig           |  1 +
 arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 245058b..888bc8a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f1a0dac..3cadb72 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -35,12 +35,39 @@
 
 #define MMU_GATHER_BUNDLE	8
 
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+
+struct mmu_table_batch {
+	struct rcu_head		rcu;
+	unsigned int		nr;
+	void			*tables[0];
+};
+
+#define MAX_TABLE_BATCH		\
+	((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *))
+
+extern void tlb_table_flush(struct mmu_gather *tlb);
+extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * TLB handling.  This allows us to remove pages from the page
  * tables, and efficiently handle the TLB issues.
  */
 struct mmu_gather {
 	struct mm_struct	*mm;
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	struct mmu_table_batch	*batch;
+	unsigned int		need_flush;
+#endif
 	unsigned int		fullmm;
 	struct vm_area_struct	*vma;
 	unsigned long		start, end;
@@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb_table_flush(tlb);
+#endif
 }
 
 static inline void tlb_flush_mmu_free(struct mmu_gather *tlb)
@@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start
 	tlb->pages = tlb->local;
 	tlb->nr = 0;
 	__tlb_alloc_page(tlb);
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+	tlb->batch = NULL;
+#endif
 }
 
 static inline void
@@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 	tlb_add_flush(tlb, addr + SZ_1M);
 #endif
 
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
@@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 {
 #ifdef CONFIG_ARM_LPAE
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 #endif
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 4/6] arm: mm: Enable RCU fast_gup
  2014-06-25 15:40 ` Steve Capper
  (?)
@ 2014-06-25 15:40   ` Steve Capper
  -1 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

Activate the RCU fast_gup for ARM. We also need to force THP splits to
broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig                      |  4 ++++
 arch/arm/include/asm/pgtable-3level.h |  8 ++++++++
 arch/arm/mm/flush.c                   | 19 +++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 888bc8a..6d86ff6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1712,6 +1712,10 @@ config ARCH_SELECT_MEMORY_MODEL
 config HAVE_ARCH_PFN_VALID
 	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
 
+config HAVE_RCU_GUP
+	def_bool y
+	depends on ARM_LPAE
+
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index b286ba9..fa8dcb2 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -219,6 +219,8 @@ static inline pte_t pte_mkspecial(pte_t pte)
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(!(pmd_val(pmd) & PMD_SECT_RDONLY))
+#define pud_write(pud)		(0)
+#define pud_page(pud)		pmd_page(__pmd(pud_val(pud)))
 
 #define pmd_hugewillfault(pmd)	(!pmd_young(pmd) || !pmd_write(pmd))
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
@@ -226,6 +228,12 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING)
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif
 #endif
 
 #define PMD_BIT_FUNC(fn,op) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 43d54f5..9422820 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -400,3 +400,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l
 	 */
 	__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static void thp_splitting_flush_sync(void *arg)
+{
+}
+
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	smp_call_function(thp_splitting_flush_sync, NULL, 1);
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 4/6] arm: mm: Enable RCU fast_gup
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

Activate the RCU fast_gup for ARM. We also need to force THP splits to
broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig                      |  4 ++++
 arch/arm/include/asm/pgtable-3level.h |  8 ++++++++
 arch/arm/mm/flush.c                   | 19 +++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 888bc8a..6d86ff6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1712,6 +1712,10 @@ config ARCH_SELECT_MEMORY_MODEL
 config HAVE_ARCH_PFN_VALID
 	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
 
+config HAVE_RCU_GUP
+	def_bool y
+	depends on ARM_LPAE
+
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index b286ba9..fa8dcb2 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -219,6 +219,8 @@ static inline pte_t pte_mkspecial(pte_t pte)
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(!(pmd_val(pmd) & PMD_SECT_RDONLY))
+#define pud_write(pud)		(0)
+#define pud_page(pud)		pmd_page(__pmd(pud_val(pud)))
 
 #define pmd_hugewillfault(pmd)	(!pmd_young(pmd) || !pmd_write(pmd))
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
@@ -226,6 +228,12 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING)
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif
 #endif
 
 #define PMD_BIT_FUNC(fn,op) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 43d54f5..9422820 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -400,3 +400,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l
 	 */
 	__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static void thp_splitting_flush_sync(void *arg)
+{
+}
+
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	smp_call_function(thp_splitting_flush_sync, NULL, 1);
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 4/6] arm: mm: Enable RCU fast_gup
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

Activate the RCU fast_gup for ARM. We also need to force THP splits to
broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm/Kconfig                      |  4 ++++
 arch/arm/include/asm/pgtable-3level.h |  8 ++++++++
 arch/arm/mm/flush.c                   | 19 +++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 888bc8a..6d86ff6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1712,6 +1712,10 @@ config ARCH_SELECT_MEMORY_MODEL
 config HAVE_ARCH_PFN_VALID
 	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
 
+config HAVE_RCU_GUP
+	def_bool y
+	depends on ARM_LPAE
+
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index b286ba9..fa8dcb2 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -219,6 +219,8 @@ static inline pte_t pte_mkspecial(pte_t pte)
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd)		(!(pmd_val(pmd) & PMD_SECT_RDONLY))
+#define pud_write(pud)		(0)
+#define pud_page(pud)		pmd_page(__pmd(pud_val(pud)))
 
 #define pmd_hugewillfault(pmd)	(!pmd_young(pmd) || !pmd_write(pmd))
 #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
@@ -226,6 +228,12 @@ static inline pte_t pte_mkspecial(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING)
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif
 #endif
 
 #define PMD_BIT_FUNC(fn,op) \
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 43d54f5..9422820 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -400,3 +400,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l
 	 */
 	__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
 }
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static void thp_splitting_flush_sync(void *arg)
+{
+}
+
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	smp_call_function(thp_splitting_flush_sync, NULL, 1);
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect the
fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig           |  1 +
 arch/arm64/include/asm/tlb.h | 18 ++++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a474de34..e1d2eef 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -49,6 +49,7 @@ config ARM64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
 	select IRQ_DOMAIN
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 80e2c08..8e4dde5 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -23,6 +23,20 @@
 
 #include <asm-generic/tlb.h>
 
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * There's three ways the TLB shootdown code is used:
  *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
@@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 {
 	pgtable_page_dtor(pte);
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 #ifndef CONFIG_ARM64_64K_PAGES
@@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 }
 #endif
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect the
fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig           |  1 +
 arch/arm64/include/asm/tlb.h | 18 ++++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a474de34..e1d2eef 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -49,6 +49,7 @@ config ARM64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
 	select IRQ_DOMAIN
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 80e2c08..8e4dde5 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -23,6 +23,20 @@
 
 #include <asm-generic/tlb.h>
 
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * There's three ways the TLB shootdown code is used:
  *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
@@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 {
 	pgtable_page_dtor(pte);
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 #ifndef CONFIG_ARM64_64K_PAGES
@@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 }
 #endif
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect the
fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig           |  1 +
 arch/arm64/include/asm/tlb.h | 18 ++++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a474de34..e1d2eef 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -49,6 +49,7 @@ config ARM64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
 	select IRQ_DOMAIN
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 80e2c08..8e4dde5 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -23,6 +23,20 @@
 
 #include <asm-generic/tlb.h>
 
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+
+#define tlb_remove_entry(tlb, entry)	tlb_remove_table(tlb, entry)
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+#else
+#define tlb_remove_entry(tlb, entry)	tlb_remove_page(tlb, entry)
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+
 /*
  * There's three ways the TLB shootdown code is used:
  *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().
@@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 {
 	pgtable_page_dtor(pte);
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, pte);
+	tlb_remove_entry(tlb, pte);
 }
 
 #ifndef CONFIG_ARM64_64K_PAGES
@@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
 	tlb_add_flush(tlb, addr);
-	tlb_remove_page(tlb, virt_to_page(pmdp));
+	tlb_remove_entry(tlb, virt_to_page(pmdp));
 }
 #endif
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

Activate the RCU fast_gup for ARM64. We also need to force THP splits
to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
 arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e1d2eef..d6fcb8e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -102,6 +102,9 @@ config GENERIC_CALIBRATE_DELAY
 config ZONE_DMA
 	def_bool y
 
+config HAVE_RCU_GUP
+	def_bool y
+
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool y
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e0ccceb..62510f7 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -237,7 +237,13 @@ static inline pmd_t pte_pmd(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
-#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+struct vm_area_struct;
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
@@ -258,6 +264,7 @@ static inline pmd_t pte_pmd(pte_t pte)
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
 #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
 
 #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
@@ -345,6 +352,8 @@ static inline pmd_t *pud_page_vaddr(pud_t pud)
 	return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK);
 }
 
+#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))
+
 #endif	/* CONFIG_ARM64_64K_PAGES */
 
 /* to find an entry in a page-table-directory */
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index e4193e3..ddf96c1 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
  */
 EXPORT_SYMBOL(flush_cache_all);
 EXPORT_SYMBOL(flush_icache_range);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static void thp_splitting_flush_sync(void *arg)
+{
+}
+
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	smp_call_function(thp_splitting_flush_sync, NULL, 1);
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm
  Cc: will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell, akpm, Steve Capper

Activate the RCU fast_gup for ARM64. We also need to force THP splits
to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
 arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e1d2eef..d6fcb8e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -102,6 +102,9 @@ config GENERIC_CALIBRATE_DELAY
 config ZONE_DMA
 	def_bool y
 
+config HAVE_RCU_GUP
+	def_bool y
+
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool y
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e0ccceb..62510f7 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -237,7 +237,13 @@ static inline pmd_t pte_pmd(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
-#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+struct vm_area_struct;
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
@@ -258,6 +264,7 @@ static inline pmd_t pte_pmd(pte_t pte)
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
 #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
 
 #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
@@ -345,6 +352,8 @@ static inline pmd_t *pud_page_vaddr(pud_t pud)
 	return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK);
 }
 
+#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))
+
 #endif	/* CONFIG_ARM64_64K_PAGES */
 
 /* to find an entry in a page-table-directory */
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index e4193e3..ddf96c1 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
  */
 EXPORT_SYMBOL(flush_cache_all);
 EXPORT_SYMBOL(flush_icache_range);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static void thp_splitting_flush_sync(void *arg)
+{
+}
+
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	smp_call_function(thp_splitting_flush_sync, NULL, 1);
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-25 15:40   ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-25 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

Activate the RCU fast_gup for ARM64. We also need to force THP splits
to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
splits are comparatively rare, this should not lead to a noticeable
performance degradation.

Some pre-requisite functions pud_write and pud_page are also added.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
 arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e1d2eef..d6fcb8e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -102,6 +102,9 @@ config GENERIC_CALIBRATE_DELAY
 config ZONE_DMA
 	def_bool y
 
+config HAVE_RCU_GUP
+	def_bool y
+
 config ARCH_DMA_ADDR_T_64BIT
 	def_bool y
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e0ccceb..62510f7 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -237,7 +237,13 @@ static inline pmd_t pte_pmd(pte_t pte)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define pmd_trans_huge(pmd)	(pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT))
 #define pmd_trans_splitting(pmd)	pte_special(pmd_pte(pmd))
-#endif
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
+struct vm_area_struct;
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp);
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
@@ -258,6 +264,7 @@ static inline pmd_t pte_pmd(pte_t pte)
 #define mk_pmd(page,prot)	pfn_pmd(page_to_pfn(page),prot)
 
 #define pmd_page(pmd)           pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+#define pud_write(pud)		pmd_write(__pmd(pud_val(pud)))
 #define pud_pfn(pud)		(((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
 
 #define set_pmd_at(mm, addr, pmdp, pmd)	set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
@@ -345,6 +352,8 @@ static inline pmd_t *pud_page_vaddr(pud_t pud)
 	return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK);
 }
 
+#define pud_page(pud)           pmd_page(__pmd(pud_val(pud)))
+
 #endif	/* CONFIG_ARM64_64K_PAGES */
 
 /* to find an entry in a page-table-directory */
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index e4193e3..ddf96c1 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
  */
 EXPORT_SYMBOL(flush_cache_all);
 EXPORT_SYMBOL(flush_icache_range);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+static void thp_splitting_flush_sync(void *arg)
+{
+}
+
+void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
+			  pmd_t *pmdp)
+{
+	pmd_t pmd = pmd_mksplitting(*pmdp);
+	VM_BUG_ON(address & ~PMD_MASK);
+	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
+
+	/* dummy IPI to serialise against fast_gup */
+	smp_call_function(thp_splitting_flush_sync, NULL, 1);
+}
+#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-25 16:50     ` Mark Rutland
  0 siblings, 0 replies; 49+ messages in thread
From: Mark Rutland @ 2014-06-25 16:50 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	anders.roxell, peterz, gary.robertson, Will Deacon, akpm,
	christoffer.dall

Hi Steve,

On Wed, Jun 25, 2014 at 04:40:24PM +0100, Steve Capper wrote:
> Activate the RCU fast_gup for ARM64. We also need to force THP splits
> to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> splits are comparatively rare, this should not lead to a noticeable
> performance degradation.
> 
> Some pre-requisite functions pud_write and pud_page are also added.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
>  arch/arm64/Kconfig               |  3 +++
>  arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
>  arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
>  3 files changed, 32 insertions(+), 1 deletion(-)

[...]

> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index e4193e3..ddf96c1 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
>   */
>  EXPORT_SYMBOL(flush_cache_all);
>  EXPORT_SYMBOL(flush_icache_range);
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +static void thp_splitting_flush_sync(void *arg)
> +{
> +}
> +
> +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> +			  pmd_t *pmdp)
> +{
> +	pmd_t pmd = pmd_mksplitting(*pmdp);
> +	VM_BUG_ON(address & ~PMD_MASK);
> +	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
> +
> +	/* dummy IPI to serialise against fast_gup */
> +	smp_call_function(thp_splitting_flush_sync, NULL, 1);

Is there some reason we can't use kick_all_cpus_sync()?

From a glance it seems that powerpc does just that.

Mark.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-25 16:50     ` Mark Rutland
  0 siblings, 0 replies; 49+ messages in thread
From: Mark Rutland @ 2014-06-25 16:50 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	anders.roxell, peterz, gary.robertson, Will Deacon, akpm,
	christoffer.dall

Hi Steve,

On Wed, Jun 25, 2014 at 04:40:24PM +0100, Steve Capper wrote:
> Activate the RCU fast_gup for ARM64. We also need to force THP splits
> to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> splits are comparatively rare, this should not lead to a noticeable
> performance degradation.
> 
> Some pre-requisite functions pud_write and pud_page are also added.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
>  arch/arm64/Kconfig               |  3 +++
>  arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
>  arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
>  3 files changed, 32 insertions(+), 1 deletion(-)

[...]

> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index e4193e3..ddf96c1 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
>   */
>  EXPORT_SYMBOL(flush_cache_all);
>  EXPORT_SYMBOL(flush_icache_range);
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +static void thp_splitting_flush_sync(void *arg)
> +{
> +}
> +
> +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> +			  pmd_t *pmdp)
> +{
> +	pmd_t pmd = pmd_mksplitting(*pmdp);
> +	VM_BUG_ON(address & ~PMD_MASK);
> +	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
> +
> +	/* dummy IPI to serialise against fast_gup */
> +	smp_call_function(thp_splitting_flush_sync, NULL, 1);

Is there some reason we can't use kick_all_cpus_sync()?

From a glance it seems that powerpc does just that.

Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-25 16:50     ` Mark Rutland
  0 siblings, 0 replies; 49+ messages in thread
From: Mark Rutland @ 2014-06-25 16:50 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	anders.roxell, peterz, gary.robertson, Will Deacon, akpm,
	christoffer.dall

Hi Steve,

On Wed, Jun 25, 2014 at 04:40:24PM +0100, Steve Capper wrote:
> Activate the RCU fast_gup for ARM64. We also need to force THP splits
> to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> splits are comparatively rare, this should not lead to a noticeable
> performance degradation.
> 
> Some pre-requisite functions pud_write and pud_page are also added.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
>  arch/arm64/Kconfig               |  3 +++
>  arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
>  arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
>  3 files changed, 32 insertions(+), 1 deletion(-)

[...]

> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index e4193e3..ddf96c1 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
>   */
>  EXPORT_SYMBOL(flush_cache_all);
>  EXPORT_SYMBOL(flush_icache_range);
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +static void thp_splitting_flush_sync(void *arg)
> +{
> +}
> +
> +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> +			  pmd_t *pmdp)
> +{
> +	pmd_t pmd = pmd_mksplitting(*pmdp);
> +	VM_BUG_ON(address & ~PMD_MASK);
> +	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
> +
> +	/* dummy IPI to serialise against fast_gup */
> +	smp_call_function(thp_splitting_flush_sync, NULL, 1);

Is there some reason we can't use kick_all_cpus_sync()?

>From a glance it seems that powerpc does just that.

Mark.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-25 16:50     ` Mark Rutland
  0 siblings, 0 replies; 49+ messages in thread
From: Mark Rutland @ 2014-06-25 16:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Steve,

On Wed, Jun 25, 2014 at 04:40:24PM +0100, Steve Capper wrote:
> Activate the RCU fast_gup for ARM64. We also need to force THP splits
> to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> splits are comparatively rare, this should not lead to a noticeable
> performance degradation.
> 
> Some pre-requisite functions pud_write and pud_page are also added.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
>  arch/arm64/Kconfig               |  3 +++
>  arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
>  arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
>  3 files changed, 32 insertions(+), 1 deletion(-)

[...]

> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index e4193e3..ddf96c1 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
>   */
>  EXPORT_SYMBOL(flush_cache_all);
>  EXPORT_SYMBOL(flush_icache_range);
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +static void thp_splitting_flush_sync(void *arg)
> +{
> +}
> +
> +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> +			  pmd_t *pmdp)
> +{
> +	pmd_t pmd = pmd_mksplitting(*pmdp);
> +	VM_BUG_ON(address & ~PMD_MASK);
> +	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
> +
> +	/* dummy IPI to serialise against fast_gup */
> +	smp_call_function(thp_splitting_flush_sync, NULL, 1);

Is there some reason we can't use kick_all_cpus_sync()?

>From a glance it seems that powerpc does just that.

Mark.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-06-25 20:42   ` Andrew Morton
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Morton @ 2014-06-25 20:42 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm,
	will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell

On Wed, 25 Jun 2014 16:40:18 +0100 Steve Capper <steve.capper@linaro.org> wrote:

> This series implements general forms of get_user_pages_fast and
> __get_user_pages_fast and activates them for arm and arm64.

Why not x86?

I think I might have already asked this.  If so, it's your fault for
not updating the changelog ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-06-25 20:42   ` Andrew Morton
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Morton @ 2014-06-25 20:42 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, catalin.marinas, linux, linux-arch, linux-mm,
	will.deacon, gary.robertson, christoffer.dall, peterz,
	anders.roxell

On Wed, 25 Jun 2014 16:40:18 +0100 Steve Capper <steve.capper@linaro.org> wrote:

> This series implements general forms of get_user_pages_fast and
> __get_user_pages_fast and activates them for arm and arm64.

Why not x86?

I think I might have already asked this.  If so, it's your fault for
not updating the changelog ;)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-06-25 20:42   ` Andrew Morton
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Morton @ 2014-06-25 20:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 25 Jun 2014 16:40:18 +0100 Steve Capper <steve.capper@linaro.org> wrote:

> This series implements general forms of get_user_pages_fast and
> __get_user_pages_fast and activates them for arm and arm64.

Why not x86?

I think I might have already asked this.  If so, it's your fault for
not updating the changelog ;)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
  2014-06-25 20:42   ` Andrew Morton
  (?)
@ 2014-06-26  7:53     ` Peter Zijlstra
  -1 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2014-06-26  7:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Steve Capper, linux-arm-kernel, catalin.marinas, linux,
	linux-arch, linux-mm, will.deacon, gary.robertson,
	christoffer.dall, anders.roxell

On Wed, Jun 25, 2014 at 01:42:35PM -0700, Andrew Morton wrote:
> On Wed, 25 Jun 2014 16:40:18 +0100 Steve Capper <steve.capper@linaro.org> wrote:
> 
> > This series implements general forms of get_user_pages_fast and
> > __get_user_pages_fast and activates them for arm and arm64.
> 
> Why not x86?
> 
> I think I might have already asked this.  If so, it's your fault for
> not updating the changelog ;)

Because x86 doesn't do RCU freed page tables :-) Also because i386 PAE
has magic (although one might expect ARM PAE -- or whatever they called
it -- to need similar magic).

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-06-26  7:53     ` Peter Zijlstra
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2014-06-26  7:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Steve Capper, linux-arm-kernel, catalin.marinas, linux,
	linux-arch, linux-mm, will.deacon, gary.robertson,
	christoffer.dall, anders.roxell

On Wed, Jun 25, 2014 at 01:42:35PM -0700, Andrew Morton wrote:
> On Wed, 25 Jun 2014 16:40:18 +0100 Steve Capper <steve.capper@linaro.org> wrote:
> 
> > This series implements general forms of get_user_pages_fast and
> > __get_user_pages_fast and activates them for arm and arm64.
> 
> Why not x86?
> 
> I think I might have already asked this.  If so, it's your fault for
> not updating the changelog ;)

Because x86 doesn't do RCU freed page tables :-) Also because i386 PAE
has magic (although one might expect ARM PAE -- or whatever they called
it -- to need similar magic).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-06-26  7:53     ` Peter Zijlstra
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2014-06-26  7:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 25, 2014 at 01:42:35PM -0700, Andrew Morton wrote:
> On Wed, 25 Jun 2014 16:40:18 +0100 Steve Capper <steve.capper@linaro.org> wrote:
> 
> > This series implements general forms of get_user_pages_fast and
> > __get_user_pages_fast and activates them for arm and arm64.
> 
> Why not x86?
> 
> I think I might have already asked this.  If so, it's your fault for
> not updating the changelog ;)

Because x86 doesn't do RCU freed page tables :-) Also because i386 PAE
has magic (although one might expect ARM PAE -- or whatever they called
it -- to need similar magic).

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-26  7:56       ` Peter Zijlstra
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2014-06-26  7:56 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Steve Capper, linux-arm-kernel, Catalin Marinas, linux,
	linux-arch, linux-mm, anders.roxell, gary.robertson, Will Deacon,
	akpm, christoffer.dall, Thomas Gleixner

On Wed, Jun 25, 2014 at 05:50:03PM +0100, Mark Rutland wrote:
> Hi Steve,
> 
> On Wed, Jun 25, 2014 at 04:40:24PM +0100, Steve Capper wrote:
> > Activate the RCU fast_gup for ARM64. We also need to force THP splits
> > to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> > splits are comparatively rare, this should not lead to a noticeable
> > performance degradation.
> > 
> > Some pre-requisite functions pud_write and pud_page are also added.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > ---
> >  arch/arm64/Kconfig               |  3 +++
> >  arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
> >  arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
> >  3 files changed, 32 insertions(+), 1 deletion(-)
> 
> [...]
> 
> > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> > index e4193e3..ddf96c1 100644
> > --- a/arch/arm64/mm/flush.c
> > +++ b/arch/arm64/mm/flush.c
> > @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
> >   */
> >  EXPORT_SYMBOL(flush_cache_all);
> >  EXPORT_SYMBOL(flush_icache_range);
> > +
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +static void thp_splitting_flush_sync(void *arg)
> > +{
> > +}
> > +
> > +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> > +			  pmd_t *pmdp)
> > +{
> > +	pmd_t pmd = pmd_mksplitting(*pmdp);
> > +	VM_BUG_ON(address & ~PMD_MASK);
> > +	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
> > +
> > +	/* dummy IPI to serialise against fast_gup */
> > +	smp_call_function(thp_splitting_flush_sync, NULL, 1);
> 
> Is there some reason we can't use kick_all_cpus_sync()?

Yes that would be equivalent. But looking at that, I worry about the
smp_mb(); archs are supposed to make sure IPIs are serializing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-26  7:56       ` Peter Zijlstra
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2014-06-26  7:56 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Steve Capper, linux-arm-kernel, Catalin Marinas, linux,
	linux-arch, linux-mm, anders.roxell, gary.robertson, Will Deacon,
	akpm, christoffer.dall, Thomas Gleixner

On Wed, Jun 25, 2014 at 05:50:03PM +0100, Mark Rutland wrote:
> Hi Steve,
> 
> On Wed, Jun 25, 2014 at 04:40:24PM +0100, Steve Capper wrote:
> > Activate the RCU fast_gup for ARM64. We also need to force THP splits
> > to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> > splits are comparatively rare, this should not lead to a noticeable
> > performance degradation.
> > 
> > Some pre-requisite functions pud_write and pud_page are also added.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > ---
> >  arch/arm64/Kconfig               |  3 +++
> >  arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
> >  arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
> >  3 files changed, 32 insertions(+), 1 deletion(-)
> 
> [...]
> 
> > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> > index e4193e3..ddf96c1 100644
> > --- a/arch/arm64/mm/flush.c
> > +++ b/arch/arm64/mm/flush.c
> > @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
> >   */
> >  EXPORT_SYMBOL(flush_cache_all);
> >  EXPORT_SYMBOL(flush_icache_range);
> > +
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +static void thp_splitting_flush_sync(void *arg)
> > +{
> > +}
> > +
> > +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> > +			  pmd_t *pmdp)
> > +{
> > +	pmd_t pmd = pmd_mksplitting(*pmdp);
> > +	VM_BUG_ON(address & ~PMD_MASK);
> > +	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
> > +
> > +	/* dummy IPI to serialise against fast_gup */
> > +	smp_call_function(thp_splitting_flush_sync, NULL, 1);
> 
> Is there some reason we can't use kick_all_cpus_sync()?

Yes that would be equivalent. But looking at that, I worry about the
smp_mb(); archs are supposed to make sure IPIs are serializing.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-26  7:56       ` Peter Zijlstra
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2014-06-26  7:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 25, 2014 at 05:50:03PM +0100, Mark Rutland wrote:
> Hi Steve,
> 
> On Wed, Jun 25, 2014 at 04:40:24PM +0100, Steve Capper wrote:
> > Activate the RCU fast_gup for ARM64. We also need to force THP splits
> > to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> > splits are comparatively rare, this should not lead to a noticeable
> > performance degradation.
> > 
> > Some pre-requisite functions pud_write and pud_page are also added.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > ---
> >  arch/arm64/Kconfig               |  3 +++
> >  arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
> >  arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
> >  3 files changed, 32 insertions(+), 1 deletion(-)
> 
> [...]
> 
> > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> > index e4193e3..ddf96c1 100644
> > --- a/arch/arm64/mm/flush.c
> > +++ b/arch/arm64/mm/flush.c
> > @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
> >   */
> >  EXPORT_SYMBOL(flush_cache_all);
> >  EXPORT_SYMBOL(flush_icache_range);
> > +
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > +static void thp_splitting_flush_sync(void *arg)
> > +{
> > +}
> > +
> > +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> > +			  pmd_t *pmdp)
> > +{
> > +	pmd_t pmd = pmd_mksplitting(*pmdp);
> > +	VM_BUG_ON(address & ~PMD_MASK);
> > +	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
> > +
> > +	/* dummy IPI to serialise against fast_gup */
> > +	smp_call_function(thp_splitting_flush_sync, NULL, 1);
> 
> Is there some reason we can't use kick_all_cpus_sync()?

Yes that would be equivalent. But looking at that, I worry about the
smp_mb(); archs are supposed to make sure IPIs are serializing.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-06-27 12:17     ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2014-06-27 12:17 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm

On Wed, Jun 25, 2014 at 04:40:20PM +0100, Steve Capper wrote:
> We need a mechanism to tag ptes as being special, this indicates that
> no attempt should be made to access the underlying struct page *
> associated with the pte. This is used by the fast_gup when operating on
> ptes as it has no means to access VMAs (that also contain this
> information) locklessly.
> 
> The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> pte_special and pte_mkspecial to make use of it, and defines
> __HAVE_ARCH_PTE_SPECIAL.
> 
> This patch also excludes special ptes from the icache/dcache sync logic.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
>  arch/arm/include/asm/pgtable-2level.h | 2 ++
>  arch/arm/include/asm/pgtable-3level.h | 8 ++++++++
>  arch/arm/include/asm/pgtable.h        | 6 ++----
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index 219ac88..f027941 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>  #define pmd_addr_end(addr,end) (end)
>  
>  #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
> +#define pte_special(pte)	(0)
> +static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
>  
>  /*
>   * We don't have huge page support for short descriptors, for the moment
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 85c60ad..b286ba9 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>  #define pte_huge(pte)		(pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
>  #define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
>  
> +#define pte_special(pte)	(!!(pte_val(pte) & L_PTE_SPECIAL))

Why the !!? Also, shouldn't this be rebased on your series adding the
pte_isset macro to ARM?

> +static inline pte_t pte_mkspecial(pte_t pte)
> +{
> +	pte_val(pte) |= L_PTE_SPECIAL;
> +	return pte;
> +}

If you put this in pgtable.h based on #ifdef __HAVE_ARCH_PTE_SPECIAL, then
you can use PTE_BIT_FUNC to avoid reinventing the wheel (or define
L_PTE_SPECIAL as 0 for 2-level and have one function).

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-06-27 12:17     ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2014-06-27 12:17 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm

On Wed, Jun 25, 2014 at 04:40:20PM +0100, Steve Capper wrote:
> We need a mechanism to tag ptes as being special, this indicates that
> no attempt should be made to access the underlying struct page *
> associated with the pte. This is used by the fast_gup when operating on
> ptes as it has no means to access VMAs (that also contain this
> information) locklessly.
> 
> The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> pte_special and pte_mkspecial to make use of it, and defines
> __HAVE_ARCH_PTE_SPECIAL.
> 
> This patch also excludes special ptes from the icache/dcache sync logic.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
>  arch/arm/include/asm/pgtable-2level.h | 2 ++
>  arch/arm/include/asm/pgtable-3level.h | 8 ++++++++
>  arch/arm/include/asm/pgtable.h        | 6 ++----
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index 219ac88..f027941 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>  #define pmd_addr_end(addr,end) (end)
>  
>  #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
> +#define pte_special(pte)	(0)
> +static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
>  
>  /*
>   * We don't have huge page support for short descriptors, for the moment
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 85c60ad..b286ba9 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>  #define pte_huge(pte)		(pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
>  #define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
>  
> +#define pte_special(pte)	(!!(pte_val(pte) & L_PTE_SPECIAL))

Why the !!? Also, shouldn't this be rebased on your series adding the
pte_isset macro to ARM?

> +static inline pte_t pte_mkspecial(pte_t pte)
> +{
> +	pte_val(pte) |= L_PTE_SPECIAL;
> +	return pte;
> +}

If you put this in pgtable.h based on #ifdef __HAVE_ARCH_PTE_SPECIAL, then
you can use PTE_BIT_FUNC to avoid reinventing the wheel (or define
L_PTE_SPECIAL as 0 for 2-level and have one function).

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-06-27 12:17     ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2014-06-27 12:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 25, 2014 at 04:40:20PM +0100, Steve Capper wrote:
> We need a mechanism to tag ptes as being special, this indicates that
> no attempt should be made to access the underlying struct page *
> associated with the pte. This is used by the fast_gup when operating on
> ptes as it has no means to access VMAs (that also contain this
> information) locklessly.
> 
> The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> pte_special and pte_mkspecial to make use of it, and defines
> __HAVE_ARCH_PTE_SPECIAL.
> 
> This patch also excludes special ptes from the icache/dcache sync logic.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>
> ---
>  arch/arm/include/asm/pgtable-2level.h | 2 ++
>  arch/arm/include/asm/pgtable-3level.h | 8 ++++++++
>  arch/arm/include/asm/pgtable.h        | 6 ++----
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index 219ac88..f027941 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>  #define pmd_addr_end(addr,end) (end)
>  
>  #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
> +#define pte_special(pte)	(0)
> +static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
>  
>  /*
>   * We don't have huge page support for short descriptors, for the moment
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 85c60ad..b286ba9 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
>  #define pte_huge(pte)		(pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
>  #define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
>  
> +#define pte_special(pte)	(!!(pte_val(pte) & L_PTE_SPECIAL))

Why the !!? Also, shouldn't this be rebased on your series adding the
pte_isset macro to ARM?

> +static inline pte_t pte_mkspecial(pte_t pte)
> +{
> +	pte_val(pte) |= L_PTE_SPECIAL;
> +	return pte;
> +}

If you put this in pgtable.h based on #ifdef __HAVE_ARCH_PTE_SPECIAL, then
you can use PTE_BIT_FUNC to avoid reinventing the wheel (or define
L_PTE_SPECIAL as 0 for 2-level and have one function).

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-27 12:20         ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2014-06-27 12:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, Steve Capper, linux-arm-kernel, Catalin Marinas,
	linux, linux-arch, linux-mm, anders.roxell, gary.robertson, akpm,
	christoffer.dall, Thomas Gleixner

On Thu, Jun 26, 2014 at 08:56:05AM +0100, Peter Zijlstra wrote:
> On Wed, Jun 25, 2014 at 05:50:03PM +0100, Mark Rutland wrote:
> > Hi Steve,
> > 
> > On Wed, Jun 25, 2014 at 04:40:24PM +0100, Steve Capper wrote:
> > > Activate the RCU fast_gup for ARM64. We also need to force THP splits
> > > to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> > > splits are comparatively rare, this should not lead to a noticeable
> > > performance degradation.
> > > 
> > > Some pre-requisite functions pud_write and pud_page are also added.
> > > 
> > > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > > ---
> > >  arch/arm64/Kconfig               |  3 +++
> > >  arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
> > >  arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
> > >  3 files changed, 32 insertions(+), 1 deletion(-)
> > 
> > [...]
> > 
> > > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> > > index e4193e3..ddf96c1 100644
> > > --- a/arch/arm64/mm/flush.c
> > > +++ b/arch/arm64/mm/flush.c
> > > @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
> > >   */
> > >  EXPORT_SYMBOL(flush_cache_all);
> > >  EXPORT_SYMBOL(flush_icache_range);
> > > +
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > > +static void thp_splitting_flush_sync(void *arg)
> > > +{
> > > +}
> > > +
> > > +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> > > +			  pmd_t *pmdp)
> > > +{
> > > +	pmd_t pmd = pmd_mksplitting(*pmdp);
> > > +	VM_BUG_ON(address & ~PMD_MASK);
> > > +	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
> > > +
> > > +	/* dummy IPI to serialise against fast_gup */
> > > +	smp_call_function(thp_splitting_flush_sync, NULL, 1);
> > 
> > Is there some reason we can't use kick_all_cpus_sync()?
> 
> Yes that would be equivalent. But looking at that, I worry about the
> smp_mb(); archs are supposed to make sure IPIs are serializing.

Agreed; smp_call_function would be hopelessly broken if that wasn't true
(at least, everywhere I've used it ;)

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-27 12:20         ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2014-06-27 12:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, Steve Capper, linux-arm-kernel, Catalin Marinas,
	linux, linux-arch, linux-mm, anders.roxell, gary.robertson, akpm,
	christoffer.dall, Thomas Gleixner

On Thu, Jun 26, 2014 at 08:56:05AM +0100, Peter Zijlstra wrote:
> On Wed, Jun 25, 2014 at 05:50:03PM +0100, Mark Rutland wrote:
> > Hi Steve,
> > 
> > On Wed, Jun 25, 2014 at 04:40:24PM +0100, Steve Capper wrote:
> > > Activate the RCU fast_gup for ARM64. We also need to force THP splits
> > > to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> > > splits are comparatively rare, this should not lead to a noticeable
> > > performance degradation.
> > > 
> > > Some pre-requisite functions pud_write and pud_page are also added.
> > > 
> > > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > > ---
> > >  arch/arm64/Kconfig               |  3 +++
> > >  arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
> > >  arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
> > >  3 files changed, 32 insertions(+), 1 deletion(-)
> > 
> > [...]
> > 
> > > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> > > index e4193e3..ddf96c1 100644
> > > --- a/arch/arm64/mm/flush.c
> > > +++ b/arch/arm64/mm/flush.c
> > > @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
> > >   */
> > >  EXPORT_SYMBOL(flush_cache_all);
> > >  EXPORT_SYMBOL(flush_icache_range);
> > > +
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > > +static void thp_splitting_flush_sync(void *arg)
> > > +{
> > > +}
> > > +
> > > +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> > > +			  pmd_t *pmdp)
> > > +{
> > > +	pmd_t pmd = pmd_mksplitting(*pmdp);
> > > +	VM_BUG_ON(address & ~PMD_MASK);
> > > +	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
> > > +
> > > +	/* dummy IPI to serialise against fast_gup */
> > > +	smp_call_function(thp_splitting_flush_sync, NULL, 1);
> > 
> > Is there some reason we can't use kick_all_cpus_sync()?
> 
> Yes that would be equivalent. But looking at that, I worry about the
> smp_mb(); archs are supposed to make sure IPIs are serializing.

Agreed; smp_call_function would be hopelessly broken if that wasn't true
(at least, everywhere I've used it ;)

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 6/6] arm64: mm: Enable RCU fast_gup
@ 2014-06-27 12:20         ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2014-06-27 12:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 26, 2014 at 08:56:05AM +0100, Peter Zijlstra wrote:
> On Wed, Jun 25, 2014 at 05:50:03PM +0100, Mark Rutland wrote:
> > Hi Steve,
> > 
> > On Wed, Jun 25, 2014 at 04:40:24PM +0100, Steve Capper wrote:
> > > Activate the RCU fast_gup for ARM64. We also need to force THP splits
> > > to broadcast an IPI s.t. we block in the fast_gup page walker. As THP
> > > splits are comparatively rare, this should not lead to a noticeable
> > > performance degradation.
> > > 
> > > Some pre-requisite functions pud_write and pud_page are also added.
> > > 
> > > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > > ---
> > >  arch/arm64/Kconfig               |  3 +++
> > >  arch/arm64/include/asm/pgtable.h | 11 ++++++++++-
> > >  arch/arm64/mm/flush.c            | 19 +++++++++++++++++++
> > >  3 files changed, 32 insertions(+), 1 deletion(-)
> > 
> > [...]
> > 
> > > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> > > index e4193e3..ddf96c1 100644
> > > --- a/arch/arm64/mm/flush.c
> > > +++ b/arch/arm64/mm/flush.c
> > > @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page);
> > >   */
> > >  EXPORT_SYMBOL(flush_cache_all);
> > >  EXPORT_SYMBOL(flush_icache_range);
> > > +
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> > > +static void thp_splitting_flush_sync(void *arg)
> > > +{
> > > +}
> > > +
> > > +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
> > > +			  pmd_t *pmdp)
> > > +{
> > > +	pmd_t pmd = pmd_mksplitting(*pmdp);
> > > +	VM_BUG_ON(address & ~PMD_MASK);
> > > +	set_pmd_at(vma->vm_mm, address, pmdp, pmd);
> > > +
> > > +	/* dummy IPI to serialise against fast_gup */
> > > +	smp_call_function(thp_splitting_flush_sync, NULL, 1);
> > 
> > Is there some reason we can't use kick_all_cpus_sync()?
> 
> Yes that would be equivalent. But looking at that, I worry about the
> smp_mb(); archs are supposed to make sure IPIs are serializing.

Agreed; smp_call_function would be hopelessly broken if that wasn't true
(at least, everywhere I've used it ;)

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-06-27 12:44       ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-27 12:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm

On Fri, Jun 27, 2014 at 01:17:21PM +0100, Will Deacon wrote:
> On Wed, Jun 25, 2014 at 04:40:20PM +0100, Steve Capper wrote:
> > We need a mechanism to tag ptes as being special, this indicates that
> > no attempt should be made to access the underlying struct page *
> > associated with the pte. This is used by the fast_gup when operating on
> > ptes as it has no means to access VMAs (that also contain this
> > information) locklessly.
> > 
> > The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> > pte_special and pte_mkspecial to make use of it, and defines
> > __HAVE_ARCH_PTE_SPECIAL.
> > 
> > This patch also excludes special ptes from the icache/dcache sync logic.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > ---
> >  arch/arm/include/asm/pgtable-2level.h | 2 ++
> >  arch/arm/include/asm/pgtable-3level.h | 8 ++++++++
> >  arch/arm/include/asm/pgtable.h        | 6 ++----
> >  3 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> > index 219ac88..f027941 100644
> > --- a/arch/arm/include/asm/pgtable-2level.h
> > +++ b/arch/arm/include/asm/pgtable-2level.h
> > @@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >  #define pmd_addr_end(addr,end) (end)
> >  
> >  #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
> > +#define pte_special(pte)	(0)
> > +static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
> >  
> >  /*
> >   * We don't have huge page support for short descriptors, for the moment
> > diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> > index 85c60ad..b286ba9 100644
> > --- a/arch/arm/include/asm/pgtable-3level.h
> > +++ b/arch/arm/include/asm/pgtable-3level.h
> > @@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >  #define pte_huge(pte)		(pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
> >  #define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
> >  
> > +#define pte_special(pte)	(!!(pte_val(pte) & L_PTE_SPECIAL))
> 
> Why the !!? Also, shouldn't this be rebased on your series adding the
> pte_isset macro to ARM?

Yes it should, I had this series logically separate to the pte_isset patch.
I will have the pte_isset patch as a pre-requisite to the ARM fast_gup
activation logic.

> 
> > +static inline pte_t pte_mkspecial(pte_t pte)
> > +{
> > +	pte_val(pte) |= L_PTE_SPECIAL;
> > +	return pte;
> > +}
> 
> If you put this in pgtable.h based on #ifdef __HAVE_ARCH_PTE_SPECIAL, then
> you can use PTE_BIT_FUNC to avoid reinventing the wheel (or define
> L_PTE_SPECIAL as 0 for 2-level and have one function).

Thanks, I'll give this a go.

Cheers,
--
Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-06-27 12:44       ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-27 12:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	gary.robertson, christoffer.dall, peterz, anders.roxell, akpm

On Fri, Jun 27, 2014 at 01:17:21PM +0100, Will Deacon wrote:
> On Wed, Jun 25, 2014 at 04:40:20PM +0100, Steve Capper wrote:
> > We need a mechanism to tag ptes as being special, this indicates that
> > no attempt should be made to access the underlying struct page *
> > associated with the pte. This is used by the fast_gup when operating on
> > ptes as it has no means to access VMAs (that also contain this
> > information) locklessly.
> > 
> > The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> > pte_special and pte_mkspecial to make use of it, and defines
> > __HAVE_ARCH_PTE_SPECIAL.
> > 
> > This patch also excludes special ptes from the icache/dcache sync logic.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > ---
> >  arch/arm/include/asm/pgtable-2level.h | 2 ++
> >  arch/arm/include/asm/pgtable-3level.h | 8 ++++++++
> >  arch/arm/include/asm/pgtable.h        | 6 ++----
> >  3 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> > index 219ac88..f027941 100644
> > --- a/arch/arm/include/asm/pgtable-2level.h
> > +++ b/arch/arm/include/asm/pgtable-2level.h
> > @@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >  #define pmd_addr_end(addr,end) (end)
> >  
> >  #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
> > +#define pte_special(pte)	(0)
> > +static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
> >  
> >  /*
> >   * We don't have huge page support for short descriptors, for the moment
> > diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> > index 85c60ad..b286ba9 100644
> > --- a/arch/arm/include/asm/pgtable-3level.h
> > +++ b/arch/arm/include/asm/pgtable-3level.h
> > @@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >  #define pte_huge(pte)		(pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
> >  #define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
> >  
> > +#define pte_special(pte)	(!!(pte_val(pte) & L_PTE_SPECIAL))
> 
> Why the !!? Also, shouldn't this be rebased on your series adding the
> pte_isset macro to ARM?

Yes it should, I had this series logically separate to the pte_isset patch.
I will have the pte_isset patch as a pre-requisite to the ARM fast_gup
activation logic.

> 
> > +static inline pte_t pte_mkspecial(pte_t pte)
> > +{
> > +	pte_val(pte) |= L_PTE_SPECIAL;
> > +	return pte;
> > +}
> 
> If you put this in pgtable.h based on #ifdef __HAVE_ARCH_PTE_SPECIAL, then
> you can use PTE_BIT_FUNC to avoid reinventing the wheel (or define
> L_PTE_SPECIAL as 0 for 2-level and have one function).

Thanks, I'll give this a go.

Cheers,
--
Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 2/6] arm: mm: Introduce special ptes for LPAE
@ 2014-06-27 12:44       ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-06-27 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 27, 2014 at 01:17:21PM +0100, Will Deacon wrote:
> On Wed, Jun 25, 2014 at 04:40:20PM +0100, Steve Capper wrote:
> > We need a mechanism to tag ptes as being special, this indicates that
> > no attempt should be made to access the underlying struct page *
> > associated with the pte. This is used by the fast_gup when operating on
> > ptes as it has no means to access VMAs (that also contain this
> > information) locklessly.
> > 
> > The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies
> > pte_special and pte_mkspecial to make use of it, and defines
> > __HAVE_ARCH_PTE_SPECIAL.
> > 
> > This patch also excludes special ptes from the icache/dcache sync logic.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > ---
> >  arch/arm/include/asm/pgtable-2level.h | 2 ++
> >  arch/arm/include/asm/pgtable-3level.h | 8 ++++++++
> >  arch/arm/include/asm/pgtable.h        | 6 ++----
> >  3 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> > index 219ac88..f027941 100644
> > --- a/arch/arm/include/asm/pgtable-2level.h
> > +++ b/arch/arm/include/asm/pgtable-2level.h
> > @@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >  #define pmd_addr_end(addr,end) (end)
> >  
> >  #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
> > +#define pte_special(pte)	(0)
> > +static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
> >  
> >  /*
> >   * We don't have huge page support for short descriptors, for the moment
> > diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> > index 85c60ad..b286ba9 100644
> > --- a/arch/arm/include/asm/pgtable-3level.h
> > +++ b/arch/arm/include/asm/pgtable-3level.h
> > @@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >  #define pte_huge(pte)		(pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT))
> >  #define pte_mkhuge(pte)		(__pte(pte_val(pte) & ~PTE_TABLE_BIT))
> >  
> > +#define pte_special(pte)	(!!(pte_val(pte) & L_PTE_SPECIAL))
> 
> Why the !!? Also, shouldn't this be rebased on your series adding the
> pte_isset macro to ARM?

Yes it should, I had this series logically separate to the pte_isset patch.
I will have the pte_isset patch as a pre-requisite to the ARM fast_gup
activation logic.

> 
> > +static inline pte_t pte_mkspecial(pte_t pte)
> > +{
> > +	pte_val(pte) |= L_PTE_SPECIAL;
> > +	return pte;
> > +}
> 
> If you put this in pgtable.h based on #ifdef __HAVE_ARCH_PTE_SPECIAL, then
> you can use PTE_BIT_FUNC to avoid reinventing the wheel (or define
> L_PTE_SPECIAL as 0 for 2-level and have one function).

Thanks, I'll give this a go.

Cheers,
--
Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
  2014-06-25 15:40 ` Steve Capper
  (?)
@ 2014-08-20 14:56   ` Dann Frazier
  -1 siblings, 0 replies; 49+ messages in thread
From: Dann Frazier @ 2014-08-20 14:56 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	anders.roxell, peterz, gary.robertson, Will Deacon, akpm,
	Christoffer Dall

On Wed, Jun 25, 2014 at 9:40 AM, Steve Capper <steve.capper@linaro.org> wrote:
> Hello,
> This series implements general forms of get_user_pages_fast and
> __get_user_pages_fast and activates them for arm and arm64.
>
> These are required for Transparent HugePages to function correctly, as
> a futex on a THP tail will otherwise result in an infinite loop (due to
> the core implementation of __get_user_pages_fast always returning 0).
>
> This series may also be beneficial for direct-IO heavy workloads and
> certain KVM workloads.
>
> The main changes since RFC V5 are:
>  * Rebased against 3.16-rc1.
>  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
>    because the entry must be present for these leaf functions to be
>    called.
>  * Rather than assume puds can be re-cast as pmds, a separate
>    function pud_write is instead used by the core gup.
>  * ARM activation logic changed, now it will only activate
>    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
>
> The main changes since RFC V4 are:
>  * corrected the arm64 logic so it now correctly rcu-frees page
>    table backing pages.
>  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
>    invalidate TLBs anyway.
>  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
>  * dropped Catalin's mmu_gather patch as that's been merged already.
>
> This series has been tested with LTP and some custom futex tests that
> exacerbate the futex on THP tail case. Also debug counters were
> temporarily employed to ensure that the RCU_TABLE_FREE logic was
> behaving as expected.
>
> I would really appreciate any testers or comments (especially on the
> validity or otherwise of the core fast_gup implementation).

I have a test case that can reliably hit the THP issue on arm64, which
hits it on both 3.16 and 3.17-rc1. I do a "juju bootstrap local" w/
THP disabled at boot. Then I reboot with THP enabled. At this point
you'll see jujud spin at 200% CPU. gccgo binaries seem to have a nack
for hitting it.

I validated that your patches resolve this issue on 3.16, so:

Tested-by: dann frazier <dann.frazier@canonical.com>

I haven't done the same for 3.17-rc1 because they no longer apply
cleanly, but I'm happy to test future submissions w/ hopefully a
shorter feedback loop (please add me to the CC). btw, should we
consider something like this until your patches go in?

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd4e81a..820e3d9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -306,6 +306,7 @@ config ARCH_WANT_HUGE_PMD_SHARE

 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
        def_bool y
+       depends on BROKEN

 config ARCH_HAS_CACHE_LINE_SIZE
        def_bool y

  -dann

> Cheers,
> --
> Steve
>
> Steve Capper (6):
>   mm: Introduce a general RCU get_user_pages_fast.
>   arm: mm: Introduce special ptes for LPAE
>   arm: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm: mm: Enable RCU fast_gup
>   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm64: mm: Enable RCU fast_gup
>
>  arch/arm/Kconfig                      |   5 +
>  arch/arm/include/asm/pgtable-2level.h |   2 +
>  arch/arm/include/asm/pgtable-3level.h |  16 ++
>  arch/arm/include/asm/pgtable.h        |   6 +-
>  arch/arm/include/asm/tlb.h            |  38 ++++-
>  arch/arm/mm/flush.c                   |  19 +++
>  arch/arm64/Kconfig                    |   4 +
>  arch/arm64/include/asm/pgtable.h      |  11 +-
>  arch/arm64/include/asm/tlb.h          |  18 ++-
>  arch/arm64/mm/flush.c                 |  19 +++
>  mm/Kconfig                            |   3 +
>  mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
>  12 files changed, 410 insertions(+), 9 deletions(-)
>
> --
> 1.9.3
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-20 14:56   ` Dann Frazier
  0 siblings, 0 replies; 49+ messages in thread
From: Dann Frazier @ 2014-08-20 14:56 UTC (permalink / raw)
  To: Steve Capper
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	anders.roxell, peterz, gary.robertson, Will Deacon, akpm,
	Christoffer Dall

On Wed, Jun 25, 2014 at 9:40 AM, Steve Capper <steve.capper@linaro.org> wrote:
> Hello,
> This series implements general forms of get_user_pages_fast and
> __get_user_pages_fast and activates them for arm and arm64.
>
> These are required for Transparent HugePages to function correctly, as
> a futex on a THP tail will otherwise result in an infinite loop (due to
> the core implementation of __get_user_pages_fast always returning 0).
>
> This series may also be beneficial for direct-IO heavy workloads and
> certain KVM workloads.
>
> The main changes since RFC V5 are:
>  * Rebased against 3.16-rc1.
>  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
>    because the entry must be present for these leaf functions to be
>    called.
>  * Rather than assume puds can be re-cast as pmds, a separate
>    function pud_write is instead used by the core gup.
>  * ARM activation logic changed, now it will only activate
>    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
>
> The main changes since RFC V4 are:
>  * corrected the arm64 logic so it now correctly rcu-frees page
>    table backing pages.
>  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
>    invalidate TLBs anyway.
>  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
>  * dropped Catalin's mmu_gather patch as that's been merged already.
>
> This series has been tested with LTP and some custom futex tests that
> exacerbate the futex on THP tail case. Also debug counters were
> temporarily employed to ensure that the RCU_TABLE_FREE logic was
> behaving as expected.
>
> I would really appreciate any testers or comments (especially on the
> validity or otherwise of the core fast_gup implementation).

I have a test case that can reliably hit the THP issue on arm64, which
hits it on both 3.16 and 3.17-rc1. I do a "juju bootstrap local" w/
THP disabled at boot. Then I reboot with THP enabled. At this point
you'll see jujud spin at 200% CPU. gccgo binaries seem to have a nack
for hitting it.

I validated that your patches resolve this issue on 3.16, so:

Tested-by: dann frazier <dann.frazier@canonical.com>

I haven't done the same for 3.17-rc1 because they no longer apply
cleanly, but I'm happy to test future submissions w/ hopefully a
shorter feedback loop (please add me to the CC). btw, should we
consider something like this until your patches go in?

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd4e81a..820e3d9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -306,6 +306,7 @@ config ARCH_WANT_HUGE_PMD_SHARE

 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
        def_bool y
+       depends on BROKEN

 config ARCH_HAS_CACHE_LINE_SIZE
        def_bool y

  -dann

> Cheers,
> --
> Steve
>
> Steve Capper (6):
>   mm: Introduce a general RCU get_user_pages_fast.
>   arm: mm: Introduce special ptes for LPAE
>   arm: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm: mm: Enable RCU fast_gup
>   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm64: mm: Enable RCU fast_gup
>
>  arch/arm/Kconfig                      |   5 +
>  arch/arm/include/asm/pgtable-2level.h |   2 +
>  arch/arm/include/asm/pgtable-3level.h |  16 ++
>  arch/arm/include/asm/pgtable.h        |   6 +-
>  arch/arm/include/asm/tlb.h            |  38 ++++-
>  arch/arm/mm/flush.c                   |  19 +++
>  arch/arm64/Kconfig                    |   4 +
>  arch/arm64/include/asm/pgtable.h      |  11 +-
>  arch/arm64/include/asm/tlb.h          |  18 ++-
>  arch/arm64/mm/flush.c                 |  19 +++
>  mm/Kconfig                            |   3 +
>  mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
>  12 files changed, 410 insertions(+), 9 deletions(-)
>
> --
> 1.9.3
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-20 14:56   ` Dann Frazier
  0 siblings, 0 replies; 49+ messages in thread
From: Dann Frazier @ 2014-08-20 14:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 25, 2014 at 9:40 AM, Steve Capper <steve.capper@linaro.org> wrote:
> Hello,
> This series implements general forms of get_user_pages_fast and
> __get_user_pages_fast and activates them for arm and arm64.
>
> These are required for Transparent HugePages to function correctly, as
> a futex on a THP tail will otherwise result in an infinite loop (due to
> the core implementation of __get_user_pages_fast always returning 0).
>
> This series may also be beneficial for direct-IO heavy workloads and
> certain KVM workloads.
>
> The main changes since RFC V5 are:
>  * Rebased against 3.16-rc1.
>  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
>    because the entry must be present for these leaf functions to be
>    called.
>  * Rather than assume puds can be re-cast as pmds, a separate
>    function pud_write is instead used by the core gup.
>  * ARM activation logic changed, now it will only activate
>    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
>
> The main changes since RFC V4 are:
>  * corrected the arm64 logic so it now correctly rcu-frees page
>    table backing pages.
>  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
>    invalidate TLBs anyway.
>  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
>  * dropped Catalin's mmu_gather patch as that's been merged already.
>
> This series has been tested with LTP and some custom futex tests that
> exacerbate the futex on THP tail case. Also debug counters were
> temporarily employed to ensure that the RCU_TABLE_FREE logic was
> behaving as expected.
>
> I would really appreciate any testers or comments (especially on the
> validity or otherwise of the core fast_gup implementation).

I have a test case that can reliably hit the THP issue on arm64, which
hits it on both 3.16 and 3.17-rc1. I do a "juju bootstrap local" w/
THP disabled at boot. Then I reboot with THP enabled. At this point
you'll see jujud spin at 200% CPU. gccgo binaries seem to have a nack
for hitting it.

I validated that your patches resolve this issue on 3.16, so:

Tested-by: dann frazier <dann.frazier@canonical.com>

I haven't done the same for 3.17-rc1 because they no longer apply
cleanly, but I'm happy to test future submissions w/ hopefully a
shorter feedback loop (please add me to the CC). btw, should we
consider something like this until your patches go in?

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd4e81a..820e3d9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -306,6 +306,7 @@ config ARCH_WANT_HUGE_PMD_SHARE

 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
        def_bool y
+       depends on BROKEN

 config ARCH_HAS_CACHE_LINE_SIZE
        def_bool y

  -dann

> Cheers,
> --
> Steve
>
> Steve Capper (6):
>   mm: Introduce a general RCU get_user_pages_fast.
>   arm: mm: Introduce special ptes for LPAE
>   arm: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm: mm: Enable RCU fast_gup
>   arm64: mm: Enable HAVE_RCU_TABLE_FREE logic
>   arm64: mm: Enable RCU fast_gup
>
>  arch/arm/Kconfig                      |   5 +
>  arch/arm/include/asm/pgtable-2level.h |   2 +
>  arch/arm/include/asm/pgtable-3level.h |  16 ++
>  arch/arm/include/asm/pgtable.h        |   6 +-
>  arch/arm/include/asm/tlb.h            |  38 ++++-
>  arch/arm/mm/flush.c                   |  19 +++
>  arch/arm64/Kconfig                    |   4 +
>  arch/arm64/include/asm/pgtable.h      |  11 +-
>  arch/arm64/include/asm/tlb.h          |  18 ++-
>  arch/arm64/mm/flush.c                 |  19 +++
>  mm/Kconfig                            |   3 +
>  mm/gup.c                              | 278 ++++++++++++++++++++++++++++++++++
>  12 files changed, 410 insertions(+), 9 deletions(-)
>
> --
> 1.9.3
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
  2014-08-20 14:56   ` Dann Frazier
  (?)
@ 2014-08-20 15:11     ` Steve Capper
  -1 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-08-20 15:11 UTC (permalink / raw)
  To: Dann Frazier
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	anders.roxell, peterz, gary.robertson, Will Deacon, akpm,
	Christoffer Dall

On Wed, Aug 20, 2014 at 08:56:09AM -0600, Dann Frazier wrote:
> On Wed, Jun 25, 2014 at 9:40 AM, Steve Capper <steve.capper@linaro.org> wrote:
> > Hello,
> > This series implements general forms of get_user_pages_fast and
> > __get_user_pages_fast and activates them for arm and arm64.
> >
> > These are required for Transparent HugePages to function correctly, as
> > a futex on a THP tail will otherwise result in an infinite loop (due to
> > the core implementation of __get_user_pages_fast always returning 0).
> >
> > This series may also be beneficial for direct-IO heavy workloads and
> > certain KVM workloads.
> >
> > The main changes since RFC V5 are:
> >  * Rebased against 3.16-rc1.
> >  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
> >    because the entry must be present for these leaf functions to be
> >    called.
> >  * Rather than assume puds can be re-cast as pmds, a separate
> >    function pud_write is instead used by the core gup.
> >  * ARM activation logic changed, now it will only activate
> >    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
> >
> > The main changes since RFC V4 are:
> >  * corrected the arm64 logic so it now correctly rcu-frees page
> >    table backing pages.
> >  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
> >    invalidate TLBs anyway.
> >  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
> >  * dropped Catalin's mmu_gather patch as that's been merged already.
> >
> > This series has been tested with LTP and some custom futex tests that
> > exacerbate the futex on THP tail case. Also debug counters were
> > temporarily employed to ensure that the RCU_TABLE_FREE logic was
> > behaving as expected.
> >
> > I would really appreciate any testers or comments (especially on the
> > validity or otherwise of the core fast_gup implementation).
> 
> I have a test case that can reliably hit the THP issue on arm64, which
> hits it on both 3.16 and 3.17-rc1. I do a "juju bootstrap local" w/
> THP disabled at boot. Then I reboot with THP enabled. At this point
> you'll see jujud spin at 200% CPU. gccgo binaries seem to have a nack
> for hitting it.
> 
> I validated that your patches resolve this issue on 3.16, so:
> 
> Tested-by: dann frazier <dann.frazier@canonical.com>

Thanks Dann!

> 
> I haven't done the same for 3.17-rc1 because they no longer apply
> cleanly, but I'm happy to test future submissions w/ hopefully a
> shorter feedback loop (please add me to the CC). btw, should we
> consider something like this until your patches go in?

I am about to post the following series, I will CC you:
git://git.linaro.org/people/steve.capper/linux.git fast_gup/3.17-rc1
(I've just been giving it a workout on 3.17-rc1).

I would much prefer for the RCU fast_gup to go into 3.18 rather than
BROKEN for THP. I am not sure what to do about earlier versions.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-20 15:11     ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-08-20 15:11 UTC (permalink / raw)
  To: Dann Frazier
  Cc: linux-arm-kernel, Catalin Marinas, linux, linux-arch, linux-mm,
	anders.roxell, peterz, gary.robertson, Will Deacon, akpm,
	Christoffer Dall

On Wed, Aug 20, 2014 at 08:56:09AM -0600, Dann Frazier wrote:
> On Wed, Jun 25, 2014 at 9:40 AM, Steve Capper <steve.capper@linaro.org> wrote:
> > Hello,
> > This series implements general forms of get_user_pages_fast and
> > __get_user_pages_fast and activates them for arm and arm64.
> >
> > These are required for Transparent HugePages to function correctly, as
> > a futex on a THP tail will otherwise result in an infinite loop (due to
> > the core implementation of __get_user_pages_fast always returning 0).
> >
> > This series may also be beneficial for direct-IO heavy workloads and
> > certain KVM workloads.
> >
> > The main changes since RFC V5 are:
> >  * Rebased against 3.16-rc1.
> >  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
> >    because the entry must be present for these leaf functions to be
> >    called.
> >  * Rather than assume puds can be re-cast as pmds, a separate
> >    function pud_write is instead used by the core gup.
> >  * ARM activation logic changed, now it will only activate
> >    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
> >
> > The main changes since RFC V4 are:
> >  * corrected the arm64 logic so it now correctly rcu-frees page
> >    table backing pages.
> >  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
> >    invalidate TLBs anyway.
> >  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
> >  * dropped Catalin's mmu_gather patch as that's been merged already.
> >
> > This series has been tested with LTP and some custom futex tests that
> > exacerbate the futex on THP tail case. Also debug counters were
> > temporarily employed to ensure that the RCU_TABLE_FREE logic was
> > behaving as expected.
> >
> > I would really appreciate any testers or comments (especially on the
> > validity or otherwise of the core fast_gup implementation).
> 
> I have a test case that can reliably hit the THP issue on arm64, which
> hits it on both 3.16 and 3.17-rc1. I do a "juju bootstrap local" w/
> THP disabled at boot. Then I reboot with THP enabled. At this point
> you'll see jujud spin at 200% CPU. gccgo binaries seem to have a nack
> for hitting it.
> 
> I validated that your patches resolve this issue on 3.16, so:
> 
> Tested-by: dann frazier <dann.frazier@canonical.com>

Thanks Dann!

> 
> I haven't done the same for 3.17-rc1 because they no longer apply
> cleanly, but I'm happy to test future submissions w/ hopefully a
> shorter feedback loop (please add me to the CC). btw, should we
> consider something like this until your patches go in?

I am about to post the following series, I will CC you:
git://git.linaro.org/people/steve.capper/linux.git fast_gup/3.17-rc1
(I've just been giving it a workout on 3.17-rc1).

I would much prefer for the RCU fast_gup to go into 3.18 rather than
BROKEN for THP. I am not sure what to do about earlier versions.

Cheers,
-- 
Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast
@ 2014-08-20 15:11     ` Steve Capper
  0 siblings, 0 replies; 49+ messages in thread
From: Steve Capper @ 2014-08-20 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 20, 2014 at 08:56:09AM -0600, Dann Frazier wrote:
> On Wed, Jun 25, 2014 at 9:40 AM, Steve Capper <steve.capper@linaro.org> wrote:
> > Hello,
> > This series implements general forms of get_user_pages_fast and
> > __get_user_pages_fast and activates them for arm and arm64.
> >
> > These are required for Transparent HugePages to function correctly, as
> > a futex on a THP tail will otherwise result in an infinite loop (due to
> > the core implementation of __get_user_pages_fast always returning 0).
> >
> > This series may also be beneficial for direct-IO heavy workloads and
> > certain KVM workloads.
> >
> > The main changes since RFC V5 are:
> >  * Rebased against 3.16-rc1.
> >  * pmd_present no longer tested for by gup_huge_pmd and gup_huge_pud,
> >    because the entry must be present for these leaf functions to be
> >    called.
> >  * Rather than assume puds can be re-cast as pmds, a separate
> >    function pud_write is instead used by the core gup.
> >  * ARM activation logic changed, now it will only activate
> >    RCU_TABLE_FREE and RCU_GUP when running with LPAE.
> >
> > The main changes since RFC V4 are:
> >  * corrected the arm64 logic so it now correctly rcu-frees page
> >    table backing pages.
> >  * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to
> >    invalidate TLBs anyway.
> >  * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge).
> >  * dropped Catalin's mmu_gather patch as that's been merged already.
> >
> > This series has been tested with LTP and some custom futex tests that
> > exacerbate the futex on THP tail case. Also debug counters were
> > temporarily employed to ensure that the RCU_TABLE_FREE logic was
> > behaving as expected.
> >
> > I would really appreciate any testers or comments (especially on the
> > validity or otherwise of the core fast_gup implementation).
> 
> I have a test case that can reliably hit the THP issue on arm64, which
> hits it on both 3.16 and 3.17-rc1. I do a "juju bootstrap local" w/
> THP disabled at boot. Then I reboot with THP enabled. At this point
> you'll see jujud spin at 200% CPU. gccgo binaries seem to have a nack
> for hitting it.
> 
> I validated that your patches resolve this issue on 3.16, so:
> 
> Tested-by: dann frazier <dann.frazier@canonical.com>

Thanks Dann!

> 
> I haven't done the same for 3.17-rc1 because they no longer apply
> cleanly, but I'm happy to test future submissions w/ hopefully a
> shorter feedback loop (please add me to the CC). btw, should we
> consider something like this until your patches go in?

I am about to post the following series, I will CC you:
git://git.linaro.org/people/steve.capper/linux.git fast_gup/3.17-rc1
(I've just been giving it a workout on 3.17-rc1).

I would much prefer for the RCU fast_gup to go into 3.18 rather than
BROKEN for THP. I am not sure what to do about earlier versions.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2014-08-20 15:11 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-25 15:40 [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast Steve Capper
2014-06-25 15:40 ` Steve Capper
2014-06-25 15:40 ` Steve Capper
2014-06-25 15:40 ` [PATCH 1/6] mm: Introduce a general RCU get_user_pages_fast Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 15:40 ` [PATCH 2/6] arm: mm: Introduce special ptes for LPAE Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-27 12:17   ` Will Deacon
2014-06-27 12:17     ` Will Deacon
2014-06-27 12:17     ` Will Deacon
2014-06-27 12:44     ` Steve Capper
2014-06-27 12:44       ` Steve Capper
2014-06-27 12:44       ` Steve Capper
2014-06-25 15:40 ` [PATCH 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 15:40 ` [PATCH 4/6] arm: mm: Enable RCU fast_gup Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 15:40 ` [PATCH 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 15:40 ` [PATCH 6/6] arm64: mm: Enable RCU fast_gup Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 15:40   ` Steve Capper
2014-06-25 16:50   ` Mark Rutland
2014-06-25 16:50     ` Mark Rutland
2014-06-25 16:50     ` Mark Rutland
2014-06-25 16:50     ` Mark Rutland
2014-06-26  7:56     ` Peter Zijlstra
2014-06-26  7:56       ` Peter Zijlstra
2014-06-26  7:56       ` Peter Zijlstra
2014-06-27 12:20       ` Will Deacon
2014-06-27 12:20         ` Will Deacon
2014-06-27 12:20         ` Will Deacon
2014-06-25 20:42 ` [PATCH 0/6] RCU get_user_pages_fast and __get_user_pages_fast Andrew Morton
2014-06-25 20:42   ` Andrew Morton
2014-06-25 20:42   ` Andrew Morton
2014-06-26  7:53   ` Peter Zijlstra
2014-06-26  7:53     ` Peter Zijlstra
2014-06-26  7:53     ` Peter Zijlstra
2014-08-20 14:56 ` Dann Frazier
2014-08-20 14:56   ` Dann Frazier
2014-08-20 14:56   ` Dann Frazier
2014-08-20 15:11   ` Steve Capper
2014-08-20 15:11     ` Steve Capper
2014-08-20 15:11     ` Steve Capper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.