[PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
@ 2012-07-20 12:50 ` Kirill A. Shutemov
  0 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels of CPU
caches.  To avoid this only cache clear the 4K area around the fault
address and use a cache avoiding clears for the rest of the 2MB area.

It would be nice to test the patchset with more workloads. Especially if
you see performance regression with THP.

Any feedback is appreciated.

Andi Kleen (6):
  THP: Use real address for NUMA policy
  mm: make clear_huge_page tolerate non aligned address
  THP: Pass real, not rounded, address to clear_huge_page
  x86: Add clear_page_nocache
  mm: make clear_huge_page cache clear only around the fault address
  x86: switch the 64bit uncached page clear to SSE/AVX v2

 arch/x86/include/asm/page.h          |    2 +
 arch/x86/include/asm/string_32.h     |    5 ++
 arch/x86/include/asm/string_64.h     |    5 ++
 arch/x86/lib/Makefile                |    1 +
 arch/x86/lib/clear_page_nocache_32.S |   30 +++++++++++
 arch/x86/lib/clear_page_nocache_64.S |   92 ++++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c                  |    7 +++
 mm/huge_memory.c                     |   17 +++---
 mm/memory.c                          |   29 ++++++++++-
 9 files changed, 178 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_nocache_32.S
 create mode 100644 arch/x86/lib/clear_page_nocache_64.S

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
@ 2012-07-20 12:50 ` Kirill A. Shutemov
  0 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels of CPU
caches.  To avoid this only cache clear the 4K area around the fault
address and use a cache avoiding clears for the rest of the 2MB area.

It would be nice to test the patchset with more workloads. Especially if
you see performance regression with THP.

Any feedback is appreciated.

Andi Kleen (6):
  THP: Use real address for NUMA policy
  mm: make clear_huge_page tolerate non aligned address
  THP: Pass real, not rounded, address to clear_huge_page
  x86: Add clear_page_nocache
  mm: make clear_huge_page cache clear only around the fault address
  x86: switch the 64bit uncached page clear to SSE/AVX v2

 arch/x86/include/asm/page.h          |    2 +
 arch/x86/include/asm/string_32.h     |    5 ++
 arch/x86/include/asm/string_64.h     |    5 ++
 arch/x86/lib/Makefile                |    1 +
 arch/x86/lib/clear_page_nocache_32.S |   30 +++++++++++
 arch/x86/lib/clear_page_nocache_64.S |   92 ++++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c                  |    7 +++
 mm/huge_memory.c                     |   17 +++---
 mm/memory.c                          |   29 ++++++++++-
 9 files changed, 178 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_nocache_32.S
 create mode 100644 arch/x86/lib/clear_page_nocache_64.S

-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH, RFC 1/6] THP: Use real address for NUMA policy
  2012-07-20 12:50 ` Kirill A. Shutemov
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

Use the fault address, not the rounded down hpage address for NUMA
policy purposes. In some circumstances this can give more exact
NUMA policy.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..70737ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -681,11 +681,11 @@ static inline gfp_t alloc_hugepage_gfpmask(int defrag, gfp_t extra_gfp)
 
 static inline struct page *alloc_hugepage_vma(int defrag,
 					      struct vm_area_struct *vma,
-					      unsigned long haddr, int nd,
+					      unsigned long address, int nd,
 					      gfp_t extra_gfp)
 {
 	return alloc_pages_vma(alloc_hugepage_gfpmask(defrag, extra_gfp),
-			       HPAGE_PMD_ORDER, vma, haddr, nd);
+			       HPAGE_PMD_ORDER, vma, address, nd);
 }
 
 #ifndef CONFIG_NUMA
@@ -710,7 +710,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (unlikely(khugepaged_enter(vma)))
 			return VM_FAULT_OOM;
 		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					  vma, haddr, numa_node_id(), 0);
+					  vma, address, numa_node_id(), 0);
 		if (unlikely(!page)) {
 			count_vm_event(THP_FAULT_FALLBACK);
 			goto out;
@@ -944,7 +944,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (transparent_hugepage_enabled(vma) &&
 	    !transparent_hugepage_debug_cow())
 		new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					      vma, haddr, numa_node_id(), 0);
+					      vma, address, numa_node_id(), 0);
 	else
 		new_page = NULL;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 1/6] THP: Use real address for NUMA policy
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  0 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

Use the fault address, not the rounded down hpage address for NUMA
policy purposes. In some circumstances this can give more exact
NUMA policy.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..70737ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -681,11 +681,11 @@ static inline gfp_t alloc_hugepage_gfpmask(int defrag, gfp_t extra_gfp)
 
 static inline struct page *alloc_hugepage_vma(int defrag,
 					      struct vm_area_struct *vma,
-					      unsigned long haddr, int nd,
+					      unsigned long address, int nd,
 					      gfp_t extra_gfp)
 {
 	return alloc_pages_vma(alloc_hugepage_gfpmask(defrag, extra_gfp),
-			       HPAGE_PMD_ORDER, vma, haddr, nd);
+			       HPAGE_PMD_ORDER, vma, address, nd);
 }
 
 #ifndef CONFIG_NUMA
@@ -710,7 +710,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (unlikely(khugepaged_enter(vma)))
 			return VM_FAULT_OOM;
 		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					  vma, haddr, numa_node_id(), 0);
+					  vma, address, numa_node_id(), 0);
 		if (unlikely(!page)) {
 			count_vm_event(THP_FAULT_FALLBACK);
 			goto out;
@@ -944,7 +944,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (transparent_hugepage_enabled(vma) &&
 	    !transparent_hugepage_debug_cow())
 		new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					      vma, haddr, numa_node_id(), 0);
+					      vma, address, numa_node_id(), 0);
 	else
 		new_page = NULL;
 
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 2/6] mm: make clear_huge_page tolerate non aligned address
  2012-07-20 12:50 ` Kirill A. Shutemov
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

hugetlb does not necessarily pass in an aligned address, so the
low level address computation is wrong.

This will fix architectures that actually use the address for flushing
the cleared address (very few, like xtensa/sparc/...?)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 2466d12..c356ead 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3975,16 +3975,17 @@ void clear_huge_page(struct page *page,
 		     unsigned long addr, unsigned int pages_per_huge_page)
 {
 	int i;
+	unsigned long haddr = addr & HPAGE_PMD_MASK;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, addr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, addr + i * PAGE_SIZE);
+		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
 	}
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 2/6] mm: make clear_huge_page tolerate non aligned address
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  0 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

hugetlb does not necessarily pass in an aligned address, so the
low level address computation is wrong.

This will fix architectures that actually use the address for flushing
the cleared address (very few, like xtensa/sparc/...?)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 2466d12..c356ead 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3975,16 +3975,17 @@ void clear_huge_page(struct page *page,
 		     unsigned long addr, unsigned int pages_per_huge_page)
 {
 	int i;
+	unsigned long haddr = addr & HPAGE_PMD_MASK;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, addr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, addr + i * PAGE_SIZE);
+		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
 	}
 }
 
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 3/6] THP: Pass real, not rounded, address to clear_huge_page
  2012-07-20 12:50 ` Kirill A. Shutemov
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 70737ec..ecd93f8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -633,7 +633,8 @@ static inline pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 					struct vm_area_struct *vma,
-					unsigned long haddr, pmd_t *pmd,
+					unsigned long haddr,
+					unsigned long address, pmd_t *pmd,
 					struct page *page)
 {
 	pgtable_t pgtable;
@@ -643,7 +644,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 	if (unlikely(!pgtable))
 		return VM_FAULT_OOM;
 
-	clear_huge_page(page, haddr, HPAGE_PMD_NR);
+	clear_huge_page(page, address, HPAGE_PMD_NR);
 	__SetPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
@@ -720,8 +721,8 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			put_page(page);
 			goto out;
 		}
-		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd,
-							  page))) {
+		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr,
+						address, pmd, page))) {
 			mem_cgroup_uncharge_page(page);
 			put_page(page);
 			goto out;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 3/6] THP: Pass real, not rounded, address to clear_huge_page
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  0 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 70737ec..ecd93f8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -633,7 +633,8 @@ static inline pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 					struct vm_area_struct *vma,
-					unsigned long haddr, pmd_t *pmd,
+					unsigned long haddr,
+					unsigned long address, pmd_t *pmd,
 					struct page *page)
 {
 	pgtable_t pgtable;
@@ -643,7 +644,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 	if (unlikely(!pgtable))
 		return VM_FAULT_OOM;
 
-	clear_huge_page(page, haddr, HPAGE_PMD_NR);
+	clear_huge_page(page, address, HPAGE_PMD_NR);
 	__SetPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
@@ -720,8 +721,8 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			put_page(page);
 			goto out;
 		}
-		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd,
-							  page))) {
+		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr,
+						address, pmd, page))) {
 			mem_cgroup_uncharge_page(page);
 			put_page(page);
 			goto out;
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 4/6] x86: Add clear_page_nocache
  2012-07-20 12:50 ` Kirill A. Shutemov
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

Add a cache avoiding version of clear_page. Straight forward integer variant
of the existing 64bit clear_page, for both 32bit and 64bit.

Also add the necessary glue for highmem including a layer that non cache
coherent architectures that use the virtual address for flushing can
hook in. This is not needed on x86 of course.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/page.h          |    2 ++
 arch/x86/include/asm/string_32.h     |    5 +++++
 arch/x86/include/asm/string_64.h     |    5 +++++
 arch/x86/lib/Makefile                |    1 +
 arch/x86/lib/clear_page_nocache_32.S |   30 ++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_nocache_64.S |   29 +++++++++++++++++++++++++++++
 arch/x86/mm/fault.c                  |    7 +++++++
 7 files changed, 79 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_nocache_32.S
 create mode 100644 arch/x86/lib/clear_page_nocache_64.S

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 8ca8283..aa83a1b 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -29,6 +29,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	copy_page(to, from);
 }
 
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr);
+
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index 3d3e835..3f2fbcf 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Let gcc decide whether to inline or use the out of line functions */
 
 #define __HAVE_ARCH_STRCPY
@@ -337,6 +339,9 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern,
 #define __HAVE_ARCH_MEMSCAN
 extern void *memscan(void *addr, int c, size_t size);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_32_H */
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 19e2c46..ca23d1d 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Written 2002 by Andi Kleen */
 
 /* Only used for special circumstances. Stolen from i386/string.h */
@@ -63,6 +65,9 @@ char *strcpy(char *dest, const char *src);
 char *strcat(char *dest, const char *src);
 int strcmp(const char *cs, const char *ct);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index b00f678..a8ad6dd 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,6 +23,7 @@ lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_SMP) += rwlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-y += clear_page_nocache_$(BITS).o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
 
diff --git a/arch/x86/lib/clear_page_nocache_32.S b/arch/x86/lib/clear_page_nocache_32.S
new file mode 100644
index 0000000..2394e0c
--- /dev/null
+++ b/arch/x86/lib/clear_page_nocache_32.S
@@ -0,0 +1,30 @@
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+
+/*
+ * Zero a page avoiding the caches
+ * rdi	page
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	mov    %eax,%edi
+	xorl   %eax,%eax
+	movl   $4096/64,%ecx
+	.p2align 4
+.Lloop:
+	decl	%ecx
+#define PUT(x) movnti %eax,x*8(%edi) ; movnti %eax,x*8+4(%edi)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+	lea	64(%edi),%edi
+	jnz	.Lloop
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
diff --git a/arch/x86/lib/clear_page_nocache_64.S b/arch/x86/lib/clear_page_nocache_64.S
new file mode 100644
index 0000000..ee16d15
--- /dev/null
+++ b/arch/x86/lib/clear_page_nocache_64.S
@@ -0,0 +1,29 @@
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+
+/*
+ * Zero a page avoiding the caches
+ * rdi	page
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	xorl   %eax,%eax
+	movl   $4096/64,%ecx
+	.p2align 4
+.Lloop:
+	decl	%ecx
+#define PUT(x) movnti %rax,x*8(%rdi)
+	movnti %rax,(%rdi)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+	leaq	64(%rdi),%rdi
+	jnz	.Lloop
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 76dcd9d..20888b4 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1209,3 +1209,10 @@ good_area:
 
 	up_read(&mm->mmap_sem);
 }
+
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr)
+{
+	void *p = kmap_atomic(page, KM_USER0);
+	clear_page_nocache(p);
+	kunmap_atomic(p);
+}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 4/6] x86: Add clear_page_nocache
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  0 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

Add a cache avoiding version of clear_page. Straight forward integer variant
of the existing 64bit clear_page, for both 32bit and 64bit.

Also add the necessary glue for highmem including a layer that non cache
coherent architectures that use the virtual address for flushing can
hook in. This is not needed on x86 of course.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/page.h          |    2 ++
 arch/x86/include/asm/string_32.h     |    5 +++++
 arch/x86/include/asm/string_64.h     |    5 +++++
 arch/x86/lib/Makefile                |    1 +
 arch/x86/lib/clear_page_nocache_32.S |   30 ++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_nocache_64.S |   29 +++++++++++++++++++++++++++++
 arch/x86/mm/fault.c                  |    7 +++++++
 7 files changed, 79 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_nocache_32.S
 create mode 100644 arch/x86/lib/clear_page_nocache_64.S

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 8ca8283..aa83a1b 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -29,6 +29,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	copy_page(to, from);
 }
 
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr);
+
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index 3d3e835..3f2fbcf 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Let gcc decide whether to inline or use the out of line functions */
 
 #define __HAVE_ARCH_STRCPY
@@ -337,6 +339,9 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern,
 #define __HAVE_ARCH_MEMSCAN
 extern void *memscan(void *addr, int c, size_t size);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_32_H */
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 19e2c46..ca23d1d 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Written 2002 by Andi Kleen */
 
 /* Only used for special circumstances. Stolen from i386/string.h */
@@ -63,6 +65,9 @@ char *strcpy(char *dest, const char *src);
 char *strcat(char *dest, const char *src);
 int strcmp(const char *cs, const char *ct);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index b00f678..a8ad6dd 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,6 +23,7 @@ lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_SMP) += rwlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-y += clear_page_nocache_$(BITS).o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
 
diff --git a/arch/x86/lib/clear_page_nocache_32.S b/arch/x86/lib/clear_page_nocache_32.S
new file mode 100644
index 0000000..2394e0c
--- /dev/null
+++ b/arch/x86/lib/clear_page_nocache_32.S
@@ -0,0 +1,30 @@
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+
+/*
+ * Zero a page avoiding the caches
+ * rdi	page
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	mov    %eax,%edi
+	xorl   %eax,%eax
+	movl   $4096/64,%ecx
+	.p2align 4
+.Lloop:
+	decl	%ecx
+#define PUT(x) movnti %eax,x*8(%edi) ; movnti %eax,x*8+4(%edi)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+	lea	64(%edi),%edi
+	jnz	.Lloop
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
diff --git a/arch/x86/lib/clear_page_nocache_64.S b/arch/x86/lib/clear_page_nocache_64.S
new file mode 100644
index 0000000..ee16d15
--- /dev/null
+++ b/arch/x86/lib/clear_page_nocache_64.S
@@ -0,0 +1,29 @@
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+
+/*
+ * Zero a page avoiding the caches
+ * rdi	page
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	xorl   %eax,%eax
+	movl   $4096/64,%ecx
+	.p2align 4
+.Lloop:
+	decl	%ecx
+#define PUT(x) movnti %rax,x*8(%rdi)
+	movnti %rax,(%rdi)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+	leaq	64(%rdi),%rdi
+	jnz	.Lloop
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 76dcd9d..20888b4 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1209,3 +1209,10 @@ good_area:
 
 	up_read(&mm->mmap_sem);
 }
+
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr)
+{
+	void *p = kmap_atomic(page, KM_USER0);
+	clear_page_nocache(p);
+	kunmap_atomic(p);
+}
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 5/6] mm: make clear_huge_page cache clear only around the fault address
  2012-07-20 12:50 ` Kirill A. Shutemov
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels
of CPU caches. To avoid this only cache clear the 4K area
around the fault address and use a cache avoiding clears
for the rest of the 2MB area.

TBD add numbers

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c |   30 +++++++++++++++++++++++++++---
 1 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index c356ead..b4740cf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3957,18 +3957,35 @@ EXPORT_SYMBOL(might_fault);
 #endif
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
+
+#ifndef ARCH_HAS_USER_NOCACHE
+#define ARCH_HAS_USER_NOCACHE 0
+#endif
+
+#if ARCH_HAS_USER_NOCACHE == 0
+#define clear_user_highpage_nocache clear_user_highpage
+#endif
+
 static void clear_gigantic_page(struct page *page,
 				unsigned long addr,
 				unsigned int pages_per_huge_page)
 {
 	int i;
 	struct page *p = page;
+	unsigned long vaddr;
+	unsigned long haddr = addr & HPAGE_PMD_MASK;
+	int target = (addr - haddr) >> PAGE_SHIFT;
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page;
 	     i++, p = mem_map_next(p, page, i)) {
 		cond_resched();
-		clear_user_highpage(p, addr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE  || i == target)
+			clear_user_highpage(p, vaddr);
+		else
+			clear_user_highpage_nocache(p, vaddr);
 	}
 }
 void clear_huge_page(struct page *page,
@@ -3976,16 +3993,23 @@ void clear_huge_page(struct page *page,
 {
 	int i;
 	unsigned long haddr = addr & HPAGE_PMD_MASK;
+	unsigned long vaddr;
+	int target = (addr - haddr) >> PAGE_SHIFT;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, haddr, pages_per_huge_page);
+		clear_gigantic_page(page, addr, pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE || i == target)
+			clear_user_highpage(page + i, vaddr);
+		else
+			clear_user_highpage_nocache(page + i, vaddr);
 	}
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 5/6] mm: make clear_huge_page cache clear only around the fault address
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  0 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels
of CPU caches. To avoid this only cache clear the 4K area
around the fault address and use a cache avoiding clears
for the rest of the 2MB area.

TBD add numbers

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c |   30 +++++++++++++++++++++++++++---
 1 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index c356ead..b4740cf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3957,18 +3957,35 @@ EXPORT_SYMBOL(might_fault);
 #endif
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
+
+#ifndef ARCH_HAS_USER_NOCACHE
+#define ARCH_HAS_USER_NOCACHE 0
+#endif
+
+#if ARCH_HAS_USER_NOCACHE == 0
+#define clear_user_highpage_nocache clear_user_highpage
+#endif
+
 static void clear_gigantic_page(struct page *page,
 				unsigned long addr,
 				unsigned int pages_per_huge_page)
 {
 	int i;
 	struct page *p = page;
+	unsigned long vaddr;
+	unsigned long haddr = addr & HPAGE_PMD_MASK;
+	int target = (addr - haddr) >> PAGE_SHIFT;
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page;
 	     i++, p = mem_map_next(p, page, i)) {
 		cond_resched();
-		clear_user_highpage(p, addr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE  || i == target)
+			clear_user_highpage(p, vaddr);
+		else
+			clear_user_highpage_nocache(p, vaddr);
 	}
 }
 void clear_huge_page(struct page *page,
@@ -3976,16 +3993,23 @@ void clear_huge_page(struct page *page,
 {
 	int i;
 	unsigned long haddr = addr & HPAGE_PMD_MASK;
+	unsigned long vaddr;
+	int target = (addr - haddr) >> PAGE_SHIFT;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, haddr, pages_per_huge_page);
+		clear_gigantic_page(page, addr, pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE || i == target)
+			clear_user_highpage(page + i, vaddr);
+		else
+			clear_user_highpage_nocache(page + i, vaddr);
 	}
 }
 
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 6/6] x86: switch the 64bit uncached page clear to SSE/AVX v2
  2012-07-20 12:50 ` Kirill A. Shutemov
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

With multiple threads vector stores are more efficient, so use them.
This will cause the page clear to run non preemptable and add some
overhead. However on 32bit it was already non preempable (due to
kmap_atomic) and there is an preemption opportunity every 4K unit.

On a NPB (Nasa Parallel Benchmark) 128GB run on a Westmere this improves
the performance regression of enabling transparent huge pages
by ~2% (2.81% to 0.81%), near the runtime variability now.
On a system with AVX support more is expected.

TBD more tests on other workloads

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[kirill.shutemov@linux.intel.com: Properly save/restore arguments]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/lib/clear_page_nocache_64.S |   91 ++++++++++++++++++++++++++++-----
 1 files changed, 77 insertions(+), 14 deletions(-)

diff --git a/arch/x86/lib/clear_page_nocache_64.S b/arch/x86/lib/clear_page_nocache_64.S
index ee16d15..c092919 100644
--- a/arch/x86/lib/clear_page_nocache_64.S
+++ b/arch/x86/lib/clear_page_nocache_64.S
@@ -1,29 +1,92 @@
+/*
+ * Clear pages with cache bypass.
+ * 
+ * Copyright (C) 2011, 2012 Intel Corporation
+ * Author: Andi Kleen
+ *
+ * This software may be redistributed and/or modified under the terms of
+ * the GNU General Public License ("GPL") version 2 only as published by the
+ * Free Software Foundation.
+ */
+
 #include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeature.h>
 #include <asm/dwarf2.h>
 
+#define SSE_UNROLL 128
+
 /*
  * Zero a page avoiding the caches
  * rdi	page
  */
 ENTRY(clear_page_nocache)
 	CFI_STARTPROC
-	xorl   %eax,%eax
-	movl   $4096/64,%ecx
+	push   %rdi
+	call   kernel_fpu_begin
+	pop    %rdi
+	sub    $16,%rsp
+	CFI_ADJUST_CFA_OFFSET 16
+	movdqu %xmm0,(%rsp)
+	xorpd  %xmm0,%xmm0
+	movl   $4096/SSE_UNROLL,%ecx
 	.p2align 4
 .Lloop:
 	decl	%ecx
-#define PUT(x) movnti %rax,x*8(%rdi)
-	movnti %rax,(%rdi)
-	PUT(1)
-	PUT(2)
-	PUT(3)
-	PUT(4)
-	PUT(5)
-	PUT(6)
-	PUT(7)
-	leaq	64(%rdi),%rdi
+	.set x,0
+	.rept SSE_UNROLL/16
+	movntdq %xmm0,x(%rdi)
+	.set x,x+16
+	.endr
+	leaq	SSE_UNROLL(%rdi),%rdi
 	jnz	.Lloop
-	nop
-	ret
+	movdqu (%rsp),%xmm0
+	addq   $16,%rsp
+	CFI_ADJUST_CFA_OFFSET -16
+	jmp   kernel_fpu_end
 	CFI_ENDPROC
 ENDPROC(clear_page_nocache)
+
+#ifdef CONFIG_AS_AVX
+
+	.section .altinstr_replacement,"ax"
+1:	.byte 0xeb					/* jmp <disp8> */
+	.byte (clear_page_nocache_avx - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_AVX,\
+	                     16, 2b-1b
+	.previous
+
+#define AVX_UNROLL 256 /* TUNE ME */
+
+ENTRY(clear_page_nocache_avx)
+	CFI_STARTPROC
+	push   %rdi
+	call   kernel_fpu_begin
+	pop    %rdi
+	sub    $32,%rsp
+	CFI_ADJUST_CFA_OFFSET 32
+	vmovdqu %ymm0,(%rsp)
+	vxorpd  %ymm0,%ymm0,%ymm0
+	movl   $4096/AVX_UNROLL,%ecx
+	.p2align 4
+.Lloop_avx:
+	decl	%ecx
+	.set x,0
+	.rept AVX_UNROLL/32
+	vmovntdq %ymm0,x(%rdi)
+	.set x,x+32
+	.endr
+	leaq	AVX_UNROLL(%rdi),%rdi
+	jnz	.Lloop_avx
+	vmovdqu (%rsp),%ymm0
+	addq   $32,%rsp
+	CFI_ADJUST_CFA_OFFSET -32
+	jmp   kernel_fpu_end
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_avx)
+
+#endif
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH, RFC 6/6] x86: switch the 64bit uncached page clear to SSE/AVX v2
@ 2012-07-20 12:50   ` Kirill A. Shutemov
  0 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 12:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

With multiple threads vector stores are more efficient, so use them.
This will cause the page clear to run non preemptable and add some
overhead. However on 32bit it was already non preempable (due to
kmap_atomic) and there is an preemption opportunity every 4K unit.

On a NPB (Nasa Parallel Benchmark) 128GB run on a Westmere this improves
the performance regression of enabling transparent huge pages
by ~2% (2.81% to 0.81%), near the runtime variability now.
On a system with AVX support more is expected.

TBD more tests on other workloads

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[kirill.shutemov@linux.intel.com: Properly save/restore arguments]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/lib/clear_page_nocache_64.S |   91 ++++++++++++++++++++++++++++-----
 1 files changed, 77 insertions(+), 14 deletions(-)

diff --git a/arch/x86/lib/clear_page_nocache_64.S b/arch/x86/lib/clear_page_nocache_64.S
index ee16d15..c092919 100644
--- a/arch/x86/lib/clear_page_nocache_64.S
+++ b/arch/x86/lib/clear_page_nocache_64.S
@@ -1,29 +1,92 @@
+/*
+ * Clear pages with cache bypass.
+ * 
+ * Copyright (C) 2011, 2012 Intel Corporation
+ * Author: Andi Kleen
+ *
+ * This software may be redistributed and/or modified under the terms of
+ * the GNU General Public License ("GPL") version 2 only as published by the
+ * Free Software Foundation.
+ */
+
 #include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeature.h>
 #include <asm/dwarf2.h>
 
+#define SSE_UNROLL 128
+
 /*
  * Zero a page avoiding the caches
  * rdi	page
  */
 ENTRY(clear_page_nocache)
 	CFI_STARTPROC
-	xorl   %eax,%eax
-	movl   $4096/64,%ecx
+	push   %rdi
+	call   kernel_fpu_begin
+	pop    %rdi
+	sub    $16,%rsp
+	CFI_ADJUST_CFA_OFFSET 16
+	movdqu %xmm0,(%rsp)
+	xorpd  %xmm0,%xmm0
+	movl   $4096/SSE_UNROLL,%ecx
 	.p2align 4
 .Lloop:
 	decl	%ecx
-#define PUT(x) movnti %rax,x*8(%rdi)
-	movnti %rax,(%rdi)
-	PUT(1)
-	PUT(2)
-	PUT(3)
-	PUT(4)
-	PUT(5)
-	PUT(6)
-	PUT(7)
-	leaq	64(%rdi),%rdi
+	.set x,0
+	.rept SSE_UNROLL/16
+	movntdq %xmm0,x(%rdi)
+	.set x,x+16
+	.endr
+	leaq	SSE_UNROLL(%rdi),%rdi
 	jnz	.Lloop
-	nop
-	ret
+	movdqu (%rsp),%xmm0
+	addq   $16,%rsp
+	CFI_ADJUST_CFA_OFFSET -16
+	jmp   kernel_fpu_end
 	CFI_ENDPROC
 ENDPROC(clear_page_nocache)
+
+#ifdef CONFIG_AS_AVX
+
+	.section .altinstr_replacement,"ax"
+1:	.byte 0xeb					/* jmp <disp8> */
+	.byte (clear_page_nocache_avx - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_AVX,\
+	                     16, 2b-1b
+	.previous
+
+#define AVX_UNROLL 256 /* TUNE ME */
+
+ENTRY(clear_page_nocache_avx)
+	CFI_STARTPROC
+	push   %rdi
+	call   kernel_fpu_begin
+	pop    %rdi
+	sub    $32,%rsp
+	CFI_ADJUST_CFA_OFFSET 32
+	vmovdqu %ymm0,(%rsp)
+	vxorpd  %ymm0,%ymm0,%ymm0
+	movl   $4096/AVX_UNROLL,%ecx
+	.p2align 4
+.Lloop_avx:
+	decl	%ecx
+	.set x,0
+	.rept AVX_UNROLL/32
+	vmovntdq %ymm0,x(%rdi)
+	.set x,x+32
+	.endr
+	leaq	AVX_UNROLL(%rdi),%rdi
+	jnz	.Lloop_avx
+	vmovdqu (%rsp),%ymm0
+	addq   $32,%rsp
+	CFI_ADJUST_CFA_OFFSET -32
+	jmp   kernel_fpu_end
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_avx)
+
+#endif
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
  2012-07-20 12:50 ` Kirill A. Shutemov
@ 2012-07-23 23:30   ` Andrew Morton
  -1 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2012-07-23 23:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrea Arcangeli, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel

On Fri, 20 Jul 2012 15:50:16 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> Clearing a 2MB huge page will typically blow away several levels of CPU
> caches.  To avoid this only cache clear the 4K area around the fault
> address and use a cache avoiding clears for the rest of the 2MB area.
> 
> It would be nice to test the patchset with more workloads. Especially if
> you see performance regression with THP.
> 
> Any feedback is appreciated.

This all looks pretty sane to me.  Some detail-poking from the x86 guys
would be nice.

What do other architectures need to do?  Simply implement
clear_page_nocache()?  I believe that powerpc is one, not sure about
others.  Please update the changelogs to let arch maintainers know
what they should do and cc those people on future versions?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
@ 2012-07-23 23:30   ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2012-07-23 23:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrea Arcangeli, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel

On Fri, 20 Jul 2012 15:50:16 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> Clearing a 2MB huge page will typically blow away several levels of CPU
> caches.  To avoid this only cache clear the 4K area around the fault
> address and use a cache avoiding clears for the rest of the 2MB area.
> 
> It would be nice to test the patchset with more workloads. Especially if
> you see performance regression with THP.
> 
> Any feedback is appreciated.

This all looks pretty sane to me.  Some detail-poking from the x86 guys
would be nice.

What do other architectures need to do?  Simply implement
clear_page_nocache()?  I believe that powerpc is one, not sure about
others.  Please update the changelogs to let arch maintainers know
what they should do and cc those people on future versions?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
  2012-07-23 23:30   ` Andrew Morton
  (?)
@ 2012-07-24 14:09   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2012-07-24 14:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrea Arcangeli, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1017 bytes --]

On Mon, Jul 23, 2012 at 04:30:20PM -0700, Andrew Morton wrote:
> On Fri, 20 Jul 2012 15:50:16 +0300
> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > Clearing a 2MB huge page will typically blow away several levels of CPU
> > caches.  To avoid this only cache clear the 4K area around the fault
> > address and use a cache avoiding clears for the rest of the 2MB area.
> > 
> > It would be nice to test the patchset with more workloads. Especially if
> > you see performance regression with THP.
> > 
> > Any feedback is appreciated.
> 
> This all looks pretty sane to me.  Some detail-poking from the x86 guys
> would be nice.
> 
> What do other architectures need to do?  Simply implement
> clear_page_nocache()?

And define ARCH_HAS_USER_NOCACHE to 1.

> I believe that powerpc is one, not sure about
> others.  Please update the changelogs to let arch maintainers know
> what they should do and cc those people on future versions?

Okay.

-- 
 Kirill A. Shutemov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
  2012-07-20 12:50 ` Kirill A. Shutemov
@ 2012-07-25 18:51   ` Christoph Lameter
  -1 siblings, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2012-07-25 18:51 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

On Fri, 20 Jul 2012, Kirill A. Shutemov wrote:

> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> Clearing a 2MB huge page will typically blow away several levels of CPU
> caches.  To avoid this only cache clear the 4K area around the fault
> address and use a cache avoiding clears for the rest of the 2MB area.

why exempt the 4K around the fault address? Is there a regression if that
is not exempted?

I guess for anonymous huge pages one may assume that there will be at
least one write to one cache line in the 4k page. Is it useful to get all
the cachelines in the page in the cache.

Also note that if we get later into hugepage use for the page cache we
would want the cache to be cold because the contents have to come in from
a storage medium.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
@ 2012-07-25 18:51   ` Christoph Lameter
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2012-07-25 18:51 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

On Fri, 20 Jul 2012, Kirill A. Shutemov wrote:

> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> Clearing a 2MB huge page will typically blow away several levels of CPU
> caches.  To avoid this only cache clear the 4K area around the fault
> address and use a cache avoiding clears for the rest of the 2MB area.

why exempt the 4K around the fault address? Is there a regression if that
is not exempted?

I guess for anonymous huge pages one may assume that there will be at
least one write to one cache line in the 4k page. Is it useful to get all
the cachelines in the page in the cache.

Also note that if we get later into hugepage use for the page cache we
would want the cache to be cold because the contents have to come in from
a storage medium.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
  2012-07-25 18:51   ` Christoph Lameter
@ 2012-07-25 19:28     ` Andi Kleen
  -1 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2012-07-25 19:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

On Wed, Jul 25, 2012 at 01:51:01PM -0500, Christoph Lameter wrote:
> On Fri, 20 Jul 2012, Kirill A. Shutemov wrote:
> 
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >
> > Clearing a 2MB huge page will typically blow away several levels of CPU
> > caches.  To avoid this only cache clear the 4K area around the fault
> > address and use a cache avoiding clears for the rest of the 2MB area.
> 
> why exempt the 4K around the fault address? Is there a regression if that
> is not exempted?

You would get an immediate cache miss when the faulting instruction
is reexecuted.

> 
> I guess for anonymous huge pages one may assume that there will be at
> least one write to one cache line in the 4k page. Is it useful to get all
> the cachelines in the page in the cache.

We did some measurements -- comparing 4K and 2MB with some tracing 
of fault patterns -- and a lot of apps don't use the full 2MB area. 
The apps with THP regressions usually used less than others.
The patchkit significantly reduced some of the regressions.

> 
> Also note that if we get later into hugepage use for the page cache we
> would want the cache to be cold because the contents have to come in from
> a storage medium.

Page cache is not cleared, so never runs this code.


-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
@ 2012-07-25 19:28     ` Andi Kleen
  0 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2012-07-25 19:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

On Wed, Jul 25, 2012 at 01:51:01PM -0500, Christoph Lameter wrote:
> On Fri, 20 Jul 2012, Kirill A. Shutemov wrote:
> 
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >
> > Clearing a 2MB huge page will typically blow away several levels of CPU
> > caches.  To avoid this only cache clear the 4K area around the fault
> > address and use a cache avoiding clears for the rest of the 2MB area.
> 
> why exempt the 4K around the fault address? Is there a regression if that
> is not exempted?

You would get an immediate cache miss when the faulting instruction
is reexecuted.

> 
> I guess for anonymous huge pages one may assume that there will be at
> least one write to one cache line in the 4k page. Is it useful to get all
> the cachelines in the page in the cache.

We did some measurements -- comparing 4K and 2MB with some tracing 
of fault patterns -- and a lot of apps don't use the full 2MB area. 
The apps with THP regressions usually used less than others.
The patchkit significantly reduced some of the regressions.

> 
> Also note that if we get later into hugepage use for the page cache we
> would want the cache to be cold because the contents have to come in from
> a storage medium.

Page cache is not cleared, so never runs this code.


-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
  2012-07-25 19:28     ` Andi Kleen
@ 2012-07-25 19:38       ` Christoph Lameter
  -1 siblings, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2012-07-25 19:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

On Wed, 25 Jul 2012, Andi Kleen wrote:

> > why exempt the 4K around the fault address? Is there a regression if that
> > is not exempted?
>
> You would get an immediate cache miss when the faulting instruction
> is reexecuted.

Nope. You would not get cache misses for all cachelines in the 4k range.
Only one.

> > I guess for anonymous huge pages one may assume that there will be at
> > least one write to one cache line in the 4k page. Is it useful to get all
> > the cachelines in the page in the cache.
>
> We did some measurements -- comparing 4K and 2MB with some tracing
> of fault patterns -- and a lot of apps don't use the full 2MB area.
> The apps with THP regressions usually used less than others.
> The patchkit significantly reduced some of the regressions.

Yup they wont use the full 2MB area. But are they using all the cache
lines of the 4k page that we are making hot?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
@ 2012-07-25 19:38       ` Christoph Lameter
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Lameter @ 2012-07-25 19:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel

On Wed, 25 Jul 2012, Andi Kleen wrote:

> > why exempt the 4K around the fault address? Is there a regression if that
> > is not exempted?
>
> You would get an immediate cache miss when the faulting instruction
> is reexecuted.

Nope. You would not get cache misses for all cachelines in the 4k range.
Only one.

> > I guess for anonymous huge pages one may assume that there will be at
> > least one write to one cache line in the 4k page. Is it useful to get all
> > the cachelines in the page in the cache.
>
> We did some measurements -- comparing 4K and 2MB with some tracing
> of fault patterns -- and a lot of apps don't use the full 2MB area.
> The apps with THP regressions usually used less than others.
> The patchkit significantly reduced some of the regressions.

Yup they wont use the full 2MB area. But are they using all the cache
lines of the 4k page that we are making hot?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
  2012-07-20 12:50 ` Kirill A. Shutemov
@ 2012-08-09 15:18   ` H. Peter Anvin
  -1 siblings, 0 replies; 25+ messages in thread
From: H. Peter Anvin @ 2012-08-09 15:18 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, x86, Andi Kleen,
	Tim Chen, Alex Shi, Jan Beulich, Robert Richter, Andy Lutomirski,
	Andrew Morton, Andrea Arcangeli, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel

On 07/20/2012 05:50 AM, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> Clearing a 2MB huge page will typically blow away several levels of CPU
> caches.  To avoid this only cache clear the 4K area around the fault
> address and use a cache avoiding clears for the rest of the 2MB area.
>
> It would be nice to test the patchset with more workloads. Especially if
> you see performance regression with THP.
>
> Any feedback is appreciated.
>
> Andi Kleen (6):
>    THP: Use real address for NUMA policy
>    mm: make clear_huge_page tolerate non aligned address
>    THP: Pass real, not rounded, address to clear_huge_page
>    x86: Add clear_page_nocache
>    mm: make clear_huge_page cache clear only around the fault address
>    x86: switch the 64bit uncached page clear to SSE/AVX v2
>

This is a mix of x86-specific and generic changes... does anyone mind if 
I put this into the -tip tree?

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page
@ 2012-08-09 15:18   ` H. Peter Anvin
  0 siblings, 0 replies; 25+ messages in thread
From: H. Peter Anvin @ 2012-08-09 15:18 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, x86, Andi Kleen,
	Tim Chen, Alex Shi, Jan Beulich, Robert Richter, Andy Lutomirski,
	Andrew Morton, Andrea Arcangeli, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel

On 07/20/2012 05:50 AM, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> Clearing a 2MB huge page will typically blow away several levels of CPU
> caches.  To avoid this only cache clear the 4K area around the fault
> address and use a cache avoiding clears for the rest of the 2MB area.
>
> It would be nice to test the patchset with more workloads. Especially if
> you see performance regression with THP.
>
> Any feedback is appreciated.
>
> Andi Kleen (6):
>    THP: Use real address for NUMA policy
>    mm: make clear_huge_page tolerate non aligned address
>    THP: Pass real, not rounded, address to clear_huge_page
>    x86: Add clear_page_nocache
>    mm: make clear_huge_page cache clear only around the fault address
>    x86: switch the 64bit uncached page clear to SSE/AVX v2
>

This is a mix of x86-specific and generic changes... does anyone mind if 
I put this into the -tip tree?

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-08-09 15:19 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-20 12:50 [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page Kirill A. Shutemov
2012-07-20 12:50 ` Kirill A. Shutemov
2012-07-20 12:50 ` [PATCH, RFC 1/6] THP: Use real address for NUMA policy Kirill A. Shutemov
2012-07-20 12:50   ` Kirill A. Shutemov
2012-07-20 12:50 ` [PATCH, RFC 2/6] mm: make clear_huge_page tolerate non aligned address Kirill A. Shutemov
2012-07-20 12:50   ` Kirill A. Shutemov
2012-07-20 12:50 ` [PATCH, RFC 3/6] THP: Pass real, not rounded, address to clear_huge_page Kirill A. Shutemov
2012-07-20 12:50   ` Kirill A. Shutemov
2012-07-20 12:50 ` [PATCH, RFC 4/6] x86: Add clear_page_nocache Kirill A. Shutemov
2012-07-20 12:50   ` Kirill A. Shutemov
2012-07-20 12:50 ` [PATCH, RFC 5/6] mm: make clear_huge_page cache clear only around the fault address Kirill A. Shutemov
2012-07-20 12:50   ` Kirill A. Shutemov
2012-07-20 12:50 ` [PATCH, RFC 6/6] x86: switch the 64bit uncached page clear to SSE/AVX v2 Kirill A. Shutemov
2012-07-20 12:50   ` Kirill A. Shutemov
2012-07-23 23:30 ` [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page Andrew Morton
2012-07-23 23:30   ` Andrew Morton
2012-07-24 14:09   ` Kirill A. Shutemov
2012-07-25 18:51 ` Christoph Lameter
2012-07-25 18:51   ` Christoph Lameter
2012-07-25 19:28   ` Andi Kleen
2012-07-25 19:28     ` Andi Kleen
2012-07-25 19:38     ` Christoph Lameter
2012-07-25 19:38       ` Christoph Lameter
2012-08-09 15:18 ` H. Peter Anvin
2012-08-09 15:18   ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.