[PATCH v3 0/7] Avoid cache trashing on clearing huge/gigantic page

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/7] Avoid cache trashing on clearing huge/gigantic page
@ 2012-08-16 15:15 ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels of CPU
caches.  To avoid this only cache clear the 4K area around the fault
address and use a cache avoiding clears for the rest of the 2MB area.

This patchset implements cache avoiding version of clear_page only for
x86. If an architecture wants to provide cache avoiding version of
clear_page it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

v3:
  - Rebased to current Linus' tree. kmap_atomic() build issue is fixed;
  - Pass fault address to clear_huge_page(). v2 had problem with clearing
    for sizes other than HPAGE_SIZE
  - x86: fix 32bit variant. Fallback version of clear_page_nocache() has
    been added for non-SSE2 systems;
  - x86: clear_page_nocache() moved to clear_page_{32,64}.S;
  - x86: use pushq_cfi/popq_cfi instead of push/pop;
v2:
  - No code change. Only commit messages are updated.
  - RFC mark is dropped.

Andi Kleen (5):
  THP: Use real address for NUMA policy
  THP: Pass fault address to __do_huge_pmd_anonymous_page()
  x86: Add clear_page_nocache
  mm: make clear_huge_page cache clear only around the fault address
  x86: switch the 64bit uncached page clear to SSE/AVX v2

Kirill A. Shutemov (2):
  hugetlb: pass fault address to hugetlb_no_page()
  mm: pass fault address to clear_huge_page()

 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 ++
 arch/x86/include/asm/string_64.h |    5 ++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 +++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   78 ++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c              |    7 +++
 include/linux/mm.h               |    2 +-
 mm/huge_memory.c                 |   17 ++++----
 mm/hugetlb.c                     |   39 ++++++++++---------
 mm/memory.c                      |   37 +++++++++++++++---
 11 files changed, 232 insertions(+), 35 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 0/7] Avoid cache trashing on clearing huge/gigantic page
@ 2012-08-16 15:15 ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels of CPU
caches.  To avoid this only cache clear the 4K area around the fault
address and use a cache avoiding clears for the rest of the 2MB area.

This patchset implements cache avoiding version of clear_page only for
x86. If an architecture wants to provide cache avoiding version of
clear_page it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

v3:
  - Rebased to current Linus' tree. kmap_atomic() build issue is fixed;
  - Pass fault address to clear_huge_page(). v2 had problem with clearing
    for sizes other than HPAGE_SIZE
  - x86: fix 32bit variant. Fallback version of clear_page_nocache() has
    been added for non-SSE2 systems;
  - x86: clear_page_nocache() moved to clear_page_{32,64}.S;
  - x86: use pushq_cfi/popq_cfi instead of push/pop;
v2:
  - No code change. Only commit messages are updated.
  - RFC mark is dropped.

Andi Kleen (5):
  THP: Use real address for NUMA policy
  THP: Pass fault address to __do_huge_pmd_anonymous_page()
  x86: Add clear_page_nocache
  mm: make clear_huge_page cache clear only around the fault address
  x86: switch the 64bit uncached page clear to SSE/AVX v2

Kirill A. Shutemov (2):
  hugetlb: pass fault address to hugetlb_no_page()
  mm: pass fault address to clear_huge_page()

 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 ++
 arch/x86/include/asm/string_64.h |    5 ++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 +++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   78 ++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c              |    7 +++
 include/linux/mm.h               |    2 +-
 mm/huge_memory.c                 |   17 ++++----
 mm/hugetlb.c                     |   39 ++++++++++---------
 mm/memory.c                      |   37 +++++++++++++++---
 11 files changed, 232 insertions(+), 35 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 0/7] Avoid cache trashing on clearing huge/gigantic page
@ 2012-08-16 15:15 ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels of CPU
caches.  To avoid this only cache clear the 4K area around the fault
address and use a cache avoiding clears for the rest of the 2MB area.

This patchset implements cache avoiding version of clear_page only for
x86. If an architecture wants to provide cache avoiding version of
clear_page it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

v3:
  - Rebased to current Linus' tree. kmap_atomic() build issue is fixed;
  - Pass fault address to clear_huge_page(). v2 had problem with clearing
    for sizes other than HPAGE_SIZE
  - x86: fix 32bit variant. Fallback version of clear_page_nocache() has
    been added for non-SSE2 systems;
  - x86: clear_page_nocache() moved to clear_page_{32,64}.S;
  - x86: use pushq_cfi/popq_cfi instead of push/pop;
v2:
  - No code change. Only commit messages are updated.
  - RFC mark is dropped.

Andi Kleen (5):
  THP: Use real address for NUMA policy
  THP: Pass fault address to __do_huge_pmd_anonymous_page()
  x86: Add clear_page_nocache
  mm: make clear_huge_page cache clear only around the fault address
  x86: switch the 64bit uncached page clear to SSE/AVX v2

Kirill A. Shutemov (2):
  hugetlb: pass fault address to hugetlb_no_page()
  mm: pass fault address to clear_huge_page()

 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 ++
 arch/x86/include/asm/string_64.h |    5 ++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 +++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   78 ++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c              |    7 +++
 include/linux/mm.h               |    2 +-
 mm/huge_memory.c                 |   17 ++++----
 mm/hugetlb.c                     |   39 ++++++++++---------
 mm/memory.c                      |   37 +++++++++++++++---
 11 files changed, 232 insertions(+), 35 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 0/7] Avoid cache trashing on clearing huge/gigantic page
@ 2012-08-16 15:15 ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels of CPU
caches.  To avoid this only cache clear the 4K area around the fault
address and use a cache avoiding clears for the rest of the 2MB area.

This patchset implements cache avoiding version of clear_page only for
x86. If an architecture wants to provide cache avoiding version of
clear_page it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

v3:
  - Rebased to current Linus' tree. kmap_atomic() build issue is fixed;
  - Pass fault address to clear_huge_page(). v2 had problem with clearing
    for sizes other than HPAGE_SIZE
  - x86: fix 32bit variant. Fallback version of clear_page_nocache() has
    been added for non-SSE2 systems;
  - x86: clear_page_nocache() moved to clear_page_{32,64}.S;
  - x86: use pushq_cfi/popq_cfi instead of push/pop;
v2:
  - No code change. Only commit messages are updated.
  - RFC mark is dropped.

Andi Kleen (5):
  THP: Use real address for NUMA policy
  THP: Pass fault address to __do_huge_pmd_anonymous_page()
  x86: Add clear_page_nocache
  mm: make clear_huge_page cache clear only around the fault address
  x86: switch the 64bit uncached page clear to SSE/AVX v2

Kirill A. Shutemov (2):
  hugetlb: pass fault address to hugetlb_no_page()
  mm: pass fault address to clear_huge_page()

 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 ++
 arch/x86/include/asm/string_64.h |    5 ++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 +++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   78 ++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c              |    7 +++
 include/linux/mm.h               |    2 +-
 mm/huge_memory.c                 |   17 ++++----
 mm/hugetlb.c                     |   39 ++++++++++---------
 mm/memory.c                      |   37 +++++++++++++++---
 11 files changed, 232 insertions(+), 35 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 1/7] THP: Use real address for NUMA policy
  2012-08-16 15:15 ` Kirill A. Shutemov
  (?)
  (?)
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Use the fault address, not the rounded down hpage address for NUMA
policy purposes. In some circumstances this can give more exact
NUMA policy.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..70737ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -681,11 +681,11 @@ static inline gfp_t alloc_hugepage_gfpmask(int defrag, gfp_t extra_gfp)
 
 static inline struct page *alloc_hugepage_vma(int defrag,
 					      struct vm_area_struct *vma,
-					      unsigned long haddr, int nd,
+					      unsigned long address, int nd,
 					      gfp_t extra_gfp)
 {
 	return alloc_pages_vma(alloc_hugepage_gfpmask(defrag, extra_gfp),
-			       HPAGE_PMD_ORDER, vma, haddr, nd);
+			       HPAGE_PMD_ORDER, vma, address, nd);
 }
 
 #ifndef CONFIG_NUMA
@@ -710,7 +710,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (unlikely(khugepaged_enter(vma)))
 			return VM_FAULT_OOM;
 		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					  vma, haddr, numa_node_id(), 0);
+					  vma, address, numa_node_id(), 0);
 		if (unlikely(!page)) {
 			count_vm_event(THP_FAULT_FALLBACK);
 			goto out;
@@ -944,7 +944,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (transparent_hugepage_enabled(vma) &&
 	    !transparent_hugepage_debug_cow())
 		new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					      vma, haddr, numa_node_id(), 0);
+					      vma, address, numa_node_id(), 0);
 	else
 		new_page = NULL;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 1/7] THP: Use real address for NUMA policy
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Use the fault address, not the rounded down hpage address for NUMA
policy purposes. In some circumstances this can give more exact
NUMA policy.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..70737ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -681,11 +681,11 @@ static inline gfp_t alloc_hugepage_gfpmask(int defrag, gfp_t extra_gfp)
 
 static inline struct page *alloc_hugepage_vma(int defrag,
 					      struct vm_area_struct *vma,
-					      unsigned long haddr, int nd,
+					      unsigned long address, int nd,
 					      gfp_t extra_gfp)
 {
 	return alloc_pages_vma(alloc_hugepage_gfpmask(defrag, extra_gfp),
-			       HPAGE_PMD_ORDER, vma, haddr, nd);
+			       HPAGE_PMD_ORDER, vma, address, nd);
 }
 
 #ifndef CONFIG_NUMA
@@ -710,7 +710,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (unlikely(khugepaged_enter(vma)))
 			return VM_FAULT_OOM;
 		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					  vma, haddr, numa_node_id(), 0);
+					  vma, address, numa_node_id(), 0);
 		if (unlikely(!page)) {
 			count_vm_event(THP_FAULT_FALLBACK);
 			goto out;
@@ -944,7 +944,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (transparent_hugepage_enabled(vma) &&
 	    !transparent_hugepage_debug_cow())
 		new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					      vma, haddr, numa_node_id(), 0);
+					      vma, address, numa_node_id(), 0);
 	else
 		new_page = NULL;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 1/7] THP: Use real address for NUMA policy
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Use the fault address, not the rounded down hpage address for NUMA
policy purposes. In some circumstances this can give more exact
NUMA policy.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..70737ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -681,11 +681,11 @@ static inline gfp_t alloc_hugepage_gfpmask(int defrag, gfp_t extra_gfp)
 
 static inline struct page *alloc_hugepage_vma(int defrag,
 					      struct vm_area_struct *vma,
-					      unsigned long haddr, int nd,
+					      unsigned long address, int nd,
 					      gfp_t extra_gfp)
 {
 	return alloc_pages_vma(alloc_hugepage_gfpmask(defrag, extra_gfp),
-			       HPAGE_PMD_ORDER, vma, haddr, nd);
+			       HPAGE_PMD_ORDER, vma, address, nd);
 }
 
 #ifndef CONFIG_NUMA
@@ -710,7 +710,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (unlikely(khugepaged_enter(vma)))
 			return VM_FAULT_OOM;
 		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					  vma, haddr, numa_node_id(), 0);
+					  vma, address, numa_node_id(), 0);
 		if (unlikely(!page)) {
 			count_vm_event(THP_FAULT_FALLBACK);
 			goto out;
@@ -944,7 +944,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (transparent_hugepage_enabled(vma) &&
 	    !transparent_hugepage_debug_cow())
 		new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					      vma, haddr, numa_node_id(), 0);
+					      vma, address, numa_node_id(), 0);
 	else
 		new_page = NULL;
 
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 1/7] THP: Use real address for NUMA policy
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

From: Andi Kleen <ak@linux.intel.com>

Use the fault address, not the rounded down hpage address for NUMA
policy purposes. In some circumstances this can give more exact
NUMA policy.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..70737ec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -681,11 +681,11 @@ static inline gfp_t alloc_hugepage_gfpmask(int defrag, gfp_t extra_gfp)
 
 static inline struct page *alloc_hugepage_vma(int defrag,
 					      struct vm_area_struct *vma,
-					      unsigned long haddr, int nd,
+					      unsigned long address, int nd,
 					      gfp_t extra_gfp)
 {
 	return alloc_pages_vma(alloc_hugepage_gfpmask(defrag, extra_gfp),
-			       HPAGE_PMD_ORDER, vma, haddr, nd);
+			       HPAGE_PMD_ORDER, vma, address, nd);
 }
 
 #ifndef CONFIG_NUMA
@@ -710,7 +710,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (unlikely(khugepaged_enter(vma)))
 			return VM_FAULT_OOM;
 		page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					  vma, haddr, numa_node_id(), 0);
+					  vma, address, numa_node_id(), 0);
 		if (unlikely(!page)) {
 			count_vm_event(THP_FAULT_FALLBACK);
 			goto out;
@@ -944,7 +944,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (transparent_hugepage_enabled(vma) &&
 	    !transparent_hugepage_debug_cow())
 		new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
-					      vma, haddr, numa_node_id(), 0);
+					      vma, address, numa_node_id(), 0);
 	else
 		new_page = NULL;
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/7] THP: Pass fault address to __do_huge_pmd_anonymous_page()
  2012-08-16 15:15 ` Kirill A. Shutemov
  (?)
  (?)
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 70737ec..6f0825b611 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -633,7 +633,8 @@ static inline pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 					struct vm_area_struct *vma,
-					unsigned long haddr, pmd_t *pmd,
+					unsigned long haddr,
+					unsigned long address, pmd_t *pmd,
 					struct page *page)
 {
 	pgtable_t pgtable;
@@ -720,8 +721,8 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			put_page(page);
 			goto out;
 		}
-		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd,
-							  page))) {
+		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr,
+						address, pmd, page))) {
 			mem_cgroup_uncharge_page(page);
 			put_page(page);
 			goto out;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/7] THP: Pass fault address to __do_huge_pmd_anonymous_page()
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 70737ec..6f0825b611 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -633,7 +633,8 @@ static inline pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 					struct vm_area_struct *vma,
-					unsigned long haddr, pmd_t *pmd,
+					unsigned long haddr,
+					unsigned long address, pmd_t *pmd,
 					struct page *page)
 {
 	pgtable_t pgtable;
@@ -720,8 +721,8 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			put_page(page);
 			goto out;
 		}
-		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd,
-							  page))) {
+		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr,
+						address, pmd, page))) {
 			mem_cgroup_uncharge_page(page);
 			put_page(page);
 			goto out;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/7] THP: Pass fault address to __do_huge_pmd_anonymous_page()
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 70737ec..6f0825b611 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -633,7 +633,8 @@ static inline pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 					struct vm_area_struct *vma,
-					unsigned long haddr, pmd_t *pmd,
+					unsigned long haddr,
+					unsigned long address, pmd_t *pmd,
 					struct page *page)
 {
 	pgtable_t pgtable;
@@ -720,8 +721,8 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			put_page(page);
 			goto out;
 		}
-		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd,
-							  page))) {
+		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr,
+						address, pmd, page))) {
 			mem_cgroup_uncharge_page(page);
 			put_page(page);
 			goto out;
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/7] THP: Pass fault address to __do_huge_pmd_anonymous_page()
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

From: Andi Kleen <ak@linux.intel.com>

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 70737ec..6f0825b611 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -633,7 +633,8 @@ static inline pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 					struct vm_area_struct *vma,
-					unsigned long haddr, pmd_t *pmd,
+					unsigned long haddr,
+					unsigned long address, pmd_t *pmd,
 					struct page *page)
 {
 	pgtable_t pgtable;
@@ -720,8 +721,8 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			put_page(page);
 			goto out;
 		}
-		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd,
-							  page))) {
+		if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr,
+						address, pmd, page))) {
 			mem_cgroup_uncharge_page(page);
 			put_page(page);
 			goto out;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 3/7] hugetlb: pass fault address to hugetlb_no_page()
  2012-08-16 15:15 ` Kirill A. Shutemov
  (?)
  (?)
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/hugetlb.c |   38 +++++++++++++++++++-------------------
 1 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bc72712..3c86d3d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2672,7 +2672,8 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
 }
 
 static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, pte_t *ptep, unsigned int flags)
+			unsigned long haddr, unsigned long fault_address,
+			pte_t *ptep, unsigned int flags)
 {
 	struct hstate *h = hstate_vma(vma);
 	int ret = VM_FAULT_SIGBUS;
@@ -2696,7 +2697,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	}
 
 	mapping = vma->vm_file->f_mapping;
-	idx = vma_hugecache_offset(h, vma, address);
+	idx = vma_hugecache_offset(h, vma, haddr);
 
 	/*
 	 * Use page lock to guard against racing truncation
@@ -2708,7 +2709,7 @@ retry:
 		size = i_size_read(mapping->host) >> huge_page_shift(h);
 		if (idx >= size)
 			goto out;
-		page = alloc_huge_page(vma, address, 0);
+		page = alloc_huge_page(vma, haddr, 0);
 		if (IS_ERR(page)) {
 			ret = PTR_ERR(page);
 			if (ret = -ENOMEM)
@@ -2717,7 +2718,7 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, address, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
@@ -2763,7 +2764,7 @@ retry:
 	 * the spinlock.
 	 */
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED))
-		if (vma_needs_reservation(h, vma, address) < 0) {
+		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto backout_unlocked;
 		}
@@ -2778,16 +2779,16 @@ retry:
 		goto backout;
 
 	if (anon_rmap)
-		hugepage_add_new_anon_rmap(page, vma, address);
+		hugepage_add_new_anon_rmap(page, vma, haddr);
 	else
 		page_dup_rmap(page);
 	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
 				&& (vma->vm_flags & VM_SHARED)));
-	set_huge_pte_at(mm, address, ptep, new_pte);
+	set_huge_pte_at(mm, haddr, ptep, new_pte);
 
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
 		/* Optimization, do the COW without a second fault */
-		ret = hugetlb_cow(mm, vma, address, ptep, new_pte, page);
+		ret = hugetlb_cow(mm, vma, haddr, ptep, new_pte, page);
 	}
 
 	spin_unlock(&mm->page_table_lock);
@@ -2813,21 +2814,20 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *pagecache_page = NULL;
 	static DEFINE_MUTEX(hugetlb_instantiation_mutex);
 	struct hstate *h = hstate_vma(vma);
+	unsigned long haddr = address & huge_page_mask(h);
 
-	address &= huge_page_mask(h);
-
-	ptep = huge_pte_offset(mm, address);
+	ptep = huge_pte_offset(mm, haddr);
 	if (ptep) {
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
-			migration_entry_wait(mm, (pmd_t *)ptep, address);
+			migration_entry_wait(mm, (pmd_t *)ptep, haddr);
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
 				VM_FAULT_SET_HINDEX(hstate_index(h));
 	}
 
-	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
+	ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
 	if (!ptep)
 		return VM_FAULT_OOM;
 
@@ -2839,7 +2839,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	mutex_lock(&hugetlb_instantiation_mutex);
 	entry = huge_ptep_get(ptep);
 	if (huge_pte_none(entry)) {
-		ret = hugetlb_no_page(mm, vma, address, ptep, flags);
+		ret = hugetlb_no_page(mm, vma, haddr, address, ptep, flags);
 		goto out_mutex;
 	}
 
@@ -2854,14 +2854,14 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * consumed.
 	 */
 	if ((flags & FAULT_FLAG_WRITE) && !pte_write(entry)) {
-		if (vma_needs_reservation(h, vma, address) < 0) {
+		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto out_mutex;
 		}
 
 		if (!(vma->vm_flags & VM_MAYSHARE))
 			pagecache_page = hugetlbfs_pagecache_page(h,
-								vma, address);
+								vma, haddr);
 	}
 
 	/*
@@ -2884,16 +2884,16 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry)) {
-			ret = hugetlb_cow(mm, vma, address, ptep, entry,
+			ret = hugetlb_cow(mm, vma, haddr, ptep, entry,
 							pagecache_page);
 			goto out_page_table_lock;
 		}
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);
-	if (huge_ptep_set_access_flags(vma, address, ptep, entry,
+	if (huge_ptep_set_access_flags(vma, haddr, ptep, entry,
 						flags & FAULT_FLAG_WRITE))
-		update_mmu_cache(vma, address, ptep);
+		update_mmu_cache(vma, haddr, ptep);
 
 out_page_table_lock:
 	spin_unlock(&mm->page_table_lock);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 3/7] hugetlb: pass fault address to hugetlb_no_page()
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/hugetlb.c |   38 +++++++++++++++++++-------------------
 1 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bc72712..3c86d3d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2672,7 +2672,8 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
 }
 
 static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, pte_t *ptep, unsigned int flags)
+			unsigned long haddr, unsigned long fault_address,
+			pte_t *ptep, unsigned int flags)
 {
 	struct hstate *h = hstate_vma(vma);
 	int ret = VM_FAULT_SIGBUS;
@@ -2696,7 +2697,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	}
 
 	mapping = vma->vm_file->f_mapping;
-	idx = vma_hugecache_offset(h, vma, address);
+	idx = vma_hugecache_offset(h, vma, haddr);
 
 	/*
 	 * Use page lock to guard against racing truncation
@@ -2708,7 +2709,7 @@ retry:
 		size = i_size_read(mapping->host) >> huge_page_shift(h);
 		if (idx >= size)
 			goto out;
-		page = alloc_huge_page(vma, address, 0);
+		page = alloc_huge_page(vma, haddr, 0);
 		if (IS_ERR(page)) {
 			ret = PTR_ERR(page);
 			if (ret == -ENOMEM)
@@ -2717,7 +2718,7 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, address, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
@@ -2763,7 +2764,7 @@ retry:
 	 * the spinlock.
 	 */
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED))
-		if (vma_needs_reservation(h, vma, address) < 0) {
+		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto backout_unlocked;
 		}
@@ -2778,16 +2779,16 @@ retry:
 		goto backout;
 
 	if (anon_rmap)
-		hugepage_add_new_anon_rmap(page, vma, address);
+		hugepage_add_new_anon_rmap(page, vma, haddr);
 	else
 		page_dup_rmap(page);
 	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
 				&& (vma->vm_flags & VM_SHARED)));
-	set_huge_pte_at(mm, address, ptep, new_pte);
+	set_huge_pte_at(mm, haddr, ptep, new_pte);
 
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
 		/* Optimization, do the COW without a second fault */
-		ret = hugetlb_cow(mm, vma, address, ptep, new_pte, page);
+		ret = hugetlb_cow(mm, vma, haddr, ptep, new_pte, page);
 	}
 
 	spin_unlock(&mm->page_table_lock);
@@ -2813,21 +2814,20 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *pagecache_page = NULL;
 	static DEFINE_MUTEX(hugetlb_instantiation_mutex);
 	struct hstate *h = hstate_vma(vma);
+	unsigned long haddr = address & huge_page_mask(h);
 
-	address &= huge_page_mask(h);
-
-	ptep = huge_pte_offset(mm, address);
+	ptep = huge_pte_offset(mm, haddr);
 	if (ptep) {
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
-			migration_entry_wait(mm, (pmd_t *)ptep, address);
+			migration_entry_wait(mm, (pmd_t *)ptep, haddr);
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
 				VM_FAULT_SET_HINDEX(hstate_index(h));
 	}
 
-	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
+	ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
 	if (!ptep)
 		return VM_FAULT_OOM;
 
@@ -2839,7 +2839,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	mutex_lock(&hugetlb_instantiation_mutex);
 	entry = huge_ptep_get(ptep);
 	if (huge_pte_none(entry)) {
-		ret = hugetlb_no_page(mm, vma, address, ptep, flags);
+		ret = hugetlb_no_page(mm, vma, haddr, address, ptep, flags);
 		goto out_mutex;
 	}
 
@@ -2854,14 +2854,14 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * consumed.
 	 */
 	if ((flags & FAULT_FLAG_WRITE) && !pte_write(entry)) {
-		if (vma_needs_reservation(h, vma, address) < 0) {
+		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto out_mutex;
 		}
 
 		if (!(vma->vm_flags & VM_MAYSHARE))
 			pagecache_page = hugetlbfs_pagecache_page(h,
-								vma, address);
+								vma, haddr);
 	}
 
 	/*
@@ -2884,16 +2884,16 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry)) {
-			ret = hugetlb_cow(mm, vma, address, ptep, entry,
+			ret = hugetlb_cow(mm, vma, haddr, ptep, entry,
 							pagecache_page);
 			goto out_page_table_lock;
 		}
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);
-	if (huge_ptep_set_access_flags(vma, address, ptep, entry,
+	if (huge_ptep_set_access_flags(vma, haddr, ptep, entry,
 						flags & FAULT_FLAG_WRITE))
-		update_mmu_cache(vma, address, ptep);
+		update_mmu_cache(vma, haddr, ptep);
 
 out_page_table_lock:
 	spin_unlock(&mm->page_table_lock);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 3/7] hugetlb: pass fault address to hugetlb_no_page()
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/hugetlb.c |   38 +++++++++++++++++++-------------------
 1 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bc72712..3c86d3d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2672,7 +2672,8 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
 }
 
 static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, pte_t *ptep, unsigned int flags)
+			unsigned long haddr, unsigned long fault_address,
+			pte_t *ptep, unsigned int flags)
 {
 	struct hstate *h = hstate_vma(vma);
 	int ret = VM_FAULT_SIGBUS;
@@ -2696,7 +2697,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	}
 
 	mapping = vma->vm_file->f_mapping;
-	idx = vma_hugecache_offset(h, vma, address);
+	idx = vma_hugecache_offset(h, vma, haddr);
 
 	/*
 	 * Use page lock to guard against racing truncation
@@ -2708,7 +2709,7 @@ retry:
 		size = i_size_read(mapping->host) >> huge_page_shift(h);
 		if (idx >= size)
 			goto out;
-		page = alloc_huge_page(vma, address, 0);
+		page = alloc_huge_page(vma, haddr, 0);
 		if (IS_ERR(page)) {
 			ret = PTR_ERR(page);
 			if (ret == -ENOMEM)
@@ -2717,7 +2718,7 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, address, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
@@ -2763,7 +2764,7 @@ retry:
 	 * the spinlock.
 	 */
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED))
-		if (vma_needs_reservation(h, vma, address) < 0) {
+		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto backout_unlocked;
 		}
@@ -2778,16 +2779,16 @@ retry:
 		goto backout;
 
 	if (anon_rmap)
-		hugepage_add_new_anon_rmap(page, vma, address);
+		hugepage_add_new_anon_rmap(page, vma, haddr);
 	else
 		page_dup_rmap(page);
 	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
 				&& (vma->vm_flags & VM_SHARED)));
-	set_huge_pte_at(mm, address, ptep, new_pte);
+	set_huge_pte_at(mm, haddr, ptep, new_pte);
 
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
 		/* Optimization, do the COW without a second fault */
-		ret = hugetlb_cow(mm, vma, address, ptep, new_pte, page);
+		ret = hugetlb_cow(mm, vma, haddr, ptep, new_pte, page);
 	}
 
 	spin_unlock(&mm->page_table_lock);
@@ -2813,21 +2814,20 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *pagecache_page = NULL;
 	static DEFINE_MUTEX(hugetlb_instantiation_mutex);
 	struct hstate *h = hstate_vma(vma);
+	unsigned long haddr = address & huge_page_mask(h);
 
-	address &= huge_page_mask(h);
-
-	ptep = huge_pte_offset(mm, address);
+	ptep = huge_pte_offset(mm, haddr);
 	if (ptep) {
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
-			migration_entry_wait(mm, (pmd_t *)ptep, address);
+			migration_entry_wait(mm, (pmd_t *)ptep, haddr);
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
 				VM_FAULT_SET_HINDEX(hstate_index(h));
 	}
 
-	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
+	ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
 	if (!ptep)
 		return VM_FAULT_OOM;
 
@@ -2839,7 +2839,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	mutex_lock(&hugetlb_instantiation_mutex);
 	entry = huge_ptep_get(ptep);
 	if (huge_pte_none(entry)) {
-		ret = hugetlb_no_page(mm, vma, address, ptep, flags);
+		ret = hugetlb_no_page(mm, vma, haddr, address, ptep, flags);
 		goto out_mutex;
 	}
 
@@ -2854,14 +2854,14 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * consumed.
 	 */
 	if ((flags & FAULT_FLAG_WRITE) && !pte_write(entry)) {
-		if (vma_needs_reservation(h, vma, address) < 0) {
+		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto out_mutex;
 		}
 
 		if (!(vma->vm_flags & VM_MAYSHARE))
 			pagecache_page = hugetlbfs_pagecache_page(h,
-								vma, address);
+								vma, haddr);
 	}
 
 	/*
@@ -2884,16 +2884,16 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry)) {
-			ret = hugetlb_cow(mm, vma, address, ptep, entry,
+			ret = hugetlb_cow(mm, vma, haddr, ptep, entry,
 							pagecache_page);
 			goto out_page_table_lock;
 		}
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);
-	if (huge_ptep_set_access_flags(vma, address, ptep, entry,
+	if (huge_ptep_set_access_flags(vma, haddr, ptep, entry,
 						flags & FAULT_FLAG_WRITE))
-		update_mmu_cache(vma, address, ptep);
+		update_mmu_cache(vma, haddr, ptep);
 
 out_page_table_lock:
 	spin_unlock(&mm->page_table_lock);
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 3/7] hugetlb: pass fault address to hugetlb_no_page()
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/hugetlb.c |   38 +++++++++++++++++++-------------------
 1 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bc72712..3c86d3d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2672,7 +2672,8 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
 }
 
 static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, pte_t *ptep, unsigned int flags)
+			unsigned long haddr, unsigned long fault_address,
+			pte_t *ptep, unsigned int flags)
 {
 	struct hstate *h = hstate_vma(vma);
 	int ret = VM_FAULT_SIGBUS;
@@ -2696,7 +2697,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	}
 
 	mapping = vma->vm_file->f_mapping;
-	idx = vma_hugecache_offset(h, vma, address);
+	idx = vma_hugecache_offset(h, vma, haddr);
 
 	/*
 	 * Use page lock to guard against racing truncation
@@ -2708,7 +2709,7 @@ retry:
 		size = i_size_read(mapping->host) >> huge_page_shift(h);
 		if (idx >= size)
 			goto out;
-		page = alloc_huge_page(vma, address, 0);
+		page = alloc_huge_page(vma, haddr, 0);
 		if (IS_ERR(page)) {
 			ret = PTR_ERR(page);
 			if (ret == -ENOMEM)
@@ -2717,7 +2718,7 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, address, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
@@ -2763,7 +2764,7 @@ retry:
 	 * the spinlock.
 	 */
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED))
-		if (vma_needs_reservation(h, vma, address) < 0) {
+		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto backout_unlocked;
 		}
@@ -2778,16 +2779,16 @@ retry:
 		goto backout;
 
 	if (anon_rmap)
-		hugepage_add_new_anon_rmap(page, vma, address);
+		hugepage_add_new_anon_rmap(page, vma, haddr);
 	else
 		page_dup_rmap(page);
 	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
 				&& (vma->vm_flags & VM_SHARED)));
-	set_huge_pte_at(mm, address, ptep, new_pte);
+	set_huge_pte_at(mm, haddr, ptep, new_pte);
 
 	if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
 		/* Optimization, do the COW without a second fault */
-		ret = hugetlb_cow(mm, vma, address, ptep, new_pte, page);
+		ret = hugetlb_cow(mm, vma, haddr, ptep, new_pte, page);
 	}
 
 	spin_unlock(&mm->page_table_lock);
@@ -2813,21 +2814,20 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct page *pagecache_page = NULL;
 	static DEFINE_MUTEX(hugetlb_instantiation_mutex);
 	struct hstate *h = hstate_vma(vma);
+	unsigned long haddr = address & huge_page_mask(h);
 
-	address &= huge_page_mask(h);
-
-	ptep = huge_pte_offset(mm, address);
+	ptep = huge_pte_offset(mm, haddr);
 	if (ptep) {
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
-			migration_entry_wait(mm, (pmd_t *)ptep, address);
+			migration_entry_wait(mm, (pmd_t *)ptep, haddr);
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
 				VM_FAULT_SET_HINDEX(hstate_index(h));
 	}
 
-	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
+	ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
 	if (!ptep)
 		return VM_FAULT_OOM;
 
@@ -2839,7 +2839,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	mutex_lock(&hugetlb_instantiation_mutex);
 	entry = huge_ptep_get(ptep);
 	if (huge_pte_none(entry)) {
-		ret = hugetlb_no_page(mm, vma, address, ptep, flags);
+		ret = hugetlb_no_page(mm, vma, haddr, address, ptep, flags);
 		goto out_mutex;
 	}
 
@@ -2854,14 +2854,14 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * consumed.
 	 */
 	if ((flags & FAULT_FLAG_WRITE) && !pte_write(entry)) {
-		if (vma_needs_reservation(h, vma, address) < 0) {
+		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto out_mutex;
 		}
 
 		if (!(vma->vm_flags & VM_MAYSHARE))
 			pagecache_page = hugetlbfs_pagecache_page(h,
-								vma, address);
+								vma, haddr);
 	}
 
 	/*
@@ -2884,16 +2884,16 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry)) {
-			ret = hugetlb_cow(mm, vma, address, ptep, entry,
+			ret = hugetlb_cow(mm, vma, haddr, ptep, entry,
 							pagecache_page);
 			goto out_page_table_lock;
 		}
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);
-	if (huge_ptep_set_access_flags(vma, address, ptep, entry,
+	if (huge_ptep_set_access_flags(vma, haddr, ptep, entry,
 						flags & FAULT_FLAG_WRITE))
-		update_mmu_cache(vma, address, ptep);
+		update_mmu_cache(vma, haddr, ptep);
 
 out_page_table_lock:
 	spin_unlock(&mm->page_table_lock);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 4/7] mm: pass fault address to clear_huge_page()
  2012-08-16 15:15 ` Kirill A. Shutemov
  (?)
  (?)
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |    2 +-
 mm/huge_memory.c   |    2 +-
 mm/hugetlb.c       |    3 ++-
 mm/memory.c        |    7 ++++---
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..2858723 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1638,7 +1638,7 @@ extern void dump_page(struct page *page);
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
 extern void clear_huge_page(struct page *page,
-			    unsigned long addr,
+			    unsigned long haddr, unsigned long fault_address,
 			    unsigned int pages_per_huge_page);
 extern void copy_user_huge_page(struct page *dst, struct page *src,
 				unsigned long addr, struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6f0825b611..070bf89 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -644,7 +644,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 	if (unlikely(!pgtable))
 		return VM_FAULT_OOM;
 
-	clear_huge_page(page, haddr, HPAGE_PMD_NR);
+	clear_huge_page(page, haddr, address, HPAGE_PMD_NR);
 	__SetPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3c86d3d..5182192 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2718,7 +2718,8 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, haddr, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, fault_address,
+				pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
diff --git a/mm/memory.c b/mm/memory.c
index 5736170..dfc179b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3984,19 +3984,20 @@ static void clear_gigantic_page(struct page *page,
 	}
 }
 void clear_huge_page(struct page *page,
-		     unsigned long addr, unsigned int pages_per_huge_page)
+		     unsigned long haddr, unsigned long fault_address,
+		     unsigned int pages_per_huge_page)
 {
 	int i;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, addr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, addr + i * PAGE_SIZE);
+		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
 	}
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 4/7] mm: pass fault address to clear_huge_page()
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |    2 +-
 mm/huge_memory.c   |    2 +-
 mm/hugetlb.c       |    3 ++-
 mm/memory.c        |    7 ++++---
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..2858723 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1638,7 +1638,7 @@ extern void dump_page(struct page *page);
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
 extern void clear_huge_page(struct page *page,
-			    unsigned long addr,
+			    unsigned long haddr, unsigned long fault_address,
 			    unsigned int pages_per_huge_page);
 extern void copy_user_huge_page(struct page *dst, struct page *src,
 				unsigned long addr, struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6f0825b611..070bf89 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -644,7 +644,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 	if (unlikely(!pgtable))
 		return VM_FAULT_OOM;
 
-	clear_huge_page(page, haddr, HPAGE_PMD_NR);
+	clear_huge_page(page, haddr, address, HPAGE_PMD_NR);
 	__SetPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3c86d3d..5182192 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2718,7 +2718,8 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, haddr, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, fault_address,
+				pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
diff --git a/mm/memory.c b/mm/memory.c
index 5736170..dfc179b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3984,19 +3984,20 @@ static void clear_gigantic_page(struct page *page,
 	}
 }
 void clear_huge_page(struct page *page,
-		     unsigned long addr, unsigned int pages_per_huge_page)
+		     unsigned long haddr, unsigned long fault_address,
+		     unsigned int pages_per_huge_page)
 {
 	int i;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, addr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, addr + i * PAGE_SIZE);
+		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
 	}
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 4/7] mm: pass fault address to clear_huge_page()
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |    2 +-
 mm/huge_memory.c   |    2 +-
 mm/hugetlb.c       |    3 ++-
 mm/memory.c        |    7 ++++---
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..2858723 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1638,7 +1638,7 @@ extern void dump_page(struct page *page);
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
 extern void clear_huge_page(struct page *page,
-			    unsigned long addr,
+			    unsigned long haddr, unsigned long fault_address,
 			    unsigned int pages_per_huge_page);
 extern void copy_user_huge_page(struct page *dst, struct page *src,
 				unsigned long addr, struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6f0825b611..070bf89 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -644,7 +644,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 	if (unlikely(!pgtable))
 		return VM_FAULT_OOM;
 
-	clear_huge_page(page, haddr, HPAGE_PMD_NR);
+	clear_huge_page(page, haddr, address, HPAGE_PMD_NR);
 	__SetPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3c86d3d..5182192 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2718,7 +2718,8 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, haddr, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, fault_address,
+				pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
diff --git a/mm/memory.c b/mm/memory.c
index 5736170..dfc179b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3984,19 +3984,20 @@ static void clear_gigantic_page(struct page *page,
 	}
 }
 void clear_huge_page(struct page *page,
-		     unsigned long addr, unsigned int pages_per_huge_page)
+		     unsigned long haddr, unsigned long fault_address,
+		     unsigned int pages_per_huge_page)
 {
 	int i;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, addr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, addr + i * PAGE_SIZE);
+		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
 	}
 }
 
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 4/7] mm: pass fault address to clear_huge_page()
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |    2 +-
 mm/huge_memory.c   |    2 +-
 mm/hugetlb.c       |    3 ++-
 mm/memory.c        |    7 ++++---
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..2858723 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1638,7 +1638,7 @@ extern void dump_page(struct page *page);
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
 extern void clear_huge_page(struct page *page,
-			    unsigned long addr,
+			    unsigned long haddr, unsigned long fault_address,
 			    unsigned int pages_per_huge_page);
 extern void copy_user_huge_page(struct page *dst, struct page *src,
 				unsigned long addr, struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6f0825b611..070bf89 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -644,7 +644,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 	if (unlikely(!pgtable))
 		return VM_FAULT_OOM;
 
-	clear_huge_page(page, haddr, HPAGE_PMD_NR);
+	clear_huge_page(page, haddr, address, HPAGE_PMD_NR);
 	__SetPageUptodate(page);
 
 	spin_lock(&mm->page_table_lock);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3c86d3d..5182192 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2718,7 +2718,8 @@ retry:
 				ret = VM_FAULT_SIGBUS;
 			goto out;
 		}
-		clear_huge_page(page, haddr, pages_per_huge_page(h));
+		clear_huge_page(page, haddr, fault_address,
+				pages_per_huge_page(h));
 		__SetPageUptodate(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
diff --git a/mm/memory.c b/mm/memory.c
index 5736170..dfc179b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3984,19 +3984,20 @@ static void clear_gigantic_page(struct page *page,
 	}
 }
 void clear_huge_page(struct page *page,
-		     unsigned long addr, unsigned int pages_per_huge_page)
+		     unsigned long haddr, unsigned long fault_address,
+		     unsigned int pages_per_huge_page)
 {
 	int i;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, addr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, addr + i * PAGE_SIZE);
+		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
 	}
 }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 5/7] x86: Add clear_page_nocache
  2012-08-16 15:15 ` Kirill A. Shutemov
  (?)
  (?)
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Add a cache avoiding version of clear_page. Straight forward integer variant
of the existing 64bit clear_page, for both 32bit and 64bit.

Also add the necessary glue for highmem including a layer that non cache
coherent architectures that use the virtual address for flushing can
hook in. This is not needed on x86 of course.

If an architecture wants to provide cache avoiding version of clear_page
it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 +++
 arch/x86/include/asm/string_64.h |    5 +++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 ++++++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   29 +++++++++++++++
 arch/x86/mm/fault.c              |    7 ++++
 7 files changed, 122 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 8ca8283..aa83a1b 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -29,6 +29,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	copy_page(to, from);
 }
 
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr);
+
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index 3d3e835..3f2fbcf 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Let gcc decide whether to inline or use the out of line functions */
 
 #define __HAVE_ARCH_STRCPY
@@ -337,6 +339,9 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern,
 #define __HAVE_ARCH_MEMSCAN
 extern void *memscan(void *addr, int c, size_t size);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_32_H */
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 19e2c46..ca23d1d 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Written 2002 by Andi Kleen */
 
 /* Only used for special circumstances. Stolen from i386/string.h */
@@ -63,6 +65,9 @@ char *strcpy(char *dest, const char *src);
 char *strcat(char *dest, const char *src);
 int strcmp(const char *cs, const char *ct);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index b00f678..14e47a2 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,6 +23,7 @@ lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_SMP) += rwlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-y += clear_page_$(BITS).o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
 
@@ -40,7 +41,7 @@ endif
 else
         obj-y += iomap_copy_64.o
         lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
-        lib-y += thunk_64.o clear_page_64.o copy_page_64.o
+        lib-y += thunk_64.o copy_page_64.o
         lib-y += memmove_64.o memset_64.o
         lib-y += copy_user_64.o copy_user_nocache_64.o
 	lib-y += cmpxchg16b_emu.o
diff --git a/arch/x86/lib/clear_page_32.S b/arch/x86/lib/clear_page_32.S
new file mode 100644
index 0000000..9592161
--- /dev/null
+++ b/arch/x86/lib/clear_page_32.S
@@ -0,0 +1,72 @@
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeature.h>
+#include <asm/dwarf2.h>
+
+/*
+ * Fallback version if SSE2 is not avaible.
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	mov    %eax,%edx
+	xorl   %eax,%eax
+	movl   $4096/32,%ecx
+	.p2align 4
+.Lloop:
+	decl	%ecx
+#define PUT(x) mov %eax,x*4(%edx)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	lea	32(%edx),%edx
+	jnz	.Lloop
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
+
+	.section .altinstr_replacement,"ax"
+1:      .byte 0xeb /* jmp <disp8> */
+	.byte (clear_page_nocache_sse2 - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_XMM2,\
+				16, 2b-1b
+	.previous
+
+/*
+ * Zero a page avoiding the caches
+ * eax	page
+ */
+ENTRY(clear_page_nocache_sse2)
+	CFI_STARTPROC
+	mov    %eax,%edx
+	xorl   %eax,%eax
+	movl   $4096/32,%ecx
+	.p2align 4
+.Lloop_sse2:
+	decl	%ecx
+#define PUT(x) movnti %eax,x*4(%edx)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	lea	32(%edx),%edx
+	jnz	.Lloop_sse2
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_sse2)
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index f2145cf..9d2f3c2 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -40,6 +40,7 @@ ENTRY(clear_page)
 	PUT(5)
 	PUT(6)
 	PUT(7)
+#undef PUT
 	leaq	64(%rdi),%rdi
 	jnz	.Lloop
 	nop
@@ -71,3 +72,31 @@ ENDPROC(clear_page)
 	altinstruction_entry clear_page,2b,X86_FEATURE_ERMS,   \
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
+
+/*
+ * Zero a page avoiding the caches
+ * rdi	page
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	xorl   %eax,%eax
+	movl   $4096/64,%ecx
+	.p2align 4
+.Lloop_nocache:
+	decl	%ecx
+#define PUT(x) movnti %rax,x*8(%rdi)
+	movnti %rax,(%rdi)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	leaq	64(%rdi),%rdi
+	jnz	.Lloop_nocache
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 76dcd9d..d8cf231 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1209,3 +1209,10 @@ good_area:
 
 	up_read(&mm->mmap_sem);
 }
+
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr)
+{
+	void *p = kmap_atomic(page);
+	clear_page_nocache(p);
+	kunmap_atomic(p);
+}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 5/7] x86: Add clear_page_nocache
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Add a cache avoiding version of clear_page. Straight forward integer variant
of the existing 64bit clear_page, for both 32bit and 64bit.

Also add the necessary glue for highmem including a layer that non cache
coherent architectures that use the virtual address for flushing can
hook in. This is not needed on x86 of course.

If an architecture wants to provide cache avoiding version of clear_page
it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 +++
 arch/x86/include/asm/string_64.h |    5 +++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 ++++++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   29 +++++++++++++++
 arch/x86/mm/fault.c              |    7 ++++
 7 files changed, 122 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 8ca8283..aa83a1b 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -29,6 +29,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	copy_page(to, from);
 }
 
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr);
+
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index 3d3e835..3f2fbcf 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Let gcc decide whether to inline or use the out of line functions */
 
 #define __HAVE_ARCH_STRCPY
@@ -337,6 +339,9 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern,
 #define __HAVE_ARCH_MEMSCAN
 extern void *memscan(void *addr, int c, size_t size);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_32_H */
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 19e2c46..ca23d1d 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Written 2002 by Andi Kleen */
 
 /* Only used for special circumstances. Stolen from i386/string.h */
@@ -63,6 +65,9 @@ char *strcpy(char *dest, const char *src);
 char *strcat(char *dest, const char *src);
 int strcmp(const char *cs, const char *ct);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index b00f678..14e47a2 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,6 +23,7 @@ lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_SMP) += rwlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-y += clear_page_$(BITS).o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
 
@@ -40,7 +41,7 @@ endif
 else
         obj-y += iomap_copy_64.o
         lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
-        lib-y += thunk_64.o clear_page_64.o copy_page_64.o
+        lib-y += thunk_64.o copy_page_64.o
         lib-y += memmove_64.o memset_64.o
         lib-y += copy_user_64.o copy_user_nocache_64.o
 	lib-y += cmpxchg16b_emu.o
diff --git a/arch/x86/lib/clear_page_32.S b/arch/x86/lib/clear_page_32.S
new file mode 100644
index 0000000..9592161
--- /dev/null
+++ b/arch/x86/lib/clear_page_32.S
@@ -0,0 +1,72 @@
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeature.h>
+#include <asm/dwarf2.h>
+
+/*
+ * Fallback version if SSE2 is not avaible.
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	mov    %eax,%edx
+	xorl   %eax,%eax
+	movl   $4096/32,%ecx
+	.p2align 4
+.Lloop:
+	decl	%ecx
+#define PUT(x) mov %eax,x*4(%edx)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	lea	32(%edx),%edx
+	jnz	.Lloop
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
+
+	.section .altinstr_replacement,"ax"
+1:      .byte 0xeb /* jmp <disp8> */
+	.byte (clear_page_nocache_sse2 - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_XMM2,\
+				16, 2b-1b
+	.previous
+
+/*
+ * Zero a page avoiding the caches
+ * eax	page
+ */
+ENTRY(clear_page_nocache_sse2)
+	CFI_STARTPROC
+	mov    %eax,%edx
+	xorl   %eax,%eax
+	movl   $4096/32,%ecx
+	.p2align 4
+.Lloop_sse2:
+	decl	%ecx
+#define PUT(x) movnti %eax,x*4(%edx)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	lea	32(%edx),%edx
+	jnz	.Lloop_sse2
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_sse2)
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index f2145cf..9d2f3c2 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -40,6 +40,7 @@ ENTRY(clear_page)
 	PUT(5)
 	PUT(6)
 	PUT(7)
+#undef PUT
 	leaq	64(%rdi),%rdi
 	jnz	.Lloop
 	nop
@@ -71,3 +72,31 @@ ENDPROC(clear_page)
 	altinstruction_entry clear_page,2b,X86_FEATURE_ERMS,   \
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
+
+/*
+ * Zero a page avoiding the caches
+ * rdi	page
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	xorl   %eax,%eax
+	movl   $4096/64,%ecx
+	.p2align 4
+.Lloop_nocache:
+	decl	%ecx
+#define PUT(x) movnti %rax,x*8(%rdi)
+	movnti %rax,(%rdi)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	leaq	64(%rdi),%rdi
+	jnz	.Lloop_nocache
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 76dcd9d..d8cf231 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1209,3 +1209,10 @@ good_area:
 
 	up_read(&mm->mmap_sem);
 }
+
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr)
+{
+	void *p = kmap_atomic(page);
+	clear_page_nocache(p);
+	kunmap_atomic(p);
+}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 5/7] x86: Add clear_page_nocache
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Add a cache avoiding version of clear_page. Straight forward integer variant
of the existing 64bit clear_page, for both 32bit and 64bit.

Also add the necessary glue for highmem including a layer that non cache
coherent architectures that use the virtual address for flushing can
hook in. This is not needed on x86 of course.

If an architecture wants to provide cache avoiding version of clear_page
it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 +++
 arch/x86/include/asm/string_64.h |    5 +++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 ++++++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   29 +++++++++++++++
 arch/x86/mm/fault.c              |    7 ++++
 7 files changed, 122 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 8ca8283..aa83a1b 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -29,6 +29,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	copy_page(to, from);
 }
 
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr);
+
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index 3d3e835..3f2fbcf 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Let gcc decide whether to inline or use the out of line functions */
 
 #define __HAVE_ARCH_STRCPY
@@ -337,6 +339,9 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern,
 #define __HAVE_ARCH_MEMSCAN
 extern void *memscan(void *addr, int c, size_t size);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_32_H */
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 19e2c46..ca23d1d 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Written 2002 by Andi Kleen */
 
 /* Only used for special circumstances. Stolen from i386/string.h */
@@ -63,6 +65,9 @@ char *strcpy(char *dest, const char *src);
 char *strcat(char *dest, const char *src);
 int strcmp(const char *cs, const char *ct);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index b00f678..14e47a2 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,6 +23,7 @@ lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_SMP) += rwlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-y += clear_page_$(BITS).o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
 
@@ -40,7 +41,7 @@ endif
 else
         obj-y += iomap_copy_64.o
         lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
-        lib-y += thunk_64.o clear_page_64.o copy_page_64.o
+        lib-y += thunk_64.o copy_page_64.o
         lib-y += memmove_64.o memset_64.o
         lib-y += copy_user_64.o copy_user_nocache_64.o
 	lib-y += cmpxchg16b_emu.o
diff --git a/arch/x86/lib/clear_page_32.S b/arch/x86/lib/clear_page_32.S
new file mode 100644
index 0000000..9592161
--- /dev/null
+++ b/arch/x86/lib/clear_page_32.S
@@ -0,0 +1,72 @@
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeature.h>
+#include <asm/dwarf2.h>
+
+/*
+ * Fallback version if SSE2 is not avaible.
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	mov    %eax,%edx
+	xorl   %eax,%eax
+	movl   $4096/32,%ecx
+	.p2align 4
+.Lloop:
+	decl	%ecx
+#define PUT(x) mov %eax,x*4(%edx)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	lea	32(%edx),%edx
+	jnz	.Lloop
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
+
+	.section .altinstr_replacement,"ax"
+1:      .byte 0xeb /* jmp <disp8> */
+	.byte (clear_page_nocache_sse2 - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_XMM2,\
+				16, 2b-1b
+	.previous
+
+/*
+ * Zero a page avoiding the caches
+ * eax	page
+ */
+ENTRY(clear_page_nocache_sse2)
+	CFI_STARTPROC
+	mov    %eax,%edx
+	xorl   %eax,%eax
+	movl   $4096/32,%ecx
+	.p2align 4
+.Lloop_sse2:
+	decl	%ecx
+#define PUT(x) movnti %eax,x*4(%edx)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	lea	32(%edx),%edx
+	jnz	.Lloop_sse2
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_sse2)
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index f2145cf..9d2f3c2 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -40,6 +40,7 @@ ENTRY(clear_page)
 	PUT(5)
 	PUT(6)
 	PUT(7)
+#undef PUT
 	leaq	64(%rdi),%rdi
 	jnz	.Lloop
 	nop
@@ -71,3 +72,31 @@ ENDPROC(clear_page)
 	altinstruction_entry clear_page,2b,X86_FEATURE_ERMS,   \
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
+
+/*
+ * Zero a page avoiding the caches
+ * rdi	page
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	xorl   %eax,%eax
+	movl   $4096/64,%ecx
+	.p2align 4
+.Lloop_nocache:
+	decl	%ecx
+#define PUT(x) movnti %rax,x*8(%rdi)
+	movnti %rax,(%rdi)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	leaq	64(%rdi),%rdi
+	jnz	.Lloop_nocache
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 76dcd9d..d8cf231 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1209,3 +1209,10 @@ good_area:
 
 	up_read(&mm->mmap_sem);
 }
+
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr)
+{
+	void *p = kmap_atomic(page);
+	clear_page_nocache(p);
+	kunmap_atomic(p);
+}
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 5/7] x86: Add clear_page_nocache
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

From: Andi Kleen <ak@linux.intel.com>

Add a cache avoiding version of clear_page. Straight forward integer variant
of the existing 64bit clear_page, for both 32bit and 64bit.

Also add the necessary glue for highmem including a layer that non cache
coherent architectures that use the virtual address for flushing can
hook in. This is not needed on x86 of course.

If an architecture wants to provide cache avoiding version of clear_page
it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/page.h      |    2 +
 arch/x86/include/asm/string_32.h |    5 +++
 arch/x86/include/asm/string_64.h |    5 +++
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/clear_page_32.S     |   72 ++++++++++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_64.S     |   29 +++++++++++++++
 arch/x86/mm/fault.c              |    7 ++++
 7 files changed, 122 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/lib/clear_page_32.S

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 8ca8283..aa83a1b 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -29,6 +29,8 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	copy_page(to, from);
 }
 
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr);
+
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index 3d3e835..3f2fbcf 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Let gcc decide whether to inline or use the out of line functions */
 
 #define __HAVE_ARCH_STRCPY
@@ -337,6 +339,9 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern,
 #define __HAVE_ARCH_MEMSCAN
 extern void *memscan(void *addr, int c, size_t size);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_32_H */
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 19e2c46..ca23d1d 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#include <linux/linkage.h>
+
 /* Written 2002 by Andi Kleen */
 
 /* Only used for special circumstances. Stolen from i386/string.h */
@@ -63,6 +65,9 @@ char *strcpy(char *dest, const char *src);
 char *strcat(char *dest, const char *src);
 int strcmp(const char *cs, const char *ct);
 
+#define ARCH_HAS_USER_NOCACHE 1
+asmlinkage void clear_page_nocache(void *page);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_STRING_64_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index b00f678..14e47a2 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,6 +23,7 @@ lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_SMP) += rwlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-y += clear_page_$(BITS).o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o
 
@@ -40,7 +41,7 @@ endif
 else
         obj-y += iomap_copy_64.o
         lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
-        lib-y += thunk_64.o clear_page_64.o copy_page_64.o
+        lib-y += thunk_64.o copy_page_64.o
         lib-y += memmove_64.o memset_64.o
         lib-y += copy_user_64.o copy_user_nocache_64.o
 	lib-y += cmpxchg16b_emu.o
diff --git a/arch/x86/lib/clear_page_32.S b/arch/x86/lib/clear_page_32.S
new file mode 100644
index 0000000..9592161
--- /dev/null
+++ b/arch/x86/lib/clear_page_32.S
@@ -0,0 +1,72 @@
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeature.h>
+#include <asm/dwarf2.h>
+
+/*
+ * Fallback version if SSE2 is not avaible.
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	mov    %eax,%edx
+	xorl   %eax,%eax
+	movl   $4096/32,%ecx
+	.p2align 4
+.Lloop:
+	decl	%ecx
+#define PUT(x) mov %eax,x*4(%edx)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	lea	32(%edx),%edx
+	jnz	.Lloop
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
+
+	.section .altinstr_replacement,"ax"
+1:      .byte 0xeb /* jmp <disp8> */
+	.byte (clear_page_nocache_sse2 - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_XMM2,\
+				16, 2b-1b
+	.previous
+
+/*
+ * Zero a page avoiding the caches
+ * eax	page
+ */
+ENTRY(clear_page_nocache_sse2)
+	CFI_STARTPROC
+	mov    %eax,%edx
+	xorl   %eax,%eax
+	movl   $4096/32,%ecx
+	.p2align 4
+.Lloop_sse2:
+	decl	%ecx
+#define PUT(x) movnti %eax,x*4(%edx)
+	PUT(0)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	lea	32(%edx),%edx
+	jnz	.Lloop_sse2
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_sse2)
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index f2145cf..9d2f3c2 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -40,6 +40,7 @@ ENTRY(clear_page)
 	PUT(5)
 	PUT(6)
 	PUT(7)
+#undef PUT
 	leaq	64(%rdi),%rdi
 	jnz	.Lloop
 	nop
@@ -71,3 +72,31 @@ ENDPROC(clear_page)
 	altinstruction_entry clear_page,2b,X86_FEATURE_ERMS,   \
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
+
+/*
+ * Zero a page avoiding the caches
+ * rdi	page
+ */
+ENTRY(clear_page_nocache)
+	CFI_STARTPROC
+	xorl   %eax,%eax
+	movl   $4096/64,%ecx
+	.p2align 4
+.Lloop_nocache:
+	decl	%ecx
+#define PUT(x) movnti %rax,x*8(%rdi)
+	movnti %rax,(%rdi)
+	PUT(1)
+	PUT(2)
+	PUT(3)
+	PUT(4)
+	PUT(5)
+	PUT(6)
+	PUT(7)
+#undef PUT
+	leaq	64(%rdi),%rdi
+	jnz	.Lloop_nocache
+	nop
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 76dcd9d..d8cf231 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1209,3 +1209,10 @@ good_area:
 
 	up_read(&mm->mmap_sem);
 }
+
+void clear_user_highpage_nocache(struct page *page, unsigned long vaddr)
+{
+	void *p = kmap_atomic(page);
+	clear_page_nocache(p);
+	kunmap_atomic(p);
+}
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
  2012-08-16 15:15 ` Kirill A. Shutemov
  (?)
  (?)
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels
of CPU caches. To avoid this only cache clear the 4K area
around the fault address and use a cache avoiding clears
for the rest of the 2MB area.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c |   34 +++++++++++++++++++++++++++++-----
 1 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index dfc179b..d4626b9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3969,18 +3969,34 @@ EXPORT_SYMBOL(might_fault);
 #endif
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
+
+#ifndef ARCH_HAS_USER_NOCACHE
+#define ARCH_HAS_USER_NOCACHE 0
+#endif
+
+#if ARCH_HAS_USER_NOCACHE = 0
+#define clear_user_highpage_nocache clear_user_highpage
+#endif
+
 static void clear_gigantic_page(struct page *page,
-				unsigned long addr,
-				unsigned int pages_per_huge_page)
+		unsigned long haddr, unsigned long fault_address,
+		unsigned int pages_per_huge_page)
 {
 	int i;
 	struct page *p = page;
+	unsigned long vaddr;
+	int target = (fault_address - haddr) >> PAGE_SHIFT;
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page;
 	     i++, p = mem_map_next(p, page, i)) {
 		cond_resched();
-		clear_user_highpage(p, addr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE  || i = target)
+			clear_user_highpage(p, vaddr);
+		else
+			clear_user_highpage_nocache(p, vaddr);
 	}
 }
 void clear_huge_page(struct page *page,
@@ -3988,16 +4004,24 @@ void clear_huge_page(struct page *page,
 		     unsigned int pages_per_huge_page)
 {
 	int i;
+	unsigned long vaddr;
+	int target = (fault_address - haddr) >> PAGE_SHIFT;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, haddr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, fault_address,
+				pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE || i = target)
+			clear_user_highpage(page + i, vaddr);
+		else
+			clear_user_highpage_nocache(page + i, vaddr);
 	}
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels
of CPU caches. To avoid this only cache clear the 4K area
around the fault address and use a cache avoiding clears
for the rest of the 2MB area.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c |   34 +++++++++++++++++++++++++++++-----
 1 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index dfc179b..d4626b9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3969,18 +3969,34 @@ EXPORT_SYMBOL(might_fault);
 #endif
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
+
+#ifndef ARCH_HAS_USER_NOCACHE
+#define ARCH_HAS_USER_NOCACHE 0
+#endif
+
+#if ARCH_HAS_USER_NOCACHE == 0
+#define clear_user_highpage_nocache clear_user_highpage
+#endif
+
 static void clear_gigantic_page(struct page *page,
-				unsigned long addr,
-				unsigned int pages_per_huge_page)
+		unsigned long haddr, unsigned long fault_address,
+		unsigned int pages_per_huge_page)
 {
 	int i;
 	struct page *p = page;
+	unsigned long vaddr;
+	int target = (fault_address - haddr) >> PAGE_SHIFT;
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page;
 	     i++, p = mem_map_next(p, page, i)) {
 		cond_resched();
-		clear_user_highpage(p, addr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE  || i == target)
+			clear_user_highpage(p, vaddr);
+		else
+			clear_user_highpage_nocache(p, vaddr);
 	}
 }
 void clear_huge_page(struct page *page,
@@ -3988,16 +4004,24 @@ void clear_huge_page(struct page *page,
 		     unsigned int pages_per_huge_page)
 {
 	int i;
+	unsigned long vaddr;
+	int target = (fault_address - haddr) >> PAGE_SHIFT;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, haddr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, fault_address,
+				pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE || i == target)
+			clear_user_highpage(page + i, vaddr);
+		else
+			clear_user_highpage_nocache(page + i, vaddr);
 	}
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels
of CPU caches. To avoid this only cache clear the 4K area
around the fault address and use a cache avoiding clears
for the rest of the 2MB area.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c |   34 +++++++++++++++++++++++++++++-----
 1 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index dfc179b..d4626b9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3969,18 +3969,34 @@ EXPORT_SYMBOL(might_fault);
 #endif
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
+
+#ifndef ARCH_HAS_USER_NOCACHE
+#define ARCH_HAS_USER_NOCACHE 0
+#endif
+
+#if ARCH_HAS_USER_NOCACHE == 0
+#define clear_user_highpage_nocache clear_user_highpage
+#endif
+
 static void clear_gigantic_page(struct page *page,
-				unsigned long addr,
-				unsigned int pages_per_huge_page)
+		unsigned long haddr, unsigned long fault_address,
+		unsigned int pages_per_huge_page)
 {
 	int i;
 	struct page *p = page;
+	unsigned long vaddr;
+	int target = (fault_address - haddr) >> PAGE_SHIFT;
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page;
 	     i++, p = mem_map_next(p, page, i)) {
 		cond_resched();
-		clear_user_highpage(p, addr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE  || i == target)
+			clear_user_highpage(p, vaddr);
+		else
+			clear_user_highpage_nocache(p, vaddr);
 	}
 }
 void clear_huge_page(struct page *page,
@@ -3988,16 +4004,24 @@ void clear_huge_page(struct page *page,
 		     unsigned int pages_per_huge_page)
 {
 	int i;
+	unsigned long vaddr;
+	int target = (fault_address - haddr) >> PAGE_SHIFT;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, haddr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, fault_address,
+				pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE || i == target)
+			clear_user_highpage(page + i, vaddr);
+		else
+			clear_user_highpage_nocache(page + i, vaddr);
 	}
 }
 
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

From: Andi Kleen <ak@linux.intel.com>

Clearing a 2MB huge page will typically blow away several levels
of CPU caches. To avoid this only cache clear the 4K area
around the fault address and use a cache avoiding clears
for the rest of the 2MB area.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c |   34 +++++++++++++++++++++++++++++-----
 1 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index dfc179b..d4626b9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3969,18 +3969,34 @@ EXPORT_SYMBOL(might_fault);
 #endif
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
+
+#ifndef ARCH_HAS_USER_NOCACHE
+#define ARCH_HAS_USER_NOCACHE 0
+#endif
+
+#if ARCH_HAS_USER_NOCACHE == 0
+#define clear_user_highpage_nocache clear_user_highpage
+#endif
+
 static void clear_gigantic_page(struct page *page,
-				unsigned long addr,
-				unsigned int pages_per_huge_page)
+		unsigned long haddr, unsigned long fault_address,
+		unsigned int pages_per_huge_page)
 {
 	int i;
 	struct page *p = page;
+	unsigned long vaddr;
+	int target = (fault_address - haddr) >> PAGE_SHIFT;
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page;
 	     i++, p = mem_map_next(p, page, i)) {
 		cond_resched();
-		clear_user_highpage(p, addr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE  || i == target)
+			clear_user_highpage(p, vaddr);
+		else
+			clear_user_highpage_nocache(p, vaddr);
 	}
 }
 void clear_huge_page(struct page *page,
@@ -3988,16 +4004,24 @@ void clear_huge_page(struct page *page,
 		     unsigned int pages_per_huge_page)
 {
 	int i;
+	unsigned long vaddr;
+	int target = (fault_address - haddr) >> PAGE_SHIFT;
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
-		clear_gigantic_page(page, haddr, pages_per_huge_page);
+		clear_gigantic_page(page, haddr, fault_address,
+				pages_per_huge_page);
 		return;
 	}
 
 	might_sleep();
+	vaddr = haddr;
 	for (i = 0; i < pages_per_huge_page; i++) {
 		cond_resched();
-		clear_user_highpage(page + i, haddr + i * PAGE_SIZE);
+		vaddr = haddr + i*PAGE_SIZE;
+		if (!ARCH_HAS_USER_NOCACHE || i == target)
+			clear_user_highpage(page + i, vaddr);
+		else
+			clear_user_highpage_nocache(page + i, vaddr);
 	}
 }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 7/7] x86: switch the 64bit uncached page clear to SSE/AVX v2
  2012-08-16 15:15 ` Kirill A. Shutemov
  (?)
  (?)
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

With multiple threads vector stores are more efficient, so use them.
This will cause the page clear to run non preemptable and add some
overhead. However on 32bit it was already non preempable (due to
kmap_atomic) and there is an preemption opportunity every 4K unit.

On a NPB (Nasa Parallel Benchmark) 128GB run on a Westmere this improves
the performance regression of enabling transparent huge pages
by ~2% (2.81% to 0.81%), near the runtime variability now.
On a system with AVX support more is expected.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[kirill.shutemov@linux.intel.com: Properly save/restore arguments]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/lib/clear_page_64.S |   79 ++++++++++++++++++++++++++++++++++--------
 1 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index 9d2f3c2..b302cff 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -73,30 +73,79 @@ ENDPROC(clear_page)
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
 
+#define SSE_UNROLL 128
+
 /*
  * Zero a page avoiding the caches
  * rdi	page
  */
 ENTRY(clear_page_nocache)
 	CFI_STARTPROC
-	xorl   %eax,%eax
-	movl   $4096/64,%ecx
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $16,%rsp
+	CFI_ADJUST_CFA_OFFSET 16
+	movdqu %xmm0,(%rsp)
+	xorpd  %xmm0,%xmm0
+	movl   $4096/SSE_UNROLL,%ecx
 	.p2align 4
 .Lloop_nocache:
 	decl	%ecx
-#define PUT(x) movnti %rax,x*8(%rdi)
-	movnti %rax,(%rdi)
-	PUT(1)
-	PUT(2)
-	PUT(3)
-	PUT(4)
-	PUT(5)
-	PUT(6)
-	PUT(7)
-#undef PUT
-	leaq	64(%rdi),%rdi
+	.set x,0
+	.rept SSE_UNROLL/16
+	movntdq %xmm0,x(%rdi)
+	.set x,x+16
+	.endr
+	leaq	SSE_UNROLL(%rdi),%rdi
 	jnz	.Lloop_nocache
-	nop
-	ret
+	movdqu (%rsp),%xmm0
+	addq   $16,%rsp
+	CFI_ADJUST_CFA_OFFSET -16
+	jmp   kernel_fpu_end
 	CFI_ENDPROC
 ENDPROC(clear_page_nocache)
+
+#ifdef CONFIG_AS_AVX
+
+	.section .altinstr_replacement,"ax"
+1:	.byte 0xeb					/* jmp <disp8> */
+	.byte (clear_page_nocache_avx - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_AVX,\
+	                     16, 2b-1b
+	.previous
+
+#define AVX_UNROLL 256 /* TUNE ME */
+
+ENTRY(clear_page_nocache_avx)
+	CFI_STARTPROC
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $32,%rsp
+	CFI_ADJUST_CFA_OFFSET 32
+	vmovdqu %ymm0,(%rsp)
+	vxorpd  %ymm0,%ymm0,%ymm0
+	movl   $4096/AVX_UNROLL,%ecx
+	.p2align 4
+.Lloop_avx:
+	decl	%ecx
+	.set x,0
+	.rept AVX_UNROLL/32
+	vmovntdq %ymm0,x(%rdi)
+	.set x,x+32
+	.endr
+	leaq	AVX_UNROLL(%rdi),%rdi
+	jnz	.Lloop_avx
+	vmovdqu (%rsp),%ymm0
+	addq   $32,%rsp
+	CFI_ADJUST_CFA_OFFSET -32
+	jmp   kernel_fpu_end
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_avx)
+
+#endif
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 7/7] x86: switch the 64bit uncached page clear to SSE/AVX v2
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

With multiple threads vector stores are more efficient, so use them.
This will cause the page clear to run non preemptable and add some
overhead. However on 32bit it was already non preempable (due to
kmap_atomic) and there is an preemption opportunity every 4K unit.

On a NPB (Nasa Parallel Benchmark) 128GB run on a Westmere this improves
the performance regression of enabling transparent huge pages
by ~2% (2.81% to 0.81%), near the runtime variability now.
On a system with AVX support more is expected.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[kirill.shutemov@linux.intel.com: Properly save/restore arguments]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/lib/clear_page_64.S |   79 ++++++++++++++++++++++++++++++++++--------
 1 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index 9d2f3c2..b302cff 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -73,30 +73,79 @@ ENDPROC(clear_page)
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
 
+#define SSE_UNROLL 128
+
 /*
  * Zero a page avoiding the caches
  * rdi	page
  */
 ENTRY(clear_page_nocache)
 	CFI_STARTPROC
-	xorl   %eax,%eax
-	movl   $4096/64,%ecx
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $16,%rsp
+	CFI_ADJUST_CFA_OFFSET 16
+	movdqu %xmm0,(%rsp)
+	xorpd  %xmm0,%xmm0
+	movl   $4096/SSE_UNROLL,%ecx
 	.p2align 4
 .Lloop_nocache:
 	decl	%ecx
-#define PUT(x) movnti %rax,x*8(%rdi)
-	movnti %rax,(%rdi)
-	PUT(1)
-	PUT(2)
-	PUT(3)
-	PUT(4)
-	PUT(5)
-	PUT(6)
-	PUT(7)
-#undef PUT
-	leaq	64(%rdi),%rdi
+	.set x,0
+	.rept SSE_UNROLL/16
+	movntdq %xmm0,x(%rdi)
+	.set x,x+16
+	.endr
+	leaq	SSE_UNROLL(%rdi),%rdi
 	jnz	.Lloop_nocache
-	nop
-	ret
+	movdqu (%rsp),%xmm0
+	addq   $16,%rsp
+	CFI_ADJUST_CFA_OFFSET -16
+	jmp   kernel_fpu_end
 	CFI_ENDPROC
 ENDPROC(clear_page_nocache)
+
+#ifdef CONFIG_AS_AVX
+
+	.section .altinstr_replacement,"ax"
+1:	.byte 0xeb					/* jmp <disp8> */
+	.byte (clear_page_nocache_avx - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_AVX,\
+	                     16, 2b-1b
+	.previous
+
+#define AVX_UNROLL 256 /* TUNE ME */
+
+ENTRY(clear_page_nocache_avx)
+	CFI_STARTPROC
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $32,%rsp
+	CFI_ADJUST_CFA_OFFSET 32
+	vmovdqu %ymm0,(%rsp)
+	vxorpd  %ymm0,%ymm0,%ymm0
+	movl   $4096/AVX_UNROLL,%ecx
+	.p2align 4
+.Lloop_avx:
+	decl	%ecx
+	.set x,0
+	.rept AVX_UNROLL/32
+	vmovntdq %ymm0,x(%rdi)
+	.set x,x+32
+	.endr
+	leaq	AVX_UNROLL(%rdi),%rdi
+	jnz	.Lloop_avx
+	vmovdqu (%rsp),%ymm0
+	addq   $32,%rsp
+	CFI_ADJUST_CFA_OFFSET -32
+	jmp   kernel_fpu_end
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_avx)
+
+#endif
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 7/7] x86: switch the 64bit uncached page clear to SSE/AVX v2
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Andi Kleen,
	Kirill A. Shutemov, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Andrea Arcangeli,
	Johannes Weiner, Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman,
	linux-kernel, linuxppc-dev, linux-mips, linux-sh, sparclinux

From: Andi Kleen <ak@linux.intel.com>

With multiple threads vector stores are more efficient, so use them.
This will cause the page clear to run non preemptable and add some
overhead. However on 32bit it was already non preempable (due to
kmap_atomic) and there is an preemption opportunity every 4K unit.

On a NPB (Nasa Parallel Benchmark) 128GB run on a Westmere this improves
the performance regression of enabling transparent huge pages
by ~2% (2.81% to 0.81%), near the runtime variability now.
On a system with AVX support more is expected.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[kirill.shutemov@linux.intel.com: Properly save/restore arguments]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/lib/clear_page_64.S |   79 ++++++++++++++++++++++++++++++++++--------
 1 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index 9d2f3c2..b302cff 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -73,30 +73,79 @@ ENDPROC(clear_page)
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
 
+#define SSE_UNROLL 128
+
 /*
  * Zero a page avoiding the caches
  * rdi	page
  */
 ENTRY(clear_page_nocache)
 	CFI_STARTPROC
-	xorl   %eax,%eax
-	movl   $4096/64,%ecx
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $16,%rsp
+	CFI_ADJUST_CFA_OFFSET 16
+	movdqu %xmm0,(%rsp)
+	xorpd  %xmm0,%xmm0
+	movl   $4096/SSE_UNROLL,%ecx
 	.p2align 4
 .Lloop_nocache:
 	decl	%ecx
-#define PUT(x) movnti %rax,x*8(%rdi)
-	movnti %rax,(%rdi)
-	PUT(1)
-	PUT(2)
-	PUT(3)
-	PUT(4)
-	PUT(5)
-	PUT(6)
-	PUT(7)
-#undef PUT
-	leaq	64(%rdi),%rdi
+	.set x,0
+	.rept SSE_UNROLL/16
+	movntdq %xmm0,x(%rdi)
+	.set x,x+16
+	.endr
+	leaq	SSE_UNROLL(%rdi),%rdi
 	jnz	.Lloop_nocache
-	nop
-	ret
+	movdqu (%rsp),%xmm0
+	addq   $16,%rsp
+	CFI_ADJUST_CFA_OFFSET -16
+	jmp   kernel_fpu_end
 	CFI_ENDPROC
 ENDPROC(clear_page_nocache)
+
+#ifdef CONFIG_AS_AVX
+
+	.section .altinstr_replacement,"ax"
+1:	.byte 0xeb					/* jmp <disp8> */
+	.byte (clear_page_nocache_avx - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_AVX,\
+	                     16, 2b-1b
+	.previous
+
+#define AVX_UNROLL 256 /* TUNE ME */
+
+ENTRY(clear_page_nocache_avx)
+	CFI_STARTPROC
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $32,%rsp
+	CFI_ADJUST_CFA_OFFSET 32
+	vmovdqu %ymm0,(%rsp)
+	vxorpd  %ymm0,%ymm0,%ymm0
+	movl   $4096/AVX_UNROLL,%ecx
+	.p2align 4
+.Lloop_avx:
+	decl	%ecx
+	.set x,0
+	.rept AVX_UNROLL/32
+	vmovntdq %ymm0,x(%rdi)
+	.set x,x+32
+	.endr
+	leaq	AVX_UNROLL(%rdi),%rdi
+	jnz	.Lloop_avx
+	vmovdqu (%rsp),%ymm0
+	addq   $32,%rsp
+	CFI_ADJUST_CFA_OFFSET -32
+	jmp   kernel_fpu_end
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_avx)
+
+#endif
-- 
1.7.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 7/7] x86: switch the 64bit uncached page clear to SSE/AVX v2
@ 2012-08-16 15:15   ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 15:15 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-mips, linux-sh, Jan Beulich, H. Peter Anvin, sparclinux,
	Andrea Arcangeli, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

From: Andi Kleen <ak@linux.intel.com>

With multiple threads vector stores are more efficient, so use them.
This will cause the page clear to run non preemptable and add some
overhead. However on 32bit it was already non preempable (due to
kmap_atomic) and there is an preemption opportunity every 4K unit.

On a NPB (Nasa Parallel Benchmark) 128GB run on a Westmere this improves
the performance regression of enabling transparent huge pages
by ~2% (2.81% to 0.81%), near the runtime variability now.
On a system with AVX support more is expected.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[kirill.shutemov@linux.intel.com: Properly save/restore arguments]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/lib/clear_page_64.S |   79 ++++++++++++++++++++++++++++++++++--------
 1 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index 9d2f3c2..b302cff 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -73,30 +73,79 @@ ENDPROC(clear_page)
 			     .Lclear_page_end-clear_page,3b-2b
 	.previous
 
+#define SSE_UNROLL 128
+
 /*
  * Zero a page avoiding the caches
  * rdi	page
  */
 ENTRY(clear_page_nocache)
 	CFI_STARTPROC
-	xorl   %eax,%eax
-	movl   $4096/64,%ecx
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $16,%rsp
+	CFI_ADJUST_CFA_OFFSET 16
+	movdqu %xmm0,(%rsp)
+	xorpd  %xmm0,%xmm0
+	movl   $4096/SSE_UNROLL,%ecx
 	.p2align 4
 .Lloop_nocache:
 	decl	%ecx
-#define PUT(x) movnti %rax,x*8(%rdi)
-	movnti %rax,(%rdi)
-	PUT(1)
-	PUT(2)
-	PUT(3)
-	PUT(4)
-	PUT(5)
-	PUT(6)
-	PUT(7)
-#undef PUT
-	leaq	64(%rdi),%rdi
+	.set x,0
+	.rept SSE_UNROLL/16
+	movntdq %xmm0,x(%rdi)
+	.set x,x+16
+	.endr
+	leaq	SSE_UNROLL(%rdi),%rdi
 	jnz	.Lloop_nocache
-	nop
-	ret
+	movdqu (%rsp),%xmm0
+	addq   $16,%rsp
+	CFI_ADJUST_CFA_OFFSET -16
+	jmp   kernel_fpu_end
 	CFI_ENDPROC
 ENDPROC(clear_page_nocache)
+
+#ifdef CONFIG_AS_AVX
+
+	.section .altinstr_replacement,"ax"
+1:	.byte 0xeb					/* jmp <disp8> */
+	.byte (clear_page_nocache_avx - clear_page_nocache) - (2f - 1b)
+	/* offset */
+2:
+	.previous
+	.section .altinstructions,"a"
+	altinstruction_entry clear_page_nocache,1b,X86_FEATURE_AVX,\
+	                     16, 2b-1b
+	.previous
+
+#define AVX_UNROLL 256 /* TUNE ME */
+
+ENTRY(clear_page_nocache_avx)
+	CFI_STARTPROC
+	pushq_cfi %rdi
+	call   kernel_fpu_begin
+	popq_cfi  %rdi
+	sub    $32,%rsp
+	CFI_ADJUST_CFA_OFFSET 32
+	vmovdqu %ymm0,(%rsp)
+	vxorpd  %ymm0,%ymm0,%ymm0
+	movl   $4096/AVX_UNROLL,%ecx
+	.p2align 4
+.Lloop_avx:
+	decl	%ecx
+	.set x,0
+	.rept AVX_UNROLL/32
+	vmovntdq %ymm0,x(%rdi)
+	.set x,x+32
+	.endr
+	leaq	AVX_UNROLL(%rdi),%rdi
+	jnz	.Lloop_avx
+	vmovdqu (%rsp),%ymm0
+	addq   $32,%rsp
+	CFI_ADJUST_CFA_OFFSET -32
+	jmp   kernel_fpu_end
+	CFI_ENDPROC
+ENDPROC(clear_page_nocache_avx)
+
+#endif
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
  2012-08-16 15:15   ` Kirill A. Shutemov
  (?)
  (?)
@ 2012-08-16 16:16     ` Andrea Arcangeli
  -1 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 16:16 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrew Morton, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel, linuxppc-dev,
	linux-mips, linux-sh, sparclinux

Hi Kirill,

On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote:
>  	for (i = 0; i < pages_per_huge_page;
>  	     i++, p = mem_map_next(p, page, i)) {

It may be more optimal to avoid a multiplication/shiftleft before the
add, and to do:

  	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
  	     i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) {

>  		cond_resched();
> -		clear_user_highpage(p, addr + i * PAGE_SIZE);
> +		vaddr = haddr + i*PAGE_SIZE;

Not sure if gcc can optimize it away because of the external calls.

> +		if (!ARCH_HAS_USER_NOCACHE || i = target)
> +			clear_user_highpage(page + i, vaddr);
> +		else
> +			clear_user_highpage_nocache(page + i, vaddr);
>  	}

My only worry overall is if there can be some workload where this may
actually slow down userland if the CPU cache is very large and
userland would access most of the faulted in memory after the first
fault.

So I wouldn't mind to add one more check in addition of
!ARCH_HAS_USER_NOCACHE above to check a runtime sysctl variable. It'll
waste a cacheline yes but I doubt it's measurable compared to the time
it takes to do a >=2M hugepage copy.

Furthermore it would allow people to benchmark its effect without
having to rebuild the kernel themself.

All other patches looks fine to me.

Thanks!
Andrea

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 16:16     ` Andrea Arcangeli
  0 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 16:16 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrew Morton, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel, linuxppc-dev,
	linux-mips, linux-sh, sparclinux

Hi Kirill,

On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote:
>  	for (i = 0; i < pages_per_huge_page;
>  	     i++, p = mem_map_next(p, page, i)) {

It may be more optimal to avoid a multiplication/shiftleft before the
add, and to do:

  	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
  	     i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) {

>  		cond_resched();
> -		clear_user_highpage(p, addr + i * PAGE_SIZE);
> +		vaddr = haddr + i*PAGE_SIZE;

Not sure if gcc can optimize it away because of the external calls.

> +		if (!ARCH_HAS_USER_NOCACHE || i == target)
> +			clear_user_highpage(page + i, vaddr);
> +		else
> +			clear_user_highpage_nocache(page + i, vaddr);
>  	}

My only worry overall is if there can be some workload where this may
actually slow down userland if the CPU cache is very large and
userland would access most of the faulted in memory after the first
fault.

So I wouldn't mind to add one more check in addition of
!ARCH_HAS_USER_NOCACHE above to check a runtime sysctl variable. It'll
waste a cacheline yes but I doubt it's measurable compared to the time
it takes to do a >=2M hugepage copy.

Furthermore it would allow people to benchmark its effect without
having to rebuild the kernel themself.

All other patches looks fine to me.

Thanks!
Andrea

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 16:16     ` Andrea Arcangeli
  0 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 16:16 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrew Morton, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel, linuxppc-dev,
	linux-mips, linux-sh, sparclinux

Hi Kirill,

On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote:
>  	for (i = 0; i < pages_per_huge_page;
>  	     i++, p = mem_map_next(p, page, i)) {

It may be more optimal to avoid a multiplication/shiftleft before the
add, and to do:

  	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
  	     i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) {

>  		cond_resched();
> -		clear_user_highpage(p, addr + i * PAGE_SIZE);
> +		vaddr = haddr + i*PAGE_SIZE;

Not sure if gcc can optimize it away because of the external calls.

> +		if (!ARCH_HAS_USER_NOCACHE || i == target)
> +			clear_user_highpage(page + i, vaddr);
> +		else
> +			clear_user_highpage_nocache(page + i, vaddr);
>  	}

My only worry overall is if there can be some workload where this may
actually slow down userland if the CPU cache is very large and
userland would access most of the faulted in memory after the first
fault.

So I wouldn't mind to add one more check in addition of
!ARCH_HAS_USER_NOCACHE above to check a runtime sysctl variable. It'll
waste a cacheline yes but I doubt it's measurable compared to the time
it takes to do a >=2M hugepage copy.

Furthermore it would allow people to benchmark its effect without
having to rebuild the kernel themself.

All other patches looks fine to me.

Thanks!
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 16:16     ` Andrea Arcangeli
  0 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 16:16 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mips, Andi Kleen, Alex Shi, Robert Richter, linuxppc-dev,
	x86, Hugh Dickins, linux-kernel, Jan Beulich, Andy Lutomirski,
	Johannes Weiner, linux-mm, linux-sh, Ingo Molnar, Mel Gorman,
	H. Peter Anvin, sparclinux, Thomas Gleixner, Tim Chen,
	Andrew Morton, KAMEZAWA Hiroyuki

Hi Kirill,

On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote:
>  	for (i = 0; i < pages_per_huge_page;
>  	     i++, p = mem_map_next(p, page, i)) {

It may be more optimal to avoid a multiplication/shiftleft before the
add, and to do:

  	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
  	     i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) {

>  		cond_resched();
> -		clear_user_highpage(p, addr + i * PAGE_SIZE);
> +		vaddr = haddr + i*PAGE_SIZE;

Not sure if gcc can optimize it away because of the external calls.

> +		if (!ARCH_HAS_USER_NOCACHE || i == target)
> +			clear_user_highpage(page + i, vaddr);
> +		else
> +			clear_user_highpage_nocache(page + i, vaddr);
>  	}

My only worry overall is if there can be some workload where this may
actually slow down userland if the CPU cache is very large and
userland would access most of the faulted in memory after the first
fault.

So I wouldn't mind to add one more check in addition of
!ARCH_HAS_USER_NOCACHE above to check a runtime sysctl variable. It'll
waste a cacheline yes but I doubt it's measurable compared to the time
it takes to do a >=2M hugepage copy.

Furthermore it would allow people to benchmark its effect without
having to rebuild the kernel themself.

All other patches looks fine to me.

Thanks!
Andrea

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
  2012-08-16 16:16     ` Andrea Arcangeli
  (?)
  (?)
@ 2012-08-16 16:43       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 16:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Andi Kleen, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Johannes Weiner,
	Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel,
	linuxppc-dev, linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 06:16:47PM +0200, Andrea Arcangeli wrote:
> Hi Kirill,
> 
> On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote:
> >  	for (i = 0; i < pages_per_huge_page;
> >  	     i++, p = mem_map_next(p, page, i)) {
> 
> It may be more optimal to avoid a multiplication/shiftleft before the
> add, and to do:
> 
>   	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
>   	     i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) {
> 

Makes sense. I'll update it.

> >  		cond_resched();
> > -		clear_user_highpage(p, addr + i * PAGE_SIZE);
> > +		vaddr = haddr + i*PAGE_SIZE;
> 
> Not sure if gcc can optimize it away because of the external calls.
> 
> > +		if (!ARCH_HAS_USER_NOCACHE || i = target)
> > +			clear_user_highpage(page + i, vaddr);
> > +		else
> > +			clear_user_highpage_nocache(page + i, vaddr);
> >  	}
> 
> 
> My only worry overall is if there can be some workload where this may
> actually slow down userland if the CPU cache is very large and
> userland would access most of the faulted in memory after the first
> fault.
> 
> So I wouldn't mind to add one more check in addition of
> !ARCH_HAS_USER_NOCACHE above to check a runtime sysctl variable. It'll
> waste a cacheline yes but I doubt it's measurable compared to the time
> it takes to do a >=2M hugepage copy.

Hm.. I think with static_key we can avoid cache overhead here. I'll try.
 
> Furthermore it would allow people to benchmark its effect without
> having to rebuild the kernel themself.
> 
> All other patches looks fine to me.

Thanks, for review. Could you take a look at huge zero page patchset? ;)

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 16:43       ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 16:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Andi Kleen, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Johannes Weiner,
	Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel,
	linuxppc-dev, linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 06:16:47PM +0200, Andrea Arcangeli wrote:
> Hi Kirill,
> 
> On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote:
> >  	for (i = 0; i < pages_per_huge_page;
> >  	     i++, p = mem_map_next(p, page, i)) {
> 
> It may be more optimal to avoid a multiplication/shiftleft before the
> add, and to do:
> 
>   	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
>   	     i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) {
> 

Makes sense. I'll update it.

> >  		cond_resched();
> > -		clear_user_highpage(p, addr + i * PAGE_SIZE);
> > +		vaddr = haddr + i*PAGE_SIZE;
> 
> Not sure if gcc can optimize it away because of the external calls.
> 
> > +		if (!ARCH_HAS_USER_NOCACHE || i == target)
> > +			clear_user_highpage(page + i, vaddr);
> > +		else
> > +			clear_user_highpage_nocache(page + i, vaddr);
> >  	}
> 
> 
> My only worry overall is if there can be some workload where this may
> actually slow down userland if the CPU cache is very large and
> userland would access most of the faulted in memory after the first
> fault.
> 
> So I wouldn't mind to add one more check in addition of
> !ARCH_HAS_USER_NOCACHE above to check a runtime sysctl variable. It'll
> waste a cacheline yes but I doubt it's measurable compared to the time
> it takes to do a >=2M hugepage copy.

Hm.. I think with static_key we can avoid cache overhead here. I'll try.
 
> Furthermore it would allow people to benchmark its effect without
> having to rebuild the kernel themself.
> 
> All other patches looks fine to me.

Thanks, for review. Could you take a look at huge zero page patchset? ;)

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 16:43       ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 16:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Andi Kleen, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Johannes Weiner,
	Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel,
	linuxppc-dev, linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 06:16:47PM +0200, Andrea Arcangeli wrote:
> Hi Kirill,
> 
> On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote:
> >  	for (i = 0; i < pages_per_huge_page;
> >  	     i++, p = mem_map_next(p, page, i)) {
> 
> It may be more optimal to avoid a multiplication/shiftleft before the
> add, and to do:
> 
>   	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
>   	     i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) {
> 

Makes sense. I'll update it.

> >  		cond_resched();
> > -		clear_user_highpage(p, addr + i * PAGE_SIZE);
> > +		vaddr = haddr + i*PAGE_SIZE;
> 
> Not sure if gcc can optimize it away because of the external calls.
> 
> > +		if (!ARCH_HAS_USER_NOCACHE || i == target)
> > +			clear_user_highpage(page + i, vaddr);
> > +		else
> > +			clear_user_highpage_nocache(page + i, vaddr);
> >  	}
> 
> 
> My only worry overall is if there can be some workload where this may
> actually slow down userland if the CPU cache is very large and
> userland would access most of the faulted in memory after the first
> fault.
> 
> So I wouldn't mind to add one more check in addition of
> !ARCH_HAS_USER_NOCACHE above to check a runtime sysctl variable. It'll
> waste a cacheline yes but I doubt it's measurable compared to the time
> it takes to do a >=2M hugepage copy.

Hm.. I think with static_key we can avoid cache overhead here. I'll try.
 
> Furthermore it would allow people to benchmark its effect without
> having to rebuild the kernel themself.
> 
> All other patches looks fine to me.

Thanks, for review. Could you take a look at huge zero page patchset? ;)

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 16:43       ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 16:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mips, linux-sh, Jan Beulich, linux-mm, H. Peter Anvin,
	sparclinux, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

On Thu, Aug 16, 2012 at 06:16:47PM +0200, Andrea Arcangeli wrote:
> Hi Kirill,
> 
> On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote:
> >  	for (i = 0; i < pages_per_huge_page;
> >  	     i++, p = mem_map_next(p, page, i)) {
> 
> It may be more optimal to avoid a multiplication/shiftleft before the
> add, and to do:
> 
>   	for (i = 0, vaddr = haddr; i < pages_per_huge_page;
>   	     i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) {
> 

Makes sense. I'll update it.

> >  		cond_resched();
> > -		clear_user_highpage(p, addr + i * PAGE_SIZE);
> > +		vaddr = haddr + i*PAGE_SIZE;
> 
> Not sure if gcc can optimize it away because of the external calls.
> 
> > +		if (!ARCH_HAS_USER_NOCACHE || i == target)
> > +			clear_user_highpage(page + i, vaddr);
> > +		else
> > +			clear_user_highpage_nocache(page + i, vaddr);
> >  	}
> 
> 
> My only worry overall is if there can be some workload where this may
> actually slow down userland if the CPU cache is very large and
> userland would access most of the faulted in memory after the first
> fault.
> 
> So I wouldn't mind to add one more check in addition of
> !ARCH_HAS_USER_NOCACHE above to check a runtime sysctl variable. It'll
> waste a cacheline yes but I doubt it's measurable compared to the time
> it takes to do a >=2M hugepage copy.

Hm.. I think with static_key we can avoid cache overhead here. I'll try.
 
> Furthermore it would allow people to benchmark its effect without
> having to rebuild the kernel themself.
> 
> All other patches looks fine to me.

Thanks, for review. Could you take a look at huge zero page patchset? ;)

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
  2012-08-16 16:43       ` Kirill A. Shutemov
  (?)
  (?)
@ 2012-08-16 18:29         ` Andrea Arcangeli
  -1 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 18:29 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrew Morton, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel, linuxppc-dev,
	linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> Hm.. I think with static_key we can avoid cache overhead here. I'll try.

Could you elaborate on the static_key? Is it some sort of self
modifying code?

> Thanks, for review. Could you take a look at huge zero page patchset? ;)

I've noticed that too, nice :). I'm checking some detail on the
wrprotect fault behavior but I'll comment there.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 18:29         ` Andrea Arcangeli
  0 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 18:29 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrew Morton, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel, linuxppc-dev,
	linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> Hm.. I think with static_key we can avoid cache overhead here. I'll try.

Could you elaborate on the static_key? Is it some sort of self
modifying code?

> Thanks, for review. Could you take a look at huge zero page patchset? ;)

I've noticed that too, nice :). I'm checking some detail on the
wrprotect fault behavior but I'll comment there.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 18:29         ` Andrea Arcangeli
  0 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 18:29 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Tim Chen, Alex Shi, Jan Beulich, Robert Richter,
	Andy Lutomirski, Andrew Morton, Johannes Weiner, Hugh Dickins,
	KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel, linuxppc-dev,
	linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> Hm.. I think with static_key we can avoid cache overhead here. I'll try.

Could you elaborate on the static_key? Is it some sort of self
modifying code?

> Thanks, for review. Could you take a look at huge zero page patchset? ;)

I've noticed that too, nice :). I'm checking some detail on the
wrprotect fault behavior but I'll comment there.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 18:29         ` Andrea Arcangeli
  0 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 18:29 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mips, Andi Kleen, Alex Shi, Robert Richter, linuxppc-dev,
	x86, Hugh Dickins, linux-kernel, Jan Beulich, Andy Lutomirski,
	Johannes Weiner, linux-mm, linux-sh, Ingo Molnar, Mel Gorman,
	H. Peter Anvin, sparclinux, Thomas Gleixner, Tim Chen,
	Andrew Morton, KAMEZAWA Hiroyuki

On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> Hm.. I think with static_key we can avoid cache overhead here. I'll try.

Could you elaborate on the static_key? Is it some sort of self
modifying code?

> Thanks, for review. Could you take a look at huge zero page patchset? ;)

I've noticed that too, nice :). I'm checking some detail on the
wrprotect fault behavior but I'll comment there.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
  2012-08-16 18:29         ` Andrea Arcangeli
  (?)
  (?)
@ 2012-08-16 18:37           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 18:37 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Andi Kleen, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Johannes Weiner,
	Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel,
	linuxppc-dev, linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 08:29:44PM +0200, Andrea Arcangeli wrote:
> On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> > Hm.. I think with static_key we can avoid cache overhead here. I'll try.
> 
> Could you elaborate on the static_key? Is it some sort of self
> modifying code?

Runtime code patching. See Documentation/static-keys.txt. We can patch it
on sysctl.

> 
> > Thanks, for review. Could you take a look at huge zero page patchset? ;)
> 
> I've noticed that too, nice :). I'm checking some detail on the
> wrprotect fault behavior but I'll comment there.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 18:37           ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 18:37 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Andi Kleen, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Johannes Weiner,
	Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel,
	linuxppc-dev, linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 08:29:44PM +0200, Andrea Arcangeli wrote:
> On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> > Hm.. I think with static_key we can avoid cache overhead here. I'll try.
> 
> Could you elaborate on the static_key? Is it some sort of self
> modifying code?

Runtime code patching. See Documentation/static-keys.txt. We can patch it
on sysctl.

> 
> > Thanks, for review. Could you take a look at huge zero page patchset? ;)
> 
> I've noticed that too, nice :). I'm checking some detail on the
> wrprotect fault behavior but I'll comment there.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 18:37           ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 18:37 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Andi Kleen, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Johannes Weiner,
	Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel,
	linuxppc-dev, linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 08:29:44PM +0200, Andrea Arcangeli wrote:
> On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> > Hm.. I think with static_key we can avoid cache overhead here. I'll try.
> 
> Could you elaborate on the static_key? Is it some sort of self
> modifying code?

Runtime code patching. See Documentation/static-keys.txt. We can patch it
on sysctl.

> 
> > Thanks, for review. Could you take a look at huge zero page patchset? ;)
> 
> I've noticed that too, nice :). I'm checking some detail on the
> wrprotect fault behavior but I'll comment there.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 18:37           ` Kirill A. Shutemov
  0 siblings, 0 replies; 52+ messages in thread
From: Kirill A. Shutemov @ 2012-08-16 18:37 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mips, linux-sh, Jan Beulich, linux-mm, H. Peter Anvin,
	sparclinux, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

On Thu, Aug 16, 2012 at 08:29:44PM +0200, Andrea Arcangeli wrote:
> On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> > Hm.. I think with static_key we can avoid cache overhead here. I'll try.
> 
> Could you elaborate on the static_key? Is it some sort of self
> modifying code?

Runtime code patching. See Documentation/static-keys.txt. We can patch it
on sysctl.

> 
> > Thanks, for review. Could you take a look at huge zero page patchset? ;)
> 
> I've noticed that too, nice :). I'm checking some detail on the
> wrprotect fault behavior but I'll comment there.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
  2012-08-16 18:37           ` Kirill A. Shutemov
  (?)
  (?)
@ 2012-08-16 19:42             ` Andrea Arcangeli
  -1 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 19:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Andi Kleen, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Johannes Weiner,
	Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel,
	linuxppc-dev, linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 09:37:25PM +0300, Kirill A. Shutemov wrote:
> On Thu, Aug 16, 2012 at 08:29:44PM +0200, Andrea Arcangeli wrote:
> > On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> > > Hm.. I think with static_key we can avoid cache overhead here. I'll try.
> > 
> > Could you elaborate on the static_key? Is it some sort of self
> > modifying code?
> 
> Runtime code patching. See Documentation/static-keys.txt. We can patch it
> on sysctl.

I guessed it had to be patching the code, thanks for the
pointer. It looks a perfect fit for this one agreed.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 19:42             ` Andrea Arcangeli
  0 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 19:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Andi Kleen, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Johannes Weiner,
	Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel,
	linuxppc-dev, linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 09:37:25PM +0300, Kirill A. Shutemov wrote:
> On Thu, Aug 16, 2012 at 08:29:44PM +0200, Andrea Arcangeli wrote:
> > On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> > > Hm.. I think with static_key we can avoid cache overhead here. I'll try.
> > 
> > Could you elaborate on the static_key? Is it some sort of self
> > modifying code?
> 
> Runtime code patching. See Documentation/static-keys.txt. We can patch it
> on sysctl.

I guessed it had to be patching the code, thanks for the
pointer. It looks a perfect fit for this one agreed.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 19:42             ` Andrea Arcangeli
  0 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 19:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, linux-mm, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Andi Kleen, Tim Chen, Alex Shi, Jan Beulich,
	Robert Richter, Andy Lutomirski, Andrew Morton, Johannes Weiner,
	Hugh Dickins, KAMEZAWA Hiroyuki, Mel Gorman, linux-kernel,
	linuxppc-dev, linux-mips, linux-sh, sparclinux

On Thu, Aug 16, 2012 at 09:37:25PM +0300, Kirill A. Shutemov wrote:
> On Thu, Aug 16, 2012 at 08:29:44PM +0200, Andrea Arcangeli wrote:
> > On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> > > Hm.. I think with static_key we can avoid cache overhead here. I'll try.
> > 
> > Could you elaborate on the static_key? Is it some sort of self
> > modifying code?
> 
> Runtime code patching. See Documentation/static-keys.txt. We can patch it
> on sysctl.

I guessed it had to be patching the code, thanks for the
pointer. It looks a perfect fit for this one agreed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address
@ 2012-08-16 19:42             ` Andrea Arcangeli
  0 siblings, 0 replies; 52+ messages in thread
From: Andrea Arcangeli @ 2012-08-16 19:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mips, linux-sh, Jan Beulich, linux-mm, H. Peter Anvin,
	sparclinux, Andi Kleen, Robert Richter, x86, Hugh Dickins,
	Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
	KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
	Johannes Weiner, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

On Thu, Aug 16, 2012 at 09:37:25PM +0300, Kirill A. Shutemov wrote:
> On Thu, Aug 16, 2012 at 08:29:44PM +0200, Andrea Arcangeli wrote:
> > On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote:
> > > Hm.. I think with static_key we can avoid cache overhead here. I'll try.
> > 
> > Could you elaborate on the static_key? Is it some sort of self
> > modifying code?
> 
> Runtime code patching. See Documentation/static-keys.txt. We can patch it
> on sysctl.

I guessed it had to be patching the code, thanks for the
pointer. It looks a perfect fit for this one agreed.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2012-08-16 19:42 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-16 15:15 [PATCH v3 0/7] Avoid cache trashing on clearing huge/gigantic page Kirill A. Shutemov
2012-08-16 15:15 ` Kirill A. Shutemov
2012-08-16 15:15 ` Kirill A. Shutemov
2012-08-16 15:15 ` Kirill A. Shutemov
2012-08-16 15:15 ` [PATCH v3 1/7] THP: Use real address for NUMA policy Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15 ` [PATCH v3 2/7] THP: Pass fault address to __do_huge_pmd_anonymous_page() Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15 ` [PATCH v3 3/7] hugetlb: pass fault address to hugetlb_no_page() Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15 ` [PATCH v3 4/7] mm: pass fault address to clear_huge_page() Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15 ` [PATCH v3 5/7] x86: Add clear_page_nocache Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15 ` [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 16:16   ` Andrea Arcangeli
2012-08-16 16:16     ` Andrea Arcangeli
2012-08-16 16:16     ` Andrea Arcangeli
2012-08-16 16:16     ` Andrea Arcangeli
2012-08-16 16:43     ` Kirill A. Shutemov
2012-08-16 16:43       ` Kirill A. Shutemov
2012-08-16 16:43       ` Kirill A. Shutemov
2012-08-16 16:43       ` Kirill A. Shutemov
2012-08-16 18:29       ` Andrea Arcangeli
2012-08-16 18:29         ` Andrea Arcangeli
2012-08-16 18:29         ` Andrea Arcangeli
2012-08-16 18:29         ` Andrea Arcangeli
2012-08-16 18:37         ` Kirill A. Shutemov
2012-08-16 18:37           ` Kirill A. Shutemov
2012-08-16 18:37           ` Kirill A. Shutemov
2012-08-16 18:37           ` Kirill A. Shutemov
2012-08-16 19:42           ` Andrea Arcangeli
2012-08-16 19:42             ` Andrea Arcangeli
2012-08-16 19:42             ` Andrea Arcangeli
2012-08-16 19:42             ` Andrea Arcangeli
2012-08-16 15:15 ` [PATCH v3 7/7] x86: switch the 64bit uncached page clear to SSE/AVX v2 Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov
2012-08-16 15:15   ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.