All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] mm, thp: two THP splitting performance fixes
@ 2014-06-17 22:37 ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2014-06-17 22:37 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Scott J Norton, Waiman Long

v1->v2:
 - Add a second patch to replace smp_mb() by smp_mb__after_atomic().
 - Add performance data to the first patch

This mini-series contains 2 minor changes to the transparent huge
page splitting code to split its performance, particularly for the
x86 architecture.

Waiman Long (2):
  mm, thp: move invariant bug check out of loop in
    __split_huge_page_map
  mm, thp: replace smp_mb after atomic_add by smp_mb__after_atomic

 mm/huge_memory.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 0/2] mm, thp: two THP splitting performance fixes
@ 2014-06-17 22:37 ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2014-06-17 22:37 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Scott J Norton, Waiman Long

v1->v2:
 - Add a second patch to replace smp_mb() by smp_mb__after_atomic().
 - Add performance data to the first patch

This mini-series contains 2 minor changes to the transparent huge
page splitting code to split its performance, particularly for the
x86 architecture.

Waiman Long (2):
  mm, thp: move invariant bug check out of loop in
    __split_huge_page_map
  mm, thp: replace smp_mb after atomic_add by smp_mb__after_atomic

 mm/huge_memory.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] mm, thp: move invariant bug check out of loop in __split_huge_page_map
  2014-06-17 22:37 ` Waiman Long
@ 2014-06-17 22:37   ` Waiman Long
  -1 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2014-06-17 22:37 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Scott J Norton, Waiman Long

In the __split_huge_page_map() function, the check for
page_mapcount(page) is invariant within the for loop. Because of the
fact that the macro is implemented using atomic_read(), the redundant
check cannot be optimized away by the compiler leading to unnecessary
read to the page structure.

This patch moves the invariant bug check out of the loop so that it
will be done only once. On a 3.16-rc1 based kernel, the execution
time of a microbenchmark that broke up 1000 transparent huge pages
using munmap() had an execution time of 38,245us and 38,548us with
and without the patch respectively. The performance gain is about 1%.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 mm/huge_memory.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e60837d..be84c71 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1744,6 +1744,8 @@ static int __split_huge_page_map(struct page *page,
 	if (pmd) {
 		pgtable = pgtable_trans_huge_withdraw(mm, pmd);
 		pmd_populate(mm, &_pmd, pgtable);
+		if (pmd_write(*pmd))
+			BUG_ON(page_mapcount(page) != 1);
 
 		haddr = address;
 		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
@@ -1753,8 +1755,6 @@ static int __split_huge_page_map(struct page *page,
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 			if (!pmd_write(*pmd))
 				entry = pte_wrprotect(entry);
-			else
-				BUG_ON(page_mapcount(page) != 1);
 			if (!pmd_young(*pmd))
 				entry = pte_mkold(entry);
 			if (pmd_numa(*pmd))
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] mm, thp: move invariant bug check out of loop in __split_huge_page_map
@ 2014-06-17 22:37   ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2014-06-17 22:37 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Scott J Norton, Waiman Long

In the __split_huge_page_map() function, the check for
page_mapcount(page) is invariant within the for loop. Because of the
fact that the macro is implemented using atomic_read(), the redundant
check cannot be optimized away by the compiler leading to unnecessary
read to the page structure.

This patch moves the invariant bug check out of the loop so that it
will be done only once. On a 3.16-rc1 based kernel, the execution
time of a microbenchmark that broke up 1000 transparent huge pages
using munmap() had an execution time of 38,245us and 38,548us with
and without the patch respectively. The performance gain is about 1%.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 mm/huge_memory.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e60837d..be84c71 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1744,6 +1744,8 @@ static int __split_huge_page_map(struct page *page,
 	if (pmd) {
 		pgtable = pgtable_trans_huge_withdraw(mm, pmd);
 		pmd_populate(mm, &_pmd, pgtable);
+		if (pmd_write(*pmd))
+			BUG_ON(page_mapcount(page) != 1);
 
 		haddr = address;
 		for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
@@ -1753,8 +1755,6 @@ static int __split_huge_page_map(struct page *page,
 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 			if (!pmd_write(*pmd))
 				entry = pte_wrprotect(entry);
-			else
-				BUG_ON(page_mapcount(page) != 1);
 			if (!pmd_young(*pmd))
 				entry = pte_mkold(entry);
 			if (pmd_numa(*pmd))
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] mm, thp: replace smp_mb after atomic_add by smp_mb__after_atomic
  2014-06-17 22:37 ` Waiman Long
@ 2014-06-17 22:37   ` Waiman Long
  -1 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2014-06-17 22:37 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Scott J Norton, Waiman Long

In some architectures like x86, atomic_add() is a full memory
barrier. In that case, an additional smp_mb() is just a waste of time.
This patch replaces that smp_mb() by smp_mb__after_atomic() which
will avoid the redundant memory barrier in some architectures.

With a 3.16-rc1 based kernel, this patch reduced the execution time
of breaking 1000 transparent huge pages from 38,245us to 30,964us. A
reduction of 19% which is quite sizeable. It also reduces the %cpu
time of the __split_huge_page_refcount function in the perf profile
from 2.18% to 1.15%.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 mm/huge_memory.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index be84c71..e2ee131 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1650,7 +1650,7 @@ static void __split_huge_page_refcount(struct page *page,
 			   &page_tail->_count);
 
 		/* after clearing PageTail the gup refcount can be released */
-		smp_mb();
+		smp_mb__after_atomic();
 
 		/*
 		 * retain hwpoison flag of the poisoned tail page:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] mm, thp: replace smp_mb after atomic_add by smp_mb__after_atomic
@ 2014-06-17 22:37   ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2014-06-17 22:37 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Scott J Norton, Waiman Long

In some architectures like x86, atomic_add() is a full memory
barrier. In that case, an additional smp_mb() is just a waste of time.
This patch replaces that smp_mb() by smp_mb__after_atomic() which
will avoid the redundant memory barrier in some architectures.

With a 3.16-rc1 based kernel, this patch reduced the execution time
of breaking 1000 transparent huge pages from 38,245us to 30,964us. A
reduction of 19% which is quite sizeable. It also reduces the %cpu
time of the __split_huge_page_refcount function in the perf profile
from 2.18% to 1.15%.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 mm/huge_memory.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index be84c71..e2ee131 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1650,7 +1650,7 @@ static void __split_huge_page_refcount(struct page *page,
 			   &page_tail->_count);
 
 		/* after clearing PageTail the gup refcount can be released */
-		smp_mb();
+		smp_mb__after_atomic();
 
 		/*
 		 * retain hwpoison flag of the poisoned tail page:
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] mm, thp: replace smp_mb after atomic_add by smp_mb__after_atomic
  2014-06-17 22:37   ` Waiman Long
@ 2014-06-18 12:17     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2014-06-18 12:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, linux-kernel, linux-mm, Scott J Norton

On Tue, Jun 17, 2014 at 06:37:59PM -0400, Waiman Long wrote:
> In some architectures like x86, atomic_add() is a full memory
> barrier. In that case, an additional smp_mb() is just a waste of time.
> This patch replaces that smp_mb() by smp_mb__after_atomic() which
> will avoid the redundant memory barrier in some architectures.
> 
> With a 3.16-rc1 based kernel, this patch reduced the execution time
> of breaking 1000 transparent huge pages from 38,245us to 30,964us. A
> reduction of 19% which is quite sizeable. It also reduces the %cpu
> time of the __split_huge_page_refcount function in the perf profile
> from 2.18% to 1.15%.
> 
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] mm, thp: replace smp_mb after atomic_add by smp_mb__after_atomic
@ 2014-06-18 12:17     ` Kirill A. Shutemov
  0 siblings, 0 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2014-06-18 12:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, linux-kernel, linux-mm, Scott J Norton

On Tue, Jun 17, 2014 at 06:37:59PM -0400, Waiman Long wrote:
> In some architectures like x86, atomic_add() is a full memory
> barrier. In that case, an additional smp_mb() is just a waste of time.
> This patch replaces that smp_mb() by smp_mb__after_atomic() which
> will avoid the redundant memory barrier in some architectures.
> 
> With a 3.16-rc1 based kernel, this patch reduced the execution time
> of breaking 1000 transparent huge pages from 38,245us to 30,964us. A
> reduction of 19% which is quite sizeable. It also reduces the %cpu
> time of the __split_huge_page_refcount function in the perf profile
> from 2.18% to 1.15%.
> 
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mm, thp: move invariant bug check out of loop in __split_huge_page_map
  2014-06-17 22:37   ` Waiman Long
@ 2014-06-18 12:24     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2014-06-18 12:24 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, linux-kernel, linux-mm, Scott J Norton

On Tue, Jun 17, 2014 at 06:37:58PM -0400, Waiman Long wrote:
> In the __split_huge_page_map() function, the check for
> page_mapcount(page) is invariant within the for loop. Because of the
> fact that the macro is implemented using atomic_read(), the redundant
> check cannot be optimized away by the compiler leading to unnecessary
> read to the page structure.
> 
> This patch moves the invariant bug check out of the loop so that it
> will be done only once. On a 3.16-rc1 based kernel, the execution
> time of a microbenchmark that broke up 1000 transparent huge pages
> using munmap() had an execution time of 38,245us and 38,548us with
> and without the patch respectively. The performance gain is about 1%.

For this low difference it would be nice to average over few runs +
stddev. It can easily can be a noise.

> Signed-off-by: Waiman Long <Waiman.Long@hp.com>

But okay:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mm, thp: move invariant bug check out of loop in __split_huge_page_map
@ 2014-06-18 12:24     ` Kirill A. Shutemov
  0 siblings, 0 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2014-06-18 12:24 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, linux-kernel, linux-mm, Scott J Norton

On Tue, Jun 17, 2014 at 06:37:58PM -0400, Waiman Long wrote:
> In the __split_huge_page_map() function, the check for
> page_mapcount(page) is invariant within the for loop. Because of the
> fact that the macro is implemented using atomic_read(), the redundant
> check cannot be optimized away by the compiler leading to unnecessary
> read to the page structure.
> 
> This patch moves the invariant bug check out of the loop so that it
> will be done only once. On a 3.16-rc1 based kernel, the execution
> time of a microbenchmark that broke up 1000 transparent huge pages
> using munmap() had an execution time of 38,245us and 38,548us with
> and without the patch respectively. The performance gain is about 1%.

For this low difference it would be nice to average over few runs +
stddev. It can easily can be a noise.

> Signed-off-by: Waiman Long <Waiman.Long@hp.com>

But okay:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mm, thp: move invariant bug check out of loop in __split_huge_page_map
  2014-06-18 12:24     ` Kirill A. Shutemov
@ 2014-06-18 15:31       ` Waiman Long
  -1 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2014-06-18 15:31 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, linux-kernel, linux-mm, Scott J Norton

On 06/18/2014 08:24 AM, Kirill A. Shutemov wrote:
> On Tue, Jun 17, 2014 at 06:37:58PM -0400, Waiman Long wrote:
>> In the __split_huge_page_map() function, the check for
>> page_mapcount(page) is invariant within the for loop. Because of the
>> fact that the macro is implemented using atomic_read(), the redundant
>> check cannot be optimized away by the compiler leading to unnecessary
>> read to the page structure.
>>
>> This patch moves the invariant bug check out of the loop so that it
>> will be done only once. On a 3.16-rc1 based kernel, the execution
>> time of a microbenchmark that broke up 1000 transparent huge pages
>> using munmap() had an execution time of 38,245us and 38,548us with
>> and without the patch respectively. The performance gain is about 1%.
> For this low difference it would be nice to average over few runs +
> stddev. It can easily can be a noise.

The timing data was the average of 5 runs with a SD of 100-200us.
>> Signed-off-by: Waiman Long<Waiman.Long@hp.com>
> But okay:
>
> Acked-by: Kirill A. Shutemov<kirill.shutemov@linux.intel.com>
>

Thank for the review.

-Longman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mm, thp: move invariant bug check out of loop in __split_huge_page_map
@ 2014-06-18 15:31       ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2014-06-18 15:31 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Mel Gorman, Rik van Riel, Ingo Molnar,
	Peter Zijlstra, linux-kernel, linux-mm, Scott J Norton

On 06/18/2014 08:24 AM, Kirill A. Shutemov wrote:
> On Tue, Jun 17, 2014 at 06:37:58PM -0400, Waiman Long wrote:
>> In the __split_huge_page_map() function, the check for
>> page_mapcount(page) is invariant within the for loop. Because of the
>> fact that the macro is implemented using atomic_read(), the redundant
>> check cannot be optimized away by the compiler leading to unnecessary
>> read to the page structure.
>>
>> This patch moves the invariant bug check out of the loop so that it
>> will be done only once. On a 3.16-rc1 based kernel, the execution
>> time of a microbenchmark that broke up 1000 transparent huge pages
>> using munmap() had an execution time of 38,245us and 38,548us with
>> and without the patch respectively. The performance gain is about 1%.
> For this low difference it would be nice to average over few runs +
> stddev. It can easily can be a noise.

The timing data was the average of 5 runs with a SD of 100-200us.
>> Signed-off-by: Waiman Long<Waiman.Long@hp.com>
> But okay:
>
> Acked-by: Kirill A. Shutemov<kirill.shutemov@linux.intel.com>
>

Thank for the review.

-Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-06-18 15:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-17 22:37 [PATCH v2 0/2] mm, thp: two THP splitting performance fixes Waiman Long
2014-06-17 22:37 ` Waiman Long
2014-06-17 22:37 ` [PATCH v2 1/2] mm, thp: move invariant bug check out of loop in __split_huge_page_map Waiman Long
2014-06-17 22:37   ` Waiman Long
2014-06-18 12:24   ` Kirill A. Shutemov
2014-06-18 12:24     ` Kirill A. Shutemov
2014-06-18 15:31     ` Waiman Long
2014-06-18 15:31       ` Waiman Long
2014-06-17 22:37 ` [PATCH v2 2/2] mm, thp: replace smp_mb after atomic_add by smp_mb__after_atomic Waiman Long
2014-06-17 22:37   ` Waiman Long
2014-06-18 12:17   ` Kirill A. Shutemov
2014-06-18 12:17     ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.