All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Optimise page alloc/free fast paths followup v2
@ 2016-04-27 14:57 ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

This is a follow-up series based on Vlastimil Babka's review feedback.
The first change is that the second patch in the previous series was dropped
as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
is fine. The nodemask pointer is the same between cpuset retries. If the
zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
change then there is a second harmless pass through the page allocator.

Patches 1-3 are fixes for patches in mmotm. They should be taking into
account the changes in mmotm already made although that did involve
some guesswork. It should be relatively easy to merge into the correct
places. However, if there are major conflicts then let me know and I'll
respin the entire series.

Patches 4-6 are from Vlastimil with only minor modifications.

There is a marginal impact to the series but it's within the noise and
necessary to address the problems.

pagealloc
                                           4.6.0-rc4                  4.6.0-rc4
                                      mmotm-20150422              followup-v1r1
Min      alloc-odr0-1               317.00 (  0.00%)           319.00 ( -0.63%)
Min      alloc-odr0-2               232.00 (  0.00%)           231.00 (  0.43%)
Min      alloc-odr0-4               192.00 (  0.00%)           193.00 ( -0.52%)
Min      alloc-odr0-8               167.00 (  0.00%)           168.00 ( -0.60%)
Min      alloc-odr0-16              154.00 (  0.00%)           155.00 ( -0.65%)
Min      alloc-odr0-32              148.00 (  0.00%)           148.00 (  0.00%)
Min      alloc-odr0-64              145.00 (  0.00%)           145.00 (  0.00%)
Min      alloc-odr0-128             143.00 (  0.00%)           144.00 ( -0.70%)
Min      alloc-odr0-256             152.00 (  0.00%)           156.00 ( -2.63%)
Min      alloc-odr0-512             164.00 (  0.00%)           165.00 ( -0.61%)
Min      alloc-odr0-1024            172.00 (  0.00%)           175.00 ( -1.74%)
Min      alloc-odr0-2048            178.00 (  0.00%)           180.00 ( -1.12%)
Min      alloc-odr0-4096            184.00 (  0.00%)           186.00 ( -1.09%)
Min      alloc-odr0-8192            187.00 (  0.00%)           189.00 ( -1.07%)
Min      alloc-odr0-16384           188.00 (  0.00%)           189.00 ( -0.53%)
Min      free-odr0-1                178.00 (  0.00%)           177.00 (  0.56%)
Min      free-odr0-2                125.00 (  0.00%)           125.00 (  0.00%)
Min      free-odr0-4                 98.00 (  0.00%)            97.00 (  1.02%)
Min      free-odr0-8                 84.00 (  0.00%)            84.00 (  0.00%)
Min      free-odr0-16                79.00 (  0.00%)            80.00 ( -1.27%)
Min      free-odr0-32                75.00 (  0.00%)            75.00 (  0.00%)
Min      free-odr0-64                73.00 (  0.00%)            73.00 (  0.00%)
Min      free-odr0-128               72.00 (  0.00%)            72.00 (  0.00%)
Min      free-odr0-256               88.00 (  0.00%)            93.00 ( -5.68%)
Min      free-odr0-512              108.00 (  0.00%)           107.00 (  0.93%)
Min      free-odr0-1024             117.00 (  0.00%)           116.00 (  0.85%)
Min      free-odr0-2048             125.00 (  0.00%)           124.00 (  0.80%)
Min      free-odr0-4096             131.00 (  0.00%)           128.00 (  2.29%)
Min      free-odr0-8192             131.00 (  0.00%)           129.00 (  1.53%)
Min      free-odr0-16384            131.00 (  0.00%)           129.00 (  1.53%)
Min      total-odr0-1               495.00 (  0.00%)           496.00 ( -0.20%)
Min      total-odr0-2               357.00 (  0.00%)           356.00 (  0.28%)
Min      total-odr0-4               290.00 (  0.00%)           290.00 (  0.00%)
Min      total-odr0-8               251.00 (  0.00%)           252.00 ( -0.40%)
Min      total-odr0-16              233.00 (  0.00%)           235.00 ( -0.86%)
Min      total-odr0-32              223.00 (  0.00%)           223.00 (  0.00%)
Min      total-odr0-64              218.00 (  0.00%)           218.00 (  0.00%)
Min      total-odr0-128             215.00 (  0.00%)           216.00 ( -0.47%)
Min      total-odr0-256             240.00 (  0.00%)           249.00 ( -3.75%)
Min      total-odr0-512             272.00 (  0.00%)           272.00 (  0.00%)
Min      total-odr0-1024            289.00 (  0.00%)           291.00 ( -0.69%)
Min      total-odr0-2048            303.00 (  0.00%)           304.00 ( -0.33%)
Min      total-odr0-4096            315.00 (  0.00%)           314.00 (  0.32%)
Min      total-odr0-8192            318.00 (  0.00%)           318.00 (  0.00%)
Min      total-odr0-16384           319.00 (  0.00%)           318.00 (  0.31%)

 mm/page_alloc.c | 69 +++++++++++++++++++++++----------------------------------
 1 file changed, 28 insertions(+), 41 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 0/6] Optimise page alloc/free fast paths followup v2
@ 2016-04-27 14:57 ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

This is a follow-up series based on Vlastimil Babka's review feedback.
The first change is that the second patch in the previous series was dropped
as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
is fine. The nodemask pointer is the same between cpuset retries. If the
zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
change then there is a second harmless pass through the page allocator.

Patches 1-3 are fixes for patches in mmotm. They should be taking into
account the changes in mmotm already made although that did involve
some guesswork. It should be relatively easy to merge into the correct
places. However, if there are major conflicts then let me know and I'll
respin the entire series.

Patches 4-6 are from Vlastimil with only minor modifications.

There is a marginal impact to the series but it's within the noise and
necessary to address the problems.

pagealloc
                                           4.6.0-rc4                  4.6.0-rc4
                                      mmotm-20150422              followup-v1r1
Min      alloc-odr0-1               317.00 (  0.00%)           319.00 ( -0.63%)
Min      alloc-odr0-2               232.00 (  0.00%)           231.00 (  0.43%)
Min      alloc-odr0-4               192.00 (  0.00%)           193.00 ( -0.52%)
Min      alloc-odr0-8               167.00 (  0.00%)           168.00 ( -0.60%)
Min      alloc-odr0-16              154.00 (  0.00%)           155.00 ( -0.65%)
Min      alloc-odr0-32              148.00 (  0.00%)           148.00 (  0.00%)
Min      alloc-odr0-64              145.00 (  0.00%)           145.00 (  0.00%)
Min      alloc-odr0-128             143.00 (  0.00%)           144.00 ( -0.70%)
Min      alloc-odr0-256             152.00 (  0.00%)           156.00 ( -2.63%)
Min      alloc-odr0-512             164.00 (  0.00%)           165.00 ( -0.61%)
Min      alloc-odr0-1024            172.00 (  0.00%)           175.00 ( -1.74%)
Min      alloc-odr0-2048            178.00 (  0.00%)           180.00 ( -1.12%)
Min      alloc-odr0-4096            184.00 (  0.00%)           186.00 ( -1.09%)
Min      alloc-odr0-8192            187.00 (  0.00%)           189.00 ( -1.07%)
Min      alloc-odr0-16384           188.00 (  0.00%)           189.00 ( -0.53%)
Min      free-odr0-1                178.00 (  0.00%)           177.00 (  0.56%)
Min      free-odr0-2                125.00 (  0.00%)           125.00 (  0.00%)
Min      free-odr0-4                 98.00 (  0.00%)            97.00 (  1.02%)
Min      free-odr0-8                 84.00 (  0.00%)            84.00 (  0.00%)
Min      free-odr0-16                79.00 (  0.00%)            80.00 ( -1.27%)
Min      free-odr0-32                75.00 (  0.00%)            75.00 (  0.00%)
Min      free-odr0-64                73.00 (  0.00%)            73.00 (  0.00%)
Min      free-odr0-128               72.00 (  0.00%)            72.00 (  0.00%)
Min      free-odr0-256               88.00 (  0.00%)            93.00 ( -5.68%)
Min      free-odr0-512              108.00 (  0.00%)           107.00 (  0.93%)
Min      free-odr0-1024             117.00 (  0.00%)           116.00 (  0.85%)
Min      free-odr0-2048             125.00 (  0.00%)           124.00 (  0.80%)
Min      free-odr0-4096             131.00 (  0.00%)           128.00 (  2.29%)
Min      free-odr0-8192             131.00 (  0.00%)           129.00 (  1.53%)
Min      free-odr0-16384            131.00 (  0.00%)           129.00 (  1.53%)
Min      total-odr0-1               495.00 (  0.00%)           496.00 ( -0.20%)
Min      total-odr0-2               357.00 (  0.00%)           356.00 (  0.28%)
Min      total-odr0-4               290.00 (  0.00%)           290.00 (  0.00%)
Min      total-odr0-8               251.00 (  0.00%)           252.00 ( -0.40%)
Min      total-odr0-16              233.00 (  0.00%)           235.00 ( -0.86%)
Min      total-odr0-32              223.00 (  0.00%)           223.00 (  0.00%)
Min      total-odr0-64              218.00 (  0.00%)           218.00 (  0.00%)
Min      total-odr0-128             215.00 (  0.00%)           216.00 ( -0.47%)
Min      total-odr0-256             240.00 (  0.00%)           249.00 ( -3.75%)
Min      total-odr0-512             272.00 (  0.00%)           272.00 (  0.00%)
Min      total-odr0-1024            289.00 (  0.00%)           291.00 ( -0.69%)
Min      total-odr0-2048            303.00 (  0.00%)           304.00 ( -0.33%)
Min      total-odr0-4096            315.00 (  0.00%)           314.00 (  0.32%)
Min      total-odr0-8192            318.00 (  0.00%)           318.00 (  0.00%)
Min      total-odr0-16384           319.00 (  0.00%)           318.00 (  0.31%)

 mm/page_alloc.c | 69 +++++++++++++++++++++++----------------------------------
 1 file changed, 28 insertions(+), 41 deletions(-)

-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] mm, page_alloc: Only check PageCompound for high-order pages -fix
  2016-04-27 14:57 ` Mel Gorman
@ 2016-04-27 14:57   ` Mel Gorman
  -1 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

Vlastimil Babka pointed out that an unlikely annotation in free_pages_prepare
shrinks stack usage by moving compound handling to the end of the function.

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-30 (-30)
function                                     old     new   delta
free_pages_prepare                           771     741     -30

It's also consistent with the buffered_rmqueue path.

This is a fix to the mmotm patch
mm-page_alloc-only-check-pagecompound-for-high-order-pages.patch.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1da56779f8fa..d8383750bd43 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1003,7 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 	 * Check tail pages before head page information is cleared to
 	 * avoid checking PageCompound for order-0 pages.
 	 */
-	if (order) {
+	if (unlikely(order)) {
 		bool compound = PageCompound(page);
 		int i;
 
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 1/6] mm, page_alloc: Only check PageCompound for high-order pages -fix
@ 2016-04-27 14:57   ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

Vlastimil Babka pointed out that an unlikely annotation in free_pages_prepare
shrinks stack usage by moving compound handling to the end of the function.

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-30 (-30)
function                                     old     new   delta
free_pages_prepare                           771     741     -30

It's also consistent with the buffered_rmqueue path.

This is a fix to the mmotm patch
mm-page_alloc-only-check-pagecompound-for-high-order-pages.patch.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1da56779f8fa..d8383750bd43 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1003,7 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 	 * Check tail pages before head page information is cleared to
 	 * avoid checking PageCompound for order-0 pages.
 	 */
-	if (order) {
+	if (unlikely(order)) {
 		bool compound = PageCompound(page);
 		int i;
 
-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] mm, page_alloc: move might_sleep_if check to the allocator slowpath -revert
  2016-04-27 14:57 ` Mel Gorman
@ 2016-04-27 14:57   ` Mel Gorman
  -1 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

Vlastimil Babka pointed out that a patch weakens a zone_reclaim test
which while "safe" defeats the purposes of the debugging check. As most
configurations eliminate this check anyway, I thought it was better to
simply revert the patch instead of adding a second check in zone_reclaim.

This is a revert of the mmotm patch
mm-page_alloc-move-might_sleep_if-check-to-the-allocator-slowpath.patch .

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8383750bd43..9ad4e68486e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3606,8 +3606,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		return NULL;
 	}
 
-	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
-
 	/*
 	 * We also sanity check to catch abuse of atomic reserves being used by
 	 * callers that are not in atomic context.
@@ -3806,6 +3804,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 
 	lockdep_trace_alloc(gfp_mask);
 
+	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
+
 	if (should_fail_alloc_page(gfp_mask, order))
 		return NULL;
 
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] mm, page_alloc: move might_sleep_if check to the allocator slowpath -revert
@ 2016-04-27 14:57   ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

Vlastimil Babka pointed out that a patch weakens a zone_reclaim test
which while "safe" defeats the purposes of the debugging check. As most
configurations eliminate this check anyway, I thought it was better to
simply revert the patch instead of adding a second check in zone_reclaim.

This is a revert of the mmotm patch
mm-page_alloc-move-might_sleep_if-check-to-the-allocator-slowpath.patch .

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8383750bd43..9ad4e68486e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3606,8 +3606,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		return NULL;
 	}
 
-	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
-
 	/*
 	 * We also sanity check to catch abuse of atomic reserves being used by
 	 * callers that are not in atomic context.
@@ -3806,6 +3804,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 
 	lockdep_trace_alloc(gfp_mask);
 
+	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
+
 	if (should_fail_alloc_page(gfp_mask, order))
 		return NULL;
 
-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] mm, page_alloc: Check once if a zone has isolated pageblocks -fix
  2016-04-27 14:57 ` Mel Gorman
@ 2016-04-27 14:57   ` Mel Gorman
  -1 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

Vlastimil Babka pointed out that the original code was protected by
the zone lock and provided a fix.

This is a fix to the mmotm patch
mm-page_alloc-check-once-if-a-zone-has-isolated-pageblocks.patch . Once
applied the following line should be removed from the changelog "Technically
this is race-prone but so is the existing code."

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ad4e68486e9..c63e5e7e4864 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1098,9 +1098,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	int migratetype = 0;
 	int batch_free = 0;
 	unsigned long nr_scanned;
-	bool isolated_pageblocks = has_isolate_pageblock(zone);
+	bool isolated_pageblocks;
 
 	spin_lock(&zone->lock);
+	isolated_pageblocks = has_isolate_pageblock(zone);
 	nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
 	if (nr_scanned)
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] mm, page_alloc: Check once if a zone has isolated pageblocks -fix
@ 2016-04-27 14:57   ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

Vlastimil Babka pointed out that the original code was protected by
the zone lock and provided a fix.

This is a fix to the mmotm patch
mm-page_alloc-check-once-if-a-zone-has-isolated-pageblocks.patch . Once
applied the following line should be removed from the changelog "Technically
this is race-prone but so is the existing code."

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ad4e68486e9..c63e5e7e4864 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1098,9 +1098,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	int migratetype = 0;
 	int batch_free = 0;
 	unsigned long nr_scanned;
-	bool isolated_pageblocks = has_isolate_pageblock(zone);
+	bool isolated_pageblocks;
 
 	spin_lock(&zone->lock);
+	isolated_pageblocks = has_isolate_pageblock(zone);
 	nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
 	if (nr_scanned)
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] mm, page_alloc: un-inline the bad part of free_pages_check
  2016-04-27 14:57 ` Mel Gorman
@ 2016-04-27 14:57   ` Mel Gorman
  -1 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

From: Vlastimil Babka <vbabka@suse.cz>

!DEBUG_VM size and bloat-o-meter:

add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-370 (-246)
function                                     old     new   delta
free_pages_check_bad                           -     124    +124
free_pcppages_bulk                          1288    1171    -117
__free_pages_ok                              948     695    -253

DEBUG_VM:

add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-214 (-90)
function                                     old     new   delta
free_pages_check_bad                           -     124    +124
free_pages_prepare                          1112     898    -214

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c63e5e7e4864..97894cbe2fa3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -906,18 +906,11 @@ static inline bool page_expected_state(struct page *page,
 	return true;
 }
 
-static inline int free_pages_check(struct page *page)
+static void free_pages_check_bad(struct page *page)
 {
 	const char *bad_reason;
 	unsigned long bad_flags;
 
-	if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)) {
-		page_cpupid_reset_last(page);
-		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
-		return 0;
-	}
-
-	/* Something has gone sideways, find it */
 	bad_reason = NULL;
 	bad_flags = 0;
 
@@ -936,6 +929,17 @@ static inline int free_pages_check(struct page *page)
 		bad_reason = "page still charged to cgroup";
 #endif
 	bad_page(page, bad_reason, bad_flags);
+}
+static inline int free_pages_check(struct page *page)
+{
+	if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) {
+		page_cpupid_reset_last(page);
+		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+		return 0;
+	}
+
+	/* Something has gone sideways, find it */
+	free_pages_check_bad(page);
 	return 1;
 }
 
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] mm, page_alloc: un-inline the bad part of free_pages_check
@ 2016-04-27 14:57   ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

From: Vlastimil Babka <vbabka@suse.cz>

!DEBUG_VM size and bloat-o-meter:

add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-370 (-246)
function                                     old     new   delta
free_pages_check_bad                           -     124    +124
free_pcppages_bulk                          1288    1171    -117
__free_pages_ok                              948     695    -253

DEBUG_VM:

add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-214 (-90)
function                                     old     new   delta
free_pages_check_bad                           -     124    +124
free_pages_prepare                          1112     898    -214

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c63e5e7e4864..97894cbe2fa3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -906,18 +906,11 @@ static inline bool page_expected_state(struct page *page,
 	return true;
 }
 
-static inline int free_pages_check(struct page *page)
+static void free_pages_check_bad(struct page *page)
 {
 	const char *bad_reason;
 	unsigned long bad_flags;
 
-	if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)) {
-		page_cpupid_reset_last(page);
-		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
-		return 0;
-	}
-
-	/* Something has gone sideways, find it */
 	bad_reason = NULL;
 	bad_flags = 0;
 
@@ -936,6 +929,17 @@ static inline int free_pages_check(struct page *page)
 		bad_reason = "page still charged to cgroup";
 #endif
 	bad_page(page, bad_reason, bad_flags);
+}
+static inline int free_pages_check(struct page *page)
+{
+	if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) {
+		page_cpupid_reset_last(page);
+		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+		return 0;
+	}
+
+	/* Something has gone sideways, find it */
+	free_pages_check_bad(page);
 	return 1;
 }
 
-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] mm, page_alloc: pull out side effects from free_pages_check
  2016-04-27 14:57 ` Mel Gorman
@ 2016-04-27 14:57   ` Mel Gorman
  -1 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

Check without side-effects should be easier to maintain. It also removes the
duplicated cpupid and flags reset done in !DEBUG_VM variant of both
free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it enables
the next patch.

It shouldn't result in new branches, thanks to inlining of the check.

!DEBUG_VM bloat-o-meter:

add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27)
function                                     old     new   delta
__free_pages_ok                              748     739      -9
free_pcppages_bulk                          1403    1385     -18

DEBUG_VM:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
function                                     old     new   delta
free_pages_prepare                           806     778     -28

This is also slightly faster because cpupid information is not set on tail
pages so we can avoid resets there.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 97894cbe2fa3..b823f00c275b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -932,11 +932,8 @@ static void free_pages_check_bad(struct page *page)
 }
 static inline int free_pages_check(struct page *page)
 {
-	if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) {
-		page_cpupid_reset_last(page);
-		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+	if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)))
 		return 0;
-	}
 
 	/* Something has gone sideways, find it */
 	free_pages_check_bad(page);
@@ -1016,7 +1013,11 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 		for (i = 1; i < (1 << order); i++) {
 			if (compound)
 				bad += free_tail_pages_check(page, page + i);
-			bad += free_pages_check(page + i);
+			if (unlikely(free_pages_check(page + i))) {
+				bad++;
+				continue;
+			}
+			(page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 		}
 	}
 	if (PageAnonHead(page))
@@ -1025,6 +1026,8 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 	if (bad)
 		return false;
 
+	page_cpupid_reset_last(page);
+	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 	reset_page_owner(page, order);
 
 	if (!PageHighMem(page)) {
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] mm, page_alloc: pull out side effects from free_pages_check
@ 2016-04-27 14:57   ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

Check without side-effects should be easier to maintain. It also removes the
duplicated cpupid and flags reset done in !DEBUG_VM variant of both
free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it enables
the next patch.

It shouldn't result in new branches, thanks to inlining of the check.

!DEBUG_VM bloat-o-meter:

add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27)
function                                     old     new   delta
__free_pages_ok                              748     739      -9
free_pcppages_bulk                          1403    1385     -18

DEBUG_VM:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
function                                     old     new   delta
free_pages_prepare                           806     778     -28

This is also slightly faster because cpupid information is not set on tail
pages so we can avoid resets there.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 97894cbe2fa3..b823f00c275b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -932,11 +932,8 @@ static void free_pages_check_bad(struct page *page)
 }
 static inline int free_pages_check(struct page *page)
 {
-	if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) {
-		page_cpupid_reset_last(page);
-		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+	if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)))
 		return 0;
-	}
 
 	/* Something has gone sideways, find it */
 	free_pages_check_bad(page);
@@ -1016,7 +1013,11 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 		for (i = 1; i < (1 << order); i++) {
 			if (compound)
 				bad += free_tail_pages_check(page, page + i);
-			bad += free_pages_check(page + i);
+			if (unlikely(free_pages_check(page + i))) {
+				bad++;
+				continue;
+			}
+			(page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 		}
 	}
 	if (PageAnonHead(page))
@@ -1025,6 +1026,8 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 	if (bad)
 		return false;
 
+	page_cpupid_reset_last(page);
+	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 	reset_page_owner(page, order);
 
 	if (!PageHighMem(page)) {
-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] mm, page_alloc: don't duplicate code in free_pcp_prepare
  2016-04-27 14:57 ` Mel Gorman
@ 2016-04-27 14:57   ` Mel Gorman
  -1 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

The new free_pcp_prepare() function shares a lot of code with
free_pages_prepare(), which makes this a maintenance risk when some future
patch modifies only one of them. We should be able to achieve the same effect
(skipping free_pages_check() from !DEBUG_VM configs) by adding a parameter to
free_pages_prepare() and making it inline, so the checks (and the order != 0
parts) are eliminated from the call from free_pcp_prepare().

!DEBUG_VM: bloat-o-meter reports no difference, as my gcc was already inlining
free_pages_prepare() and the elimination seems to work as expected

DEBUG_VM bloat-o-meter:

add/remove: 0/1 grow/shrink: 2/0 up/down: 1035/-778 (257)
function                                     old     new   delta
__free_pages_ok                              297    1060    +763
free_hot_cold_page                           480     752    +272
free_pages_prepare                           778       -    -778

Here inlining didn't occur before, and added some code, but it's ok for a debug
option.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b823f00c275b..bc4160bfb36b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -990,7 +990,8 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	return ret;
 }
 
-static bool free_pages_prepare(struct page *page, unsigned int order)
+static __always_inline bool free_pages_prepare(struct page *page, unsigned int order,
+						bool check_free)
 {
 	int bad = 0;
 
@@ -1022,7 +1023,8 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 	}
 	if (PageAnonHead(page))
 		page->mapping = NULL;
-	bad += free_pages_check(page);
+	if (check_free)
+		bad += free_pages_check(page);
 	if (bad)
 		return false;
 
@@ -1046,7 +1048,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 #ifdef CONFIG_DEBUG_VM
 static inline bool free_pcp_prepare(struct page *page)
 {
-	return free_pages_prepare(page, 0);
+	return free_pages_prepare(page, 0, true);
 }
 
 static inline bool bulkfree_pcp_prepare(struct page *page)
@@ -1056,30 +1058,7 @@ static inline bool bulkfree_pcp_prepare(struct page *page)
 #else
 static bool free_pcp_prepare(struct page *page)
 {
-	VM_BUG_ON_PAGE(PageTail(page), page);
-
-	trace_mm_page_free(page, 0);
-	kmemcheck_free_shadow(page, 0);
-	kasan_poison_free_pages(page, 0);
-
-	if (PageAnonHead(page))
-		page->mapping = NULL;
-
-	reset_page_owner(page, 0);
-
-	if (!PageHighMem(page)) {
-		debug_check_no_locks_freed(page_address(page),
-					   PAGE_SIZE);
-		debug_check_no_obj_freed(page_address(page),
-					   PAGE_SIZE);
-	}
-	arch_free_page(page, 0);
-	kernel_poison_pages(page, 0, 0);
-	kernel_map_pages(page, 0, 0);
-
-	page_cpupid_reset_last(page);
-	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
-	return true;
+	return free_pages_prepare(page, 0, false);
 }
 
 static bool bulkfree_pcp_prepare(struct page *page)
@@ -1257,7 +1236,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	int migratetype;
 	unsigned long pfn = page_to_pfn(page);
 
-	if (!free_pages_prepare(page, order))
+	if (!free_pages_prepare(page, order, true))
 		return;
 
 	migratetype = get_pfnblock_migratetype(page, pfn);
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] mm, page_alloc: don't duplicate code in free_pcp_prepare
@ 2016-04-27 14:57   ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML, Mel Gorman

The new free_pcp_prepare() function shares a lot of code with
free_pages_prepare(), which makes this a maintenance risk when some future
patch modifies only one of them. We should be able to achieve the same effect
(skipping free_pages_check() from !DEBUG_VM configs) by adding a parameter to
free_pages_prepare() and making it inline, so the checks (and the order != 0
parts) are eliminated from the call from free_pcp_prepare().

!DEBUG_VM: bloat-o-meter reports no difference, as my gcc was already inlining
free_pages_prepare() and the elimination seems to work as expected

DEBUG_VM bloat-o-meter:

add/remove: 0/1 grow/shrink: 2/0 up/down: 1035/-778 (257)
function                                     old     new   delta
__free_pages_ok                              297    1060    +763
free_hot_cold_page                           480     752    +272
free_pages_prepare                           778       -    -778

Here inlining didn't occur before, and added some code, but it's ok for a debug
option.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b823f00c275b..bc4160bfb36b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -990,7 +990,8 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
 	return ret;
 }
 
-static bool free_pages_prepare(struct page *page, unsigned int order)
+static __always_inline bool free_pages_prepare(struct page *page, unsigned int order,
+						bool check_free)
 {
 	int bad = 0;
 
@@ -1022,7 +1023,8 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 	}
 	if (PageAnonHead(page))
 		page->mapping = NULL;
-	bad += free_pages_check(page);
+	if (check_free)
+		bad += free_pages_check(page);
 	if (bad)
 		return false;
 
@@ -1046,7 +1048,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 #ifdef CONFIG_DEBUG_VM
 static inline bool free_pcp_prepare(struct page *page)
 {
-	return free_pages_prepare(page, 0);
+	return free_pages_prepare(page, 0, true);
 }
 
 static inline bool bulkfree_pcp_prepare(struct page *page)
@@ -1056,30 +1058,7 @@ static inline bool bulkfree_pcp_prepare(struct page *page)
 #else
 static bool free_pcp_prepare(struct page *page)
 {
-	VM_BUG_ON_PAGE(PageTail(page), page);
-
-	trace_mm_page_free(page, 0);
-	kmemcheck_free_shadow(page, 0);
-	kasan_poison_free_pages(page, 0);
-
-	if (PageAnonHead(page))
-		page->mapping = NULL;
-
-	reset_page_owner(page, 0);
-
-	if (!PageHighMem(page)) {
-		debug_check_no_locks_freed(page_address(page),
-					   PAGE_SIZE);
-		debug_check_no_obj_freed(page_address(page),
-					   PAGE_SIZE);
-	}
-	arch_free_page(page, 0);
-	kernel_poison_pages(page, 0, 0);
-	kernel_map_pages(page, 0, 0);
-
-	page_cpupid_reset_last(page);
-	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
-	return true;
+	return free_pages_prepare(page, 0, false);
 }
 
 static bool bulkfree_pcp_prepare(struct page *page)
@@ -1257,7 +1236,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	int migratetype;
 	unsigned long pfn = page_to_pfn(page);
 
-	if (!free_pages_prepare(page, order))
+	if (!free_pages_prepare(page, order, true))
 		return;
 
 	migratetype = get_pfnblock_migratetype(page, pfn);
-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] Optimise page alloc/free fast paths followup v2
  2016-04-27 14:57 ` Mel Gorman
@ 2016-05-02  8:54   ` Vlastimil Babka
  -1 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2016-05-02  8:54 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML

On 04/27/2016 04:57 PM, Mel Gorman wrote:
> as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
> is fine. The nodemask pointer is the same between cpuset retries. If the
> zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
> change then there is a second harmless pass through the page allocator.

True. But I just realized (while working on direct compaction priorities)
that there's another subtle issue with the ALLOC_NO_WATERMARKS part.
According to the comment it should be ignoring mempolicies, but it still
honours ac.nodemask, and your patch is replacing NULL ac.nodemask with the
mempolicy one.

I think it's possibly easily fixed outside the fast path like this. If
you agree, consider it has my s-o-b:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f052bbca41d..7ccaa6e023f3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3834,6 +3834,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	alloc_mask = memalloc_noio_flags(gfp_mask);
 	ac.spread_dirty_pages = false;
 
+	/*
+	 * Restore the original nodemask, which might have been replaced with
+	 * &cpuset_current_mems_allowed to optimize the fast-path attempt.
+	 */
+	ac.nodemask = nodemask;
 	page = __alloc_pages_slowpath(alloc_mask, order, &ac);
 
 no_zone:

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] Optimise page alloc/free fast paths followup v2
@ 2016-05-02  8:54   ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2016-05-02  8:54 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML

On 04/27/2016 04:57 PM, Mel Gorman wrote:
> as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
> is fine. The nodemask pointer is the same between cpuset retries. If the
> zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
> change then there is a second harmless pass through the page allocator.

True. But I just realized (while working on direct compaction priorities)
that there's another subtle issue with the ALLOC_NO_WATERMARKS part.
According to the comment it should be ignoring mempolicies, but it still
honours ac.nodemask, and your patch is replacing NULL ac.nodemask with the
mempolicy one.

I think it's possibly easily fixed outside the fast path like this. If
you agree, consider it has my s-o-b:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f052bbca41d..7ccaa6e023f3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3834,6 +3834,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	alloc_mask = memalloc_noio_flags(gfp_mask);
 	ac.spread_dirty_pages = false;
 
+	/*
+	 * Restore the original nodemask, which might have been replaced with
+	 * &cpuset_current_mems_allowed to optimize the fast-path attempt.
+	 */
+	ac.nodemask = nodemask;
 	page = __alloc_pages_slowpath(alloc_mask, order, &ac);
 
 no_zone:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] Optimise page alloc/free fast paths followup v2
  2016-05-02  8:54   ` Vlastimil Babka
@ 2016-05-03  8:50     ` Mel Gorman
  -1 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-05-03  8:50 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML

On Mon, May 02, 2016 at 10:54:23AM +0200, Vlastimil Babka wrote:
> On 04/27/2016 04:57 PM, Mel Gorman wrote:
> > as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
> > is fine. The nodemask pointer is the same between cpuset retries. If the
> > zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
> > change then there is a second harmless pass through the page allocator.
> 
> True. But I just realized (while working on direct compaction priorities)
> that there's another subtle issue with the ALLOC_NO_WATERMARKS part.
> According to the comment it should be ignoring mempolicies, but it still
> honours ac.nodemask, and your patch is replacing NULL ac.nodemask with the
> mempolicy one.
> 
> I think it's possibly easily fixed outside the fast path like this. If
> you agree, consider it has my s-o-b:
> 

While I see your point, I don't necessarily see why this fixes it as the
original nodemask may also be a restricted set that ALLOC_NO_WATERMARKS
should ignore. How about this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 79100583b9de..dbb08d102d41 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3432,9 +3432,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		/*
 		 * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
 		 * the allocation is high priority and these type of
-		 * allocations are system rather than user orientated
+		 * allocations are system rather than user orientated. If a
+		 * cpuset retry occurs then these values persist across the
+		 * retry but that's ok for a context ignoring watermarks.
 		 */
 		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
+		ac->high_zoneidx = MAX_NR_ZONES - 1;
+		ac->nodemask = NULL;
 		page = get_page_from_freelist(gfp_mask, order,
 						ALLOC_NO_WATERMARKS, ac);
 		if (page)

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] Optimise page alloc/free fast paths followup v2
@ 2016-05-03  8:50     ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2016-05-03  8:50 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML

On Mon, May 02, 2016 at 10:54:23AM +0200, Vlastimil Babka wrote:
> On 04/27/2016 04:57 PM, Mel Gorman wrote:
> > as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
> > is fine. The nodemask pointer is the same between cpuset retries. If the
> > zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
> > change then there is a second harmless pass through the page allocator.
> 
> True. But I just realized (while working on direct compaction priorities)
> that there's another subtle issue with the ALLOC_NO_WATERMARKS part.
> According to the comment it should be ignoring mempolicies, but it still
> honours ac.nodemask, and your patch is replacing NULL ac.nodemask with the
> mempolicy one.
> 
> I think it's possibly easily fixed outside the fast path like this. If
> you agree, consider it has my s-o-b:
> 

While I see your point, I don't necessarily see why this fixes it as the
original nodemask may also be a restricted set that ALLOC_NO_WATERMARKS
should ignore. How about this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 79100583b9de..dbb08d102d41 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3432,9 +3432,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		/*
 		 * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
 		 * the allocation is high priority and these type of
-		 * allocations are system rather than user orientated
+		 * allocations are system rather than user orientated. If a
+		 * cpuset retry occurs then these values persist across the
+		 * retry but that's ok for a context ignoring watermarks.
 		 */
 		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
+		ac->high_zoneidx = MAX_NR_ZONES - 1;
+		ac->nodemask = NULL;
 		page = get_page_from_freelist(gfp_mask, order,
 						ALLOC_NO_WATERMARKS, ac);
 		if (page)

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] Optimise page alloc/free fast paths followup v2
  2016-05-03  8:50     ` Mel Gorman
@ 2016-05-03 14:33       ` Vlastimil Babka
  -1 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2016-05-03 14:33 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML

On 05/03/2016 10:50 AM, Mel Gorman wrote:
> On Mon, May 02, 2016 at 10:54:23AM +0200, Vlastimil Babka wrote:
>> On 04/27/2016 04:57 PM, Mel Gorman wrote:
>>> as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
>>> is fine. The nodemask pointer is the same between cpuset retries. If the
>>> zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
>>> change then there is a second harmless pass through the page allocator.
>>
>> True. But I just realized (while working on direct compaction priorities)
>> that there's another subtle issue with the ALLOC_NO_WATERMARKS part.
>> According to the comment it should be ignoring mempolicies, but it still
>> honours ac.nodemask, and your patch is replacing NULL ac.nodemask with the
>> mempolicy one.
>>
>> I think it's possibly easily fixed outside the fast path like this. If
>> you agree, consider it has my s-o-b:
>>
>
> While I see your point, I don't necessarily see why this fixes it as the
> original nodemask may also be a restricted set that ALLOC_NO_WATERMARKS
> should ignore.

I wasn't so sure about that, so I defensively went with just restoring 
the pre-patch behavior. I expect that it's safe to ignore mempolicies 
imposed on the process via e.g. taskset/numactl, for a a critical kernel 
allocation that happens to be done within the process context. But 
ignoring nodemask that's imposed by the kernel allocation itself might 
be breaking some expectations. I guess vma policies can also result in a 
restricted nodemask, but those should be for userspace allocations and 
never ALLOC_NO_WATERMARKS?

> How about this?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 79100583b9de..dbb08d102d41 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3432,9 +3432,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>   		/*
>   		 * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
>   		 * the allocation is high priority and these type of
> -		 * allocations are system rather than user orientated
> +		 * allocations are system rather than user orientated. If a
> +		 * cpuset retry occurs then these values persist across the
> +		 * retry but that's ok for a context ignoring watermarks.
>   		 */
>   		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> +		ac->high_zoneidx = MAX_NR_ZONES - 1;

Wouldn't altering high_zoneidx like this result in e.g. DMA allocation 
ending up in a NORMAL zone? Also high_zoneidx doesn't seem to get 
restricted by mempolicy/nodemask anyway. Maybe you wanted to re-set 
preferred_zoneref? That would make sense, yeah.

> +		ac->nodemask = NULL;
>   		page = get_page_from_freelist(gfp_mask, order,
>   						ALLOC_NO_WATERMARKS, ac);
>   		if (page)
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] Optimise page alloc/free fast paths followup v2
@ 2016-05-03 14:33       ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2016-05-03 14:33 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML

On 05/03/2016 10:50 AM, Mel Gorman wrote:
> On Mon, May 02, 2016 at 10:54:23AM +0200, Vlastimil Babka wrote:
>> On 04/27/2016 04:57 PM, Mel Gorman wrote:
>>> as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
>>> is fine. The nodemask pointer is the same between cpuset retries. If the
>>> zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
>>> change then there is a second harmless pass through the page allocator.
>>
>> True. But I just realized (while working on direct compaction priorities)
>> that there's another subtle issue with the ALLOC_NO_WATERMARKS part.
>> According to the comment it should be ignoring mempolicies, but it still
>> honours ac.nodemask, and your patch is replacing NULL ac.nodemask with the
>> mempolicy one.
>>
>> I think it's possibly easily fixed outside the fast path like this. If
>> you agree, consider it has my s-o-b:
>>
>
> While I see your point, I don't necessarily see why this fixes it as the
> original nodemask may also be a restricted set that ALLOC_NO_WATERMARKS
> should ignore.

I wasn't so sure about that, so I defensively went with just restoring 
the pre-patch behavior. I expect that it's safe to ignore mempolicies 
imposed on the process via e.g. taskset/numactl, for a a critical kernel 
allocation that happens to be done within the process context. But 
ignoring nodemask that's imposed by the kernel allocation itself might 
be breaking some expectations. I guess vma policies can also result in a 
restricted nodemask, but those should be for userspace allocations and 
never ALLOC_NO_WATERMARKS?

> How about this?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 79100583b9de..dbb08d102d41 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3432,9 +3432,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>   		/*
>   		 * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
>   		 * the allocation is high priority and these type of
> -		 * allocations are system rather than user orientated
> +		 * allocations are system rather than user orientated. If a
> +		 * cpuset retry occurs then these values persist across the
> +		 * retry but that's ok for a context ignoring watermarks.
>   		 */
>   		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> +		ac->high_zoneidx = MAX_NR_ZONES - 1;

Wouldn't altering high_zoneidx like this result in e.g. DMA allocation 
ending up in a NORMAL zone? Also high_zoneidx doesn't seem to get 
restricted by mempolicy/nodemask anyway. Maybe you wanted to re-set 
preferred_zoneref? That would make sense, yeah.

> +		ac->nodemask = NULL;
>   		page = get_page_from_freelist(gfp_mask, order,
>   						ALLOC_NO_WATERMARKS, ac);
>   		if (page)
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-05-03 14:33 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-27 14:57 [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Mel Gorman
2016-04-27 14:57 ` Mel Gorman
2016-04-27 14:57 ` [PATCH 1/6] mm, page_alloc: Only check PageCompound for high-order pages -fix Mel Gorman
2016-04-27 14:57   ` Mel Gorman
2016-04-27 14:57 ` [PATCH 2/6] mm, page_alloc: move might_sleep_if check to the allocator slowpath -revert Mel Gorman
2016-04-27 14:57   ` Mel Gorman
2016-04-27 14:57 ` [PATCH 3/6] mm, page_alloc: Check once if a zone has isolated pageblocks -fix Mel Gorman
2016-04-27 14:57   ` Mel Gorman
2016-04-27 14:57 ` [PATCH 4/6] mm, page_alloc: un-inline the bad part of free_pages_check Mel Gorman
2016-04-27 14:57   ` Mel Gorman
2016-04-27 14:57 ` [PATCH 5/6] mm, page_alloc: pull out side effects from free_pages_check Mel Gorman
2016-04-27 14:57   ` Mel Gorman
2016-04-27 14:57 ` [PATCH 6/6] mm, page_alloc: don't duplicate code in free_pcp_prepare Mel Gorman
2016-04-27 14:57   ` Mel Gorman
2016-05-02  8:54 ` [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Vlastimil Babka
2016-05-02  8:54   ` Vlastimil Babka
2016-05-03  8:50   ` Mel Gorman
2016-05-03  8:50     ` Mel Gorman
2016-05-03 14:33     ` Vlastimil Babka
2016-05-03 14:33       ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.