All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] try to reduce fragmenting fallbacks
@ 2017-02-10 17:23 ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

Hi,

this is a v2 of [1] from last year, which was a response to Johanes' worries
about mobility grouping regressions. There are some new patches and the order
goes from cleanups to "obvious wins" towards "just RFC" (last two patches).
But it's all theoretical for now, I'm trying to run some tests with the usual
problem of not having good workloads and metrics :) But I'd like to hear some
feedback anyway. For now this is based on v4.9.

I think the only substantial new patch is 08/10, the rest is some cleanups,
small tweaks and bugfixes.

[1] https://www.spinics.net/lists/linux-mm/msg114380.html

Vlastimil Babka (10):
  mm, compaction: reorder fields in struct compact_control
  mm, compaction: remove redundant watermark check in compact_finished()
  mm, page_alloc: split smallest stolen page in fallback
  mm, page_alloc: count movable pages when stealing from pageblock
  mm, compaction: change migrate_async_suitable() to
    suitable_migration_source()
  mm, compaction: add migratetype to compact_control
  mm, compaction: restrict async compaction to pageblocks of same
    migratetype
  mm, compaction: finish whole pageblock to reduce fragmentation
  mm, page_alloc: disallow migratetype fallback in fastpath
  mm, page_alloc: introduce MIGRATE_MIXED migratetype

 include/linux/mmzone.h         |   6 ++
 include/linux/page-isolation.h |   5 +-
 mm/compaction.c                | 116 +++++++++++++++++-------
 mm/internal.h                  |  14 +--
 mm/page_alloc.c                | 196 +++++++++++++++++++++++++++++------------
 mm/page_isolation.c            |   5 +-
 6 files changed, 246 insertions(+), 96 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v2 00/10] try to reduce fragmenting fallbacks
@ 2017-02-10 17:23 ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

Hi,

this is a v2 of [1] from last year, which was a response to Johanes' worries
about mobility grouping regressions. There are some new patches and the order
goes from cleanups to "obvious wins" towards "just RFC" (last two patches).
But it's all theoretical for now, I'm trying to run some tests with the usual
problem of not having good workloads and metrics :) But I'd like to hear some
feedback anyway. For now this is based on v4.9.

I think the only substantial new patch is 08/10, the rest is some cleanups,
small tweaks and bugfixes.

[1] https://www.spinics.net/lists/linux-mm/msg114380.html

Vlastimil Babka (10):
  mm, compaction: reorder fields in struct compact_control
  mm, compaction: remove redundant watermark check in compact_finished()
  mm, page_alloc: split smallest stolen page in fallback
  mm, page_alloc: count movable pages when stealing from pageblock
  mm, compaction: change migrate_async_suitable() to
    suitable_migration_source()
  mm, compaction: add migratetype to compact_control
  mm, compaction: restrict async compaction to pageblocks of same
    migratetype
  mm, compaction: finish whole pageblock to reduce fragmentation
  mm, page_alloc: disallow migratetype fallback in fastpath
  mm, page_alloc: introduce MIGRATE_MIXED migratetype

 include/linux/mmzone.h         |   6 ++
 include/linux/page-isolation.h |   5 +-
 mm/compaction.c                | 116 +++++++++++++++++-------
 mm/internal.h                  |  14 +--
 mm/page_alloc.c                | 196 +++++++++++++++++++++++++++++------------
 mm/page_isolation.c            |   5 +-
 6 files changed, 246 insertions(+), 96 deletions(-)

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v2 01/10] mm, compaction: reorder fields in struct compact_control
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-10 17:23   ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

While currently there are (mostly by accident) no holes in struct
compact_control (on x86_64), but we are going to add more bool flags, so place
them all together to the end of the structure. While at it, just order all
fields from largest to smallest.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/internal.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 537ac9951f5f..da37ddd3db40 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -171,21 +171,21 @@ extern int user_min_free_kbytes;
 struct compact_control {
 	struct list_head freepages;	/* List of free pages to migrate to */
 	struct list_head migratepages;	/* List of pages being migrated */
+	struct zone *zone;
 	unsigned long nr_freepages;	/* Number of isolated free pages */
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
+	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
+	int order;			/* order a direct compactor needs */
+	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
+	const int classzone_idx;	/* zone index of a direct compactor */
 	enum migrate_mode mode;		/* Async or sync migration mode */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool ignore_block_suitable;	/* Scan blocks considered unsuitable */
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
-	int order;			/* order a direct compactor needs */
-	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
-	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
-	const int classzone_idx;	/* zone index of a direct compactor */
-	struct zone *zone;
 	bool contended;			/* Signal lock or sched contention */
 };
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 01/10] mm, compaction: reorder fields in struct compact_control
@ 2017-02-10 17:23   ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

While currently there are (mostly by accident) no holes in struct
compact_control (on x86_64), but we are going to add more bool flags, so place
them all together to the end of the structure. While at it, just order all
fields from largest to smallest.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/internal.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 537ac9951f5f..da37ddd3db40 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -171,21 +171,21 @@ extern int user_min_free_kbytes;
 struct compact_control {
 	struct list_head freepages;	/* List of free pages to migrate to */
 	struct list_head migratepages;	/* List of pages being migrated */
+	struct zone *zone;
 	unsigned long nr_freepages;	/* Number of isolated free pages */
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
+	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
+	int order;			/* order a direct compactor needs */
+	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
+	const int classzone_idx;	/* zone index of a direct compactor */
 	enum migrate_mode mode;		/* Async or sync migration mode */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool ignore_block_suitable;	/* Scan blocks considered unsuitable */
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
-	int order;			/* order a direct compactor needs */
-	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
-	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
-	const int classzone_idx;	/* zone index of a direct compactor */
-	struct zone *zone;
 	bool contended;			/* Signal lock or sched contention */
 };
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 02/10] mm, compaction: remove redundant watermark check in compact_finished()
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-10 17:23   ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

When detecting whether compaction has succeeded in forming a high-order page,
__compact_finished() employs a watermark check, followed by an own search for
a suitable page in the freelists. This is not ideal for two reasons:

- The watermark check also searches high-order freelists, but has a less strict
  criteria wrt fallback. It's therefore redundant and waste of cycles. This was
  different in the past when high-order watermark check attempted to apply
  reserves to high-order pages.

- The watermark check might actually fail due to lack of order-0 pages.
  Compaction can't help with that, so there's no point in continuing because of
  that. It's possible that high-order page still exists and it terminates.

This patch therefore removes the watermark check. This should save some cycles
and terminate compaction sooner in some cases.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 0409a4ad6ea1..fc88e7b6fe37 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1291,7 +1291,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 			    const int migratetype)
 {
 	unsigned int order;
-	unsigned long watermark;
 
 	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_CONTENDED;
@@ -1319,13 +1318,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 	if (is_via_compact_memory(cc->order))
 		return COMPACT_CONTINUE;
 
-	/* Compaction run is not finished if the watermark is not met */
-	watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];
-
-	if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
-							cc->alloc_flags))
-		return COMPACT_CONTINUE;
-
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
 		struct free_area *area = &zone->free_area[order];
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 02/10] mm, compaction: remove redundant watermark check in compact_finished()
@ 2017-02-10 17:23   ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

When detecting whether compaction has succeeded in forming a high-order page,
__compact_finished() employs a watermark check, followed by an own search for
a suitable page in the freelists. This is not ideal for two reasons:

- The watermark check also searches high-order freelists, but has a less strict
  criteria wrt fallback. It's therefore redundant and waste of cycles. This was
  different in the past when high-order watermark check attempted to apply
  reserves to high-order pages.

- The watermark check might actually fail due to lack of order-0 pages.
  Compaction can't help with that, so there's no point in continuing because of
  that. It's possible that high-order page still exists and it terminates.

This patch therefore removes the watermark check. This should save some cycles
and terminate compaction sooner in some cases.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 0409a4ad6ea1..fc88e7b6fe37 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1291,7 +1291,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 			    const int migratetype)
 {
 	unsigned int order;
-	unsigned long watermark;
 
 	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_CONTENDED;
@@ -1319,13 +1318,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 	if (is_via_compact_memory(cc->order))
 		return COMPACT_CONTINUE;
 
-	/* Compaction run is not finished if the watermark is not met */
-	watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];
-
-	if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
-							cc->alloc_flags))
-		return COMPACT_CONTINUE;
-
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
 		struct free_area *area = &zone->free_area[order];
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 03/10] mm, page_alloc: split smallest stolen page in fallback
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-10 17:23   ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

The __rmqueue_fallback() function is called when there's no free page of
requested migratetype, and we need to steal from a different one. There are
various heuristics to make this event infrequent and reduce permanent
fragmentation. The main one is to try stealing from a pageblock that has the
most free pages, and possibly steal them all at once and convert the whole
pageblock. Precise searching for such pageblock would be expensive, so instead
the heuristics walks the free lists from MAX_ORDER down to requested order and
assumes that the block with highest-order free page is likely to also have the
most free pages in total.

Chances are that together with the highest-order page, we steal also pages of
lower orders from the same block. But then we still split the highest order
page. This is wasteful and can contribute to fragmentation instead of avoiding
it.

This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
them on the freelist of the requested migratetype, and only report whether it
was successful. Then we pick (and eventually split) the smallest page with
__rmqueue_smallest().  This all happens under zone lock, so nobody can steal it
from us in the process. This should reduce fragmentation due to fallbacks. At
worst we are only stealing a single highest-order page and waste some cycles by
moving it between lists and then removing it, but fallback is not exactly hot
path so that should not be a concern. As a side benefit the patch removes some
duplicate code by reusing __rmqueue_smallest().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 48 ++++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6de9440e3ae2..314e6b9ddbc4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1960,14 +1960,24 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
  * use it's pages as requested migratetype in the future.
  */
 static void steal_suitable_fallback(struct zone *zone, struct page *page,
-							  int start_type)
+					 int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
+	struct free_area *area;
 	int pages;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
 		change_pageblock_range(page, current_order, start_type);
+		area = &zone->free_area[current_order];
+		list_move(&page->lru, &area->free_list[start_type]);
+		return;
+	}
+
+	/* We are not allowed to try stealing from the whole block */
+	if (!whole_block) {
+		area = &zone->free_area[current_order];
+		list_move(&page->lru, &area->free_list[start_type]);
 		return;
 	}
 
@@ -2111,8 +2121,13 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
 	}
 }
 
-/* Remove an element from the buddy allocator from the fallback list */
-static inline struct page *
+/*
+ * Try finding a free buddy page on the fallback list and put it on the free
+ * list of requested migratetype, possibly along with other pages from the same
+ * block, depending on fragmentation avoidance heuristics. Returns true if
+ * fallback was found so that __rmqueue_smallest() can grab it.
+ */
+static inline bool
 __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 {
 	struct free_area *area;
@@ -2133,32 +2148,16 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 
 		page = list_first_entry(&area->free_list[fallback_mt],
 						struct page, lru);
-		if (can_steal)
-			steal_suitable_fallback(zone, page, start_migratetype);
 
-		/* Remove the page from the freelists */
-		area->nr_free--;
-		list_del(&page->lru);
-		rmv_page_order(page);
-
-		expand(zone, page, order, current_order, area,
-					start_migratetype);
-		/*
-		 * The pcppage_migratetype may differ from pageblock's
-		 * migratetype depending on the decisions in
-		 * find_suitable_fallback(). This is OK as long as it does not
-		 * differ for MIGRATE_CMA pageblocks. Those can be used as
-		 * fallback only via special __rmqueue_cma_fallback() function
-		 */
-		set_pcppage_migratetype(page, start_migratetype);
+		steal_suitable_fallback(zone, page, start_migratetype, can_steal);
 
 		trace_mm_page_alloc_extfrag(page, order, current_order,
 			start_migratetype, fallback_mt);
 
-		return page;
+		return true;
 	}
 
-	return NULL;
+	return false;
 }
 
 /*
@@ -2170,13 +2169,14 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 {
 	struct page *page;
 
+retry:
 	page = __rmqueue_smallest(zone, order, migratetype);
 	if (unlikely(!page)) {
 		if (migratetype == MIGRATE_MOVABLE)
 			page = __rmqueue_cma_fallback(zone, order);
 
-		if (!page)
-			page = __rmqueue_fallback(zone, order, migratetype);
+		if (!page && __rmqueue_fallback(zone, order, migratetype))
+			goto retry;
 	}
 
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 03/10] mm, page_alloc: split smallest stolen page in fallback
@ 2017-02-10 17:23   ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

The __rmqueue_fallback() function is called when there's no free page of
requested migratetype, and we need to steal from a different one. There are
various heuristics to make this event infrequent and reduce permanent
fragmentation. The main one is to try stealing from a pageblock that has the
most free pages, and possibly steal them all at once and convert the whole
pageblock. Precise searching for such pageblock would be expensive, so instead
the heuristics walks the free lists from MAX_ORDER down to requested order and
assumes that the block with highest-order free page is likely to also have the
most free pages in total.

Chances are that together with the highest-order page, we steal also pages of
lower orders from the same block. But then we still split the highest order
page. This is wasteful and can contribute to fragmentation instead of avoiding
it.

This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
them on the freelist of the requested migratetype, and only report whether it
was successful. Then we pick (and eventually split) the smallest page with
__rmqueue_smallest().  This all happens under zone lock, so nobody can steal it
from us in the process. This should reduce fragmentation due to fallbacks. At
worst we are only stealing a single highest-order page and waste some cycles by
moving it between lists and then removing it, but fallback is not exactly hot
path so that should not be a concern. As a side benefit the patch removes some
duplicate code by reusing __rmqueue_smallest().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 48 ++++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6de9440e3ae2..314e6b9ddbc4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1960,14 +1960,24 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
  * use it's pages as requested migratetype in the future.
  */
 static void steal_suitable_fallback(struct zone *zone, struct page *page,
-							  int start_type)
+					 int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
+	struct free_area *area;
 	int pages;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
 		change_pageblock_range(page, current_order, start_type);
+		area = &zone->free_area[current_order];
+		list_move(&page->lru, &area->free_list[start_type]);
+		return;
+	}
+
+	/* We are not allowed to try stealing from the whole block */
+	if (!whole_block) {
+		area = &zone->free_area[current_order];
+		list_move(&page->lru, &area->free_list[start_type]);
 		return;
 	}
 
@@ -2111,8 +2121,13 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
 	}
 }
 
-/* Remove an element from the buddy allocator from the fallback list */
-static inline struct page *
+/*
+ * Try finding a free buddy page on the fallback list and put it on the free
+ * list of requested migratetype, possibly along with other pages from the same
+ * block, depending on fragmentation avoidance heuristics. Returns true if
+ * fallback was found so that __rmqueue_smallest() can grab it.
+ */
+static inline bool
 __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 {
 	struct free_area *area;
@@ -2133,32 +2148,16 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 
 		page = list_first_entry(&area->free_list[fallback_mt],
 						struct page, lru);
-		if (can_steal)
-			steal_suitable_fallback(zone, page, start_migratetype);
 
-		/* Remove the page from the freelists */
-		area->nr_free--;
-		list_del(&page->lru);
-		rmv_page_order(page);
-
-		expand(zone, page, order, current_order, area,
-					start_migratetype);
-		/*
-		 * The pcppage_migratetype may differ from pageblock's
-		 * migratetype depending on the decisions in
-		 * find_suitable_fallback(). This is OK as long as it does not
-		 * differ for MIGRATE_CMA pageblocks. Those can be used as
-		 * fallback only via special __rmqueue_cma_fallback() function
-		 */
-		set_pcppage_migratetype(page, start_migratetype);
+		steal_suitable_fallback(zone, page, start_migratetype, can_steal);
 
 		trace_mm_page_alloc_extfrag(page, order, current_order,
 			start_migratetype, fallback_mt);
 
-		return page;
+		return true;
 	}
 
-	return NULL;
+	return false;
 }
 
 /*
@@ -2170,13 +2169,14 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 {
 	struct page *page;
 
+retry:
 	page = __rmqueue_smallest(zone, order, migratetype);
 	if (unlikely(!page)) {
 		if (migratetype == MIGRATE_MOVABLE)
 			page = __rmqueue_cma_fallback(zone, order);
 
-		if (!page)
-			page = __rmqueue_fallback(zone, order, migratetype);
+		if (!page && __rmqueue_fallback(zone, order, migratetype))
+			goto retry;
 	}
 
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-10 17:23   ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

When stealing pages from pageblock of a different migratetype, we count how
many free pages were stolen, and change the pageblock's migratetype if more
than half of the pageblock was free. This might be too conservative, as there
might be other pages that are not free, but were allocated with the same
migratetype as our allocation requested.

While we cannot determine the migratetype of allocated pages precisely (at
least without the page_owner functionality enabled), we can count pages that
compaction would try to isolate for migration - those are either on LRU or
__PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
done as part of free page stealing with little additional overhead.

The page stealing code is changed so that it considers free pages plus pages
of the "good" migratetype for the decision whether to change pageblock's
migratetype.

The result should be more accurate migratetype of pageblocks wrt the actual
pages in the pageblocks, when stealing from semi-occupied pageblocks. This
should help the efficiency of page grouping by mobility.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/page-isolation.h |  5 +---
 mm/page_alloc.c                | 54 +++++++++++++++++++++++++++++++++---------
 mm/page_isolation.c            |  5 ++--
 3 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 047d64706f2a..d4cd2014fa6f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,10 +33,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			 bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype);
-int move_freepages(struct zone *zone,
-			  struct page *start_page, struct page *end_page,
-			  int migratetype);
+				int migratetype, int *num_movable);
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 314e6b9ddbc4..a7d33818610f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1844,9 +1844,9 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
  * Note that start_page and end_pages are not aligned on a pageblock
  * boundary. If alignment is required, use move_freepages_block()
  */
-int move_freepages(struct zone *zone,
+static int move_freepages(struct zone *zone,
 			  struct page *start_page, struct page *end_page,
-			  int migratetype)
+			  int migratetype, int *num_movable)
 {
 	struct page *page;
 	unsigned int order;
@@ -1863,6 +1863,9 @@ int move_freepages(struct zone *zone,
 	VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
 #endif
 
+	if (num_movable)
+		*num_movable = 0;
+
 	for (page = start_page; page <= end_page;) {
 		/* Make sure we are not inadvertently changing nodes */
 		VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
@@ -1873,6 +1876,14 @@ int move_freepages(struct zone *zone,
 		}
 
 		if (!PageBuddy(page)) {
+			/*
+			 * We assume that pages that could be isolated for
+			 * migration are movable. But we don't actually try
+			 * isolating, as that would be expensive.
+			 */
+			if (num_movable && (PageLRU(page) || __PageMovable(page)))
+				(*num_movable)++;
+
 			page++;
 			continue;
 		}
@@ -1888,7 +1899,7 @@ int move_freepages(struct zone *zone,
 }
 
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+				int migratetype, int *num_movable)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -1905,7 +1916,8 @@ int move_freepages_block(struct zone *zone, struct page *page,
 	if (!zone_spans_pfn(zone, end_pfn))
 		return 0;
 
-	return move_freepages(zone, start_page, end_page, migratetype);
+	return move_freepages(zone, start_page, end_page, migratetype,
+								num_movable);
 }
 
 static void change_pageblock_range(struct page *pageblock_page,
@@ -1960,11 +1972,12 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
  * use it's pages as requested migratetype in the future.
  */
 static void steal_suitable_fallback(struct zone *zone, struct page *page,
-					 int start_type, bool whole_block)
+					int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
 	struct free_area *area;
-	int pages;
+	int free_pages, good_pages;
+	int old_block_type;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
@@ -1981,10 +1994,29 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 		return;
 	}
 
-	pages = move_freepages_block(zone, page, start_type);
+	free_pages = move_freepages_block(zone, page, start_type,
+						&good_pages);
+	/*
+	 * good_pages is now the number of movable pages, but if we
+	 * want UNMOVABLE or RECLAIMABLE allocation, it's more tricky
+	 */
+	if (start_type != MIGRATE_MOVABLE) {
+		/*
+		 * If we are falling back to MIGRATE_MOVABLE pageblock,
+		 * treat all non-movable pages as good. If it's UNMOVABLE
+		 * falling back to RECLAIMABLE or vice versa, be conservative
+		 * as we can't distinguish the exact migratetype.
+		 */
+		old_block_type = get_pageblock_migratetype(page);
+		if (old_block_type == MIGRATE_MOVABLE)
+			good_pages = pageblock_nr_pages
+						- free_pages - good_pages;
+		else
+			good_pages = 0;
+	}
 
-	/* Claim the whole block if over half of it is free */
-	if (pages >= (1 << (pageblock_order-1)) ||
+	/* Claim the whole block if over half of it is free or good type */
+	if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||
 			page_group_by_mobility_disabled)
 		set_pageblock_migratetype(page, start_type);
 }
@@ -2056,7 +2088,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 			!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
-		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
+		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
 	}
 
 out_unlock:
@@ -2113,7 +2145,7 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
 			 * may increase.
 			 */
 			set_pageblock_migratetype(page, ac->migratetype);
-			move_freepages_block(zone, page, ac->migratetype);
+			move_freepages_block(zone, page, ac->migratetype, NULL);
 			spin_unlock_irqrestore(&zone->lock, flags);
 			return;
 		}
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index a5594bfcc5ed..29c2f9b9aba7 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -66,7 +66,8 @@ static int set_migratetype_isolate(struct page *page,
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
+									NULL);
 
 		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
 	}
@@ -120,7 +121,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	 * pageblock scanning for freepage moving.
 	 */
 	if (!isolated_page) {
-		nr_pages = move_freepages_block(zone, page, migratetype);
+		nr_pages = move_freepages_block(zone, page, migratetype, NULL);
 		__mod_zone_freepage_state(zone, nr_pages, migratetype);
 	}
 	set_pageblock_migratetype(page, migratetype);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-02-10 17:23   ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

When stealing pages from pageblock of a different migratetype, we count how
many free pages were stolen, and change the pageblock's migratetype if more
than half of the pageblock was free. This might be too conservative, as there
might be other pages that are not free, but were allocated with the same
migratetype as our allocation requested.

While we cannot determine the migratetype of allocated pages precisely (at
least without the page_owner functionality enabled), we can count pages that
compaction would try to isolate for migration - those are either on LRU or
__PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
done as part of free page stealing with little additional overhead.

The page stealing code is changed so that it considers free pages plus pages
of the "good" migratetype for the decision whether to change pageblock's
migratetype.

The result should be more accurate migratetype of pageblocks wrt the actual
pages in the pageblocks, when stealing from semi-occupied pageblocks. This
should help the efficiency of page grouping by mobility.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/page-isolation.h |  5 +---
 mm/page_alloc.c                | 54 +++++++++++++++++++++++++++++++++---------
 mm/page_isolation.c            |  5 ++--
 3 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 047d64706f2a..d4cd2014fa6f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,10 +33,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			 bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype);
-int move_freepages(struct zone *zone,
-			  struct page *start_page, struct page *end_page,
-			  int migratetype);
+				int migratetype, int *num_movable);
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 314e6b9ddbc4..a7d33818610f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1844,9 +1844,9 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
  * Note that start_page and end_pages are not aligned on a pageblock
  * boundary. If alignment is required, use move_freepages_block()
  */
-int move_freepages(struct zone *zone,
+static int move_freepages(struct zone *zone,
 			  struct page *start_page, struct page *end_page,
-			  int migratetype)
+			  int migratetype, int *num_movable)
 {
 	struct page *page;
 	unsigned int order;
@@ -1863,6 +1863,9 @@ int move_freepages(struct zone *zone,
 	VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
 #endif
 
+	if (num_movable)
+		*num_movable = 0;
+
 	for (page = start_page; page <= end_page;) {
 		/* Make sure we are not inadvertently changing nodes */
 		VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
@@ -1873,6 +1876,14 @@ int move_freepages(struct zone *zone,
 		}
 
 		if (!PageBuddy(page)) {
+			/*
+			 * We assume that pages that could be isolated for
+			 * migration are movable. But we don't actually try
+			 * isolating, as that would be expensive.
+			 */
+			if (num_movable && (PageLRU(page) || __PageMovable(page)))
+				(*num_movable)++;
+
 			page++;
 			continue;
 		}
@@ -1888,7 +1899,7 @@ int move_freepages(struct zone *zone,
 }
 
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+				int migratetype, int *num_movable)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -1905,7 +1916,8 @@ int move_freepages_block(struct zone *zone, struct page *page,
 	if (!zone_spans_pfn(zone, end_pfn))
 		return 0;
 
-	return move_freepages(zone, start_page, end_page, migratetype);
+	return move_freepages(zone, start_page, end_page, migratetype,
+								num_movable);
 }
 
 static void change_pageblock_range(struct page *pageblock_page,
@@ -1960,11 +1972,12 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
  * use it's pages as requested migratetype in the future.
  */
 static void steal_suitable_fallback(struct zone *zone, struct page *page,
-					 int start_type, bool whole_block)
+					int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
 	struct free_area *area;
-	int pages;
+	int free_pages, good_pages;
+	int old_block_type;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
@@ -1981,10 +1994,29 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 		return;
 	}
 
-	pages = move_freepages_block(zone, page, start_type);
+	free_pages = move_freepages_block(zone, page, start_type,
+						&good_pages);
+	/*
+	 * good_pages is now the number of movable pages, but if we
+	 * want UNMOVABLE or RECLAIMABLE allocation, it's more tricky
+	 */
+	if (start_type != MIGRATE_MOVABLE) {
+		/*
+		 * If we are falling back to MIGRATE_MOVABLE pageblock,
+		 * treat all non-movable pages as good. If it's UNMOVABLE
+		 * falling back to RECLAIMABLE or vice versa, be conservative
+		 * as we can't distinguish the exact migratetype.
+		 */
+		old_block_type = get_pageblock_migratetype(page);
+		if (old_block_type == MIGRATE_MOVABLE)
+			good_pages = pageblock_nr_pages
+						- free_pages - good_pages;
+		else
+			good_pages = 0;
+	}
 
-	/* Claim the whole block if over half of it is free */
-	if (pages >= (1 << (pageblock_order-1)) ||
+	/* Claim the whole block if over half of it is free or good type */
+	if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||
 			page_group_by_mobility_disabled)
 		set_pageblock_migratetype(page, start_type);
 }
@@ -2056,7 +2088,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 			!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
-		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
+		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
 	}
 
 out_unlock:
@@ -2113,7 +2145,7 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
 			 * may increase.
 			 */
 			set_pageblock_migratetype(page, ac->migratetype);
-			move_freepages_block(zone, page, ac->migratetype);
+			move_freepages_block(zone, page, ac->migratetype, NULL);
 			spin_unlock_irqrestore(&zone->lock, flags);
 			return;
 		}
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index a5594bfcc5ed..29c2f9b9aba7 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -66,7 +66,8 @@ static int set_migratetype_isolate(struct page *page,
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
+									NULL);
 
 		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
 	}
@@ -120,7 +121,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	 * pageblock scanning for freepage moving.
 	 */
 	if (!isolated_page) {
-		nr_pages = move_freepages_block(zone, page, migratetype);
+		nr_pages = move_freepages_block(zone, page, migratetype, NULL);
 		__mod_zone_freepage_state(zone, nr_pages, migratetype);
 	}
 	set_pageblock_migratetype(page, migratetype);
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 05/10] mm, compaction: change migrate_async_suitable() to suitable_migration_source()
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-10 17:23   ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

Preparation for making the decisions more complex and depending on
compact_control flags. No functional change.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mmzone.h |  5 +++++
 mm/compaction.c        | 19 +++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0f088f3a2fed..fd60a2b2d25d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -74,6 +74,11 @@ extern char * const migratetype_names[MIGRATE_TYPES];
 #  define is_migrate_cma_page(_page) false
 #endif
 
+static inline bool is_migrate_movable(int mt)
+{
+	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/compaction.c b/mm/compaction.c
index fc88e7b6fe37..6c477025c3da 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -88,11 +88,6 @@ static void map_pages(struct list_head *list)
 	list_splice(&tmp_list, list);
 }
 
-static inline bool migrate_async_suitable(int migratetype)
-{
-	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
-}
-
 #ifdef CONFIG_COMPACTION
 
 int PageMovable(struct page *page)
@@ -996,6 +991,15 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
 
+static bool suitable_migration_source(struct compact_control *cc,
+							struct page *page)
+{
+	if (cc->mode != MIGRATE_ASYNC)
+		return true;
+
+	return is_migrate_movable(get_pageblock_migratetype(page));
+}
+
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
@@ -1015,7 +1019,7 @@ static bool suitable_migration_target(struct compact_control *cc,
 	}
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(get_pageblock_migratetype(page)))
+	if (is_migrate_movable(get_pageblock_migratetype(page)))
 		return true;
 
 	/* Otherwise skip the block */
@@ -1250,8 +1254,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		 * Async compaction is optimistic to see if the minimum amount
 		 * of work satisfies the allocation.
 		 */
-		if (cc->mode == MIGRATE_ASYNC &&
-		    !migrate_async_suitable(get_pageblock_migratetype(page)))
+		if (!suitable_migration_source(cc, page))
 			continue;
 
 		/* Perform the isolation */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 05/10] mm, compaction: change migrate_async_suitable() to suitable_migration_source()
@ 2017-02-10 17:23   ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

Preparation for making the decisions more complex and depending on
compact_control flags. No functional change.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mmzone.h |  5 +++++
 mm/compaction.c        | 19 +++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0f088f3a2fed..fd60a2b2d25d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -74,6 +74,11 @@ extern char * const migratetype_names[MIGRATE_TYPES];
 #  define is_migrate_cma_page(_page) false
 #endif
 
+static inline bool is_migrate_movable(int mt)
+{
+	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/compaction.c b/mm/compaction.c
index fc88e7b6fe37..6c477025c3da 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -88,11 +88,6 @@ static void map_pages(struct list_head *list)
 	list_splice(&tmp_list, list);
 }
 
-static inline bool migrate_async_suitable(int migratetype)
-{
-	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
-}
-
 #ifdef CONFIG_COMPACTION
 
 int PageMovable(struct page *page)
@@ -996,6 +991,15 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
 
+static bool suitable_migration_source(struct compact_control *cc,
+							struct page *page)
+{
+	if (cc->mode != MIGRATE_ASYNC)
+		return true;
+
+	return is_migrate_movable(get_pageblock_migratetype(page));
+}
+
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
@@ -1015,7 +1019,7 @@ static bool suitable_migration_target(struct compact_control *cc,
 	}
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(get_pageblock_migratetype(page)))
+	if (is_migrate_movable(get_pageblock_migratetype(page)))
 		return true;
 
 	/* Otherwise skip the block */
@@ -1250,8 +1254,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		 * Async compaction is optimistic to see if the minimum amount
 		 * of work satisfies the allocation.
 		 */
-		if (cc->mode == MIGRATE_ASYNC &&
-		    !migrate_async_suitable(get_pageblock_migratetype(page)))
+		if (!suitable_migration_source(cc, page))
 			continue;
 
 		/* Perform the isolation */
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 06/10] mm, compaction: add migratetype to compact_control
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-10 17:23   ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

Preparation patch. We are going to need migratetype at lower layers than
compact_zone() and compact_finished().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 15 +++++++--------
 mm/internal.h   |  1 +
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 6c477025c3da..b7094700712b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1290,10 +1290,11 @@ static inline bool is_via_compact_memory(int order)
 	return order == -1;
 }
 
-static enum compact_result __compact_finished(struct zone *zone, struct compact_control *cc,
-			    const int migratetype)
+static enum compact_result __compact_finished(struct zone *zone,
+						struct compact_control *cc)
 {
 	unsigned int order;
+	const int migratetype = cc->migratetype;
 
 	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_CONTENDED;
@@ -1349,12 +1350,11 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 }
 
 static enum compact_result compact_finished(struct zone *zone,
-			struct compact_control *cc,
-			const int migratetype)
+			struct compact_control *cc)
 {
 	int ret;
 
-	ret = __compact_finished(zone, cc, migratetype);
+	ret = __compact_finished(zone, cc);
 	trace_mm_compaction_finished(zone, cc->order, ret);
 	if (ret == COMPACT_NO_SUITABLE_PAGE)
 		ret = COMPACT_CONTINUE;
@@ -1487,9 +1487,9 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	enum compact_result ret;
 	unsigned long start_pfn = zone->zone_start_pfn;
 	unsigned long end_pfn = zone_end_pfn(zone);
-	const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 
+	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
@@ -1539,8 +1539,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	migrate_prep_local();
 
-	while ((ret = compact_finished(zone, cc, migratetype)) ==
-						COMPACT_CONTINUE) {
+	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
 		int err;
 
 		switch (isolate_migratepages(zone, cc)) {
diff --git a/mm/internal.h b/mm/internal.h
index da37ddd3db40..888f33cc7641 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -179,6 +179,7 @@ struct compact_control {
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
+	int migratetype;		/* migratetype of direct compactor */
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
 	const int classzone_idx;	/* zone index of a direct compactor */
 	enum migrate_mode mode;		/* Async or sync migration mode */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 06/10] mm, compaction: add migratetype to compact_control
@ 2017-02-10 17:23   ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

Preparation patch. We are going to need migratetype at lower layers than
compact_zone() and compact_finished().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 15 +++++++--------
 mm/internal.h   |  1 +
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 6c477025c3da..b7094700712b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1290,10 +1290,11 @@ static inline bool is_via_compact_memory(int order)
 	return order == -1;
 }
 
-static enum compact_result __compact_finished(struct zone *zone, struct compact_control *cc,
-			    const int migratetype)
+static enum compact_result __compact_finished(struct zone *zone,
+						struct compact_control *cc)
 {
 	unsigned int order;
+	const int migratetype = cc->migratetype;
 
 	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_CONTENDED;
@@ -1349,12 +1350,11 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 }
 
 static enum compact_result compact_finished(struct zone *zone,
-			struct compact_control *cc,
-			const int migratetype)
+			struct compact_control *cc)
 {
 	int ret;
 
-	ret = __compact_finished(zone, cc, migratetype);
+	ret = __compact_finished(zone, cc);
 	trace_mm_compaction_finished(zone, cc->order, ret);
 	if (ret == COMPACT_NO_SUITABLE_PAGE)
 		ret = COMPACT_CONTINUE;
@@ -1487,9 +1487,9 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	enum compact_result ret;
 	unsigned long start_pfn = zone->zone_start_pfn;
 	unsigned long end_pfn = zone_end_pfn(zone);
-	const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 
+	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
@@ -1539,8 +1539,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	migrate_prep_local();
 
-	while ((ret = compact_finished(zone, cc, migratetype)) ==
-						COMPACT_CONTINUE) {
+	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
 		int err;
 
 		switch (isolate_migratepages(zone, cc)) {
diff --git a/mm/internal.h b/mm/internal.h
index da37ddd3db40..888f33cc7641 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -179,6 +179,7 @@ struct compact_control {
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
+	int migratetype;		/* migratetype of direct compactor */
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
 	const int classzone_idx;	/* zone index of a direct compactor */
 	enum migrate_mode mode;		/* Async or sync migration mode */
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-10 17:23   ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
pageblocks. This is a heuristic intended to reduce latency, based on the
assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.

However, with the exception of THP's, most high-order allocations are not
movable. Should the async compaction succeed, this increases the chance that
the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
long-term fragmentation worse.

This patch attempts to help the situation by changing async direct compaction
so that the migrate scanner only scans the pageblocks of the requested
migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
contain movable pages, chances are that the allocation can succeed within one
of such pageblocks, removing the need for a fallback. If that fails, the
subsequent sync attempt will ignore this restriction.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 11 +++++++++--
 mm/page_alloc.c | 20 +++++++++++++-------
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b7094700712b..84ef44c3b1c9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -994,10 +994,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 static bool suitable_migration_source(struct compact_control *cc,
 							struct page *page)
 {
-	if (cc->mode != MIGRATE_ASYNC)
+	int block_mt;
+
+	if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
 		return true;
 
-	return is_migrate_movable(get_pageblock_migratetype(page));
+	block_mt = get_pageblock_migratetype(page);
+
+	if (cc->migratetype == MIGRATE_MOVABLE)
+		return is_migrate_movable(block_mt);
+	else
+		return block_mt == cc->migratetype;
 }
 
 /* Returns true if the page is within a block suitable for migration to */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a7d33818610f..6d9ba640a12d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3523,6 +3523,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 						struct alloc_context *ac)
 {
 	bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
+	const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
 	struct page *page = NULL;
 	unsigned int alloc_flags;
 	unsigned long did_some_progress;
@@ -3572,12 +3573,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	/*
 	 * For costly allocations, try direct compaction first, as it's likely
-	 * that we have enough base pages and don't need to reclaim. Don't try
-	 * that for allocations that are allowed to ignore watermarks, as the
-	 * ALLOC_NO_WATERMARKS attempt didn't yet happen.
+	 * that we have enough base pages and don't need to reclaim. For non-
+	 * movable high-order allocations, do that as well, as compaction will
+	 * try prevent permanent fragmentation by migrating from blocks of the
+	 * same migratetype.
+	 * Don't try this for allocations that are allowed to ignore
+	 * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
 	 */
-	if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER &&
-		!gfp_pfmemalloc_allowed(gfp_mask)) {
+	if (can_direct_reclaim &&
+			(costly_order ||
+			   (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
+			&& !gfp_pfmemalloc_allowed(gfp_mask)) {
 		page = __alloc_pages_direct_compact(gfp_mask, order,
 						alloc_flags, ac,
 						INIT_COMPACT_PRIORITY,
@@ -3589,7 +3595,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * Checks for costly allocations with __GFP_NORETRY, which
 		 * includes THP page fault allocations
 		 */
-		if (gfp_mask & __GFP_NORETRY) {
+		if (costly_order && (gfp_mask & __GFP_NORETRY)) {
 			/*
 			 * If compaction is deferred for high-order allocations,
 			 * it is because sync compaction recently failed. If
@@ -3684,7 +3690,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * Do not retry costly high order allocations unless they are
 	 * __GFP_REPEAT
 	 */
-	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
+	if (costly_order && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
 	/* Make sure we know about allocations which stall for too long */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype
@ 2017-02-10 17:23   ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
pageblocks. This is a heuristic intended to reduce latency, based on the
assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.

However, with the exception of THP's, most high-order allocations are not
movable. Should the async compaction succeed, this increases the chance that
the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
long-term fragmentation worse.

This patch attempts to help the situation by changing async direct compaction
so that the migrate scanner only scans the pageblocks of the requested
migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
contain movable pages, chances are that the allocation can succeed within one
of such pageblocks, removing the need for a fallback. If that fails, the
subsequent sync attempt will ignore this restriction.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 11 +++++++++--
 mm/page_alloc.c | 20 +++++++++++++-------
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b7094700712b..84ef44c3b1c9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -994,10 +994,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 static bool suitable_migration_source(struct compact_control *cc,
 							struct page *page)
 {
-	if (cc->mode != MIGRATE_ASYNC)
+	int block_mt;
+
+	if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
 		return true;
 
-	return is_migrate_movable(get_pageblock_migratetype(page));
+	block_mt = get_pageblock_migratetype(page);
+
+	if (cc->migratetype == MIGRATE_MOVABLE)
+		return is_migrate_movable(block_mt);
+	else
+		return block_mt == cc->migratetype;
 }
 
 /* Returns true if the page is within a block suitable for migration to */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a7d33818610f..6d9ba640a12d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3523,6 +3523,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 						struct alloc_context *ac)
 {
 	bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
+	const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
 	struct page *page = NULL;
 	unsigned int alloc_flags;
 	unsigned long did_some_progress;
@@ -3572,12 +3573,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	/*
 	 * For costly allocations, try direct compaction first, as it's likely
-	 * that we have enough base pages and don't need to reclaim. Don't try
-	 * that for allocations that are allowed to ignore watermarks, as the
-	 * ALLOC_NO_WATERMARKS attempt didn't yet happen.
+	 * that we have enough base pages and don't need to reclaim. For non-
+	 * movable high-order allocations, do that as well, as compaction will
+	 * try prevent permanent fragmentation by migrating from blocks of the
+	 * same migratetype.
+	 * Don't try this for allocations that are allowed to ignore
+	 * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
 	 */
-	if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER &&
-		!gfp_pfmemalloc_allowed(gfp_mask)) {
+	if (can_direct_reclaim &&
+			(costly_order ||
+			   (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
+			&& !gfp_pfmemalloc_allowed(gfp_mask)) {
 		page = __alloc_pages_direct_compact(gfp_mask, order,
 						alloc_flags, ac,
 						INIT_COMPACT_PRIORITY,
@@ -3589,7 +3595,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * Checks for costly allocations with __GFP_NORETRY, which
 		 * includes THP page fault allocations
 		 */
-		if (gfp_mask & __GFP_NORETRY) {
+		if (costly_order && (gfp_mask & __GFP_NORETRY)) {
 			/*
 			 * If compaction is deferred for high-order allocations,
 			 * it is because sync compaction recently failed. If
@@ -3684,7 +3690,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * Do not retry costly high order allocations unless they are
 	 * __GFP_REPEAT
 	 */
-	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
+	if (costly_order && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
 	/* Make sure we know about allocations which stall for too long */
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 08/10] mm, compaction: finish whole pageblock to reduce fragmentation
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-10 17:23   ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

The main goal of direct compaction is to form a high-order page for allocation,
but it should also help against long-term fragmentation when possible. Most
lower-than-pageblock-order compactions are for non-movable allocations, which
means that if we compact in a movable pageblock and terminate as soon as we
create the high-order page, it's unlikely that the fallback heuristics will
claim the whole block. Instead there might be a single unmovable page in a
pageblock full of movable pages, and the next unmovable allocation might pick
another pageblock and increase long-term fragmentation.

To help against such scenarios, this patch changes the termination criteria for
compaction so that the current pageblock is finished even though the high-order
page already exists. Note that it might be possible that the high-order page
formed elsewhere in the zone due to parallel activity, but this patch doesn't
try to detect that.

This is only done with sync compaction, because async compaction is limited to
pageblock of the same migratetype, where it cannot result in a migratetype
fallback. (Async compaction also eagerly skips order-aligned blocks where
isolation fails, which is against the goal of migrating away as much of the
pageblock as possible.)

As a result of this patch, long-term memory fragmentation should be reduced.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 35 +++++++++++++++++++++++++++++++++--
 mm/internal.h   |  1 +
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 84ef44c3b1c9..cef77a5fffea 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1329,6 +1329,17 @@ static enum compact_result __compact_finished(struct zone *zone,
 	if (is_via_compact_memory(cc->order))
 		return COMPACT_CONTINUE;
 
+	if (cc->finishing_block) {
+		/*
+		 * We have finished the pageblock, but better check again that
+		 * we really succeeded.
+		 */
+		if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
+			cc->finishing_block = false;
+		else
+			return COMPACT_CONTINUE;
+	}
+
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
 		struct free_area *area = &zone->free_area[order];
@@ -1349,8 +1360,28 @@ static enum compact_result __compact_finished(struct zone *zone,
 		 * other migratetype buddy lists.
 		 */
 		if (find_suitable_fallback(area, order, migratetype,
-						true, &can_steal) != -1)
-			return COMPACT_SUCCESS;
+						true, &can_steal) != -1) {
+
+			/* movable pages are OK in any pageblock */
+			if (migratetype == MIGRATE_MOVABLE)
+				return COMPACT_SUCCESS;
+
+			/*
+			 * We are stealing for a non-movable allocation. Make
+			 * sure we finish compacting the current pageblock
+			 * first so it is as free as possible and we won't
+			 * have to steal another one soon. This only applies
+			 * to sync compaction, as async compaction operates
+			 * on pageblocks of the same migratetype.
+			 */
+			if (cc->mode == MIGRATE_ASYNC ||
+				IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages)) {
+				return COMPACT_SUCCESS;
+			} else {
+				cc->finishing_block = true;
+				return COMPACT_CONTINUE;
+			}
+		}
 	}
 
 	return COMPACT_NO_SUITABLE_PAGE;
diff --git a/mm/internal.h b/mm/internal.h
index 888f33cc7641..cdb33c957906 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -188,6 +188,7 @@ struct compact_control {
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	bool contended;			/* Signal lock or sched contention */
+	bool finishing_block;		/* Finishing current pageblock */
 };
 
 unsigned long
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 08/10] mm, compaction: finish whole pageblock to reduce fragmentation
@ 2017-02-10 17:23   ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

The main goal of direct compaction is to form a high-order page for allocation,
but it should also help against long-term fragmentation when possible. Most
lower-than-pageblock-order compactions are for non-movable allocations, which
means that if we compact in a movable pageblock and terminate as soon as we
create the high-order page, it's unlikely that the fallback heuristics will
claim the whole block. Instead there might be a single unmovable page in a
pageblock full of movable pages, and the next unmovable allocation might pick
another pageblock and increase long-term fragmentation.

To help against such scenarios, this patch changes the termination criteria for
compaction so that the current pageblock is finished even though the high-order
page already exists. Note that it might be possible that the high-order page
formed elsewhere in the zone due to parallel activity, but this patch doesn't
try to detect that.

This is only done with sync compaction, because async compaction is limited to
pageblock of the same migratetype, where it cannot result in a migratetype
fallback. (Async compaction also eagerly skips order-aligned blocks where
isolation fails, which is against the goal of migrating away as much of the
pageblock as possible.)

As a result of this patch, long-term memory fragmentation should be reduced.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 35 +++++++++++++++++++++++++++++++++--
 mm/internal.h   |  1 +
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 84ef44c3b1c9..cef77a5fffea 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1329,6 +1329,17 @@ static enum compact_result __compact_finished(struct zone *zone,
 	if (is_via_compact_memory(cc->order))
 		return COMPACT_CONTINUE;
 
+	if (cc->finishing_block) {
+		/*
+		 * We have finished the pageblock, but better check again that
+		 * we really succeeded.
+		 */
+		if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
+			cc->finishing_block = false;
+		else
+			return COMPACT_CONTINUE;
+	}
+
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
 		struct free_area *area = &zone->free_area[order];
@@ -1349,8 +1360,28 @@ static enum compact_result __compact_finished(struct zone *zone,
 		 * other migratetype buddy lists.
 		 */
 		if (find_suitable_fallback(area, order, migratetype,
-						true, &can_steal) != -1)
-			return COMPACT_SUCCESS;
+						true, &can_steal) != -1) {
+
+			/* movable pages are OK in any pageblock */
+			if (migratetype == MIGRATE_MOVABLE)
+				return COMPACT_SUCCESS;
+
+			/*
+			 * We are stealing for a non-movable allocation. Make
+			 * sure we finish compacting the current pageblock
+			 * first so it is as free as possible and we won't
+			 * have to steal another one soon. This only applies
+			 * to sync compaction, as async compaction operates
+			 * on pageblocks of the same migratetype.
+			 */
+			if (cc->mode == MIGRATE_ASYNC ||
+				IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages)) {
+				return COMPACT_SUCCESS;
+			} else {
+				cc->finishing_block = true;
+				return COMPACT_CONTINUE;
+			}
+		}
 	}
 
 	return COMPACT_NO_SUITABLE_PAGE;
diff --git a/mm/internal.h b/mm/internal.h
index 888f33cc7641..cdb33c957906 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -188,6 +188,7 @@ struct compact_control {
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	bool contended;			/* Signal lock or sched contention */
+	bool finishing_block;		/* Finishing current pageblock */
 };
 
 unsigned long
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC v2 09/10] mm, page_alloc: disallow migratetype fallback in fastpath
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-10 17:23   ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

The previous patch has adjusted async compaction so that it helps against
longterm fragmentation when compacting for a non-MOVABLE high-order allocation.
The goal of this patch is to force such allocations go through compaction
once before being allowed to fallback to a pageblock of different migratetype
(e.g. MOVABLE). In contexts where compaction is not allowed (and for order-0
allocations), this delayed fallback possibility can still help by trying a
different zone where fallback might not be needed and potentially waking up
kswapd earlier.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 22 +++++++++++++++++-----
 mm/internal.h   |  2 ++
 mm/page_alloc.c | 15 +++++++++++----
 3 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index cef77a5fffea..bb18d21c6a56 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1357,9 +1357,11 @@ static enum compact_result __compact_finished(struct zone *zone,
 #endif
 		/*
 		 * Job done if allocation would steal freepages from
-		 * other migratetype buddy lists.
+		 * other migratetype buddy lists. This is not allowed
+		 * for async direct compaction.
 		 */
-		if (find_suitable_fallback(area, order, migratetype,
+		if (!cc->prevent_fallback &&
+			find_suitable_fallback(area, order, migratetype,
 						true, &can_steal) != -1) {
 
 			/* movable pages are OK in any pageblock */
@@ -1530,8 +1532,17 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
-	/* Compaction is likely to fail */
-	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
+	/*
+	 * Compaction should not be needed. If we don't allow stealing from
+	 * pageblocks of different migratetype, the watermark checks cannot
+	 * distinguish that, so assume we would need to steal, and leave the
+	 * thorough check to compact_finished().
+	 */
+	if (ret == COMPACT_SUCCESS && !cc->prevent_fallback)
+		return ret;
+
+	/* Compaction is likely to fail due to insufficient free pages */
+	if (ret == COMPACT_SKIPPED)
 		return ret;
 
 	/* huh, compaction_suitable is returning something unexpected */
@@ -1699,7 +1710,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.direct_compaction = true,
 		.whole_zone = (prio == MIN_COMPACT_PRIORITY),
 		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
-		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
+		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY),
+		.prevent_fallback = (prio == COMPACT_PRIO_ASYNC)
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
diff --git a/mm/internal.h b/mm/internal.h
index cdb33c957906..1b7a89a9a9d7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -189,6 +189,7 @@ struct compact_control {
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	bool contended;			/* Signal lock or sched contention */
 	bool finishing_block;		/* Finishing current pageblock */
+	bool prevent_fallback;		/* Stealing migratetypes not allowed */
 };
 
 unsigned long
@@ -467,6 +468,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
 #define ALLOC_CMA		0x80 /* allow allocations from CMA areas */
+#define ALLOC_FALLBACK		0x100 /* allow fallback of migratetype */
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6d9ba640a12d..5270be8325fd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2197,7 +2197,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
  * Call me with the zone->lock already held.
  */
 static struct page *__rmqueue(struct zone *zone, unsigned int order,
-				int migratetype)
+				int migratetype, bool allow_fallback)
 {
 	struct page *page;
 
@@ -2207,7 +2207,8 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 		if (migratetype == MIGRATE_MOVABLE)
 			page = __rmqueue_cma_fallback(zone, order);
 
-		if (!page && __rmqueue_fallback(zone, order, migratetype))
+		if (!page && allow_fallback &&
+				__rmqueue_fallback(zone, order, migratetype))
 			goto retry;
 	}
 
@@ -2228,7 +2229,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 
 	spin_lock(&zone->lock);
 	for (i = 0; i < count; ++i) {
-		struct page *page = __rmqueue(zone, order, migratetype);
+		struct page *page = __rmqueue(zone, order, migratetype, true);
 		if (unlikely(page == NULL))
 			break;
 
@@ -2661,7 +2662,10 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 					trace_mm_page_alloc_zone_locked(page, order, migratetype);
 			}
 			if (!page)
-				page = __rmqueue(zone, order, migratetype);
+				page = __rmqueue(zone, order, migratetype,
+						alloc_flags &
+						(ALLOC_FALLBACK |
+						 ALLOC_NO_WATERMARKS));
 		} while (page && check_new_pages(page, order));
 		spin_unlock(&zone->lock);
 		if (!page)
@@ -3616,6 +3620,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		}
 	}
 
+	/* async direct compaction didn't help, now allow fallback */
+	alloc_flags |= ALLOC_FALLBACK;
+
 retry:
 	/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC v2 09/10] mm, page_alloc: disallow migratetype fallback in fastpath
@ 2017-02-10 17:23   ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

The previous patch has adjusted async compaction so that it helps against
longterm fragmentation when compacting for a non-MOVABLE high-order allocation.
The goal of this patch is to force such allocations go through compaction
once before being allowed to fallback to a pageblock of different migratetype
(e.g. MOVABLE). In contexts where compaction is not allowed (and for order-0
allocations), this delayed fallback possibility can still help by trying a
different zone where fallback might not be needed and potentially waking up
kswapd earlier.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 22 +++++++++++++++++-----
 mm/internal.h   |  2 ++
 mm/page_alloc.c | 15 +++++++++++----
 3 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index cef77a5fffea..bb18d21c6a56 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1357,9 +1357,11 @@ static enum compact_result __compact_finished(struct zone *zone,
 #endif
 		/*
 		 * Job done if allocation would steal freepages from
-		 * other migratetype buddy lists.
+		 * other migratetype buddy lists. This is not allowed
+		 * for async direct compaction.
 		 */
-		if (find_suitable_fallback(area, order, migratetype,
+		if (!cc->prevent_fallback &&
+			find_suitable_fallback(area, order, migratetype,
 						true, &can_steal) != -1) {
 
 			/* movable pages are OK in any pageblock */
@@ -1530,8 +1532,17 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
-	/* Compaction is likely to fail */
-	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
+	/*
+	 * Compaction should not be needed. If we don't allow stealing from
+	 * pageblocks of different migratetype, the watermark checks cannot
+	 * distinguish that, so assume we would need to steal, and leave the
+	 * thorough check to compact_finished().
+	 */
+	if (ret == COMPACT_SUCCESS && !cc->prevent_fallback)
+		return ret;
+
+	/* Compaction is likely to fail due to insufficient free pages */
+	if (ret == COMPACT_SKIPPED)
 		return ret;
 
 	/* huh, compaction_suitable is returning something unexpected */
@@ -1699,7 +1710,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.direct_compaction = true,
 		.whole_zone = (prio == MIN_COMPACT_PRIORITY),
 		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
-		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
+		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY),
+		.prevent_fallback = (prio == COMPACT_PRIO_ASYNC)
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
diff --git a/mm/internal.h b/mm/internal.h
index cdb33c957906..1b7a89a9a9d7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -189,6 +189,7 @@ struct compact_control {
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	bool contended;			/* Signal lock or sched contention */
 	bool finishing_block;		/* Finishing current pageblock */
+	bool prevent_fallback;		/* Stealing migratetypes not allowed */
 };
 
 unsigned long
@@ -467,6 +468,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
 #define ALLOC_CMA		0x80 /* allow allocations from CMA areas */
+#define ALLOC_FALLBACK		0x100 /* allow fallback of migratetype */
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6d9ba640a12d..5270be8325fd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2197,7 +2197,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
  * Call me with the zone->lock already held.
  */
 static struct page *__rmqueue(struct zone *zone, unsigned int order,
-				int migratetype)
+				int migratetype, bool allow_fallback)
 {
 	struct page *page;
 
@@ -2207,7 +2207,8 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 		if (migratetype == MIGRATE_MOVABLE)
 			page = __rmqueue_cma_fallback(zone, order);
 
-		if (!page && __rmqueue_fallback(zone, order, migratetype))
+		if (!page && allow_fallback &&
+				__rmqueue_fallback(zone, order, migratetype))
 			goto retry;
 	}
 
@@ -2228,7 +2229,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 
 	spin_lock(&zone->lock);
 	for (i = 0; i < count; ++i) {
-		struct page *page = __rmqueue(zone, order, migratetype);
+		struct page *page = __rmqueue(zone, order, migratetype, true);
 		if (unlikely(page == NULL))
 			break;
 
@@ -2661,7 +2662,10 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 					trace_mm_page_alloc_zone_locked(page, order, migratetype);
 			}
 			if (!page)
-				page = __rmqueue(zone, order, migratetype);
+				page = __rmqueue(zone, order, migratetype,
+						alloc_flags &
+						(ALLOC_FALLBACK |
+						 ALLOC_NO_WATERMARKS));
 		} while (page && check_new_pages(page, order));
 		spin_unlock(&zone->lock);
 		if (!page)
@@ -3616,6 +3620,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		}
 	}
 
+	/* async direct compaction didn't help, now allow fallback */
+	alloc_flags |= ALLOC_FALLBACK;
+
 retry:
 	/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC v2 10/10] mm, page_alloc: introduce MIGRATE_MIXED migratetype
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-10 17:23   ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

Page mobility grouping tries to minimize the number of pageblocks that contain
non-migratable pages by distinguishing MOVABLE, UNMOVABLE and RECLAIMABLE
pageblock migratetypes. Changing pageblock's migratetype is allowed if an
allocation of different migratetype steals more than half of pages from it.

That means it's possible to have pageblocks that contain some UNMOVABLE and
RECLAIMABLE pages, yet they are marked as MOVABLE, and the next time stealing
happens, another MOVABLE pageblock might get polluted. On the other hand, if we
duly marked all polluted pageblocks (even just by single page) as UNMOVABLE or
RECLAIMABLE, further allocations and freeing of pages would tend to spread over
all of them, and there would be little pressure for them to eventually become
fully free and MOVABLE.

This patch thus introduces a new migratetype MIGRATE_MIXED, which is intended
to mark pageblocks that contain some UNMOVABLE or RECLAIMABLE pages, but not
enough to mark the whole pageblocks as such. These pageblocks become preferred
fallback before UNMOVABLE/RECLAIMABLE allocation steals from a MOVABLE
pageblock, or vice versa. This should help page mobility grouping:

- UNMOVABLE and RECLAIMABLE allocations will try to be satisfied from their
  respective pageblocks. If these are full, polluting other pageblocks is
  limited to MIGRATE_MIXED pageblocks. MIGRATE_MOVABLE pageblocks remain pure.
  If a temporery pressure for UNMOVABLE and RECLAIMABLE pageblocks disappears
  and can be satisfied without fallback, the MIXED pageblocks might eventually
  fully recover from the polluted pages.

- MOVABLE allocations will exhaust MOVABLE pageblocks first, then fallback to
  MIXED as second. This leaves free pages in UNMOVABLE and RECLAIMABLE
  pageblocks as a last resort, so those allocations don't have to fall back
  so much.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mmzone.h |  1 +
 mm/compaction.c        | 14 ++++++++--
 mm/page_alloc.c        | 73 +++++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 71 insertions(+), 17 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fd60a2b2d25d..d9417f5171d8 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -41,6 +41,7 @@ enum {
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
 	MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
+	MIGRATE_MIXED,
 #ifdef CONFIG_CMA
 	/*
 	 * MIGRATE_CMA migration type is designed to mimic the way
diff --git a/mm/compaction.c b/mm/compaction.c
index bb18d21c6a56..d2d7bfeffe7e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1001,6 +1001,9 @@ static bool suitable_migration_source(struct compact_control *cc,
 
 	block_mt = get_pageblock_migratetype(page);
 
+	if (block_mt == MIGRATE_MIXED)
+		return true;
+
 	if (cc->migratetype == MIGRATE_MOVABLE)
 		return is_migrate_movable(block_mt);
 	else
@@ -1011,6 +1014,8 @@ static bool suitable_migration_source(struct compact_control *cc,
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
 {
+	int block_mt;
+
 	if (cc->ignore_block_suitable)
 		return true;
 
@@ -1025,8 +1030,13 @@ static bool suitable_migration_target(struct compact_control *cc,
 			return false;
 	}
 
-	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (is_migrate_movable(get_pageblock_migratetype(page)))
+	block_mt = get_pageblock_migratetype(page);
+
+	/*
+	 * If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration.
+	 * Allow also mixed pageblocks so we are not so restrictive.
+	 */
+	if (is_migrate_movable(block_mt) || block_mt == MIGRATE_MIXED)
 		return true;
 
 	/* Otherwise skip the block */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5270be8325fd..1a93813e7962 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -234,6 +234,7 @@ char * const migratetype_names[MIGRATE_TYPES] = {
 	"Movable",
 	"Reclaimable",
 	"HighAtomic",
+	"Mixed",
 #ifdef CONFIG_CMA
 	"CMA",
 #endif
@@ -1817,9 +1818,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * the free lists for the desirable migrate type are depleted
  */
 static int fallbacks[MIGRATE_TYPES][4] = {
-	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
-	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
+	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MIXED, MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MIXED, MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_MOVABLE]     = { MIGRATE_MIXED, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
 #ifdef CONFIG_CMA
 	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
 #endif
@@ -1977,7 +1978,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	unsigned int current_order = page_order(page);
 	struct free_area *area;
 	int free_pages, good_pages;
-	int old_block_type;
+	int old_block_type, new_block_type;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
@@ -1991,11 +1992,27 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	if (!whole_block) {
 		area = &zone->free_area[current_order];
 		list_move(&page->lru, &area->free_list[start_type]);
-		return;
+		free_pages = 1 << current_order;
+		/* TODO: We didn't scan the block, so be pessimistic */
+		good_pages = 0;
+	} else {
+		free_pages = move_freepages_block(zone, page, start_type,
+							&good_pages);
+		/*
+		 * good_pages is now the number of movable pages, but if we
+		 * want UNMOVABLE or RECLAIMABLE, we consider all non-movable
+		 * as good (but we can't fully distinguish them)
+		 */
+		if (start_type != MIGRATE_MOVABLE)
+			good_pages = pageblock_nr_pages - free_pages -
+								good_pages;
 	}
 
 	free_pages = move_freepages_block(zone, page, start_type,
 						&good_pages);
+
+	new_block_type = old_block_type = get_pageblock_migratetype(page);
+
 	/*
 	 * good_pages is now the number of movable pages, but if we
 	 * want UNMOVABLE or RECLAIMABLE allocation, it's more tricky
@@ -2007,7 +2024,6 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 		 * falling back to RECLAIMABLE or vice versa, be conservative
 		 * as we can't distinguish the exact migratetype.
 		 */
-		old_block_type = get_pageblock_migratetype(page);
 		if (old_block_type == MIGRATE_MOVABLE)
 			good_pages = pageblock_nr_pages
 						- free_pages - good_pages;
@@ -2015,10 +2031,34 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 			good_pages = 0;
 	}
 
-	/* Claim the whole block if over half of it is free or good type */
-	if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||
-			page_group_by_mobility_disabled)
-		set_pageblock_migratetype(page, start_type);
+	if (page_group_by_mobility_disabled) {
+		new_block_type = start_type;
+	} else if (free_pages + good_pages >= (1 << (pageblock_order-1))) {
+		/*
+		 * Claim the whole block if over half of it is free or good
+		 * type. The exception is the transition to MIGRATE_MOVABLE
+		 * where we require it to be fully free so that MIGRATE_MOVABLE
+		 * pageblocks consist of purely movable pages. So if we steal
+		 * less than whole pageblock, mark it as MIGRATE_MIXED.
+		 */
+		if ((start_type == MIGRATE_MOVABLE) &&
+				free_pages + good_pages < pageblock_nr_pages)
+			new_block_type = MIGRATE_MIXED;
+		else
+			new_block_type = start_type;
+	} else {
+		/*
+		 * We didn't steal enough to change the block's migratetype.
+		 * But if we are stealing from a MOVABLE block for a
+		 * non-MOVABLE allocation, mark the block as MIXED.
+		 */
+		if (old_block_type == MIGRATE_MOVABLE
+					&& start_type != MIGRATE_MOVABLE)
+			new_block_type = MIGRATE_MIXED;
+	}
+
+	if (new_block_type != old_block_type)
+		set_pageblock_migratetype(page, new_block_type);
 }
 
 /*
@@ -2560,16 +2600,18 @@ int __isolate_free_page(struct page *page, unsigned int order)
 	rmv_page_order(page);
 
 	/*
-	 * Set the pageblock if the isolated page is at least half of a
-	 * pageblock
+	 * Set the pageblock's migratetype to MIXED if the isolated page is
+	 * at least half of a pageblock, MOVABLE if at least whole pageblock
 	 */
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
+		int new_mt = (order >= pageblock_order) ?
+					MIGRATE_MOVABLE : MIGRATE_MIXED;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
-				set_pageblock_migratetype(page,
-							  MIGRATE_MOVABLE);
+
+			if (!is_migrate_isolate(mt) && !is_migrate_movable(mt))
+				set_pageblock_migratetype(page, new_mt);
 		}
 	}
 
@@ -4252,6 +4294,7 @@ static void show_migration_types(unsigned char type)
 		[MIGRATE_MOVABLE]	= 'M',
 		[MIGRATE_RECLAIMABLE]	= 'E',
 		[MIGRATE_HIGHATOMIC]	= 'H',
+		[MIGRATE_MIXED]		= 'M',
 #ifdef CONFIG_CMA
 		[MIGRATE_CMA]		= 'C',
 #endif
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC v2 10/10] mm, page_alloc: introduce MIGRATE_MIXED migratetype
@ 2017-02-10 17:23   ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-10 17:23 UTC (permalink / raw)
  To: linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Vlastimil Babka

Page mobility grouping tries to minimize the number of pageblocks that contain
non-migratable pages by distinguishing MOVABLE, UNMOVABLE and RECLAIMABLE
pageblock migratetypes. Changing pageblock's migratetype is allowed if an
allocation of different migratetype steals more than half of pages from it.

That means it's possible to have pageblocks that contain some UNMOVABLE and
RECLAIMABLE pages, yet they are marked as MOVABLE, and the next time stealing
happens, another MOVABLE pageblock might get polluted. On the other hand, if we
duly marked all polluted pageblocks (even just by single page) as UNMOVABLE or
RECLAIMABLE, further allocations and freeing of pages would tend to spread over
all of them, and there would be little pressure for them to eventually become
fully free and MOVABLE.

This patch thus introduces a new migratetype MIGRATE_MIXED, which is intended
to mark pageblocks that contain some UNMOVABLE or RECLAIMABLE pages, but not
enough to mark the whole pageblocks as such. These pageblocks become preferred
fallback before UNMOVABLE/RECLAIMABLE allocation steals from a MOVABLE
pageblock, or vice versa. This should help page mobility grouping:

- UNMOVABLE and RECLAIMABLE allocations will try to be satisfied from their
  respective pageblocks. If these are full, polluting other pageblocks is
  limited to MIGRATE_MIXED pageblocks. MIGRATE_MOVABLE pageblocks remain pure.
  If a temporery pressure for UNMOVABLE and RECLAIMABLE pageblocks disappears
  and can be satisfied without fallback, the MIXED pageblocks might eventually
  fully recover from the polluted pages.

- MOVABLE allocations will exhaust MOVABLE pageblocks first, then fallback to
  MIXED as second. This leaves free pages in UNMOVABLE and RECLAIMABLE
  pageblocks as a last resort, so those allocations don't have to fall back
  so much.

Not-yet-signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mmzone.h |  1 +
 mm/compaction.c        | 14 ++++++++--
 mm/page_alloc.c        | 73 +++++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 71 insertions(+), 17 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fd60a2b2d25d..d9417f5171d8 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -41,6 +41,7 @@ enum {
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
 	MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
+	MIGRATE_MIXED,
 #ifdef CONFIG_CMA
 	/*
 	 * MIGRATE_CMA migration type is designed to mimic the way
diff --git a/mm/compaction.c b/mm/compaction.c
index bb18d21c6a56..d2d7bfeffe7e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1001,6 +1001,9 @@ static bool suitable_migration_source(struct compact_control *cc,
 
 	block_mt = get_pageblock_migratetype(page);
 
+	if (block_mt == MIGRATE_MIXED)
+		return true;
+
 	if (cc->migratetype == MIGRATE_MOVABLE)
 		return is_migrate_movable(block_mt);
 	else
@@ -1011,6 +1014,8 @@ static bool suitable_migration_source(struct compact_control *cc,
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
 {
+	int block_mt;
+
 	if (cc->ignore_block_suitable)
 		return true;
 
@@ -1025,8 +1030,13 @@ static bool suitable_migration_target(struct compact_control *cc,
 			return false;
 	}
 
-	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (is_migrate_movable(get_pageblock_migratetype(page)))
+	block_mt = get_pageblock_migratetype(page);
+
+	/*
+	 * If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration.
+	 * Allow also mixed pageblocks so we are not so restrictive.
+	 */
+	if (is_migrate_movable(block_mt) || block_mt == MIGRATE_MIXED)
 		return true;
 
 	/* Otherwise skip the block */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5270be8325fd..1a93813e7962 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -234,6 +234,7 @@ char * const migratetype_names[MIGRATE_TYPES] = {
 	"Movable",
 	"Reclaimable",
 	"HighAtomic",
+	"Mixed",
 #ifdef CONFIG_CMA
 	"CMA",
 #endif
@@ -1817,9 +1818,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * the free lists for the desirable migrate type are depleted
  */
 static int fallbacks[MIGRATE_TYPES][4] = {
-	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
-	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
+	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MIXED, MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MIXED, MIGRATE_MOVABLE,   MIGRATE_TYPES },
+	[MIGRATE_MOVABLE]     = { MIGRATE_MIXED, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
 #ifdef CONFIG_CMA
 	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
 #endif
@@ -1977,7 +1978,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	unsigned int current_order = page_order(page);
 	struct free_area *area;
 	int free_pages, good_pages;
-	int old_block_type;
+	int old_block_type, new_block_type;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
@@ -1991,11 +1992,27 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	if (!whole_block) {
 		area = &zone->free_area[current_order];
 		list_move(&page->lru, &area->free_list[start_type]);
-		return;
+		free_pages = 1 << current_order;
+		/* TODO: We didn't scan the block, so be pessimistic */
+		good_pages = 0;
+	} else {
+		free_pages = move_freepages_block(zone, page, start_type,
+							&good_pages);
+		/*
+		 * good_pages is now the number of movable pages, but if we
+		 * want UNMOVABLE or RECLAIMABLE, we consider all non-movable
+		 * as good (but we can't fully distinguish them)
+		 */
+		if (start_type != MIGRATE_MOVABLE)
+			good_pages = pageblock_nr_pages - free_pages -
+								good_pages;
 	}
 
 	free_pages = move_freepages_block(zone, page, start_type,
 						&good_pages);
+
+	new_block_type = old_block_type = get_pageblock_migratetype(page);
+
 	/*
 	 * good_pages is now the number of movable pages, but if we
 	 * want UNMOVABLE or RECLAIMABLE allocation, it's more tricky
@@ -2007,7 +2024,6 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 		 * falling back to RECLAIMABLE or vice versa, be conservative
 		 * as we can't distinguish the exact migratetype.
 		 */
-		old_block_type = get_pageblock_migratetype(page);
 		if (old_block_type == MIGRATE_MOVABLE)
 			good_pages = pageblock_nr_pages
 						- free_pages - good_pages;
@@ -2015,10 +2031,34 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 			good_pages = 0;
 	}
 
-	/* Claim the whole block if over half of it is free or good type */
-	if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||
-			page_group_by_mobility_disabled)
-		set_pageblock_migratetype(page, start_type);
+	if (page_group_by_mobility_disabled) {
+		new_block_type = start_type;
+	} else if (free_pages + good_pages >= (1 << (pageblock_order-1))) {
+		/*
+		 * Claim the whole block if over half of it is free or good
+		 * type. The exception is the transition to MIGRATE_MOVABLE
+		 * where we require it to be fully free so that MIGRATE_MOVABLE
+		 * pageblocks consist of purely movable pages. So if we steal
+		 * less than whole pageblock, mark it as MIGRATE_MIXED.
+		 */
+		if ((start_type == MIGRATE_MOVABLE) &&
+				free_pages + good_pages < pageblock_nr_pages)
+			new_block_type = MIGRATE_MIXED;
+		else
+			new_block_type = start_type;
+	} else {
+		/*
+		 * We didn't steal enough to change the block's migratetype.
+		 * But if we are stealing from a MOVABLE block for a
+		 * non-MOVABLE allocation, mark the block as MIXED.
+		 */
+		if (old_block_type == MIGRATE_MOVABLE
+					&& start_type != MIGRATE_MOVABLE)
+			new_block_type = MIGRATE_MIXED;
+	}
+
+	if (new_block_type != old_block_type)
+		set_pageblock_migratetype(page, new_block_type);
 }
 
 /*
@@ -2560,16 +2600,18 @@ int __isolate_free_page(struct page *page, unsigned int order)
 	rmv_page_order(page);
 
 	/*
-	 * Set the pageblock if the isolated page is at least half of a
-	 * pageblock
+	 * Set the pageblock's migratetype to MIXED if the isolated page is
+	 * at least half of a pageblock, MOVABLE if at least whole pageblock
 	 */
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
+		int new_mt = (order >= pageblock_order) ?
+					MIGRATE_MOVABLE : MIGRATE_MIXED;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
-				set_pageblock_migratetype(page,
-							  MIGRATE_MOVABLE);
+
+			if (!is_migrate_isolate(mt) && !is_migrate_movable(mt))
+				set_pageblock_migratetype(page, new_mt);
 		}
 	}
 
@@ -4252,6 +4294,7 @@ static void show_migration_types(unsigned char type)
 		[MIGRATE_MOVABLE]	= 'M',
 		[MIGRATE_RECLAIMABLE]	= 'E',
 		[MIGRATE_HIGHATOMIC]	= 'H',
+		[MIGRATE_MIXED]		= 'M',
 #ifdef CONFIG_CMA
 		[MIGRATE_CMA]		= 'C',
 #endif
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 01/10] mm, compaction: reorder fields in struct compact_control
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-13 10:49     ` Mel Gorman
  -1 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:34PM +0100, Vlastimil Babka wrote:
> While currently there are (mostly by accident) no holes in struct
> compact_control (on x86_64), but we are going to add more bool flags, so place
> them all together to the end of the structure. While at it, just order all
> fields from largest to smallest.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 01/10] mm, compaction: reorder fields in struct compact_control
@ 2017-02-13 10:49     ` Mel Gorman
  0 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:34PM +0100, Vlastimil Babka wrote:
> While currently there are (mostly by accident) no holes in struct
> compact_control (on x86_64), but we are going to add more bool flags, so place
> them all together to the end of the structure. While at it, just order all
> fields from largest to smallest.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 02/10] mm, compaction: remove redundant watermark check in compact_finished()
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-13 10:49     ` Mel Gorman
  -1 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:35PM +0100, Vlastimil Babka wrote:
> When detecting whether compaction has succeeded in forming a high-order page,
> __compact_finished() employs a watermark check, followed by an own search for
> a suitable page in the freelists. This is not ideal for two reasons:
> 
> - The watermark check also searches high-order freelists, but has a less strict
>   criteria wrt fallback. It's therefore redundant and waste of cycles. This was
>   different in the past when high-order watermark check attempted to apply
>   reserves to high-order pages.
> 
> - The watermark check might actually fail due to lack of order-0 pages.
>   Compaction can't help with that, so there's no point in continuing because of
>   that. It's possible that high-order page still exists and it terminates.
> 
> This patch therefore removes the watermark check. This should save some cycles
> and terminate compaction sooner in some cases.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 02/10] mm, compaction: remove redundant watermark check in compact_finished()
@ 2017-02-13 10:49     ` Mel Gorman
  0 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:35PM +0100, Vlastimil Babka wrote:
> When detecting whether compaction has succeeded in forming a high-order page,
> __compact_finished() employs a watermark check, followed by an own search for
> a suitable page in the freelists. This is not ideal for two reasons:
> 
> - The watermark check also searches high-order freelists, but has a less strict
>   criteria wrt fallback. It's therefore redundant and waste of cycles. This was
>   different in the past when high-order watermark check attempted to apply
>   reserves to high-order pages.
> 
> - The watermark check might actually fail due to lack of order-0 pages.
>   Compaction can't help with that, so there's no point in continuing because of
>   that. It's possible that high-order page still exists and it terminates.
> 
> This patch therefore removes the watermark check. This should save some cycles
> and terminate compaction sooner in some cases.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 03/10] mm, page_alloc: split smallest stolen page in fallback
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-13 10:51     ` Mel Gorman
  -1 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:36PM +0100, Vlastimil Babka wrote:
> The __rmqueue_fallback() function is called when there's no free page of
> requested migratetype, and we need to steal from a different one. There are
> various heuristics to make this event infrequent and reduce permanent
> fragmentation. The main one is to try stealing from a pageblock that has the
> most free pages, and possibly steal them all at once and convert the whole
> pageblock. Precise searching for such pageblock would be expensive, so instead
> the heuristics walks the free lists from MAX_ORDER down to requested order and
> assumes that the block with highest-order free page is likely to also have the
> most free pages in total.
> 
> Chances are that together with the highest-order page, we steal also pages of
> lower orders from the same block. But then we still split the highest order
> page. This is wasteful and can contribute to fragmentation instead of avoiding
> it.
> 

The original intent was that if an allocation request was stealing a
pageblock that taking the largest one would reduce the likelihood of a
steal in the near future by the same type.

> This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
> them on the freelist of the requested migratetype, and only report whether it
> was successful. Then we pick (and eventually split) the smallest page with
> __rmqueue_smallest().  This all happens under zone lock, so nobody can steal it
> from us in the process. This should reduce fragmentation due to fallbacks. At
> worst we are only stealing a single highest-order page and waste some cycles by
> moving it between lists and then removing it, but fallback is not exactly hot
> path so that should not be a concern. As a side benefit the patch removes some
> duplicate code by reusing __rmqueue_smallest().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

But conceptually this is better so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 03/10] mm, page_alloc: split smallest stolen page in fallback
@ 2017-02-13 10:51     ` Mel Gorman
  0 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:36PM +0100, Vlastimil Babka wrote:
> The __rmqueue_fallback() function is called when there's no free page of
> requested migratetype, and we need to steal from a different one. There are
> various heuristics to make this event infrequent and reduce permanent
> fragmentation. The main one is to try stealing from a pageblock that has the
> most free pages, and possibly steal them all at once and convert the whole
> pageblock. Precise searching for such pageblock would be expensive, so instead
> the heuristics walks the free lists from MAX_ORDER down to requested order and
> assumes that the block with highest-order free page is likely to also have the
> most free pages in total.
> 
> Chances are that together with the highest-order page, we steal also pages of
> lower orders from the same block. But then we still split the highest order
> page. This is wasteful and can contribute to fragmentation instead of avoiding
> it.
> 

The original intent was that if an allocation request was stealing a
pageblock that taking the largest one would reduce the likelihood of a
steal in the near future by the same type.

> This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
> them on the freelist of the requested migratetype, and only report whether it
> was successful. Then we pick (and eventually split) the smallest page with
> __rmqueue_smallest().  This all happens under zone lock, so nobody can steal it
> from us in the process. This should reduce fragmentation due to fallbacks. At
> worst we are only stealing a single highest-order page and waste some cycles by
> moving it between lists and then removing it, but fallback is not exactly hot
> path so that should not be a concern. As a side benefit the patch removes some
> duplicate code by reusing __rmqueue_smallest().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

But conceptually this is better so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-13 10:53     ` Mel Gorman
  -1 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:37PM +0100, Vlastimil Babka wrote:
> When stealing pages from pageblock of a different migratetype, we count how
> many free pages were stolen, and change the pageblock's migratetype if more
> than half of the pageblock was free. This might be too conservative, as there
> might be other pages that are not free, but were allocated with the same
> migratetype as our allocation requested.
> 
> While we cannot determine the migratetype of allocated pages precisely (at
> least without the page_owner functionality enabled), we can count pages that
> compaction would try to isolate for migration - those are either on LRU or
> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
> done as part of free page stealing with little additional overhead.
> 
> The page stealing code is changed so that it considers free pages plus pages
> of the "good" migratetype for the decision whether to change pageblock's
> migratetype.
> 
> The result should be more accurate migratetype of pageblocks wrt the actual
> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
> should help the efficiency of page grouping by mobility.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

While it's fine now, it may be necessary to unify the checks that
compaction and the page allocator use for determining if the page can
move. In general, this is still a better idea for a modest amount of
overhead in a path that is considered slow anyway so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-02-13 10:53     ` Mel Gorman
  0 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:37PM +0100, Vlastimil Babka wrote:
> When stealing pages from pageblock of a different migratetype, we count how
> many free pages were stolen, and change the pageblock's migratetype if more
> than half of the pageblock was free. This might be too conservative, as there
> might be other pages that are not free, but were allocated with the same
> migratetype as our allocation requested.
> 
> While we cannot determine the migratetype of allocated pages precisely (at
> least without the page_owner functionality enabled), we can count pages that
> compaction would try to isolate for migration - those are either on LRU or
> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
> done as part of free page stealing with little additional overhead.
> 
> The page stealing code is changed so that it considers free pages plus pages
> of the "good" migratetype for the decision whether to change pageblock's
> migratetype.
> 
> The result should be more accurate migratetype of pageblocks wrt the actual
> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
> should help the efficiency of page grouping by mobility.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

While it's fine now, it may be necessary to unify the checks that
compaction and the page allocator use for determining if the page can
move. In general, this is still a better idea for a modest amount of
overhead in a path that is considered slow anyway so

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 05/10] mm, compaction: change migrate_async_suitable() to suitable_migration_source()
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-13 10:53     ` Mel Gorman
  -1 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:38PM +0100, Vlastimil Babka wrote:
> Preparation for making the decisions more complex and depending on
> compact_control flags. No functional change.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 05/10] mm, compaction: change migrate_async_suitable() to suitable_migration_source()
@ 2017-02-13 10:53     ` Mel Gorman
  0 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:38PM +0100, Vlastimil Babka wrote:
> Preparation for making the decisions more complex and depending on
> compact_control flags. No functional change.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 06/10] mm, compaction: add migratetype to compact_control
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-13 10:53     ` Mel Gorman
  -1 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:39PM +0100, Vlastimil Babka wrote:
> Preparation patch. We are going to need migratetype at lower layers than
> compact_zone() and compact_finished().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 06/10] mm, compaction: add migratetype to compact_control
@ 2017-02-13 10:53     ` Mel Gorman
  0 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:39PM +0100, Vlastimil Babka wrote:
> Preparation patch. We are going to need migratetype at lower layers than
> compact_zone() and compact_finished().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 03/10] mm, page_alloc: split smallest stolen page in fallback
  2017-02-13 10:51     ` Mel Gorman
@ 2017-02-13 10:54       ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-13 10:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/13/2017 11:51 AM, Mel Gorman wrote:
> On Fri, Feb 10, 2017 at 06:23:36PM +0100, Vlastimil Babka wrote:
>> The __rmqueue_fallback() function is called when there's no free page of
>> requested migratetype, and we need to steal from a different one. There are
>> various heuristics to make this event infrequent and reduce permanent
>> fragmentation. The main one is to try stealing from a pageblock that has the
>> most free pages, and possibly steal them all at once and convert the whole
>> pageblock. Precise searching for such pageblock would be expensive, so instead
>> the heuristics walks the free lists from MAX_ORDER down to requested order and
>> assumes that the block with highest-order free page is likely to also have the
>> most free pages in total.
>>
>> Chances are that together with the highest-order page, we steal also pages of
>> lower orders from the same block. But then we still split the highest order
>> page. This is wasteful and can contribute to fragmentation instead of avoiding
>> it.
>>
> 
> The original intent was that if an allocation request was stealing a
> pageblock that taking the largest one would reduce the likelihood of a
> steal in the near future by the same type.

I understand the intent and tried to explain that in the first
paragraph. This patch doesn't change that, we still select the pageblock
for stealing based on the largest free page we find. But if we manage to
steal also some smaller pages from the same pageblock, we will split the
smallest one instead of the largest one.

>> This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
>> them on the freelist of the requested migratetype, and only report whether it
>> was successful. Then we pick (and eventually split) the smallest page with
>> __rmqueue_smallest().  This all happens under zone lock, so nobody can steal it
>> from us in the process. This should reduce fragmentation due to fallbacks. At
>> worst we are only stealing a single highest-order page and waste some cycles by
>> moving it between lists and then removing it, but fallback is not exactly hot
>> path so that should not be a concern. As a side benefit the patch removes some
>> duplicate code by reusing __rmqueue_smallest().
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> But conceptually this is better so
> 
> Acked-by: Mel Gorman <mgorman@techsingularity.net>

Thanks!

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 03/10] mm, page_alloc: split smallest stolen page in fallback
@ 2017-02-13 10:54       ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-13 10:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/13/2017 11:51 AM, Mel Gorman wrote:
> On Fri, Feb 10, 2017 at 06:23:36PM +0100, Vlastimil Babka wrote:
>> The __rmqueue_fallback() function is called when there's no free page of
>> requested migratetype, and we need to steal from a different one. There are
>> various heuristics to make this event infrequent and reduce permanent
>> fragmentation. The main one is to try stealing from a pageblock that has the
>> most free pages, and possibly steal them all at once and convert the whole
>> pageblock. Precise searching for such pageblock would be expensive, so instead
>> the heuristics walks the free lists from MAX_ORDER down to requested order and
>> assumes that the block with highest-order free page is likely to also have the
>> most free pages in total.
>>
>> Chances are that together with the highest-order page, we steal also pages of
>> lower orders from the same block. But then we still split the highest order
>> page. This is wasteful and can contribute to fragmentation instead of avoiding
>> it.
>>
> 
> The original intent was that if an allocation request was stealing a
> pageblock that taking the largest one would reduce the likelihood of a
> steal in the near future by the same type.

I understand the intent and tried to explain that in the first
paragraph. This patch doesn't change that, we still select the pageblock
for stealing based on the largest free page we find. But if we manage to
steal also some smaller pages from the same pageblock, we will split the
smallest one instead of the largest one.

>> This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
>> them on the freelist of the requested migratetype, and only report whether it
>> was successful. Then we pick (and eventually split) the smallest page with
>> __rmqueue_smallest().  This all happens under zone lock, so nobody can steal it
>> from us in the process. This should reduce fragmentation due to fallbacks. At
>> worst we are only stealing a single highest-order page and waste some cycles by
>> moving it between lists and then removing it, but fallback is not exactly hot
>> path so that should not be a concern. As a side benefit the patch removes some
>> duplicate code by reusing __rmqueue_smallest().
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> But conceptually this is better so
> 
> Acked-by: Mel Gorman <mgorman@techsingularity.net>

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-13 10:56     ` Mel Gorman
  -1 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:40PM +0100, Vlastimil Babka wrote:
> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> pageblocks. This is a heuristic intended to reduce latency, based on the
> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> 
> However, with the exception of THP's, most high-order allocations are not
> movable. Should the async compaction succeed, this increases the chance that
> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> long-term fragmentation worse.
> 
> This patch attempts to help the situation by changing async direct compaction
> so that the migrate scanner only scans the pageblocks of the requested
> migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
> contain movable pages, chances are that the allocation can succeed within one
> of such pageblocks, removing the need for a fallback. If that fails, the
> subsequent sync attempt will ignore this restriction.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Ok, I really like this idea. The thinking of async originally was to
reduce latency but that was also at the time when THP allocations were
stalling for long periods of time. Now that the default has changed,
this idea makes a lot of sense. A few months ago I would have thought
that this will increase the changes that a high-order allocation for the
stack may have a higher chance of failing but with VMAP_STACK, this is
much less of a concern. It would be very nice to know for this patch if
the number of times the extfrag tracepoint is triggered is reduced or
increased by this patch. Do you have that data?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype
@ 2017-02-13 10:56     ` Mel Gorman
  0 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:40PM +0100, Vlastimil Babka wrote:
> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> pageblocks. This is a heuristic intended to reduce latency, based on the
> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> 
> However, with the exception of THP's, most high-order allocations are not
> movable. Should the async compaction succeed, this increases the chance that
> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> long-term fragmentation worse.
> 
> This patch attempts to help the situation by changing async direct compaction
> so that the migrate scanner only scans the pageblocks of the requested
> migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
> contain movable pages, chances are that the allocation can succeed within one
> of such pageblocks, removing the need for a fallback. If that fails, the
> subsequent sync attempt will ignore this restriction.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Ok, I really like this idea. The thinking of async originally was to
reduce latency but that was also at the time when THP allocations were
stalling for long periods of time. Now that the default has changed,
this idea makes a lot of sense. A few months ago I would have thought
that this will increase the changes that a high-order allocation for the
stack may have a higher chance of failing but with VMAP_STACK, this is
much less of a concern. It would be very nice to know for this patch if
the number of times the extfrag tracepoint is triggered is reduced or
increased by this patch. Do you have that data?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 08/10] mm, compaction: finish whole pageblock to reduce fragmentation
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-13 10:57     ` Mel Gorman
  -1 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:41PM +0100, Vlastimil Babka wrote:
> The main goal of direct compaction is to form a high-order page for allocation,
> but it should also help against long-term fragmentation when possible. Most
> lower-than-pageblock-order compactions are for non-movable allocations, which
> means that if we compact in a movable pageblock and terminate as soon as we
> create the high-order page, it's unlikely that the fallback heuristics will
> claim the whole block. Instead there might be a single unmovable page in a
> pageblock full of movable pages, and the next unmovable allocation might pick
> another pageblock and increase long-term fragmentation.
> 
> To help against such scenarios, this patch changes the termination criteria for
> compaction so that the current pageblock is finished even though the high-order
> page already exists. Note that it might be possible that the high-order page
> formed elsewhere in the zone due to parallel activity, but this patch doesn't
> try to detect that.
> 
> This is only done with sync compaction, because async compaction is limited to
> pageblock of the same migratetype, where it cannot result in a migratetype
> fallback. (Async compaction also eagerly skips order-aligned blocks where
> isolation fails, which is against the goal of migrating away as much of the
> pageblock as possible.)
> 
> As a result of this patch, long-term memory fragmentation should be reduced.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 08/10] mm, compaction: finish whole pageblock to reduce fragmentation
@ 2017-02-13 10:57     ` Mel Gorman
  0 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 10:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:41PM +0100, Vlastimil Babka wrote:
> The main goal of direct compaction is to form a high-order page for allocation,
> but it should also help against long-term fragmentation when possible. Most
> lower-than-pageblock-order compactions are for non-movable allocations, which
> means that if we compact in a movable pageblock and terminate as soon as we
> create the high-order page, it's unlikely that the fallback heuristics will
> claim the whole block. Instead there might be a single unmovable page in a
> pageblock full of movable pages, and the next unmovable allocation might pick
> another pageblock and increase long-term fragmentation.
> 
> To help against such scenarios, this patch changes the termination criteria for
> compaction so that the current pageblock is finished even though the high-order
> page already exists. Note that it might be possible that the high-order page
> formed elsewhere in the zone due to parallel activity, but this patch doesn't
> try to detect that.
> 
> This is only done with sync compaction, because async compaction is limited to
> pageblock of the same migratetype, where it cannot result in a migratetype
> fallback. (Async compaction also eagerly skips order-aligned blocks where
> isolation fails, which is against the goal of migrating away as much of the
> pageblock as possible.)
> 
> As a result of this patch, long-term memory fragmentation should be reduced.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
  2017-02-10 17:23 ` Vlastimil Babka
@ 2017-02-13 11:07   ` Mel Gorman
  -1 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 11:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
> Hi,
> 
> this is a v2 of [1] from last year, which was a response to Johanes' worries
> about mobility grouping regressions. There are some new patches and the order
> goes from cleanups to "obvious wins" towards "just RFC" (last two patches).
> But it's all theoretical for now, I'm trying to run some tests with the usual
> problem of not having good workloads and metrics :) But I'd like to hear some
> feedback anyway. For now this is based on v4.9.
> 
> I think the only substantial new patch is 08/10, the rest is some cleanups,
> small tweaks and bugfixes.
> 

By and large, I like the series, particularly patches 7 and 8. I cannot
make up my mind about the RFC patches 9 and 10 yet. Conceptually they
seem sound but they are much more far reaching than the rest of the
series.

It would be nice if patches 1-8 could be treated in isolation with data
on the number of extfrag events triggered, time spent in compaction and
the success rate. Patches 9 and 10 are tricy enough that they would need
data per patch where as patches 1-8 should be ok with data gathered for
the whole series.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
@ 2017-02-13 11:07   ` Mel Gorman
  0 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-13 11:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
> Hi,
> 
> this is a v2 of [1] from last year, which was a response to Johanes' worries
> about mobility grouping regressions. There are some new patches and the order
> goes from cleanups to "obvious wins" towards "just RFC" (last two patches).
> But it's all theoretical for now, I'm trying to run some tests with the usual
> problem of not having good workloads and metrics :) But I'd like to hear some
> feedback anyway. For now this is based on v4.9.
> 
> I think the only substantial new patch is 08/10, the rest is some cleanups,
> small tweaks and bugfixes.
> 

By and large, I like the series, particularly patches 7 and 8. I cannot
make up my mind about the RFC patches 9 and 10 yet. Conceptually they
seem sound but they are much more far reaching than the rest of the
series.

It would be nice if patches 1-8 could be treated in isolation with data
on the number of extfrag events triggered, time spent in compaction and
the success rate. Patches 9 and 10 are tricy enough that they would need
data per patch where as patches 1-8 should be ok with data gathered for
the whole series.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-14 10:07     ` Xishi Qiu
  -1 siblings, 0 replies; 92+ messages in thread
From: Xishi Qiu @ 2017-02-14 10:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	Mel Gorman, linux-kernel, kernel-team

On 2017/2/11 1:23, Vlastimil Babka wrote:

> When stealing pages from pageblock of a different migratetype, we count how
> many free pages were stolen, and change the pageblock's migratetype if more
> than half of the pageblock was free. This might be too conservative, as there
> might be other pages that are not free, but were allocated with the same
> migratetype as our allocation requested.
> 
> While we cannot determine the migratetype of allocated pages precisely (at
> least without the page_owner functionality enabled), we can count pages that
> compaction would try to isolate for migration - those are either on LRU or
> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
> done as part of free page stealing with little additional overhead.
> 
> The page stealing code is changed so that it considers free pages plus pages
> of the "good" migratetype for the decision whether to change pageblock's
> migratetype.
> 
> The result should be more accurate migratetype of pageblocks wrt the actual
> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
> should help the efficiency of page grouping by mobility.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Hi Vlastimil,

How about these two changes?

1. If we steal some free pages, we will add these page at the head of start_migratetype
list, it will cause more fixed, because these pages will be allocated more easily.
So how about use list_move_tail instead of list_move?

__rmqueue_fallback
	steal_suitable_fallback
		move_freepages_block
			move_freepages
				list_move

2. When doing expand() - list_add(), usually the list is empty, but in the
following case, the list is not empty, because we did move_freepages_block()
before.

__rmqueue_fallback
	steal_suitable_fallback
		move_freepages_block  // move to the list of start_migratetype
	expand  // split the largest order
		list_add  // add to the list of start_migratetype

So how about use list_add_tail instead of list_add? Then we can merge the large
block again as soon as the page freed.

Thanks,
Xishi Qiu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-02-14 10:07     ` Xishi Qiu
  0 siblings, 0 replies; 92+ messages in thread
From: Xishi Qiu @ 2017-02-14 10:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	Mel Gorman, linux-kernel, kernel-team

On 2017/2/11 1:23, Vlastimil Babka wrote:

> When stealing pages from pageblock of a different migratetype, we count how
> many free pages were stolen, and change the pageblock's migratetype if more
> than half of the pageblock was free. This might be too conservative, as there
> might be other pages that are not free, but were allocated with the same
> migratetype as our allocation requested.
> 
> While we cannot determine the migratetype of allocated pages precisely (at
> least without the page_owner functionality enabled), we can count pages that
> compaction would try to isolate for migration - those are either on LRU or
> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
> done as part of free page stealing with little additional overhead.
> 
> The page stealing code is changed so that it considers free pages plus pages
> of the "good" migratetype for the decision whether to change pageblock's
> migratetype.
> 
> The result should be more accurate migratetype of pageblocks wrt the actual
> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
> should help the efficiency of page grouping by mobility.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Hi Vlastimil,

How about these two changes?

1. If we steal some free pages, we will add these page at the head of start_migratetype
list, it will cause more fixed, because these pages will be allocated more easily.
So how about use list_move_tail instead of list_move?

__rmqueue_fallback
	steal_suitable_fallback
		move_freepages_block
			move_freepages
				list_move

2. When doing expand() - list_add(), usually the list is empty, but in the
following case, the list is not empty, because we did move_freepages_block()
before.

__rmqueue_fallback
	steal_suitable_fallback
		move_freepages_block  // move to the list of start_migratetype
	expand  // split the largest order
		list_add  // add to the list of start_migratetype

So how about use list_add_tail instead of list_add? Then we can merge the large
block again as soon as the page freed.

Thanks,
Xishi Qiu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 01/10] mm, compaction: reorder fields in struct compact_control
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-14 16:33     ` Johannes Weiner
  -1 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 16:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:34PM +0100, Vlastimil Babka wrote:
> While currently there are (mostly by accident) no holes in struct
> compact_control (on x86_64), but we are going to add more bool flags, so place
> them all together to the end of the structure. While at it, just order all
> fields from largest to smallest.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 01/10] mm, compaction: reorder fields in struct compact_control
@ 2017-02-14 16:33     ` Johannes Weiner
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 16:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:34PM +0100, Vlastimil Babka wrote:
> While currently there are (mostly by accident) no holes in struct
> compact_control (on x86_64), but we are going to add more bool flags, so place
> them all together to the end of the structure. While at it, just order all
> fields from largest to smallest.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 02/10] mm, compaction: remove redundant watermark check in compact_finished()
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-14 16:34     ` Johannes Weiner
  -1 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 16:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:35PM +0100, Vlastimil Babka wrote:
> When detecting whether compaction has succeeded in forming a high-order page,
> __compact_finished() employs a watermark check, followed by an own search for
> a suitable page in the freelists. This is not ideal for two reasons:
> 
> - The watermark check also searches high-order freelists, but has a less strict
>   criteria wrt fallback. It's therefore redundant and waste of cycles. This was
>   different in the past when high-order watermark check attempted to apply
>   reserves to high-order pages.
> 
> - The watermark check might actually fail due to lack of order-0 pages.
>   Compaction can't help with that, so there's no point in continuing because of
>   that. It's possible that high-order page still exists and it terminates.
> 
> This patch therefore removes the watermark check. This should save some cycles
> and terminate compaction sooner in some cases.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 02/10] mm, compaction: remove redundant watermark check in compact_finished()
@ 2017-02-14 16:34     ` Johannes Weiner
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 16:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:35PM +0100, Vlastimil Babka wrote:
> When detecting whether compaction has succeeded in forming a high-order page,
> __compact_finished() employs a watermark check, followed by an own search for
> a suitable page in the freelists. This is not ideal for two reasons:
> 
> - The watermark check also searches high-order freelists, but has a less strict
>   criteria wrt fallback. It's therefore redundant and waste of cycles. This was
>   different in the past when high-order watermark check attempted to apply
>   reserves to high-order pages.
> 
> - The watermark check might actually fail due to lack of order-0 pages.
>   Compaction can't help with that, so there's no point in continuing because of
>   that. It's possible that high-order page still exists and it terminates.
> 
> This patch therefore removes the watermark check. This should save some cycles
> and terminate compaction sooner in some cases.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 03/10] mm, page_alloc: split smallest stolen page in fallback
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-14 16:59     ` Johannes Weiner
  -1 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 16:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:36PM +0100, Vlastimil Babka wrote:
> The __rmqueue_fallback() function is called when there's no free page of
> requested migratetype, and we need to steal from a different one. There are
> various heuristics to make this event infrequent and reduce permanent
> fragmentation. The main one is to try stealing from a pageblock that has the
> most free pages, and possibly steal them all at once and convert the whole
> pageblock. Precise searching for such pageblock would be expensive, so instead
> the heuristics walks the free lists from MAX_ORDER down to requested order and
> assumes that the block with highest-order free page is likely to also have the
> most free pages in total.
> 
> Chances are that together with the highest-order page, we steal also pages of
> lower orders from the same block. But then we still split the highest order
> page. This is wasteful and can contribute to fragmentation instead of avoiding
> it.
> 
> This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
> them on the freelist of the requested migratetype, and only report whether it
> was successful. Then we pick (and eventually split) the smallest page with
> __rmqueue_smallest().  This all happens under zone lock, so nobody can steal it
> from us in the process. This should reduce fragmentation due to fallbacks. At
> worst we are only stealing a single highest-order page and waste some cycles by
> moving it between lists and then removing it, but fallback is not exactly hot
> path so that should not be a concern. As a side benefit the patch removes some
> duplicate code by reusing __rmqueue_smallest().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

It took me a second to understand what you're doing here, but this is
clever. Finding a suitable fallback still goes by biggest block to
make future stealing less probable, but when we do steal the entire
block and move_freepages_block() has migrated all the free chunks of
that block over to the new migratetype list, we might as well then try
to allocate from the smallest chunk available in the stolen block.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 03/10] mm, page_alloc: split smallest stolen page in fallback
@ 2017-02-14 16:59     ` Johannes Weiner
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 16:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:36PM +0100, Vlastimil Babka wrote:
> The __rmqueue_fallback() function is called when there's no free page of
> requested migratetype, and we need to steal from a different one. There are
> various heuristics to make this event infrequent and reduce permanent
> fragmentation. The main one is to try stealing from a pageblock that has the
> most free pages, and possibly steal them all at once and convert the whole
> pageblock. Precise searching for such pageblock would be expensive, so instead
> the heuristics walks the free lists from MAX_ORDER down to requested order and
> assumes that the block with highest-order free page is likely to also have the
> most free pages in total.
> 
> Chances are that together with the highest-order page, we steal also pages of
> lower orders from the same block. But then we still split the highest order
> page. This is wasteful and can contribute to fragmentation instead of avoiding
> it.
> 
> This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
> them on the freelist of the requested migratetype, and only report whether it
> was successful. Then we pick (and eventually split) the smallest page with
> __rmqueue_smallest().  This all happens under zone lock, so nobody can steal it
> from us in the process. This should reduce fragmentation due to fallbacks. At
> worst we are only stealing a single highest-order page and waste some cycles by
> moving it between lists and then removing it, but fallback is not exactly hot
> path so that should not be a concern. As a side benefit the patch removes some
> duplicate code by reusing __rmqueue_smallest().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

It took me a second to understand what you're doing here, but this is
clever. Finding a suitable fallback still goes by biggest block to
make future stealing less probable, but when we do steal the entire
block and move_freepages_block() has migrated all the free chunks of
that block over to the new migratetype list, we might as well then try
to allocate from the smallest chunk available in the stolen block.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-14 18:10     ` Johannes Weiner
  -1 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 18:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:37PM +0100, Vlastimil Babka wrote:
> When stealing pages from pageblock of a different migratetype, we count how
> many free pages were stolen, and change the pageblock's migratetype if more
> than half of the pageblock was free. This might be too conservative, as there
> might be other pages that are not free, but were allocated with the same
> migratetype as our allocation requested.
> 
> While we cannot determine the migratetype of allocated pages precisely (at
> least without the page_owner functionality enabled), we can count pages that
> compaction would try to isolate for migration - those are either on LRU or
> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
> done as part of free page stealing with little additional overhead.
> 
> The page stealing code is changed so that it considers free pages plus pages
> of the "good" migratetype for the decision whether to change pageblock's
> migratetype.
> 
> The result should be more accurate migratetype of pageblocks wrt the actual
> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
> should help the efficiency of page grouping by mobility.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

That makes sense to me. I have just one nit about the patch:

> @@ -1981,10 +1994,29 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>  		return;
>  	}
>  
> -	pages = move_freepages_block(zone, page, start_type);
> +	free_pages = move_freepages_block(zone, page, start_type,
> +						&good_pages);
> +	/*
> +	 * good_pages is now the number of movable pages, but if we
> +	 * want UNMOVABLE or RECLAIMABLE allocation, it's more tricky
> +	 */
> +	if (start_type != MIGRATE_MOVABLE) {
> +		/*
> +		 * If we are falling back to MIGRATE_MOVABLE pageblock,
> +		 * treat all non-movable pages as good. If it's UNMOVABLE
> +		 * falling back to RECLAIMABLE or vice versa, be conservative
> +		 * as we can't distinguish the exact migratetype.
> +		 */
> +		old_block_type = get_pageblock_migratetype(page);
> +		if (old_block_type == MIGRATE_MOVABLE)
> +			good_pages = pageblock_nr_pages
> +						- free_pages - good_pages;

This line had me scratch my head for a while, and I think it's mostly
because of the variable naming and the way the comments are phrased.

Could you use a variable called movable_pages to pass to and be filled
in by move_freepages_block?

And instead of good_pages something like starttype_pages or
alike_pages or st_pages or mt_pages or something, to indicate the
number of pages that are comparable to the allocation's migratetype?

> -	/* Claim the whole block if over half of it is free */
> -	if (pages >= (1 << (pageblock_order-1)) ||
> +	/* Claim the whole block if over half of it is free or good type */
> +	if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||
>  			page_group_by_mobility_disabled)
>  		set_pageblock_migratetype(page, start_type);

This would then read

	if (free_pages + alike_pages ...)

which I think would be more descriptive.

The comment leading the entire section following move_freepages_block
could then say something like "If a sufficient number of pages in the
block are either free or of comparable migratability as our
allocation, claim the whole block." Followed by the caveats of how we
determine this migratibility.

Or maybe even the function. The comment above the function seems out
of date after this patch.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-02-14 18:10     ` Johannes Weiner
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 18:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:37PM +0100, Vlastimil Babka wrote:
> When stealing pages from pageblock of a different migratetype, we count how
> many free pages were stolen, and change the pageblock's migratetype if more
> than half of the pageblock was free. This might be too conservative, as there
> might be other pages that are not free, but were allocated with the same
> migratetype as our allocation requested.
> 
> While we cannot determine the migratetype of allocated pages precisely (at
> least without the page_owner functionality enabled), we can count pages that
> compaction would try to isolate for migration - those are either on LRU or
> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
> done as part of free page stealing with little additional overhead.
> 
> The page stealing code is changed so that it considers free pages plus pages
> of the "good" migratetype for the decision whether to change pageblock's
> migratetype.
> 
> The result should be more accurate migratetype of pageblocks wrt the actual
> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
> should help the efficiency of page grouping by mobility.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

That makes sense to me. I have just one nit about the patch:

> @@ -1981,10 +1994,29 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>  		return;
>  	}
>  
> -	pages = move_freepages_block(zone, page, start_type);
> +	free_pages = move_freepages_block(zone, page, start_type,
> +						&good_pages);
> +	/*
> +	 * good_pages is now the number of movable pages, but if we
> +	 * want UNMOVABLE or RECLAIMABLE allocation, it's more tricky
> +	 */
> +	if (start_type != MIGRATE_MOVABLE) {
> +		/*
> +		 * If we are falling back to MIGRATE_MOVABLE pageblock,
> +		 * treat all non-movable pages as good. If it's UNMOVABLE
> +		 * falling back to RECLAIMABLE or vice versa, be conservative
> +		 * as we can't distinguish the exact migratetype.
> +		 */
> +		old_block_type = get_pageblock_migratetype(page);
> +		if (old_block_type == MIGRATE_MOVABLE)
> +			good_pages = pageblock_nr_pages
> +						- free_pages - good_pages;

This line had me scratch my head for a while, and I think it's mostly
because of the variable naming and the way the comments are phrased.

Could you use a variable called movable_pages to pass to and be filled
in by move_freepages_block?

And instead of good_pages something like starttype_pages or
alike_pages or st_pages or mt_pages or something, to indicate the
number of pages that are comparable to the allocation's migratetype?

> -	/* Claim the whole block if over half of it is free */
> -	if (pages >= (1 << (pageblock_order-1)) ||
> +	/* Claim the whole block if over half of it is free or good type */
> +	if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||
>  			page_group_by_mobility_disabled)
>  		set_pageblock_migratetype(page, start_type);

This would then read

	if (free_pages + alike_pages ...)

which I think would be more descriptive.

The comment leading the entire section following move_freepages_block
could then say something like "If a sufficient number of pages in the
block are either free or of comparable migratability as our
allocation, claim the whole block." Followed by the caveats of how we
determine this migratibility.

Or maybe even the function. The comment above the function seems out
of date after this patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 05/10] mm, compaction: change migrate_async_suitable() to suitable_migration_source()
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-14 18:12     ` Johannes Weiner
  -1 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 18:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:38PM +0100, Vlastimil Babka wrote:
> Preparation for making the decisions more complex and depending on
> compact_control flags. No functional change.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 05/10] mm, compaction: change migrate_async_suitable() to suitable_migration_source()
@ 2017-02-14 18:12     ` Johannes Weiner
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 18:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:38PM +0100, Vlastimil Babka wrote:
> Preparation for making the decisions more complex and depending on
> compact_control flags. No functional change.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 06/10] mm, compaction: add migratetype to compact_control
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-14 18:15     ` Johannes Weiner
  -1 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 18:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:39PM +0100, Vlastimil Babka wrote:
> Preparation patch. We are going to need migratetype at lower layers than
> compact_zone() and compact_finished().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 06/10] mm, compaction: add migratetype to compact_control
@ 2017-02-14 18:15     ` Johannes Weiner
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 18:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:39PM +0100, Vlastimil Babka wrote:
> Preparation patch. We are going to need migratetype at lower layers than
> compact_zone() and compact_finished().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-14 20:10     ` Johannes Weiner
  -1 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 20:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:40PM +0100, Vlastimil Babka wrote:
> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> pageblocks. This is a heuristic intended to reduce latency, based on the
> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> 
> However, with the exception of THP's, most high-order allocations are not
> movable. Should the async compaction succeed, this increases the chance that
> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> long-term fragmentation worse.
> 
> This patch attempts to help the situation by changing async direct compaction
> so that the migrate scanner only scans the pageblocks of the requested
> migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
> contain movable pages, chances are that the allocation can succeed within one
> of such pageblocks, removing the need for a fallback. If that fails, the
> subsequent sync attempt will ignore this restriction.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Yes, IMO we should make the async compaction scanner decontaminate
unmovable blocks. This is because we fall back to other-typed blocks
before we reclaim, so any unmovable blocks that aren't perfectly
occupied will fill with greedy page cache (and order-0 doesn't steal
blocks back to make them compactable again). Subsequent unmovable
higher-order allocations in turn are more likely to fall back and
steal more movable blocks.

As long as we have vastly more movable blocks than unmovable blocks,
continuous page cache turnover will counteract this negative trend -
pages are reclaimed mostly from movable blocks and some unmovable
blocks, while new cache allocations are placed into the freed movable
blocks - slowly moving cache out from unmovable blocks into movable
ones. But that effect is independent of the rate of higher-order
allocations and can be overwhelmed, so I think it makes sense to
involve compaction directly in decontamination.

The thing I'm not entirely certain about is the aggressiveness of this
patch. Instead of restricting the async scanner to blocks of the same
migratetype, wouldn't it be better (in terms of allocation latency) to
simply let it compact *all* block types? Maybe changing it to look at
unmovable blocks is enough to curb cross-contamination. Sure there
will still be some, but now we're matching the decontamination rate to
the rate of !movable higher-order allocations and don't just rely on
the independent cache turnover rate, which during higher-order bursts
might not be high enough to prevent an expansion of unmovable blocks.

Does that make sense?

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype
@ 2017-02-14 20:10     ` Johannes Weiner
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-14 20:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:40PM +0100, Vlastimil Babka wrote:
> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> pageblocks. This is a heuristic intended to reduce latency, based on the
> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> 
> However, with the exception of THP's, most high-order allocations are not
> movable. Should the async compaction succeed, this increases the chance that
> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> long-term fragmentation worse.
> 
> This patch attempts to help the situation by changing async direct compaction
> so that the migrate scanner only scans the pageblocks of the requested
> migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
> contain movable pages, chances are that the allocation can succeed within one
> of such pageblocks, removing the need for a fallback. If that fails, the
> subsequent sync attempt will ignore this restriction.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Yes, IMO we should make the async compaction scanner decontaminate
unmovable blocks. This is because we fall back to other-typed blocks
before we reclaim, so any unmovable blocks that aren't perfectly
occupied will fill with greedy page cache (and order-0 doesn't steal
blocks back to make them compactable again). Subsequent unmovable
higher-order allocations in turn are more likely to fall back and
steal more movable blocks.

As long as we have vastly more movable blocks than unmovable blocks,
continuous page cache turnover will counteract this negative trend -
pages are reclaimed mostly from movable blocks and some unmovable
blocks, while new cache allocations are placed into the freed movable
blocks - slowly moving cache out from unmovable blocks into movable
ones. But that effect is independent of the rate of higher-order
allocations and can be overwhelmed, so I think it makes sense to
involve compaction directly in decontamination.

The thing I'm not entirely certain about is the aggressiveness of this
patch. Instead of restricting the async scanner to blocks of the same
migratetype, wouldn't it be better (in terms of allocation latency) to
simply let it compact *all* block types? Maybe changing it to look at
unmovable blocks is enough to curb cross-contamination. Sure there
will still be some, but now we're matching the decontamination rate to
the rate of !movable higher-order allocations and don't just rely on
the independent cache turnover rate, which during higher-order bursts
might not be high enough to prevent an expansion of unmovable blocks.

Does that make sense?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
  2017-02-14 10:07     ` Xishi Qiu
@ 2017-02-15 10:47       ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-15 10:47 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	Mel Gorman, linux-kernel, kernel-team

On 02/14/2017 11:07 AM, Xishi Qiu wrote:
> On 2017/2/11 1:23, Vlastimil Babka wrote:
> 
>> When stealing pages from pageblock of a different migratetype, we count how
>> many free pages were stolen, and change the pageblock's migratetype if more
>> than half of the pageblock was free. This might be too conservative, as there
>> might be other pages that are not free, but were allocated with the same
>> migratetype as our allocation requested.
>>
>> While we cannot determine the migratetype of allocated pages precisely (at
>> least without the page_owner functionality enabled), we can count pages that
>> compaction would try to isolate for migration - those are either on LRU or
>> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
>> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
>> done as part of free page stealing with little additional overhead.
>>
>> The page stealing code is changed so that it considers free pages plus pages
>> of the "good" migratetype for the decision whether to change pageblock's
>> migratetype.
>>
>> The result should be more accurate migratetype of pageblocks wrt the actual
>> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
>> should help the efficiency of page grouping by mobility.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Hi Vlastimil,
> 
> How about these two changes?
> 
> 1. If we steal some free pages, we will add these page at the head of start_migratetype
> list, it will cause more fixed, because these pages will be allocated more easily.

What do you mean by "more fixed" here?

> So how about use list_move_tail instead of list_move?

Hmm, not sure if it can make any difference. We steal because the lists
are currently empty (at least for the order we want), so it shouldn't
matter if we add to head or tail.

> __rmqueue_fallback
> 	steal_suitable_fallback
> 		move_freepages_block
> 			move_freepages
> 				list_move
> 
> 2. When doing expand() - list_add(), usually the list is empty, but in the
> following case, the list is not empty, because we did move_freepages_block()
> before.
> 
> __rmqueue_fallback
> 	steal_suitable_fallback
> 		move_freepages_block  // move to the list of start_migratetype
> 	expand  // split the largest order
> 		list_add  // add to the list of start_migratetype
> 
> So how about use list_add_tail instead of list_add? Then we can merge the large
> block again as soon as the page freed.

Same here. The lists are not empty, but contain probably just the pages
from our stolen pageblock. It shouldn't matter how we order them within
the same block.

So maybe it could make some difference for higher-order allocations, but
it's unclear to me. Making e.g. expand() more complex with a flag to
tell it the head vs tail add could mean extra overhead in allocator fast
path that would offset any gains.

> Thanks,
> Xishi Qiu
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-02-15 10:47       ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-15 10:47 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	Mel Gorman, linux-kernel, kernel-team

On 02/14/2017 11:07 AM, Xishi Qiu wrote:
> On 2017/2/11 1:23, Vlastimil Babka wrote:
> 
>> When stealing pages from pageblock of a different migratetype, we count how
>> many free pages were stolen, and change the pageblock's migratetype if more
>> than half of the pageblock was free. This might be too conservative, as there
>> might be other pages that are not free, but were allocated with the same
>> migratetype as our allocation requested.
>>
>> While we cannot determine the migratetype of allocated pages precisely (at
>> least without the page_owner functionality enabled), we can count pages that
>> compaction would try to isolate for migration - those are either on LRU or
>> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
>> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
>> done as part of free page stealing with little additional overhead.
>>
>> The page stealing code is changed so that it considers free pages plus pages
>> of the "good" migratetype for the decision whether to change pageblock's
>> migratetype.
>>
>> The result should be more accurate migratetype of pageblocks wrt the actual
>> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
>> should help the efficiency of page grouping by mobility.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Hi Vlastimil,
> 
> How about these two changes?
> 
> 1. If we steal some free pages, we will add these page at the head of start_migratetype
> list, it will cause more fixed, because these pages will be allocated more easily.

What do you mean by "more fixed" here?

> So how about use list_move_tail instead of list_move?

Hmm, not sure if it can make any difference. We steal because the lists
are currently empty (at least for the order we want), so it shouldn't
matter if we add to head or tail.

> __rmqueue_fallback
> 	steal_suitable_fallback
> 		move_freepages_block
> 			move_freepages
> 				list_move
> 
> 2. When doing expand() - list_add(), usually the list is empty, but in the
> following case, the list is not empty, because we did move_freepages_block()
> before.
> 
> __rmqueue_fallback
> 	steal_suitable_fallback
> 		move_freepages_block  // move to the list of start_migratetype
> 	expand  // split the largest order
> 		list_add  // add to the list of start_migratetype
> 
> So how about use list_add_tail instead of list_add? Then we can merge the large
> block again as soon as the page freed.

Same here. The lists are not empty, but contain probably just the pages
from our stolen pageblock. It shouldn't matter how we order them within
the same block.

So maybe it could make some difference for higher-order allocations, but
it's unclear to me. Making e.g. expand() more complex with a flag to
tell it the head vs tail add could mean extra overhead in allocator fast
path that would offset any gains.

> Thanks,
> Xishi Qiu
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
  2017-02-15 10:47       ` Vlastimil Babka
@ 2017-02-15 11:56         ` Xishi Qiu
  -1 siblings, 0 replies; 92+ messages in thread
From: Xishi Qiu @ 2017-02-15 11:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	Mel Gorman, linux-kernel, kernel-team

On 2017/2/15 18:47, Vlastimil Babka wrote:

> On 02/14/2017 11:07 AM, Xishi Qiu wrote:
>> On 2017/2/11 1:23, Vlastimil Babka wrote:
>>
>>> When stealing pages from pageblock of a different migratetype, we count how
>>> many free pages were stolen, and change the pageblock's migratetype if more
>>> than half of the pageblock was free. This might be too conservative, as there
>>> might be other pages that are not free, but were allocated with the same
>>> migratetype as our allocation requested.
>>>
>>> While we cannot determine the migratetype of allocated pages precisely (at
>>> least without the page_owner functionality enabled), we can count pages that
>>> compaction would try to isolate for migration - those are either on LRU or
>>> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
>>> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
>>> done as part of free page stealing with little additional overhead.
>>>
>>> The page stealing code is changed so that it considers free pages plus pages
>>> of the "good" migratetype for the decision whether to change pageblock's
>>> migratetype.
>>>
>>> The result should be more accurate migratetype of pageblocks wrt the actual
>>> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
>>> should help the efficiency of page grouping by mobility.
>>>
>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>>
>> Hi Vlastimil,
>>
>> How about these two changes?
>>
>> 1. If we steal some free pages, we will add these page at the head of start_migratetype
>> list, it will cause more fixed, because these pages will be allocated more easily.
> 
> What do you mean by "more fixed" here?
> 
>> So how about use list_move_tail instead of list_move?
> 
> Hmm, not sure if it can make any difference. We steal because the lists
> are currently empty (at least for the order we want), so it shouldn't
> matter if we add to head or tail.
> 

Hi Vlastimil,

Please see the following case, I am not sure if it is right.

MIGRATE_MOVABLE
order:    0 1 2 3 4 5 6 7 8 9 10
free num: 1 1 1 1 1 1 1 1 1 1 0  // one page(e.g. page A) was allocated before

MIGRATE_UNMOVABLE
order:    0 1 2 3 4 5 6 7 8 9 10
free num: x x x x 0 0 0 0 0 0 0 // we want order=4, so steal from MIGRATE_MOVABLE

We alloc order=4 in MIGRATE_UNMOVABLE, then it will fallback to steal pages from
MIGRATE_MOVABLE, and we will move free pages form MIGRATE_MOVABLE list to 
MIGRATE_UNMOVABLE list.

List of order 4-9 in MIGRATE_UNMOVABLE is empty, so add head or tail is the same.
But order 0-3 is not empty, so if we add to the head, we will allocate pages which
stolen from MIGRATE_MOVABLE first later. So we will have less chance to make a large
block(order=10) when the one page(page A) free again.

Also we will split order=9 which from MIGRATE_MOVABLE to alloc order=4 in expand(),
so if we add to the head, we will allocate pages which split from order=9 first later.
So we will have less chance to make a large block(order=9) when the order=4 page
free again.

>> __rmqueue_fallback
>> 	steal_suitable_fallback
>> 		move_freepages_block
>> 			move_freepages
>> 				list_move
>>
>> 2. When doing expand() - list_add(), usually the list is empty, but in the
>> following case, the list is not empty, because we did move_freepages_block()
>> before.
>>
>> __rmqueue_fallback
>> 	steal_suitable_fallback
>> 		move_freepages_block  // move to the list of start_migratetype
>> 	expand  // split the largest order
>> 		list_add  // add to the list of start_migratetype
>>
>> So how about use list_add_tail instead of list_add? Then we can merge the large
>> block again as soon as the page freed.
> 
> Same here. The lists are not empty, but contain probably just the pages
> from our stolen pageblock. It shouldn't matter how we order them within
> the same block.
> 
> So maybe it could make some difference for higher-order allocations, but
> it's unclear to me. Making e.g. expand() more complex with a flag to
> tell it the head vs tail add could mean extra overhead in allocator fast
> path that would offset any gains.
> 
>> Thanks,
>> Xishi Qiu
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-02-15 11:56         ` Xishi Qiu
  0 siblings, 0 replies; 92+ messages in thread
From: Xishi Qiu @ 2017-02-15 11:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	Mel Gorman, linux-kernel, kernel-team

On 2017/2/15 18:47, Vlastimil Babka wrote:

> On 02/14/2017 11:07 AM, Xishi Qiu wrote:
>> On 2017/2/11 1:23, Vlastimil Babka wrote:
>>
>>> When stealing pages from pageblock of a different migratetype, we count how
>>> many free pages were stolen, and change the pageblock's migratetype if more
>>> than half of the pageblock was free. This might be too conservative, as there
>>> might be other pages that are not free, but were allocated with the same
>>> migratetype as our allocation requested.
>>>
>>> While we cannot determine the migratetype of allocated pages precisely (at
>>> least without the page_owner functionality enabled), we can count pages that
>>> compaction would try to isolate for migration - those are either on LRU or
>>> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
>>> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
>>> done as part of free page stealing with little additional overhead.
>>>
>>> The page stealing code is changed so that it considers free pages plus pages
>>> of the "good" migratetype for the decision whether to change pageblock's
>>> migratetype.
>>>
>>> The result should be more accurate migratetype of pageblocks wrt the actual
>>> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
>>> should help the efficiency of page grouping by mobility.
>>>
>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>>
>> Hi Vlastimil,
>>
>> How about these two changes?
>>
>> 1. If we steal some free pages, we will add these page at the head of start_migratetype
>> list, it will cause more fixed, because these pages will be allocated more easily.
> 
> What do you mean by "more fixed" here?
> 
>> So how about use list_move_tail instead of list_move?
> 
> Hmm, not sure if it can make any difference. We steal because the lists
> are currently empty (at least for the order we want), so it shouldn't
> matter if we add to head or tail.
> 

Hi Vlastimil,

Please see the following case, I am not sure if it is right.

MIGRATE_MOVABLE
order:    0 1 2 3 4 5 6 7 8 9 10
free num: 1 1 1 1 1 1 1 1 1 1 0  // one page(e.g. page A) was allocated before

MIGRATE_UNMOVABLE
order:    0 1 2 3 4 5 6 7 8 9 10
free num: x x x x 0 0 0 0 0 0 0 // we want order=4, so steal from MIGRATE_MOVABLE

We alloc order=4 in MIGRATE_UNMOVABLE, then it will fallback to steal pages from
MIGRATE_MOVABLE, and we will move free pages form MIGRATE_MOVABLE list to 
MIGRATE_UNMOVABLE list.

List of order 4-9 in MIGRATE_UNMOVABLE is empty, so add head or tail is the same.
But order 0-3 is not empty, so if we add to the head, we will allocate pages which
stolen from MIGRATE_MOVABLE first later. So we will have less chance to make a large
block(order=10) when the one page(page A) free again.

Also we will split order=9 which from MIGRATE_MOVABLE to alloc order=4 in expand(),
so if we add to the head, we will allocate pages which split from order=9 first later.
So we will have less chance to make a large block(order=9) when the order=4 page
free again.

>> __rmqueue_fallback
>> 	steal_suitable_fallback
>> 		move_freepages_block
>> 			move_freepages
>> 				list_move
>>
>> 2. When doing expand() - list_add(), usually the list is empty, but in the
>> following case, the list is not empty, because we did move_freepages_block()
>> before.
>>
>> __rmqueue_fallback
>> 	steal_suitable_fallback
>> 		move_freepages_block  // move to the list of start_migratetype
>> 	expand  // split the largest order
>> 		list_add  // add to the list of start_migratetype
>>
>> So how about use list_add_tail instead of list_add? Then we can merge the large
>> block again as soon as the page freed.
> 
> Same here. The lists are not empty, but contain probably just the pages
> from our stolen pageblock. It shouldn't matter how we order them within
> the same block.
> 
> So maybe it could make some difference for higher-order allocations, but
> it's unclear to me. Making e.g. expand() more complex with a flag to
> tell it the head vs tail add could mean extra overhead in allocator fast
> path that would offset any gains.
> 
>> Thanks,
>> Xishi Qiu
>>
> 
> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
  2017-02-13 11:07   ` Mel Gorman
@ 2017-02-15 14:29     ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-15 14:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/13/2017 12:07 PM, Mel Gorman wrote:
> On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
> 
> By and large, I like the series, particularly patches 7 and 8. I cannot
> make up my mind about the RFC patches 9 and 10 yet. Conceptually they
> seem sound but they are much more far reaching than the rest of the
> series.
> 
> It would be nice if patches 1-8 could be treated in isolation with data
> on the number of extfrag events triggered, time spent in compaction and
> the success rate. Patches 9 and 10 are tricy enough that they would need
> data per patch where as patches 1-8 should be ok with data gathered for
> the whole series.

I've got the results with mmtests stress-highalloc modified to do
GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of
each run, as the extfrag stats are quite volatile (note the stats below
are sums, not averages, as it was less perl hacking for me).

Success rate are the same, already high due to the low order. THP and
compaction stats also roughly the same. The extfrag stats (a bit
modified/expanded wrt. vanilla mmtests):

(the patches are stacked, and I haven't measured the non-functional-changes
patches separately)
							   base     patch 2     patch 3     patch 4     patch 7     patch 8
Page alloc extfrag event                               11734984    11769620    11485185    13029676    13312786    13939417
Extfrag fragmenting                                    11729231    11763921    11479301    13024101    13307281    13933978
Extfrag fragmenting for unmovable                         87848       84906       76328       78613       66025       59261
Extfrag fragmenting unmovable placed with movable          8298        7367        5865        8479        6440        5928
Extfrag fragmenting for reclaimable                    11636074    11673657    11397642    12940253    13236444    13869509
Extfrag fragmenting reclaimable placed with movable      389283      362396      330855      374292      390700      415478
Extfrag fragmenting for movable                            5309        5358        5331        5235        4812        5208

Going in order, patch 3 might be some improvement wrt polluting
(movable) pageblocks with unmovable, hopefully not noise.

Results for patch 4 ("count movable pages when stealing from pageblock")
are really puzzling me, as it increases the number of fragmenting events
for reclaimable allocations, implicating "reclaimable placed with (i.e.
falling back to) unmovable" (which is not listed separately above, but
follows logically from "reclaimable placed with movable" not changing
that much). I really wonder why is that. The patch effectively only
changes the decision to change migratetype of a pageblock, it doesn't
affect the actual stealing decision (which is always true for
RECLAIMABLE anyway, see can_steal_fallback()). Moreover, since we can't
distinguish UNMOVABLE from RECLAIMABLE when counting, good_pages is 0
and thus even the decision to change pageblock migratetype shouldn't be
changed by the patch for this case. I must recheck the implementation...

Patch 7 could be cautiously labeled as improvement for reduction of
"Fragmenting for unmovable" events, which would be perfect as that was
the intention. For reclaimable it looks worse, but probably just within
noise. Same goes for Patch 8, although the apparent regression for
reclaimable looks even worse there.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
@ 2017-02-15 14:29     ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-15 14:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/13/2017 12:07 PM, Mel Gorman wrote:
> On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
> 
> By and large, I like the series, particularly patches 7 and 8. I cannot
> make up my mind about the RFC patches 9 and 10 yet. Conceptually they
> seem sound but they are much more far reaching than the rest of the
> series.
> 
> It would be nice if patches 1-8 could be treated in isolation with data
> on the number of extfrag events triggered, time spent in compaction and
> the success rate. Patches 9 and 10 are tricy enough that they would need
> data per patch where as patches 1-8 should be ok with data gathered for
> the whole series.

I've got the results with mmtests stress-highalloc modified to do
GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of
each run, as the extfrag stats are quite volatile (note the stats below
are sums, not averages, as it was less perl hacking for me).

Success rate are the same, already high due to the low order. THP and
compaction stats also roughly the same. The extfrag stats (a bit
modified/expanded wrt. vanilla mmtests):

(the patches are stacked, and I haven't measured the non-functional-changes
patches separately)
							   base     patch 2     patch 3     patch 4     patch 7     patch 8
Page alloc extfrag event                               11734984    11769620    11485185    13029676    13312786    13939417
Extfrag fragmenting                                    11729231    11763921    11479301    13024101    13307281    13933978
Extfrag fragmenting for unmovable                         87848       84906       76328       78613       66025       59261
Extfrag fragmenting unmovable placed with movable          8298        7367        5865        8479        6440        5928
Extfrag fragmenting for reclaimable                    11636074    11673657    11397642    12940253    13236444    13869509
Extfrag fragmenting reclaimable placed with movable      389283      362396      330855      374292      390700      415478
Extfrag fragmenting for movable                            5309        5358        5331        5235        4812        5208

Going in order, patch 3 might be some improvement wrt polluting
(movable) pageblocks with unmovable, hopefully not noise.

Results for patch 4 ("count movable pages when stealing from pageblock")
are really puzzling me, as it increases the number of fragmenting events
for reclaimable allocations, implicating "reclaimable placed with (i.e.
falling back to) unmovable" (which is not listed separately above, but
follows logically from "reclaimable placed with movable" not changing
that much). I really wonder why is that. The patch effectively only
changes the decision to change migratetype of a pageblock, it doesn't
affect the actual stealing decision (which is always true for
RECLAIMABLE anyway, see can_steal_fallback()). Moreover, since we can't
distinguish UNMOVABLE from RECLAIMABLE when counting, good_pages is 0
and thus even the decision to change pageblock migratetype shouldn't be
changed by the patch for this case. I must recheck the implementation...

Patch 7 could be cautiously labeled as improvement for reduction of
"Fragmenting for unmovable" events, which would be perfect as that was
the intention. For reclaimable it looks worse, but probably just within
noise. Same goes for Patch 8, although the apparent regression for
reclaimable looks even worse there.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
  2017-02-15 14:29     ` Vlastimil Babka
@ 2017-02-15 16:11       ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-15 16:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/15/2017 03:29 PM, Vlastimil Babka wrote:
> Results for patch 4 ("count movable pages when stealing from pageblock")
> are really puzzling me, as it increases the number of fragmenting events
> for reclaimable allocations, implicating "reclaimable placed with (i.e.
> falling back to) unmovable" (which is not listed separately above, but
> follows logically from "reclaimable placed with movable" not changing
> that much). I really wonder why is that. The patch effectively only
> changes the decision to change migratetype of a pageblock, it doesn't
> affect the actual stealing decision (which is always true for
> RECLAIMABLE anyway, see can_steal_fallback()). Moreover, since we can't
> distinguish UNMOVABLE from RECLAIMABLE when counting, good_pages is 0
> and thus even the decision to change pageblock migratetype shouldn't be
> changed by the patch for this case. I must recheck the implementation...

Ah, there it is... not enough LISP

-       if (pages >= (1 << (pageblock_order-1)) ||
+       /* Claim the whole block if over half of it is free or good type */
+       if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
@ 2017-02-15 16:11       ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-15 16:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/15/2017 03:29 PM, Vlastimil Babka wrote:
> Results for patch 4 ("count movable pages when stealing from pageblock")
> are really puzzling me, as it increases the number of fragmenting events
> for reclaimable allocations, implicating "reclaimable placed with (i.e.
> falling back to) unmovable" (which is not listed separately above, but
> follows logically from "reclaimable placed with movable" not changing
> that much). I really wonder why is that. The patch effectively only
> changes the decision to change migratetype of a pageblock, it doesn't
> affect the actual stealing decision (which is always true for
> RECLAIMABLE anyway, see can_steal_fallback()). Moreover, since we can't
> distinguish UNMOVABLE from RECLAIMABLE when counting, good_pages is 0
> and thus even the decision to change pageblock migratetype shouldn't be
> changed by the patch for this case. I must recheck the implementation...

Ah, there it is... not enough LISP

-       if (pages >= (1 << (pageblock_order-1)) ||
+       /* Claim the whole block if over half of it is free or good type */
+       if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
  2017-02-15 16:11       ` Vlastimil Babka
@ 2017-02-15 20:11         ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-15 20:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 15.2.2017 17:11, Vlastimil Babka wrote:
> On 02/15/2017 03:29 PM, Vlastimil Babka wrote:
>> Results for patch 4 ("count movable pages when stealing from pageblock")
>> are really puzzling me, as it increases the number of fragmenting events
>> for reclaimable allocations, implicating "reclaimable placed with (i.e.
>> falling back to) unmovable" (which is not listed separately above, but
>> follows logically from "reclaimable placed with movable" not changing
>> that much). I really wonder why is that. The patch effectively only
>> changes the decision to change migratetype of a pageblock, it doesn't
>> affect the actual stealing decision (which is always true for
>> RECLAIMABLE anyway, see can_steal_fallback()). Moreover, since we can't
>> distinguish UNMOVABLE from RECLAIMABLE when counting, good_pages is 0
>> and thus even the decision to change pageblock migratetype shouldn't be
>> changed by the patch for this case. I must recheck the implementation...
> 
> Ah, there it is... not enough LISP
> 
> -       if (pages >= (1 << (pageblock_order-1)) ||
> +       /* Claim the whole block if over half of it is free or good type */
> +       if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||

Nope, I was blind and thought that this needs "(free_pages + good_pages)"
because of operator priority wrt shifting, but >= is not shift... bah.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
@ 2017-02-15 20:11         ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-15 20:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 15.2.2017 17:11, Vlastimil Babka wrote:
> On 02/15/2017 03:29 PM, Vlastimil Babka wrote:
>> Results for patch 4 ("count movable pages when stealing from pageblock")
>> are really puzzling me, as it increases the number of fragmenting events
>> for reclaimable allocations, implicating "reclaimable placed with (i.e.
>> falling back to) unmovable" (which is not listed separately above, but
>> follows logically from "reclaimable placed with movable" not changing
>> that much). I really wonder why is that. The patch effectively only
>> changes the decision to change migratetype of a pageblock, it doesn't
>> affect the actual stealing decision (which is always true for
>> RECLAIMABLE anyway, see can_steal_fallback()). Moreover, since we can't
>> distinguish UNMOVABLE from RECLAIMABLE when counting, good_pages is 0
>> and thus even the decision to change pageblock migratetype shouldn't be
>> changed by the patch for this case. I must recheck the implementation...
> 
> Ah, there it is... not enough LISP
> 
> -       if (pages >= (1 << (pageblock_order-1)) ||
> +       /* Claim the whole block if over half of it is free or good type */
> +       if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||

Nope, I was blind and thought that this needs "(free_pages + good_pages)"
because of operator priority wrt shifting, but >= is not shift... bah.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 08/10] mm, compaction: finish whole pageblock to reduce fragmentation
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-02-16 11:44     ` Johannes Weiner
  -1 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-16 11:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:41PM +0100, Vlastimil Babka wrote:
> The main goal of direct compaction is to form a high-order page for allocation,
> but it should also help against long-term fragmentation when possible. Most
> lower-than-pageblock-order compactions are for non-movable allocations, which
> means that if we compact in a movable pageblock and terminate as soon as we
> create the high-order page, it's unlikely that the fallback heuristics will
> claim the whole block. Instead there might be a single unmovable page in a
> pageblock full of movable pages, and the next unmovable allocation might pick
> another pageblock and increase long-term fragmentation.
> 
> To help against such scenarios, this patch changes the termination criteria for
> compaction so that the current pageblock is finished even though the high-order
> page already exists. Note that it might be possible that the high-order page
> formed elsewhere in the zone due to parallel activity, but this patch doesn't
> try to detect that.
> 
> This is only done with sync compaction, because async compaction is limited to
> pageblock of the same migratetype, where it cannot result in a migratetype
> fallback. (Async compaction also eagerly skips order-aligned blocks where
> isolation fails, which is against the goal of migrating away as much of the
> pageblock as possible.)
> 
> As a result of this patch, long-term memory fragmentation should be reduced.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 08/10] mm, compaction: finish whole pageblock to reduce fragmentation
@ 2017-02-16 11:44     ` Johannes Weiner
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-16 11:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 10, 2017 at 06:23:41PM +0100, Vlastimil Babka wrote:
> The main goal of direct compaction is to form a high-order page for allocation,
> but it should also help against long-term fragmentation when possible. Most
> lower-than-pageblock-order compactions are for non-movable allocations, which
> means that if we compact in a movable pageblock and terminate as soon as we
> create the high-order page, it's unlikely that the fallback heuristics will
> claim the whole block. Instead there might be a single unmovable page in a
> pageblock full of movable pages, and the next unmovable allocation might pick
> another pageblock and increase long-term fragmentation.
> 
> To help against such scenarios, this patch changes the termination criteria for
> compaction so that the current pageblock is finished even though the high-order
> page already exists. Note that it might be possible that the high-order page
> formed elsewhere in the zone due to parallel activity, but this patch doesn't
> try to detect that.
> 
> This is only done with sync compaction, because async compaction is limited to
> pageblock of the same migratetype, where it cannot result in a migratetype
> fallback. (Async compaction also eagerly skips order-aligned blocks where
> isolation fails, which is against the goal of migrating away as much of the
> pageblock as possible.)
> 
> As a result of this patch, long-term memory fragmentation should be reduced.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
  2017-02-15 14:29     ` Vlastimil Babka
@ 2017-02-16 15:12       ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-16 15:12 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/15/2017 03:29 PM, Vlastimil Babka wrote:
> On 02/13/2017 12:07 PM, Mel Gorman wrote:
>> On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
>>
>> By and large, I like the series, particularly patches 7 and 8. I cannot
>> make up my mind about the RFC patches 9 and 10 yet. Conceptually they
>> seem sound but they are much more far reaching than the rest of the
>> series.
>>
>> It would be nice if patches 1-8 could be treated in isolation with data
>> on the number of extfrag events triggered, time spent in compaction and
>> the success rate. Patches 9 and 10 are tricy enough that they would need
>> data per patch where as patches 1-8 should be ok with data gathered for
>> the whole series.
> 
> I've got the results with mmtests stress-highalloc modified to do
> GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
> balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
> wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of
> each run, as the extfrag stats are quite volatile (note the stats below
> are sums, not averages, as it was less perl hacking for me).
> 
> Success rate are the same, already high due to the low order. THP and
> compaction stats also roughly the same. The extfrag stats (a bit
> modified/expanded wrt. vanilla mmtests):
> 
> (the patches are stacked, and I haven't measured the non-functional-changes
> patches separately)
> 							   base     patch 2     patch 3     patch 4     patch 7     patch 8
> Page alloc extfrag event                               11734984    11769620    11485185    13029676    13312786    13939417
> Extfrag fragmenting                                    11729231    11763921    11479301    13024101    13307281    13933978
> Extfrag fragmenting for unmovable                         87848       84906       76328       78613       66025       59261
> Extfrag fragmenting unmovable placed with movable          8298        7367        5865        8479        6440        5928
> Extfrag fragmenting for reclaimable                    11636074    11673657    11397642    12940253    13236444    13869509
> Extfrag fragmenting reclaimable placed with movable      389283      362396      330855      374292      390700      415478
> Extfrag fragmenting for movable                            5309        5358        5331        5235        4812        5208

OK, so turns out the trace postprocessing script had mixed up movable
and reclaimable, because the tracepoint prints only the numeric value
from the enum. Commit 016c13daa5c9 ("mm, page_alloc: use masks and
shifts when converting GFP flags to migrate types") swapped movable and
reclaimable in the enum, and the script wasn't updated.

Here are the results again, after fixing the script:

 							   base     patch 2     patch 3     patch 4     patch 7     patch 8
Page alloc extfrag event                               11734984    11769620    11485185    13029676    13312786    13939417
Extfrag fragmenting                                    11729231    11763921    11479301    13024101    13307281    13933978
Extfrag fragmenting for unmovable                         87848       84906       76328       78613       66025       59261
Extfrag fragmenting unmovable placed with movable         79550       77539       70463       70134       59585       53333
Extfrag fragmenting unmovable placed with reclaim.         8298        7367        5865        8479        6440        5928
Extfrag fragmenting for reclaimable                        5309        5358        5331        5235        4812        5208
Extfrag fragmenting reclaimable placed with movable        1757        1728        1703        1750        1647        1715
Extfrag fragmenting reclaimable placed with unmov.         3552        3630        3628        3485        3165        3493
Extfrag fragmenting for movable                        11636074    11673657    11397642    12940253    13236444    13869509

Most of the original evaluation is still applicable, and it's nice to
see even more stronger trend of "unmovable placed with movable"
decreasing throughout the series.
The mystery of patch 4 increasing fragmenting events actually applies to
movable allocations (and not reclaimable), which is not permanent
fragmentation. But it's still significant, so I'll investigate.
It's unfortunately possible that the optimistic stats are just a result
of having more pageblocks on average marked as UNMOVABLE. That would be
fine if they were really occupied by such allocations, but not so great
otherwise. I do hope that the extra insight about existing pages coming
from Patch 4 is improving things here, not making them worse. But the
extfrag events themselves won't tell us that...

> Going in order, patch 3 might be some improvement wrt polluting
> (movable) pageblocks with unmovable, hopefully not noise.
> 
> Results for patch 4 ("count movable pages when stealing from pageblock")
> are really puzzling me, as it increases the number of fragmenting events
> for reclaimable allocations, implicating "reclaimable placed with (i.e.
> falling back to) unmovable" (which is not listed separately above, but
> follows logically from "reclaimable placed with movable" not changing
> that much). I really wonder why is that. The patch effectively only
> changes the decision to change migratetype of a pageblock, it doesn't
> affect the actual stealing decision (which is always true for
> RECLAIMABLE anyway, see can_steal_fallback()). Moreover, since we can't
> distinguish UNMOVABLE from RECLAIMABLE when counting, good_pages is 0
> and thus even the decision to change pageblock migratetype shouldn't be
> changed by the patch for this case. I must recheck the implementation...
> 
> Patch 7 could be cautiously labeled as improvement for reduction of
> "Fragmenting for unmovable" events, which would be perfect as that was
> the intention. For reclaimable it looks worse, but probably just within
> noise. Same goes for Patch 8, although the apparent regression for
> reclaimable looks even worse there.
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
@ 2017-02-16 15:12       ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-16 15:12 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/15/2017 03:29 PM, Vlastimil Babka wrote:
> On 02/13/2017 12:07 PM, Mel Gorman wrote:
>> On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
>>
>> By and large, I like the series, particularly patches 7 and 8. I cannot
>> make up my mind about the RFC patches 9 and 10 yet. Conceptually they
>> seem sound but they are much more far reaching than the rest of the
>> series.
>>
>> It would be nice if patches 1-8 could be treated in isolation with data
>> on the number of extfrag events triggered, time spent in compaction and
>> the success rate. Patches 9 and 10 are tricy enough that they would need
>> data per patch where as patches 1-8 should be ok with data gathered for
>> the whole series.
> 
> I've got the results with mmtests stress-highalloc modified to do
> GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
> balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
> wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of
> each run, as the extfrag stats are quite volatile (note the stats below
> are sums, not averages, as it was less perl hacking for me).
> 
> Success rate are the same, already high due to the low order. THP and
> compaction stats also roughly the same. The extfrag stats (a bit
> modified/expanded wrt. vanilla mmtests):
> 
> (the patches are stacked, and I haven't measured the non-functional-changes
> patches separately)
> 							   base     patch 2     patch 3     patch 4     patch 7     patch 8
> Page alloc extfrag event                               11734984    11769620    11485185    13029676    13312786    13939417
> Extfrag fragmenting                                    11729231    11763921    11479301    13024101    13307281    13933978
> Extfrag fragmenting for unmovable                         87848       84906       76328       78613       66025       59261
> Extfrag fragmenting unmovable placed with movable          8298        7367        5865        8479        6440        5928
> Extfrag fragmenting for reclaimable                    11636074    11673657    11397642    12940253    13236444    13869509
> Extfrag fragmenting reclaimable placed with movable      389283      362396      330855      374292      390700      415478
> Extfrag fragmenting for movable                            5309        5358        5331        5235        4812        5208

OK, so turns out the trace postprocessing script had mixed up movable
and reclaimable, because the tracepoint prints only the numeric value
from the enum. Commit 016c13daa5c9 ("mm, page_alloc: use masks and
shifts when converting GFP flags to migrate types") swapped movable and
reclaimable in the enum, and the script wasn't updated.

Here are the results again, after fixing the script:

 							   base     patch 2     patch 3     patch 4     patch 7     patch 8
Page alloc extfrag event                               11734984    11769620    11485185    13029676    13312786    13939417
Extfrag fragmenting                                    11729231    11763921    11479301    13024101    13307281    13933978
Extfrag fragmenting for unmovable                         87848       84906       76328       78613       66025       59261
Extfrag fragmenting unmovable placed with movable         79550       77539       70463       70134       59585       53333
Extfrag fragmenting unmovable placed with reclaim.         8298        7367        5865        8479        6440        5928
Extfrag fragmenting for reclaimable                        5309        5358        5331        5235        4812        5208
Extfrag fragmenting reclaimable placed with movable        1757        1728        1703        1750        1647        1715
Extfrag fragmenting reclaimable placed with unmov.         3552        3630        3628        3485        3165        3493
Extfrag fragmenting for movable                        11636074    11673657    11397642    12940253    13236444    13869509

Most of the original evaluation is still applicable, and it's nice to
see even more stronger trend of "unmovable placed with movable"
decreasing throughout the series.
The mystery of patch 4 increasing fragmenting events actually applies to
movable allocations (and not reclaimable), which is not permanent
fragmentation. But it's still significant, so I'll investigate.
It's unfortunately possible that the optimistic stats are just a result
of having more pageblocks on average marked as UNMOVABLE. That would be
fine if they were really occupied by such allocations, but not so great
otherwise. I do hope that the extra insight about existing pages coming
from Patch 4 is improving things here, not making them worse. But the
extfrag events themselves won't tell us that...

> Going in order, patch 3 might be some improvement wrt polluting
> (movable) pageblocks with unmovable, hopefully not noise.
> 
> Results for patch 4 ("count movable pages when stealing from pageblock")
> are really puzzling me, as it increases the number of fragmenting events
> for reclaimable allocations, implicating "reclaimable placed with (i.e.
> falling back to) unmovable" (which is not listed separately above, but
> follows logically from "reclaimable placed with movable" not changing
> that much). I really wonder why is that. The patch effectively only
> changes the decision to change migratetype of a pageblock, it doesn't
> affect the actual stealing decision (which is always true for
> RECLAIMABLE anyway, see can_steal_fallback()). Moreover, since we can't
> distinguish UNMOVABLE from RECLAIMABLE when counting, good_pages is 0
> and thus even the decision to change pageblock migratetype shouldn't be
> changed by the patch for this case. I must recheck the implementation...
> 
> Patch 7 could be cautiously labeled as improvement for reduction of
> "Fragmenting for unmovable" events, which would be perfect as that was
> the intention. For reclaimable it looks worse, but probably just within
> noise. Same goes for Patch 8, although the apparent regression for
> reclaimable looks even worse there.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
  2017-02-16 15:12       ` Vlastimil Babka
@ 2017-02-17 15:24         ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-17 15:24 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/16/2017 04:12 PM, Vlastimil Babka wrote:
> On 02/15/2017 03:29 PM, Vlastimil Babka wrote:
>> On 02/13/2017 12:07 PM, Mel Gorman wrote:
>>> On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
>>>
>>> By and large, I like the series, particularly patches 7 and 8. I cannot
>>> make up my mind about the RFC patches 9 and 10 yet. Conceptually they
>>> seem sound but they are much more far reaching than the rest of the
>>> series.
>>>
>>> It would be nice if patches 1-8 could be treated in isolation with data
>>> on the number of extfrag events triggered, time spent in compaction and
>>> the success rate. Patches 9 and 10 are tricy enough that they would need
>>> data per patch where as patches 1-8 should be ok with data gathered for
>>> the whole series.
>> 
>> I've got the results with mmtests stress-highalloc modified to do
>> GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
>> balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
>> wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of
>> each run, as the extfrag stats are quite volatile (note the stats below
>> are sums, not averages, as it was less perl hacking for me).
>> 
>> Success rate are the same, already high due to the low order. THP and
>> compaction stats also roughly the same. The extfrag stats (a bit
>> modified/expanded wrt. vanilla mmtests):
>> 
>> (the patches are stacked, and I haven't measured the non-functional-changes
>> patches separately)
>> 							   base     patch 2     patch 3     patch 4     patch 7     patch 8
>> Page alloc extfrag event                               11734984    11769620    11485185    13029676    13312786    13939417
>> Extfrag fragmenting                                    11729231    11763921    11479301    13024101    13307281    13933978
>> Extfrag fragmenting for unmovable                         87848       84906       76328       78613       66025       59261
>> Extfrag fragmenting unmovable placed with movable          8298        7367        5865        8479        6440        5928
>> Extfrag fragmenting for reclaimable                    11636074    11673657    11397642    12940253    13236444    13869509
>> Extfrag fragmenting reclaimable placed with movable      389283      362396      330855      374292      390700      415478
>> Extfrag fragmenting for movable                            5309        5358        5331        5235        4812        5208
> 
> OK, so turns out the trace postprocessing script had mixed up movable
> and reclaimable, because the tracepoint prints only the numeric value
> from the enum. Commit 016c13daa5c9 ("mm, page_alloc: use masks and
> shifts when converting GFP flags to migrate types") swapped movable and
> reclaimable in the enum, and the script wasn't updated.
> 
> Here are the results again, after fixing the script:
> 
>  							   base     patch 2     patch 3     patch 4     patch 7     patch 8
> Page alloc extfrag event                               11734984    11769620    11485185    13029676    13312786    13939417
> Extfrag fragmenting                                    11729231    11763921    11479301    13024101    13307281    13933978
> Extfrag fragmenting for unmovable                         87848       84906       76328       78613       66025       59261
> Extfrag fragmenting unmovable placed with movable         79550       77539       70463       70134       59585       53333
> Extfrag fragmenting unmovable placed with reclaim.         8298        7367        5865        8479        6440        5928
> Extfrag fragmenting for reclaimable                        5309        5358        5331        5235        4812        5208
> Extfrag fragmenting reclaimable placed with movable        1757        1728        1703        1750        1647        1715
> Extfrag fragmenting reclaimable placed with unmov.         3552        3630        3628        3485        3165        3493
> Extfrag fragmenting for movable                        11636074    11673657    11397642    12940253    13236444    13869509


And the disaster of evaluation continues. I have now realised that my automation
got broken by grub2 changes, and long story short, iterations 2+ of each kernel
actually used the "patch 8" kernel, which made all the differences relatively
smaller. So only the first iteration is usable, with results below for
illustration. I'll hopefully collect the proper data with 5 iterations over
weekend - the series should have more impact than it looked like.

    	                                                      base     patch 2     patch 3     patch 4     patch 7     patch 8
Page alloc extfrag event                                   1528823     1444798     1514653     2702564     2643290     3024168
Extfrag fragmenting                                        1527537     1443567     1513410     2701466     2642117     3023164
Extfrag fragmenting for unmovable                            39908       37186       32646       23214       13942       13994
Extfrag fragmenting unmovable placed with movable            36703       36093       31344       21312       12628       12267
Extfrag fragmenting unmovable placed with reclaim.            3205        1093        1302        1902        1314        1727
Extfrag fragmenting for reclaimable                           1038        1025        1048        1039        1023        1132
Extfrag fragmenting reclaimable placed with movable            370         319         326         373         317         320
Extfrag fragmenting reclaimable placed with unmov.             668         706         722         666         706         812
Extfrag fragmenting for movable                            1486591     1405356     1479716     2677213     2627152     3008038

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
@ 2017-02-17 15:24         ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-17 15:24 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/16/2017 04:12 PM, Vlastimil Babka wrote:
> On 02/15/2017 03:29 PM, Vlastimil Babka wrote:
>> On 02/13/2017 12:07 PM, Mel Gorman wrote:
>>> On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
>>>
>>> By and large, I like the series, particularly patches 7 and 8. I cannot
>>> make up my mind about the RFC patches 9 and 10 yet. Conceptually they
>>> seem sound but they are much more far reaching than the rest of the
>>> series.
>>>
>>> It would be nice if patches 1-8 could be treated in isolation with data
>>> on the number of extfrag events triggered, time spent in compaction and
>>> the success rate. Patches 9 and 10 are tricy enough that they would need
>>> data per patch where as patches 1-8 should be ok with data gathered for
>>> the whole series.
>> 
>> I've got the results with mmtests stress-highalloc modified to do
>> GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
>> balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
>> wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of
>> each run, as the extfrag stats are quite volatile (note the stats below
>> are sums, not averages, as it was less perl hacking for me).
>> 
>> Success rate are the same, already high due to the low order. THP and
>> compaction stats also roughly the same. The extfrag stats (a bit
>> modified/expanded wrt. vanilla mmtests):
>> 
>> (the patches are stacked, and I haven't measured the non-functional-changes
>> patches separately)
>> 							   base     patch 2     patch 3     patch 4     patch 7     patch 8
>> Page alloc extfrag event                               11734984    11769620    11485185    13029676    13312786    13939417
>> Extfrag fragmenting                                    11729231    11763921    11479301    13024101    13307281    13933978
>> Extfrag fragmenting for unmovable                         87848       84906       76328       78613       66025       59261
>> Extfrag fragmenting unmovable placed with movable          8298        7367        5865        8479        6440        5928
>> Extfrag fragmenting for reclaimable                    11636074    11673657    11397642    12940253    13236444    13869509
>> Extfrag fragmenting reclaimable placed with movable      389283      362396      330855      374292      390700      415478
>> Extfrag fragmenting for movable                            5309        5358        5331        5235        4812        5208
> 
> OK, so turns out the trace postprocessing script had mixed up movable
> and reclaimable, because the tracepoint prints only the numeric value
> from the enum. Commit 016c13daa5c9 ("mm, page_alloc: use masks and
> shifts when converting GFP flags to migrate types") swapped movable and
> reclaimable in the enum, and the script wasn't updated.
> 
> Here are the results again, after fixing the script:
> 
>  							   base     patch 2     patch 3     patch 4     patch 7     patch 8
> Page alloc extfrag event                               11734984    11769620    11485185    13029676    13312786    13939417
> Extfrag fragmenting                                    11729231    11763921    11479301    13024101    13307281    13933978
> Extfrag fragmenting for unmovable                         87848       84906       76328       78613       66025       59261
> Extfrag fragmenting unmovable placed with movable         79550       77539       70463       70134       59585       53333
> Extfrag fragmenting unmovable placed with reclaim.         8298        7367        5865        8479        6440        5928
> Extfrag fragmenting for reclaimable                        5309        5358        5331        5235        4812        5208
> Extfrag fragmenting reclaimable placed with movable        1757        1728        1703        1750        1647        1715
> Extfrag fragmenting reclaimable placed with unmov.         3552        3630        3628        3485        3165        3493
> Extfrag fragmenting for movable                        11636074    11673657    11397642    12940253    13236444    13869509


And the disaster of evaluation continues. I have now realised that my automation
got broken by grub2 changes, and long story short, iterations 2+ of each kernel
actually used the "patch 8" kernel, which made all the differences relatively
smaller. So only the first iteration is usable, with results below for
illustration. I'll hopefully collect the proper data with 5 iterations over
weekend - the series should have more impact than it looked like.

    	                                                      base     patch 2     patch 3     patch 4     patch 7     patch 8
Page alloc extfrag event                                   1528823     1444798     1514653     2702564     2643290     3024168
Extfrag fragmenting                                        1527537     1443567     1513410     2701466     2642117     3023164
Extfrag fragmenting for unmovable                            39908       37186       32646       23214       13942       13994
Extfrag fragmenting unmovable placed with movable            36703       36093       31344       21312       12628       12267
Extfrag fragmenting unmovable placed with reclaim.            3205        1093        1302        1902        1314        1727
Extfrag fragmenting for reclaimable                           1038        1025        1048        1039        1023        1132
Extfrag fragmenting reclaimable placed with movable            370         319         326         373         317         320
Extfrag fragmenting reclaimable placed with unmov.             668         706         722         666         706         812
Extfrag fragmenting for movable                            1486591     1405356     1479716     2677213     2627152     3008038

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
  2017-02-14 18:10     ` Johannes Weiner
@ 2017-02-17 16:09       ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-17 16:09 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On 02/14/2017 07:10 PM, Johannes Weiner wrote:
> 
> That makes sense to me. I have just one nit about the patch:
> 
>> @@ -1981,10 +1994,29 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>>  		return;
>>  	}
>>  
>> -	pages = move_freepages_block(zone, page, start_type);
>> +	free_pages = move_freepages_block(zone, page, start_type,
>> +						&good_pages);
>> +	/*
>> +	 * good_pages is now the number of movable pages, but if we
>> +	 * want UNMOVABLE or RECLAIMABLE allocation, it's more tricky
>> +	 */
>> +	if (start_type != MIGRATE_MOVABLE) {
>> +		/*
>> +		 * If we are falling back to MIGRATE_MOVABLE pageblock,
>> +		 * treat all non-movable pages as good. If it's UNMOVABLE
>> +		 * falling back to RECLAIMABLE or vice versa, be conservative
>> +		 * as we can't distinguish the exact migratetype.
>> +		 */
>> +		old_block_type = get_pageblock_migratetype(page);
>> +		if (old_block_type == MIGRATE_MOVABLE)
>> +			good_pages = pageblock_nr_pages
>> +						- free_pages - good_pages;
> 
> This line had me scratch my head for a while, and I think it's mostly
> because of the variable naming and the way the comments are phrased.
> 
> Could you use a variable called movable_pages to pass to and be filled
> in by move_freepages_block?
> 
> And instead of good_pages something like starttype_pages or
> alike_pages or st_pages or mt_pages or something, to indicate the
> number of pages that are comparable to the allocation's migratetype?
> 
>> -	/* Claim the whole block if over half of it is free */
>> -	if (pages >= (1 << (pageblock_order-1)) ||
>> +	/* Claim the whole block if over half of it is free or good type */
>> +	if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||
>>  			page_group_by_mobility_disabled)
>>  		set_pageblock_migratetype(page, start_type);
> 
> This would then read
> 
> 	if (free_pages + alike_pages ...)
> 
> which I think would be more descriptive.
> 
> The comment leading the entire section following move_freepages_block
> could then say something like "If a sufficient number of pages in the
> block are either free or of comparable migratability as our
> allocation, claim the whole block." Followed by the caveats of how we
> determine this migratibility.
> 
> Or maybe even the function. The comment above the function seems out
> of date after this patch.

I'll incorporate this for the next posting, thanks for the feedback!

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-02-17 16:09       ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-17 16:09 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On 02/14/2017 07:10 PM, Johannes Weiner wrote:
> 
> That makes sense to me. I have just one nit about the patch:
> 
>> @@ -1981,10 +1994,29 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>>  		return;
>>  	}
>>  
>> -	pages = move_freepages_block(zone, page, start_type);
>> +	free_pages = move_freepages_block(zone, page, start_type,
>> +						&good_pages);
>> +	/*
>> +	 * good_pages is now the number of movable pages, but if we
>> +	 * want UNMOVABLE or RECLAIMABLE allocation, it's more tricky
>> +	 */
>> +	if (start_type != MIGRATE_MOVABLE) {
>> +		/*
>> +		 * If we are falling back to MIGRATE_MOVABLE pageblock,
>> +		 * treat all non-movable pages as good. If it's UNMOVABLE
>> +		 * falling back to RECLAIMABLE or vice versa, be conservative
>> +		 * as we can't distinguish the exact migratetype.
>> +		 */
>> +		old_block_type = get_pageblock_migratetype(page);
>> +		if (old_block_type == MIGRATE_MOVABLE)
>> +			good_pages = pageblock_nr_pages
>> +						- free_pages - good_pages;
> 
> This line had me scratch my head for a while, and I think it's mostly
> because of the variable naming and the way the comments are phrased.
> 
> Could you use a variable called movable_pages to pass to and be filled
> in by move_freepages_block?
> 
> And instead of good_pages something like starttype_pages or
> alike_pages or st_pages or mt_pages or something, to indicate the
> number of pages that are comparable to the allocation's migratetype?
> 
>> -	/* Claim the whole block if over half of it is free */
>> -	if (pages >= (1 << (pageblock_order-1)) ||
>> +	/* Claim the whole block if over half of it is free or good type */
>> +	if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||
>>  			page_group_by_mobility_disabled)
>>  		set_pageblock_migratetype(page, start_type);
> 
> This would then read
> 
> 	if (free_pages + alike_pages ...)
> 
> which I think would be more descriptive.
> 
> The comment leading the entire section following move_freepages_block
> could then say something like "If a sufficient number of pages in the
> block are either free or of comparable migratability as our
> allocation, claim the whole block." Followed by the caveats of how we
> determine this migratibility.
> 
> Or maybe even the function. The comment above the function seems out
> of date after this patch.

I'll incorporate this for the next posting, thanks for the feedback!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
  2017-02-15 11:56         ` Xishi Qiu
@ 2017-02-17 16:21           ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-17 16:21 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	Mel Gorman, linux-kernel, kernel-team

On 02/15/2017 12:56 PM, Xishi Qiu wrote:
> On 2017/2/15 18:47, Vlastimil Babka wrote:
> 
>> On 02/14/2017 11:07 AM, Xishi Qiu wrote:
>>> On 2017/2/11 1:23, Vlastimil Babka wrote:
>>>
>>>> When stealing pages from pageblock of a different migratetype, we count how
>>>> many free pages were stolen, and change the pageblock's migratetype if more
>>>> than half of the pageblock was free. This might be too conservative, as there
>>>> might be other pages that are not free, but were allocated with the same
>>>> migratetype as our allocation requested.
>>>>
>>>> While we cannot determine the migratetype of allocated pages precisely (at
>>>> least without the page_owner functionality enabled), we can count pages that
>>>> compaction would try to isolate for migration - those are either on LRU or
>>>> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
>>>> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
>>>> done as part of free page stealing with little additional overhead.
>>>>
>>>> The page stealing code is changed so that it considers free pages plus pages
>>>> of the "good" migratetype for the decision whether to change pageblock's
>>>> migratetype.
>>>>
>>>> The result should be more accurate migratetype of pageblocks wrt the actual
>>>> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
>>>> should help the efficiency of page grouping by mobility.
>>>>
>>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>>>
>>> Hi Vlastimil,
>>>
>>> How about these two changes?
>>>
>>> 1. If we steal some free pages, we will add these page at the head of start_migratetype
>>> list, it will cause more fixed, because these pages will be allocated more easily.
>> 
>> What do you mean by "more fixed" here?
>> 
>>> So how about use list_move_tail instead of list_move?
>> 
>> Hmm, not sure if it can make any difference. We steal because the lists
>> are currently empty (at least for the order we want), so it shouldn't
>> matter if we add to head or tail.
>> 
> 
> Hi Vlastimil,
> 
> Please see the following case, I am not sure if it is right.
> 
> MIGRATE_MOVABLE
> order:    0 1 2 3 4 5 6 7 8 9 10
> free num: 1 1 1 1 1 1 1 1 1 1 0  // one page(e.g. page A) was allocated before
>
> MIGRATE_UNMOVABLE
> order:    0 1 2 3 4 5 6 7 8 9 10
> free num: x x x x 0 0 0 0 0 0 0 // we want order=4, so steal from MIGRATE_MOVABLE
> 
> We alloc order=4 in MIGRATE_UNMOVABLE, then it will fallback to steal pages from
> MIGRATE_MOVABLE, and we will move free pages form MIGRATE_MOVABLE list to 
> MIGRATE_UNMOVABLE list.
> 
> List of order 4-9 in MIGRATE_UNMOVABLE is empty, so add head or tail is the same.
> But order 0-3 is not empty, so if we add to the head, we will allocate pages which
> stolen from MIGRATE_MOVABLE first later. So we will have less chance to make a large
> block(order=10) when the one page(page A) free again.

I see. But do we know that page A, and the order-4 page we just allocated, are
both going to be freed soon? It's not a clear win to me, so maybe you can try
implementing it and see if it makes any difference?

> Also we will split order=9 which from MIGRATE_MOVABLE to alloc order=4 in expand(),

Yes, for pageblock order == 9.

> so if we add to the head, we will allocate pages which split from order=9 first later.
> So we will have less chance to make a large block(order=9) when the order=4 page
> free again.

Again that assumes our order-4 allocation is temporary. Is there a significant
chance of this?

>>> __rmqueue_fallback
>>> 	steal_suitable_fallback
>>> 		move_freepages_block
>>> 			move_freepages
>>> 				list_move
>>>
>>> 2. When doing expand() - list_add(), usually the list is empty, but in the
>>> following case, the list is not empty, because we did move_freepages_block()
>>> before.
>>>
>>> __rmqueue_fallback
>>> 	steal_suitable_fallback
>>> 		move_freepages_block  // move to the list of start_migratetype
>>> 	expand  // split the largest order
>>> 		list_add  // add to the list of start_migratetype
>>>
>>> So how about use list_add_tail instead of list_add? Then we can merge the large
>>> block again as soon as the page freed.
>> 
>> Same here. The lists are not empty, but contain probably just the pages
>> from our stolen pageblock. It shouldn't matter how we order them within
>> the same block.
>> 
>> So maybe it could make some difference for higher-order allocations, but
>> it's unclear to me. Making e.g. expand() more complex with a flag to
>> tell it the head vs tail add could mean extra overhead in allocator fast
>> path that would offset any gains.
>> 
>>> Thanks,
>>> Xishi Qiu
>>>
>> 
>> 
>> .
>> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-02-17 16:21           ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-17 16:21 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	Mel Gorman, linux-kernel, kernel-team

On 02/15/2017 12:56 PM, Xishi Qiu wrote:
> On 2017/2/15 18:47, Vlastimil Babka wrote:
> 
>> On 02/14/2017 11:07 AM, Xishi Qiu wrote:
>>> On 2017/2/11 1:23, Vlastimil Babka wrote:
>>>
>>>> When stealing pages from pageblock of a different migratetype, we count how
>>>> many free pages were stolen, and change the pageblock's migratetype if more
>>>> than half of the pageblock was free. This might be too conservative, as there
>>>> might be other pages that are not free, but were allocated with the same
>>>> migratetype as our allocation requested.
>>>>
>>>> While we cannot determine the migratetype of allocated pages precisely (at
>>>> least without the page_owner functionality enabled), we can count pages that
>>>> compaction would try to isolate for migration - those are either on LRU or
>>>> __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
>>>> MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
>>>> done as part of free page stealing with little additional overhead.
>>>>
>>>> The page stealing code is changed so that it considers free pages plus pages
>>>> of the "good" migratetype for the decision whether to change pageblock's
>>>> migratetype.
>>>>
>>>> The result should be more accurate migratetype of pageblocks wrt the actual
>>>> pages in the pageblocks, when stealing from semi-occupied pageblocks. This
>>>> should help the efficiency of page grouping by mobility.
>>>>
>>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>>>
>>> Hi Vlastimil,
>>>
>>> How about these two changes?
>>>
>>> 1. If we steal some free pages, we will add these page at the head of start_migratetype
>>> list, it will cause more fixed, because these pages will be allocated more easily.
>> 
>> What do you mean by "more fixed" here?
>> 
>>> So how about use list_move_tail instead of list_move?
>> 
>> Hmm, not sure if it can make any difference. We steal because the lists
>> are currently empty (at least for the order we want), so it shouldn't
>> matter if we add to head or tail.
>> 
> 
> Hi Vlastimil,
> 
> Please see the following case, I am not sure if it is right.
> 
> MIGRATE_MOVABLE
> order:    0 1 2 3 4 5 6 7 8 9 10
> free num: 1 1 1 1 1 1 1 1 1 1 0  // one page(e.g. page A) was allocated before
>
> MIGRATE_UNMOVABLE
> order:    0 1 2 3 4 5 6 7 8 9 10
> free num: x x x x 0 0 0 0 0 0 0 // we want order=4, so steal from MIGRATE_MOVABLE
> 
> We alloc order=4 in MIGRATE_UNMOVABLE, then it will fallback to steal pages from
> MIGRATE_MOVABLE, and we will move free pages form MIGRATE_MOVABLE list to 
> MIGRATE_UNMOVABLE list.
> 
> List of order 4-9 in MIGRATE_UNMOVABLE is empty, so add head or tail is the same.
> But order 0-3 is not empty, so if we add to the head, we will allocate pages which
> stolen from MIGRATE_MOVABLE first later. So we will have less chance to make a large
> block(order=10) when the one page(page A) free again.

I see. But do we know that page A, and the order-4 page we just allocated, are
both going to be freed soon? It's not a clear win to me, so maybe you can try
implementing it and see if it makes any difference?

> Also we will split order=9 which from MIGRATE_MOVABLE to alloc order=4 in expand(),

Yes, for pageblock order == 9.

> so if we add to the head, we will allocate pages which split from order=9 first later.
> So we will have less chance to make a large block(order=9) when the order=4 page
> free again.

Again that assumes our order-4 allocation is temporary. Is there a significant
chance of this?

>>> __rmqueue_fallback
>>> 	steal_suitable_fallback
>>> 		move_freepages_block
>>> 			move_freepages
>>> 				list_move
>>>
>>> 2. When doing expand() - list_add(), usually the list is empty, but in the
>>> following case, the list is not empty, because we did move_freepages_block()
>>> before.
>>>
>>> __rmqueue_fallback
>>> 	steal_suitable_fallback
>>> 		move_freepages_block  // move to the list of start_migratetype
>>> 	expand  // split the largest order
>>> 		list_add  // add to the list of start_migratetype
>>>
>>> So how about use list_add_tail instead of list_add? Then we can merge the large
>>> block again as soon as the page freed.
>> 
>> Same here. The lists are not empty, but contain probably just the pages
>> from our stolen pageblock. It shouldn't matter how we order them within
>> the same block.
>> 
>> So maybe it could make some difference for higher-order allocations, but
>> it's unclear to me. Making e.g. expand() more complex with a flag to
>> tell it the head vs tail add could mean extra overhead in allocator fast
>> path that would offset any gains.
>> 
>>> Thanks,
>>> Xishi Qiu
>>>
>> 
>> 
>> .
>> 
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype
  2017-02-14 20:10     ` Johannes Weiner
@ 2017-02-17 16:32       ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-17 16:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On 02/14/2017 09:10 PM, Johannes Weiner wrote:
> On Fri, Feb 10, 2017 at 06:23:40PM +0100, Vlastimil Babka wrote:
>> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
>> pageblocks. This is a heuristic intended to reduce latency, based on the
>> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
>> 
>> However, with the exception of THP's, most high-order allocations are not
>> movable. Should the async compaction succeed, this increases the chance that
>> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
>> long-term fragmentation worse.
>> 
>> This patch attempts to help the situation by changing async direct compaction
>> so that the migrate scanner only scans the pageblocks of the requested
>> migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
>> contain movable pages, chances are that the allocation can succeed within one
>> of such pageblocks, removing the need for a fallback. If that fails, the
>> subsequent sync attempt will ignore this restriction.
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Yes, IMO we should make the async compaction scanner decontaminate
> unmovable blocks. This is because we fall back to other-typed blocks
> before we reclaim,

Which we could change too, patch 9 is a step in that direction.

> so any unmovable blocks that aren't perfectly
> occupied will fill with greedy page cache (and order-0 doesn't steal
> blocks back to make them compactable again).

order-0 allocation can actually steal the block back, the decisions to steal are
based on the order of the free pages in the fallback block, not on the
allocation order. But maybe I'm not sure what exactly you meant here.

> Subsequent unmovable
> higher-order allocations in turn are more likely to fall back and
> steal more movable blocks.

Yes.

> As long as we have vastly more movable blocks than unmovable blocks,
> continuous page cache turnover will counteract this negative trend -
> pages are reclaimed mostly from movable blocks and some unmovable
> blocks, while new cache allocations are placed into the freed movable
> blocks - slowly moving cache out from unmovable blocks into movable
> ones. But that effect is independent of the rate of higher-order
> allocations and can be overwhelmed, so I think it makes sense to
> involve compaction directly in decontamination.

Interesting observation, I agree.

> The thing I'm not entirely certain about is the aggressiveness of this
> patch. Instead of restricting the async scanner to blocks of the same
> migratetype, wouldn't it be better (in terms of allocation latency) to
> simply let it compact *all* block types?

Yes it would help allocation latency, but I'm afraid it will remove most of the
decontamination effect.

> Maybe changing it to look at
> unmovable blocks is enough to curb cross-contamination. Sure there
> will still be some, but now we're matching the decontamination rate to
> the rate of !movable higher-order allocations and don't just rely on
> the independent cache turnover rate, which during higher-order bursts
> might not be high enough to prevent an expansion of unmovable blocks.

The rate of compaction attempts is matched with allocations, but the probability
of compaction scanner being in unmovable block is low when the majority of
blocks are movable. So the decontamination rate is proportional but much smaller.

> Does that make sense?

I guess I can try and look at the stats, but I have doubts.

Thanks for the feedback!

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype
@ 2017-02-17 16:32       ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-17 16:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On 02/14/2017 09:10 PM, Johannes Weiner wrote:
> On Fri, Feb 10, 2017 at 06:23:40PM +0100, Vlastimil Babka wrote:
>> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
>> pageblocks. This is a heuristic intended to reduce latency, based on the
>> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
>> 
>> However, with the exception of THP's, most high-order allocations are not
>> movable. Should the async compaction succeed, this increases the chance that
>> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
>> long-term fragmentation worse.
>> 
>> This patch attempts to help the situation by changing async direct compaction
>> so that the migrate scanner only scans the pageblocks of the requested
>> migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
>> contain movable pages, chances are that the allocation can succeed within one
>> of such pageblocks, removing the need for a fallback. If that fails, the
>> subsequent sync attempt will ignore this restriction.
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Yes, IMO we should make the async compaction scanner decontaminate
> unmovable blocks. This is because we fall back to other-typed blocks
> before we reclaim,

Which we could change too, patch 9 is a step in that direction.

> so any unmovable blocks that aren't perfectly
> occupied will fill with greedy page cache (and order-0 doesn't steal
> blocks back to make them compactable again).

order-0 allocation can actually steal the block back, the decisions to steal are
based on the order of the free pages in the fallback block, not on the
allocation order. But maybe I'm not sure what exactly you meant here.

> Subsequent unmovable
> higher-order allocations in turn are more likely to fall back and
> steal more movable blocks.

Yes.

> As long as we have vastly more movable blocks than unmovable blocks,
> continuous page cache turnover will counteract this negative trend -
> pages are reclaimed mostly from movable blocks and some unmovable
> blocks, while new cache allocations are placed into the freed movable
> blocks - slowly moving cache out from unmovable blocks into movable
> ones. But that effect is independent of the rate of higher-order
> allocations and can be overwhelmed, so I think it makes sense to
> involve compaction directly in decontamination.

Interesting observation, I agree.

> The thing I'm not entirely certain about is the aggressiveness of this
> patch. Instead of restricting the async scanner to blocks of the same
> migratetype, wouldn't it be better (in terms of allocation latency) to
> simply let it compact *all* block types?

Yes it would help allocation latency, but I'm afraid it will remove most of the
decontamination effect.

> Maybe changing it to look at
> unmovable blocks is enough to curb cross-contamination. Sure there
> will still be some, but now we're matching the decontamination rate to
> the rate of !movable higher-order allocations and don't just rely on
> the independent cache turnover rate, which during higher-order bursts
> might not be high enough to prevent an expansion of unmovable blocks.

The rate of compaction attempts is matched with allocations, but the probability
of compaction scanner being in unmovable block is low when the majority of
blocks are movable. So the decontamination rate is proportional but much smaller.

> Does that make sense?

I guess I can try and look at the stats, but I have doubts.

Thanks for the feedback!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype
  2017-02-17 16:32       ` Vlastimil Babka
@ 2017-02-17 17:39         ` Johannes Weiner
  -1 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-17 17:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 17, 2017 at 05:32:00PM +0100, Vlastimil Babka wrote:
> On 02/14/2017 09:10 PM, Johannes Weiner wrote:
> > On Fri, Feb 10, 2017 at 06:23:40PM +0100, Vlastimil Babka wrote:
> >> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> >> pageblocks. This is a heuristic intended to reduce latency, based on the
> >> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> >> 
> >> However, with the exception of THP's, most high-order allocations are not
> >> movable. Should the async compaction succeed, this increases the chance that
> >> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> >> long-term fragmentation worse.
> >> 
> >> This patch attempts to help the situation by changing async direct compaction
> >> so that the migrate scanner only scans the pageblocks of the requested
> >> migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
> >> contain movable pages, chances are that the allocation can succeed within one
> >> of such pageblocks, removing the need for a fallback. If that fails, the
> >> subsequent sync attempt will ignore this restriction.
> >> 
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Yes, IMO we should make the async compaction scanner decontaminate
> > unmovable blocks. This is because we fall back to other-typed blocks
> > before we reclaim,
> 
> Which we could change too, patch 9 is a step in that direction.

Yep, patch 9 looks good to me too, pending data that confirms it.

> > so any unmovable blocks that aren't perfectly
> > occupied will fill with greedy page cache (and order-0 doesn't steal
> > blocks back to make them compactable again).
> 
> order-0 allocation can actually steal the block back, the decisions to steal are
> based on the order of the free pages in the fallback block, not on the
> allocation order. But maybe I'm not sure what exactly you meant here.

No, that was me misreading the code. Scratch what's in parentheses.

> > The thing I'm not entirely certain about is the aggressiveness of this
> > patch. Instead of restricting the async scanner to blocks of the same
> > migratetype, wouldn't it be better (in terms of allocation latency) to
> > simply let it compact *all* block types?
> 
> Yes it would help allocation latency, but I'm afraid it will remove most of the
> decontamination effect.
> 
> > Maybe changing it to look at
> > unmovable blocks is enough to curb cross-contamination. Sure there
> > will still be some, but now we're matching the decontamination rate to
> > the rate of !movable higher-order allocations and don't just rely on
> > the independent cache turnover rate, which during higher-order bursts
> > might not be high enough to prevent an expansion of unmovable blocks.
> 
> The rate of compaction attempts is matched with allocations, but the probability
> of compaction scanner being in unmovable block is low when the majority of
> blocks are movable. So the decontamination rate is proportional but much smaller.

Yeah, you're right. The unmovable blocks would still expand, we'd just
turn it into a logarithmic curve.

> > Does that make sense?
> 
> I guess I can try and look at the stats, but I have doubts.

I don't insist. Your patch is implementing a good thing, we can just
keep an eye out for a change in allocation latencies before spending
time trying to mitigate a potential non-issue.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype
@ 2017-02-17 17:39         ` Johannes Weiner
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Weiner @ 2017-02-17 17:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team

On Fri, Feb 17, 2017 at 05:32:00PM +0100, Vlastimil Babka wrote:
> On 02/14/2017 09:10 PM, Johannes Weiner wrote:
> > On Fri, Feb 10, 2017 at 06:23:40PM +0100, Vlastimil Babka wrote:
> >> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> >> pageblocks. This is a heuristic intended to reduce latency, based on the
> >> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> >> 
> >> However, with the exception of THP's, most high-order allocations are not
> >> movable. Should the async compaction succeed, this increases the chance that
> >> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> >> long-term fragmentation worse.
> >> 
> >> This patch attempts to help the situation by changing async direct compaction
> >> so that the migrate scanner only scans the pageblocks of the requested
> >> migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
> >> contain movable pages, chances are that the allocation can succeed within one
> >> of such pageblocks, removing the need for a fallback. If that fails, the
> >> subsequent sync attempt will ignore this restriction.
> >> 
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Yes, IMO we should make the async compaction scanner decontaminate
> > unmovable blocks. This is because we fall back to other-typed blocks
> > before we reclaim,
> 
> Which we could change too, patch 9 is a step in that direction.

Yep, patch 9 looks good to me too, pending data that confirms it.

> > so any unmovable blocks that aren't perfectly
> > occupied will fill with greedy page cache (and order-0 doesn't steal
> > blocks back to make them compactable again).
> 
> order-0 allocation can actually steal the block back, the decisions to steal are
> based on the order of the free pages in the fallback block, not on the
> allocation order. But maybe I'm not sure what exactly you meant here.

No, that was me misreading the code. Scratch what's in parentheses.

> > The thing I'm not entirely certain about is the aggressiveness of this
> > patch. Instead of restricting the async scanner to blocks of the same
> > migratetype, wouldn't it be better (in terms of allocation latency) to
> > simply let it compact *all* block types?
> 
> Yes it would help allocation latency, but I'm afraid it will remove most of the
> decontamination effect.
> 
> > Maybe changing it to look at
> > unmovable blocks is enough to curb cross-contamination. Sure there
> > will still be some, but now we're matching the decontamination rate to
> > the rate of !movable higher-order allocations and don't just rely on
> > the independent cache turnover rate, which during higher-order bursts
> > might not be high enough to prevent an expansion of unmovable blocks.
> 
> The rate of compaction attempts is matched with allocations, but the probability
> of compaction scanner being in unmovable block is low when the majority of
> blocks are movable. So the decontamination rate is proportional but much smaller.

Yeah, you're right. The unmovable blocks would still expand, we'd just
turn it into a logarithmic curve.

> > Does that make sense?
> 
> I guess I can try and look at the stats, but I have doubts.

I don't insist. Your patch is implementing a good thing, we can just
keep an eye out for a change in allocation latencies before spending
time trying to mitigate a potential non-issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
  2017-02-13 11:07   ` Mel Gorman
@ 2017-02-20 12:30     ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-20 12:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/13/2017 12:07 PM, Mel Gorman wrote:
> On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
> 
> By and large, I like the series, particularly patches 7 and 8. I cannot
> make up my mind about the RFC patches 9 and 10 yet. Conceptually they
> seem sound but they are much more far reaching than the rest of the
> series.
> 
> It would be nice if patches 1-8 could be treated in isolation with data
> on the number of extfrag events triggered, time spent in compaction and
> the success rate. Patches 9 and 10 are tricy enough that they would need
> data per patch where as patches 1-8 should be ok with data gathered for
> the whole series.
 
Ok let's try again with a fresh subthread after fixing automation and
postprocessing...

I've got the results with mmtests stress-highalloc modified to do
GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of
each run, as the extfrag stats are quite volatile (note the stats below
are sums, not averages, as it was less perl hacking for me).

Success rate are the same, already high due to the low alllocation order used.

                                   patch 1     patch 2     patch 3     patch 4     patch 7     patch 8     patch 9    patch 10
Compaction stalls                    22449       24680       24846       19765       22059       17480       29499       58284
Compaction success                   12971       14836       14608       10475       11632        8757       16697       12544
Compaction failures                   9477        9843       10238        9290       10426        8722       12801       45739
Page migrate success               3109022     3370438     3312164     1695105     1608435     2111379     2445824     3288822
Page migrate failure                911588     1149065     1028264     1112675     1077251     1026367     1014035      398158
Compaction pages isolated          7242983     8015530     7782467     4629063     4402787     5377665     6062703     7180216
Compaction migrate scanned       980838938   987367943   957690188   917647238   947155598  1018922197  1041367620   209082744
Compaction free scanned          557926893   598946443   602236894   594024490   541169699   763651731   827822984   396678647
Compaction cost                      10243       10578       10304        8286        8398        9440        9957        5019

Compaction stats are mostly within noise until patch 4, which decreases the
number of compactions, and migrations. Part of that could be due to more
pageblocks marked as unmovable, and async compaction skipping those. This
changes a bit with patch 7, but not so much. Patch 8 increases free scanner
stats and migrations, which comes from the changed termination criteria.
Interestingly number of compactions decreases - probably the fully compacted
pageblock satisfies multiple subsequent allocations, so it amortizes.
Patch 9 increases compaction attempts as we force them before fallbacks. Success
vs failure rate increases, so it might be worth it.
Patch 10 looks quite bad for compaction - lots of attempt and failures, but
scanner stats went down. I probably need to check if the new migratetype is
considered as suitable for compaction optimally.

Next comes the extfrag tracepoint, where "fragmenting" means that an allocation
had to fallback to a pageblock of another migratetype which wasn't fully free
(which is almost all of the fallbacks). I have locally added another tracepoint
for "Page steal" into steal_suitable_fallback() which triggers in situations
where we are allowed to do move_freepages_block(). If we decide to also do
set_pageblock_migratetype(), it's "Pages steal with pageblock" with break down
for which allocation migratetype we are stealing and from which fallback
migratetype. The last part "due to counting" comes from patch 4 and counts the
events where the counting of movable pages allowed us to change pageblock's
migratetype, while the number of free pages alone wouldn't be enough to cross
the threshold.

                                                     patch 1     patch 2     patch 3     patch 4     patch 7     patch 8     patch 9    patch 10
Page alloc extfrag event                            10155066     8522968    10164959    15622080    13727068    13140319     6584820     2030419
Extfrag fragmenting                                 10149231     8517025    10159040    15616925    13721391    13134792     6579315     2024038
Extfrag fragmenting for unmovable                     159504      168500      184177       97835       70625       56948       50413      166200
Extfrag fragmenting unmovable placed with movable     153613      163549      172693       91740       64099       50917       44845       20256
Extfrag fragmenting unmovable placed with reclaim.      5891        4951       11484        6095        6526        6031        5568       26540
Extfrag fragmenting for reclaimable                     4738        4829        6345        4822        5640        5378        4213        6599
Extfrag fragmenting reclaimable placed with movable     1836        1902        1851        1579        1739        1760        1918         965
Extfrag fragmenting reclaimable placed with unmov.      2902        2927        4494        3243        3901        3618        2295        3867
Extfrag fragmenting for movable                      9984989     8343696     9968518    15514268    13645126    13072466     6524689     1851239
Pages steal                                           179954      192291      210880      123254       94545       81486       72717     2024038
Pages steal with pageblock                             22153       18943       20154       33562       29969       33444       32871     1572912
Pages steal with pageblock for unmovable               14350       12858       13256       20660       19003       20852       20265       21010
Pages steal with pageblock for unmovable from mov.     12812       11402       11683       19072       17467       19298       18791        5271
Pages steal with pageblock for unmovable from recl.     1538        1456        1573        1588        1536        1554        1474        1421
Pages steal with pageblock for movable                  7114        5489        5965       11787       10012       11493       11586     1550723
Pages steal with pageblock for movable from unmov.      6885        5291        5541       11179        9525       10885       10874       29787
Pages steal with pageblock for movable from recl.        229         198         424         608         487         608         712        1190
Pages steal with pageblock for reclaimable               689         596         933        1115         954        1099        1020        1179
Pages steal with pageblock for reclaimable from unmov.   273         219         537         658         547         667         584         629
Pages steal with pageblock for reclaimable from mov.     416         377         396         457         407         432         436         324
Pages steal with pageblock due to counting                                                 11834       10075        7530        6927     1381357
... for unmovable                                                                           8993        7381        4616        3863         344
... for movable                                                                             2792        2653        2851        2972     1380981
... for reclaimable                                                                           49          41          63          92          32


What we can see is that "Extfrag fragmenting for unmovable" and "... placed with
movable" drops with almost each patch, which is good as we are polluting less
movable pageblocks with unmovable pages.
The most significant change is patch 4 with movable page counting. On the other
hand it increases "Extfrag fragmenting for movable" by 50%. "Pages steal" drops
though, so these movable allocation fallbacks find only small free pages and are
not allowed to steal whole pageblocks back. "Pages steal with pageblock" raises,
because the patch increases the chances of pageblock migratetype changes to
happen. This affects all migratetypes.
The summary is that patch 4 is not a clear win wrt these stats, but I believe
that the tradeoff it makes is a good one. There's less pollution of movable
pageblocks by unmovable allocations. There's less stealing between pageblock,
and those that remain have higher chance of changing migratetype also the
pageblock itself, so it should more faithfully reflect the migratetype of the
pages within the pageblock. The increase of movable allocations falling back to
unmovable pageblock might look dramatic, but those allocations can be migrated
by compaction when needed, and other patches in the series (7-9) improve that
aspect.
Patches 7 and 8 continue the trend of reduced unmovable fallbacks and also
reduce the impact on movable fallbacks from patch 4.
Same for patch 9, which also reduces the movable fallbacks to half. It's not
completely clear to me why. Perhaps the more aggressive compaction of unmovable
blocks results in unmovable allocations (such as those GFP_KERNEL ones from the
workload) fitting within less blocks, and thus reclaim has higher changes of
freeing the LRU pages within movable blocks, and new movable allocations don't
have to fallback that much.
Patch 10 kills all the improvements to "Pages steal with pageblock" so I'll have
to investigate.

To sum up, patches 1-8 look OK to me. Patch 9 looks also very promising, but
there's danger of increased allocation latencies due to the forced compaction.
Patch 10 has either implementation bugs or there's some unforeseen consequence
of its design.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
@ 2017-02-20 12:30     ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-02-20 12:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On 02/13/2017 12:07 PM, Mel Gorman wrote:
> On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
> 
> By and large, I like the series, particularly patches 7 and 8. I cannot
> make up my mind about the RFC patches 9 and 10 yet. Conceptually they
> seem sound but they are much more far reaching than the rest of the
> series.
> 
> It would be nice if patches 1-8 could be treated in isolation with data
> on the number of extfrag events triggered, time spent in compaction and
> the success rate. Patches 9 and 10 are tricy enough that they would need
> data per patch where as patches 1-8 should be ok with data gathered for
> the whole series.
 
Ok let's try again with a fresh subthread after fixing automation and
postprocessing...

I've got the results with mmtests stress-highalloc modified to do
GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of
each run, as the extfrag stats are quite volatile (note the stats below
are sums, not averages, as it was less perl hacking for me).

Success rate are the same, already high due to the low alllocation order used.

                                   patch 1     patch 2     patch 3     patch 4     patch 7     patch 8     patch 9    patch 10
Compaction stalls                    22449       24680       24846       19765       22059       17480       29499       58284
Compaction success                   12971       14836       14608       10475       11632        8757       16697       12544
Compaction failures                   9477        9843       10238        9290       10426        8722       12801       45739
Page migrate success               3109022     3370438     3312164     1695105     1608435     2111379     2445824     3288822
Page migrate failure                911588     1149065     1028264     1112675     1077251     1026367     1014035      398158
Compaction pages isolated          7242983     8015530     7782467     4629063     4402787     5377665     6062703     7180216
Compaction migrate scanned       980838938   987367943   957690188   917647238   947155598  1018922197  1041367620   209082744
Compaction free scanned          557926893   598946443   602236894   594024490   541169699   763651731   827822984   396678647
Compaction cost                      10243       10578       10304        8286        8398        9440        9957        5019

Compaction stats are mostly within noise until patch 4, which decreases the
number of compactions, and migrations. Part of that could be due to more
pageblocks marked as unmovable, and async compaction skipping those. This
changes a bit with patch 7, but not so much. Patch 8 increases free scanner
stats and migrations, which comes from the changed termination criteria.
Interestingly number of compactions decreases - probably the fully compacted
pageblock satisfies multiple subsequent allocations, so it amortizes.
Patch 9 increases compaction attempts as we force them before fallbacks. Success
vs failure rate increases, so it might be worth it.
Patch 10 looks quite bad for compaction - lots of attempt and failures, but
scanner stats went down. I probably need to check if the new migratetype is
considered as suitable for compaction optimally.

Next comes the extfrag tracepoint, where "fragmenting" means that an allocation
had to fallback to a pageblock of another migratetype which wasn't fully free
(which is almost all of the fallbacks). I have locally added another tracepoint
for "Page steal" into steal_suitable_fallback() which triggers in situations
where we are allowed to do move_freepages_block(). If we decide to also do
set_pageblock_migratetype(), it's "Pages steal with pageblock" with break down
for which allocation migratetype we are stealing and from which fallback
migratetype. The last part "due to counting" comes from patch 4 and counts the
events where the counting of movable pages allowed us to change pageblock's
migratetype, while the number of free pages alone wouldn't be enough to cross
the threshold.

                                                     patch 1     patch 2     patch 3     patch 4     patch 7     patch 8     patch 9    patch 10
Page alloc extfrag event                            10155066     8522968    10164959    15622080    13727068    13140319     6584820     2030419
Extfrag fragmenting                                 10149231     8517025    10159040    15616925    13721391    13134792     6579315     2024038
Extfrag fragmenting for unmovable                     159504      168500      184177       97835       70625       56948       50413      166200
Extfrag fragmenting unmovable placed with movable     153613      163549      172693       91740       64099       50917       44845       20256
Extfrag fragmenting unmovable placed with reclaim.      5891        4951       11484        6095        6526        6031        5568       26540
Extfrag fragmenting for reclaimable                     4738        4829        6345        4822        5640        5378        4213        6599
Extfrag fragmenting reclaimable placed with movable     1836        1902        1851        1579        1739        1760        1918         965
Extfrag fragmenting reclaimable placed with unmov.      2902        2927        4494        3243        3901        3618        2295        3867
Extfrag fragmenting for movable                      9984989     8343696     9968518    15514268    13645126    13072466     6524689     1851239
Pages steal                                           179954      192291      210880      123254       94545       81486       72717     2024038
Pages steal with pageblock                             22153       18943       20154       33562       29969       33444       32871     1572912
Pages steal with pageblock for unmovable               14350       12858       13256       20660       19003       20852       20265       21010
Pages steal with pageblock for unmovable from mov.     12812       11402       11683       19072       17467       19298       18791        5271
Pages steal with pageblock for unmovable from recl.     1538        1456        1573        1588        1536        1554        1474        1421
Pages steal with pageblock for movable                  7114        5489        5965       11787       10012       11493       11586     1550723
Pages steal with pageblock for movable from unmov.      6885        5291        5541       11179        9525       10885       10874       29787
Pages steal with pageblock for movable from recl.        229         198         424         608         487         608         712        1190
Pages steal with pageblock for reclaimable               689         596         933        1115         954        1099        1020        1179
Pages steal with pageblock for reclaimable from unmov.   273         219         537         658         547         667         584         629
Pages steal with pageblock for reclaimable from mov.     416         377         396         457         407         432         436         324
Pages steal with pageblock due to counting                                                 11834       10075        7530        6927     1381357
... for unmovable                                                                           8993        7381        4616        3863         344
... for movable                                                                             2792        2653        2851        2972     1380981
... for reclaimable                                                                           49          41          63          92          32


What we can see is that "Extfrag fragmenting for unmovable" and "... placed with
movable" drops with almost each patch, which is good as we are polluting less
movable pageblocks with unmovable pages.
The most significant change is patch 4 with movable page counting. On the other
hand it increases "Extfrag fragmenting for movable" by 50%. "Pages steal" drops
though, so these movable allocation fallbacks find only small free pages and are
not allowed to steal whole pageblocks back. "Pages steal with pageblock" raises,
because the patch increases the chances of pageblock migratetype changes to
happen. This affects all migratetypes.
The summary is that patch 4 is not a clear win wrt these stats, but I believe
that the tradeoff it makes is a good one. There's less pollution of movable
pageblocks by unmovable allocations. There's less stealing between pageblock,
and those that remain have higher chance of changing migratetype also the
pageblock itself, so it should more faithfully reflect the migratetype of the
pages within the pageblock. The increase of movable allocations falling back to
unmovable pageblock might look dramatic, but those allocations can be migrated
by compaction when needed, and other patches in the series (7-9) improve that
aspect.
Patches 7 and 8 continue the trend of reduced unmovable fallbacks and also
reduce the impact on movable fallbacks from patch 4.
Same for patch 9, which also reduces the movable fallbacks to half. It's not
completely clear to me why. Perhaps the more aggressive compaction of unmovable
blocks results in unmovable allocations (such as those GFP_KERNEL ones from the
workload) fitting within less blocks, and thus reclaim has higher changes of
freeing the LRU pages within movable blocks, and new movable allocations don't
have to fallback that much.
Patch 10 kills all the improvements to "Pages steal with pageblock" so I'll have
to investigate.

To sum up, patches 1-8 look OK to me. Patch 9 looks also very promising, but
there's danger of increased allocation latencies due to the forced compaction.
Patch 10 has either implementation bugs or there's some unforeseen consequence
of its design.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
  2017-02-20 12:30     ` Vlastimil Babka
@ 2017-02-23 16:01       ` Mel Gorman
  -1 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-23 16:01 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Mon, Feb 20, 2017 at 01:30:33PM +0100, Vlastimil Babka wrote:
> On 02/13/2017 12:07 PM, Mel Gorman wrote:
> > On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
> > 
> > By and large, I like the series, particularly patches 7 and 8. I cannot
> > make up my mind about the RFC patches 9 and 10 yet. Conceptually they
> > seem sound but they are much more far reaching than the rest of the
> > series.
> > 
> > It would be nice if patches 1-8 could be treated in isolation with data
> > on the number of extfrag events triggered, time spent in compaction and
> > the success rate. Patches 9 and 10 are tricy enough that they would need
> > data per patch where as patches 1-8 should be ok with data gathered for
> > the whole series.
>  
> Ok let's try again with a fresh subthread after fixing automation and
> postprocessing...
> 
> <SNIP>
> 
> To sum up, patches 1-8 look OK to me. Patch 9 looks also very promising, but
> there's danger of increased allocation latencies due to the forced compaction.
> Patch 10 has either implementation bugs or there's some unforeseen consequence
> of its design.
> 

I don't have anything useful to add other than the figures for patches
1-8 look good and the fact that fragmenting events that misplace unmovable
allocations is welcome.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/10] try to reduce fragmenting fallbacks
@ 2017-02-23 16:01       ` Mel Gorman
  0 siblings, 0 replies; 92+ messages in thread
From: Mel Gorman @ 2017-02-23 16:01 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Johannes Weiner, Joonsoo Kim, David Rientjes,
	linux-kernel, kernel-team

On Mon, Feb 20, 2017 at 01:30:33PM +0100, Vlastimil Babka wrote:
> On 02/13/2017 12:07 PM, Mel Gorman wrote:
> > On Fri, Feb 10, 2017 at 06:23:33PM +0100, Vlastimil Babka wrote:
> > 
> > By and large, I like the series, particularly patches 7 and 8. I cannot
> > make up my mind about the RFC patches 9 and 10 yet. Conceptually they
> > seem sound but they are much more far reaching than the rest of the
> > series.
> > 
> > It would be nice if patches 1-8 could be treated in isolation with data
> > on the number of extfrag events triggered, time spent in compaction and
> > the success rate. Patches 9 and 10 are tricy enough that they would need
> > data per patch where as patches 1-8 should be ok with data gathered for
> > the whole series.
>  
> Ok let's try again with a fresh subthread after fixing automation and
> postprocessing...
> 
> <SNIP>
> 
> To sum up, patches 1-8 look OK to me. Patch 9 looks also very promising, but
> there's danger of increased allocation latencies due to the forced compaction.
> Patch 10 has either implementation bugs or there's some unforeseen consequence
> of its design.
> 

I don't have anything useful to add other than the figures for patches
1-8 look good and the fact that fragmenting events that misplace unmovable
allocations is welcome.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC v2 10/10] mm, page_alloc: introduce MIGRATE_MIXED migratetype
  2017-02-10 17:23   ` Vlastimil Babka
@ 2017-03-08  2:16     ` Yisheng Xie
  -1 siblings, 0 replies; 92+ messages in thread
From: Yisheng Xie @ 2017-03-08  2:16 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Hanjun Guo

Hi Vlastimil ,

On 2017/2/11 1:23, Vlastimil Babka wrote:
> @@ -1977,7 +1978,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>  	unsigned int current_order = page_order(page);
>  	struct free_area *area;
>  	int free_pages, good_pages;
> -	int old_block_type;
> +	int old_block_type, new_block_type;
>  
>  	/* Take ownership for orders >= pageblock_order */
>  	if (current_order >= pageblock_order) {
> @@ -1991,11 +1992,27 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>  	if (!whole_block) {
>  		area = &zone->free_area[current_order];
>  		list_move(&page->lru, &area->free_list[start_type]);
> -		return;
> +		free_pages = 1 << current_order;
> +		/* TODO: We didn't scan the block, so be pessimistic */
> +		good_pages = 0;
> +	} else {
> +		free_pages = move_freepages_block(zone, page, start_type,
> +							&good_pages);
> +		/*
> +		 * good_pages is now the number of movable pages, but if we
> +		 * want UNMOVABLE or RECLAIMABLE, we consider all non-movable
> +		 * as good (but we can't fully distinguish them)
> +		 */
> +		if (start_type != MIGRATE_MOVABLE)
> +			good_pages = pageblock_nr_pages - free_pages -
> +								good_pages;
>  	}
>  
>  	free_pages = move_freepages_block(zone, page, start_type,
>  						&good_pages);
It seems this move_freepages_block() should be removed, if we can steal whole block
then just  do it. If not we can check whether we can set it as mixed mt, right?
Please let me know if I miss something..

Thanks
Yisheng Xie

> +
> +	new_block_type = old_block_type = get_pageblock_migratetype(page);
> +
>  	/*
>  	 * good_pages is now the number of movable pages, but if we
>  	 * want UNMOVABLE or RECLAIMABLE allocation, it's more tricky
> @@ -2007,7 +2024,6 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>  		 * falling back to RECLAIMABLE or vice versa, be conservative
>  		 * as we can't distinguish the exact migratetype.
>  		 */
> -		old_block_type = get_pageblock_migratetype(page);
>  		if (old_block_type == MIGRATE_MOVABLE)
>  			good_pages = pageblock_nr_pages
>  						- free_pages - good_pages;
> @@ -2015,10 +2031,34 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>  			good_pages = 0;
>  	}
>  
> -	/* Claim the whole block if over half of it is free or good type */
> -	if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||
> -			page_group_by_mobility_disabled)
> -		set_pageblock_migratetype(page, start_type);
> +	if (page_group_by_mobility_disabled) {
> +		new_block_type = start_type;
> +	} else if (free_pages + good_pages >= (1 << (pageblock_order-1))) {
> +		/*
> +		 * Claim the whole block if over half of it is free or good
> +		 * type. The exception is the transition to MIGRATE_MOVABLE
> +		 * where we require it to be fully free so that MIGRATE_MOVABLE
> +		 * pageblocks consist of purely movable pages. So if we steal
> +		 * less than whole pageblock, mark it as MIGRATE_MIXED.
> +		 */
> +		if ((start_type == MIGRATE_MOVABLE) &&
> +				free_pages + good_pages < pageblock_nr_pages)
> +			new_block_type = MIGRATE_MIXED;
> +		else
> +			new_block_type = start_type;
> +	} else {
> +		/*
> +		 * We didn't steal enough to change the block's migratetype.
> +		 * But if we are stealing from a MOVABLE block for a
> +		 * non-MOVABLE allocation, mark the block as MIXED.
> +		 */
> +		if (old_block_type == MIGRATE_MOVABLE
> +					&& start_type != MIGRATE_MOVABLE)
> +			new_block_type = MIGRATE_MIXED;
> +	}
> +
> +	if (new_block_type != old_block_type)
> +		set_pageblock_migratetype(page, new_block_type);
>  }
>  
>  /*
> @@ -2560,16 +2600,18 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  	rmv_page_order(page);
>  
>  	/*
> -	 * Set the pageblock if the isolated page is at least half of a
> -	 * pageblock
> +	 * Set the pageblock's migratetype to MIXED if the isolated page is
> +	 * at least half of a pageblock, MOVABLE if at least whole pageblock
>  	 */
>  	if (order >= pageblock_order - 1) {
>  		struct page *endpage = page + (1 << order) - 1;
> +		int new_mt = (order >= pageblock_order) ?
> +					MIGRATE_MOVABLE : MIGRATE_MIXED;
>  		for (; page < endpage; page += pageblock_nr_pages) {
>  			int mt = get_pageblock_migratetype(page);
> -			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
> -				set_pageblock_migratetype(page,
> -							  MIGRATE_MOVABLE);
> +
> +			if (!is_migrate_isolate(mt) && !is_migrate_movable(mt))
> +				set_pageblock_migratetype(page, new_mt);
>  		}
>  	}
>  
> @@ -4252,6 +4294,7 @@ static void show_migration_types(unsigned char type)
>  		[MIGRATE_MOVABLE]	= 'M',
>  		[MIGRATE_RECLAIMABLE]	= 'E',
>  		[MIGRATE_HIGHATOMIC]	= 'H',
> +		[MIGRATE_MIXED]		= 'M',
>  #ifdef CONFIG_CMA
>  		[MIGRATE_CMA]		= 'C',
>  #endif
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC v2 10/10] mm, page_alloc: introduce MIGRATE_MIXED migratetype
@ 2017-03-08  2:16     ` Yisheng Xie
  0 siblings, 0 replies; 92+ messages in thread
From: Yisheng Xie @ 2017-03-08  2:16 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Hanjun Guo

Hi Vlastimil ,

On 2017/2/11 1:23, Vlastimil Babka wrote:
> @@ -1977,7 +1978,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>  	unsigned int current_order = page_order(page);
>  	struct free_area *area;
>  	int free_pages, good_pages;
> -	int old_block_type;
> +	int old_block_type, new_block_type;
>  
>  	/* Take ownership for orders >= pageblock_order */
>  	if (current_order >= pageblock_order) {
> @@ -1991,11 +1992,27 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>  	if (!whole_block) {
>  		area = &zone->free_area[current_order];
>  		list_move(&page->lru, &area->free_list[start_type]);
> -		return;
> +		free_pages = 1 << current_order;
> +		/* TODO: We didn't scan the block, so be pessimistic */
> +		good_pages = 0;
> +	} else {
> +		free_pages = move_freepages_block(zone, page, start_type,
> +							&good_pages);
> +		/*
> +		 * good_pages is now the number of movable pages, but if we
> +		 * want UNMOVABLE or RECLAIMABLE, we consider all non-movable
> +		 * as good (but we can't fully distinguish them)
> +		 */
> +		if (start_type != MIGRATE_MOVABLE)
> +			good_pages = pageblock_nr_pages - free_pages -
> +								good_pages;
>  	}
>  
>  	free_pages = move_freepages_block(zone, page, start_type,
>  						&good_pages);
It seems this move_freepages_block() should be removed, if we can steal whole block
then just  do it. If not we can check whether we can set it as mixed mt, right?
Please let me know if I miss something..

Thanks
Yisheng Xie

> +
> +	new_block_type = old_block_type = get_pageblock_migratetype(page);
> +
>  	/*
>  	 * good_pages is now the number of movable pages, but if we
>  	 * want UNMOVABLE or RECLAIMABLE allocation, it's more tricky
> @@ -2007,7 +2024,6 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>  		 * falling back to RECLAIMABLE or vice versa, be conservative
>  		 * as we can't distinguish the exact migratetype.
>  		 */
> -		old_block_type = get_pageblock_migratetype(page);
>  		if (old_block_type == MIGRATE_MOVABLE)
>  			good_pages = pageblock_nr_pages
>  						- free_pages - good_pages;
> @@ -2015,10 +2031,34 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>  			good_pages = 0;
>  	}
>  
> -	/* Claim the whole block if over half of it is free or good type */
> -	if (free_pages + good_pages >= (1 << (pageblock_order-1)) ||
> -			page_group_by_mobility_disabled)
> -		set_pageblock_migratetype(page, start_type);
> +	if (page_group_by_mobility_disabled) {
> +		new_block_type = start_type;
> +	} else if (free_pages + good_pages >= (1 << (pageblock_order-1))) {
> +		/*
> +		 * Claim the whole block if over half of it is free or good
> +		 * type. The exception is the transition to MIGRATE_MOVABLE
> +		 * where we require it to be fully free so that MIGRATE_MOVABLE
> +		 * pageblocks consist of purely movable pages. So if we steal
> +		 * less than whole pageblock, mark it as MIGRATE_MIXED.
> +		 */
> +		if ((start_type == MIGRATE_MOVABLE) &&
> +				free_pages + good_pages < pageblock_nr_pages)
> +			new_block_type = MIGRATE_MIXED;
> +		else
> +			new_block_type = start_type;
> +	} else {
> +		/*
> +		 * We didn't steal enough to change the block's migratetype.
> +		 * But if we are stealing from a MOVABLE block for a
> +		 * non-MOVABLE allocation, mark the block as MIXED.
> +		 */
> +		if (old_block_type == MIGRATE_MOVABLE
> +					&& start_type != MIGRATE_MOVABLE)
> +			new_block_type = MIGRATE_MIXED;
> +	}
> +
> +	if (new_block_type != old_block_type)
> +		set_pageblock_migratetype(page, new_block_type);
>  }
>  
>  /*
> @@ -2560,16 +2600,18 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  	rmv_page_order(page);
>  
>  	/*
> -	 * Set the pageblock if the isolated page is at least half of a
> -	 * pageblock
> +	 * Set the pageblock's migratetype to MIXED if the isolated page is
> +	 * at least half of a pageblock, MOVABLE if at least whole pageblock
>  	 */
>  	if (order >= pageblock_order - 1) {
>  		struct page *endpage = page + (1 << order) - 1;
> +		int new_mt = (order >= pageblock_order) ?
> +					MIGRATE_MOVABLE : MIGRATE_MIXED;
>  		for (; page < endpage; page += pageblock_nr_pages) {
>  			int mt = get_pageblock_migratetype(page);
> -			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
> -				set_pageblock_migratetype(page,
> -							  MIGRATE_MOVABLE);
> +
> +			if (!is_migrate_isolate(mt) && !is_migrate_movable(mt))
> +				set_pageblock_migratetype(page, new_mt);
>  		}
>  	}
>  
> @@ -4252,6 +4294,7 @@ static void show_migration_types(unsigned char type)
>  		[MIGRATE_MOVABLE]	= 'M',
>  		[MIGRATE_RECLAIMABLE]	= 'E',
>  		[MIGRATE_HIGHATOMIC]	= 'H',
> +		[MIGRATE_MIXED]		= 'M',
>  #ifdef CONFIG_CMA
>  		[MIGRATE_CMA]		= 'C',
>  #endif
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC v2 10/10] mm, page_alloc: introduce MIGRATE_MIXED migratetype
  2017-03-08  2:16     ` Yisheng Xie
@ 2017-03-08  7:07       ` Vlastimil Babka
  -1 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-03-08  7:07 UTC (permalink / raw)
  To: Yisheng Xie, linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Hanjun Guo

On 03/08/2017 03:16 AM, Yisheng Xie wrote:
> Hi Vlastimil ,
> 
> On 2017/2/11 1:23, Vlastimil Babka wrote:
>> @@ -1977,7 +1978,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>>  	unsigned int current_order = page_order(page);
>>  	struct free_area *area;
>>  	int free_pages, good_pages;
>> -	int old_block_type;
>> +	int old_block_type, new_block_type;
>>  
>>  	/* Take ownership for orders >= pageblock_order */
>>  	if (current_order >= pageblock_order) {
>> @@ -1991,11 +1992,27 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>>  	if (!whole_block) {
>>  		area = &zone->free_area[current_order];
>>  		list_move(&page->lru, &area->free_list[start_type]);
>> -		return;
>> +		free_pages = 1 << current_order;
>> +		/* TODO: We didn't scan the block, so be pessimistic */
>> +		good_pages = 0;
>> +	} else {
>> +		free_pages = move_freepages_block(zone, page, start_type,
>> +							&good_pages);
>> +		/*
>> +		 * good_pages is now the number of movable pages, but if we
>> +		 * want UNMOVABLE or RECLAIMABLE, we consider all non-movable
>> +		 * as good (but we can't fully distinguish them)
>> +		 */
>> +		if (start_type != MIGRATE_MOVABLE)
>> +			good_pages = pageblock_nr_pages - free_pages -
>> +								good_pages;
>>  	}
>>  
>>  	free_pages = move_freepages_block(zone, page, start_type,
>>  						&good_pages);
> It seems this move_freepages_block() should be removed, if we can steal whole block
> then just  do it. If not we can check whether we can set it as mixed mt, right?
> Please let me know if I miss something..

Right. My results suggested this patch was buggy, so this might be the
bug (or one of the bugs), thanks for pointing it out. I've reposted v3
without the RFC patches 9 and 10 and will return to them later.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC v2 10/10] mm, page_alloc: introduce MIGRATE_MIXED migratetype
@ 2017-03-08  7:07       ` Vlastimil Babka
  0 siblings, 0 replies; 92+ messages in thread
From: Vlastimil Babka @ 2017-03-08  7:07 UTC (permalink / raw)
  To: Yisheng Xie, linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Hanjun Guo

On 03/08/2017 03:16 AM, Yisheng Xie wrote:
> Hi Vlastimil ,
> 
> On 2017/2/11 1:23, Vlastimil Babka wrote:
>> @@ -1977,7 +1978,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>>  	unsigned int current_order = page_order(page);
>>  	struct free_area *area;
>>  	int free_pages, good_pages;
>> -	int old_block_type;
>> +	int old_block_type, new_block_type;
>>  
>>  	/* Take ownership for orders >= pageblock_order */
>>  	if (current_order >= pageblock_order) {
>> @@ -1991,11 +1992,27 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>>  	if (!whole_block) {
>>  		area = &zone->free_area[current_order];
>>  		list_move(&page->lru, &area->free_list[start_type]);
>> -		return;
>> +		free_pages = 1 << current_order;
>> +		/* TODO: We didn't scan the block, so be pessimistic */
>> +		good_pages = 0;
>> +	} else {
>> +		free_pages = move_freepages_block(zone, page, start_type,
>> +							&good_pages);
>> +		/*
>> +		 * good_pages is now the number of movable pages, but if we
>> +		 * want UNMOVABLE or RECLAIMABLE, we consider all non-movable
>> +		 * as good (but we can't fully distinguish them)
>> +		 */
>> +		if (start_type != MIGRATE_MOVABLE)
>> +			good_pages = pageblock_nr_pages - free_pages -
>> +								good_pages;
>>  	}
>>  
>>  	free_pages = move_freepages_block(zone, page, start_type,
>>  						&good_pages);
> It seems this move_freepages_block() should be removed, if we can steal whole block
> then just  do it. If not we can check whether we can set it as mixed mt, right?
> Please let me know if I miss something..

Right. My results suggested this patch was buggy, so this might be the
bug (or one of the bugs), thanks for pointing it out. I've reposted v3
without the RFC patches 9 and 10 and will return to them later.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC v2 10/10] mm, page_alloc: introduce MIGRATE_MIXED migratetype
  2017-03-08  7:07       ` Vlastimil Babka
@ 2017-03-13  2:16         ` Yisheng Xie
  -1 siblings, 0 replies; 92+ messages in thread
From: Yisheng Xie @ 2017-03-13  2:16 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Hanjun Guo

Hi, Vlastimil,

On 2017/3/8 15:07, Vlastimil Babka wrote:
> On 03/08/2017 03:16 AM, Yisheng Xie wrote:
>> Hi Vlastimil ,
>>
>> On 2017/2/11 1:23, Vlastimil Babka wrote:
>>> @@ -1977,7 +1978,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>>>  	unsigned int current_order = page_order(page);
>>>  	struct free_area *area;
>>>  	int free_pages, good_pages;
>>> -	int old_block_type;
>>> +	int old_block_type, new_block_type;
>>>  
>>>  	/* Take ownership for orders >= pageblock_order */
>>>  	if (current_order >= pageblock_order) {
>>> @@ -1991,11 +1992,27 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>>>  	if (!whole_block) {
>>>  		area = &zone->free_area[current_order];
>>>  		list_move(&page->lru, &area->free_list[start_type]);
>>> -		return;
>>> +		free_pages = 1 << current_order;
>>> +		/* TODO: We didn't scan the block, so be pessimistic */
>>> +		good_pages = 0;
>>> +	} else {
>>> +		free_pages = move_freepages_block(zone, page, start_type,
>>> +							&good_pages);
>>> +		/*
>>> +		 * good_pages is now the number of movable pages, but if we
>>> +		 * want UNMOVABLE or RECLAIMABLE, we consider all non-movable
>>> +		 * as good (but we can't fully distinguish them)
>>> +		 */
>>> +		if (start_type != MIGRATE_MOVABLE)
>>> +			good_pages = pageblock_nr_pages - free_pages -
>>> +								good_pages;
>>>  	}
>>>  
>>>  	free_pages = move_freepages_block(zone, page, start_type,
>>>  						&good_pages);
>> It seems this move_freepages_block() should be removed, if we can steal whole block
>> then just  do it. If not we can check whether we can set it as mixed mt, right?
>> Please let me know if I miss something..
> 
> Right. My results suggested this patch was buggy, so this might be the
> bug (or one of the bugs), thanks for pointing it out. I've reposted v3
> without the RFC patches 9 and 10 and will return to them later.
Yes, I also have test about this patch on v4.1, but can not get better perf.
And it would be much appreciative if you can Cc me when send patchs about 9,10 later.

Thanks
Yisheng Xie.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC v2 10/10] mm, page_alloc: introduce MIGRATE_MIXED migratetype
@ 2017-03-13  2:16         ` Yisheng Xie
  0 siblings, 0 replies; 92+ messages in thread
From: Yisheng Xie @ 2017-03-13  2:16 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, Johannes Weiner
  Cc: Joonsoo Kim, David Rientjes, Mel Gorman, linux-kernel,
	kernel-team, Hanjun Guo

Hi, Vlastimil,

On 2017/3/8 15:07, Vlastimil Babka wrote:
> On 03/08/2017 03:16 AM, Yisheng Xie wrote:
>> Hi Vlastimil ,
>>
>> On 2017/2/11 1:23, Vlastimil Babka wrote:
>>> @@ -1977,7 +1978,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>>>  	unsigned int current_order = page_order(page);
>>>  	struct free_area *area;
>>>  	int free_pages, good_pages;
>>> -	int old_block_type;
>>> +	int old_block_type, new_block_type;
>>>  
>>>  	/* Take ownership for orders >= pageblock_order */
>>>  	if (current_order >= pageblock_order) {
>>> @@ -1991,11 +1992,27 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>>>  	if (!whole_block) {
>>>  		area = &zone->free_area[current_order];
>>>  		list_move(&page->lru, &area->free_list[start_type]);
>>> -		return;
>>> +		free_pages = 1 << current_order;
>>> +		/* TODO: We didn't scan the block, so be pessimistic */
>>> +		good_pages = 0;
>>> +	} else {
>>> +		free_pages = move_freepages_block(zone, page, start_type,
>>> +							&good_pages);
>>> +		/*
>>> +		 * good_pages is now the number of movable pages, but if we
>>> +		 * want UNMOVABLE or RECLAIMABLE, we consider all non-movable
>>> +		 * as good (but we can't fully distinguish them)
>>> +		 */
>>> +		if (start_type != MIGRATE_MOVABLE)
>>> +			good_pages = pageblock_nr_pages - free_pages -
>>> +								good_pages;
>>>  	}
>>>  
>>>  	free_pages = move_freepages_block(zone, page, start_type,
>>>  						&good_pages);
>> It seems this move_freepages_block() should be removed, if we can steal whole block
>> then just  do it. If not we can check whether we can set it as mixed mt, right?
>> Please let me know if I miss something..
> 
> Right. My results suggested this patch was buggy, so this might be the
> bug (or one of the bugs), thanks for pointing it out. I've reposted v3
> without the RFC patches 9 and 10 and will return to them later.
Yes, I also have test about this patch on v4.1, but can not get better perf.
And it would be much appreciative if you can Cc me when send patchs about 9,10 later.

Thanks
Yisheng Xie.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2017-03-13  2:20 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-10 17:23 [PATCH v2 00/10] try to reduce fragmenting fallbacks Vlastimil Babka
2017-02-10 17:23 ` Vlastimil Babka
2017-02-10 17:23 ` [PATCH v2 01/10] mm, compaction: reorder fields in struct compact_control Vlastimil Babka
2017-02-10 17:23   ` Vlastimil Babka
2017-02-13 10:49   ` Mel Gorman
2017-02-13 10:49     ` Mel Gorman
2017-02-14 16:33   ` Johannes Weiner
2017-02-14 16:33     ` Johannes Weiner
2017-02-10 17:23 ` [PATCH v2 02/10] mm, compaction: remove redundant watermark check in compact_finished() Vlastimil Babka
2017-02-10 17:23   ` Vlastimil Babka
2017-02-13 10:49   ` Mel Gorman
2017-02-13 10:49     ` Mel Gorman
2017-02-14 16:34   ` Johannes Weiner
2017-02-14 16:34     ` Johannes Weiner
2017-02-10 17:23 ` [PATCH v2 03/10] mm, page_alloc: split smallest stolen page in fallback Vlastimil Babka
2017-02-10 17:23   ` Vlastimil Babka
2017-02-13 10:51   ` Mel Gorman
2017-02-13 10:51     ` Mel Gorman
2017-02-13 10:54     ` Vlastimil Babka
2017-02-13 10:54       ` Vlastimil Babka
2017-02-14 16:59   ` Johannes Weiner
2017-02-14 16:59     ` Johannes Weiner
2017-02-10 17:23 ` [PATCH v2 04/10] mm, page_alloc: count movable pages when stealing from pageblock Vlastimil Babka
2017-02-10 17:23   ` Vlastimil Babka
2017-02-13 10:53   ` Mel Gorman
2017-02-13 10:53     ` Mel Gorman
2017-02-14 10:07   ` Xishi Qiu
2017-02-14 10:07     ` Xishi Qiu
2017-02-15 10:47     ` Vlastimil Babka
2017-02-15 10:47       ` Vlastimil Babka
2017-02-15 11:56       ` Xishi Qiu
2017-02-15 11:56         ` Xishi Qiu
2017-02-17 16:21         ` Vlastimil Babka
2017-02-17 16:21           ` Vlastimil Babka
2017-02-14 18:10   ` Johannes Weiner
2017-02-14 18:10     ` Johannes Weiner
2017-02-17 16:09     ` Vlastimil Babka
2017-02-17 16:09       ` Vlastimil Babka
2017-02-10 17:23 ` [PATCH v2 05/10] mm, compaction: change migrate_async_suitable() to suitable_migration_source() Vlastimil Babka
2017-02-10 17:23   ` Vlastimil Babka
2017-02-13 10:53   ` Mel Gorman
2017-02-13 10:53     ` Mel Gorman
2017-02-14 18:12   ` Johannes Weiner
2017-02-14 18:12     ` Johannes Weiner
2017-02-10 17:23 ` [PATCH v2 06/10] mm, compaction: add migratetype to compact_control Vlastimil Babka
2017-02-10 17:23   ` Vlastimil Babka
2017-02-13 10:53   ` Mel Gorman
2017-02-13 10:53     ` Mel Gorman
2017-02-14 18:15   ` Johannes Weiner
2017-02-14 18:15     ` Johannes Weiner
2017-02-10 17:23 ` [PATCH v2 07/10] mm, compaction: restrict async compaction to pageblocks of same migratetype Vlastimil Babka
2017-02-10 17:23   ` Vlastimil Babka
2017-02-13 10:56   ` Mel Gorman
2017-02-13 10:56     ` Mel Gorman
2017-02-14 20:10   ` Johannes Weiner
2017-02-14 20:10     ` Johannes Weiner
2017-02-17 16:32     ` Vlastimil Babka
2017-02-17 16:32       ` Vlastimil Babka
2017-02-17 17:39       ` Johannes Weiner
2017-02-17 17:39         ` Johannes Weiner
2017-02-10 17:23 ` [PATCH v2 08/10] mm, compaction: finish whole pageblock to reduce fragmentation Vlastimil Babka
2017-02-10 17:23   ` Vlastimil Babka
2017-02-13 10:57   ` Mel Gorman
2017-02-13 10:57     ` Mel Gorman
2017-02-16 11:44   ` Johannes Weiner
2017-02-16 11:44     ` Johannes Weiner
2017-02-10 17:23 ` [RFC v2 09/10] mm, page_alloc: disallow migratetype fallback in fastpath Vlastimil Babka
2017-02-10 17:23   ` Vlastimil Babka
2017-02-10 17:23 ` [RFC v2 10/10] mm, page_alloc: introduce MIGRATE_MIXED migratetype Vlastimil Babka
2017-02-10 17:23   ` Vlastimil Babka
2017-03-08  2:16   ` Yisheng Xie
2017-03-08  2:16     ` Yisheng Xie
2017-03-08  7:07     ` Vlastimil Babka
2017-03-08  7:07       ` Vlastimil Babka
2017-03-13  2:16       ` Yisheng Xie
2017-03-13  2:16         ` Yisheng Xie
2017-02-13 11:07 ` [PATCH v2 00/10] try to reduce fragmenting fallbacks Mel Gorman
2017-02-13 11:07   ` Mel Gorman
2017-02-15 14:29   ` Vlastimil Babka
2017-02-15 14:29     ` Vlastimil Babka
2017-02-15 16:11     ` Vlastimil Babka
2017-02-15 16:11       ` Vlastimil Babka
2017-02-15 20:11       ` Vlastimil Babka
2017-02-15 20:11         ` Vlastimil Babka
2017-02-16 15:12     ` Vlastimil Babka
2017-02-16 15:12       ` Vlastimil Babka
2017-02-17 15:24       ` Vlastimil Babka
2017-02-17 15:24         ` Vlastimil Babka
2017-02-20 12:30   ` Vlastimil Babka
2017-02-20 12:30     ` Vlastimil Babka
2017-02-23 16:01     ` Mel Gorman
2017-02-23 16:01       ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.