linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1
@ 2018-12-14 23:02 Mel Gorman
  2018-12-14 23:02 ` [PATCH 01/14] mm, compaction: Shrink compact_control Mel Gorman
                   ` (13 more replies)
  0 siblings, 14 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:02 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing,
	Mel Gorman

This is a very preliminary RFC. I'm posting this early as the
__GFP_THISNODE discussion continues and has started looking at the
compaction implementation and it'd be worth looking at this series
fdirst. The cc list is based on that dicussion just to make them aware
it exists. A v2 will have a significantly trimmed cc.

This series reduces scan rates and success rates of compaction, primarily
by using the free lists to shorten scans and better controlling of skip
information and whether multiple scanners can target the same block. There
still is much room for improvement but we probably should get these out
of the way first as they are pre-requisites for anything smarter.

Test data is still incomplete but I'm not expecting major differences on
2-socket vs 1-socket given the type of series. That might be wrong.

Primarily I'm using thpscale to measure the impact of the series. The
benchmark creates a large file, maps it, faults it, punches holes in the
mapping so that the virtual address space is fragmented and then tries
to allocate THP. It re-executes for different numbers of threads. From a
fragmentation perspective, the workload is relatively benign but it does
stress compaction.

The overall impact on latencies for a 1-socket machine is

                                    4.20.0-rc6             4.20.0-rc6
                                mmotm-20181210           capture-v1r8
Amean     fault-both-3      3842.11 (   0.00%)     2898.64 *  24.56%*
Amean     fault-both-5      5201.92 (   0.00%)     4296.58 (  17.40%)
Amean     fault-both-7      7086.15 (   0.00%)     6203.55 (  12.46%)
Amean     fault-both-12    11383.58 (   0.00%)     9309.13 *  18.22%*
Amean     fault-both-18    16616.53 (   0.00%)     6245.27 *  62.42%*
Amean     fault-both-24    18617.05 (   0.00%)    15083.42 *  18.98%*
Amean     fault-both-30    24372.88 (   0.00%)    11498.60 *  52.82%*
Amean     fault-both-32    22621.58 (   0.00%)     9684.82 *  57.19%*

24-62% reduction in fault latency (be it base or huge page)

The allocation success rates which are using MADV_HUGEPAGE are as follows;

                               4.20.0-rc6             4.20.0-rc6
                           mmotm-20181210           capture-v1r8
Percentage huge-3        85.74 (   0.00%)       98.12 (  14.44%)
Percentage huge-5        89.16 (   0.00%)       98.83 (  10.85%)
Percentage huge-7        85.98 (   0.00%)       97.99 (  13.97%)
Percentage huge-12       84.19 (   0.00%)       99.00 (  17.59%)
Percentage huge-18       81.20 (   0.00%)       98.92 (  21.83%)
Percentage huge-24       82.60 (   0.00%)       99.08 (  19.95%)
Percentage huge-30       82.87 (   0.00%)       99.22 (  19.74%)
Percentage huge-32       81.98 (   0.00%)       98.97 (  20.72%)

So, it's showing that the series is not far short of  allocating 100% of the
requested pages as huge pages. Finally the overall scan rates are as follows

Compaction migrate scanned   677936161     4756927
Compaction free scanned      352045485   256525622
Kcompactd migrate scanned       751732      751080
Kcompactd free scanned       113784579    93099786

These are still pretty crazy scan rates but direct compaction migration
scanning is reduced by 99% and the free scanner is reduced by 27% so it's
a step in the right direction.

 include/linux/compaction.h |   3 +-
 include/linux/gfp.h        |   7 +-
 include/linux/sched.h      |   4 +
 kernel/sched/core.c        |   3 +
 mm/compaction.c            | 638 ++++++++++++++++++++++++++++++++++++++-------
 mm/internal.h              |  21 +-
 mm/migrate.c               |   2 +-
 mm/page_alloc.c            |  70 ++++-
 8 files changed, 637 insertions(+), 111 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/14] mm, compaction: Shrink compact_control
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
@ 2018-12-14 23:02 ` Mel Gorman
  2018-12-17 12:27   ` Vlastimil Babka
  2018-12-14 23:02 ` [PATCH 02/14] mm, compaction: Rearrange compact_control Mel Gorman
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:02 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing,
	Mel Gorman

The isolate and migrate scanners should never isolate more than a pageblock
of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 536bc2a839b9..5564841fce36 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -185,8 +185,8 @@ struct compact_control {
 	struct list_head freepages;	/* List of free pages to migrate to */
 	struct list_head migratepages;	/* List of pages being migrated */
 	struct zone *zone;
-	unsigned long nr_freepages;	/* Number of isolated free pages */
-	unsigned long nr_migratepages;	/* Number of pages to migrate */
+	unsigned int nr_freepages;	/* Number of isolated free pages */
+	unsigned int nr_migratepages;	/* Number of pages to migrate */
 	unsigned long total_migrate_scanned;
 	unsigned long total_free_scanned;
 	unsigned long free_pfn;		/* isolate_freepages search base */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/14] mm, compaction: Rearrange compact_control
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
  2018-12-14 23:02 ` [PATCH 01/14] mm, compaction: Shrink compact_control Mel Gorman
@ 2018-12-14 23:02 ` Mel Gorman
  2018-12-17 13:20   ` Vlastimil Babka
  2018-12-14 23:02 ` [PATCH 03/14] mm, compaction: Remove last_migrated_pfn from compact_control Mel Gorman
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:02 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing,
	Mel Gorman

compact_control spans two cache lines with write-intensive lines on
both. Rearrange so the most write-intensive fields are in the same
cache line. This has a negligible impact on the overall performance of
compaction and is more a tidying exercise than anything.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 5564841fce36..867af5425432 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -184,14 +184,14 @@ extern int user_min_free_kbytes;
 struct compact_control {
 	struct list_head freepages;	/* List of free pages to migrate to */
 	struct list_head migratepages;	/* List of pages being migrated */
-	struct zone *zone;
 	unsigned int nr_freepages;	/* Number of isolated free pages */
 	unsigned int nr_migratepages;	/* Number of pages to migrate */
-	unsigned long total_migrate_scanned;
-	unsigned long total_free_scanned;
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
+	struct zone *zone;
+	unsigned long total_migrate_scanned;
+	unsigned long total_free_scanned;
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* migratetype of direct compactor */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/14] mm, compaction: Remove last_migrated_pfn from compact_control
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
  2018-12-14 23:02 ` [PATCH 01/14] mm, compaction: Shrink compact_control Mel Gorman
  2018-12-14 23:02 ` [PATCH 02/14] mm, compaction: Rearrange compact_control Mel Gorman
@ 2018-12-14 23:02 ` Mel Gorman
  2018-12-17 13:50   ` Vlastimil Babka
  2018-12-14 23:03 ` [PATCH 04/14] mm, compaction: Rename map_pages to split_map_pages Mel Gorman
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:02 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing,
	Mel Gorman

The last_migrated_pfn field is a bit dubious as to whether it really helps
but either way, the information from it can be inferred without increasing
the size of compact_control so remove the field.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 25 +++++++++----------------
 mm/internal.h   |  1 -
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ef29490b0f46..fb4d9f52ed56 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -886,15 +886,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		cc->nr_migratepages++;
 		nr_isolated++;
 
-		/*
-		 * Record where we could have freed pages by migration and not
-		 * yet flushed them to buddy allocator.
-		 * - this is the lowest page that was isolated and likely be
-		 * then freed by migration.
-		 */
-		if (!cc->last_migrated_pfn)
-			cc->last_migrated_pfn = low_pfn;
-
 		/* Avoid isolating too much */
 		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
 			++low_pfn;
@@ -918,7 +909,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			}
 			putback_movable_pages(&cc->migratepages);
 			cc->nr_migratepages = 0;
-			cc->last_migrated_pfn = 0;
 			nr_isolated = 0;
 		}
 
@@ -1539,6 +1529,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	enum compact_result ret;
 	unsigned long start_pfn = zone->zone_start_pfn;
 	unsigned long end_pfn = zone_end_pfn(zone);
+	unsigned long last_migrated_pfn;
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 
 	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
@@ -1584,7 +1575,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 			cc->whole_zone = true;
 	}
 
-	cc->last_migrated_pfn = 0;
+	last_migrated_pfn = 0;
 
 	trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
 				cc->free_pfn, end_pfn, sync);
@@ -1593,12 +1584,14 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
 		int err;
+		unsigned long start_pfn = cc->migrate_pfn;
 
 		switch (isolate_migratepages(zone, cc)) {
 		case ISOLATE_ABORT:
 			ret = COMPACT_CONTENDED;
 			putback_movable_pages(&cc->migratepages);
 			cc->nr_migratepages = 0;
+			last_migrated_pfn = 0;
 			goto out;
 		case ISOLATE_NONE:
 			/*
@@ -1608,6 +1601,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 			 */
 			goto check_drain;
 		case ISOLATE_SUCCESS:
+			last_migrated_pfn = start_pfn;
 			;
 		}
 
@@ -1639,8 +1633,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 				cc->migrate_pfn = block_end_pfn(
 						cc->migrate_pfn - 1, cc->order);
 				/* Draining pcplists is useless in this case */
-				cc->last_migrated_pfn = 0;
-
+				last_migrated_pfn = 0;
 			}
 		}
 
@@ -1652,18 +1645,18 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 		 * compact_finished() can detect immediately if allocation
 		 * would succeed.
 		 */
-		if (cc->order > 0 && cc->last_migrated_pfn) {
+		if (cc->order > 0 && last_migrated_pfn) {
 			int cpu;
 			unsigned long current_block_start =
 				block_start_pfn(cc->migrate_pfn, cc->order);
 
-			if (cc->last_migrated_pfn < current_block_start) {
+			if (last_migrated_pfn < current_block_start) {
 				cpu = get_cpu();
 				lru_add_drain_cpu(cpu);
 				drain_local_pages(zone);
 				put_cpu();
 				/* No more flushing until we migrate again */
-				cc->last_migrated_pfn = 0;
+				last_migrated_pfn = 0;
 			}
 		}
 
diff --git a/mm/internal.h b/mm/internal.h
index 867af5425432..f40d06d70683 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -188,7 +188,6 @@ struct compact_control {
 	unsigned int nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
 	struct zone *zone;
 	unsigned long total_migrate_scanned;
 	unsigned long total_free_scanned;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/14] mm, compaction: Rename map_pages to split_map_pages
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (2 preceding siblings ...)
  2018-12-14 23:02 ` [PATCH 03/14] mm, compaction: Remove last_migrated_pfn from compact_control Mel Gorman
@ 2018-12-14 23:03 ` Mel Gorman
  2018-12-17 14:06   ` Vlastimil Babka
  2018-12-14 23:03 ` [PATCH 05/14] mm, compaction: Skip pageblocks with reserved pages Mel Gorman
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:03 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing,
	Mel Gorman

It's non-obvious that high-order free pages are split into order-0
pages from the function name. Fix it.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 60 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 29 insertions(+), 31 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index fb4d9f52ed56..3afa4e9188b6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -66,7 +66,7 @@ static unsigned long release_freepages(struct list_head *freelist)
 	return high_pfn;
 }
 
-static void map_pages(struct list_head *list)
+static void split_map_pages(struct list_head *list)
 {
 	unsigned int i, order, nr_pages;
 	struct page *page, *next;
@@ -644,7 +644,7 @@ isolate_freepages_range(struct compact_control *cc,
 	}
 
 	/* __isolate_free_page() does not map the pages */
-	map_pages(&freelist);
+	split_map_pages(&freelist);
 
 	if (pfn < end_pfn) {
 		/* Loop terminated early, cleanup. */
@@ -1141,7 +1141,7 @@ static void isolate_freepages(struct compact_control *cc)
 	}
 
 	/* __isolate_free_page() does not map the pages */
-	map_pages(freelist);
+	split_map_pages(freelist);
 
 	/*
 	 * Record where the free scanner will restart next time. Either we
@@ -1300,8 +1300,7 @@ static inline bool is_via_compact_memory(int order)
 	return order == -1;
 }
 
-static enum compact_result __compact_finished(struct zone *zone,
-						struct compact_control *cc)
+static enum compact_result __compact_finished(struct compact_control *cc)
 {
 	unsigned int order;
 	const int migratetype = cc->migratetype;
@@ -1312,7 +1311,7 @@ static enum compact_result __compact_finished(struct zone *zone,
 	/* Compaction run completes if the migrate and free scanner meet */
 	if (compact_scanners_met(cc)) {
 		/* Let the next compaction start anew. */
-		reset_cached_positions(zone);
+		reset_cached_positions(cc->zone);
 
 		/*
 		 * Mark that the PG_migrate_skip information should be cleared
@@ -1321,7 +1320,7 @@ static enum compact_result __compact_finished(struct zone *zone,
 		 * based on an allocation request.
 		 */
 		if (cc->direct_compaction)
-			zone->compact_blockskip_flush = true;
+			cc->zone->compact_blockskip_flush = true;
 
 		if (cc->whole_zone)
 			return COMPACT_COMPLETE;
@@ -1345,7 +1344,7 @@ static enum compact_result __compact_finished(struct zone *zone,
 
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
-		struct free_area *area = &zone->free_area[order];
+		struct free_area *area = &cc->zone->free_area[order];
 		bool can_steal;
 
 		/* Job done if page is free of the right migratetype */
@@ -1391,13 +1390,12 @@ static enum compact_result __compact_finished(struct zone *zone,
 	return COMPACT_NO_SUITABLE_PAGE;
 }
 
-static enum compact_result compact_finished(struct zone *zone,
-			struct compact_control *cc)
+static enum compact_result compact_finished(struct compact_control *cc)
 {
 	int ret;
 
-	ret = __compact_finished(zone, cc);
-	trace_mm_compaction_finished(zone, cc->order, ret);
+	ret = __compact_finished(cc);
+	trace_mm_compaction_finished(cc->zone, cc->order, ret);
 	if (ret == COMPACT_NO_SUITABLE_PAGE)
 		ret = COMPACT_CONTINUE;
 
@@ -1524,16 +1522,16 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 	return false;
 }
 
-static enum compact_result compact_zone(struct zone *zone, struct compact_control *cc)
+static enum compact_result compact_zone(struct compact_control *cc)
 {
 	enum compact_result ret;
-	unsigned long start_pfn = zone->zone_start_pfn;
-	unsigned long end_pfn = zone_end_pfn(zone);
+	unsigned long start_pfn = cc->zone->zone_start_pfn;
+	unsigned long end_pfn = zone_end_pfn(cc->zone);
 	unsigned long last_migrated_pfn;
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 
 	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
-	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
+	ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
 	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
@@ -1546,8 +1544,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	 * Clear pageblock skip if there were failures recently and compaction
 	 * is about to be retried after being deferred.
 	 */
-	if (compaction_restarting(zone, cc->order))
-		__reset_isolation_suitable(zone);
+	if (compaction_restarting(cc->zone, cc->order))
+		__reset_isolation_suitable(cc->zone);
 
 	/*
 	 * Setup to move all movable pages to the end of the zone. Used cached
@@ -1559,16 +1557,16 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 		cc->migrate_pfn = start_pfn;
 		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
 	} else {
-		cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
-		cc->free_pfn = zone->compact_cached_free_pfn;
+		cc->migrate_pfn = cc->zone->compact_cached_migrate_pfn[sync];
+		cc->free_pfn = cc->zone->compact_cached_free_pfn;
 		if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
 			cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
-			zone->compact_cached_free_pfn = cc->free_pfn;
+			cc->zone->compact_cached_free_pfn = cc->free_pfn;
 		}
 		if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
 			cc->migrate_pfn = start_pfn;
-			zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
-			zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
+			cc->zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
+			cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
 		}
 
 		if (cc->migrate_pfn == start_pfn)
@@ -1582,11 +1580,11 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	migrate_prep_local();
 
-	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
+	while ((ret = compact_finished(cc)) == COMPACT_CONTINUE) {
 		int err;
 		unsigned long start_pfn = cc->migrate_pfn;
 
-		switch (isolate_migratepages(zone, cc)) {
+		switch (isolate_migratepages(cc->zone, cc)) {
 		case ISOLATE_ABORT:
 			ret = COMPACT_CONTENDED;
 			putback_movable_pages(&cc->migratepages);
@@ -1653,7 +1651,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 			if (last_migrated_pfn < current_block_start) {
 				cpu = get_cpu();
 				lru_add_drain_cpu(cpu);
-				drain_local_pages(zone);
+				drain_local_pages(cc->zone);
 				put_cpu();
 				/* No more flushing until we migrate again */
 				last_migrated_pfn = 0;
@@ -1678,8 +1676,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 		 * Only go back, not forward. The cached pfn might have been
 		 * already reset to zone end in compact_finished()
 		 */
-		if (free_pfn > zone->compact_cached_free_pfn)
-			zone->compact_cached_free_pfn = free_pfn;
+		if (free_pfn > cc->zone->compact_cached_free_pfn)
+			cc->zone->compact_cached_free_pfn = free_pfn;
 	}
 
 	count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned);
@@ -1716,7 +1714,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	ret = compact_zone(zone, &cc);
+	ret = compact_zone(&cc);
 
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
@@ -1834,7 +1832,7 @@ static void compact_node(int nid)
 		INIT_LIST_HEAD(&cc.freepages);
 		INIT_LIST_HEAD(&cc.migratepages);
 
-		compact_zone(zone, &cc);
+		compact_zone(&cc);
 
 		VM_BUG_ON(!list_empty(&cc.freepages));
 		VM_BUG_ON(!list_empty(&cc.migratepages));
@@ -1976,7 +1974,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 
 		if (kthread_should_stop())
 			return;
-		status = compact_zone(zone, &cc);
+		status = compact_zone(&cc);
 
 		if (status == COMPACT_SUCCESS) {
 			compaction_defer_reset(zone, cc.order, false);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/14] mm, compaction: Skip pageblocks with reserved pages
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (3 preceding siblings ...)
  2018-12-14 23:03 ` [PATCH 04/14] mm, compaction: Rename map_pages to split_map_pages Mel Gorman
@ 2018-12-14 23:03 ` Mel Gorman
  2018-12-18  8:08   ` Vlastimil Babka
  2018-12-14 23:03 ` [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler Mel Gorman
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:03 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing,
	Mel Gorman

Reserved pages are set at boot time, tend to be clustered and almost
never become unreserved. When isolating pages for migrating, skip
the entire pageblock is one PageReserved page is encountered on the
grounds that it is highly probable the entire pageblock is reserved.

The impact depends on the machine and timing but both thpscale and
thpfioscale when using MADV_HUGEPAGE show a reduction of scanning and
fault latency on a 1-socket machine. The 2-socket results were too
noisy to draw any meaningful conclusion but it's safe to assume less
scanning is useful.

1-socket thpfioscale
                                   4.20.0-rc6             4.20.0-rc6
                               mmotm-20181210        noreserved-v1r4
Amean     fault-base-1     1481.32 (   0.00%)     1443.63 (   2.54%)
Amean     fault-huge-1     1118.17 (   0.00%)      981.30 *  12.24%*
Amean     fault-both-1     1176.43 (   0.00%)     1052.64 *  10.52%*

Compaction migrate scanned     3860713     3294284
Compaction free scanned      613786341   433423502
Kcompactd migrate scanned       408711      291915
Kcompactd free scanned       242509759   217164988

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 3afa4e9188b6..8134dba47584 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -827,6 +827,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 					goto isolate_success;
 			}
 
+			/*
+			 * A reserved page is never freed and tend to be
+			 * clustered in the same pageblocks. Skip the block.
+			 */
+			if (PageReserved(page))
+				low_pfn = end_pfn;
+
 			goto isolate_fail;
 		}
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (4 preceding siblings ...)
  2018-12-14 23:03 ` [PATCH 05/14] mm, compaction: Skip pageblocks with reserved pages Mel Gorman
@ 2018-12-14 23:03 ` Mel Gorman
  2018-12-18  9:06   ` Vlastimil Babka
  2018-12-20 19:44   ` Yang Shi
  2018-12-14 23:03 ` [PATCH 07/14] mm, compaction: Always finish scanning of a full pageblock Mel Gorman
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:03 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing,
	Mel Gorman

Pages with no migration handler use a fallback hander which sometimes
works and sometimes persistently fails such as blockdev pages. Migration
will retry a number of times on these persistent pages which is wasteful
during compaction. This patch will fail migration immediately unless the
caller is in MIGRATE_SYNC mode which indicates the caller is willing to
wait while being persistent.

This is not expected to help THP allocation success rates but it does
reduce latencies slightly.

1-socket thpfioscale
                                    4.20.0-rc6             4.20.0-rc6
                               noreserved-v1r4          failfast-v1r4
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      2276.15 (   0.00%)     3867.54 * -69.92%*
Amean     fault-both-5      4992.20 (   0.00%)     5313.20 (  -6.43%)
Amean     fault-both-7      7373.30 (   0.00%)     7039.11 (   4.53%)
Amean     fault-both-12    11911.52 (   0.00%)    11328.29 (   4.90%)
Amean     fault-both-18    17209.42 (   0.00%)    16455.34 (   4.38%)
Amean     fault-both-24    20943.71 (   0.00%)    20448.94 (   2.36%)
Amean     fault-both-30    22703.00 (   0.00%)    21655.07 (   4.62%)
Amean     fault-both-32    22461.41 (   0.00%)    21415.35 (   4.66%)

The 2-socket results are not materially different. Scan rates are
similar as expected.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index df17a710e2c7..0e27a10429e2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -885,7 +885,7 @@ static int fallback_migrate_page(struct address_space *mapping,
 	 */
 	if (page_has_private(page) &&
 	    !try_to_release_page(page, GFP_KERNEL))
-		return -EAGAIN;
+		return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY;
 
 	return migrate_page(mapping, newpage, page, mode);
 }
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/14] mm, compaction: Always finish scanning of a full pageblock
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (5 preceding siblings ...)
  2018-12-14 23:03 ` [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler Mel Gorman
@ 2018-12-14 23:03 ` Mel Gorman
  2018-12-18  9:23   ` Vlastimil Babka
  2018-12-14 23:03 ` [PATCH 08/14] mm, compaction: Use the page allocator bulk-free helper for lists of pages Mel Gorman
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:03 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing,
	Mel Gorman

When compaction is finishing, it uses a flag to ensure the pageblock is
complete.  However, in general it makes sense to always complete migration
of a pageblock. Minimally, skip information is based on a pageblock and
partially scanned pageblocks may incur more scanning in the future. The
pageblock skip handling also becomes more strict later in the series and
the hint is more useful if a complete pageblock was always scanned.

The impact here is potentially on latencies as more scanning is done
but it's not a consistent win or loss as the scanning is not always a
high percentage of the pageblock and sometimes it is offset by future
reductions in scanning. Hence, the results are not presented this time as
it's a mix of gains/losses without any clear pattern. However, completing
scanning of the pageblock is important for later patches.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 19 ++++++++-----------
 mm/internal.h   |  1 -
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 8134dba47584..4f51435c645a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1338,16 +1338,14 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 	if (is_via_compact_memory(cc->order))
 		return COMPACT_CONTINUE;
 
-	if (cc->finishing_block) {
-		/*
-		 * We have finished the pageblock, but better check again that
-		 * we really succeeded.
-		 */
-		if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
-			cc->finishing_block = false;
-		else
-			return COMPACT_CONTINUE;
-	}
+	/*
+	 * Always finish scanning a pageblock to reduce the possibility of
+	 * fallbacks in the future. This is particularly important when
+	 * migration source is unmovable/reclaimable but it's not worth
+	 * special casing.
+	 */
+	if (!IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
+		return COMPACT_CONTINUE;
 
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
@@ -1389,7 +1387,6 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 				return COMPACT_SUCCESS;
 			}
 
-			cc->finishing_block = true;
 			return COMPACT_CONTINUE;
 		}
 	}
diff --git a/mm/internal.h b/mm/internal.h
index f40d06d70683..9b32f4cab0ae 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -203,7 +203,6 @@ struct compact_control {
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	bool contended;			/* Signal lock or sched contention */
-	bool finishing_block;		/* Finishing current pageblock */
 };
 
 unsigned long
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/14] mm, compaction: Use the page allocator bulk-free helper for lists of pages
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (6 preceding siblings ...)
  2018-12-14 23:03 ` [PATCH 07/14] mm, compaction: Always finish scanning of a full pageblock Mel Gorman
@ 2018-12-14 23:03 ` Mel Gorman
  2018-12-18  9:55   ` Vlastimil Babka
  2018-12-14 23:03 ` [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction Mel Gorman
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:03 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing,
	Mel Gorman

release_pages() is a simpler version of free_unref_page_list() but it
tracks the highest PFN for caching the restart point of the compaction
free scanner. This patch optionally tracks the highest PFN in the core
helper and converts compaction to use it.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/gfp.h |  7 ++++++-
 mm/compaction.c     | 12 +++---------
 mm/page_alloc.c     | 10 +++++++++-
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0705164f928c..ed9a95b63374 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -543,7 +543,12 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
 extern void __free_pages(struct page *page, unsigned int order);
 extern void free_pages(unsigned long addr, unsigned int order);
 extern void free_unref_page(struct page *page);
-extern void free_unref_page_list(struct list_head *list);
+extern void __free_page_list(struct list_head *list, bool dropref, unsigned long *highest_pfn);
+
+static inline void free_unref_page_list(struct list_head *list)
+{
+	return __free_page_list(list, false, NULL);
+}
 
 struct page_frag_cache;
 extern void __page_frag_cache_drain(struct page *page, unsigned int count);
diff --git a/mm/compaction.c b/mm/compaction.c
index 4f51435c645a..8ba9b3b479e3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -52,16 +52,10 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
 
 static unsigned long release_freepages(struct list_head *freelist)
 {
-	struct page *page, *next;
-	unsigned long high_pfn = 0;
+	unsigned long high_pfn;
 
-	list_for_each_entry_safe(page, next, freelist, lru) {
-		unsigned long pfn = page_to_pfn(page);
-		list_del(&page->lru);
-		__free_page(page);
-		if (pfn > high_pfn)
-			high_pfn = pfn;
-	}
+	__free_page_list(freelist, true, &high_pfn);
+	INIT_LIST_HEAD(freelist);
 
 	return high_pfn;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a6e7bfd18cde..80535cd55a92 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2961,18 +2961,26 @@ void free_unref_page(struct page *page)
 /*
  * Free a list of 0-order pages
  */
-void free_unref_page_list(struct list_head *list)
+void __free_page_list(struct list_head *list, bool dropref,
+				unsigned long *highest_pfn)
 {
 	struct page *page, *next;
 	unsigned long flags, pfn;
 	int batch_count = 0;
 
+	if (highest_pfn)
+		*highest_pfn = 0;
+
 	/* Prepare pages for freeing */
 	list_for_each_entry_safe(page, next, list, lru) {
+		if (dropref)
+			WARN_ON_ONCE(!put_page_testzero(page));
 		pfn = page_to_pfn(page);
 		if (!free_unref_page_prepare(page, pfn))
 			list_del(&page->lru);
 		set_page_private(page, pfn);
+		if (highest_pfn && pfn > *highest_pfn)
+			*highest_pfn = pfn;
 	}
 
 	local_irq_save(flags);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (7 preceding siblings ...)
  2018-12-14 23:03 ` [PATCH 08/14] mm, compaction: Use the page allocator bulk-free helper for lists of pages Mel Gorman
@ 2018-12-14 23:03 ` Mel Gorman
  2018-12-18 12:36   ` Vlastimil Babka
  2018-12-14 23:04 ` [PATCH 10/14] mm, compaction: Use free lists to quickly locate a migration source Mel Gorman
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:03 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing,
	Mel Gorman

When pageblocks get fragmented, watermarks are artifically boosted to pages
are reclaimed to avoid further fragmentation events. However, compaction
is often either fragmentation-neutral or moving movable pages away from
unmovable/reclaimable pages. As the actual watermarks are preserved,
allow compaction to ignore the boost factor.

1-socket thpscale
                                    4.20.0-rc6             4.20.0-rc6
                               finishscan-v1r4           noboost-v1r4
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3849.90 (   0.00%)     3753.53 (   2.50%)
Amean     fault-both-5      5054.13 (   0.00%)     5396.32 (  -6.77%)
Amean     fault-both-7      7061.77 (   0.00%)     7393.46 (  -4.70%)
Amean     fault-both-12    11560.59 (   0.00%)    12155.50 (  -5.15%)
Amean     fault-both-18    16120.15 (   0.00%)    16445.96 (  -2.02%)
Amean     fault-both-24    19804.31 (   0.00%)    20465.03 (  -3.34%)
Amean     fault-both-30    25018.73 (   0.00%)    20813.54 *  16.81%*
Amean     fault-both-32    24380.19 (   0.00%)    22384.02 (   8.19%)

The impact on the scan rates is a mixed bag because this patch is very
sensitive to timing and whether the boost was active or not. However,
detailed tracing indicated that failure of migration due to a premature
ENOMEM triggered by watermark checks were eliminated.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 80535cd55a92..c7b80e62bfd9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3043,7 +3043,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		 * watermark, because we already know our high-order page
 		 * exists.
 		 */
-		watermark = min_wmark_pages(zone) + (1UL << order);
+		watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
 		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
 			return 0;
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/14] mm, compaction: Use free lists to quickly locate a migration source
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (8 preceding siblings ...)
  2018-12-14 23:03 ` [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction Mel Gorman
@ 2018-12-14 23:04 ` Mel Gorman
  2018-12-14 23:05 ` [PATCH 11/14] mm, compaction: Keep migration source private to a single compaction instance Mel Gorman
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:04 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

The migration scanner is a linear scan of a zone which is a potentially
very large search space. Furthermore, many pageblocks are unusable such
as those filled with reserved pages or partially filled with pages that
cannot migrate. These still get scanned in the common case of allocating
a THP and the cost accumulates.

The patch uses a partial search of the free lists to locate a migration
source candidate that is marked as MOVABLE when allocating a THP. It
prefers picking a block with a larger number of free pages already on
the basis that there are fewer pages to migrate to free the entire block.
The lowest PFN found during searches is tracked as the basis of the start
for the linear search after the first search of the free list fails.
After the search, the free list is shuffled so that the next search will
not encounter the same page. If the search fails then the subsequent
searches will be shorter and the linear scanner is used.

If this search fails, or if the request is for a small or
unmovable/reclaimable allocation then the linear scanner is still used. It
is somewhat pointless to use the list search in these cases. Small free
pages must be used for the search and there is no guarantee that movable
pages are located within that block that are contiguous.

                                    4.20.0-rc6             4.20.0-rc6
                                  noboost-v1r4           findmig-v1r4
Amean     fault-both-3      3753.53 (   0.00%)     3545.40 (   5.54%)
Amean     fault-both-5      5396.32 (   0.00%)     5431.98 (  -0.66%)
Amean     fault-both-7      7393.46 (   0.00%)     7185.11 (   2.82%)
Amean     fault-both-12    12155.50 (   0.00%)    11424.68 (   6.01%)
Amean     fault-both-18    16445.96 (   0.00%)    14170.10 *  13.84%*
Amean     fault-both-24    20465.03 (   0.00%)    16143.57 *  21.12%*
Amean     fault-both-30    20813.54 (   0.00%)    19207.96 (   7.71%)
Amean     fault-both-32    22384.02 (   0.00%)    20051.01 *  10.42%*

Compaction migrate scanned    60836989    51005450
Compaction free scanned      890084421   780359464

This is showing a 16% reduction in migration scanning with some mild
improvements on latency. A 2-socket machine showed similar reductions
of scan rates in percentage terms.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 mm/internal.h   |   2 +
 2 files changed, 179 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 8ba9b3b479e3..65c7ab1847a0 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1041,6 +1041,12 @@ static bool suitable_migration_target(struct compact_control *cc,
 	return false;
 }
 
+static inline unsigned int
+freelist_scan_limit(struct compact_control *cc)
+{
+	return (COMPACT_CLUSTER_MAX >> cc->fast_search_fail) + 1;
+}
+
 /*
  * Test whether the free scanner has reached the same or lower pageblock than
  * the migration scanner, and compaction should thus terminate.
@@ -1051,6 +1057,19 @@ static inline bool compact_scanners_met(struct compact_control *cc)
 		<= (cc->migrate_pfn >> pageblock_order);
 }
 
+/* Reorder the free list to reduce repeated future searches */
+static void
+move_freelist_tail(struct list_head *freelist, struct page *freepage)
+{
+	LIST_HEAD(sublist);
+
+	if (!list_is_last(freelist, &freepage->lru)) {
+		list_cut_position(&sublist, freelist, &freepage->lru);
+		if (!list_empty(&sublist))
+			list_splice_tail(&sublist, freelist);
+	}
+}
+
 /*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
@@ -1208,6 +1227,160 @@ typedef enum {
  */
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
 
+static inline void
+update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
+{
+	if (cc->fast_start_pfn == ULONG_MAX)
+		return;
+
+	if (!cc->fast_start_pfn)
+		cc->fast_start_pfn = pfn;
+
+	cc->fast_start_pfn = min(cc->fast_start_pfn, pfn);
+}
+
+static inline void
+reinit_migrate_pfn(struct compact_control *cc)
+{
+	if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX)
+		return;
+
+	cc->migrate_pfn = cc->fast_start_pfn;
+	cc->fast_start_pfn = ULONG_MAX;
+}
+
+/*
+ * Briefly search the free lists for a migration source that already has
+ * some free pages to reduce the number of pages that need migration
+ * before a pageblock is free.
+ */
+static unsigned long fast_find_migrateblock(struct compact_control *cc)
+{
+	unsigned int limit = freelist_scan_limit(cc);
+	unsigned int nr_scanned = 0;
+	unsigned long distance;
+	unsigned long pfn = cc->migrate_pfn;
+	unsigned long high_pfn;
+	int order;
+
+	/* Skip hints are relied on to avoid repeats on the fast search */
+	if (cc->ignore_skip_hint)
+		return pfn;
+
+	/*
+	 * If the migrate_pfn is not at the start of a zone or the start
+	 * of a pageblock then assume this is a continuation of a previous
+	 * scan restarted due to COMPACT_CLUSTER_MAX.
+	 */
+	if (pfn != cc->zone->zone_start_pfn && pfn != pageblock_start_pfn(pfn))
+		return pfn;
+
+	/*
+	 * For smaller orders, just linearly scan as the number of pages
+	 * to migrate should be relatively small and does not necessarily
+	 * justify freeing up a large block for a small allocation.
+	 */
+	if (cc->order <= PAGE_ALLOC_COSTLY_ORDER)
+		return pfn;
+
+	/*
+	 * Only allow kcompactd and direct requests for movable pages to
+	 * quickly clear out a MOVABLE pageblock for allocation. This
+	 * reduces the risk that a large movable pageblock is freed for
+	 * an unmovable/reclaimable small allocation.
+	 */
+	if (cc->direct_compaction && cc->migratetype != MIGRATE_MOVABLE)
+		return pfn;
+
+	/*
+	 * When starting the migration scanner, pick any pageblock within the
+	 * first half of the search space. Otherwise try and pick a pageblock
+	 * within the first eighth to reduce the chances that a migration
+	 * target later becomes a source.
+	 */
+	distance = (cc->free_pfn - cc->migrate_pfn) >> 1;
+	if (cc->migrate_pfn != cc->zone->zone_start_pfn)
+		distance >>= 2;
+	high_pfn = pageblock_start_pfn(cc->migrate_pfn + distance);
+
+	for (order = cc->order - 1;
+	     order >= PAGE_ALLOC_COSTLY_ORDER && pfn == cc->migrate_pfn && nr_scanned < limit;
+	     order--) {
+		struct free_area *area = &cc->zone->free_area[order];
+		struct list_head *freelist;
+		unsigned long nr_skipped = 0;
+		unsigned long flags;
+		struct page *freepage;
+
+		if (!area->nr_free)
+			continue;
+
+		spin_lock_irqsave(&cc->zone->lock, flags);
+		freelist = &area->free_list[MIGRATE_MOVABLE];
+		list_for_each_entry(freepage, freelist, lru) {
+			unsigned long free_pfn;
+
+			nr_scanned++;
+			free_pfn = page_to_pfn(freepage);
+			if (free_pfn < high_pfn) {
+				update_fast_start_pfn(cc, free_pfn);
+
+				/*
+				 * Avoid if skipped recently. Move to the tail
+				 * of the list so it will not be found again
+				 * soon
+				 */
+				if (get_pageblock_skip(freepage)) {
+
+					if (list_is_last(freelist, &freepage->lru))
+						break;
+
+					nr_skipped++;
+					list_del(&freepage->lru);
+					list_add_tail(&freepage->lru, freelist);
+					if (nr_skipped > 2)
+						break;
+					continue;
+				}
+
+				/* Reorder to so a future search skips recent pages */
+				move_freelist_tail(freelist, freepage);
+
+				pfn = pageblock_start_pfn(free_pfn);
+				cc->fast_search_fail = 0;
+				set_pageblock_skip(freepage);
+				break;
+			}
+
+			/*
+			 * If low PFNs are being found and discarded then
+			 * limit the scan as fast searching is finding
+			 * poor candidates.
+			 */
+			if (free_pfn < cc->migrate_pfn)
+				limit >>= 1;
+
+			if (nr_scanned >= limit) {
+				cc->fast_search_fail++;
+				move_freelist_tail(freelist, freepage);
+				break;
+			}
+		}
+		spin_unlock_irqrestore(&cc->zone->lock, flags);
+	}
+
+	cc->total_migrate_scanned += nr_scanned;
+
+	/*
+	 * If fast scanning failed then use a cached entry for a page block
+	 * that had free pages as the basis for starting a linear scan.
+	 */
+	if (pfn == cc->migrate_pfn)
+		reinit_migrate_pfn(cc);
+
+	return pfn;
+}
+
 /*
  * Isolate all pages that can be migrated from the first suitable block,
  * starting at the block pointed to by the migrate scanner pfn within
@@ -1226,9 +1399,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 
 	/*
 	 * Start at where we last stopped, or beginning of the zone as
-	 * initialized by compact_zone()
+	 * initialized by compact_zone(). The first failure will use
+	 * the lowest PFN as the starting point for linear scanning.
 	 */
-	low_pfn = cc->migrate_pfn;
+	low_pfn = fast_find_migrateblock(cc);
 	block_start_pfn = pageblock_start_pfn(low_pfn);
 	if (block_start_pfn < zone->zone_start_pfn)
 		block_start_pfn = zone->zone_start_pfn;
@@ -1551,6 +1725,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
 	 * want to compact the whole zone), but check that it is initialised
 	 * by ensuring the values are within zone boundaries.
 	 */
+	cc->fast_start_pfn = 0;
 	if (cc->whole_zone) {
 		cc->migrate_pfn = start_pfn;
 		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
diff --git a/mm/internal.h b/mm/internal.h
index 9b32f4cab0ae..983cb975545f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -188,9 +188,11 @@ struct compact_control {
 	unsigned int nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	unsigned long fast_start_pfn;	/* a pfn to start linear scan from */
 	struct zone *zone;
 	unsigned long total_migrate_scanned;
 	unsigned long total_free_scanned;
+	unsigned int fast_search_fail;	/* failures to use free list searches */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* migratetype of direct compactor */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/14] mm, compaction: Keep migration source private to a single compaction instance
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (9 preceding siblings ...)
  2018-12-14 23:04 ` [PATCH 10/14] mm, compaction: Use free lists to quickly locate a migration source Mel Gorman
@ 2018-12-14 23:05 ` Mel Gorman
  2018-12-14 23:05 ` [PATCH 12/14] mm, compaction: Use free lists to quickly locate a migration target Mel Gorman
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:05 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

Due to either a fast search of the free list or a linear scan, it's possible
for multiple compaction instances to pick the same pageblock for migration.
This is lucky for one scanner and increased scanning for all the others. It
also opens a race to allocate the resulting free block.

This patch tests and updates the pageblock skip for the migration scanner
more carefully. When isolating a block, it will check and skip if the block
is already in use. Once the zone lock is acquired, it will be rechecked so
that only one scanner can set the pageblock skip for exclusive use. Any
scanner contending will continue with a linear scan. The skip bit is still
set if no pages can be isolated in a range. While this may result in redundant
scanning, it avoids unnecessarily acquiring the zone lock when there are
no suitable migration sources.

1-socket thpscale
                                    4.20.0-rc6             4.20.0-rc6
                                  findmig-v1r4           isolmig-v1r4
Amean     fault-both-3      3545.40 (   0.00%)     2980.25 *  15.94%*
Amean     fault-both-5      5431.98 (   0.00%)     4393.04 *  19.13%*
Amean     fault-both-7      7185.11 (   0.00%)     5797.16 *  19.32%*
Amean     fault-both-12    11424.68 (   0.00%)     9849.61 (  13.79%)
Amean     fault-both-18    14170.10 (   0.00%)    13816.96 (   2.49%)
Amean     fault-both-24    16143.57 (   0.00%)    16255.20 (  -0.69%)
Amean     fault-both-30    19207.96 (   0.00%)    15741.25 *  18.05%*
Amean     fault-both-32    20051.01 (   0.00%)    16624.73 *  17.09%*

This is the first patch that shows a significant reduction in latency
as multiple compaction scanners do not operate on the same blocks. There
is a small increase in the success rate

                               4.20.0-rc6             4.20.0-rc6
                             findmig-v1r4           isolmig-v1r4
Percentage huge-3        91.95 (   0.00%)       95.97 (   4.37%)
Percentage huge-5        91.40 (   0.00%)       94.78 (   3.70%)
Percentage huge-7        92.94 (   0.00%)       93.94 (   1.07%)
Percentage huge-12       92.13 (   0.00%)       93.77 (   1.78%)
Percentage huge-18       91.01 (   0.00%)       94.57 (   3.91%)
Percentage huge-24       89.56 (   0.00%)       93.71 (   4.63%)
Percentage huge-30       90.26 (   0.00%)       94.14 (   4.30%)
Percentage huge-32       90.70 (   0.00%)       94.44 (   4.12%)

Compaction migrate scanned    51005450    25587453
Compaction free scanned      780359464    87735894

Migration scan rates are reduced by 49%. At the time of writing, the
2-socket results are not yet available.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 112 +++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 87 insertions(+), 25 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 65c7ab1847a0..b0309bf409b3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -279,13 +279,51 @@ void reset_isolation_suitable(pg_data_t *pgdat)
 	}
 }
 
+/*
+ * Sets the pageblock skip bit if it was clear. Note that this is a hint as
+ * locks are not required for read/writers. Returns true if it was already set.
+ */
+static bool test_and_set_skip(struct compact_control *cc, struct page *page,
+							unsigned long pfn)
+{
+	bool skip;
+
+	/* Do no update if skip hint is being ignored */
+	if (cc->ignore_skip_hint)
+		return false;
+
+	if (!IS_ALIGNED(pfn, pageblock_nr_pages))
+		return false;
+
+	skip = get_pageblock_skip(page);
+	if (!skip && !cc->no_set_skip_hint)
+		set_pageblock_skip(page);
+
+	return skip;
+}
+
+static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
+{
+	struct zone *zone = cc->zone;
+	pfn = pageblock_end_pfn(pfn);
+
+	/* Set for isolation rather than compaction */
+	if (cc->no_set_skip_hint)
+		return;
+
+	if (pfn > zone->compact_cached_migrate_pfn[0])
+		zone->compact_cached_migrate_pfn[0] = pfn;
+	if (cc->mode != MIGRATE_ASYNC &&
+	    pfn > zone->compact_cached_migrate_pfn[1])
+		zone->compact_cached_migrate_pfn[1] = pfn;
+}
+
 /*
  * If no pages were isolated then mark this pageblock to be skipped in the
  * future. The information is later cleared by __reset_isolation_suitable().
  */
 static void update_pageblock_skip(struct compact_control *cc,
-			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			struct page *page, unsigned long nr_isolated)
 {
 	struct zone *zone = cc->zone;
 	unsigned long pfn;
@@ -304,16 +342,8 @@ static void update_pageblock_skip(struct compact_control *cc,
 	pfn = page_to_pfn(page);
 
 	/* Update where async and sync compaction should restart */
-	if (migrate_scanner) {
-		if (pfn > zone->compact_cached_migrate_pfn[0])
-			zone->compact_cached_migrate_pfn[0] = pfn;
-		if (cc->mode != MIGRATE_ASYNC &&
-		    pfn > zone->compact_cached_migrate_pfn[1])
-			zone->compact_cached_migrate_pfn[1] = pfn;
-	} else {
-		if (pfn < zone->compact_cached_free_pfn)
-			zone->compact_cached_free_pfn = pfn;
-	}
+	if (pfn < zone->compact_cached_free_pfn)
+		zone->compact_cached_free_pfn = pfn;
 }
 #else
 static inline bool isolation_suitable(struct compact_control *cc,
@@ -561,7 +591,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 
 	/* Update the pageblock-skip if the whole pageblock was scanned */
 	if (blockpfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, total_isolated, false);
+		update_pageblock_skip(cc, valid_page, total_isolated);
 
 	cc->total_free_scanned += nr_scanned;
 	if (total_isolated)
@@ -696,6 +726,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	unsigned long start_pfn = low_pfn;
 	bool skip_on_failure = false;
 	unsigned long next_skip_pfn = 0;
+	bool skip_updated = false;
 
 	/*
 	 * Ensure that there are not too many pages isolated from the LRU
@@ -762,8 +793,19 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		page = pfn_to_page(low_pfn);
 
-		if (!valid_page)
+		/*
+		 * Check if the pageblock has already been marked skipped.
+		 * Only the aligned PFN is checked as the caller isolates
+		 * COMPACT_CLUSTER_MAX at a time so the second call must
+		 * not falsely conclude that the block should be skipped.
+		 */
+		if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) {
+			if (!cc->ignore_skip_hint && get_pageblock_skip(page)) {
+				low_pfn = end_pfn;
+				goto isolate_abort;
+			}
 			valid_page = page;
+		}
 
 		/*
 		 * Skip if free. We read page order here without zone lock
@@ -851,8 +893,19 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		if (!locked) {
 			locked = compact_trylock_irqsave(zone_lru_lock(zone),
 								&flags, cc);
-			if (!locked)
+
+			/* Allow future scanning if the lock is contended */
+			if (!locked) {
+				clear_pageblock_skip(page);
 				break;
+			}
+
+			/* Try get exclusive access under lock */
+			if (!skip_updated) {
+				skip_updated = true;
+				if (test_and_set_skip(cc, page, low_pfn))
+					goto isolate_abort;
+			}
 
 			/* Recheck PageLRU and PageCompound under lock */
 			if (!PageLRU(page))
@@ -930,15 +983,20 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	if (unlikely(low_pfn > end_pfn))
 		low_pfn = end_pfn;
 
+isolate_abort:
 	if (locked)
 		spin_unlock_irqrestore(zone_lru_lock(zone), flags);
 
 	/*
-	 * Update the pageblock-skip information and cached scanner pfn,
-	 * if the whole pageblock was scanned without isolating any page.
+	 * Updated the cached scanner pfn if the pageblock was scanned
+	 * without isolating a page. The pageblock may not be marked
+	 * skipped already if there were no LRU pages in the block.
 	 */
-	if (low_pfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, nr_isolated, true);
+	if (low_pfn == end_pfn && !nr_isolated) {
+		if (valid_page && !skip_updated)
+			set_pageblock_skip(valid_page);
+		update_cached_migrate(cc, low_pfn);
+	}
 
 	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
 						nr_scanned, nr_isolated);
@@ -1323,8 +1381,6 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 			nr_scanned++;
 			free_pfn = page_to_pfn(freepage);
 			if (free_pfn < high_pfn) {
-				update_fast_start_pfn(cc, free_pfn);
-
 				/*
 				 * Avoid if skipped recently. Move to the tail
 				 * of the list so it will not be found again
@@ -1346,9 +1402,9 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 				/* Reorder to so a future search skips recent pages */
 				move_freelist_tail(freelist, freepage);
 
+				update_fast_start_pfn(cc, free_pfn);
 				pfn = pageblock_start_pfn(free_pfn);
 				cc->fast_search_fail = 0;
-				set_pageblock_skip(freepage);
 				break;
 			}
 
@@ -1418,7 +1474,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			low_pfn = block_end_pfn,
 			block_start_pfn = block_end_pfn,
 			block_end_pfn += pageblock_nr_pages) {
-
 		/*
 		 * This can potentially iterate a massively long zone with
 		 * many pageblocks unsuitable, so periodically check if we
@@ -1433,8 +1488,15 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		if (!page)
 			continue;
 
-		/* If isolation recently failed, do not retry */
-		if (!isolation_suitable(cc, page))
+		/*
+		 * If isolation recently failed, do not retry. Only check the
+		 * pageblock once. COMPACT_CLUSTER_MAX causes a pageblock
+		 * to be visited multiple times. Assume skip was checked
+		 * before making it "skip" so other compaction instances do
+		 * not scan the same block.
+		 */
+		if (IS_ALIGNED(low_pfn, pageblock_nr_pages) &&
+		    !isolation_suitable(cc, page))
 			continue;
 
 		/*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/14] mm, compaction: Use free lists to quickly locate a migration target
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (10 preceding siblings ...)
  2018-12-14 23:05 ` [PATCH 11/14] mm, compaction: Keep migration source private to a single compaction instance Mel Gorman
@ 2018-12-14 23:05 ` Mel Gorman
  2018-12-14 23:05 ` [PATCH 13/14] mm, compaction: Capture a page under direct compaction Mel Gorman
  2018-12-14 23:06 ` [PATCH 14/14] mm, compaction: Do not direct compact remote memory Mel Gorman
  13 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:05 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

Similar to the migration scanner, this uses the free lists to quickly
locate a migration target. The search is different in that lower orders
will be searched for a suitable high PFN if necessary but the search
is still bound. This is justified on the grounds that the free scanner
typically scans linearly much more than the migration scanner.

If a free page is found, it is isolated and the full pageblock is scanned
for any remaining free pages. This is done so that it's possible to mark
the pageblock for skipping in the near future.

1-socket thpfioscale
                                    4.20.0-rc6             4.20.0-rc6
                                  isolmig-v1r4          findfree-v1r8
Amean     fault-both-3      2980.25 (   0.00%)     2911.07 (   2.32%)
Amean     fault-both-5      4393.04 (   0.00%)     4692.96 (  -6.83%)
Amean     fault-both-7      5797.16 (   0.00%)     6449.17 ( -11.25%)
Amean     fault-both-12     9849.61 (   0.00%)     9778.40 (   0.72%)
Amean     fault-both-18    13816.96 (   0.00%)    11756.92 (  14.91%)
Amean     fault-both-24    16255.20 (   0.00%)    13675.93 *  15.87%*
Amean     fault-both-30    15741.25 (   0.00%)    17195.41 (  -9.24%)
Amean     fault-both-32    16624.73 (   0.00%)    18150.08 (  -9.18%)

The impact on latency is variable but the search is optimistic and
sensitive to the exact system state. Success rates are similar but
the major impact is to the rate of scanning

                            4.20.0-rc6  4.20.0-rc6
                          isolmig-v1r4findfree-v1r8
Compaction migrate scanned    25587453    27634284
Compaction free scanned       87735894    55279519

The free scan rates are reduced by 37%.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 201 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 197 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b0309bf409b3..ba3035dcc548 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1117,7 +1117,7 @@ static inline bool compact_scanners_met(struct compact_control *cc)
 
 /* Reorder the free list to reduce repeated future searches */
 static void
-move_freelist_tail(struct list_head *freelist, struct page *freepage)
+move_freelist_head(struct list_head *freelist, struct page *freepage)
 {
 	LIST_HEAD(sublist);
 
@@ -1128,6 +1128,193 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
 	}
 }
 
+static void
+move_freelist_tail(struct list_head *freelist, struct page *freepage)
+{
+	LIST_HEAD(sublist);
+
+	if (!list_is_last(freelist, &freepage->lru)) {
+		list_cut_before(&sublist, freelist, &freepage->lru);
+		if (!list_empty(&sublist))
+			list_splice_tail(&sublist, freelist);
+	}
+}
+
+static void
+fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated)
+{
+	unsigned long start_pfn, end_pfn;
+	struct page *page = pfn_to_page(pfn);
+
+	/* Do not search around if there are enough pages already */
+	if (cc->nr_freepages >= cc->nr_migratepages)
+		return;
+
+	/* Minimise scanning during async compaction */
+	if (cc->direct_compaction && cc->mode == MIGRATE_ASYNC)
+		return;
+
+	/* Pageblock boundaries */
+	start_pfn = pageblock_start_pfn(pfn);
+	end_pfn = min(start_pfn + pageblock_nr_pages, zone_end_pfn(cc->zone));
+
+	/* Scan before */
+	if (start_pfn != pfn) {
+		isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, false);
+		if (cc->nr_freepages >= cc->nr_migratepages)
+			return;
+	}
+
+	/* Scan after */
+	start_pfn = pfn + nr_isolated;
+	if (start_pfn != end_pfn)
+		isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, false);
+
+	/* Skip this pageblock in the future as it's full or nearly full */
+	if (cc->nr_freepages < cc->nr_migratepages)
+		set_pageblock_skip(page);
+}
+
+static unsigned long
+fast_isolate_freepages(struct compact_control *cc)
+{
+	unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
+	unsigned int order_scanned = 0, nr_scanned = 0;
+	unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0;
+	unsigned long nr_isolated = 0;
+	unsigned long distance;
+	struct page *page = NULL;
+	bool scan_start = false;
+	int order;
+
+	/*
+	 * If starting the scan, use a deeper search and use the highest
+	 * PFN found if a suitable one is not found.
+	 */
+	if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) {
+		limit = pageblock_nr_pages >> 1;
+		scan_start = true;
+	}
+
+	/*
+	 * Preferred point is in the top quarter of the scan space but take
+	 * a pfn from the top half if the search is problematic.
+	 */
+	distance = (cc->free_pfn - cc->migrate_pfn);
+	low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2));
+	min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1));
+
+	if (WARN_ON_ONCE(min_pfn > low_pfn))
+		low_pfn = min_pfn;
+
+	for (order = cc->order - 1;
+	     order >= 0 && !page;
+	     order--) {
+		struct free_area *area = &cc->zone->free_area[order];
+		struct list_head *freelist;
+		struct page *freepage;
+		unsigned long flags;
+
+		if (!area->nr_free)
+			continue;
+
+		spin_lock_irqsave(&cc->zone->lock, flags);
+		freelist = &area->free_list[MIGRATE_MOVABLE];
+		list_for_each_entry_reverse(freepage, freelist, lru) {
+			unsigned long pfn;
+
+			order_scanned++;
+			nr_scanned++;
+			pfn = page_to_pfn(freepage);
+
+			if (pfn >= highest)
+				highest = pageblock_start_pfn(pfn);
+
+			if (pfn >= low_pfn) {
+				cc->fast_search_fail = 0;
+				page = freepage;
+				break;
+			}
+
+			if (pfn >= min_pfn && pfn > high_pfn) {
+				high_pfn = pfn;
+
+				/* Shorten the scan if a candidate is found */
+				limit >>= 1;
+			}
+
+			if (order_scanned >= limit)
+				break;
+		}
+
+		/* Use a minimum pfn if a preferred one was not found */
+		if (!page && high_pfn) {
+			page = pfn_to_page(high_pfn);
+
+			/* Update freepage for the list reorder below */
+			freepage = page;
+		}
+
+		/* Reorder to so a future search skips recent pages */
+		move_freelist_head(freelist, freepage);
+
+		/* Isolate the page if available */
+		if (page) {
+			if (__isolate_free_page(page, order)) {
+				set_page_private(page, order);
+				nr_isolated = 1 << order;
+				cc->nr_freepages += nr_isolated;
+				list_add_tail(&page->lru, &cc->freepages);
+				count_compact_events(COMPACTISOLATED, nr_isolated);
+			} else {
+				/* If isolation fails, abort the search */
+				order = -1;
+				page = NULL;
+			}
+		}
+
+		spin_unlock_irqrestore(&cc->zone->lock, flags);
+
+		/*
+		 * Smaller scan on next order so the total scan ig related
+		 * to freelist_scan_limit.
+		 */
+		if (order_scanned >= limit)
+			limit = min(1U, limit >> 1);
+	}
+
+	if (!page) {
+		cc->fast_search_fail++;
+		if (scan_start) {
+			/*
+			 * Use the highest PFN found above min. If one was
+			 * not found, be pessemistic for direct compaction
+			 * and use the min mark.
+			 */
+			if (highest) {
+				page = pfn_to_page(highest);
+				cc->free_pfn = highest;
+			} else {
+				if (cc->direct_compaction) {
+					page = pfn_to_page(min_pfn);
+					cc->free_pfn = min_pfn;
+				}
+			}
+		}
+	}
+
+	if (highest && highest > cc->zone->compact_cached_free_pfn)
+		cc->zone->compact_cached_free_pfn = highest;
+
+	cc->total_free_scanned += nr_scanned;
+	if (!page)
+		return 0;
+
+	low_pfn = page_to_pfn(page);
+	fast_isolate_around(cc, low_pfn, nr_isolated);
+	return low_pfn;
+}
+
 /*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
@@ -1142,6 +1329,11 @@ static void isolate_freepages(struct compact_control *cc)
 	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
 	struct list_head *freelist = &cc->freepages;
 
+	/* Try a small search of the free lists for a candidate */
+	isolate_start_pfn = fast_isolate_freepages(cc);
+	if (isolate_start_pfn)
+		goto splitmap;
+
 	/*
 	 * Initialise the free scanner. The starting point is where we last
 	 * successfully isolated from, zone-cached value, or the end of the
@@ -1218,9 +1410,6 @@ static void isolate_freepages(struct compact_control *cc)
 		}
 	}
 
-	/* __isolate_free_page() does not map the pages */
-	split_map_pages(freelist);
-
 	/*
 	 * Record where the free scanner will restart next time. Either we
 	 * broke from the loop and set isolate_start_pfn based on the last
@@ -1228,6 +1417,10 @@ static void isolate_freepages(struct compact_control *cc)
 	 * and the loop terminated due to isolate_start_pfn < low_pfn
 	 */
 	cc->free_pfn = isolate_start_pfn;
+
+splitmap:
+	/* __isolate_free_page() does not map the pages */
+	split_map_pages(freelist);
 }
 
 /*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/14] mm, compaction: Capture a page under direct compaction
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (11 preceding siblings ...)
  2018-12-14 23:05 ` [PATCH 12/14] mm, compaction: Use free lists to quickly locate a migration target Mel Gorman
@ 2018-12-14 23:05 ` Mel Gorman
  2018-12-14 23:06 ` [PATCH 14/14] mm, compaction: Do not direct compact remote memory Mel Gorman
  13 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:05 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

Compaction is inherently race-prone as a suitable page freed during compaction
can be allocated by any parallel task. This patch uses a capture_control
structure to isolate a page immediately when it is freed by a direct compactor
in the slow path of the page allocator.

                                    4.20.0-rc6             4.20.0-rc6
                                 findfree-v1r8           capture-v1r8
Amean     fault-both-3      2911.07 (   0.00%)     2898.64 (   0.43%)
Amean     fault-both-5      4692.96 (   0.00%)     4296.58 (   8.45%)
Amean     fault-both-7      6449.17 (   0.00%)     6203.55 (   3.81%)
Amean     fault-both-12     9778.40 (   0.00%)     9309.13 (   4.80%)
Amean     fault-both-18    11756.92 (   0.00%)     6245.27 *  46.88%*
Amean     fault-both-24    13675.93 (   0.00%)    15083.42 ( -10.29%)
Amean     fault-both-30    17195.41 (   0.00%)    11498.60 *  33.13%*
Amean     fault-both-32    18150.08 (   0.00%)     9684.82 *  46.64%*

As expected, the biggest reduction in latency is when there are multiple
compaction instances that would previously compete for the same blocks.
THP allocation rates are also slightly higher.

                               4.20.0-rc6             4.20.0-rc6
                            findfree-v1r8           capture-v1r8
Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
Percentage huge-3        97.63 (   0.00%)       98.12 (   0.49%)
Percentage huge-5        96.11 (   0.00%)       98.83 (   2.84%)
Percentage huge-7        95.44 (   0.00%)       97.99 (   2.68%)
Percentage huge-12       95.36 (   0.00%)       99.00 (   3.82%)
Percentage huge-18       95.32 (   0.00%)       98.92 (   3.78%)
Percentage huge-24       95.13 (   0.00%)       99.08 (   4.15%)
Percentage huge-30       95.53 (   0.00%)       99.22 (   3.86%)
Percentage huge-32       94.94 (   0.00%)       98.97 (   4.25%)

And scan rates are reduced

Compaction migrate scanned    27634284    19002941
Compaction free scanned       55279519    46395714

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/compaction.h |  3 ++-
 include/linux/sched.h      |  4 ++++
 kernel/sched/core.c        |  3 +++
 mm/compaction.c            | 31 +++++++++++++++++++------
 mm/internal.h              |  9 +++++++
 mm/page_alloc.c            | 58 ++++++++++++++++++++++++++++++++++++++++++----
 6 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 68250a57aace..b0d530cf46d1 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -95,7 +95,8 @@ extern int sysctl_compact_unevictable_allowed;
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 		unsigned int order, unsigned int alloc_flags,
-		const struct alloc_context *ac, enum compact_priority prio);
+		const struct alloc_context *ac, enum compact_priority prio,
+		struct page **page);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int order,
 		unsigned int alloc_flags, int classzone_idx);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8681905589f0..f1758ef4d1e2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -47,6 +47,7 @@ struct pid_namespace;
 struct pipe_inode_info;
 struct rcu_node;
 struct reclaim_state;
+struct capture_control;
 struct robust_list_head;
 struct sched_attr;
 struct sched_param;
@@ -964,6 +965,9 @@ struct task_struct {
 
 	struct io_context		*io_context;
 
+#ifdef CONFIG_COMPACTION
+	struct capture_control		*capture_control;
+#endif
 	/* Ptrace state: */
 	unsigned long			ptrace_message;
 	kernel_siginfo_t		*last_siginfo;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5f41fd2e0b6b..cd6d816aa40b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2177,6 +2177,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	INIT_HLIST_HEAD(&p->preempt_notifiers);
 #endif
 
+#ifdef CONFIG_COMPACTION
+	p->capture_control = NULL;
+#endif
 	init_numa_balancing(clone_flags, p);
 }
 
diff --git a/mm/compaction.c b/mm/compaction.c
index ba3035dcc548..39d33b6d1172 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1949,7 +1949,8 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 	return false;
 }
 
-static enum compact_result compact_zone(struct compact_control *cc)
+static enum compact_result
+compact_zone(struct compact_control *cc, struct capture_control *capc)
 {
 	enum compact_result ret;
 	unsigned long start_pfn = cc->zone->zone_start_pfn;
@@ -2086,6 +2087,11 @@ static enum compact_result compact_zone(struct compact_control *cc)
 			}
 		}
 
+		/* Stop if a page has been captured */
+		if (capc && capc->page) {
+			ret = COMPACT_SUCCESS;
+			break;
+		}
 	}
 
 out:
@@ -2119,7 +2125,8 @@ static enum compact_result compact_zone(struct compact_control *cc)
 
 static enum compact_result compact_zone_order(struct zone *zone, int order,
 		gfp_t gfp_mask, enum compact_priority prio,
-		unsigned int alloc_flags, int classzone_idx)
+		unsigned int alloc_flags, int classzone_idx,
+		struct page **capture)
 {
 	enum compact_result ret;
 	struct compact_control cc = {
@@ -2139,14 +2146,24 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
 		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
 	};
+	struct capture_control capc = {
+		.cc = &cc,
+		.page = NULL,
+	};
+
+	if (capture)
+		current->capture_control = &capc;
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	ret = compact_zone(&cc);
+	ret = compact_zone(&cc, &capc);
 
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
 
+	*capture = capc.page;
+	current->capture_control = NULL;
+
 	return ret;
 }
 
@@ -2164,7 +2181,7 @@ int sysctl_extfrag_threshold = 500;
  */
 enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum compact_priority prio)
+		enum compact_priority prio, struct page **capture)
 {
 	int may_perform_io = gfp_mask & __GFP_IO;
 	struct zoneref *z;
@@ -2192,7 +2209,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		}
 
 		status = compact_zone_order(zone, order, gfp_mask, prio,
-					alloc_flags, ac_classzone_idx(ac));
+				alloc_flags, ac_classzone_idx(ac), capture);
 		rc = max(status, rc);
 
 		/* The allocation should succeed, stop compacting */
@@ -2260,7 +2277,7 @@ static void compact_node(int nid)
 		INIT_LIST_HEAD(&cc.freepages);
 		INIT_LIST_HEAD(&cc.migratepages);
 
-		compact_zone(&cc);
+		compact_zone(&cc, NULL);
 
 		VM_BUG_ON(!list_empty(&cc.freepages));
 		VM_BUG_ON(!list_empty(&cc.migratepages));
@@ -2402,7 +2419,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 
 		if (kthread_should_stop())
 			return;
-		status = compact_zone(&cc);
+		status = compact_zone(&cc, NULL);
 
 		if (status == COMPACT_SUCCESS) {
 			compaction_defer_reset(zone, cc.order, false);
diff --git a/mm/internal.h b/mm/internal.h
index 983cb975545f..08fbb9d157c0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -207,6 +207,15 @@ struct compact_control {
 	bool contended;			/* Signal lock or sched contention */
 };
 
+/*
+ * Used in direct compaction when a page should be taken from the freelists
+ * immediately when one is created during the free path.
+ */
+struct capture_control {
+	struct compact_control *cc;
+	struct page *page;
+};
+
 unsigned long
 isolate_freepages_range(struct compact_control *cc,
 			unsigned long start_pfn, unsigned long end_pfn);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c7b80e62bfd9..4e0cf4dbda5b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -753,6 +753,41 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
 	return 0;
 }
 
+#ifdef CONFIG_COMPACTION
+static inline struct capture_control *task_capc(struct zone *zone)
+{
+	struct capture_control *capc = current->capture_control;
+
+	return capc &&
+		!(current->flags & PF_KTHREAD) &&
+		!capc->page &&
+		capc->cc->zone == zone &&
+		capc->cc->direct_compaction ? capc : NULL;
+}
+
+static inline bool
+compaction_capture(struct capture_control *capc, struct page *page, int order)
+{
+	if (!capc || order != capc->cc->order)
+		return false;
+
+	capc->page = page;
+	return true;
+}
+
+#else
+static inline struct capture_control *task_capc(struct zone *zone)
+{
+	return NULL;
+}
+
+static inline bool
+compaction_capture(struct capture_control *capc, struct page *page, int order)
+{
+	return false;
+}
+#endif /* CONFIG_COMPACTION */
+
 /*
  * Freeing function for a buddy system allocator.
  *
@@ -786,6 +821,7 @@ static inline void __free_one_page(struct page *page,
 	unsigned long uninitialized_var(buddy_pfn);
 	struct page *buddy;
 	unsigned int max_order;
+	struct capture_control *capc = task_capc(zone);
 
 	max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
 
@@ -801,6 +837,12 @@ static inline void __free_one_page(struct page *page,
 
 continue_merging:
 	while (order < max_order - 1) {
+		if (compaction_capture(capc, page, order)) {
+			if (likely(!is_migrate_isolate(migratetype)))
+				__mod_zone_freepage_state(zone, -(1 << order),
+								migratetype);
+			return;
+		}
 		buddy_pfn = __find_buddy_pfn(pfn, order);
 		buddy = page + (buddy_pfn - pfn);
 
@@ -3779,7 +3821,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
 		enum compact_priority prio, enum compact_result *compact_result)
 {
-	struct page *page;
+	struct page *page = NULL;
 	unsigned long pflags;
 	unsigned int noreclaim_flag;
 
@@ -3790,13 +3832,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	noreclaim_flag = memalloc_noreclaim_save();
 
 	*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-									prio);
+								prio, &page);
 
 	memalloc_noreclaim_restore(noreclaim_flag);
 	psi_memstall_leave(&pflags);
 
-	if (*compact_result <= COMPACT_INACTIVE)
+	if (*compact_result <= COMPACT_INACTIVE) {
+		WARN_ON_ONCE(page);
 		return NULL;
+	}
 
 	/*
 	 * At least in one zone compaction wasn't deferred or skipped, so let's
@@ -3804,7 +3848,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	 */
 	count_vm_event(COMPACTSTALL);
 
-	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
+	/* Prep a captured page if available */
+	if (page)
+		prep_new_page(page, order, gfp_mask, alloc_flags);
+
+	/* Try get a page from the freelist if available */
+	if (!page)
+		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 
 	if (page) {
 		struct zone *zone = page_zone(page);
-- 
2.16.4


-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 14/14] mm, compaction: Do not direct compact remote memory
  2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
                   ` (12 preceding siblings ...)
  2018-12-14 23:05 ` [PATCH 13/14] mm, compaction: Capture a page under direct compaction Mel Gorman
@ 2018-12-14 23:06 ` Mel Gorman
  13 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-14 23:06 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

Remote compaction is expensive and possibly counter-productive. Locality
is expected to often have better performance characteristics than remote
high-order pages. For small allocations, it's expected that locality is
generally required or fallbacks are possible. For larger allocations such
as THP, they are forbidden at the time of writing but if __GFP_THISNODE
is ever removed, then it would still be preferable to fallback to small
local base pages over remote THP in the general case. kcompactd is still
woken via kswapd so compaction happens eventually.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 39d33b6d1172..05fecd7227e4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2208,6 +2208,16 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			continue;
 		}
 
+		/*
+		 * Do not compact remote memory. It's expensive and high-order
+		 * small allocations are expected to prefer or require local
+		 * memory. Similarly, larger requests such as THP can fallback
+		 * to base pages in preference to remote huge pages if
+		 * __GFP_THISNODE is not specified
+		 */
+		if (zone_to_nid(zone) != zone_to_nid(ac->preferred_zoneref->zone))
+			continue;
+
 		status = compact_zone_order(zone, order, gfp_mask, prio,
 				alloc_flags, ac_classzone_idx(ac), capture);
 		rc = max(status, rc);
-- 
2.16.4


-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/14] mm, compaction: Shrink compact_control
  2018-12-14 23:02 ` [PATCH 01/14] mm, compaction: Shrink compact_control Mel Gorman
@ 2018-12-17 12:27   ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2018-12-17 12:27 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

On 12/15/18 12:02 AM, Mel Gorman wrote:
> The isolate and migrate scanners should never isolate more than a pageblock
> of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/internal.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 536bc2a839b9..5564841fce36 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -185,8 +185,8 @@ struct compact_control {
>  	struct list_head freepages;	/* List of free pages to migrate to */
>  	struct list_head migratepages;	/* List of pages being migrated */
>  	struct zone *zone;
> -	unsigned long nr_freepages;	/* Number of isolated free pages */
> -	unsigned long nr_migratepages;	/* Number of pages to migrate */
> +	unsigned int nr_freepages;	/* Number of isolated free pages */
> +	unsigned int nr_migratepages;	/* Number of pages to migrate */
>  	unsigned long total_migrate_scanned;
>  	unsigned long total_free_scanned;
>  	unsigned long free_pfn;		/* isolate_freepages search base */
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/14] mm, compaction: Rearrange compact_control
  2018-12-14 23:02 ` [PATCH 02/14] mm, compaction: Rearrange compact_control Mel Gorman
@ 2018-12-17 13:20   ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2018-12-17 13:20 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

On 12/15/18 12:02 AM, Mel Gorman wrote:
> compact_control spans two cache lines with write-intensive lines on
> both. Rearrange so the most write-intensive fields are in the same
> cache line. This has a negligible impact on the overall performance of
> compaction and is more a tidying exercise than anything.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/14] mm, compaction: Remove last_migrated_pfn from compact_control
  2018-12-14 23:02 ` [PATCH 03/14] mm, compaction: Remove last_migrated_pfn from compact_control Mel Gorman
@ 2018-12-17 13:50   ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2018-12-17 13:50 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

On 12/15/18 12:02 AM, Mel Gorman wrote:
> The last_migrated_pfn field is a bit dubious as to whether it really helps
> but either way, the information from it can be inferred without increasing
> the size of compact_control so remove the field.

Yeah looks like this won't cause more frequent drains.

> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/14] mm, compaction: Rename map_pages to split_map_pages
  2018-12-14 23:03 ` [PATCH 04/14] mm, compaction: Rename map_pages to split_map_pages Mel Gorman
@ 2018-12-17 14:06   ` Vlastimil Babka
  2018-12-17 14:30     ` Mel Gorman
  0 siblings, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2018-12-17 14:06 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

On 12/15/18 12:03 AM, Mel Gorman wrote:
> It's non-obvious that high-order free pages are split into order-0
> pages from the function name. Fix it.

That's fine, but looks like the patch has another change squashed into
it that removes zone parameter from several functions and uses cc->zone
instead.

> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/compaction.c | 60 ++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 29 insertions(+), 31 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index fb4d9f52ed56..3afa4e9188b6 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -66,7 +66,7 @@ static unsigned long release_freepages(struct list_head *freelist)
>  	return high_pfn;
>  }
>  
> -static void map_pages(struct list_head *list)
> +static void split_map_pages(struct list_head *list)
>  {
>  	unsigned int i, order, nr_pages;
>  	struct page *page, *next;
> @@ -644,7 +644,7 @@ isolate_freepages_range(struct compact_control *cc,
>  	}
>  
>  	/* __isolate_free_page() does not map the pages */
> -	map_pages(&freelist);
> +	split_map_pages(&freelist);
>  
>  	if (pfn < end_pfn) {
>  		/* Loop terminated early, cleanup. */
> @@ -1141,7 +1141,7 @@ static void isolate_freepages(struct compact_control *cc)
>  	}
>  
>  	/* __isolate_free_page() does not map the pages */
> -	map_pages(freelist);
> +	split_map_pages(freelist);
>  
>  	/*
>  	 * Record where the free scanner will restart next time. Either we
> @@ -1300,8 +1300,7 @@ static inline bool is_via_compact_memory(int order)
>  	return order == -1;
>  }
>  
> -static enum compact_result __compact_finished(struct zone *zone,
> -						struct compact_control *cc)
> +static enum compact_result __compact_finished(struct compact_control *cc)
>  {
>  	unsigned int order;
>  	const int migratetype = cc->migratetype;
> @@ -1312,7 +1311,7 @@ static enum compact_result __compact_finished(struct zone *zone,
>  	/* Compaction run completes if the migrate and free scanner meet */
>  	if (compact_scanners_met(cc)) {
>  		/* Let the next compaction start anew. */
> -		reset_cached_positions(zone);
> +		reset_cached_positions(cc->zone);
>  
>  		/*
>  		 * Mark that the PG_migrate_skip information should be cleared
> @@ -1321,7 +1320,7 @@ static enum compact_result __compact_finished(struct zone *zone,
>  		 * based on an allocation request.
>  		 */
>  		if (cc->direct_compaction)
> -			zone->compact_blockskip_flush = true;
> +			cc->zone->compact_blockskip_flush = true;
>  
>  		if (cc->whole_zone)
>  			return COMPACT_COMPLETE;
> @@ -1345,7 +1344,7 @@ static enum compact_result __compact_finished(struct zone *zone,
>  
>  	/* Direct compactor: Is a suitable page free? */
>  	for (order = cc->order; order < MAX_ORDER; order++) {
> -		struct free_area *area = &zone->free_area[order];
> +		struct free_area *area = &cc->zone->free_area[order];
>  		bool can_steal;
>  
>  		/* Job done if page is free of the right migratetype */
> @@ -1391,13 +1390,12 @@ static enum compact_result __compact_finished(struct zone *zone,
>  	return COMPACT_NO_SUITABLE_PAGE;
>  }
>  
> -static enum compact_result compact_finished(struct zone *zone,
> -			struct compact_control *cc)
> +static enum compact_result compact_finished(struct compact_control *cc)
>  {
>  	int ret;
>  
> -	ret = __compact_finished(zone, cc);
> -	trace_mm_compaction_finished(zone, cc->order, ret);
> +	ret = __compact_finished(cc);
> +	trace_mm_compaction_finished(cc->zone, cc->order, ret);
>  	if (ret == COMPACT_NO_SUITABLE_PAGE)
>  		ret = COMPACT_CONTINUE;
>  
> @@ -1524,16 +1522,16 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>  	return false;
>  }
>  
> -static enum compact_result compact_zone(struct zone *zone, struct compact_control *cc)
> +static enum compact_result compact_zone(struct compact_control *cc)
>  {
>  	enum compact_result ret;
> -	unsigned long start_pfn = zone->zone_start_pfn;
> -	unsigned long end_pfn = zone_end_pfn(zone);
> +	unsigned long start_pfn = cc->zone->zone_start_pfn;
> +	unsigned long end_pfn = zone_end_pfn(cc->zone);
>  	unsigned long last_migrated_pfn;
>  	const bool sync = cc->mode != MIGRATE_ASYNC;
>  
>  	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
> -	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
> +	ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
>  							cc->classzone_idx);
>  	/* Compaction is likely to fail */
>  	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
> @@ -1546,8 +1544,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
>  	 * Clear pageblock skip if there were failures recently and compaction
>  	 * is about to be retried after being deferred.
>  	 */
> -	if (compaction_restarting(zone, cc->order))
> -		__reset_isolation_suitable(zone);
> +	if (compaction_restarting(cc->zone, cc->order))
> +		__reset_isolation_suitable(cc->zone);
>  
>  	/*
>  	 * Setup to move all movable pages to the end of the zone. Used cached
> @@ -1559,16 +1557,16 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
>  		cc->migrate_pfn = start_pfn;
>  		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
>  	} else {
> -		cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
> -		cc->free_pfn = zone->compact_cached_free_pfn;
> +		cc->migrate_pfn = cc->zone->compact_cached_migrate_pfn[sync];
> +		cc->free_pfn = cc->zone->compact_cached_free_pfn;
>  		if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
>  			cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
> -			zone->compact_cached_free_pfn = cc->free_pfn;
> +			cc->zone->compact_cached_free_pfn = cc->free_pfn;
>  		}
>  		if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
>  			cc->migrate_pfn = start_pfn;
> -			zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
> -			zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
> +			cc->zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
> +			cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
>  		}
>  
>  		if (cc->migrate_pfn == start_pfn)
> @@ -1582,11 +1580,11 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
>  
>  	migrate_prep_local();
>  
> -	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
> +	while ((ret = compact_finished(cc)) == COMPACT_CONTINUE) {
>  		int err;
>  		unsigned long start_pfn = cc->migrate_pfn;
>  
> -		switch (isolate_migratepages(zone, cc)) {
> +		switch (isolate_migratepages(cc->zone, cc)) {
>  		case ISOLATE_ABORT:
>  			ret = COMPACT_CONTENDED;
>  			putback_movable_pages(&cc->migratepages);
> @@ -1653,7 +1651,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
>  			if (last_migrated_pfn < current_block_start) {
>  				cpu = get_cpu();
>  				lru_add_drain_cpu(cpu);
> -				drain_local_pages(zone);
> +				drain_local_pages(cc->zone);
>  				put_cpu();
>  				/* No more flushing until we migrate again */
>  				last_migrated_pfn = 0;
> @@ -1678,8 +1676,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
>  		 * Only go back, not forward. The cached pfn might have been
>  		 * already reset to zone end in compact_finished()
>  		 */
> -		if (free_pfn > zone->compact_cached_free_pfn)
> -			zone->compact_cached_free_pfn = free_pfn;
> +		if (free_pfn > cc->zone->compact_cached_free_pfn)
> +			cc->zone->compact_cached_free_pfn = free_pfn;
>  	}
>  
>  	count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned);
> @@ -1716,7 +1714,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
>  
> -	ret = compact_zone(zone, &cc);
> +	ret = compact_zone(&cc);
>  
>  	VM_BUG_ON(!list_empty(&cc.freepages));
>  	VM_BUG_ON(!list_empty(&cc.migratepages));
> @@ -1834,7 +1832,7 @@ static void compact_node(int nid)
>  		INIT_LIST_HEAD(&cc.freepages);
>  		INIT_LIST_HEAD(&cc.migratepages);
>  
> -		compact_zone(zone, &cc);
> +		compact_zone(&cc);
>  
>  		VM_BUG_ON(!list_empty(&cc.freepages));
>  		VM_BUG_ON(!list_empty(&cc.migratepages));
> @@ -1976,7 +1974,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
>  
>  		if (kthread_should_stop())
>  			return;
> -		status = compact_zone(zone, &cc);
> +		status = compact_zone(&cc);
>  
>  		if (status == COMPACT_SUCCESS) {
>  			compaction_defer_reset(zone, cc.order, false);
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/14] mm, compaction: Rename map_pages to split_map_pages
  2018-12-17 14:06   ` Vlastimil Babka
@ 2018-12-17 14:30     ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-17 14:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, Linus Torvalds,
	Michal Hocko, ying.huang, kirill, Andrew Morton,
	Linux List Kernel Mailing

On Mon, Dec 17, 2018 at 03:06:59PM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > It's non-obvious that high-order free pages are split into order-0
> > pages from the function name. Fix it.
> 
> That's fine, but looks like the patch has another change squashed into
> it that removes zone parameter from several functions and uses cc->zone
> instead.
> 

Bah, it's a rebase mishap. It didn't flag when rereading the patches
before sending because "yep, I did that on purpose". I'll split it out,
the changelog will be uninspiring.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/14] mm, compaction: Skip pageblocks with reserved pages
  2018-12-14 23:03 ` [PATCH 05/14] mm, compaction: Skip pageblocks with reserved pages Mel Gorman
@ 2018-12-18  8:08   ` Vlastimil Babka
  2018-12-18  8:38     ` Mel Gorman
  0 siblings, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2018-12-18  8:08 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

On 12/15/18 12:03 AM, Mel Gorman wrote:
> Reserved pages are set at boot time, tend to be clustered and almost
> never become unreserved. When isolating pages for migrating, skip
> the entire pageblock is one PageReserved page is encountered on the
> grounds that it is highly probable the entire pageblock is reserved.

Agreed, but maybe since it's highly probable and not certain, this
skipping should not be done on the highest compaction priority?

> The impact depends on the machine and timing but both thpscale and
> thpfioscale when using MADV_HUGEPAGE show a reduction of scanning and
> fault latency on a 1-socket machine. The 2-socket results were too
> noisy to draw any meaningful conclusion but it's safe to assume less
> scanning is useful.
> 
> 1-socket thpfioscale
>                                    4.20.0-rc6             4.20.0-rc6
>                                mmotm-20181210        noreserved-v1r4
> Amean     fault-base-1     1481.32 (   0.00%)     1443.63 (   2.54%)
> Amean     fault-huge-1     1118.17 (   0.00%)      981.30 *  12.24%*
> Amean     fault-both-1     1176.43 (   0.00%)     1052.64 *  10.52%*
> 
> Compaction migrate scanned     3860713     3294284
> Compaction free scanned      613786341   433423502
> Kcompactd migrate scanned       408711      291915
> Kcompactd free scanned       242509759   217164988
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/compaction.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 3afa4e9188b6..8134dba47584 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -827,6 +827,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  					goto isolate_success;
>  			}
>  
> +			/*
> +			 * A reserved page is never freed and tend to be
> +			 * clustered in the same pageblocks. Skip the block.
> +			 */
> +			if (PageReserved(page))
> +				low_pfn = end_pfn;
> +
>  			goto isolate_fail;
>  		}
>  
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/14] mm, compaction: Skip pageblocks with reserved pages
  2018-12-18  8:08   ` Vlastimil Babka
@ 2018-12-18  8:38     ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-18  8:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, Linus Torvalds,
	Michal Hocko, ying.huang, kirill, Andrew Morton,
	Linux List Kernel Mailing

On Tue, Dec 18, 2018 at 09:08:02AM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > Reserved pages are set at boot time, tend to be clustered and almost
> > never become unreserved. When isolating pages for migrating, skip
> > the entire pageblock is one PageReserved page is encountered on the
> > grounds that it is highly probable the entire pageblock is reserved.
> 
> Agreed, but maybe since it's highly probable and not certain, this
> skipping should not be done on the highest compaction priority?
> 

I don't think that's necessary at this time. For the most part, you are
talking about one partial pageblock at best given how the early memory
allocator works so it would only ever be useful for a high-order kernel
allocation. Second, one of compactions primary problems is inefficient
scanning where viable pageblocks are easily skipped over or only partially
scanned which is something I'm still looking at. Lastly, maximum priority
compaction is rarely hit in practice as far as I can tell.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler
  2018-12-14 23:03 ` [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler Mel Gorman
@ 2018-12-18  9:06   ` Vlastimil Babka
  2018-12-18  9:55     ` Mel Gorman
  2018-12-20 19:44   ` Yang Shi
  1 sibling, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2018-12-18  9:06 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

On 12/15/18 12:03 AM, Mel Gorman wrote:
> Pages with no migration handler use a fallback hander which sometimes
> works and sometimes persistently fails such as blockdev pages. Migration
> will retry a number of times on these persistent pages which is wasteful
> during compaction. This patch will fail migration immediately unless the
> caller is in MIGRATE_SYNC mode which indicates the caller is willing to
> wait while being persistent.

Right.

> This is not expected to help THP allocation success rates but it does
> reduce latencies slightly.
> 
> 1-socket thpfioscale
>                                     4.20.0-rc6             4.20.0-rc6
>                                noreserved-v1r4          failfast-v1r4
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      2276.15 (   0.00%)     3867.54 * -69.92%*

This is rather weird.

> Amean     fault-both-5      4992.20 (   0.00%)     5313.20 (  -6.43%)
> Amean     fault-both-7      7373.30 (   0.00%)     7039.11 (   4.53%)
> Amean     fault-both-12    11911.52 (   0.00%)    11328.29 (   4.90%)
> Amean     fault-both-18    17209.42 (   0.00%)    16455.34 (   4.38%)
> Amean     fault-both-24    20943.71 (   0.00%)    20448.94 (   2.36%)
> Amean     fault-both-30    22703.00 (   0.00%)    21655.07 (   4.62%)
> Amean     fault-both-32    22461.41 (   0.00%)    21415.35 (   4.66%)
> 
> The 2-socket results are not materially different. Scan rates are
> similar as expected.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/migrate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index df17a710e2c7..0e27a10429e2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -885,7 +885,7 @@ static int fallback_migrate_page(struct address_space *mapping,
>  	 */
>  	if (page_has_private(page) &&
>  	    !try_to_release_page(page, GFP_KERNEL))
> -		return -EAGAIN;
> +		return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY;
>  
>  	return migrate_page(mapping, newpage, page, mode);
>  }
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/14] mm, compaction: Always finish scanning of a full pageblock
  2018-12-14 23:03 ` [PATCH 07/14] mm, compaction: Always finish scanning of a full pageblock Mel Gorman
@ 2018-12-18  9:23   ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2018-12-18  9:23 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

On 12/15/18 12:03 AM, Mel Gorman wrote:
> When compaction is finishing, it uses a flag to ensure the pageblock is
> complete.  However, in general it makes sense to always complete migration
> of a pageblock. Minimally, skip information is based on a pageblock and
> partially scanned pageblocks may incur more scanning in the future. The
> pageblock skip handling also becomes more strict later in the series and
> the hint is more useful if a complete pageblock was always scanned.
> 
> The impact here is potentially on latencies as more scanning is done
> but it's not a consistent win or loss as the scanning is not always a
> high percentage of the pageblock and sometimes it is offset by future
> reductions in scanning. Hence, the results are not presented this time as
> it's a mix of gains/losses without any clear pattern. However, completing
> scanning of the pageblock is important for later patches.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler
  2018-12-18  9:06   ` Vlastimil Babka
@ 2018-12-18  9:55     ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-18  9:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, Linus Torvalds,
	Michal Hocko, ying.huang, kirill, Andrew Morton,
	Linux List Kernel Mailing

On Tue, Dec 18, 2018 at 10:06:31AM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > Pages with no migration handler use a fallback hander which sometimes
> > works and sometimes persistently fails such as blockdev pages. Migration
> > will retry a number of times on these persistent pages which is wasteful
> > during compaction. This patch will fail migration immediately unless the
> > caller is in MIGRATE_SYNC mode which indicates the caller is willing to
> > wait while being persistent.
> 
> Right.
> 
> > This is not expected to help THP allocation success rates but it does
> > reduce latencies slightly.
> > 
> > 1-socket thpfioscale
> >                                     4.20.0-rc6             4.20.0-rc6
> >                                noreserved-v1r4          failfast-v1r4
> > Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> > Amean     fault-both-3      2276.15 (   0.00%)     3867.54 * -69.92%*
> 
> This is rather weird.
> 

Fault latency is extremely variable and there can be very large outliers
that skew the mean (the full report includes quartiles but it makes for an
excessive changelog). It can be down to luck about how often the migrate
scanner advances and how often it gets reset. For this series, it'll
not be unusual to see jitter in the latencies for individual patches
that will not get nailed down reliably until later in the series. The
alternative is massive patches that do multiple things which will look
nice in changelogs and be horrible to review.

> > Amean     fault-both-5      4992.20 (   0.00%)     5313.20 (  -6.43%)
> > Amean     fault-both-7      7373.30 (   0.00%)     7039.11 (   4.53%)
> > Amean     fault-both-12    11911.52 (   0.00%)    11328.29 (   4.90%)
> > Amean     fault-both-18    17209.42 (   0.00%)    16455.34 (   4.38%)
> > Amean     fault-both-24    20943.71 (   0.00%)    20448.94 (   2.36%)
> > Amean     fault-both-30    22703.00 (   0.00%)    21655.07 (   4.62%)
> > Amean     fault-both-32    22461.41 (   0.00%)    21415.35 (   4.66%)
> > 
> > The 2-socket results are not materially different. Scan rates are
> > similar as expected.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 08/14] mm, compaction: Use the page allocator bulk-free helper for lists of pages
  2018-12-14 23:03 ` [PATCH 08/14] mm, compaction: Use the page allocator bulk-free helper for lists of pages Mel Gorman
@ 2018-12-18  9:55   ` Vlastimil Babka
  2018-12-19 16:04     ` Mel Gorman
  0 siblings, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2018-12-18  9:55 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

On 12/15/18 12:03 AM, Mel Gorman wrote:
> release_pages() is a simpler version of free_unref_page_list() but it
> tracks the highest PFN for caching the restart point of the compaction
> free scanner. This patch optionally tracks the highest PFN in the core
> helper and converts compaction to use it.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Nit below:

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2961,18 +2961,26 @@ void free_unref_page(struct page *page)
>  /*
>   * Free a list of 0-order pages
>   */
> -void free_unref_page_list(struct list_head *list)
> +void __free_page_list(struct list_head *list, bool dropref,
> +				unsigned long *highest_pfn)
>  {
>  	struct page *page, *next;
>  	unsigned long flags, pfn;
>  	int batch_count = 0;
>  
> +	if (highest_pfn)
> +		*highest_pfn = 0;
> +
>  	/* Prepare pages for freeing */
>  	list_for_each_entry_safe(page, next, list, lru) {
> +		if (dropref)
> +			WARN_ON_ONCE(!put_page_testzero(page));

That will warn just once, but then page will remain with elevated count
and free_unref_page_prepare() will warn either immediately or later
depending on DEBUG_VM, for each page.
Also IIRC it's legal for basically anyone to do get_page_unless_zero()
and later put_page(), and this would now cause warning. Maybe just test
for put_page_testzero() result without warning, and continue? Hm but
then we should still do a list_del() and that becomes racy after
dropping our ref...

>  		pfn = page_to_pfn(page);
>  		if (!free_unref_page_prepare(page, pfn))
>  			list_del(&page->lru);
>  		set_page_private(page, pfn);
> +		if (highest_pfn && pfn > *highest_pfn)
> +			*highest_pfn = pfn;
>  	}
>  
>  	local_irq_save(flags);
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction
  2018-12-14 23:03 ` [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction Mel Gorman
@ 2018-12-18 12:36   ` Vlastimil Babka
  2018-12-18 13:51     ` Mel Gorman
  0 siblings, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2018-12-18 12:36 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Linus Torvalds, Michal Hocko,
	ying.huang, kirill, Andrew Morton, Linux List Kernel Mailing

On 12/15/18 12:03 AM, Mel Gorman wrote:
> When pageblocks get fragmented, watermarks are artifically boosted to pages
> are reclaimed to avoid further fragmentation events. However, compaction
> is often either fragmentation-neutral or moving movable pages away from
> unmovable/reclaimable pages. As the actual watermarks are preserved,
> allow compaction to ignore the boost factor.

Right, I should have realized that when reviewing the boost patch. I
think it would be useful to do the same change in
__compaction_suitable() as well. Compaction has its own "gap".

> 1-socket thpscale
>                                     4.20.0-rc6             4.20.0-rc6
>                                finishscan-v1r4           noboost-v1r4
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      3849.90 (   0.00%)     3753.53 (   2.50%)
> Amean     fault-both-5      5054.13 (   0.00%)     5396.32 (  -6.77%)
> Amean     fault-both-7      7061.77 (   0.00%)     7393.46 (  -4.70%)
> Amean     fault-both-12    11560.59 (   0.00%)    12155.50 (  -5.15%)
> Amean     fault-both-18    16120.15 (   0.00%)    16445.96 (  -2.02%)
> Amean     fault-both-24    19804.31 (   0.00%)    20465.03 (  -3.34%)
> Amean     fault-both-30    25018.73 (   0.00%)    20813.54 *  16.81%*
> Amean     fault-both-32    24380.19 (   0.00%)    22384.02 (   8.19%)
> 
> The impact on the scan rates is a mixed bag because this patch is very
> sensitive to timing and whether the boost was active or not. However,
> detailed tracing indicated that failure of migration due to a premature
> ENOMEM triggered by watermark checks were eliminated.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 80535cd55a92..c7b80e62bfd9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3043,7 +3043,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  		 * watermark, because we already know our high-order page
>  		 * exists.
>  		 */
> -		watermark = min_wmark_pages(zone) + (1UL << order);
> +		watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
>  		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
>  			return 0;
>  
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction
  2018-12-18 12:36   ` Vlastimil Babka
@ 2018-12-18 13:51     ` Mel Gorman
  2018-12-18 13:58       ` Vlastimil Babka
  0 siblings, 1 reply; 33+ messages in thread
From: Mel Gorman @ 2018-12-18 13:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, Linus Torvalds,
	Michal Hocko, ying.huang, kirill, Andrew Morton,
	Linux List Kernel Mailing

On Tue, Dec 18, 2018 at 01:36:42PM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > When pageblocks get fragmented, watermarks are artifically boosted to pages
> > are reclaimed to avoid further fragmentation events. However, compaction
> > is often either fragmentation-neutral or moving movable pages away from
> > unmovable/reclaimable pages. As the actual watermarks are preserved,
> > allow compaction to ignore the boost factor.
> 
> Right, I should have realized that when reviewing the boost patch. I
> think it would be useful to do the same change in
> __compaction_suitable() as well. Compaction has its own "gap".
> 

That gap is somewhat static though so I'm a bit more wary of it.  However,
the check in __isolate_free_page looks too agressive. We isolate in
units of COMPACT_CLUSTER_MAX yet the watermark check there is based on
the allocation request. That means for THP that we check if 512 pages
can be allocated when only somewhere between 1 and 32 is needed for that
compaction cycle to complete. Adjusting that might be more appropriate?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction
  2018-12-18 13:51     ` Mel Gorman
@ 2018-12-18 13:58       ` Vlastimil Babka
  2018-12-18 14:29         ` Mel Gorman
  0 siblings, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2018-12-18 13:58 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, Linus Torvalds,
	Michal Hocko, ying.huang, kirill, Andrew Morton,
	Linux List Kernel Mailing

On 12/18/18 2:51 PM, Mel Gorman wrote:
> On Tue, Dec 18, 2018 at 01:36:42PM +0100, Vlastimil Babka wrote:
>> On 12/15/18 12:03 AM, Mel Gorman wrote:
>>> When pageblocks get fragmented, watermarks are artifically boosted to pages
>>> are reclaimed to avoid further fragmentation events. However, compaction
>>> is often either fragmentation-neutral or moving movable pages away from
>>> unmovable/reclaimable pages. As the actual watermarks are preserved,
>>> allow compaction to ignore the boost factor.
>>
>> Right, I should have realized that when reviewing the boost patch. I
>> think it would be useful to do the same change in
>> __compaction_suitable() as well. Compaction has its own "gap".
>>
> 
> That gap is somewhat static though so I'm a bit more wary of it. However,

Well, watermark boost is dynamic, but based on allocations stealing from
other migratetypes, not reflecting compaction chances of success.

> the check in __isolate_free_page looks too agressive. We isolate in
> units of COMPACT_CLUSTER_MAX yet the watermark check there is based on
> the allocation request. That means for THP that we check if 512 pages
> can be allocated when only somewhere between 1 and 32 is needed for that
> compaction cycle to complete. Adjusting that might be more appropriate?

AFAIU the code in __isolate_free_page() reflects that if there's less
than 512 free pages gap, we might form a high-order page for THP but
won't be able to allocate it afterwards due to watermark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction
  2018-12-18 13:58       ` Vlastimil Babka
@ 2018-12-18 14:29         ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-18 14:29 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, Linus Torvalds,
	Michal Hocko, ying.huang, kirill, Andrew Morton,
	Linux List Kernel Mailing

On Tue, Dec 18, 2018 at 02:58:33PM +0100, Vlastimil Babka wrote:
> On 12/18/18 2:51 PM, Mel Gorman wrote:
> > On Tue, Dec 18, 2018 at 01:36:42PM +0100, Vlastimil Babka wrote:
> >> On 12/15/18 12:03 AM, Mel Gorman wrote:
> >>> When pageblocks get fragmented, watermarks are artifically boosted to pages
> >>> are reclaimed to avoid further fragmentation events. However, compaction
> >>> is often either fragmentation-neutral or moving movable pages away from
> >>> unmovable/reclaimable pages. As the actual watermarks are preserved,
> >>> allow compaction to ignore the boost factor.
> >>
> >> Right, I should have realized that when reviewing the boost patch. I
> >> think it would be useful to do the same change in
> >> __compaction_suitable() as well. Compaction has its own "gap".
> >>
> > 
> > That gap is somewhat static though so I'm a bit more wary of it. However,
> 
> Well, watermark boost is dynamic, but based on allocations stealing from
> other migratetypes, not reflecting compaction chances of success.
> 

True.

> > the check in __isolate_free_page looks too agressive. We isolate in
> > units of COMPACT_CLUSTER_MAX yet the watermark check there is based on
> > the allocation request. That means for THP that we check if 512 pages
> > can be allocated when only somewhere between 1 and 32 is needed for that
> > compaction cycle to complete. Adjusting that might be more appropriate?
> 
> AFAIU the code in __isolate_free_page() reflects that if there's less
> than 512 free pages gap, we might form a high-order page for THP but
> won't be able to allocate it afterwards due to watermark.

Yeah but it used to be a lot more important when watermark checking for
high-orders was very different. Now, if the watermark is met for order-0
and a large enough free page is allocated, the allocation succeeds so
it's a lot less relevant than it used to be. kswapd will still run in
the background for order-0 if necessary so a heavy watermark check there
doesn't really help.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 08/14] mm, compaction: Use the page allocator bulk-free helper for lists of pages
  2018-12-18  9:55   ` Vlastimil Babka
@ 2018-12-19 16:04     ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-19 16:04 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, Linus Torvalds,
	Michal Hocko, ying.huang, kirill, Andrew Morton,
	Linux List Kernel Mailing

On Tue, Dec 18, 2018 at 10:55:31AM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > release_pages() is a simpler version of free_unref_page_list() but it
> > tracks the highest PFN for caching the restart point of the compaction
> > free scanner. This patch optionally tracks the highest PFN in the core
> > helper and converts compaction to use it.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Nit below:
> 
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2961,18 +2961,26 @@ void free_unref_page(struct page *page)
> >  /*
> >   * Free a list of 0-order pages
> >   */
> > -void free_unref_page_list(struct list_head *list)
> > +void __free_page_list(struct list_head *list, bool dropref,
> > +				unsigned long *highest_pfn)
> >  {
> >  	struct page *page, *next;
> >  	unsigned long flags, pfn;
> >  	int batch_count = 0;
> >  
> > +	if (highest_pfn)
> > +		*highest_pfn = 0;
> > +
> >  	/* Prepare pages for freeing */
> >  	list_for_each_entry_safe(page, next, list, lru) {
> > +		if (dropref)
> > +			WARN_ON_ONCE(!put_page_testzero(page));
> 
> That will warn just once, but then page will remain with elevated count
> and free_unref_page_prepare() will warn either immediately or later
> depending on DEBUG_VM, for each page.
> Also IIRC it's legal for basically anyone to do get_page_unless_zero()
> and later put_page(), and this would now cause warning. Maybe just test
> for put_page_testzero() result without warning, and continue? Hm but
> then we should still do a list_del() and that becomes racy after
> dropping our ref...
> 

While there are cases where such a pattern is legal, this function
simply does not expect it and the callers do not violate the rule. If it
ever gets a new user that makes mistakes, they'll get the warning. Sure,
the page leaks but it'll be in a state where it's unsafe to do anything
else with it.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler
  2018-12-14 23:03 ` [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler Mel Gorman
  2018-12-18  9:06   ` Vlastimil Babka
@ 2018-12-20 19:44   ` Yang Shi
  2018-12-20 20:31     ` Mel Gorman
  1 sibling, 1 reply; 33+ messages in thread
From: Yang Shi @ 2018-12-20 19:44 UTC (permalink / raw)
  To: mgorman
  Cc: Linux MM, David Rientjes, Andrea Arcangeli, torvalds,
	Michal Hocko, Huang Ying, Kirill A. Shutemov, Andrew Morton,
	Linux Kernel Mailing List

On Fri, Dec 14, 2018 at 3:03 PM Mel Gorman <mgorman@techsingularity.net> wrote:
>
> Pages with no migration handler use a fallback hander which sometimes
> works and sometimes persistently fails such as blockdev pages. Migration

A minor correction. The above statement sounds not accurate anymore
since Jan Kara had patch series (blkdev: avoid migration stalls for
blkdev pages) have blockdev use its own migration handler.

Thanks,
Yang

> will retry a number of times on these persistent pages which is wasteful
> during compaction. This patch will fail migration immediately unless the
> caller is in MIGRATE_SYNC mode which indicates the caller is willing to
> wait while being persistent.
>
> This is not expected to help THP allocation success rates but it does
> reduce latencies slightly.
>
> 1-socket thpfioscale
>                                     4.20.0-rc6             4.20.0-rc6
>                                noreserved-v1r4          failfast-v1r4
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      2276.15 (   0.00%)     3867.54 * -69.92%*
> Amean     fault-both-5      4992.20 (   0.00%)     5313.20 (  -6.43%)
> Amean     fault-both-7      7373.30 (   0.00%)     7039.11 (   4.53%)
> Amean     fault-both-12    11911.52 (   0.00%)    11328.29 (   4.90%)
> Amean     fault-both-18    17209.42 (   0.00%)    16455.34 (   4.38%)
> Amean     fault-both-24    20943.71 (   0.00%)    20448.94 (   2.36%)
> Amean     fault-both-30    22703.00 (   0.00%)    21655.07 (   4.62%)
> Amean     fault-both-32    22461.41 (   0.00%)    21415.35 (   4.66%)
>
> The 2-socket results are not materially different. Scan rates are
> similar as expected.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/migrate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index df17a710e2c7..0e27a10429e2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -885,7 +885,7 @@ static int fallback_migrate_page(struct address_space *mapping,
>          */
>         if (page_has_private(page) &&
>             !try_to_release_page(page, GFP_KERNEL))
> -               return -EAGAIN;
> +               return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY;
>
>         return migrate_page(mapping, newpage, page, mode);
>  }
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler
  2018-12-20 19:44   ` Yang Shi
@ 2018-12-20 20:31     ` Mel Gorman
  0 siblings, 0 replies; 33+ messages in thread
From: Mel Gorman @ 2018-12-20 20:31 UTC (permalink / raw)
  To: Yang Shi
  Cc: Linux MM, David Rientjes, Andrea Arcangeli, torvalds,
	Michal Hocko, Huang Ying, Kirill A. Shutemov, Andrew Morton,
	Linux Kernel Mailing List

On Thu, Dec 20, 2018 at 11:44:57AM -0800, Yang Shi wrote:
> On Fri, Dec 14, 2018 at 3:03 PM Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > Pages with no migration handler use a fallback hander which sometimes
> > works and sometimes persistently fails such as blockdev pages. Migration
> 
> A minor correction. The above statement sounds not accurate anymore
> since Jan Kara had patch series (blkdev: avoid migration stalls for
> blkdev pages) have blockdev use its own migration handler.
> 

I'm aware given that I reviewed that series. The statement was correct
at the time of writing. I'll alter the example when rebased on top of
Jan's work.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-12-20 20:31 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-14 23:02 [RFC PATCH 00/14] Increase success rates and reduce latency of compaction v1 Mel Gorman
2018-12-14 23:02 ` [PATCH 01/14] mm, compaction: Shrink compact_control Mel Gorman
2018-12-17 12:27   ` Vlastimil Babka
2018-12-14 23:02 ` [PATCH 02/14] mm, compaction: Rearrange compact_control Mel Gorman
2018-12-17 13:20   ` Vlastimil Babka
2018-12-14 23:02 ` [PATCH 03/14] mm, compaction: Remove last_migrated_pfn from compact_control Mel Gorman
2018-12-17 13:50   ` Vlastimil Babka
2018-12-14 23:03 ` [PATCH 04/14] mm, compaction: Rename map_pages to split_map_pages Mel Gorman
2018-12-17 14:06   ` Vlastimil Babka
2018-12-17 14:30     ` Mel Gorman
2018-12-14 23:03 ` [PATCH 05/14] mm, compaction: Skip pageblocks with reserved pages Mel Gorman
2018-12-18  8:08   ` Vlastimil Babka
2018-12-18  8:38     ` Mel Gorman
2018-12-14 23:03 ` [PATCH 06/14] mm, migrate: Immediately fail migration of a page with no migration handler Mel Gorman
2018-12-18  9:06   ` Vlastimil Babka
2018-12-18  9:55     ` Mel Gorman
2018-12-20 19:44   ` Yang Shi
2018-12-20 20:31     ` Mel Gorman
2018-12-14 23:03 ` [PATCH 07/14] mm, compaction: Always finish scanning of a full pageblock Mel Gorman
2018-12-18  9:23   ` Vlastimil Babka
2018-12-14 23:03 ` [PATCH 08/14] mm, compaction: Use the page allocator bulk-free helper for lists of pages Mel Gorman
2018-12-18  9:55   ` Vlastimil Babka
2018-12-19 16:04     ` Mel Gorman
2018-12-14 23:03 ` [PATCH 09/14] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction Mel Gorman
2018-12-18 12:36   ` Vlastimil Babka
2018-12-18 13:51     ` Mel Gorman
2018-12-18 13:58       ` Vlastimil Babka
2018-12-18 14:29         ` Mel Gorman
2018-12-14 23:04 ` [PATCH 10/14] mm, compaction: Use free lists to quickly locate a migration source Mel Gorman
2018-12-14 23:05 ` [PATCH 11/14] mm, compaction: Keep migration source private to a single compaction instance Mel Gorman
2018-12-14 23:05 ` [PATCH 12/14] mm, compaction: Use free lists to quickly locate a migration target Mel Gorman
2018-12-14 23:05 ` [PATCH 13/14] mm, compaction: Capture a page under direct compaction Mel Gorman
2018-12-14 23:06 ` [PATCH 14/14] mm, compaction: Do not direct compact remote memory Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).