linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/25] Increase success rates and reduce latency of compaction v2
@ 2019-01-04 12:49 Mel Gorman
  2019-01-04 12:49 ` [PATCH 01/25] mm, compaction: Shrink compact_control Mel Gorman
                   ` (28 more replies)
  0 siblings, 29 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

This series reduces scan rates and success rates of compaction, primarily
by using the free lists to shorten scans, better controlling of skip
information and whether multiple scanners can target the same block and
capturing pageblocks before being stolen by parallel requests. The series
is based on the 4.21/5.0 merge window after Andrew's tree had been merged.
It's known to rebase cleanly.

Primarily I'm using thpscale to measure the impact of the series. The
benchmark creates a large file, maps it, faults it, punches holes in the
mapping so that the virtual address space is fragmented and then tries
to allocate THP. It re-executes for different numbers of threads. From a
fragmentation perspective, the workload is relatively benign but it does
stress compaction.

The overall impact on latencies for a 1-socket machine is

				      baseline		      patches
Amean     fault-both-3      5362.80 (   0.00%)     4446.89 *  17.08%*
Amean     fault-both-5      9488.75 (   0.00%)     5660.86 *  40.34%*
Amean     fault-both-7     11909.86 (   0.00%)     8549.63 *  28.21%*
Amean     fault-both-12    16185.09 (   0.00%)    11508.36 *  28.90%*
Amean     fault-both-18    12057.72 (   0.00%)    19013.48 * -57.69%*
Amean     fault-both-24    23939.95 (   0.00%)    19676.16 *  17.81%*
Amean     fault-both-30    26606.14 (   0.00%)    27363.23 (  -2.85%)
Amean     fault-both-32    31677.12 (   0.00%)    23154.09 *  26.91%*

While there is a glitch at the 18-thread mark, it's known that the base
page allocation latency was much lower and huge pages were taking
longer -- partially due a high allocation success rate.

The allocation success rates are much improved

			 	 baseline		 patches
Percentage huge-3        70.93 (   0.00%)       98.30 (  38.60%)
Percentage huge-5        56.02 (   0.00%)       83.36 (  48.81%)
Percentage huge-7        60.98 (   0.00%)       89.04 (  46.01%)
Percentage huge-12       73.02 (   0.00%)       94.36 (  29.23%)
Percentage huge-18       94.37 (   0.00%)       95.87 (   1.58%)
Percentage huge-24       84.95 (   0.00%)       97.41 (  14.67%)
Percentage huge-30       83.63 (   0.00%)       96.69 (  15.62%)
Percentage huge-32       81.69 (   0.00%)       96.10 (  17.65%)

That's a nearly perfect allocation success rate.

The biggest impact is on the scan rates

Compaction migrate scanned   106520811    26934599
Compaction free scanned     4180735040    26584944

The number of pages scanned for migration was reduced by 74% and the
free scanner was reduced by 99.36%. So much less work in exchange
for lower latency and better success rates.

The series was also evaluated using a workload that heavily fragments
memory but the benefits there are also significant, albeit not presented.

It was commented that we should be rethinking scanning entirely and to
a large extent I agree. However, to achieve that you need a lot of this
series in place first so it's best to make the linear scanners as best
as possible before ripping them out.

 include/linux/compaction.h |    3 +-
 include/linux/gfp.h        |    7 +-
 include/linux/mmzone.h     |    2 +
 include/linux/sched.h      |    4 +
 kernel/sched/core.c        |    3 +
 mm/compaction.c            | 1031 ++++++++++++++++++++++++++++++++++----------
 mm/internal.h              |   23 +-
 mm/migrate.c               |    2 +-
 mm/page_alloc.c            |   70 ++-
 9 files changed, 908 insertions(+), 237 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 01/25] mm, compaction: Shrink compact_control
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-04 12:49 ` [PATCH 02/25] mm, compaction: Rearrange compact_control Mel Gorman
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

The isolate and migrate scanners should never isolate more than a pageblock
of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/internal.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index f4a7bb02decf..5ddf5d3771a0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -184,8 +184,8 @@ struct compact_control {
 	struct list_head freepages;	/* List of free pages to migrate to */
 	struct list_head migratepages;	/* List of pages being migrated */
 	struct zone *zone;
-	unsigned long nr_freepages;	/* Number of isolated free pages */
-	unsigned long nr_migratepages;	/* Number of pages to migrate */
+	unsigned int nr_freepages;	/* Number of isolated free pages */
+	unsigned int nr_migratepages;	/* Number of pages to migrate */
 	unsigned long total_migrate_scanned;
 	unsigned long total_free_scanned;
 	unsigned long free_pfn;		/* isolate_freepages search base */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 02/25] mm, compaction: Rearrange compact_control
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
  2019-01-04 12:49 ` [PATCH 01/25] mm, compaction: Shrink compact_control Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-04 12:49 ` [PATCH 03/25] mm, compaction: Remove last_migrated_pfn from compact_control Mel Gorman
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

compact_control spans two cache lines with write-intensive lines on
both. Rearrange so the most write-intensive fields are in the same
cache line. This has a negligible impact on the overall performance of
compaction and is more a tidying exercise than anything.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/internal.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 5ddf5d3771a0..9437ba5791db 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -183,14 +183,14 @@ extern int user_min_free_kbytes;
 struct compact_control {
 	struct list_head freepages;	/* List of free pages to migrate to */
 	struct list_head migratepages;	/* List of pages being migrated */
-	struct zone *zone;
 	unsigned int nr_freepages;	/* Number of isolated free pages */
 	unsigned int nr_migratepages;	/* Number of pages to migrate */
-	unsigned long total_migrate_scanned;
-	unsigned long total_free_scanned;
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
+	struct zone *zone;
+	unsigned long total_migrate_scanned;
+	unsigned long total_free_scanned;
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* migratetype of direct compactor */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 03/25] mm, compaction: Remove last_migrated_pfn from compact_control
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
  2019-01-04 12:49 ` [PATCH 01/25] mm, compaction: Shrink compact_control Mel Gorman
  2019-01-04 12:49 ` [PATCH 02/25] mm, compaction: Rearrange compact_control Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-04 12:49 ` [PATCH 04/25] mm, compaction: Remove unnecessary zone parameter in some instances Mel Gorman
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

The last_migrated_pfn field is a bit dubious as to whether it really helps
but either way, the information from it can be inferred without increasing
the size of compact_control so remove the field.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 25 +++++++++----------------
 mm/internal.h   |  1 -
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ef29490b0f46..fb4d9f52ed56 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -886,15 +886,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		cc->nr_migratepages++;
 		nr_isolated++;
 
-		/*
-		 * Record where we could have freed pages by migration and not
-		 * yet flushed them to buddy allocator.
-		 * - this is the lowest page that was isolated and likely be
-		 * then freed by migration.
-		 */
-		if (!cc->last_migrated_pfn)
-			cc->last_migrated_pfn = low_pfn;
-
 		/* Avoid isolating too much */
 		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
 			++low_pfn;
@@ -918,7 +909,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			}
 			putback_movable_pages(&cc->migratepages);
 			cc->nr_migratepages = 0;
-			cc->last_migrated_pfn = 0;
 			nr_isolated = 0;
 		}
 
@@ -1539,6 +1529,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	enum compact_result ret;
 	unsigned long start_pfn = zone->zone_start_pfn;
 	unsigned long end_pfn = zone_end_pfn(zone);
+	unsigned long last_migrated_pfn;
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 
 	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
@@ -1584,7 +1575,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 			cc->whole_zone = true;
 	}
 
-	cc->last_migrated_pfn = 0;
+	last_migrated_pfn = 0;
 
 	trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
 				cc->free_pfn, end_pfn, sync);
@@ -1593,12 +1584,14 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
 		int err;
+		unsigned long start_pfn = cc->migrate_pfn;
 
 		switch (isolate_migratepages(zone, cc)) {
 		case ISOLATE_ABORT:
 			ret = COMPACT_CONTENDED;
 			putback_movable_pages(&cc->migratepages);
 			cc->nr_migratepages = 0;
+			last_migrated_pfn = 0;
 			goto out;
 		case ISOLATE_NONE:
 			/*
@@ -1608,6 +1601,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 			 */
 			goto check_drain;
 		case ISOLATE_SUCCESS:
+			last_migrated_pfn = start_pfn;
 			;
 		}
 
@@ -1639,8 +1633,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 				cc->migrate_pfn = block_end_pfn(
 						cc->migrate_pfn - 1, cc->order);
 				/* Draining pcplists is useless in this case */
-				cc->last_migrated_pfn = 0;
-
+				last_migrated_pfn = 0;
 			}
 		}
 
@@ -1652,18 +1645,18 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 		 * compact_finished() can detect immediately if allocation
 		 * would succeed.
 		 */
-		if (cc->order > 0 && cc->last_migrated_pfn) {
+		if (cc->order > 0 && last_migrated_pfn) {
 			int cpu;
 			unsigned long current_block_start =
 				block_start_pfn(cc->migrate_pfn, cc->order);
 
-			if (cc->last_migrated_pfn < current_block_start) {
+			if (last_migrated_pfn < current_block_start) {
 				cpu = get_cpu();
 				lru_add_drain_cpu(cpu);
 				drain_local_pages(zone);
 				put_cpu();
 				/* No more flushing until we migrate again */
-				cc->last_migrated_pfn = 0;
+				last_migrated_pfn = 0;
 			}
 		}
 
diff --git a/mm/internal.h b/mm/internal.h
index 9437ba5791db..c6f794ad21a9 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -187,7 +187,6 @@ struct compact_control {
 	unsigned int nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
 	struct zone *zone;
 	unsigned long total_migrate_scanned;
 	unsigned long total_free_scanned;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 04/25] mm, compaction: Remove unnecessary zone parameter in some instances
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (2 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 03/25] mm, compaction: Remove last_migrated_pfn from compact_control Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-15 11:43   ` Vlastimil Babka
  2019-01-04 12:49 ` [PATCH 05/25] mm, compaction: Rename map_pages to split_map_pages Mel Gorman
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

A zone parameter is passed into a number of top-level compaction functions
despite the fact that it's already in cache_control. This is harmless but
it did need an audit to check if zone actually ever changes meaningfully.
This patches removes the parameter in a number of top-level functions. The
change could be much deeper but this was enough to briefly clarify the
flow.

No functional change.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 54 ++++++++++++++++++++++++++----------------------------
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index fb4d9f52ed56..7acb43f07303 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1300,8 +1300,7 @@ static inline bool is_via_compact_memory(int order)
 	return order == -1;
 }
 
-static enum compact_result __compact_finished(struct zone *zone,
-						struct compact_control *cc)
+static enum compact_result __compact_finished(struct compact_control *cc)
 {
 	unsigned int order;
 	const int migratetype = cc->migratetype;
@@ -1312,7 +1311,7 @@ static enum compact_result __compact_finished(struct zone *zone,
 	/* Compaction run completes if the migrate and free scanner meet */
 	if (compact_scanners_met(cc)) {
 		/* Let the next compaction start anew. */
-		reset_cached_positions(zone);
+		reset_cached_positions(cc->zone);
 
 		/*
 		 * Mark that the PG_migrate_skip information should be cleared
@@ -1321,7 +1320,7 @@ static enum compact_result __compact_finished(struct zone *zone,
 		 * based on an allocation request.
 		 */
 		if (cc->direct_compaction)
-			zone->compact_blockskip_flush = true;
+			cc->zone->compact_blockskip_flush = true;
 
 		if (cc->whole_zone)
 			return COMPACT_COMPLETE;
@@ -1345,7 +1344,7 @@ static enum compact_result __compact_finished(struct zone *zone,
 
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
-		struct free_area *area = &zone->free_area[order];
+		struct free_area *area = &cc->zone->free_area[order];
 		bool can_steal;
 
 		/* Job done if page is free of the right migratetype */
@@ -1391,13 +1390,12 @@ static enum compact_result __compact_finished(struct zone *zone,
 	return COMPACT_NO_SUITABLE_PAGE;
 }
 
-static enum compact_result compact_finished(struct zone *zone,
-			struct compact_control *cc)
+static enum compact_result compact_finished(struct compact_control *cc)
 {
 	int ret;
 
-	ret = __compact_finished(zone, cc);
-	trace_mm_compaction_finished(zone, cc->order, ret);
+	ret = __compact_finished(cc);
+	trace_mm_compaction_finished(cc->zone, cc->order, ret);
 	if (ret == COMPACT_NO_SUITABLE_PAGE)
 		ret = COMPACT_CONTINUE;
 
@@ -1524,16 +1522,16 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 	return false;
 }
 
-static enum compact_result compact_zone(struct zone *zone, struct compact_control *cc)
+static enum compact_result compact_zone(struct compact_control *cc)
 {
 	enum compact_result ret;
-	unsigned long start_pfn = zone->zone_start_pfn;
-	unsigned long end_pfn = zone_end_pfn(zone);
+	unsigned long start_pfn = cc->zone->zone_start_pfn;
+	unsigned long end_pfn = zone_end_pfn(cc->zone);
 	unsigned long last_migrated_pfn;
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 
 	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
-	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
+	ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
 	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
@@ -1546,8 +1544,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	 * Clear pageblock skip if there were failures recently and compaction
 	 * is about to be retried after being deferred.
 	 */
-	if (compaction_restarting(zone, cc->order))
-		__reset_isolation_suitable(zone);
+	if (compaction_restarting(cc->zone, cc->order))
+		__reset_isolation_suitable(cc->zone);
 
 	/*
 	 * Setup to move all movable pages to the end of the zone. Used cached
@@ -1559,16 +1557,16 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 		cc->migrate_pfn = start_pfn;
 		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
 	} else {
-		cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
-		cc->free_pfn = zone->compact_cached_free_pfn;
+		cc->migrate_pfn = cc->zone->compact_cached_migrate_pfn[sync];
+		cc->free_pfn = cc->zone->compact_cached_free_pfn;
 		if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
 			cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
-			zone->compact_cached_free_pfn = cc->free_pfn;
+			cc->zone->compact_cached_free_pfn = cc->free_pfn;
 		}
 		if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
 			cc->migrate_pfn = start_pfn;
-			zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
-			zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
+			cc->zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
+			cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
 		}
 
 		if (cc->migrate_pfn == start_pfn)
@@ -1582,11 +1580,11 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	migrate_prep_local();
 
-	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
+	while ((ret = compact_finished(cc)) == COMPACT_CONTINUE) {
 		int err;
 		unsigned long start_pfn = cc->migrate_pfn;
 
-		switch (isolate_migratepages(zone, cc)) {
+		switch (isolate_migratepages(cc->zone, cc)) {
 		case ISOLATE_ABORT:
 			ret = COMPACT_CONTENDED;
 			putback_movable_pages(&cc->migratepages);
@@ -1653,7 +1651,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 			if (last_migrated_pfn < current_block_start) {
 				cpu = get_cpu();
 				lru_add_drain_cpu(cpu);
-				drain_local_pages(zone);
+				drain_local_pages(cc->zone);
 				put_cpu();
 				/* No more flushing until we migrate again */
 				last_migrated_pfn = 0;
@@ -1678,8 +1676,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 		 * Only go back, not forward. The cached pfn might have been
 		 * already reset to zone end in compact_finished()
 		 */
-		if (free_pfn > zone->compact_cached_free_pfn)
-			zone->compact_cached_free_pfn = free_pfn;
+		if (free_pfn > cc->zone->compact_cached_free_pfn)
+			cc->zone->compact_cached_free_pfn = free_pfn;
 	}
 
 	count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned);
@@ -1716,7 +1714,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	ret = compact_zone(zone, &cc);
+	ret = compact_zone(&cc);
 
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
@@ -1834,7 +1832,7 @@ static void compact_node(int nid)
 		INIT_LIST_HEAD(&cc.freepages);
 		INIT_LIST_HEAD(&cc.migratepages);
 
-		compact_zone(zone, &cc);
+		compact_zone(&cc);
 
 		VM_BUG_ON(!list_empty(&cc.freepages));
 		VM_BUG_ON(!list_empty(&cc.migratepages));
@@ -1976,7 +1974,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 
 		if (kthread_should_stop())
 			return;
-		status = compact_zone(zone, &cc);
+		status = compact_zone(&cc);
 
 		if (status == COMPACT_SUCCESS) {
 			compaction_defer_reset(zone, cc.order, false);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 05/25] mm, compaction: Rename map_pages to split_map_pages
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (3 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 04/25] mm, compaction: Remove unnecessary zone parameter in some instances Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-15 11:59   ` Vlastimil Babka
  2019-01-04 12:49 ` [PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages Mel Gorman
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

It's non-obvious that high-order free pages are split into order-0 pages
from the function name. Fix it.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7acb43f07303..3afa4e9188b6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -66,7 +66,7 @@ static unsigned long release_freepages(struct list_head *freelist)
 	return high_pfn;
 }
 
-static void map_pages(struct list_head *list)
+static void split_map_pages(struct list_head *list)
 {
 	unsigned int i, order, nr_pages;
 	struct page *page, *next;
@@ -644,7 +644,7 @@ isolate_freepages_range(struct compact_control *cc,
 	}
 
 	/* __isolate_free_page() does not map the pages */
-	map_pages(&freelist);
+	split_map_pages(&freelist);
 
 	if (pfn < end_pfn) {
 		/* Loop terminated early, cleanup. */
@@ -1141,7 +1141,7 @@ static void isolate_freepages(struct compact_control *cc)
 	}
 
 	/* __isolate_free_page() does not map the pages */
-	map_pages(freelist);
+	split_map_pages(freelist);
 
 	/*
 	 * Record where the free scanner will restart next time. Either we
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (4 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 05/25] mm, compaction: Rename map_pages to split_map_pages Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-15 12:10   ` Vlastimil Babka
  2019-01-04 12:49 ` [PATCH 07/25] mm, migrate: Immediately fail migration of a page with no migration handler Mel Gorman
                   ` (22 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Reserved pages are set at boot time, tend to be clustered and almost never
become unreserved. When isolating pages for either migration sources or
target, skip the entire pageblock is one PageReserved page is encountered
on the grounds that it is highly probable the entire pageblock is reserved.

The performance impact is relative to the number of reserved pages in
the system and their location so it'll be variable but intuitively it
should make sense. If the memblock allocator was ever changed to spread
reserved pages throughout the address space then this patch would be
impaired but it would also be considered a bug given that such a change
would ruin fragmentation.

On both 1-socket and 2-socket machines, scan rates are reduced slightly
on workloads that intensively allocate THP while the system is fragmented.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 3afa4e9188b6..94d1e5b062ea 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -484,6 +484,15 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 			goto isolate_fail;
 		}
 
+		/*
+		 * A reserved page is never freed and tend to be clustered in
+		 * the same pageblock. Skip the block.
+		 */
+		if (PageReserved(page)) {
+			blockpfn = end_pfn;
+			break;
+		}
+
 		if (!PageBuddy(page))
 			goto isolate_fail;
 
@@ -827,6 +836,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 					goto isolate_success;
 			}
 
+			/*
+			 * A reserved page is never freed and tend to be
+			 * clustered in the same pageblocks. Skip the block.
+			 */
+			if (PageReserved(page))
+				low_pfn = end_pfn;
+
 			goto isolate_fail;
 		}
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 07/25] mm, migrate: Immediately fail migration of a page with no migration handler
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (5 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-04 12:49 ` [PATCH 08/25] mm, compaction: Always finish scanning of a full pageblock Mel Gorman
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Pages with no migration handler use a fallback handler which sometimes
works and sometimes persistently retries. A historical example was blockdev
pages but there are others such as odd refcounting when page->private
is used.  These are retried multiple times which is wasteful during
compaction so this patch will fail migration faster unless the caller
specifies MIGRATE_SYNC.

This is not expected to help THP allocation success rates but it did
reduce latencies very slightly in some cases.

1-socket thpfioscale
                                        4.20.0                 4.20.0
                              noreserved-v2r15         failfast-v2r15
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3839.67 (   0.00%)     3833.72 (   0.15%)
Amean     fault-both-5      5177.47 (   0.00%)     4967.15 (   4.06%)
Amean     fault-both-7      7245.03 (   0.00%)     7139.19 (   1.46%)
Amean     fault-both-12    11534.89 (   0.00%)    11326.30 (   1.81%)
Amean     fault-both-18    16241.10 (   0.00%)    16270.70 (  -0.18%)
Amean     fault-both-24    19075.91 (   0.00%)    19839.65 (  -4.00%)
Amean     fault-both-30    22712.11 (   0.00%)    21707.05 (   4.43%)
Amean     fault-both-32    21692.92 (   0.00%)    21968.16 (  -1.27%)

The 2-socket results are not materially different. Scan rates are similar
as expected.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 5d1839a9148d..547cc1f3f3bb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -916,7 +916,7 @@ static int fallback_migrate_page(struct address_space *mapping,
 	 */
 	if (page_has_private(page) &&
 	    !try_to_release_page(page, GFP_KERNEL))
-		return -EAGAIN;
+		return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY;
 
 	return migrate_page(mapping, newpage, page, mode);
 }
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 08/25] mm, compaction: Always finish scanning of a full pageblock
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (6 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 07/25] mm, migrate: Immediately fail migration of a page with no migration handler Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-04 12:49 ` [PATCH 09/25] mm, compaction: Use the page allocator bulk-free helper for lists of pages Mel Gorman
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

When compaction is finishing, it uses a flag to ensure the pageblock
is complete but it makes sense to always complete migration of a
pageblock. Minimally, skip information is based on a pageblock and
partially scanned pageblocks may incur more scanning in the future. The
pageblock skip handling also becomes more strict later in the series and
the hint is more useful if a complete pageblock was always scanned.

The potentially impacts latency as more scanning is done but it's not a
consistent win or loss as the scanning is not always a high percentage
of the pageblock and sometimes it is offset by future reductions
in scanning. Hence, the results are not presented this time due to a
misleading mix of gains/losses without any clear pattern. However, full
scanning of the pageblock is important for later patches.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 19 ++++++++-----------
 mm/internal.h   |  1 -
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 94d1e5b062ea..8bf2090231a3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1347,16 +1347,14 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 	if (is_via_compact_memory(cc->order))
 		return COMPACT_CONTINUE;
 
-	if (cc->finishing_block) {
-		/*
-		 * We have finished the pageblock, but better check again that
-		 * we really succeeded.
-		 */
-		if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
-			cc->finishing_block = false;
-		else
-			return COMPACT_CONTINUE;
-	}
+	/*
+	 * Always finish scanning a pageblock to reduce the possibility of
+	 * fallbacks in the future. This is particularly important when
+	 * migration source is unmovable/reclaimable but it's not worth
+	 * special casing.
+	 */
+	if (!IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
+		return COMPACT_CONTINUE;
 
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
@@ -1398,7 +1396,6 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 				return COMPACT_SUCCESS;
 			}
 
-			cc->finishing_block = true;
 			return COMPACT_CONTINUE;
 		}
 	}
diff --git a/mm/internal.h b/mm/internal.h
index c6f794ad21a9..edb4029f64c8 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -202,7 +202,6 @@ struct compact_control {
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	bool contended;			/* Signal lock or sched contention */
-	bool finishing_block;		/* Finishing current pageblock */
 };
 
 unsigned long
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 09/25] mm, compaction: Use the page allocator bulk-free helper for lists of pages
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (7 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 08/25] mm, compaction: Always finish scanning of a full pageblock Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-15 12:39   ` Vlastimil Babka
  2019-01-04 12:49 ` [PATCH 10/25] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction Mel Gorman
                   ` (19 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

release_pages() is a simpler version of free_unref_page_list() but it
tracks the highest PFN for caching the restart point of the compaction
free scanner. This patch optionally tracks the highest PFN in the core
helper and converts compaction to use it. The performance impact is
limited but it should reduce lock contention slightly in some cases.
The main benefit is removing some partially duplicated code.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/gfp.h |  7 ++++++-
 mm/compaction.c     | 12 +++---------
 mm/page_alloc.c     | 10 +++++++++-
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 5f5e25fd6149..9e58799b730f 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -543,7 +543,12 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
 extern void __free_pages(struct page *page, unsigned int order);
 extern void free_pages(unsigned long addr, unsigned int order);
 extern void free_unref_page(struct page *page);
-extern void free_unref_page_list(struct list_head *list);
+extern void __free_page_list(struct list_head *list, bool dropref, unsigned long *highest_pfn);
+
+static inline void free_unref_page_list(struct list_head *list)
+{
+	return __free_page_list(list, false, NULL);
+}
 
 struct page_frag_cache;
 extern void __page_frag_cache_drain(struct page *page, unsigned int count);
diff --git a/mm/compaction.c b/mm/compaction.c
index 8bf2090231a3..8f0ce44dba41 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -52,16 +52,10 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
 
 static unsigned long release_freepages(struct list_head *freelist)
 {
-	struct page *page, *next;
-	unsigned long high_pfn = 0;
+	unsigned long high_pfn;
 
-	list_for_each_entry_safe(page, next, freelist, lru) {
-		unsigned long pfn = page_to_pfn(page);
-		list_del(&page->lru);
-		__free_page(page);
-		if (pfn > high_pfn)
-			high_pfn = pfn;
-	}
+	__free_page_list(freelist, true, &high_pfn);
+	INIT_LIST_HEAD(freelist);
 
 	return high_pfn;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cde5dac6229a..57ba9d1da519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2876,18 +2876,26 @@ void free_unref_page(struct page *page)
 /*
  * Free a list of 0-order pages
  */
-void free_unref_page_list(struct list_head *list)
+void __free_page_list(struct list_head *list, bool dropref,
+				unsigned long *highest_pfn)
 {
 	struct page *page, *next;
 	unsigned long flags, pfn;
 	int batch_count = 0;
 
+	if (highest_pfn)
+		*highest_pfn = 0;
+
 	/* Prepare pages for freeing */
 	list_for_each_entry_safe(page, next, list, lru) {
+		if (dropref)
+			WARN_ON_ONCE(!put_page_testzero(page));
 		pfn = page_to_pfn(page);
 		if (!free_unref_page_prepare(page, pfn))
 			list_del(&page->lru);
 		set_page_private(page, pfn);
+		if (highest_pfn && pfn > *highest_pfn)
+			*highest_pfn = pfn;
 	}
 
 	local_irq_save(flags);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 10/25] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (8 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 09/25] mm, compaction: Use the page allocator bulk-free helper for lists of pages Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-15 13:18   ` Vlastimil Babka
  2019-01-04 12:49 ` [PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source Mel Gorman
                   ` (18 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

When pageblocks get fragmented, watermarks are artifically boosted to
reclaim pages to avoid further fragmentation events. However, compaction
is often either fragmentation-neutral or moving movable pages away from
unmovable/reclaimable pages. As the true watermarks are preserved, allow
compaction to ignore the boost factor.

The expected impact is very slight as the main benefit is that compaction
is slightly more likely to succeed when the system has been fragmented
very recently. On both 1-socket and 2-socket machines for THP-intensive
allocation during fragmentation the success rate was increased by less
than 1% which is marginal. However, detailed tracing indicated that
failure of migration due to a premature ENOMEM triggered by watermark
checks were eliminated.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 57ba9d1da519..05c9a81d54ed 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2958,7 +2958,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		 * watermark, because we already know our high-order page
 		 * exists.
 		 */
-		watermark = min_wmark_pages(zone) + (1UL << order);
+		watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
 		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
 			return 0;
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (9 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 10/25] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-16 13:15   ` Vlastimil Babka
  2019-01-04 12:49 ` [PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance Mel Gorman
                   ` (17 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

The migration scanner is a linear scan of a zone with a potentiall large
search space.  Furthermore, many pageblocks are unusable such as those
filled with reserved pages or partially filled with pages that cannot
migrate. These still get scanned in the common case of allocating a THP
and the cost accumulates.

The patch uses a partial search of the free lists to locate a migration
source candidate that is marked as MOVABLE when allocating a THP. It
prefers picking a block with a larger number of free pages already on
the basis that there are fewer pages to migrate to free the entire block.
The lowest PFN found during searches is tracked as the basis of the start
for the linear search after the first search of the free list fails.
After the search, the free list is shuffled so that the next search will
not encounter the same page. If the search fails then the subsequent
searches will be shorter and the linear scanner is used.

If this search fails, or if the request is for a small or
unmovable/reclaimable allocation then the linear scanner is still used. It
is somewhat pointless to use the list search in those cases. Small free
pages must be used for the search and there is no guarantee that movable
pages are located within that block that are contiguous.

                                        4.20.0                 4.20.0
                                failfast-v2r15          findmig-v2r15
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3833.72 (   0.00%)     3505.69 (   8.56%)
Amean     fault-both-5      4967.15 (   0.00%)     5794.13 * -16.65%*
Amean     fault-both-7      7139.19 (   0.00%)     7663.09 (  -7.34%)
Amean     fault-both-12    11326.30 (   0.00%)    10983.36 (   3.03%)
Amean     fault-both-18    16270.70 (   0.00%)    13602.71 *  16.40%*
Amean     fault-both-24    19839.65 (   0.00%)    16145.77 *  18.62%*
Amean     fault-both-30    21707.05 (   0.00%)    19753.82 (   9.00%)
Amean     fault-both-32    21968.16 (   0.00%)    20616.16 (   6.15%)

                                   4.20.0                 4.20.0
                           failfast-v2r15          findmig-v2r15
Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
Percentage huge-3        84.62 (   0.00%)       90.58 (   7.05%)
Percentage huge-5        88.43 (   0.00%)       91.34 (   3.29%)
Percentage huge-7        88.33 (   0.00%)       92.21 (   4.39%)
Percentage huge-12       88.74 (   0.00%)       92.48 (   4.21%)
Percentage huge-18       86.52 (   0.00%)       91.65 (   5.93%)
Percentage huge-24       86.42 (   0.00%)       90.23 (   4.41%)
Percentage huge-30       86.67 (   0.00%)       90.17 (   4.04%)
Percentage huge-32       86.00 (   0.00%)       89.72 (   4.32%)

This shows an improvement in allocation latencies and a slight increase
in allocation success rates. While not presented, there was a 13% reduction
in migration scanning and a 10% reduction on system CPU usage. A 2-socket
machine showed similar benefits.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 mm/internal.h   |   2 +
 2 files changed, 179 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 8f0ce44dba41..137e32e8a2f5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1050,6 +1050,12 @@ static bool suitable_migration_target(struct compact_control *cc,
 	return false;
 }
 
+static inline unsigned int
+freelist_scan_limit(struct compact_control *cc)
+{
+	return (COMPACT_CLUSTER_MAX >> cc->fast_search_fail) + 1;
+}
+
 /*
  * Test whether the free scanner has reached the same or lower pageblock than
  * the migration scanner, and compaction should thus terminate.
@@ -1060,6 +1066,19 @@ static inline bool compact_scanners_met(struct compact_control *cc)
 		<= (cc->migrate_pfn >> pageblock_order);
 }
 
+/* Reorder the free list to reduce repeated future searches */
+static void
+move_freelist_tail(struct list_head *freelist, struct page *freepage)
+{
+	LIST_HEAD(sublist);
+
+	if (!list_is_last(freelist, &freepage->lru)) {
+		list_cut_position(&sublist, freelist, &freepage->lru);
+		if (!list_empty(&sublist))
+			list_splice_tail(&sublist, freelist);
+	}
+}
+
 /*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
@@ -1217,6 +1236,160 @@ typedef enum {
  */
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
 
+static inline void
+update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
+{
+	if (cc->fast_start_pfn == ULONG_MAX)
+		return;
+
+	if (!cc->fast_start_pfn)
+		cc->fast_start_pfn = pfn;
+
+	cc->fast_start_pfn = min(cc->fast_start_pfn, pfn);
+}
+
+static inline void
+reinit_migrate_pfn(struct compact_control *cc)
+{
+	if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX)
+		return;
+
+	cc->migrate_pfn = cc->fast_start_pfn;
+	cc->fast_start_pfn = ULONG_MAX;
+}
+
+/*
+ * Briefly search the free lists for a migration source that already has
+ * some free pages to reduce the number of pages that need migration
+ * before a pageblock is free.
+ */
+static unsigned long fast_find_migrateblock(struct compact_control *cc)
+{
+	unsigned int limit = freelist_scan_limit(cc);
+	unsigned int nr_scanned = 0;
+	unsigned long distance;
+	unsigned long pfn = cc->migrate_pfn;
+	unsigned long high_pfn;
+	int order;
+
+	/* Skip hints are relied on to avoid repeats on the fast search */
+	if (cc->ignore_skip_hint)
+		return pfn;
+
+	/*
+	 * If the migrate_pfn is not at the start of a zone or the start
+	 * of a pageblock then assume this is a continuation of a previous
+	 * scan restarted due to COMPACT_CLUSTER_MAX.
+	 */
+	if (pfn != cc->zone->zone_start_pfn && pfn != pageblock_start_pfn(pfn))
+		return pfn;
+
+	/*
+	 * For smaller orders, just linearly scan as the number of pages
+	 * to migrate should be relatively small and does not necessarily
+	 * justify freeing up a large block for a small allocation.
+	 */
+	if (cc->order <= PAGE_ALLOC_COSTLY_ORDER)
+		return pfn;
+
+	/*
+	 * Only allow kcompactd and direct requests for movable pages to
+	 * quickly clear out a MOVABLE pageblock for allocation. This
+	 * reduces the risk that a large movable pageblock is freed for
+	 * an unmovable/reclaimable small allocation.
+	 */
+	if (cc->direct_compaction && cc->migratetype != MIGRATE_MOVABLE)
+		return pfn;
+
+	/*
+	 * When starting the migration scanner, pick any pageblock within the
+	 * first half of the search space. Otherwise try and pick a pageblock
+	 * within the first eighth to reduce the chances that a migration
+	 * target later becomes a source.
+	 */
+	distance = (cc->free_pfn - cc->migrate_pfn) >> 1;
+	if (cc->migrate_pfn != cc->zone->zone_start_pfn)
+		distance >>= 2;
+	high_pfn = pageblock_start_pfn(cc->migrate_pfn + distance);
+
+	for (order = cc->order - 1;
+	     order >= PAGE_ALLOC_COSTLY_ORDER && pfn == cc->migrate_pfn && nr_scanned < limit;
+	     order--) {
+		struct free_area *area = &cc->zone->free_area[order];
+		struct list_head *freelist;
+		unsigned long nr_skipped = 0;
+		unsigned long flags;
+		struct page *freepage;
+
+		if (!area->nr_free)
+			continue;
+
+		spin_lock_irqsave(&cc->zone->lock, flags);
+		freelist = &area->free_list[MIGRATE_MOVABLE];
+		list_for_each_entry(freepage, freelist, lru) {
+			unsigned long free_pfn;
+
+			nr_scanned++;
+			free_pfn = page_to_pfn(freepage);
+			if (free_pfn < high_pfn) {
+				update_fast_start_pfn(cc, free_pfn);
+
+				/*
+				 * Avoid if skipped recently. Move to the tail
+				 * of the list so it will not be found again
+				 * soon
+				 */
+				if (get_pageblock_skip(freepage)) {
+
+					if (list_is_last(freelist, &freepage->lru))
+						break;
+
+					nr_skipped++;
+					list_del(&freepage->lru);
+					list_add_tail(&freepage->lru, freelist);
+					if (nr_skipped > 2)
+						break;
+					continue;
+				}
+
+				/* Reorder to so a future search skips recent pages */
+				move_freelist_tail(freelist, freepage);
+
+				pfn = pageblock_start_pfn(free_pfn);
+				cc->fast_search_fail = 0;
+				set_pageblock_skip(freepage);
+				break;
+			}
+
+			/*
+			 * If low PFNs are being found and discarded then
+			 * limit the scan as fast searching is finding
+			 * poor candidates.
+			 */
+			if (free_pfn < cc->migrate_pfn)
+				limit >>= 1;
+
+			if (nr_scanned >= limit) {
+				cc->fast_search_fail++;
+				move_freelist_tail(freelist, freepage);
+				break;
+			}
+		}
+		spin_unlock_irqrestore(&cc->zone->lock, flags);
+	}
+
+	cc->total_migrate_scanned += nr_scanned;
+
+	/*
+	 * If fast scanning failed then use a cached entry for a page block
+	 * that had free pages as the basis for starting a linear scan.
+	 */
+	if (pfn == cc->migrate_pfn)
+		reinit_migrate_pfn(cc);
+
+	return pfn;
+}
+
 /*
  * Isolate all pages that can be migrated from the first suitable block,
  * starting at the block pointed to by the migrate scanner pfn within
@@ -1235,9 +1408,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 
 	/*
 	 * Start at where we last stopped, or beginning of the zone as
-	 * initialized by compact_zone()
+	 * initialized by compact_zone(). The first failure will use
+	 * the lowest PFN as the starting point for linear scanning.
 	 */
-	low_pfn = cc->migrate_pfn;
+	low_pfn = fast_find_migrateblock(cc);
 	block_start_pfn = pageblock_start_pfn(low_pfn);
 	if (block_start_pfn < zone->zone_start_pfn)
 		block_start_pfn = zone->zone_start_pfn;
@@ -1560,6 +1734,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
 	 * want to compact the whole zone), but check that it is initialised
 	 * by ensuring the values are within zone boundaries.
 	 */
+	cc->fast_start_pfn = 0;
 	if (cc->whole_zone) {
 		cc->migrate_pfn = start_pfn;
 		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
diff --git a/mm/internal.h b/mm/internal.h
index edb4029f64c8..b25b33c5dd80 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -187,9 +187,11 @@ struct compact_control {
 	unsigned int nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	unsigned long fast_start_pfn;	/* a pfn to start linear scan from */
 	struct zone *zone;
 	unsigned long total_migrate_scanned;
 	unsigned long total_free_scanned;
+	unsigned int fast_search_fail;	/* failures to use free list searches */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* migratetype of direct compactor */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (10 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-16 15:45   ` Vlastimil Babka
  2019-01-17  9:40   ` Vlastimil Babka
  2019-01-04 12:49 ` [PATCH 13/25] mm, compaction: Use free lists to quickly locate a migration target Mel Gorman
                   ` (16 subsequent siblings)
  28 siblings, 2 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Due to either a fast search of the free list or a linear scan, it is
possible for multiple compaction instances to pick the same pageblock
for migration.  This is lucky for one scanner and increased scanning for
all the others. It also allows a race between requests on which first
allocates the resulting free block.

This patch tests and updates the pageblock skip for the migration scanner
carefully. When isolating a block, it will check and skip if the block is
already in use. Once the zone lock is acquired, it will be rechecked so
that only one scanner can set the pageblock skip for exclusive use. Any
scanner contending will continue with a linear scan. The skip bit is
still set if no pages can be isolated in a range. While this may result
in redundant scanning, it avoids unnecessarily acquiring the zone lock
when there are no suitable migration sources.

1-socket thpscale
                                        4.20.0                 4.20.0
                                 findmig-v2r15          isolmig-v2r15
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3505.69 (   0.00%)     3066.68 *  12.52%*
Amean     fault-both-5      5794.13 (   0.00%)     4298.49 *  25.81%*
Amean     fault-both-7      7663.09 (   0.00%)     5986.99 *  21.87%*
Amean     fault-both-12    10983.36 (   0.00%)     9324.85 (  15.10%)
Amean     fault-both-18    13602.71 (   0.00%)    13350.05 (   1.86%)
Amean     fault-both-24    16145.77 (   0.00%)    13491.77 *  16.44%*
Amean     fault-both-30    19753.82 (   0.00%)    15630.86 *  20.87%*
Amean     fault-both-32    20616.16 (   0.00%)    17428.50 *  15.46%*

This is the first patch that shows a significant reduction in latency as
multiple compaction scanners do not operate on the same blocks. There is
a small increase in the success rate

                               4.20.0-rc6             4.20.0-rc6
                             findmig-v1r4           isolmig-v1r4
Percentage huge-3        90.58 (   0.00%)       95.84 (   5.81%)
Percentage huge-5        91.34 (   0.00%)       94.19 (   3.12%)
Percentage huge-7        92.21 (   0.00%)       93.78 (   1.71%)
Percentage huge-12       92.48 (   0.00%)       94.33 (   2.00%)
Percentage huge-18       91.65 (   0.00%)       94.15 (   2.72%)
Percentage huge-24       90.23 (   0.00%)       94.23 (   4.43%)
Percentage huge-30       90.17 (   0.00%)       95.17 (   5.54%)
Percentage huge-32       89.72 (   0.00%)       93.59 (   4.32%)

Compaction migrate scanned    54168306    25516488
Compaction free scanned      800530954    87603321

Migration scan rates are reduced by 52%.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 126 ++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 99 insertions(+), 27 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 137e32e8a2f5..24e3a9db4b70 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -279,13 +279,52 @@ void reset_isolation_suitable(pg_data_t *pgdat)
 	}
 }
 
+/*
+ * Sets the pageblock skip bit if it was clear. Note that this is a hint as
+ * locks are not required for read/writers. Returns true if it was already set.
+ */
+static bool test_and_set_skip(struct compact_control *cc, struct page *page,
+							unsigned long pfn)
+{
+	bool skip;
+
+	/* Do no update if skip hint is being ignored */
+	if (cc->ignore_skip_hint)
+		return false;
+
+	if (!IS_ALIGNED(pfn, pageblock_nr_pages))
+		return false;
+
+	skip = get_pageblock_skip(page);
+	if (!skip && !cc->no_set_skip_hint)
+		set_pageblock_skip(page);
+
+	return skip;
+}
+
+static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
+{
+	struct zone *zone = cc->zone;
+
+	pfn = pageblock_end_pfn(pfn);
+
+	/* Set for isolation rather than compaction */
+	if (cc->no_set_skip_hint)
+		return;
+
+	if (pfn > zone->compact_cached_migrate_pfn[0])
+		zone->compact_cached_migrate_pfn[0] = pfn;
+	if (cc->mode != MIGRATE_ASYNC &&
+	    pfn > zone->compact_cached_migrate_pfn[1])
+		zone->compact_cached_migrate_pfn[1] = pfn;
+}
+
 /*
  * If no pages were isolated then mark this pageblock to be skipped in the
  * future. The information is later cleared by __reset_isolation_suitable().
  */
 static void update_pageblock_skip(struct compact_control *cc,
-			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			struct page *page, unsigned long nr_isolated)
 {
 	struct zone *zone = cc->zone;
 	unsigned long pfn;
@@ -304,16 +343,8 @@ static void update_pageblock_skip(struct compact_control *cc,
 	pfn = page_to_pfn(page);
 
 	/* Update where async and sync compaction should restart */
-	if (migrate_scanner) {
-		if (pfn > zone->compact_cached_migrate_pfn[0])
-			zone->compact_cached_migrate_pfn[0] = pfn;
-		if (cc->mode != MIGRATE_ASYNC &&
-		    pfn > zone->compact_cached_migrate_pfn[1])
-			zone->compact_cached_migrate_pfn[1] = pfn;
-	} else {
-		if (pfn < zone->compact_cached_free_pfn)
-			zone->compact_cached_free_pfn = pfn;
-	}
+	if (pfn < zone->compact_cached_free_pfn)
+		zone->compact_cached_free_pfn = pfn;
 }
 #else
 static inline bool isolation_suitable(struct compact_control *cc,
@@ -328,10 +359,19 @@ static inline bool pageblock_skip_persistent(struct page *page)
 }
 
 static inline void update_pageblock_skip(struct compact_control *cc,
-			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			struct page *page, unsigned long nr_isolated)
+{
+}
+
+static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
 {
 }
+
+static bool test_and_set_skip(struct compact_control *cc, struct page *page,
+							unsigned long pfn)
+{
+	return false;
+}
 #endif /* CONFIG_COMPACTION */
 
 /*
@@ -570,7 +610,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 
 	/* Update the pageblock-skip if the whole pageblock was scanned */
 	if (blockpfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, total_isolated, false);
+		update_pageblock_skip(cc, valid_page, total_isolated);
 
 	cc->total_free_scanned += nr_scanned;
 	if (total_isolated)
@@ -705,6 +745,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	unsigned long start_pfn = low_pfn;
 	bool skip_on_failure = false;
 	unsigned long next_skip_pfn = 0;
+	bool skip_updated = false;
 
 	/*
 	 * Ensure that there are not too many pages isolated from the LRU
@@ -771,8 +812,19 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		page = pfn_to_page(low_pfn);
 
-		if (!valid_page)
+		/*
+		 * Check if the pageblock has already been marked skipped.
+		 * Only the aligned PFN is checked as the caller isolates
+		 * COMPACT_CLUSTER_MAX at a time so the second call must
+		 * not falsely conclude that the block should be skipped.
+		 */
+		if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) {
+			if (!cc->ignore_skip_hint && get_pageblock_skip(page)) {
+				low_pfn = end_pfn;
+				goto isolate_abort;
+			}
 			valid_page = page;
+		}
 
 		/*
 		 * Skip if free. We read page order here without zone lock
@@ -860,8 +912,19 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		if (!locked) {
 			locked = compact_trylock_irqsave(zone_lru_lock(zone),
 								&flags, cc);
-			if (!locked)
+
+			/* Allow future scanning if the lock is contended */
+			if (!locked) {
+				clear_pageblock_skip(page);
 				break;
+			}
+
+			/* Try get exclusive access under lock */
+			if (!skip_updated) {
+				skip_updated = true;
+				if (test_and_set_skip(cc, page, low_pfn))
+					goto isolate_abort;
+			}
 
 			/* Recheck PageLRU and PageCompound under lock */
 			if (!PageLRU(page))
@@ -939,15 +1002,20 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	if (unlikely(low_pfn > end_pfn))
 		low_pfn = end_pfn;
 
+isolate_abort:
 	if (locked)
 		spin_unlock_irqrestore(zone_lru_lock(zone), flags);
 
 	/*
-	 * Update the pageblock-skip information and cached scanner pfn,
-	 * if the whole pageblock was scanned without isolating any page.
+	 * Updated the cached scanner pfn if the pageblock was scanned
+	 * without isolating a page. The pageblock may not be marked
+	 * skipped already if there were no LRU pages in the block.
 	 */
-	if (low_pfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, nr_isolated, true);
+	if (low_pfn == end_pfn && !nr_isolated) {
+		if (valid_page && !skip_updated)
+			set_pageblock_skip(valid_page);
+		update_cached_migrate(cc, low_pfn);
+	}
 
 	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
 						nr_scanned, nr_isolated);
@@ -1332,8 +1400,6 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 			nr_scanned++;
 			free_pfn = page_to_pfn(freepage);
 			if (free_pfn < high_pfn) {
-				update_fast_start_pfn(cc, free_pfn);
-
 				/*
 				 * Avoid if skipped recently. Move to the tail
 				 * of the list so it will not be found again
@@ -1355,9 +1421,9 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 				/* Reorder to so a future search skips recent pages */
 				move_freelist_tail(freelist, freepage);
 
+				update_fast_start_pfn(cc, free_pfn);
 				pfn = pageblock_start_pfn(free_pfn);
 				cc->fast_search_fail = 0;
-				set_pageblock_skip(freepage);
 				break;
 			}
 
@@ -1427,7 +1493,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			low_pfn = block_end_pfn,
 			block_start_pfn = block_end_pfn,
 			block_end_pfn += pageblock_nr_pages) {
-
 		/*
 		 * This can potentially iterate a massively long zone with
 		 * many pageblocks unsuitable, so periodically check if we
@@ -1442,8 +1507,15 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		if (!page)
 			continue;
 
-		/* If isolation recently failed, do not retry */
-		if (!isolation_suitable(cc, page))
+		/*
+		 * If isolation recently failed, do not retry. Only check the
+		 * pageblock once. COMPACT_CLUSTER_MAX causes a pageblock
+		 * to be visited multiple times. Assume skip was checked
+		 * before making it "skip" so other compaction instances do
+		 * not scan the same block.
+		 */
+		if (IS_ALIGNED(low_pfn, pageblock_nr_pages) &&
+		    !isolation_suitable(cc, page))
 			continue;
 
 		/*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 13/25] mm, compaction: Use free lists to quickly locate a migration target
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (11 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance Mel Gorman
@ 2019-01-04 12:49 ` Mel Gorman
  2019-01-17 14:36   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 14/25] mm, compaction: Avoid rescanning the same pageblock multiple times Mel Gorman
                   ` (15 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:49 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Similar to the migration scanner, this patch uses the free lists to quickly
locate a migration target. The search is different in that lower orders
will be searched for a suitable high PFN if necessary but the search
is still bound. This is justified on the grounds that the free scanner
typically scans linearly much more than the migration scanner.

If a free page is found, it is isolated and compaction continues if enough
pages were isolated. For SYNC* scanning, the full pageblock is scanned
for any remaining free pages so that is can be marked for skipping in
the near future.

1-socket thpfioscale
                                        4.20.0                 4.20.0
                                 isolmig-v2r15         findfree-v2r15
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3066.68 (   0.00%)     2884.51 (   5.94%)
Amean     fault-both-5      4298.49 (   0.00%)     4419.70 (  -2.82%)
Amean     fault-both-7      5986.99 (   0.00%)     6039.04 (  -0.87%)
Amean     fault-both-12     9324.85 (   0.00%)     9992.34 (  -7.16%)
Amean     fault-both-18    13350.05 (   0.00%)    12690.05 (   4.94%)
Amean     fault-both-24    13491.77 (   0.00%)    14393.93 (  -6.69%)
Amean     fault-both-30    15630.86 (   0.00%)    16894.08 (  -8.08%)
Amean     fault-both-32    17428.50 (   0.00%)    17813.68 (  -2.21%)

The impact on latency is variable but the search is optimistic and
sensitive to the exact system state. Success rates are similar but
the major impact is to the rate of scanning

                            4.20.0-rc6  4.20.0-rc6
                          isolmig-v1r4findfree-v1r8
Compaction migrate scanned    25516488    28324352
Compaction free scanned       87603321    56131065

The free scan rates are reduced by 35%. The 2-socket reductions for the
free scanner are more dramatic which is a likely reflection that the
machine has more memory.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 203 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 198 insertions(+), 5 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 24e3a9db4b70..9438f0564ed5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1136,7 +1136,7 @@ static inline bool compact_scanners_met(struct compact_control *cc)
 
 /* Reorder the free list to reduce repeated future searches */
 static void
-move_freelist_tail(struct list_head *freelist, struct page *freepage)
+move_freelist_head(struct list_head *freelist, struct page *freepage)
 {
 	LIST_HEAD(sublist);
 
@@ -1147,6 +1147,193 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
 	}
 }
 
+static void
+move_freelist_tail(struct list_head *freelist, struct page *freepage)
+{
+	LIST_HEAD(sublist);
+
+	if (!list_is_last(freelist, &freepage->lru)) {
+		list_cut_before(&sublist, freelist, &freepage->lru);
+		if (!list_empty(&sublist))
+			list_splice_tail(&sublist, freelist);
+	}
+}
+
+static void
+fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated)
+{
+	unsigned long start_pfn, end_pfn;
+	struct page *page = pfn_to_page(pfn);
+
+	/* Do not search around if there are enough pages already */
+	if (cc->nr_freepages >= cc->nr_migratepages)
+		return;
+
+	/* Minimise scanning during async compaction */
+	if (cc->direct_compaction && cc->mode == MIGRATE_ASYNC)
+		return;
+
+	/* Pageblock boundaries */
+	start_pfn = pageblock_start_pfn(pfn);
+	end_pfn = min(start_pfn + pageblock_nr_pages, zone_end_pfn(cc->zone));
+
+	/* Scan before */
+	if (start_pfn != pfn) {
+		isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, false);
+		if (cc->nr_freepages >= cc->nr_migratepages)
+			return;
+	}
+
+	/* Scan after */
+	start_pfn = pfn + nr_isolated;
+	if (start_pfn != end_pfn)
+		isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, false);
+
+	/* Skip this pageblock in the future as it's full or nearly full */
+	if (cc->nr_freepages < cc->nr_migratepages)
+		set_pageblock_skip(page);
+}
+
+static unsigned long
+fast_isolate_freepages(struct compact_control *cc)
+{
+	unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
+	unsigned int order_scanned = 0, nr_scanned = 0;
+	unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0;
+	unsigned long nr_isolated = 0;
+	unsigned long distance;
+	struct page *page = NULL;
+	bool scan_start = false;
+	int order;
+
+	/*
+	 * If starting the scan, use a deeper search and use the highest
+	 * PFN found if a suitable one is not found.
+	 */
+	if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) {
+		limit = pageblock_nr_pages >> 1;
+		scan_start = true;
+	}
+
+	/*
+	 * Preferred point is in the top quarter of the scan space but take
+	 * a pfn from the top half if the search is problematic.
+	 */
+	distance = (cc->free_pfn - cc->migrate_pfn);
+	low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2));
+	min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1));
+
+	if (WARN_ON_ONCE(min_pfn > low_pfn))
+		low_pfn = min_pfn;
+
+	for (order = cc->order - 1;
+	     order >= 0 && !page;
+	     order--) {
+		struct free_area *area = &cc->zone->free_area[order];
+		struct list_head *freelist;
+		struct page *freepage;
+		unsigned long flags;
+
+		if (!area->nr_free)
+			continue;
+
+		spin_lock_irqsave(&cc->zone->lock, flags);
+		freelist = &area->free_list[MIGRATE_MOVABLE];
+		list_for_each_entry_reverse(freepage, freelist, lru) {
+			unsigned long pfn;
+
+			order_scanned++;
+			nr_scanned++;
+			pfn = page_to_pfn(freepage);
+
+			if (pfn >= highest)
+				highest = pageblock_start_pfn(pfn);
+
+			if (pfn >= low_pfn) {
+				cc->fast_search_fail = 0;
+				page = freepage;
+				break;
+			}
+
+			if (pfn >= min_pfn && pfn > high_pfn) {
+				high_pfn = pfn;
+
+				/* Shorten the scan if a candidate is found */
+				limit >>= 1;
+			}
+
+			if (order_scanned >= limit)
+				break;
+		}
+
+		/* Use a minimum pfn if a preferred one was not found */
+		if (!page && high_pfn) {
+			page = pfn_to_page(high_pfn);
+
+			/* Update freepage for the list reorder below */
+			freepage = page;
+		}
+
+		/* Reorder to so a future search skips recent pages */
+		move_freelist_head(freelist, freepage);
+
+		/* Isolate the page if available */
+		if (page) {
+			if (__isolate_free_page(page, order)) {
+				set_page_private(page, order);
+				nr_isolated = 1 << order;
+				cc->nr_freepages += nr_isolated;
+				list_add_tail(&page->lru, &cc->freepages);
+				count_compact_events(COMPACTISOLATED, nr_isolated);
+			} else {
+				/* If isolation fails, abort the search */
+				order = -1;
+				page = NULL;
+			}
+		}
+
+		spin_unlock_irqrestore(&cc->zone->lock, flags);
+
+		/*
+		 * Smaller scan on next order so the total scan ig related
+		 * to freelist_scan_limit.
+		 */
+		if (order_scanned >= limit)
+			limit = min(1U, limit >> 1);
+	}
+
+	if (!page) {
+		cc->fast_search_fail++;
+		if (scan_start) {
+			/*
+			 * Use the highest PFN found above min. If one was
+			 * not found, be pessemistic for direct compaction
+			 * and use the min mark.
+			 */
+			if (highest) {
+				page = pfn_to_page(highest);
+				cc->free_pfn = highest;
+			} else {
+				if (cc->direct_compaction) {
+					page = pfn_to_page(min_pfn);
+					cc->free_pfn = min_pfn;
+				}
+			}
+		}
+	}
+
+	if (highest && highest > cc->zone->compact_cached_free_pfn)
+		cc->zone->compact_cached_free_pfn = highest;
+
+	cc->total_free_scanned += nr_scanned;
+	if (!page)
+		return cc->free_pfn;
+
+	low_pfn = page_to_pfn(page);
+	fast_isolate_around(cc, low_pfn, nr_isolated);
+	return low_pfn;
+}
+
 /*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
@@ -1161,6 +1348,11 @@ static void isolate_freepages(struct compact_control *cc)
 	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
 	struct list_head *freelist = &cc->freepages;
 
+	/* Try a small search of the free lists for a candidate */
+	isolate_start_pfn = fast_isolate_freepages(cc);
+	if (cc->nr_freepages)
+		goto splitmap;
+
 	/*
 	 * Initialise the free scanner. The starting point is where we last
 	 * successfully isolated from, zone-cached value, or the end of the
@@ -1173,7 +1365,7 @@ static void isolate_freepages(struct compact_control *cc)
 	 * is using.
 	 */
 	isolate_start_pfn = cc->free_pfn;
-	block_start_pfn = pageblock_start_pfn(cc->free_pfn);
+	block_start_pfn = pageblock_start_pfn(isolate_start_pfn);
 	block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
 						zone_end_pfn(zone));
 	low_pfn = pageblock_end_pfn(cc->migrate_pfn);
@@ -1237,9 +1429,6 @@ static void isolate_freepages(struct compact_control *cc)
 		}
 	}
 
-	/* __isolate_free_page() does not map the pages */
-	split_map_pages(freelist);
-
 	/*
 	 * Record where the free scanner will restart next time. Either we
 	 * broke from the loop and set isolate_start_pfn based on the last
@@ -1247,6 +1436,10 @@ static void isolate_freepages(struct compact_control *cc)
 	 * and the loop terminated due to isolate_start_pfn < low_pfn
 	 */
 	cc->free_pfn = isolate_start_pfn;
+
+splitmap:
+	/* __isolate_free_page() does not map the pages */
+	split_map_pages(freelist);
 }
 
 /*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 14/25] mm, compaction: Avoid rescanning the same pageblock multiple times
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (12 preceding siblings ...)
  2019-01-04 12:49 ` [PATCH 13/25] mm, compaction: Use free lists to quickly locate a migration target Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-17 15:16   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 15/25] mm, compaction: Finish pageblock scanning on contention Mel Gorman
                   ` (14 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Pageblocks are marked for skip when no pages are isolated after a scan.
However, it's possible to hit corner cases where the migration scanner
gets stuck near the boundary between the source and target scanner. Due
to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that
are migrated can be reallocated before the pageblock is complete. The
pageblock is not necessarily skipped so it can be rescanned multiple
times. Similarly, a pageblock with some dirty/writeback pages may fail
to isolate and be rescanned until writeback completes which is wasteful.

This patch tracks if a pageblock is being rescanned. If so, then the entire
pageblock will be migrated as one operation. This narrows the race window
during which pages can be reallocated during migration. Secondly, if there
are pages that cannot be isolated then the pageblock will still be fully
scanned and marked for skipping. On the second rescan, the pageblock skip
is set and the migration scanner makes progress.

                                        4.20.0                 4.20.0
                              finishscan-v2r15         norescan-v2r15
Amean     fault-both-3      3729.80 (   0.00%)     2872.13 *  23.00%*
Amean     fault-both-5      5148.49 (   0.00%)     4330.56 *  15.89%*
Amean     fault-both-7      7393.24 (   0.00%)     6496.63 (  12.13%)
Amean     fault-both-12    11709.32 (   0.00%)    10280.59 (  12.20%)
Amean     fault-both-18    16626.82 (   0.00%)    11079.19 *  33.37%*
Amean     fault-both-24    19944.34 (   0.00%)    17207.80 *  13.72%*
Amean     fault-both-30    23435.53 (   0.00%)    17736.13 *  24.32%*
Amean     fault-both-32    23948.70 (   0.00%)    18509.41 *  22.71%*

                                   4.20.0                 4.20.0
                         finishscan-v2r15         norescan-v2r15
Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
Percentage huge-3        88.39 (   0.00%)       96.87 (   9.60%)
Percentage huge-5        92.07 (   0.00%)       94.63 (   2.77%)
Percentage huge-7        91.96 (   0.00%)       93.83 (   2.03%)
Percentage huge-12       93.38 (   0.00%)       92.65 (  -0.78%)
Percentage huge-18       91.89 (   0.00%)       93.66 (   1.94%)
Percentage huge-24       91.37 (   0.00%)       93.15 (   1.95%)
Percentage huge-30       92.77 (   0.00%)       93.16 (   0.42%)
Percentage huge-32       87.97 (   0.00%)       92.58 (   5.24%)

The fault latency reduction is large and while the THP allocation
success rate is only slightly higher, it's already high at this
point of the series.

Compaction migrate scanned    60718343.00    31772603.00
Compaction free scanned      933061894.00    63267928.00

Migration scan rates are reduced by 48% and free scan rates are
also reduced as the same migration source block is not being selected
multiple times. The corner case where migration scan rates go through the
roof due to a dirty/writeback pageblock located at the boundary of the
migration/free scanner did not happen in this case. When it does happen,
the scan rates multiple by factors measured in the hundreds and would be
misleading to present.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 32 ++++++++++++++++++++++++++------
 mm/internal.h   |  1 +
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9438f0564ed5..9c2cc7955446 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -959,8 +959,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		cc->nr_migratepages++;
 		nr_isolated++;
 
-		/* Avoid isolating too much */
-		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
+		/*
+		 * Avoid isolating too much unless this block is being
+		 * rescanned (e.g. dirty/writeback pages, parallel allocation).
+		 */
+		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX && !cc->rescan) {
 			++low_pfn;
 			break;
 		}
@@ -1007,11 +1010,14 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		spin_unlock_irqrestore(zone_lru_lock(zone), flags);
 
 	/*
-	 * Updated the cached scanner pfn if the pageblock was scanned
-	 * without isolating a page. The pageblock may not be marked
-	 * skipped already if there were no LRU pages in the block.
+	 * Updated the cached scanner pfn once the pageblock has been scanned
+	 * Pages will either be migrated in which case there is no point
+	 * scanning in the near future or migration failed in which case the
+	 * failure reason may persist. The block is marked for skipping if
+	 * there were no pages isolated in the block or if the block is
+	 * rescanned twice in a row.
 	 */
-	if (low_pfn == end_pfn && !nr_isolated) {
+	if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) {
 		if (valid_page && !skip_updated)
 			set_pageblock_skip(valid_page);
 		update_cached_migrate(cc, low_pfn);
@@ -2031,6 +2037,20 @@ static enum compact_result compact_zone(struct compact_control *cc)
 		int err;
 		unsigned long start_pfn = cc->migrate_pfn;
 
+		/*
+		 * Avoid multiple rescans which can happen if a page cannot be
+		 * isolated (dirty/writeback in async mode) or if the migrated
+		 * pages are being allocated before the pageblock is cleared.
+		 * The first rescan will capture the entire pageblock for
+		 * migration. If it fails, it'll be marked skip and scanning
+		 * will proceed as normal.
+		 */
+		cc->rescan = false;
+		if (pageblock_start_pfn(last_migrated_pfn) ==
+		    pageblock_start_pfn(start_pfn)) {
+			cc->rescan = true;
+		}
+
 		switch (isolate_migratepages(cc->zone, cc)) {
 		case ISOLATE_ABORT:
 			ret = COMPACT_CONTENDED;
diff --git a/mm/internal.h b/mm/internal.h
index b25b33c5dd80..e5ca2a10b8ad 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -204,6 +204,7 @@ struct compact_control {
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	bool contended;			/* Signal lock or sched contention */
+	bool rescan;			/* Rescanning the same pageblock */
 };
 
 unsigned long
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 15/25] mm, compaction: Finish pageblock scanning on contention
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (13 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 14/25] mm, compaction: Avoid rescanning the same pageblock multiple times Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-17 16:38   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 16/25] mm, compaction: Check early for huge pages encountered by the migration scanner Mel Gorman
                   ` (13 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Async migration aborts on spinlock contention but contention can be high
when there are multiple compaction attempts and kswapd is active. The
consequence is that the migration scanners move forward uselessly while
still contending on locks for longer while leaving suitable migration
sources behind.

This patch will acquire the lock but track when contention occurs. When
it does, the current pageblock will finish as compaction may succeed for
that block and then abort. This will have a variable impact on latency as
in some cases useless scanning is avoided (reduces latency) but a lock
will be contended (increase latency) or a single contended pageblock is
scanned that would otherwise have been skipped (increase latency).

                                        4.20.0                 4.20.0
                                norescan-v2r15    finishcontend-v2r15
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      2872.13 (   0.00%)     2973.08 (  -3.51%)
Amean     fault-both-5      4330.56 (   0.00%)     3870.19 (  10.63%)
Amean     fault-both-7      6496.63 (   0.00%)     6580.50 (  -1.29%)
Amean     fault-both-12    10280.59 (   0.00%)     9527.40 (   7.33%)
Amean     fault-both-18    11079.19 (   0.00%)    13395.86 * -20.91%*
Amean     fault-both-24    17207.80 (   0.00%)    14936.94 *  13.20%*
Amean     fault-both-30    17736.13 (   0.00%)    16748.46 (   5.57%)
Amean     fault-both-32    18509.41 (   0.00%)    18521.30 (  -0.06%)

                                   4.20.0                 4.20.0
                           norescan-v2r15    finishcontend-v2r15
Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
Percentage huge-3        96.87 (   0.00%)       97.57 (   0.72%)
Percentage huge-5        94.63 (   0.00%)       96.88 (   2.39%)
Percentage huge-7        93.83 (   0.00%)       95.47 (   1.74%)
Percentage huge-12       92.65 (   0.00%)       98.64 (   6.47%)
Percentage huge-18       93.66 (   0.00%)       98.33 (   4.98%)
Percentage huge-24       93.15 (   0.00%)       98.88 (   6.15%)
Percentage huge-30       93.16 (   0.00%)       97.09 (   4.21%)
Percentage huge-32       92.58 (   0.00%)       96.20 (   3.92%)

As expected, a variable impact on latency while allocation success
rates are slightly higher. System CPU usage is reduced by about 10%
but scan rate impact is mixed

Compaction migrate scanned    31772603    19980216
Compaction free scanned       63267928   120381828

Migration scan rates are reduced 37% which is expected as a pageblock
is used by the async scanner instead of skipped but the free scanning is
increased. This can be partially accounted for by the increased success
rate but also by the fact that the scanners do not meet for longer when
pageblocks are actually used. Overall this is justified and completing
a pageblock scan is very important for later patches.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 95 +++++++++++++++++++++++----------------------------------
 1 file changed, 39 insertions(+), 56 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9c2cc7955446..608d274f9880 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -376,24 +376,25 @@ static bool test_and_set_skip(struct compact_control *cc, struct page *page,
 
 /*
  * Compaction requires the taking of some coarse locks that are potentially
- * very heavily contended. For async compaction, back out if the lock cannot
- * be taken immediately. For sync compaction, spin on the lock if needed.
+ * very heavily contended. For async compaction, trylock and record if the
+ * lock is contended. The lock will still be acquired but compaction will
+ * abort when the current block is finished regardless of success rate.
+ * Sync compaction acquires the lock.
  *
- * Returns true if the lock is held
- * Returns false if the lock is not held and compaction should abort
+ * Always returns true which makes it easier to track lock state in callers.
  */
-static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags,
+static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags,
 						struct compact_control *cc)
 {
-	if (cc->mode == MIGRATE_ASYNC) {
-		if (!spin_trylock_irqsave(lock, *flags)) {
-			cc->contended = true;
-			return false;
-		}
-	} else {
-		spin_lock_irqsave(lock, *flags);
+	/* Track if the lock is contended in async mode */
+	if (cc->mode == MIGRATE_ASYNC && !cc->contended) {
+		if (spin_trylock_irqsave(lock, *flags))
+			return true;
+
+		cc->contended = true;
 	}
 
+	spin_lock_irqsave(lock, *flags);
 	return true;
 }
 
@@ -426,10 +427,8 @@ static bool compact_unlock_should_abort(spinlock_t *lock,
 	}
 
 	if (need_resched()) {
-		if (cc->mode == MIGRATE_ASYNC) {
+		if (cc->mode == MIGRATE_ASYNC)
 			cc->contended = true;
-			return true;
-		}
 		cond_resched();
 	}
 
@@ -449,10 +448,8 @@ static inline bool compact_should_abort(struct compact_control *cc)
 {
 	/* async compaction aborts if contended */
 	if (need_resched()) {
-		if (cc->mode == MIGRATE_ASYNC) {
+		if (cc->mode == MIGRATE_ASYNC)
 			cc->contended = true;
-			return true;
-		}
 
 		cond_resched();
 	}
@@ -538,18 +535,8 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 		 * recheck as well.
 		 */
 		if (!locked) {
-			/*
-			 * The zone lock must be held to isolate freepages.
-			 * Unfortunately this is a very coarse lock and can be
-			 * heavily contended if there are parallel allocations
-			 * or parallel compactions. For async compaction do not
-			 * spin on the lock and we acquire the lock as late as
-			 * possible.
-			 */
-			locked = compact_trylock_irqsave(&cc->zone->lock,
+			locked = compact_lock_irqsave(&cc->zone->lock,
 								&flags, cc);
-			if (!locked)
-				break;
 
 			/* Recheck this is a buddy page under lock */
 			if (!PageBuddy(page))
@@ -910,15 +897,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		/* If we already hold the lock, we can skip some rechecking */
 		if (!locked) {
-			locked = compact_trylock_irqsave(zone_lru_lock(zone),
+			locked = compact_lock_irqsave(zone_lru_lock(zone),
 								&flags, cc);
 
-			/* Allow future scanning if the lock is contended */
-			if (!locked) {
-				clear_pageblock_skip(page);
-				break;
-			}
-
 			/* Try get exclusive access under lock */
 			if (!skip_updated) {
 				skip_updated = true;
@@ -961,9 +942,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		/*
 		 * Avoid isolating too much unless this block is being
-		 * rescanned (e.g. dirty/writeback pages, parallel allocation).
+		 * rescanned (e.g. dirty/writeback pages, parallel allocation)
+		 * or a lock is contended. For contention, isolate quickly to
+		 * potentially remove one source of contention.
 		 */
-		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX && !cc->rescan) {
+		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX &&
+		    !cc->rescan && !cc->contended) {
 			++low_pfn;
 			break;
 		}
@@ -1411,12 +1395,8 @@ static void isolate_freepages(struct compact_control *cc)
 		isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
 					freelist, false);
 
-		/*
-		 * If we isolated enough freepages, or aborted due to lock
-		 * contention, terminate.
-		 */
-		if ((cc->nr_freepages >= cc->nr_migratepages)
-							|| cc->contended) {
+		/* Are enough freepages isolated? */
+		if (cc->nr_freepages >= cc->nr_migratepages) {
 			if (isolate_start_pfn >= block_end_pfn) {
 				/*
 				 * Restart at previous pageblock if more
@@ -1458,13 +1438,8 @@ static struct page *compaction_alloc(struct page *migratepage,
 	struct compact_control *cc = (struct compact_control *)data;
 	struct page *freepage;
 
-	/*
-	 * Isolate free pages if necessary, and if we are not aborting due to
-	 * contention.
-	 */
 	if (list_empty(&cc->freepages)) {
-		if (!cc->contended)
-			isolate_freepages(cc);
+		isolate_freepages(cc);
 
 		if (list_empty(&cc->freepages))
 			return NULL;
@@ -1729,7 +1704,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		low_pfn = isolate_migratepages_block(cc, low_pfn,
 						block_end_pfn, isolate_mode);
 
-		if (!low_pfn || cc->contended)
+		if (!low_pfn)
 			return ISOLATE_ABORT;
 
 		/*
@@ -1759,9 +1734,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 {
 	unsigned int order;
 	const int migratetype = cc->migratetype;
-
-	if (cc->contended || fatal_signal_pending(current))
-		return COMPACT_CONTENDED;
+	int ret;
 
 	/* Compaction run completes if the migrate and free scanner meet */
 	if (compact_scanners_met(cc)) {
@@ -1796,6 +1769,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 		return COMPACT_CONTINUE;
 
 	/* Direct compactor: Is a suitable page free? */
+	ret = COMPACT_NO_SUITABLE_PAGE;
 	for (order = cc->order; order < MAX_ORDER; order++) {
 		struct free_area *area = &cc->zone->free_area[order];
 		bool can_steal;
@@ -1835,11 +1809,15 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 				return COMPACT_SUCCESS;
 			}
 
-			return COMPACT_CONTINUE;
+			ret = COMPACT_CONTINUE;
+			break;
 		}
 	}
 
-	return COMPACT_NO_SUITABLE_PAGE;
+	if (cc->contended || fatal_signal_pending(current))
+		ret = COMPACT_CONTENDED;
+
+	return ret;
 }
 
 static enum compact_result compact_finished(struct compact_control *cc)
@@ -1981,6 +1959,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
 	unsigned long end_pfn = zone_end_pfn(cc->zone);
 	unsigned long last_migrated_pfn;
 	const bool sync = cc->mode != MIGRATE_ASYNC;
+	unsigned long a, b, c;
 
 	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
@@ -2026,6 +2005,10 @@ static enum compact_result compact_zone(struct compact_control *cc)
 			cc->whole_zone = true;
 	}
 
+	a = cc->migrate_pfn;
+	b = cc->free_pfn;
+	c = (cc->free_pfn - cc->migrate_pfn) / pageblock_nr_pages;
+
 	last_migrated_pfn = 0;
 
 	trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 16/25] mm, compaction: Check early for huge pages encountered by the migration scanner
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (14 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 15/25] mm, compaction: Finish pageblock scanning on contention Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-17 17:01   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 17/25] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks Mel Gorman
                   ` (12 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

When scanning for sources or targets, PageCompound is checked for huge
pages as they can be skipped quickly but it happens relatively late after
a lot of setup and checking. This patch short-cuts the check to make it
earlier. It might still change when the lock is acquired but this has
less overhead overall. The free scanner advances but the migration scanner
does not. Typically the free scanner encounters more movable blocks that
change state over the lifetime of the system and also tends to scan more
aggressively as it's actively filling its portion of the physical address
space with data. This could change in the future but for the moment,
this worked better in practice and incurred fewer scan restarts.

The impact on latency and allocation success rates is marginal but the
free scan rates are reduced by 32% and system CPU usage is reduced by
2.6%. The 2-socket results are not materially different.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 608d274f9880..921720f7a416 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1071,6 +1071,9 @@ static bool suitable_migration_source(struct compact_control *cc,
 {
 	int block_mt;
 
+	if (pageblock_skip_persistent(page))
+		return false;
+
 	if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
 		return true;
 
@@ -1693,12 +1696,17 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			continue;
 
 		/*
-		 * For async compaction, also only scan in MOVABLE blocks.
-		 * Async compaction is optimistic to see if the minimum amount
-		 * of work satisfies the allocation.
+		 * For async compaction, also only scan in MOVABLE blocks
+		 * without huge pages. Async compaction is optimistic to see
+		 * if the minimum amount of work satisfies the allocation.
+		 * The cached PFN is updated as it's possible that all
+		 * remaining blocks between source and target are suitable
+		 * and the compaction scanners fail to meet.
 		 */
-		if (!suitable_migration_source(cc, page))
+		if (!suitable_migration_source(cc, page)) {
+			update_cached_migrate(cc, block_end_pfn);
 			continue;
+		}
 
 		/* Perform the isolation */
 		low_pfn = isolate_migratepages_block(cc, low_pfn,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 17/25] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (15 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 16/25] mm, compaction: Check early for huge pages encountered by the migration scanner Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-17 17:17   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 18/25] mm, compaction: Rework compact_should_abort as compact_check_resched Mel Gorman
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Migrate has separate cached PFNs for ASYNC and SYNC* migration on the
basis that some migrations will fail in ASYNC mode. However, if the cached
PFNs match at the start of scanning and pageblocks are skipped due to
having no isolation candidates, then the sync state does not matter.
This patch keeps matching cached PFNs in sync until a pageblock with
isolation candidates is found.

The actual benefit is marginal given that the sync scanner following the
async scanner will often skip a number of pageblocks but it's useless
work. Any benefit depends heavily on whether the scanners restarted
recently so overall the reduction in scan rates is a mere 2.8% which
is borderline noise.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 921720f7a416..be27e4fa1b40 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1967,6 +1967,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
 	unsigned long end_pfn = zone_end_pfn(cc->zone);
 	unsigned long last_migrated_pfn;
 	const bool sync = cc->mode != MIGRATE_ASYNC;
+	bool update_cached;
 	unsigned long a, b, c;
 
 	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
@@ -2019,6 +2020,17 @@ static enum compact_result compact_zone(struct compact_control *cc)
 
 	last_migrated_pfn = 0;
 
+	/*
+	 * Migrate has separate cached PFNs for ASYNC and SYNC* migration on
+	 * the basis that some migrations will fail in ASYNC mode. However,
+	 * if the cached PFNs match and pageblocks are skipped due to having
+	 * no isolation candidates, then the sync state does not matter.
+	 * Until a pageblock with isolation candidates is found, keep the
+	 * cached PFNs in sync to avoid revisiting the same blocks.
+	 */
+	update_cached = !sync &&
+		cc->zone->compact_cached_migrate_pfn[0] == cc->zone->compact_cached_migrate_pfn[1];
+
 	trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
 				cc->free_pfn, end_pfn, sync);
 
@@ -2050,6 +2062,11 @@ static enum compact_result compact_zone(struct compact_control *cc)
 			last_migrated_pfn = 0;
 			goto out;
 		case ISOLATE_NONE:
+			if (update_cached) {
+				cc->zone->compact_cached_migrate_pfn[1] =
+					cc->zone->compact_cached_migrate_pfn[0];
+			}
+
 			/*
 			 * We haven't isolated and migrated anything, but
 			 * there might still be unflushed migrations from
@@ -2057,6 +2074,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
 			 */
 			goto check_drain;
 		case ISOLATE_SUCCESS:
+			update_cached = false;
 			last_migrated_pfn = start_pfn;
 			;
 		}
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 18/25] mm, compaction: Rework compact_should_abort as compact_check_resched
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (16 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 17/25] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-17 17:27   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 19/25] mm, compaction: Do not consider a need to reschedule as contention Mel Gorman
                   ` (10 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

With incremental changes, compact_should_abort no longer makes
any documented sense. Rename to compact_check_resched and update the
associated comments.  There is no benefit other than reducing redundant
code and making the intent slightly clearer. It could potentially be
merged with earlier patches but it just makes the review slightly
harder.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 61 ++++++++++++++++++++++-----------------------------------
 1 file changed, 23 insertions(+), 38 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index be27e4fa1b40..1a41a2dbff24 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -398,6 +398,21 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags,
 	return true;
 }
 
+/*
+ * Aside from avoiding lock contention, compaction also periodically checks
+ * need_resched() and records async compaction as contended if necessary.
+ */
+static inline void compact_check_resched(struct compact_control *cc)
+{
+	/* async compaction aborts if contended */
+	if (need_resched()) {
+		if (cc->mode == MIGRATE_ASYNC)
+			cc->contended = true;
+
+		cond_resched();
+	}
+}
+
 /*
  * Compaction requires the taking of some coarse locks that are potentially
  * very heavily contended. The lock should be periodically unlocked to avoid
@@ -426,33 +441,7 @@ static bool compact_unlock_should_abort(spinlock_t *lock,
 		return true;
 	}
 
-	if (need_resched()) {
-		if (cc->mode == MIGRATE_ASYNC)
-			cc->contended = true;
-		cond_resched();
-	}
-
-	return false;
-}
-
-/*
- * Aside from avoiding lock contention, compaction also periodically checks
- * need_resched() and either schedules in sync compaction or aborts async
- * compaction. This is similar to what compact_unlock_should_abort() does, but
- * is used where no lock is concerned.
- *
- * Returns false when no scheduling was needed, or sync compaction scheduled.
- * Returns true when async compaction should abort.
- */
-static inline bool compact_should_abort(struct compact_control *cc)
-{
-	/* async compaction aborts if contended */
-	if (need_resched()) {
-		if (cc->mode == MIGRATE_ASYNC)
-			cc->contended = true;
-
-		cond_resched();
-	}
+	compact_check_resched(cc);
 
 	return false;
 }
@@ -750,8 +739,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			return 0;
 	}
 
-	if (compact_should_abort(cc))
-		return 0;
+	compact_check_resched(cc);
 
 	if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) {
 		skip_on_failure = true;
@@ -1374,12 +1362,10 @@ static void isolate_freepages(struct compact_control *cc)
 				isolate_start_pfn = block_start_pfn) {
 		/*
 		 * This can iterate a massively long zone without finding any
-		 * suitable migration targets, so periodically check if we need
-		 * to schedule, or even abort async compaction.
+		 * suitable migration targets, so periodically check resched.
 		 */
-		if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))
-						&& compact_should_abort(cc))
-			break;
+		if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)))
+			compact_check_resched(cc);
 
 		page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
 									zone);
@@ -1673,11 +1659,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		/*
 		 * This can potentially iterate a massively long zone with
 		 * many pageblocks unsuitable, so periodically check if we
-		 * need to schedule, or even abort async compaction.
+		 * need to schedule.
 		 */
-		if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))
-						&& compact_should_abort(cc))
-			break;
+		if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)))
+			compact_check_resched(cc);
 
 		page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
 									zone);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 19/25] mm, compaction: Do not consider a need to reschedule as contention
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (17 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 18/25] mm, compaction: Rework compact_should_abort as compact_check_resched Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-17 17:33   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner Mel Gorman
                   ` (9 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Scanning on large machines can take a considerable length of time and
eventually need to be rescheduled. This is treated as an abort event but
that's not appropriate as the attempt is likely to be retried after making
numerous checks and taking another cycle through the page allocator.
This patch will check the need to reschedule if necessary but continue
the scanning.

The main benefit is reduced scanning when compaction is taking a long time
or the machine is over-saturated. It also avoids an unnecessary exit of
compaction that ends up being retried by the page allocator in the outer
loop.

                                        4.20.0                 4.20.0
                              synccached-v2r15        noresched-v2r15
Amean     fault-both-3      2655.55 (   0.00%)     2736.50 (  -3.05%)
Amean     fault-both-5      4580.67 (   0.00%)     4133.70 (   9.76%)
Amean     fault-both-7      5740.50 (   0.00%)     5738.61 (   0.03%)
Amean     fault-both-12     9237.55 (   0.00%)     9392.82 (  -1.68%)
Amean     fault-both-18    12899.51 (   0.00%)    13257.15 (  -2.77%)
Amean     fault-both-24    16342.47 (   0.00%)    16859.44 (  -3.16%)
Amean     fault-both-30    20394.26 (   0.00%)    16249.30 *  20.32%*
Amean     fault-both-32    17450.76 (   0.00%)    14904.71 *  14.59%*

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 1a41a2dbff24..75eb0d40d4d7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -398,19 +398,11 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags,
 	return true;
 }
 
-/*
- * Aside from avoiding lock contention, compaction also periodically checks
- * need_resched() and records async compaction as contended if necessary.
- */
+/* Avoid soft-lockups due to long scan times */
 static inline void compact_check_resched(struct compact_control *cc)
 {
-	/* async compaction aborts if contended */
-	if (need_resched()) {
-		if (cc->mode == MIGRATE_ASYNC)
-			cc->contended = true;
-
+	if (need_resched())
 		cond_resched();
-	}
 }
 
 /*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (18 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 19/25] mm, compaction: Do not consider a need to reschedule as contention Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-17 17:58   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 21/25] mm, compaction: Round-robin the order while searching the free lists for a target Mel Gorman
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

The fast isolation of pages can move the scanner faster than is necessary
depending on the contents of the free list. This patch will only allow
the fast isolation to initialise the scanner and advance it slowly. The
primary means of moving the scanner forward is via the linear scanner
to reduce the likelihood the migration source/target scanners meet
prematurely triggering a rescan.

                                        4.20.0                 4.20.0
                               noresched-v2r15         slowfree-v2r15
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      2736.50 (   0.00%)     2512.53 (   8.18%)
Amean     fault-both-5      4133.70 (   0.00%)     4159.43 (  -0.62%)
Amean     fault-both-7      5738.61 (   0.00%)     5950.15 (  -3.69%)
Amean     fault-both-12     9392.82 (   0.00%)     8674.38 (   7.65%)
Amean     fault-both-18    13257.15 (   0.00%)    12850.79 (   3.07%)
Amean     fault-both-24    16859.44 (   0.00%)    17242.86 (  -2.27%)
Amean     fault-both-30    16249.30 (   0.00%)    19404.18 * -19.42%*
Amean     fault-both-32    14904.71 (   0.00%)    16200.79 (  -8.70%)

The impact to latency, success rates and scan rates is marginal but
avoiding unnecessary restarts is important. It helps later patches that
are more careful about how pageblocks are treated as earlier iterations
of those patches hit corner cases where the restarts were punishing and
very visible.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 75eb0d40d4d7..6c5552c6d8f9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -324,10 +324,9 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
  * future. The information is later cleared by __reset_isolation_suitable().
  */
 static void update_pageblock_skip(struct compact_control *cc,
-			struct page *page, unsigned long nr_isolated)
+			struct page *page, unsigned long pfn)
 {
 	struct zone *zone = cc->zone;
-	unsigned long pfn;
 
 	if (cc->no_set_skip_hint)
 		return;
@@ -335,13 +334,8 @@ static void update_pageblock_skip(struct compact_control *cc,
 	if (!page)
 		return;
 
-	if (nr_isolated)
-		return;
-
 	set_pageblock_skip(page);
 
-	pfn = page_to_pfn(page);
-
 	/* Update where async and sync compaction should restart */
 	if (pfn < zone->compact_cached_free_pfn)
 		zone->compact_cached_free_pfn = pfn;
@@ -359,7 +353,7 @@ static inline bool pageblock_skip_persistent(struct page *page)
 }
 
 static inline void update_pageblock_skip(struct compact_control *cc,
-			struct page *page, unsigned long nr_isolated)
+			struct page *page, unsigned long pfn)
 {
 }
 
@@ -450,7 +444,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 				bool strict)
 {
 	int nr_scanned = 0, total_isolated = 0;
-	struct page *cursor, *valid_page = NULL;
+	struct page *cursor;
 	unsigned long flags = 0;
 	bool locked = false;
 	unsigned long blockpfn = *start_pfn;
@@ -477,9 +471,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 		if (!pfn_valid_within(blockpfn))
 			goto isolate_fail;
 
-		if (!valid_page)
-			valid_page = page;
-
 		/*
 		 * For compound pages such as THP and hugetlbfs, we can save
 		 * potentially a lot of iterations if we skip them at once.
@@ -576,10 +567,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 	if (strict && blockpfn < end_pfn)
 		total_isolated = 0;
 
-	/* Update the pageblock-skip if the whole pageblock was scanned */
-	if (blockpfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, total_isolated);
-
 	cc->total_free_scanned += nr_scanned;
 	if (total_isolated)
 		count_compact_events(COMPACTISOLATED, total_isolated);
@@ -1295,8 +1282,10 @@ fast_isolate_freepages(struct compact_control *cc)
 		}
 	}
 
-	if (highest && highest > cc->zone->compact_cached_free_pfn)
+	if (highest && highest >= cc->zone->compact_cached_free_pfn) {
+		highest -= pageblock_nr_pages;
 		cc->zone->compact_cached_free_pfn = highest;
+	}
 
 	cc->total_free_scanned += nr_scanned;
 	if (!page)
@@ -1376,6 +1365,10 @@ static void isolate_freepages(struct compact_control *cc)
 		isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
 					freelist, false);
 
+		/* Update the skip hint if the full pageblock was scanned */
+		if (isolate_start_pfn == block_end_pfn)
+			update_pageblock_skip(cc, page, block_start_pfn);
+
 		/* Are enough freepages isolated? */
 		if (cc->nr_freepages >= cc->nr_migratepages) {
 			if (isolate_start_pfn >= block_end_pfn) {
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 21/25] mm, compaction: Round-robin the order while searching the free lists for a target
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (19 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-18  9:17   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 22/25] mm, compaction: Sample pageblocks for free pages Mel Gorman
                   ` (7 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

As compaction proceeds and creates high-order blocks, the free list
search gets less efficient as the larger blocks are used as compaction
targets. Eventually, the larger blocks will be behind the migration
scanner for partially migrated pageblocks and the search fails. This
patch round-robins what orders are searched so that larger blocks can be
ignored and find smaller blocks that can be used as migration targets.

The overall impact was small on 1-socket but it avoids corner cases where
the migration/free scanners meet prematurely or situations where many of
the pageblocks encountered by the free scanner are almost full instead of
being properly packed. Previous testing had indicated that without this
patch there were occasional large spikes in the free scanner without this
patch. By co-incidence, the 2-socket results showed a 54% reduction in
the free scanner but will not be universally true.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 33 ++++++++++++++++++++++++++++++---
 mm/internal.h   |  3 ++-
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 6c5552c6d8f9..652e249168b1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1154,6 +1154,24 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long
 		set_pageblock_skip(page);
 }
 
+/* Search orders in round-robin fashion */
+static int next_search_order(struct compact_control *cc, int order)
+{
+	order--;
+	if (order < 0)
+		order = cc->order - 1;
+
+	/* Search wrapped around? */
+	if (order == cc->search_order) {
+		cc->search_order--;
+		if (cc->search_order < 0)
+			cc->search_order = cc->order - 1;
+		return -1;
+	}
+
+	return order;
+}
+
 static unsigned long
 fast_isolate_freepages(struct compact_control *cc)
 {
@@ -1186,9 +1204,15 @@ fast_isolate_freepages(struct compact_control *cc)
 	if (WARN_ON_ONCE(min_pfn > low_pfn))
 		low_pfn = min_pfn;
 
-	for (order = cc->order - 1;
-	     order >= 0 && !page;
-	     order--) {
+	/*
+	 * Search starts from the last successful isolation order or the next
+	 * order to search after a previous failure
+	 */
+	cc->search_order = min_t(unsigned int, cc->order - 1, cc->search_order);
+
+	for (order = cc->search_order;
+	     !page && order >= 0;
+	     order = next_search_order(cc, order)) {
 		struct free_area *area = &cc->zone->free_area[order];
 		struct list_head *freelist;
 		struct page *freepage;
@@ -1211,6 +1235,7 @@ fast_isolate_freepages(struct compact_control *cc)
 
 			if (pfn >= low_pfn) {
 				cc->fast_search_fail = 0;
+				cc->search_order = order;
 				page = freepage;
 				break;
 			}
@@ -2146,6 +2171,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.total_migrate_scanned = 0,
 		.total_free_scanned = 0,
 		.order = order,
+		.search_order = order,
 		.gfp_mask = gfp_mask,
 		.zone = zone,
 		.mode = (prio == COMPACT_PRIO_ASYNC) ?
@@ -2385,6 +2411,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 	struct zone *zone;
 	struct compact_control cc = {
 		.order = pgdat->kcompactd_max_order,
+		.search_order = pgdat->kcompactd_max_order,
 		.total_migrate_scanned = 0,
 		.total_free_scanned = 0,
 		.classzone_idx = pgdat->kcompactd_classzone_idx,
diff --git a/mm/internal.h b/mm/internal.h
index e5ca2a10b8ad..d028abd8a8f3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -191,7 +191,8 @@ struct compact_control {
 	struct zone *zone;
 	unsigned long total_migrate_scanned;
 	unsigned long total_free_scanned;
-	unsigned int fast_search_fail;	/* failures to use free list searches */
+	unsigned short fast_search_fail;/* failures to use free list searches */
+	unsigned short search_order;	/* order to start a fast search at */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* migratetype of direct compactor */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 22/25] mm, compaction: Sample pageblocks for free pages
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (20 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 21/25] mm, compaction: Round-robin the order while searching the free lists for a target Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-18 10:38   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 23/25] mm, compaction: Be selective about what pageblocks to clear skip hints Mel Gorman
                   ` (6 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Once fast searching finishes, there is a possibility that the linear
scanner is scanning full blocks found by the fast scanner earlier. This
patch uses an adaptive stride to sample pageblocks for free pages. The
more consecutive full pageblocks encountered, the larger the stride until
a pageblock with free pages is found. The scanners might meet slightly
sooner but it is an acceptable risk given that the search of the free
lists may still encounter the pages and adjust the cached PFN of the free
scanner accordingly.

In terms of latency and success rates, the impact is not obvious but the
free scan rate is reduced by 87% on a 1-socket machine and 92% on a
2-socket machine. It's also the first time in the series where the number
of pages scanned by the migration scanner is greater than the free scanner
due to the increased search efficiency.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 652e249168b1..cc532e81a7b7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -441,6 +441,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 				unsigned long *start_pfn,
 				unsigned long end_pfn,
 				struct list_head *freelist,
+				unsigned int stride,
 				bool strict)
 {
 	int nr_scanned = 0, total_isolated = 0;
@@ -450,10 +451,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 	unsigned long blockpfn = *start_pfn;
 	unsigned int order;
 
+	/* Strict mode is for isolation, speed is secondary */
+	if (strict)
+		stride = 1;
+
 	cursor = pfn_to_page(blockpfn);
 
 	/* Isolate free pages. */
-	for (; blockpfn < end_pfn; blockpfn++, cursor++) {
+	for (; blockpfn < end_pfn; blockpfn += stride, cursor += stride) {
 		int isolated;
 		struct page *page = cursor;
 
@@ -624,7 +629,7 @@ isolate_freepages_range(struct compact_control *cc,
 			break;
 
 		isolated = isolate_freepages_block(cc, &isolate_start_pfn,
-						block_end_pfn, &freelist, true);
+					block_end_pfn, &freelist, 0, true);
 
 		/*
 		 * In strict mode, isolate_freepages_block() returns 0 if
@@ -1139,7 +1144,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long
 
 	/* Scan before */
 	if (start_pfn != pfn) {
-		isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, false);
+		isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, 1, false);
 		if (cc->nr_freepages >= cc->nr_migratepages)
 			return;
 	}
@@ -1147,7 +1152,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long
 	/* Scan after */
 	start_pfn = pfn + nr_isolated;
 	if (start_pfn != end_pfn)
-		isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, false);
+		isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
 
 	/* Skip this pageblock in the future as it's full or nearly full */
 	if (cc->nr_freepages < cc->nr_migratepages)
@@ -1333,7 +1338,9 @@ static void isolate_freepages(struct compact_control *cc)
 	unsigned long isolate_start_pfn; /* exact pfn we start at */
 	unsigned long block_end_pfn;	/* end of current pageblock */
 	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
+	unsigned long nr_isolated;
 	struct list_head *freelist = &cc->freepages;
+	unsigned int stride;
 
 	/* Try a small search of the free lists for a candidate */
 	isolate_start_pfn = fast_isolate_freepages(cc);
@@ -1356,6 +1363,7 @@ static void isolate_freepages(struct compact_control *cc)
 	block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
 						zone_end_pfn(zone));
 	low_pfn = pageblock_end_pfn(cc->migrate_pfn);
+	stride = cc->mode == MIGRATE_ASYNC ? COMPACT_CLUSTER_MAX : 1;
 
 	/*
 	 * Isolate free pages until enough are available to migrate the
@@ -1387,8 +1395,8 @@ static void isolate_freepages(struct compact_control *cc)
 			continue;
 
 		/* Found a block suitable for isolating free pages from. */
-		isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
-					freelist, false);
+		nr_isolated = isolate_freepages_block(cc, &isolate_start_pfn,
+					block_end_pfn, freelist, stride, false);
 
 		/* Update the skip hint if the full pageblock was scanned */
 		if (isolate_start_pfn == block_end_pfn)
@@ -1412,6 +1420,13 @@ static void isolate_freepages(struct compact_control *cc)
 			 */
 			break;
 		}
+
+		/* Adjust stride depending on isolation */
+		if (nr_isolated) {
+			stride = 1;
+			continue;
+		}
+		stride = min_t(unsigned int, COMPACT_CLUSTER_MAX, stride << 1);
 	}
 
 	/*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 23/25] mm, compaction: Be selective about what pageblocks to clear skip hints
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (21 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 22/25] mm, compaction: Sample pageblocks for free pages Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-18 12:55   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 24/25] mm, compaction: Capture a page under direct compaction Mel Gorman
                   ` (5 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Pageblock hints are cleared when compaction restarts or kswapd makes enough
progress that it can sleep but it's over-eager in that the bit is cleared
for migration sources with no LRU pages and migration targets with no free
pages. As pageblock skip hint flushes are relatively rare and out-of-band
with respect to kswapd, this patch makes a few more expensive checks to
see if it's appropriate to even clear the bit. Every pageblock that is
not cleared will avoid 512 pages being scanned unnecessarily on x86-64.

The impact is variable with different workloads showing small differences
in latency, success rates and scan rates. This is expected as clearing
the hints is not that common but doing a small amount of work out-of-band
to avoid a large amount of work in-band later is generally a good thing.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/mmzone.h |   2 +
 mm/compaction.c        | 119 +++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 102 insertions(+), 19 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cc4a507d7ca4..faa1e6523f49 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -480,6 +480,8 @@ struct zone {
 	unsigned long		compact_cached_free_pfn;
 	/* pfn where async and sync compaction migration scanner should start */
 	unsigned long		compact_cached_migrate_pfn[2];
+	unsigned long		compact_init_migrate_pfn;
+	unsigned long		compact_init_free_pfn;
 #endif
 
 #ifdef CONFIG_COMPACTION
diff --git a/mm/compaction.c b/mm/compaction.c
index cc532e81a7b7..7f316e1a7275 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -231,6 +231,62 @@ static bool pageblock_skip_persistent(struct page *page)
 	return false;
 }
 
+static bool
+__reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source,
+							bool check_target)
+{
+	struct page *page = pfn_to_online_page(pfn);
+	struct page *end_page;
+
+	if (!page)
+		return false;
+	if (zone != page_zone(page))
+		return false;
+	if (pageblock_skip_persistent(page))
+		return false;
+
+	/*
+	 * If skip is already cleared do no further checking once the
+	 * restart points have been set.
+	 */
+	if (check_source && check_target && !get_pageblock_skip(page))
+		return true;
+
+	/*
+	 * If clearing skip for the target scanner, do not select a
+	 * non-movable pageblock as the starting point.
+	 */
+	if (!check_source && check_target &&
+	    get_pageblock_migratetype(page) != MIGRATE_MOVABLE)
+		return false;
+
+	/*
+	 * Only clear the hint if a sample indicates there is either a
+	 * free page or an LRU page in the block. One or other condition
+	 * is necessary for the block to be a migration source/target.
+	 */
+	page = pfn_to_page(pageblock_start_pfn(pfn));
+	if (zone != page_zone(page))
+		return false;
+	end_page = page + pageblock_nr_pages;
+
+	do {
+		if (check_source && PageLRU(page)) {
+			clear_pageblock_skip(page);
+			return true;
+		}
+
+		if (check_target && PageBuddy(page)) {
+			clear_pageblock_skip(page);
+			return true;
+		}
+
+		page += (1 << PAGE_ALLOC_COSTLY_ORDER);
+	} while (page < end_page);
+
+	return false;
+}
+
 /*
  * This function is called to clear all cached information on pageblocks that
  * should be skipped for page isolation when the migrate and free page scanner
@@ -238,30 +294,54 @@ static bool pageblock_skip_persistent(struct page *page)
  */
 static void __reset_isolation_suitable(struct zone *zone)
 {
-	unsigned long start_pfn = zone->zone_start_pfn;
-	unsigned long end_pfn = zone_end_pfn(zone);
-	unsigned long pfn;
+	unsigned long migrate_pfn = zone->zone_start_pfn;
+	unsigned long free_pfn = zone_end_pfn(zone);
+	unsigned long reset_migrate = free_pfn;
+	unsigned long reset_free = migrate_pfn;
+	bool source_set = false;
+	bool free_set = false;
 
-	zone->compact_blockskip_flush = false;
+	if (!zone->compact_blockskip_flush)
+		return;
 
-	/* Walk the zone and mark every pageblock as suitable for isolation */
-	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
-		struct page *page;
+	zone->compact_blockskip_flush = false;
 
+	/*
+	 * Walk the zone and update pageblock skip information. Source looks
+	 * for PageLRU while target looks for PageBuddy. When the scanner
+	 * is found, both PageBuddy and PageLRU are checked as the pageblock
+	 * is suitable as both source and target.
+	 */
+	for (; migrate_pfn < free_pfn; migrate_pfn += pageblock_nr_pages,
+					free_pfn -= pageblock_nr_pages) {
 		cond_resched();
 
-		page = pfn_to_online_page(pfn);
-		if (!page)
-			continue;
-		if (zone != page_zone(page))
-			continue;
-		if (pageblock_skip_persistent(page))
-			continue;
+		/* Update the migrate PFN */
+		if (__reset_isolation_pfn(zone, migrate_pfn, true, source_set) &&
+		    migrate_pfn < reset_migrate) {
+			source_set = true;
+			reset_migrate = migrate_pfn;
+			zone->compact_init_migrate_pfn = reset_migrate;
+			zone->compact_cached_migrate_pfn[0] = reset_migrate;
+			zone->compact_cached_migrate_pfn[1] = reset_migrate;
+		}
 
-		clear_pageblock_skip(page);
+		/* Update the free PFN */
+		if (__reset_isolation_pfn(zone, free_pfn, free_set, true) &&
+		    free_pfn > reset_free) {
+			free_set = true;
+			reset_free = free_pfn;
+			zone->compact_init_free_pfn = reset_free;
+			zone->compact_cached_free_pfn = reset_free;
+		}
 	}
 
-	reset_cached_positions(zone);
+	/* Leave no distance if no suitable block was reset */
+	if (reset_migrate >= reset_free) {
+		zone->compact_cached_migrate_pfn[0] = migrate_pfn;
+		zone->compact_cached_migrate_pfn[1] = migrate_pfn;
+		zone->compact_cached_free_pfn = free_pfn;
+	}
 }
 
 void reset_isolation_suitable(pg_data_t *pgdat)
@@ -1193,7 +1273,7 @@ fast_isolate_freepages(struct compact_control *cc)
 	 * If starting the scan, use a deeper search and use the highest
 	 * PFN found if a suitable one is not found.
 	 */
-	if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) {
+	if (cc->free_pfn >= cc->zone->compact_init_free_pfn) {
 		limit = pageblock_nr_pages >> 1;
 		scan_start = true;
 	}
@@ -1338,7 +1418,6 @@ static void isolate_freepages(struct compact_control *cc)
 	unsigned long isolate_start_pfn; /* exact pfn we start at */
 	unsigned long block_end_pfn;	/* end of current pageblock */
 	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
-	unsigned long nr_isolated;
 	struct list_head *freelist = &cc->freepages;
 	unsigned int stride;
 
@@ -1374,6 +1453,8 @@ static void isolate_freepages(struct compact_control *cc)
 				block_end_pfn = block_start_pfn,
 				block_start_pfn -= pageblock_nr_pages,
 				isolate_start_pfn = block_start_pfn) {
+		unsigned long nr_isolated;
+
 		/*
 		 * This can iterate a massively long zone without finding any
 		 * suitable migration targets, so periodically check resched.
@@ -2020,7 +2101,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
 			cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
 		}
 
-		if (cc->migrate_pfn == start_pfn)
+		if (cc->migrate_pfn <= cc->zone->compact_init_migrate_pfn)
 			cc->whole_zone = true;
 	}
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 24/25] mm, compaction: Capture a page under direct compaction
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (22 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 23/25] mm, compaction: Be selective about what pageblocks to clear skip hints Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-18 13:40   ` Vlastimil Babka
  2019-01-04 12:50 ` [PATCH 25/25] mm, compaction: Do not direct compact remote memory Mel Gorman
                   ` (4 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Compaction is inherently race-prone as a suitable page freed during
compaction can be allocated by any parallel task. This patch uses a
capture_control structure to isolate a page immediately when it is freed
by a direct compactor in the slow path of the page allocator. The intent
is to avoid redundant scanning.

                                        4.20.0                 4.20.0
                               selective-v2r15          capture-v2r15
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      2624.85 (   0.00%)     2594.49 (   1.16%)
Amean     fault-both-5      3842.66 (   0.00%)     4088.32 (  -6.39%)
Amean     fault-both-7      5459.47 (   0.00%)     5936.54 (  -8.74%)
Amean     fault-both-12     9276.60 (   0.00%)    10160.85 (  -9.53%)
Amean     fault-both-18    14030.73 (   0.00%)    13908.92 (   0.87%)
Amean     fault-both-24    13298.10 (   0.00%)    16819.86 * -26.48%*
Amean     fault-both-30    17648.62 (   0.00%)    17901.74 (  -1.43%)
Amean     fault-both-32    19161.67 (   0.00%)    18621.32 (   2.82%)

Latency is only moderately affected but the devil is in the details.
A closer examination indicates that base page fault latency is much
reduced but latency of huge pages is increased as it takes creater care
to succeed. Part of the "problem" is that allocation success rates
are close to 100% even when under pressure and compaction gets harder

                                   4.20.0                 4.20.0
                          selective-v2r15          capture-v2r15
Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
Percentage huge-3        99.95 (   0.00%)       99.98 (   0.03%)
Percentage huge-5        98.83 (   0.00%)       98.01 (  -0.84%)
Percentage huge-7        96.78 (   0.00%)       98.30 (   1.58%)
Percentage huge-12       98.85 (   0.00%)       97.76 (  -1.10%)
Percentage huge-18       97.52 (   0.00%)       99.05 (   1.57%)
Percentage huge-24       97.07 (   0.00%)       99.34 (   2.35%)
Percentage huge-30       96.59 (   0.00%)       99.08 (   2.58%)
Percentage huge-32       95.94 (   0.00%)       99.03 (   3.22%)

And scan rates are reduced as expected by 10% for the migration
scanner and 37% for the free scanner indicating that there is
less redundant work.

Compaction migrate scanned    20338945.00    18133661.00
Compaction free scanned       12590377.00     7986174.00

The impact on 2-socket is much larger albeit not presented. Under
a different workload that fragments heavily, the allocation latency
is reduced by 26% while the success rate goes from 63% to 80%

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/compaction.h |  3 ++-
 include/linux/sched.h      |  4 ++++
 kernel/sched/core.c        |  3 +++
 mm/compaction.c            | 31 +++++++++++++++++++------
 mm/internal.h              |  9 +++++++
 mm/page_alloc.c            | 58 ++++++++++++++++++++++++++++++++++++++++++----
 6 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 68250a57aace..b0d530cf46d1 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -95,7 +95,8 @@ extern int sysctl_compact_unevictable_allowed;
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 		unsigned int order, unsigned int alloc_flags,
-		const struct alloc_context *ac, enum compact_priority prio);
+		const struct alloc_context *ac, enum compact_priority prio,
+		struct page **page);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int order,
 		unsigned int alloc_flags, int classzone_idx);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 89541d248893..f5ac0cf9cc32 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -47,6 +47,7 @@ struct pid_namespace;
 struct pipe_inode_info;
 struct rcu_node;
 struct reclaim_state;
+struct capture_control;
 struct robust_list_head;
 struct sched_attr;
 struct sched_param;
@@ -964,6 +965,9 @@ struct task_struct {
 
 	struct io_context		*io_context;
 
+#ifdef CONFIG_COMPACTION
+	struct capture_control		*capture_control;
+#endif
 	/* Ptrace state: */
 	unsigned long			ptrace_message;
 	kernel_siginfo_t		*last_siginfo;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f66920173370..ef478b0daa45 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2177,6 +2177,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	INIT_HLIST_HEAD(&p->preempt_notifiers);
 #endif
 
+#ifdef CONFIG_COMPACTION
+	p->capture_control = NULL;
+#endif
 	init_numa_balancing(clone_flags, p);
 }
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 7f316e1a7275..ae70be023b21 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2051,7 +2051,8 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 	return false;
 }
 
-static enum compact_result compact_zone(struct compact_control *cc)
+static enum compact_result
+compact_zone(struct compact_control *cc, struct capture_control *capc)
 {
 	enum compact_result ret;
 	unsigned long start_pfn = cc->zone->zone_start_pfn;
@@ -2225,6 +2226,11 @@ static enum compact_result compact_zone(struct compact_control *cc)
 			}
 		}
 
+		/* Stop if a page has been captured */
+		if (capc && capc->page) {
+			ret = COMPACT_SUCCESS;
+			break;
+		}
 	}
 
 out:
@@ -2258,7 +2264,8 @@ static enum compact_result compact_zone(struct compact_control *cc)
 
 static enum compact_result compact_zone_order(struct zone *zone, int order,
 		gfp_t gfp_mask, enum compact_priority prio,
-		unsigned int alloc_flags, int classzone_idx)
+		unsigned int alloc_flags, int classzone_idx,
+		struct page **capture)
 {
 	enum compact_result ret;
 	struct compact_control cc = {
@@ -2279,14 +2286,24 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
 		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
 	};
+	struct capture_control capc = {
+		.cc = &cc,
+		.page = NULL,
+	};
+
+	if (capture)
+		current->capture_control = &capc;
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	ret = compact_zone(&cc);
+	ret = compact_zone(&cc, &capc);
 
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
 
+	*capture = capc.page;
+	current->capture_control = NULL;
+
 	return ret;
 }
 
@@ -2304,7 +2321,7 @@ int sysctl_extfrag_threshold = 500;
  */
 enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum compact_priority prio)
+		enum compact_priority prio, struct page **capture)
 {
 	int may_perform_io = gfp_mask & __GFP_IO;
 	struct zoneref *z;
@@ -2332,7 +2349,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		}
 
 		status = compact_zone_order(zone, order, gfp_mask, prio,
-					alloc_flags, ac_classzone_idx(ac));
+				alloc_flags, ac_classzone_idx(ac), capture);
 		rc = max(status, rc);
 
 		/* The allocation should succeed, stop compacting */
@@ -2400,7 +2417,7 @@ static void compact_node(int nid)
 		INIT_LIST_HEAD(&cc.freepages);
 		INIT_LIST_HEAD(&cc.migratepages);
 
-		compact_zone(&cc);
+		compact_zone(&cc, NULL);
 
 		VM_BUG_ON(!list_empty(&cc.freepages));
 		VM_BUG_ON(!list_empty(&cc.migratepages));
@@ -2543,7 +2560,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 
 		if (kthread_should_stop())
 			return;
-		status = compact_zone(&cc);
+		status = compact_zone(&cc, NULL);
 
 		if (status == COMPACT_SUCCESS) {
 			compaction_defer_reset(zone, cc.order, false);
diff --git a/mm/internal.h b/mm/internal.h
index d028abd8a8f3..6b1e5e313855 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -208,6 +208,15 @@ struct compact_control {
 	bool rescan;			/* Rescanning the same pageblock */
 };
 
+/*
+ * Used in direct compaction when a page should be taken from the freelists
+ * immediately when one is created during the free path.
+ */
+struct capture_control {
+	struct compact_control *cc;
+	struct page *page;
+};
+
 unsigned long
 isolate_freepages_range(struct compact_control *cc,
 			unsigned long start_pfn, unsigned long end_pfn);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 05c9a81d54ed..83ea34d8dbe2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -789,6 +789,41 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
 	return 0;
 }
 
+#ifdef CONFIG_COMPACTION
+static inline struct capture_control *task_capc(struct zone *zone)
+{
+	struct capture_control *capc = current->capture_control;
+
+	return capc &&
+		!(current->flags & PF_KTHREAD) &&
+		!capc->page &&
+		capc->cc->zone == zone &&
+		capc->cc->direct_compaction ? capc : NULL;
+}
+
+static inline bool
+compaction_capture(struct capture_control *capc, struct page *page, int order)
+{
+	if (!capc || order != capc->cc->order)
+		return false;
+
+	capc->page = page;
+	return true;
+}
+
+#else
+static inline struct capture_control *task_capc(struct zone *zone)
+{
+	return NULL;
+}
+
+static inline bool
+compaction_capture(struct capture_control *capc, struct page *page, int order)
+{
+	return false;
+}
+#endif /* CONFIG_COMPACTION */
+
 /*
  * Freeing function for a buddy system allocator.
  *
@@ -822,6 +857,7 @@ static inline void __free_one_page(struct page *page,
 	unsigned long uninitialized_var(buddy_pfn);
 	struct page *buddy;
 	unsigned int max_order;
+	struct capture_control *capc = task_capc(zone);
 
 	max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
 
@@ -837,6 +873,12 @@ static inline void __free_one_page(struct page *page,
 
 continue_merging:
 	while (order < max_order - 1) {
+		if (compaction_capture(capc, page, order)) {
+			if (likely(!is_migrate_isolate(migratetype)))
+				__mod_zone_freepage_state(zone, -(1 << order),
+								migratetype);
+			return;
+		}
 		buddy_pfn = __find_buddy_pfn(pfn, order);
 		buddy = page + (buddy_pfn - pfn);
 
@@ -3700,7 +3742,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
 		enum compact_priority prio, enum compact_result *compact_result)
 {
-	struct page *page;
+	struct page *page = NULL;
 	unsigned long pflags;
 	unsigned int noreclaim_flag;
 
@@ -3711,13 +3753,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	noreclaim_flag = memalloc_noreclaim_save();
 
 	*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-									prio);
+								prio, &page);
 
 	memalloc_noreclaim_restore(noreclaim_flag);
 	psi_memstall_leave(&pflags);
 
-	if (*compact_result <= COMPACT_INACTIVE)
+	if (*compact_result <= COMPACT_INACTIVE) {
+		WARN_ON_ONCE(page);
 		return NULL;
+	}
 
 	/*
 	 * At least in one zone compaction wasn't deferred or skipped, so let's
@@ -3725,7 +3769,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	 */
 	count_vm_event(COMPACTSTALL);
 
-	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
+	/* Prep a captured page if available */
+	if (page)
+		prep_new_page(page, order, gfp_mask, alloc_flags);
+
+	/* Try get a page from the freelist if available */
+	if (!page)
+		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 
 	if (page) {
 		struct zone *zone = page_zone(page);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH 25/25] mm, compaction: Do not direct compact remote memory
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (23 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 24/25] mm, compaction: Capture a page under direct compaction Mel Gorman
@ 2019-01-04 12:50 ` Mel Gorman
  2019-01-18 13:51   ` Vlastimil Babka
  2019-01-07 23:43 ` [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Andrew Morton
                   ` (3 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-04 12:50 UTC (permalink / raw)
  To: Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Andrew Morton, Linux List Kernel Mailing, Mel Gorman

Remote compaction is expensive and possibly counter-productive. Locality
is expected to often have better performance characteristics than remote
high-order pages. For small allocations, it's expected that locality is
generally required or fallbacks are possible. For larger allocations such
as THP, they are forbidden at the time of writing but if __GFP_THISNODE
is ever removed, then it would still be preferable to fallback to small
local base pages over remote THP in the general case. kcompactd is still
woken via kswapd so compaction happens eventually.

While this patch potentially has both positive and negative effects,
it is best to avoid the possibility of remote compaction given the cost
relative to any potential benefit.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index ae70be023b21..cc17f0c01811 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2348,6 +2348,16 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			continue;
 		}
 
+		/*
+		 * Do not compact remote memory. It's expensive and high-order
+		 * small allocations are expected to prefer or require local
+		 * memory. Similarly, larger requests such as THP can fallback
+		 * to base pages in preference to remote huge pages if
+		 * __GFP_THISNODE is not specified
+		 */
+		if (zone_to_nid(zone) != zone_to_nid(ac->preferred_zoneref->zone))
+			continue;
+
 		status = compact_zone_order(zone, order, gfp_mask, prio,
 				alloc_flags, ac_classzone_idx(ac), capture);
 		rc = max(status, rc);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH 00/25] Increase success rates and reduce latency of compaction v2
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (24 preceding siblings ...)
  2019-01-04 12:50 ` [PATCH 25/25] mm, compaction: Do not direct compact remote memory Mel Gorman
@ 2019-01-07 23:43 ` Andrew Morton
  2019-01-08  9:12   ` Mel Gorman
  2019-01-09 11:13 ` [PATCH] mm, compaction: Use free lists to quickly locate a migration target -fix Mel Gorman
                   ` (2 subsequent siblings)
  28 siblings, 1 reply; 75+ messages in thread
From: Andrew Morton @ 2019-01-07 23:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, Vlastimil Babka,
	ying.huang, kirill, Linux List Kernel Mailing

On Fri,  4 Jan 2019 12:49:46 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:

> This series reduces scan rates and success rates of compaction, primarily
> by using the free lists to shorten scans, better controlling of skip
> information and whether multiple scanners can target the same block and
> capturing pageblocks before being stolen by parallel requests. The series
> is based on the 4.21/5.0 merge window after Andrew's tree had been merged.
> It's known to rebase cleanly.
> 
> ...
>
>  include/linux/compaction.h |    3 +-
>  include/linux/gfp.h        |    7 +-
>  include/linux/mmzone.h     |    2 +
>  include/linux/sched.h      |    4 +
>  kernel/sched/core.c        |    3 +
>  mm/compaction.c            | 1031 ++++++++++++++++++++++++++++++++++----------
>  mm/internal.h              |   23 +-
>  mm/migrate.c               |    2 +-
>  mm/page_alloc.c            |   70 ++-
>  9 files changed, 908 insertions(+), 237 deletions(-)

Boy that's a lot of material.  I just tossed it in there unread for
now.  Do you have any suggestions as to how we can move ahead with
getting this appropriately reviewed and tested?


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 00/25] Increase success rates and reduce latency of compaction v2
  2019-01-07 23:43 ` [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Andrew Morton
@ 2019-01-08  9:12   ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-08  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, Vlastimil Babka,
	ying.huang, kirill, Linux List Kernel Mailing

On Mon, Jan 07, 2019 at 03:43:54PM -0800, Andrew Morton wrote:
> On Fri,  4 Jan 2019 12:49:46 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > This series reduces scan rates and success rates of compaction, primarily
> > by using the free lists to shorten scans, better controlling of skip
> > information and whether multiple scanners can target the same block and
> > capturing pageblocks before being stolen by parallel requests. The series
> > is based on the 4.21/5.0 merge window after Andrew's tree had been merged.
> > It's known to rebase cleanly.
> > 
> > ...
> >
> >  include/linux/compaction.h |    3 +-
> >  include/linux/gfp.h        |    7 +-
> >  include/linux/mmzone.h     |    2 +
> >  include/linux/sched.h      |    4 +
> >  kernel/sched/core.c        |    3 +
> >  mm/compaction.c            | 1031 ++++++++++++++++++++++++++++++++++----------
> >  mm/internal.h              |   23 +-
> >  mm/migrate.c               |    2 +-
> >  mm/page_alloc.c            |   70 ++-
> >  9 files changed, 908 insertions(+), 237 deletions(-)
> 
> Boy that's a lot of material. 

It's unfortunate I know. It just turned out that there is a lot that had
to change to make the most important patches in the series work without
obvious side-effects.

> I just tossed it in there unread for
> now.  Do you have any suggestions as to how we can move ahead with
> getting this appropriately reviewed and tested?
> 

The main workloads that should see a difference are those that use
MADV_HUGEPAGE or change /sys/kernel/mm/transparent_hugepage/defrag. I'm
expecting MADV_HUGEPAGE is more common in practice. By default, there
should be little change as direct compaction is not used heavily for THP.
Although SLUB workloads might see a difference given a long enough uptime,
it will be relatively difficult to detect.

As this was partially motivated by the __GFP_THISNODE discussion, I
would like to hear from David if this series makes an impact, if any,
when starting Google workloads on a fragmented system.

Similarly, I would be interested in hearing if Andrea's KVM startup times
see any benefit. I'm expecting less here as I expect that workload is
still bound by reclaim thrashing the local node in reclaim. Still, a
confirmation would be nice and if there is any benefit then it's a plus
even if the workload gets reclaimed excessively.

Local tests didn't show up anything interesting *other* than what is
already in the changelogs as those workloads are specifically targetting
those paths. Intel LKP has not reported any regressions (functional or
performance) despite being on git.kernel.org for a few weeks. However,
as they are using default configurations, this is not much of a surprise.

Review is harder. Vlastimil would normally be the best fit as he has
worked on compaction but for him or for anyone else, I'm expecting they're
dealing with a backlog after the holidays.  I know I still have to get
to Vlastimil's recent series on THP allocations so I'm guilty of the same
crime with respect to review.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH] mm, compaction: Use free lists to quickly locate a migration target -fix
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (25 preceding siblings ...)
  2019-01-07 23:43 ` [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Andrew Morton
@ 2019-01-09 11:13 ` Mel Gorman
  2019-01-09 19:27   ` Andrew Morton
  2019-01-09 11:15 ` [PATCH] mm, compaction: Finish pageblock scanning on contention -fix Mel Gorman
  2019-01-09 11:16 ` [PATCH] mm, compaction: Round-robin the order while searching the free lists for a target -fix Mel Gorman
  28 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-09 11:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Linux-MM, Linux List Kernel Mailing

Full compaction of a node passes in negative orders which can lead to array
boundary issues. While it could be addressed in the control flow of the
primary loop, it would be fragile so explicitly check for the condition.
This is a fix for the mmotm patch
broken-out/mm-compaction-use-free-lists-to-quickly-locate-a-migration-target.patch

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9438f0564ed5..167ad0f5c2fe 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1206,6 +1206,10 @@ fast_isolate_freepages(struct compact_control *cc)
 	bool scan_start = false;
 	int order;
 
+	/* Full compaction passes in a negative order */
+	if (order <= 0)
+		return cc->free_pfn;
+
 	/*
 	 * If starting the scan, use a deeper search and use the highest
 	 * PFN found if a suitable one is not found.

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH] mm, compaction: Finish pageblock scanning on contention -fix
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (26 preceding siblings ...)
  2019-01-09 11:13 ` [PATCH] mm, compaction: Use free lists to quickly locate a migration target -fix Mel Gorman
@ 2019-01-09 11:15 ` Mel Gorman
  2019-01-09 11:16 ` [PATCH] mm, compaction: Round-robin the order while searching the free lists for a target -fix Mel Gorman
  28 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-09 11:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, YueHaibing, Linux-MM, Linux List Kernel Mailing

From: YueHaibing <yuehaibing@huawei.com>

Fixes gcc '-Wunused-but-set-variable' warning:

mm/compaction.c: In function 'compact_zone':
mm/compaction.c:2063:22: warning:
 variable 'c' set but not used [-Wunused-but-set-variable]
mm/compaction.c:2063:19: warning:
 variable 'b' set but not used [-Wunused-but-set-variable]
mm/compaction.c:2063:16: warning:
 variable 'a' set but not used [-Wunused-but-set-variable]

This never used since 94d5992baaa5 ("mm, compaction: finish
pageblock scanning on contention"). This is a fix to the mmotm patch
broken-out/mm-compaction-finish-pageblock-scanning-on-contention.patch

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 51da4691092b..ca8da58ce1cd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1963,7 +1963,6 @@ static enum compact_result compact_zone(struct compact_control *cc)
 	unsigned long end_pfn = zone_end_pfn(cc->zone);
 	unsigned long last_migrated_pfn;
 	const bool sync = cc->mode != MIGRATE_ASYNC;
-	unsigned long a, b, c;
 
 	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
@@ -2009,10 +2008,6 @@ static enum compact_result compact_zone(struct compact_control *cc)
 			cc->whole_zone = true;
 	}
 
-	a = cc->migrate_pfn;
-	b = cc->free_pfn;
-	c = (cc->free_pfn - cc->migrate_pfn) / pageblock_nr_pages;
-
 	last_migrated_pfn = 0;
 
 	trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH] mm, compaction: Round-robin the order while searching the free lists for a target -fix
  2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
                   ` (27 preceding siblings ...)
  2019-01-09 11:15 ` [PATCH] mm, compaction: Finish pageblock scanning on contention -fix Mel Gorman
@ 2019-01-09 11:16 ` Mel Gorman
  28 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-09 11:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	Dan Carpenter, kirill, Linux-MM, Linux List Kernel Mailing

Dan Carpenter reported the following static checker warning:

        mm/compaction.c:1252 next_search_order()
        warn: impossible condition '(cc->search_order < 0) => (0-u16max < 0)'

While a negative order never makes sense, the control flow is
easier if search_order is signed. This is a fix to the mmotm patch
broken-out/mm-compaction-round-robin-the-order-while-searching-the-free-lists-for-a-target.patch

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/internal.h b/mm/internal.h
index d028abd8a8f3..e74dbc257550 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -192,7 +192,7 @@ struct compact_control {
 	unsigned long total_migrate_scanned;
 	unsigned long total_free_scanned;
 	unsigned short fast_search_fail;/* failures to use free list searches */
-	unsigned short search_order;	/* order to start a fast search at */
+	short search_order;		/* order to start a fast search at */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* migratetype of direct compactor */

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH] mm, compaction: Use free lists to quickly locate a migration target -fix
  2019-01-09 11:13 ` [PATCH] mm, compaction: Use free lists to quickly locate a migration target -fix Mel Gorman
@ 2019-01-09 19:27   ` Andrew Morton
  2019-01-09 21:26     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Andrew Morton @ 2019-01-09 19:27 UTC (permalink / raw)
  To: Mel Gorman
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Linux-MM, Linux List Kernel Mailing

On Wed, 9 Jan 2019 11:13:44 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:

> Full compaction of a node passes in negative orders which can lead to array
> boundary issues. While it could be addressed in the control flow of the
> primary loop, it would be fragile so explicitly check for the condition.
> This is a fix for the mmotm patch
> broken-out/mm-compaction-use-free-lists-to-quickly-locate-a-migration-target.patch
> 
> ...
>
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1206,6 +1206,10 @@ fast_isolate_freepages(struct compact_control *cc)
>  	bool scan_start = false;
>  	int order;
>  
> +	/* Full compaction passes in a negative order */
> +	if (order <= 0)
> +		return cc->free_pfn;
> +
>  	/*
>  	 * If starting the scan, use a deeper search and use the highest
>  	 * PFN found if a suitable one is not found.

`order' is uninitialized.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH] mm, compaction: Use free lists to quickly locate a migration target -fix
  2019-01-09 19:27   ` Andrew Morton
@ 2019-01-09 21:26     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-09 21:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Andrea Arcangeli, Vlastimil Babka, ying.huang,
	kirill, Linux-MM, Linux List Kernel Mailing

On Wed, Jan 09, 2019 at 11:27:31AM -0800, Andrew Morton wrote:
> On Wed, 9 Jan 2019 11:13:44 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > Full compaction of a node passes in negative orders which can lead to array
> > boundary issues. While it could be addressed in the control flow of the
> > primary loop, it would be fragile so explicitly check for the condition.
> > This is a fix for the mmotm patch
> > broken-out/mm-compaction-use-free-lists-to-quickly-locate-a-migration-target.patch
> > 
> > ...
> >
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1206,6 +1206,10 @@ fast_isolate_freepages(struct compact_control *cc)
> >  	bool scan_start = false;
> >  	int order;
> >  
> > +	/* Full compaction passes in a negative order */
> > +	if (order <= 0)
> > +		return cc->free_pfn;
> > +
> >  	/*
> >  	 * If starting the scan, use a deeper search and use the highest
> >  	 * PFN found if a suitable one is not found.
> 
> `order' is uninitialized.

Twice I managed to send out the wrong one :(

---8<---
mm, compaction: Use free lists to quickly locate a migration target -fix

Full compaction of a node passes in negative orders which can lead to array
boundary issues. While it could be addressed in the control flow of the
primary loop, it would be fragile so explicitly check for the condition.
This is a fix for the mmotm patch
broken-out/mm-compaction-use-free-lists-to-quickly-locate-a-migration-target.patch

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9438f0564ed5..4b46ae96cc1b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1206,6 +1206,10 @@ fast_isolate_freepages(struct compact_control *cc)
 	bool scan_start = false;
 	int order;
 
+	/* Full compaction passes in a negative order */
+	if (cc->order <= 0)
+		return cc->free_pfn;
+
 	/*
 	 * If starting the scan, use a deeper search and use the highest
 	 * PFN found if a suitable one is not found.

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH 04/25] mm, compaction: Remove unnecessary zone parameter in some instances
  2019-01-04 12:49 ` [PATCH 04/25] mm, compaction: Remove unnecessary zone parameter in some instances Mel Gorman
@ 2019-01-15 11:43   ` Vlastimil Babka
  0 siblings, 0 replies; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-15 11:43 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:49 PM, Mel Gorman wrote:
> A zone parameter is passed into a number of top-level compaction functions
> despite the fact that it's already in cache_control. This is harmless but

                                        ^ compact_control

> it did need an audit to check if zone actually ever changes meaningfully.

Tried changing the field to "struct zone * const zone;" and it only
flagged compact_node() and kcompactd_do_work() which look ok.

> This patches removes the parameter in a number of top-level functions. The
> change could be much deeper but this was enough to briefly clarify the
> flow.
> 
> No functional change.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 05/25] mm, compaction: Rename map_pages to split_map_pages
  2019-01-04 12:49 ` [PATCH 05/25] mm, compaction: Rename map_pages to split_map_pages Mel Gorman
@ 2019-01-15 11:59   ` Vlastimil Babka
  0 siblings, 0 replies; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-15 11:59 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:49 PM, Mel Gorman wrote:
> It's non-obvious that high-order free pages are split into order-0 pages
> from the function name. Fix it.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages
  2019-01-04 12:49 ` [PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages Mel Gorman
@ 2019-01-15 12:10   ` Vlastimil Babka
  2019-01-15 12:50     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-15 12:10 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:49 PM, Mel Gorman wrote:
> Reserved pages are set at boot time, tend to be clustered and almost never
> become unreserved. When isolating pages for either migration sources or
> target, skip the entire pageblock is one PageReserved page is encountered
> on the grounds that it is highly probable the entire pageblock is reserved.
> 
> The performance impact is relative to the number of reserved pages in
> the system and their location so it'll be variable but intuitively it
> should make sense. If the memblock allocator was ever changed to spread
> reserved pages throughout the address space then this patch would be
> impaired but it would also be considered a bug given that such a change
> would ruin fragmentation.
> 
> On both 1-socket and 2-socket machines, scan rates are reduced slightly
> on workloads that intensively allocate THP while the system is fragmented.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/compaction.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 3afa4e9188b6..94d1e5b062ea 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -484,6 +484,15 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  			goto isolate_fail;
>  		}
>  
> +		/*
> +		 * A reserved page is never freed and tend to be clustered in
> +		 * the same pageblock. Skip the block.
> +		 */
> +		if (PageReserved(page)) {
> +			blockpfn = end_pfn;
> +			break;
> +		}
> +
>  		if (!PageBuddy(page))
>  			goto isolate_fail;
>  
> @@ -827,6 +836,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  					goto isolate_success;
>  			}
>  
> +			/*
> +			 * A reserved page is never freed and tend to be
> +			 * clustered in the same pageblocks. Skip the block.

AFAICS memory allocator is not the only user of PageReserved. There
seems to be some drivers as well, notably the DRM subsystem via
drm_pci_alloc(). There's an effort to clean those up [1] but until then,
there might be some false positives here.

[1] https://marc.info/?l=linux-mm&m=154747078617898&w=2

> +			 */
> +			if (PageReserved(page))
> +				low_pfn = end_pfn;
> +
>  			goto isolate_fail;
>  		}
>  
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 09/25] mm, compaction: Use the page allocator bulk-free helper for lists of pages
  2019-01-04 12:49 ` [PATCH 09/25] mm, compaction: Use the page allocator bulk-free helper for lists of pages Mel Gorman
@ 2019-01-15 12:39   ` Vlastimil Babka
  2019-01-16  9:46     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-15 12:39 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:49 PM, Mel Gorman wrote:
> release_pages() is a simpler version of free_unref_page_list() but it
> tracks the highest PFN for caching the restart point of the compaction
> free scanner. This patch optionally tracks the highest PFN in the core
> helper and converts compaction to use it. The performance impact is
> limited but it should reduce lock contention slightly in some cases.
> The main benefit is removing some partially duplicated code.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

...

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2876,18 +2876,26 @@ void free_unref_page(struct page *page)
>  /*
>   * Free a list of 0-order pages
>   */
> -void free_unref_page_list(struct list_head *list)
> +void __free_page_list(struct list_head *list, bool dropref,
> +				unsigned long *highest_pfn)
>  {
>  	struct page *page, *next;
>  	unsigned long flags, pfn;
>  	int batch_count = 0;
>  
> +	if (highest_pfn)
> +		*highest_pfn = 0;
> +
>  	/* Prepare pages for freeing */
>  	list_for_each_entry_safe(page, next, list, lru) {
> +		if (dropref)
> +			WARN_ON_ONCE(!put_page_testzero(page));

I've thought about it again and still think it can cause spurious
warnings. We enter this function with one page pin, which means somebody
else might be doing pfn scanning and get_page_unless_zero() with
success, so there are two pins. Then we do the put_page_testzero() above
and go back to one pin, and warn. You said "this function simply does
not expect it and the callers do not violate the rule", but this is
rather about potential parallel pfn scanning activity and not about this
function's callers. Maybe there really is no parallel pfn scanner that
would try to pin a page with a state the page has when it's processed by
this function, but I wouldn't bet on it (any state checks preceding the
pin might also be racy etc.).

>  		pfn = page_to_pfn(page);
>  		if (!free_unref_page_prepare(page, pfn))
>  			list_del(&page->lru);
>  		set_page_private(page, pfn);
> +		if (highest_pfn && pfn > *highest_pfn)
> +			*highest_pfn = pfn;
>  	}
>  
>  	local_irq_save(flags);
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages
  2019-01-15 12:10   ` Vlastimil Babka
@ 2019-01-15 12:50     ` Mel Gorman
  2019-01-16  9:42       ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-15 12:50 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Tue, Jan 15, 2019 at 01:10:57PM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:49 PM, Mel Gorman wrote:
> > Reserved pages are set at boot time, tend to be clustered and almost never
> > become unreserved. When isolating pages for either migration sources or
> > target, skip the entire pageblock is one PageReserved page is encountered
> > on the grounds that it is highly probable the entire pageblock is reserved.
> > 
> > The performance impact is relative to the number of reserved pages in
> > the system and their location so it'll be variable but intuitively it
> > should make sense. If the memblock allocator was ever changed to spread
> > reserved pages throughout the address space then this patch would be
> > impaired but it would also be considered a bug given that such a change
> > would ruin fragmentation.
> > 
> > On both 1-socket and 2-socket machines, scan rates are reduced slightly
> > on workloads that intensively allocate THP while the system is fragmented.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > ---
> >  mm/compaction.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 3afa4e9188b6..94d1e5b062ea 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -484,6 +484,15 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> >  			goto isolate_fail;
> >  		}
> >  
> > +		/*
> > +		 * A reserved page is never freed and tend to be clustered in
> > +		 * the same pageblock. Skip the block.
> > +		 */
> > +		if (PageReserved(page)) {
> > +			blockpfn = end_pfn;
> > +			break;
> > +		}
> > +
> >  		if (!PageBuddy(page))
> >  			goto isolate_fail;
> >  
> > @@ -827,6 +836,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >  					goto isolate_success;
> >  			}
> >  
> > +			/*
> > +			 * A reserved page is never freed and tend to be
> > +			 * clustered in the same pageblocks. Skip the block.
> 
> AFAICS memory allocator is not the only user of PageReserved. There
> seems to be some drivers as well, notably the DRM subsystem via
> drm_pci_alloc(). There's an effort to clean those up [1] but until then,
> there might be some false positives here.
> 
> [1] https://marc.info/?l=linux-mm&m=154747078617898&w=2
> 

Hmm, I'm tempted to leave this anyway. The reservations for PCI space are
likely to be persistent and I also do not expect them to grow much. While
I consider it to be partially abuse to use PageReserved like this, it
should get cleaned up slowly over time. If this turns out to be wrong,
I'll attempt to fix the responsible driver that is scattering
PageReserved around the place and at worst, revert this if it turns out
to be a major problem in practice. Any objections?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 10/25] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction
  2019-01-04 12:49 ` [PATCH 10/25] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction Mel Gorman
@ 2019-01-15 13:18   ` Vlastimil Babka
  0 siblings, 0 replies; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-15 13:18 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:49 PM, Mel Gorman wrote:
> When pageblocks get fragmented, watermarks are artifically boosted to
> reclaim pages to avoid further fragmentation events. However, compaction
> is often either fragmentation-neutral or moving movable pages away from
> unmovable/reclaimable pages. As the true watermarks are preserved, allow
> compaction to ignore the boost factor.
> 
> The expected impact is very slight as the main benefit is that compaction
> is slightly more likely to succeed when the system has been fragmented
> very recently. On both 1-socket and 2-socket machines for THP-intensive
> allocation during fragmentation the success rate was increased by less
> than 1% which is marginal. However, detailed tracing indicated that
> failure of migration due to a premature ENOMEM triggered by watermark
> checks were eliminated.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 57ba9d1da519..05c9a81d54ed 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2958,7 +2958,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  		 * watermark, because we already know our high-order page
>  		 * exists.
>  		 */
> -		watermark = min_wmark_pages(zone) + (1UL << order);
> +		watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
>  		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
>  			return 0;
>  
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages
  2019-01-15 12:50     ` Mel Gorman
@ 2019-01-16  9:42       ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-16  9:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Tue, Jan 15, 2019 at 12:50:45PM +0000, Mel Gorman wrote:
> > AFAICS memory allocator is not the only user of PageReserved. There
> > seems to be some drivers as well, notably the DRM subsystem via
> > drm_pci_alloc(). There's an effort to clean those up [1] but until then,
> > there might be some false positives here.
> > 
> > [1] https://marc.info/?l=linux-mm&m=154747078617898&w=2
> > 
> 
> Hmm, I'm tempted to leave this anyway. The reservations for PCI space are
> likely to be persistent and I also do not expect them to grow much. While
> I consider it to be partially abuse to use PageReserved like this, it
> should get cleaned up slowly over time. If this turns out to be wrong,
> I'll attempt to fix the responsible driver that is scattering
> PageReserved around the place and at worst, revert this if it turns out
> to be a major problem in practice. Any objections?
> 

I decided to drop this anyway as the series does not hinge on it, it's a
relatively minor improvement overall and I don't want to halt the entire
series over it. The maintain that the system would recover even if the
driver released the pages as the check would eventually fail and then be
cleared after a reset. The only downside from the patch that I can see
really is that it's a small maintenance overhead due to an apparent
duplicated check. The CPU overhead of compaction will be slightly higher
due to the revert but there are other options on the horizon that would
bring down that overhead again.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 09/25] mm, compaction: Use the page allocator bulk-free helper for lists of pages
  2019-01-15 12:39   ` Vlastimil Babka
@ 2019-01-16  9:46     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-16  9:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Tue, Jan 15, 2019 at 01:39:28PM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:49 PM, Mel Gorman wrote:
> > release_pages() is a simpler version of free_unref_page_list() but it
> > tracks the highest PFN for caching the restart point of the compaction
> > free scanner. This patch optionally tracks the highest PFN in the core
> > helper and converts compaction to use it. The performance impact is
> > limited but it should reduce lock contention slightly in some cases.
> > The main benefit is removing some partially duplicated code.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> ...
> 
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2876,18 +2876,26 @@ void free_unref_page(struct page *page)
> >  /*
> >   * Free a list of 0-order pages
> >   */
> > -void free_unref_page_list(struct list_head *list)
> > +void __free_page_list(struct list_head *list, bool dropref,
> > +				unsigned long *highest_pfn)
> >  {
> >  	struct page *page, *next;
> >  	unsigned long flags, pfn;
> >  	int batch_count = 0;
> >  
> > +	if (highest_pfn)
> > +		*highest_pfn = 0;
> > +
> >  	/* Prepare pages for freeing */
> >  	list_for_each_entry_safe(page, next, list, lru) {
> > +		if (dropref)
> > +			WARN_ON_ONCE(!put_page_testzero(page));
> 
> I've thought about it again and still think it can cause spurious
> warnings. We enter this function with one page pin, which means somebody
> else might be doing pfn scanning and get_page_unless_zero() with
> success, so there are two pins. Then we do the put_page_testzero() above
> and go back to one pin, and warn. You said "this function simply does
> not expect it and the callers do not violate the rule", but this is
> rather about potential parallel pfn scanning activity and not about this
> function's callers. Maybe there really is no parallel pfn scanner that
> would try to pin a page with a state the page has when it's processed by
> this function, but I wouldn't bet on it (any state checks preceding the
> pin might also be racy etc.).
> 

Ok, I'll drop this patch because in theory you're right. I wouldn't think
that parallel PFN scanning is likely to trigger it but gup is a potential
issue. While this also will increase CPU usage slightly again, it'll be
no worse than it was before and again, I don't want to stall the entire
series over a relatively small optimisation.

Thanks Vlastimil!

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source
  2019-01-04 12:49 ` [PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source Mel Gorman
@ 2019-01-16 13:15   ` Vlastimil Babka
  2019-01-16 14:33     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-16 13:15 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:49 PM, Mel Gorman wrote:
> The migration scanner is a linear scan of a zone with a potentiall large
> search space.  Furthermore, many pageblocks are unusable such as those
> filled with reserved pages or partially filled with pages that cannot
> migrate. These still get scanned in the common case of allocating a THP
> and the cost accumulates.
> 
> The patch uses a partial search of the free lists to locate a migration
> source candidate that is marked as MOVABLE when allocating a THP. It
> prefers picking a block with a larger number of free pages already on
> the basis that there are fewer pages to migrate to free the entire block.
> The lowest PFN found during searches is tracked as the basis of the start
> for the linear search after the first search of the free list fails.
> After the search, the free list is shuffled so that the next search will
> not encounter the same page. If the search fails then the subsequent
> searches will be shorter and the linear scanner is used.
> 
> If this search fails, or if the request is for a small or
> unmovable/reclaimable allocation then the linear scanner is still used. It
> is somewhat pointless to use the list search in those cases. Small free
> pages must be used for the search and there is no guarantee that movable
> pages are located within that block that are contiguous.
> 
>                                         4.20.0                 4.20.0
>                                 failfast-v2r15          findmig-v2r15
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      3833.72 (   0.00%)     3505.69 (   8.56%)
> Amean     fault-both-5      4967.15 (   0.00%)     5794.13 * -16.65%*
> Amean     fault-both-7      7139.19 (   0.00%)     7663.09 (  -7.34%)
> Amean     fault-both-12    11326.30 (   0.00%)    10983.36 (   3.03%)
> Amean     fault-both-18    16270.70 (   0.00%)    13602.71 *  16.40%*
> Amean     fault-both-24    19839.65 (   0.00%)    16145.77 *  18.62%*
> Amean     fault-both-30    21707.05 (   0.00%)    19753.82 (   9.00%)
> Amean     fault-both-32    21968.16 (   0.00%)    20616.16 (   6.15%)
> 
>                                    4.20.0                 4.20.0
>                            failfast-v2r15          findmig-v2r15
> Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
> Percentage huge-3        84.62 (   0.00%)       90.58 (   7.05%)
> Percentage huge-5        88.43 (   0.00%)       91.34 (   3.29%)
> Percentage huge-7        88.33 (   0.00%)       92.21 (   4.39%)
> Percentage huge-12       88.74 (   0.00%)       92.48 (   4.21%)
> Percentage huge-18       86.52 (   0.00%)       91.65 (   5.93%)
> Percentage huge-24       86.42 (   0.00%)       90.23 (   4.41%)
> Percentage huge-30       86.67 (   0.00%)       90.17 (   4.04%)
> Percentage huge-32       86.00 (   0.00%)       89.72 (   4.32%)
> 
> This shows an improvement in allocation latencies and a slight increase
> in allocation success rates. While not presented, there was a 13% reduction
> in migration scanning and a 10% reduction on system CPU usage. A 2-socket
> machine showed similar benefits.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/compaction.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  mm/internal.h   |   2 +
>  2 files changed, 179 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 8f0ce44dba41..137e32e8a2f5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1050,6 +1050,12 @@ static bool suitable_migration_target(struct compact_control *cc,
>  	return false;
>  }
>  
> +static inline unsigned int
> +freelist_scan_limit(struct compact_control *cc)
> +{
> +	return (COMPACT_CLUSTER_MAX >> cc->fast_search_fail) + 1;
> +}
> +
>  /*
>   * Test whether the free scanner has reached the same or lower pageblock than
>   * the migration scanner, and compaction should thus terminate.
> @@ -1060,6 +1066,19 @@ static inline bool compact_scanners_met(struct compact_control *cc)
>  		<= (cc->migrate_pfn >> pageblock_order);
>  }
>  
> +/* Reorder the free list to reduce repeated future searches */
> +static void
> +move_freelist_tail(struct list_head *freelist, struct page *freepage)
> +{
> +	LIST_HEAD(sublist);
> +
> +	if (!list_is_last(freelist, &freepage->lru)) {
> +		list_cut_position(&sublist, freelist, &freepage->lru);
> +		if (!list_empty(&sublist))
> +			list_splice_tail(&sublist, freelist);
> +	}
> +}
> +
>  /*
>   * Based on information in the current compact_control, find blocks
>   * suitable for isolating free pages from and then isolate them.
> @@ -1217,6 +1236,160 @@ typedef enum {
>   */
>  int sysctl_compact_unevictable_allowed __read_mostly = 1;
>  
> +static inline void
> +update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
> +{
> +	if (cc->fast_start_pfn == ULONG_MAX)
> +		return;
> +
> +	if (!cc->fast_start_pfn)
> +		cc->fast_start_pfn = pfn;
> +
> +	cc->fast_start_pfn = min(cc->fast_start_pfn, pfn);
> +}
> +
> +static inline void
> +reinit_migrate_pfn(struct compact_control *cc)
> +{
> +	if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX)
> +		return;
> +
> +	cc->migrate_pfn = cc->fast_start_pfn;
> +	cc->fast_start_pfn = ULONG_MAX;
> +}
> +
> +/*
> + * Briefly search the free lists for a migration source that already has
> + * some free pages to reduce the number of pages that need migration
> + * before a pageblock is free.
> + */
> +static unsigned long fast_find_migrateblock(struct compact_control *cc)
> +{
> +	unsigned int limit = freelist_scan_limit(cc);
> +	unsigned int nr_scanned = 0;
> +	unsigned long distance;
> +	unsigned long pfn = cc->migrate_pfn;
> +	unsigned long high_pfn;
> +	int order;
> +
> +	/* Skip hints are relied on to avoid repeats on the fast search */
> +	if (cc->ignore_skip_hint)
> +		return pfn;
> +
> +	/*
> +	 * If the migrate_pfn is not at the start of a zone or the start
> +	 * of a pageblock then assume this is a continuation of a previous
> +	 * scan restarted due to COMPACT_CLUSTER_MAX.
> +	 */
> +	if (pfn != cc->zone->zone_start_pfn && pfn != pageblock_start_pfn(pfn))
> +		return pfn;
> +
> +	/*
> +	 * For smaller orders, just linearly scan as the number of pages
> +	 * to migrate should be relatively small and does not necessarily
> +	 * justify freeing up a large block for a small allocation.
> +	 */
> +	if (cc->order <= PAGE_ALLOC_COSTLY_ORDER)
> +		return pfn;
> +
> +	/*
> +	 * Only allow kcompactd and direct requests for movable pages to
> +	 * quickly clear out a MOVABLE pageblock for allocation. This
> +	 * reduces the risk that a large movable pageblock is freed for
> +	 * an unmovable/reclaimable small allocation.
> +	 */
> +	if (cc->direct_compaction && cc->migratetype != MIGRATE_MOVABLE)
> +		return pfn;
> +
> +	/*
> +	 * When starting the migration scanner, pick any pageblock within the
> +	 * first half of the search space. Otherwise try and pick a pageblock
> +	 * within the first eighth to reduce the chances that a migration
> +	 * target later becomes a source.
> +	 */
> +	distance = (cc->free_pfn - cc->migrate_pfn) >> 1;
> +	if (cc->migrate_pfn != cc->zone->zone_start_pfn)
> +		distance >>= 2;
> +	high_pfn = pageblock_start_pfn(cc->migrate_pfn + distance);
> +
> +	for (order = cc->order - 1;
> +	     order >= PAGE_ALLOC_COSTLY_ORDER && pfn == cc->migrate_pfn && nr_scanned < limit;
> +	     order--) {
> +		struct free_area *area = &cc->zone->free_area[order];
> +		struct list_head *freelist;
> +		unsigned long nr_skipped = 0;
> +		unsigned long flags;
> +		struct page *freepage;
> +
> +		if (!area->nr_free)
> +			continue;
> +
> +		spin_lock_irqsave(&cc->zone->lock, flags);
> +		freelist = &area->free_list[MIGRATE_MOVABLE];
> +		list_for_each_entry(freepage, freelist, lru) {
> +			unsigned long free_pfn;
> +
> +			nr_scanned++;
> +			free_pfn = page_to_pfn(freepage);
> +			if (free_pfn < high_pfn) {
> +				update_fast_start_pfn(cc, free_pfn);
> +
> +				/*
> +				 * Avoid if skipped recently. Move to the tail
> +				 * of the list so it will not be found again
> +				 * soon
> +				 */
> +				if (get_pageblock_skip(freepage)) {
> +
> +					if (list_is_last(freelist, &freepage->lru))
> +						break;
> +
> +					nr_skipped++;
> +					list_del(&freepage->lru);
> +					list_add_tail(&freepage->lru, freelist);

Use list_move_tail() instead of del+add ? Also is this even safe inside
list_for_each_entry() and not list_for_each_entry_safe()? I guess
without the extra safe iterator, we moved freepage, which is our
iterator, to the tail, so the for cycle will immediately end?
Also is this moving of one page needed when you also have
move_freelist_tail() to move everything we scanned at once?


> +					if (nr_skipped > 2)
> +						break;

Counting number of skips per order seems weird. What's the intention, is
it not to encounter again a page that we already moved to tail? That
could be solved differently, e.g. using only move_freelist_tail()?

> +					continue;
> +				}
> +
> +				/* Reorder to so a future search skips recent pages */
> +				move_freelist_tail(freelist, freepage);
> +
> +				pfn = pageblock_start_pfn(free_pfn);
> +				cc->fast_search_fail = 0;
> +				set_pageblock_skip(freepage);

Hmm with pageblock skip bit set, we return to isolate_migratepages(),
and there's isolation_suitable() check which tests the skip bit, so
AFAICS in the end we skip the pageblock we found here?

> +				break;
> +			}
> +
> +			/*
> +			 * If low PFNs are being found and discarded then
> +			 * limit the scan as fast searching is finding
> +			 * poor candidates.
> +			 */

I wonder about the "low PFNs are being found and discarded" part. Maybe
I'm missing it, but I don't see them being discarded above, this seems
to be the first check against cc->migrate_pfn. With the min() part in
update_fast_start_pfn(), does it mean we can actually go back and rescan
(or skip thanks to skip bits, anyway) again pageblocks that we already
scanned?

> +			if (free_pfn < cc->migrate_pfn)
> +				limit >>= 1;
> +
> +			if (nr_scanned >= limit) {
> +				cc->fast_search_fail++;
> +				move_freelist_tail(freelist, freepage);
> +				break;
> +			}
> +		}
> +		spin_unlock_irqrestore(&cc->zone->lock, flags);
> +	}
> +
> +	cc->total_migrate_scanned += nr_scanned;
> +
> +	/*
> +	 * If fast scanning failed then use a cached entry for a page block
> +	 * that had free pages as the basis for starting a linear scan.
> +	 */
> +	if (pfn == cc->migrate_pfn)
> +		reinit_migrate_pfn(cc);
> +
> +	return pfn;
> +}
> +
>  /*
>   * Isolate all pages that can be migrated from the first suitable block,
>   * starting at the block pointed to by the migrate scanner pfn within
> @@ -1235,9 +1408,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  
>  	/*
>  	 * Start at where we last stopped, or beginning of the zone as
> -	 * initialized by compact_zone()
> +	 * initialized by compact_zone(). The first failure will use
> +	 * the lowest PFN as the starting point for linear scanning.
>  	 */
> -	low_pfn = cc->migrate_pfn;
> +	low_pfn = fast_find_migrateblock(cc);
>  	block_start_pfn = pageblock_start_pfn(low_pfn);
>  	if (block_start_pfn < zone->zone_start_pfn)
>  		block_start_pfn = zone->zone_start_pfn;
> @@ -1560,6 +1734,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
>  	 * want to compact the whole zone), but check that it is initialised
>  	 * by ensuring the values are within zone boundaries.
>  	 */
> +	cc->fast_start_pfn = 0;
>  	if (cc->whole_zone) {
>  		cc->migrate_pfn = start_pfn;
>  		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
> diff --git a/mm/internal.h b/mm/internal.h
> index edb4029f64c8..b25b33c5dd80 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -187,9 +187,11 @@ struct compact_control {
>  	unsigned int nr_migratepages;	/* Number of pages to migrate */
>  	unsigned long free_pfn;		/* isolate_freepages search base */
>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
> +	unsigned long fast_start_pfn;	/* a pfn to start linear scan from */
>  	struct zone *zone;
>  	unsigned long total_migrate_scanned;
>  	unsigned long total_free_scanned;
> +	unsigned int fast_search_fail;	/* failures to use free list searches */
>  	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* migratetype of direct compactor */
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source
  2019-01-16 13:15   ` Vlastimil Babka
@ 2019-01-16 14:33     ` Mel Gorman
  2019-01-16 15:00       ` Vlastimil Babka
  0 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-16 14:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Wed, Jan 16, 2019 at 02:15:10PM +0100, Vlastimil Babka wrote:
> > <SNIP>
> > +			if (free_pfn < high_pfn) {
> > +				update_fast_start_pfn(cc, free_pfn);
> > +
> > +				/*
> > +				 * Avoid if skipped recently. Move to the tail
> > +				 * of the list so it will not be found again
> > +				 * soon
> > +				 */
> > +				if (get_pageblock_skip(freepage)) {
> > +
> > +					if (list_is_last(freelist, &freepage->lru))
> > +						break;
> > +
> > +					nr_skipped++;
> > +					list_del(&freepage->lru);
> > +					list_add_tail(&freepage->lru, freelist);
> 
> Use list$_move_tail() instead of del+add ?

Yep, that will work fine.

> Also is this even safe inside
> list_for_each_entry() and not list_for_each_entry_safe()? I guess
> without the extra safe iterator, we moved freepage, which is our
> iterator, to the tail, so the for cycle will immediately end?

This is an oversight. In an earlier iteration, it always terminated
after a list adjustment. This was abandoned and I should have used the
safe variant after that to avoid an early termination. Will fix.

> Also is this moving of one page needed when you also have
> move_freelist_tail() to move everything we scanned at once?
> 
> 
> > +					if (nr_skipped > 2)
> > +						break;
> 
> Counting number of skips per order seems weird. What's the intention, is
> it not to encounter again a page that we already moved to tail? That
> could be solved differently, e.g. using only move_freelist_tail()?
> 

The intention was to avoid searching a full list of pages that are
marked for skipping and to instead bail early. However, the benefit may
be marginal when the full series is taken into account so rather than
defend it, I'll remove it and readd in isolation iff I have better
supporting data for it.

I'm not going to rely on move_freelist_tail() only because that is used
after a candidate is found and the list terminates where as I want to
continue searching the freelist if the block has been marked for skip.

> > +					continue;
> > +				}
> > +
> > +				/* Reorder to so a future search skips recent pages */
> > +				move_freelist_tail(freelist, freepage);
> > +
> > +				pfn = pageblock_start_pfn(free_pfn);
> > +				cc->fast_search_fail = 0;
> > +				set_pageblock_skip(freepage);
> 
> Hmm with pageblock skip bit set, we return to isolate_migratepages(),
> and there's isolation_suitable() check which tests the skip bit, so
> AFAICS in the end we skip the pageblock we found here?
> 

Hmm, yes. This partially "worked" because the linear scan would continue
after but it starts in the wrong place so the search cost was still
reduced.

> > +				break;
> > +			}
> > +
> > +			/*
> > +			 * If low PFNs are being found and discarded then
> > +			 * limit the scan as fast searching is finding
> > +			 * poor candidates.
> > +			 */
> 
> I wonder about the "low PFNs are being found and discarded" part. Maybe
> I'm missing it, but I don't see them being discarded above, this seems
> to be the first check against cc->migrate_pfn. With the min() part in
> update_fast_start_pfn(), does it mean we can actually go back and rescan
> (or skip thanks to skip bits, anyway) again pageblocks that we already
> scanned?
> 

Extremely poor phrasing. My mind was thinking in terms of discarding
unsuitable candidates as they were below the migration scanner and it
did not translate properly.

Based on your feedback, how does the following untested diff look?

diff --git a/mm/compaction.c b/mm/compaction.c
index 6f649a3bd256..ccc38af323ab 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1572,16 +1572,15 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 	     order--) {
 		struct free_area *area = &cc->zone->free_area[order];
 		struct list_head *freelist;
-		unsigned long nr_skipped = 0;
 		unsigned long flags;
-		struct page *freepage;
+		struct page *freepage, *tmp;
 
 		if (!area->nr_free)
 			continue;
 
 		spin_lock_irqsave(&cc->zone->lock, flags);
 		freelist = &area->free_list[MIGRATE_MOVABLE];
-		list_for_each_entry(freepage, freelist, lru) {
+		list_for_each_entry_safe(freepage, tmp, freelist, lru) {
 			unsigned long free_pfn;
 
 			nr_scanned++;
@@ -1593,15 +1592,10 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 				 * soon
 				 */
 				if (get_pageblock_skip(freepage)) {
-
 					if (list_is_last(freelist, &freepage->lru))
 						break;
 
-					nr_skipped++;
-					list_del(&freepage->lru);
-					list_add_tail(&freepage->lru, freelist);
-					if (nr_skipped > 2)
-						break;
+					list_move_tail(&freepage->lru, freelist);
 					continue;
 				}
 
@@ -1616,9 +1610,11 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 			}
 
 			/*
-			 * If low PFNs are being found and discarded then
-			 * limit the scan as fast searching is finding
-			 * poor candidates.
+			 * PFNs below the migrate scanner are not suitable as
+			 * it may result in pageblocks being rescanned that
+			 * not necessarily marked for skipping. Limit the
+			 * search if unsuitable candidates are being found
+			 * on the freelist.
 			 */
 			if (free_pfn < cc->migrate_pfn)
 				limit >>= 1;
@@ -1659,6 +1655,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	const isolate_mode_t isolate_mode =
 		(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
 		(cc->mode != MIGRATE_SYNC ? ISOLATE_ASYNC_MIGRATE : 0);
+	bool fast_find_block;
 
 	/*
 	 * Start at where we last stopped, or beginning of the zone as
@@ -1670,6 +1667,13 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	if (block_start_pfn < zone->zone_start_pfn)
 		block_start_pfn = zone->zone_start_pfn;
 
+	/*
+	 * fast_find_migrateblock marks a pageblock skipped so to avoid
+	 * the isolation_suitable check below, check whether the fast
+	 * search was successful.
+	 */
+	fast_find_block = low_pfn != cc->migrate_pfn && !cc->fast_search_fail;
+
 	/* Only scan within a pageblock boundary */
 	block_end_pfn = pageblock_end_pfn(low_pfn);
 
@@ -1678,6 +1682,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	 * Do not cross the free scanner.
 	 */
 	for (; block_end_pfn <= cc->free_pfn;
+			fast_find_block = false,
 			low_pfn = block_end_pfn,
 			block_start_pfn = block_end_pfn,
 			block_end_pfn += pageblock_nr_pages) {
@@ -1703,7 +1708,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		 * not scan the same block.
 		 */
 		if (IS_ALIGNED(low_pfn, pageblock_nr_pages) &&
-		    !isolation_suitable(cc, page))
+		    !fast_find_block && !isolation_suitable(cc, page))
 			continue;
 
 		/*

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source
  2019-01-16 14:33     ` Mel Gorman
@ 2019-01-16 15:00       ` Vlastimil Babka
  2019-01-16 15:43         ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-16 15:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/16/19 3:33 PM, Mel Gorman wrote:
>>> +				break;
>>> +			}
>>> +
>>> +			/*
>>> +			 * If low PFNs are being found and discarded then
>>> +			 * limit the scan as fast searching is finding
>>> +			 * poor candidates.
>>> +			 */
>>
>> I wonder about the "low PFNs are being found and discarded" part. Maybe
>> I'm missing it, but I don't see them being discarded above, this seems
>> to be the first check against cc->migrate_pfn. With the min() part in
>> update_fast_start_pfn(), does it mean we can actually go back and rescan
>> (or skip thanks to skip bits, anyway) again pageblocks that we already
>> scanned?
>>
> 
> Extremely poor phrasing. My mind was thinking in terms of discarding
> unsuitable candidates as they were below the migration scanner and it
> did not translate properly.
> 
> Based on your feedback, how does the following untested diff look?

IMHO better. Meanwhile I noticed that the next patch removes the
set_pageblock_skip() so maybe it's needless churn to introduce the
fast_find_block, but I'll check more closely.

The new comment about pfns below cc->migrate_pfn is better but I still
wonder if it would be better to really skip over those candidates (they
are still called unsuitable) and not go backwards with cc->migrate_pfn.
But if you think the pageblock skip bits and halving of limit minimizes
pointless rescan sufficiently, then fine.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source
  2019-01-16 15:00       ` Vlastimil Babka
@ 2019-01-16 15:43         ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-16 15:43 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Wed, Jan 16, 2019 at 04:00:22PM +0100, Vlastimil Babka wrote:
> On 1/16/19 3:33 PM, Mel Gorman wrote:
> >>> +				break;
> >>> +			}
> >>> +
> >>> +			/*
> >>> +			 * If low PFNs are being found and discarded then
> >>> +			 * limit the scan as fast searching is finding
> >>> +			 * poor candidates.
> >>> +			 */
> >>
> >> I wonder about the "low PFNs are being found and discarded" part. Maybe
> >> I'm missing it, but I don't see them being discarded above, this seems
> >> to be the first check against cc->migrate_pfn. With the min() part in
> >> update_fast_start_pfn(), does it mean we can actually go back and rescan
> >> (or skip thanks to skip bits, anyway) again pageblocks that we already
> >> scanned?
> >>
> > 
> > Extremely poor phrasing. My mind was thinking in terms of discarding
> > unsuitable candidates as they were below the migration scanner and it
> > did not translate properly.
> > 
> > Based on your feedback, how does the following untested diff look?
> 
> IMHO better. Meanwhile I noticed that the next patch removes the
> set_pageblock_skip() so maybe it's needless churn to introduce the
> fast_find_block, but I'll check more closely.
> 

Indeed but the patches should standalone and preserve bisection as best
as possible so while it's weird looking, I'll add the logic and just take
it back out again in the next patch. Merging the patches together would
be lead to a tricky review!

> The new comment about pfns below cc->migrate_pfn is better but I still
> wonder if it would be better to really skip over those candidates (they
> are still called unsuitable) and not go backwards with cc->migrate_pfn.
> But if you think the pageblock skip bits and halving of limit minimizes
> pointless rescan sufficiently, then fine.

I'll check if it works out better to ensure they are really skipped.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance
  2019-01-04 12:49 ` [PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance Mel Gorman
@ 2019-01-16 15:45   ` Vlastimil Babka
  2019-01-16 16:15     ` Mel Gorman
  2019-01-17  9:40   ` Vlastimil Babka
  1 sibling, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-16 15:45 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:49 PM, Mel Gorman wrote:
> Due to either a fast search of the free list or a linear scan, it is
> possible for multiple compaction instances to pick the same pageblock
> for migration.  This is lucky for one scanner and increased scanning for
> all the others. It also allows a race between requests on which first
> allocates the resulting free block.
> 
> This patch tests and updates the pageblock skip for the migration scanner
> carefully. When isolating a block, it will check and skip if the block is
> already in use. Once the zone lock is acquired, it will be rechecked so
> that only one scanner can set the pageblock skip for exclusive use. Any
> scanner contending will continue with a linear scan. The skip bit is
> still set if no pages can be isolated in a range.

Also the skip bit will remain set even if pages *could* be isolated,
AFAICS there's no clearing after a block was finished with
nr_isolated>0. Is it intended? Note even previously it wasn't ideal,
because when pageblock was visited multiple times due to
COMPACT_CLUSTER_MAX, it would be marked with skip bit if the last visit
failed to isolate, even if the previous visits didn't.

> While this may result
> in redundant scanning, it avoids unnecessarily acquiring the zone lock
> when there are no suitable migration sources.



> 1-socket thpscale
>                                         4.20.0                 4.20.0
>                                  findmig-v2r15          isolmig-v2r15
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      3505.69 (   0.00%)     3066.68 *  12.52%*
> Amean     fault-both-5      5794.13 (   0.00%)     4298.49 *  25.81%*
> Amean     fault-both-7      7663.09 (   0.00%)     5986.99 *  21.87%*
> Amean     fault-both-12    10983.36 (   0.00%)     9324.85 (  15.10%)
> Amean     fault-both-18    13602.71 (   0.00%)    13350.05 (   1.86%)
> Amean     fault-both-24    16145.77 (   0.00%)    13491.77 *  16.44%*
> Amean     fault-both-30    19753.82 (   0.00%)    15630.86 *  20.87%*
> Amean     fault-both-32    20616.16 (   0.00%)    17428.50 *  15.46%*
> 
> This is the first patch that shows a significant reduction in latency as
> multiple compaction scanners do not operate on the same blocks. There is
> a small increase in the success rate
> 
>                                4.20.0-rc6             4.20.0-rc6
>                              findmig-v1r4           isolmig-v1r4
> Percentage huge-3        90.58 (   0.00%)       95.84 (   5.81%)
> Percentage huge-5        91.34 (   0.00%)       94.19 (   3.12%)
> Percentage huge-7        92.21 (   0.00%)       93.78 (   1.71%)
> Percentage huge-12       92.48 (   0.00%)       94.33 (   2.00%)
> Percentage huge-18       91.65 (   0.00%)       94.15 (   2.72%)
> Percentage huge-24       90.23 (   0.00%)       94.23 (   4.43%)
> Percentage huge-30       90.17 (   0.00%)       95.17 (   5.54%)
> Percentage huge-32       89.72 (   0.00%)       93.59 (   4.32%)
> 
> Compaction migrate scanned    54168306    25516488
> Compaction free scanned      800530954    87603321
> 
> Migration scan rates are reduced by 52%.

Wonder how much of that is due to not clearing as pointed out above.
Also interesting how free scanned was reduced so disproportionally.

> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance
  2019-01-16 15:45   ` Vlastimil Babka
@ 2019-01-16 16:15     ` Mel Gorman
  2019-01-17  9:29       ` Vlastimil Babka
  0 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-16 16:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Wed, Jan 16, 2019 at 04:45:59PM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:49 PM, Mel Gorman wrote:
> > Due to either a fast search of the free list or a linear scan, it is
> > possible for multiple compaction instances to pick the same pageblock
> > for migration.  This is lucky for one scanner and increased scanning for
> > all the others. It also allows a race between requests on which first
> > allocates the resulting free block.
> > 
> > This patch tests and updates the pageblock skip for the migration scanner
> > carefully. When isolating a block, it will check and skip if the block is
> > already in use. Once the zone lock is acquired, it will be rechecked so
> > that only one scanner can set the pageblock skip for exclusive use. Any
> > scanner contending will continue with a linear scan. The skip bit is
> > still set if no pages can be isolated in a range.
> 
> Also the skip bit will remain set even if pages *could* be isolated,

That's the point -- the pageblock is scanned by one compaction instance
and skipped by others.

> AFAICS there's no clearing after a block was finished with
> nr_isolated>0. Is it intended?

Yes, defer to a full reset later when the compaction scanners meet.
Tracing really indicated we spent a stupid amount of time scanning,
rescanning and competing for pageblocks within short intervals.

> > Migration scan rates are reduced by 52%.
> 
> Wonder how much of that is due to not clearing as pointed out above.
> Also interesting how free scanned was reduced so disproportionally.
> 

The amount of free scanning is related to the amount of migration
scanning. If migration sources are scanning, rescanning and competing
for the same pageblocks, it can result in unnecessary free scanning too.
It doesn't fully explain the drop but I didn't specifically try to quantify
it either as the free scanner is altered further in later patches.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance
  2019-01-16 16:15     ` Mel Gorman
@ 2019-01-17  9:29       ` Vlastimil Babka
  0 siblings, 0 replies; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-17  9:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/16/19 5:15 PM, Mel Gorman wrote:
> On Wed, Jan 16, 2019 at 04:45:59PM +0100, Vlastimil Babka wrote:
>> On 1/4/19 1:49 PM, Mel Gorman wrote:
>> > Due to either a fast search of the free list or a linear scan, it is
>> > possible for multiple compaction instances to pick the same pageblock
>> > for migration.  This is lucky for one scanner and increased scanning for
>> > all the others. It also allows a race between requests on which first
>> > allocates the resulting free block.
>> > 
>> > This patch tests and updates the pageblock skip for the migration scanner
>> > carefully. When isolating a block, it will check and skip if the block is
>> > already in use. Once the zone lock is acquired, it will be rechecked so
>> > that only one scanner can set the pageblock skip for exclusive use. Any
>> > scanner contending will continue with a linear scan. The skip bit is
>> > still set if no pages can be isolated in a range.
>> 
>> Also the skip bit will remain set even if pages *could* be isolated,
> 
> That's the point -- the pageblock is scanned by one compaction instance
> and skipped by others.

OK, I understood wrongly that this is meant just to avoid races.

>> AFAICS there's no clearing after a block was finished with
>> nr_isolated>0. Is it intended?
> 
> Yes, defer to a full reset later when the compaction scanners meet.
> Tracing really indicated we spent a stupid amount of time scanning,
> rescanning and competing for pageblocks within short interval.

Right.

>> > Migration scan rates are reduced by 52%.
>> 
>> Wonder how much of that is due to not clearing as pointed out above.
>> Also interesting how free scanned was reduced so disproportionally.
>> 
> 
> The amount of free scanning is related to the amount of migration
> scanning. If migration sources are scanning, rescanning and competing
> for the same pageblocks, it can result in unnecessary free scanning too.
> It doesn't fully explain the drop but I didn't specifically try to quantify
> it either as the free scanner is altered further in later patches.

Perhaps lots of skipping in migration scanners mean that they progress faster
into the parts of zone that would otherwise be scanned by the free scanner, so
the free scanner has less work to do. But agree that it's moot to investigate
too much if there are further changes later.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance
  2019-01-04 12:49 ` [PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance Mel Gorman
  2019-01-16 15:45   ` Vlastimil Babka
@ 2019-01-17  9:40   ` Vlastimil Babka
  1 sibling, 0 replies; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-17  9:40 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:49 PM, Mel Gorman wrote:
> Due to either a fast search of the free list or a linear scan, it is
> possible for multiple compaction instances to pick the same pageblock
> for migration.  This is lucky for one scanner and increased scanning for
> all the others. It also allows a race between requests on which first
> allocates the resulting free block.
> 
> This patch tests and updates the pageblock skip for the migration scanner
> carefully. When isolating a block, it will check and skip if the block is
> already in use. Once the zone lock is acquired, it will be rechecked so
> that only one scanner can set the pageblock skip for exclusive use. Any
> scanner contending will continue with a linear scan. The skip bit is
> still set if no pages can be isolated in a range. While this may result
> in redundant scanning, it avoids unnecessarily acquiring the zone lock
> when there are no suitable migration sources.
> 
> 1-socket thpscale
>                                         4.20.0                 4.20.0
>                                  findmig-v2r15          isolmig-v2r15
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      3505.69 (   0.00%)     3066.68 *  12.52%*
> Amean     fault-both-5      5794.13 (   0.00%)     4298.49 *  25.81%*
> Amean     fault-both-7      7663.09 (   0.00%)     5986.99 *  21.87%*
> Amean     fault-both-12    10983.36 (   0.00%)     9324.85 (  15.10%)
> Amean     fault-both-18    13602.71 (   0.00%)    13350.05 (   1.86%)
> Amean     fault-both-24    16145.77 (   0.00%)    13491.77 *  16.44%*
> Amean     fault-both-30    19753.82 (   0.00%)    15630.86 *  20.87%*
> Amean     fault-both-32    20616.16 (   0.00%)    17428.50 *  15.46%*
> 
> This is the first patch that shows a significant reduction in latency as
> multiple compaction scanners do not operate on the same blocks. There is
> a small increase in the success rate
> 
>                                4.20.0-rc6             4.20.0-rc6
>                              findmig-v1r4           isolmig-v1r4
> Percentage huge-3        90.58 (   0.00%)       95.84 (   5.81%)
> Percentage huge-5        91.34 (   0.00%)       94.19 (   3.12%)
> Percentage huge-7        92.21 (   0.00%)       93.78 (   1.71%)
> Percentage huge-12       92.48 (   0.00%)       94.33 (   2.00%)
> Percentage huge-18       91.65 (   0.00%)       94.15 (   2.72%)
> Percentage huge-24       90.23 (   0.00%)       94.23 (   4.43%)
> Percentage huge-30       90.17 (   0.00%)       95.17 (   5.54%)
> Percentage huge-32       89.72 (   0.00%)       93.59 (   4.32%)
> 
> Compaction migrate scanned    54168306    25516488
> Compaction free scanned      800530954    87603321
> 
> Migration scan rates are reduced by 52%.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 13/25] mm, compaction: Use free lists to quickly locate a migration target
  2019-01-04 12:49 ` [PATCH 13/25] mm, compaction: Use free lists to quickly locate a migration target Mel Gorman
@ 2019-01-17 14:36   ` Vlastimil Babka
  2019-01-17 15:51     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-17 14:36 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:49 PM, Mel Gorman wrote:
> Similar to the migration scanner, this patch uses the free lists to quickly
> locate a migration target. The search is different in that lower orders
> will be searched for a suitable high PFN if necessary but the search
> is still bound. This is justified on the grounds that the free scanner
> typically scans linearly much more than the migration scanner.
> 
> If a free page is found, it is isolated and compaction continues if enough
> pages were isolated. For SYNC* scanning, the full pageblock is scanned
> for any remaining free pages so that is can be marked for skipping in
> the near future.
> 
> 1-socket thpfioscale
>                                         4.20.0                 4.20.0
>                                  isolmig-v2r15         findfree-v2r15
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      3066.68 (   0.00%)     2884.51 (   5.94%)
> Amean     fault-both-5      4298.49 (   0.00%)     4419.70 (  -2.82%)
> Amean     fault-both-7      5986.99 (   0.00%)     6039.04 (  -0.87%)
> Amean     fault-both-12     9324.85 (   0.00%)     9992.34 (  -7.16%)
> Amean     fault-both-18    13350.05 (   0.00%)    12690.05 (   4.94%)
> Amean     fault-both-24    13491.77 (   0.00%)    14393.93 (  -6.69%)
> Amean     fault-both-30    15630.86 (   0.00%)    16894.08 (  -8.08%)
> Amean     fault-both-32    17428.50 (   0.00%)    17813.68 (  -2.21%)
> 
> The impact on latency is variable but the search is optimistic and
> sensitive to the exact system state. Success rates are similar but
> the major impact is to the rate of scanning
> 
>                             4.20.0-rc6  4.20.0-rc6
>                           isolmig-v1r4findfree-v1r8
> Compaction migrate scanned    25516488    28324352
> Compaction free scanned       87603321    56131065
> 
> The free scan rates are reduced by 35%. The 2-socket reductions for the
> free scanner are more dramatic which is a likely reflection that the
> machine has more memory.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/compaction.c | 203 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 198 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 24e3a9db4b70..9438f0564ed5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1136,7 +1136,7 @@ static inline bool compact_scanners_met(struct compact_control *cc)
>  
>  /* Reorder the free list to reduce repeated future searches */
>  static void
> -move_freelist_tail(struct list_head *freelist, struct page *freepage)
> +move_freelist_head(struct list_head *freelist, struct page *freepage)
>  {
>  	LIST_HEAD(sublist);
>  
> @@ -1147,6 +1147,193 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
>  	}
>  }

Hmm this hunk appears to simply rename move_freelist_tail() to
move_freelist_head(), but fast_find_migrateblock() is unchanged, so it now calls
the new version below.

> +static void
> +move_freelist_tail(struct list_head *freelist, struct page *freepage)
> +{
> +	LIST_HEAD(sublist);
> +
> +	if (!list_is_last(freelist, &freepage->lru)) {
> +		list_cut_before(&sublist, freelist, &freepage->lru);
> +		if (!list_empty(&sublist))
> +			list_splice_tail(&sublist, freelist);
> +	}
> +}

And this differs in using list_cut_before() instead of list_cut_position(). I'm
assuming that move_freelist_tail() was supposed to be unchanged, and
move_freelist_head() different, but it's the opposite. BTW it would be nice to
document both of the functions what they are doing on the high level :) The one
above was a bit tricky to decode to me, as it seems to be moving the initial
part of list to the tail, to effectively move the latter part of the list
(including freepage) to the head.

> +static void
> +fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated)
> +{
> +	unsigned long start_pfn, end_pfn;
> +	struct page *page = pfn_to_page(pfn);
> +
> +	/* Do not search around if there are enough pages already */
> +	if (cc->nr_freepages >= cc->nr_migratepages)
> +		return;
> +
> +	/* Minimise scanning during async compaction */
> +	if (cc->direct_compaction && cc->mode == MIGRATE_ASYNC)
> +		return;
> +
> +	/* Pageblock boundaries */
> +	start_pfn = pageblock_start_pfn(pfn);
> +	end_pfn = min(start_pfn + pageblock_nr_pages, zone_end_pfn(cc->zone));
> +
> +	/* Scan before */
> +	if (start_pfn != pfn) {
> +		isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, false);
> +		if (cc->nr_freepages >= cc->nr_migratepages)
> +			return;
> +	}
> +
> +	/* Scan after */
> +	start_pfn = pfn + nr_isolated;
> +	if (start_pfn != end_pfn)
> +		isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, false);
> +
> +	/* Skip this pageblock in the future as it's full or nearly full */
> +	if (cc->nr_freepages < cc->nr_migratepages)
> +		set_pageblock_skip(page);
> +}
> +
> +static unsigned long
> +fast_isolate_freepages(struct compact_control *cc)
> +{
> +	unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
> +	unsigned int order_scanned = 0, nr_scanned = 0;
> +	unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0;
> +	unsigned long nr_isolated = 0;
> +	unsigned long distance;
> +	struct page *page = NULL;
> +	bool scan_start = false;
> +	int order;
> +
> +	/*
> +	 * If starting the scan, use a deeper search and use the highest
> +	 * PFN found if a suitable one is not found.
> +	 */
> +	if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) {
> +		limit = pageblock_nr_pages >> 1;
> +		scan_start = true;
> +	}
> +
> +	/*
> +	 * Preferred point is in the top quarter of the scan space but take
> +	 * a pfn from the top half if the search is problematic.
> +	 */
> +	distance = (cc->free_pfn - cc->migrate_pfn);
> +	low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2));
> +	min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1));
> +
> +	if (WARN_ON_ONCE(min_pfn > low_pfn))
> +		low_pfn = min_pfn;
> +
> +	for (order = cc->order - 1;
> +	     order >= 0 && !page;
> +	     order--) {
> +		struct free_area *area = &cc->zone->free_area[order];
> +		struct list_head *freelist;
> +		struct page *freepage;
> +		unsigned long flags;
> +
> +		if (!area->nr_free)
> +			continue;
> +
> +		spin_lock_irqsave(&cc->zone->lock, flags);
> +		freelist = &area->free_list[MIGRATE_MOVABLE];
> +		list_for_each_entry_reverse(freepage, freelist, lru) {
> +			unsigned long pfn;
> +
> +			order_scanned++;
> +			nr_scanned++;

Seems order_scanned is supposed to be reset to 0 for each new order? Otherwise
it's equivalent to nr_scanned...

> +			pfn = page_to_pfn(freepage);
> +
> +			if (pfn >= highest)
> +				highest = pageblock_start_pfn(pfn);
> +
> +			if (pfn >= low_pfn) {
> +				cc->fast_search_fail = 0;
> +				page = freepage;
> +				break;
> +			}
> +
> +			if (pfn >= min_pfn && pfn > high_pfn) {
> +				high_pfn = pfn;
> +
> +				/* Shorten the scan if a candidate is found */
> +				limit >>= 1;
> +			}
> +
> +			if (order_scanned >= limit)
> +				break;
> +		}
> +
> +		/* Use a minimum pfn if a preferred one was not found */
> +		if (!page && high_pfn) {
> +			page = pfn_to_page(high_pfn);
> +
> +			/* Update freepage for the list reorder below */
> +			freepage = page;
> +		}
> +
> +		/* Reorder to so a future search skips recent pages */
> +		move_freelist_head(freelist, freepage);
> +
> +		/* Isolate the page if available */
> +		if (page) {
> +			if (__isolate_free_page(page, order)) {
> +				set_page_private(page, order);
> +				nr_isolated = 1 << order;
> +				cc->nr_freepages += nr_isolated;
> +				list_add_tail(&page->lru, &cc->freepages);
> +				count_compact_events(COMPACTISOLATED, nr_isolated);
> +			} else {
> +				/* If isolation fails, abort the search */
> +				order = -1;
> +				page = NULL;
> +			}
> +		}
> +
> +		spin_unlock_irqrestore(&cc->zone->lock, flags);
> +
> +		/*
> +		 * Smaller scan on next order so the total scan ig related
> +		 * to freelist_scan_limit.
> +		 */
> +		if (order_scanned >= limit)

... and this also indicates order_scanned is supposed to be reset.

> +			limit = min(1U, limit >> 1);
> +	}
> +
> +	if (!page) {
> +		cc->fast_search_fail++;
> +		if (scan_start) {
> +			/*
> +			 * Use the highest PFN found above min. If one was
> +			 * not found, be pessemistic for direct compaction
> +			 * and use the min mark.
> +			 */
> +			if (highest) {
> +				page = pfn_to_page(highest);
> +				cc->free_pfn = highest;
> +			} else {
> +				if (cc->direct_compaction) {
> +					page = pfn_to_page(min_pfn);
> +					cc->free_pfn = min_pfn;
> +				}
> +			}
> +		}
> +	}
> +
> +	if (highest && highest > cc->zone->compact_cached_free_pfn)
> +		cc->zone->compact_cached_free_pfn = highest;
> +
> +	cc->total_free_scanned += nr_scanned;
> +	if (!page)
> +		return cc->free_pfn;
> +
> +	low_pfn = page_to_pfn(page);
> +	fast_isolate_around(cc, low_pfn, nr_isolated);
> +	return low_pfn;
> +}
> +
>  /*
>   * Based on information in the current compact_control, find blocks
>   * suitable for isolating free pages from and then isolate them.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 14/25] mm, compaction: Avoid rescanning the same pageblock multiple times
  2019-01-04 12:50 ` [PATCH 14/25] mm, compaction: Avoid rescanning the same pageblock multiple times Mel Gorman
@ 2019-01-17 15:16   ` Vlastimil Babka
  2019-01-17 16:00     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-17 15:16 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> Pageblocks are marked for skip when no pages are isolated after a scan.
> However, it's possible to hit corner cases where the migration scanner
> gets stuck near the boundary between the source and target scanner. Due
> to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that
> are migrated can be reallocated before the pageblock is complete. The
> pageblock is not necessarily skipped so it can be rescanned multiple
> times. Similarly, a pageblock with some dirty/writeback pages may fail
> to isolate and be rescanned until writeback completes which is wasteful.

     ^ migrate? If we failed to isolate, then it wouldn't bump nr_isolated.
Wonder if we could do better checks and not isolate pages that cannot be at the
moment migrated anyway.

> 
> This patch tracks if a pageblock is being rescanned. If so, then the entire
> pageblock will be migrated as one operation. This narrows the race window
> during which pages can be reallocated during migration. Secondly, if there
> are pages that cannot be isolated then the pageblock will still be fully
> scanned and marked for skipping. On the second rescan, the pageblock skip
> is set and the migration scanner makes progress.
> 
>                                         4.20.0                 4.20.0
>                               finishscan-v2r15         norescan-v2r15
> Amean     fault-both-3      3729.80 (   0.00%)     2872.13 *  23.00%*
> Amean     fault-both-5      5148.49 (   0.00%)     4330.56 *  15.89%*
> Amean     fault-both-7      7393.24 (   0.00%)     6496.63 (  12.13%)
> Amean     fault-both-12    11709.32 (   0.00%)    10280.59 (  12.20%)
> Amean     fault-both-18    16626.82 (   0.00%)    11079.19 *  33.37%*
> Amean     fault-both-24    19944.34 (   0.00%)    17207.80 *  13.72%*
> Amean     fault-both-30    23435.53 (   0.00%)    17736.13 *  24.32%*
> Amean     fault-both-32    23948.70 (   0.00%)    18509.41 *  22.71%*
> 
>                                    4.20.0                 4.20.0
>                          finishscan-v2r15         norescan-v2r15
> Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
> Percentage huge-3        88.39 (   0.00%)       96.87 (   9.60%)
> Percentage huge-5        92.07 (   0.00%)       94.63 (   2.77%)
> Percentage huge-7        91.96 (   0.00%)       93.83 (   2.03%)
> Percentage huge-12       93.38 (   0.00%)       92.65 (  -0.78%)
> Percentage huge-18       91.89 (   0.00%)       93.66 (   1.94%)
> Percentage huge-24       91.37 (   0.00%)       93.15 (   1.95%)
> Percentage huge-30       92.77 (   0.00%)       93.16 (   0.42%)
> Percentage huge-32       87.97 (   0.00%)       92.58 (   5.24%)
> 
> The fault latency reduction is large and while the THP allocation
> success rate is only slightly higher, it's already high at this
> point of the series.
> 
> Compaction migrate scanned    60718343.00    31772603.00
> Compaction free scanned      933061894.00    63267928.00

Hm I thought the order of magnitude difference between migrate and free scanned
was already gone at this point as reported in the previous 2 patches. Or is this
from different system/configuration? Anyway, encouraging result. I would expect
that after "Keep migration source private to a single compaction instance" sets
the skip bits much more early and aggressively, the rescans would not happen
anymore thanks to those, even if cached pfns were not updated.

> Migration scan rates are reduced by 48% and free scan rates are
> also reduced as the same migration source block is not being selected
> multiple times. The corner case where migration scan rates go through the
> roof due to a dirty/writeback pageblock located at the boundary of the
> migration/free scanner did not happen in this case. When it does happen,
> the scan rates multiple by factors measured in the hundreds and would be
> misleading to present.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 13/25] mm, compaction: Use free lists to quickly locate a migration target
  2019-01-17 14:36   ` Vlastimil Babka
@ 2019-01-17 15:51     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-17 15:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Thu, Jan 17, 2019 at 03:36:08PM +0100, Vlastimil Babka wrote:
> >  /* Reorder the free list to reduce repeated future searches */
> >  static void
> > -move_freelist_tail(struct list_head *freelist, struct page *freepage)
> > +move_freelist_head(struct list_head *freelist, struct page *freepage)
> >  {
> >  	LIST_HEAD(sublist);
> >  
> > @@ -1147,6 +1147,193 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
> >  	}
> >  }
> 
> Hmm this hunk appears to simply rename move_freelist_tail() to
> move_freelist_head(), but fast_find_migrateblock() is unchanged, so it now calls
> the new version below.
> 

Rebase screwup. I'll fix it up and retest

> <SNIP>
> BTW it would be nice to
> document both of the functions what they are doing on the high level :) The one
> above was a bit tricky to decode to me, as it seems to be moving the initial
> part of list to the tail, to effectively move the latter part of the list
> (including freepage) to the head.
> 

I'll include a blurb.

> > +	/*
> > +	 * If starting the scan, use a deeper search and use the highest
> > +	 * PFN found if a suitable one is not found.
> > +	 */
> > +	if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) {
> > +		limit = pageblock_nr_pages >> 1;
> > +		scan_start = true;
> > +	}
> > +
> > +	/*
> > +	 * Preferred point is in the top quarter of the scan space but take
> > +	 * a pfn from the top half if the search is problematic.
> > +	 */
> > +	distance = (cc->free_pfn - cc->migrate_pfn);
> > +	low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2));
> > +	min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1));
> > +
> > +	if (WARN_ON_ONCE(min_pfn > low_pfn))
> > +		low_pfn = min_pfn;
> > +
> > +	for (order = cc->order - 1;
> > +	     order >= 0 && !page;
> > +	     order--) {
> > +		struct free_area *area = &cc->zone->free_area[order];
> > +		struct list_head *freelist;
> > +		struct page *freepage;
> > +		unsigned long flags;
> > +
> > +		if (!area->nr_free)
> > +			continue;
> > +
> > +		spin_lock_irqsave(&cc->zone->lock, flags);
> > +		freelist = &area->free_list[MIGRATE_MOVABLE];
> > +		list_for_each_entry_reverse(freepage, freelist, lru) {
> > +			unsigned long pfn;
> > +
> > +			order_scanned++;
> > +			nr_scanned++;
> 
> Seems order_scanned is supposed to be reset to 0 for each new order? Otherwise
> it's equivalent to nr_scanned...
> 

Yes, it was meant to be. Not sure at what point I broke that and failed
to spot it afterwards. As you note elsewhere, the code structure doesn't
make sense if it wasn't been set to 0. Instead of doing a shorter search
at each order, it would simply check one page for each lower order.

Thanks!

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 14/25] mm, compaction: Avoid rescanning the same pageblock multiple times
  2019-01-17 15:16   ` Vlastimil Babka
@ 2019-01-17 16:00     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-17 16:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Thu, Jan 17, 2019 at 04:16:54PM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:50 PM, Mel Gorman wrote:
> > Pageblocks are marked for skip when no pages are isolated after a scan.
> > However, it's possible to hit corner cases where the migration scanner
> > gets stuck near the boundary between the source and target scanner. Due
> > to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that
> > are migrated can be reallocated before the pageblock is complete. The
> > pageblock is not necessarily skipped so it can be rescanned multiple
> > times. Similarly, a pageblock with some dirty/writeback pages may fail
> > to isolate and be rescanned until writeback completes which is wasteful.
> 
>      ^ migrate? If we failed to isolate, then it wouldn't bump nr_isolated.
> Wonder if we could do better checks and not isolate pages that cannot be at the
> moment migrated anyway.
> 

Potentially but it would be considered a layering violation. There may be
per-fs reasons why a page cannot migrate and no matter how well we check,
there will be race conditions.

> > The fault latency reduction is large and while the THP allocation
> > success rate is only slightly higher, it's already high at this
> > point of the series.
> > 
> > Compaction migrate scanned    60718343.00    31772603.00
> > Compaction free scanned      933061894.00    63267928.00
> 
> Hm I thought the order of magnitude difference between migrate and free scanned
> was already gone at this point as reported in the previous 2 patches.

There are corner cases that mean there can be large differences for a
single run. In some cases it doesn't matter but this one might have been
unlucky. It's something that occurs less as the series progresses.

> Or is this
> from different system/configuration?

I don't *think* so. While I had multiple machines running tests, I'm
pretty sure I wrote the changelogs based on one machine and only checked
the others had nothing strange.

> Anyway, encouraging result. I would expect
> that after "Keep migration source private to a single compaction instance" sets
> the skip bits much more early and aggressively, the rescans would not happen
> anymore thanks to those, even if cached pfns were not updated.
> 

Yes and no. The corner case where the scanner gets stuck rescanning one
pageblock can happen when the fast search fails. In that case, the
linear scanner needs to get to the end of a pageblock and if it fails,
it'll simply rescan like a lunatic. This happened specifically for pages
under writeback for me.

> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 

Thanks

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 15/25] mm, compaction: Finish pageblock scanning on contention
  2019-01-04 12:50 ` [PATCH 15/25] mm, compaction: Finish pageblock scanning on contention Mel Gorman
@ 2019-01-17 16:38   ` Vlastimil Babka
  2019-01-17 17:11     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-17 16:38 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> Async migration aborts on spinlock contention but contention can be high
> when there are multiple compaction attempts and kswapd is active. The
> consequence is that the migration scanners move forward uselessly while
> still contending on locks for longer while leaving suitable migration
> sources behind.
> 
> This patch will acquire the lock but track when contention occurs. When
> it does, the current pageblock will finish as compaction may succeed for
> that block and then abort. This will have a variable impact on latency as
> in some cases useless scanning is avoided (reduces latency) but a lock
> will be contended (increase latency) or a single contended pageblock is
> scanned that would otherwise have been skipped (increase latency).
> 
>                                         4.20.0                 4.20.0
>                                 norescan-v2r15    finishcontend-v2r15
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      2872.13 (   0.00%)     2973.08 (  -3.51%)
> Amean     fault-both-5      4330.56 (   0.00%)     3870.19 (  10.63%)
> Amean     fault-both-7      6496.63 (   0.00%)     6580.50 (  -1.29%)
> Amean     fault-both-12    10280.59 (   0.00%)     9527.40 (   7.33%)
> Amean     fault-both-18    11079.19 (   0.00%)    13395.86 * -20.91%*
> Amean     fault-both-24    17207.80 (   0.00%)    14936.94 *  13.20%*
> Amean     fault-both-30    17736.13 (   0.00%)    16748.46 (   5.57%)
> Amean     fault-both-32    18509.41 (   0.00%)    18521.30 (  -0.06%)
> 
>                                    4.20.0                 4.20.0
>                            norescan-v2r15    finishcontend-v2r15
> Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
> Percentage huge-3        96.87 (   0.00%)       97.57 (   0.72%)
> Percentage huge-5        94.63 (   0.00%)       96.88 (   2.39%)
> Percentage huge-7        93.83 (   0.00%)       95.47 (   1.74%)
> Percentage huge-12       92.65 (   0.00%)       98.64 (   6.47%)
> Percentage huge-18       93.66 (   0.00%)       98.33 (   4.98%)
> Percentage huge-24       93.15 (   0.00%)       98.88 (   6.15%)
> Percentage huge-30       93.16 (   0.00%)       97.09 (   4.21%)
> Percentage huge-32       92.58 (   0.00%)       96.20 (   3.92%)
> 
> As expected, a variable impact on latency while allocation success
> rates are slightly higher. System CPU usage is reduced by about 10%
> but scan rate impact is mixed
> 
> Compaction migrate scanned    31772603    19980216
> Compaction free scanned       63267928   120381828
> 
> Migration scan rates are reduced 37% which is expected as a pageblock
> is used by the async scanner instead of skipped but the free scanning is
> increased. This can be partially accounted for by the increased success
> rate but also by the fact that the scanners do not meet for longer when
> pageblocks are actually used. Overall this is justified and completing
> a pageblock scan is very important for later patches.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Some comments below.

> @@ -538,18 +535,8 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  		 * recheck as well.
>  		 */
>  		if (!locked) {
> -			/*
> -			 * The zone lock must be held to isolate freepages.
> -			 * Unfortunately this is a very coarse lock and can be
> -			 * heavily contended if there are parallel allocations
> -			 * or parallel compactions. For async compaction do not
> -			 * spin on the lock and we acquire the lock as late as
> -			 * possible.
> -			 */
> -			locked = compact_trylock_irqsave(&cc->zone->lock,
> +			locked = compact_lock_irqsave(&cc->zone->lock,
>  								&flags, cc);
> -			if (!locked)
> -				break;

Seems a bit dangerous to continue compact_lock_irqsave() to return bool that
however now always returns true, and remove the safety checks that test the
result. Easy for somebody in the future to reintroduce some 'return false'
condition (even though the name now says lock and not trylock) and start
crashing. I would either change it to return void, or leave the checks in place.

>  
>  			/* Recheck this is a buddy page under lock */
>  			if (!PageBuddy(page))
> @@ -910,15 +897,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  		/* If we already hold the lock, we can skip some rechecking */
>  		if (!locked) {
> -			locked = compact_trylock_irqsave(zone_lru_lock(zone),
> +			locked = compact_lock_irqsave(zone_lru_lock(zone),
>  								&flags, cc);
>  
> -			/* Allow future scanning if the lock is contended */
> -			if (!locked) {
> -				clear_pageblock_skip(page);
> -				break;
> -			}

Ditto.

> -
>  			/* Try get exclusive access under lock */
>  			if (!skip_updated) {
>  				skip_updated = true;
> @@ -961,9 +942,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  		/*
>  		 * Avoid isolating too much unless this block is being
> -		 * rescanned (e.g. dirty/writeback pages, parallel allocation).
> +		 * rescanned (e.g. dirty/writeback pages, parallel allocation)
> +		 * or a lock is contended. For contention, isolate quickly to
> +		 * potentially remove one source of contention.
>  		 */
> -		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX && !cc->rescan) {
> +		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX &&
> +		    !cc->rescan && !cc->contended) {
>  			++low_pfn;
>  			break;
>  		}
> @@ -1411,12 +1395,8 @@ static void isolate_freepages(struct compact_control *cc)
>  		isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
>  					freelist, false);
>  
> -		/*
> -		 * If we isolated enough freepages, or aborted due to lock
> -		 * contention, terminate.
> -		 */
> -		if ((cc->nr_freepages >= cc->nr_migratepages)
> -							|| cc->contended) {

Does it really make sense to continue in the case of free scanner, when we know
we will just return back the extra pages in the end? release_freepages() will
update the cached pfns, but the pageblock skip bit will stay, so we just leave
those pages behind. Unless finishing the block is important for the later
patches (as changelog mentions) even in the case of free scanner, but then we
can just skip the rest of it, as truly scanning it can't really help anything?

> +		/* Are enough freepages isolated? */
> +		if (cc->nr_freepages >= cc->nr_migratepages) {
>  			if (isolate_start_pfn >= block_end_pfn) {
>  				/*
>  				 * Restart at previous pageblock if more

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 16/25] mm, compaction: Check early for huge pages encountered by the migration scanner
  2019-01-04 12:50 ` [PATCH 16/25] mm, compaction: Check early for huge pages encountered by the migration scanner Mel Gorman
@ 2019-01-17 17:01   ` Vlastimil Babka
  2019-01-17 17:35     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-17 17:01 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> When scanning for sources or targets, PageCompound is checked for huge
> pages as they can be skipped quickly but it happens relatively late after
> a lot of setup and checking. This patch short-cuts the check to make it
> earlier. It might still change when the lock is acquired but this has
> less overhead overall. The free scanner advances but the migration scanner
> does not. Typically the free scanner encounters more movable blocks that
> change state over the lifetime of the system and also tends to scan more
> aggressively as it's actively filling its portion of the physical address
> space with data. This could change in the future but for the moment,
> this worked better in practice and incurred fewer scan restarts.
> 
> The impact on latency and allocation success rates is marginal but the
> free scan rates are reduced by 32% and system CPU usage is reduced by
> 2.6%. The 2-socket results are not materially different.

Hmm, interesting that adjusting migrate scanner affected free scanner. Oh well.

> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Nit below.

> ---
>  mm/compaction.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 608d274f9880..921720f7a416 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1071,6 +1071,9 @@ static bool suitable_migration_source(struct compact_control *cc,
>  {
>  	int block_mt;
>  
> +	if (pageblock_skip_persistent(page))
> +		return false;
> +
>  	if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
>  		return true;
>  
> @@ -1693,12 +1696,17 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  			continue;
>  
>  		/*
> -		 * For async compaction, also only scan in MOVABLE blocks.
> -		 * Async compaction is optimistic to see if the minimum amount
> -		 * of work satisfies the allocation.
> +		 * For async compaction, also only scan in MOVABLE blocks
> +		 * without huge pages. Async compaction is optimistic to see
> +		 * if the minimum amount of work satisfies the allocation.
> +		 * The cached PFN is updated as it's possible that all
> +		 * remaining blocks between source and target are suitable

								  ^ unsuitable?

> +		 * and the compaction scanners fail to meet.
>  		 */
> -		if (!suitable_migration_source(cc, page))
> +		if (!suitable_migration_source(cc, page)) {
> +			update_cached_migrate(cc, block_end_pfn);
>  			continue;
> +		}
>  
>  		/* Perform the isolation */
>  		low_pfn = isolate_migratepages_block(cc, low_pfn,
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 15/25] mm, compaction: Finish pageblock scanning on contention
  2019-01-17 16:38   ` Vlastimil Babka
@ 2019-01-17 17:11     ` Mel Gorman
  2019-01-18  8:57       ` Vlastimil Babka
  0 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-17 17:11 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Thu, Jan 17, 2019 at 05:38:36PM +0100, Vlastimil Babka wrote:
> > rate but also by the fact that the scanners do not meet for longer when
> > pageblocks are actually used. Overall this is justified and completing
> > a pageblock scan is very important for later patches.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Some comments below.
> 

Thanks

> > @@ -538,18 +535,8 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> >  		 * recheck as well.
> >  		 */
> >  		if (!locked) {
> > -			/*
> > -			 * The zone lock must be held to isolate freepages.
> > -			 * Unfortunately this is a very coarse lock and can be
> > -			 * heavily contended if there are parallel allocations
> > -			 * or parallel compactions. For async compaction do not
> > -			 * spin on the lock and we acquire the lock as late as
> > -			 * possible.
> > -			 */
> > -			locked = compact_trylock_irqsave(&cc->zone->lock,
> > +			locked = compact_lock_irqsave(&cc->zone->lock,
> >  								&flags, cc);
> > -			if (!locked)
> > -				break;
> 
> Seems a bit dangerous to continue compact_lock_irqsave() to return bool that
> however now always returns true, and remove the safety checks that test the
> result. Easy for somebody in the future to reintroduce some 'return false'
> condition (even though the name now says lock and not trylock) and start
> crashing. I would either change it to return void, or leave the checks in place.
> 

I considered changing it from bool at the same time as "Rework
compact_should_abort as compact_check_resched". It turned out to be a
bit clumsy because the locked state must be explicitly updated in the
caller then. e.g.

locked = compact_lock_irqsave(...)

becomes

compact_lock_irqsave(...)
locked = true

I didn't think the result looked that great to be honest but maybe it's
worth revisiting as a cleanup patch like "Rework compact_should_abort as
compact_check_resched" on top.

> > 
> > @@ -1411,12 +1395,8 @@ static void isolate_freepages(struct compact_control *cc)
> >  		isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
> >  					freelist, false);
> >  
> > -		/*
> > -		 * If we isolated enough freepages, or aborted due to lock
> > -		 * contention, terminate.
> > -		 */
> > -		if ((cc->nr_freepages >= cc->nr_migratepages)
> > -							|| cc->contended) {
> 
> Does it really make sense to continue in the case of free scanner, when we know
> we will just return back the extra pages in the end? release_freepages() will
> update the cached pfns, but the pageblock skip bit will stay, so we just leave
> those pages behind. Unless finishing the block is important for the later
> patches (as changelog mentions) even in the case of free scanner, but then we
> can just skip the rest of it, as truly scanning it can't really help anything?
> 

Finishing is important for later patches is one factor but not the only
factor. While we eventually return all pages, we do not know at this
point in time how many free pages are needed. Remember the migration
source isolates COMPACT_CLUSTER_MAX pages and then looks for migration
targets.  If the source isolates 32 pages, free might isolate more from
one pageblock but that's ok as the migration source may need more free
pages in the immediate future. It's less wasteful than it looks at first
glance (or second or even third glance).

However, if we isolated exactly enough targets, and the pageblock gets
marked skipped, then each COMPACT_CLUSTER_MAX isolation from the target
could potentially marge one new pageblock unnecessarily and increase
scanning+resets overall. That would be bad.

There still can be waste because we do not know in advance exactly how
many migration sources there will be -- sure, we could calculate it but
that involves scanning the source pageblock twice which is wasteful.
I did try estimating it based on the remaining number of pages in the
pageblock but the additional complexity did not appear to help.

Does that make sense?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 17/25] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks
  2019-01-04 12:50 ` [PATCH 17/25] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks Mel Gorman
@ 2019-01-17 17:17   ` Vlastimil Babka
  2019-01-17 17:37     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-17 17:17 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> Migrate has separate cached PFNs for ASYNC and SYNC* migration on the
> basis that some migrations will fail in ASYNC mode. However, if the cached
> PFNs match at the start of scanning and pageblocks are skipped due to
> having no isolation candidates, then the sync state does not matter.
> This patch keeps matching cached PFNs in sync until a pageblock with
> isolation candidates is found.
> 
> The actual benefit is marginal given that the sync scanner following the
> async scanner will often skip a number of pageblocks but it's useless
> work. Any benefit depends heavily on whether the scanners restarted
> recently so overall the reduction in scan rates is a mere 2.8% which
> is borderline noise.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

My easlier suggestion to check more thoroughly if pages can be migrated (which
depends on the mode) before isolating them wouldn't play nice with this :)



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 18/25] mm, compaction: Rework compact_should_abort as compact_check_resched
  2019-01-04 12:50 ` [PATCH 18/25] mm, compaction: Rework compact_should_abort as compact_check_resched Mel Gorman
@ 2019-01-17 17:27   ` Vlastimil Babka
  0 siblings, 0 replies; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-17 17:27 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> With incremental changes, compact_should_abort no longer makes
> any documented sense. Rename to compact_check_resched and update the
> associated comments.  There is no benefit other than reducing redundant
> code and making the intent slightly clearer. It could potentially be
> merged with earlier patches but it just makes the review slightly
> harder.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 19/25] mm, compaction: Do not consider a need to reschedule as contention
  2019-01-04 12:50 ` [PATCH 19/25] mm, compaction: Do not consider a need to reschedule as contention Mel Gorman
@ 2019-01-17 17:33   ` Vlastimil Babka
  2019-01-17 18:05     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-17 17:33 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> Scanning on large machines can take a considerable length of time and
> eventually need to be rescheduled. This is treated as an abort event but
> that's not appropriate as the attempt is likely to be retried after making
> numerous checks and taking another cycle through the page allocator.
> This patch will check the need to reschedule if necessary but continue
> the scanning.
> 
> The main benefit is reduced scanning when compaction is taking a long time
> or the machine is over-saturated. It also avoids an unnecessary exit of
> compaction that ends up being retried by the page allocator in the outer
> loop.
> 
>                                         4.20.0                 4.20.0
>                               synccached-v2r15        noresched-v2r15
> Amean     fault-both-3      2655.55 (   0.00%)     2736.50 (  -3.05%)
> Amean     fault-both-5      4580.67 (   0.00%)     4133.70 (   9.76%)
> Amean     fault-both-7      5740.50 (   0.00%)     5738.61 (   0.03%)
> Amean     fault-both-12     9237.55 (   0.00%)     9392.82 (  -1.68%)
> Amean     fault-both-18    12899.51 (   0.00%)    13257.15 (  -2.77%)
> Amean     fault-both-24    16342.47 (   0.00%)    16859.44 (  -3.16%)
> Amean     fault-both-30    20394.26 (   0.00%)    16249.30 *  20.32%*
> Amean     fault-both-32    17450.76 (   0.00%)    14904.71 *  14.59%*

I always assumed that this was the main factor that (clumsily) limited THP fault
latencies. Seems like it's (no longer?) the case, or the lock contention
detection alone works as well.

> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 1a41a2dbff24..75eb0d40d4d7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -398,19 +398,11 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags,
>  	return true;
>  }
>  
> -/*
> - * Aside from avoiding lock contention, compaction also periodically checks
> - * need_resched() and records async compaction as contended if necessary.
> - */
> +/* Avoid soft-lockups due to long scan times */
>  static inline void compact_check_resched(struct compact_control *cc)
>  {
> -	/* async compaction aborts if contended */
> -	if (need_resched()) {
> -		if (cc->mode == MIGRATE_ASYNC)
> -			cc->contended = true;
> -
> +	if (need_resched())
>  		cond_resched();

Seems like plain "cond_resched()" is sufficient at this point, and probably
doesn't need a wrapper anymore.

> -	}
>  }
>  
>  /*
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 16/25] mm, compaction: Check early for huge pages encountered by the migration scanner
  2019-01-17 17:01   ` Vlastimil Babka
@ 2019-01-17 17:35     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-17 17:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Thu, Jan 17, 2019 at 06:01:18PM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:50 PM, Mel Gorman wrote:
> > When scanning for sources or targets, PageCompound is checked for huge
> > pages as they can be skipped quickly but it happens relatively late after
> > a lot of setup and checking. This patch short-cuts the check to make it
> > earlier. It might still change when the lock is acquired but this has
> > less overhead overall. The free scanner advances but the migration scanner
> > does not. Typically the free scanner encounters more movable blocks that
> > change state over the lifetime of the system and also tends to scan more
> > aggressively as it's actively filling its portion of the physical address
> > space with data. This could change in the future but for the moment,
> > this worked better in practice and incurred fewer scan restarts.
> > 
> > The impact on latency and allocation success rates is marginal but the
> > free scan rates are reduced by 32% and system CPU usage is reduced by
> > 2.6%. The 2-socket results are not materially different.
> 
> Hmm, interesting that adjusting migrate scanner affected free scanner. Oh well.
> 

Russian Roulette again. The exact scan rates depend on the system state
which are non-deterministic.  It's not until very late in the series that
they stabilise somewhat. In fact, during the development of the series,
I had to reorder patches multiple times when a corner case was dealt with
to avoid 1 in every 3-6 runs having crazy insane scan rates. The final
ordering was based on *relative* stability.

> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Nit below.
> 

Nit fixed.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 17/25] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks
  2019-01-17 17:17   ` Vlastimil Babka
@ 2019-01-17 17:37     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-17 17:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Thu, Jan 17, 2019 at 06:17:28PM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:50 PM, Mel Gorman wrote:
> > Migrate has separate cached PFNs for ASYNC and SYNC* migration on the
> > basis that some migrations will fail in ASYNC mode. However, if the cached
> > PFNs match at the start of scanning and pageblocks are skipped due to
> > having no isolation candidates, then the sync state does not matter.
> > This patch keeps matching cached PFNs in sync until a pageblock with
> > isolation candidates is found.
> > 
> > The actual benefit is marginal given that the sync scanner following the
> > async scanner will often skip a number of pageblocks but it's useless
> > work. Any benefit depends heavily on whether the scanners restarted
> > recently so overall the reduction in scan rates is a mere 2.8% which
> > is borderline noise.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> My easlier suggestion to check more thoroughly if pages can be migrated (which
> depends on the mode) before isolating them wouldn't play nice with this :)
> 

No, unfortunately it wouldn't. I did find though that sync_light often
ran very quickly after async when compaction was having trouble
succeeding. The time window was short enough that states like
Dirty/Writeback were highly unlikely to be cleared. It might have played
nice when fragmentation was very low but any benefit then would be very
difficult to detect.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner
  2019-01-04 12:50 ` [PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner Mel Gorman
@ 2019-01-17 17:58   ` Vlastimil Babka
  2019-01-17 19:39     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-17 17:58 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> The fast isolation of pages can move the scanner faster than is necessary
> depending on the contents of the free list. This patch will only allow
> the fast isolation to initialise the scanner and advance it slowly. The
> primary means of moving the scanner forward is via the linear scanner
> to reduce the likelihood the migration source/target scanners meet
> prematurely triggering a rescan.

Maybe I've seen enough code today and need to stop, but AFAICS the description
here doesn't match the actual code changes? What I see are some cleanups, and a
change in free scanner that will set pageblock skip bit after a pageblock has
been scanned, even if there were pages isolated, while previously it would set
the skip bit only if nothing was isolated.

>                                         4.20.0                 4.20.0
>                                noresched-v2r15         slowfree-v2r15
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      2736.50 (   0.00%)     2512.53 (   8.18%)
> Amean     fault-both-5      4133.70 (   0.00%)     4159.43 (  -0.62%)
> Amean     fault-both-7      5738.61 (   0.00%)     5950.15 (  -3.69%)
> Amean     fault-both-12     9392.82 (   0.00%)     8674.38 (   7.65%)
> Amean     fault-both-18    13257.15 (   0.00%)    12850.79 (   3.07%)
> Amean     fault-both-24    16859.44 (   0.00%)    17242.86 (  -2.27%)
> Amean     fault-both-30    16249.30 (   0.00%)    19404.18 * -19.42%*
> Amean     fault-both-32    14904.71 (   0.00%)    16200.79 (  -8.70%)
> 
> The impact to latency, success rates and scan rates is marginal but
> avoiding unnecessary restarts is important. It helps later patches that
> are more careful about how pageblocks are treated as earlier iterations
> of those patches hit corner cases where the restarts were punishing and
> very visible.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/compaction.c | 27 ++++++++++-----------------
>  1 file changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 75eb0d40d4d7..6c5552c6d8f9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -324,10 +324,9 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
>   * future. The information is later cleared by __reset_isolation_suitable().
>   */
>  static void update_pageblock_skip(struct compact_control *cc,
> -			struct page *page, unsigned long nr_isolated)
> +			struct page *page, unsigned long pfn)
>  {
>  	struct zone *zone = cc->zone;
> -	unsigned long pfn;
>  
>  	if (cc->no_set_skip_hint)
>  		return;
> @@ -335,13 +334,8 @@ static void update_pageblock_skip(struct compact_control *cc,
>  	if (!page)
>  		return;
>  
> -	if (nr_isolated)
> -		return;
> -
>  	set_pageblock_skip(page);
>  
> -	pfn = page_to_pfn(page);
> -
>  	/* Update where async and sync compaction should restart */
>  	if (pfn < zone->compact_cached_free_pfn)
>  		zone->compact_cached_free_pfn = pfn;
> @@ -359,7 +353,7 @@ static inline bool pageblock_skip_persistent(struct page *page)
>  }
>  
>  static inline void update_pageblock_skip(struct compact_control *cc,
> -			struct page *page, unsigned long nr_isolated)
> +			struct page *page, unsigned long pfn)
>  {
>  }
>  
> @@ -450,7 +444,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  				bool strict)
>  {
>  	int nr_scanned = 0, total_isolated = 0;
> -	struct page *cursor, *valid_page = NULL;
> +	struct page *cursor;
>  	unsigned long flags = 0;
>  	bool locked = false;
>  	unsigned long blockpfn = *start_pfn;
> @@ -477,9 +471,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  		if (!pfn_valid_within(blockpfn))
>  			goto isolate_fail;
>  
> -		if (!valid_page)
> -			valid_page = page;
> -
>  		/*
>  		 * For compound pages such as THP and hugetlbfs, we can save
>  		 * potentially a lot of iterations if we skip them at once.
> @@ -576,10 +567,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  	if (strict && blockpfn < end_pfn)
>  		total_isolated = 0;
>  
> -	/* Update the pageblock-skip if the whole pageblock was scanned */
> -	if (blockpfn == end_pfn)
> -		update_pageblock_skip(cc, valid_page, total_isolated);
> -
>  	cc->total_free_scanned += nr_scanned;
>  	if (total_isolated)
>  		count_compact_events(COMPACTISOLATED, total_isolated);
> @@ -1295,8 +1282,10 @@ fast_isolate_freepages(struct compact_control *cc)
>  		}
>  	}
>  
> -	if (highest && highest > cc->zone->compact_cached_free_pfn)
> +	if (highest && highest >= cc->zone->compact_cached_free_pfn) {
> +		highest -= pageblock_nr_pages;
>  		cc->zone->compact_cached_free_pfn = highest;
> +	}
>  
>  	cc->total_free_scanned += nr_scanned;
>  	if (!page)
> @@ -1376,6 +1365,10 @@ static void isolate_freepages(struct compact_control *cc)
>  		isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
>  					freelist, false);
>  
> +		/* Update the skip hint if the full pageblock was scanned */
> +		if (isolate_start_pfn == block_end_pfn)
> +			update_pageblock_skip(cc, page, block_start_pfn);
> +
>  		/* Are enough freepages isolated? */
>  		if (cc->nr_freepages >= cc->nr_migratepages) {
>  			if (isolate_start_pfn >= block_end_pfn) {
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 19/25] mm, compaction: Do not consider a need to reschedule as contention
  2019-01-17 17:33   ` Vlastimil Babka
@ 2019-01-17 18:05     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-17 18:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Thu, Jan 17, 2019 at 06:33:37PM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:50 PM, Mel Gorman wrote:
> > Scanning on large machines can take a considerable length of time and
> > eventually need to be rescheduled. This is treated as an abort event but
> > that's not appropriate as the attempt is likely to be retried after making
> > numerous checks and taking another cycle through the page allocator.
> > This patch will check the need to reschedule if necessary but continue
> > the scanning.
> > 
> > The main benefit is reduced scanning when compaction is taking a long time
> > or the machine is over-saturated. It also avoids an unnecessary exit of
> > compaction that ends up being retried by the page allocator in the outer
> > loop.
> > 
> >                                         4.20.0                 4.20.0
> >                               synccached-v2r15        noresched-v2r15
> > Amean     fault-both-3      2655.55 (   0.00%)     2736.50 (  -3.05%)
> > Amean     fault-both-5      4580.67 (   0.00%)     4133.70 (   9.76%)
> > Amean     fault-both-7      5740.50 (   0.00%)     5738.61 (   0.03%)
> > Amean     fault-both-12     9237.55 (   0.00%)     9392.82 (  -1.68%)
> > Amean     fault-both-18    12899.51 (   0.00%)    13257.15 (  -2.77%)
> > Amean     fault-both-24    16342.47 (   0.00%)    16859.44 (  -3.16%)
> > Amean     fault-both-30    20394.26 (   0.00%)    16249.30 *  20.32%*
> > Amean     fault-both-32    17450.76 (   0.00%)    14904.71 *  14.59%*
> 
> I always assumed that this was the main factor that (clumsily) limited THP fault
> latencies. Seems like it's (no longer?) the case, or the lock contention
> detection alone works as well.
> 

I didn't dig into the history but one motivating factor around all the
logic would have been reducing the time IRQs were disabled. With changes
like scanning COMPACT_CLUSTER_MAX and dropping locks, it's less of a
factor. Then again, the retry loops around in the page allocator would
also have changed the problem. Things just changed enough that the
original motivation no longer applies.

> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> > ---
> >  mm/compaction.c | 12 ++----------
> >  1 file changed, 2 insertions(+), 10 deletions(-)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 1a41a2dbff24..75eb0d40d4d7 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -398,19 +398,11 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags,
> >  	return true;
> >  }
> >  
> > -/*
> > - * Aside from avoiding lock contention, compaction also periodically checks
> > - * need_resched() and records async compaction as contended if necessary.
> > - */
> > +/* Avoid soft-lockups due to long scan times */
> >  static inline void compact_check_resched(struct compact_control *cc)
> >  {
> > -	/* async compaction aborts if contended */
> > -	if (need_resched()) {
> > -		if (cc->mode == MIGRATE_ASYNC)
> > -			cc->contended = true;
> > -
> > +	if (need_resched())
> >  		cond_resched();
> 
> Seems like plain "cond_resched()" is sufficient at this point, and probably
> doesn't need a wrapper anymore.
> 

I guess so. I liked having the helper to remind that the contention points
mattered at some point and wasn't a random sprinkling of cond_resched
but I'll remove it.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner
  2019-01-17 17:58   ` Vlastimil Babka
@ 2019-01-17 19:39     ` Mel Gorman
  2019-01-18  9:09       ` Vlastimil Babka
  0 siblings, 1 reply; 75+ messages in thread
From: Mel Gorman @ 2019-01-17 19:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Thu, Jan 17, 2019 at 06:58:30PM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:50 PM, Mel Gorman wrote:
> > The fast isolation of pages can move the scanner faster than is necessary
> > depending on the contents of the free list. This patch will only allow
> > the fast isolation to initialise the scanner and advance it slowly. The
> > primary means of moving the scanner forward is via the linear scanner
> > to reduce the likelihood the migration source/target scanners meet
> > prematurely triggering a rescan.
> 
> Maybe I've seen enough code today and need to stop, but AFAICS the description
> here doesn't match the actual code changes? What I see are some cleanups, and a
> change in free scanner that will set pageblock skip bit after a pageblock has
> been scanned, even if there were pages isolated, while previously it would set
> the skip bit only if nothing was isolated.
> 

The first three hunks could have been split out but it wouldn't help
overall. Maybe a changelog rewrite will help;

mm, compaction: Reduce premature advancement of the migration target scanner

The fast isolation of free pages allows the cached PFN of the free
scanner to advance faster than necessary depending on the contents
of the free list. The key is that fast_isolate_freepages() can update
zone->compact_cached_free_pfn via isolate_freepages_block().  When the
fast search fails, the linear scan can start from a point that has skipped
valid migration targets, particularly pageblocks with just low-order
free pages. This can cause the migration source/target scanners to meet
prematurely causing a reset.

This patch starts by avoiding an update of the pageblock skip information
and cached PFN from isolate_freepages_block() and puts the responsibility
of updating that information in the callers. The fast scanner will update
the cached PFN if and only if it finds a block that is higher than the
existing cached PFN and sets the skip if the pageblock is full or nearly
full. The linear scanner will update skipped information and the cached
PFN only when a block is completely scanned. The total impact is that
the free scanner advances more slowly as it is primarily driven by the
linear scanner instead of the fast search.

Does that help?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 15/25] mm, compaction: Finish pageblock scanning on contention
  2019-01-17 17:11     ` Mel Gorman
@ 2019-01-18  8:57       ` Vlastimil Babka
  0 siblings, 0 replies; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-18  8:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/17/19 6:11 PM, Mel Gorman wrote:
> On Thu, Jan 17, 2019 at 05:38:36PM +0100, Vlastimil Babka wrote:
>> > rate but also by the fact that the scanners do not meet for longer when
>> > pageblocks are actually used. Overall this is justified and completing
>> > a pageblock scan is very important for later patches.
>> > 
>> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>> 
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> 
>> Some comments below.
>> 
> 
> Thanks
> 
>> > @@ -538,18 +535,8 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>> >  		 * recheck as well.
>> >  		 */
>> >  		if (!locked) {
>> > -			/*
>> > -			 * The zone lock must be held to isolate freepages.
>> > -			 * Unfortunately this is a very coarse lock and can be
>> > -			 * heavily contended if there are parallel allocations
>> > -			 * or parallel compactions. For async compaction do not
>> > -			 * spin on the lock and we acquire the lock as late as
>> > -			 * possible.
>> > -			 */
>> > -			locked = compact_trylock_irqsave(&cc->zone->lock,
>> > +			locked = compact_lock_irqsave(&cc->zone->lock,
>> >  								&flags, cc);
>> > -			if (!locked)
>> > -				break;
>> 
>> Seems a bit dangerous to continue compact_lock_irqsave() to return bool that
>> however now always returns true, and remove the safety checks that test the
>> result. Easy for somebody in the future to reintroduce some 'return false'
>> condition (even though the name now says lock and not trylock) and start
>> crashing. I would either change it to return void, or leave the checks in place.
>> 
> 
> I considered changing it from bool at the same time as "Rework
> compact_should_abort as compact_check_resched". It turned out to be a
> bit clumsy because the locked state must be explicitly updated in the
> caller then. e.g.
> 
> locked = compact_lock_irqsave(...)
> 
> becomes
> 
> compact_lock_irqsave(...)
> locked = true
> 
> I didn't think the result looked that great to be honest but maybe it's
> worth revisiting as a cleanup patch like "Rework compact_should_abort as
> compact_check_resched" on top.
> 
>> > 
>> > @@ -1411,12 +1395,8 @@ static void isolate_freepages(struct compact_control *cc)
>> >  		isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
>> >  					freelist, false);
>> >  
>> > -		/*
>> > -		 * If we isolated enough freepages, or aborted due to lock
>> > -		 * contention, terminate.
>> > -		 */
>> > -		if ((cc->nr_freepages >= cc->nr_migratepages)
>> > -							|| cc->contended) {
>> 
>> Does it really make sense to continue in the case of free scanner, when we know
>> we will just return back the extra pages in the end? release_freepages() will
>> update the cached pfns, but the pageblock skip bit will stay, so we just leave
>> those pages behind. Unless finishing the block is important for the later
>> patches (as changelog mentions) even in the case of free scanner, but then we
>> can just skip the rest of it, as truly scanning it can't really help anything?
>> 
> 
> Finishing is important for later patches is one factor but not the only
> factor. While we eventually return all pages, we do not know at this
> point in time how many free pages are needed. Remember the migration
> source isolates COMPACT_CLUSTER_MAX pages and then looks for migration
> targets.  If the source isolates 32 pages, free might isolate more from
> one pageblock but that's ok as the migration source may need more free
> pages in the immediate future. It's less wasteful than it looks at first
> glance (or second or even third glance).
> 
> However, if we isolated exactly enough targets, and the pageblock gets
> marked skipped, then each COMPACT_CLUSTER_MAX isolation from the target
> could potentially marge one new pageblock unnecessarily and increase
> scanning+resets overall. That would be bad.
> 
> There still can be waste because we do not know in advance exactly how
> many migration sources there will be -- sure, we could calculate it but
> that involves scanning the source pageblock twice which is wasteful.
> I did try estimating it based on the remaining number of pages in the
> pageblock but the additional complexity did not appear to help.
> 
> Does that make sense?

OK, thanks.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner
  2019-01-17 19:39     ` Mel Gorman
@ 2019-01-18  9:09       ` Vlastimil Babka
  0 siblings, 0 replies; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-18  9:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/17/19 8:39 PM, Mel Gorman wrote:
> On Thu, Jan 17, 2019 at 06:58:30PM +0100, Vlastimil Babka wrote:
>> On 1/4/19 1:50 PM, Mel Gorman wrote:
>> > The fast isolation of pages can move the scanner faster than is necessary
>> > depending on the contents of the free list. This patch will only allow
>> > the fast isolation to initialise the scanner and advance it slowly. The
>> > primary means of moving the scanner forward is via the linear scanner
>> > to reduce the likelihood the migration source/target scanners meet
>> > prematurely triggering a rescan.
>> 
>> Maybe I've seen enough code today and need to stop, but AFAICS the description
>> here doesn't match the actual code changes? What I see are some cleanups, and a
>> change in free scanner that will set pageblock skip bit after a pageblock has
>> been scanned, even if there were pages isolated, while previously it would set
>> the skip bit only if nothing was isolated.
>> 
> 
> The first three hunks could have been split out but it wouldn't help
> overall. Maybe a changelog rewrite will help;
> 
> mm, compaction: Reduce premature advancement of the migration target scanner
> 
> The fast isolation of free pages allows the cached PFN of the free
> scanner to advance faster than necessary depending on the contents
> of the free list. The key is that fast_isolate_freepages() can update
> zone->compact_cached_free_pfn via isolate_freepages_block().  When the
> fast search fails, the linear scan can start from a point that has skipped
> valid migration targets, particularly pageblocks with just low-order
> free pages. This can cause the migration source/target scanners to meet
> prematurely causing a reset.
> 
> This patch starts by avoiding an update of the pageblock skip information
> and cached PFN from isolate_freepages_block() and puts the responsibility
> of updating that information in the callers. The fast scanner will update
> the cached PFN if and only if it finds a block that is higher than the
> existing cached PFN and sets the skip if the pageblock is full or nearly
> full. The linear scanner will update skipped information and the cached
> PFN only when a block is completely scanned. The total impact is that
> the free scanner advances more slowly as it is primarily driven by the
> linear scanner instead of the fast search.
> 
> Does that help?

Yeah, now I get it, thanks!

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 21/25] mm, compaction: Round-robin the order while searching the free lists for a target
  2019-01-04 12:50 ` [PATCH 21/25] mm, compaction: Round-robin the order while searching the free lists for a target Mel Gorman
@ 2019-01-18  9:17   ` Vlastimil Babka
  0 siblings, 0 replies; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-18  9:17 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> As compaction proceeds and creates high-order blocks, the free list
> search gets less efficient as the larger blocks are used as compaction
> targets. Eventually, the larger blocks will be behind the migration
> scanner for partially migrated pageblocks and the search fails. This
> patch round-robins what orders are searched so that larger blocks can be
> ignored and find smaller blocks that can be used as migration targets.
> 
> The overall impact was small on 1-socket but it avoids corner cases where
> the migration/free scanners meet prematurely or situations where many of
> the pageblocks encountered by the free scanner are almost full instead of
> being properly packed. Previous testing had indicated that without this
> patch there were occasional large spikes in the free scanner without this
> patch. By co-incidence, the 2-socket results showed a 54% reduction in
> the free scanner but will not be universally true.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 22/25] mm, compaction: Sample pageblocks for free pages
  2019-01-04 12:50 ` [PATCH 22/25] mm, compaction: Sample pageblocks for free pages Mel Gorman
@ 2019-01-18 10:38   ` Vlastimil Babka
  2019-01-18 13:44     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-18 10:38 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> Once fast searching finishes, there is a possibility that the linear
> scanner is scanning full blocks found by the fast scanner earlier. This
> patch uses an adaptive stride to sample pageblocks for free pages. The
> more consecutive full pageblocks encountered, the larger the stride until
> a pageblock with free pages is found. The scanners might meet slightly
> sooner but it is an acceptable risk given that the search of the free
> lists may still encounter the pages and adjust the cached PFN of the free
> scanner accordingly.
> 
> In terms of latency and success rates, the impact is not obvious but the
> free scan rate is reduced by 87% on a 1-socket machine and 92% on a
> 2-socket machine. It's also the first time in the series where the number
> of pages scanned by the migration scanner is greater than the free scanner
> due to the increased search efficiency.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

OK, I admit this is quite counterintuitive to me. I would have expected
this change to result in meeting scanners much more sooner, while
missing many free pages (especially when starting with stride 32 for
async compaction). I would have expected that pageblocks that we already
depleted are marked for skipping, while freeing pages by reclaim
scatters them randomly in the remaining ones, and this will then miss
many. But you have benchmarking data so I won't object :)

> ---
>  mm/compaction.c | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 652e249168b1..cc532e81a7b7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -441,6 +441,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  				unsigned long *start_pfn,
>  				unsigned long end_pfn,
>  				struct list_head *freelist,
> +				unsigned int stride,
>  				bool strict)
>  {
>  	int nr_scanned = 0, total_isolated = 0;
> @@ -450,10 +451,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  	unsigned long blockpfn = *start_pfn;
>  	unsigned int order;
>  
> +	/* Strict mode is for isolation, speed is secondary */
> +	if (strict)
> +		stride = 1;

Why not just call this from strict context with stride 1, instead of
passing 0 and then changing it to 1.

> +
>  	cursor = pfn_to_page(blockpfn);
>  
>  	/* Isolate free pages. */
> -	for (; blockpfn < end_pfn; blockpfn++, cursor++) {
> +	for (; blockpfn < end_pfn; blockpfn += stride, cursor += stride) {
>  		int isolated;
>  		struct page *page = cursor;
>  
> @@ -624,7 +629,7 @@ isolate_freepages_range(struct compact_control *cc,
>  			break;
>  
>  		isolated = isolate_freepages_block(cc, &isolate_start_pfn,
> -						block_end_pfn, &freelist, true);
> +					block_end_pfn, &freelist, 0, true);
>  
>  		/*
>  		 * In strict mode, isolate_freepages_block() returns 0 if
> @@ -1139,7 +1144,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long
>  
>  	/* Scan before */
>  	if (start_pfn != pfn) {
> -		isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, false);
> +		isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, 1, false);
>  		if (cc->nr_freepages >= cc->nr_migratepages)
>  			return;
>  	}
> @@ -1147,7 +1152,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long
>  	/* Scan after */
>  	start_pfn = pfn + nr_isolated;
>  	if (start_pfn != end_pfn)
> -		isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, false);
> +		isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
>  
>  	/* Skip this pageblock in the future as it's full or nearly full */
>  	if (cc->nr_freepages < cc->nr_migratepages)
> @@ -1333,7 +1338,9 @@ static void isolate_freepages(struct compact_control *cc)
>  	unsigned long isolate_start_pfn; /* exact pfn we start at */
>  	unsigned long block_end_pfn;	/* end of current pageblock */
>  	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
> +	unsigned long nr_isolated;
>  	struct list_head *freelist = &cc->freepages;
> +	unsigned int stride;
>  
>  	/* Try a small search of the free lists for a candidate */
>  	isolate_start_pfn = fast_isolate_freepages(cc);
> @@ -1356,6 +1363,7 @@ static void isolate_freepages(struct compact_control *cc)
>  	block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
>  						zone_end_pfn(zone));
>  	low_pfn = pageblock_end_pfn(cc->migrate_pfn);
> +	stride = cc->mode == MIGRATE_ASYNC ? COMPACT_CLUSTER_MAX : 1;
>  
>  	/*
>  	 * Isolate free pages until enough are available to migrate the
> @@ -1387,8 +1395,8 @@ static void isolate_freepages(struct compact_control *cc)
>  			continue;
>  
>  		/* Found a block suitable for isolating free pages from. */
> -		isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
> -					freelist, false);
> +		nr_isolated = isolate_freepages_block(cc, &isolate_start_pfn,
> +					block_end_pfn, freelist, stride, false);
>  
>  		/* Update the skip hint if the full pageblock was scanned */
>  		if (isolate_start_pfn == block_end_pfn)
> @@ -1412,6 +1420,13 @@ static void isolate_freepages(struct compact_control *cc)
>  			 */
>  			break;
>  		}
> +
> +		/* Adjust stride depending on isolation */
> +		if (nr_isolated) {
> +			stride = 1;
> +			continue;
> +		}

If we hit a free page with a large stride, wouldn't it make sense to
reset it to 1 immediately in the same pageblock, and possibly also start
over from its beginning, if the assumption is that free pages appear
close together?

> +		stride = min_t(unsigned int, COMPACT_CLUSTER_MAX, stride << 1);
>  	}
>  
>  	/*
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 23/25] mm, compaction: Be selective about what pageblocks to clear skip hints
  2019-01-04 12:50 ` [PATCH 23/25] mm, compaction: Be selective about what pageblocks to clear skip hints Mel Gorman
@ 2019-01-18 12:55   ` Vlastimil Babka
  2019-01-18 14:10     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-18 12:55 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> Pageblock hints are cleared when compaction restarts or kswapd makes enough
> progress that it can sleep but it's over-eager in that the bit is cleared
> for migration sources with no LRU pages and migration targets with no free
> pages. As pageblock skip hint flushes are relatively rare and out-of-band
> with respect to kswapd, this patch makes a few more expensive checks to
> see if it's appropriate to even clear the bit. Every pageblock that is
> not cleared will avoid 512 pages being scanned unnecessarily on x86-64.
> 
> The impact is variable with different workloads showing small differences
> in latency, success rates and scan rates. This is expected as clearing
> the hints is not that common but doing a small amount of work out-of-band
> to avoid a large amount of work in-band later is generally a good thing.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Similar doubts to the previous patch wrt sampling. But if it works, ok.

> ---
>  include/linux/mmzone.h |   2 +
>  mm/compaction.c        | 119 +++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 102 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index cc4a507d7ca4..faa1e6523f49 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -480,6 +480,8 @@ struct zone {
>  	unsigned long		compact_cached_free_pfn;
>  	/* pfn where async and sync compaction migration scanner should start */
>  	unsigned long		compact_cached_migrate_pfn[2];
> +	unsigned long		compact_init_migrate_pfn;
> +	unsigned long		compact_init_free_pfn;
>  #endif
>  
>  #ifdef CONFIG_COMPACTION
> diff --git a/mm/compaction.c b/mm/compaction.c
> index cc532e81a7b7..7f316e1a7275 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -231,6 +231,62 @@ static bool pageblock_skip_persistent(struct page *page)
>  	return false;
>  }
>  
> +static bool
> +__reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source,
> +							bool check_target)
> +{
> +	struct page *page = pfn_to_online_page(pfn);
> +	struct page *end_page;
> +
> +	if (!page)
> +		return false;
> +	if (zone != page_zone(page))
> +		return false;
> +	if (pageblock_skip_persistent(page))
> +		return false;
> +
> +	/*
> +	 * If skip is already cleared do no further checking once the
> +	 * restart points have been set.
> +	 */
> +	if (check_source && check_target && !get_pageblock_skip(page))
> +		return true;
> +
> +	/*
> +	 * If clearing skip for the target scanner, do not select a
> +	 * non-movable pageblock as the starting point.
> +	 */
> +	if (!check_source && check_target &&
> +	    get_pageblock_migratetype(page) != MIGRATE_MOVABLE)
> +		return false;
> +
> +	/*
> +	 * Only clear the hint if a sample indicates there is either a
> +	 * free page or an LRU page in the block. One or other condition
> +	 * is necessary for the block to be a migration source/target.
> +	 */
> +	page = pfn_to_page(pageblock_start_pfn(pfn));
> +	if (zone != page_zone(page))
> +		return false;
> +	end_page = page + pageblock_nr_pages;

Watch out for start pfn being invalid, and end_page being invalid or after zone end?

> +
> +	do {
> +		if (check_source && PageLRU(page)) {
> +			clear_pageblock_skip(page);
> +			return true;
> +		}
> +
> +		if (check_target && PageBuddy(page)) {
> +			clear_pageblock_skip(page);
> +			return true;
> +		}
> +
> +		page += (1 << PAGE_ALLOC_COSTLY_ORDER);

Also probably check pfn_valid_within() and page_zone?

> +	} while (page < end_page);
> +
> +	return false;
> +}
> +
>  /*
>   * This function is called to clear all cached information on pageblocks that
>   * should be skipped for page isolation when the migrate and free page scanner

...

> @@ -1193,7 +1273,7 @@ fast_isolate_freepages(struct compact_control *cc)
>  	 * If starting the scan, use a deeper search and use the highest
>  	 * PFN found if a suitable one is not found.
>  	 */
> -	if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) {
> +	if (cc->free_pfn >= cc->zone->compact_init_free_pfn) {
>  		limit = pageblock_nr_pages >> 1;
>  		scan_start = true;
>  	}
> @@ -1338,7 +1418,6 @@ static void isolate_freepages(struct compact_control *cc)
>  	unsigned long isolate_start_pfn; /* exact pfn we start at */
>  	unsigned long block_end_pfn;	/* end of current pageblock */
>  	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
> -	unsigned long nr_isolated;
>  	struct list_head *freelist = &cc->freepages;
>  	unsigned int stride;
>  
> @@ -1374,6 +1453,8 @@ static void isolate_freepages(struct compact_control *cc)
>  				block_end_pfn = block_start_pfn,
>  				block_start_pfn -= pageblock_nr_pages,
>  				isolate_start_pfn = block_start_pfn) {
> +		unsigned long nr_isolated;

Unrelated cleanup? Nevermind.

>  		/*
>  		 * This can iterate a massively long zone without finding any
>  		 * suitable migration targets, so periodically check resched.
> @@ -2020,7 +2101,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
>  			cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
>  		}
>  
> -		if (cc->migrate_pfn == start_pfn)
> +		if (cc->migrate_pfn <= cc->zone->compact_init_migrate_pfn)
>  			cc->whole_zone = true;
>  	}
>  
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 24/25] mm, compaction: Capture a page under direct compaction
  2019-01-04 12:50 ` [PATCH 24/25] mm, compaction: Capture a page under direct compaction Mel Gorman
@ 2019-01-18 13:40   ` Vlastimil Babka
  2019-01-18 14:39     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-18 13:40 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> Compaction is inherently race-prone as a suitable page freed during
> compaction can be allocated by any parallel task. This patch uses a
> capture_control structure to isolate a page immediately when it is freed
> by a direct compactor in the slow path of the page allocator. The intent
> is to avoid redundant scanning.
> 
>                                         4.20.0                 4.20.0
>                                selective-v2r15          capture-v2r15
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      2624.85 (   0.00%)     2594.49 (   1.16%)
> Amean     fault-both-5      3842.66 (   0.00%)     4088.32 (  -6.39%)
> Amean     fault-both-7      5459.47 (   0.00%)     5936.54 (  -8.74%)
> Amean     fault-both-12     9276.60 (   0.00%)    10160.85 (  -9.53%)
> Amean     fault-both-18    14030.73 (   0.00%)    13908.92 (   0.87%)
> Amean     fault-both-24    13298.10 (   0.00%)    16819.86 * -26.48%*
> Amean     fault-both-30    17648.62 (   0.00%)    17901.74 (  -1.43%)
> Amean     fault-both-32    19161.67 (   0.00%)    18621.32 (   2.82%)
> 
> Latency is only moderately affected but the devil is in the details.
> A closer examination indicates that base page fault latency is much
> reduced but latency of huge pages is increased as it takes creater care
> to succeed. Part of the "problem" is that allocation success rates
> are close to 100% even when under pressure and compaction gets harder
> 
>                                    4.20.0                 4.20.0
>                           selective-v2r15          capture-v2r15
> Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
> Percentage huge-3        99.95 (   0.00%)       99.98 (   0.03%)
> Percentage huge-5        98.83 (   0.00%)       98.01 (  -0.84%)
> Percentage huge-7        96.78 (   0.00%)       98.30 (   1.58%)
> Percentage huge-12       98.85 (   0.00%)       97.76 (  -1.10%)
> Percentage huge-18       97.52 (   0.00%)       99.05 (   1.57%)
> Percentage huge-24       97.07 (   0.00%)       99.34 (   2.35%)
> Percentage huge-30       96.59 (   0.00%)       99.08 (   2.58%)
> Percentage huge-32       95.94 (   0.00%)       99.03 (   3.22%)
> 
> And scan rates are reduced as expected by 10% for the migration
> scanner and 37% for the free scanner indicating that there is
> less redundant work.
> 
> Compaction migrate scanned    20338945.00    18133661.00
> Compaction free scanned       12590377.00     7986174.00
> 
> The impact on 2-socket is much larger albeit not presented. Under
> a different workload that fragments heavily, the allocation latency
> is reduced by 26% while the success rate goes from 63% to 80%
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Great, you crossed off this old TODO item, and didn't need pageblock isolation
to do that :D

I have just one worry...

> @@ -837,6 +873,12 @@ static inline void __free_one_page(struct page *page,
>  
>  continue_merging:
>  	while (order < max_order - 1) {
> +		if (compaction_capture(capc, page, order)) {
> +			if (likely(!is_migrate_isolate(migratetype)))
> +				__mod_zone_freepage_state(zone, -(1 << order),
> +								migratetype);
> +			return;

What about MIGRATE_CMA pageblocks and compaction for non-movable allocation,
won't that violate CMA expecteations?
And less critically, this will avoid the migratetype stealing decisions and
actions, potentially resulting in worse fragmentation avoidance?

> +		}
>  		buddy_pfn = __find_buddy_pfn(pfn, order);
>  		buddy = page + (buddy_pfn - pfn);
>  

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 22/25] mm, compaction: Sample pageblocks for free pages
  2019-01-18 10:38   ` Vlastimil Babka
@ 2019-01-18 13:44     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-18 13:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Fri, Jan 18, 2019 at 11:38:38AM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:50 PM, Mel Gorman wrote:
> > Once fast searching finishes, there is a possibility that the linear
> > scanner is scanning full blocks found by the fast scanner earlier. This
> > patch uses an adaptive stride to sample pageblocks for free pages. The
> > more consecutive full pageblocks encountered, the larger the stride until
> > a pageblock with free pages is found. The scanners might meet slightly
> > sooner but it is an acceptable risk given that the search of the free
> > lists may still encounter the pages and adjust the cached PFN of the free
> > scanner accordingly.
> > 
> > In terms of latency and success rates, the impact is not obvious but the
> > free scan rate is reduced by 87% on a 1-socket machine and 92% on a
> > 2-socket machine. It's also the first time in the series where the number
> > of pages scanned by the migration scanner is greater than the free scanner
> > due to the increased search efficiency.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> OK, I admit this is quite counterintuitive to me. I would have expected
> this change to result in meeting scanners much more sooner, while
> missing many free pages (especially when starting with stride 32 for
> async compaction). I would have expected that pageblocks that we already
> depleted are marked for skipping, while freeing pages by reclaim
> scatters them randomly in the remaining ones, and this will then miss
> many. But you have benchmarking data so I won't object :)
> 

So, it comes down to probabilities to some extent which we cannot
really calculate because they are a function of the reference string
for allocations and frees in combination with compaction activity both
of which depend on the workload.

Fundamentally, the key is that compaction typically moves data from lower
addresses to higher addresses. The longer compaction is running, the more
packed the higher addresses become until there are no free pages. When
the fast search fails and the linear search starts, it has to proceed
through a large number of pageblocks that have been tightly packed one
page at a time. However, the location of the free pages doesn't change
very much so the locationwhere compaction finds a target and when the
scanners meet doesn't change by very much at all.

Now, with sampling, some candidates might be missed depending on the
size of the stride and the scanners meet fractionally sooner but the
difference is very marginal. The difference is that we skip over heavily
packed pageblocks much quicker.

Does that help the counterintuitive nature of the patch?

> > ---
> >  mm/compaction.c | 27 +++++++++++++++++++++------
> >  1 file changed, 21 insertions(+), 6 deletions(-)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 652e249168b1..cc532e81a7b7 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -441,6 +441,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> >  				unsigned long *start_pfn,
> >  				unsigned long end_pfn,
> >  				struct list_head *freelist,
> > +				unsigned int stride,
> >  				bool strict)
> >  {
> >  	int nr_scanned = 0, total_isolated = 0;
> > @@ -450,10 +451,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> >  	unsigned long blockpfn = *start_pfn;
> >  	unsigned int order;
> >  
> > +	/* Strict mode is for isolation, speed is secondary */
> > +	if (strict)
> > +		stride = 1;
> 
> Why not just call this from strict context with stride 1, instead of
> passing 0 and then changing it to 1.

No particular reason other than I wanted to make it clear that strict
mode shouldn't play games with stride. I can change it if you prefer.

> > @@ -1412,6 +1420,13 @@ static void isolate_freepages(struct compact_control *cc)
> >  			 */
> >  			break;
> >  		}
> > +
> > +		/* Adjust stride depending on isolation */
> > +		if (nr_isolated) {
> > +			stride = 1;
> > +			continue;
> > +		}
> 
> If we hit a free page with a large stride, wouldn't it make sense to
> reset it to 1 immediately in the same pageblock, and possibly also start
> over from its beginning, if the assumption is that free pages appear
> close together?
> 

I felt that the likely benefit would be marginal and without additional
complexity, we end up scanning the same pageblock twice. I didn't think
the marginal upside was worth it.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 25/25] mm, compaction: Do not direct compact remote memory
  2019-01-04 12:50 ` [PATCH 25/25] mm, compaction: Do not direct compact remote memory Mel Gorman
@ 2019-01-18 13:51   ` Vlastimil Babka
  2019-01-18 14:46     ` Mel Gorman
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2019-01-18 13:51 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On 1/4/19 1:50 PM, Mel Gorman wrote:
> Remote compaction is expensive and possibly counter-productive. Locality
> is expected to often have better performance characteristics than remote
> high-order pages. For small allocations, it's expected that locality is
> generally required or fallbacks are possible. For larger allocations such
> as THP, they are forbidden at the time of writing but if __GFP_THISNODE
> is ever removed, then it would still be preferable to fallback to small
> local base pages over remote THP in the general case. kcompactd is still
> woken via kswapd so compaction happens eventually.
> 
> While this patch potentially has both positive and negative effects,
> it is best to avoid the possibility of remote compaction given the cost
> relative to any potential benefit.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Generally agree with the intent, but what if there's e.g. high-order (but not
costly) kernel allocation on behalf of user process on cpu belonging to a
movable node, where the only non-movable node is node 0. It will have to keep
reclaiming until a large enough page is formed, or wait for kcompactd?
So maybe do this only for costly orders?

Also I think compaction_zonelist_suitable() should be also updated, or we might
be promising the reclaim-compact loop e.g. that we will compact after enough
reclaim, but then we won't.

> ---
>  mm/compaction.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ae70be023b21..cc17f0c01811 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2348,6 +2348,16 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>  			continue;
>  		}
>  
> +		/*
> +		 * Do not compact remote memory. It's expensive and high-order
> +		 * small allocations are expected to prefer or require local
> +		 * memory. Similarly, larger requests such as THP can fallback
> +		 * to base pages in preference to remote huge pages if
> +		 * __GFP_THISNODE is not specified
> +		 */
> +		if (zone_to_nid(zone) != zone_to_nid(ac->preferred_zoneref->zone))
> +			continue;
> +
>  		status = compact_zone_order(zone, order, gfp_mask, prio,
>  				alloc_flags, ac_classzone_idx(ac), capture);
>  		rc = max(status, rc);
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 23/25] mm, compaction: Be selective about what pageblocks to clear skip hints
  2019-01-18 12:55   ` Vlastimil Babka
@ 2019-01-18 14:10     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-18 14:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Fri, Jan 18, 2019 at 01:55:24PM +0100, Vlastimil Babka wrote:
> > +static bool
> > +__reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source,
> > +							bool check_target)
> > +{
> > +	struct page *page = pfn_to_online_page(pfn);
> > +	struct page *end_page;
> > +
> > +	if (!page)
> > +		return false;
> > +	if (zone != page_zone(page))
> > +		return false;
> > +	if (pageblock_skip_persistent(page))
> > +		return false;
> > +
> > +	/*
> > +	 * If skip is already cleared do no further checking once the
> > +	 * restart points have been set.
> > +	 */
> > +	if (check_source && check_target && !get_pageblock_skip(page))
> > +		return true;
> > +
> > +	/*
> > +	 * If clearing skip for the target scanner, do not select a
> > +	 * non-movable pageblock as the starting point.
> > +	 */
> > +	if (!check_source && check_target &&
> > +	    get_pageblock_migratetype(page) != MIGRATE_MOVABLE)
> > +		return false;
> > +
> > +	/*
> > +	 * Only clear the hint if a sample indicates there is either a
> > +	 * free page or an LRU page in the block. One or other condition
> > +	 * is necessary for the block to be a migration source/target.
> > +	 */
> > +	page = pfn_to_page(pageblock_start_pfn(pfn));
> > +	if (zone != page_zone(page))
> > +		return false;
> > +	end_page = page + pageblock_nr_pages;
> 
> Watch out for start pfn being invalid, and end_page being invalid or after zone end?
> 

Yeah, it is possible there is no alignment on pageblock_nr_pages.

> > +
> > +	do {
> > +		if (check_source && PageLRU(page)) {
> > +			clear_pageblock_skip(page);
> > +			return true;
> > +		}
> > +
> > +		if (check_target && PageBuddy(page)) {
> > +			clear_pageblock_skip(page);
> > +			return true;
> > +		}
> > +
> > +		page += (1 << PAGE_ALLOC_COSTLY_ORDER);
> 
> Also probably check pfn_valid_within() and page_zone?
> 

Again yes. Holes could have been punched.

I've an updated version but I'll shove it through tests just to be sure.

> > +	} while (page < end_page);
> > +
> > +	return false;
> > +}
> > +
> >  /*
> >   * This function is called to clear all cached information on pageblocks that
> >   * should be skipped for page isolation when the migrate and free page scanner
> 
> ...
> 
> > @@ -1193,7 +1273,7 @@ fast_isolate_freepages(struct compact_control *cc)
> >  	 * If starting the scan, use a deeper search and use the highest
> >  	 * PFN found if a suitable one is not found.
> >  	 */
> > -	if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) {
> > +	if (cc->free_pfn >= cc->zone->compact_init_free_pfn) {
> >  		limit = pageblock_nr_pages >> 1;
> >  		scan_start = true;
> >  	}
> > @@ -1338,7 +1418,6 @@ static void isolate_freepages(struct compact_control *cc)
> >  	unsigned long isolate_start_pfn; /* exact pfn we start at */
> >  	unsigned long block_end_pfn;	/* end of current pageblock */
> >  	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
> > -	unsigned long nr_isolated;
> >  	struct list_head *freelist = &cc->freepages;
> >  	unsigned int stride;
> >  
> > @@ -1374,6 +1453,8 @@ static void isolate_freepages(struct compact_control *cc)
> >  				block_end_pfn = block_start_pfn,
> >  				block_start_pfn -= pageblock_nr_pages,
> >  				isolate_start_pfn = block_start_pfn) {
> > +		unsigned long nr_isolated;
> 
> Unrelated cleanup? Nevermind.
> 

I'll move the hunks to "mm, compaction: Sample pageblocks for free
pages" where they belong

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 24/25] mm, compaction: Capture a page under direct compaction
  2019-01-18 13:40   ` Vlastimil Babka
@ 2019-01-18 14:39     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-18 14:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Fri, Jan 18, 2019 at 02:40:00PM +0100, Vlastimil Babka wrote:
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Great, you crossed off this old TODO item, and didn't need pageblock isolation
> to do that :D
> 

The TODO is not just old, it's ancient! The idea of capture was first
floated in 2008! A version was proposed at https://lwn.net/Articles/301246/
against 2.6.27-rc1-mm1.

> I have just one worry...
> 
> > @@ -837,6 +873,12 @@ static inline void __free_one_page(struct page *page,
> >  
> >  continue_merging:
> >  	while (order < max_order - 1) {
> > +		if (compaction_capture(capc, page, order)) {
> > +			if (likely(!is_migrate_isolate(migratetype)))
> > +				__mod_zone_freepage_state(zone, -(1 << order),
> > +								migratetype);
> > +			return;
> 
> What about MIGRATE_CMA pageblocks and compaction for non-movable allocation,
> won't that violate CMA expecteations?
> And less critically, this will avoid the migratetype stealing decisions and
> actions, potentially resulting in worse fragmentation avoidance?
> 

Both might be issues. How about this (untested)?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fe089ac8a207..d61174bb0333 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -799,11 +799,26 @@ static inline struct capture_control *task_capc(struct zone *zone)
 }
 
 static inline bool
-compaction_capture(struct capture_control *capc, struct page *page, int order)
+compaction_capture(struct capture_control *capc, struct page *page,
+		   int order, int migratetype)
 {
 	if (!capc || order != capc->cc->order)
 		return false;
 
+	/* Do not accidentally pollute CMA or isolated regions*/
+	if (is_migrate_cma(migratetype) ||
+	    is_migrate_isolate(migratetype))
+		return false;
+
+	/*
+	 * Do not let lower order allocations polluate a movable pageblock.
+	 * This might let an unmovable request use a reclaimable pageblock
+	 * and vice-versa but no more than normal fallback logic which can
+	 * have trouble finding a high-order free page.
+	 */
+	if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
+		return false;
+
 	capc->page = page;
 	return true;
 }
@@ -815,7 +830,8 @@ static inline struct capture_control *task_capc(struct zone *zone)
 }
 
 static inline bool
-compaction_capture(struct capture_control *capc, struct page *page, int order)
+compaction_capture(struct capture_control *capc, struct page *page,
+		   int order, int migratetype)
 {
 	return false;
 }
@@ -870,7 +886,7 @@ static inline void __free_one_page(struct page *page,
 
 continue_merging:
 	while (order < max_order - 1) {
-		if (compaction_capture(capc, page, order)) {
+		if (compaction_capture(capc, page, order, migratetype)) {
 			if (likely(!is_migrate_isolate(migratetype)))
 				__mod_zone_freepage_state(zone, -(1 << order),
 								migratetype);

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH 25/25] mm, compaction: Do not direct compact remote memory
  2019-01-18 13:51   ` Vlastimil Babka
@ 2019-01-18 14:46     ` Mel Gorman
  0 siblings, 0 replies; 75+ messages in thread
From: Mel Gorman @ 2019-01-18 14:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, David Rientjes, Andrea Arcangeli, ying.huang, kirill,
	Andrew Morton, Linux List Kernel Mailing

On Fri, Jan 18, 2019 at 02:51:00PM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:50 PM, Mel Gorman wrote:
> > Remote compaction is expensive and possibly counter-productive. Locality
> > is expected to often have better performance characteristics than remote
> > high-order pages. For small allocations, it's expected that locality is
> > generally required or fallbacks are possible. For larger allocations such
> > as THP, they are forbidden at the time of writing but if __GFP_THISNODE
> > is ever removed, then it would still be preferable to fallback to small
> > local base pages over remote THP in the general case. kcompactd is still
> > woken via kswapd so compaction happens eventually.
> > 
> > While this patch potentially has both positive and negative effects,
> > it is best to avoid the possibility of remote compaction given the cost
> > relative to any potential benefit.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Generally agree with the intent, but what if there's e.g. high-order (but not
> costly) kernel allocation on behalf of user process on cpu belonging to a
> movable node, where the only non-movable node is node 0. It will have to keep
> reclaiming until a large enough page is formed, or wait for kcompactd?

Nnnggghhh, movable nodes. Yes, in such a case it would have to wait for
reclaim or kcompactd which could be problematic. This would have to be
special cased further.

> So maybe do this only for costly orders?
> 

This was written on the basis of the __GFP_THISNODE discussion which is
THP specific so costly didn't come into my thinking. If that ever gets
resurrected properly, this patch can be revisited. It would be trivial to
check if the preferred node is a movable node and allow remote compaction
in such cases but I'm not aiming at any specific problem with this patch
so it's too hand-wavy.

> Also I think compaction_zonelist_suitable() should be also updated, or we might
> be promising the reclaim-compact loop e.g. that we will compact after enough
> reclaim, but then we won't.
> 

True. I think I'll kill this patch as __GFP_THISNODE is now used again
for THP (regardless of how one feels about the subject) and we don't have
good examples where remote compaction for lower-order kernel allocations
is a problem.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2019-01-18 14:46 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-04 12:49 [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Mel Gorman
2019-01-04 12:49 ` [PATCH 01/25] mm, compaction: Shrink compact_control Mel Gorman
2019-01-04 12:49 ` [PATCH 02/25] mm, compaction: Rearrange compact_control Mel Gorman
2019-01-04 12:49 ` [PATCH 03/25] mm, compaction: Remove last_migrated_pfn from compact_control Mel Gorman
2019-01-04 12:49 ` [PATCH 04/25] mm, compaction: Remove unnecessary zone parameter in some instances Mel Gorman
2019-01-15 11:43   ` Vlastimil Babka
2019-01-04 12:49 ` [PATCH 05/25] mm, compaction: Rename map_pages to split_map_pages Mel Gorman
2019-01-15 11:59   ` Vlastimil Babka
2019-01-04 12:49 ` [PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages Mel Gorman
2019-01-15 12:10   ` Vlastimil Babka
2019-01-15 12:50     ` Mel Gorman
2019-01-16  9:42       ` Mel Gorman
2019-01-04 12:49 ` [PATCH 07/25] mm, migrate: Immediately fail migration of a page with no migration handler Mel Gorman
2019-01-04 12:49 ` [PATCH 08/25] mm, compaction: Always finish scanning of a full pageblock Mel Gorman
2019-01-04 12:49 ` [PATCH 09/25] mm, compaction: Use the page allocator bulk-free helper for lists of pages Mel Gorman
2019-01-15 12:39   ` Vlastimil Babka
2019-01-16  9:46     ` Mel Gorman
2019-01-04 12:49 ` [PATCH 10/25] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction Mel Gorman
2019-01-15 13:18   ` Vlastimil Babka
2019-01-04 12:49 ` [PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source Mel Gorman
2019-01-16 13:15   ` Vlastimil Babka
2019-01-16 14:33     ` Mel Gorman
2019-01-16 15:00       ` Vlastimil Babka
2019-01-16 15:43         ` Mel Gorman
2019-01-04 12:49 ` [PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance Mel Gorman
2019-01-16 15:45   ` Vlastimil Babka
2019-01-16 16:15     ` Mel Gorman
2019-01-17  9:29       ` Vlastimil Babka
2019-01-17  9:40   ` Vlastimil Babka
2019-01-04 12:49 ` [PATCH 13/25] mm, compaction: Use free lists to quickly locate a migration target Mel Gorman
2019-01-17 14:36   ` Vlastimil Babka
2019-01-17 15:51     ` Mel Gorman
2019-01-04 12:50 ` [PATCH 14/25] mm, compaction: Avoid rescanning the same pageblock multiple times Mel Gorman
2019-01-17 15:16   ` Vlastimil Babka
2019-01-17 16:00     ` Mel Gorman
2019-01-04 12:50 ` [PATCH 15/25] mm, compaction: Finish pageblock scanning on contention Mel Gorman
2019-01-17 16:38   ` Vlastimil Babka
2019-01-17 17:11     ` Mel Gorman
2019-01-18  8:57       ` Vlastimil Babka
2019-01-04 12:50 ` [PATCH 16/25] mm, compaction: Check early for huge pages encountered by the migration scanner Mel Gorman
2019-01-17 17:01   ` Vlastimil Babka
2019-01-17 17:35     ` Mel Gorman
2019-01-04 12:50 ` [PATCH 17/25] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks Mel Gorman
2019-01-17 17:17   ` Vlastimil Babka
2019-01-17 17:37     ` Mel Gorman
2019-01-04 12:50 ` [PATCH 18/25] mm, compaction: Rework compact_should_abort as compact_check_resched Mel Gorman
2019-01-17 17:27   ` Vlastimil Babka
2019-01-04 12:50 ` [PATCH 19/25] mm, compaction: Do not consider a need to reschedule as contention Mel Gorman
2019-01-17 17:33   ` Vlastimil Babka
2019-01-17 18:05     ` Mel Gorman
2019-01-04 12:50 ` [PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner Mel Gorman
2019-01-17 17:58   ` Vlastimil Babka
2019-01-17 19:39     ` Mel Gorman
2019-01-18  9:09       ` Vlastimil Babka
2019-01-04 12:50 ` [PATCH 21/25] mm, compaction: Round-robin the order while searching the free lists for a target Mel Gorman
2019-01-18  9:17   ` Vlastimil Babka
2019-01-04 12:50 ` [PATCH 22/25] mm, compaction: Sample pageblocks for free pages Mel Gorman
2019-01-18 10:38   ` Vlastimil Babka
2019-01-18 13:44     ` Mel Gorman
2019-01-04 12:50 ` [PATCH 23/25] mm, compaction: Be selective about what pageblocks to clear skip hints Mel Gorman
2019-01-18 12:55   ` Vlastimil Babka
2019-01-18 14:10     ` Mel Gorman
2019-01-04 12:50 ` [PATCH 24/25] mm, compaction: Capture a page under direct compaction Mel Gorman
2019-01-18 13:40   ` Vlastimil Babka
2019-01-18 14:39     ` Mel Gorman
2019-01-04 12:50 ` [PATCH 25/25] mm, compaction: Do not direct compact remote memory Mel Gorman
2019-01-18 13:51   ` Vlastimil Babka
2019-01-18 14:46     ` Mel Gorman
2019-01-07 23:43 ` [PATCH 00/25] Increase success rates and reduce latency of compaction v2 Andrew Morton
2019-01-08  9:12   ` Mel Gorman
2019-01-09 11:13 ` [PATCH] mm, compaction: Use free lists to quickly locate a migration target -fix Mel Gorman
2019-01-09 19:27   ` Andrew Morton
2019-01-09 21:26     ` Mel Gorman
2019-01-09 11:15 ` [PATCH] mm, compaction: Finish pageblock scanning on contention -fix Mel Gorman
2019-01-09 11:16 ` [PATCH] mm, compaction: Round-robin the order while searching the free lists for a target -fix Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).