All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/8] try to reduce fragmenting fallbacks
@ 2017-03-07 13:15 ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

Last year, Johannes Weiner has reported a regression in page mobility
grouping [1] and while the exact cause was not found, I've come up with some
ways to improve it by reducing the number of allocations falling back to
different migratetype and causing permanent fragmentation.

Changes since v2:
- incorporated feedback (nothing major)
- updated to current mmotm
- dropped the more intrusive RFC patches 9-10

The series was tested with mmtests stress-highalloc modified to do GFP_KERNEL
order-4 allocations, on 4.9 with "mm, vmscan: fix zone balance check in
prepare_kswapd_sleep" (without that, kcompactd indeed wasn't woken up) on UMA
machine with 4GB memory. There were 5 repeats of each run, as the extfrag stats
are quite volatile (note the stats below are sums, not averages, as it was less
perl hacking for me).

Success rate are the same, already high due to the low allocation order used,
so I'm not including them.

Compaction stats:
(the patches are stacked, and I haven't measured the non-functional-changes
patches separately)

                                   patch 1     patch 2     patch 3     patch 4     patch 7     patch 8
Compaction stalls                    22449       24680       24846       19765       22059       17480
Compaction success                   12971       14836       14608       10475       11632        8757
Compaction failures                   9477        9843       10238        9290       10426        8722
Page migrate success               3109022     3370438     3312164     1695105     1608435     2111379
Page migrate failure                911588     1149065     1028264     1112675     1077251     1026367
Compaction pages isolated          7242983     8015530     7782467     4629063     4402787     5377665
Compaction migrate scanned       980838938   987367943   957690188   917647238   947155598  1018922197
Compaction free scanned          557926893   598946443   602236894   594024490   541169699   763651731
Compaction cost                      10243       10578       10304        8286        8398        9440

Compaction stats are mostly within noise until patch 4, which decreases the
number of compactions, and migrations. Part of that could be due to more
pageblocks marked as unmovable, and async compaction skipping those. This
changes a bit with patch 7, but not so much. Patch 8 increases free scanner
stats and migrations, which comes from the changed termination criteria.
Interestingly number of compactions decreases - probably the fully compacted
pageblock satisfies multiple subsequent allocations, so it amortizes.

Next comes the extfrag tracepoint, where "fragmenting" means that an allocation
had to fallback to a pageblock of another migratetype which wasn't fully free
(which is almost all of the fallbacks). I have locally added another tracepoint
for "Page steal" into steal_suitable_fallback() which triggers in situations
where we are allowed to do move_freepages_block(). If we decide to also do
set_pageblock_migratetype(), it's "Pages steal with pageblock" with break down
for which allocation migratetype we are stealing and from which fallback
migratetype. The last part "due to counting" comes from patch 4 and counts the
events where the counting of movable pages allowed us to change pageblock's
migratetype, while the number of free pages alone wouldn't be enough to cross
the threshold.

                                                     patch 1     patch 2     patch 3     patch 4     patch 7     patch 8
Page alloc extfrag event                            10155066     8522968    10164959    15622080    13727068    13140319
Extfrag fragmenting                                 10149231     8517025    10159040    15616925    13721391    13134792
Extfrag fragmenting for unmovable                     159504      168500      184177       97835       70625       56948
Extfrag fragmenting unmovable placed with movable     153613      163549      172693       91740       64099       50917
Extfrag fragmenting unmovable placed with reclaim.      5891        4951       11484        6095        6526        6031
Extfrag fragmenting for reclaimable                     4738        4829        6345        4822        5640        5378
Extfrag fragmenting reclaimable placed with movable     1836        1902        1851        1579        1739        1760
Extfrag fragmenting reclaimable placed with unmov.      2902        2927        4494        3243        3901        3618
Extfrag fragmenting for movable                      9984989     8343696     9968518    15514268    13645126    13072466
Pages steal                                           179954      192291      210880      123254       94545       81486
Pages steal with pageblock                             22153       18943       20154       33562       29969       33444
Pages steal with pageblock for unmovable               14350       12858       13256       20660       19003       20852
Pages steal with pageblock for unmovable from mov.     12812       11402       11683       19072       17467       19298
Pages steal with pageblock for unmovable from recl.     1538        1456        1573        1588        1536        1554
Pages steal with pageblock for movable                  7114        5489        5965       11787       10012       11493
Pages steal with pageblock for movable from unmov.      6885        5291        5541       11179        9525       10885
Pages steal with pageblock for movable from recl.        229         198         424         608         487         608
Pages steal with pageblock for reclaimable               689         596         933        1115         954        1099
Pages steal with pageblock for reclaimable from unmov.   273         219         537         658         547         667
Pages steal with pageblock for reclaimable from mov.     416         377         396         457         407         432
Pages steal with pageblock due to counting                                                 11834       10075        7530
... for unmovable                                                                           8993        7381        4616
... for movable                                                                             2792        2653        2851
... for reclaimable                                                                           49          41          63

What we can see is that "Extfrag fragmenting for unmovable" and "... placed with
movable" drops with almost each patch, which is good as we are polluting less
movable pageblocks with unmovable pages.
The most significant change is patch 4 with movable page counting. On the other
hand it increases "Extfrag fragmenting for movable" by 50%. "Pages steal" drops
though, so these movable allocation fallbacks find only small free pages and are
not allowed to steal whole pageblocks back. "Pages steal with pageblock" raises,
because the patch increases the chances of pageblock migratetype changes to
happen. This affects all migratetypes.
The summary is that patch 4 is not a clear win wrt these stats, but I believe
that the tradeoff it makes is a good one. There's less pollution of movable
pageblocks by unmovable allocations. There's less stealing between pageblock,
and those that remain have higher chance of changing migratetype also the
pageblock itself, so it should more faithfully reflect the migratetype of the
pages within the pageblock. The increase of movable allocations falling back to
unmovable pageblock might look dramatic, but those allocations can be migrated
by compaction when needed, and other patches in the series (7-9) improve that
aspect.
Patches 7 and 8 continue the trend of reduced unmovable fallbacks and also
reduce the impact on movable fallbacks from patch 4.

[1] https://www.spinics.net/lists/linux-mm/msg114237.html

Vlastimil Babka (8):
  mm, compaction: reorder fields in struct compact_control
  mm, compaction: remove redundant watermark check in compact_finished()
  mm, page_alloc: split smallest stolen page in fallback
  mm, page_alloc: count movable pages when stealing from pageblock
  mm, compaction: change migrate_async_suitable() to
    suitable_migration_source()
  mm, compaction: add migratetype to compact_control
  mm, compaction: restrict async compaction to pageblocks of same
    migratetype
  mm, compaction: finish whole pageblock to reduce fragmentation

 include/linux/mmzone.h         |   5 ++
 include/linux/page-isolation.h |   5 +-
 mm/compaction.c                |  83 ++++++++++++++++-------
 mm/internal.h                  |  12 ++--
 mm/page_alloc.c                | 148 ++++++++++++++++++++++++++++-------------
 mm/page_isolation.c            |   5 +-
 6 files changed, 177 insertions(+), 81 deletions(-)

-- 
2.12.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 0/8] try to reduce fragmenting fallbacks
@ 2017-03-07 13:15 ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

Last year, Johannes Weiner has reported a regression in page mobility
grouping [1] and while the exact cause was not found, I've come up with some
ways to improve it by reducing the number of allocations falling back to
different migratetype and causing permanent fragmentation.

Changes since v2:
- incorporated feedback (nothing major)
- updated to current mmotm
- dropped the more intrusive RFC patches 9-10

The series was tested with mmtests stress-highalloc modified to do GFP_KERNEL
order-4 allocations, on 4.9 with "mm, vmscan: fix zone balance check in
prepare_kswapd_sleep" (without that, kcompactd indeed wasn't woken up) on UMA
machine with 4GB memory. There were 5 repeats of each run, as the extfrag stats
are quite volatile (note the stats below are sums, not averages, as it was less
perl hacking for me).

Success rate are the same, already high due to the low allocation order used,
so I'm not including them.

Compaction stats:
(the patches are stacked, and I haven't measured the non-functional-changes
patches separately)

                                   patch 1     patch 2     patch 3     patch 4     patch 7     patch 8
Compaction stalls                    22449       24680       24846       19765       22059       17480
Compaction success                   12971       14836       14608       10475       11632        8757
Compaction failures                   9477        9843       10238        9290       10426        8722
Page migrate success               3109022     3370438     3312164     1695105     1608435     2111379
Page migrate failure                911588     1149065     1028264     1112675     1077251     1026367
Compaction pages isolated          7242983     8015530     7782467     4629063     4402787     5377665
Compaction migrate scanned       980838938   987367943   957690188   917647238   947155598  1018922197
Compaction free scanned          557926893   598946443   602236894   594024490   541169699   763651731
Compaction cost                      10243       10578       10304        8286        8398        9440

Compaction stats are mostly within noise until patch 4, which decreases the
number of compactions, and migrations. Part of that could be due to more
pageblocks marked as unmovable, and async compaction skipping those. This
changes a bit with patch 7, but not so much. Patch 8 increases free scanner
stats and migrations, which comes from the changed termination criteria.
Interestingly number of compactions decreases - probably the fully compacted
pageblock satisfies multiple subsequent allocations, so it amortizes.

Next comes the extfrag tracepoint, where "fragmenting" means that an allocation
had to fallback to a pageblock of another migratetype which wasn't fully free
(which is almost all of the fallbacks). I have locally added another tracepoint
for "Page steal" into steal_suitable_fallback() which triggers in situations
where we are allowed to do move_freepages_block(). If we decide to also do
set_pageblock_migratetype(), it's "Pages steal with pageblock" with break down
for which allocation migratetype we are stealing and from which fallback
migratetype. The last part "due to counting" comes from patch 4 and counts the
events where the counting of movable pages allowed us to change pageblock's
migratetype, while the number of free pages alone wouldn't be enough to cross
the threshold.

                                                     patch 1     patch 2     patch 3     patch 4     patch 7     patch 8
Page alloc extfrag event                            10155066     8522968    10164959    15622080    13727068    13140319
Extfrag fragmenting                                 10149231     8517025    10159040    15616925    13721391    13134792
Extfrag fragmenting for unmovable                     159504      168500      184177       97835       70625       56948
Extfrag fragmenting unmovable placed with movable     153613      163549      172693       91740       64099       50917
Extfrag fragmenting unmovable placed with reclaim.      5891        4951       11484        6095        6526        6031
Extfrag fragmenting for reclaimable                     4738        4829        6345        4822        5640        5378
Extfrag fragmenting reclaimable placed with movable     1836        1902        1851        1579        1739        1760
Extfrag fragmenting reclaimable placed with unmov.      2902        2927        4494        3243        3901        3618
Extfrag fragmenting for movable                      9984989     8343696     9968518    15514268    13645126    13072466
Pages steal                                           179954      192291      210880      123254       94545       81486
Pages steal with pageblock                             22153       18943       20154       33562       29969       33444
Pages steal with pageblock for unmovable               14350       12858       13256       20660       19003       20852
Pages steal with pageblock for unmovable from mov.     12812       11402       11683       19072       17467       19298
Pages steal with pageblock for unmovable from recl.     1538        1456        1573        1588        1536        1554
Pages steal with pageblock for movable                  7114        5489        5965       11787       10012       11493
Pages steal with pageblock for movable from unmov.      6885        5291        5541       11179        9525       10885
Pages steal with pageblock for movable from recl.        229         198         424         608         487         608
Pages steal with pageblock for reclaimable               689         596         933        1115         954        1099
Pages steal with pageblock for reclaimable from unmov.   273         219         537         658         547         667
Pages steal with pageblock for reclaimable from mov.     416         377         396         457         407         432
Pages steal with pageblock due to counting                                                 11834       10075        7530
... for unmovable                                                                           8993        7381        4616
... for movable                                                                             2792        2653        2851
... for reclaimable                                                                           49          41          63

What we can see is that "Extfrag fragmenting for unmovable" and "... placed with
movable" drops with almost each patch, which is good as we are polluting less
movable pageblocks with unmovable pages.
The most significant change is patch 4 with movable page counting. On the other
hand it increases "Extfrag fragmenting for movable" by 50%. "Pages steal" drops
though, so these movable allocation fallbacks find only small free pages and are
not allowed to steal whole pageblocks back. "Pages steal with pageblock" raises,
because the patch increases the chances of pageblock migratetype changes to
happen. This affects all migratetypes.
The summary is that patch 4 is not a clear win wrt these stats, but I believe
that the tradeoff it makes is a good one. There's less pollution of movable
pageblocks by unmovable allocations. There's less stealing between pageblock,
and those that remain have higher chance of changing migratetype also the
pageblock itself, so it should more faithfully reflect the migratetype of the
pages within the pageblock. The increase of movable allocations falling back to
unmovable pageblock might look dramatic, but those allocations can be migrated
by compaction when needed, and other patches in the series (7-9) improve that
aspect.
Patches 7 and 8 continue the trend of reduced unmovable fallbacks and also
reduce the impact on movable fallbacks from patch 4.

[1] https://www.spinics.net/lists/linux-mm/msg114237.html

Vlastimil Babka (8):
  mm, compaction: reorder fields in struct compact_control
  mm, compaction: remove redundant watermark check in compact_finished()
  mm, page_alloc: split smallest stolen page in fallback
  mm, page_alloc: count movable pages when stealing from pageblock
  mm, compaction: change migrate_async_suitable() to
    suitable_migration_source()
  mm, compaction: add migratetype to compact_control
  mm, compaction: restrict async compaction to pageblocks of same
    migratetype
  mm, compaction: finish whole pageblock to reduce fragmentation

 include/linux/mmzone.h         |   5 ++
 include/linux/page-isolation.h |   5 +-
 mm/compaction.c                |  83 ++++++++++++++++-------
 mm/internal.h                  |  12 ++--
 mm/page_alloc.c                | 148 ++++++++++++++++++++++++++++-------------
 mm/page_isolation.c            |   5 +-
 6 files changed, 177 insertions(+), 81 deletions(-)

-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 1/8] mm, compaction: reorder fields in struct compact_control
  2017-03-07 13:15 ` Vlastimil Babka
@ 2017-03-07 13:15   ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

While currently there are (mostly by accident) no holes in struct
compact_control (on x86_64), but we are going to add more bool flags, so place
them all together to the end of the structure. While at it, just order all
fields from largest to smallest.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/internal.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 823a7a89099b..05c48a95a20a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -183,6 +183,7 @@ extern int user_min_free_kbytes;
 struct compact_control {
 	struct list_head freepages;	/* List of free pages to migrate to */
 	struct list_head migratepages;	/* List of pages being migrated */
+	struct zone *zone;
 	unsigned long nr_freepages;	/* Number of isolated free pages */
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long total_migrate_scanned;
@@ -190,16 +191,15 @@ struct compact_control {
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
+	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
+	int order;			/* order a direct compactor needs */
+	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
+	const int classzone_idx;	/* zone index of a direct compactor */
 	enum migrate_mode mode;		/* Async or sync migration mode */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool ignore_block_suitable;	/* Scan blocks considered unsuitable */
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
-	int order;			/* order a direct compactor needs */
-	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
-	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
-	const int classzone_idx;	/* zone index of a direct compactor */
-	struct zone *zone;
 	bool contended;			/* Signal lock or sched contention */
 };
 
-- 
2.12.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 1/8] mm, compaction: reorder fields in struct compact_control
@ 2017-03-07 13:15   ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

While currently there are (mostly by accident) no holes in struct
compact_control (on x86_64), but we are going to add more bool flags, so place
them all together to the end of the structure. While at it, just order all
fields from largest to smallest.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/internal.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 823a7a89099b..05c48a95a20a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -183,6 +183,7 @@ extern int user_min_free_kbytes;
 struct compact_control {
 	struct list_head freepages;	/* List of free pages to migrate to */
 	struct list_head migratepages;	/* List of pages being migrated */
+	struct zone *zone;
 	unsigned long nr_freepages;	/* Number of isolated free pages */
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long total_migrate_scanned;
@@ -190,16 +191,15 @@ struct compact_control {
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
+	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
+	int order;			/* order a direct compactor needs */
+	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
+	const int classzone_idx;	/* zone index of a direct compactor */
 	enum migrate_mode mode;		/* Async or sync migration mode */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool ignore_block_suitable;	/* Scan blocks considered unsuitable */
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
-	int order;			/* order a direct compactor needs */
-	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
-	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
-	const int classzone_idx;	/* zone index of a direct compactor */
-	struct zone *zone;
 	bool contended;			/* Signal lock or sched contention */
 };
 
-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 2/8] mm, compaction: remove redundant watermark check in compact_finished()
  2017-03-07 13:15 ` Vlastimil Babka
@ 2017-03-07 13:15   ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

When detecting whether compaction has succeeded in forming a high-order page,
__compact_finished() employs a watermark check, followed by an own search for
a suitable page in the freelists. This is not ideal for two reasons:

- The watermark check also searches high-order freelists, but has a less strict
  criteria wrt fallback. It's therefore redundant and waste of cycles. This was
  different in the past when high-order watermark check attempted to apply
  reserves to high-order pages.

- The watermark check might actually fail due to lack of order-0 pages.
  Compaction can't help with that, so there's no point in continuing because of
  that. It's possible that high-order page still exists and it terminates.

This patch therefore removes the watermark check. This should save some cycles
and terminate compaction sooner in some cases.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 81e1eaa2a2cf..9222ff362f33 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1280,7 +1280,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 			    const int migratetype)
 {
 	unsigned int order;
-	unsigned long watermark;
 
 	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_CONTENDED;
@@ -1308,13 +1307,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 	if (is_via_compact_memory(cc->order))
 		return COMPACT_CONTINUE;
 
-	/* Compaction run is not finished if the watermark is not met */
-	watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];
-
-	if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
-							cc->alloc_flags))
-		return COMPACT_CONTINUE;
-
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
 		struct free_area *area = &zone->free_area[order];
-- 
2.12.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 2/8] mm, compaction: remove redundant watermark check in compact_finished()
@ 2017-03-07 13:15   ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

When detecting whether compaction has succeeded in forming a high-order page,
__compact_finished() employs a watermark check, followed by an own search for
a suitable page in the freelists. This is not ideal for two reasons:

- The watermark check also searches high-order freelists, but has a less strict
  criteria wrt fallback. It's therefore redundant and waste of cycles. This was
  different in the past when high-order watermark check attempted to apply
  reserves to high-order pages.

- The watermark check might actually fail due to lack of order-0 pages.
  Compaction can't help with that, so there's no point in continuing because of
  that. It's possible that high-order page still exists and it terminates.

This patch therefore removes the watermark check. This should save some cycles
and terminate compaction sooner in some cases.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 81e1eaa2a2cf..9222ff362f33 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1280,7 +1280,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 			    const int migratetype)
 {
 	unsigned int order;
-	unsigned long watermark;
 
 	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_CONTENDED;
@@ -1308,13 +1307,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 	if (is_via_compact_memory(cc->order))
 		return COMPACT_CONTINUE;
 
-	/* Compaction run is not finished if the watermark is not met */
-	watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];
-
-	if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
-							cc->alloc_flags))
-		return COMPACT_CONTINUE;
-
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
 		struct free_area *area = &zone->free_area[order];
-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 3/8] mm, page_alloc: split smallest stolen page in fallback
  2017-03-07 13:15 ` Vlastimil Babka
@ 2017-03-07 13:15   ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

The __rmqueue_fallback() function is called when there's no free page of
requested migratetype, and we need to steal from a different one. There are
various heuristics to make this event infrequent and reduce permanent
fragmentation. The main one is to try stealing from a pageblock that has the
most free pages, and possibly steal them all at once and convert the whole
pageblock. Precise searching for such pageblock would be expensive, so instead
the heuristics walks the free lists from MAX_ORDER down to requested order and
assumes that the block with highest-order free page is likely to also have the
most free pages in total.

Chances are that together with the highest-order page, we steal also pages of
lower orders from the same block. But then we still split the highest order
page. This is wasteful and can contribute to fragmentation instead of avoiding
it.

This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
them on the freelist of the requested migratetype, and only report whether it
was successful. Then we pick (and eventually split) the smallest page with
__rmqueue_smallest().  This all happens under zone lock, so nobody can steal it
from us in the process. This should reduce fragmentation due to fallbacks. At
worst we are only stealing a single highest-order page and waste some cycles by
moving it between lists and then removing it, but fallback is not exactly hot
path so that should not be a concern. As a side benefit the patch removes some
duplicate code by reusing __rmqueue_smallest().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_alloc.c | 59 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 34 insertions(+), 25 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5238b87aec91..eda7fedf6378 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1952,23 +1952,41 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
  * use it's pages as requested migratetype in the future.
  */
 static void steal_suitable_fallback(struct zone *zone, struct page *page,
-							  int start_type)
+					int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
+	struct free_area *area;
 	int pages;
 
+	/*
+	 * This can happen due to races and we want to prevent broken
+	 * highatomic accounting.
+	 */
+	if (is_migrate_highatomic_page(page))
+		goto single_page;
+
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
 		change_pageblock_range(page, current_order, start_type);
-		return;
+		goto single_page;
 	}
 
+	/* We are not allowed to try stealing from the whole block */
+	if (!whole_block)
+		goto single_page;
+
 	pages = move_freepages_block(zone, page, start_type);
 
 	/* Claim the whole block if over half of it is free */
 	if (pages >= (1 << (pageblock_order-1)) ||
 			page_group_by_mobility_disabled)
 		set_pageblock_migratetype(page, start_type);
+
+	return;
+
+single_page:
+	area = &zone->free_area[current_order];
+	list_move(&page->lru, &area->free_list[start_type]);
 }
 
 /*
@@ -2127,8 +2145,13 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 	return false;
 }
 
-/* Remove an element from the buddy allocator from the fallback list */
-static inline struct page *
+/*
+ * Try finding a free buddy page on the fallback list and put it on the free
+ * list of requested migratetype, possibly along with other pages from the same
+ * block, depending on fragmentation avoidance heuristics. Returns true if
+ * fallback was found so that __rmqueue_smallest() can grab it.
+ */
+static inline bool
 __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 {
 	struct free_area *area;
@@ -2149,32 +2172,17 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 
 		page = list_first_entry(&area->free_list[fallback_mt],
 						struct page, lru);
-		if (can_steal && !is_migrate_highatomic_page(page))
-			steal_suitable_fallback(zone, page, start_migratetype);
 
-		/* Remove the page from the freelists */
-		area->nr_free--;
-		list_del(&page->lru);
-		rmv_page_order(page);
-
-		expand(zone, page, order, current_order, area,
-					start_migratetype);
-		/*
-		 * The pcppage_migratetype may differ from pageblock's
-		 * migratetype depending on the decisions in
-		 * find_suitable_fallback(). This is OK as long as it does not
-		 * differ for MIGRATE_CMA pageblocks. Those can be used as
-		 * fallback only via special __rmqueue_cma_fallback() function
-		 */
-		set_pcppage_migratetype(page, start_migratetype);
+		steal_suitable_fallback(zone, page, start_migratetype,
+								can_steal);
 
 		trace_mm_page_alloc_extfrag(page, order, current_order,
 			start_migratetype, fallback_mt);
 
-		return page;
+		return true;
 	}
 
-	return NULL;
+	return false;
 }
 
 /*
@@ -2186,13 +2194,14 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 {
 	struct page *page;
 
+retry:
 	page = __rmqueue_smallest(zone, order, migratetype);
 	if (unlikely(!page)) {
 		if (migratetype == MIGRATE_MOVABLE)
 			page = __rmqueue_cma_fallback(zone, order);
 
-		if (!page)
-			page = __rmqueue_fallback(zone, order, migratetype);
+		if (!page && __rmqueue_fallback(zone, order, migratetype))
+			goto retry;
 	}
 
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
-- 
2.12.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 3/8] mm, page_alloc: split smallest stolen page in fallback
@ 2017-03-07 13:15   ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

The __rmqueue_fallback() function is called when there's no free page of
requested migratetype, and we need to steal from a different one. There are
various heuristics to make this event infrequent and reduce permanent
fragmentation. The main one is to try stealing from a pageblock that has the
most free pages, and possibly steal them all at once and convert the whole
pageblock. Precise searching for such pageblock would be expensive, so instead
the heuristics walks the free lists from MAX_ORDER down to requested order and
assumes that the block with highest-order free page is likely to also have the
most free pages in total.

Chances are that together with the highest-order page, we steal also pages of
lower orders from the same block. But then we still split the highest order
page. This is wasteful and can contribute to fragmentation instead of avoiding
it.

This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
them on the freelist of the requested migratetype, and only report whether it
was successful. Then we pick (and eventually split) the smallest page with
__rmqueue_smallest().  This all happens under zone lock, so nobody can steal it
from us in the process. This should reduce fragmentation due to fallbacks. At
worst we are only stealing a single highest-order page and waste some cycles by
moving it between lists and then removing it, but fallback is not exactly hot
path so that should not be a concern. As a side benefit the patch removes some
duplicate code by reusing __rmqueue_smallest().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_alloc.c | 59 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 34 insertions(+), 25 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5238b87aec91..eda7fedf6378 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1952,23 +1952,41 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
  * use it's pages as requested migratetype in the future.
  */
 static void steal_suitable_fallback(struct zone *zone, struct page *page,
-							  int start_type)
+					int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
+	struct free_area *area;
 	int pages;
 
+	/*
+	 * This can happen due to races and we want to prevent broken
+	 * highatomic accounting.
+	 */
+	if (is_migrate_highatomic_page(page))
+		goto single_page;
+
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
 		change_pageblock_range(page, current_order, start_type);
-		return;
+		goto single_page;
 	}
 
+	/* We are not allowed to try stealing from the whole block */
+	if (!whole_block)
+		goto single_page;
+
 	pages = move_freepages_block(zone, page, start_type);
 
 	/* Claim the whole block if over half of it is free */
 	if (pages >= (1 << (pageblock_order-1)) ||
 			page_group_by_mobility_disabled)
 		set_pageblock_migratetype(page, start_type);
+
+	return;
+
+single_page:
+	area = &zone->free_area[current_order];
+	list_move(&page->lru, &area->free_list[start_type]);
 }
 
 /*
@@ -2127,8 +2145,13 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 	return false;
 }
 
-/* Remove an element from the buddy allocator from the fallback list */
-static inline struct page *
+/*
+ * Try finding a free buddy page on the fallback list and put it on the free
+ * list of requested migratetype, possibly along with other pages from the same
+ * block, depending on fragmentation avoidance heuristics. Returns true if
+ * fallback was found so that __rmqueue_smallest() can grab it.
+ */
+static inline bool
 __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 {
 	struct free_area *area;
@@ -2149,32 +2172,17 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 
 		page = list_first_entry(&area->free_list[fallback_mt],
 						struct page, lru);
-		if (can_steal && !is_migrate_highatomic_page(page))
-			steal_suitable_fallback(zone, page, start_migratetype);
 
-		/* Remove the page from the freelists */
-		area->nr_free--;
-		list_del(&page->lru);
-		rmv_page_order(page);
-
-		expand(zone, page, order, current_order, area,
-					start_migratetype);
-		/*
-		 * The pcppage_migratetype may differ from pageblock's
-		 * migratetype depending on the decisions in
-		 * find_suitable_fallback(). This is OK as long as it does not
-		 * differ for MIGRATE_CMA pageblocks. Those can be used as
-		 * fallback only via special __rmqueue_cma_fallback() function
-		 */
-		set_pcppage_migratetype(page, start_migratetype);
+		steal_suitable_fallback(zone, page, start_migratetype,
+								can_steal);
 
 		trace_mm_page_alloc_extfrag(page, order, current_order,
 			start_migratetype, fallback_mt);
 
-		return page;
+		return true;
 	}
 
-	return NULL;
+	return false;
 }
 
 /*
@@ -2186,13 +2194,14 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 {
 	struct page *page;
 
+retry:
 	page = __rmqueue_smallest(zone, order, migratetype);
 	if (unlikely(!page)) {
 		if (migratetype == MIGRATE_MOVABLE)
 			page = __rmqueue_cma_fallback(zone, order);
 
-		if (!page)
-			page = __rmqueue_fallback(zone, order, migratetype);
+		if (!page && __rmqueue_fallback(zone, order, migratetype))
+			goto retry;
 	}
 
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 4/8] mm, page_alloc: count movable pages when stealing from pageblock
  2017-03-07 13:15 ` Vlastimil Babka
@ 2017-03-07 13:15   ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

When stealing pages from pageblock of a different migratetype, we count how
many free pages were stolen, and change the pageblock's migratetype if more
than half of the pageblock was free. This might be too conservative, as there
might be other pages that are not free, but were allocated with the same
migratetype as our allocation requested.

While we cannot determine the migratetype of allocated pages precisely (at
least without the page_owner functionality enabled), we can count pages that
compaction would try to isolate for migration - those are either on LRU or
__PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
done as part of free page stealing with little additional overhead.

The page stealing code is changed so that it considers free pages plus pages
of the "good" migratetype for the decision whether to change pageblock's
migratetype.

The result should be more accurate migratetype of pageblocks wrt the actual
pages in the pageblocks, when stealing from semi-occupied pageblocks. This
should help the efficiency of page grouping by mobility.

In testing based on 4.9 kernel with stress-highalloc from mmtests configured
for order-4 GFP_KERNEL allocations, this patch has reduced the number of
unmovable allocations falling back to movable pageblocks by 47%. The number
of movable allocations falling back to other pageblocks are increased by 55%,
but these events don't cause permanent fragmentation, so the tradeoff should
be positive. Later patches also offset the movable fallback increase to some
extent.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/page-isolation.h |  5 +--
 mm/page_alloc.c                | 71 +++++++++++++++++++++++++++++++++---------
 mm/page_isolation.c            |  5 +--
 3 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 047d64706f2a..d4cd2014fa6f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,10 +33,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			 bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype);
-int move_freepages(struct zone *zone,
-			  struct page *start_page, struct page *end_page,
-			  int migratetype);
+				int migratetype, int *num_movable);
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eda7fedf6378..db96d1ebbed8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1836,9 +1836,9 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
  * Note that start_page and end_pages are not aligned on a pageblock
  * boundary. If alignment is required, use move_freepages_block()
  */
-int move_freepages(struct zone *zone,
+static int move_freepages(struct zone *zone,
 			  struct page *start_page, struct page *end_page,
-			  int migratetype)
+			  int migratetype, int *num_movable)
 {
 	struct page *page;
 	unsigned int order;
@@ -1855,6 +1855,9 @@ int move_freepages(struct zone *zone,
 	VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
 #endif
 
+	if (num_movable)
+		*num_movable = 0;
+
 	for (page = start_page; page <= end_page;) {
 		if (!pfn_valid_within(page_to_pfn(page))) {
 			page++;
@@ -1865,6 +1868,15 @@ int move_freepages(struct zone *zone,
 		VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
 
 		if (!PageBuddy(page)) {
+			/*
+			 * We assume that pages that could be isolated for
+			 * migration are movable. But we don't actually try
+			 * isolating, as that would be expensive.
+			 */
+			if (num_movable &&
+					(PageLRU(page) || __PageMovable(page)))
+				(*num_movable)++;
+
 			page++;
 			continue;
 		}
@@ -1880,7 +1892,7 @@ int move_freepages(struct zone *zone,
 }
 
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+				int migratetype, int *num_movable)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -1897,7 +1909,8 @@ int move_freepages_block(struct zone *zone, struct page *page,
 	if (!zone_spans_pfn(zone, end_pfn))
 		return 0;
 
-	return move_freepages(zone, start_page, end_page, migratetype);
+	return move_freepages(zone, start_page, end_page, migratetype,
+								num_movable);
 }
 
 static void change_pageblock_range(struct page *pageblock_page,
@@ -1947,22 +1960,26 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
 /*
  * This function implements actual steal behaviour. If order is large enough,
  * we can steal whole pageblock. If not, we first move freepages in this
- * pageblock and check whether half of pages are moved or not. If half of
- * pages are moved, we can change migratetype of pageblock and permanently
- * use it's pages as requested migratetype in the future.
+ * pageblock to our migratetype and determine how many already-allocated pages
+ * are there in the pageblock with a compatible migratetype. If at least half
+ * of pages are free or compatible, we can change migratetype of the pageblock
+ * itself, so pages freed in the future will be put on the correct free list.
  */
 static void steal_suitable_fallback(struct zone *zone, struct page *page,
 					int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
 	struct free_area *area;
-	int pages;
+	int free_pages, movable_pages, alike_pages;
+	int old_block_type;
+
+	old_block_type = get_pageblock_migratetype(page);
 
 	/*
 	 * This can happen due to races and we want to prevent broken
 	 * highatomic accounting.
 	 */
-	if (is_migrate_highatomic_page(page))
+	if (is_migrate_highatomic(old_block_type))
 		goto single_page;
 
 	/* Take ownership for orders >= pageblock_order */
@@ -1975,10 +1992,35 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	if (!whole_block)
 		goto single_page;
 
-	pages = move_freepages_block(zone, page, start_type);
+	free_pages = move_freepages_block(zone, page, start_type,
+						&movable_pages);
+	/*
+	 * Determine how many pages are compatible with our allocation.
+	 * For movable allocation, it's the number of movable pages which
+	 * we just obtained. For other types it's a bit more tricky.
+	 */
+	if (start_type == MIGRATE_MOVABLE) {
+		alike_pages = movable_pages;
+	} else {
+		/*
+		 * If we are falling back a RECLAIMABLE or UNMOVABLE allocation
+		 * to MOVABLE pageblock, consider all non-movable pages as
+		 * compatible. If it's UNMOVABLE falling back to RECLAIMABLE or
+		 * vice versa, be conservative since we can't distinguish the
+		 * exact migratetype of non-movable pages.
+		 */
+		if (old_block_type == MIGRATE_MOVABLE)
+			alike_pages = pageblock_nr_pages
+						- (free_pages + movable_pages);
+		else
+			alike_pages = 0;
+	}
 
-	/* Claim the whole block if over half of it is free */
-	if (pages >= (1 << (pageblock_order-1)) ||
+	/*
+	 * If a sufficient number of pages in the block are either free or of
+	 * comparable migratability as our allocation, claim the whole block.
+	 */
+	if (free_pages + alike_pages >= (1 << (pageblock_order-1)) ||
 			page_group_by_mobility_disabled)
 		set_pageblock_migratetype(page, start_type);
 
@@ -2056,7 +2098,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 	    && !is_migrate_cma(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
-		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
+		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
 	}
 
 out_unlock:
@@ -2133,7 +2175,8 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 			 * may increase.
 			 */
 			set_pageblock_migratetype(page, ac->migratetype);
-			ret = move_freepages_block(zone, page, ac->migratetype);
+			ret = move_freepages_block(zone, page, ac->migratetype,
+									NULL);
 			if (ret) {
 				spin_unlock_irqrestore(&zone->lock, flags);
 				return ret;
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 7927bbb54a4e..5092e4ef00c8 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -66,7 +66,8 @@ static int set_migratetype_isolate(struct page *page,
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
+									NULL);
 
 		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
 	}
@@ -120,7 +121,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	 * pageblock scanning for freepage moving.
 	 */
 	if (!isolated_page) {
-		nr_pages = move_freepages_block(zone, page, migratetype);
+		nr_pages = move_freepages_block(zone, page, migratetype, NULL);
 		__mod_zone_freepage_state(zone, nr_pages, migratetype);
 	}
 	set_pageblock_migratetype(page, migratetype);
-- 
2.12.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 4/8] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-03-07 13:15   ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

When stealing pages from pageblock of a different migratetype, we count how
many free pages were stolen, and change the pageblock's migratetype if more
than half of the pageblock was free. This might be too conservative, as there
might be other pages that are not free, but were allocated with the same
migratetype as our allocation requested.

While we cannot determine the migratetype of allocated pages precisely (at
least without the page_owner functionality enabled), we can count pages that
compaction would try to isolate for migration - those are either on LRU or
__PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
done as part of free page stealing with little additional overhead.

The page stealing code is changed so that it considers free pages plus pages
of the "good" migratetype for the decision whether to change pageblock's
migratetype.

The result should be more accurate migratetype of pageblocks wrt the actual
pages in the pageblocks, when stealing from semi-occupied pageblocks. This
should help the efficiency of page grouping by mobility.

In testing based on 4.9 kernel with stress-highalloc from mmtests configured
for order-4 GFP_KERNEL allocations, this patch has reduced the number of
unmovable allocations falling back to movable pageblocks by 47%. The number
of movable allocations falling back to other pageblocks are increased by 55%,
but these events don't cause permanent fragmentation, so the tradeoff should
be positive. Later patches also offset the movable fallback increase to some
extent.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/page-isolation.h |  5 +--
 mm/page_alloc.c                | 71 +++++++++++++++++++++++++++++++++---------
 mm/page_isolation.c            |  5 +--
 3 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 047d64706f2a..d4cd2014fa6f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,10 +33,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			 bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype);
-int move_freepages(struct zone *zone,
-			  struct page *start_page, struct page *end_page,
-			  int migratetype);
+				int migratetype, int *num_movable);
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eda7fedf6378..db96d1ebbed8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1836,9 +1836,9 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
  * Note that start_page and end_pages are not aligned on a pageblock
  * boundary. If alignment is required, use move_freepages_block()
  */
-int move_freepages(struct zone *zone,
+static int move_freepages(struct zone *zone,
 			  struct page *start_page, struct page *end_page,
-			  int migratetype)
+			  int migratetype, int *num_movable)
 {
 	struct page *page;
 	unsigned int order;
@@ -1855,6 +1855,9 @@ int move_freepages(struct zone *zone,
 	VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
 #endif
 
+	if (num_movable)
+		*num_movable = 0;
+
 	for (page = start_page; page <= end_page;) {
 		if (!pfn_valid_within(page_to_pfn(page))) {
 			page++;
@@ -1865,6 +1868,15 @@ int move_freepages(struct zone *zone,
 		VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
 
 		if (!PageBuddy(page)) {
+			/*
+			 * We assume that pages that could be isolated for
+			 * migration are movable. But we don't actually try
+			 * isolating, as that would be expensive.
+			 */
+			if (num_movable &&
+					(PageLRU(page) || __PageMovable(page)))
+				(*num_movable)++;
+
 			page++;
 			continue;
 		}
@@ -1880,7 +1892,7 @@ int move_freepages(struct zone *zone,
 }
 
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+				int migratetype, int *num_movable)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -1897,7 +1909,8 @@ int move_freepages_block(struct zone *zone, struct page *page,
 	if (!zone_spans_pfn(zone, end_pfn))
 		return 0;
 
-	return move_freepages(zone, start_page, end_page, migratetype);
+	return move_freepages(zone, start_page, end_page, migratetype,
+								num_movable);
 }
 
 static void change_pageblock_range(struct page *pageblock_page,
@@ -1947,22 +1960,26 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
 /*
  * This function implements actual steal behaviour. If order is large enough,
  * we can steal whole pageblock. If not, we first move freepages in this
- * pageblock and check whether half of pages are moved or not. If half of
- * pages are moved, we can change migratetype of pageblock and permanently
- * use it's pages as requested migratetype in the future.
+ * pageblock to our migratetype and determine how many already-allocated pages
+ * are there in the pageblock with a compatible migratetype. If at least half
+ * of pages are free or compatible, we can change migratetype of the pageblock
+ * itself, so pages freed in the future will be put on the correct free list.
  */
 static void steal_suitable_fallback(struct zone *zone, struct page *page,
 					int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
 	struct free_area *area;
-	int pages;
+	int free_pages, movable_pages, alike_pages;
+	int old_block_type;
+
+	old_block_type = get_pageblock_migratetype(page);
 
 	/*
 	 * This can happen due to races and we want to prevent broken
 	 * highatomic accounting.
 	 */
-	if (is_migrate_highatomic_page(page))
+	if (is_migrate_highatomic(old_block_type))
 		goto single_page;
 
 	/* Take ownership for orders >= pageblock_order */
@@ -1975,10 +1992,35 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	if (!whole_block)
 		goto single_page;
 
-	pages = move_freepages_block(zone, page, start_type);
+	free_pages = move_freepages_block(zone, page, start_type,
+						&movable_pages);
+	/*
+	 * Determine how many pages are compatible with our allocation.
+	 * For movable allocation, it's the number of movable pages which
+	 * we just obtained. For other types it's a bit more tricky.
+	 */
+	if (start_type == MIGRATE_MOVABLE) {
+		alike_pages = movable_pages;
+	} else {
+		/*
+		 * If we are falling back a RECLAIMABLE or UNMOVABLE allocation
+		 * to MOVABLE pageblock, consider all non-movable pages as
+		 * compatible. If it's UNMOVABLE falling back to RECLAIMABLE or
+		 * vice versa, be conservative since we can't distinguish the
+		 * exact migratetype of non-movable pages.
+		 */
+		if (old_block_type == MIGRATE_MOVABLE)
+			alike_pages = pageblock_nr_pages
+						- (free_pages + movable_pages);
+		else
+			alike_pages = 0;
+	}
 
-	/* Claim the whole block if over half of it is free */
-	if (pages >= (1 << (pageblock_order-1)) ||
+	/*
+	 * If a sufficient number of pages in the block are either free or of
+	 * comparable migratability as our allocation, claim the whole block.
+	 */
+	if (free_pages + alike_pages >= (1 << (pageblock_order-1)) ||
 			page_group_by_mobility_disabled)
 		set_pageblock_migratetype(page, start_type);
 
@@ -2056,7 +2098,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 	    && !is_migrate_cma(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
-		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
+		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
 	}
 
 out_unlock:
@@ -2133,7 +2175,8 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 			 * may increase.
 			 */
 			set_pageblock_migratetype(page, ac->migratetype);
-			ret = move_freepages_block(zone, page, ac->migratetype);
+			ret = move_freepages_block(zone, page, ac->migratetype,
+									NULL);
 			if (ret) {
 				spin_unlock_irqrestore(&zone->lock, flags);
 				return ret;
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 7927bbb54a4e..5092e4ef00c8 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -66,7 +66,8 @@ static int set_migratetype_isolate(struct page *page,
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
+									NULL);
 
 		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
 	}
@@ -120,7 +121,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	 * pageblock scanning for freepage moving.
 	 */
 	if (!isolated_page) {
-		nr_pages = move_freepages_block(zone, page, migratetype);
+		nr_pages = move_freepages_block(zone, page, migratetype, NULL);
 		__mod_zone_freepage_state(zone, nr_pages, migratetype);
 	}
 	set_pageblock_migratetype(page, migratetype);
-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 5/8] mm, compaction: change migrate_async_suitable() to suitable_migration_source()
  2017-03-07 13:15 ` Vlastimil Babka
@ 2017-03-07 13:15   ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

Preparation for making the decisions more complex and depending on
compact_control flags. No functional change.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/mmzone.h |  5 +++++
 mm/compaction.c        | 19 +++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 446cf68c1c09..618499159a7c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -74,6 +74,11 @@ extern char * const migratetype_names[MIGRATE_TYPES];
 #  define is_migrate_cma_page(_page) false
 #endif
 
+static inline bool is_migrate_movable(int mt)
+{
+	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/compaction.c b/mm/compaction.c
index 9222ff362f33..a7c2f0da7228 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -89,11 +89,6 @@ static void map_pages(struct list_head *list)
 	list_splice(&tmp_list, list);
 }
 
-static inline bool migrate_async_suitable(int migratetype)
-{
-	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
-}
-
 #ifdef CONFIG_COMPACTION
 
 int PageMovable(struct page *page)
@@ -988,6 +983,15 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
 
+static bool suitable_migration_source(struct compact_control *cc,
+							struct page *page)
+{
+	if (cc->mode != MIGRATE_ASYNC)
+		return true;
+
+	return is_migrate_movable(get_pageblock_migratetype(page));
+}
+
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
@@ -1007,7 +1011,7 @@ static bool suitable_migration_target(struct compact_control *cc,
 	}
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(get_pageblock_migratetype(page)))
+	if (is_migrate_movable(get_pageblock_migratetype(page)))
 		return true;
 
 	/* Otherwise skip the block */
@@ -1242,8 +1246,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		 * Async compaction is optimistic to see if the minimum amount
 		 * of work satisfies the allocation.
 		 */
-		if (cc->mode == MIGRATE_ASYNC &&
-		    !migrate_async_suitable(get_pageblock_migratetype(page)))
+		if (!suitable_migration_source(cc, page))
 			continue;
 
 		/* Perform the isolation */
-- 
2.12.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 5/8] mm, compaction: change migrate_async_suitable() to suitable_migration_source()
@ 2017-03-07 13:15   ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

Preparation for making the decisions more complex and depending on
compact_control flags. No functional change.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/mmzone.h |  5 +++++
 mm/compaction.c        | 19 +++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 446cf68c1c09..618499159a7c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -74,6 +74,11 @@ extern char * const migratetype_names[MIGRATE_TYPES];
 #  define is_migrate_cma_page(_page) false
 #endif
 
+static inline bool is_migrate_movable(int mt)
+{
+	return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/compaction.c b/mm/compaction.c
index 9222ff362f33..a7c2f0da7228 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -89,11 +89,6 @@ static void map_pages(struct list_head *list)
 	list_splice(&tmp_list, list);
 }
 
-static inline bool migrate_async_suitable(int migratetype)
-{
-	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
-}
-
 #ifdef CONFIG_COMPACTION
 
 int PageMovable(struct page *page)
@@ -988,6 +983,15 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
 
+static bool suitable_migration_source(struct compact_control *cc,
+							struct page *page)
+{
+	if (cc->mode != MIGRATE_ASYNC)
+		return true;
+
+	return is_migrate_movable(get_pageblock_migratetype(page));
+}
+
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
@@ -1007,7 +1011,7 @@ static bool suitable_migration_target(struct compact_control *cc,
 	}
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(get_pageblock_migratetype(page)))
+	if (is_migrate_movable(get_pageblock_migratetype(page)))
 		return true;
 
 	/* Otherwise skip the block */
@@ -1242,8 +1246,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		 * Async compaction is optimistic to see if the minimum amount
 		 * of work satisfies the allocation.
 		 */
-		if (cc->mode == MIGRATE_ASYNC &&
-		    !migrate_async_suitable(get_pageblock_migratetype(page)))
+		if (!suitable_migration_source(cc, page))
 			continue;
 
 		/* Perform the isolation */
-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 6/8] mm, compaction: add migratetype to compact_control
  2017-03-07 13:15 ` Vlastimil Babka
@ 2017-03-07 13:15   ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

Preparation patch. We are going to need migratetype at lower layers than
compact_zone() and compact_finished().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 15 +++++++--------
 mm/internal.h   |  1 +
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a7c2f0da7228..c48da73e30a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1279,10 +1279,11 @@ static inline bool is_via_compact_memory(int order)
 	return order == -1;
 }
 
-static enum compact_result __compact_finished(struct zone *zone, struct compact_control *cc,
-			    const int migratetype)
+static enum compact_result __compact_finished(struct zone *zone,
+						struct compact_control *cc)
 {
 	unsigned int order;
+	const int migratetype = cc->migratetype;
 
 	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_CONTENDED;
@@ -1338,12 +1339,11 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 }
 
 static enum compact_result compact_finished(struct zone *zone,
-			struct compact_control *cc,
-			const int migratetype)
+			struct compact_control *cc)
 {
 	int ret;
 
-	ret = __compact_finished(zone, cc, migratetype);
+	ret = __compact_finished(zone, cc);
 	trace_mm_compaction_finished(zone, cc->order, ret);
 	if (ret == COMPACT_NO_SUITABLE_PAGE)
 		ret = COMPACT_CONTINUE;
@@ -1476,9 +1476,9 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	enum compact_result ret;
 	unsigned long start_pfn = zone->zone_start_pfn;
 	unsigned long end_pfn = zone_end_pfn(zone);
-	const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 
+	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
@@ -1528,8 +1528,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	migrate_prep_local();
 
-	while ((ret = compact_finished(zone, cc, migratetype)) ==
-						COMPACT_CONTINUE) {
+	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
 		int err;
 
 		switch (isolate_migratepages(zone, cc)) {
diff --git a/mm/internal.h b/mm/internal.h
index 05c48a95a20a..3985656ac261 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -193,6 +193,7 @@ struct compact_control {
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
+	int migratetype;		/* migratetype of direct compactor */
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
 	const int classzone_idx;	/* zone index of a direct compactor */
 	enum migrate_mode mode;		/* Async or sync migration mode */
-- 
2.12.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 6/8] mm, compaction: add migratetype to compact_control
@ 2017-03-07 13:15   ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

Preparation patch. We are going to need migratetype at lower layers than
compact_zone() and compact_finished().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 15 +++++++--------
 mm/internal.h   |  1 +
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a7c2f0da7228..c48da73e30a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1279,10 +1279,11 @@ static inline bool is_via_compact_memory(int order)
 	return order == -1;
 }
 
-static enum compact_result __compact_finished(struct zone *zone, struct compact_control *cc,
-			    const int migratetype)
+static enum compact_result __compact_finished(struct zone *zone,
+						struct compact_control *cc)
 {
 	unsigned int order;
+	const int migratetype = cc->migratetype;
 
 	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_CONTENDED;
@@ -1338,12 +1339,11 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 }
 
 static enum compact_result compact_finished(struct zone *zone,
-			struct compact_control *cc,
-			const int migratetype)
+			struct compact_control *cc)
 {
 	int ret;
 
-	ret = __compact_finished(zone, cc, migratetype);
+	ret = __compact_finished(zone, cc);
 	trace_mm_compaction_finished(zone, cc->order, ret);
 	if (ret == COMPACT_NO_SUITABLE_PAGE)
 		ret = COMPACT_CONTINUE;
@@ -1476,9 +1476,9 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	enum compact_result ret;
 	unsigned long start_pfn = zone->zone_start_pfn;
 	unsigned long end_pfn = zone_end_pfn(zone);
-	const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 
+	cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
@@ -1528,8 +1528,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	migrate_prep_local();
 
-	while ((ret = compact_finished(zone, cc, migratetype)) ==
-						COMPACT_CONTINUE) {
+	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
 		int err;
 
 		switch (isolate_migratepages(zone, cc)) {
diff --git a/mm/internal.h b/mm/internal.h
index 05c48a95a20a..3985656ac261 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -193,6 +193,7 @@ struct compact_control {
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
+	int migratetype;		/* migratetype of direct compactor */
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
 	const int classzone_idx;	/* zone index of a direct compactor */
 	enum migrate_mode mode;		/* Async or sync migration mode */
-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype
  2017-03-07 13:15 ` Vlastimil Babka
@ 2017-03-07 13:15   ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
pageblocks. This is a heuristic intended to reduce latency, based on the
assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.

However, with the exception of THP's, most high-order allocations are not
movable. Should the async compaction succeed, this increases the chance that
the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
long-term fragmentation worse.

This patch attempts to help the situation by changing async direct compaction
so that the migrate scanner only scans the pageblocks of the requested
migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
contain movable pages, chances are that the allocation can succeed within one
of such pageblocks, removing the need for a fallback. If that fails, the
subsequent sync attempt will ignore this restriction.

In testing based on 4.9 kernel with stress-highalloc from mmtests configured
for order-4 GFP_KERNEL allocations, this patch has reduced the number of
unmovable allocations falling back to movable pageblocks by 30%. The number
of movable allocations falling back is reduced by 12%.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 11 +++++++++--
 mm/page_alloc.c | 20 +++++++++++++-------
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index c48da73e30a5..2c288e75840d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -986,10 +986,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 static bool suitable_migration_source(struct compact_control *cc,
 							struct page *page)
 {
-	if (cc->mode != MIGRATE_ASYNC)
+	int block_mt;
+
+	if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
 		return true;
 
-	return is_migrate_movable(get_pageblock_migratetype(page));
+	block_mt = get_pageblock_migratetype(page);
+
+	if (cc->migratetype == MIGRATE_MOVABLE)
+		return is_migrate_movable(block_mt);
+	else
+		return block_mt == cc->migratetype;
 }
 
 /* Returns true if the page is within a block suitable for migration to */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index db96d1ebbed8..01ce6d41cb20 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3665,6 +3665,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 						struct alloc_context *ac)
 {
 	bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
+	const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
 	struct page *page = NULL;
 	unsigned int alloc_flags;
 	unsigned long did_some_progress;
@@ -3732,12 +3733,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	/*
 	 * For costly allocations, try direct compaction first, as it's likely
-	 * that we have enough base pages and don't need to reclaim. Don't try
-	 * that for allocations that are allowed to ignore watermarks, as the
-	 * ALLOC_NO_WATERMARKS attempt didn't yet happen.
+	 * that we have enough base pages and don't need to reclaim. For non-
+	 * movable high-order allocations, do that as well, as compaction will
+	 * try prevent permanent fragmentation by migrating from blocks of the
+	 * same migratetype.
+	 * Don't try this for allocations that are allowed to ignore
+	 * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
 	 */
-	if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER &&
-		!gfp_pfmemalloc_allowed(gfp_mask)) {
+	if (can_direct_reclaim &&
+			(costly_order ||
+			   (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
+			&& !gfp_pfmemalloc_allowed(gfp_mask)) {
 		page = __alloc_pages_direct_compact(gfp_mask, order,
 						alloc_flags, ac,
 						INIT_COMPACT_PRIORITY,
@@ -3749,7 +3755,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * Checks for costly allocations with __GFP_NORETRY, which
 		 * includes THP page fault allocations
 		 */
-		if (gfp_mask & __GFP_NORETRY) {
+		if (costly_order && (gfp_mask & __GFP_NORETRY)) {
 			/*
 			 * If compaction is deferred for high-order allocations,
 			 * it is because sync compaction recently failed. If
@@ -3830,7 +3836,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * Do not retry costly high order allocations unless they are
 	 * __GFP_REPEAT
 	 */
-	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
+	if (costly_order && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
-- 
2.12.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype
@ 2017-03-07 13:15   ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
pageblocks. This is a heuristic intended to reduce latency, based on the
assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.

However, with the exception of THP's, most high-order allocations are not
movable. Should the async compaction succeed, this increases the chance that
the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
long-term fragmentation worse.

This patch attempts to help the situation by changing async direct compaction
so that the migrate scanner only scans the pageblocks of the requested
migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
contain movable pages, chances are that the allocation can succeed within one
of such pageblocks, removing the need for a fallback. If that fails, the
subsequent sync attempt will ignore this restriction.

In testing based on 4.9 kernel with stress-highalloc from mmtests configured
for order-4 GFP_KERNEL allocations, this patch has reduced the number of
unmovable allocations falling back to movable pageblocks by 30%. The number
of movable allocations falling back is reduced by 12%.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 11 +++++++++--
 mm/page_alloc.c | 20 +++++++++++++-------
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index c48da73e30a5..2c288e75840d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -986,10 +986,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 static bool suitable_migration_source(struct compact_control *cc,
 							struct page *page)
 {
-	if (cc->mode != MIGRATE_ASYNC)
+	int block_mt;
+
+	if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
 		return true;
 
-	return is_migrate_movable(get_pageblock_migratetype(page));
+	block_mt = get_pageblock_migratetype(page);
+
+	if (cc->migratetype == MIGRATE_MOVABLE)
+		return is_migrate_movable(block_mt);
+	else
+		return block_mt == cc->migratetype;
 }
 
 /* Returns true if the page is within a block suitable for migration to */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index db96d1ebbed8..01ce6d41cb20 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3665,6 +3665,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 						struct alloc_context *ac)
 {
 	bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
+	const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
 	struct page *page = NULL;
 	unsigned int alloc_flags;
 	unsigned long did_some_progress;
@@ -3732,12 +3733,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	/*
 	 * For costly allocations, try direct compaction first, as it's likely
-	 * that we have enough base pages and don't need to reclaim. Don't try
-	 * that for allocations that are allowed to ignore watermarks, as the
-	 * ALLOC_NO_WATERMARKS attempt didn't yet happen.
+	 * that we have enough base pages and don't need to reclaim. For non-
+	 * movable high-order allocations, do that as well, as compaction will
+	 * try prevent permanent fragmentation by migrating from blocks of the
+	 * same migratetype.
+	 * Don't try this for allocations that are allowed to ignore
+	 * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
 	 */
-	if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER &&
-		!gfp_pfmemalloc_allowed(gfp_mask)) {
+	if (can_direct_reclaim &&
+			(costly_order ||
+			   (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
+			&& !gfp_pfmemalloc_allowed(gfp_mask)) {
 		page = __alloc_pages_direct_compact(gfp_mask, order,
 						alloc_flags, ac,
 						INIT_COMPACT_PRIORITY,
@@ -3749,7 +3755,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * Checks for costly allocations with __GFP_NORETRY, which
 		 * includes THP page fault allocations
 		 */
-		if (gfp_mask & __GFP_NORETRY) {
+		if (costly_order && (gfp_mask & __GFP_NORETRY)) {
 			/*
 			 * If compaction is deferred for high-order allocations,
 			 * it is because sync compaction recently failed. If
@@ -3830,7 +3836,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * Do not retry costly high order allocations unless they are
 	 * __GFP_REPEAT
 	 */
-	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
+	if (costly_order && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 8/8] mm, compaction: finish whole pageblock to reduce fragmentation
  2017-03-07 13:15 ` Vlastimil Babka
@ 2017-03-07 13:15   ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

The main goal of direct compaction is to form a high-order page for allocation,
but it should also help against long-term fragmentation when possible. Most
lower-than-pageblock-order compactions are for non-movable allocations, which
means that if we compact in a movable pageblock and terminate as soon as we
create the high-order page, it's unlikely that the fallback heuristics will
claim the whole block. Instead there might be a single unmovable page in a
pageblock full of movable pages, and the next unmovable allocation might pick
another pageblock and increase long-term fragmentation.

To help against such scenarios, this patch changes the termination criteria for
compaction so that the current pageblock is finished even though the high-order
page already exists. Note that it might be possible that the high-order page
formed elsewhere in the zone due to parallel activity, but this patch doesn't
try to detect that.

This is only done with sync compaction, because async compaction is limited to
pageblock of the same migratetype, where it cannot result in a migratetype
fallback. (Async compaction also eagerly skips order-aligned blocks where
isolation fails, which is against the goal of migrating away as much of the
pageblock as possible.)

As a result of this patch, long-term memory fragmentation should be reduced.

In testing based on 4.9 kernel with stress-highalloc from mmtests configured
for order-4 GFP_KERNEL allocations, this patch has reduced the number of
unmovable allocations falling back to movable pageblocks by 20%. The number

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 36 ++++++++++++++++++++++++++++++++++--
 mm/internal.h   |  1 +
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 2c288e75840d..bc7903130501 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1318,6 +1318,17 @@ static enum compact_result __compact_finished(struct zone *zone,
 	if (is_via_compact_memory(cc->order))
 		return COMPACT_CONTINUE;
 
+	if (cc->finishing_block) {
+		/*
+		 * We have finished the pageblock, but better check again that
+		 * we really succeeded.
+		 */
+		if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
+			cc->finishing_block = false;
+		else
+			return COMPACT_CONTINUE;
+	}
+
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
 		struct free_area *area = &zone->free_area[order];
@@ -1338,8 +1349,29 @@ static enum compact_result __compact_finished(struct zone *zone,
 		 * other migratetype buddy lists.
 		 */
 		if (find_suitable_fallback(area, order, migratetype,
-						true, &can_steal) != -1)
-			return COMPACT_SUCCESS;
+						true, &can_steal) != -1) {
+
+			/* movable pages are OK in any pageblock */
+			if (migratetype == MIGRATE_MOVABLE)
+				return COMPACT_SUCCESS;
+
+			/*
+			 * We are stealing for a non-movable allocation. Make
+			 * sure we finish compacting the current pageblock
+			 * first so it is as free as possible and we won't
+			 * have to steal another one soon. This only applies
+			 * to sync compaction, as async compaction operates
+			 * on pageblocks of the same migratetype.
+			 */
+			if (cc->mode == MIGRATE_ASYNC ||
+					IS_ALIGNED(cc->migrate_pfn,
+							pageblock_nr_pages)) {
+				return COMPACT_SUCCESS;
+			}
+
+			cc->finishing_block = true;
+			return COMPACT_CONTINUE;
+		}
 	}
 
 	return COMPACT_NO_SUITABLE_PAGE;
diff --git a/mm/internal.h b/mm/internal.h
index 3985656ac261..e417084c2fb1 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -202,6 +202,7 @@ struct compact_control {
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	bool contended;			/* Signal lock or sched contention */
+	bool finishing_block;		/* Finishing current pageblock */
 };
 
 unsigned long
-- 
2.12.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 8/8] mm, compaction: finish whole pageblock to reduce fragmentation
@ 2017-03-07 13:15   ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-07 13:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team, Vlastimil Babka

The main goal of direct compaction is to form a high-order page for allocation,
but it should also help against long-term fragmentation when possible. Most
lower-than-pageblock-order compactions are for non-movable allocations, which
means that if we compact in a movable pageblock and terminate as soon as we
create the high-order page, it's unlikely that the fallback heuristics will
claim the whole block. Instead there might be a single unmovable page in a
pageblock full of movable pages, and the next unmovable allocation might pick
another pageblock and increase long-term fragmentation.

To help against such scenarios, this patch changes the termination criteria for
compaction so that the current pageblock is finished even though the high-order
page already exists. Note that it might be possible that the high-order page
formed elsewhere in the zone due to parallel activity, but this patch doesn't
try to detect that.

This is only done with sync compaction, because async compaction is limited to
pageblock of the same migratetype, where it cannot result in a migratetype
fallback. (Async compaction also eagerly skips order-aligned blocks where
isolation fails, which is against the goal of migrating away as much of the
pageblock as possible.)

As a result of this patch, long-term memory fragmentation should be reduced.

In testing based on 4.9 kernel with stress-highalloc from mmtests configured
for order-4 GFP_KERNEL allocations, this patch has reduced the number of
unmovable allocations falling back to movable pageblocks by 20%. The number

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 36 ++++++++++++++++++++++++++++++++++--
 mm/internal.h   |  1 +
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 2c288e75840d..bc7903130501 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1318,6 +1318,17 @@ static enum compact_result __compact_finished(struct zone *zone,
 	if (is_via_compact_memory(cc->order))
 		return COMPACT_CONTINUE;
 
+	if (cc->finishing_block) {
+		/*
+		 * We have finished the pageblock, but better check again that
+		 * we really succeeded.
+		 */
+		if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
+			cc->finishing_block = false;
+		else
+			return COMPACT_CONTINUE;
+	}
+
 	/* Direct compactor: Is a suitable page free? */
 	for (order = cc->order; order < MAX_ORDER; order++) {
 		struct free_area *area = &zone->free_area[order];
@@ -1338,8 +1349,29 @@ static enum compact_result __compact_finished(struct zone *zone,
 		 * other migratetype buddy lists.
 		 */
 		if (find_suitable_fallback(area, order, migratetype,
-						true, &can_steal) != -1)
-			return COMPACT_SUCCESS;
+						true, &can_steal) != -1) {
+
+			/* movable pages are OK in any pageblock */
+			if (migratetype == MIGRATE_MOVABLE)
+				return COMPACT_SUCCESS;
+
+			/*
+			 * We are stealing for a non-movable allocation. Make
+			 * sure we finish compacting the current pageblock
+			 * first so it is as free as possible and we won't
+			 * have to steal another one soon. This only applies
+			 * to sync compaction, as async compaction operates
+			 * on pageblocks of the same migratetype.
+			 */
+			if (cc->mode == MIGRATE_ASYNC ||
+					IS_ALIGNED(cc->migrate_pfn,
+							pageblock_nr_pages)) {
+				return COMPACT_SUCCESS;
+			}
+
+			cc->finishing_block = true;
+			return COMPACT_CONTINUE;
+		}
 	}
 
 	return COMPACT_NO_SUITABLE_PAGE;
diff --git a/mm/internal.h b/mm/internal.h
index 3985656ac261..e417084c2fb1 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -202,6 +202,7 @@ struct compact_control {
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	bool contended;			/* Signal lock or sched contention */
+	bool finishing_block;		/* Finishing current pageblock */
 };
 
 unsigned long
-- 
2.12.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/8] try to reduce fragmenting fallbacks
  2017-03-07 13:15 ` Vlastimil Babka
@ 2017-03-08 16:46   ` Johannes Weiner
  -1 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2017-03-08 16:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team

On Tue, Mar 07, 2017 at 02:15:37PM +0100, Vlastimil Babka wrote:
> Last year, Johannes Weiner has reported a regression in page mobility
> grouping [1] and while the exact cause was not found, I've come up with some
> ways to improve it by reducing the number of allocations falling back to
> different migratetype and causing permanent fragmentation.

I finally managed to get a handful of our machines on 4.10 with these
patches applied and a 4.10 vanilla control group.

The sampling period is over twelve hours, which is on the short side
for evaluating that load, so take the results with a grain of salt.

The allocstall rate (events per second) is down on average, but there
are occasionally fairly high spikes that exceed the peaks in 4.10:

http://cmpxchg.org/antifrag/allocstallrate.png

Activity from the compaction free scanner is down, while the migration
scanner does more work. I would assume most of this is coming from the
same-migratetype restriction on the source blocks:

http://cmpxchg.org/antifrag/compactfreescannedrate.png
http://cmpxchg.org/antifrag/compactmigratescannedrate.png

Unfortunately, the average compaction stall rate is consistently much
higher with the patches. The 1h rate averages are 2-3x higher:

http://cmpxchg.org/antifrag/compactstallrate.png

An increase in direct compaction is a bit worrisome, but the task
completion rates - the bottom line metric for this workload - are
still too chaotic to say whether the increased allocation latency
affects us meaningfully here. I'll give it a few more days.

Is there any other data you would like me to gather?

Thanks!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/8] try to reduce fragmenting fallbacks
@ 2017-03-08 16:46   ` Johannes Weiner
  0 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2017-03-08 16:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team

On Tue, Mar 07, 2017 at 02:15:37PM +0100, Vlastimil Babka wrote:
> Last year, Johannes Weiner has reported a regression in page mobility
> grouping [1] and while the exact cause was not found, I've come up with some
> ways to improve it by reducing the number of allocations falling back to
> different migratetype and causing permanent fragmentation.

I finally managed to get a handful of our machines on 4.10 with these
patches applied and a 4.10 vanilla control group.

The sampling period is over twelve hours, which is on the short side
for evaluating that load, so take the results with a grain of salt.

The allocstall rate (events per second) is down on average, but there
are occasionally fairly high spikes that exceed the peaks in 4.10:

http://cmpxchg.org/antifrag/allocstallrate.png

Activity from the compaction free scanner is down, while the migration
scanner does more work. I would assume most of this is coming from the
same-migratetype restriction on the source blocks:

http://cmpxchg.org/antifrag/compactfreescannedrate.png
http://cmpxchg.org/antifrag/compactmigratescannedrate.png

Unfortunately, the average compaction stall rate is consistently much
higher with the patches. The 1h rate averages are 2-3x higher:

http://cmpxchg.org/antifrag/compactstallrate.png

An increase in direct compaction is a bit worrisome, but the task
completion rates - the bottom line metric for this workload - are
still too chaotic to say whether the increased allocation latency
affects us meaningfully here. I'll give it a few more days.

Is there any other data you would like me to gather?

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/8] try to reduce fragmenting fallbacks
  2017-03-08 16:46   ` Johannes Weiner
@ 2017-03-08 19:17     ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-08 19:17 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team

On 8.3.2017 17:46, Johannes Weiner wrote:
> Is there any other data you would like me to gather?

If you can enable the extfrag tracepoint, it would be nice to have graphs of how
unmovable allocations falling back to movable pageblocks, etc.

Possibly also /proc/pagetypeinfo for numbers of pageblock types.

Thanks!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/8] try to reduce fragmenting fallbacks
@ 2017-03-08 19:17     ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-08 19:17 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team

On 8.3.2017 17:46, Johannes Weiner wrote:
> Is there any other data you would like me to gather?

If you can enable the extfrag tracepoint, it would be nice to have graphs of how
unmovable allocations falling back to movable pageblocks, etc.

Possibly also /proc/pagetypeinfo for numbers of pageblock types.

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/8] mm, compaction: remove redundant watermark check in compact_finished()
  2017-03-07 13:15   ` Vlastimil Babka
@ 2017-03-16  1:30     ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-03-16  1:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

Hello,

On Tue, Mar 07, 2017 at 02:15:39PM +0100, Vlastimil Babka wrote:
> When detecting whether compaction has succeeded in forming a high-order page,
> __compact_finished() employs a watermark check, followed by an own search for
> a suitable page in the freelists. This is not ideal for two reasons:
> 
> - The watermark check also searches high-order freelists, but has a less strict
>   criteria wrt fallback. It's therefore redundant and waste of cycles. This was
>   different in the past when high-order watermark check attempted to apply
>   reserves to high-order pages.

Although it looks redundant now, I don't like removal of the watermark
check here. Criteria in watermark check would be changed to more strict
later and we would easily miss to apply it on compaction side if the
watermark check is removed.

> 
> - The watermark check might actually fail due to lack of order-0 pages.
>   Compaction can't help with that, so there's no point in continuing because of
>   that. It's possible that high-order page still exists and it terminates.

If lack of order-0 pages is the reason for stopping compaction, we
need to insert the watermark check for order-0 to break the compaction
instead of removing it. Am I missing something?

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/8] mm, compaction: remove redundant watermark check in compact_finished()
@ 2017-03-16  1:30     ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-03-16  1:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

Hello,

On Tue, Mar 07, 2017 at 02:15:39PM +0100, Vlastimil Babka wrote:
> When detecting whether compaction has succeeded in forming a high-order page,
> __compact_finished() employs a watermark check, followed by an own search for
> a suitable page in the freelists. This is not ideal for two reasons:
> 
> - The watermark check also searches high-order freelists, but has a less strict
>   criteria wrt fallback. It's therefore redundant and waste of cycles. This was
>   different in the past when high-order watermark check attempted to apply
>   reserves to high-order pages.

Although it looks redundant now, I don't like removal of the watermark
check here. Criteria in watermark check would be changed to more strict
later and we would easily miss to apply it on compaction side if the
watermark check is removed.

> 
> - The watermark check might actually fail due to lack of order-0 pages.
>   Compaction can't help with that, so there's no point in continuing because of
>   that. It's possible that high-order page still exists and it terminates.

If lack of order-0 pages is the reason for stopping compaction, we
need to insert the watermark check for order-0 to break the compaction
instead of removing it. Am I missing something?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/8] mm, page_alloc: count movable pages when stealing from pageblock
  2017-03-07 13:15   ` Vlastimil Babka
@ 2017-03-16  1:53     ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-03-16  1:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On Tue, Mar 07, 2017 at 02:15:41PM +0100, Vlastimil Babka wrote:
> When stealing pages from pageblock of a different migratetype, we count how
> many free pages were stolen, and change the pageblock's migratetype if more
> than half of the pageblock was free. This might be too conservative, as there
> might be other pages that are not free, but were allocated with the same
> migratetype as our allocation requested.

I think that too conservative is good for movable case. In my experiments,
fragmentation spreads out when unmovable/reclaimable pageblock is
changed to movable pageblock prematurely ('prematurely' means that
allocated unmovable pages remains). As you said below, movable allocations
falling back to other pageblocks don't causes permanent fragmentation.
Therefore, we don't need to be less conservative for movable
allocation. So, how about following change to keep the criteria for
movable allocation conservative even with this counting improvement?

threshold = (1 << (pageblock_order - 1));
if (start_type == MIGRATE_MOVABLE)
        threshold += (1 << (pageblock_order - 2));

if (free_pages + alike_pages >= threshold)
        ...

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/8] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-03-16  1:53     ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-03-16  1:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On Tue, Mar 07, 2017 at 02:15:41PM +0100, Vlastimil Babka wrote:
> When stealing pages from pageblock of a different migratetype, we count how
> many free pages were stolen, and change the pageblock's migratetype if more
> than half of the pageblock was free. This might be too conservative, as there
> might be other pages that are not free, but were allocated with the same
> migratetype as our allocation requested.

I think that too conservative is good for movable case. In my experiments,
fragmentation spreads out when unmovable/reclaimable pageblock is
changed to movable pageblock prematurely ('prematurely' means that
allocated unmovable pages remains). As you said below, movable allocations
falling back to other pageblocks don't causes permanent fragmentation.
Therefore, we don't need to be less conservative for movable
allocation. So, how about following change to keep the criteria for
movable allocation conservative even with this counting improvement?

threshold = (1 << (pageblock_order - 1));
if (start_type == MIGRATE_MOVABLE)
        threshold += (1 << (pageblock_order - 2));

if (free_pages + alike_pages >= threshold)
        ...

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype
  2017-03-07 13:15   ` Vlastimil Babka
@ 2017-03-16  2:14     ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-03-16  2:14 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> pageblocks. This is a heuristic intended to reduce latency, based on the
> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> 
> However, with the exception of THP's, most high-order allocations are not
> movable. Should the async compaction succeed, this increases the chance that
> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> long-term fragmentation worse.

I agree with this idea but have some concerns on this change.

*ASYNC* compaction is designed for reducing latency and this change
doesn't fit it. If everything works fine, there is a few movable pages
in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
less the number of non-MOVABLE pageblock than MOVABLE one so finding
non-MOVABLE pageblock takes long time. These two factors will increase
the latency of *ASYNC* compaction.

And, there is a concern in implementaion side. With this change, there
is much possibilty that compaction scanner's met by ASYNC compaction.
It resets the scanner position and SYNC compaction would start the
scan at the beginning of the zone every time. It would make cached
position useless and inefficient.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype
@ 2017-03-16  2:14     ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-03-16  2:14 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> pageblocks. This is a heuristic intended to reduce latency, based on the
> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> 
> However, with the exception of THP's, most high-order allocations are not
> movable. Should the async compaction succeed, this increases the chance that
> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> long-term fragmentation worse.

I agree with this idea but have some concerns on this change.

*ASYNC* compaction is designed for reducing latency and this change
doesn't fit it. If everything works fine, there is a few movable pages
in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
less the number of non-MOVABLE pageblock than MOVABLE one so finding
non-MOVABLE pageblock takes long time. These two factors will increase
the latency of *ASYNC* compaction.

And, there is a concern in implementaion side. With this change, there
is much possibilty that compaction scanner's met by ASYNC compaction.
It resets the scanner position and SYNC compaction would start the
scan at the beginning of the zone every time. It would make cached
position useless and inefficient.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 8/8] mm, compaction: finish whole pageblock to reduce fragmentation
  2017-03-07 13:15   ` Vlastimil Babka
@ 2017-03-16  2:18     ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-03-16  2:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On Tue, Mar 07, 2017 at 02:15:45PM +0100, Vlastimil Babka wrote:
> The main goal of direct compaction is to form a high-order page for allocation,
> but it should also help against long-term fragmentation when possible. Most
> lower-than-pageblock-order compactions are for non-movable allocations, which
> means that if we compact in a movable pageblock and terminate as soon as we
> create the high-order page, it's unlikely that the fallback heuristics will
> claim the whole block. Instead there might be a single unmovable page in a
> pageblock full of movable pages, and the next unmovable allocation might pick
> another pageblock and increase long-term fragmentation.
> 
> To help against such scenarios, this patch changes the termination criteria for
> compaction so that the current pageblock is finished even though the high-order
> page already exists. Note that it might be possible that the high-order page
> formed elsewhere in the zone due to parallel activity, but this patch doesn't
> try to detect that.
> 
> This is only done with sync compaction, because async compaction is limited to
> pageblock of the same migratetype, where it cannot result in a migratetype
> fallback. (Async compaction also eagerly skips order-aligned blocks where
> isolation fails, which is against the goal of migrating away as much of the
> pageblock as possible.)
> 
> As a result of this patch, long-term memory fragmentation should be reduced.
> 
> In testing based on 4.9 kernel with stress-highalloc from mmtests configured
> for order-4 GFP_KERNEL allocations, this patch has reduced the number of
> unmovable allocations falling back to movable pageblocks by 20%. The number
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/compaction.c | 36 ++++++++++++++++++++++++++++++++++--
>  mm/internal.h   |  1 +
>  2 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 2c288e75840d..bc7903130501 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1318,6 +1318,17 @@ static enum compact_result __compact_finished(struct zone *zone,
>  	if (is_via_compact_memory(cc->order))
>  		return COMPACT_CONTINUE;
>  
> +	if (cc->finishing_block) {
> +		/*
> +		 * We have finished the pageblock, but better check again that
> +		 * we really succeeded.
> +		 */
> +		if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
> +			cc->finishing_block = false;
> +		else
> +			return COMPACT_CONTINUE;
> +	}
> +
>  	/* Direct compactor: Is a suitable page free? */
>  	for (order = cc->order; order < MAX_ORDER; order++) {
>  		struct free_area *area = &zone->free_area[order];
> @@ -1338,8 +1349,29 @@ static enum compact_result __compact_finished(struct zone *zone,
>  		 * other migratetype buddy lists.
>  		 */
>  		if (find_suitable_fallback(area, order, migratetype,
> -						true, &can_steal) != -1)
> -			return COMPACT_SUCCESS;
> +						true, &can_steal) != -1) {
> +
> +			/* movable pages are OK in any pageblock */
> +			if (migratetype == MIGRATE_MOVABLE)
> +				return COMPACT_SUCCESS;
> +
> +			/*
> +			 * We are stealing for a non-movable allocation. Make
> +			 * sure we finish compacting the current pageblock
> +			 * first so it is as free as possible and we won't
> +			 * have to steal another one soon. This only applies
> +			 * to sync compaction, as async compaction operates
> +			 * on pageblocks of the same migratetype.
> +			 */
> +			if (cc->mode == MIGRATE_ASYNC ||
> +					IS_ALIGNED(cc->migrate_pfn,
> +							pageblock_nr_pages)) {
> +				return COMPACT_SUCCESS;
> +			}

If cc->migratetype and cc->migrate_pfn's migratetype is the same, stopping
the compaction here doesn't cause any fragmentation. Do we need to
compact full pageblock in this case?

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 8/8] mm, compaction: finish whole pageblock to reduce fragmentation
@ 2017-03-16  2:18     ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-03-16  2:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On Tue, Mar 07, 2017 at 02:15:45PM +0100, Vlastimil Babka wrote:
> The main goal of direct compaction is to form a high-order page for allocation,
> but it should also help against long-term fragmentation when possible. Most
> lower-than-pageblock-order compactions are for non-movable allocations, which
> means that if we compact in a movable pageblock and terminate as soon as we
> create the high-order page, it's unlikely that the fallback heuristics will
> claim the whole block. Instead there might be a single unmovable page in a
> pageblock full of movable pages, and the next unmovable allocation might pick
> another pageblock and increase long-term fragmentation.
> 
> To help against such scenarios, this patch changes the termination criteria for
> compaction so that the current pageblock is finished even though the high-order
> page already exists. Note that it might be possible that the high-order page
> formed elsewhere in the zone due to parallel activity, but this patch doesn't
> try to detect that.
> 
> This is only done with sync compaction, because async compaction is limited to
> pageblock of the same migratetype, where it cannot result in a migratetype
> fallback. (Async compaction also eagerly skips order-aligned blocks where
> isolation fails, which is against the goal of migrating away as much of the
> pageblock as possible.)
> 
> As a result of this patch, long-term memory fragmentation should be reduced.
> 
> In testing based on 4.9 kernel with stress-highalloc from mmtests configured
> for order-4 GFP_KERNEL allocations, this patch has reduced the number of
> unmovable allocations falling back to movable pageblocks by 20%. The number
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/compaction.c | 36 ++++++++++++++++++++++++++++++++++--
>  mm/internal.h   |  1 +
>  2 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 2c288e75840d..bc7903130501 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1318,6 +1318,17 @@ static enum compact_result __compact_finished(struct zone *zone,
>  	if (is_via_compact_memory(cc->order))
>  		return COMPACT_CONTINUE;
>  
> +	if (cc->finishing_block) {
> +		/*
> +		 * We have finished the pageblock, but better check again that
> +		 * we really succeeded.
> +		 */
> +		if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
> +			cc->finishing_block = false;
> +		else
> +			return COMPACT_CONTINUE;
> +	}
> +
>  	/* Direct compactor: Is a suitable page free? */
>  	for (order = cc->order; order < MAX_ORDER; order++) {
>  		struct free_area *area = &zone->free_area[order];
> @@ -1338,8 +1349,29 @@ static enum compact_result __compact_finished(struct zone *zone,
>  		 * other migratetype buddy lists.
>  		 */
>  		if (find_suitable_fallback(area, order, migratetype,
> -						true, &can_steal) != -1)
> -			return COMPACT_SUCCESS;
> +						true, &can_steal) != -1) {
> +
> +			/* movable pages are OK in any pageblock */
> +			if (migratetype == MIGRATE_MOVABLE)
> +				return COMPACT_SUCCESS;
> +
> +			/*
> +			 * We are stealing for a non-movable allocation. Make
> +			 * sure we finish compacting the current pageblock
> +			 * first so it is as free as possible and we won't
> +			 * have to steal another one soon. This only applies
> +			 * to sync compaction, as async compaction operates
> +			 * on pageblocks of the same migratetype.
> +			 */
> +			if (cc->mode == MIGRATE_ASYNC ||
> +					IS_ALIGNED(cc->migrate_pfn,
> +							pageblock_nr_pages)) {
> +				return COMPACT_SUCCESS;
> +			}

If cc->migratetype and cc->migrate_pfn's migratetype is the same, stopping
the compaction here doesn't cause any fragmentation. Do we need to
compact full pageblock in this case?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/8] try to reduce fragmenting fallbacks
  2017-03-08 19:17     ` Vlastimil Babka
@ 2017-03-16 18:34       ` Johannes Weiner
  -1 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2017-03-16 18:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team

On Wed, Mar 08, 2017 at 08:17:39PM +0100, Vlastimil Babka wrote:
> On 8.3.2017 17:46, Johannes Weiner wrote:
> > Is there any other data you would like me to gather?
> 
> If you can enable the extfrag tracepoint, it would be nice to have graphs of how
> unmovable allocations falling back to movable pageblocks, etc.

Okay, here we go. I recorded 24 hours worth of the extfrag tracepoint,
filtered to fallbacks from unmovable requests to movable blocks. I've
uploaded the plot here:

http://cmpxchg.org/antifrag/fallbackrate.png

but this already speaks for itself:

11G     alloc-mtfallback.trace
3.3G    alloc-mtfallback-patched.trace

;)

> Possibly also /proc/pagetypeinfo for numbers of pageblock types.

After a week of uptime, the patched (b) kernel has more movable blocks
than vanilla 4.10-rc8 (a):

   Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic          CMA      Isolate

a: Node 1, zone   Normal         2017        29763          987            1            0            0
b: Node 1, zone   Normal         1264        30850          653            1            0            0

I sampled this somewhat sporadically over the week and it's been
reading reliably this way.

The patched kernel also consistently beats vanilla in terms of peak
job throughput.

Overall very cool!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/8] try to reduce fragmenting fallbacks
@ 2017-03-16 18:34       ` Johannes Weiner
  0 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2017-03-16 18:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team

On Wed, Mar 08, 2017 at 08:17:39PM +0100, Vlastimil Babka wrote:
> On 8.3.2017 17:46, Johannes Weiner wrote:
> > Is there any other data you would like me to gather?
> 
> If you can enable the extfrag tracepoint, it would be nice to have graphs of how
> unmovable allocations falling back to movable pageblocks, etc.

Okay, here we go. I recorded 24 hours worth of the extfrag tracepoint,
filtered to fallbacks from unmovable requests to movable blocks. I've
uploaded the plot here:

http://cmpxchg.org/antifrag/fallbackrate.png

but this already speaks for itself:

11G     alloc-mtfallback.trace
3.3G    alloc-mtfallback-patched.trace

;)

> Possibly also /proc/pagetypeinfo for numbers of pageblock types.

After a week of uptime, the patched (b) kernel has more movable blocks
than vanilla 4.10-rc8 (a):

   Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic          CMA      Isolate

a: Node 1, zone   Normal         2017        29763          987            1            0            0
b: Node 1, zone   Normal         1264        30850          653            1            0            0

I sampled this somewhat sporadically over the week and it's been
reading reliably this way.

The patched kernel also consistently beats vanilla in terms of peak
job throughput.

Overall very cool!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/8] try to reduce fragmenting fallbacks
  2017-03-16 18:34       ` Johannes Weiner
@ 2017-03-17 18:29         ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-17 18:29 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team

On 03/16/2017 07:34 PM, Johannes Weiner wrote:
> On Wed, Mar 08, 2017 at 08:17:39PM +0100, Vlastimil Babka wrote:
>> On 8.3.2017 17:46, Johannes Weiner wrote:
>>> Is there any other data you would like me to gather?
>>
>> If you can enable the extfrag tracepoint, it would be nice to have graphs of how
>> unmovable allocations falling back to movable pageblocks, etc.
> 
> Okay, here we go. I recorded 24 hours worth of the extfrag tracepoint,
> filtered to fallbacks from unmovable requests to movable blocks. I've
> uploaded the plot here:
> 
> http://cmpxchg.org/antifrag/fallbackrate.png
> 
> but this already speaks for itself:
> 
> 11G     alloc-mtfallback.trace
> 3.3G    alloc-mtfallback-patched.trace
> 
> ;)

Great!

>> Possibly also /proc/pagetypeinfo for numbers of pageblock types.

> After a week of uptime, the patched (b) kernel has more movable blocks
> than vanilla 4.10-rc8 (a):
> 
>    Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic          CMA      Isolate
> 
> a: Node 1, zone   Normal         2017        29763          987            1            0            0
> b: Node 1, zone   Normal         1264        30850          653            1            0            0

That's better than I expected. I wouldn't be surprised if the number of
unmovable pageblocks actually got *higher* due to the series because
previously many unmovable pages would be scattered around movable blocks.

> I sampled this somewhat sporadically over the week and it's been
> reading reliably this way.
> 
> The patched kernel also consistently beats vanilla in terms of peak
> job throughput.
> 
> Overall very cool!

Thanks a lot! So that means it's worth the increased compaction stats
you reported earlier?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/8] try to reduce fragmenting fallbacks
@ 2017-03-17 18:29         ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-17 18:29 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team

On 03/16/2017 07:34 PM, Johannes Weiner wrote:
> On Wed, Mar 08, 2017 at 08:17:39PM +0100, Vlastimil Babka wrote:
>> On 8.3.2017 17:46, Johannes Weiner wrote:
>>> Is there any other data you would like me to gather?
>>
>> If you can enable the extfrag tracepoint, it would be nice to have graphs of how
>> unmovable allocations falling back to movable pageblocks, etc.
> 
> Okay, here we go. I recorded 24 hours worth of the extfrag tracepoint,
> filtered to fallbacks from unmovable requests to movable blocks. I've
> uploaded the plot here:
> 
> http://cmpxchg.org/antifrag/fallbackrate.png
> 
> but this already speaks for itself:
> 
> 11G     alloc-mtfallback.trace
> 3.3G    alloc-mtfallback-patched.trace
> 
> ;)

Great!

>> Possibly also /proc/pagetypeinfo for numbers of pageblock types.

> After a week of uptime, the patched (b) kernel has more movable blocks
> than vanilla 4.10-rc8 (a):
> 
>    Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic          CMA      Isolate
> 
> a: Node 1, zone   Normal         2017        29763          987            1            0            0
> b: Node 1, zone   Normal         1264        30850          653            1            0            0

That's better than I expected. I wouldn't be surprised if the number of
unmovable pageblocks actually got *higher* due to the series because
previously many unmovable pages would be scattered around movable blocks.

> I sampled this somewhat sporadically over the week and it's been
> reading reliably this way.
> 
> The patched kernel also consistently beats vanilla in terms of peak
> job throughput.
> 
> Overall very cool!

Thanks a lot! So that means it's worth the increased compaction stats
you reported earlier?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/8] try to reduce fragmenting fallbacks
  2017-03-17 18:29         ` Vlastimil Babka
@ 2017-03-19 21:23           ` Johannes Weiner
  -1 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2017-03-19 21:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team

On Fri, Mar 17, 2017 at 07:29:54PM +0100, Vlastimil Babka wrote:
> On 03/16/2017 07:34 PM, Johannes Weiner wrote:
> > The patched kernel also consistently beats vanilla in terms of peak
> > job throughput.
> > 
> > Overall very cool!
> 
> Thanks a lot! So that means it's worth the increased compaction stats
> you reported earlier?

Yes, from the impact this patchset has on the workload overall, I'm
assuming that the increased work pays off.

So maybe something to keep an eye out for, but IMO not a dealbreaker.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 0/8] try to reduce fragmenting fallbacks
@ 2017-03-19 21:23           ` Johannes Weiner
  0 siblings, 0 replies; 48+ messages in thread
From: Johannes Weiner @ 2017-03-19 21:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Joonsoo Kim,
	David Rientjes, kernel-team

On Fri, Mar 17, 2017 at 07:29:54PM +0100, Vlastimil Babka wrote:
> On 03/16/2017 07:34 PM, Johannes Weiner wrote:
> > The patched kernel also consistently beats vanilla in terms of peak
> > job throughput.
> > 
> > Overall very cool!
> 
> Thanks a lot! So that means it's worth the increased compaction stats
> you reported earlier?

Yes, from the impact this patchset has on the workload overall, I'm
assuming that the increased work pays off.

So maybe something to keep an eye out for, but IMO not a dealbreaker.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/8] mm, compaction: remove redundant watermark check in compact_finished()
  2017-03-16  1:30     ` Joonsoo Kim
@ 2017-03-29 15:30       ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-29 15:30 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On 03/16/2017 02:30 AM, Joonsoo Kim wrote:
> Hello,

Hi, sorry for the late replies.

> On Tue, Mar 07, 2017 at 02:15:39PM +0100, Vlastimil Babka wrote:
>> When detecting whether compaction has succeeded in forming a high-order page,
>> __compact_finished() employs a watermark check, followed by an own search for
>> a suitable page in the freelists. This is not ideal for two reasons:
>> 
>> - The watermark check also searches high-order freelists, but has a less strict
>>   criteria wrt fallback. It's therefore redundant and waste of cycles. This was
>>   different in the past when high-order watermark check attempted to apply
>>   reserves to high-order pages.
> 
> Although it looks redundant now, I don't like removal of the watermark
> check here. Criteria in watermark check would be changed to more strict
> later and we would easily miss to apply it on compaction side if the
> watermark check is removed.

I see, but compaction is already full of various watermark(-like) checks that
have to be considered/updated if watermark checking changes significantly, or
things will go subtly wrong. I doubt this extra check can really help much in
such cases.

>> 
>> - The watermark check might actually fail due to lack of order-0 pages.
>>   Compaction can't help with that, so there's no point in continuing because of
>>   that. It's possible that high-order page still exists and it terminates.
> 
> If lack of order-0 pages is the reason for stopping compaction, we
> need to insert the watermark check for order-0 to break the compaction
> instead of removing it. Am I missing something?

You proposed that once IIRC, but didn't follow up? Currently we learn about
insufficient order-0 watermark in __isolate_free_page() from the free scanner.
We could potentially stop compacting earlier by checking it also in
compact_finished(), but maybe it doesn't happen that often and it's just extra
checking overhead.

So I wouldn't be terribly opposed by converting the current check to an order-0
fail-compaction check (instead of removing it), but I really wouldn't like to
insert the order-0 one and also keep the current one.

> Thanks.
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 2/8] mm, compaction: remove redundant watermark check in compact_finished()
@ 2017-03-29 15:30       ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-29 15:30 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On 03/16/2017 02:30 AM, Joonsoo Kim wrote:
> Hello,

Hi, sorry for the late replies.

> On Tue, Mar 07, 2017 at 02:15:39PM +0100, Vlastimil Babka wrote:
>> When detecting whether compaction has succeeded in forming a high-order page,
>> __compact_finished() employs a watermark check, followed by an own search for
>> a suitable page in the freelists. This is not ideal for two reasons:
>> 
>> - The watermark check also searches high-order freelists, but has a less strict
>>   criteria wrt fallback. It's therefore redundant and waste of cycles. This was
>>   different in the past when high-order watermark check attempted to apply
>>   reserves to high-order pages.
> 
> Although it looks redundant now, I don't like removal of the watermark
> check here. Criteria in watermark check would be changed to more strict
> later and we would easily miss to apply it on compaction side if the
> watermark check is removed.

I see, but compaction is already full of various watermark(-like) checks that
have to be considered/updated if watermark checking changes significantly, or
things will go subtly wrong. I doubt this extra check can really help much in
such cases.

>> 
>> - The watermark check might actually fail due to lack of order-0 pages.
>>   Compaction can't help with that, so there's no point in continuing because of
>>   that. It's possible that high-order page still exists and it terminates.
> 
> If lack of order-0 pages is the reason for stopping compaction, we
> need to insert the watermark check for order-0 to break the compaction
> instead of removing it. Am I missing something?

You proposed that once IIRC, but didn't follow up? Currently we learn about
insufficient order-0 watermark in __isolate_free_page() from the free scanner.
We could potentially stop compacting earlier by checking it also in
compact_finished(), but maybe it doesn't happen that often and it's just extra
checking overhead.

So I wouldn't be terribly opposed by converting the current check to an order-0
fail-compaction check (instead of removing it), but I really wouldn't like to
insert the order-0 one and also keep the current one.

> Thanks.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/8] mm, page_alloc: count movable pages when stealing from pageblock
  2017-03-16  1:53     ` Joonsoo Kim
@ 2017-03-29 15:49       ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-29 15:49 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On 03/16/2017 02:53 AM, Joonsoo Kim wrote:
> On Tue, Mar 07, 2017 at 02:15:41PM +0100, Vlastimil Babka wrote:
>> When stealing pages from pageblock of a different migratetype, we count how
>> many free pages were stolen, and change the pageblock's migratetype if more
>> than half of the pageblock was free. This might be too conservative, as there
>> might be other pages that are not free, but were allocated with the same
>> migratetype as our allocation requested.
> 
> I think that too conservative is good for movable case. In my experiments,
> fragmentation spreads out when unmovable/reclaimable pageblock is
> changed to movable pageblock prematurely ('prematurely' means that
> allocated unmovable pages remains). As you said below, movable allocations
> falling back to other pageblocks don't causes permanent fragmentation.
> Therefore, we don't need to be less conservative for movable
> allocation. So, how about following change to keep the criteria for
> movable allocation conservative even with this counting improvement?
> 
> threshold = (1 << (pageblock_order - 1));
> if (start_type == MIGRATE_MOVABLE)
>         threshold += (1 << (pageblock_order - 2));
> 
> if (free_pages + alike_pages >= threshold)
>         ...

That could help, or also not. Keeping more pageblocks marked as unmovable also
means that more unmovable allocations will spread out to them all, even if they
would fit within less pageblocks. MIGRATE_MIXED was an idea to help in this
case, as truly unmovable pageblocks would be preferred to the mixed ones.

Can't decide about such change without testing :/

> Thanks.
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/8] mm, page_alloc: count movable pages when stealing from pageblock
@ 2017-03-29 15:49       ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-29 15:49 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On 03/16/2017 02:53 AM, Joonsoo Kim wrote:
> On Tue, Mar 07, 2017 at 02:15:41PM +0100, Vlastimil Babka wrote:
>> When stealing pages from pageblock of a different migratetype, we count how
>> many free pages were stolen, and change the pageblock's migratetype if more
>> than half of the pageblock was free. This might be too conservative, as there
>> might be other pages that are not free, but were allocated with the same
>> migratetype as our allocation requested.
> 
> I think that too conservative is good for movable case. In my experiments,
> fragmentation spreads out when unmovable/reclaimable pageblock is
> changed to movable pageblock prematurely ('prematurely' means that
> allocated unmovable pages remains). As you said below, movable allocations
> falling back to other pageblocks don't causes permanent fragmentation.
> Therefore, we don't need to be less conservative for movable
> allocation. So, how about following change to keep the criteria for
> movable allocation conservative even with this counting improvement?
> 
> threshold = (1 << (pageblock_order - 1));
> if (start_type == MIGRATE_MOVABLE)
>         threshold += (1 << (pageblock_order - 2));
> 
> if (free_pages + alike_pages >= threshold)
>         ...

That could help, or also not. Keeping more pageblocks marked as unmovable also
means that more unmovable allocations will spread out to them all, even if they
would fit within less pageblocks. MIGRATE_MIXED was an idea to help in this
case, as truly unmovable pageblocks would be preferred to the mixed ones.

Can't decide about such change without testing :/

> Thanks.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype
  2017-03-16  2:14     ` Joonsoo Kim
@ 2017-03-29 16:06       ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-29 16:06 UTC (permalink / raw)
  To: Joonsoo Kim, Johannes Weiner
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	David Rientjes, kernel-team, kernel-team

On 03/16/2017 03:14 AM, Joonsoo Kim wrote:
> On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
>> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
>> pageblocks. This is a heuristic intended to reduce latency, based on the
>> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
>> 
>> However, with the exception of THP's, most high-order allocations are not
>> movable. Should the async compaction succeed, this increases the chance that
>> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
>> long-term fragmentation worse.
> 
> I agree with this idea but have some concerns on this change.
> 
> *ASYNC* compaction is designed for reducing latency and this change
> doesn't fit it. If everything works fine, there is a few movable pages
> in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
> less the number of non-MOVABLE pageblock than MOVABLE one so finding
> non-MOVABLE pageblock takes long time. These two factors will increase
> the latency of *ASYNC* compaction.

Right. I lately started to doubt the whole idea of async compaction (for
non-movable allocations). Seems it's one of the compaction heuristics tuned
towards the THP usecase. But for non-movable allocations, we just can't have
both the low latency and long-term fragmentation avoidance. I see now even my
own skip_on_failure mode in isolate_migratepages_block() as a mistake for
non-movable allocations.

Ideally I'd like to make async compaction redundant by kcompactd, and direct
compaction would mean a serious situation which should warrant sync compaction.
Meanwhile I see several options to modify this patch
- async compaction for non-movable allocations will stop doing the
skip_on_failure mode, and won't restrict the pageblock at all. patch 8/8 will
make sure that also this kind of compaction finishes the whole pageblock
- non-movable allocations will skip async compaction completely and go for sync
compaction immediately

Both options mean we won't clean the unmovable/reclaimable pageblocks as
aggressively, but perhaps the tradeoff won't be bad. What do you think?
Johannes, would you be able/willing to test such modification?

Thanks

> And, there is a concern in implementaion side. With this change, there
> is much possibilty that compaction scanner's met by ASYNC compaction.
> It resets the scanner position and SYNC compaction would start the
> scan at the beginning of the zone every time. It would make cached
> position useless and inefficient.
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype
@ 2017-03-29 16:06       ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-29 16:06 UTC (permalink / raw)
  To: Joonsoo Kim, Johannes Weiner
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman,
	David Rientjes, kernel-team, kernel-team

On 03/16/2017 03:14 AM, Joonsoo Kim wrote:
> On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
>> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
>> pageblocks. This is a heuristic intended to reduce latency, based on the
>> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
>> 
>> However, with the exception of THP's, most high-order allocations are not
>> movable. Should the async compaction succeed, this increases the chance that
>> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
>> long-term fragmentation worse.
> 
> I agree with this idea but have some concerns on this change.
> 
> *ASYNC* compaction is designed for reducing latency and this change
> doesn't fit it. If everything works fine, there is a few movable pages
> in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
> less the number of non-MOVABLE pageblock than MOVABLE one so finding
> non-MOVABLE pageblock takes long time. These two factors will increase
> the latency of *ASYNC* compaction.

Right. I lately started to doubt the whole idea of async compaction (for
non-movable allocations). Seems it's one of the compaction heuristics tuned
towards the THP usecase. But for non-movable allocations, we just can't have
both the low latency and long-term fragmentation avoidance. I see now even my
own skip_on_failure mode in isolate_migratepages_block() as a mistake for
non-movable allocations.

Ideally I'd like to make async compaction redundant by kcompactd, and direct
compaction would mean a serious situation which should warrant sync compaction.
Meanwhile I see several options to modify this patch
- async compaction for non-movable allocations will stop doing the
skip_on_failure mode, and won't restrict the pageblock at all. patch 8/8 will
make sure that also this kind of compaction finishes the whole pageblock
- non-movable allocations will skip async compaction completely and go for sync
compaction immediately

Both options mean we won't clean the unmovable/reclaimable pageblocks as
aggressively, but perhaps the tradeoff won't be bad. What do you think?
Johannes, would you be able/willing to test such modification?

Thanks

> And, there is a concern in implementaion side. With this change, there
> is much possibilty that compaction scanner's met by ASYNC compaction.
> It resets the scanner position and SYNC compaction would start the
> scan at the beginning of the zone every time. It would make cached
> position useless and inefficient.
> 
> Thanks.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 8/8] mm, compaction: finish whole pageblock to reduce fragmentation
  2017-03-16  2:18     ` Joonsoo Kim
@ 2017-03-29 16:13       ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-29 16:13 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On 03/16/2017 03:18 AM, Joonsoo Kim wrote:
> On Tue, Mar 07, 2017 at 02:15:45PM +0100, Vlastimil Babka wrote:
>> The main goal of direct compaction is to form a high-order page for allocation,
>> but it should also help against long-term fragmentation when possible. Most
>> lower-than-pageblock-order compactions are for non-movable allocations, which
>> means that if we compact in a movable pageblock and terminate as soon as we
>> create the high-order page, it's unlikely that the fallback heuristics will
>> claim the whole block. Instead there might be a single unmovable page in a
>> pageblock full of movable pages, and the next unmovable allocation might pick
>> another pageblock and increase long-term fragmentation.
>> 
>> To help against such scenarios, this patch changes the termination criteria for
>> compaction so that the current pageblock is finished even though the high-order
>> page already exists. Note that it might be possible that the high-order page
>> formed elsewhere in the zone due to parallel activity, but this patch doesn't
>> try to detect that.
>> 
>> This is only done with sync compaction, because async compaction is limited to
>> pageblock of the same migratetype, where it cannot result in a migratetype
>> fallback. (Async compaction also eagerly skips order-aligned blocks where
>> isolation fails, which is against the goal of migrating away as much of the
>> pageblock as possible.)
>> 
>> As a result of this patch, long-term memory fragmentation should be reduced.
>> 
>> In testing based on 4.9 kernel with stress-highalloc from mmtests configured
>> for order-4 GFP_KERNEL allocations, this patch has reduced the number of
>> unmovable allocations falling back to movable pageblocks by 20%. The number
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Acked-by: Mel Gorman <mgorman@techsingularity.net>
>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>> ---
>>  mm/compaction.c | 36 ++++++++++++++++++++++++++++++++++--
>>  mm/internal.h   |  1 +
>>  2 files changed, 35 insertions(+), 2 deletions(-)
>> 
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 2c288e75840d..bc7903130501 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1318,6 +1318,17 @@ static enum compact_result __compact_finished(struct zone *zone,
>>  	if (is_via_compact_memory(cc->order))
>>  		return COMPACT_CONTINUE;
>>  
>> +	if (cc->finishing_block) {
>> +		/*
>> +		 * We have finished the pageblock, but better check again that
>> +		 * we really succeeded.
>> +		 */
>> +		if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
>> +			cc->finishing_block = false;
>> +		else
>> +			return COMPACT_CONTINUE;
>> +	}
>> +
>>  	/* Direct compactor: Is a suitable page free? */
>>  	for (order = cc->order; order < MAX_ORDER; order++) {
>>  		struct free_area *area = &zone->free_area[order];
>> @@ -1338,8 +1349,29 @@ static enum compact_result __compact_finished(struct zone *zone,
>>  		 * other migratetype buddy lists.
>>  		 */
>>  		if (find_suitable_fallback(area, order, migratetype,
>> -						true, &can_steal) != -1)
>> -			return COMPACT_SUCCESS;
>> +						true, &can_steal) != -1) {
>> +
>> +			/* movable pages are OK in any pageblock */
>> +			if (migratetype == MIGRATE_MOVABLE)
>> +				return COMPACT_SUCCESS;
>> +
>> +			/*
>> +			 * We are stealing for a non-movable allocation. Make
>> +			 * sure we finish compacting the current pageblock
>> +			 * first so it is as free as possible and we won't
>> +			 * have to steal another one soon. This only applies
>> +			 * to sync compaction, as async compaction operates
>> +			 * on pageblocks of the same migratetype.
>> +			 */
>> +			if (cc->mode == MIGRATE_ASYNC ||
>> +					IS_ALIGNED(cc->migrate_pfn,
>> +							pageblock_nr_pages)) {
>> +				return COMPACT_SUCCESS;
>> +			}
> 
> If cc->migratetype and cc->migrate_pfn's migratetype is the same, stopping
> the compaction here doesn't cause any fragmentation. Do we need to
> compact full pageblock in this case?

Probably not, but if we make patch 7/8 less aggressive, then I'd rather keep
this chance of clearing a whole unmovable pageblock of movable pages.

I also realized that the finishing_block flag is just an unnecessary
complication. Just keep going until migrate_pfn hits the end of pageblock, and
only then check the termination criteria.

> Thanks.
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 8/8] mm, compaction: finish whole pageblock to reduce fragmentation
@ 2017-03-29 16:13       ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-03-29 16:13 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Johannes Weiner,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On 03/16/2017 03:18 AM, Joonsoo Kim wrote:
> On Tue, Mar 07, 2017 at 02:15:45PM +0100, Vlastimil Babka wrote:
>> The main goal of direct compaction is to form a high-order page for allocation,
>> but it should also help against long-term fragmentation when possible. Most
>> lower-than-pageblock-order compactions are for non-movable allocations, which
>> means that if we compact in a movable pageblock and terminate as soon as we
>> create the high-order page, it's unlikely that the fallback heuristics will
>> claim the whole block. Instead there might be a single unmovable page in a
>> pageblock full of movable pages, and the next unmovable allocation might pick
>> another pageblock and increase long-term fragmentation.
>> 
>> To help against such scenarios, this patch changes the termination criteria for
>> compaction so that the current pageblock is finished even though the high-order
>> page already exists. Note that it might be possible that the high-order page
>> formed elsewhere in the zone due to parallel activity, but this patch doesn't
>> try to detect that.
>> 
>> This is only done with sync compaction, because async compaction is limited to
>> pageblock of the same migratetype, where it cannot result in a migratetype
>> fallback. (Async compaction also eagerly skips order-aligned blocks where
>> isolation fails, which is against the goal of migrating away as much of the
>> pageblock as possible.)
>> 
>> As a result of this patch, long-term memory fragmentation should be reduced.
>> 
>> In testing based on 4.9 kernel with stress-highalloc from mmtests configured
>> for order-4 GFP_KERNEL allocations, this patch has reduced the number of
>> unmovable allocations falling back to movable pageblocks by 20%. The number
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Acked-by: Mel Gorman <mgorman@techsingularity.net>
>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>> ---
>>  mm/compaction.c | 36 ++++++++++++++++++++++++++++++++++--
>>  mm/internal.h   |  1 +
>>  2 files changed, 35 insertions(+), 2 deletions(-)
>> 
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 2c288e75840d..bc7903130501 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1318,6 +1318,17 @@ static enum compact_result __compact_finished(struct zone *zone,
>>  	if (is_via_compact_memory(cc->order))
>>  		return COMPACT_CONTINUE;
>>  
>> +	if (cc->finishing_block) {
>> +		/*
>> +		 * We have finished the pageblock, but better check again that
>> +		 * we really succeeded.
>> +		 */
>> +		if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
>> +			cc->finishing_block = false;
>> +		else
>> +			return COMPACT_CONTINUE;
>> +	}
>> +
>>  	/* Direct compactor: Is a suitable page free? */
>>  	for (order = cc->order; order < MAX_ORDER; order++) {
>>  		struct free_area *area = &zone->free_area[order];
>> @@ -1338,8 +1349,29 @@ static enum compact_result __compact_finished(struct zone *zone,
>>  		 * other migratetype buddy lists.
>>  		 */
>>  		if (find_suitable_fallback(area, order, migratetype,
>> -						true, &can_steal) != -1)
>> -			return COMPACT_SUCCESS;
>> +						true, &can_steal) != -1) {
>> +
>> +			/* movable pages are OK in any pageblock */
>> +			if (migratetype == MIGRATE_MOVABLE)
>> +				return COMPACT_SUCCESS;
>> +
>> +			/*
>> +			 * We are stealing for a non-movable allocation. Make
>> +			 * sure we finish compacting the current pageblock
>> +			 * first so it is as free as possible and we won't
>> +			 * have to steal another one soon. This only applies
>> +			 * to sync compaction, as async compaction operates
>> +			 * on pageblocks of the same migratetype.
>> +			 */
>> +			if (cc->mode == MIGRATE_ASYNC ||
>> +					IS_ALIGNED(cc->migrate_pfn,
>> +							pageblock_nr_pages)) {
>> +				return COMPACT_SUCCESS;
>> +			}
> 
> If cc->migratetype and cc->migrate_pfn's migratetype is the same, stopping
> the compaction here doesn't cause any fragmentation. Do we need to
> compact full pageblock in this case?

Probably not, but if we make patch 7/8 less aggressive, then I'd rather keep
this chance of clearing a whole unmovable pageblock of movable pages.

I also realized that the finishing_block flag is just an unnecessary
complication. Just keep going until migrate_pfn hits the end of pageblock, and
only then check the termination criteria.

> Thanks.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype
  2017-03-29 16:06       ` Vlastimil Babka
@ 2017-04-07  0:38         ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-04-07  0:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Johannes Weiner, Andrew Morton, linux-kernel, linux-mm,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On Wed, Mar 29, 2017 at 06:06:41PM +0200, Vlastimil Babka wrote:
> On 03/16/2017 03:14 AM, Joonsoo Kim wrote:
> > On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
> >> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> >> pageblocks. This is a heuristic intended to reduce latency, based on the
> >> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> >> 
> >> However, with the exception of THP's, most high-order allocations are not
> >> movable. Should the async compaction succeed, this increases the chance that
> >> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> >> long-term fragmentation worse.
> > 
> > I agree with this idea but have some concerns on this change.
> > 
> > *ASYNC* compaction is designed for reducing latency and this change
> > doesn't fit it. If everything works fine, there is a few movable pages
> > in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
> > less the number of non-MOVABLE pageblock than MOVABLE one so finding
> > non-MOVABLE pageblock takes long time. These two factors will increase
> > the latency of *ASYNC* compaction.
> 
> Right. I lately started to doubt the whole idea of async compaction (for
> non-movable allocations). Seems it's one of the compaction heuristics tuned
> towards the THP usecase. But for non-movable allocations, we just can't have
> both the low latency and long-term fragmentation avoidance. I see now even my
> own skip_on_failure mode in isolate_migratepages_block() as a mistake for
> non-movable allocations.

Why do you think that skip_on_failure mode is a mistake? I think that
it would lead to reduce the latency and it fits the goal of async
compaction.

> 
> Ideally I'd like to make async compaction redundant by kcompactd, and direct
> compaction would mean a serious situation which should warrant sync compaction.
> Meanwhile I see several options to modify this patch
> - async compaction for non-movable allocations will stop doing the
> skip_on_failure mode, and won't restrict the pageblock at all. patch 8/8 will
> make sure that also this kind of compaction finishes the whole pageblock
> - non-movable allocations will skip async compaction completely and go for sync
> compaction immediately

IMO, concept of async compaction is also important for non-movable allocation.
Non-movable allocation is essential for some workload and they hope
the low latency.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype
@ 2017-04-07  0:38         ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2017-04-07  0:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Johannes Weiner, Andrew Morton, linux-kernel, linux-mm,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On Wed, Mar 29, 2017 at 06:06:41PM +0200, Vlastimil Babka wrote:
> On 03/16/2017 03:14 AM, Joonsoo Kim wrote:
> > On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
> >> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> >> pageblocks. This is a heuristic intended to reduce latency, based on the
> >> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> >> 
> >> However, with the exception of THP's, most high-order allocations are not
> >> movable. Should the async compaction succeed, this increases the chance that
> >> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> >> long-term fragmentation worse.
> > 
> > I agree with this idea but have some concerns on this change.
> > 
> > *ASYNC* compaction is designed for reducing latency and this change
> > doesn't fit it. If everything works fine, there is a few movable pages
> > in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
> > less the number of non-MOVABLE pageblock than MOVABLE one so finding
> > non-MOVABLE pageblock takes long time. These two factors will increase
> > the latency of *ASYNC* compaction.
> 
> Right. I lately started to doubt the whole idea of async compaction (for
> non-movable allocations). Seems it's one of the compaction heuristics tuned
> towards the THP usecase. But for non-movable allocations, we just can't have
> both the low latency and long-term fragmentation avoidance. I see now even my
> own skip_on_failure mode in isolate_migratepages_block() as a mistake for
> non-movable allocations.

Why do you think that skip_on_failure mode is a mistake? I think that
it would lead to reduce the latency and it fits the goal of async
compaction.

> 
> Ideally I'd like to make async compaction redundant by kcompactd, and direct
> compaction would mean a serious situation which should warrant sync compaction.
> Meanwhile I see several options to modify this patch
> - async compaction for non-movable allocations will stop doing the
> skip_on_failure mode, and won't restrict the pageblock at all. patch 8/8 will
> make sure that also this kind of compaction finishes the whole pageblock
> - non-movable allocations will skip async compaction completely and go for sync
> compaction immediately

IMO, concept of async compaction is also important for non-movable allocation.
Non-movable allocation is essential for some workload and they hope
the low latency.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype
  2017-04-07  0:38         ` Joonsoo Kim
@ 2017-05-04  6:12           ` Vlastimil Babka
  -1 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-05-04  6:12 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Johannes Weiner, Andrew Morton, linux-kernel, linux-mm,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On 04/07/2017 02:38 AM, Joonsoo Kim wrote:
> On Wed, Mar 29, 2017 at 06:06:41PM +0200, Vlastimil Babka wrote:
>> On 03/16/2017 03:14 AM, Joonsoo Kim wrote:
>>> On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
>>>> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
>>>> pageblocks. This is a heuristic intended to reduce latency, based on the
>>>> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
>>>>
>>>> However, with the exception of THP's, most high-order allocations are not
>>>> movable. Should the async compaction succeed, this increases the chance that
>>>> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
>>>> long-term fragmentation worse.
>>>
>>> I agree with this idea but have some concerns on this change.
>>>
>>> *ASYNC* compaction is designed for reducing latency and this change
>>> doesn't fit it. If everything works fine, there is a few movable pages
>>> in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
>>> less the number of non-MOVABLE pageblock than MOVABLE one so finding
>>> non-MOVABLE pageblock takes long time. These two factors will increase
>>> the latency of *ASYNC* compaction.
>>
>> Right. I lately started to doubt the whole idea of async compaction (for
>> non-movable allocations). Seems it's one of the compaction heuristics tuned
>> towards the THP usecase. But for non-movable allocations, we just can't have
>> both the low latency and long-term fragmentation avoidance. I see now even my
>> own skip_on_failure mode in isolate_migratepages_block() as a mistake for
>> non-movable allocations.
> 
> Why do you think that skip_on_failure mode is a mistake? I think that
> it would lead to reduce the latency and it fits the goal of async
> compaction.

Yes, but the downside is that compaction will create just the single
high-order page that is requested, while previously it would also
migrate away some more lower-order pages. When compacting for
MIGRATE_UNMOVABLE allocation, we then can't steal extra pages, so next
allocation might pollute a different pageblock. It's not a good tradeoff.

>>
>> Ideally I'd like to make async compaction redundant by kcompactd, and direct
>> compaction would mean a serious situation which should warrant sync compaction.
>> Meanwhile I see several options to modify this patch
>> - async compaction for non-movable allocations will stop doing the
>> skip_on_failure mode, and won't restrict the pageblock at all. patch 8/8 will
>> make sure that also this kind of compaction finishes the whole pageblock
>> - non-movable allocations will skip async compaction completely and go for sync
>> compaction immediately
> 
> IMO, concept of async compaction is also important for non-movable allocation.
> Non-movable allocation is essential for some workload and they hope
> the low latency.

The low latency should not be at the expense of making long-term
fragmentation worse.

> Thanks.
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype
@ 2017-05-04  6:12           ` Vlastimil Babka
  0 siblings, 0 replies; 48+ messages in thread
From: Vlastimil Babka @ 2017-05-04  6:12 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Johannes Weiner, Andrew Morton, linux-kernel, linux-mm,
	Mel Gorman, David Rientjes, kernel-team, kernel-team

On 04/07/2017 02:38 AM, Joonsoo Kim wrote:
> On Wed, Mar 29, 2017 at 06:06:41PM +0200, Vlastimil Babka wrote:
>> On 03/16/2017 03:14 AM, Joonsoo Kim wrote:
>>> On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
>>>> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
>>>> pageblocks. This is a heuristic intended to reduce latency, based on the
>>>> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
>>>>
>>>> However, with the exception of THP's, most high-order allocations are not
>>>> movable. Should the async compaction succeed, this increases the chance that
>>>> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
>>>> long-term fragmentation worse.
>>>
>>> I agree with this idea but have some concerns on this change.
>>>
>>> *ASYNC* compaction is designed for reducing latency and this change
>>> doesn't fit it. If everything works fine, there is a few movable pages
>>> in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
>>> less the number of non-MOVABLE pageblock than MOVABLE one so finding
>>> non-MOVABLE pageblock takes long time. These two factors will increase
>>> the latency of *ASYNC* compaction.
>>
>> Right. I lately started to doubt the whole idea of async compaction (for
>> non-movable allocations). Seems it's one of the compaction heuristics tuned
>> towards the THP usecase. But for non-movable allocations, we just can't have
>> both the low latency and long-term fragmentation avoidance. I see now even my
>> own skip_on_failure mode in isolate_migratepages_block() as a mistake for
>> non-movable allocations.
> 
> Why do you think that skip_on_failure mode is a mistake? I think that
> it would lead to reduce the latency and it fits the goal of async
> compaction.

Yes, but the downside is that compaction will create just the single
high-order page that is requested, while previously it would also
migrate away some more lower-order pages. When compacting for
MIGRATE_UNMOVABLE allocation, we then can't steal extra pages, so next
allocation might pollute a different pageblock. It's not a good tradeoff.

>>
>> Ideally I'd like to make async compaction redundant by kcompactd, and direct
>> compaction would mean a serious situation which should warrant sync compaction.
>> Meanwhile I see several options to modify this patch
>> - async compaction for non-movable allocations will stop doing the
>> skip_on_failure mode, and won't restrict the pageblock at all. patch 8/8 will
>> make sure that also this kind of compaction finishes the whole pageblock
>> - non-movable allocations will skip async compaction completely and go for sync
>> compaction immediately
> 
> IMO, concept of async compaction is also important for non-movable allocation.
> Non-movable allocation is essential for some workload and they hope
> the low latency.

The low latency should not be at the expense of making long-term
fragmentation worse.

> Thanks.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2017-05-04  6:13 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-07 13:15 [PATCH v3 0/8] try to reduce fragmenting fallbacks Vlastimil Babka
2017-03-07 13:15 ` Vlastimil Babka
2017-03-07 13:15 ` [PATCH v3 1/8] mm, compaction: reorder fields in struct compact_control Vlastimil Babka
2017-03-07 13:15   ` Vlastimil Babka
2017-03-07 13:15 ` [PATCH v3 2/8] mm, compaction: remove redundant watermark check in compact_finished() Vlastimil Babka
2017-03-07 13:15   ` Vlastimil Babka
2017-03-16  1:30   ` Joonsoo Kim
2017-03-16  1:30     ` Joonsoo Kim
2017-03-29 15:30     ` Vlastimil Babka
2017-03-29 15:30       ` Vlastimil Babka
2017-03-07 13:15 ` [PATCH v3 3/8] mm, page_alloc: split smallest stolen page in fallback Vlastimil Babka
2017-03-07 13:15   ` Vlastimil Babka
2017-03-07 13:15 ` [PATCH v3 4/8] mm, page_alloc: count movable pages when stealing from pageblock Vlastimil Babka
2017-03-07 13:15   ` Vlastimil Babka
2017-03-16  1:53   ` Joonsoo Kim
2017-03-16  1:53     ` Joonsoo Kim
2017-03-29 15:49     ` Vlastimil Babka
2017-03-29 15:49       ` Vlastimil Babka
2017-03-07 13:15 ` [PATCH v3 5/8] mm, compaction: change migrate_async_suitable() to suitable_migration_source() Vlastimil Babka
2017-03-07 13:15   ` Vlastimil Babka
2017-03-07 13:15 ` [PATCH v3 6/8] mm, compaction: add migratetype to compact_control Vlastimil Babka
2017-03-07 13:15   ` Vlastimil Babka
2017-03-07 13:15 ` [PATCH v3 7/8] mm, compaction: restrict async compaction to pageblocks of same migratetype Vlastimil Babka
2017-03-07 13:15   ` Vlastimil Babka
2017-03-16  2:14   ` Joonsoo Kim
2017-03-16  2:14     ` Joonsoo Kim
2017-03-29 16:06     ` Vlastimil Babka
2017-03-29 16:06       ` Vlastimil Babka
2017-04-07  0:38       ` Joonsoo Kim
2017-04-07  0:38         ` Joonsoo Kim
2017-05-04  6:12         ` Vlastimil Babka
2017-05-04  6:12           ` Vlastimil Babka
2017-03-07 13:15 ` [PATCH v3 8/8] mm, compaction: finish whole pageblock to reduce fragmentation Vlastimil Babka
2017-03-07 13:15   ` Vlastimil Babka
2017-03-16  2:18   ` Joonsoo Kim
2017-03-16  2:18     ` Joonsoo Kim
2017-03-29 16:13     ` Vlastimil Babka
2017-03-29 16:13       ` Vlastimil Babka
2017-03-08 16:46 ` [PATCH v3 0/8] try to reduce fragmenting fallbacks Johannes Weiner
2017-03-08 16:46   ` Johannes Weiner
2017-03-08 19:17   ` Vlastimil Babka
2017-03-08 19:17     ` Vlastimil Babka
2017-03-16 18:34     ` Johannes Weiner
2017-03-16 18:34       ` Johannes Weiner
2017-03-17 18:29       ` Vlastimil Babka
2017-03-17 18:29         ` Vlastimil Babka
2017-03-19 21:23         ` Johannes Weiner
2017-03-19 21:23           ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.