* [RFC PATCH 0/5] compaction: changing initial position of scanners
@ 2015-01-19 10:05 Vlastimil Babka
2015-01-19 10:05 ` [PATCH 1/5] mm, compaction: more robust check for scanners meeting Vlastimil Babka
` (7 more replies)
0 siblings, 8 replies; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-19 10:05 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, Vlastimil Babka, David Rientjes,
Christoph Lameter, Joonsoo Kim, Mel Gorman, Michal Nazarewicz,
Minchan Kim, Naoya Horiguchi, Rik van Riel
Even after all the patches compaction received in last several versions, it
turns out that its effectivneess degrades considerably as the system ages
after reboot. For example, see how success rates of stress-highalloc from
mmtests degrades when we re-execute it several times, first time being after
fresh reboot:
3.19-rc4 3.19-rc4 3.19-rc4
4-nothp-1 4-nothp-2 4-nothp-3
Success 1 Min 25.00 ( 0.00%) 13.00 ( 48.00%) 9.00 ( 64.00%)
Success 1 Mean 36.20 ( 0.00%) 23.40 ( 35.36%) 16.40 ( 54.70%)
Success 1 Max 41.00 ( 0.00%) 34.00 ( 17.07%) 25.00 ( 39.02%)
Success 2 Min 25.00 ( 0.00%) 15.00 ( 40.00%) 10.00 ( 60.00%)
Success 2 Mean 37.20 ( 0.00%) 25.00 ( 32.80%) 17.20 ( 53.76%)
Success 2 Max 44.00 ( 0.00%) 36.00 ( 18.18%) 25.00 ( 43.18%)
Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 78.00 ( 7.14%)
Success 3 Mean 85.80 ( 0.00%) 82.80 ( 3.50%) 80.40 ( 6.29%)
Success 3 Max 87.00 ( 0.00%) 84.00 ( 3.45%) 82.00 ( 5.75%)
Wouldn't it be much better, if it looked like this?
3.18 3.18 3.18
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%)
Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%)
Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%)
Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%)
Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%)
Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%)
Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%)
Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%)
Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
In my humble opinion, it would :) Much lower degradation, and a nice
improvement in the first iteration as a bonus.
So what sorcery is this? Nothing much, just a fundamental change of the
compaction scanners operation...
As everyone knows [1] the migration scanner starts at the first pageblock
of a zone, and goes towards the end, and the free scanner starts at the
last pageblock and goes towards the beginning. Somewhere in the middle of the
zone, the scanners meet:
zone_start zone_end
| |
-------------------------------------------------------------
MMMMMMMMMMMMM| => <= |FFFFFFFFFFFF
migrate_pfn free_pfn
In my tests, the scanners meet around the middle of the pageblock on the first
iteration, and around the 1/3 on subsequent iterations. Which means the
migration scanner doesn't see the larger part of the zone at all. For more
details why it's bad, see Patch 4 description.
To make sure we eventually scan the whole zone with the migration scanner, we
could e.g. reverse the directions after each run. But that would still be
biased, and with 1/3 of zone reachable from each side, we would still omit the
middle 1/3 of a zone.
Or we could stop terminating compaction when the scanners meet, and let them
continue to scan the whole zone. Mel told me it used to be the case long ago,
but that approach would result in migrating pages back and forth during single
compaction run, which wouldn't be cool.
So the approach taken by this patchset is to let scanners start at any
so-called pivot pfn within the zone, and keep their direction:
zone_start pivot zone_end
| | |
-------------------------------------------------------------
<= |FFFFFFFMMMMMM| =>
free_pfn migrate_pfn
Eventually, one of the scanners will reach the zone boundary and wrap around,
e.g. the in the case of the free scanner:
zone_start pivot zone_end
| | |
-------------------------------------------------------------
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFMMMMMMMMMMMM| => <= |F
migrate_pfn free_pfn
Compaction will again terminate when the scanners meet.
As you can imagine, the required code changes made the termination detection
and the scanners themselves quite hairy. There are lots of corner cases and
the code is often hard to wrap one's head around [puns intended]. The scanner
functions isolate_migratepages() and isolate_freepages() were recently cleaned
up a lot, and this makes them messy again, as they can no longer rely on the
fact that they will meet the other scanner and not the zone boundary.
But the improvements seem to make these complications worth, and I hope
somebody can suggest more elegant solutions to the various parts of the code.
So here it is as a RFC. Patches 1-3 are cleanups that could be applied in any
case. Patch 4 implements the main changes, but leaves the pivot to be the
first zone's pfn, so that free scanner wraps immediately and there's no
actual change. Patch 5 updates the pivot in a conservative way, as explained
in the changelog.
Even with the conservative approach, stress-highalloc results are improved,
mainly on the 2+ iterations after fresh reboot. Here are again the results
before the patches, including compaction stats:
3.19-rc4 3.19-rc4 3.19-rc4
4-nothp-1 4-nothp-2 4-nothp-3
Success 1 Min 25.00 ( 0.00%) 13.00 ( 48.00%) 9.00 ( 64.00%)
Success 1 Mean 36.20 ( 0.00%) 23.40 ( 35.36%) 16.40 ( 54.70%)
Success 1 Max 41.00 ( 0.00%) 34.00 ( 17.07%) 25.00 ( 39.02%)
Success 2 Min 25.00 ( 0.00%) 15.00 ( 40.00%) 10.00 ( 60.00%)
Success 2 Mean 37.20 ( 0.00%) 25.00 ( 32.80%) 17.20 ( 53.76%)
Success 2 Max 44.00 ( 0.00%) 36.00 ( 18.18%) 25.00 ( 43.18%)
Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 78.00 ( 7.14%)
Success 3 Mean 85.80 ( 0.00%) 82.80 ( 3.50%) 80.40 ( 6.29%)
Success 3 Max 87.00 ( 0.00%) 84.00 ( 3.45%) 82.00 ( 5.75%)
3.19-rc4 3.19-rc4 3.19-rc4
4-nothp-1 4-nothp-2 4-nothp-3
User 6781.16 6931.44 6905.44
System 1073.97 1071.20 1067.92
Elapsed 2349.71 2290.40 2255.59
3.19-rc4 3.19-rc4 3.19-rc4
4-nothp-1 4-nothp-2 4-nothp-3
Minor Faults 198270153 197020929 197146656
Major Faults 498 470 527
Swap Ins 60 34 105
Swap Outs 2780 1011 425
Allocation stalls 8325 4615 4769
DMA allocs 161 177 141
DMA32 allocs 124477754 123412713 123385291
Normal allocs 59366406 59040066 58909035
Movable allocs 0 0 0
Direct pages scanned 413858 280997 267341
Kswapd pages scanned 2204723 2184556 2147546
Kswapd pages reclaimed 2199430 2181305 2144395
Direct pages reclaimed 413062 280323 266557
Kswapd efficiency 99% 99% 99%
Kswapd velocity 914.497 977.868 943.877
Direct efficiency 99% 99% 99%
Direct velocity 171.664 125.782 117.500
Percentage direct scans 15% 11% 11%
Zone normal velocity 359.993 356.888 339.577
Zone dma32 velocity 726.152 746.745 721.784
Zone dma velocity 0.016 0.017 0.016
Page writes by reclaim 2893.000 1119.800 507.600
Page writes file 112 108 81
Page writes anon 2780 1011 425
Page reclaim immediate 1134 1197 1412
Sector Reads 4747006 4606605 4524025
Sector Writes 12759301 12562316 12629243
Page rescued immediate 0 0 0
Slabs scanned 1807821 1528751 1498929
Direct inode steals 24428 12649 13346
Kswapd inode steals 33213 29804 30194
Kswapd skipped wait 0 0 0
THP fault alloc 217 207 222
THP collapse alloc 500 539 373
THP splits 13 14 13
THP fault fallback 50 9 16
THP collapse fail 16 17 22
Compaction stalls 3136 2310 2072
Compaction success 1123 897 828
Compaction failures 2012 1413 1244
Page migrate success 4319697 3682666 3133699
Page migrate failure 19012 9964 7488
Compaction pages isolated 8974417 7634911 6528981
Compaction migrate scanned 182447073 122423031 127292737
Compaction free scanned 389193883 291257587 269516820
Compaction cost 5931 4824 4267
As the allocation success rates degrade, so do the compaction success rates,
and so does the scanner activity.
Now after the series:
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%)
Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%)
Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%)
Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%)
Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%)
Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%)
Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%)
Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%)
Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
User 6675.75 6742.26 6707.92
System 1069.78 1070.77 1070.25
Elapsed 2450.31 2363.98 2442.21
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
Minor Faults 197652452 197164571 197946824
Major Faults 882 743 934
Swap Ins 113 96 144
Swap Outs 2144 1340 1799
Allocation stalls 10304 9060 10261
DMA allocs 142 227 181
DMA32 allocs 124497911 123947671 124010151
Normal allocs 59233600 59160910 59669421
Movable allocs 0 0 0
Direct pages scanned 591368 481119 525158
Kswapd pages scanned 2217381 2164036 2170388
Kswapd pages reclaimed 2212285 2160162 2166794
Direct pages reclaimed 590064 480297 524013
Kswapd efficiency 99% 99% 99%
Kswapd velocity 921.119 937.864 936.558
Direct efficiency 99% 99% 99%
Direct velocity 245.659 208.511 226.615
Percentage direct scans 21% 18% 19%
Zone normal velocity 378.957 376.819 383.604
Zone dma32 velocity 787.805 769.530 779.552
Zone dma velocity 0.015 0.025 0.016
Page writes by reclaim 2284.600 1432.400 1972.400
Page writes file 140 92 173
Page writes anon 2144 1340 1799
Page reclaim immediate 1689 1315 1440
Sector Reads 4920699 4830343 4830263
Sector Writes 12643658 12588410 12713518
Page rescued immediate 0 0 0
Slabs scanned 2084358 1922421 1962867
Direct inode steals 35881 37170 45512
Kswapd inode steals 35973 35741 30339
Kswapd skipped wait 0 0 0
THP fault alloc 191 224 170
THP collapse alloc 554 533 516
THP splits 15 15 11
THP fault fallback 45 0 76
THP collapse fail 16 18 18
Compaction stalls 4069 3746 3951
Compaction success 1507 1388 1366
Compaction failures 2562 2357 2585
Page migrate success 4533617 4912832 5278583
Page migrate failure 20127 15273 17862
Compaction pages isolated 9495626 10235697 10993942
Compaction migrate scanned 93585708 99204656 92052138
Compaction free scanned 510786018 503529022 541298357
Compaction cost 5541 5988 6332
Less degradation in success rates and no degradation in activity.
Let's look again at compactions stats without the series:
Compaction stalls 3136 2310 2072
Compaction success 1123 897 828
Compaction failures 2012 1413 1244
Page migrate success 4319697 3682666 3133699
Page migrate failure 19012 9964 7488
Compaction pages isolated 8974417 7634911 6528981
Compaction migrate scanned 182447073 122423031 127292737
Compaction free scanned 389193883 291257587 269516820
Compaction cost 5931 4824 4267
Interestingly, the number of migrate scanned pages was higher before the
series, even twice in the first iteration. That suggests that the scanner
was indeed scanning in a limited number of pageblocks over and over, despite
their compaction potential "depleted". After the patches, it achieves better
results with less scanned blocks, as it has a better chance of encountering
non-depleted blocks. The free scanner activity has increased after the series,
which just means that higher success rates lead to less deferred compaction.
There is also some increase in page migrations, but it is likely also a
consequence of better success and not due to useless back and forth migrations
due to pivot changes.
Preliminary testing with THP-like allocations has shown similar improvements,
which is somewhat surprising, because THP allocations do not use sync
and thus do not defer compaction; but changing the pivot is currently tied
to restarting from deferred compaction. Possibly this is due to the other
allocation activity than the stress-highalloc test itself. I will post these
results when full measurements are done.
[1] http://lwn.net/Articles/368869/
Vlastimil Babka (5):
mm, compaction: more robust check for scanners meeting
mm, compaction: simplify handling restart position in free pages
scanner
mm, compaction: encapsulate resetting cached scanner positions
mm, compaction: allow scanners to start at any pfn within the zone
mm, compaction: set pivot pfn to the pfn when scanners met last time
include/linux/mmzone.h | 4 +
mm/compaction.c | 276 ++++++++++++++++++++++++++++++++++++++++---------
mm/internal.h | 1 +
3 files changed, 234 insertions(+), 47 deletions(-)
--
2.1.2
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/5] mm, compaction: more robust check for scanners meeting
2015-01-19 10:05 [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
@ 2015-01-19 10:05 ` Vlastimil Babka
2015-01-20 13:26 ` Zhang Yanfei
2015-01-19 10:05 ` [PATCH 2/5] mm, compaction: simplify handling restart position in free pages scanner Vlastimil Babka
` (6 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-19 10:05 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, Vlastimil Babka, Minchan Kim,
Mel Gorman, Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
Compaction should finish when the migration and free scanner meet, i.e. they
reach the same pageblock. Currently however, the test in compact_finished()
simply just compares the exact pfns, which may yield a false negative when the
free scanner position is in the middle of a pageblock and the migration
scanner reaches the begining of the same pageblock.
This hasn't been a problem until commit e14c720efdd7 ("mm, compaction:
remember position within pageblock in free pages scanner") allowed the free
scanner position to be in the middle of a pageblock between invocations.
The hot-fix 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in
compact_zone") prevented the issue by adding a special check in the migration
scanner to satisfy the current detection of scanners meeting.
However, the proper fix is to make the detection more robust. This patch
introduces the compact_scanners_met() function that returns true when the free
scanner position is in the same or lower pageblock than the migration scanner.
The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
removed.
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
mm/compaction.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 546e571..5fdbdb8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -803,6 +803,16 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
#endif /* CONFIG_COMPACTION || CONFIG_CMA */
#ifdef CONFIG_COMPACTION
/*
+ * Test whether the free scanner has reached the same or lower pageblock than
+ * the migration scanner, and compaction should thus terminate.
+ */
+static inline bool compact_scanners_met(struct compact_control *cc)
+{
+ return (cc->free_pfn >> pageblock_order)
+ <= (cc->migrate_pfn >> pageblock_order);
+}
+
+/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -1027,12 +1037,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
}
acct_isolated(zone, cc);
- /*
- * Record where migration scanner will be restarted. If we end up in
- * the same pageblock as the free scanner, make the scanners fully
- * meet so that compact_finished() terminates compaction.
- */
- cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
+ /* Record where migration scanner will be restarted. */
+ cc->migrate_pfn = low_pfn;
return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
}
@@ -1047,7 +1053,7 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
return COMPACT_PARTIAL;
/* Compaction run completes if the migrate and free scanner meet */
- if (cc->free_pfn <= cc->migrate_pfn) {
+ if (compact_scanners_met(cc)) {
/* Let the next compaction start anew. */
zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
@@ -1238,7 +1244,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
* migrate_pages() may return -ENOMEM when scanners meet
* and we want compact_finished() to detect it
*/
- if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
+ if (err == -ENOMEM && !compact_scanners_met(cc)) {
ret = COMPACT_PARTIAL;
goto out;
}
--
2.1.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/5] mm, compaction: simplify handling restart position in free pages scanner
2015-01-19 10:05 [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
2015-01-19 10:05 ` [PATCH 1/5] mm, compaction: more robust check for scanners meeting Vlastimil Babka
@ 2015-01-19 10:05 ` Vlastimil Babka
2015-01-20 13:27 ` Zhang Yanfei
2015-01-19 10:05 ` [PATCH 3/5] mm, compaction: encapsulate resetting cached scanner positions Vlastimil Babka
` (5 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-19 10:05 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, Vlastimil Babka, Minchan Kim,
Mel Gorman, Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
Handling the position where compaction free scanner should restart (stored in
cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
remember position within pageblock in free pages scanner"). Currently the
position is updated in each loop iteration isolate_freepages(), although it's
enough to update it only when exiting the loop when we have found enough free
pages, or detected contention in async compaction. Then an extra check outside
the loop updates the position in case we have met the migration scanner.
This can be simplified if we move the test for having isolated enough from
for loop header next to the test for contention, and determining the restart
position only in these cases. We can reuse the isolate_start_pfn variable for
this instead of setting cc->free_pfn directly. Outside the loop, we can simply
set cc->free_pfn to value of isolate_start_pfn without extra check.
We also add VM_BUG_ON to future-proof the code, in case somebody adds a new
condition that terminates isolate_freepages_block() prematurely, which
wouldn't be also considered in isolate_freepages().
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
mm/compaction.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 5fdbdb8..45799a4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -849,7 +849,7 @@ static void isolate_freepages(struct compact_control *cc)
* pages on cc->migratepages. We stop searching if the migrate
* and free page scanners meet or enough free pages are isolated.
*/
- for (; block_start_pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
+ for (; block_start_pfn >= low_pfn;
block_end_pfn = block_start_pfn,
block_start_pfn -= pageblock_nr_pages,
isolate_start_pfn = block_start_pfn) {
@@ -883,6 +883,8 @@ static void isolate_freepages(struct compact_control *cc)
nr_freepages += isolated;
/*
+ * If we isolated enough freepages, or aborted due to async
+ * compaction being contended, terminate the loop.
* Remember where the free scanner should restart next time,
* which is where isolate_freepages_block() left off.
* But if it scanned the whole pageblock, isolate_start_pfn
@@ -891,28 +893,30 @@ static void isolate_freepages(struct compact_control *cc)
* In that case we will however want to restart at the start
* of the previous pageblock.
*/
- cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
- isolate_start_pfn :
- block_start_pfn - pageblock_nr_pages;
-
- /*
- * isolate_freepages_block() might have aborted due to async
- * compaction being contended
- */
- if (cc->contended)
+ if ((nr_freepages > cc->nr_migratepages) || cc->contended) {
+ if (isolate_start_pfn >= block_end_pfn)
+ isolate_start_pfn =
+ block_start_pfn - pageblock_nr_pages;
break;
+ } else {
+ /*
+ * isolate_freepages_block() should not terminate
+ * prematurely unless contended, or isolated enough
+ */
+ VM_BUG_ON(isolate_start_pfn < block_end_pfn);
+ }
}
/* split_free_page does not map the pages */
map_pages(freelist);
/*
- * If we crossed the migrate scanner, we want to keep it that way
- * so that compact_finished() may detect this
+ * Record where the free scanner will restart next time. Either we
+ * broke from the loop and set isolate_start_pfn based on the last
+ * call to isolate_freepages_block(), or we met the migration scanner
+ * and the loop terminated due to isolate_start_pfn < low_pfn
*/
- if (block_start_pfn < low_pfn)
- cc->free_pfn = cc->migrate_pfn;
-
+ cc->free_pfn = isolate_start_pfn;
cc->nr_freepages = nr_freepages;
}
--
2.1.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/5] mm, compaction: encapsulate resetting cached scanner positions
2015-01-19 10:05 [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
2015-01-19 10:05 ` [PATCH 1/5] mm, compaction: more robust check for scanners meeting Vlastimil Babka
2015-01-19 10:05 ` [PATCH 2/5] mm, compaction: simplify handling restart position in free pages scanner Vlastimil Babka
@ 2015-01-19 10:05 ` Vlastimil Babka
2015-01-20 13:30 ` Zhang Yanfei
2015-01-19 10:05 ` [RFC PATCH 4/5] mm, compaction: allow scanners to start at any pfn within the zone Vlastimil Babka
` (4 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-19 10:05 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, Vlastimil Babka, Minchan Kim,
Mel Gorman, Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
Reseting the cached compaction scanner positions is now done implicitly in
__reset_isolation_suitable() and compact_finished(). Encapsulate the
functionality in a new function reset_cached_positions() and call it
explicitly where needed.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
mm/compaction.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 45799a4..5626220 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -123,6 +123,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
return !get_pageblock_skip(page);
}
+static void reset_cached_positions(struct zone *zone)
+{
+ zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
+ zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
+ zone->compact_cached_free_pfn = zone_end_pfn(zone);
+}
+
/*
* This function is called to clear all cached information on pageblocks that
* should be skipped for page isolation when the migrate and free page scanner
@@ -134,9 +141,6 @@ static void __reset_isolation_suitable(struct zone *zone)
unsigned long end_pfn = zone_end_pfn(zone);
unsigned long pfn;
- zone->compact_cached_migrate_pfn[0] = start_pfn;
- zone->compact_cached_migrate_pfn[1] = start_pfn;
- zone->compact_cached_free_pfn = end_pfn;
zone->compact_blockskip_flush = false;
/* Walk the zone and mark every pageblock as suitable for isolation */
@@ -166,8 +170,10 @@ void reset_isolation_suitable(pg_data_t *pgdat)
continue;
/* Only flush if a full compaction finished recently */
- if (zone->compact_blockskip_flush)
+ if (zone->compact_blockskip_flush) {
__reset_isolation_suitable(zone);
+ reset_cached_positions(zone);
+ }
}
}
@@ -1059,9 +1065,7 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
/* Compaction run completes if the migrate and free scanner meet */
if (compact_scanners_met(cc)) {
/* Let the next compaction start anew. */
- zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
- zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
- zone->compact_cached_free_pfn = zone_end_pfn(zone);
+ reset_cached_positions(zone);
/*
* Mark that the PG_migrate_skip information should be cleared
@@ -1187,8 +1191,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
* is about to be retried after being deferred. kswapd does not do
* this reset as it'll reset the cached information when going to sleep.
*/
- if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
+ if (compaction_restarting(zone, cc->order) && !current_is_kswapd()) {
__reset_isolation_suitable(zone);
+ reset_cached_positions(zone);
+ }
/*
* Setup to move all movable pages to the end of the zone. Used cached
--
2.1.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [RFC PATCH 4/5] mm, compaction: allow scanners to start at any pfn within the zone
2015-01-19 10:05 [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
` (2 preceding siblings ...)
2015-01-19 10:05 ` [PATCH 3/5] mm, compaction: encapsulate resetting cached scanner positions Vlastimil Babka
@ 2015-01-19 10:05 ` Vlastimil Babka
2015-01-20 15:18 ` Zhang Yanfei
2015-01-19 10:05 ` [RFC PATCH 5/5] mm, compaction: set pivot pfn to the pfn when scanners met last time Vlastimil Babka
` (3 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-19 10:05 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, Vlastimil Babka, Minchan Kim,
Mel Gorman, Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
Compaction employs two page scanners - migration scanner isolates pages to be
the source of migration, free page scanner isolates pages to be the target of
migration. Currently, migration scanner starts at the zone's first pageblock
and progresses towards the last one. Free scanner starts at the last pageblock
and progresses towards the first one. Within a pageblock, each scanner scans
pages from the first to the last one. When the scanners meet within the same
pageblock, compaction terminates.
One consequence of the current scheme, that turns out to be unfortunate, is
that the migration scanner does not encounter the pageblocks which were
scanned by the free scanner. In a test with stress-highalloc from mmtests,
the scanners were observed to meet around the middle of the zone in first two
phases (with background memory pressure) of the test when executed after fresh
reboot. On further executions without reboot, the meeting point shifts to
roughly third of the zone, and compaction activity as well as allocation
success rates deteriorates compared to the run after fresh reboot.
It turns out that the deterioration is indeed due to the migration scanner
processing only a small part of the zone. Compaction also keeps making this
bias worse by its activity - by moving all migratable pages towards end of the
zone, the free scanner has to scan a lot of full pageblocks to find more free
pages. The beginning of the zone contains pageblocks that have been compacted
as much as possible, but the free pages there cannot be further merged into
larger orders due to unmovable pages. The rest of the zone might contain more
suitable pageblocks, but the migration scanner will not reach them. It also
isn't be able to move movable pages out of unmovable pageblocks there, which
affects fragmentation.
This patch is the first step to remove this bias. It allows the compaction
scanners to start at arbitrary pfn (aligned to pageblock for practical
purposes), called pivot, within the zone. The migration scanner starts at the
exact pfn, the free scanner starts at the pageblock preceding the pivot. The
direction of scanning is unaffected, but when the migration scanner reaches
the last pageblock of the zone, or the free scanner reaches the first
pageblock, they wrap and continue with the first or last pageblock,
respectively. Compaction terminates when any of the scanners wrap and both
meet within the same pageblock.
For easier bisection of potential regressions, this patch always uses the
first zone's pfn as the pivot. That means the free scanner immediately wraps
to the last pageblock and the operation of scanners is thus unchanged. The
actual pivot changing is done by the next patch.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
include/linux/mmzone.h | 2 +
mm/compaction.c | 204 +++++++++++++++++++++++++++++++++++++++++++------
mm/internal.h | 1 +
3 files changed, 182 insertions(+), 25 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2f0856d..47aa181 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -503,6 +503,8 @@ struct zone {
unsigned long percpu_drift_mark;
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+ /* pfn where compaction scanners have initially started last time */
+ unsigned long compact_cached_pivot_pfn;
/* pfn where compaction free scanner should start */
unsigned long compact_cached_free_pfn;
/* pfn where async and sync compaction migration scanner should start */
diff --git a/mm/compaction.c b/mm/compaction.c
index 5626220..abae89a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -123,11 +123,16 @@ static inline bool isolation_suitable(struct compact_control *cc,
return !get_pageblock_skip(page);
}
+/*
+ * Invalidate cached compaction scanner positions, so that compact_zone()
+ * will reinitialize them on the next compaction.
+ */
static void reset_cached_positions(struct zone *zone)
{
- zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
- zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
- zone->compact_cached_free_pfn = zone_end_pfn(zone);
+ /* Invalid values are re-initialized in compact_zone */
+ zone->compact_cached_migrate_pfn[0] = 0;
+ zone->compact_cached_migrate_pfn[1] = 0;
+ zone->compact_cached_free_pfn = 0;
}
/*
@@ -172,11 +177,35 @@ void reset_isolation_suitable(pg_data_t *pgdat)
/* Only flush if a full compaction finished recently */
if (zone->compact_blockskip_flush) {
__reset_isolation_suitable(zone);
- reset_cached_positions(zone);
+ reset_cached_positions(zone, false);
}
}
}
+static void update_cached_migrate_pfn(unsigned long pfn,
+ unsigned long pivot_pfn, unsigned long *old_pfn)
+{
+ /* Both old and new pfn either wrapped or not, and new is higher */
+ if (((*old_pfn >= pivot_pfn) == (pfn >= pivot_pfn))
+ && (pfn > *old_pfn))
+ *old_pfn = pfn;
+ /* New pfn has wrapped and the old didn't yet */
+ else if ((*old_pfn >= pivot_pfn) && (pfn < pivot_pfn))
+ *old_pfn = pfn;
+}
+
+static void update_cached_free_pfn(unsigned long pfn,
+ unsigned long pivot_pfn, unsigned long *old_pfn)
+{
+ /* Both old and new either pfn wrapped or not, and new is lower */
+ if (((*old_pfn < pivot_pfn) == (pfn < pivot_pfn))
+ && (pfn < *old_pfn))
+ *old_pfn = pfn;
+ /* New pfn has wrapped and the old didn't yet */
+ else if ((*old_pfn < pivot_pfn) && (pfn >= pivot_pfn))
+ *old_pfn = pfn;
+}
+
/*
* If no pages were isolated then mark this pageblock to be skipped in the
* future. The information is later cleared by __reset_isolation_suitable().
@@ -186,6 +215,7 @@ static void update_pageblock_skip(struct compact_control *cc,
bool migrate_scanner)
{
struct zone *zone = cc->zone;
+ unsigned long pivot_pfn = cc->pivot_pfn;
unsigned long pfn;
if (cc->ignore_skip_hint)
@@ -203,14 +233,14 @@ static void update_pageblock_skip(struct compact_control *cc,
/* Update where async and sync compaction should restart */
if (migrate_scanner) {
- if (pfn > zone->compact_cached_migrate_pfn[0])
- zone->compact_cached_migrate_pfn[0] = pfn;
- if (cc->mode != MIGRATE_ASYNC &&
- pfn > zone->compact_cached_migrate_pfn[1])
- zone->compact_cached_migrate_pfn[1] = pfn;
+ update_cached_migrate_pfn(pfn, pivot_pfn,
+ &zone->compact_cached_migrate_pfn[0]);
+ if (cc->mode != MIGRATE_ASYNC)
+ update_cached_migrate_pfn(pfn, pivot_pfn,
+ &zone->compact_cached_migrate_pfn[1]);
} else {
- if (pfn < zone->compact_cached_free_pfn)
- zone->compact_cached_free_pfn = pfn;
+ update_cached_free_pfn(pfn, pivot_pfn,
+ &zone->compact_cached_free_pfn);
}
}
#else
@@ -808,14 +838,41 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
#endif /* CONFIG_COMPACTION || CONFIG_CMA */
#ifdef CONFIG_COMPACTION
+static inline bool migrate_scanner_wrapped(struct compact_control *cc)
+{
+ return cc->migrate_pfn < cc->pivot_pfn;
+}
+
+static inline bool free_scanner_wrapped(struct compact_control *cc)
+{
+ return cc->free_pfn >= cc->pivot_pfn;
+}
+
/*
* Test whether the free scanner has reached the same or lower pageblock than
* the migration scanner, and compaction should thus terminate.
*/
static inline bool compact_scanners_met(struct compact_control *cc)
{
- return (cc->free_pfn >> pageblock_order)
- <= (cc->migrate_pfn >> pageblock_order);
+ bool free_below_migrate = (cc->free_pfn >> pageblock_order)
+ <= (cc->migrate_pfn >> pageblock_order);
+
+ if (migrate_scanner_wrapped(cc) != free_scanner_wrapped(cc))
+ /*
+ * Only one of the scanners have wrapped. Terminate if free
+ * scanner is in the same or lower pageblock than migration
+ * scanner.
+ */
+ return free_below_migrate;
+ else
+ /*
+ * If neither scanner has wrapped, then free < start <=
+ * migration and we return false by definition.
+ * It shouldn't happen that both have wrapped, but even if it
+ * does due to e.g. reading mismatched zone cached pfn's, then
+ * migration < start <= free, so we return true and terminate.
+ */
+ return !free_below_migrate;
}
/*
@@ -832,7 +889,10 @@ static void isolate_freepages(struct compact_control *cc)
unsigned long low_pfn; /* lowest pfn scanner is able to scan */
int nr_freepages = cc->nr_freepages;
struct list_head *freelist = &cc->freepages;
+ bool wrapping; /* set to true in the first pageblock of the zone */
+ bool wrapped; /* set to true when either scanner has wrapped */
+wrap:
/*
* Initialise the free scanner. The starting point is where we last
* successfully isolated from, zone-cached value, or the end of the
@@ -848,14 +908,25 @@ static void isolate_freepages(struct compact_control *cc)
block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
zone_end_pfn(zone));
- low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
+
+ wrapping = false;
+ wrapped = free_scanner_wrapped(cc) || migrate_scanner_wrapped(cc);
+ if (!wrapped)
+ /*
+ * If neither scanner wrapped yet, we are limited by zone's
+ * beginning. Here we pretend that the zone starts pageblock
+ * aligned to make the for-loop condition simpler.
+ */
+ low_pfn = zone->zone_start_pfn & ~(pageblock_nr_pages-1);
+ else
+ low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
/*
* Isolate free pages until enough are available to migrate the
* pages on cc->migratepages. We stop searching if the migrate
* and free page scanners meet or enough free pages are isolated.
*/
- for (; block_start_pfn >= low_pfn;
+ for (; !wrapping && block_start_pfn >= low_pfn;
block_end_pfn = block_start_pfn,
block_start_pfn -= pageblock_nr_pages,
isolate_start_pfn = block_start_pfn) {
@@ -870,6 +941,24 @@ static void isolate_freepages(struct compact_control *cc)
&& compact_should_abort(cc))
break;
+ /*
+ * When we are limited by zone boundary, this means we have
+ * reached its first pageblock.
+ */
+ if (!wrapped && block_start_pfn <= zone->zone_start_pfn) {
+ /* The zone might start in the middle of the pageblock */
+ block_start_pfn = zone->zone_start_pfn;
+ if (isolate_start_pfn <= zone->zone_start_pfn)
+ isolate_start_pfn = zone->zone_start_pfn;
+ /*
+ * For e.g. DMA zone with zone_start_pfn == 1, we will
+ * underflow block_start_pfn in the next loop
+ * iteration. We have to terminate the loop with other
+ * means.
+ */
+ wrapping = true;
+ }
+
page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
zone);
if (!page)
@@ -903,6 +992,12 @@ static void isolate_freepages(struct compact_control *cc)
if (isolate_start_pfn >= block_end_pfn)
isolate_start_pfn =
block_start_pfn - pageblock_nr_pages;
+ else if (wrapping)
+ /*
+ * We have been in the first pageblock of the
+ * zone, but have not finished it yet.
+ */
+ wrapping = false;
break;
} else {
/*
@@ -913,6 +1008,20 @@ static void isolate_freepages(struct compact_control *cc)
}
}
+ /* Did we reach the beginning of the zone? Wrap to the end. */
+ if (!wrapped && wrapping) {
+ isolate_start_pfn = (zone_end_pfn(zone)-1) &
+ ~(pageblock_nr_pages-1);
+ /*
+ * If we haven't isolated anything, we have to continue
+ * immediately, otherwise page migration will fail.
+ */
+ if (!nr_freepages && !cc->contended) {
+ cc->free_pfn = isolate_start_pfn;
+ goto wrap;
+ }
+ }
+
/* split_free_page does not map the pages */
map_pages(freelist);
@@ -984,10 +1093,11 @@ typedef enum {
static isolate_migrate_t isolate_migratepages(struct zone *zone,
struct compact_control *cc)
{
- unsigned long low_pfn, end_pfn;
+ unsigned long low_pfn, end_pfn, max_pfn;
struct page *page;
const isolate_mode_t isolate_mode =
(cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);
+ bool wrapped = migrate_scanner_wrapped(cc) || free_scanner_wrapped(cc);
/*
* Start at where we last stopped, or beginning of the zone as
@@ -998,13 +1108,27 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
/* Only scan within a pageblock boundary */
end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
+ if (!wrapped) {
+ /*
+ * Neither of the scanners has wrapped yet, we are limited by
+ * zone end. Here we pretend it's aligned to pageblock
+ * boundary to make the for-loop condition simpler
+ */
+ max_pfn = ALIGN(zone_end_pfn(zone), pageblock_nr_pages);
+ } else {
+ /* If any of the scanners wrapped, we will meet free scanner */
+ max_pfn = cc->free_pfn;
+ }
+
/*
* Iterate over whole pageblocks until we find the first suitable.
- * Do not cross the free scanner.
+ * Do not cross the free scanner or the end of the zone.
*/
- for (; end_pfn <= cc->free_pfn;
+ for (; end_pfn <= max_pfn;
low_pfn = end_pfn, end_pfn += pageblock_nr_pages) {
+ if (!wrapped && end_pfn > zone_end_pfn(zone))
+ end_pfn = zone_end_pfn(zone);
/*
* This can potentially iterate a massively long zone with
* many pageblocks unsuitable, so periodically check if we
@@ -1047,6 +1171,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
}
acct_isolated(zone, cc);
+ /* Did we reach the end of the zone? Wrap to the beginning */
+ if (!wrapped && low_pfn >= zone_end_pfn(zone))
+ low_pfn = zone->zone_start_pfn;
+
/* Record where migration scanner will be restarted. */
cc->migrate_pfn = low_pfn;
@@ -1197,22 +1325,48 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
}
/*
- * Setup to move all movable pages to the end of the zone. Used cached
+ * Setup the scanner positions according to pivot pfn. Use cached
* information on where the scanners should start but check that it
* is initialised by ensuring the values are within zone boundaries.
*/
- cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
- cc->free_pfn = zone->compact_cached_free_pfn;
- if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
- cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
- zone->compact_cached_free_pfn = cc->free_pfn;
+ cc->pivot_pfn = zone->compact_cached_pivot_pfn;
+ if (cc->pivot_pfn < start_pfn || cc->pivot_pfn > end_pfn) {
+ cc->pivot_pfn = start_pfn;
+ zone->compact_cached_pivot_pfn = cc->pivot_pfn;
+ /* When starting position was invalid, reset the rest */
+ reset_cached_positions(zone);
}
+
+ cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
- cc->migrate_pfn = start_pfn;
+ cc->migrate_pfn = cc->pivot_pfn;
zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
}
+ cc->free_pfn = zone->compact_cached_free_pfn;
+ if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn)
+ cc->free_pfn = cc->pivot_pfn;
+
+ /*
+ * Free scanner should start on the beginning of the pageblock below
+ * the cc->pivot_pfn. If that's below the zone boundary, wrap to the
+ * last pageblock of the zone.
+ */
+ if (cc->free_pfn == cc->pivot_pfn) {
+ /* Don't underflow in zones starting with e.g. pfn 1 */
+ if (cc->pivot_pfn < pageblock_nr_pages) {
+ cc->free_pfn = (end_pfn-1) & ~(pageblock_nr_pages-1);
+ } else {
+ cc->free_pfn = (cc->pivot_pfn - pageblock_nr_pages);
+ cc->free_pfn &= ~(pageblock_nr_pages-1);
+ if (cc->free_pfn < start_pfn)
+ cc->free_pfn = (end_pfn-1) &
+ ~(pageblock_nr_pages-1);
+ }
+ zone->compact_cached_free_pfn = cc->free_pfn;
+ }
+
trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn);
migrate_prep_local();
diff --git a/mm/internal.h b/mm/internal.h
index efad241..cb7b297 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -157,6 +157,7 @@ struct compact_control {
struct list_head migratepages; /* List of pages being migrated */
unsigned long nr_freepages; /* Number of isolated free pages */
unsigned long nr_migratepages; /* Number of pages to migrate */
+ unsigned long pivot_pfn; /* Where the scanners initially start */
unsigned long free_pfn; /* isolate_freepages search base */
unsigned long migrate_pfn; /* isolate_migratepages search base */
enum migrate_mode mode; /* Async or sync migration mode */
--
2.1.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [RFC PATCH 5/5] mm, compaction: set pivot pfn to the pfn when scanners met last time
2015-01-19 10:05 [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
` (3 preceding siblings ...)
2015-01-19 10:05 ` [RFC PATCH 4/5] mm, compaction: allow scanners to start at any pfn within the zone Vlastimil Babka
@ 2015-01-19 10:05 ` Vlastimil Babka
2015-01-19 10:10 ` [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
` (2 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-19 10:05 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, Vlastimil Babka, Minchan Kim,
Mel Gorman, Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
The previous patch has prepared compaction scanners to start at arbitrary
pivot pfn within the zone, but left the pivot at the first pfn of the zone.
This patch introduces actual changing of the pivot pfn.
Our goal is to remove the bias in compaction under memory pressure, where
the migration scanner scans only the first half (or less) of the zone where
it cannot succeed anymore. At the same time we want to avoid frequent changes
of the pivot which would result in migrating pages back and forth without much
benefit. So the question is how often to change the pivot, and to which pfn
it should be set.
Another thing to consider is that the scanners mark pageblocks as unsuitable
for scanning via update_pageblock_skip(), which is a single bit per pageblock.
However, pageblock being unsuitable as a source of free pages is completely
different condition from pageblock being unsuitable as the source of
migratable pages. Thus, changing the pivot should be accompanied with
resetting the skip bits. The resetting is currently done either when kswapd
goes to sleep, or when compaction is being restarted from the longest possible
deferred compaction period.
Thus as a conservative first step, this patch does not increase the frequency
of skip bits resetting, and ties changing the pivot only to the situations
where compaction is restarted from being deferred. This happens when
compaction has failed a lot with the previous pivot, and most pageblocks were
already marked as unsuitable. Thus, most migrations occured relatively long
ago and we are not going to frequently migrate back and forth.
The pivot position is simply set to the pageblock where the scanners have met
during the last finished compaction. This means that migration scanner will
immediately scan pageblocks that it couldn't reach with the previous pivot.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
include/linux/mmzone.h | 2 ++
mm/compaction.c | 22 +++++++++++++++++-----
2 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 47aa181..7801886 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -505,6 +505,8 @@ struct zone {
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
/* pfn where compaction scanners have initially started last time */
unsigned long compact_cached_pivot_pfn;
+ /* pfn where compaction scanners have met last time */
+ unsigned long compact_cached_last_met_pfn;
/* pfn where compaction free scanner should start */
unsigned long compact_cached_free_pfn;
/* pfn where async and sync compaction migration scanner should start */
diff --git a/mm/compaction.c b/mm/compaction.c
index abae89a..70792c5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -125,10 +125,16 @@ static inline bool isolation_suitable(struct compact_control *cc,
/*
* Invalidate cached compaction scanner positions, so that compact_zone()
- * will reinitialize them on the next compaction.
+ * will reinitialize them on the next compaction. Optionally reset the
+ * initial pivot position for the scanners to the position where the scanners
+ * have met the last time.
*/
-static void reset_cached_positions(struct zone *zone)
+static void reset_cached_positions(struct zone *zone, bool update_pivot)
{
+ if (update_pivot)
+ zone->compact_cached_pivot_pfn =
+ zone->compact_cached_last_met_pfn;
+
/* Invalid values are re-initialized in compact_zone */
zone->compact_cached_migrate_pfn[0] = 0;
zone->compact_cached_migrate_pfn[1] = 0;
@@ -1193,7 +1199,13 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
/* Compaction run completes if the migrate and free scanner meet */
if (compact_scanners_met(cc)) {
/* Let the next compaction start anew. */
- reset_cached_positions(zone);
+ reset_cached_positions(zone, false);
+ /*
+ * Remember where compaction scanners met for the next time
+ * the pivot pfn is changed.
+ */
+ zone->compact_cached_last_met_pfn =
+ cc->migrate_pfn & ~(pageblock_nr_pages-1);
/*
* Mark that the PG_migrate_skip information should be cleared
@@ -1321,7 +1333,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
*/
if (compaction_restarting(zone, cc->order) && !current_is_kswapd()) {
__reset_isolation_suitable(zone);
- reset_cached_positions(zone);
+ reset_cached_positions(zone, true);
}
/*
@@ -1334,7 +1346,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
cc->pivot_pfn = start_pfn;
zone->compact_cached_pivot_pfn = cc->pivot_pfn;
/* When starting position was invalid, reset the rest */
- reset_cached_positions(zone);
+ reset_cached_positions(zone, false);
}
cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
--
2.1.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/5] compaction: changing initial position of scanners
2015-01-19 10:05 [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
` (4 preceding siblings ...)
2015-01-19 10:05 ` [RFC PATCH 5/5] mm, compaction: set pivot pfn to the pfn when scanners met last time Vlastimil Babka
@ 2015-01-19 10:10 ` Vlastimil Babka
2015-01-20 9:02 ` Vlastimil Babka
2015-02-03 6:49 ` Joonsoo Kim
7 siblings, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-19 10:10 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, David Rientjes, Christoph Lameter,
Joonsoo Kim, Mel Gorman, Michal Nazarewicz, Minchan Kim,
Naoya Horiguchi, Rik van Riel
Looks like wrapping got busted in the results part. Let's try again:
-----
Even after all the patches compaction received in last several versions, it
turns out that its effectivneess degrades considerably as the system ages
after reboot. For example, see how success rates of stress-highalloc from
mmtests degrades when we re-execute it several times, first time being after
fresh reboot:
3.19-rc4 3.19-rc4 3.19-rc4
4-nothp-1 4-nothp-2 4-nothp-3
Success 1 Min 25.00 ( 0.00%) 13.00 ( 48.00%) 9.00 ( 64.00%)
Success 1 Mean 36.20 ( 0.00%) 23.40 ( 35.36%) 16.40 ( 54.70%)
Success 1 Max 41.00 ( 0.00%) 34.00 ( 17.07%) 25.00 ( 39.02%)
Success 2 Min 25.00 ( 0.00%) 15.00 ( 40.00%) 10.00 ( 60.00%)
Success 2 Mean 37.20 ( 0.00%) 25.00 ( 32.80%) 17.20 ( 53.76%)
Success 2 Max 44.00 ( 0.00%) 36.00 ( 18.18%) 25.00 ( 43.18%)
Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 78.00 ( 7.14%)
Success 3 Mean 85.80 ( 0.00%) 82.80 ( 3.50%) 80.40 ( 6.29%)
Success 3 Max 87.00 ( 0.00%) 84.00 ( 3.45%) 82.00 ( 5.75%)
Wouldn't it be much better, if it looked like this?
3.18 3.18 3.18
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%)
Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%)
Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%)
Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%)
Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%)
Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%)
Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%)
Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%)
Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
In my humble opinion, it would :) Much lower degradation, and a nice
improvement in the first iteration as a bonus.
So what sorcery is this? Nothing much, just a fundamental change of the
compaction scanners operation...
As everyone knows [1] the migration scanner starts at the first pageblock
of a zone, and goes towards the end, and the free scanner starts at the
last pageblock and goes towards the beginning. Somewhere in the middle of the
zone, the scanners meet:
zone_start zone_end
| |
-------------------------------------------------------------
MMMMMMMMMMMMM| => <= |FFFFFFFFFFFF
migrate_pfn free_pfn
In my tests, the scanners meet around the middle of the pageblock on the first
iteration, and around the 1/3 on subsequent iterations. Which means the
migration scanner doesn't see the larger part of the zone at all. For more
details why it's bad, see Patch 4 description.
To make sure we eventually scan the whole zone with the migration scanner, we
could e.g. reverse the directions after each run. But that would still be
biased, and with 1/3 of zone reachable from each side, we would still omit the
middle 1/3 of a zone.
Or we could stop terminating compaction when the scanners meet, and let them
continue to scan the whole zone. Mel told me it used to be the case long ago,
but that approach would result in migrating pages back and forth during single
compaction run, which wouldn't be cool.
So the approach taken by this patchset is to let scanners start at any
so-called pivot pfn within the zone, and keep their direction:
zone_start pivot zone_end
| | |
-------------------------------------------------------------
<= |FFFFFFFMMMMMM| =>
free_pfn migrate_pfn
Eventually, one of the scanners will reach the zone boundary and wrap around,
e.g. the in the case of the free scanner:
zone_start pivot zone_end
| | |
-------------------------------------------------------------
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFMMMMMMMMMMMM| => <= |F
migrate_pfn free_pfn
Compaction will again terminate when the scanners meet.
As you can imagine, the required code changes made the termination detection
and the scanners themselves quite hairy. There are lots of corner cases and
the code is often hard to wrap one's head around [puns intended]. The scanner
functions isolate_migratepages() and isolate_freepages() were recently cleaned
up a lot, and this makes them messy again, as they can no longer rely on the
fact that they will meet the other scanner and not the zone boundary.
But the improvements seem to make these complications worth, and I hope
somebody can suggest more elegant solutions to the various parts of the code.
So here it is as a RFC. Patches 1-3 are cleanups that could be applied in any
case. Patch 4 implements the main changes, but leaves the pivot to be the
first zone's pfn, so that free scanner wraps immediately and there's no
actual change. Patch 5 updates the pivot in a conservative way, as explained
in the changelog.
Even with the conservative approach, stress-highalloc results are improved,
mainly on the 2+ iterations after fresh reboot. Here are again the results
before the patches, including compaction stats:
3.19-rc4 3.19-rc4 3.19-rc4
4-nothp-1 4-nothp-2 4-nothp-3
Success 1 Min 25.00 ( 0.00%) 13.00 ( 48.00%) 9.00 ( 64.00%)
Success 1 Mean 36.20 ( 0.00%) 23.40 ( 35.36%) 16.40 ( 54.70%)
Success 1 Max 41.00 ( 0.00%) 34.00 ( 17.07%) 25.00 ( 39.02%)
Success 2 Min 25.00 ( 0.00%) 15.00 ( 40.00%) 10.00 ( 60.00%)
Success 2 Mean 37.20 ( 0.00%) 25.00 ( 32.80%) 17.20 ( 53.76%)
Success 2 Max 44.00 ( 0.00%) 36.00 ( 18.18%) 25.00 ( 43.18%)
Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 78.00 ( 7.14%)
Success 3 Mean 85.80 ( 0.00%) 82.80 ( 3.50%) 80.40 ( 6.29%)
Success 3 Max 87.00 ( 0.00%) 84.00 ( 3.45%) 82.00 ( 5.75%)
3.19-rc4 3.19-rc4 3.19-rc4
4-nothp-1 4-nothp-2 4-nothp-3
User 6781.16 6931.44 6905.44
System 1073.97 1071.20 1067.92
Elapsed 2349.71 2290.40 2255.59
3.19-rc4 3.19-rc4 3.19-rc4
4-nothp-1 4-nothp-2 4-nothp-3
Minor Faults 198270153 197020929 197146656
Major Faults 498 470 527
Swap Ins 60 34 105
Swap Outs 2780 1011 425
Allocation stalls 8325 4615 4769
DMA allocs 161 177 141
DMA32 allocs 124477754 123412713 123385291
Normal allocs 59366406 59040066 58909035
Movable allocs 0 0 0
Direct pages scanned 413858 280997 267341
Kswapd pages scanned 2204723 2184556 2147546
Kswapd pages reclaimed 2199430 2181305 2144395
Direct pages reclaimed 413062 280323 266557
Kswapd efficiency 99% 99% 99%
Kswapd velocity 914.497 977.868 943.877
Direct efficiency 99% 99% 99%
Direct velocity 171.664 125.782 117.500
Percentage direct scans 15% 11% 11%
Zone normal velocity 359.993 356.888 339.577
Zone dma32 velocity 726.152 746.745 721.784
Zone dma velocity 0.016 0.017 0.016
Page writes by reclaim 2893.000 1119.800 507.600
Page writes file 112 108 81
Page writes anon 2780 1011 425
Page reclaim immediate 1134 1197 1412
Sector Reads 4747006 4606605 4524025
Sector Writes 12759301 12562316 12629243
Page rescued immediate 0 0 0
Slabs scanned 1807821 1528751 1498929
Direct inode steals 24428 12649 13346
Kswapd inode steals 33213 29804 30194
Kswapd skipped wait 0 0 0
THP fault alloc 217 207 222
THP collapse alloc 500 539 373
THP splits 13 14 13
THP fault fallback 50 9 16
THP collapse fail 16 17 22
Compaction stalls 3136 2310 2072
Compaction success 1123 897 828
Compaction failures 2012 1413 1244
Page migrate success 4319697 3682666 3133699
Page migrate failure 19012 9964 7488
Compaction pages isolated 8974417 7634911 6528981
Compaction migrate scanned 182447073 122423031 127292737
Compaction free scanned 389193883 291257587 269516820
Compaction cost 5931 4824 4267
As the allocation success rates degrade, so do the compaction success rates,
and so does the scanner activity.
Now after the series:
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%)
Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%)
Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%)
Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%)
Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%)
Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%)
Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%)
Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%)
Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
User 6675.75 6742.26 6707.92
System 1069.78 1070.77 1070.25
Elapsed 2450.31 2363.98 2442.21
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
Minor Faults 197652452 197164571 197946824
Major Faults 882 743 934
Swap Ins 113 96 144
Swap Outs 2144 1340 1799
Allocation stalls 10304 9060 10261
DMA allocs 142 227 181
DMA32 allocs 124497911 123947671 124010151
Normal allocs 59233600 59160910 59669421
Movable allocs 0 0 0
Direct pages scanned 591368 481119 525158
Kswapd pages scanned 2217381 2164036 2170388
Kswapd pages reclaimed 2212285 2160162 2166794
Direct pages reclaimed 590064 480297 524013
Kswapd efficiency 99% 99% 99%
Kswapd velocity 921.119 937.864 936.558
Direct efficiency 99% 99% 99%
Direct velocity 245.659 208.511 226.615
Percentage direct scans 21% 18% 19%
Zone normal velocity 378.957 376.819 383.604
Zone dma32 velocity 787.805 769.530 779.552
Zone dma velocity 0.015 0.025 0.016
Page writes by reclaim 2284.600 1432.400 1972.400
Page writes file 140 92 173
Page writes anon 2144 1340 1799
Page reclaim immediate 1689 1315 1440
Sector Reads 4920699 4830343 4830263
Sector Writes 12643658 12588410 12713518
Page rescued immediate 0 0 0
Slabs scanned 2084358 1922421 1962867
Direct inode steals 35881 37170 45512
Kswapd inode steals 35973 35741 30339
Kswapd skipped wait 0 0 0
THP fault alloc 191 224 170
THP collapse alloc 554 533 516
THP splits 15 15 11
THP fault fallback 45 0 76
THP collapse fail 16 18 18
Compaction stalls 4069 3746 3951
Compaction success 1507 1388 1366
Compaction failures 2562 2357 2585
Page migrate success 4533617 4912832 5278583
Page migrate failure 20127 15273 17862
Compaction pages isolated 9495626 10235697 10993942
Compaction migrate scanned 93585708 99204656 92052138
Compaction free scanned 510786018 503529022 541298357
Compaction cost 5541 5988 6332
Less degradation in success rates and no degradation in activity.
Let's look again at compactions stats without the series:
Compaction stalls 3136 2310 2072
Compaction success 1123 897 828
Compaction failures 2012 1413 1244
Page migrate success 4319697 3682666 3133699
Page migrate failure 19012 9964 7488
Compaction pages isolated 8974417 7634911 6528981
Compaction migrate scanned 182447073 122423031 127292737
Compaction free scanned 389193883 291257587 269516820
Compaction cost 5931 4824 4267
Interestingly, the number of migrate scanned pages was higher before the
series, even twice in the first iteration. That suggests that the scanner
was indeed scanning in a limited number of pageblocks over and over, despite
their compaction potential "depleted". After the patches, it achieves better
results with less scanned blocks, as it has a better chance of encountering
non-depleted blocks. The free scanner activity has increased after the series,
which just means that higher success rates lead to less deferred compaction.
There is also some increase in page migrations, but it is likely also a
consequence of better success and not due to useless back and forth migrations
due to pivot changes.
Preliminary testing with THP-like allocations has shown similar improvements,
which is somewhat surprising, because THP allocations do not use sync
and thus do not defer compaction; but changing the pivot is currently tied
to restarting from deferred compaction. Possibly this is due to the other
allocation activity than the stress-highalloc test itself. I will post these
results when full measurements are done.
[1] http://lwn.net/Articles/368869/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/5] compaction: changing initial position of scanners
2015-01-19 10:05 [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
` (5 preceding siblings ...)
2015-01-19 10:10 ` [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
@ 2015-01-20 9:02 ` Vlastimil Babka
2015-02-03 6:49 ` Joonsoo Kim
7 siblings, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-20 9:02 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, David Rientjes, Christoph Lameter,
Joonsoo Kim, Mel Gorman, Michal Nazarewicz, Minchan Kim,
Naoya Horiguchi, Rik van Riel
On 01/19/2015 11:05 AM, Vlastimil Babka wrote:
> Preliminary testing with THP-like allocations has shown similar improvements,
> which is somewhat surprising, because THP allocations do not use sync
> and thus do not defer compaction; but changing the pivot is currently tied
> to restarting from deferred compaction. Possibly this is due to the other
> allocation activity than the stress-highalloc test itself. I will post these
> results when full measurements are done.
So here are the full results, before patch 5:
3.19-rc4 3.19-rc4 3.19-rc4
4-thp-1 4-thp-2 4-thp-3
Success 1 Min 24.00 ( 0.00%) 14.00 ( 41.67%) 16.00 ( 33.33%)
Success 1 Mean 29.80 ( 0.00%) 17.00 ( 42.95%) 19.80 ( 33.56%)
Success 1 Max 33.00 ( 0.00%) 25.00 ( 24.24%) 29.00 ( 12.12%)
Success 2 Min 26.00 ( 0.00%) 15.00 ( 42.31%) 17.00 ( 34.62%)
Success 2 Mean 32.60 ( 0.00%) 19.20 ( 41.10%) 21.20 ( 34.97%)
Success 2 Max 37.00 ( 0.00%) 28.00 ( 24.32%) 30.00 ( 18.92%)
Success 3 Min 85.00 ( 0.00%) 82.00 ( 3.53%) 80.00 ( 5.88%)
Success 3 Mean 86.20 ( 0.00%) 83.20 ( 3.48%) 81.20 ( 5.80%)
Success 3 Max 87.00 ( 0.00%) 84.00 ( 3.45%) 82.00 ( 5.75%)
3.19-rc4 3.19-rc4 3.19-rc4
4-thp-1 4-thp-2 4-thp-3
User 6798.70 6905.43 6941.09
System 1064.04 1062.94 1062.76
Elapsed 2108.98 2026.84 2039.40
3.19-rc4 3.19-rc4 3.19-rc4
4-thp-1 4-thp-2 4-thp-3
Minor Faults 198099852 197531505 197503750
Major Faults 483 422 490
Swap Ins 86 55 113
Swap Outs 3138 887 399
Allocation stalls 6523 4619 4816
DMA allocs 20 21 29
DMA32 allocs 124645618 123937328 123927500
Normal allocs 58904757 58753982 58818313
Movable allocs 0 0 0
Direct pages scanned 428919 361735 402238
Kswapd pages scanned 2110062 2070576 2042000
Kswapd pages reclaimed 2104207 2067640 2026147
Direct pages reclaimed 427965 361045 401398
Kswapd efficiency 99% 99% 99%
Kswapd velocity 1019.443 1032.145 1015.507
Direct efficiency 99% 99% 99%
Direct velocity 207.225 180.319 200.037
Percentage direct scans 16% 14% 16%
Zone normal velocity 408.313 391.138 385.933
Zone dma32 velocity 818.355 821.320 829.605
Zone dma velocity 0.000 0.006 0.006
Page writes by reclaim 3352.000 963.000 517.000
Page writes file 213 75 117
Page writes anon 3138 887 399
Page reclaim immediate 1144 1066 14181
Sector Reads 4628816 4522758 4532221
Sector Writes 12767080 12671744 12651826
Page rescued immediate 0 0 0
Slabs scanned 1697125 1504150 1497662
Direct inode steals 18505 12670 12166
Kswapd inode steals 34595 31472 30069
Kswapd skipped wait 0 0 0
THP fault alloc 239 220 235
THP collapse alloc 507 406 478
THP splits 14 13 12
THP fault fallback 28 25 6
THP collapse fail 16 21 19
Compaction stalls 2661 1953 2062
Compaction success 1005 738 832
Compaction failures 1656 1215 1229
Page migrate success 2137343 1941288 1984697
Page migrate failure 9345 5923 7109
Compaction pages isolated 4630988 4169422 4271015
Compaction migrate scanned 42109445 35090957 37926766
Compaction free scanned 150533970 131933800 126682354
Compaction cost 2601 2339 2406
After patch 5:
3.19-rc4 3.19-rc4 3.19-rc4
5-thp-1 5-thp-2 5-thp-3
Success 1 Min 43.00 ( 0.00%) 35.00 ( 18.60%) 33.00 ( 23.26%)
Success 1 Mean 44.80 ( 0.00%) 37.20 ( 16.96%) 36.40 ( 18.75%)
Success 1 Max 46.00 ( 0.00%) 38.00 ( 17.39%) 41.00 ( 10.87%)
Success 2 Min 53.00 ( 0.00%) 44.00 ( 16.98%) 40.00 ( 24.53%)
Success 2 Mean 55.60 ( 0.00%) 47.60 ( 14.39%) 43.60 ( 21.58%)
Success 2 Max 58.00 ( 0.00%) 50.00 ( 13.79%) 46.00 ( 20.69%)
Success 3 Min 85.00 ( 0.00%) 81.00 ( 4.71%) 79.00 ( 7.06%)
Success 3 Mean 86.40 ( 0.00%) 83.40 ( 3.47%) 80.20 ( 7.18%)
Success 3 Max 88.00 ( 0.00%) 85.00 ( 3.41%) 82.00 ( 6.82%)
3.19-rc4 3.19-rc4 3.19-rc4
5-thp-1 5-thp-2 5-thp-3
User 6690.38 6734.92 6772.31
System 1065.54 1063.26 1064.48
Elapsed 2172.84 2153.99 2162.77
3.19-rc4 3.19-rc4 3.19-rc4
5-thp-1 5-thp-2 5-thp-3
Minor Faults 196642124 197072025 196458816
Major Faults 712 629 663
Swap Ins 112 89 119
Swap Outs 2425 1615 955
Allocation stalls 10987 8689 8072
DMA allocs 13 13 12
DMA32 allocs 124048790 124176073 123636557
Normal allocs 58641381 58758990 58548760
Movable allocs 0 0 0
Direct pages scanned 607943 519167 491800
Kswapd pages scanned 2091239 2093530 2080230
Kswapd pages reclaimed 2085935 2090069 2076784
Direct pages reclaimed 607103 518178 490870
Kswapd efficiency 99% 99% 99%
Kswapd velocity 953.698 980.452 978.692
Direct efficiency 99% 99% 99%
Direct velocity 277.249 243.139 231.379
Percentage direct scans 22% 19% 19%
Zone normal velocity 411.137 400.519 393.191
Zone dma32 velocity 819.809 823.068 816.877
Zone dma velocity 0.000 0.004 0.004
Page writes by reclaim 2568.800 1735.400 1112.200
Page writes file 143 120 156
Page writes anon 2425 1615 955
Page reclaim immediate 1316 1240 1460
Sector Reads 4801026 4810421 4766392
Sector Writes 12521168 12599312 12490746
Page rescued immediate 0 0 0
Slabs scanned 1998998 1873533 1826614
Direct inode steals 39894 44540 29447
Kswapd inode steals 34313 24112 32393
Kswapd skipped wait 0 0 0
THP fault alloc 204 227 190
THP collapse alloc 554 555 463
THP splits 14 16 11
THP fault fallback 3 3 0
THP collapse fail 15 16 20
Compaction stalls 3742 3350 3257
Compaction success 1427 1298 1262
Compaction failures 2315 2051 1995
Page migrate success 2557969 2615986 2766293
Page migrate failure 11532 7297 8779
Compaction pages isolated 5601557 5698249 6044777
Compaction migrate scanned 16776380 20331554 22910553
Compaction free scanned 213184847 172781034 169740956
Compaction cost 2879 2966 3146
Success rates are still worse on second iteration compared to first, but much better overal.
Let's repeat compaction stats before patch 5, for easier comparison:
Compaction stalls 2661 1953 2062
Compaction success 1005 738 832
Compaction failures 1656 1215 1229
Page migrate success 2137343 1941288 1984697
Page migrate failure 9345 5923 7109
Compaction pages isolated 4630988 4169422 4271015
Compaction migrate scanned 42109445 35090957 37926766
Compaction free scanned 150533970 131933800 126682354
Compaction cost 2601 2339 2406
Again, migrate scanned went actually down after patch 5, the rest went up as
compaction was more successful and less deferred.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/5] mm, compaction: more robust check for scanners meeting
2015-01-19 10:05 ` [PATCH 1/5] mm, compaction: more robust check for scanners meeting Vlastimil Babka
@ 2015-01-20 13:26 ` Zhang Yanfei
0 siblings, 0 replies; 22+ messages in thread
From: Zhang Yanfei @ 2015-01-20 13:26 UTC (permalink / raw)
To: Vlastimil Babka, linux-mm
Cc: linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
在 2015/1/19 18:05, Vlastimil Babka 写道:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration
> scanner reaches the begining of the same pageblock.
>
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction:
> remember position within pageblock in free pages scanner") allowed the free
> scanner position to be in the middle of a pageblock between invocations.
> The hot-fix 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in
> compact_zone") prevented the issue by adding a special check in the migration
> scanner to satisfy the current detection of scanners meeting.
>
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
>
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
> mm/compaction.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 546e571..5fdbdb8 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -803,6 +803,16 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
> #endif /* CONFIG_COMPACTION || CONFIG_CMA */
> #ifdef CONFIG_COMPACTION
> /*
> + * Test whether the free scanner has reached the same or lower pageblock than
> + * the migration scanner, and compaction should thus terminate.
> + */
> +static inline bool compact_scanners_met(struct compact_control *cc)
> +{
> + return (cc->free_pfn >> pageblock_order)
> + <= (cc->migrate_pfn >> pageblock_order);
> +}
> +
> +/*
> * Based on information in the current compact_control, find blocks
> * suitable for isolating free pages from and then isolate them.
> */
> @@ -1027,12 +1037,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> }
>
> acct_isolated(zone, cc);
> - /*
> - * Record where migration scanner will be restarted. If we end up in
> - * the same pageblock as the free scanner, make the scanners fully
> - * meet so that compact_finished() terminates compaction.
> - */
> - cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
> + /* Record where migration scanner will be restarted. */
> + cc->migrate_pfn = low_pfn;
>
> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> }
> @@ -1047,7 +1053,7 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
> return COMPACT_PARTIAL;
>
> /* Compaction run completes if the migrate and free scanner meet */
> - if (cc->free_pfn <= cc->migrate_pfn) {
> + if (compact_scanners_met(cc)) {
> /* Let the next compaction start anew. */
> zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> @@ -1238,7 +1244,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> * migrate_pages() may return -ENOMEM when scanners meet
> * and we want compact_finished() to detect it
> */
> - if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
> + if (err == -ENOMEM && !compact_scanners_met(cc)) {
> ret = COMPACT_PARTIAL;
> goto out;
> }
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/5] mm, compaction: simplify handling restart position in free pages scanner
2015-01-19 10:05 ` [PATCH 2/5] mm, compaction: simplify handling restart position in free pages scanner Vlastimil Babka
@ 2015-01-20 13:27 ` Zhang Yanfei
2015-01-20 13:31 ` Vlastimil Babka
0 siblings, 1 reply; 22+ messages in thread
From: Zhang Yanfei @ 2015-01-20 13:27 UTC (permalink / raw)
To: Vlastimil Babka, linux-mm
Cc: linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
Hello,
在 2015/1/19 18:05, Vlastimil Babka 写道:
> Handling the position where compaction free scanner should restart (stored in
> cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
> remember position within pageblock in free pages scanner"). Currently the
> position is updated in each loop iteration isolate_freepages(), although it's
> enough to update it only when exiting the loop when we have found enough free
> pages, or detected contention in async compaction. Then an extra check outside
> the loop updates the position in case we have met the migration scanner.
>
> This can be simplified if we move the test for having isolated enough from
> for loop header next to the test for contention, and determining the restart
> position only in these cases. We can reuse the isolate_start_pfn variable for
> this instead of setting cc->free_pfn directly. Outside the loop, we can simply
> set cc->free_pfn to value of isolate_start_pfn without extra check.
>
> We also add VM_BUG_ON to future-proof the code, in case somebody adds a new
> condition that terminates isolate_freepages_block() prematurely, which
> wouldn't be also considered in isolate_freepages().
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
> mm/compaction.c | 34 +++++++++++++++++++---------------
> 1 file changed, 19 insertions(+), 15 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5fdbdb8..45799a4 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -849,7 +849,7 @@ static void isolate_freepages(struct compact_control *cc)
> * pages on cc->migratepages. We stop searching if the migrate
> * and free page scanners meet or enough free pages are isolated.
> */
> - for (; block_start_pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
> + for (; block_start_pfn >= low_pfn;
> block_end_pfn = block_start_pfn,
> block_start_pfn -= pageblock_nr_pages,
> isolate_start_pfn = block_start_pfn) {
> @@ -883,6 +883,8 @@ static void isolate_freepages(struct compact_control *cc)
> nr_freepages += isolated;
>
> /*
> + * If we isolated enough freepages, or aborted due to async
> + * compaction being contended, terminate the loop.
> * Remember where the free scanner should restart next time,
> * which is where isolate_freepages_block() left off.
> * But if it scanned the whole pageblock, isolate_start_pfn
> @@ -891,28 +893,30 @@ static void isolate_freepages(struct compact_control *cc)
> * In that case we will however want to restart at the start
> * of the previous pageblock.
> */
> - cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
> - isolate_start_pfn :
> - block_start_pfn - pageblock_nr_pages;
> -
> - /*
> - * isolate_freepages_block() might have aborted due to async
> - * compaction being contended
> - */
> - if (cc->contended)
> + if ((nr_freepages > cc->nr_migratepages) || cc->contended) {
Shouldn't this be nr_freepages >= cc->nr_migratepages?
Thanks
> + if (isolate_start_pfn >= block_end_pfn)
> + isolate_start_pfn =
> + block_start_pfn - pageblock_nr_pages;
> break;
> + } else {
> + /*
> + * isolate_freepages_block() should not terminate
> + * prematurely unless contended, or isolated enough
> + */
> + VM_BUG_ON(isolate_start_pfn < block_end_pfn);
> + }
> }
>
> /* split_free_page does not map the pages */
> map_pages(freelist);
>
> /*
> - * If we crossed the migrate scanner, we want to keep it that way
> - * so that compact_finished() may detect this
> + * Record where the free scanner will restart next time. Either we
> + * broke from the loop and set isolate_start_pfn based on the last
> + * call to isolate_freepages_block(), or we met the migration scanner
> + * and the loop terminated due to isolate_start_pfn < low_pfn
> */
> - if (block_start_pfn < low_pfn)
> - cc->free_pfn = cc->migrate_pfn;
> -
> + cc->free_pfn = isolate_start_pfn;
> cc->nr_freepages = nr_freepages;
> }
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/5] mm, compaction: encapsulate resetting cached scanner positions
2015-01-19 10:05 ` [PATCH 3/5] mm, compaction: encapsulate resetting cached scanner positions Vlastimil Babka
@ 2015-01-20 13:30 ` Zhang Yanfei
2015-01-22 15:22 ` Vlastimil Babka
0 siblings, 1 reply; 22+ messages in thread
From: Zhang Yanfei @ 2015-01-20 13:30 UTC (permalink / raw)
To: Vlastimil Babka, linux-mm
Cc: linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
在 2015/1/19 18:05, Vlastimil Babka 写道:
> Reseting the cached compaction scanner positions is now done implicitly in
> __reset_isolation_suitable() and compact_finished(). Encapsulate the
> functionality in a new function reset_cached_positions() and call it
> explicitly where needed.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Should the new function be inline?
Thanks.
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
> mm/compaction.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 45799a4..5626220 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -123,6 +123,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
> return !get_pageblock_skip(page);
> }
>
> +static void reset_cached_positions(struct zone *zone)
> +{
> + zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> + zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> + zone->compact_cached_free_pfn = zone_end_pfn(zone);
> +}
> +
> /*
> * This function is called to clear all cached information on pageblocks that
> * should be skipped for page isolation when the migrate and free page scanner
> @@ -134,9 +141,6 @@ static void __reset_isolation_suitable(struct zone *zone)
> unsigned long end_pfn = zone_end_pfn(zone);
> unsigned long pfn;
>
> - zone->compact_cached_migrate_pfn[0] = start_pfn;
> - zone->compact_cached_migrate_pfn[1] = start_pfn;
> - zone->compact_cached_free_pfn = end_pfn;
> zone->compact_blockskip_flush = false;
>
> /* Walk the zone and mark every pageblock as suitable for isolation */
> @@ -166,8 +170,10 @@ void reset_isolation_suitable(pg_data_t *pgdat)
> continue;
>
> /* Only flush if a full compaction finished recently */
> - if (zone->compact_blockskip_flush)
> + if (zone->compact_blockskip_flush) {
> __reset_isolation_suitable(zone);
> + reset_cached_positions(zone);
> + }
> }
> }
>
> @@ -1059,9 +1065,7 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
> /* Compaction run completes if the migrate and free scanner meet */
> if (compact_scanners_met(cc)) {
> /* Let the next compaction start anew. */
> - zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> - zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> - zone->compact_cached_free_pfn = zone_end_pfn(zone);
> + reset_cached_positions(zone);
>
> /*
> * Mark that the PG_migrate_skip information should be cleared
> @@ -1187,8 +1191,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> * is about to be retried after being deferred. kswapd does not do
> * this reset as it'll reset the cached information when going to sleep.
> */
> - if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
> + if (compaction_restarting(zone, cc->order) && !current_is_kswapd()) {
> __reset_isolation_suitable(zone);
> + reset_cached_positions(zone);
> + }
>
> /*
> * Setup to move all movable pages to the end of the zone. Used cached
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/5] mm, compaction: simplify handling restart position in free pages scanner
2015-01-20 13:27 ` Zhang Yanfei
@ 2015-01-20 13:31 ` Vlastimil Babka
0 siblings, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-20 13:31 UTC (permalink / raw)
To: Zhang Yanfei, linux-mm
Cc: linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
On 01/20/2015 02:27 PM, Zhang Yanfei wrote:
> Hello,
>
> 在 2015/1/19 18:05, Vlastimil Babka 写道:
>> @@ -883,6 +883,8 @@ static void isolate_freepages(struct compact_control *cc)
>> nr_freepages += isolated;
>>
>> /*
>> + * If we isolated enough freepages, or aborted due to async
>> + * compaction being contended, terminate the loop.
>> * Remember where the free scanner should restart next time,
>> * which is where isolate_freepages_block() left off.
>> * But if it scanned the whole pageblock, isolate_start_pfn
>> @@ -891,28 +893,30 @@ static void isolate_freepages(struct compact_control *cc)
>> * In that case we will however want to restart at the start
>> * of the previous pageblock.
>> */
>> - cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
>> - isolate_start_pfn :
>> - block_start_pfn - pageblock_nr_pages;
>> -
>> - /*
>> - * isolate_freepages_block() might have aborted due to async
>> - * compaction being contended
>> - */
>> - if (cc->contended)
>> + if ((nr_freepages > cc->nr_migratepages) || cc->contended) {
>
> Shouldn't this be nr_freepages >= cc->nr_migratepages?
Ah, of course. Thanks for catching that!
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 4/5] mm, compaction: allow scanners to start at any pfn within the zone
2015-01-19 10:05 ` [RFC PATCH 4/5] mm, compaction: allow scanners to start at any pfn within the zone Vlastimil Babka
@ 2015-01-20 15:18 ` Zhang Yanfei
2015-01-22 15:24 ` Vlastimil Babka
0 siblings, 1 reply; 22+ messages in thread
From: Zhang Yanfei @ 2015-01-20 15:18 UTC (permalink / raw)
To: Vlastimil Babka, linux-mm
Cc: linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
Hello Vlastimil
在 2015/1/19 18:05, Vlastimil Babka 写道:
> Compaction employs two page scanners - migration scanner isolates pages to be
> the source of migration, free page scanner isolates pages to be the target of
> migration. Currently, migration scanner starts at the zone's first pageblock
> and progresses towards the last one. Free scanner starts at the last pageblock
> and progresses towards the first one. Within a pageblock, each scanner scans
> pages from the first to the last one. When the scanners meet within the same
> pageblock, compaction terminates.
>
> One consequence of the current scheme, that turns out to be unfortunate, is
> that the migration scanner does not encounter the pageblocks which were
> scanned by the free scanner. In a test with stress-highalloc from mmtests,
> the scanners were observed to meet around the middle of the zone in first two
> phases (with background memory pressure) of the test when executed after fresh
> reboot. On further executions without reboot, the meeting point shifts to
> roughly third of the zone, and compaction activity as well as allocation
> success rates deteriorates compared to the run after fresh reboot.
>
> It turns out that the deterioration is indeed due to the migration scanner
> processing only a small part of the zone. Compaction also keeps making this
> bias worse by its activity - by moving all migratable pages towards end of the
> zone, the free scanner has to scan a lot of full pageblocks to find more free
> pages. The beginning of the zone contains pageblocks that have been compacted
> as much as possible, but the free pages there cannot be further merged into
> larger orders due to unmovable pages. The rest of the zone might contain more
> suitable pageblocks, but the migration scanner will not reach them. It also
> isn't be able to move movable pages out of unmovable pageblocks there, which
> affects fragmentation.
>
> This patch is the first step to remove this bias. It allows the compaction
> scanners to start at arbitrary pfn (aligned to pageblock for practical
> purposes), called pivot, within the zone. The migration scanner starts at the
> exact pfn, the free scanner starts at the pageblock preceding the pivot. The
> direction of scanning is unaffected, but when the migration scanner reaches
> the last pageblock of the zone, or the free scanner reaches the first
> pageblock, they wrap and continue with the first or last pageblock,
> respectively. Compaction terminates when any of the scanners wrap and both
> meet within the same pageblock.
>
> For easier bisection of potential regressions, this patch always uses the
> first zone's pfn as the pivot. That means the free scanner immediately wraps
> to the last pageblock and the operation of scanners is thus unchanged. The
> actual pivot changing is done by the next patch.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
I read through the whole patch, and you can feel free to add:
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
I agree with you and the approach to improve the current scheme. One thing
I think should be carefully treated is how to avoid migrating back and forth
since the pivot pfn can be changed. I see patch 5 has introduced a policy to
change the pivot so we can have a careful observation on it.
(The changes in the patch make the code more difficult to understand now...
and I just find a tiny mistake, please see below)
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
> include/linux/mmzone.h | 2 +
> mm/compaction.c | 204 +++++++++++++++++++++++++++++++++++++++++++------
> mm/internal.h | 1 +
> 3 files changed, 182 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 2f0856d..47aa181 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -503,6 +503,8 @@ struct zone {
> unsigned long percpu_drift_mark;
>
> #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> + /* pfn where compaction scanners have initially started last time */
> + unsigned long compact_cached_pivot_pfn;
> /* pfn where compaction free scanner should start */
> unsigned long compact_cached_free_pfn;
> /* pfn where async and sync compaction migration scanner should start */
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5626220..abae89a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -123,11 +123,16 @@ static inline bool isolation_suitable(struct compact_control *cc,
> return !get_pageblock_skip(page);
> }
>
> +/*
> + * Invalidate cached compaction scanner positions, so that compact_zone()
> + * will reinitialize them on the next compaction.
> + */
> static void reset_cached_positions(struct zone *zone)
> {
> - zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> - zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> - zone->compact_cached_free_pfn = zone_end_pfn(zone);
> + /* Invalid values are re-initialized in compact_zone */
> + zone->compact_cached_migrate_pfn[0] = 0;
> + zone->compact_cached_migrate_pfn[1] = 0;
> + zone->compact_cached_free_pfn = 0;
> }
>
> /*
> @@ -172,11 +177,35 @@ void reset_isolation_suitable(pg_data_t *pgdat)
> /* Only flush if a full compaction finished recently */
> if (zone->compact_blockskip_flush) {
> __reset_isolation_suitable(zone);
> - reset_cached_positions(zone);
> + reset_cached_positions(zone, false);
The second argument should be in patch 5 instead of here, right? ^-^
Thanks.
> }
> }
> }
>
> +static void update_cached_migrate_pfn(unsigned long pfn,
> + unsigned long pivot_pfn, unsigned long *old_pfn)
> +{
> + /* Both old and new pfn either wrapped or not, and new is higher */
> + if (((*old_pfn >= pivot_pfn) == (pfn >= pivot_pfn))
> + && (pfn > *old_pfn))
> + *old_pfn = pfn;
> + /* New pfn has wrapped and the old didn't yet */
> + else if ((*old_pfn >= pivot_pfn) && (pfn < pivot_pfn))
> + *old_pfn = pfn;
> +}
> +
> +static void update_cached_free_pfn(unsigned long pfn,
> + unsigned long pivot_pfn, unsigned long *old_pfn)
> +{
> + /* Both old and new either pfn wrapped or not, and new is lower */
> + if (((*old_pfn < pivot_pfn) == (pfn < pivot_pfn))
> + && (pfn < *old_pfn))
> + *old_pfn = pfn;
> + /* New pfn has wrapped and the old didn't yet */
> + else if ((*old_pfn < pivot_pfn) && (pfn >= pivot_pfn))
> + *old_pfn = pfn;
> +}
> +
> /*
> * If no pages were isolated then mark this pageblock to be skipped in the
> * future. The information is later cleared by __reset_isolation_suitable().
> @@ -186,6 +215,7 @@ static void update_pageblock_skip(struct compact_control *cc,
> bool migrate_scanner)
> {
> struct zone *zone = cc->zone;
> + unsigned long pivot_pfn = cc->pivot_pfn;
> unsigned long pfn;
>
> if (cc->ignore_skip_hint)
> @@ -203,14 +233,14 @@ static void update_pageblock_skip(struct compact_control *cc,
>
> /* Update where async and sync compaction should restart */
> if (migrate_scanner) {
> - if (pfn > zone->compact_cached_migrate_pfn[0])
> - zone->compact_cached_migrate_pfn[0] = pfn;
> - if (cc->mode != MIGRATE_ASYNC &&
> - pfn > zone->compact_cached_migrate_pfn[1])
> - zone->compact_cached_migrate_pfn[1] = pfn;
> + update_cached_migrate_pfn(pfn, pivot_pfn,
> + &zone->compact_cached_migrate_pfn[0]);
> + if (cc->mode != MIGRATE_ASYNC)
> + update_cached_migrate_pfn(pfn, pivot_pfn,
> + &zone->compact_cached_migrate_pfn[1]);
> } else {
> - if (pfn < zone->compact_cached_free_pfn)
> - zone->compact_cached_free_pfn = pfn;
> + update_cached_free_pfn(pfn, pivot_pfn,
> + &zone->compact_cached_free_pfn);
> }
> }
> #else
> @@ -808,14 +838,41 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>
> #endif /* CONFIG_COMPACTION || CONFIG_CMA */
> #ifdef CONFIG_COMPACTION
> +static inline bool migrate_scanner_wrapped(struct compact_control *cc)
> +{
> + return cc->migrate_pfn < cc->pivot_pfn;
> +}
> +
> +static inline bool free_scanner_wrapped(struct compact_control *cc)
> +{
> + return cc->free_pfn >= cc->pivot_pfn;
> +}
> +
> /*
> * Test whether the free scanner has reached the same or lower pageblock than
> * the migration scanner, and compaction should thus terminate.
> */
> static inline bool compact_scanners_met(struct compact_control *cc)
> {
> - return (cc->free_pfn >> pageblock_order)
> - <= (cc->migrate_pfn >> pageblock_order);
> + bool free_below_migrate = (cc->free_pfn >> pageblock_order)
> + <= (cc->migrate_pfn >> pageblock_order);
> +
> + if (migrate_scanner_wrapped(cc) != free_scanner_wrapped(cc))
> + /*
> + * Only one of the scanners have wrapped. Terminate if free
> + * scanner is in the same or lower pageblock than migration
> + * scanner.
> + */
> + return free_below_migrate;
> + else
> + /*
> + * If neither scanner has wrapped, then free < start <=
> + * migration and we return false by definition.
> + * It shouldn't happen that both have wrapped, but even if it
> + * does due to e.g. reading mismatched zone cached pfn's, then
> + * migration < start <= free, so we return true and terminate.
> + */
> + return !free_below_migrate;
> }
>
> /*
> @@ -832,7 +889,10 @@ static void isolate_freepages(struct compact_control *cc)
> unsigned long low_pfn; /* lowest pfn scanner is able to scan */
> int nr_freepages = cc->nr_freepages;
> struct list_head *freelist = &cc->freepages;
> + bool wrapping; /* set to true in the first pageblock of the zone */
> + bool wrapped; /* set to true when either scanner has wrapped */
>
> +wrap:
> /*
> * Initialise the free scanner. The starting point is where we last
> * successfully isolated from, zone-cached value, or the end of the
> @@ -848,14 +908,25 @@ static void isolate_freepages(struct compact_control *cc)
> block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
> block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
> zone_end_pfn(zone));
> - low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
> +
> + wrapping = false;
> + wrapped = free_scanner_wrapped(cc) || migrate_scanner_wrapped(cc);
> + if (!wrapped)
> + /*
> + * If neither scanner wrapped yet, we are limited by zone's
> + * beginning. Here we pretend that the zone starts pageblock
> + * aligned to make the for-loop condition simpler.
> + */
> + low_pfn = zone->zone_start_pfn & ~(pageblock_nr_pages-1);
> + else
> + low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
>
> /*
> * Isolate free pages until enough are available to migrate the
> * pages on cc->migratepages. We stop searching if the migrate
> * and free page scanners meet or enough free pages are isolated.
> */
> - for (; block_start_pfn >= low_pfn;
> + for (; !wrapping && block_start_pfn >= low_pfn;
> block_end_pfn = block_start_pfn,
> block_start_pfn -= pageblock_nr_pages,
> isolate_start_pfn = block_start_pfn) {
> @@ -870,6 +941,24 @@ static void isolate_freepages(struct compact_control *cc)
> && compact_should_abort(cc))
> break;
>
> + /*
> + * When we are limited by zone boundary, this means we have
> + * reached its first pageblock.
> + */
> + if (!wrapped && block_start_pfn <= zone->zone_start_pfn) {
> + /* The zone might start in the middle of the pageblock */
> + block_start_pfn = zone->zone_start_pfn;
> + if (isolate_start_pfn <= zone->zone_start_pfn)
> + isolate_start_pfn = zone->zone_start_pfn;
> + /*
> + * For e.g. DMA zone with zone_start_pfn == 1, we will
> + * underflow block_start_pfn in the next loop
> + * iteration. We have to terminate the loop with other
> + * means.
> + */
> + wrapping = true;
> + }
> +
> page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
> zone);
> if (!page)
> @@ -903,6 +992,12 @@ static void isolate_freepages(struct compact_control *cc)
> if (isolate_start_pfn >= block_end_pfn)
> isolate_start_pfn =
> block_start_pfn - pageblock_nr_pages;
> + else if (wrapping)
> + /*
> + * We have been in the first pageblock of the
> + * zone, but have not finished it yet.
> + */
> + wrapping = false;
> break;
> } else {
> /*
> @@ -913,6 +1008,20 @@ static void isolate_freepages(struct compact_control *cc)
> }
> }
>
> + /* Did we reach the beginning of the zone? Wrap to the end. */
> + if (!wrapped && wrapping) {
> + isolate_start_pfn = (zone_end_pfn(zone)-1) &
> + ~(pageblock_nr_pages-1);
> + /*
> + * If we haven't isolated anything, we have to continue
> + * immediately, otherwise page migration will fail.
> + */
> + if (!nr_freepages && !cc->contended) {
> + cc->free_pfn = isolate_start_pfn;
> + goto wrap;
> + }
> + }
> +
> /* split_free_page does not map the pages */
> map_pages(freelist);
>
> @@ -984,10 +1093,11 @@ typedef enum {
> static isolate_migrate_t isolate_migratepages(struct zone *zone,
> struct compact_control *cc)
> {
> - unsigned long low_pfn, end_pfn;
> + unsigned long low_pfn, end_pfn, max_pfn;
> struct page *page;
> const isolate_mode_t isolate_mode =
> (cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);
> + bool wrapped = migrate_scanner_wrapped(cc) || free_scanner_wrapped(cc);
>
> /*
> * Start at where we last stopped, or beginning of the zone as
> @@ -998,13 +1108,27 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> /* Only scan within a pageblock boundary */
> end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
>
> + if (!wrapped) {
> + /*
> + * Neither of the scanners has wrapped yet, we are limited by
> + * zone end. Here we pretend it's aligned to pageblock
> + * boundary to make the for-loop condition simpler
> + */
> + max_pfn = ALIGN(zone_end_pfn(zone), pageblock_nr_pages);
> + } else {
> + /* If any of the scanners wrapped, we will meet free scanner */
> + max_pfn = cc->free_pfn;
> + }
> +
> /*
> * Iterate over whole pageblocks until we find the first suitable.
> - * Do not cross the free scanner.
> + * Do not cross the free scanner or the end of the zone.
> */
> - for (; end_pfn <= cc->free_pfn;
> + for (; end_pfn <= max_pfn;
> low_pfn = end_pfn, end_pfn += pageblock_nr_pages) {
>
> + if (!wrapped && end_pfn > zone_end_pfn(zone))
> + end_pfn = zone_end_pfn(zone);
> /*
> * This can potentially iterate a massively long zone with
> * many pageblocks unsuitable, so periodically check if we
> @@ -1047,6 +1171,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> }
>
> acct_isolated(zone, cc);
> + /* Did we reach the end of the zone? Wrap to the beginning */
> + if (!wrapped && low_pfn >= zone_end_pfn(zone))
> + low_pfn = zone->zone_start_pfn;
> +
> /* Record where migration scanner will be restarted. */
> cc->migrate_pfn = low_pfn;
>
> @@ -1197,22 +1325,48 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> }
>
> /*
> - * Setup to move all movable pages to the end of the zone. Used cached
> + * Setup the scanner positions according to pivot pfn. Use cached
> * information on where the scanners should start but check that it
> * is initialised by ensuring the values are within zone boundaries.
> */
> - cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
> - cc->free_pfn = zone->compact_cached_free_pfn;
> - if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
> - cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
> - zone->compact_cached_free_pfn = cc->free_pfn;
> + cc->pivot_pfn = zone->compact_cached_pivot_pfn;
> + if (cc->pivot_pfn < start_pfn || cc->pivot_pfn > end_pfn) {
> + cc->pivot_pfn = start_pfn;
> + zone->compact_cached_pivot_pfn = cc->pivot_pfn;
> + /* When starting position was invalid, reset the rest */
> + reset_cached_positions(zone);
> }
> +
> + cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
> if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
> - cc->migrate_pfn = start_pfn;
> + cc->migrate_pfn = cc->pivot_pfn;
> zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
> zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
> }
>
> + cc->free_pfn = zone->compact_cached_free_pfn;
> + if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn)
> + cc->free_pfn = cc->pivot_pfn;
> +
> + /*
> + * Free scanner should start on the beginning of the pageblock below
> + * the cc->pivot_pfn. If that's below the zone boundary, wrap to the
> + * last pageblock of the zone.
> + */
> + if (cc->free_pfn == cc->pivot_pfn) {
> + /* Don't underflow in zones starting with e.g. pfn 1 */
> + if (cc->pivot_pfn < pageblock_nr_pages) {
> + cc->free_pfn = (end_pfn-1) & ~(pageblock_nr_pages-1);
> + } else {
> + cc->free_pfn = (cc->pivot_pfn - pageblock_nr_pages);
> + cc->free_pfn &= ~(pageblock_nr_pages-1);
> + if (cc->free_pfn < start_pfn)
> + cc->free_pfn = (end_pfn-1) &
> + ~(pageblock_nr_pages-1);
> + }
> + zone->compact_cached_free_pfn = cc->free_pfn;
> + }
> +
> trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn);
>
> migrate_prep_local();
> diff --git a/mm/internal.h b/mm/internal.h
> index efad241..cb7b297 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -157,6 +157,7 @@ struct compact_control {
> struct list_head migratepages; /* List of pages being migrated */
> unsigned long nr_freepages; /* Number of isolated free pages */
> unsigned long nr_migratepages; /* Number of pages to migrate */
> + unsigned long pivot_pfn; /* Where the scanners initially start */
> unsigned long free_pfn; /* isolate_freepages search base */
> unsigned long migrate_pfn; /* isolate_migratepages search base */
> enum migrate_mode mode; /* Async or sync migration mode */
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/5] mm, compaction: encapsulate resetting cached scanner positions
2015-01-20 13:30 ` Zhang Yanfei
@ 2015-01-22 15:22 ` Vlastimil Babka
0 siblings, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-22 15:22 UTC (permalink / raw)
To: Zhang Yanfei, linux-mm
Cc: linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
On 01/20/2015 02:30 PM, Zhang Yanfei wrote:
> 在 2015/1/19 18:05, Vlastimil Babka 写道:
>> Reseting the cached compaction scanner positions is now done implicitly in
>> __reset_isolation_suitable() and compact_finished(). Encapsulate the
>> functionality in a new function reset_cached_positions() and call it
>> explicitly where needed.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>
> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Thanks.
> Should the new function be inline?
I'll try comparing with bloat-o-meter before next submission, if there's
any difference.
> Thanks.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 4/5] mm, compaction: allow scanners to start at any pfn within the zone
2015-01-20 15:18 ` Zhang Yanfei
@ 2015-01-22 15:24 ` Vlastimil Babka
0 siblings, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2015-01-22 15:24 UTC (permalink / raw)
To: Zhang Yanfei, linux-mm
Cc: linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
Christoph Lameter, Rik van Riel, David Rientjes
On 01/20/2015 04:18 PM, Zhang Yanfei wrote:
> Hello Vlastimil
>
> 在 2015/1/19 18:05, Vlastimil Babka 写道:
>> Compaction employs two page scanners - migration scanner isolates pages to be
>> the source of migration, free page scanner isolates pages to be the target of
>> migration. Currently, migration scanner starts at the zone's first pageblock
>> and progresses towards the last one. Free scanner starts at the last pageblock
>> and progresses towards the first one. Within a pageblock, each scanner scans
>> pages from the first to the last one. When the scanners meet within the same
>> pageblock, compaction terminates.
>>
>> One consequence of the current scheme, that turns out to be unfortunate, is
>> that the migration scanner does not encounter the pageblocks which were
>> scanned by the free scanner. In a test with stress-highalloc from mmtests,
>> the scanners were observed to meet around the middle of the zone in first two
>> phases (with background memory pressure) of the test when executed after fresh
>> reboot. On further executions without reboot, the meeting point shifts to
>> roughly third of the zone, and compaction activity as well as allocation
>> success rates deteriorates compared to the run after fresh reboot.
>>
>> It turns out that the deterioration is indeed due to the migration scanner
>> processing only a small part of the zone. Compaction also keeps making this
>> bias worse by its activity - by moving all migratable pages towards end of the
>> zone, the free scanner has to scan a lot of full pageblocks to find more free
>> pages. The beginning of the zone contains pageblocks that have been compacted
>> as much as possible, but the free pages there cannot be further merged into
>> larger orders due to unmovable pages. The rest of the zone might contain more
>> suitable pageblocks, but the migration scanner will not reach them. It also
>> isn't be able to move movable pages out of unmovable pageblocks there, which
>> affects fragmentation.
>>
>> This patch is the first step to remove this bias. It allows the compaction
>> scanners to start at arbitrary pfn (aligned to pageblock for practical
>> purposes), called pivot, within the zone. The migration scanner starts at the
>> exact pfn, the free scanner starts at the pageblock preceding the pivot. The
>> direction of scanning is unaffected, but when the migration scanner reaches
>> the last pageblock of the zone, or the free scanner reaches the first
>> pageblock, they wrap and continue with the first or last pageblock,
>> respectively. Compaction terminates when any of the scanners wrap and both
>> meet within the same pageblock.
>>
>> For easier bisection of potential regressions, this patch always uses the
>> first zone's pfn as the pivot. That means the free scanner immediately wraps
>> to the last pageblock and the operation of scanners is thus unchanged. The
>> actual pivot changing is done by the next patch.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>
> I read through the whole patch, and you can feel free to add:
>
> Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Thanks.
> I agree with you and the approach to improve the current scheme. One thing
> I think should be carefully treated is how to avoid migrating back and forth
> since the pivot pfn can be changed. I see patch 5 has introduced a policy to
> change the pivot so we can have a careful observation on it.
>
> (The changes in the patch make the code more difficult to understand now...
> and I just find a tiny mistake, please see below)
>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: Michal Nazarewicz <mina86@mina86.com>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: David Rientjes <rientjes@google.com>
>> ---
>> include/linux/mmzone.h | 2 +
>> mm/compaction.c | 204 +++++++++++++++++++++++++++++++++++++++++++------
>> mm/internal.h | 1 +
>> 3 files changed, 182 insertions(+), 25 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 2f0856d..47aa181 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -503,6 +503,8 @@ struct zone {
>> unsigned long percpu_drift_mark;
>>
>> #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>> + /* pfn where compaction scanners have initially started last time */
>> + unsigned long compact_cached_pivot_pfn;
>> /* pfn where compaction free scanner should start */
>> unsigned long compact_cached_free_pfn;
>> /* pfn where async and sync compaction migration scanner should start */
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 5626220..abae89a 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -123,11 +123,16 @@ static inline bool isolation_suitable(struct compact_control *cc,
>> return !get_pageblock_skip(page);
>> }
>>
>> +/*
>> + * Invalidate cached compaction scanner positions, so that compact_zone()
>> + * will reinitialize them on the next compaction.
>> + */
>> static void reset_cached_positions(struct zone *zone)
>> {
>> - zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
>> - zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
>> - zone->compact_cached_free_pfn = zone_end_pfn(zone);
>> + /* Invalid values are re-initialized in compact_zone */
>> + zone->compact_cached_migrate_pfn[0] = 0;
>> + zone->compact_cached_migrate_pfn[1] = 0;
>> + zone->compact_cached_free_pfn = 0;
>> }
>>
>> /*
>> @@ -172,11 +177,35 @@ void reset_isolation_suitable(pg_data_t *pgdat)
>> /* Only flush if a full compaction finished recently */
>> if (zone->compact_blockskip_flush) {
>> __reset_isolation_suitable(zone);
>> - reset_cached_positions(zone);
>> + reset_cached_positions(zone, false);
>
> The second argument should be in patch 5 instead of here, right? ^-^
Yes, a mistake with some last-minute rebasing :)
> Thanks.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/5] compaction: changing initial position of scanners
2015-01-19 10:05 [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
` (6 preceding siblings ...)
2015-01-20 9:02 ` Vlastimil Babka
@ 2015-02-03 6:49 ` Joonsoo Kim
2015-02-03 9:05 ` Vlastimil Babka
7 siblings, 1 reply; 22+ messages in thread
From: Joonsoo Kim @ 2015-02-03 6:49 UTC (permalink / raw)
To: Vlastimil Babka
Cc: linux-mm, linux-kernel, Andrew Morton, David Rientjes,
Christoph Lameter, Mel Gorman, Michal Nazarewicz, Minchan Kim,
Naoya Horiguchi, Rik van Riel
On Mon, Jan 19, 2015 at 11:05:15AM +0100, Vlastimil Babka wrote:
> Even after all the patches compaction received in last several versions, it
> turns out that its effectivneess degrades considerably as the system ages
> after reboot. For example, see how success rates of stress-highalloc from
> mmtests degrades when we re-execute it several times, first time being after
> fresh reboot:
> 3.19-rc4 3.19-rc4 3.19-rc4
> 4-nothp-1 4-nothp-2 4-nothp-3
> Success 1 Min 25.00 ( 0.00%) 13.00 ( 48.00%) 9.00 ( 64.00%)
> Success 1 Mean 36.20 ( 0.00%) 23.40 ( 35.36%) 16.40 ( 54.70%)
> Success 1 Max 41.00 ( 0.00%) 34.00 ( 17.07%) 25.00 ( 39.02%)
> Success 2 Min 25.00 ( 0.00%) 15.00 ( 40.00%) 10.00 ( 60.00%)
> Success 2 Mean 37.20 ( 0.00%) 25.00 ( 32.80%) 17.20 ( 53.76%)
> Success 2 Max 44.00 ( 0.00%) 36.00 ( 18.18%) 25.00 ( 43.18%)
> Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 78.00 ( 7.14%)
> Success 3 Mean 85.80 ( 0.00%) 82.80 ( 3.50%) 80.40 ( 6.29%)
> Success 3 Max 87.00 ( 0.00%) 84.00 ( 3.45%) 82.00 ( 5.75%)
>
> Wouldn't it be much better, if it looked like this?
>
> 3.18 3.18 3.18
> 3.19-rc4 3.19-rc4 3.19-rc4
> 5-nothp-1 5-nothp-2 5-nothp-3
> Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%)
> Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%)
> Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%)
> Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%)
> Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%)
> Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%)
> Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%)
> Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%)
> Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
>
> In my humble opinion, it would :) Much lower degradation, and a nice
> improvement in the first iteration as a bonus.
>
> So what sorcery is this? Nothing much, just a fundamental change of the
> compaction scanners operation...
>
> As everyone knows [1] the migration scanner starts at the first pageblock
> of a zone, and goes towards the end, and the free scanner starts at the
> last pageblock and goes towards the beginning. Somewhere in the middle of the
> zone, the scanners meet:
>
> zone_start zone_end
> | |
> -------------------------------------------------------------
> MMMMMMMMMMMMM| => <= |FFFFFFFFFFFF
> migrate_pfn free_pfn
>
> In my tests, the scanners meet around the middle of the pageblock on the first
> iteration, and around the 1/3 on subsequent iterations. Which means the
> migration scanner doesn't see the larger part of the zone at all. For more
> details why it's bad, see Patch 4 description.
>
> To make sure we eventually scan the whole zone with the migration scanner, we
> could e.g. reverse the directions after each run. But that would still be
> biased, and with 1/3 of zone reachable from each side, we would still omit the
> middle 1/3 of a zone.
>
> Or we could stop terminating compaction when the scanners meet, and let them
> continue to scan the whole zone. Mel told me it used to be the case long ago,
> but that approach would result in migrating pages back and forth during single
> compaction run, which wouldn't be cool.
>
> So the approach taken by this patchset is to let scanners start at any
> so-called pivot pfn within the zone, and keep their direction:
>
> zone_start pivot zone_end
> | | |
> -------------------------------------------------------------
> <= |FFFFFFFMMMMMM| =>
> free_pfn migrate_pfn
>
> Eventually, one of the scanners will reach the zone boundary and wrap around,
> e.g. the in the case of the free scanner:
>
> zone_start pivot zone_end
> | | |
> -------------------------------------------------------------
> FFFFFFFFFFFFFFFFFFFFFFFFFFFFFMMMMMMMMMMMM| => <= |F
> migrate_pfn free_pfn
>
> Compaction will again terminate when the scanners meet.
>
>
> As you can imagine, the required code changes made the termination detection
> and the scanners themselves quite hairy. There are lots of corner cases and
> the code is often hard to wrap one's head around [puns intended]. The scanner
> functions isolate_migratepages() and isolate_freepages() were recently cleaned
> up a lot, and this makes them messy again, as they can no longer rely on the
> fact that they will meet the other scanner and not the zone boundary.
>
> But the improvements seem to make these complications worth, and I hope
> somebody can suggest more elegant solutions to the various parts of the code.
> So here it is as a RFC. Patches 1-3 are cleanups that could be applied in any
> case. Patch 4 implements the main changes, but leaves the pivot to be the
> first zone's pfn, so that free scanner wraps immediately and there's no
> actual change. Patch 5 updates the pivot in a conservative way, as explained
> in the changelog.
Hello,
I don't have any elegant idea, but, have some humble opinion.
The point is that migrate scanner should scan whole zone.
Although your pivot approach makes some sense and it can scan whole zone,
it could cause back and forth migration in a very short term whenever
both scanners get toward and passed each other. I think that if we permit
overlap of scanner, we don't need to adhere to reverse linear scanning
in freepage scanner since reverse liner scan doesn't prevent back and
forth migration from now on.
There are two solutions on this problem.
One is that free scanner scans pfn in same direction where migrate scanner
goes with having proper interval.
|=========================|
MMM==> <Interval> FFF==>
Enough interval guarantees to prevent back and forth migration,
at least, in a very short period.
Or, we could make free scanner totally different with linear scan.
Linear scanning to get freepage wastes much time if system memory
is really big and most of it is used. If we takes freepage from the
buddy, we can eliminate this scanning overhead. With additional
logic, that is, comparing position of freepage with migrate scanner
and selectively taking it, we can avoid back and forth migration
in a very short period.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/5] compaction: changing initial position of scanners
2015-02-03 6:49 ` Joonsoo Kim
@ 2015-02-03 9:05 ` Vlastimil Babka
2015-02-03 15:00 ` Joonsoo Kim
0 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2015-02-03 9:05 UTC (permalink / raw)
To: Joonsoo Kim
Cc: linux-mm, linux-kernel, Andrew Morton, David Rientjes,
Christoph Lameter, Mel Gorman, Michal Nazarewicz, Minchan Kim,
Naoya Horiguchi, Rik van Riel
On 02/03/2015 07:49 AM, Joonsoo Kim wrote:
> On Mon, Jan 19, 2015 at 11:05:15AM +0100, Vlastimil Babka wrote:
>
> Hello,
>
> I don't have any elegant idea, but, have some humble opinion.
>
> The point is that migrate scanner should scan whole zone.
> Although your pivot approach makes some sense and it can scan whole zone,
> it could cause back and forth migration in a very short term whenever
> both scanners get toward and passed each other.
I don't understand the scenario you suggest? The scanners don't overlap in any
single run, that doesn't change. If they meet, compaction terminates. They can
"overlap" if you compare the current run with previous run, after pivot change.
The it's true that e.g. migration scanner will operate on pageblocks where the
free scanner has operated on previously. But pivot changes are only done after
the full defer cycle, which is not short term.
> I think that if we permit
> overlap of scanner, we don't need to adhere to reverse linear scanning
> in freepage scanner since reverse liner scan doesn't prevent back and
> forth migration from now on.
I believe that we still don't permit overlap, but anyway...
> There are two solutions on this problem.
> One is that free scanner scans pfn in same direction where migrate scanner
> goes with having proper interval.
>
> |=========================|
> MMM==> <Interval> FFF==>
>
> Enough interval guarantees to prevent back and forth migration,
> at least, in a very short period.
That would depend on the termination criteria and what to do after restart.
You would have to terminate as soon as one scanner approaches the position where
the other started. Otherwise you overlap and migrate back in a single run. So
you terminate and that will typically mean one of the scanners did not finish
its part fully, so there are pageblocks scanned by neither one. You could adjust
the interval to find the optimal one. But you shouldn't do it immediately next
run, as that would overlap the previous run too soon. Or maybe adjust it only a
little... I don't know if that's simpler than my approach, it seems more quirky.
> Or, we could make free scanner totally different with linear scan.
> Linear scanning to get freepage wastes much time if system memory
> is really big and most of it is used. If we takes freepage from the
> buddy, we can eliminate this scanning overhead. With additional
> logic, that is, comparing position of freepage with migrate scanner
> and selectively taking it, we can avoid back and forth migration
> in a very short period.
I think the metric we should be looking is the ration between free pages scanned
and migrate pages scanned. It's true that after this series in the
allocate-as-thp scenario, it was more than 10 in the first run after reboot.
So maybe it could be more efficient to search the buddy lists. But then again,
when to terminate in this case? The free lists are changing continuously. And to
compare position, you also need to predict how much the migrate scanner will
progress in the single run, because you don't want to take pages from there.
> Thanks.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/5] compaction: changing initial position of scanners
2015-02-03 9:05 ` Vlastimil Babka
@ 2015-02-03 15:00 ` Joonsoo Kim
2015-02-03 15:51 ` Vlastimil Babka
0 siblings, 1 reply; 22+ messages in thread
From: Joonsoo Kim @ 2015-02-03 15:00 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Joonsoo Kim, Linux Memory Management List, LKML, Andrew Morton,
David Rientjes, Christoph Lameter, Mel Gorman, Michal Nazarewicz,
Minchan Kim, Naoya Horiguchi, Rik van Riel
2015-02-03 18:05 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 02/03/2015 07:49 AM, Joonsoo Kim wrote:
>> On Mon, Jan 19, 2015 at 11:05:15AM +0100, Vlastimil Babka wrote:
>>
>> Hello,
>>
>> I don't have any elegant idea, but, have some humble opinion.
>>
>> The point is that migrate scanner should scan whole zone.
>> Although your pivot approach makes some sense and it can scan whole zone,
>> it could cause back and forth migration in a very short term whenever
>> both scanners get toward and passed each other.
>
> I don't understand the scenario you suggest? The scanners don't overlap in any
> single run, that doesn't change. If they meet, compaction terminates. They can
> "overlap" if you compare the current run with previous run, after pivot change.
Yeah, I mean this case.
I think that we should regard single run as whole zone scan rather than just
terminating criteria we have artificially defined and try to avoid
back and forth
problem as much as possible in this scale. Not overlapping in a single run you
mentioned doesn't solve this problem in this scale.
> The it's true that e.g. migration scanner will operate on pageblocks where the
> free scanner has operated on previously. But pivot changes are only done after
> the full defer cycle, which is not short term.
I don't think it's not short term. After successful run, if next high
order request
comes immediately, migrate scanner will immediately restart at the position
where previous free scanner has operated.
>
>> I think that if we permit
>> overlap of scanner, we don't need to adhere to reverse linear scanning
>> in freepage scanner since reverse liner scan doesn't prevent back and
>> forth migration from now on.
>
> I believe that we still don't permit overlap, but anyway...
>
>> There are two solutions on this problem.
>> One is that free scanner scans pfn in same direction where migrate scanner
>> goes with having proper interval.
>>
>> |=========================|
>> MMM==> <Interval> FFF==>
>>
>> Enough interval guarantees to prevent back and forth migration,
>> at least, in a very short period.
>
> That would depend on the termination criteria and what to do after restart.
> You would have to terminate as soon as one scanner approaches the position where
> the other started. Otherwise you overlap and migrate back in a single run. So
> you terminate and that will typically mean one of the scanners did not finish
> its part fully, so there are pageblocks scanned by neither one. You could adjust
> the interval to find the optimal one. But you shouldn't do it immediately next
> run, as that would overlap the previous run too soon. Or maybe adjust it only a
> little... I don't know if that's simpler than my approach, it seems more quirky.
Yeah, the idea comes from quick thought so it's not perfect.
In fact, if we regard single run as whole zone scan, back and forth problem is
inevitable. What we can do best is reducing bad effect of that problem. With
interval, we don't try to migrate page which we immediately use for freepage
in a very short period.
I think that we can break relationship of free scanner and migrate scanner.
It's not necessary that summation of scanned range of both scanner is whole
zone. The point is migrate scanner should scan whole zone. Free scanner
would adjust scanning position based on the position where
migrate scanner is, whenever necessary.
>> Or, we could make free scanner totally different with linear scan.
>> Linear scanning to get freepage wastes much time if system memory
>> is really big and most of it is used. If we takes freepage from the
>> buddy, we can eliminate this scanning overhead. With additional
>> logic, that is, comparing position of freepage with migrate scanner
>> and selectively taking it, we can avoid back and forth migration
>> in a very short period.
>
> I think the metric we should be looking is the ration between free pages scanned
> and migrate pages scanned. It's true that after this series in the
> allocate-as-thp scenario, it was more than 10 in the first run after reboot.
> So maybe it could be more efficient to search the buddy lists. But then again,
> when to terminate in this case? The free lists are changing continuously. And to
> compare position, you also need to predict how much the migrate scanner will
> progress in the single run, because you don't want to take pages from there.
We can terminate when whole zone is scanned. As I mentioned above, we can't
avoid back and forth problem in case of whole zone range and I'd like to do what
we can do our best, avoiding back and forth in a very short term. Maybe, taking
freepage positioned zone_range/2 far from with migrated scanner at that time
would prevent the problem occurrence in a very short term.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/5] compaction: changing initial position of scanners
2015-02-03 15:00 ` Joonsoo Kim
@ 2015-02-03 15:51 ` Vlastimil Babka
2015-02-03 17:07 ` Joonsoo Kim
0 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2015-02-03 15:51 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Joonsoo Kim, Linux Memory Management List, LKML, Andrew Morton,
David Rientjes, Christoph Lameter, Mel Gorman, Michal Nazarewicz,
Minchan Kim, Naoya Horiguchi, Rik van Riel
On 02/03/2015 04:00 PM, Joonsoo Kim wrote:
> 2015-02-03 18:05 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>> On 02/03/2015 07:49 AM, Joonsoo Kim wrote:
>>> On Mon, Jan 19, 2015 at 11:05:15AM +0100, Vlastimil Babka wrote:
>>>
>>> Hello,
>>>
>>> I don't have any elegant idea, but, have some humble opinion.
>>>
>>> The point is that migrate scanner should scan whole zone.
>>> Although your pivot approach makes some sense and it can scan whole zone,
>>> it could cause back and forth migration in a very short term whenever
>>> both scanners get toward and passed each other.
>>
>> I don't understand the scenario you suggest? The scanners don't overlap in any
>> single run, that doesn't change. If they meet, compaction terminates. They can
>> "overlap" if you compare the current run with previous run, after pivot change.
>
> Yeah, I mean this case.
>
> I think that we should regard single run as whole zone scan rather than just
> terminating criteria we have artificially defined and try to avoid
> back and forth
> problem as much as possible in this scale. Not overlapping in a single run you
> mentioned doesn't solve this problem in this scale.
>
>> The it's true that e.g. migration scanner will operate on pageblocks where the
>> free scanner has operated on previously. But pivot changes are only done after
>> the full defer cycle, which is not short term.
>
> I don't think it's not short term. After successful run, if next high
> order request
> comes immediately, migrate scanner will immediately restart at the position
> where previous free scanner has operated.
Ah, I think I see where the misunderstanding comes from now. So to clarify,
let's consider
1. single compaction run - single invocation of compact_zone(). It can start
from cached pfn's from previous run, or zone boundaries (or pivot, after this
series), and terminate with scanners meeting or not meeting.
2. full zone compaction - consists one or more compaction runs, where the first
run starts at boundaries (pivot). It ends when scanners meet -
compact_finished() returns COMPACT_COMPLETE
3. compaction after full defer cycle - this is full zone compaction, where
compaction_restarting() returns true in its first run
My understanding is that you think pivot changing occurs after each full zone
compaction (definition 2), but in fact it occurs only each defer cycle
(definition 3). See patch 5 for detailed reasoning. I don't think it's short
term. It means full zone compactions (def 2) already failed many times and then
was deferred for further time, using the same unchanged pivot.
I think any of the alternatives you suggested below where migrate scanner
processes whole zone during full zone compaction (2), would necessarily result
in shorter-term back and forth migration than this scheme. On the other hand,
the pivot changing proposed here might be too long-term. But it's a first
attempt, and the frequency can be further tuned.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/5] compaction: changing initial position of scanners
2015-02-03 15:51 ` Vlastimil Babka
@ 2015-02-03 17:07 ` Joonsoo Kim
2015-02-04 14:39 ` Vlastimil Babka
0 siblings, 1 reply; 22+ messages in thread
From: Joonsoo Kim @ 2015-02-03 17:07 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Joonsoo Kim, Linux Memory Management List, LKML, Andrew Morton,
David Rientjes, Christoph Lameter, Mel Gorman, Michal Nazarewicz,
Minchan Kim, Naoya Horiguchi, Rik van Riel
2015-02-04 0:51 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 02/03/2015 04:00 PM, Joonsoo Kim wrote:
>> 2015-02-03 18:05 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>>> On 02/03/2015 07:49 AM, Joonsoo Kim wrote:
>>>> On Mon, Jan 19, 2015 at 11:05:15AM +0100, Vlastimil Babka wrote:
>>>>
>>>> Hello,
>>>>
>>>> I don't have any elegant idea, but, have some humble opinion.
>>>>
>>>> The point is that migrate scanner should scan whole zone.
>>>> Although your pivot approach makes some sense and it can scan whole zone,
>>>> it could cause back and forth migration in a very short term whenever
>>>> both scanners get toward and passed each other.
>>>
>>> I don't understand the scenario you suggest? The scanners don't overlap in any
>>> single run, that doesn't change. If they meet, compaction terminates. They can
>>> "overlap" if you compare the current run with previous run, after pivot change.
>>
>> Yeah, I mean this case.
>>
>> I think that we should regard single run as whole zone scan rather than just
>> terminating criteria we have artificially defined and try to avoid
>> back and forth
>> problem as much as possible in this scale. Not overlapping in a single run you
>> mentioned doesn't solve this problem in this scale.
>>
>>> The it's true that e.g. migration scanner will operate on pageblocks where the
>>> free scanner has operated on previously. But pivot changes are only done after
>>> the full defer cycle, which is not short term.
>>
>> I don't think it's not short term. After successful run, if next high
>> order request
>> comes immediately, migrate scanner will immediately restart at the position
>> where previous free scanner has operated.
>
> Ah, I think I see where the misunderstanding comes from now. So to clarify,
> let's consider
>
> 1. single compaction run - single invocation of compact_zone(). It can start
> from cached pfn's from previous run, or zone boundaries (or pivot, after this
> series), and terminate with scanners meeting or not meeting.
>
> 2. full zone compaction - consists one or more compaction runs, where the first
> run starts at boundaries (pivot). It ends when scanners meet -
> compact_finished() returns COMPACT_COMPLETE
>
> 3. compaction after full defer cycle - this is full zone compaction, where
> compaction_restarting() returns true in its first run
>
> My understanding is that you think pivot changing occurs after each full zone
> compaction (definition 2), but in fact it occurs only each defer cycle
> (definition 3). See patch 5 for detailed reasoning. I don't think it's short
> term. It means full zone compactions (def 2) already failed many times and then
> was deferred for further time, using the same unchanged pivot.
Ah... thanks for clarifying. I actually think pivot changing occurs at
definition 2
as you guess. :)
> I think any of the alternatives you suggested below where migrate scanner
> processes whole zone during full zone compaction (2), would necessarily result
> in shorter-term back and forth migration than this scheme. On the other hand,
> the pivot changing proposed here might be too long-term. But it's a first
> attempt, and the frequency can be further tuned.
Yes, your proposal would be less problematic on back and forth problem than
my suggestion.
Hmm...nevertheless, I can't completely agree with pivot approach.
I'd like to remove dependency of migrate scanner and free scanner such as
termination criteria at this chance. Meeting position of both scanner is roughly
determined by on amount of free memory in the zone. If 200 MB is free in
the zone, migrate scanner can scan at maximum 200 MB from the start pfn
of the pivot. Without changing pivot quickly, we can scan only
this region regardless zone size so it cause bad effect to high order
allocation for a long time.
In stress-highalloc test, it doesn't matter since we try to attempt a lot of
allocations. This bad effect would not appear easily. Although middle of
allocation attempts are failed, latter attempts would succeed
since pivot would be changed in the middle of attempts.
But, in real world scenario, all allocation attempts are precise and
it'd be better
first come high order allocation request to succeed and this is another problem
than allocation success rate in stress-highalloc test. To accomplish it, we
need to change pivot as soon as possible. Without it, we could miss some
precise allocation attempt until pivot is changed. For this purpose, we should
remove defer logic or change it more loosely and then, resetting pivot would
occur soon so we could encounter back and forth problem frequently.
Therefore, it's better to change compaction logic more fundamentally.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/5] compaction: changing initial position of scanners
2015-02-03 17:07 ` Joonsoo Kim
@ 2015-02-04 14:39 ` Vlastimil Babka
2015-02-10 8:54 ` Joonsoo Kim
0 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2015-02-04 14:39 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Joonsoo Kim, Linux Memory Management List, LKML, Andrew Morton,
David Rientjes, Christoph Lameter, Mel Gorman, Michal Nazarewicz,
Minchan Kim, Naoya Horiguchi, Rik van Riel
On 02/03/2015 06:07 PM, Joonsoo Kim wrote:
> 2015-02-04 0:51 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>> Ah, I think I see where the misunderstanding comes from now. So to clarify,
>> let's consider
>>
>> 1. single compaction run - single invocation of compact_zone(). It can start
>> from cached pfn's from previous run, or zone boundaries (or pivot, after this
>> series), and terminate with scanners meeting or not meeting.
>>
>> 2. full zone compaction - consists one or more compaction runs, where the first
>> run starts at boundaries (pivot). It ends when scanners meet -
>> compact_finished() returns COMPACT_COMPLETE
>>
>> 3. compaction after full defer cycle - this is full zone compaction, where
>> compaction_restarting() returns true in its first run
>>
>> My understanding is that you think pivot changing occurs after each full zone
>> compaction (definition 2), but in fact it occurs only each defer cycle
>> (definition 3). See patch 5 for detailed reasoning. I don't think it's short
>> term. It means full zone compactions (def 2) already failed many times and then
>> was deferred for further time, using the same unchanged pivot.
>
> Ah... thanks for clarifying. I actually think pivot changing occurs at
> definition 2
> as you guess. :)
Great it's clarified :)
>> I think any of the alternatives you suggested below where migrate scanner
>> processes whole zone during full zone compaction (2), would necessarily result
>> in shorter-term back and forth migration than this scheme. On the other hand,
>> the pivot changing proposed here might be too long-term. But it's a first
>> attempt, and the frequency can be further tuned.
>
> Yes, your proposal would be less problematic on back and forth problem than
> my suggestion.
>
> Hmm...nevertheless, I can't completely agree with pivot approach.
>
> I'd like to remove dependency of migrate scanner and free scanner such as
> termination criteria at this chance. Meeting position of both scanner is roughly
Well at some point compaction should terminate if it scanned the whole
zone, and failed. How else to check that than using the scanner positions?
> determined by on amount of free memory in the zone. If 200 MB is free in
> the zone, migrate scanner can scan at maximum 200 MB from the start pfn
> of the pivot. Without changing pivot quickly, we can scan only
> this region regardless zone size so it cause bad effect to high order
> allocation for a long time.
>
> In stress-highalloc test, it doesn't matter since we try to attempt a lot of
> allocations. This bad effect would not appear easily. Although middle of
> allocation attempts are failed, latter attempts would succeed
> since pivot would be changed in the middle of attempts.
OK, that might be true. It's not a perfect benchmark.
> But, in real world scenario, all allocation attempts are precise and
> it'd be better
> first come high order allocation request to succeed and this is another problem
> than allocation success rate in stress-highalloc test. To accomplish it, we
> need to change pivot as soon as possible. Without it, we could miss some
> precise allocation attempt until pivot is changed. For this purpose, we should
> remove defer logic or change it more loosely and then, resetting pivot would
> occur soon so we could encounter back and forth problem frequently.
It seems to me that you can't have both the "migration scanner should
try scanning whole zone during single compaction (or during relatively
few attempts)" and "we shouldn't migrate pages that we have just
(relatively recently) migrated", in any scheme including the two you
proposed in previous mail. These features just go against each other.
In any scheme you should divide the zone between part that's scanned for
pages to migrate from, and part that scanned for free pages as migration
targets. If you don't divide, then you end up migrating back and forth
instantly, which would be bad.
So what happens after you don't have any free pages in the part that was
for the free scanner (this is what happen in current scheme when
scanners meet). If you wanted to continue with the migration scanner,
the only free pages are in the part which the migration scanner just
processed. And funnily enough, the pivot changing scheme will put the
free scanner just in the position to scan this part. But doing that
immediately could mean excessive migration.
Your alternative scheme where free scanner follows the migration scanner
at some distance is not very different in this sense. If you manage to
tune the distance properly, you will also scan for free pages the part
that was just processed by the migration scanner. It might be more
efficient in that you don't rescan the part that the migration scanner
didn't reach both before and after pivot change. But fundamentally, it
means migrating pages that were recently migrated.
> Therefore, it's better to change compaction logic more fundamentally.
Maybe it's indeed better to excessively migrate than keep rescanning the
same pageblocks and then defer compaction. But we shouldn't forget that
immediate success rate is not the only criteria. We should also keep the
overhead sane. That's why there's pageblock suitability bitfield,
deferred compaction etc, which I'm not sure how those would fit into the
"continuously progressing migration scanner" scheme.
So what I think should precede such increase in compact aggressivity:
- on direct compaction, only try migrate when successfully isolated all
pages needed for merging the desired order page. I've had such patch
already in one series last year, but it affected the anti-fragmentation
effects of compaction.
- no more THP page faults (also for other good reasons), leave
collapsing to khugepaged, or rather task_work, leaving only the
expensive sync compaction to dedicated per-node daemons. These should
hopefully solve the anti-fragmentation issue as well.
> Thanks.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH 0/5] compaction: changing initial position of scanners
2015-02-04 14:39 ` Vlastimil Babka
@ 2015-02-10 8:54 ` Joonsoo Kim
0 siblings, 0 replies; 22+ messages in thread
From: Joonsoo Kim @ 2015-02-10 8:54 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Linux Memory Management List, LKML, Andrew Morton,
David Rientjes, Christoph Lameter, Mel Gorman, Michal Nazarewicz,
Minchan Kim, Naoya Horiguchi, Rik van Riel
On Wed, Feb 04, 2015 at 03:39:25PM +0100, Vlastimil Babka wrote:
> On 02/03/2015 06:07 PM, Joonsoo Kim wrote:
> >2015-02-04 0:51 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> >>Ah, I think I see where the misunderstanding comes from now. So to clarify,
> >>let's consider
> >>
> >>1. single compaction run - single invocation of compact_zone(). It can start
> >>from cached pfn's from previous run, or zone boundaries (or pivot, after this
> >>series), and terminate with scanners meeting or not meeting.
> >>
> >>2. full zone compaction - consists one or more compaction runs, where the first
> >>run starts at boundaries (pivot). It ends when scanners meet -
> >>compact_finished() returns COMPACT_COMPLETE
> >>
> >>3. compaction after full defer cycle - this is full zone compaction, where
> >>compaction_restarting() returns true in its first run
> >>
> >>My understanding is that you think pivot changing occurs after each full zone
> >>compaction (definition 2), but in fact it occurs only each defer cycle
> >>(definition 3). See patch 5 for detailed reasoning. I don't think it's short
> >>term. It means full zone compactions (def 2) already failed many times and then
> >>was deferred for further time, using the same unchanged pivot.
> >
> >Ah... thanks for clarifying. I actually think pivot changing occurs at
> >definition 2
> >as you guess. :)
>
> Great it's clarified :)
>
> >>I think any of the alternatives you suggested below where migrate scanner
> >>processes whole zone during full zone compaction (2), would necessarily result
> >>in shorter-term back and forth migration than this scheme. On the other hand,
> >>the pivot changing proposed here might be too long-term. But it's a first
> >>attempt, and the frequency can be further tuned.
> >
> >Yes, your proposal would be less problematic on back and forth problem than
> >my suggestion.
> >
> >Hmm...nevertheless, I can't completely agree with pivot approach.
> >
> >I'd like to remove dependency of migrate scanner and free scanner such as
> >termination criteria at this chance. Meeting position of both scanner is roughly
>
> Well at some point compaction should terminate if it scanned the
> whole zone, and failed. How else to check that than using the
> scanner positions?
We can count number of pageblock we are scanning and if it matches
with zone span we can terminate.
> >determined by on amount of free memory in the zone. If 200 MB is free in
> >the zone, migrate scanner can scan at maximum 200 MB from the start pfn
> >of the pivot. Without changing pivot quickly, we can scan only
> >this region regardless zone size so it cause bad effect to high order
> >allocation for a long time.
> >
> >In stress-highalloc test, it doesn't matter since we try to attempt a lot of
> >allocations. This bad effect would not appear easily. Although middle of
> >allocation attempts are failed, latter attempts would succeed
> >since pivot would be changed in the middle of attempts.
>
> OK, that might be true. It's not a perfect benchmark.
>
> >But, in real world scenario, all allocation attempts are precise and
> >it'd be better
> >first come high order allocation request to succeed and this is another problem
> >than allocation success rate in stress-highalloc test. To accomplish it, we
> >need to change pivot as soon as possible. Without it, we could miss some
> >precise allocation attempt until pivot is changed. For this purpose, we should
> >remove defer logic or change it more loosely and then, resetting pivot would
> >occur soon so we could encounter back and forth problem frequently.
>
> It seems to me that you can't have both the "migration scanner
> should try scanning whole zone during single compaction (or during
> relatively few attempts)" and "we shouldn't migrate pages that we
> have just (relatively recently) migrated", in any scheme including
> the two you proposed in previous mail. These features just go
> against each other.
>
> In any scheme you should divide the zone between part that's scanned
> for pages to migrate from, and part that scanned for free pages as
> migration targets. If you don't divide, then you end up migrating
> back and forth instantly, which would be bad.
Okay. My proposal isn't perfect but it is just quick thought. :)
I hope that it is a seed for better idea, not the solution.
It may be true that we need to divide region for each scanner.
>
> So what happens after you don't have any free pages in the part that
> was for the free scanner (this is what happen in current scheme when
> scanners meet). If you wanted to continue with the migration
> scanner, the only free pages are in the part which the migration
> scanner just processed. And funnily enough, the pivot changing
> scheme will put the free scanner just in the position to scan this
> part. But doing that immediately could mean excessive migration.
>
> Your alternative scheme where free scanner follows the migration
> scanner at some distance is not very different in this sense. If you
> manage to tune the distance properly, you will also scan for free
> pages the part that was just processed by the migration scanner. It
> might be more efficient in that you don't rescan the part that the
> migration scanner didn't reach both before and after pivot change.
> But fundamentally, it means migrating pages that were recently
> migrated.
What I'm worrying about in pivot approach is that if pivot is changed
frequently due to burst of failed high order allocation or we decide
to make compaction aggressive, immediate migration back and forth
could happen. Isn't it the problem we need to consider here?
One of main difference of my approach is the direction of free scanner
and if we scan in same direction with migration scanner, we can
guarantees that certain migrated pages are migrated again as late as
possible. But, with reverse direction, it's not possible to keep this
order. See following example.
PIVOT(REVERSE DIRECTION)
|---1---2---3-----|------------------|
=> after compaction
|-----------------|---3---2---1------|
=> after pivot changed, #3 page which is migrated recently is migrated
first.
SAME DIRECTION
|---1---2---3-----|------------------|
=> after compaction
|-----------------|---1---2---3------|
=> we can keep the order of migratable pages.
I know that if we want to scan whole zone range, we can't avoid back
and forth migration. Best thing we can do is something like this that
preserve order of migration and my proposal aims at this purpose.
Thanks.
> >Therefore, it's better to change compaction logic more fundamentally.
>
> Maybe it's indeed better to excessively migrate than keep rescanning
> the same pageblocks and then defer compaction. But we shouldn't
> forget that immediate success rate is not the only criteria. We
> should also keep the overhead sane. That's why there's pageblock
> suitability bitfield, deferred compaction etc, which I'm not sure
> how those would fit into the "continuously progressing migration
> scanner" scheme.
> So what I think should precede such increase in compact aggressivity:
> - on direct compaction, only try migrate when successfully isolated
> all pages needed for merging the desired order page. I've had such
> patch already in one series last year, but it affected the
> anti-fragmentation effects of compaction.
> - no more THP page faults (also for other good reasons), leave
> collapsing to khugepaged, or rather task_work, leaving only the
> expensive sync compaction to dedicated per-node daemons. These
> should hopefully solve the anti-fragmentation issue as well.
>
> >Thanks.
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2015-02-10 8:52 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-19 10:05 [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
2015-01-19 10:05 ` [PATCH 1/5] mm, compaction: more robust check for scanners meeting Vlastimil Babka
2015-01-20 13:26 ` Zhang Yanfei
2015-01-19 10:05 ` [PATCH 2/5] mm, compaction: simplify handling restart position in free pages scanner Vlastimil Babka
2015-01-20 13:27 ` Zhang Yanfei
2015-01-20 13:31 ` Vlastimil Babka
2015-01-19 10:05 ` [PATCH 3/5] mm, compaction: encapsulate resetting cached scanner positions Vlastimil Babka
2015-01-20 13:30 ` Zhang Yanfei
2015-01-22 15:22 ` Vlastimil Babka
2015-01-19 10:05 ` [RFC PATCH 4/5] mm, compaction: allow scanners to start at any pfn within the zone Vlastimil Babka
2015-01-20 15:18 ` Zhang Yanfei
2015-01-22 15:24 ` Vlastimil Babka
2015-01-19 10:05 ` [RFC PATCH 5/5] mm, compaction: set pivot pfn to the pfn when scanners met last time Vlastimil Babka
2015-01-19 10:10 ` [RFC PATCH 0/5] compaction: changing initial position of scanners Vlastimil Babka
2015-01-20 9:02 ` Vlastimil Babka
2015-02-03 6:49 ` Joonsoo Kim
2015-02-03 9:05 ` Vlastimil Babka
2015-02-03 15:00 ` Joonsoo Kim
2015-02-03 15:51 ` Vlastimil Babka
2015-02-03 17:07 ` Joonsoo Kim
2015-02-04 14:39 ` Vlastimil Babka
2015-02-10 8:54 ` Joonsoo Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).