* [RFC 0/5] Consider higher small zone and mmaped-pages stream
@ 2012-08-22 7:15 Minchan Kim
2012-08-22 7:15 ` [PATCH 1/5] vmscan: Fix obsolete comment of balance_pgdat Minchan Kim
` (4 more replies)
0 siblings, 5 replies; 8+ messages in thread
From: Minchan Kim @ 2012-08-22 7:15 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Minchan Kim
This patchset solves two problem.
1. higher small memory zone - [2] and [3]
2. mmaped-pages stream reclaim efficiency [5]
[1] and [4] is minor fix which isn't related with
this series so it could be apply separately.
I wrote down each problem in each patch description.
Please look at each patch.
Test enviroment is following as
1. Intel(R) Core(TM)2 Duo CPU
2. 2G RAM and 400M movable zone
3. Test program:
Hannes's mapped-file-stream.c with 78 processes per 1G.
10 times exectuion.
Thanks.
Minchan Kim (5):
[1] vmscan: Fix obsolete comment of balance_pgdat
[2] vmscan: sleep only if backingdev is congested
[3] vmscan: prevent excessive pageout of kswapd
[4] vmscan: get rid of unnecessary nr_dirty ret variable
[5] vmscan: accelerate to reclaim mapped-pages stream
include/linux/mmzone.h | 23 +++++++++++++++
mm/vmscan.c | 77 ++++++++++++++++++++++++++++++++++++++----------
2 files changed, 85 insertions(+), 15 deletions(-)
--
1.7.9.5
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/5] vmscan: Fix obsolete comment of balance_pgdat
2012-08-22 7:15 [RFC 0/5] Consider higher small zone and mmaped-pages stream Minchan Kim
@ 2012-08-22 7:15 ` Minchan Kim
2012-08-23 17:37 ` Rik van Riel
2012-08-22 7:15 ` [PATCH 2/5] vmscan: sleep only if backingdev is congested Minchan Kim
` (3 subsequent siblings)
4 siblings, 1 reply; 8+ messages in thread
From: Minchan Kim @ 2012-08-22 7:15 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Minchan Kim, Nick Piggin
This patch correct obsolete comment caused by [1] and [2].
[1] 7ac6218, kswapd lockup fix
[2] 32a4330, mm: prevent kswapd from freeing excessive amounts of lowmem
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/vmscan.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8d01243..f015d92 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2472,16 +2472,17 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
* This can happen if the pages are all mlocked, or if they are all used by
* device drivers (say, ZONE_DMA). Or if they are all in use by hugetlb.
* What we do is to detect the case where all pages in the zone have been
- * scanned twice and there has been zero successful reclaim. Mark the zone as
- * dead and from now on, only perform a short scan. Basically we're polling
- * the zone for when the problem goes away.
+ * scanned above 6 times of the number of reclaimable pages and there has
+ * been zero successful reclaim. Mark the zone as dead and from now on,
+ * only perform a short scan. Basically we're polling the zone for when
+ * the problem goes away.
*
* kswapd scans the zones in the highmem->normal->dma direction. It skips
* zones which have free_pages > high_wmark_pages(zone), but once a zone is
- * found to have free_pages <= high_wmark_pages(zone), we scan that zone and the
- * lower zones regardless of the number of free pages in the lower zones. This
- * interoperates with the page allocator fallback scheme to ensure that aging
- * of pages is balanced across the zones.
+ * found to have free_pages <= high_wmark_pages(zone), we scan that zone and
+ * lower zones which don't have too many pages free. This interoperates with
+ * the page allocator fallback scheme to ensure that aging of pages is balanced
+ * across the zones.
*/
static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
int *classzone_idx)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/5] vmscan: sleep only if backingdev is congested
2012-08-22 7:15 [RFC 0/5] Consider higher small zone and mmaped-pages stream Minchan Kim
2012-08-22 7:15 ` [PATCH 1/5] vmscan: Fix obsolete comment of balance_pgdat Minchan Kim
@ 2012-08-22 7:15 ` Minchan Kim
2012-08-25 23:02 ` Rik van Riel
2012-08-22 7:15 ` [PATCH 3/5] vmscan: prevent excessive pageout of kswapd Minchan Kim
` (2 subsequent siblings)
4 siblings, 1 reply; 8+ messages in thread
From: Minchan Kim @ 2012-08-22 7:15 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Minchan Kim
In small high zone(ex, 40M movable zone), reclaim priority
could be raised easily so congestion_wait of balance_pgdat can make
kswapd sleep unnecessarily so process ends up entering into direct
reclaim path. It means processes's latency would be longer.
This patch changes congestion_wait with wait_iff_congested so kswapd
will sleep only if backdev really is congested.
==DRIVER mapped-file-stream mapped-file-stream(0.00, -nan%)
Name mapped-file-stream mapped-file-stream(0.00, -nan%)
Elapsed 676 663 (-13.00, -1.92%)
nr_vmscan_write 91 1341 (1250.00, 1373.63%)
nr_vmscan_immediate_reclaim 0 0 (0.00, 0.00%)
pgpgin 29932 21668 (-8264.00,-27.61%)
pgpgout 3652 8392 (4740.00, 129.79%)
pswpin 0 22 (22.00, 0.00%)
pswpout 91 1341 (1250.00, 1373.63%)
pgactivate 15686 16217 (531.00, 3.39%)
pgdeactivate 14171 15431 (1260.00, 8.89%)
pgfault 204523237 204524355 (1118.00, 0.00%)
pgmajfault 204472586 204472528 (-58.00, -0.00%)
pgsteal_kswapd_dma 149066 466676 (317610.00,213.07%)
pgsteal_kswapd_normal 56219654 49663877 (-6555777.00,-11.66%)
pgsteal_kswapd_high 92860817 138182330 (45321513.00,48.81%)
pgsteal_kswapd_movable 1211389 4236726 (3025337.00,249.74%)
pgsteal_direct_dma 35808 9306 (-26502.00,-74.01%)
pgsteal_direct_normal 21270282 123835 (-21146447.00,-99.42%)
pgsteal_direct_high 21051650 274887 (-20776763.00,-98.69%)
pgsteal_direct_movable 250572 38011 (-212561.00,-84.83%)
pgscan_kswapd_dma 325126 947813 (622687.00,191.52%)
pgscan_kswapd_normal 111171753 97902722 (-13269031.00,-11.94%)
pgscan_kswapd_high 178149789 274337809 (96188020.00,53.99%)
pgscan_kswapd_movable 2537926 8496474 (5958548.00,234.78%)
pgscan_direct_dma 56919 22855 (-34064.00,-59.85%)
pgscan_direct_normal 45698152 3604954 (-42093198.00,-92.11%)
pgscan_direct_high 51326549 4504909 (-46821640.00,-91.22%)
pgscan_direct_movable 433830 105418 (-328412.00,-75.70%)
pgscan_direct_throttle 0 0 (0.00, 0.00%)
pginodesteal 6721 11111 (4390.00, 65.32%)
slabs_scanned 57344 56320 (-1024.00,-1.79%)
kswapd_inodesteal 36327 31121 (-5206.00,-14.33%)
kswapd_low_wmark_hit_quickly 533 4607 (4074.00, 764.35%)
kswapd_high_wmark_hit_quickly 39 432 (393.00, 1007.69%)
kswapd_skip_congestion_wait 71505 10254 (-61251.00,-85.66%)
pageoutrun 2406110 2879697 (473587.00,19.68%)
allocstall 696424 8222 (-688202.00,-98.82%)
pgrotated 91 1341 (1250.00, 1373.63%)
kswapd_totalscan 292184594 381684818 (89500224.00,30.63%)
kswapd_totalsteal 150440926 192549609 (42108683.00,27.99%)
Kswapd_efficiency 51.00 50.00 (-1.00, -1.96%)
direct_totalscan 97515450 8238136 (-89277314.00,-91.55%)
direct_totalsteal 42608312 446039 (-42162273.00,-98.95%)
direct_efficiency 43.00 5.00 (-38.00, -88.37%)
reclaim_velocity 576479.35 588119.08 (11639.73,2.02%)
Elapsed time of test program is reduced by 13 second.
As I expected, kswapd scanning/reclaim ratio is increased about 30%
but kswapd's efficiency is still good. We reduced allocstall about 98%
so I think it's most important factor for reducing elapsed time of
test program.
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/vmscan.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f015d92..d1ebe69 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2705,8 +2705,16 @@ loop_again:
if (total_scanned && (sc.priority < DEF_PRIORITY - 2)) {
if (has_under_min_watermark_zone)
count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
- else
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ else {
+ for (i = 0; i <= end_zone; i++) {
+ struct zone *zone = pgdat->node_zones
+ + i;
+ if (!populated_zone(zone))
+ continue;
+ wait_iff_congested(zone, BLK_RW_ASYNC,
+ HZ/10);
+ }
+ }
}
/*
--
1.7.9.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/5] vmscan: prevent excessive pageout of kswapd
2012-08-22 7:15 [RFC 0/5] Consider higher small zone and mmaped-pages stream Minchan Kim
2012-08-22 7:15 ` [PATCH 1/5] vmscan: Fix obsolete comment of balance_pgdat Minchan Kim
2012-08-22 7:15 ` [PATCH 2/5] vmscan: sleep only if backingdev is congested Minchan Kim
@ 2012-08-22 7:15 ` Minchan Kim
2012-08-22 7:15 ` [PATCH 4/5] vmscan: get rid of unnecessary nr_dirty ret variable Minchan Kim
2012-08-22 7:15 ` [PATCH 5/5] vmscan: accelerate to reclaim mapped-pages stream Minchan Kim
4 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2012-08-22 7:15 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Minchan Kim
If higher zone is very small, priority could be raised easily
while lower zones have enough free pages. When one of lower zones
doesn't meet high watermark, the zone try to reclaim pages with
the high prioirty which is increased by higher small zone.
It ends up reclaiming excessive pages. I saw 8~16M pageout
in my KVM test although we need just a few Kbytes.
This patch decrease the priority temporally by average between
current and previous reclaim prioirty and if we can't reclaim
enough pages with the priority, we can use the big jumped high
priority continuosly.
==DRIVER mapped-file-stream mapped-file-stream(0.00, -nan%)
Name mapped-file-stream mapped-file-stream(0.00, -nan%)
Elapsed 663 665 (2.00, 0.30%)
nr_vmscan_write 1341 849 (-492.00, -36.69%)
nr_vmscan_immediate_reclaim 0 8 (8.00, 0.00%)
pgpgin 21668 30280 (8612.00, 39.75%)
pgpgout 8392 6396 (-1996.00,-23.78%)
pswpin 22 8 (-14.00, -63.64%)
pswpout 1341 849 (-492.00, -36.69%)
pgactivate 16217 15959 (-258.00, -1.59%)
pgdeactivate 15431 15303 (-128.00, -0.83%)
pgfault 204524355 204524410 (55.00, 0.00%)
pgmajfault 204472528 204472602 (74.00, 0.00%)
pgsteal_kswapd_dma 466676 475265 (8589.00, 1.84%)
pgsteal_kswapd_normal 49663877 51289479 (1625602.00,3.27%)
pgsteal_kswapd_high 138182330 135817904 (-2364426.00,-1.71%)
pgsteal_kswapd_movable 4236726 4380123 (143397.00,3.38%)
pgsteal_direct_dma 9306 11910 (2604.00, 27.98%)
pgsteal_direct_normal 123835 165012 (41177.00,33.25%)
pgsteal_direct_high 274887 309271 (34384.00,12.51%)
pgsteal_direct_movable 38011 45638 (7627.00, 20.07%)
pgscan_kswapd_dma 947813 972089 (24276.00,2.56%)
pgscan_kswapd_normal 97902722 100850050 (2947328.00,3.01%)
pgscan_kswapd_high 274337809 269039236 (-5298573.00,-1.93%)
pgscan_kswapd_movable 8496474 8774392 (277918.00,3.27%)
pgscan_direct_dma 22855 26410 (3555.00, 15.55%)
pgscan_direct_normal 3604954 4186439 (581485.00,16.13%)
pgscan_direct_high 4504909 5132110 (627201.00,13.92%)
pgscan_direct_movable 105418 122790 (17372.00,16.48%)
pgscan_direct_throttle 0 0 (0.00, 0.00%)
pginodesteal 11111 6836 (-4275.00,-38.48%)
slabs_scanned 56320 56320 (0.00, 0.00%)
kswapd_inodesteal 31121 35904 (4783.00, 15.37%)
kswapd_low_wmark_hit_quickly 4607 5193 (586.00, 12.72%)
kswapd_high_wmark_hit_quickly 432 421 (-11.00, -2.55%)
kswapd_skip_congestion_wait 10254 12375 (2121.00, 20.68%)
pageoutrun 2879697 3071912 (192215.00,6.67%)
allocstall 8222 9727 (1505.00, 18.30%)
pgrotated 1341 850 (-491.00, -36.61%)
kswapd_totalscan 381684818 379635767 (-2049051.00,-0.54%)
kswapd_totalsteal 192549609 191962771 (-586838.00,-0.30%)
Kswapd_efficiency 50.00 50.00 (0.00, 0.00%)
direct_totalscan 8238136 9467749 (1229613.00,14.93%)
direct_totalsteal 446039 531831 (85792.00,19.23%)
direct_efficiency 5.00 5.00 (0.00, 0.00%)
reclaim_velocity 588119.08 585118.06 (-3001.02,-0.51%)
Elapsed time of test program is rather increased compared to
previous patch[2/5] but the number of reclaimed pages is much decreased.
before-patch: 192995648 after-patch: 192494602 diff: 501046(about 2G)
Since kswapd reclaimed smaller pages per turn compared to old behavior,
kswapd's pageoutrun is increased and allocstall is also increased
by about 18%. Yeb. It's not good in this workload but old behavior
worked well by just *luck* which reclaimed too many pages than
necessary amount so we could avoid frequent reclaim path.
As downside of that, it might evict part of working set and this patch
will prevent that problem without big downside, I believe.
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/vmscan.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d1ebe69..0e2550c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2492,6 +2492,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
int i;
int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long total_scanned;
+ int prev_priority[MAX_NR_ZONES];
struct reclaim_state *reclaim_state = current->reclaim_state;
unsigned long nr_soft_reclaimed;
unsigned long nr_soft_scanned;
@@ -2513,6 +2514,8 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
loop_again:
total_scanned = 0;
sc.priority = DEF_PRIORITY;
+ for (i = 0; i < MAX_NR_ZONES; i++)
+ prev_priority[i] = DEF_PRIORITY;
sc.nr_reclaimed = 0;
sc.may_writepage = !laptop_mode;
count_vm_event(PAGEOUTRUN);
@@ -2635,6 +2638,21 @@ loop_again:
!zone_watermark_ok_safe(zone, testorder,
high_wmark_pages(zone) + balance_gap,
end_zone, 0)) {
+ /*
+ * If higher zone is very small, priority could
+ * be raised easily while lower zones have
+ * enough free pages. When one of lower zones
+ * doesn't meet high watermark, the zone try to
+ * reclaim pages with high prioirty which is
+ * increased by higher small zone. It ends up
+ * reclaiming excessive pages.
+ * Let's decrease the priority temporally.
+ */
+ int tmp_priority = sc.priority;
+ if ((prev_priority[i] - sc.priority) > 1)
+ sc.priority = (prev_priority[i] +
+ sc.priority) >> 1;
+
shrink_zone(zone, &sc);
reclaim_state->reclaimed_slab = 0;
@@ -2644,7 +2662,11 @@ loop_again:
if (nr_slab == 0 && !zone_reclaimable(zone))
zone->all_unreclaimable = 1;
- }
+
+ prev_priority[i] = tmp_priority;
+ sc.priority = tmp_priority;
+ } else
+ prev_priority[i] = DEF_PRIORITY;
/*
* If we've done a decent amount of scanning and
--
1.7.9.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/5] vmscan: get rid of unnecessary nr_dirty ret variable
2012-08-22 7:15 [RFC 0/5] Consider higher small zone and mmaped-pages stream Minchan Kim
` (2 preceding siblings ...)
2012-08-22 7:15 ` [PATCH 3/5] vmscan: prevent excessive pageout of kswapd Minchan Kim
@ 2012-08-22 7:15 ` Minchan Kim
2012-08-22 7:15 ` [PATCH 5/5] vmscan: accelerate to reclaim mapped-pages stream Minchan Kim
4 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2012-08-22 7:15 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Minchan Kim
Now anyone don't use nr_dirty so remove it.
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/vmscan.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0e2550c..1a66680 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -674,7 +674,6 @@ static enum page_references page_check_references(struct page *page,
static unsigned long shrink_page_list(struct list_head *page_list,
struct zone *zone,
struct scan_control *sc,
- unsigned long *ret_nr_dirty,
unsigned long *ret_nr_writeback)
{
LIST_HEAD(ret_pages);
@@ -955,7 +954,6 @@ keep:
list_splice(&ret_pages, page_list);
count_vm_events(PGACTIVATE, pgactivate);
mem_cgroup_uncharge_end();
- *ret_nr_dirty += nr_dirty;
*ret_nr_writeback += nr_writeback;
return nr_reclaimed;
}
@@ -1236,7 +1234,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
unsigned long nr_scanned;
unsigned long nr_reclaimed = 0;
unsigned long nr_taken;
- unsigned long nr_dirty = 0;
unsigned long nr_writeback = 0;
isolate_mode_t isolate_mode = 0;
int file = is_file_lru(lru);
@@ -1278,8 +1275,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
if (nr_taken == 0)
return 0;
- nr_reclaimed = shrink_page_list(&page_list, zone, sc,
- &nr_dirty, &nr_writeback);
+ nr_reclaimed = shrink_page_list(&page_list, zone, sc, &nr_writeback);
spin_lock_irq(&zone->lru_lock);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/5] vmscan: accelerate to reclaim mapped-pages stream
2012-08-22 7:15 [RFC 0/5] Consider higher small zone and mmaped-pages stream Minchan Kim
` (3 preceding siblings ...)
2012-08-22 7:15 ` [PATCH 4/5] vmscan: get rid of unnecessary nr_dirty ret variable Minchan Kim
@ 2012-08-22 7:15 ` Minchan Kim
4 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2012-08-22 7:15 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Minchan Kim
Normally, mmapped data pages has a chance to stay around in LRU
one more rather than others because they were bon with referecend
pte so we can keep workingset mapped page in memory.
But it can have a problem when there are a ton of mmaped page stream.
VM should burn out CPU for rotating them in LRU so that kswapd's
efficiency would drop so that processes start to enter direct reclaim
path. It's not desirable.
This patch try to detect mmaped pages stream.
If VM see above 80%'s mmaped pages in a reclaim chunk(32),
he consider it as mmaped-pages stream's symptom and monitor
consecutive reclaim chunk. If VM find 1M mmapped pages during
consecutive reclaim, he concludes it as mmaped pages stream and
start to reclaim them without rotation.
If VM see below 80%'s mmaped pages in a reclaim chunck during
consecutive reclaim, it back off instantly
==DRIVER mapped-file-stream mapped-file-stream(0.00, -nan%)
Name mapped-file-stream mapped-file-stream(0.00, -nan%)
Elapsed 665 615 (-50.00, -7.52%)
nr_vmscan_write 849 62 (-787.00, -92.70%)
nr_vmscan_immediate_reclaim 8 5 (-3.00, -37.50%)
pgpgin 30280 27096 (-3184.00,-10.52%)
pgpgout 6396 2680 (-3716.00,-58.10%)
pswpin 8 0 (-8.00, -100.00%)
pswpout 849 18 (-831.00, -97.88%)
pgactivate 15959 15585 (-374.00, -2.34%)
pgdeactivate 15303 13896 (-1407.00,-9.19%)
pgfault 204524410 204524092 (-318.00, -0.00%)
pgmajfault 204472602 204472572 (-30.00, -0.00%)
pgsteal_kswapd_dma 475265 892600 (417335.00,87.81%)
pgsteal_kswapd_normal 51289479 44560409 (-6729070.00,-13.12%)
pgsteal_kswapd_high 135817904 142316673 (6498769.00,4.78%)
pgsteal_kswapd_movable 4380123 4793399 (413276.00,9.44%)
pgsteal_direct_dma 11910 0 (-11910.00,-100.00%)
pgsteal_direct_normal 165012 1322 (-163690.00,-99.20%)
pgsteal_direct_high 309271 40 (-309231.00,-99.99%)
pgsteal_direct_movable 45638 0 (-45638.00,-100.00%)
pgscan_kswapd_dma 972089 893162 (-78927.00,-8.12%)
pgscan_kswapd_normal 100850050 44609130 (-56240920.00,-55.77%)
pgscan_kswapd_high 269039236 142394025 (-126645211.00,-47.07%)
pgscan_kswapd_movable 8774392 4798082 (-3976310.00,-45.32%)
pgscan_direct_dma 26410 0 (-26410.00,-100.00%)
pgscan_direct_normal 4186439 1322 (-4185117.00,-99.97%)
pgscan_direct_high 5132110 1161 (-5130949.00,-99.98%)
pgscan_direct_movable 122790 0 (-122790.00,-100.00%)
pgscan_direct_throttle 0 0 (0.00, 0.00%)
pginodesteal 6836 0 (-6836.00,-100.00%)
slabs_scanned 56320 52224 (-4096.00,-7.27%)
kswapd_inodesteal 35904 41679 (5775.00, 16.08%)
kswapd_low_wmark_hit_quickly 5193 7587 (2394.00, 46.10%)
kswapd_high_wmark_hit_quickly 421 463 (42.00, 9.98%)
kswapd_skip_congestion_wait 12375 23 (-12352.00,-99.81%)
pageoutrun 3071912 3202200 (130288.00,4.24%)
allocstall 9727 32 (-9695.00,-99.67%)
pgrotated 850 18 (-832.00, -97.88%)
kswapd_totalscan 379635767 192694399 (-186941368.00,-49.24%)
kswapd_totalsteal 191962771 192563081 (600310.00,0.31%)
Kswapd_efficiency 50.00 99.00 (49.00, 98.00%)
direct_totalscan 9467749 2483 (-9465266.00,-99.97%)
direct_totalsteal 531831 1362 (-530469.00,-99.74%)
direct_efficiency 5.00 54.00 (49.00, 980.00%)
reclaim_velocity 585118.06 313328.26 (-271789.80,-46.45%)
Elapsed time of test program is 50 second. Of course,
the number of scanning is decreased hugely so efficiency of
kswapd/direct reclaim is super enhanced.
I think this patch can help very much on mmapped-file stream
while it doesn't have a problem on other workload due to instant
backoff.
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
include/linux/mmzone.h | 23 +++++++++++++++++++++++
mm/vmscan.c | 24 ++++++++++++++++++++++--
2 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2daa54f..190376e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -325,6 +325,28 @@ enum zone_type {
#error ZONES_SHIFT -- too many zones configured adjust calculation
#endif
+/*
+ * VM try to detect mp(mapped-pages) stream so it could be reclaimed
+ * without rotation. It reduces CPU burning and enhances kswapd
+ * efficiency.
+ */
+struct mp_detector {
+ bool force_reclaim;
+ int stream_detect_shift;
+};
+
+/*
+ * If we detect SWAP_CLUSTER_MAX * MP_DETECT_MAX_SHIFT(ie, 1M)
+ * mapped-pages during consecutive reclaim, we consider it as
+ * mapped-pages stream.
+ */
+#define MP_DETECT_MAX_SHIFT 8 /* 1 is SWAP_CLUSTER_MAX pages */
+/*
+ * If above 80% is mapped pages in a reclaim chunk, we consider it as
+ * mapped-pages stream's symptom.
+ */
+#define MP_STREAM_RATIO (4 / 5)
+
struct zone {
/* Fields commonly accessed by the page allocator */
@@ -422,6 +444,7 @@ struct zone {
*/
unsigned int inactive_ratio;
+ struct mp_detector mp;
ZONE_PADDING(_pad2_)
/* Rarely used or read-mostly fields */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1a66680..e215e98 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -674,12 +674,14 @@ static enum page_references page_check_references(struct page *page,
static unsigned long shrink_page_list(struct list_head *page_list,
struct zone *zone,
struct scan_control *sc,
+ unsigned long *ret_nr_referenced_ptes,
unsigned long *ret_nr_writeback)
{
LIST_HEAD(ret_pages);
LIST_HEAD(free_pages);
int pgactivate = 0;
unsigned long nr_dirty = 0;
+ unsigned long nr_referenced_ptes = 0;
unsigned long nr_congested = 0;
unsigned long nr_reclaimed = 0;
unsigned long nr_writeback = 0;
@@ -762,12 +764,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
case PAGEREF_ACTIVATE:
goto activate_locked;
case PAGEREF_KEEP:
+ nr_referenced_ptes++;
+ if (zone->mp.force_reclaim)
+ goto free_mapped_page;
goto keep_locked;
case PAGEREF_RECLAIM:
case PAGEREF_RECLAIM_CLEAN:
; /* try to reclaim the page below */
}
-
+free_mapped_page:
/*
* Anonymous process memory has backing store?
* Try to allocate it some swap space here.
@@ -954,6 +959,7 @@ keep:
list_splice(&ret_pages, page_list);
count_vm_events(PGACTIVATE, pgactivate);
mem_cgroup_uncharge_end();
+ *ret_nr_referenced_ptes = nr_referenced_ptes;
*ret_nr_writeback += nr_writeback;
return nr_reclaimed;
}
@@ -1234,6 +1240,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
unsigned long nr_scanned;
unsigned long nr_reclaimed = 0;
unsigned long nr_taken;
+ unsigned long nr_referenced_ptes = 0;
unsigned long nr_writeback = 0;
isolate_mode_t isolate_mode = 0;
int file = is_file_lru(lru);
@@ -1275,7 +1282,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
if (nr_taken == 0)
return 0;
- nr_reclaimed = shrink_page_list(&page_list, zone, sc, &nr_writeback);
+ nr_reclaimed = shrink_page_list(&page_list, zone, sc,
+ &nr_referenced_ptes, &nr_writeback);
spin_lock_irq(&zone->lru_lock);
@@ -1325,6 +1333,18 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
(nr_taken >> (DEF_PRIORITY - sc->priority)))
wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
+
+ if (nr_referenced_ptes >= (nr_taken * MP_STREAM_RATIO)) {
+ int shift = zone->mp.stream_detect_shift;
+ shift = min(++shift, MP_DETECT_MAX_SHIFT);
+ if (shift == MP_DETECT_MAX_SHIFT)
+ zone->mp.force_reclaim = true;
+ zone->mp.stream_detect_shift = shift;
+ } else {
+ zone->mp.stream_detect_shift = 0;
+ zone->mp.force_reclaim = false;
+ }
+
trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
zone_idx(zone),
nr_scanned, nr_reclaimed,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/5] vmscan: Fix obsolete comment of balance_pgdat
2012-08-22 7:15 ` [PATCH 1/5] vmscan: Fix obsolete comment of balance_pgdat Minchan Kim
@ 2012-08-23 17:37 ` Rik van Riel
0 siblings, 0 replies; 8+ messages in thread
From: Rik van Riel @ 2012-08-23 17:37 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm,
linux-kernel, Nick Piggin
On 08/22/2012 03:15 AM, Minchan Kim wrote:
> This patch correct obsolete comment caused by [1] and [2].
>
> [1] 7ac6218, kswapd lockup fix
> [2] 32a4330, mm: prevent kswapd from freeing excessive amounts of lowmem
>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Rik van Riel <riel@redhat.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/5] vmscan: sleep only if backingdev is congested
2012-08-22 7:15 ` [PATCH 2/5] vmscan: sleep only if backingdev is congested Minchan Kim
@ 2012-08-25 23:02 ` Rik van Riel
0 siblings, 0 replies; 8+ messages in thread
From: Rik van Riel @ 2012-08-25 23:02 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, Mel Gorman, Johannes Weiner, linux-mm, linux-kernel
On 08/22/2012 03:15 AM, Minchan Kim wrote:
> +++ b/mm/vmscan.c
> @@ -2705,8 +2705,16 @@ loop_again:
> if (total_scanned && (sc.priority < DEF_PRIORITY - 2)) {
> if (has_under_min_watermark_zone)
> count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
> - else
> - congestion_wait(BLK_RW_ASYNC, HZ/10);
> + else {
> + for (i = 0; i <= end_zone; i++) {
> + struct zone *zone = pgdat->node_zones
> + + i;
> + if (!populated_zone(zone))
> + continue;
> + wait_iff_congested(zone, BLK_RW_ASYNC,
> + HZ/10);
> + }
> + }
> }
Do we really want to wait on every zone?
That could increase the sleep time by a factor 3.
--
All rights reversed
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-08-25 23:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-22 7:15 [RFC 0/5] Consider higher small zone and mmaped-pages stream Minchan Kim
2012-08-22 7:15 ` [PATCH 1/5] vmscan: Fix obsolete comment of balance_pgdat Minchan Kim
2012-08-23 17:37 ` Rik van Riel
2012-08-22 7:15 ` [PATCH 2/5] vmscan: sleep only if backingdev is congested Minchan Kim
2012-08-25 23:02 ` Rik van Riel
2012-08-22 7:15 ` [PATCH 3/5] vmscan: prevent excessive pageout of kswapd Minchan Kim
2012-08-22 7:15 ` [PATCH 4/5] vmscan: get rid of unnecessary nr_dirty ret variable Minchan Kim
2012-08-22 7:15 ` [PATCH 5/5] vmscan: accelerate to reclaim mapped-pages stream Minchan Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).