All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] Reduce system disruption due to kswapd
@ 2013-03-17 13:04 ` Mel Gorman
  0 siblings, 0 replies; 268+ messages in thread
From: Mel Gorman @ 2013-03-17 13:04 UTC (permalink / raw)
  To: Linux-MM
  Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic,
	Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, LKML,
	Mel Gorman

Kswapd and page reclaim behaviour has been screwy in one way or the other
for a long time. Very broadly speaking it worked in the far past because
machines were limited in memory so it did not have that many pages to scan
and it stalled congestion_wait() frequently to prevent it going completely
nuts. In recent times it has behaved very unsatisfactorily with some of
the problems compounded by the removal of stall logic and the introduction
of transparent hugepage support with high-order reclaims.

There are many variations of bugs that are rooted in this area. One example
is reports of a large copy operations or backup causing the machine to
grind to a halt or applications pushed to swap. Sometimes in low memory
situations a large percentage of memory suddenly gets reclaimed. In other
cases an application starts and kswapd hits 100% CPU usage for prolonged
periods of time and so on. There is now talk of introducing features like
an extra free kbytes tunable to work around aspects of the problem instead
of trying to deal with it. It's compounded by the problem that it can be
very workload and machine specific.

This RFC is aimed at investigating if kswapd can be address these various
problems in a relatively straight-forward fashion without a fundamental
rewrite.

Patches 1-2 limits the number of pages kswapd reclaims while still obeying
	the anon/file proportion of the LRUs it should be scanning.

Patches 3-4 control how and when kswapd raises its scanning priority and
	deletes the scanning restart logic which is tricky to follow.

Patch 5 notes that it is too easy for kswapd to reach priority 0 when
	scanning and then reclaim the world. Down with that sort of thing.

Patch 6 notes that kswapd starts writeback based on scanning priority which
	is not necessarily related to dirty pages. It will have kswapd
	writeback pages if a number of unqueued dirty pages have been
	recently encountered at the tail of the LRU.

Patch 7 notes that sometimes kswapd should stall waiting on IO to complete
	to reduce LRU churn and the likelihood that it'll reclaim young
	clean pages or push applications to swap. It will cause kswapd
	to block on IO if it detects that pages being reclaimed under
	writeback are recycling through the LRU before the IO completes.

Patch 8 shrinks slab just once per priority scanned or if a zone is otherwise
	unreclaimable to avoid hammering slab when kswapd has to skip a
	large number of pages.

Patches 9-10 are cosmetic but balance_pgdat() might be easier to follow.

This was tested using memcached+memcachetest while some background IO
was in progress as implemented by the parallel IO tests implement in MM
Tests. memcachetest benchmarks how many operations/second memcached can
service and it is run multiple times. It starts with no background IO and
then re-runs the test with larger amounts of IO in the background to roughly
simulate a large copy in progress.  The expectation is that the IO should
have little or no impact on memcachetest which is running entirely in memory.

Ordinarily this test is run a number of times for each amount of IO and
the worse result reported but these results are based on just one run as
a quick test. ftrace was also running so there was additional sources of
interference and the results would be more varaiable than normal. More
comprehensive tests are be queued but they'll take quite some time to
complete. Kernel baseline is v3.9-rc2 and the following kernels were tested

vanilla			3.9-rc2
flatten-v1r8		Patches 1-4
limitprio-v1r8		Patches 1-5
write-v1r8		Patches 1-6
block-v1r8		Patches 1-7
tidy-v1r8		Patches 1-10

                                         3.9.0-rc2                   3.9.0-rc2                   3.9.0-rc2                   3.9.0-rc2                   3.9.0-rc2
                                           vanilla                flatten-v1r8              limitprio-v1r8                  block-v1r8                   tidy-v1r8
Ops memcachetest-0M             10932.00 (  0.00%)          10898.00 ( -0.31%)          10903.00 ( -0.27%)          10911.00 ( -0.19%)          10916.00 ( -0.15%)
Ops memcachetest-749M            7816.00 (  0.00%)          10715.00 ( 37.09%)          11006.00 ( 40.81%)          10903.00 ( 39.50%)          10856.00 ( 38.89%)
Ops memcachetest-2498M           3974.00 (  0.00%)           3190.00 (-19.73%)          11623.00 (192.48%)          11142.00 (180.37%)          10930.00 (175.04%)
Ops memcachetest-4246M           2355.00 (  0.00%)           2915.00 ( 23.78%)          12619.00 (435.84%)          11212.00 (376.09%)          10904.00 (363.01%)
Ops io-duration-0M                  0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops io-duration-749M               31.00 (  0.00%)             16.00 ( 48.39%)              9.00 ( 70.97%)              9.00 ( 70.97%)              8.00 ( 74.19%)
Ops io-duration-2498M              89.00 (  0.00%)            111.00 (-24.72%)             27.00 ( 69.66%)             28.00 ( 68.54%)             27.00 ( 69.66%)
Ops io-duration-4246M             182.00 (  0.00%)            165.00 (  9.34%)             49.00 ( 73.08%)             46.00 ( 74.73%)             45.00 ( 75.27%)
Ops swaptotal-0M                    0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swaptotal-749M             219394.00 (  0.00%)         162045.00 ( 26.14%)              0.00 (  0.00%)              0.00 (  0.00%)             16.00 ( 99.99%)
Ops swaptotal-2498M            312904.00 (  0.00%)         389809.00 (-24.58%)            334.00 ( 99.89%)           1233.00 ( 99.61%)              8.00 (100.00%)
Ops swaptotal-4246M            471517.00 (  0.00%)         395170.00 ( 16.19%)              0.00 (  0.00%)           1117.00 ( 99.76%)             29.00 ( 99.99%)
Ops swapin-0M                       0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-749M                 62057.00 (  0.00%)           5954.00 ( 90.41%)              0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-2498M               143617.00 (  0.00%)         154592.00 ( -7.64%)              0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-4246M               160417.00 (  0.00%)         125904.00 ( 21.51%)              0.00 (  0.00%)             13.00 ( 99.99%)              0.00 (  0.00%)
Ops minorfaults-0M            1683549.00 (  0.00%)        1685771.00 ( -0.13%)        1675398.00 (  0.48%)        1723245.00 ( -2.36%)        1683717.00 ( -0.01%)
Ops minorfaults-749M          1788977.00 (  0.00%)        1871737.00 ( -4.63%)        1617193.00 (  9.60%)        1610892.00 (  9.95%)        1682760.00 (  5.94%)
Ops minorfaults-2498M         1836894.00 (  0.00%)        1796566.00 (  2.20%)        1677878.00 (  8.66%)        1685741.00 (  8.23%)        1609514.00 ( 12.38%)
Ops minorfaults-4246M         1797685.00 (  0.00%)        1819832.00 ( -1.23%)        1689258.00 (  6.03%)        1690695.00 (  5.95%)        1684430.00 (  6.30%)
Ops majorfaults-0M                  5.00 (  0.00%)              7.00 (-40.00%)              5.00 (  0.00%)             24.00 (-380.00%)              9.00 (-80.00%)
Ops majorfaults-749M            10310.00 (  0.00%)            876.00 ( 91.50%)             73.00 ( 99.29%)             63.00 ( 99.39%)             90.00 ( 99.13%)
Ops majorfaults-2498M           20809.00 (  0.00%)          22377.00 ( -7.54%)            102.00 ( 99.51%)            110.00 ( 99.47%)             55.00 ( 99.74%)
Ops majorfaults-4246M           23228.00 (  0.00%)          20270.00 ( 12.73%)            196.00 ( 99.16%)            222.00 ( 99.04%)            102.00 ( 99.56%)

Note how the vanilla kernel's performance is ruined by the parallel IO
with performance of 10932 ops/sec dropping to 2355 ops/sec. Note that
this is likely due to the swap activity and major faults as memcached
is pushed to swap prematurely.

flatten-v1r8 overall reduces the amount of reclaim but it's a minor
improvement.

limitprio-v1r8 almost eliminates the impact the parallel IO has on the
memcachetest workload. The ops/sec remain above 10K ops/sec and there is
no swapin activity.

The remainer of the series has very little impact on the memcachetest
workload but the impact on kswapd is visible in the vmstat figures.

                             3.9.0-rc2   3.9.0-rc2   3.9.0-rc2   3.9.0-rc2   3.9.0-rc2
                               vanillaflatten-v1r8limitprio-v1r8  block-v1r8   tidy-v1r8
Page Ins                       1567012     1238608       90388      103832       75684
Page Outs                     12837552    15223512    12726464    13613400    12668604
Swap Ins                        366362      286798           0          13           0
Swap Outs                       637724      660574         334        2337          53
Direct pages scanned                 0           0           0      196955      292532
Kswapd pages scanned          11763732     4389473   207629411    22337712     3885443
Kswapd pages reclaimed         1262812     1186228     1228379      971375      685338
Direct pages reclaimed               0           0           0      186053      267255
Kswapd efficiency                  10%         27%          0%          4%         17%
Kswapd velocity               9111.544    3407.923  161226.742   17342.002    3009.265
Direct efficiency                 100%        100%        100%         94%         91%
Direct velocity                  0.000       0.000       0.000     152.907     226.565
Percentage direct scans             0%          0%          0%          0%          7%
Page writes by reclaim         2858699     1159073    42498573    21198413     3018972
Page writes file               2220975      498499    42498239    21196076     3018919
Page writes anon                637724      660574         334        2337          53
Page reclaim immediate            6243         125       69598        1056        4370
Page rescued immediate               0           0           0           0           0
Slabs scanned                    35328       39296       32000       62080       25600
Direct inode steals                  0           0           0           0           0
Kswapd inode steals              16899        5491        6375       19957         907
Kswapd skipped wait                  0           0           0           0           0
THP fault alloc                     14           7          10          50           7
THP collapse alloc                 491         465         637         709         629
THP splits                          10          12           5           7           5
THP fault fallback                   0           0           0           0           0
THP collapse fail                    0           0           0           0           0
Compaction stalls                    0           0           0          81           3
Compaction success                   0           0           0          74           0
Compaction failures                  0           0           0           7           3
Page migrate success                 0           0           0       43855           0
Page migrate failure                 0           0           0           0           0
Compaction pages isolated            0           0           0       97582           0
Compaction migrate scanned           0           0           0      111419           0
Compaction free scanned              0           0           0      324617           0
Compaction cost                      0           0           0          48           0

While limitprio-v1r8 improves the performance of memcachetest, note what it
does to kswapd activity apparently scanning on average 162K pages/second. In
reality what happened was that there was spikes in reclaim activity but
nevertheless it's severe.

The patch that blocks kswapd when it encounters too many pages under
writeback severely reduces the amount of scanning activity. Note that the
full series also reduces the amount of slab shrinking heavily reduces the
amount of inodes reclaimed by kswapd.

Comments?

 include/linux/mmzone.h |  16 ++
 mm/vmscan.c            | 387 +++++++++++++++++++++++++++++--------------------
 2 files changed, 245 insertions(+), 158 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 268+ messages in thread
* [PATCH 0/10] Reduce system disruption due to kswapd V2
@ 2013-04-09 11:06 Mel Gorman
  2013-04-09 11:06   ` Mel Gorman
  0 siblings, 1 reply; 268+ messages in thread
From: Mel Gorman @ 2013-04-09 11:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic,
	Johannes Weiner, dormando, Satoru Moriya, Michal Hocko, Linux-MM,
	LKML, Mel Gorman

Posting V2 of this series got delayed due to trying to pin down an unrelated
regression in 3.9-rc where interactive performance is shot to hell. That
problem still has not been identified as it's resisting attempts to be
reproducible by a script for the purposes of bisection.

For those that looked at V1, the most important difference in this version
is how patch 2 preserves the proportional scanning of anon/file LRUs.

The series is against 3.9-rc6.

Changelog since V1
o Rename ZONE_DIRTY to ZONE_TAIL_LRU_DIRTY			(andi)
o Reformat comment in shrink_page_list				(andi)
o Clarify some comments						(dhillf)
o Rework how the proportional scanning is preserved
o Add PageReclaim check before kswapd starts writeback
o Reset sc.nr_reclaimed on every full zone scan

Kswapd and page reclaim behaviour has been screwy in one way or the other
for a long time. Very broadly speaking it worked in the far past because
machines were limited in memory so it did not have that many pages to scan
and it stalled congestion_wait() frequently to prevent it going completely
nuts. In recent times it has behaved very unsatisfactorily with some of
the problems compounded by the removal of stall logic and the introduction
of transparent hugepage support with high-order reclaims.

There are many variations of bugs that are rooted in this area. One example
is reports of a large copy operations or backup causing the machine to
grind to a halt or applications pushed to swap. Sometimes in low memory
situations a large percentage of memory suddenly gets reclaimed. In other
cases an application starts and kswapd hits 100% CPU usage for prolonged
periods of time and so on. There is now talk of introducing features like
an extra free kbytes tunable to work around aspects of the problem instead
of trying to deal with it. It's compounded by the problem that it can be
very workload and machine specific.

This series aims at addressing some of the worst of these problems without
attempting to fundmentally alter how page reclaim works.

Patches 1-2 limits the number of pages kswapd reclaims while still obeying
	the anon/file proportion of the LRUs it should be scanning.

Patches 3-4 control how and when kswapd raises its scanning priority and
	deletes the scanning restart logic which is tricky to follow.

Patch 5 notes that it is too easy for kswapd to reach priority 0 when
	scanning and then reclaim the world. Down with that sort of thing.

Patch 6 notes that kswapd starts writeback based on scanning priority which
	is not necessarily related to dirty pages. It will have kswapd
	writeback pages if a number of unqueued dirty pages have been
	recently encountered at the tail of the LRU.

Patch 7 notes that sometimes kswapd should stall waiting on IO to complete
	to reduce LRU churn and the likelihood that it'll reclaim young
	clean pages or push applications to swap. It will cause kswapd
	to block on IO if it detects that pages being reclaimed under
	writeback are recycling through the LRU before the IO completes.

Patch 8 shrinks slab just once per priority scanned or if a zone is otherwise
	unreclaimable to avoid hammering slab when kswapd has to skip a
	large number of pages.

Patches 9-10 are cosmetic but balance_pgdat() might be easier to follow.

This was tested using memcached+memcachetest while some background IO
was in progress as implemented by the parallel IO tests implement in MM
Tests. memcachetest benchmarks how many operations/second memcached can
service and it is run multiple times. It starts with no background IO and
then re-runs the test with larger amounts of IO in the background to roughly
simulate a large copy in progress.  The expectation is that the IO should
have little or no impact on memcachetest which is running entirely in memory.

                                         3.9.0-rc6                   3.9.0-rc6
                                           vanilla           lessdisrupt-v2r11
Ops memcachetest-0M             11106.00 (  0.00%)          10997.00 ( -0.98%)
Ops memcachetest-749M           10960.00 (  0.00%)          11032.00 (  0.66%)
Ops memcachetest-2498M           2588.00 (  0.00%)          10948.00 (323.03%)
Ops memcachetest-4246M           2401.00 (  0.00%)          10960.00 (356.48%)
Ops io-duration-0M                  0.00 (  0.00%)              0.00 (  0.00%)
Ops io-duration-749M               15.00 (  0.00%)              8.00 ( 46.67%)
Ops io-duration-2498M             112.00 (  0.00%)             25.00 ( 77.68%)
Ops io-duration-4246M             170.00 (  0.00%)             45.00 ( 73.53%)
Ops swaptotal-0M                    0.00 (  0.00%)              0.00 (  0.00%)
Ops swaptotal-749M             161678.00 (  0.00%)             16.00 ( 99.99%)
Ops swaptotal-2498M            471903.00 (  0.00%)            192.00 ( 99.96%)
Ops swaptotal-4246M            444010.00 (  0.00%)           1323.00 ( 99.70%)
Ops swapin-0M                       0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-749M                   789.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-2498M               196496.00 (  0.00%)            192.00 ( 99.90%)
Ops swapin-4246M               168269.00 (  0.00%)            154.00 ( 99.91%)
Ops minorfaults-0M            1596126.00 (  0.00%)        1521332.00 (  4.69%)
Ops minorfaults-749M          1766556.00 (  0.00%)        1596350.00 (  9.63%)
Ops minorfaults-2498M         1661445.00 (  0.00%)        1598762.00 (  3.77%)
Ops minorfaults-4246M         1628375.00 (  0.00%)        1597624.00 (  1.89%)
Ops majorfaults-0M                  9.00 (  0.00%)              0.00 (  0.00%)
Ops majorfaults-749M              154.00 (  0.00%)            101.00 ( 34.42%)
Ops majorfaults-2498M           27214.00 (  0.00%)            165.00 ( 99.39%)
Ops majorfaults-4246M           23229.00 (  0.00%)            114.00 ( 99.51%)

Note how the vanilla kernels performance collapses when there is enough IO
taking place in the background. This drop in performance is part of users
complain of when they start backups. Note how the swapin and major fault
figures indicate that processes were being pushed to swap prematurely. With
the series applied, there is no noticable performance drop and while there
is still some swap activity, it's tiny.

                             3.9.0-rc6   3.9.0-rc6
                               vanilla lessdisrupt-v2r11
Page Ins                       9094288      346092
Page Outs                     62897388    47599884
Swap Ins                       2243749       19389
Swap Outs                      3953966      142258
Direct pages scanned                 0     2262897
Kswapd pages scanned          55530838    75725437
Kswapd pages reclaimed         6682620     1814689
Direct pages reclaimed               0     2187167
Kswapd efficiency                  12%          2%
Kswapd velocity              10537.501   14377.501
Direct efficiency                 100%         96%
Direct velocity                  0.000     429.642
Percentage direct scans             0%          2%
Page writes by reclaim        10835163    72419297
Page writes file               6881197    72277039
Page writes anon               3953966      142258
Page reclaim immediate           11463        8199
Page rescued immediate               0           0
Slabs scanned                    38144       30592
Direct inode steals                  0           0
Kswapd inode steals              11383         791
Kswapd skipped wait                  0           0
THP fault alloc                     10         111
THP collapse alloc                2782        1779
THP splits                          10          27
THP fault fallback                   0           5
THP collapse fail                    0          21
Compaction stalls                    0          89
Compaction success                   0          53
Compaction failures                  0          36
Page migrate success                 0       37062
Page migrate failure                 0           0
Compaction pages isolated            0       83481
Compaction migrate scanned           0       80830
Compaction free scanned              0     2660824
Compaction cost                      0          40
NUMA PTE updates                     0           0
NUMA hint faults                     0           0
NUMA hint local faults               0           0
NUMA pages migrated                  0           0
AutoNUMA cost                        0           0

Note that while there is no noticeable performance drop and swap activity is
massively reduced there are processes that direct reclaim as a consequence
of the series due to kswapd not reclaiming the world. ftrace was not enabled
for this particular test to avoid disruption but on a similar test with
ftrace I found that the vast bulk of the direct reclaims were in the dd
processes. The top direct reclaimers were;

     11 ps-13204
     12 top-13198
     15 memcachetest-11712
     20 gzip-3126
     67 tclsh-3124
     80 memcachetest-12924
    191 flush-8:0-292
    338 tee-3125
   2184 dd-12135
  10751 dd-13124

While processes did stall, it was mostly the "correct" processes that
stalled.

There is also still a risk that kswapd not reclaiming the world may mean
that it stays awake balancing zones, does not stall on the appropriate
events and continually scans pages it cannot reclaim consuming CPU. This
will be visible as continued high CPU usage but in my own tests I only
saw a single spike lasting less than a second and I did not observe any
problems related to reclaim while running the series on my desktop.

 include/linux/mmzone.h |  17 ++
 mm/vmscan.c            | 449 ++++++++++++++++++++++++++++++-------------------
 2 files changed, 293 insertions(+), 173 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 268+ messages in thread
* [PATCH 0/10] Reduce system disruption due to kswapd V3
@ 2013-04-11 19:57 Mel Gorman
  2013-04-11 19:57   ` Mel Gorman
  0 siblings, 1 reply; 268+ messages in thread
From: Mel Gorman @ 2013-04-11 19:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jiri Slaby, Valdis Kletnieks, Rik van Riel, Zlatko Calusic,
	Johannes Weiner, dormando, Michal Hocko, Kamezawa Hiroyuki,
	Linux-MM, LKML, Mel Gorman

Big change is again related to proportional reclaim.

Changelog since V2
o Preserve ratio properly for proportional scanning		(kamezawa)

Changelog since V1
o Rename ZONE_DIRTY to ZONE_TAIL_LRU_DIRTY			(andi)
o Reformat comment in shrink_page_list				(andi)
o Clarify some comments						(dhillf)
o Rework how the proportional scanning is preserved
o Add PageReclaim check before kswapd starts writeback
o Reset sc.nr_reclaimed on every full zone scan

Kswapd and page reclaim behaviour has been screwy in one way or the other
for a long time. Very broadly speaking it worked in the far past because
machines were limited in memory so it did not have that many pages to scan
and it stalled congestion_wait() frequently to prevent it going completely
nuts. In recent times it has behaved very unsatisfactorily with some of
the problems compounded by the removal of stall logic and the introduction
of transparent hugepage support with high-order reclaims.

There are many variations of bugs that are rooted in this area. One example
is reports of a large copy operations or backup causing the machine to
grind to a halt or applications pushed to swap. Sometimes in low memory
situations a large percentage of memory suddenly gets reclaimed. In other
cases an application starts and kswapd hits 100% CPU usage for prolonged
periods of time and so on. There is now talk of introducing features like
an extra free kbytes tunable to work around aspects of the problem instead
of trying to deal with it. It's compounded by the problem that it can be
very workload and machine specific.

This series aims at addressing some of the worst of these problems without
attempting to fundmentally alter how page reclaim works.

Patches 1-2 limits the number of pages kswapd reclaims while still obeying
	the anon/file proportion of the LRUs it should be scanning.

Patches 3-4 control how and when kswapd raises its scanning priority and
	deletes the scanning restart logic which is tricky to follow.

Patch 5 notes that it is too easy for kswapd to reach priority 0 when
	scanning and then reclaim the world. Down with that sort of thing.

Patch 6 notes that kswapd starts writeback based on scanning priority which
	is not necessarily related to dirty pages. It will have kswapd
	writeback pages if a number of unqueued dirty pages have been
	recently encountered at the tail of the LRU.

Patch 7 notes that sometimes kswapd should stall waiting on IO to complete
	to reduce LRU churn and the likelihood that it'll reclaim young
	clean pages or push applications to swap. It will cause kswapd
	to block on IO if it detects that pages being reclaimed under
	writeback are recycling through the LRU before the IO completes.

Patch 8 shrinks slab just once per priority scanned or if a zone is otherwise
	unreclaimable to avoid hammering slab when kswapd has to skip a
	large number of pages.

Patches 9-10 are cosmetic but balance_pgdat() might be easier to follow.

This was tested using memcached+memcachetest while some background IO
was in progress as implemented by the parallel IO tests implement in MM
Tests. memcachetest benchmarks how many operations/second memcached can
service and it is run multiple times. It starts with no background IO and
then re-runs the test with larger amounts of IO in the background to roughly
simulate a large copy in progress.  The expectation is that the IO should
have little or no impact on memcachetest which is running entirely in memory.

                                         3.9.0-rc6                   3.9.0-rc6
                                           vanilla            lessdisrupt-v3r6
Ops memcachetest-0M             10868.00 (  0.00%)          10932.00 (  0.59%)
Ops memcachetest-749M           10976.00 (  0.00%)          10986.00 (  0.09%)
Ops memcachetest-2498M           3406.00 (  0.00%)          10871.00 (219.17%)
Ops memcachetest-4246M           2402.00 (  0.00%)          10936.00 (355.29%)
Ops io-duration-0M                  0.00 (  0.00%)              0.00 (  0.00%)
Ops io-duration-749M               15.00 (  0.00%)              9.00 ( 40.00%)
Ops io-duration-2498M             107.00 (  0.00%)             27.00 ( 74.77%)
Ops io-duration-4246M             193.00 (  0.00%)             47.00 ( 75.65%)
Ops swaptotal-0M                    0.00 (  0.00%)              0.00 (  0.00%)
Ops swaptotal-749M             155965.00 (  0.00%)             25.00 ( 99.98%)
Ops swaptotal-2498M            335917.00 (  0.00%)            287.00 ( 99.91%)
Ops swaptotal-4246M            463021.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-0M                       0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-749M                     0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-2498M               139128.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-4246M               156276.00 (  0.00%)              0.00 (  0.00%)
Ops minorfaults-0M            1677257.00 (  0.00%)        1642376.00 (  2.08%)
Ops minorfaults-749M          1819566.00 (  0.00%)        1572243.00 ( 13.59%)
Ops minorfaults-2498M         1842140.00 (  0.00%)        1652508.00 ( 10.29%)
Ops minorfaults-4246M         1796116.00 (  0.00%)        1651464.00 (  8.05%)
Ops majorfaults-0M                  6.00 (  0.00%)              6.00 (  0.00%)
Ops majorfaults-749M               55.00 (  0.00%)             49.00 ( 10.91%)
Ops majorfaults-2498M           20936.00 (  0.00%)            110.00 ( 99.47%)
Ops majorfaults-4246M           22487.00 (  0.00%)            185.00 ( 99.18%)

Note how the vanilla kernels performance collapses when there is enough IO
taking place in the background. This drop in performance is part of users
complain of when they start backups. Note how the swapin and major fault
figures indicate that processes were being pushed to swap prematurely. With
the series applied, there is no noticable performance drop and while there
is still some swap activity, it's tiny.

                             3.9.0-rc6   3.9.0-rc6
                               vanillalessdisrupt-v3r6
Page Ins                       1281068       89224
Page Outs                     15697620    11478616
Swap Ins                        295654           0
Swap Outs                       659499         312
Direct pages scanned                 0       78668
Kswapd pages scanned           7166977     4416457
Kswapd pages reclaimed         1185518     1051751
Direct pages reclaimed               0       72993
Kswapd efficiency                  16%         23%
Kswapd velocity               5558.640    3420.614
Direct efficiency                 100%         92%
Direct velocity                  0.000      60.930
Percentage direct scans             0%          1%
Page writes by reclaim         2044715     2922251
Page writes file               1385216     2921939
Page writes anon                659499         312
Page reclaim immediate            4040         218
Page rescued immediate               0           0
Slabs scanned                    35456       26624
Direct inode steals                  0           0
Kswapd inode steals              19898        1420
Kswapd skipped wait                  0           0
THP fault alloc                     11          51
THP collapse alloc                 574         609
THP splits                           9           6
THP fault fallback                   0           0
THP collapse fail                    0           0
Compaction stalls                    0           0
Compaction success                   0           0
Compaction failures                  0           0
Page migrate success                 0           0
Page migrate failure                 0           0
Compaction pages isolated            0           0
Compaction migrate scanned           0           0
Compaction free scanned              0           0
Compaction cost                      0           0
NUMA PTE updates                     0           0
NUMA hint faults                     0           0
NUMA hint local faults               0           0
NUMA pages migrated                  0           0
AutoNUMA cost                        0           0

Note that kswapd efficiency is slightly improved. Unfortunately, also note
that there is a small amount of direct reclaim due to kswapd no longer reclaiming
the world. Using ftrace it would appear that the direct reclaim stalls are mostly
harmless with the vast bulk of the stalls incurred by dd

      2 gzip-3111
      5 memcachetest-12607
     26 tclsh-3109
     67 tee-3110
     89 flush-8:0-286
   2055 dd-12795

There is a risk that kswapd not reclaiming the world may mean that it
stays awake balancing zones, does not stall on the appropriate events
and continually scans pages it cannot reclaim consuming CPU. This will
be visible as continued high CPU usage but in my own tests I only saw a
single spike lasting less than a second and I did not observe any problems
related to reclaim while running the series on my desktop.

 include/linux/mmzone.h |  17 ++
 mm/vmscan.c            | 461 ++++++++++++++++++++++++++++++-------------------
 2 files changed, 305 insertions(+), 173 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 268+ messages in thread

end of thread, other threads:[~2013-04-18 16:10 UTC | newest]

Thread overview: 268+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-17 13:04 [RFC PATCH 0/8] Reduce system disruption due to kswapd Mel Gorman
2013-03-17 13:04 ` Mel Gorman
2013-03-17 13:04 ` [PATCH 01/10] mm: vmscan: Limit the number of pages kswapd reclaims at each priority Mel Gorman
2013-03-17 13:04   ` Mel Gorman
2013-03-18 23:53   ` Simon Jeons
2013-03-18 23:53     ` Simon Jeons
2013-03-19  9:55     ` Mel Gorman
2013-03-19  9:55       ` Mel Gorman
2013-03-19 10:16       ` Simon Jeons
2013-03-19 10:16         ` Simon Jeons
2013-03-19 10:59         ` Mel Gorman
2013-03-19 10:59           ` Mel Gorman
2013-03-20 16:18   ` Michal Hocko
2013-03-20 16:18     ` Michal Hocko
2013-03-21  0:52     ` Rik van Riel
2013-03-21  0:52       ` Rik van Riel
2013-03-22  0:08       ` Will Huck
2013-03-22  0:08         ` Will Huck
2013-03-21  9:47     ` Mel Gorman
2013-03-21  9:47       ` Mel Gorman
2013-03-21 12:59       ` Michal Hocko
2013-03-21 12:59         ` Michal Hocko
2013-03-21  0:51   ` Rik van Riel
2013-03-21  0:51     ` Rik van Riel
2013-03-21 15:57   ` Johannes Weiner
2013-03-21 15:57     ` Johannes Weiner
2013-03-21 16:47     ` Mel Gorman
2013-03-21 16:47       ` Mel Gorman
2013-03-22  0:05     ` Will Huck
2013-03-22  0:05       ` Will Huck
2013-03-22  3:52       ` Rik van Riel
2013-03-22  3:52         ` Rik van Riel
2013-03-22  3:56         ` Will Huck
2013-03-22  3:56           ` Will Huck
2013-03-22  4:59           ` Will Huck
2013-03-22  4:59             ` Will Huck
2013-03-22 13:01             ` Rik van Riel
2013-03-22 13:01               ` Rik van Riel
2013-04-05  0:05               ` Will Huck
2013-04-05  0:05                 ` Will Huck
2013-04-07  7:32                 ` Will Huck
2013-04-07  7:32                   ` Will Huck
2013-04-07  7:35                 ` Will Huck
2013-04-07  7:35                   ` Will Huck
2013-04-11  5:54         ` Will Huck
2013-04-11  5:54           ` Will Huck
2013-04-11  5:58         ` Will Huck
2013-04-11  5:58           ` Will Huck
2013-04-12  5:46           ` Ric Mason
2013-04-12  5:46             ` Ric Mason
2013-04-12  9:34             ` Mel Gorman
2013-04-12  9:34               ` Mel Gorman
2013-04-12 13:40               ` Rik van Riel
2013-04-12 13:40                 ` Rik van Riel
2013-03-25  9:07   ` Michal Hocko
2013-03-25  9:07     ` Michal Hocko
2013-03-25  9:13     ` Jiri Slaby
2013-03-25  9:13       ` Jiri Slaby
2013-03-28 22:31       ` Jiri Slaby
2013-03-28 22:31         ` Jiri Slaby
2013-03-29  8:22         ` Michal Hocko
2013-03-29  8:22           ` Michal Hocko
2013-03-30 22:07           ` Jiri Slaby
2013-03-30 22:07             ` Jiri Slaby
2013-04-02 11:15             ` Mel Gorman
2013-04-02 11:15               ` Mel Gorman
2013-03-17 13:04 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman
2013-03-17 13:04   ` Mel Gorman
2013-03-17 14:39   ` Andi Kleen
2013-03-17 14:39     ` Andi Kleen
2013-03-17 15:08     ` Mel Gorman
2013-03-17 15:08       ` Mel Gorman
2013-03-21  1:10   ` Rik van Riel
2013-03-21  1:10     ` Rik van Riel
2013-03-21  9:54     ` Mel Gorman
2013-03-21  9:54       ` Mel Gorman
2013-03-21 14:01   ` Michal Hocko
2013-03-21 14:01     ` Michal Hocko
2013-03-21 14:31     ` Mel Gorman
2013-03-21 14:31       ` Mel Gorman
2013-03-21 15:07       ` Michal Hocko
2013-03-21 15:07         ` Michal Hocko
2013-03-21 15:34         ` Mel Gorman
2013-03-21 15:34           ` Mel Gorman
2013-03-22  7:54           ` Michal Hocko
2013-03-22  7:54             ` Michal Hocko
2013-03-22  8:37             ` Mel Gorman
2013-03-22  8:37               ` Mel Gorman
2013-03-22 10:04               ` Michal Hocko
2013-03-22 10:04                 ` Michal Hocko
2013-03-22 10:47                 ` Michal Hocko
2013-03-22 10:47                   ` Michal Hocko
2013-03-21 16:25   ` Johannes Weiner
2013-03-21 16:25     ` Johannes Weiner
2013-03-21 18:02     ` Mel Gorman
2013-03-21 18:02       ` Mel Gorman
2013-03-22 16:53       ` Johannes Weiner
2013-03-22 16:53         ` Johannes Weiner
2013-03-22 18:25         ` Mel Gorman
2013-03-22 18:25           ` Mel Gorman
2013-03-22 19:09           ` Johannes Weiner
2013-03-22 19:09             ` Johannes Weiner
2013-03-22 19:46             ` Mel Gorman
2013-03-22 19:46               ` Mel Gorman
2013-03-17 13:04 ` [PATCH 03/10] mm: vmscan: Flatten kswapd priority loop Mel Gorman
2013-03-17 13:04   ` Mel Gorman
2013-03-17 14:36   ` Andi Kleen
2013-03-17 14:36     ` Andi Kleen
2013-03-17 15:09     ` Mel Gorman
2013-03-17 15:09       ` Mel Gorman
2013-03-18  7:02   ` Hillf Danton
2013-03-19 10:01     ` Mel Gorman
2013-03-18 23:58   ` Simon Jeons
2013-03-18 23:58     ` Simon Jeons
2013-03-19 10:12     ` Mel Gorman
2013-03-19 10:12       ` Mel Gorman
2013-03-19  3:08   ` Simon Jeons
2013-03-19  3:08     ` Simon Jeons
2013-03-19  8:23     ` Michal Hocko
2013-03-19  8:23       ` Michal Hocko
2013-03-19 10:14     ` Mel Gorman
2013-03-19 10:14       ` Mel Gorman
2013-03-19 10:26       ` Simon Jeons
2013-03-19 10:26         ` Simon Jeons
2013-03-19 11:01         ` Mel Gorman
2013-03-19 11:01           ` Mel Gorman
2013-03-21 14:54   ` Michal Hocko
2013-03-21 14:54     ` Michal Hocko
2013-03-21 15:26     ` Mel Gorman
2013-03-21 15:26       ` Mel Gorman
2013-03-21 15:38       ` Michal Hocko
2013-03-21 15:38         ` Michal Hocko
2013-03-17 13:04 ` [PATCH 04/10] mm: vmscan: Decide whether to compact the pgdat based on reclaim progress Mel Gorman
2013-03-17 13:04   ` Mel Gorman
2013-03-18 11:11   ` Wanpeng Li
2013-03-18 11:11   ` Wanpeng Li
2013-03-19 10:19     ` Mel Gorman
2013-03-19 10:19       ` Mel Gorman
2013-03-18 11:35   ` Hillf Danton
2013-03-18 11:35     ` Hillf Danton
2013-03-19 10:27     ` Mel Gorman
2013-03-19 10:27       ` Mel Gorman
2013-03-21 15:32   ` Michal Hocko
2013-03-21 15:32     ` Michal Hocko
2013-03-21 15:47     ` Mel Gorman
2013-03-21 15:47       ` Mel Gorman
2013-03-21 15:50       ` Michal Hocko
2013-03-21 15:50         ` Michal Hocko
2013-03-17 13:04 ` [PATCH 05/10] mm: vmscan: Do not allow kswapd to scan at maximum priority Mel Gorman
2013-03-17 13:04   ` Mel Gorman
2013-03-21  1:20   ` Rik van Riel
2013-03-21  1:20     ` Rik van Riel
2013-03-21 10:12     ` Mel Gorman
2013-03-21 10:12       ` Mel Gorman
2013-03-21 12:30       ` Rik van Riel
2013-03-21 12:30         ` Rik van Riel
2013-03-21 15:48   ` Michal Hocko
2013-03-21 15:48     ` Michal Hocko
2013-03-17 13:04 ` [PATCH 06/10] mm: vmscan: Have kswapd writeback pages based on dirty pages encountered, not priority Mel Gorman
2013-03-17 13:04   ` Mel Gorman
2013-03-17 14:42   ` Andi Kleen
2013-03-17 14:42     ` Andi Kleen
2013-03-17 15:11     ` Mel Gorman
2013-03-17 15:11       ` Mel Gorman
2013-03-21 17:53       ` Rik van Riel
2013-03-21 17:53         ` Rik van Riel
2013-03-21 18:15         ` Mel Gorman
2013-03-21 18:15           ` Mel Gorman
2013-03-21 18:21           ` Rik van Riel
2013-03-21 18:21             ` Rik van Riel
2013-03-18 11:08   ` Wanpeng Li
2013-03-18 11:08   ` Wanpeng Li
2013-03-19 10:35     ` Mel Gorman
2013-03-19 10:35       ` Mel Gorman
2013-03-17 13:04 ` [PATCH 07/10] mm: vmscan: Block kswapd if it is encountering pages under writeback Mel Gorman
2013-03-17 13:04   ` Mel Gorman
2013-03-17 14:49   ` Andi Kleen
2013-03-17 14:49     ` Andi Kleen
2013-03-17 15:19     ` Mel Gorman
2013-03-17 15:19       ` Mel Gorman
2013-03-17 15:40       ` Andi Kleen
2013-03-17 15:40         ` Andi Kleen
2013-03-19 11:06         ` Mel Gorman
2013-03-19 11:06           ` Mel Gorman
2013-03-18 11:37   ` Simon Jeons
2013-03-18 11:37     ` Simon Jeons
2013-03-19 10:57     ` Mel Gorman
2013-03-19 10:57       ` Mel Gorman
2013-03-18 11:58   ` Wanpeng Li
2013-03-19 10:58     ` Mel Gorman
2013-03-19 10:58       ` Mel Gorman
2013-03-18 11:58   ` Wanpeng Li
2013-03-21 16:32   ` [PATCH 07/10 -v2r1] " Michal Hocko
2013-03-21 16:32     ` Michal Hocko
2013-03-21 18:42   ` [PATCH 07/10] " Rik van Riel
2013-03-21 18:42     ` Rik van Riel
2013-03-22  8:27     ` Mel Gorman
2013-03-22  8:27       ` Mel Gorman
2013-03-17 13:04 ` [PATCH 08/10] mm: vmscan: Have kswapd shrink slab only once per priority Mel Gorman
2013-03-17 13:04   ` Mel Gorman
2013-03-17 14:53   ` Andi Kleen
2013-03-17 14:53     ` Andi Kleen
2013-03-21 16:47   ` Michal Hocko
2013-03-21 16:47     ` Michal Hocko
2013-03-21 19:47   ` Rik van Riel
2013-03-21 19:47     ` Rik van Riel
2013-04-09  6:53   ` Joonsoo Kim
2013-04-09  6:53     ` Joonsoo Kim
2013-04-09  8:41     ` Simon Jeons
2013-04-09  8:41       ` Simon Jeons
2013-04-09 11:13     ` Mel Gorman
2013-04-09 11:13       ` Mel Gorman
2013-04-10  1:07       ` Dave Chinner
2013-04-10  1:07         ` Dave Chinner
2013-04-10  5:23         ` Joonsoo Kim
2013-04-10  5:23           ` Joonsoo Kim
2013-04-11  9:53         ` Mel Gorman
2013-04-11  9:53           ` Mel Gorman
2013-04-10  5:21       ` Joonsoo Kim
2013-04-10  5:21         ` Joonsoo Kim
2013-04-11 10:01         ` Mel Gorman
2013-04-11 10:01           ` Mel Gorman
2013-04-11 10:29           ` Ric Mason
2013-04-11 10:29             ` Ric Mason
2013-03-17 13:04 ` [PATCH 09/10] mm: vmscan: Check if kswapd should writepage " Mel Gorman
2013-03-17 13:04   ` Mel Gorman
2013-03-21 16:58   ` Michal Hocko
2013-03-21 16:58     ` Michal Hocko
2013-03-21 18:07     ` Mel Gorman
2013-03-21 18:07       ` Mel Gorman
2013-03-21 19:52   ` Rik van Riel
2013-03-21 19:52     ` Rik van Riel
2013-03-17 13:04 ` [PATCH 10/10] mm: vmscan: Move logic from balance_pgdat() to kswapd_shrink_zone() Mel Gorman
2013-03-17 13:04   ` Mel Gorman
2013-03-17 14:55   ` Andi Kleen
2013-03-17 14:55     ` Andi Kleen
2013-03-17 15:25     ` Mel Gorman
2013-03-17 15:25       ` Mel Gorman
2013-03-21 17:18   ` Michal Hocko
2013-03-21 17:18     ` Michal Hocko
2013-03-21 18:13     ` Mel Gorman
2013-03-21 18:13       ` Mel Gorman
2013-03-21 10:44 ` [RFC PATCH 0/8] Reduce system disruption due to kswapd Damien Wyart
2013-03-21 10:54   ` Zlatko Calusic
2013-03-21 11:48     ` Mel Gorman
2013-03-21 11:20   ` Mel Gorman
2013-03-22 14:37 ` Mel Gorman
2013-03-22 14:37   ` Mel Gorman
2013-03-24 19:00 ` Jiri Slaby
2013-03-24 19:00   ` Jiri Slaby
2013-03-25  8:17   ` Michal Hocko
2013-03-25  8:17     ` Michal Hocko
2013-04-09 11:06 [PATCH 0/10] Reduce system disruption due to kswapd V2 Mel Gorman
2013-04-09 11:06 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman
2013-04-09 11:06   ` Mel Gorman
2013-04-10  7:16   ` Kamezawa Hiroyuki
2013-04-10  7:16     ` Kamezawa Hiroyuki
2013-04-10 14:08     ` Mel Gorman
2013-04-10 14:08       ` Mel Gorman
2013-04-11  0:14       ` Kamezawa Hiroyuki
2013-04-11  0:14         ` Kamezawa Hiroyuki
2013-04-11  9:09         ` Mel Gorman
2013-04-11  9:09           ` Mel Gorman
2013-04-11 19:57 [PATCH 0/10] Reduce system disruption due to kswapd V3 Mel Gorman
2013-04-11 19:57 ` [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd Mel Gorman
2013-04-11 19:57   ` Mel Gorman
2013-04-18 15:01   ` Johannes Weiner
2013-04-18 15:01     ` Johannes Weiner
2013-04-18 15:58     ` Mel Gorman
2013-04-18 15:58       ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.