[PATCH 0/3] Removal of lumpy reclaim V2

* [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-11 16:38 ` Mel Gorman
  0 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Mel Gorman, Linux-MM, LKML

Andrew, these three patches should replace the two lumpy reclaim patches
you already have. When applied, there is no functional difference (slightly
changes in layout) but the changelogs are better.

Changelog since V1
o Ying pointed out that compaction was waiting on page writeback and the
  description of the patches in V1 was broken. This version is the same
  except that it is structured differently to explain that waiting on
  page writeback is removed.
o Rebased to v3.4-rc2

This series removes lumpy reclaim and some stalling logic that was
unintentionally being used by memory compaction. The end result
is that stalling on dirty pages during page reclaim now depends on
wait_iff_congested().

Four kernels were compared

3.3.0     vanilla
3.4.0-rc2 vanilla
3.4.0-rc2 lumpyremove-v2 is patch one from this series
3.4.0-rc2 nosync-v2r3 is the full series

Removing lumpy reclaim saves almost 900K of text where as the full series
removes 1200K of text.

   text	   data	    bss	    dec	    hex	filename
6740375	1927944	2260992	10929311	 a6c49f	vmlinux-3.4.0-rc2-vanilla
6739479	1927944	2260992	10928415	 a6c11f	vmlinux-3.4.0-rc2-lumpyremove-v2
6739159	1927944	2260992	10928095	 a6bfdf	vmlinux-3.4.0-rc2-nosync-v2

There are behaviour changes in the series and so tests were run with
monitoring of ftrace events. This disrupts results so the performance
results are distorted but the new behaviour should be clearer.

fs-mark running in a threaded configuration showed little of interest as
it did not push reclaim aggressively

FS-Mark Multi Threaded
                        3.3.0-vanilla       rc2-vanilla       lumpyremove-v2r3       nosync-v2r3
Files/s  min           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
Files/s  mean          3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
Files/s  stddev        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)
Files/s  max           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
Overhead min      508667.00 ( 0.00%)   521350.00 (-2.49%)   544292.00 (-7.00%)   547168.00 (-7.57%)
Overhead mean     551185.00 ( 0.00%)   652690.73 (-18.42%)   991208.40 (-79.83%)   570130.53 (-3.44%)
Overhead stddev    18200.69 ( 0.00%)   331958.29 (-1723.88%)  1579579.43 (-8578.68%)     9576.81 (47.38%)
Overhead max      576775.00 ( 0.00%)  1846634.00 (-220.17%)  6901055.00 (-1096.49%)   585675.00 (-1.54%)
MMTests Statistics: duration
Sys Time Running Test (seconds)             309.90    300.95    307.33    298.95
User+Sys Time Running Test (seconds)        319.32    309.67    315.69    307.51
Total Elapsed Time (seconds)               1187.85   1193.09   1191.98   1193.73

MMTests Statistics: vmstat
Page Ins                                       80532       82212       81420       79480
Page Outs                                  111434984   111456240   111437376   111582628
Swap Ins                                           0           0           0           0
Swap Outs                                          0           0           0           0
Direct pages scanned                           44881       27889       27453       34843
Kswapd pages scanned                        25841428    25860774    25861233    25843212
Kswapd pages reclaimed                      25841393    25860741    25861199    25843179
Direct pages reclaimed                         44881       27889       27453       34843
Kswapd efficiency                                99%         99%         99%         99%
Kswapd velocity                            21754.791   21675.460   21696.029   21649.127
Direct efficiency                               100%        100%        100%        100%
Direct velocity                               37.783      23.375      23.031      29.188
Percentage direct scans                           0%          0%          0%          0%

ftrace showed that there was no stalling on writeback or pages submitted
for IO from reclaim context.

postmark was similar and while it was more interesting, it also did not
push reclaim heavily.

POSTMARK
                                     3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
Transactions per second:               16.00 ( 0.00%)    20.00 (25.00%)    18.00 (12.50%)    17.00 ( 6.25%)
Data megabytes read per second:        18.80 ( 0.00%)    24.27 (29.10%)    22.26 (18.40%)    20.54 ( 9.26%)
Data megabytes written per second:     35.83 ( 0.00%)    46.25 (29.08%)    42.42 (18.39%)    39.14 ( 9.24%)
Files created alone per second:        28.00 ( 0.00%)    38.00 (35.71%)    34.00 (21.43%)    30.00 ( 7.14%)
Files create/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)
Files deleted alone per second:       556.00 ( 0.00%)  1224.00 (120.14%)  3062.00 (450.72%)  6124.00 (1001.44%)
Files delete/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds)             113.34    107.99    109.73    108.72
User+Sys Time Running Test (seconds)        145.51    139.81    143.32    143.55
Total Elapsed Time (seconds)               1159.16    899.23    980.17   1062.27

MMTests Statistics: vmstat
Page Ins                                    13710192    13729032    13727944    13760136
Page Outs                                   43071140    42987228    42733684    42931624
Swap Ins                                           0           0           0           0
Swap Outs                                          0           0           0           0
Direct pages scanned                               0           0           0           0
Kswapd pages scanned                         9941613     9937443     9939085     9929154
Kswapd pages reclaimed                       9940926     9936751     9938397     9928465
Direct pages reclaimed                             0           0           0           0
Kswapd efficiency                                99%         99%         99%         99%
Kswapd velocity                             8576.567   11051.058   10140.164    9347.109
Direct efficiency                               100%        100%        100%        100%
Direct velocity                                0.000       0.000       0.000       0.000

It looks like here that the full series regresses performance but as ftrace
showed no usage of wait_iff_congested() or sync reclaim I am assuming it's
a disruption due to monitoring. Other data such as memory usage, page IO,
swap IO all looked similar.

Running a benchmark with a plain DD showed nothing very interesting. The
full series stalled in wait_iff_congested() slightly less but stall times
on vanilla kernels were marginal.

Running a benchmark that hammered on file-backed mappings showed stalls
due to congestion but not in sync writebacks

MICRO
                                     3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
MMTests Statistics: duration
Sys Time Running Test (seconds)             308.13    294.50    298.75    299.53
User+Sys Time Running Test (seconds)        330.45    316.28    318.93    320.79
Total Elapsed Time (seconds)               1814.90   1833.88   1821.14   1832.91

MMTests Statistics: vmstat
Page Ins                                      108712      120708       97224      110344
Page Outs                                  155514576   156017404   155813676   156193256
Swap Ins                                           0           0           0           0
Swap Outs                                          0           0           0           0
Direct pages scanned                         2599253     1550480     2512822     2414760
Kswapd pages scanned                        69742364    71150694    68839041    69692533
Kswapd pages reclaimed                      34824488    34773341    34796602    34799396
Direct pages reclaimed                         53693       94750       61792       75205
Kswapd efficiency                                49%         48%         50%         49%
Kswapd velocity                            38427.662   38797.901   37799.972   38022.889
Direct efficiency                                 2%          6%          2%          3%
Direct velocity                             1432.174     845.464    1379.807    1317.446
Percentage direct scans                           3%          2%          3%          3%
Page writes by reclaim                             0           0           0           0
Page writes file                                   0           0           0           0
Page writes anon                                   0           0           0           0
Page reclaim immediate                             0           0           0        1218
Page rescued immediate                             0           0           0           0
Slabs scanned                                  15360       16384       13312       16384
Direct inode steals                                0           0           0           0
Kswapd inode steals                             4340        4327        1630        4323

FTrace Reclaim Statistics: congestion_wait
Direct number congest     waited                 0          0          0          0 
Direct time   congest     waited               0ms        0ms        0ms        0ms 
Direct full   congest     waited                 0          0          0          0 
Direct number conditional waited               900        870        754        789 
Direct time   conditional waited               0ms        0ms        0ms       20ms 
Direct full   conditional waited                 0          0          0          0 
KSwapd number congest     waited              2106       2308       2116       1915 
KSwapd time   congest     waited          139924ms   157832ms   125652ms   132516ms 
KSwapd full   congest     waited              1346       1530       1202       1278 
KSwapd number conditional waited             12922      16320      10943      14670 
KSwapd time   conditional waited               0ms        0ms        0ms        0ms 
KSwapd full   conditional waited                 0          0          0          0 

Reclaim statistics are not radically changed. The stall times in kswapd
are massive but it is clear that it is due to calls to congestion_wait()
and that is almost certainly the call in balance_pgdat(). Otherwise stalls
due to dirty pages are non-existant.

I ran a benchmark that stressed high-order allocation. This is very
artifical load but was used in the past to evaluate lumpy reclaim and
compaction. Generally I look at allocation success rates and latency figures.

STRESS-HIGHALLOC
                 3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
Pass 1          81.00 ( 0.00%)    28.00 (-53.00%)    24.00 (-57.00%)    28.00 (-53.00%)
Pass 2          82.00 ( 0.00%)    39.00 (-43.00%)    38.00 (-44.00%)    43.00 (-39.00%)
while Rested    88.00 ( 0.00%)    87.00 (-1.00%)    88.00 ( 0.00%)    88.00 ( 0.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds)             740.93    681.42    685.14    684.87
User+Sys Time Running Test (seconds)       2922.65   3269.52   3281.35   3279.44
Total Elapsed Time (seconds)               1161.73   1152.49   1159.55   1161.44

MMTests Statistics: vmstat
Page Ins                                     4486020     2807256     2855944     2876244
Page Outs                                    7261600     7973688     7975320     7986120
Swap Ins                                       31694           0           0           0
Swap Outs                                      98179           0           0           0
Direct pages scanned                           53494       57731       34406      113015
Kswapd pages scanned                         6271173     1287481     1278174     1219095
Kswapd pages reclaimed                       2029240     1281025     1260708     1201583
Direct pages reclaimed                          1468       14564       16649       92456
Kswapd efficiency                                32%         99%         98%         98%
Kswapd velocity                             5398.133    1117.130    1102.302    1049.641
Direct efficiency                                 2%         25%         48%         81%
Direct velocity                               46.047      50.092      29.672      97.306
Percentage direct scans                           0%          4%          2%          8%
Page writes by reclaim                       1616049           0           0           0
Page writes file                             1517870           0           0           0
Page writes anon                               98179           0           0           0
Page reclaim immediate                        103778       27339        9796       17831
Page rescued immediate                             0           0           0           0
Slabs scanned                                1096704      986112      980992      998400
Direct inode steals                              223      215040      216736      247881
Kswapd inode steals                           175331       61548       68444       63066
Kswapd skipped wait                            21991           0           1           0
THP fault alloc                                    1         135         125         134
THP collapse alloc                               393         311         228         236
THP splits                                        25          13           7           8
THP fault fallback                                 0           0           0           0
THP collapse fail                                  3           5           7           7
Compaction stalls                                865        1270        1422        1518
Compaction success                               370         401         353         383
Compaction failures                              495         869        1069        1135
Compaction pages moved                        870155     3828868     4036106     4423626
Compaction move failure                        26429       23865       29742       27514

Success rates are completely hosed for 3.4-rc2 which is almost certainly
due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
expected this would happen for kswapd and impair allocation success rates
(https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
a difference: 80% less scanning, 37% less reclaim by kswapd

In comparison, reclaim/compaction is not aggressive and gives up easily
which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
much more aggressive about reclaim/compaction than THP allocations are. The
stress test above is allocating like neither THP or hugetlbfs but is much
closer to THP.

Mainline is now impaired in terms of high order allocation under heavy load
although I do not know to what degree as I did not test with __GFP_REPEAT.
Keep this in mind for bugs related to hugepage pool resizing, THP allocation
and high order atomic allocation failures from network devices.

In terms of congestion throttling, I see the following for this test

FTrace Reclaim Statistics: congestion_wait
Direct number congest     waited                 3          0          0          0 
Direct time   congest     waited               0ms        0ms        0ms        0ms 
Direct full   congest     waited                 0          0          0          0 
Direct number conditional waited               957        512       1081       1075 
Direct time   conditional waited               0ms        0ms        0ms        0ms 
Direct full   conditional waited                 0          0          0          0 
KSwapd number congest     waited                36          4          3          5 
KSwapd time   congest     waited            3148ms      400ms      300ms      500ms 
KSwapd full   congest     waited                30          4          3          5 
KSwapd number conditional waited             88514        197        332        542 
KSwapd time   conditional waited            4980ms        0ms        0ms        0ms 
KSwapd full   conditional waited                49          0          0          0 

The "conditional waited" times are the most interesting as this is directly
impacted by the number of dirty pages encountered during scan. As lumpy
reclaim is no longer scanning contiguous ranges, it is finding fewer dirty
pages. This brings wait times from about 5 seconds to 0. kswapd itself is
still calling congestion_wait() so it'll still stall but it's a lot less.

In terms of the type of IO we were doing, I see this

FTrace Reclaim Statistics: mm_vmscan_writepage
Direct writes anon  sync                         0          0          0          0 
Direct writes anon  async                        0          0          0          0 
Direct writes file  sync                         0          0          0          0 
Direct writes file  async                        0          0          0          0 
Direct writes mixed sync                         0          0          0          0 
Direct writes mixed async                        0          0          0          0 
KSwapd writes anon  sync                         0          0          0          0 
KSwapd writes anon  async                    91682          0          0          0 
KSwapd writes file  sync                         0          0          0          0 
KSwapd writes file  async                   822629          0          0          0 
KSwapd writes mixed sync                         0          0          0          0 
KSwapd writes mixed async                        0          0          0          0 

In 3.2, kswapd was doing a bunch of async writes of pages but
reclaim/compaction was never reaching a point where it was doing sync
IO. This does not guarantee that reclaim/compaction was not calling
wait_on_page_writeback() but I would consider it unlikely. It indicates
that merging patches 2 and 3 to stop reclaim/compaction calling
wait_on_page_writeback() should be safe.

 include/trace/events/vmscan.h |   40 ++-----
 mm/vmscan.c                   |  263 ++++-------------------------------------
 2 files changed, 37 insertions(+), 266 deletions(-)

-- 
1.7.9.2

^ permalink raw reply	[flat|nested] 36+ messages in thread