All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-11 16:38 ` Mel Gorman
  0 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Mel Gorman, Linux-MM, LKML

Andrew, these three patches should replace the two lumpy reclaim patches
you already have. When applied, there is no functional difference (slightly
changes in layout) but the changelogs are better.

Changelog since V1
o Ying pointed out that compaction was waiting on page writeback and the
  description of the patches in V1 was broken. This version is the same
  except that it is structured differently to explain that waiting on
  page writeback is removed.
o Rebased to v3.4-rc2

This series removes lumpy reclaim and some stalling logic that was
unintentionally being used by memory compaction. The end result
is that stalling on dirty pages during page reclaim now depends on
wait_iff_congested().

Four kernels were compared

3.3.0     vanilla
3.4.0-rc2 vanilla
3.4.0-rc2 lumpyremove-v2 is patch one from this series
3.4.0-rc2 nosync-v2r3 is the full series

Removing lumpy reclaim saves almost 900K of text where as the full series
removes 1200K of text.

   text	   data	    bss	    dec	    hex	filename
6740375	1927944	2260992	10929311	 a6c49f	vmlinux-3.4.0-rc2-vanilla
6739479	1927944	2260992	10928415	 a6c11f	vmlinux-3.4.0-rc2-lumpyremove-v2
6739159	1927944	2260992	10928095	 a6bfdf	vmlinux-3.4.0-rc2-nosync-v2

There are behaviour changes in the series and so tests were run with
monitoring of ftrace events. This disrupts results so the performance
results are distorted but the new behaviour should be clearer.

fs-mark running in a threaded configuration showed little of interest as
it did not push reclaim aggressively

FS-Mark Multi Threaded
                        3.3.0-vanilla       rc2-vanilla       lumpyremove-v2r3       nosync-v2r3
Files/s  min           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
Files/s  mean          3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
Files/s  stddev        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)
Files/s  max           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
Overhead min      508667.00 ( 0.00%)   521350.00 (-2.49%)   544292.00 (-7.00%)   547168.00 (-7.57%)
Overhead mean     551185.00 ( 0.00%)   652690.73 (-18.42%)   991208.40 (-79.83%)   570130.53 (-3.44%)
Overhead stddev    18200.69 ( 0.00%)   331958.29 (-1723.88%)  1579579.43 (-8578.68%)     9576.81 (47.38%)
Overhead max      576775.00 ( 0.00%)  1846634.00 (-220.17%)  6901055.00 (-1096.49%)   585675.00 (-1.54%)
MMTests Statistics: duration
Sys Time Running Test (seconds)             309.90    300.95    307.33    298.95
User+Sys Time Running Test (seconds)        319.32    309.67    315.69    307.51
Total Elapsed Time (seconds)               1187.85   1193.09   1191.98   1193.73

MMTests Statistics: vmstat
Page Ins                                       80532       82212       81420       79480
Page Outs                                  111434984   111456240   111437376   111582628
Swap Ins                                           0           0           0           0
Swap Outs                                          0           0           0           0
Direct pages scanned                           44881       27889       27453       34843
Kswapd pages scanned                        25841428    25860774    25861233    25843212
Kswapd pages reclaimed                      25841393    25860741    25861199    25843179
Direct pages reclaimed                         44881       27889       27453       34843
Kswapd efficiency                                99%         99%         99%         99%
Kswapd velocity                            21754.791   21675.460   21696.029   21649.127
Direct efficiency                               100%        100%        100%        100%
Direct velocity                               37.783      23.375      23.031      29.188
Percentage direct scans                           0%          0%          0%          0%

ftrace showed that there was no stalling on writeback or pages submitted
for IO from reclaim context.


postmark was similar and while it was more interesting, it also did not
push reclaim heavily.

POSTMARK
                                     3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
Transactions per second:               16.00 ( 0.00%)    20.00 (25.00%)    18.00 (12.50%)    17.00 ( 6.25%)
Data megabytes read per second:        18.80 ( 0.00%)    24.27 (29.10%)    22.26 (18.40%)    20.54 ( 9.26%)
Data megabytes written per second:     35.83 ( 0.00%)    46.25 (29.08%)    42.42 (18.39%)    39.14 ( 9.24%)
Files created alone per second:        28.00 ( 0.00%)    38.00 (35.71%)    34.00 (21.43%)    30.00 ( 7.14%)
Files create/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)
Files deleted alone per second:       556.00 ( 0.00%)  1224.00 (120.14%)  3062.00 (450.72%)  6124.00 (1001.44%)
Files delete/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds)             113.34    107.99    109.73    108.72
User+Sys Time Running Test (seconds)        145.51    139.81    143.32    143.55
Total Elapsed Time (seconds)               1159.16    899.23    980.17   1062.27

MMTests Statistics: vmstat
Page Ins                                    13710192    13729032    13727944    13760136
Page Outs                                   43071140    42987228    42733684    42931624
Swap Ins                                           0           0           0           0
Swap Outs                                          0           0           0           0
Direct pages scanned                               0           0           0           0
Kswapd pages scanned                         9941613     9937443     9939085     9929154
Kswapd pages reclaimed                       9940926     9936751     9938397     9928465
Direct pages reclaimed                             0           0           0           0
Kswapd efficiency                                99%         99%         99%         99%
Kswapd velocity                             8576.567   11051.058   10140.164    9347.109
Direct efficiency                               100%        100%        100%        100%
Direct velocity                                0.000       0.000       0.000       0.000

It looks like here that the full series regresses performance but as ftrace
showed no usage of wait_iff_congested() or sync reclaim I am assuming it's
a disruption due to monitoring. Other data such as memory usage, page IO,
swap IO all looked similar.

Running a benchmark with a plain DD showed nothing very interesting. The
full series stalled in wait_iff_congested() slightly less but stall times
on vanilla kernels were marginal.

Running a benchmark that hammered on file-backed mappings showed stalls
due to congestion but not in sync writebacks

MICRO
                                     3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
MMTests Statistics: duration
Sys Time Running Test (seconds)             308.13    294.50    298.75    299.53
User+Sys Time Running Test (seconds)        330.45    316.28    318.93    320.79
Total Elapsed Time (seconds)               1814.90   1833.88   1821.14   1832.91

MMTests Statistics: vmstat
Page Ins                                      108712      120708       97224      110344
Page Outs                                  155514576   156017404   155813676   156193256
Swap Ins                                           0           0           0           0
Swap Outs                                          0           0           0           0
Direct pages scanned                         2599253     1550480     2512822     2414760
Kswapd pages scanned                        69742364    71150694    68839041    69692533
Kswapd pages reclaimed                      34824488    34773341    34796602    34799396
Direct pages reclaimed                         53693       94750       61792       75205
Kswapd efficiency                                49%         48%         50%         49%
Kswapd velocity                            38427.662   38797.901   37799.972   38022.889
Direct efficiency                                 2%          6%          2%          3%
Direct velocity                             1432.174     845.464    1379.807    1317.446
Percentage direct scans                           3%          2%          3%          3%
Page writes by reclaim                             0           0           0           0
Page writes file                                   0           0           0           0
Page writes anon                                   0           0           0           0
Page reclaim immediate                             0           0           0        1218
Page rescued immediate                             0           0           0           0
Slabs scanned                                  15360       16384       13312       16384
Direct inode steals                                0           0           0           0
Kswapd inode steals                             4340        4327        1630        4323

FTrace Reclaim Statistics: congestion_wait
Direct number congest     waited                 0          0          0          0 
Direct time   congest     waited               0ms        0ms        0ms        0ms 
Direct full   congest     waited                 0          0          0          0 
Direct number conditional waited               900        870        754        789 
Direct time   conditional waited               0ms        0ms        0ms       20ms 
Direct full   conditional waited                 0          0          0          0 
KSwapd number congest     waited              2106       2308       2116       1915 
KSwapd time   congest     waited          139924ms   157832ms   125652ms   132516ms 
KSwapd full   congest     waited              1346       1530       1202       1278 
KSwapd number conditional waited             12922      16320      10943      14670 
KSwapd time   conditional waited               0ms        0ms        0ms        0ms 
KSwapd full   conditional waited                 0          0          0          0 


Reclaim statistics are not radically changed. The stall times in kswapd
are massive but it is clear that it is due to calls to congestion_wait()
and that is almost certainly the call in balance_pgdat(). Otherwise stalls
due to dirty pages are non-existant.

I ran a benchmark that stressed high-order allocation. This is very
artifical load but was used in the past to evaluate lumpy reclaim and
compaction. Generally I look at allocation success rates and latency figures.

STRESS-HIGHALLOC
                 3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
Pass 1          81.00 ( 0.00%)    28.00 (-53.00%)    24.00 (-57.00%)    28.00 (-53.00%)
Pass 2          82.00 ( 0.00%)    39.00 (-43.00%)    38.00 (-44.00%)    43.00 (-39.00%)
while Rested    88.00 ( 0.00%)    87.00 (-1.00%)    88.00 ( 0.00%)    88.00 ( 0.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds)             740.93    681.42    685.14    684.87
User+Sys Time Running Test (seconds)       2922.65   3269.52   3281.35   3279.44
Total Elapsed Time (seconds)               1161.73   1152.49   1159.55   1161.44

MMTests Statistics: vmstat
Page Ins                                     4486020     2807256     2855944     2876244
Page Outs                                    7261600     7973688     7975320     7986120
Swap Ins                                       31694           0           0           0
Swap Outs                                      98179           0           0           0
Direct pages scanned                           53494       57731       34406      113015
Kswapd pages scanned                         6271173     1287481     1278174     1219095
Kswapd pages reclaimed                       2029240     1281025     1260708     1201583
Direct pages reclaimed                          1468       14564       16649       92456
Kswapd efficiency                                32%         99%         98%         98%
Kswapd velocity                             5398.133    1117.130    1102.302    1049.641
Direct efficiency                                 2%         25%         48%         81%
Direct velocity                               46.047      50.092      29.672      97.306
Percentage direct scans                           0%          4%          2%          8%
Page writes by reclaim                       1616049           0           0           0
Page writes file                             1517870           0           0           0
Page writes anon                               98179           0           0           0
Page reclaim immediate                        103778       27339        9796       17831
Page rescued immediate                             0           0           0           0
Slabs scanned                                1096704      986112      980992      998400
Direct inode steals                              223      215040      216736      247881
Kswapd inode steals                           175331       61548       68444       63066
Kswapd skipped wait                            21991           0           1           0
THP fault alloc                                    1         135         125         134
THP collapse alloc                               393         311         228         236
THP splits                                        25          13           7           8
THP fault fallback                                 0           0           0           0
THP collapse fail                                  3           5           7           7
Compaction stalls                                865        1270        1422        1518
Compaction success                               370         401         353         383
Compaction failures                              495         869        1069        1135
Compaction pages moved                        870155     3828868     4036106     4423626
Compaction move failure                        26429       23865       29742       27514

Success rates are completely hosed for 3.4-rc2 which is almost certainly
due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
expected this would happen for kswapd and impair allocation success rates
(https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
a difference: 80% less scanning, 37% less reclaim by kswapd

In comparison, reclaim/compaction is not aggressive and gives up easily
which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
much more aggressive about reclaim/compaction than THP allocations are. The
stress test above is allocating like neither THP or hugetlbfs but is much
closer to THP.

Mainline is now impaired in terms of high order allocation under heavy load
although I do not know to what degree as I did not test with __GFP_REPEAT.
Keep this in mind for bugs related to hugepage pool resizing, THP allocation
and high order atomic allocation failures from network devices.

In terms of congestion throttling, I see the following for this test

FTrace Reclaim Statistics: congestion_wait
Direct number congest     waited                 3          0          0          0 
Direct time   congest     waited               0ms        0ms        0ms        0ms 
Direct full   congest     waited                 0          0          0          0 
Direct number conditional waited               957        512       1081       1075 
Direct time   conditional waited               0ms        0ms        0ms        0ms 
Direct full   conditional waited                 0          0          0          0 
KSwapd number congest     waited                36          4          3          5 
KSwapd time   congest     waited            3148ms      400ms      300ms      500ms 
KSwapd full   congest     waited                30          4          3          5 
KSwapd number conditional waited             88514        197        332        542 
KSwapd time   conditional waited            4980ms        0ms        0ms        0ms 
KSwapd full   conditional waited                49          0          0          0 

The "conditional waited" times are the most interesting as this is directly
impacted by the number of dirty pages encountered during scan. As lumpy
reclaim is no longer scanning contiguous ranges, it is finding fewer dirty
pages. This brings wait times from about 5 seconds to 0. kswapd itself is
still calling congestion_wait() so it'll still stall but it's a lot less.

In terms of the type of IO we were doing, I see this

FTrace Reclaim Statistics: mm_vmscan_writepage
Direct writes anon  sync                         0          0          0          0 
Direct writes anon  async                        0          0          0          0 
Direct writes file  sync                         0          0          0          0 
Direct writes file  async                        0          0          0          0 
Direct writes mixed sync                         0          0          0          0 
Direct writes mixed async                        0          0          0          0 
KSwapd writes anon  sync                         0          0          0          0 
KSwapd writes anon  async                    91682          0          0          0 
KSwapd writes file  sync                         0          0          0          0 
KSwapd writes file  async                   822629          0          0          0 
KSwapd writes mixed sync                         0          0          0          0 
KSwapd writes mixed async                        0          0          0          0 

In 3.2, kswapd was doing a bunch of async writes of pages but
reclaim/compaction was never reaching a point where it was doing sync
IO. This does not guarantee that reclaim/compaction was not calling
wait_on_page_writeback() but I would consider it unlikely. It indicates
that merging patches 2 and 3 to stop reclaim/compaction calling
wait_on_page_writeback() should be safe.

 include/trace/events/vmscan.h |   40 ++-----
 mm/vmscan.c                   |  263 ++++-------------------------------------
 2 files changed, 37 insertions(+), 266 deletions(-)

-- 
1.7.9.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-11 16:38 ` Mel Gorman
  0 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Mel Gorman, Linux-MM, LKML

Andrew, these three patches should replace the two lumpy reclaim patches
you already have. When applied, there is no functional difference (slightly
changes in layout) but the changelogs are better.

Changelog since V1
o Ying pointed out that compaction was waiting on page writeback and the
  description of the patches in V1 was broken. This version is the same
  except that it is structured differently to explain that waiting on
  page writeback is removed.
o Rebased to v3.4-rc2

This series removes lumpy reclaim and some stalling logic that was
unintentionally being used by memory compaction. The end result
is that stalling on dirty pages during page reclaim now depends on
wait_iff_congested().

Four kernels were compared

3.3.0     vanilla
3.4.0-rc2 vanilla
3.4.0-rc2 lumpyremove-v2 is patch one from this series
3.4.0-rc2 nosync-v2r3 is the full series

Removing lumpy reclaim saves almost 900K of text where as the full series
removes 1200K of text.

   text	   data	    bss	    dec	    hex	filename
6740375	1927944	2260992	10929311	 a6c49f	vmlinux-3.4.0-rc2-vanilla
6739479	1927944	2260992	10928415	 a6c11f	vmlinux-3.4.0-rc2-lumpyremove-v2
6739159	1927944	2260992	10928095	 a6bfdf	vmlinux-3.4.0-rc2-nosync-v2

There are behaviour changes in the series and so tests were run with
monitoring of ftrace events. This disrupts results so the performance
results are distorted but the new behaviour should be clearer.

fs-mark running in a threaded configuration showed little of interest as
it did not push reclaim aggressively

FS-Mark Multi Threaded
                        3.3.0-vanilla       rc2-vanilla       lumpyremove-v2r3       nosync-v2r3
Files/s  min           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
Files/s  mean          3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
Files/s  stddev        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)
Files/s  max           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
Overhead min      508667.00 ( 0.00%)   521350.00 (-2.49%)   544292.00 (-7.00%)   547168.00 (-7.57%)
Overhead mean     551185.00 ( 0.00%)   652690.73 (-18.42%)   991208.40 (-79.83%)   570130.53 (-3.44%)
Overhead stddev    18200.69 ( 0.00%)   331958.29 (-1723.88%)  1579579.43 (-8578.68%)     9576.81 (47.38%)
Overhead max      576775.00 ( 0.00%)  1846634.00 (-220.17%)  6901055.00 (-1096.49%)   585675.00 (-1.54%)
MMTests Statistics: duration
Sys Time Running Test (seconds)             309.90    300.95    307.33    298.95
User+Sys Time Running Test (seconds)        319.32    309.67    315.69    307.51
Total Elapsed Time (seconds)               1187.85   1193.09   1191.98   1193.73

MMTests Statistics: vmstat
Page Ins                                       80532       82212       81420       79480
Page Outs                                  111434984   111456240   111437376   111582628
Swap Ins                                           0           0           0           0
Swap Outs                                          0           0           0           0
Direct pages scanned                           44881       27889       27453       34843
Kswapd pages scanned                        25841428    25860774    25861233    25843212
Kswapd pages reclaimed                      25841393    25860741    25861199    25843179
Direct pages reclaimed                         44881       27889       27453       34843
Kswapd efficiency                                99%         99%         99%         99%
Kswapd velocity                            21754.791   21675.460   21696.029   21649.127
Direct efficiency                               100%        100%        100%        100%
Direct velocity                               37.783      23.375      23.031      29.188
Percentage direct scans                           0%          0%          0%          0%

ftrace showed that there was no stalling on writeback or pages submitted
for IO from reclaim context.


postmark was similar and while it was more interesting, it also did not
push reclaim heavily.

POSTMARK
                                     3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
Transactions per second:               16.00 ( 0.00%)    20.00 (25.00%)    18.00 (12.50%)    17.00 ( 6.25%)
Data megabytes read per second:        18.80 ( 0.00%)    24.27 (29.10%)    22.26 (18.40%)    20.54 ( 9.26%)
Data megabytes written per second:     35.83 ( 0.00%)    46.25 (29.08%)    42.42 (18.39%)    39.14 ( 9.24%)
Files created alone per second:        28.00 ( 0.00%)    38.00 (35.71%)    34.00 (21.43%)    30.00 ( 7.14%)
Files create/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)
Files deleted alone per second:       556.00 ( 0.00%)  1224.00 (120.14%)  3062.00 (450.72%)  6124.00 (1001.44%)
Files delete/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds)             113.34    107.99    109.73    108.72
User+Sys Time Running Test (seconds)        145.51    139.81    143.32    143.55
Total Elapsed Time (seconds)               1159.16    899.23    980.17   1062.27

MMTests Statistics: vmstat
Page Ins                                    13710192    13729032    13727944    13760136
Page Outs                                   43071140    42987228    42733684    42931624
Swap Ins                                           0           0           0           0
Swap Outs                                          0           0           0           0
Direct pages scanned                               0           0           0           0
Kswapd pages scanned                         9941613     9937443     9939085     9929154
Kswapd pages reclaimed                       9940926     9936751     9938397     9928465
Direct pages reclaimed                             0           0           0           0
Kswapd efficiency                                99%         99%         99%         99%
Kswapd velocity                             8576.567   11051.058   10140.164    9347.109
Direct efficiency                               100%        100%        100%        100%
Direct velocity                                0.000       0.000       0.000       0.000

It looks like here that the full series regresses performance but as ftrace
showed no usage of wait_iff_congested() or sync reclaim I am assuming it's
a disruption due to monitoring. Other data such as memory usage, page IO,
swap IO all looked similar.

Running a benchmark with a plain DD showed nothing very interesting. The
full series stalled in wait_iff_congested() slightly less but stall times
on vanilla kernels were marginal.

Running a benchmark that hammered on file-backed mappings showed stalls
due to congestion but not in sync writebacks

MICRO
                                     3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
MMTests Statistics: duration
Sys Time Running Test (seconds)             308.13    294.50    298.75    299.53
User+Sys Time Running Test (seconds)        330.45    316.28    318.93    320.79
Total Elapsed Time (seconds)               1814.90   1833.88   1821.14   1832.91

MMTests Statistics: vmstat
Page Ins                                      108712      120708       97224      110344
Page Outs                                  155514576   156017404   155813676   156193256
Swap Ins                                           0           0           0           0
Swap Outs                                          0           0           0           0
Direct pages scanned                         2599253     1550480     2512822     2414760
Kswapd pages scanned                        69742364    71150694    68839041    69692533
Kswapd pages reclaimed                      34824488    34773341    34796602    34799396
Direct pages reclaimed                         53693       94750       61792       75205
Kswapd efficiency                                49%         48%         50%         49%
Kswapd velocity                            38427.662   38797.901   37799.972   38022.889
Direct efficiency                                 2%          6%          2%          3%
Direct velocity                             1432.174     845.464    1379.807    1317.446
Percentage direct scans                           3%          2%          3%          3%
Page writes by reclaim                             0           0           0           0
Page writes file                                   0           0           0           0
Page writes anon                                   0           0           0           0
Page reclaim immediate                             0           0           0        1218
Page rescued immediate                             0           0           0           0
Slabs scanned                                  15360       16384       13312       16384
Direct inode steals                                0           0           0           0
Kswapd inode steals                             4340        4327        1630        4323

FTrace Reclaim Statistics: congestion_wait
Direct number congest     waited                 0          0          0          0 
Direct time   congest     waited               0ms        0ms        0ms        0ms 
Direct full   congest     waited                 0          0          0          0 
Direct number conditional waited               900        870        754        789 
Direct time   conditional waited               0ms        0ms        0ms       20ms 
Direct full   conditional waited                 0          0          0          0 
KSwapd number congest     waited              2106       2308       2116       1915 
KSwapd time   congest     waited          139924ms   157832ms   125652ms   132516ms 
KSwapd full   congest     waited              1346       1530       1202       1278 
KSwapd number conditional waited             12922      16320      10943      14670 
KSwapd time   conditional waited               0ms        0ms        0ms        0ms 
KSwapd full   conditional waited                 0          0          0          0 


Reclaim statistics are not radically changed. The stall times in kswapd
are massive but it is clear that it is due to calls to congestion_wait()
and that is almost certainly the call in balance_pgdat(). Otherwise stalls
due to dirty pages are non-existant.

I ran a benchmark that stressed high-order allocation. This is very
artifical load but was used in the past to evaluate lumpy reclaim and
compaction. Generally I look at allocation success rates and latency figures.

STRESS-HIGHALLOC
                 3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
Pass 1          81.00 ( 0.00%)    28.00 (-53.00%)    24.00 (-57.00%)    28.00 (-53.00%)
Pass 2          82.00 ( 0.00%)    39.00 (-43.00%)    38.00 (-44.00%)    43.00 (-39.00%)
while Rested    88.00 ( 0.00%)    87.00 (-1.00%)    88.00 ( 0.00%)    88.00 ( 0.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds)             740.93    681.42    685.14    684.87
User+Sys Time Running Test (seconds)       2922.65   3269.52   3281.35   3279.44
Total Elapsed Time (seconds)               1161.73   1152.49   1159.55   1161.44

MMTests Statistics: vmstat
Page Ins                                     4486020     2807256     2855944     2876244
Page Outs                                    7261600     7973688     7975320     7986120
Swap Ins                                       31694           0           0           0
Swap Outs                                      98179           0           0           0
Direct pages scanned                           53494       57731       34406      113015
Kswapd pages scanned                         6271173     1287481     1278174     1219095
Kswapd pages reclaimed                       2029240     1281025     1260708     1201583
Direct pages reclaimed                          1468       14564       16649       92456
Kswapd efficiency                                32%         99%         98%         98%
Kswapd velocity                             5398.133    1117.130    1102.302    1049.641
Direct efficiency                                 2%         25%         48%         81%
Direct velocity                               46.047      50.092      29.672      97.306
Percentage direct scans                           0%          4%          2%          8%
Page writes by reclaim                       1616049           0           0           0
Page writes file                             1517870           0           0           0
Page writes anon                               98179           0           0           0
Page reclaim immediate                        103778       27339        9796       17831
Page rescued immediate                             0           0           0           0
Slabs scanned                                1096704      986112      980992      998400
Direct inode steals                              223      215040      216736      247881
Kswapd inode steals                           175331       61548       68444       63066
Kswapd skipped wait                            21991           0           1           0
THP fault alloc                                    1         135         125         134
THP collapse alloc                               393         311         228         236
THP splits                                        25          13           7           8
THP fault fallback                                 0           0           0           0
THP collapse fail                                  3           5           7           7
Compaction stalls                                865        1270        1422        1518
Compaction success                               370         401         353         383
Compaction failures                              495         869        1069        1135
Compaction pages moved                        870155     3828868     4036106     4423626
Compaction move failure                        26429       23865       29742       27514

Success rates are completely hosed for 3.4-rc2 which is almost certainly
due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
expected this would happen for kswapd and impair allocation success rates
(https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
a difference: 80% less scanning, 37% less reclaim by kswapd

In comparison, reclaim/compaction is not aggressive and gives up easily
which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
much more aggressive about reclaim/compaction than THP allocations are. The
stress test above is allocating like neither THP or hugetlbfs but is much
closer to THP.

Mainline is now impaired in terms of high order allocation under heavy load
although I do not know to what degree as I did not test with __GFP_REPEAT.
Keep this in mind for bugs related to hugepage pool resizing, THP allocation
and high order atomic allocation failures from network devices.

In terms of congestion throttling, I see the following for this test

FTrace Reclaim Statistics: congestion_wait
Direct number congest     waited                 3          0          0          0 
Direct time   congest     waited               0ms        0ms        0ms        0ms 
Direct full   congest     waited                 0          0          0          0 
Direct number conditional waited               957        512       1081       1075 
Direct time   conditional waited               0ms        0ms        0ms        0ms 
Direct full   conditional waited                 0          0          0          0 
KSwapd number congest     waited                36          4          3          5 
KSwapd time   congest     waited            3148ms      400ms      300ms      500ms 
KSwapd full   congest     waited                30          4          3          5 
KSwapd number conditional waited             88514        197        332        542 
KSwapd time   conditional waited            4980ms        0ms        0ms        0ms 
KSwapd full   conditional waited                49          0          0          0 

The "conditional waited" times are the most interesting as this is directly
impacted by the number of dirty pages encountered during scan. As lumpy
reclaim is no longer scanning contiguous ranges, it is finding fewer dirty
pages. This brings wait times from about 5 seconds to 0. kswapd itself is
still calling congestion_wait() so it'll still stall but it's a lot less.

In terms of the type of IO we were doing, I see this

FTrace Reclaim Statistics: mm_vmscan_writepage
Direct writes anon  sync                         0          0          0          0 
Direct writes anon  async                        0          0          0          0 
Direct writes file  sync                         0          0          0          0 
Direct writes file  async                        0          0          0          0 
Direct writes mixed sync                         0          0          0          0 
Direct writes mixed async                        0          0          0          0 
KSwapd writes anon  sync                         0          0          0          0 
KSwapd writes anon  async                    91682          0          0          0 
KSwapd writes file  sync                         0          0          0          0 
KSwapd writes file  async                   822629          0          0          0 
KSwapd writes mixed sync                         0          0          0          0 
KSwapd writes mixed async                        0          0          0          0 

In 3.2, kswapd was doing a bunch of async writes of pages but
reclaim/compaction was never reaching a point where it was doing sync
IO. This does not guarantee that reclaim/compaction was not calling
wait_on_page_writeback() but I would consider it unlikely. It indicates
that merging patches 2 and 3 to stop reclaim/compaction calling
wait_on_page_writeback() should be safe.

 include/trace/events/vmscan.h |   40 ++-----
 mm/vmscan.c                   |  263 ++++-------------------------------------
 2 files changed, 37 insertions(+), 266 deletions(-)

-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/3] mm: vmscan: Remove lumpy reclaim
  2012-04-11 16:38 ` Mel Gorman
@ 2012-04-11 16:38   ` Mel Gorman
  -1 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Mel Gorman, Linux-MM, LKML

Lumpy reclaim had a purpose but in the mind of some, it was to kick
the system so hard it trashed. For others the purpose was to complicate
vmscan.c. Over time it was giving softer shoes and a nicer attitude but
memory compaction needs to step up and replace it so this patch sends
lumpy reclaim to the farm.

The tracepoint format changes for isolating LRU pages with this patch
applied. Furthermore reclaim/compaction can no longer queue dirty pages in
pageout() if the underlying BDI is congested. Lumpy reclaim used this logic
and reclaim/compaction was using it in error.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/trace/events/vmscan.h |   26 ++------
 mm/vmscan.c                   |  144 +++++------------------------------------
 2 files changed, 19 insertions(+), 151 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index f64560e..1c20a1f 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -263,22 +263,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
 		unsigned long nr_taken,
-		unsigned long nr_lumpy_taken,
-		unsigned long nr_lumpy_dirty,
-		unsigned long nr_lumpy_failed,
 		isolate_mode_t isolate_mode,
 		int file),
 
-	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file),
+	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file),
 
 	TP_STRUCT__entry(
 		__field(int, order)
 		__field(unsigned long, nr_requested)
 		__field(unsigned long, nr_scanned)
 		__field(unsigned long, nr_taken)
-		__field(unsigned long, nr_lumpy_taken)
-		__field(unsigned long, nr_lumpy_dirty)
-		__field(unsigned long, nr_lumpy_failed)
 		__field(isolate_mode_t, isolate_mode)
 		__field(int, file)
 	),
@@ -288,22 +282,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
 		__entry->nr_requested = nr_requested;
 		__entry->nr_scanned = nr_scanned;
 		__entry->nr_taken = nr_taken;
-		__entry->nr_lumpy_taken = nr_lumpy_taken;
-		__entry->nr_lumpy_dirty = nr_lumpy_dirty;
-		__entry->nr_lumpy_failed = nr_lumpy_failed;
 		__entry->isolate_mode = isolate_mode;
 		__entry->file = file;
 	),
 
-	TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu contig_taken=%lu contig_dirty=%lu contig_failed=%lu file=%d",
+	TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu file=%d",
 		__entry->isolate_mode,
 		__entry->order,
 		__entry->nr_requested,
 		__entry->nr_scanned,
 		__entry->nr_taken,
-		__entry->nr_lumpy_taken,
-		__entry->nr_lumpy_dirty,
-		__entry->nr_lumpy_failed,
 		__entry->file)
 );
 
@@ -313,13 +301,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
 		unsigned long nr_taken,
-		unsigned long nr_lumpy_taken,
-		unsigned long nr_lumpy_dirty,
-		unsigned long nr_lumpy_failed,
 		isolate_mode_t isolate_mode,
 		int file),
 
-	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file)
+	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file)
 
 );
 
@@ -329,13 +314,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
 		unsigned long nr_taken,
-		unsigned long nr_lumpy_taken,
-		unsigned long nr_lumpy_dirty,
-		unsigned long nr_lumpy_failed,
 		isolate_mode_t isolate_mode,
 		int file),
 
-	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file)
+	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file)
 
 );
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33c332b..a4b86bd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -58,9 +58,6 @@
  * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
  * RECLAIM_MODE_ASYNC:  Do not block
  * RECLAIM_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
- * RECLAIM_MODE_LUMPYRECLAIM: For high-order allocations, take a reference
- *			page from the LRU and reclaim all pages within a
- *			naturally aligned range
  * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
  *			order-0 pages and then compact the zone
  */
@@ -68,7 +65,6 @@ typedef unsigned __bitwise__ reclaim_mode_t;
 #define RECLAIM_MODE_SINGLE		((__force reclaim_mode_t)0x01u)
 #define RECLAIM_MODE_ASYNC		((__force reclaim_mode_t)0x02u)
 #define RECLAIM_MODE_SYNC		((__force reclaim_mode_t)0x04u)
-#define RECLAIM_MODE_LUMPYRECLAIM	((__force reclaim_mode_t)0x08u)
 #define RECLAIM_MODE_COMPACTION		((__force reclaim_mode_t)0x10u)
 
 struct scan_control {
@@ -367,27 +363,17 @@ out:
 static void set_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
+	/* Sync reclaim used only for compaction */
 	reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
 
 	/*
-	 * Initially assume we are entering either lumpy reclaim or
-	 * reclaim/compaction.Depending on the order, we will either set the
-	 * sync mode or just reclaim order-0 pages later.
-	 */
-	if (COMPACTION_BUILD)
-		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
-	else
-		sc->reclaim_mode = RECLAIM_MODE_LUMPYRECLAIM;
-
-	/*
-	 * Avoid using lumpy reclaim or reclaim/compaction if possible by
-	 * restricting when its set to either costly allocations or when
+	 * Restrict reclaim/compaction to costly allocations or when
 	 * under memory pressure
 	 */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->reclaim_mode |= syncmode;
-	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->reclaim_mode |= syncmode;
+	if (COMPACTION_BUILD && sc->order &&
+			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
+			 priority < DEF_PRIORITY - 2))
+		sc->reclaim_mode = RECLAIM_MODE_COMPACTION | syncmode;
 	else
 		sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
 }
@@ -416,10 +402,6 @@ static int may_write_to_queue(struct backing_dev_info *bdi,
 		return 1;
 	if (bdi == current->backing_dev_info)
 		return 1;
-
-	/* lumpy reclaim for hugepage often need a lot of write */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		return 1;
 	return 0;
 }
 
@@ -710,10 +692,6 @@ static enum page_references page_check_references(struct page *page,
 	referenced_ptes = page_referenced(page, 1, mz->mem_cgroup, &vm_flags);
 	referenced_page = TestClearPageReferenced(page);
 
-	/* Lumpy reclaim - ignore references */
-	if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
-		return PAGEREF_RECLAIM;
-
 	/*
 	 * Mlock lost the isolation race with us.  Let try_to_unmap()
 	 * move the page to the unevictable list.
@@ -824,7 +802,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				wait_on_page_writeback(page);
 			else {
 				unlock_page(page);
-				goto keep_lumpy;
+				goto keep_reclaim_mode;
 			}
 		}
 
@@ -908,7 +886,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto activate_locked;
 			case PAGE_SUCCESS:
 				if (PageWriteback(page))
-					goto keep_lumpy;
+					goto keep_reclaim_mode;
 				if (PageDirty(page))
 					goto keep;
 
@@ -1008,7 +986,7 @@ keep_locked:
 		unlock_page(page);
 keep:
 		reset_reclaim_mode(sc);
-keep_lumpy:
+keep_reclaim_mode:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
 	}
@@ -1064,11 +1042,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
 	if (!all_lru_mode && !!page_is_file_cache(page) != file)
 		return ret;
 
-	/*
-	 * When this function is being called for lumpy reclaim, we
-	 * initially look into all LRU pages, active, inactive and
-	 * unevictable; only give shrink_page_list evictable pages.
-	 */
+	/* Do not give back unevictable pages for compaction */
 	if (PageUnevictable(page))
 		return ret;
 
@@ -1153,9 +1127,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	struct lruvec *lruvec;
 	struct list_head *src;
 	unsigned long nr_taken = 0;
-	unsigned long nr_lumpy_taken = 0;
-	unsigned long nr_lumpy_dirty = 0;
-	unsigned long nr_lumpy_failed = 0;
 	unsigned long scan;
 	int lru = LRU_BASE;
 
@@ -1168,10 +1139,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 	for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
 		struct page *page;
-		unsigned long pfn;
-		unsigned long end_pfn;
-		unsigned long page_pfn;
-		int zone_id;
 
 		page = lru_to_page(src);
 		prefetchw_prev_lru_page(page, src, flags);
@@ -1193,84 +1160,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		default:
 			BUG();
 		}
-
-		if (!sc->order || !(sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM))
-			continue;
-
-		/*
-		 * Attempt to take all pages in the order aligned region
-		 * surrounding the tag page.  Only take those pages of
-		 * the same active state as that tag page.  We may safely
-		 * round the target page pfn down to the requested order
-		 * as the mem_map is guaranteed valid out to MAX_ORDER,
-		 * where that page is in a different zone we will detect
-		 * it from its zone id and abort this block scan.
-		 */
-		zone_id = page_zone_id(page);
-		page_pfn = page_to_pfn(page);
-		pfn = page_pfn & ~((1 << sc->order) - 1);
-		end_pfn = pfn + (1 << sc->order);
-		for (; pfn < end_pfn; pfn++) {
-			struct page *cursor_page;
-
-			/* The target page is in the block, ignore it. */
-			if (unlikely(pfn == page_pfn))
-				continue;
-
-			/* Avoid holes within the zone. */
-			if (unlikely(!pfn_valid_within(pfn)))
-				break;
-
-			cursor_page = pfn_to_page(pfn);
-
-			/* Check that we have not crossed a zone boundary. */
-			if (unlikely(page_zone_id(cursor_page) != zone_id))
-				break;
-
-			/*
-			 * If we don't have enough swap space, reclaiming of
-			 * anon page which don't already have a swap slot is
-			 * pointless.
-			 */
-			if (nr_swap_pages <= 0 && PageSwapBacked(cursor_page) &&
-			    !PageSwapCache(cursor_page))
-				break;
-
-			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
-				unsigned int isolated_pages;
-
-				mem_cgroup_lru_del(cursor_page);
-				list_move(&cursor_page->lru, dst);
-				isolated_pages = hpage_nr_pages(cursor_page);
-				nr_taken += isolated_pages;
-				nr_lumpy_taken += isolated_pages;
-				if (PageDirty(cursor_page))
-					nr_lumpy_dirty += isolated_pages;
-				scan++;
-				pfn += isolated_pages - 1;
-			} else {
-				/*
-				 * Check if the page is freed already.
-				 *
-				 * We can't use page_count() as that
-				 * requires compound_head and we don't
-				 * have a pin on the page here. If a
-				 * page is tail, we may or may not
-				 * have isolated the head, so assume
-				 * it's not free, it'd be tricky to
-				 * track the head status without a
-				 * page pin.
-				 */
-				if (!PageTail(cursor_page) &&
-				    !atomic_read(&cursor_page->_count))
-					continue;
-				break;
-			}
-		}
-
-		/* If we break out of the loop above, lumpy reclaim failed */
-		if (pfn < end_pfn)
-			nr_lumpy_failed++;
 	}
 
 	*nr_scanned = scan;
@@ -1278,7 +1167,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	trace_mm_vmscan_lru_isolate(sc->order,
 			nr_to_scan, scan,
 			nr_taken,
-			nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed,
 			mode, file);
 	return nr_taken;
 }
@@ -1466,13 +1354,13 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 					int priority,
 					struct scan_control *sc)
 {
-	int lumpy_stall_priority;
+	int stall_priority;
 
 	/* kswapd should not stall on sync IO */
 	if (current_is_kswapd())
 		return false;
 
-	/* Only stall on lumpy reclaim */
+	/* Only stall for memory compaction */
 	if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
 		return false;
 
@@ -1487,11 +1375,11 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 	 * priority to be much higher before stalling.
 	 */
 	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		lumpy_stall_priority = DEF_PRIORITY;
+		stall_priority = DEF_PRIORITY;
 	else
-		lumpy_stall_priority = DEF_PRIORITY / 3;
+		stall_priority = DEF_PRIORITY / 3;
 
-	return priority <= lumpy_stall_priority;
+	return priority <= stall_priority;
 }
 
 /*
@@ -1523,8 +1411,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 	}
 
 	set_reclaim_mode(priority, sc, false);
-	if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
-		isolate_mode |= ISOLATE_ACTIVE;
 
 	lru_add_drain();
 
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 1/3] mm: vmscan: Remove lumpy reclaim
@ 2012-04-11 16:38   ` Mel Gorman
  0 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Mel Gorman, Linux-MM, LKML

Lumpy reclaim had a purpose but in the mind of some, it was to kick
the system so hard it trashed. For others the purpose was to complicate
vmscan.c. Over time it was giving softer shoes and a nicer attitude but
memory compaction needs to step up and replace it so this patch sends
lumpy reclaim to the farm.

The tracepoint format changes for isolating LRU pages with this patch
applied. Furthermore reclaim/compaction can no longer queue dirty pages in
pageout() if the underlying BDI is congested. Lumpy reclaim used this logic
and reclaim/compaction was using it in error.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/trace/events/vmscan.h |   26 ++------
 mm/vmscan.c                   |  144 +++++------------------------------------
 2 files changed, 19 insertions(+), 151 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index f64560e..1c20a1f 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -263,22 +263,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
 		unsigned long nr_taken,
-		unsigned long nr_lumpy_taken,
-		unsigned long nr_lumpy_dirty,
-		unsigned long nr_lumpy_failed,
 		isolate_mode_t isolate_mode,
 		int file),
 
-	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file),
+	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file),
 
 	TP_STRUCT__entry(
 		__field(int, order)
 		__field(unsigned long, nr_requested)
 		__field(unsigned long, nr_scanned)
 		__field(unsigned long, nr_taken)
-		__field(unsigned long, nr_lumpy_taken)
-		__field(unsigned long, nr_lumpy_dirty)
-		__field(unsigned long, nr_lumpy_failed)
 		__field(isolate_mode_t, isolate_mode)
 		__field(int, file)
 	),
@@ -288,22 +282,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
 		__entry->nr_requested = nr_requested;
 		__entry->nr_scanned = nr_scanned;
 		__entry->nr_taken = nr_taken;
-		__entry->nr_lumpy_taken = nr_lumpy_taken;
-		__entry->nr_lumpy_dirty = nr_lumpy_dirty;
-		__entry->nr_lumpy_failed = nr_lumpy_failed;
 		__entry->isolate_mode = isolate_mode;
 		__entry->file = file;
 	),
 
-	TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu contig_taken=%lu contig_dirty=%lu contig_failed=%lu file=%d",
+	TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu file=%d",
 		__entry->isolate_mode,
 		__entry->order,
 		__entry->nr_requested,
 		__entry->nr_scanned,
 		__entry->nr_taken,
-		__entry->nr_lumpy_taken,
-		__entry->nr_lumpy_dirty,
-		__entry->nr_lumpy_failed,
 		__entry->file)
 );
 
@@ -313,13 +301,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
 		unsigned long nr_taken,
-		unsigned long nr_lumpy_taken,
-		unsigned long nr_lumpy_dirty,
-		unsigned long nr_lumpy_failed,
 		isolate_mode_t isolate_mode,
 		int file),
 
-	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file)
+	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file)
 
 );
 
@@ -329,13 +314,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
 		unsigned long nr_taken,
-		unsigned long nr_lumpy_taken,
-		unsigned long nr_lumpy_dirty,
-		unsigned long nr_lumpy_failed,
 		isolate_mode_t isolate_mode,
 		int file),
 
-	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file)
+	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file)
 
 );
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33c332b..a4b86bd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -58,9 +58,6 @@
  * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
  * RECLAIM_MODE_ASYNC:  Do not block
  * RECLAIM_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
- * RECLAIM_MODE_LUMPYRECLAIM: For high-order allocations, take a reference
- *			page from the LRU and reclaim all pages within a
- *			naturally aligned range
  * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
  *			order-0 pages and then compact the zone
  */
@@ -68,7 +65,6 @@ typedef unsigned __bitwise__ reclaim_mode_t;
 #define RECLAIM_MODE_SINGLE		((__force reclaim_mode_t)0x01u)
 #define RECLAIM_MODE_ASYNC		((__force reclaim_mode_t)0x02u)
 #define RECLAIM_MODE_SYNC		((__force reclaim_mode_t)0x04u)
-#define RECLAIM_MODE_LUMPYRECLAIM	((__force reclaim_mode_t)0x08u)
 #define RECLAIM_MODE_COMPACTION		((__force reclaim_mode_t)0x10u)
 
 struct scan_control {
@@ -367,27 +363,17 @@ out:
 static void set_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
+	/* Sync reclaim used only for compaction */
 	reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
 
 	/*
-	 * Initially assume we are entering either lumpy reclaim or
-	 * reclaim/compaction.Depending on the order, we will either set the
-	 * sync mode or just reclaim order-0 pages later.
-	 */
-	if (COMPACTION_BUILD)
-		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
-	else
-		sc->reclaim_mode = RECLAIM_MODE_LUMPYRECLAIM;
-
-	/*
-	 * Avoid using lumpy reclaim or reclaim/compaction if possible by
-	 * restricting when its set to either costly allocations or when
+	 * Restrict reclaim/compaction to costly allocations or when
 	 * under memory pressure
 	 */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->reclaim_mode |= syncmode;
-	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->reclaim_mode |= syncmode;
+	if (COMPACTION_BUILD && sc->order &&
+			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
+			 priority < DEF_PRIORITY - 2))
+		sc->reclaim_mode = RECLAIM_MODE_COMPACTION | syncmode;
 	else
 		sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
 }
@@ -416,10 +402,6 @@ static int may_write_to_queue(struct backing_dev_info *bdi,
 		return 1;
 	if (bdi == current->backing_dev_info)
 		return 1;
-
-	/* lumpy reclaim for hugepage often need a lot of write */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		return 1;
 	return 0;
 }
 
@@ -710,10 +692,6 @@ static enum page_references page_check_references(struct page *page,
 	referenced_ptes = page_referenced(page, 1, mz->mem_cgroup, &vm_flags);
 	referenced_page = TestClearPageReferenced(page);
 
-	/* Lumpy reclaim - ignore references */
-	if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
-		return PAGEREF_RECLAIM;
-
 	/*
 	 * Mlock lost the isolation race with us.  Let try_to_unmap()
 	 * move the page to the unevictable list.
@@ -824,7 +802,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				wait_on_page_writeback(page);
 			else {
 				unlock_page(page);
-				goto keep_lumpy;
+				goto keep_reclaim_mode;
 			}
 		}
 
@@ -908,7 +886,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto activate_locked;
 			case PAGE_SUCCESS:
 				if (PageWriteback(page))
-					goto keep_lumpy;
+					goto keep_reclaim_mode;
 				if (PageDirty(page))
 					goto keep;
 
@@ -1008,7 +986,7 @@ keep_locked:
 		unlock_page(page);
 keep:
 		reset_reclaim_mode(sc);
-keep_lumpy:
+keep_reclaim_mode:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
 	}
@@ -1064,11 +1042,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
 	if (!all_lru_mode && !!page_is_file_cache(page) != file)
 		return ret;
 
-	/*
-	 * When this function is being called for lumpy reclaim, we
-	 * initially look into all LRU pages, active, inactive and
-	 * unevictable; only give shrink_page_list evictable pages.
-	 */
+	/* Do not give back unevictable pages for compaction */
 	if (PageUnevictable(page))
 		return ret;
 
@@ -1153,9 +1127,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	struct lruvec *lruvec;
 	struct list_head *src;
 	unsigned long nr_taken = 0;
-	unsigned long nr_lumpy_taken = 0;
-	unsigned long nr_lumpy_dirty = 0;
-	unsigned long nr_lumpy_failed = 0;
 	unsigned long scan;
 	int lru = LRU_BASE;
 
@@ -1168,10 +1139,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 	for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
 		struct page *page;
-		unsigned long pfn;
-		unsigned long end_pfn;
-		unsigned long page_pfn;
-		int zone_id;
 
 		page = lru_to_page(src);
 		prefetchw_prev_lru_page(page, src, flags);
@@ -1193,84 +1160,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		default:
 			BUG();
 		}
-
-		if (!sc->order || !(sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM))
-			continue;
-
-		/*
-		 * Attempt to take all pages in the order aligned region
-		 * surrounding the tag page.  Only take those pages of
-		 * the same active state as that tag page.  We may safely
-		 * round the target page pfn down to the requested order
-		 * as the mem_map is guaranteed valid out to MAX_ORDER,
-		 * where that page is in a different zone we will detect
-		 * it from its zone id and abort this block scan.
-		 */
-		zone_id = page_zone_id(page);
-		page_pfn = page_to_pfn(page);
-		pfn = page_pfn & ~((1 << sc->order) - 1);
-		end_pfn = pfn + (1 << sc->order);
-		for (; pfn < end_pfn; pfn++) {
-			struct page *cursor_page;
-
-			/* The target page is in the block, ignore it. */
-			if (unlikely(pfn == page_pfn))
-				continue;
-
-			/* Avoid holes within the zone. */
-			if (unlikely(!pfn_valid_within(pfn)))
-				break;
-
-			cursor_page = pfn_to_page(pfn);
-
-			/* Check that we have not crossed a zone boundary. */
-			if (unlikely(page_zone_id(cursor_page) != zone_id))
-				break;
-
-			/*
-			 * If we don't have enough swap space, reclaiming of
-			 * anon page which don't already have a swap slot is
-			 * pointless.
-			 */
-			if (nr_swap_pages <= 0 && PageSwapBacked(cursor_page) &&
-			    !PageSwapCache(cursor_page))
-				break;
-
-			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
-				unsigned int isolated_pages;
-
-				mem_cgroup_lru_del(cursor_page);
-				list_move(&cursor_page->lru, dst);
-				isolated_pages = hpage_nr_pages(cursor_page);
-				nr_taken += isolated_pages;
-				nr_lumpy_taken += isolated_pages;
-				if (PageDirty(cursor_page))
-					nr_lumpy_dirty += isolated_pages;
-				scan++;
-				pfn += isolated_pages - 1;
-			} else {
-				/*
-				 * Check if the page is freed already.
-				 *
-				 * We can't use page_count() as that
-				 * requires compound_head and we don't
-				 * have a pin on the page here. If a
-				 * page is tail, we may or may not
-				 * have isolated the head, so assume
-				 * it's not free, it'd be tricky to
-				 * track the head status without a
-				 * page pin.
-				 */
-				if (!PageTail(cursor_page) &&
-				    !atomic_read(&cursor_page->_count))
-					continue;
-				break;
-			}
-		}
-
-		/* If we break out of the loop above, lumpy reclaim failed */
-		if (pfn < end_pfn)
-			nr_lumpy_failed++;
 	}
 
 	*nr_scanned = scan;
@@ -1278,7 +1167,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	trace_mm_vmscan_lru_isolate(sc->order,
 			nr_to_scan, scan,
 			nr_taken,
-			nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed,
 			mode, file);
 	return nr_taken;
 }
@@ -1466,13 +1354,13 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 					int priority,
 					struct scan_control *sc)
 {
-	int lumpy_stall_priority;
+	int stall_priority;
 
 	/* kswapd should not stall on sync IO */
 	if (current_is_kswapd())
 		return false;
 
-	/* Only stall on lumpy reclaim */
+	/* Only stall for memory compaction */
 	if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
 		return false;
 
@@ -1487,11 +1375,11 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 	 * priority to be much higher before stalling.
 	 */
 	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		lumpy_stall_priority = DEF_PRIORITY;
+		stall_priority = DEF_PRIORITY;
 	else
-		lumpy_stall_priority = DEF_PRIORITY / 3;
+		stall_priority = DEF_PRIORITY / 3;
 
-	return priority <= lumpy_stall_priority;
+	return priority <= stall_priority;
 }
 
 /*
@@ -1523,8 +1411,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 	}
 
 	set_reclaim_mode(priority, sc, false);
-	if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
-		isolate_mode |= ISOLATE_ACTIVE;
 
 	lru_add_drain();
 
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/3] mm: vmscan: Do not stall on writeback during memory compaction
  2012-04-11 16:38 ` Mel Gorman
@ 2012-04-11 16:38   ` Mel Gorman
  -1 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Mel Gorman, Linux-MM, LKML

This patch stops reclaim/compaction entering sync reclaim as this was only
intended for lumpy reclaim and an oversight. Page migration has its own
logic for stalling on writeback pages if necessary and memory compaction
is already using it.

Waiting on page writeback is bad for a number of reasons but the primary
one is that waiting on writeback to a slow device like USB can take a
considerable length of time. Page reclaim instead uses wait_iff_congested()
to throttle if too many dirty pages are being scanned.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/trace/events/vmscan.h |   10 ++---
 mm/vmscan.c                   |   85 ++++-------------------------------------
 2 files changed, 13 insertions(+), 82 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 1c20a1f..044e8ba 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -13,7 +13,7 @@
 #define RECLAIM_WB_ANON		0x0001u
 #define RECLAIM_WB_FILE		0x0002u
 #define RECLAIM_WB_MIXED	0x0010u
-#define RECLAIM_WB_SYNC		0x0004u
+#define RECLAIM_WB_SYNC		0x0004u /* Unused, all reclaim async */
 #define RECLAIM_WB_ASYNC	0x0008u
 
 #define show_reclaim_flags(flags)				\
@@ -27,13 +27,13 @@
 
 #define trace_reclaim_flags(page, sync) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
-	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
+	(RECLAIM_WB_ASYNC) \
 	)
 
 #define trace_shrink_flags(file, sync) ( \
-	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_MIXED : \
-			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
-	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+	( \
+		(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
+		(RECLAIM_WB_ASYNC) \
 	)
 
 TRACE_EVENT(mm_vmscan_kswapd_sleep,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a4b86bd..68319e4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -56,15 +56,11 @@
 /*
  * reclaim_mode determines how the inactive list is shrunk
  * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
- * RECLAIM_MODE_ASYNC:  Do not block
- * RECLAIM_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
  * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
  *			order-0 pages and then compact the zone
  */
 typedef unsigned __bitwise__ reclaim_mode_t;
 #define RECLAIM_MODE_SINGLE		((__force reclaim_mode_t)0x01u)
-#define RECLAIM_MODE_ASYNC		((__force reclaim_mode_t)0x02u)
-#define RECLAIM_MODE_SYNC		((__force reclaim_mode_t)0x04u)
 #define RECLAIM_MODE_COMPACTION		((__force reclaim_mode_t)0x10u)
 
 struct scan_control {
@@ -360,12 +356,8 @@ out:
 	return ret;
 }
 
-static void set_reclaim_mode(int priority, struct scan_control *sc,
-				   bool sync)
+static void set_reclaim_mode(int priority, struct scan_control *sc)
 {
-	/* Sync reclaim used only for compaction */
-	reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
-
 	/*
 	 * Restrict reclaim/compaction to costly allocations or when
 	 * under memory pressure
@@ -373,14 +365,14 @@ static void set_reclaim_mode(int priority, struct scan_control *sc,
 	if (COMPACTION_BUILD && sc->order &&
 			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
 			 priority < DEF_PRIORITY - 2))
-		sc->reclaim_mode = RECLAIM_MODE_COMPACTION | syncmode;
+		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
 	else
-		sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
+		sc->reclaim_mode = RECLAIM_MODE_SINGLE;
 }
 
 static void reset_reclaim_mode(struct scan_control *sc)
 {
-	sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
+	sc->reclaim_mode = RECLAIM_MODE_SINGLE;
 }
 
 static inline int is_page_cache_freeable(struct page *page)
@@ -791,19 +783,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 		if (PageWriteback(page)) {
 			nr_writeback++;
-			/*
-			 * Synchronous reclaim cannot queue pages for
-			 * writeback due to the possibility of stack overflow
-			 * but if it encounters a page under writeback, wait
-			 * for the IO to complete.
-			 */
-			if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) &&
-			    may_enter_fs)
-				wait_on_page_writeback(page);
-			else {
-				unlock_page(page);
-				goto keep_reclaim_mode;
-			}
+			unlock_page(page);
+			goto keep;
 		}
 
 		references = page_check_references(page, mz, sc);
@@ -886,7 +867,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto activate_locked;
 			case PAGE_SUCCESS:
 				if (PageWriteback(page))
-					goto keep_reclaim_mode;
+					goto keep;
 				if (PageDirty(page))
 					goto keep;
 
@@ -985,8 +966,6 @@ activate_locked:
 keep_locked:
 		unlock_page(page);
 keep:
-		reset_reclaim_mode(sc);
-keep_reclaim_mode:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
 	}
@@ -1342,47 +1321,6 @@ update_isolated_counts(struct mem_cgroup_zone *mz,
 }
 
 /*
- * Returns true if a direct reclaim should wait on pages under writeback.
- *
- * If we are direct reclaiming for contiguous pages and we do not reclaim
- * everything in the list, try again and wait for writeback IO to complete.
- * This will stall high-order allocations noticeably. Only do that when really
- * need to free the pages under high memory pressure.
- */
-static inline bool should_reclaim_stall(unsigned long nr_taken,
-					unsigned long nr_freed,
-					int priority,
-					struct scan_control *sc)
-{
-	int stall_priority;
-
-	/* kswapd should not stall on sync IO */
-	if (current_is_kswapd())
-		return false;
-
-	/* Only stall for memory compaction */
-	if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
-		return false;
-
-	/* If we have reclaimed everything on the isolated list, no stall */
-	if (nr_freed == nr_taken)
-		return false;
-
-	/*
-	 * For high-order allocations, there are two stall thresholds.
-	 * High-cost allocations stall immediately where as lower
-	 * order allocations such as stacks require the scanning
-	 * priority to be much higher before stalling.
-	 */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		stall_priority = DEF_PRIORITY;
-	else
-		stall_priority = DEF_PRIORITY / 3;
-
-	return priority <= stall_priority;
-}
-
-/*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
@@ -1410,7 +1348,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 			return SWAP_CLUSTER_MAX;
 	}
 
-	set_reclaim_mode(priority, sc, false);
+	set_reclaim_mode(priority, sc);
 
 	lru_add_drain();
 
@@ -1442,13 +1380,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 	nr_reclaimed = shrink_page_list(&page_list, mz, sc, priority,
 						&nr_dirty, &nr_writeback);
 
-	/* Check if we should syncronously wait for writeback */
-	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
-		set_reclaim_mode(priority, sc, true);
-		nr_reclaimed += shrink_page_list(&page_list, mz, sc,
-					priority, &nr_dirty, &nr_writeback);
-	}
-
 	spin_lock_irq(&zone->lru_lock);
 
 	reclaim_stat->recent_scanned[0] += nr_anon;
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/3] mm: vmscan: Do not stall on writeback during memory compaction
@ 2012-04-11 16:38   ` Mel Gorman
  0 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Mel Gorman, Linux-MM, LKML

This patch stops reclaim/compaction entering sync reclaim as this was only
intended for lumpy reclaim and an oversight. Page migration has its own
logic for stalling on writeback pages if necessary and memory compaction
is already using it.

Waiting on page writeback is bad for a number of reasons but the primary
one is that waiting on writeback to a slow device like USB can take a
considerable length of time. Page reclaim instead uses wait_iff_congested()
to throttle if too many dirty pages are being scanned.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/trace/events/vmscan.h |   10 ++---
 mm/vmscan.c                   |   85 ++++-------------------------------------
 2 files changed, 13 insertions(+), 82 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 1c20a1f..044e8ba 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -13,7 +13,7 @@
 #define RECLAIM_WB_ANON		0x0001u
 #define RECLAIM_WB_FILE		0x0002u
 #define RECLAIM_WB_MIXED	0x0010u
-#define RECLAIM_WB_SYNC		0x0004u
+#define RECLAIM_WB_SYNC		0x0004u /* Unused, all reclaim async */
 #define RECLAIM_WB_ASYNC	0x0008u
 
 #define show_reclaim_flags(flags)				\
@@ -27,13 +27,13 @@
 
 #define trace_reclaim_flags(page, sync) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
-	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
+	(RECLAIM_WB_ASYNC) \
 	)
 
 #define trace_shrink_flags(file, sync) ( \
-	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_MIXED : \
-			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
-	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+	( \
+		(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
+		(RECLAIM_WB_ASYNC) \
 	)
 
 TRACE_EVENT(mm_vmscan_kswapd_sleep,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a4b86bd..68319e4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -56,15 +56,11 @@
 /*
  * reclaim_mode determines how the inactive list is shrunk
  * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
- * RECLAIM_MODE_ASYNC:  Do not block
- * RECLAIM_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
  * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
  *			order-0 pages and then compact the zone
  */
 typedef unsigned __bitwise__ reclaim_mode_t;
 #define RECLAIM_MODE_SINGLE		((__force reclaim_mode_t)0x01u)
-#define RECLAIM_MODE_ASYNC		((__force reclaim_mode_t)0x02u)
-#define RECLAIM_MODE_SYNC		((__force reclaim_mode_t)0x04u)
 #define RECLAIM_MODE_COMPACTION		((__force reclaim_mode_t)0x10u)
 
 struct scan_control {
@@ -360,12 +356,8 @@ out:
 	return ret;
 }
 
-static void set_reclaim_mode(int priority, struct scan_control *sc,
-				   bool sync)
+static void set_reclaim_mode(int priority, struct scan_control *sc)
 {
-	/* Sync reclaim used only for compaction */
-	reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
-
 	/*
 	 * Restrict reclaim/compaction to costly allocations or when
 	 * under memory pressure
@@ -373,14 +365,14 @@ static void set_reclaim_mode(int priority, struct scan_control *sc,
 	if (COMPACTION_BUILD && sc->order &&
 			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
 			 priority < DEF_PRIORITY - 2))
-		sc->reclaim_mode = RECLAIM_MODE_COMPACTION | syncmode;
+		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
 	else
-		sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
+		sc->reclaim_mode = RECLAIM_MODE_SINGLE;
 }
 
 static void reset_reclaim_mode(struct scan_control *sc)
 {
-	sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
+	sc->reclaim_mode = RECLAIM_MODE_SINGLE;
 }
 
 static inline int is_page_cache_freeable(struct page *page)
@@ -791,19 +783,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 		if (PageWriteback(page)) {
 			nr_writeback++;
-			/*
-			 * Synchronous reclaim cannot queue pages for
-			 * writeback due to the possibility of stack overflow
-			 * but if it encounters a page under writeback, wait
-			 * for the IO to complete.
-			 */
-			if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) &&
-			    may_enter_fs)
-				wait_on_page_writeback(page);
-			else {
-				unlock_page(page);
-				goto keep_reclaim_mode;
-			}
+			unlock_page(page);
+			goto keep;
 		}
 
 		references = page_check_references(page, mz, sc);
@@ -886,7 +867,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto activate_locked;
 			case PAGE_SUCCESS:
 				if (PageWriteback(page))
-					goto keep_reclaim_mode;
+					goto keep;
 				if (PageDirty(page))
 					goto keep;
 
@@ -985,8 +966,6 @@ activate_locked:
 keep_locked:
 		unlock_page(page);
 keep:
-		reset_reclaim_mode(sc);
-keep_reclaim_mode:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
 	}
@@ -1342,47 +1321,6 @@ update_isolated_counts(struct mem_cgroup_zone *mz,
 }
 
 /*
- * Returns true if a direct reclaim should wait on pages under writeback.
- *
- * If we are direct reclaiming for contiguous pages and we do not reclaim
- * everything in the list, try again and wait for writeback IO to complete.
- * This will stall high-order allocations noticeably. Only do that when really
- * need to free the pages under high memory pressure.
- */
-static inline bool should_reclaim_stall(unsigned long nr_taken,
-					unsigned long nr_freed,
-					int priority,
-					struct scan_control *sc)
-{
-	int stall_priority;
-
-	/* kswapd should not stall on sync IO */
-	if (current_is_kswapd())
-		return false;
-
-	/* Only stall for memory compaction */
-	if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
-		return false;
-
-	/* If we have reclaimed everything on the isolated list, no stall */
-	if (nr_freed == nr_taken)
-		return false;
-
-	/*
-	 * For high-order allocations, there are two stall thresholds.
-	 * High-cost allocations stall immediately where as lower
-	 * order allocations such as stacks require the scanning
-	 * priority to be much higher before stalling.
-	 */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		stall_priority = DEF_PRIORITY;
-	else
-		stall_priority = DEF_PRIORITY / 3;
-
-	return priority <= stall_priority;
-}
-
-/*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
@@ -1410,7 +1348,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 			return SWAP_CLUSTER_MAX;
 	}
 
-	set_reclaim_mode(priority, sc, false);
+	set_reclaim_mode(priority, sc);
 
 	lru_add_drain();
 
@@ -1442,13 +1380,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 	nr_reclaimed = shrink_page_list(&page_list, mz, sc, priority,
 						&nr_dirty, &nr_writeback);
 
-	/* Check if we should syncronously wait for writeback */
-	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
-		set_reclaim_mode(priority, sc, true);
-		nr_reclaimed += shrink_page_list(&page_list, mz, sc,
-					priority, &nr_dirty, &nr_writeback);
-	}
-
 	spin_lock_irq(&zone->lru_lock);
 
 	reclaim_stat->recent_scanned[0] += nr_anon;
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/3] mm: vmscan: Remove reclaim_mode_t
  2012-04-11 16:38 ` Mel Gorman
@ 2012-04-11 16:38   ` Mel Gorman
  -1 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Mel Gorman, Linux-MM, LKML

There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
and lumpy reclaim have been removed. This patch gets rid of reclaim_mode_t
as well and improves the documentation about what reclaim/compaction is
and when it is triggered.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/trace/events/vmscan.h |    4 +--
 mm/vmscan.c                   |   72 +++++++++++++----------------------------
 2 files changed, 24 insertions(+), 52 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 044e8ba..0794aa2 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,12 +25,12 @@
 		{RECLAIM_WB_ASYNC,	"RECLAIM_WB_ASYNC"}	\
 		) : "RECLAIM_WB_NONE"
 
-#define trace_reclaim_flags(page, sync) ( \
+#define trace_reclaim_flags(page) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
 	(RECLAIM_WB_ASYNC) \
 	)
 
-#define trace_shrink_flags(file, sync) ( \
+#define trace_shrink_flags(file) \
 	( \
 		(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
 		(RECLAIM_WB_ASYNC) \
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 68319e4..36c6ad2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -53,16 +53,6 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
-/*
- * reclaim_mode determines how the inactive list is shrunk
- * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
- * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
- *			order-0 pages and then compact the zone
- */
-typedef unsigned __bitwise__ reclaim_mode_t;
-#define RECLAIM_MODE_SINGLE		((__force reclaim_mode_t)0x01u)
-#define RECLAIM_MODE_COMPACTION		((__force reclaim_mode_t)0x10u)
-
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
 	unsigned long nr_scanned;
@@ -89,12 +79,6 @@ struct scan_control {
 	int order;
 
 	/*
-	 * Intend to reclaim enough continuous memory rather than reclaim
-	 * enough amount of memory. i.e, mode for high order allocation.
-	 */
-	reclaim_mode_t reclaim_mode;
-
-	/*
 	 * The memory cgroup that hit its limit and as a result is the
 	 * primary target of this reclaim invocation.
 	 */
@@ -356,25 +340,6 @@ out:
 	return ret;
 }
 
-static void set_reclaim_mode(int priority, struct scan_control *sc)
-{
-	/*
-	 * Restrict reclaim/compaction to costly allocations or when
-	 * under memory pressure
-	 */
-	if (COMPACTION_BUILD && sc->order &&
-			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
-			 priority < DEF_PRIORITY - 2))
-		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
-	else
-		sc->reclaim_mode = RECLAIM_MODE_SINGLE;
-}
-
-static void reset_reclaim_mode(struct scan_control *sc)
-{
-	sc->reclaim_mode = RECLAIM_MODE_SINGLE;
-}
-
 static inline int is_page_cache_freeable(struct page *page)
 {
 	/*
@@ -497,8 +462,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			/* synchronous write or broken a_ops? */
 			ClearPageReclaim(page);
 		}
-		trace_mm_vmscan_writepage(page,
-			trace_reclaim_flags(page, sc->reclaim_mode));
+		trace_mm_vmscan_writepage(page, trace_reclaim_flags(page));
 		inc_zone_page_state(page, NR_VMSCAN_WRITE);
 		return PAGE_SUCCESS;
 	}
@@ -953,7 +917,6 @@ cull_mlocked:
 			try_to_free_swap(page);
 		unlock_page(page);
 		putback_lru_page(page);
-		reset_reclaim_mode(sc);
 		continue;
 
 activate_locked:
@@ -1348,8 +1311,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 			return SWAP_CLUSTER_MAX;
 	}
 
-	set_reclaim_mode(priority, sc);
-
 	lru_add_drain();
 
 	if (!sc->may_unmap)
@@ -1428,7 +1389,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 		zone_idx(zone),
 		nr_scanned, nr_reclaimed,
 		priority,
-		trace_shrink_flags(file, sc->reclaim_mode));
+		trace_shrink_flags(file));
 	return nr_reclaimed;
 }
 
@@ -1507,8 +1468,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	lru_add_drain();
 
-	reset_reclaim_mode(sc);
-
 	if (!sc->may_unmap)
 		isolate_mode |= ISOLATE_UNMAPPED;
 	if (!sc->may_writepage)
@@ -1821,23 +1780,35 @@ out:
 	}
 }
 
+/* Use reclaim/compaction for costly allocs or under memory pressure */
+static bool in_reclaim_compaction(int priority, struct scan_control *sc)
+{
+	if (COMPACTION_BUILD && sc->order &&
+			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
+			 priority < DEF_PRIORITY - 2))
+		return true;
+
+	return false;
+}
+
 /*
- * Reclaim/compaction depends on a number of pages being freed. To avoid
- * disruption to the system, a small number of order-0 pages continue to be
- * rotated and reclaimed in the normal fashion. However, by the time we get
- * back to the allocator and call try_to_compact_zone(), we ensure that
- * there are enough free pages for it to be likely successful
+ * Reclaim/compaction is used for high-order allocation requests. It reclaims
+ * order-0 pages before compacting the zone. should_continue_reclaim() returns
+ * true if more pages should be reclaimed such that when the page allocator
+ * calls try_to_compact_zone() that it will have enough free pages to succeed.
+ * It will give up earlier than that if there is difficulty reclaiming pages.
  */
 static inline bool should_continue_reclaim(struct mem_cgroup_zone *mz,
 					unsigned long nr_reclaimed,
 					unsigned long nr_scanned,
+					int priority,
 					struct scan_control *sc)
 {
 	unsigned long pages_for_compaction;
 	unsigned long inactive_lru_pages;
 
 	/* If not in reclaim/compaction mode, stop */
-	if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION))
+	if (!in_reclaim_compaction(priority, sc))
 		return false;
 
 	/* Consider stopping depending on scan and reclaim activity */
@@ -1944,7 +1915,8 @@ restart:
 
 	/* reclaim/compaction might need reclaim to continue */
 	if (should_continue_reclaim(mz, nr_reclaimed,
-					sc->nr_scanned - nr_scanned, sc))
+					sc->nr_scanned - nr_scanned,
+					priority, sc))
 		goto restart;
 
 	throttle_vm_writeout(sc->gfp_mask);
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/3] mm: vmscan: Remove reclaim_mode_t
@ 2012-04-11 16:38   ` Mel Gorman
  0 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Mel Gorman, Linux-MM, LKML

There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
and lumpy reclaim have been removed. This patch gets rid of reclaim_mode_t
as well and improves the documentation about what reclaim/compaction is
and when it is triggered.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/trace/events/vmscan.h |    4 +--
 mm/vmscan.c                   |   72 +++++++++++++----------------------------
 2 files changed, 24 insertions(+), 52 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 044e8ba..0794aa2 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,12 +25,12 @@
 		{RECLAIM_WB_ASYNC,	"RECLAIM_WB_ASYNC"}	\
 		) : "RECLAIM_WB_NONE"
 
-#define trace_reclaim_flags(page, sync) ( \
+#define trace_reclaim_flags(page) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
 	(RECLAIM_WB_ASYNC) \
 	)
 
-#define trace_shrink_flags(file, sync) ( \
+#define trace_shrink_flags(file) \
 	( \
 		(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
 		(RECLAIM_WB_ASYNC) \
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 68319e4..36c6ad2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -53,16 +53,6 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
-/*
- * reclaim_mode determines how the inactive list is shrunk
- * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
- * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
- *			order-0 pages and then compact the zone
- */
-typedef unsigned __bitwise__ reclaim_mode_t;
-#define RECLAIM_MODE_SINGLE		((__force reclaim_mode_t)0x01u)
-#define RECLAIM_MODE_COMPACTION		((__force reclaim_mode_t)0x10u)
-
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
 	unsigned long nr_scanned;
@@ -89,12 +79,6 @@ struct scan_control {
 	int order;
 
 	/*
-	 * Intend to reclaim enough continuous memory rather than reclaim
-	 * enough amount of memory. i.e, mode for high order allocation.
-	 */
-	reclaim_mode_t reclaim_mode;
-
-	/*
 	 * The memory cgroup that hit its limit and as a result is the
 	 * primary target of this reclaim invocation.
 	 */
@@ -356,25 +340,6 @@ out:
 	return ret;
 }
 
-static void set_reclaim_mode(int priority, struct scan_control *sc)
-{
-	/*
-	 * Restrict reclaim/compaction to costly allocations or when
-	 * under memory pressure
-	 */
-	if (COMPACTION_BUILD && sc->order &&
-			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
-			 priority < DEF_PRIORITY - 2))
-		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
-	else
-		sc->reclaim_mode = RECLAIM_MODE_SINGLE;
-}
-
-static void reset_reclaim_mode(struct scan_control *sc)
-{
-	sc->reclaim_mode = RECLAIM_MODE_SINGLE;
-}
-
 static inline int is_page_cache_freeable(struct page *page)
 {
 	/*
@@ -497,8 +462,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			/* synchronous write or broken a_ops? */
 			ClearPageReclaim(page);
 		}
-		trace_mm_vmscan_writepage(page,
-			trace_reclaim_flags(page, sc->reclaim_mode));
+		trace_mm_vmscan_writepage(page, trace_reclaim_flags(page));
 		inc_zone_page_state(page, NR_VMSCAN_WRITE);
 		return PAGE_SUCCESS;
 	}
@@ -953,7 +917,6 @@ cull_mlocked:
 			try_to_free_swap(page);
 		unlock_page(page);
 		putback_lru_page(page);
-		reset_reclaim_mode(sc);
 		continue;
 
 activate_locked:
@@ -1348,8 +1311,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 			return SWAP_CLUSTER_MAX;
 	}
 
-	set_reclaim_mode(priority, sc);
-
 	lru_add_drain();
 
 	if (!sc->may_unmap)
@@ -1428,7 +1389,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 		zone_idx(zone),
 		nr_scanned, nr_reclaimed,
 		priority,
-		trace_shrink_flags(file, sc->reclaim_mode));
+		trace_shrink_flags(file));
 	return nr_reclaimed;
 }
 
@@ -1507,8 +1468,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	lru_add_drain();
 
-	reset_reclaim_mode(sc);
-
 	if (!sc->may_unmap)
 		isolate_mode |= ISOLATE_UNMAPPED;
 	if (!sc->may_writepage)
@@ -1821,23 +1780,35 @@ out:
 	}
 }
 
+/* Use reclaim/compaction for costly allocs or under memory pressure */
+static bool in_reclaim_compaction(int priority, struct scan_control *sc)
+{
+	if (COMPACTION_BUILD && sc->order &&
+			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
+			 priority < DEF_PRIORITY - 2))
+		return true;
+
+	return false;
+}
+
 /*
- * Reclaim/compaction depends on a number of pages being freed. To avoid
- * disruption to the system, a small number of order-0 pages continue to be
- * rotated and reclaimed in the normal fashion. However, by the time we get
- * back to the allocator and call try_to_compact_zone(), we ensure that
- * there are enough free pages for it to be likely successful
+ * Reclaim/compaction is used for high-order allocation requests. It reclaims
+ * order-0 pages before compacting the zone. should_continue_reclaim() returns
+ * true if more pages should be reclaimed such that when the page allocator
+ * calls try_to_compact_zone() that it will have enough free pages to succeed.
+ * It will give up earlier than that if there is difficulty reclaiming pages.
  */
 static inline bool should_continue_reclaim(struct mem_cgroup_zone *mz,
 					unsigned long nr_reclaimed,
 					unsigned long nr_scanned,
+					int priority,
 					struct scan_control *sc)
 {
 	unsigned long pages_for_compaction;
 	unsigned long inactive_lru_pages;
 
 	/* If not in reclaim/compaction mode, stop */
-	if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION))
+	if (!in_reclaim_compaction(priority, sc))
 		return false;
 
 	/* Consider stopping depending on scan and reclaim activity */
@@ -1944,7 +1915,8 @@ restart:
 
 	/* reclaim/compaction might need reclaim to continue */
 	if (should_continue_reclaim(mz, nr_reclaimed,
-					sc->nr_scanned - nr_scanned, sc))
+					sc->nr_scanned - nr_scanned,
+					priority, sc))
 		goto restart;
 
 	throttle_vm_writeout(sc->gfp_mask);
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
  2012-04-11 16:38 ` Mel Gorman
@ 2012-04-11 17:17   ` Rik van Riel
  -1 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2012-04-11 17:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On 04/11/2012 12:38 PM, Mel Gorman wrote:

> Success rates are completely hosed for 3.4-rc2 which is almost certainly
> due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
> expected this would happen for kswapd and impair allocation success rates
> (https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
> a difference: 80% less scanning, 37% less reclaim by kswapd

Also, no gratuitous pageouts of anonymous memory.
That was what really made a difference on a somewhat
heavily loaded desktop + kvm workload.

> In comparison, reclaim/compaction is not aggressive and gives up easily
> which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
> much more aggressive about reclaim/compaction than THP allocations are. The
> stress test above is allocating like neither THP or hugetlbfs but is much
> closer to THP.

Next step: get rid of __GFP_NO_KSWAPD for THP, first
in the -mm kernel

> Mainline is now impaired in terms of high order allocation under heavy load
> although I do not know to what degree as I did not test with __GFP_REPEAT.
> Keep this in mind for bugs related to hugepage pool resizing, THP allocation
> and high order atomic allocation failures from network devices.

This might be due to smaller allocations not bumping
the compaction deferring code, when we have deferred
compaction for a higher order allocation.

I wonder if the compaction deferring code is simply
too defer-happy, now that we ignore compaction at
lower orders than where compaction failed?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-11 17:17   ` Rik van Riel
  0 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2012-04-11 17:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On 04/11/2012 12:38 PM, Mel Gorman wrote:

> Success rates are completely hosed for 3.4-rc2 which is almost certainly
> due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
> expected this would happen for kswapd and impair allocation success rates
> (https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
> a difference: 80% less scanning, 37% less reclaim by kswapd

Also, no gratuitous pageouts of anonymous memory.
That was what really made a difference on a somewhat
heavily loaded desktop + kvm workload.

> In comparison, reclaim/compaction is not aggressive and gives up easily
> which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
> much more aggressive about reclaim/compaction than THP allocations are. The
> stress test above is allocating like neither THP or hugetlbfs but is much
> closer to THP.

Next step: get rid of __GFP_NO_KSWAPD for THP, first
in the -mm kernel

> Mainline is now impaired in terms of high order allocation under heavy load
> although I do not know to what degree as I did not test with __GFP_REPEAT.
> Keep this in mind for bugs related to hugepage pool resizing, THP allocation
> and high order atomic allocation failures from network devices.

This might be due to smaller allocations not bumping
the compaction deferring code, when we have deferred
compaction for a higher order allocation.

I wonder if the compaction deferring code is simply
too defer-happy, now that we ignore compaction at
lower orders than where compaction failed?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] mm: vmscan: Remove lumpy reclaim
  2012-04-11 16:38   ` Mel Gorman
@ 2012-04-11 17:25     ` Rik van Riel
  -1 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2012-04-11 17:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On 04/11/2012 12:38 PM, Mel Gorman wrote:
> Lumpy reclaim had a purpose but in the mind of some, it was to kick
> the system so hard it trashed. For others the purpose was to complicate
> vmscan.c. Over time it was giving softer shoes and a nicer attitude but
> memory compaction needs to step up and replace it so this patch sends
> lumpy reclaim to the farm.
>
> The tracepoint format changes for isolating LRU pages with this patch
> applied. Furthermore reclaim/compaction can no longer queue dirty pages in
> pageout() if the underlying BDI is congested. Lumpy reclaim used this logic
> and reclaim/compaction was using it in error.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Acked-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] mm: vmscan: Remove lumpy reclaim
@ 2012-04-11 17:25     ` Rik van Riel
  0 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2012-04-11 17:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On 04/11/2012 12:38 PM, Mel Gorman wrote:
> Lumpy reclaim had a purpose but in the mind of some, it was to kick
> the system so hard it trashed. For others the purpose was to complicate
> vmscan.c. Over time it was giving softer shoes and a nicer attitude but
> memory compaction needs to step up and replace it so this patch sends
> lumpy reclaim to the farm.
>
> The tracepoint format changes for isolating LRU pages with this patch
> applied. Furthermore reclaim/compaction can no longer queue dirty pages in
> pageout() if the underlying BDI is congested. Lumpy reclaim used this logic
> and reclaim/compaction was using it in error.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Acked-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] mm: vmscan: Do not stall on writeback during memory compaction
  2012-04-11 16:38   ` Mel Gorman
@ 2012-04-11 17:26     ` Rik van Riel
  -1 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2012-04-11 17:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On 04/11/2012 12:38 PM, Mel Gorman wrote:
> This patch stops reclaim/compaction entering sync reclaim as this was only
> intended for lumpy reclaim and an oversight. Page migration has its own
> logic for stalling on writeback pages if necessary and memory compaction
> is already using it.
>
> Waiting on page writeback is bad for a number of reasons but the primary
> one is that waiting on writeback to a slow device like USB can take a
> considerable length of time. Page reclaim instead uses wait_iff_congested()
> to throttle if too many dirty pages are being scanned.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Acked-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] mm: vmscan: Do not stall on writeback during memory compaction
@ 2012-04-11 17:26     ` Rik van Riel
  0 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2012-04-11 17:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On 04/11/2012 12:38 PM, Mel Gorman wrote:
> This patch stops reclaim/compaction entering sync reclaim as this was only
> intended for lumpy reclaim and an oversight. Page migration has its own
> logic for stalling on writeback pages if necessary and memory compaction
> is already using it.
>
> Waiting on page writeback is bad for a number of reasons but the primary
> one is that waiting on writeback to a slow device like USB can take a
> considerable length of time. Page reclaim instead uses wait_iff_congested()
> to throttle if too many dirty pages are being scanned.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Acked-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/3] mm: vmscan: Remove reclaim_mode_t
  2012-04-11 16:38   ` Mel Gorman
@ 2012-04-11 17:26     ` Rik van Riel
  -1 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2012-04-11 17:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On 04/11/2012 12:38 PM, Mel Gorman wrote:
> There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
> and lumpy reclaim have been removed. This patch gets rid of reclaim_mode_t
> as well and improves the documentation about what reclaim/compaction is
> and when it is triggered.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Acked-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/3] mm: vmscan: Remove reclaim_mode_t
@ 2012-04-11 17:26     ` Rik van Riel
  0 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2012-04-11 17:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On 04/11/2012 12:38 PM, Mel Gorman wrote:
> There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
> and lumpy reclaim have been removed. This patch gets rid of reclaim_mode_t
> as well and improves the documentation about what reclaim/compaction is
> and when it is triggered.
>
> Signed-off-by: Mel Gorman<mgorman@suse.de>

Acked-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
  2012-04-11 17:17   ` Rik van Riel
@ 2012-04-11 17:52     ` Mel Gorman
  -1 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 17:52 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On Wed, Apr 11, 2012 at 01:17:02PM -0400, Rik van Riel wrote:
> On 04/11/2012 12:38 PM, Mel Gorman wrote:
> 
> >Success rates are completely hosed for 3.4-rc2 which is almost certainly
> >due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
> >expected this would happen for kswapd and impair allocation success rates
> >(https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
> >a difference: 80% less scanning, 37% less reclaim by kswapd
> 
> Also, no gratuitous pageouts of anonymous memory.
> That was what really made a difference on a somewhat
> heavily loaded desktop + kvm workload.
> 

Indeed.

> >In comparison, reclaim/compaction is not aggressive and gives up easily
> >which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
> >much more aggressive about reclaim/compaction than THP allocations are. The
> >stress test above is allocating like neither THP or hugetlbfs but is much
> >closer to THP.
> 
> Next step: get rid of __GFP_NO_KSWAPD for THP, first
> in the -mm kernel
> 

Initially the flag was introduced because kswapd reclaimed too
aggressively. One would like to believe that it would be less of a problem
now but we must avoid a situation where the CPU and reclaim cost of kswapd
exceeds the benefit of allocating a THP.

> >Mainline is now impaired in terms of high order allocation under heavy load
> >although I do not know to what degree as I did not test with __GFP_REPEAT.
> >Keep this in mind for bugs related to hugepage pool resizing, THP allocation
> >and high order atomic allocation failures from network devices.
> 
> This might be due to smaller allocations not bumping
> the compaction deferring code, when we have deferred
> compaction for a higher order allocation.
> 

It's one possibility but in this case I am not inclined to blame memory
compaction as such although there is some indication that there is a bug in
the free scanner that would make compaction less effective than it should be.

> I wonder if the compaction deferring code is simply
> too defer-happy, now that we ignore compaction at
> lower orders than where compaction failed?

I do not think it's a compaction deferral problem. We do not record
statistics on how often we defer compaction but if you look at the compaction
statistics you'll see that "Compaction stalls" and "Compaction pages moved"
figures are much higher. This implies that we are using compaction more
aggressively in 3.4-rc2 instead of deferring more.

You may also note that "Compaction success" figures are more or less the
same as 3.3 but that "Compaction failures" are higher. This indicates that
in 3.2 the high success rate was partially due to lumpy reclaim freeing
up the contiguous page before memory compaction was needed in memory
pressure situations.  If that is accurate then adjusting the logic in
should_continue_reclaim() for reclaim/compaction may partially address
the issue but not 100% of the way as reclaim/compaction will still be
racing with other allocation requests. This race is likely to be tigher
now because an accidental side-effect of lumpy reclaim was to throttle
parallel allocations requests in swap. It may not be very
straight-forward to fix :)

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-11 17:52     ` Mel Gorman
  0 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-11 17:52 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On Wed, Apr 11, 2012 at 01:17:02PM -0400, Rik van Riel wrote:
> On 04/11/2012 12:38 PM, Mel Gorman wrote:
> 
> >Success rates are completely hosed for 3.4-rc2 which is almost certainly
> >due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
> >expected this would happen for kswapd and impair allocation success rates
> >(https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
> >a difference: 80% less scanning, 37% less reclaim by kswapd
> 
> Also, no gratuitous pageouts of anonymous memory.
> That was what really made a difference on a somewhat
> heavily loaded desktop + kvm workload.
> 

Indeed.

> >In comparison, reclaim/compaction is not aggressive and gives up easily
> >which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
> >much more aggressive about reclaim/compaction than THP allocations are. The
> >stress test above is allocating like neither THP or hugetlbfs but is much
> >closer to THP.
> 
> Next step: get rid of __GFP_NO_KSWAPD for THP, first
> in the -mm kernel
> 

Initially the flag was introduced because kswapd reclaimed too
aggressively. One would like to believe that it would be less of a problem
now but we must avoid a situation where the CPU and reclaim cost of kswapd
exceeds the benefit of allocating a THP.

> >Mainline is now impaired in terms of high order allocation under heavy load
> >although I do not know to what degree as I did not test with __GFP_REPEAT.
> >Keep this in mind for bugs related to hugepage pool resizing, THP allocation
> >and high order atomic allocation failures from network devices.
> 
> This might be due to smaller allocations not bumping
> the compaction deferring code, when we have deferred
> compaction for a higher order allocation.
> 

It's one possibility but in this case I am not inclined to blame memory
compaction as such although there is some indication that there is a bug in
the free scanner that would make compaction less effective than it should be.

> I wonder if the compaction deferring code is simply
> too defer-happy, now that we ignore compaction at
> lower orders than where compaction failed?

I do not think it's a compaction deferral problem. We do not record
statistics on how often we defer compaction but if you look at the compaction
statistics you'll see that "Compaction stalls" and "Compaction pages moved"
figures are much higher. This implies that we are using compaction more
aggressively in 3.4-rc2 instead of deferring more.

You may also note that "Compaction success" figures are more or less the
same as 3.3 but that "Compaction failures" are higher. This indicates that
in 3.2 the high success rate was partially due to lumpy reclaim freeing
up the contiguous page before memory compaction was needed in memory
pressure situations.  If that is accurate then adjusting the logic in
should_continue_reclaim() for reclaim/compaction may partially address
the issue but not 100% of the way as reclaim/compaction will still be
racing with other allocation requests. This race is likely to be tigher
now because an accidental side-effect of lumpy reclaim was to throttle
parallel allocations requests in swap. It may not be very
straight-forward to fix :)

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
  2012-04-11 17:52     ` Mel Gorman
@ 2012-04-11 18:06       ` Rik van Riel
  -1 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2012-04-11 18:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On 04/11/2012 01:52 PM, Mel Gorman wrote:
> On Wed, Apr 11, 2012 at 01:17:02PM -0400, Rik van Riel wrote:

>> Next step: get rid of __GFP_NO_KSWAPD for THP, first
>> in the -mm kernel
>>
>
> Initially the flag was introduced because kswapd reclaimed too
> aggressively. One would like to believe that it would be less of a problem
> now but we must avoid a situation where the CPU and reclaim cost of kswapd
> exceeds the benefit of allocating a THP.

Since kswapd and the direct reclaim code now use
the same conditionals for calling compaction,
the cost ought to be identical.

I agree this is something we should shake out
in -mm for a while though, before considering a
mainline merge.

Andrew, would you be willing to take a removal
of __GFP_NO_KSWAPD in -mm, and push it to Linus
for the 3.6 kernel if no ill effects are seen
in -mm and -next?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-11 18:06       ` Rik van Riel
  0 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2012-04-11 18:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On 04/11/2012 01:52 PM, Mel Gorman wrote:
> On Wed, Apr 11, 2012 at 01:17:02PM -0400, Rik van Riel wrote:

>> Next step: get rid of __GFP_NO_KSWAPD for THP, first
>> in the -mm kernel
>>
>
> Initially the flag was introduced because kswapd reclaimed too
> aggressively. One would like to believe that it would be less of a problem
> now but we must avoid a situation where the CPU and reclaim cost of kswapd
> exceeds the benefit of allocating a THP.

Since kswapd and the direct reclaim code now use
the same conditionals for calling compaction,
the cost ought to be identical.

I agree this is something we should shake out
in -mm for a while though, before considering a
mainline merge.

Andrew, would you be willing to take a removal
of __GFP_NO_KSWAPD in -mm, and push it to Linus
for the 3.6 kernel if no ill effects are seen
in -mm and -next?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] mm: vmscan: Do not stall on writeback during memory compaction
  2012-04-11 17:26     ` Rik van Riel
@ 2012-04-11 18:51       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2012-04-11 18:51 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Andrew Morton, Konstantin Khlebnikov, Hugh Dickins,
	Ying Han, Linux-MM, LKML

On Wed, Apr 11, 2012 at 1:26 PM, Rik van Riel <riel@redhat.com> wrote:
> On 04/11/2012 12:38 PM, Mel Gorman wrote:
>>
>> This patch stops reclaim/compaction entering sync reclaim as this was only
>> intended for lumpy reclaim and an oversight. Page migration has its own
>> logic for stalling on writeback pages if necessary and memory compaction
>> is already using it.
>>
>> Waiting on page writeback is bad for a number of reasons but the primary
>> one is that waiting on writeback to a slow device like USB can take a
>> considerable length of time. Page reclaim instead uses
>> wait_iff_congested()
>> to throttle if too many dirty pages are being scanned.
>>
>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>
>
> Acked-by: Rik van Riel <riel@redhat.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] mm: vmscan: Do not stall on writeback during memory compaction
@ 2012-04-11 18:51       ` KOSAKI Motohiro
  0 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2012-04-11 18:51 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Andrew Morton, Konstantin Khlebnikov, Hugh Dickins,
	Ying Han, Linux-MM, LKML

On Wed, Apr 11, 2012 at 1:26 PM, Rik van Riel <riel@redhat.com> wrote:
> On 04/11/2012 12:38 PM, Mel Gorman wrote:
>>
>> This patch stops reclaim/compaction entering sync reclaim as this was only
>> intended for lumpy reclaim and an oversight. Page migration has its own
>> logic for stalling on writeback pages if necessary and memory compaction
>> is already using it.
>>
>> Waiting on page writeback is bad for a number of reasons but the primary
>> one is that waiting on writeback to a slow device like USB can take a
>> considerable length of time. Page reclaim instead uses
>> wait_iff_congested()
>> to throttle if too many dirty pages are being scanned.
>>
>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>
>
> Acked-by: Rik van Riel <riel@redhat.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] mm: vmscan: Remove lumpy reclaim
  2012-04-11 17:25     ` Rik van Riel
@ 2012-04-11 18:54       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2012-04-11 18:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Andrew Morton, Konstantin Khlebnikov, Hugh Dickins,
	Ying Han, Linux-MM, LKML

On Wed, Apr 11, 2012 at 1:25 PM, Rik van Riel <riel@redhat.com> wrote:
> On 04/11/2012 12:38 PM, Mel Gorman wrote:
>>
>> Lumpy reclaim had a purpose but in the mind of some, it was to kick
>> the system so hard it trashed. For others the purpose was to complicate
>> vmscan.c. Over time it was giving softer shoes and a nicer attitude but
>> memory compaction needs to step up and replace it so this patch sends
>> lumpy reclaim to the farm.
>>
>> The tracepoint format changes for isolating LRU pages with this patch
>> applied. Furthermore reclaim/compaction can no longer queue dirty pages in
>> pageout() if the underlying BDI is congested. Lumpy reclaim used this
>> logic
>> and reclaim/compaction was using it in error.
>>
>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>
> Acked-by: Rik van Riel <riel@redhat.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] mm: vmscan: Remove lumpy reclaim
@ 2012-04-11 18:54       ` KOSAKI Motohiro
  0 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2012-04-11 18:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Andrew Morton, Konstantin Khlebnikov, Hugh Dickins,
	Ying Han, Linux-MM, LKML

On Wed, Apr 11, 2012 at 1:25 PM, Rik van Riel <riel@redhat.com> wrote:
> On 04/11/2012 12:38 PM, Mel Gorman wrote:
>>
>> Lumpy reclaim had a purpose but in the mind of some, it was to kick
>> the system so hard it trashed. For others the purpose was to complicate
>> vmscan.c. Over time it was giving softer shoes and a nicer attitude but
>> memory compaction needs to step up and replace it so this patch sends
>> lumpy reclaim to the farm.
>>
>> The tracepoint format changes for isolating LRU pages with this patch
>> applied. Furthermore reclaim/compaction can no longer queue dirty pages in
>> pageout() if the underlying BDI is congested. Lumpy reclaim used this
>> logic
>> and reclaim/compaction was using it in error.
>>
>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>
> Acked-by: Rik van Riel <riel@redhat.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/3] mm: vmscan: Remove reclaim_mode_t
  2012-04-11 17:26     ` Rik van Riel
@ 2012-04-11 19:48       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2012-04-11 19:48 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Andrew Morton, Konstantin Khlebnikov, Hugh Dickins,
	Ying Han, Linux-MM, LKML

On Wed, Apr 11, 2012 at 1:26 PM, Rik van Riel <riel@redhat.com> wrote:
> On 04/11/2012 12:38 PM, Mel Gorman wrote:
>>
>> There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
>> and lumpy reclaim have been removed. This patch gets rid of reclaim_mode_t
>> as well and improves the documentation about what reclaim/compaction is
>> and when it is triggered.
>>
>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>
> Acked-by: Rik van Riel <riel@redhat.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/3] mm: vmscan: Remove reclaim_mode_t
@ 2012-04-11 19:48       ` KOSAKI Motohiro
  0 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2012-04-11 19:48 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mel Gorman, Andrew Morton, Konstantin Khlebnikov, Hugh Dickins,
	Ying Han, Linux-MM, LKML

On Wed, Apr 11, 2012 at 1:26 PM, Rik van Riel <riel@redhat.com> wrote:
> On 04/11/2012 12:38 PM, Mel Gorman wrote:
>>
>> There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
>> and lumpy reclaim have been removed. This patch gets rid of reclaim_mode_t
>> as well and improves the documentation about what reclaim/compaction is
>> and when it is triggered.
>>
>> Signed-off-by: Mel Gorman<mgorman@suse.de>
>
> Acked-by: Rik van Riel <riel@redhat.com>

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
  2012-04-11 16:38 ` Mel Gorman
@ 2012-04-11 23:37   ` Ying Han
  -1 siblings, 0 replies; 36+ messages in thread
From: Ying Han @ 2012-04-11 23:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins,
	Linux-MM, LKML

On Wed, Apr 11, 2012 at 9:38 AM, Mel Gorman <mgorman@suse.de> wrote:
> Andrew, these three patches should replace the two lumpy reclaim patches
> you already have. When applied, there is no functional difference (slightly
> changes in layout) but the changelogs are better.
>
> Changelog since V1
> o Ying pointed out that compaction was waiting on page writeback and the
>  description of the patches in V1 was broken. This version is the same
>  except that it is structured differently to explain that waiting on
>  page writeback is removed.
> o Rebased to v3.4-rc2
>
> This series removes lumpy reclaim and some stalling logic that was
> unintentionally being used by memory compaction. The end result
> is that stalling on dirty pages during page reclaim now depends on
> wait_iff_congested().
>
> Four kernels were compared
>
> 3.3.0     vanilla
> 3.4.0-rc2 vanilla
> 3.4.0-rc2 lumpyremove-v2 is patch one from this series
> 3.4.0-rc2 nosync-v2r3 is the full series
>
> Removing lumpy reclaim saves almost 900K of text where as the full series
> removes 1200K of text.
>
>   text    data     bss     dec     hex filename
> 6740375 1927944 2260992 10929311         a6c49f vmlinux-3.4.0-rc2-vanilla
> 6739479 1927944 2260992 10928415         a6c11f vmlinux-3.4.0-rc2-lumpyremove-v2
> 6739159 1927944 2260992 10928095         a6bfdf vmlinux-3.4.0-rc2-nosync-v2
>
> There are behaviour changes in the series and so tests were run with
> monitoring of ftrace events. This disrupts results so the performance
> results are distorted but the new behaviour should be clearer.
>
> fs-mark running in a threaded configuration showed little of interest as
> it did not push reclaim aggressively
>
> FS-Mark Multi Threaded
>                        3.3.0-vanilla       rc2-vanilla       lumpyremove-v2r3       nosync-v2r3
> Files/s  min           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
> Files/s  mean          3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
> Files/s  stddev        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)
> Files/s  max           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
> Overhead min      508667.00 ( 0.00%)   521350.00 (-2.49%)   544292.00 (-7.00%)   547168.00 (-7.57%)
> Overhead mean     551185.00 ( 0.00%)   652690.73 (-18.42%)   991208.40 (-79.83%)   570130.53 (-3.44%)
> Overhead stddev    18200.69 ( 0.00%)   331958.29 (-1723.88%)  1579579.43 (-8578.68%)     9576.81 (47.38%)
> Overhead max      576775.00 ( 0.00%)  1846634.00 (-220.17%)  6901055.00 (-1096.49%)   585675.00 (-1.54%)
> MMTests Statistics: duration
> Sys Time Running Test (seconds)             309.90    300.95    307.33    298.95
> User+Sys Time Running Test (seconds)        319.32    309.67    315.69    307.51
> Total Elapsed Time (seconds)               1187.85   1193.09   1191.98   1193.73
>
> MMTests Statistics: vmstat
> Page Ins                                       80532       82212       81420       79480
> Page Outs                                  111434984   111456240   111437376   111582628
> Swap Ins                                           0           0           0           0
> Swap Outs                                          0           0           0           0
> Direct pages scanned                           44881       27889       27453       34843
> Kswapd pages scanned                        25841428    25860774    25861233    25843212
> Kswapd pages reclaimed                      25841393    25860741    25861199    25843179
> Direct pages reclaimed                         44881       27889       27453       34843
> Kswapd efficiency                                99%         99%         99%         99%
> Kswapd velocity                            21754.791   21675.460   21696.029   21649.127
> Direct efficiency                               100%        100%        100%        100%
> Direct velocity                               37.783      23.375      23.031      29.188
> Percentage direct scans                           0%          0%          0%          0%
>
> ftrace showed that there was no stalling on writeback or pages submitted
> for IO from reclaim context.
>
>
> postmark was similar and while it was more interesting, it also did not
> push reclaim heavily.
>
> POSTMARK
>                                     3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
> Transactions per second:               16.00 ( 0.00%)    20.00 (25.00%)    18.00 (12.50%)    17.00 ( 6.25%)
> Data megabytes read per second:        18.80 ( 0.00%)    24.27 (29.10%)    22.26 (18.40%)    20.54 ( 9.26%)
> Data megabytes written per second:     35.83 ( 0.00%)    46.25 (29.08%)    42.42 (18.39%)    39.14 ( 9.24%)
> Files created alone per second:        28.00 ( 0.00%)    38.00 (35.71%)    34.00 (21.43%)    30.00 ( 7.14%)
> Files create/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)
> Files deleted alone per second:       556.00 ( 0.00%)  1224.00 (120.14%)  3062.00 (450.72%)  6124.00 (1001.44%)
> Files delete/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)
>
> MMTests Statistics: duration
> Sys Time Running Test (seconds)             113.34    107.99    109.73    108.72
> User+Sys Time Running Test (seconds)        145.51    139.81    143.32    143.55
> Total Elapsed Time (seconds)               1159.16    899.23    980.17   1062.27
>
> MMTests Statistics: vmstat
> Page Ins                                    13710192    13729032    13727944    13760136
> Page Outs                                   43071140    42987228    42733684    42931624
> Swap Ins                                           0           0           0           0
> Swap Outs                                          0           0           0           0
> Direct pages scanned                               0           0           0           0
> Kswapd pages scanned                         9941613     9937443     9939085     9929154
> Kswapd pages reclaimed                       9940926     9936751     9938397     9928465
> Direct pages reclaimed                             0           0           0           0
> Kswapd efficiency                                99%         99%         99%         99%
> Kswapd velocity                             8576.567   11051.058   10140.164    9347.109
> Direct efficiency                               100%        100%        100%        100%
> Direct velocity                                0.000       0.000       0.000       0.000
>
> It looks like here that the full series regresses performance but as ftrace
> showed no usage of wait_iff_congested() or sync reclaim I am assuming it's
> a disruption due to monitoring. Other data such as memory usage, page IO,
> swap IO all looked similar.
>
> Running a benchmark with a plain DD showed nothing very interesting. The
> full series stalled in wait_iff_congested() slightly less but stall times
> on vanilla kernels were marginal.
>
> Running a benchmark that hammered on file-backed mappings showed stalls
> due to congestion but not in sync writebacks
>
> MICRO
>                                     3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
> MMTests Statistics: duration
> Sys Time Running Test (seconds)             308.13    294.50    298.75    299.53
> User+Sys Time Running Test (seconds)        330.45    316.28    318.93    320.79
> Total Elapsed Time (seconds)               1814.90   1833.88   1821.14   1832.91
>
> MMTests Statistics: vmstat
> Page Ins                                      108712      120708       97224      110344
> Page Outs                                  155514576   156017404   155813676   156193256
> Swap Ins                                           0           0           0           0
> Swap Outs                                          0           0           0           0
> Direct pages scanned                         2599253     1550480     2512822     2414760
> Kswapd pages scanned                        69742364    71150694    68839041    69692533
> Kswapd pages reclaimed                      34824488    34773341    34796602    34799396
> Direct pages reclaimed                         53693       94750       61792       75205
> Kswapd efficiency                                49%         48%         50%         49%
> Kswapd velocity                            38427.662   38797.901   37799.972   38022.889
> Direct efficiency                                 2%          6%          2%          3%
> Direct velocity                             1432.174     845.464    1379.807    1317.446
> Percentage direct scans                           3%          2%          3%          3%
> Page writes by reclaim                             0           0           0           0
> Page writes file                                   0           0           0           0
> Page writes anon                                   0           0           0           0
> Page reclaim immediate                             0           0           0        1218
> Page rescued immediate                             0           0           0           0
> Slabs scanned                                  15360       16384       13312       16384
> Direct inode steals                                0           0           0           0
> Kswapd inode steals                             4340        4327        1630        4323
>
> FTrace Reclaim Statistics: congestion_wait
> Direct number congest     waited                 0          0          0          0
> Direct time   congest     waited               0ms        0ms        0ms        0ms
> Direct full   congest     waited                 0          0          0          0
> Direct number conditional waited               900        870        754        789
> Direct time   conditional waited               0ms        0ms        0ms       20ms
> Direct full   conditional waited                 0          0          0          0
> KSwapd number congest     waited              2106       2308       2116       1915
> KSwapd time   congest     waited          139924ms   157832ms   125652ms   132516ms
> KSwapd full   congest     waited              1346       1530       1202       1278
> KSwapd number conditional waited             12922      16320      10943      14670
> KSwapd time   conditional waited               0ms        0ms        0ms        0ms
> KSwapd full   conditional waited                 0          0          0          0
>
>
> Reclaim statistics are not radically changed. The stall times in kswapd
> are massive but it is clear that it is due to calls to congestion_wait()
> and that is almost certainly the call in balance_pgdat(). Otherwise stalls
> due to dirty pages are non-existant.
>
> I ran a benchmark that stressed high-order allocation. This is very
> artifical load but was used in the past to evaluate lumpy reclaim and
> compaction. Generally I look at allocation success rates and latency figures.
>
> STRESS-HIGHALLOC
>                 3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
> Pass 1          81.00 ( 0.00%)    28.00 (-53.00%)    24.00 (-57.00%)    28.00 (-53.00%)
> Pass 2          82.00 ( 0.00%)    39.00 (-43.00%)    38.00 (-44.00%)    43.00 (-39.00%)
> while Rested    88.00 ( 0.00%)    87.00 (-1.00%)    88.00 ( 0.00%)    88.00 ( 0.00%)
>
> MMTests Statistics: duration
> Sys Time Running Test (seconds)             740.93    681.42    685.14    684.87
> User+Sys Time Running Test (seconds)       2922.65   3269.52   3281.35   3279.44
> Total Elapsed Time (seconds)               1161.73   1152.49   1159.55   1161.44
>
> MMTests Statistics: vmstat
> Page Ins                                     4486020     2807256     2855944     2876244
> Page Outs                                    7261600     7973688     7975320     7986120
> Swap Ins                                       31694           0           0           0
> Swap Outs                                      98179           0           0           0
> Direct pages scanned                           53494       57731       34406      113015
> Kswapd pages scanned                         6271173     1287481     1278174     1219095
> Kswapd pages reclaimed                       2029240     1281025     1260708     1201583
> Direct pages reclaimed                          1468       14564       16649       92456
> Kswapd efficiency                                32%         99%         98%         98%
> Kswapd velocity                             5398.133    1117.130    1102.302    1049.641
> Direct efficiency                                 2%         25%         48%         81%
> Direct velocity                               46.047      50.092      29.672      97.306
> Percentage direct scans                           0%          4%          2%          8%
> Page writes by reclaim                       1616049           0           0           0
> Page writes file                             1517870           0           0           0
> Page writes anon                               98179           0           0           0
> Page reclaim immediate                        103778       27339        9796       17831
> Page rescued immediate                             0           0           0           0
> Slabs scanned                                1096704      986112      980992      998400
> Direct inode steals                              223      215040      216736      247881
> Kswapd inode steals                           175331       61548       68444       63066
> Kswapd skipped wait                            21991           0           1           0
> THP fault alloc                                    1         135         125         134
> THP collapse alloc                               393         311         228         236
> THP splits                                        25          13           7           8
> THP fault fallback                                 0           0           0           0
> THP collapse fail                                  3           5           7           7
> Compaction stalls                                865        1270        1422        1518
> Compaction success                               370         401         353         383
> Compaction failures                              495         869        1069        1135
> Compaction pages moved                        870155     3828868     4036106     4423626
> Compaction move failure                        26429       23865       29742       27514
>
> Success rates are completely hosed for 3.4-rc2 which is almost certainly
> due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
> expected this would happen for kswapd and impair allocation success rates
> (https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
> a difference: 80% less scanning, 37% less reclaim by kswapd
>
> In comparison, reclaim/compaction is not aggressive and gives up easily
> which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
> much more aggressive about reclaim/compaction than THP allocations are. The
> stress test above is allocating like neither THP or hugetlbfs but is much
> closer to THP.
>
> Mainline is now impaired in terms of high order allocation under heavy load
> although I do not know to what degree as I did not test with __GFP_REPEAT.
> Keep this in mind for bugs related to hugepage pool resizing, THP allocation
> and high order atomic allocation failures from network devices.
>
> In terms of congestion throttling, I see the following for this test
>
> FTrace Reclaim Statistics: congestion_wait
> Direct number congest     waited                 3          0          0          0
> Direct time   congest     waited               0ms        0ms        0ms        0ms
> Direct full   congest     waited                 0          0          0          0
> Direct number conditional waited               957        512       1081       1075
> Direct time   conditional waited               0ms        0ms        0ms        0ms
> Direct full   conditional waited                 0          0          0          0
> KSwapd number congest     waited                36          4          3          5
> KSwapd time   congest     waited            3148ms      400ms      300ms      500ms
> KSwapd full   congest     waited                30          4          3          5
> KSwapd number conditional waited             88514        197        332        542
> KSwapd time   conditional waited            4980ms        0ms        0ms        0ms
> KSwapd full   conditional waited                49          0          0          0
>
> The "conditional waited" times are the most interesting as this is directly
> impacted by the number of dirty pages encountered during scan. As lumpy
> reclaim is no longer scanning contiguous ranges, it is finding fewer dirty
> pages. This brings wait times from about 5 seconds to 0. kswapd itself is
> still calling congestion_wait() so it'll still stall but it's a lot less.
>
> In terms of the type of IO we were doing, I see this
>
> FTrace Reclaim Statistics: mm_vmscan_writepage
> Direct writes anon  sync                         0          0          0          0
> Direct writes anon  async                        0          0          0          0
> Direct writes file  sync                         0          0          0          0
> Direct writes file  async                        0          0          0          0
> Direct writes mixed sync                         0          0          0          0
> Direct writes mixed async                        0          0          0          0
> KSwapd writes anon  sync                         0          0          0          0
> KSwapd writes anon  async                    91682          0          0          0
> KSwapd writes file  sync                         0          0          0          0
> KSwapd writes file  async                   822629          0          0          0
> KSwapd writes mixed sync                         0          0          0          0
> KSwapd writes mixed async                        0          0          0          0
>
> In 3.2, kswapd was doing a bunch of async writes of pages but
> reclaim/compaction was never reaching a point where it was doing sync
> IO. This does not guarantee that reclaim/compaction was not calling
> wait_on_page_writeback() but I would consider it unlikely. It indicates
> that merging patches 2 and 3 to stop reclaim/compaction calling
> wait_on_page_writeback() should be safe.
>
>  include/trace/events/vmscan.h |   40 ++-----
>  mm/vmscan.c                   |  263 ++++-------------------------------------
>  2 files changed, 37 insertions(+), 266 deletions(-)
>
> --
> 1.7.9.2
>

It might be a naive question, what we do w/ users with the following
in the .config file?

# CONFIG_COMPACTION is not set

--Ying

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-11 23:37   ` Ying Han
  0 siblings, 0 replies; 36+ messages in thread
From: Ying Han @ 2012-04-11 23:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins,
	Linux-MM, LKML

On Wed, Apr 11, 2012 at 9:38 AM, Mel Gorman <mgorman@suse.de> wrote:
> Andrew, these three patches should replace the two lumpy reclaim patches
> you already have. When applied, there is no functional difference (slightly
> changes in layout) but the changelogs are better.
>
> Changelog since V1
> o Ying pointed out that compaction was waiting on page writeback and the
>  description of the patches in V1 was broken. This version is the same
>  except that it is structured differently to explain that waiting on
>  page writeback is removed.
> o Rebased to v3.4-rc2
>
> This series removes lumpy reclaim and some stalling logic that was
> unintentionally being used by memory compaction. The end result
> is that stalling on dirty pages during page reclaim now depends on
> wait_iff_congested().
>
> Four kernels were compared
>
> 3.3.0     vanilla
> 3.4.0-rc2 vanilla
> 3.4.0-rc2 lumpyremove-v2 is patch one from this series
> 3.4.0-rc2 nosync-v2r3 is the full series
>
> Removing lumpy reclaim saves almost 900K of text where as the full series
> removes 1200K of text.
>
>   text    data     bss     dec     hex filename
> 6740375 1927944 2260992 10929311         a6c49f vmlinux-3.4.0-rc2-vanilla
> 6739479 1927944 2260992 10928415         a6c11f vmlinux-3.4.0-rc2-lumpyremove-v2
> 6739159 1927944 2260992 10928095         a6bfdf vmlinux-3.4.0-rc2-nosync-v2
>
> There are behaviour changes in the series and so tests were run with
> monitoring of ftrace events. This disrupts results so the performance
> results are distorted but the new behaviour should be clearer.
>
> fs-mark running in a threaded configuration showed little of interest as
> it did not push reclaim aggressively
>
> FS-Mark Multi Threaded
>                        3.3.0-vanilla       rc2-vanilla       lumpyremove-v2r3       nosync-v2r3
> Files/s  min           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
> Files/s  mean          3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
> Files/s  stddev        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)        0.00 ( 0.00%)
> Files/s  max           3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)        3.20 ( 0.00%)
> Overhead min      508667.00 ( 0.00%)   521350.00 (-2.49%)   544292.00 (-7.00%)   547168.00 (-7.57%)
> Overhead mean     551185.00 ( 0.00%)   652690.73 (-18.42%)   991208.40 (-79.83%)   570130.53 (-3.44%)
> Overhead stddev    18200.69 ( 0.00%)   331958.29 (-1723.88%)  1579579.43 (-8578.68%)     9576.81 (47.38%)
> Overhead max      576775.00 ( 0.00%)  1846634.00 (-220.17%)  6901055.00 (-1096.49%)   585675.00 (-1.54%)
> MMTests Statistics: duration
> Sys Time Running Test (seconds)             309.90    300.95    307.33    298.95
> User+Sys Time Running Test (seconds)        319.32    309.67    315.69    307.51
> Total Elapsed Time (seconds)               1187.85   1193.09   1191.98   1193.73
>
> MMTests Statistics: vmstat
> Page Ins                                       80532       82212       81420       79480
> Page Outs                                  111434984   111456240   111437376   111582628
> Swap Ins                                           0           0           0           0
> Swap Outs                                          0           0           0           0
> Direct pages scanned                           44881       27889       27453       34843
> Kswapd pages scanned                        25841428    25860774    25861233    25843212
> Kswapd pages reclaimed                      25841393    25860741    25861199    25843179
> Direct pages reclaimed                         44881       27889       27453       34843
> Kswapd efficiency                                99%         99%         99%         99%
> Kswapd velocity                            21754.791   21675.460   21696.029   21649.127
> Direct efficiency                               100%        100%        100%        100%
> Direct velocity                               37.783      23.375      23.031      29.188
> Percentage direct scans                           0%          0%          0%          0%
>
> ftrace showed that there was no stalling on writeback or pages submitted
> for IO from reclaim context.
>
>
> postmark was similar and while it was more interesting, it also did not
> push reclaim heavily.
>
> POSTMARK
>                                     3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
> Transactions per second:               16.00 ( 0.00%)    20.00 (25.00%)    18.00 (12.50%)    17.00 ( 6.25%)
> Data megabytes read per second:        18.80 ( 0.00%)    24.27 (29.10%)    22.26 (18.40%)    20.54 ( 9.26%)
> Data megabytes written per second:     35.83 ( 0.00%)    46.25 (29.08%)    42.42 (18.39%)    39.14 ( 9.24%)
> Files created alone per second:        28.00 ( 0.00%)    38.00 (35.71%)    34.00 (21.43%)    30.00 ( 7.14%)
> Files create/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)
> Files deleted alone per second:       556.00 ( 0.00%)  1224.00 (120.14%)  3062.00 (450.72%)  6124.00 (1001.44%)
> Files delete/transact per second:       8.00 ( 0.00%)    10.00 (25.00%)     9.00 (12.50%)     8.00 ( 0.00%)
>
> MMTests Statistics: duration
> Sys Time Running Test (seconds)             113.34    107.99    109.73    108.72
> User+Sys Time Running Test (seconds)        145.51    139.81    143.32    143.55
> Total Elapsed Time (seconds)               1159.16    899.23    980.17   1062.27
>
> MMTests Statistics: vmstat
> Page Ins                                    13710192    13729032    13727944    13760136
> Page Outs                                   43071140    42987228    42733684    42931624
> Swap Ins                                           0           0           0           0
> Swap Outs                                          0           0           0           0
> Direct pages scanned                               0           0           0           0
> Kswapd pages scanned                         9941613     9937443     9939085     9929154
> Kswapd pages reclaimed                       9940926     9936751     9938397     9928465
> Direct pages reclaimed                             0           0           0           0
> Kswapd efficiency                                99%         99%         99%         99%
> Kswapd velocity                             8576.567   11051.058   10140.164    9347.109
> Direct efficiency                               100%        100%        100%        100%
> Direct velocity                                0.000       0.000       0.000       0.000
>
> It looks like here that the full series regresses performance but as ftrace
> showed no usage of wait_iff_congested() or sync reclaim I am assuming it's
> a disruption due to monitoring. Other data such as memory usage, page IO,
> swap IO all looked similar.
>
> Running a benchmark with a plain DD showed nothing very interesting. The
> full series stalled in wait_iff_congested() slightly less but stall times
> on vanilla kernels were marginal.
>
> Running a benchmark that hammered on file-backed mappings showed stalls
> due to congestion but not in sync writebacks
>
> MICRO
>                                     3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
> MMTests Statistics: duration
> Sys Time Running Test (seconds)             308.13    294.50    298.75    299.53
> User+Sys Time Running Test (seconds)        330.45    316.28    318.93    320.79
> Total Elapsed Time (seconds)               1814.90   1833.88   1821.14   1832.91
>
> MMTests Statistics: vmstat
> Page Ins                                      108712      120708       97224      110344
> Page Outs                                  155514576   156017404   155813676   156193256
> Swap Ins                                           0           0           0           0
> Swap Outs                                          0           0           0           0
> Direct pages scanned                         2599253     1550480     2512822     2414760
> Kswapd pages scanned                        69742364    71150694    68839041    69692533
> Kswapd pages reclaimed                      34824488    34773341    34796602    34799396
> Direct pages reclaimed                         53693       94750       61792       75205
> Kswapd efficiency                                49%         48%         50%         49%
> Kswapd velocity                            38427.662   38797.901   37799.972   38022.889
> Direct efficiency                                 2%          6%          2%          3%
> Direct velocity                             1432.174     845.464    1379.807    1317.446
> Percentage direct scans                           3%          2%          3%          3%
> Page writes by reclaim                             0           0           0           0
> Page writes file                                   0           0           0           0
> Page writes anon                                   0           0           0           0
> Page reclaim immediate                             0           0           0        1218
> Page rescued immediate                             0           0           0           0
> Slabs scanned                                  15360       16384       13312       16384
> Direct inode steals                                0           0           0           0
> Kswapd inode steals                             4340        4327        1630        4323
>
> FTrace Reclaim Statistics: congestion_wait
> Direct number congest     waited                 0          0          0          0
> Direct time   congest     waited               0ms        0ms        0ms        0ms
> Direct full   congest     waited                 0          0          0          0
> Direct number conditional waited               900        870        754        789
> Direct time   conditional waited               0ms        0ms        0ms       20ms
> Direct full   conditional waited                 0          0          0          0
> KSwapd number congest     waited              2106       2308       2116       1915
> KSwapd time   congest     waited          139924ms   157832ms   125652ms   132516ms
> KSwapd full   congest     waited              1346       1530       1202       1278
> KSwapd number conditional waited             12922      16320      10943      14670
> KSwapd time   conditional waited               0ms        0ms        0ms        0ms
> KSwapd full   conditional waited                 0          0          0          0
>
>
> Reclaim statistics are not radically changed. The stall times in kswapd
> are massive but it is clear that it is due to calls to congestion_wait()
> and that is almost certainly the call in balance_pgdat(). Otherwise stalls
> due to dirty pages are non-existant.
>
> I ran a benchmark that stressed high-order allocation. This is very
> artifical load but was used in the past to evaluate lumpy reclaim and
> compaction. Generally I look at allocation success rates and latency figures.
>
> STRESS-HIGHALLOC
>                 3.3.0-vanilla       rc2-vanilla  lumpyremove-v2r3       nosync-v2r3
> Pass 1          81.00 ( 0.00%)    28.00 (-53.00%)    24.00 (-57.00%)    28.00 (-53.00%)
> Pass 2          82.00 ( 0.00%)    39.00 (-43.00%)    38.00 (-44.00%)    43.00 (-39.00%)
> while Rested    88.00 ( 0.00%)    87.00 (-1.00%)    88.00 ( 0.00%)    88.00 ( 0.00%)
>
> MMTests Statistics: duration
> Sys Time Running Test (seconds)             740.93    681.42    685.14    684.87
> User+Sys Time Running Test (seconds)       2922.65   3269.52   3281.35   3279.44
> Total Elapsed Time (seconds)               1161.73   1152.49   1159.55   1161.44
>
> MMTests Statistics: vmstat
> Page Ins                                     4486020     2807256     2855944     2876244
> Page Outs                                    7261600     7973688     7975320     7986120
> Swap Ins                                       31694           0           0           0
> Swap Outs                                      98179           0           0           0
> Direct pages scanned                           53494       57731       34406      113015
> Kswapd pages scanned                         6271173     1287481     1278174     1219095
> Kswapd pages reclaimed                       2029240     1281025     1260708     1201583
> Direct pages reclaimed                          1468       14564       16649       92456
> Kswapd efficiency                                32%         99%         98%         98%
> Kswapd velocity                             5398.133    1117.130    1102.302    1049.641
> Direct efficiency                                 2%         25%         48%         81%
> Direct velocity                               46.047      50.092      29.672      97.306
> Percentage direct scans                           0%          4%          2%          8%
> Page writes by reclaim                       1616049           0           0           0
> Page writes file                             1517870           0           0           0
> Page writes anon                               98179           0           0           0
> Page reclaim immediate                        103778       27339        9796       17831
> Page rescued immediate                             0           0           0           0
> Slabs scanned                                1096704      986112      980992      998400
> Direct inode steals                              223      215040      216736      247881
> Kswapd inode steals                           175331       61548       68444       63066
> Kswapd skipped wait                            21991           0           1           0
> THP fault alloc                                    1         135         125         134
> THP collapse alloc                               393         311         228         236
> THP splits                                        25          13           7           8
> THP fault fallback                                 0           0           0           0
> THP collapse fail                                  3           5           7           7
> Compaction stalls                                865        1270        1422        1518
> Compaction success                               370         401         353         383
> Compaction failures                              495         869        1069        1135
> Compaction pages moved                        870155     3828868     4036106     4423626
> Compaction move failure                        26429       23865       29742       27514
>
> Success rates are completely hosed for 3.4-rc2 which is almost certainly
> due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
> expected this would happen for kswapd and impair allocation success rates
> (https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
> a difference: 80% less scanning, 37% less reclaim by kswapd
>
> In comparison, reclaim/compaction is not aggressive and gives up easily
> which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
> much more aggressive about reclaim/compaction than THP allocations are. The
> stress test above is allocating like neither THP or hugetlbfs but is much
> closer to THP.
>
> Mainline is now impaired in terms of high order allocation under heavy load
> although I do not know to what degree as I did not test with __GFP_REPEAT.
> Keep this in mind for bugs related to hugepage pool resizing, THP allocation
> and high order atomic allocation failures from network devices.
>
> In terms of congestion throttling, I see the following for this test
>
> FTrace Reclaim Statistics: congestion_wait
> Direct number congest     waited                 3          0          0          0
> Direct time   congest     waited               0ms        0ms        0ms        0ms
> Direct full   congest     waited                 0          0          0          0
> Direct number conditional waited               957        512       1081       1075
> Direct time   conditional waited               0ms        0ms        0ms        0ms
> Direct full   conditional waited                 0          0          0          0
> KSwapd number congest     waited                36          4          3          5
> KSwapd time   congest     waited            3148ms      400ms      300ms      500ms
> KSwapd full   congest     waited                30          4          3          5
> KSwapd number conditional waited             88514        197        332        542
> KSwapd time   conditional waited            4980ms        0ms        0ms        0ms
> KSwapd full   conditional waited                49          0          0          0
>
> The "conditional waited" times are the most interesting as this is directly
> impacted by the number of dirty pages encountered during scan. As lumpy
> reclaim is no longer scanning contiguous ranges, it is finding fewer dirty
> pages. This brings wait times from about 5 seconds to 0. kswapd itself is
> still calling congestion_wait() so it'll still stall but it's a lot less.
>
> In terms of the type of IO we were doing, I see this
>
> FTrace Reclaim Statistics: mm_vmscan_writepage
> Direct writes anon  sync                         0          0          0          0
> Direct writes anon  async                        0          0          0          0
> Direct writes file  sync                         0          0          0          0
> Direct writes file  async                        0          0          0          0
> Direct writes mixed sync                         0          0          0          0
> Direct writes mixed async                        0          0          0          0
> KSwapd writes anon  sync                         0          0          0          0
> KSwapd writes anon  async                    91682          0          0          0
> KSwapd writes file  sync                         0          0          0          0
> KSwapd writes file  async                   822629          0          0          0
> KSwapd writes mixed sync                         0          0          0          0
> KSwapd writes mixed async                        0          0          0          0
>
> In 3.2, kswapd was doing a bunch of async writes of pages but
> reclaim/compaction was never reaching a point where it was doing sync
> IO. This does not guarantee that reclaim/compaction was not calling
> wait_on_page_writeback() but I would consider it unlikely. It indicates
> that merging patches 2 and 3 to stop reclaim/compaction calling
> wait_on_page_writeback() should be safe.
>
>  include/trace/events/vmscan.h |   40 ++-----
>  mm/vmscan.c                   |  263 ++++-------------------------------------
>  2 files changed, 37 insertions(+), 266 deletions(-)
>
> --
> 1.7.9.2
>

It might be a naive question, what we do w/ users with the following
in the .config file?

# CONFIG_COMPACTION is not set

--Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
  2012-04-11 16:38 ` Mel Gorman
@ 2012-04-11 23:54   ` Hugh Dickins
  -1 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2012-04-11 23:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Ying Han,
	Linux-MM, LKML

On Wed, 11 Apr 2012, Mel Gorman wrote:
> 
> Removing lumpy reclaim saves almost 900K of text where as the full series
> removes 1200K of text.

Impressive...

> 
>    text	   data	    bss	    dec	    hex	filename
> 6740375	1927944	2260992	10929311	 a6c49f	vmlinux-3.4.0-rc2-vanilla
> 6739479	1927944	2260992	10928415	 a6c11f	vmlinux-3.4.0-rc2-lumpyremove-v2
> 6739159	1927944	2260992	10928095	 a6bfdf	vmlinux-3.4.0-rc2-nosync-v2

... but I fear you meant " bytes" instead of "K" ;)

Hugh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-11 23:54   ` Hugh Dickins
  0 siblings, 0 replies; 36+ messages in thread
From: Hugh Dickins @ 2012-04-11 23:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Ying Han,
	Linux-MM, LKML

On Wed, 11 Apr 2012, Mel Gorman wrote:
> 
> Removing lumpy reclaim saves almost 900K of text where as the full series
> removes 1200K of text.

Impressive...

> 
>    text	   data	    bss	    dec	    hex	filename
> 6740375	1927944	2260992	10929311	 a6c49f	vmlinux-3.4.0-rc2-vanilla
> 6739479	1927944	2260992	10928415	 a6c11f	vmlinux-3.4.0-rc2-lumpyremove-v2
> 6739159	1927944	2260992	10928095	 a6bfdf	vmlinux-3.4.0-rc2-nosync-v2

... but I fear you meant " bytes" instead of "K" ;)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
  2012-04-11 23:54   ` Hugh Dickins
@ 2012-04-12  5:44     ` Mel Gorman
  -1 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-12  5:44 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Ying Han,
	Linux-MM, LKML

On Wed, Apr 11, 2012 at 04:54:18PM -0700, Hugh Dickins wrote:
> On Wed, 11 Apr 2012, Mel Gorman wrote:
> > 
> > Removing lumpy reclaim saves almost 900K of text where as the full series
> > removes 1200K of text.
> 
> Impressive...
> 
> > 
> >    text	   data	    bss	    dec	    hex	filename
> > 6740375	1927944	2260992	10929311	 a6c49f	vmlinux-3.4.0-rc2-vanilla
> > 6739479	1927944	2260992	10928415	 a6c11f	vmlinux-3.4.0-rc2-lumpyremove-v2
> > 6739159	1927944	2260992	10928095	 a6bfdf	vmlinux-3.4.0-rc2-nosync-v2
> 
> ... but I fear you meant " bytes" instead of "K" ;)
> 

Whoops, I do :)

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-12  5:44     ` Mel Gorman
  0 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-12  5:44 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Ying Han,
	Linux-MM, LKML

On Wed, Apr 11, 2012 at 04:54:18PM -0700, Hugh Dickins wrote:
> On Wed, 11 Apr 2012, Mel Gorman wrote:
> > 
> > Removing lumpy reclaim saves almost 900K of text where as the full series
> > removes 1200K of text.
> 
> Impressive...
> 
> > 
> >    text	   data	    bss	    dec	    hex	filename
> > 6740375	1927944	2260992	10929311	 a6c49f	vmlinux-3.4.0-rc2-vanilla
> > 6739479	1927944	2260992	10928415	 a6c11f	vmlinux-3.4.0-rc2-lumpyremove-v2
> > 6739159	1927944	2260992	10928095	 a6bfdf	vmlinux-3.4.0-rc2-nosync-v2
> 
> ... but I fear you meant " bytes" instead of "K" ;)
> 

Whoops, I do :)

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
  2012-04-11 23:37   ` Ying Han
@ 2012-04-12  5:49     ` Mel Gorman
  -1 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-12  5:49 UTC (permalink / raw)
  To: Ying Han
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins,
	Linux-MM, LKML

On Wed, Apr 11, 2012 at 04:37:00PM -0700, Ying Han wrote:
> > In 3.2, kswapd was doing a bunch of async writes of pages but
> > reclaim/compaction was never reaching a point where it was doing sync
> > IO. This does not guarantee that reclaim/compaction was not calling
> > wait_on_page_writeback() but I would consider it unlikely. It indicates
> > that merging patches 2 and 3 to stop reclaim/compaction calling
> > wait_on_page_writeback() should be safe.
> >
> >  include/trace/events/vmscan.h |   40 ++-----
> >  mm/vmscan.c                   |  263 ++++-------------------------------------
> >  2 files changed, 37 insertions(+), 266 deletions(-)
> >
> > --
> > 1.7.9.2
> >
> 
> It might be a naive question, what we do w/ users with the following
> in the .config file?
> 
> # CONFIG_COMPACTION is not set
> 

After lumpy reclaim is removed page reclaim will be reclaiming at order-0
randomly to see if that frees up a high-order page randomly. It remains to
be seen how many users really depended on lumpy reclaim like this and as
to why they were not using compaction. Two configurations that may care are
NOMMU and SLUB. NOMMU may not notice as they were already unable to handle
anonymous pages in lumpy reclaim. SLUB will fallback to using order-0 pages.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-12  5:49     ` Mel Gorman
  0 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-12  5:49 UTC (permalink / raw)
  To: Ying Han
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins,
	Linux-MM, LKML

On Wed, Apr 11, 2012 at 04:37:00PM -0700, Ying Han wrote:
> > In 3.2, kswapd was doing a bunch of async writes of pages but
> > reclaim/compaction was never reaching a point where it was doing sync
> > IO. This does not guarantee that reclaim/compaction was not calling
> > wait_on_page_writeback() but I would consider it unlikely. It indicates
> > that merging patches 2 and 3 to stop reclaim/compaction calling
> > wait_on_page_writeback() should be safe.
> >
> >  include/trace/events/vmscan.h |   40 ++-----
> >  mm/vmscan.c                   |  263 ++++-------------------------------------
> >  2 files changed, 37 insertions(+), 266 deletions(-)
> >
> > --
> > 1.7.9.2
> >
> 
> It might be a naive question, what we do w/ users with the following
> in the .config file?
> 
> # CONFIG_COMPACTION is not set
> 

After lumpy reclaim is removed page reclaim will be reclaiming at order-0
randomly to see if that frees up a high-order page randomly. It remains to
be seen how many users really depended on lumpy reclaim like this and as
to why they were not using compaction. Two configurations that may care are
NOMMU and SLUB. NOMMU may not notice as they were already unable to handle
anonymous pages in lumpy reclaim. SLUB will fallback to using order-0 pages.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
  2012-04-11 18:06       ` Rik van Riel
@ 2012-04-12  9:32         ` Mel Gorman
  -1 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-12  9:32 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On Wed, Apr 11, 2012 at 02:06:11PM -0400, Rik van Riel wrote:
> On 04/11/2012 01:52 PM, Mel Gorman wrote:
> >On Wed, Apr 11, 2012 at 01:17:02PM -0400, Rik van Riel wrote:
> 
> >>Next step: get rid of __GFP_NO_KSWAPD for THP, first
> >>in the -mm kernel
> >>
> >
> >Initially the flag was introduced because kswapd reclaimed too
> >aggressively. One would like to believe that it would be less of a problem
> >now but we must avoid a situation where the CPU and reclaim cost of kswapd
> >exceeds the benefit of allocating a THP.
> 
> Since kswapd and the direct reclaim code now use
> the same conditionals for calling compaction,
> the cost ought to be identical.
> 

kswapd has different retry logic for reclaim and can stay awake if there
are continual calls to wakeup_kswapd() setting pgdat->kswapd_max_order
and kswapd makes forward progress. It's not identical enough that I would
express 100% confidence that it will be free of problems.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Removal of lumpy reclaim V2
@ 2012-04-12  9:32         ` Mel Gorman
  0 siblings, 0 replies; 36+ messages in thread
From: Mel Gorman @ 2012-04-12  9:32 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Konstantin Khlebnikov, Hugh Dickins, Ying Han,
	Linux-MM, LKML

On Wed, Apr 11, 2012 at 02:06:11PM -0400, Rik van Riel wrote:
> On 04/11/2012 01:52 PM, Mel Gorman wrote:
> >On Wed, Apr 11, 2012 at 01:17:02PM -0400, Rik van Riel wrote:
> 
> >>Next step: get rid of __GFP_NO_KSWAPD for THP, first
> >>in the -mm kernel
> >>
> >
> >Initially the flag was introduced because kswapd reclaimed too
> >aggressively. One would like to believe that it would be less of a problem
> >now but we must avoid a situation where the CPU and reclaim cost of kswapd
> >exceeds the benefit of allocating a THP.
> 
> Since kswapd and the direct reclaim code now use
> the same conditionals for calling compaction,
> the cost ought to be identical.
> 

kswapd has different retry logic for reclaim and can stay awake if there
are continual calls to wakeup_kswapd() setting pgdat->kswapd_max_order
and kswapd makes forward progress. It's not identical enough that I would
express 100% confidence that it will be free of problems.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2012-04-12  9:32 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11 16:38 [PATCH 0/3] Removal of lumpy reclaim V2 Mel Gorman
2012-04-11 16:38 ` Mel Gorman
2012-04-11 16:38 ` [PATCH 1/3] mm: vmscan: Remove lumpy reclaim Mel Gorman
2012-04-11 16:38   ` Mel Gorman
2012-04-11 17:25   ` Rik van Riel
2012-04-11 17:25     ` Rik van Riel
2012-04-11 18:54     ` KOSAKI Motohiro
2012-04-11 18:54       ` KOSAKI Motohiro
2012-04-11 16:38 ` [PATCH 2/3] mm: vmscan: Do not stall on writeback during memory compaction Mel Gorman
2012-04-11 16:38   ` Mel Gorman
2012-04-11 17:26   ` Rik van Riel
2012-04-11 17:26     ` Rik van Riel
2012-04-11 18:51     ` KOSAKI Motohiro
2012-04-11 18:51       ` KOSAKI Motohiro
2012-04-11 16:38 ` [PATCH 3/3] mm: vmscan: Remove reclaim_mode_t Mel Gorman
2012-04-11 16:38   ` Mel Gorman
2012-04-11 17:26   ` Rik van Riel
2012-04-11 17:26     ` Rik van Riel
2012-04-11 19:48     ` KOSAKI Motohiro
2012-04-11 19:48       ` KOSAKI Motohiro
2012-04-11 17:17 ` [PATCH 0/3] Removal of lumpy reclaim V2 Rik van Riel
2012-04-11 17:17   ` Rik van Riel
2012-04-11 17:52   ` Mel Gorman
2012-04-11 17:52     ` Mel Gorman
2012-04-11 18:06     ` Rik van Riel
2012-04-11 18:06       ` Rik van Riel
2012-04-12  9:32       ` Mel Gorman
2012-04-12  9:32         ` Mel Gorman
2012-04-11 23:37 ` Ying Han
2012-04-11 23:37   ` Ying Han
2012-04-12  5:49   ` Mel Gorman
2012-04-12  5:49     ` Mel Gorman
2012-04-11 23:54 ` Hugh Dickins
2012-04-11 23:54   ` Hugh Dickins
2012-04-12  5:44   ` Mel Gorman
2012-04-12  5:44     ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.