All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] page stealing tweaks
@ 2014-12-04 17:12 ` Vlastimil Babka
  0 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-04 17:12 UTC (permalink / raw)
  To: linux-mm, Joonsoo Kim
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes, Vlastimil Babka

When studying page stealing, I noticed some weird looking decisions in
try_to_steal_freepages(). The first I assume is a bug (Patch 1), the following
two patches were driven by evaluation.

Testing was done with stress-highalloc of mmtests, using the
mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how often
page stealing occurs for individual migratetypes, and what migratetypes are
used for fallbacks. Arguably, the worst case of page stealing is when
UNMOVABLE allocation steals from MOVABLE pageblock. RECLAIMABLE allocation
stealing from MOVABLE allocation is also not ideal, so the goal is to minimize
these two cases.

For some reason, the first patch increased the number of page stealing events
for MOVABLE allocations, and I am still not sure why. In theory these events
are not as bad, and the third patch does more than just to correct this.

Here are the results, baseline (column 26) is 3.17-rc7 with compaction patches
from -mm. First, the results with benchmark set to mimic non-THP-like
whole-pageblock allocations. Discussion below:

stress-highalloc
                             3.17-rc7              3.17-rc7              3.17-rc7              3.17-rc7
                             26-nothp              27-nothp              28-nothp              29-nothp
Success 1 Min         20.00 (  0.00%)       31.00 (-55.00%)       33.00 (-65.00%)       23.00 (-15.00%)
Success 1 Mean        32.70 (  0.00%)       39.00 (-19.27%)       39.10 (-19.57%)       35.80 ( -9.48%)
Success 1 Max         42.00 (  0.00%)       44.00 ( -4.76%)       46.00 ( -9.52%)       45.00 ( -7.14%)
Success 2 Min         20.00 (  0.00%)       33.00 (-65.00%)       36.00 (-80.00%)       24.00 (-20.00%)
Success 2 Mean        33.90 (  0.00%)       41.30 (-21.83%)       41.70 (-23.01%)       36.80 ( -8.55%)
Success 2 Max         44.00 (  0.00%)       49.00 (-11.36%)       49.00 (-11.36%)       45.00 ( -2.27%)
Success 3 Min         84.00 (  0.00%)       86.00 ( -2.38%)       86.00 ( -2.38%)       85.00 ( -1.19%)
Success 3 Mean        86.40 (  0.00%)       87.20 ( -0.93%)       87.20 ( -0.93%)       86.80 ( -0.46%)
Success 3 Max         88.00 (  0.00%)       89.00 ( -1.14%)       89.00 ( -1.14%)       88.00 (  0.00%)

            3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
            26-nothp    27-nothp    28-nothp    29-nothp
User         6818.93     6775.23     6759.60     6783.81
System       1055.97     1056.31     1055.37     1057.36
Elapsed      2150.18     2211.63     2196.91     2201.93

                              3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
                              26-nothp    27-nothp    28-nothp    29-nothp
Minor Faults                 198162003   197936707   197750617   198414323
Major Faults                       462         511         533         490
Swap Ins                            29          31          42          21
Swap Outs                         2142        2225        2616        2276
Allocation stalls                 6030        7716        6856        6175
DMA allocs                         112         102         128          73
DMA32 allocs                 124578777   124503016   124372538   124840569
Normal allocs                 59157970    59165895    59160083    59154005
Movable allocs                       0           0           0           0
Direct pages scanned            353190      424846      395619      359421
Kswapd pages scanned           2201775     2221571     2223699     2254336
Kswapd pages reclaimed         2196630     2216042     2218175     2242737
Direct pages reclaimed          352402      423989      394801      358321
Kswapd efficiency                  99%         99%         99%         99%
Kswapd velocity               1011.483    1019.369    1016.418    1010.895
Direct efficiency                  99%         99%         99%         99%
Direct velocity                162.253     194.941     180.832     161.173
Percentage direct scans            13%         16%         15%         13%
Zone normal velocity           381.505     402.030     393.093     376.382
Zone dma32 velocity            792.218     812.269     804.143     795.679
Zone dma velocity                0.012       0.011       0.014       0.007
Page writes by reclaim        2316.900    2366.600    2791.300    2492.700
Page writes file                   174         141         174         216
Page writes anon                  2142        2225        2616        2276
Page reclaim immediate            1381        1586        1314        8126
Sector Reads                   4703932     4775640     4750501     4747452
Sector Writes                 12758092    12720075    12695676    12790100
Page rescued immediate               0           0           0           0
Slabs scanned                  1750170     1871811     1847197     1822608
Direct inode steals              14468       14838       14872       14241
Kswapd inode steals              38766       40510       40353       40442
Kswapd skipped wait                  0           0           0           0
THP fault alloc                    262         221         239         239
THP collapse alloc                 506         494         535         491
THP splits                          14          12          14          14
THP fault fallback                   7          33          10          39
THP collapse fail                   17          18          16          18
Compaction stalls                 2746        3359        3185        2981
Compaction success                1025        1188        1153        1097
Compaction failures               1721        2170        2032        1884
Page migrate success           3889927     4512417     4340044     4128768
Page migrate failure             14551       17660       17096       14686
Compaction pages isolated      8058458     9337143     8974871     8554984
Compaction migrate scanned   156216179   187390755   178241572   163503245
Compaction free scanned      317797413   388387641   361523988   341521402
Compaction cost                   5284        6173        5923        5592
NUMA alloc hit               181314344   181142494   180975258   181531369
NUMA alloc miss                      0           0           0           0
NUMA interleave hit                  0           0           0           0
NUMA alloc local             181314344   181142494   180975258   181531369
NUMA base PTE updates                0           0           0           0
NUMA huge PMD updates                0           0           0           0
NUMA page range updates              0           0           0           0
NUMA hint faults                     0           0           0           0
NUMA hint local faults               0           0           0           0
NUMA hint local percent            100         100         100         100
NUMA pages migrated                  0           0           0           0
AutoNUMA cost                       0%          0%          0%          0%

                                                       3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
                                                       26-nothp    27-nothp    28-nothp    29-nothp
Page alloc extfrag event                                7223461    10651213    10274135     3785074
Extfrag fragmenting                                     7221775    10648719    10272431     3782605
Extfrag fragmenting for unmovable                         20264       16784        2668        2768
Extfrag fragmenting unmovable stealing from movable       10814        7531        2231        2091
Extfrag fragmenting for reclaimable                        1937        1114        1138        1268
Extfrag fragmenting reclaimable stealing from movable      1731         882         914         973
Extfrag fragmenting for movable                         7199574    10630821    10268625     3778569

As can be seen, success rates are not very much affected, or perhaps the first
patch improves them slightly. But the reduction of extfrag events is quite
prominent, especially for unmovable allocations polluting (potentially
permanently) movable pageblocks.

For completeness, the results with benchark set to mimic THP allocations are
below. It's not so different, so no extra discussion.

stress-highalloc
                             3.17-rc7              3.17-rc7              3.17-rc7              3.17-rc7
                               26-thp                27-thp                28-thp                29-thp
Success 1 Min         20.00 (  0.00%)       27.00 (-35.00%)       26.00 (-30.00%)       22.00 (-10.00%)
Success 1 Mean        28.90 (  0.00%)       33.00 (-14.19%)       31.90 (-10.38%)       29.60 ( -2.42%)
Success 1 Max         36.00 (  0.00%)       40.00 (-11.11%)       39.00 ( -8.33%)       35.00 (  2.78%)
Success 2 Min         20.00 (  0.00%)       28.00 (-40.00%)       30.00 (-50.00%)       23.00 (-15.00%)
Success 2 Mean        31.20 (  0.00%)       36.70 (-17.63%)       35.20 (-12.82%)       32.50 ( -4.17%)
Success 2 Max         39.00 (  0.00%)       43.00 (-10.26%)       42.00 ( -7.69%)       43.00 (-10.26%)
Success 3 Min         85.00 (  0.00%)       86.00 ( -1.18%)       87.00 ( -2.35%)       86.00 ( -1.18%)
Success 3 Mean        86.90 (  0.00%)       87.30 ( -0.46%)       87.70 ( -0.92%)       87.20 ( -0.35%)
Success 3 Max         88.00 (  0.00%)       88.00 (  0.00%)       90.00 ( -2.27%)       89.00 ( -1.14%)

            3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
              26-thp      27-thp      28-thp      29-thp
User         6819.54     6791.98     6817.78     6780.39
System       1060.01     1061.72     1059.55     1060.22
Elapsed      2143.61     2169.23     2151.94     2164.37

                              3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
                                26-thp      27-thp      28-thp      29-thp
Minor Faults                 197991650   197731531   197676212   198108344
Major Faults                       467         517         485         463
Swap Ins                            55          42          55          37
Swap Outs                         2743        2628        2848        2423
Allocation stalls                 5674        6859        5830        5430
DMA allocs                          21          19          18          20
DMA32 allocs                 124822788   124717762   124599426   124998427
Normal allocs                 58689613    58661322    58715465    58613337
Movable allocs                       0           0           0           0
Direct pages scanned            425873      497589      437964      440959
Kswapd pages scanned           2106472     2092938     2123314     2137886
Kswapd pages reclaimed         2100750     2087313     2117523     2124031
Direct pages reclaimed          424875      496616      437006      439572
Kswapd efficiency                  99%         99%         99%         99%
Kswapd velocity                986.439     999.617    1016.928     984.321
Direct efficiency                  99%         99%         99%         99%
Direct velocity                199.432     237.656     209.756     203.025
Percentage direct scans            16%         19%         17%         17%
Zone normal velocity           396.728     411.978     412.730     391.261
Zone dma32 velocity            789.143     825.294     813.954     796.086
Zone dma velocity                0.000       0.000       0.000       0.000
Page writes by reclaim        2963.000    2735.600    2981.900    2640.500
Page writes file                   219         107         133         217
Page writes anon                  2743        2628        2848        2423
Page reclaim immediate            1504        1609        1622        9672
Sector Reads                   4638068     4700778     4687436     4690935
Sector Writes                 12744701    12689336    12685726    12742547
Page rescued immediate               0           0           0           0
Slabs scanned                  1612929     1704964     1659159     1670590
Direct inode steals              15564       17989       16063       17179
Kswapd inode steals              31322       31013       31563       31266
Kswapd skipped wait                  0           0           0           0
THP fault alloc                    250         227         246         223
THP collapse alloc                 517         515         504         487
THP splits                          15          13          14          11
THP fault fallback                  10          24           5          38
THP collapse fail                   17          18          16          18
Compaction stalls                 2482        2794        2687        2608
Compaction success                 894        1016         995         972
Compaction failures               1588        1778        1692        1636
Page migrate success           2306759     2283240     2298373     2228802
Page migrate failure             10645       12648       10681       10023
Compaction pages isolated      4906442     4878707     4907827     4768580
Compaction migrate scanned    40396525    46362656    44372629    42315303
Compaction free scanned      134008519   146858466   131814222   132434783
Compaction cost                   2770        2787        2789        2700
NUMA alloc hit               181150856   180941682   180895401   181254771
NUMA alloc miss                      0           0           0           0
NUMA interleave hit                  0           0           0           0
NUMA alloc local             181150856   180941682   180895401   181254771
NUMA base PTE updates                0           0           0           0
NUMA huge PMD updates                0           0           0           0
NUMA page range updates              0           0           0           0
NUMA hint faults                     0           0           0           0
NUMA hint local faults               0           0           0           0
NUMA hint local percent            100         100         100         100
NUMA pages migrated                  0           0           0           0
AutoNUMA cost                       0%          0%          0%          0%

                                                       3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
                                                         26-thp      27-thp      28-thp      29-thp
Page alloc extfrag event                                4270316     5661910     5018754     2062787
Extfrag fragmenting                                     4268643     5660158     5016977     2061077
Extfrag fragmenting for unmovable                         21632       17627        1985        1984
Extfrag fragmenting unmovable placed with movable         12428        9011        1663        1506
Extfrag fragmenting for reclaimable                        1682        1106        1290        1401
Extfrag fragmenting reclaimable placed with movable        1480         917        1072        1132
Extfrag fragmenting for movable                         4245329     5641425     5013702     2057692


Vlastimil Babka (3):
  mm: when stealing freepages, also take pages created by splitting
    buddy page
  mm: more aggressive page stealing for UNMOVABLE allocations
  mm: always steal split buddies in fallback allocations

 mm/page_alloc.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

-- 
2.1.2


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 0/3] page stealing tweaks
@ 2014-12-04 17:12 ` Vlastimil Babka
  0 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-04 17:12 UTC (permalink / raw)
  To: linux-mm, Joonsoo Kim
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes, Vlastimil Babka

When studying page stealing, I noticed some weird looking decisions in
try_to_steal_freepages(). The first I assume is a bug (Patch 1), the following
two patches were driven by evaluation.

Testing was done with stress-highalloc of mmtests, using the
mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how often
page stealing occurs for individual migratetypes, and what migratetypes are
used for fallbacks. Arguably, the worst case of page stealing is when
UNMOVABLE allocation steals from MOVABLE pageblock. RECLAIMABLE allocation
stealing from MOVABLE allocation is also not ideal, so the goal is to minimize
these two cases.

For some reason, the first patch increased the number of page stealing events
for MOVABLE allocations, and I am still not sure why. In theory these events
are not as bad, and the third patch does more than just to correct this.

Here are the results, baseline (column 26) is 3.17-rc7 with compaction patches
from -mm. First, the results with benchmark set to mimic non-THP-like
whole-pageblock allocations. Discussion below:

stress-highalloc
                             3.17-rc7              3.17-rc7              3.17-rc7              3.17-rc7
                             26-nothp              27-nothp              28-nothp              29-nothp
Success 1 Min         20.00 (  0.00%)       31.00 (-55.00%)       33.00 (-65.00%)       23.00 (-15.00%)
Success 1 Mean        32.70 (  0.00%)       39.00 (-19.27%)       39.10 (-19.57%)       35.80 ( -9.48%)
Success 1 Max         42.00 (  0.00%)       44.00 ( -4.76%)       46.00 ( -9.52%)       45.00 ( -7.14%)
Success 2 Min         20.00 (  0.00%)       33.00 (-65.00%)       36.00 (-80.00%)       24.00 (-20.00%)
Success 2 Mean        33.90 (  0.00%)       41.30 (-21.83%)       41.70 (-23.01%)       36.80 ( -8.55%)
Success 2 Max         44.00 (  0.00%)       49.00 (-11.36%)       49.00 (-11.36%)       45.00 ( -2.27%)
Success 3 Min         84.00 (  0.00%)       86.00 ( -2.38%)       86.00 ( -2.38%)       85.00 ( -1.19%)
Success 3 Mean        86.40 (  0.00%)       87.20 ( -0.93%)       87.20 ( -0.93%)       86.80 ( -0.46%)
Success 3 Max         88.00 (  0.00%)       89.00 ( -1.14%)       89.00 ( -1.14%)       88.00 (  0.00%)

            3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
            26-nothp    27-nothp    28-nothp    29-nothp
User         6818.93     6775.23     6759.60     6783.81
System       1055.97     1056.31     1055.37     1057.36
Elapsed      2150.18     2211.63     2196.91     2201.93

                              3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
                              26-nothp    27-nothp    28-nothp    29-nothp
Minor Faults                 198162003   197936707   197750617   198414323
Major Faults                       462         511         533         490
Swap Ins                            29          31          42          21
Swap Outs                         2142        2225        2616        2276
Allocation stalls                 6030        7716        6856        6175
DMA allocs                         112         102         128          73
DMA32 allocs                 124578777   124503016   124372538   124840569
Normal allocs                 59157970    59165895    59160083    59154005
Movable allocs                       0           0           0           0
Direct pages scanned            353190      424846      395619      359421
Kswapd pages scanned           2201775     2221571     2223699     2254336
Kswapd pages reclaimed         2196630     2216042     2218175     2242737
Direct pages reclaimed          352402      423989      394801      358321
Kswapd efficiency                  99%         99%         99%         99%
Kswapd velocity               1011.483    1019.369    1016.418    1010.895
Direct efficiency                  99%         99%         99%         99%
Direct velocity                162.253     194.941     180.832     161.173
Percentage direct scans            13%         16%         15%         13%
Zone normal velocity           381.505     402.030     393.093     376.382
Zone dma32 velocity            792.218     812.269     804.143     795.679
Zone dma velocity                0.012       0.011       0.014       0.007
Page writes by reclaim        2316.900    2366.600    2791.300    2492.700
Page writes file                   174         141         174         216
Page writes anon                  2142        2225        2616        2276
Page reclaim immediate            1381        1586        1314        8126
Sector Reads                   4703932     4775640     4750501     4747452
Sector Writes                 12758092    12720075    12695676    12790100
Page rescued immediate               0           0           0           0
Slabs scanned                  1750170     1871811     1847197     1822608
Direct inode steals              14468       14838       14872       14241
Kswapd inode steals              38766       40510       40353       40442
Kswapd skipped wait                  0           0           0           0
THP fault alloc                    262         221         239         239
THP collapse alloc                 506         494         535         491
THP splits                          14          12          14          14
THP fault fallback                   7          33          10          39
THP collapse fail                   17          18          16          18
Compaction stalls                 2746        3359        3185        2981
Compaction success                1025        1188        1153        1097
Compaction failures               1721        2170        2032        1884
Page migrate success           3889927     4512417     4340044     4128768
Page migrate failure             14551       17660       17096       14686
Compaction pages isolated      8058458     9337143     8974871     8554984
Compaction migrate scanned   156216179   187390755   178241572   163503245
Compaction free scanned      317797413   388387641   361523988   341521402
Compaction cost                   5284        6173        5923        5592
NUMA alloc hit               181314344   181142494   180975258   181531369
NUMA alloc miss                      0           0           0           0
NUMA interleave hit                  0           0           0           0
NUMA alloc local             181314344   181142494   180975258   181531369
NUMA base PTE updates                0           0           0           0
NUMA huge PMD updates                0           0           0           0
NUMA page range updates              0           0           0           0
NUMA hint faults                     0           0           0           0
NUMA hint local faults               0           0           0           0
NUMA hint local percent            100         100         100         100
NUMA pages migrated                  0           0           0           0
AutoNUMA cost                       0%          0%          0%          0%

                                                       3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
                                                       26-nothp    27-nothp    28-nothp    29-nothp
Page alloc extfrag event                                7223461    10651213    10274135     3785074
Extfrag fragmenting                                     7221775    10648719    10272431     3782605
Extfrag fragmenting for unmovable                         20264       16784        2668        2768
Extfrag fragmenting unmovable stealing from movable       10814        7531        2231        2091
Extfrag fragmenting for reclaimable                        1937        1114        1138        1268
Extfrag fragmenting reclaimable stealing from movable      1731         882         914         973
Extfrag fragmenting for movable                         7199574    10630821    10268625     3778569

As can be seen, success rates are not very much affected, or perhaps the first
patch improves them slightly. But the reduction of extfrag events is quite
prominent, especially for unmovable allocations polluting (potentially
permanently) movable pageblocks.

For completeness, the results with benchark set to mimic THP allocations are
below. It's not so different, so no extra discussion.

stress-highalloc
                             3.17-rc7              3.17-rc7              3.17-rc7              3.17-rc7
                               26-thp                27-thp                28-thp                29-thp
Success 1 Min         20.00 (  0.00%)       27.00 (-35.00%)       26.00 (-30.00%)       22.00 (-10.00%)
Success 1 Mean        28.90 (  0.00%)       33.00 (-14.19%)       31.90 (-10.38%)       29.60 ( -2.42%)
Success 1 Max         36.00 (  0.00%)       40.00 (-11.11%)       39.00 ( -8.33%)       35.00 (  2.78%)
Success 2 Min         20.00 (  0.00%)       28.00 (-40.00%)       30.00 (-50.00%)       23.00 (-15.00%)
Success 2 Mean        31.20 (  0.00%)       36.70 (-17.63%)       35.20 (-12.82%)       32.50 ( -4.17%)
Success 2 Max         39.00 (  0.00%)       43.00 (-10.26%)       42.00 ( -7.69%)       43.00 (-10.26%)
Success 3 Min         85.00 (  0.00%)       86.00 ( -1.18%)       87.00 ( -2.35%)       86.00 ( -1.18%)
Success 3 Mean        86.90 (  0.00%)       87.30 ( -0.46%)       87.70 ( -0.92%)       87.20 ( -0.35%)
Success 3 Max         88.00 (  0.00%)       88.00 (  0.00%)       90.00 ( -2.27%)       89.00 ( -1.14%)

            3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
              26-thp      27-thp      28-thp      29-thp
User         6819.54     6791.98     6817.78     6780.39
System       1060.01     1061.72     1059.55     1060.22
Elapsed      2143.61     2169.23     2151.94     2164.37

                              3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
                                26-thp      27-thp      28-thp      29-thp
Minor Faults                 197991650   197731531   197676212   198108344
Major Faults                       467         517         485         463
Swap Ins                            55          42          55          37
Swap Outs                         2743        2628        2848        2423
Allocation stalls                 5674        6859        5830        5430
DMA allocs                          21          19          18          20
DMA32 allocs                 124822788   124717762   124599426   124998427
Normal allocs                 58689613    58661322    58715465    58613337
Movable allocs                       0           0           0           0
Direct pages scanned            425873      497589      437964      440959
Kswapd pages scanned           2106472     2092938     2123314     2137886
Kswapd pages reclaimed         2100750     2087313     2117523     2124031
Direct pages reclaimed          424875      496616      437006      439572
Kswapd efficiency                  99%         99%         99%         99%
Kswapd velocity                986.439     999.617    1016.928     984.321
Direct efficiency                  99%         99%         99%         99%
Direct velocity                199.432     237.656     209.756     203.025
Percentage direct scans            16%         19%         17%         17%
Zone normal velocity           396.728     411.978     412.730     391.261
Zone dma32 velocity            789.143     825.294     813.954     796.086
Zone dma velocity                0.000       0.000       0.000       0.000
Page writes by reclaim        2963.000    2735.600    2981.900    2640.500
Page writes file                   219         107         133         217
Page writes anon                  2743        2628        2848        2423
Page reclaim immediate            1504        1609        1622        9672
Sector Reads                   4638068     4700778     4687436     4690935
Sector Writes                 12744701    12689336    12685726    12742547
Page rescued immediate               0           0           0           0
Slabs scanned                  1612929     1704964     1659159     1670590
Direct inode steals              15564       17989       16063       17179
Kswapd inode steals              31322       31013       31563       31266
Kswapd skipped wait                  0           0           0           0
THP fault alloc                    250         227         246         223
THP collapse alloc                 517         515         504         487
THP splits                          15          13          14          11
THP fault fallback                  10          24           5          38
THP collapse fail                   17          18          16          18
Compaction stalls                 2482        2794        2687        2608
Compaction success                 894        1016         995         972
Compaction failures               1588        1778        1692        1636
Page migrate success           2306759     2283240     2298373     2228802
Page migrate failure             10645       12648       10681       10023
Compaction pages isolated      4906442     4878707     4907827     4768580
Compaction migrate scanned    40396525    46362656    44372629    42315303
Compaction free scanned      134008519   146858466   131814222   132434783
Compaction cost                   2770        2787        2789        2700
NUMA alloc hit               181150856   180941682   180895401   181254771
NUMA alloc miss                      0           0           0           0
NUMA interleave hit                  0           0           0           0
NUMA alloc local             181150856   180941682   180895401   181254771
NUMA base PTE updates                0           0           0           0
NUMA huge PMD updates                0           0           0           0
NUMA page range updates              0           0           0           0
NUMA hint faults                     0           0           0           0
NUMA hint local faults               0           0           0           0
NUMA hint local percent            100         100         100         100
NUMA pages migrated                  0           0           0           0
AutoNUMA cost                       0%          0%          0%          0%

                                                       3.17-rc7    3.17-rc7    3.17-rc7    3.17-rc7
                                                         26-thp      27-thp      28-thp      29-thp
Page alloc extfrag event                                4270316     5661910     5018754     2062787
Extfrag fragmenting                                     4268643     5660158     5016977     2061077
Extfrag fragmenting for unmovable                         21632       17627        1985        1984
Extfrag fragmenting unmovable placed with movable         12428        9011        1663        1506
Extfrag fragmenting for reclaimable                        1682        1106        1290        1401
Extfrag fragmenting reclaimable placed with movable        1480         917        1072        1132
Extfrag fragmenting for movable                         4245329     5641425     5013702     2057692


Vlastimil Babka (3):
  mm: when stealing freepages, also take pages created by splitting
    buddy page
  mm: more aggressive page stealing for UNMOVABLE allocations
  mm: always steal split buddies in fallback allocations

 mm/page_alloc.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC PATCH 1/3] mm: when stealing freepages, also take pages created by splitting buddy page
  2014-12-04 17:12 ` Vlastimil Babka
@ 2014-12-04 17:12   ` Vlastimil Babka
  -1 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-04 17:12 UTC (permalink / raw)
  To: linux-mm, Joonsoo Kim
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes, Vlastimil Babka

When __rmqueue_fallback() is called to allocate a page of order X, it will
find a page of order Y >= X of a fallback migratetype, which is different from
the desired migratetype. With the help of try_to_steal_freepages(), it may
change the migratetype (to the desired one) also of:

1) all currently free pages in the pageblock containing the fallback page
2) the fallback pageblock itself
3) buddy pages created by splitting the fallback page (when Y > X)

These decisions take the order Y into account, as well as the desired
migratetype, with the goal of preventing multiple fallback allocations that
could e.g. distribute UNMOVABLE allocations among multiple pageblocks.

Originally, decision for 1) has implied the decision for 3). Commit
47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
(probably unintentionally) so that the buddy pages in case 3) are always
changed to the desired migratetype, except for CMA pageblocks.

Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
fix a bug") did some refactoring and added a comment that the case of 3) is
intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
pageblock type") removed the comment and tried to restore the original behavior
where 1) implies 3), but due to the previous refactoring, the result is instead
that only 2) implies 3) - and the conditions for 2) are less frequently met
than conditions for 1). This may increase fragmentation in situations where the
code decides to steal all free pages from the pageblock (case 1)), but then
gives back the buddy pages produced by splitting.

This patch restores the original intended logic where 1) implies 3). During
testing with stress-highalloc from mmtests, this has shown to decrease the
number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
pageblocks, which can lead to permanent fragmentation. It has increased the
number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
pageblocks, but these are fixable by sync compaction and thus less harmful.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 616a2c9..548b072 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1105,12 +1105,10 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
 
 		/* Claim the whole block if over half of it is free */
 		if (pages >= (1 << (pageblock_order-1)) ||
-				page_group_by_mobility_disabled) {
-
+				page_group_by_mobility_disabled)
 			set_pageblock_migratetype(page, start_type);
-			return start_type;
-		}
 
+		return start_type;
 	}
 
 	return fallback_type;
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 1/3] mm: when stealing freepages, also take pages created by splitting buddy page
@ 2014-12-04 17:12   ` Vlastimil Babka
  0 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-04 17:12 UTC (permalink / raw)
  To: linux-mm, Joonsoo Kim
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes, Vlastimil Babka

When __rmqueue_fallback() is called to allocate a page of order X, it will
find a page of order Y >= X of a fallback migratetype, which is different from
the desired migratetype. With the help of try_to_steal_freepages(), it may
change the migratetype (to the desired one) also of:

1) all currently free pages in the pageblock containing the fallback page
2) the fallback pageblock itself
3) buddy pages created by splitting the fallback page (when Y > X)

These decisions take the order Y into account, as well as the desired
migratetype, with the goal of preventing multiple fallback allocations that
could e.g. distribute UNMOVABLE allocations among multiple pageblocks.

Originally, decision for 1) has implied the decision for 3). Commit
47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
(probably unintentionally) so that the buddy pages in case 3) are always
changed to the desired migratetype, except for CMA pageblocks.

Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
fix a bug") did some refactoring and added a comment that the case of 3) is
intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
pageblock type") removed the comment and tried to restore the original behavior
where 1) implies 3), but due to the previous refactoring, the result is instead
that only 2) implies 3) - and the conditions for 2) are less frequently met
than conditions for 1). This may increase fragmentation in situations where the
code decides to steal all free pages from the pageblock (case 1)), but then
gives back the buddy pages produced by splitting.

This patch restores the original intended logic where 1) implies 3). During
testing with stress-highalloc from mmtests, this has shown to decrease the
number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
pageblocks, which can lead to permanent fragmentation. It has increased the
number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
pageblocks, but these are fixable by sync compaction and thus less harmful.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 616a2c9..548b072 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1105,12 +1105,10 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
 
 		/* Claim the whole block if over half of it is free */
 		if (pages >= (1 << (pageblock_order-1)) ||
-				page_group_by_mobility_disabled) {
-
+				page_group_by_mobility_disabled)
 			set_pageblock_migratetype(page, start_type);
-			return start_type;
-		}
 
+		return start_type;
 	}
 
 	return fallback_type;
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
  2014-12-04 17:12 ` Vlastimil Babka
@ 2014-12-04 17:12   ` Vlastimil Babka
  -1 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-04 17:12 UTC (permalink / raw)
  To: linux-mm, Joonsoo Kim
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes, Vlastimil Babka

When allocation falls back to stealing free pages of another migratetype,
it can decide to steal extra pages, or even the whole pageblock in order to
reduce fragmentation, which could happen if further allocation fallbacks
pick a different pageblock. In try_to_steal_freepages(), one of the situations
where extra pages are stolen happens when we are trying to allocate a
MIGRATE_RECLAIMABLE page.

However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
spreading such allocation over multiple fallback pageblocks is arguably even
worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
should minimize the number of such fallbacks, and thus steal as much as is
possible from each fallback pageblock.

This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
extra free pages. When evaluating with stress-highalloc from mmtests, this has
reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 548b072..a14249c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
 
 	if (current_order >= pageblock_order / 2 ||
 	    start_type == MIGRATE_RECLAIMABLE ||
+	    start_type == MIGRATE_UNMOVABLE ||
 	    page_group_by_mobility_disabled) {
 		int pages;
 
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
@ 2014-12-04 17:12   ` Vlastimil Babka
  0 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-04 17:12 UTC (permalink / raw)
  To: linux-mm, Joonsoo Kim
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes, Vlastimil Babka

When allocation falls back to stealing free pages of another migratetype,
it can decide to steal extra pages, or even the whole pageblock in order to
reduce fragmentation, which could happen if further allocation fallbacks
pick a different pageblock. In try_to_steal_freepages(), one of the situations
where extra pages are stolen happens when we are trying to allocate a
MIGRATE_RECLAIMABLE page.

However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
spreading such allocation over multiple fallback pageblocks is arguably even
worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
should minimize the number of such fallbacks, and thus steal as much as is
possible from each fallback pageblock.

This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
extra free pages. When evaluating with stress-highalloc from mmtests, this has
reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 548b072..a14249c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
 
 	if (current_order >= pageblock_order / 2 ||
 	    start_type == MIGRATE_RECLAIMABLE ||
+	    start_type == MIGRATE_UNMOVABLE ||
 	    page_group_by_mobility_disabled) {
 		int pages;
 
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations
  2014-12-04 17:12 ` Vlastimil Babka
@ 2014-12-04 17:12   ` Vlastimil Babka
  -1 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-04 17:12 UTC (permalink / raw)
  To: linux-mm, Joonsoo Kim
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes, Vlastimil Babka

When allocation falls back to another migratetype, it will steal a page with
highest available order, and (depending on this order and desired migratetype),
it might also steal the rest of free pages from the same pageblock.

Given the preference of highest available order, it is likely that it will be
higher than the desired order, and result in the stolen buddy page being split.
The remaining pages after split are currently stolen only when the rest of the
free pages are stolen. This can however lead to situations where for MOVABLE
allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
order-0 page. Then on the next MOVABLE allocation (which may be batched to
fill the pcplists) we split another order-3 or higher page, etc. By stealing
all pages that we have split, we can avoid further stealing.

This patch therefore adjust the page stealing so that buddy pages created by
split are always stolen. This has effect only on MOVABLE allocations, as
RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
stealing the rest of free pages from the pageblock.

Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
has already performed this change (unintentinally), but was reverted by commit
0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
Neither included evaluation. My evaluation with stress-highalloc from mmtests
shows about 2.5x reduction of page stealing events for MOVABLE allocations,
without affecting the page stealing events for other allocation migratetypes.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a14249c..82096a6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
 		if (pages >= (1 << (pageblock_order-1)) ||
 				page_group_by_mobility_disabled)
 			set_pageblock_migratetype(page, start_type);
-
-		return start_type;
 	}
 
-	return fallback_type;
+	return start_type;
 }
 
 /* Remove an element from the buddy allocator from the fallback list */
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations
@ 2014-12-04 17:12   ` Vlastimil Babka
  0 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-04 17:12 UTC (permalink / raw)
  To: linux-mm, Joonsoo Kim
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes, Vlastimil Babka

When allocation falls back to another migratetype, it will steal a page with
highest available order, and (depending on this order and desired migratetype),
it might also steal the rest of free pages from the same pageblock.

Given the preference of highest available order, it is likely that it will be
higher than the desired order, and result in the stolen buddy page being split.
The remaining pages after split are currently stolen only when the rest of the
free pages are stolen. This can however lead to situations where for MOVABLE
allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
order-0 page. Then on the next MOVABLE allocation (which may be batched to
fill the pcplists) we split another order-3 or higher page, etc. By stealing
all pages that we have split, we can avoid further stealing.

This patch therefore adjust the page stealing so that buddy pages created by
split are always stolen. This has effect only on MOVABLE allocations, as
RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
stealing the rest of free pages from the pageblock.

Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
has already performed this change (unintentinally), but was reverted by commit
0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
Neither included evaluation. My evaluation with stress-highalloc from mmtests
shows about 2.5x reduction of page stealing events for MOVABLE allocations,
without affecting the page stealing events for other allocation migratetypes.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a14249c..82096a6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
 		if (pages >= (1 << (pageblock_order-1)) ||
 				page_group_by_mobility_disabled)
 			set_pageblock_migratetype(page, start_type);
-
-		return start_type;
 	}
 
-	return fallback_type;
+	return start_type;
 }
 
 /* Remove an element from the buddy allocator from the fallback list */
-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/3] mm: when stealing freepages, also take pages created by splitting buddy page
  2014-12-04 17:12   ` Vlastimil Babka
@ 2014-12-08  6:54     ` Joonsoo Kim
  -1 siblings, 0 replies; 38+ messages in thread
From: Joonsoo Kim @ 2014-12-08  6:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:56PM +0100, Vlastimil Babka wrote:
> When __rmqueue_fallback() is called to allocate a page of order X, it will
> find a page of order Y >= X of a fallback migratetype, which is different from
> the desired migratetype. With the help of try_to_steal_freepages(), it may
> change the migratetype (to the desired one) also of:
> 
> 1) all currently free pages in the pageblock containing the fallback page
> 2) the fallback pageblock itself
> 3) buddy pages created by splitting the fallback page (when Y > X)
> 
> These decisions take the order Y into account, as well as the desired
> migratetype, with the goal of preventing multiple fallback allocations that
> could e.g. distribute UNMOVABLE allocations among multiple pageblocks.
> 
> Originally, decision for 1) has implied the decision for 3). Commit
> 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
> (probably unintentionally) so that the buddy pages in case 3) are always
> changed to the desired migratetype, except for CMA pageblocks.
> 
> Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
> fix a bug") did some refactoring and added a comment that the case of 3) is
> intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
> pageblock type") removed the comment and tried to restore the original behavior
> where 1) implies 3), but due to the previous refactoring, the result is instead
> that only 2) implies 3) - and the conditions for 2) are less frequently met
> than conditions for 1). This may increase fragmentation in situations where the
> code decides to steal all free pages from the pageblock (case 1)), but then
> gives back the buddy pages produced by splitting.
> 
> This patch restores the original intended logic where 1) implies 3). During
> testing with stress-highalloc from mmtests, this has shown to decrease the
> number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
> pageblocks, which can lead to permanent fragmentation. It has increased the
> number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
> pageblocks, but these are fixable by sync compaction and thus less harmful.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 616a2c9..548b072 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1105,12 +1105,10 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>  
>  		/* Claim the whole block if over half of it is free */
>  		if (pages >= (1 << (pageblock_order-1)) ||
> -				page_group_by_mobility_disabled) {
> -
> +				page_group_by_mobility_disabled)
>  			set_pageblock_migratetype(page, start_type);
> -			return start_type;
> -		}
>  
> +		return start_type;
>  	}
>  
>  	return fallback_type;

change_ownership on tracepoint will be wrong with this change.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/3] mm: when stealing freepages, also take pages created by splitting buddy page
@ 2014-12-08  6:54     ` Joonsoo Kim
  0 siblings, 0 replies; 38+ messages in thread
From: Joonsoo Kim @ 2014-12-08  6:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:56PM +0100, Vlastimil Babka wrote:
> When __rmqueue_fallback() is called to allocate a page of order X, it will
> find a page of order Y >= X of a fallback migratetype, which is different from
> the desired migratetype. With the help of try_to_steal_freepages(), it may
> change the migratetype (to the desired one) also of:
> 
> 1) all currently free pages in the pageblock containing the fallback page
> 2) the fallback pageblock itself
> 3) buddy pages created by splitting the fallback page (when Y > X)
> 
> These decisions take the order Y into account, as well as the desired
> migratetype, with the goal of preventing multiple fallback allocations that
> could e.g. distribute UNMOVABLE allocations among multiple pageblocks.
> 
> Originally, decision for 1) has implied the decision for 3). Commit
> 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
> (probably unintentionally) so that the buddy pages in case 3) are always
> changed to the desired migratetype, except for CMA pageblocks.
> 
> Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
> fix a bug") did some refactoring and added a comment that the case of 3) is
> intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
> pageblock type") removed the comment and tried to restore the original behavior
> where 1) implies 3), but due to the previous refactoring, the result is instead
> that only 2) implies 3) - and the conditions for 2) are less frequently met
> than conditions for 1). This may increase fragmentation in situations where the
> code decides to steal all free pages from the pageblock (case 1)), but then
> gives back the buddy pages produced by splitting.
> 
> This patch restores the original intended logic where 1) implies 3). During
> testing with stress-highalloc from mmtests, this has shown to decrease the
> number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
> pageblocks, which can lead to permanent fragmentation. It has increased the
> number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
> pageblocks, but these are fixable by sync compaction and thus less harmful.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 616a2c9..548b072 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1105,12 +1105,10 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>  
>  		/* Claim the whole block if over half of it is free */
>  		if (pages >= (1 << (pageblock_order-1)) ||
> -				page_group_by_mobility_disabled) {
> -
> +				page_group_by_mobility_disabled)
>  			set_pageblock_migratetype(page, start_type);
> -			return start_type;
> -		}
>  
> +		return start_type;
>  	}
>  
>  	return fallback_type;

change_ownership on tracepoint will be wrong with this change.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
  2014-12-04 17:12   ` Vlastimil Babka
@ 2014-12-08  7:11     ` Joonsoo Kim
  -1 siblings, 0 replies; 38+ messages in thread
From: Joonsoo Kim @ 2014-12-08  7:11 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> When allocation falls back to stealing free pages of another migratetype,
> it can decide to steal extra pages, or even the whole pageblock in order to
> reduce fragmentation, which could happen if further allocation fallbacks
> pick a different pageblock. In try_to_steal_freepages(), one of the situations
> where extra pages are stolen happens when we are trying to allocate a
> MIGRATE_RECLAIMABLE page.
> 
> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> spreading such allocation over multiple fallback pageblocks is arguably even
> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> should minimize the number of such fallbacks, and thus steal as much as is
> possible from each fallback pageblock.

I'm not sure that this change is good. If we steal order 0 pages,
this may be good. But, sometimes, we try to steal high order page
and, in this case, there would be many order 0 freepages and blindly
stealing freepages in that pageblock make the system more fragmented.

MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
it can be reclaimed so excessive migratetype movement doesn't result
in permanent fragmentation.

What I'd like to do to prevent fragmentation is
1) check whether we can steal all or almost freepages and change
migratetype of pageblock.
2) If above condition isn't met, deny allocation and invoke compaction.

Maybe knob to control behaviour would be needed.
How about it?

Thanks.

> 
> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
> extra free pages. When evaluating with stress-highalloc from mmtests, this has
> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 548b072..a14249c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>  
>  	if (current_order >= pageblock_order / 2 ||
>  	    start_type == MIGRATE_RECLAIMABLE ||
> +	    start_type == MIGRATE_UNMOVABLE ||
>  	    page_group_by_mobility_disabled) {
>  		int pages;
>  
> -- 
> 2.1.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
@ 2014-12-08  7:11     ` Joonsoo Kim
  0 siblings, 0 replies; 38+ messages in thread
From: Joonsoo Kim @ 2014-12-08  7:11 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> When allocation falls back to stealing free pages of another migratetype,
> it can decide to steal extra pages, or even the whole pageblock in order to
> reduce fragmentation, which could happen if further allocation fallbacks
> pick a different pageblock. In try_to_steal_freepages(), one of the situations
> where extra pages are stolen happens when we are trying to allocate a
> MIGRATE_RECLAIMABLE page.
> 
> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> spreading such allocation over multiple fallback pageblocks is arguably even
> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> should minimize the number of such fallbacks, and thus steal as much as is
> possible from each fallback pageblock.

I'm not sure that this change is good. If we steal order 0 pages,
this may be good. But, sometimes, we try to steal high order page
and, in this case, there would be many order 0 freepages and blindly
stealing freepages in that pageblock make the system more fragmented.

MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
it can be reclaimed so excessive migratetype movement doesn't result
in permanent fragmentation.

What I'd like to do to prevent fragmentation is
1) check whether we can steal all or almost freepages and change
migratetype of pageblock.
2) If above condition isn't met, deny allocation and invoke compaction.

Maybe knob to control behaviour would be needed.
How about it?

Thanks.

> 
> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
> extra free pages. When evaluating with stress-highalloc from mmtests, this has
> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 548b072..a14249c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>  
>  	if (current_order >= pageblock_order / 2 ||
>  	    start_type == MIGRATE_RECLAIMABLE ||
> +	    start_type == MIGRATE_UNMOVABLE ||
>  	    page_group_by_mobility_disabled) {
>  		int pages;
>  
> -- 
> 2.1.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations
  2014-12-04 17:12   ` Vlastimil Babka
@ 2014-12-08  7:36     ` Joonsoo Kim
  -1 siblings, 0 replies; 38+ messages in thread
From: Joonsoo Kim @ 2014-12-08  7:36 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
> When allocation falls back to another migratetype, it will steal a page with
> highest available order, and (depending on this order and desired migratetype),
> it might also steal the rest of free pages from the same pageblock.
> 
> Given the preference of highest available order, it is likely that it will be
> higher than the desired order, and result in the stolen buddy page being split.
> The remaining pages after split are currently stolen only when the rest of the
> free pages are stolen. This can however lead to situations where for MOVABLE
> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
> order-0 page. Then on the next MOVABLE allocation (which may be batched to
> fill the pcplists) we split another order-3 or higher page, etc. By stealing
> all pages that we have split, we can avoid further stealing.
> 
> This patch therefore adjust the page stealing so that buddy pages created by
> split are always stolen. This has effect only on MOVABLE allocations, as
> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
> stealing the rest of free pages from the pageblock.

In fact, CMA also has same problem and this patch skips to fix it.
If movable allocation steals the page on CMA reserved area, remained split
freepages are always linked to original CMA buddy list. And then, next
fallback allocation repeately selects most highorder freepage on CMA
area and split it.

IMO, It'd be better to re-consider whole fragmentation avoidance logic.

Thanks.

> 
> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
> has already performed this change (unintentinally), but was reverted by commit
> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
> Neither included evaluation. My evaluation with stress-highalloc from mmtests
> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
> without affecting the page stealing events for other allocation migratetypes.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a14249c..82096a6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>  		if (pages >= (1 << (pageblock_order-1)) ||
>  				page_group_by_mobility_disabled)
>  			set_pageblock_migratetype(page, start_type);
> -
> -		return start_type;
>  	}
>  
> -	return fallback_type;
> +	return start_type;
>  }
>  
>  /* Remove an element from the buddy allocator from the fallback list */
> -- 
> 2.1.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations
@ 2014-12-08  7:36     ` Joonsoo Kim
  0 siblings, 0 replies; 38+ messages in thread
From: Joonsoo Kim @ 2014-12-08  7:36 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
> When allocation falls back to another migratetype, it will steal a page with
> highest available order, and (depending on this order and desired migratetype),
> it might also steal the rest of free pages from the same pageblock.
> 
> Given the preference of highest available order, it is likely that it will be
> higher than the desired order, and result in the stolen buddy page being split.
> The remaining pages after split are currently stolen only when the rest of the
> free pages are stolen. This can however lead to situations where for MOVABLE
> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
> order-0 page. Then on the next MOVABLE allocation (which may be batched to
> fill the pcplists) we split another order-3 or higher page, etc. By stealing
> all pages that we have split, we can avoid further stealing.
> 
> This patch therefore adjust the page stealing so that buddy pages created by
> split are always stolen. This has effect only on MOVABLE allocations, as
> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
> stealing the rest of free pages from the pageblock.

In fact, CMA also has same problem and this patch skips to fix it.
If movable allocation steals the page on CMA reserved area, remained split
freepages are always linked to original CMA buddy list. And then, next
fallback allocation repeately selects most highorder freepage on CMA
area and split it.

IMO, It'd be better to re-consider whole fragmentation avoidance logic.

Thanks.

> 
> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
> has already performed this change (unintentinally), but was reverted by commit
> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
> Neither included evaluation. My evaluation with stress-highalloc from mmtests
> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
> without affecting the page stealing events for other allocation migratetypes.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a14249c..82096a6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>  		if (pages >= (1 << (pageblock_order-1)) ||
>  				page_group_by_mobility_disabled)
>  			set_pageblock_migratetype(page, start_type);
> -
> -		return start_type;
>  	}
>  
> -	return fallback_type;
> +	return start_type;
>  }
>  
>  /* Remove an element from the buddy allocator from the fallback list */
> -- 
> 2.1.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
  2014-12-08  7:11     ` Joonsoo Kim
@ 2014-12-08 10:27       ` Vlastimil Babka
  -1 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-08 10:27 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
> On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
>> When allocation falls back to stealing free pages of another migratetype,
>> it can decide to steal extra pages, or even the whole pageblock in order to
>> reduce fragmentation, which could happen if further allocation fallbacks
>> pick a different pageblock. In try_to_steal_freepages(), one of the situations
>> where extra pages are stolen happens when we are trying to allocate a
>> MIGRATE_RECLAIMABLE page.
>>
>> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
>> spreading such allocation over multiple fallback pageblocks is arguably even
>> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
>> should minimize the number of such fallbacks, and thus steal as much as is
>> possible from each fallback pageblock.
>
> I'm not sure that this change is good. If we steal order 0 pages,
> this may be good. But, sometimes, we try to steal high order page
> and, in this case, there would be many order 0 freepages and blindly
> stealing freepages in that pageblock make the system more fragmented.

I don't understand. If we try to steal high order page (current_order >= 
pageblock_order / 2), then nothing changes, the condition for extra 
stealing is the same.

> MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
> it can be reclaimed so excessive migratetype movement doesn't result
> in permanent fragmentation.

There's two kinds of "fragmentation" IMHO. First, inside a pageblock, 
unmovable allocations can prevent merging of lower orders. This can get 
worse if we steal multiple pages from a single pageblock, but the 
pageblock itself is not marked as unmovable.

Second kind of fragmentation is when unmovable allocations spread over 
multiple pageblocks. Lower order allocations within each such pageblock 
might be still possible, but less pageblocks are able to compact to have 
whole pageblock free.

I think the second kind is worse, so when do have to pollute a movable 
pageblock with unmovable allocation, we better take as much as possible, 
so we prevent polluting other pageblocks.


> What I'd like to do to prevent fragmentation is
> 1) check whether we can steal all or almost freepages and change
> migratetype of pageblock.
> 2) If above condition isn't met, deny allocation and invoke compaction.

Could work to some extend, but we need also to prevent excessive compaction.

We could also introduce a new pageblock migratetype, something like 
MIGRATE_MIXED. The idea is that once pageblock isn't used purely by 
MOVABLE allocations, it's marked as MIXED, until it either becomes 
marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is fully 
freed. In more detail:

- MIXED is preferred for fallback before any other migratetypes
- if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE 
pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by current 
rules), it marks it as MIXED instead.
- if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE 
pageblocks, it will only mark it as MOVABLE if it was fully free. 
Otherwise, if current rules would result in marking it as MOVABLE (i.e. 
most of it was stolen, but not all) it will mark it as MIXED instead.

This could in theory leave more MOVABLE pageblocks unspoiled by 
UNMOVABLE allocations.

> Maybe knob to control behaviour would be needed.
> How about it?

Adding new knobs is not a good solution.

> Thanks.
>
>>
>> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
>> extra free pages. When evaluating with stress-highalloc from mmtests, this has
>> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
>> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>   mm/page_alloc.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 548b072..a14249c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>>
>>   	if (current_order >= pageblock_order / 2 ||
>>   	    start_type == MIGRATE_RECLAIMABLE ||
>> +	    start_type == MIGRATE_UNMOVABLE ||
>>   	    page_group_by_mobility_disabled) {
>>   		int pages;
>>
>> --
>> 2.1.2
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
@ 2014-12-08 10:27       ` Vlastimil Babka
  0 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-08 10:27 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
> On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
>> When allocation falls back to stealing free pages of another migratetype,
>> it can decide to steal extra pages, or even the whole pageblock in order to
>> reduce fragmentation, which could happen if further allocation fallbacks
>> pick a different pageblock. In try_to_steal_freepages(), one of the situations
>> where extra pages are stolen happens when we are trying to allocate a
>> MIGRATE_RECLAIMABLE page.
>>
>> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
>> spreading such allocation over multiple fallback pageblocks is arguably even
>> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
>> should minimize the number of such fallbacks, and thus steal as much as is
>> possible from each fallback pageblock.
>
> I'm not sure that this change is good. If we steal order 0 pages,
> this may be good. But, sometimes, we try to steal high order page
> and, in this case, there would be many order 0 freepages and blindly
> stealing freepages in that pageblock make the system more fragmented.

I don't understand. If we try to steal high order page (current_order >= 
pageblock_order / 2), then nothing changes, the condition for extra 
stealing is the same.

> MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
> it can be reclaimed so excessive migratetype movement doesn't result
> in permanent fragmentation.

There's two kinds of "fragmentation" IMHO. First, inside a pageblock, 
unmovable allocations can prevent merging of lower orders. This can get 
worse if we steal multiple pages from a single pageblock, but the 
pageblock itself is not marked as unmovable.

Second kind of fragmentation is when unmovable allocations spread over 
multiple pageblocks. Lower order allocations within each such pageblock 
might be still possible, but less pageblocks are able to compact to have 
whole pageblock free.

I think the second kind is worse, so when do have to pollute a movable 
pageblock with unmovable allocation, we better take as much as possible, 
so we prevent polluting other pageblocks.


> What I'd like to do to prevent fragmentation is
> 1) check whether we can steal all or almost freepages and change
> migratetype of pageblock.
> 2) If above condition isn't met, deny allocation and invoke compaction.

Could work to some extend, but we need also to prevent excessive compaction.

We could also introduce a new pageblock migratetype, something like 
MIGRATE_MIXED. The idea is that once pageblock isn't used purely by 
MOVABLE allocations, it's marked as MIXED, until it either becomes 
marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is fully 
freed. In more detail:

- MIXED is preferred for fallback before any other migratetypes
- if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE 
pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by current 
rules), it marks it as MIXED instead.
- if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE 
pageblocks, it will only mark it as MOVABLE if it was fully free. 
Otherwise, if current rules would result in marking it as MOVABLE (i.e. 
most of it was stolen, but not all) it will mark it as MIXED instead.

This could in theory leave more MOVABLE pageblocks unspoiled by 
UNMOVABLE allocations.

> Maybe knob to control behaviour would be needed.
> How about it?

Adding new knobs is not a good solution.

> Thanks.
>
>>
>> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
>> extra free pages. When evaluating with stress-highalloc from mmtests, this has
>> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
>> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>   mm/page_alloc.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 548b072..a14249c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>>
>>   	if (current_order >= pageblock_order / 2 ||
>>   	    start_type == MIGRATE_RECLAIMABLE ||
>> +	    start_type == MIGRATE_UNMOVABLE ||
>>   	    page_group_by_mobility_disabled) {
>>   		int pages;
>>
>> --
>> 2.1.2
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations
  2014-12-08  7:36     ` Joonsoo Kim
@ 2014-12-08 10:30       ` Vlastimil Babka
  -1 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-08 10:30 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On 12/08/2014 08:36 AM, Joonsoo Kim wrote:
> On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
>> When allocation falls back to another migratetype, it will steal a page with
>> highest available order, and (depending on this order and desired migratetype),
>> it might also steal the rest of free pages from the same pageblock.
>>
>> Given the preference of highest available order, it is likely that it will be
>> higher than the desired order, and result in the stolen buddy page being split.
>> The remaining pages after split are currently stolen only when the rest of the
>> free pages are stolen. This can however lead to situations where for MOVABLE
>> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
>> order-0 page. Then on the next MOVABLE allocation (which may be batched to
>> fill the pcplists) we split another order-3 or higher page, etc. By stealing
>> all pages that we have split, we can avoid further stealing.
>>
>> This patch therefore adjust the page stealing so that buddy pages created by
>> split are always stolen. This has effect only on MOVABLE allocations, as
>> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
>> stealing the rest of free pages from the pageblock.
>
> In fact, CMA also has same problem and this patch skips to fix it.
> If movable allocation steals the page on CMA reserved area, remained split
> freepages are always linked to original CMA buddy list. And then, next
> fallback allocation repeately selects most highorder freepage on CMA
> area and split it.

Hm yeah, for CMA it would make more sense to steal page of the lowest 
available order, not highest.

> IMO, It'd be better to re-consider whole fragmentation avoidance logic.
>
> Thanks.
>
>>
>> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
>> has already performed this change (unintentinally), but was reverted by commit
>> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
>> Neither included evaluation. My evaluation with stress-highalloc from mmtests
>> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
>> without affecting the page stealing events for other allocation migratetypes.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>   mm/page_alloc.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a14249c..82096a6 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>>   		if (pages >= (1 << (pageblock_order-1)) ||
>>   				page_group_by_mobility_disabled)
>>   			set_pageblock_migratetype(page, start_type);
>> -
>> -		return start_type;
>>   	}
>>
>> -	return fallback_type;
>> +	return start_type;
>>   }
>>
>>   /* Remove an element from the buddy allocator from the fallback list */
>> --
>> 2.1.2
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations
@ 2014-12-08 10:30       ` Vlastimil Babka
  0 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-08 10:30 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On 12/08/2014 08:36 AM, Joonsoo Kim wrote:
> On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
>> When allocation falls back to another migratetype, it will steal a page with
>> highest available order, and (depending on this order and desired migratetype),
>> it might also steal the rest of free pages from the same pageblock.
>>
>> Given the preference of highest available order, it is likely that it will be
>> higher than the desired order, and result in the stolen buddy page being split.
>> The remaining pages after split are currently stolen only when the rest of the
>> free pages are stolen. This can however lead to situations where for MOVABLE
>> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
>> order-0 page. Then on the next MOVABLE allocation (which may be batched to
>> fill the pcplists) we split another order-3 or higher page, etc. By stealing
>> all pages that we have split, we can avoid further stealing.
>>
>> This patch therefore adjust the page stealing so that buddy pages created by
>> split are always stolen. This has effect only on MOVABLE allocations, as
>> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
>> stealing the rest of free pages from the pageblock.
>
> In fact, CMA also has same problem and this patch skips to fix it.
> If movable allocation steals the page on CMA reserved area, remained split
> freepages are always linked to original CMA buddy list. And then, next
> fallback allocation repeately selects most highorder freepage on CMA
> area and split it.

Hm yeah, for CMA it would make more sense to steal page of the lowest 
available order, not highest.

> IMO, It'd be better to re-consider whole fragmentation avoidance logic.
>
> Thanks.
>
>>
>> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
>> has already performed this change (unintentinally), but was reverted by commit
>> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
>> Neither included evaluation. My evaluation with stress-highalloc from mmtests
>> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
>> without affecting the page stealing events for other allocation migratetypes.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>   mm/page_alloc.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a14249c..82096a6 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>>   		if (pages >= (1 << (pageblock_order-1)) ||
>>   				page_group_by_mobility_disabled)
>>   			set_pageblock_migratetype(page, start_type);
>> -
>> -		return start_type;
>>   	}
>>
>> -	return fallback_type;
>> +	return start_type;
>>   }
>>
>>   /* Remove an element from the buddy allocator from the fallback list */
>> --
>> 2.1.2
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/3] mm: when stealing freepages, also take pages created by splitting buddy page
  2014-12-04 17:12   ` Vlastimil Babka
@ 2014-12-08 11:07     ` Mel Gorman
  -1 siblings, 0 replies; 38+ messages in thread
From: Mel Gorman @ 2014-12-08 11:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Minchan Kim, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:56PM +0100, Vlastimil Babka wrote:
> When __rmqueue_fallback() is called to allocate a page of order X, it will
> find a page of order Y >= X of a fallback migratetype, which is different from
> the desired migratetype. With the help of try_to_steal_freepages(), it may
> change the migratetype (to the desired one) also of:
> 
> 1) all currently free pages in the pageblock containing the fallback page
> 2) the fallback pageblock itself
> 3) buddy pages created by splitting the fallback page (when Y > X)
> 
> These decisions take the order Y into account, as well as the desired
> migratetype, with the goal of preventing multiple fallback allocations that
> could e.g. distribute UNMOVABLE allocations among multiple pageblocks.
> 
> Originally, decision for 1) has implied the decision for 3). Commit
> 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
> (probably unintentionally) so that the buddy pages in case 3) are always
> changed to the desired migratetype, except for CMA pageblocks.
> 
> Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
> fix a bug") did some refactoring and added a comment that the case of 3) is
> intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
> pageblock type") removed the comment and tried to restore the original behavior
> where 1) implies 3), but due to the previous refactoring, the result is instead
> that only 2) implies 3) - and the conditions for 2) are less frequently met
> than conditions for 1). This may increase fragmentation in situations where the
> code decides to steal all free pages from the pageblock (case 1)), but then
> gives back the buddy pages produced by splitting.
> 
> This patch restores the original intended logic where 1) implies 3). During
> testing with stress-highalloc from mmtests, this has shown to decrease the
> number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
> pageblocks, which can lead to permanent fragmentation. It has increased the
> number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
> pageblocks, but these are fixable by sync compaction and thus less harmful.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Assuming the tracepoint issue Joonsoo pointed out gets corrected;

Acked-by: Mel Gorman <mgorman@suse.de>

I'm kicking myself that I missed the effect of 47118af076f6 when I was
reviewing it. I knew allocation success rates were worse than they used
to be but had been blaming changes in aggression of reclaim and
compaction.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/3] mm: when stealing freepages, also take pages created by splitting buddy page
@ 2014-12-08 11:07     ` Mel Gorman
  0 siblings, 0 replies; 38+ messages in thread
From: Mel Gorman @ 2014-12-08 11:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Minchan Kim, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:56PM +0100, Vlastimil Babka wrote:
> When __rmqueue_fallback() is called to allocate a page of order X, it will
> find a page of order Y >= X of a fallback migratetype, which is different from
> the desired migratetype. With the help of try_to_steal_freepages(), it may
> change the migratetype (to the desired one) also of:
> 
> 1) all currently free pages in the pageblock containing the fallback page
> 2) the fallback pageblock itself
> 3) buddy pages created by splitting the fallback page (when Y > X)
> 
> These decisions take the order Y into account, as well as the desired
> migratetype, with the goal of preventing multiple fallback allocations that
> could e.g. distribute UNMOVABLE allocations among multiple pageblocks.
> 
> Originally, decision for 1) has implied the decision for 3). Commit
> 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
> (probably unintentionally) so that the buddy pages in case 3) are always
> changed to the desired migratetype, except for CMA pageblocks.
> 
> Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
> fix a bug") did some refactoring and added a comment that the case of 3) is
> intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
> pageblock type") removed the comment and tried to restore the original behavior
> where 1) implies 3), but due to the previous refactoring, the result is instead
> that only 2) implies 3) - and the conditions for 2) are less frequently met
> than conditions for 1). This may increase fragmentation in situations where the
> code decides to steal all free pages from the pageblock (case 1)), but then
> gives back the buddy pages produced by splitting.
> 
> This patch restores the original intended logic where 1) implies 3). During
> testing with stress-highalloc from mmtests, this has shown to decrease the
> number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
> pageblocks, which can lead to permanent fragmentation. It has increased the
> number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
> pageblocks, but these are fixable by sync compaction and thus less harmful.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Assuming the tracepoint issue Joonsoo pointed out gets corrected;

Acked-by: Mel Gorman <mgorman@suse.de>

I'm kicking myself that I missed the effect of 47118af076f6 when I was
reviewing it. I knew allocation success rates were worse than they used
to be but had been blaming changes in aggression of reclaim and
compaction.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
  2014-12-04 17:12   ` Vlastimil Babka
@ 2014-12-08 11:16     ` Mel Gorman
  -1 siblings, 0 replies; 38+ messages in thread
From: Mel Gorman @ 2014-12-08 11:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Minchan Kim, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> When allocation falls back to stealing free pages of another migratetype,
> it can decide to steal extra pages, or even the whole pageblock in order to
> reduce fragmentation, which could happen if further allocation fallbacks
> pick a different pageblock. In try_to_steal_freepages(), one of the situations
> where extra pages are stolen happens when we are trying to allocate a
> MIGRATE_RECLAIMABLE page.
> 
> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> spreading such allocation over multiple fallback pageblocks is arguably even
> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> should minimize the number of such fallbacks, and thus steal as much as is
> possible from each fallback pageblock.
> 
> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
> extra free pages. When evaluating with stress-highalloc from mmtests, this has
> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@suse.de>

Note that this is a slightly tricky tradeoff. UNMOVABLE allocations will now
be stealing more of a pageblock during fallback events. This will reduce the
probability that unmovable fallbacks will happen in the future. However,
it also increases the probability that a movable allocation will fallback
in the future. This is particularly true for kernel-build stress workloads
as the liklihood is that unmovable allocations are stealing from movable
pageblocks.  The reason this happens is that the movable free lists are
smaller after an unmovable fallback event so a movable fallback event
happens sooner than it would have otherwise.

Movable fallback events are less severe than unmovable fallback events as
they can be moved or freed later so the patch heads the right direction. The
side-effect is simply interesting to note.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
@ 2014-12-08 11:16     ` Mel Gorman
  0 siblings, 0 replies; 38+ messages in thread
From: Mel Gorman @ 2014-12-08 11:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Minchan Kim, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> When allocation falls back to stealing free pages of another migratetype,
> it can decide to steal extra pages, or even the whole pageblock in order to
> reduce fragmentation, which could happen if further allocation fallbacks
> pick a different pageblock. In try_to_steal_freepages(), one of the situations
> where extra pages are stolen happens when we are trying to allocate a
> MIGRATE_RECLAIMABLE page.
> 
> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> spreading such allocation over multiple fallback pageblocks is arguably even
> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> should minimize the number of such fallbacks, and thus steal as much as is
> possible from each fallback pageblock.
> 
> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
> extra free pages. When evaluating with stress-highalloc from mmtests, this has
> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@suse.de>

Note that this is a slightly tricky tradeoff. UNMOVABLE allocations will now
be stealing more of a pageblock during fallback events. This will reduce the
probability that unmovable fallbacks will happen in the future. However,
it also increases the probability that a movable allocation will fallback
in the future. This is particularly true for kernel-build stress workloads
as the liklihood is that unmovable allocations are stealing from movable
pageblocks.  The reason this happens is that the movable free lists are
smaller after an unmovable fallback event so a movable fallback event
happens sooner than it would have otherwise.

Movable fallback events are less severe than unmovable fallback events as
they can be moved or freed later so the patch heads the right direction. The
side-effect is simply interesting to note.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations
  2014-12-04 17:12   ` Vlastimil Babka
@ 2014-12-08 11:26     ` Mel Gorman
  -1 siblings, 0 replies; 38+ messages in thread
From: Mel Gorman @ 2014-12-08 11:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Minchan Kim, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
> When allocation falls back to another migratetype, it will steal a page with
> highest available order, and (depending on this order and desired migratetype),
> it might also steal the rest of free pages from the same pageblock.
> 
> Given the preference of highest available order, it is likely that it will be
> higher than the desired order, and result in the stolen buddy page being split.
> The remaining pages after split are currently stolen only when the rest of the
> free pages are stolen.

The original intent was that the stolen fallback buddy page would be
added to the requested migratetype freelists. This was independent of
whether all other free pages in the pageblock were moved or whether the
pageblock migratetype was updated.

> This can however lead to situations where for MOVABLE
> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
> order-0 page. Then on the next MOVABLE allocation (which may be batched to
> fill the pcplists) we split another order-3 or higher page, etc. By stealing
> all pages that we have split, we can avoid further stealing.
> 
> This patch therefore adjust the page stealing so that buddy pages created by
> split are always stolen. This has effect only on MOVABLE allocations, as
> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
> stealing the rest of free pages from the pageblock.
> 

This restores the intended behaviour.

> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
> has already performed this change (unintentinally), but was reverted by commit
> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
> Neither included evaluation. My evaluation with stress-highalloc from mmtests
> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
> without affecting the page stealing events for other allocation migratetypes.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations
@ 2014-12-08 11:26     ` Mel Gorman
  0 siblings, 0 replies; 38+ messages in thread
From: Mel Gorman @ 2014-12-08 11:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Minchan Kim, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
> When allocation falls back to another migratetype, it will steal a page with
> highest available order, and (depending on this order and desired migratetype),
> it might also steal the rest of free pages from the same pageblock.
> 
> Given the preference of highest available order, it is likely that it will be
> higher than the desired order, and result in the stolen buddy page being split.
> The remaining pages after split are currently stolen only when the rest of the
> free pages are stolen.

The original intent was that the stolen fallback buddy page would be
added to the requested migratetype freelists. This was independent of
whether all other free pages in the pageblock were moved or whether the
pageblock migratetype was updated.

> This can however lead to situations where for MOVABLE
> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
> order-0 page. Then on the next MOVABLE allocation (which may be batched to
> fill the pcplists) we split another order-3 or higher page, etc. By stealing
> all pages that we have split, we can avoid further stealing.
> 
> This patch therefore adjust the page stealing so that buddy pages created by
> split are always stolen. This has effect only on MOVABLE allocations, as
> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
> stealing the rest of free pages from the pageblock.
> 

This restores the intended behaviour.

> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
> has already performed this change (unintentinally), but was reverted by commit
> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
> Neither included evaluation. My evaluation with stress-highalloc from mmtests
> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
> without affecting the page stealing events for other allocation migratetypes.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/3] mm: when stealing freepages, also take pages created by splitting buddy page
  2014-12-04 17:12   ` Vlastimil Babka
@ 2014-12-09  3:02     ` Minchan Kim
  -1 siblings, 0 replies; 38+ messages in thread
From: Minchan Kim @ 2014-12-09  3:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:56PM +0100, Vlastimil Babka wrote:
> When __rmqueue_fallback() is called to allocate a page of order X, it will
> find a page of order Y >= X of a fallback migratetype, which is different from
> the desired migratetype. With the help of try_to_steal_freepages(), it may
> change the migratetype (to the desired one) also of:
> 
> 1) all currently free pages in the pageblock containing the fallback page
> 2) the fallback pageblock itself
> 3) buddy pages created by splitting the fallback page (when Y > X)
> 
> These decisions take the order Y into account, as well as the desired
> migratetype, with the goal of preventing multiple fallback allocations that
> could e.g. distribute UNMOVABLE allocations among multiple pageblocks.
> 
> Originally, decision for 1) has implied the decision for 3). Commit
> 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
> (probably unintentionally) so that the buddy pages in case 3) are always
> changed to the desired migratetype, except for CMA pageblocks.
> 
> Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
> fix a bug") did some refactoring and added a comment that the case of 3) is
> intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
> pageblock type") removed the comment and tried to restore the original behavior
> where 1) implies 3), but due to the previous refactoring, the result is instead
> that only 2) implies 3) - and the conditions for 2) are less frequently met
> than conditions for 1). This may increase fragmentation in situations where the
> code decides to steal all free pages from the pageblock (case 1)), but then
> gives back the buddy pages produced by splitting.
> 
> This patch restores the original intended logic where 1) implies 3). During
> testing with stress-highalloc from mmtests, this has shown to decrease the
> number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
> pageblocks, which can lead to permanent fragmentation. It has increased the
> number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
> pageblocks, but these are fixable by sync compaction and thus less harmful.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>

I expect you will Cc -stable when you respin with fixing pointed out
by Joonsoo.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 1/3] mm: when stealing freepages, also take pages created by splitting buddy page
@ 2014-12-09  3:02     ` Minchan Kim
  0 siblings, 0 replies; 38+ messages in thread
From: Minchan Kim @ 2014-12-09  3:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:56PM +0100, Vlastimil Babka wrote:
> When __rmqueue_fallback() is called to allocate a page of order X, it will
> find a page of order Y >= X of a fallback migratetype, which is different from
> the desired migratetype. With the help of try_to_steal_freepages(), it may
> change the migratetype (to the desired one) also of:
> 
> 1) all currently free pages in the pageblock containing the fallback page
> 2) the fallback pageblock itself
> 3) buddy pages created by splitting the fallback page (when Y > X)
> 
> These decisions take the order Y into account, as well as the desired
> migratetype, with the goal of preventing multiple fallback allocations that
> could e.g. distribute UNMOVABLE allocations among multiple pageblocks.
> 
> Originally, decision for 1) has implied the decision for 3). Commit
> 47118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
> (probably unintentionally) so that the buddy pages in case 3) are always
> changed to the desired migratetype, except for CMA pageblocks.
> 
> Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and
> fix a bug") did some refactoring and added a comment that the case of 3) is
> intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect
> pageblock type") removed the comment and tried to restore the original behavior
> where 1) implies 3), but due to the previous refactoring, the result is instead
> that only 2) implies 3) - and the conditions for 2) are less frequently met
> than conditions for 1). This may increase fragmentation in situations where the
> code decides to steal all free pages from the pageblock (case 1)), but then
> gives back the buddy pages produced by splitting.
> 
> This patch restores the original intended logic where 1) implies 3). During
> testing with stress-highalloc from mmtests, this has shown to decrease the
> number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE
> pageblocks, which can lead to permanent fragmentation. It has increased the
> number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE
> pageblocks, but these are fixable by sync compaction and thus less harmful.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>

I expect you will Cc -stable when you respin with fixing pointed out
by Joonsoo.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
  2014-12-04 17:12   ` Vlastimil Babka
@ 2014-12-09  3:09     ` Minchan Kim
  -1 siblings, 0 replies; 38+ messages in thread
From: Minchan Kim @ 2014-12-09  3:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> When allocation falls back to stealing free pages of another migratetype,
> it can decide to steal extra pages, or even the whole pageblock in order to
> reduce fragmentation, which could happen if further allocation fallbacks
> pick a different pageblock. In try_to_steal_freepages(), one of the situations
> where extra pages are stolen happens when we are trying to allocate a
> MIGRATE_RECLAIMABLE page.
> 
> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> spreading such allocation over multiple fallback pageblocks is arguably even
> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> should minimize the number of such fallbacks, and thus steal as much as is
> possible from each fallback pageblock.

Fair enough.

> 
> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
> extra free pages. When evaluating with stress-highalloc from mmtests, this has
> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>

Nit:

Please fix comment on try_to_steal_freepages.
We don't bias MIGRATE_RECLAIMABLE any more so remove it. Instead,
put some words about the policy and why.

Thanks.

> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 548b072..a14249c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>  
>  	if (current_order >= pageblock_order / 2 ||
>  	    start_type == MIGRATE_RECLAIMABLE ||
> +	    start_type == MIGRATE_UNMOVABLE ||
>  	    page_group_by_mobility_disabled) {
>  		int pages;
>  
> -- 
> 2.1.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
@ 2014-12-09  3:09     ` Minchan Kim
  0 siblings, 0 replies; 38+ messages in thread
From: Minchan Kim @ 2014-12-09  3:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> When allocation falls back to stealing free pages of another migratetype,
> it can decide to steal extra pages, or even the whole pageblock in order to
> reduce fragmentation, which could happen if further allocation fallbacks
> pick a different pageblock. In try_to_steal_freepages(), one of the situations
> where extra pages are stolen happens when we are trying to allocate a
> MIGRATE_RECLAIMABLE page.
> 
> However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> spreading such allocation over multiple fallback pageblocks is arguably even
> worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> should minimize the number of such fallbacks, and thus steal as much as is
> possible from each fallback pageblock.

Fair enough.

> 
> This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to steal
> extra free pages. When evaluating with stress-highalloc from mmtests, this has
> reduced the number of MIGRATE_UNMOVABLE fallbacks to roughly 1/6. The number
> of these fallbacks stealing from MIGRATE_MOVABLE block is reduced to 1/3.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>

Nit:

Please fix comment on try_to_steal_freepages.
We don't bias MIGRATE_RECLAIMABLE any more so remove it. Instead,
put some words about the policy and why.

Thanks.

> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 548b072..a14249c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1098,6 +1098,7 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>  
>  	if (current_order >= pageblock_order / 2 ||
>  	    start_type == MIGRATE_RECLAIMABLE ||
> +	    start_type == MIGRATE_UNMOVABLE ||
>  	    page_group_by_mobility_disabled) {
>  		int pages;
>  
> -- 
> 2.1.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations
  2014-12-04 17:12   ` Vlastimil Babka
@ 2014-12-09  3:17     ` Minchan Kim
  -1 siblings, 0 replies; 38+ messages in thread
From: Minchan Kim @ 2014-12-09  3:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
> When allocation falls back to another migratetype, it will steal a page with
> highest available order, and (depending on this order and desired migratetype),
> it might also steal the rest of free pages from the same pageblock.
> 
> Given the preference of highest available order, it is likely that it will be
> higher than the desired order, and result in the stolen buddy page being split.
> The remaining pages after split are currently stolen only when the rest of the
> free pages are stolen. This can however lead to situations where for MOVABLE
> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
> order-0 page. Then on the next MOVABLE allocation (which may be batched to
> fill the pcplists) we split another order-3 or higher page, etc. By stealing
> all pages that we have split, we can avoid further stealing.
> 
> This patch therefore adjust the page stealing so that buddy pages created by
> split are always stolen. This has effect only on MOVABLE allocations, as
> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
> stealing the rest of free pages from the pageblock.
> 
> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
> has already performed this change (unintentinally), but was reverted by commit
> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
> Neither included evaluation. My evaluation with stress-highalloc from mmtests
> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
> without affecting the page stealing events for other allocation migratetypes.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>

Nit:

>From this patch, try_to_steal_freepages always return start_type excpet CMA
case so we could factor CMA case out in try_to_steal_freepages and put the
check right before calling try_to_steal_freepages.

The benefit are we could make try_to_steal_freepages's return type as void
and we could remove fallback_type argument(ie, make the function simple).
Additionally, we could move set_freepage_migratetype into
try_to_steal_freepages so that we could remove new_type variable
in __rmqueue_fallback.

trace_mm_page_alloc_extfrag could work without new_type using
get_pageblock_migratetype.

Thanks.

> ---
>  mm/page_alloc.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a14249c..82096a6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>  		if (pages >= (1 << (pageblock_order-1)) ||
>  				page_group_by_mobility_disabled)
>  			set_pageblock_migratetype(page, start_type);
> -
> -		return start_type;
>  	}
>  
> -	return fallback_type;
> +	return start_type;
>  }
>  
>  /* Remove an element from the buddy allocator from the fallback list */
> -- 
> 2.1.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations
@ 2014-12-09  3:17     ` Minchan Kim
  0 siblings, 0 replies; 38+ messages in thread
From: Minchan Kim @ 2014-12-09  3:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Joonsoo Kim, linux-kernel, Mel Gorman, Rik van Riel,
	David Rientjes

On Thu, Dec 04, 2014 at 06:12:58PM +0100, Vlastimil Babka wrote:
> When allocation falls back to another migratetype, it will steal a page with
> highest available order, and (depending on this order and desired migratetype),
> it might also steal the rest of free pages from the same pageblock.
> 
> Given the preference of highest available order, it is likely that it will be
> higher than the desired order, and result in the stolen buddy page being split.
> The remaining pages after split are currently stolen only when the rest of the
> free pages are stolen. This can however lead to situations where for MOVABLE
> allocations we split e.g. order-4 fallback UNMOVABLE page, but steal only
> order-0 page. Then on the next MOVABLE allocation (which may be batched to
> fill the pcplists) we split another order-3 or higher page, etc. By stealing
> all pages that we have split, we can avoid further stealing.
> 
> This patch therefore adjust the page stealing so that buddy pages created by
> split are always stolen. This has effect only on MOVABLE allocations, as
> RECLAIMABLE and UNMOVABLE allocations already always do that in addition to
> stealing the rest of free pages from the pageblock.
> 
> Note that commit 7118af076f6 ("mm: mmzone: MIGRATE_CMA migration type added")
> has already performed this change (unintentinally), but was reverted by commit
> 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type").
> Neither included evaluation. My evaluation with stress-highalloc from mmtests
> shows about 2.5x reduction of page stealing events for MOVABLE allocations,
> without affecting the page stealing events for other allocation migratetypes.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>

Nit:

>From this patch, try_to_steal_freepages always return start_type excpet CMA
case so we could factor CMA case out in try_to_steal_freepages and put the
check right before calling try_to_steal_freepages.

The benefit are we could make try_to_steal_freepages's return type as void
and we could remove fallback_type argument(ie, make the function simple).
Additionally, we could move set_freepage_migratetype into
try_to_steal_freepages so that we could remove new_type variable
in __rmqueue_fallback.

trace_mm_page_alloc_extfrag could work without new_type using
get_pageblock_migratetype.

Thanks.

> ---
>  mm/page_alloc.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a14249c..82096a6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1108,11 +1108,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
>  		if (pages >= (1 << (pageblock_order-1)) ||
>  				page_group_by_mobility_disabled)
>  			set_pageblock_migratetype(page, start_type);
> -
> -		return start_type;
>  	}
>  
> -	return fallback_type;
> +	return start_type;
>  }
>  
>  /* Remove an element from the buddy allocator from the fallback list */
> -- 
> 2.1.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
  2014-12-08 10:27       ` Vlastimil Babka
@ 2014-12-09  8:28         ` Joonsoo Kim
  -1 siblings, 0 replies; 38+ messages in thread
From: Joonsoo Kim @ 2014-12-09  8:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On Mon, Dec 08, 2014 at 11:27:27AM +0100, Vlastimil Babka wrote:
> On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
> >On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> >>When allocation falls back to stealing free pages of another migratetype,
> >>it can decide to steal extra pages, or even the whole pageblock in order to
> >>reduce fragmentation, which could happen if further allocation fallbacks
> >>pick a different pageblock. In try_to_steal_freepages(), one of the situations
> >>where extra pages are stolen happens when we are trying to allocate a
> >>MIGRATE_RECLAIMABLE page.
> >>
> >>However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> >>spreading such allocation over multiple fallback pageblocks is arguably even
> >>worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> >>should minimize the number of such fallbacks, and thus steal as much as is
> >>possible from each fallback pageblock.
> >
> >I'm not sure that this change is good. If we steal order 0 pages,
> >this may be good. But, sometimes, we try to steal high order page
> >and, in this case, there would be many order 0 freepages and blindly
> >stealing freepages in that pageblock make the system more fragmented.
> 
> I don't understand. If we try to steal high order page
> (current_order >= pageblock_order / 2), then nothing changes, the
> condition for extra stealing is the same.

More accureately, I means mid order page (current_order <
pageblock_order / 2), but, not order 0, such as order 2,3,4(?).
In this case, perhaps, the system has enough unmovable order 0 freepages,
so we don't need to worry about second kind of fragmentation you
mentioned below. Stealing one mid order freepage is enough to satify
request.

> 
> >MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
> >it can be reclaimed so excessive migratetype movement doesn't result
> >in permanent fragmentation.
> 
> There's two kinds of "fragmentation" IMHO. First, inside a
> pageblock, unmovable allocations can prevent merging of lower
> orders. This can get worse if we steal multiple pages from a single
> pageblock, but the pageblock itself is not marked as unmovable.

So, what's the intention pageblock itself not marked as unmovable?
I guess that if many pages are moved to unmovable, they can't be easily
back and this pageblock is highly fragmented. So, processing more unmovable
requests from this pageblock by changing pageblock migratetype makes more
sense to me.

> Second kind of fragmentation is when unmovable allocations spread
> over multiple pageblocks. Lower order allocations within each such
> pageblock might be still possible, but less pageblocks are able to
> compact to have whole pageblock free.
> 
> I think the second kind is worse, so when do have to pollute a
> movable pageblock with unmovable allocation, we better take as much
> as possible, so we prevent polluting other pageblocks.

I agree.

> 
> >What I'd like to do to prevent fragmentation is
> >1) check whether we can steal all or almost freepages and change
> >migratetype of pageblock.
> >2) If above condition isn't met, deny allocation and invoke compaction.
> 
> Could work to some extend, but we need also to prevent excessive compaction.

So, I suggest knob to control behaviour. In small memory system,
fragmentation occurs frequently so the system can't handle just order 2
request. In that system, excessive compaction is acceptable because
it it better than system down.

> 
> We could also introduce a new pageblock migratetype, something like
> MIGRATE_MIXED. The idea is that once pageblock isn't used purely by
> MOVABLE allocations, it's marked as MIXED, until it either becomes
> marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is
> fully freed. In more detail:
> 
> - MIXED is preferred for fallback before any other migratetypes
> - if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE
> pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by
> current rules), it marks it as MIXED instead.
> - if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE
> pageblocks, it will only mark it as MOVABLE if it was fully free.
> Otherwise, if current rules would result in marking it as MOVABLE
> (i.e. most of it was stolen, but not all) it will mark it as MIXED
> instead.
> 
> This could in theory leave more MOVABLE pageblocks unspoiled by
> UNMOVABLE allocations.

I guess that we can do it without introducing new migratetype pageblock.
Just always marking it as RECLAIMABLE/UNMOVABLE when
RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE would
have same effect.

Thanks.

> >Maybe knob to control behaviour would be needed.
> >How about it?
> 
> Adding new knobs is not a good solution.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
@ 2014-12-09  8:28         ` Joonsoo Kim
  0 siblings, 0 replies; 38+ messages in thread
From: Joonsoo Kim @ 2014-12-09  8:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On Mon, Dec 08, 2014 at 11:27:27AM +0100, Vlastimil Babka wrote:
> On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
> >On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> >>When allocation falls back to stealing free pages of another migratetype,
> >>it can decide to steal extra pages, or even the whole pageblock in order to
> >>reduce fragmentation, which could happen if further allocation fallbacks
> >>pick a different pageblock. In try_to_steal_freepages(), one of the situations
> >>where extra pages are stolen happens when we are trying to allocate a
> >>MIGRATE_RECLAIMABLE page.
> >>
> >>However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> >>spreading such allocation over multiple fallback pageblocks is arguably even
> >>worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> >>should minimize the number of such fallbacks, and thus steal as much as is
> >>possible from each fallback pageblock.
> >
> >I'm not sure that this change is good. If we steal order 0 pages,
> >this may be good. But, sometimes, we try to steal high order page
> >and, in this case, there would be many order 0 freepages and blindly
> >stealing freepages in that pageblock make the system more fragmented.
> 
> I don't understand. If we try to steal high order page
> (current_order >= pageblock_order / 2), then nothing changes, the
> condition for extra stealing is the same.

More accureately, I means mid order page (current_order <
pageblock_order / 2), but, not order 0, such as order 2,3,4(?).
In this case, perhaps, the system has enough unmovable order 0 freepages,
so we don't need to worry about second kind of fragmentation you
mentioned below. Stealing one mid order freepage is enough to satify
request.

> 
> >MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
> >it can be reclaimed so excessive migratetype movement doesn't result
> >in permanent fragmentation.
> 
> There's two kinds of "fragmentation" IMHO. First, inside a
> pageblock, unmovable allocations can prevent merging of lower
> orders. This can get worse if we steal multiple pages from a single
> pageblock, but the pageblock itself is not marked as unmovable.

So, what's the intention pageblock itself not marked as unmovable?
I guess that if many pages are moved to unmovable, they can't be easily
back and this pageblock is highly fragmented. So, processing more unmovable
requests from this pageblock by changing pageblock migratetype makes more
sense to me.

> Second kind of fragmentation is when unmovable allocations spread
> over multiple pageblocks. Lower order allocations within each such
> pageblock might be still possible, but less pageblocks are able to
> compact to have whole pageblock free.
> 
> I think the second kind is worse, so when do have to pollute a
> movable pageblock with unmovable allocation, we better take as much
> as possible, so we prevent polluting other pageblocks.

I agree.

> 
> >What I'd like to do to prevent fragmentation is
> >1) check whether we can steal all or almost freepages and change
> >migratetype of pageblock.
> >2) If above condition isn't met, deny allocation and invoke compaction.
> 
> Could work to some extend, but we need also to prevent excessive compaction.

So, I suggest knob to control behaviour. In small memory system,
fragmentation occurs frequently so the system can't handle just order 2
request. In that system, excessive compaction is acceptable because
it it better than system down.

> 
> We could also introduce a new pageblock migratetype, something like
> MIGRATE_MIXED. The idea is that once pageblock isn't used purely by
> MOVABLE allocations, it's marked as MIXED, until it either becomes
> marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is
> fully freed. In more detail:
> 
> - MIXED is preferred for fallback before any other migratetypes
> - if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE
> pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by
> current rules), it marks it as MIXED instead.
> - if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE
> pageblocks, it will only mark it as MOVABLE if it was fully free.
> Otherwise, if current rules would result in marking it as MOVABLE
> (i.e. most of it was stolen, but not all) it will mark it as MIXED
> instead.
> 
> This could in theory leave more MOVABLE pageblocks unspoiled by
> UNMOVABLE allocations.

I guess that we can do it without introducing new migratetype pageblock.
Just always marking it as RECLAIMABLE/UNMOVABLE when
RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE would
have same effect.

Thanks.

> >Maybe knob to control behaviour would be needed.
> >How about it?
> 
> Adding new knobs is not a good solution.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
  2014-12-09  8:28         ` Joonsoo Kim
@ 2014-12-09  9:12           ` Vlastimil Babka
  -1 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-09  9:12 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On 12/09/2014 09:28 AM, Joonsoo Kim wrote:
> On Mon, Dec 08, 2014 at 11:27:27AM +0100, Vlastimil Babka wrote:
>> On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
>>>
>>> I'm not sure that this change is good. If we steal order 0 pages,
>>> this may be good. But, sometimes, we try to steal high order page
>>> and, in this case, there would be many order 0 freepages and blindly
>>> stealing freepages in that pageblock make the system more fragmented.
>>
>> I don't understand. If we try to steal high order page
>> (current_order >= pageblock_order / 2), then nothing changes, the
>> condition for extra stealing is the same.
>
> More accureately, I means mid order page (current_order <
> pageblock_order / 2), but, not order 0, such as order 2,3,4(?).
> In this case, perhaps, the system has enough unmovable order 0 freepages,
> so we don't need to worry about second kind of fragmentation you
> mentioned below. Stealing one mid order freepage is enough to satify
> request.

OK.

>>
>>> MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
>>> it can be reclaimed so excessive migratetype movement doesn't result
>>> in permanent fragmentation.
>>
>> There's two kinds of "fragmentation" IMHO. First, inside a
>> pageblock, unmovable allocations can prevent merging of lower
>> orders. This can get worse if we steal multiple pages from a single
>> pageblock, but the pageblock itself is not marked as unmovable.
>
> So, what's the intention pageblock itself not marked as unmovable?
> I guess that if many pages are moved to unmovable, they can't be easily
> back and this pageblock is highly fragmented. So, processing more unmovable
> requests from this pageblock by changing pageblock migratetype makes more
> sense to me.

There's the danger that we mark too much pageblocks as unmovable in some 
unmovable allocation spike and even if the number of unmovable allocated 
pages later decreases, they will keep being allocated from many 
unmovable-marked pageblocks, and neither will become empty enough to be 
remarked back. If we don't mark pageblocks unmovable as aggressively, 
it's possible that the unmovable allocations in a partially-stolen 
pageblock will be eventually freed, and no more unmovable allocations 
will occur in that pageblock if it's not marked as unmovable.

>> Second kind of fragmentation is when unmovable allocations spread
>> over multiple pageblocks. Lower order allocations within each such
>> pageblock might be still possible, but less pageblocks are able to
>> compact to have whole pageblock free.
>>
>> I think the second kind is worse, so when do have to pollute a
>> movable pageblock with unmovable allocation, we better take as much
>> as possible, so we prevent polluting other pageblocks.
>
> I agree.
>
>>
>>> What I'd like to do to prevent fragmentation is
>>> 1) check whether we can steal all or almost freepages and change
>>> migratetype of pageblock.
>>> 2) If above condition isn't met, deny allocation and invoke compaction.
>>
>> Could work to some extend, but we need also to prevent excessive compaction.
>
> So, I suggest knob to control behaviour. In small memory system,
> fragmentation occurs frequently so the system can't handle just order 2
> request. In that system, excessive compaction is acceptable because
> it it better than system down.

So you say that in these systems, order 2 requests fail because of page 
stealing?

>>
>> We could also introduce a new pageblock migratetype, something like
>> MIGRATE_MIXED. The idea is that once pageblock isn't used purely by
>> MOVABLE allocations, it's marked as MIXED, until it either becomes
>> marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is
>> fully freed. In more detail:
>>
>> - MIXED is preferred for fallback before any other migratetypes
>> - if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE
>> pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by
>> current rules), it marks it as MIXED instead.
>> - if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE
>> pageblocks, it will only mark it as MOVABLE if it was fully free.
>> Otherwise, if current rules would result in marking it as MOVABLE
>> (i.e. most of it was stolen, but not all) it will mark it as MIXED
>> instead.
>>
>> This could in theory leave more MOVABLE pageblocks unspoiled by
>> UNMOVABLE allocations.
>
> I guess that we can do it without introducing new migratetype pageblock.
> Just always marking it as RECLAIMABLE/UNMOVABLE when
> RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE would
> have same effect.

See the argument above. The difference with MIXED marking is that new 
unmovable allocations would take from these pageblocks only as a 
fallback. Primarily it would try to reuse a more limited number of 
unmovable-marked pageblocks.

But this is just an idea not related to the series at hand. Yes, it 
could be better, these are all heuristics and any change is a potential 
tradeoff.

Also we need to keep in mind that ultimately, anything we devise cannot 
prevent fragmentation 100%. We cannot predict the future, so we don't 
know which unmovable allocations will be freed soon, and which will stay 
for longer time. To minimize fragmentation, we would need to recognize 
those longer-lived unmovable allocations, so we could put them together 
in as few pageblocks as possible.

> Thanks.
>
>>> Maybe knob to control behaviour would be needed.
>>> How about it?
>>
>> Adding new knobs is not a good solution.
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
@ 2014-12-09  9:12           ` Vlastimil Babka
  0 siblings, 0 replies; 38+ messages in thread
From: Vlastimil Babka @ 2014-12-09  9:12 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On 12/09/2014 09:28 AM, Joonsoo Kim wrote:
> On Mon, Dec 08, 2014 at 11:27:27AM +0100, Vlastimil Babka wrote:
>> On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
>>>
>>> I'm not sure that this change is good. If we steal order 0 pages,
>>> this may be good. But, sometimes, we try to steal high order page
>>> and, in this case, there would be many order 0 freepages and blindly
>>> stealing freepages in that pageblock make the system more fragmented.
>>
>> I don't understand. If we try to steal high order page
>> (current_order >= pageblock_order / 2), then nothing changes, the
>> condition for extra stealing is the same.
>
> More accureately, I means mid order page (current_order <
> pageblock_order / 2), but, not order 0, such as order 2,3,4(?).
> In this case, perhaps, the system has enough unmovable order 0 freepages,
> so we don't need to worry about second kind of fragmentation you
> mentioned below. Stealing one mid order freepage is enough to satify
> request.

OK.

>>
>>> MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
>>> it can be reclaimed so excessive migratetype movement doesn't result
>>> in permanent fragmentation.
>>
>> There's two kinds of "fragmentation" IMHO. First, inside a
>> pageblock, unmovable allocations can prevent merging of lower
>> orders. This can get worse if we steal multiple pages from a single
>> pageblock, but the pageblock itself is not marked as unmovable.
>
> So, what's the intention pageblock itself not marked as unmovable?
> I guess that if many pages are moved to unmovable, they can't be easily
> back and this pageblock is highly fragmented. So, processing more unmovable
> requests from this pageblock by changing pageblock migratetype makes more
> sense to me.

There's the danger that we mark too much pageblocks as unmovable in some 
unmovable allocation spike and even if the number of unmovable allocated 
pages later decreases, they will keep being allocated from many 
unmovable-marked pageblocks, and neither will become empty enough to be 
remarked back. If we don't mark pageblocks unmovable as aggressively, 
it's possible that the unmovable allocations in a partially-stolen 
pageblock will be eventually freed, and no more unmovable allocations 
will occur in that pageblock if it's not marked as unmovable.

>> Second kind of fragmentation is when unmovable allocations spread
>> over multiple pageblocks. Lower order allocations within each such
>> pageblock might be still possible, but less pageblocks are able to
>> compact to have whole pageblock free.
>>
>> I think the second kind is worse, so when do have to pollute a
>> movable pageblock with unmovable allocation, we better take as much
>> as possible, so we prevent polluting other pageblocks.
>
> I agree.
>
>>
>>> What I'd like to do to prevent fragmentation is
>>> 1) check whether we can steal all or almost freepages and change
>>> migratetype of pageblock.
>>> 2) If above condition isn't met, deny allocation and invoke compaction.
>>
>> Could work to some extend, but we need also to prevent excessive compaction.
>
> So, I suggest knob to control behaviour. In small memory system,
> fragmentation occurs frequently so the system can't handle just order 2
> request. In that system, excessive compaction is acceptable because
> it it better than system down.

So you say that in these systems, order 2 requests fail because of page 
stealing?

>>
>> We could also introduce a new pageblock migratetype, something like
>> MIGRATE_MIXED. The idea is that once pageblock isn't used purely by
>> MOVABLE allocations, it's marked as MIXED, until it either becomes
>> marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is
>> fully freed. In more detail:
>>
>> - MIXED is preferred for fallback before any other migratetypes
>> - if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE
>> pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by
>> current rules), it marks it as MIXED instead.
>> - if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE
>> pageblocks, it will only mark it as MOVABLE if it was fully free.
>> Otherwise, if current rules would result in marking it as MOVABLE
>> (i.e. most of it was stolen, but not all) it will mark it as MIXED
>> instead.
>>
>> This could in theory leave more MOVABLE pageblocks unspoiled by
>> UNMOVABLE allocations.
>
> I guess that we can do it without introducing new migratetype pageblock.
> Just always marking it as RECLAIMABLE/UNMOVABLE when
> RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE would
> have same effect.

See the argument above. The difference with MIXED marking is that new 
unmovable allocations would take from these pageblocks only as a 
fallback. Primarily it would try to reuse a more limited number of 
unmovable-marked pageblocks.

But this is just an idea not related to the series at hand. Yes, it 
could be better, these are all heuristics and any change is a potential 
tradeoff.

Also we need to keep in mind that ultimately, anything we devise cannot 
prevent fragmentation 100%. We cannot predict the future, so we don't 
know which unmovable allocations will be freed soon, and which will stay 
for longer time. To minimize fragmentation, we would need to recognize 
those longer-lived unmovable allocations, so we could put them together 
in as few pageblocks as possible.

> Thanks.
>
>>> Maybe knob to control behaviour would be needed.
>>> How about it?
>>
>> Adding new knobs is not a good solution.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
  2014-12-09  3:09     ` Minchan Kim
@ 2014-12-09  9:47       ` Mel Gorman
  -1 siblings, 0 replies; 38+ messages in thread
From: Mel Gorman @ 2014-12-09  9:47 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Vlastimil Babka, linux-mm, Joonsoo Kim, linux-kernel,
	Rik van Riel, David Rientjes

On Tue, Dec 09, 2014 at 12:09:40PM +0900, Minchan Kim wrote:
> On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> > When allocation falls back to stealing free pages of another migratetype,
> > it can decide to steal extra pages, or even the whole pageblock in order to
> > reduce fragmentation, which could happen if further allocation fallbacks
> > pick a different pageblock. In try_to_steal_freepages(), one of the situations
> > where extra pages are stolen happens when we are trying to allocate a
> > MIGRATE_RECLAIMABLE page.
> > 
> > However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> > spreading such allocation over multiple fallback pageblocks is arguably even
> > worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> > should minimize the number of such fallbacks, and thus steal as much as is
> > possible from each fallback pageblock.
> 
> Fair enough.
> 

Just to be absolutly sure, check that data and see what the number of
MIGRATE_UNMOVABLE blocks looks like over time. Make sure it's not just
continually growing. MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE blocks were
expected to be freed if the system was aggressively reclaimed but the same
is not be true of MIGRATE_UNMOVABLE. Even if all processes are
aggressively reclaimed for example, the page tables are still there.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
@ 2014-12-09  9:47       ` Mel Gorman
  0 siblings, 0 replies; 38+ messages in thread
From: Mel Gorman @ 2014-12-09  9:47 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Vlastimil Babka, linux-mm, Joonsoo Kim, linux-kernel,
	Rik van Riel, David Rientjes

On Tue, Dec 09, 2014 at 12:09:40PM +0900, Minchan Kim wrote:
> On Thu, Dec 04, 2014 at 06:12:57PM +0100, Vlastimil Babka wrote:
> > When allocation falls back to stealing free pages of another migratetype,
> > it can decide to steal extra pages, or even the whole pageblock in order to
> > reduce fragmentation, which could happen if further allocation fallbacks
> > pick a different pageblock. In try_to_steal_freepages(), one of the situations
> > where extra pages are stolen happens when we are trying to allocate a
> > MIGRATE_RECLAIMABLE page.
> > 
> > However, MIGRATE_UNMOVABLE allocations are not treated the same way, although
> > spreading such allocation over multiple fallback pageblocks is arguably even
> > worse than it is for RECLAIMABLE allocations. To minimize fragmentation, we
> > should minimize the number of such fallbacks, and thus steal as much as is
> > possible from each fallback pageblock.
> 
> Fair enough.
> 

Just to be absolutly sure, check that data and see what the number of
MIGRATE_UNMOVABLE blocks looks like over time. Make sure it's not just
continually growing. MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE blocks were
expected to be freed if the system was aggressively reclaimed but the same
is not be true of MIGRATE_UNMOVABLE. Even if all processes are
aggressively reclaimed for example, the page tables are still there.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
  2014-12-09  9:12           ` Vlastimil Babka
@ 2014-12-10  6:32             ` Joonsoo Kim
  -1 siblings, 0 replies; 38+ messages in thread
From: Joonsoo Kim @ 2014-12-10  6:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On Tue, Dec 09, 2014 at 10:12:15AM +0100, Vlastimil Babka wrote:
> On 12/09/2014 09:28 AM, Joonsoo Kim wrote:
> >On Mon, Dec 08, 2014 at 11:27:27AM +0100, Vlastimil Babka wrote:
> >>On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
> >>>
> >>>I'm not sure that this change is good. If we steal order 0 pages,
> >>>this may be good. But, sometimes, we try to steal high order page
> >>>and, in this case, there would be many order 0 freepages and blindly
> >>>stealing freepages in that pageblock make the system more fragmented.
> >>
> >>I don't understand. If we try to steal high order page
> >>(current_order >= pageblock_order / 2), then nothing changes, the
> >>condition for extra stealing is the same.
> >
> >More accureately, I means mid order page (current_order <
> >pageblock_order / 2), but, not order 0, such as order 2,3,4(?).
> >In this case, perhaps, the system has enough unmovable order 0 freepages,
> >so we don't need to worry about second kind of fragmentation you
> >mentioned below. Stealing one mid order freepage is enough to satify
> >request.
> 
> OK.
> 
> >>
> >>>MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
> >>>it can be reclaimed so excessive migratetype movement doesn't result
> >>>in permanent fragmentation.
> >>
> >>There's two kinds of "fragmentation" IMHO. First, inside a
> >>pageblock, unmovable allocations can prevent merging of lower
> >>orders. This can get worse if we steal multiple pages from a single
> >>pageblock, but the pageblock itself is not marked as unmovable.
> >
> >So, what's the intention pageblock itself not marked as unmovable?
> >I guess that if many pages are moved to unmovable, they can't be easily
> >back and this pageblock is highly fragmented. So, processing more unmovable
> >requests from this pageblock by changing pageblock migratetype makes more
> >sense to me.
> 
> There's the danger that we mark too much pageblocks as unmovable in
> some unmovable allocation spike and even if the number of unmovable
> allocated pages later decreases, they will keep being allocated from
> many unmovable-marked pageblocks, and neither will become empty
> enough to be remarked back. If we don't mark pageblocks unmovable as
> aggressively, it's possible that the unmovable allocations in a
> partially-stolen pageblock will be eventually freed, and no more
> unmovable allocations will occur in that pageblock if it's not
> marked as unmovable.

Hmm... Yes, but, it seems to be really workload dependent. I'll check
the effect of changing pageblock migratetype aggressively on my test bed.

> 
> >>Second kind of fragmentation is when unmovable allocations spread
> >>over multiple pageblocks. Lower order allocations within each such
> >>pageblock might be still possible, but less pageblocks are able to
> >>compact to have whole pageblock free.
> >>
> >>I think the second kind is worse, so when do have to pollute a
> >>movable pageblock with unmovable allocation, we better take as much
> >>as possible, so we prevent polluting other pageblocks.
> >
> >I agree.
> >
> >>
> >>>What I'd like to do to prevent fragmentation is
> >>>1) check whether we can steal all or almost freepages and change
> >>>migratetype of pageblock.
> >>>2) If above condition isn't met, deny allocation and invoke compaction.
> >>
> >>Could work to some extend, but we need also to prevent excessive compaction.
> >
> >So, I suggest knob to control behaviour. In small memory system,
> >fragmentation occurs frequently so the system can't handle just order 2
> >request. In that system, excessive compaction is acceptable because
> >it it better than system down.
> 
> So you say that in these systems, order 2 requests fail because of
> page stealing?

Yes. At some point, system memory is highly fragmented and order 2
requests fail. It would be caused by page stealing but I didn't analyze it.

> >>
> >>We could also introduce a new pageblock migratetype, something like
> >>MIGRATE_MIXED. The idea is that once pageblock isn't used purely by
> >>MOVABLE allocations, it's marked as MIXED, until it either becomes
> >>marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is
> >>fully freed. In more detail:
> >>
> >>- MIXED is preferred for fallback before any other migratetypes
> >>- if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE
> >>pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by
> >>current rules), it marks it as MIXED instead.
> >>- if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE
> >>pageblocks, it will only mark it as MOVABLE if it was fully free.
> >>Otherwise, if current rules would result in marking it as MOVABLE
> >>(i.e. most of it was stolen, but not all) it will mark it as MIXED
> >>instead.
> >>
> >>This could in theory leave more MOVABLE pageblocks unspoiled by
> >>UNMOVABLE allocations.
> >
> >I guess that we can do it without introducing new migratetype pageblock.
> >Just always marking it as RECLAIMABLE/UNMOVABLE when
> >RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE would
> >have same effect.
> 
> See the argument above. The difference with MIXED marking is that
> new unmovable allocations would take from these pageblocks only as a
> fallback. Primarily it would try to reuse a more limited number of
> unmovable-marked pageblocks.

Ah, I understand now. Looks like a good idea.

> But this is just an idea not related to the series at hand. Yes, it
> could be better, these are all heuristics and any change is a
> potential tradeoff.
> 
> Also we need to keep in mind that ultimately, anything we devise
> cannot prevent fragmentation 100%. We cannot predict the future, so
> we don't know which unmovable allocations will be freed soon, and
> which will stay for longer time. To minimize fragmentation, we would
> need to recognize those longer-lived unmovable allocations, so we
> could put them together in as few pageblocks as possible.
> 
> >Thanks.
> >
> >>>Maybe knob to control behaviour would be needed.
> >>>How about it?
> >>
> >>Adding new knobs is not a good solution.
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations
@ 2014-12-10  6:32             ` Joonsoo Kim
  0 siblings, 0 replies; 38+ messages in thread
From: Joonsoo Kim @ 2014-12-10  6:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, linux-kernel, Minchan Kim, Mel Gorman, Rik van Riel,
	David Rientjes

On Tue, Dec 09, 2014 at 10:12:15AM +0100, Vlastimil Babka wrote:
> On 12/09/2014 09:28 AM, Joonsoo Kim wrote:
> >On Mon, Dec 08, 2014 at 11:27:27AM +0100, Vlastimil Babka wrote:
> >>On 12/08/2014 08:11 AM, Joonsoo Kim wrote:
> >>>
> >>>I'm not sure that this change is good. If we steal order 0 pages,
> >>>this may be good. But, sometimes, we try to steal high order page
> >>>and, in this case, there would be many order 0 freepages and blindly
> >>>stealing freepages in that pageblock make the system more fragmented.
> >>
> >>I don't understand. If we try to steal high order page
> >>(current_order >= pageblock_order / 2), then nothing changes, the
> >>condition for extra stealing is the same.
> >
> >More accureately, I means mid order page (current_order <
> >pageblock_order / 2), but, not order 0, such as order 2,3,4(?).
> >In this case, perhaps, the system has enough unmovable order 0 freepages,
> >so we don't need to worry about second kind of fragmentation you
> >mentioned below. Stealing one mid order freepage is enough to satify
> >request.
> 
> OK.
> 
> >>
> >>>MIGRATE_RECLAIMABLE is different case than MIGRATE_UNMOVABLE, because
> >>>it can be reclaimed so excessive migratetype movement doesn't result
> >>>in permanent fragmentation.
> >>
> >>There's two kinds of "fragmentation" IMHO. First, inside a
> >>pageblock, unmovable allocations can prevent merging of lower
> >>orders. This can get worse if we steal multiple pages from a single
> >>pageblock, but the pageblock itself is not marked as unmovable.
> >
> >So, what's the intention pageblock itself not marked as unmovable?
> >I guess that if many pages are moved to unmovable, they can't be easily
> >back and this pageblock is highly fragmented. So, processing more unmovable
> >requests from this pageblock by changing pageblock migratetype makes more
> >sense to me.
> 
> There's the danger that we mark too much pageblocks as unmovable in
> some unmovable allocation spike and even if the number of unmovable
> allocated pages later decreases, they will keep being allocated from
> many unmovable-marked pageblocks, and neither will become empty
> enough to be remarked back. If we don't mark pageblocks unmovable as
> aggressively, it's possible that the unmovable allocations in a
> partially-stolen pageblock will be eventually freed, and no more
> unmovable allocations will occur in that pageblock if it's not
> marked as unmovable.

Hmm... Yes, but, it seems to be really workload dependent. I'll check
the effect of changing pageblock migratetype aggressively on my test bed.

> 
> >>Second kind of fragmentation is when unmovable allocations spread
> >>over multiple pageblocks. Lower order allocations within each such
> >>pageblock might be still possible, but less pageblocks are able to
> >>compact to have whole pageblock free.
> >>
> >>I think the second kind is worse, so when do have to pollute a
> >>movable pageblock with unmovable allocation, we better take as much
> >>as possible, so we prevent polluting other pageblocks.
> >
> >I agree.
> >
> >>
> >>>What I'd like to do to prevent fragmentation is
> >>>1) check whether we can steal all or almost freepages and change
> >>>migratetype of pageblock.
> >>>2) If above condition isn't met, deny allocation and invoke compaction.
> >>
> >>Could work to some extend, but we need also to prevent excessive compaction.
> >
> >So, I suggest knob to control behaviour. In small memory system,
> >fragmentation occurs frequently so the system can't handle just order 2
> >request. In that system, excessive compaction is acceptable because
> >it it better than system down.
> 
> So you say that in these systems, order 2 requests fail because of
> page stealing?

Yes. At some point, system memory is highly fragmented and order 2
requests fail. It would be caused by page stealing but I didn't analyze it.

> >>
> >>We could also introduce a new pageblock migratetype, something like
> >>MIGRATE_MIXED. The idea is that once pageblock isn't used purely by
> >>MOVABLE allocations, it's marked as MIXED, until it either becomes
> >>marked UNMOVABLE or RECLAIMABLE by the existing mechanisms, or is
> >>fully freed. In more detail:
> >>
> >>- MIXED is preferred for fallback before any other migratetypes
> >>- if RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE
> >>pageblock and cannot mark pageblock as RECLAIMABLE/UNMOVABLE (by
> >>current rules), it marks it as MIXED instead.
> >>- if MOVABLE allocation is stealing from UNMOVABLE/RECLAIMABLE
> >>pageblocks, it will only mark it as MOVABLE if it was fully free.
> >>Otherwise, if current rules would result in marking it as MOVABLE
> >>(i.e. most of it was stolen, but not all) it will mark it as MIXED
> >>instead.
> >>
> >>This could in theory leave more MOVABLE pageblocks unspoiled by
> >>UNMOVABLE allocations.
> >
> >I guess that we can do it without introducing new migratetype pageblock.
> >Just always marking it as RECLAIMABLE/UNMOVABLE when
> >RECLAIMABLE/UNMOVABLE page allocation is stealing from MOVABLE would
> >have same effect.
> 
> See the argument above. The difference with MIXED marking is that
> new unmovable allocations would take from these pageblocks only as a
> fallback. Primarily it would try to reuse a more limited number of
> unmovable-marked pageblocks.

Ah, I understand now. Looks like a good idea.

> But this is just an idea not related to the series at hand. Yes, it
> could be better, these are all heuristics and any change is a
> potential tradeoff.
> 
> Also we need to keep in mind that ultimately, anything we devise
> cannot prevent fragmentation 100%. We cannot predict the future, so
> we don't know which unmovable allocations will be freed soon, and
> which will stay for longer time. To minimize fragmentation, we would
> need to recognize those longer-lived unmovable allocations, so we
> could put them together in as few pageblocks as possible.
> 
> >Thanks.
> >
> >>>Maybe knob to control behaviour would be needed.
> >>>How about it?
> >>
> >>Adding new knobs is not a good solution.
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2014-12-10  6:28 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-04 17:12 [PATCH 0/3] page stealing tweaks Vlastimil Babka
2014-12-04 17:12 ` Vlastimil Babka
2014-12-04 17:12 ` [RFC PATCH 1/3] mm: when stealing freepages, also take pages created by splitting buddy page Vlastimil Babka
2014-12-04 17:12   ` Vlastimil Babka
2014-12-08  6:54   ` Joonsoo Kim
2014-12-08  6:54     ` Joonsoo Kim
2014-12-08 11:07   ` Mel Gorman
2014-12-08 11:07     ` Mel Gorman
2014-12-09  3:02   ` Minchan Kim
2014-12-09  3:02     ` Minchan Kim
2014-12-04 17:12 ` [RFC PATCH 2/3] mm: more aggressive page stealing for UNMOVABLE allocations Vlastimil Babka
2014-12-04 17:12   ` Vlastimil Babka
2014-12-08  7:11   ` Joonsoo Kim
2014-12-08  7:11     ` Joonsoo Kim
2014-12-08 10:27     ` Vlastimil Babka
2014-12-08 10:27       ` Vlastimil Babka
2014-12-09  8:28       ` Joonsoo Kim
2014-12-09  8:28         ` Joonsoo Kim
2014-12-09  9:12         ` Vlastimil Babka
2014-12-09  9:12           ` Vlastimil Babka
2014-12-10  6:32           ` Joonsoo Kim
2014-12-10  6:32             ` Joonsoo Kim
2014-12-08 11:16   ` Mel Gorman
2014-12-08 11:16     ` Mel Gorman
2014-12-09  3:09   ` Minchan Kim
2014-12-09  3:09     ` Minchan Kim
2014-12-09  9:47     ` Mel Gorman
2014-12-09  9:47       ` Mel Gorman
2014-12-04 17:12 ` [RFC PATCH 3/3] mm: always steal split buddies in fallback allocations Vlastimil Babka
2014-12-04 17:12   ` Vlastimil Babka
2014-12-08  7:36   ` Joonsoo Kim
2014-12-08  7:36     ` Joonsoo Kim
2014-12-08 10:30     ` Vlastimil Babka
2014-12-08 10:30       ` Vlastimil Babka
2014-12-08 11:26   ` Mel Gorman
2014-12-08 11:26     ` Mel Gorman
2014-12-09  3:17   ` Minchan Kim
2014-12-09  3:17     ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.