linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mm: kswapd struggles reclaiming the pages on 64GB server
@ 2016-08-12 20:52 Andriy Tkachuk
  2016-08-17 11:43 ` Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andriy Tkachuk @ 2016-08-12 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mel Gorman

Hi,

our user-space application uses large amount of anon pages (private
mapping of the large file, more than 64GB RAM available in the system)
which are rarely accessible and are supposed to be swapped out.
Instead, we see that most of these pages are kept in memory while the
system suffers from the lack of free memory and overall performance
(especially the disk I/O, vm.swappiness=100 does not help it). kswapd
scans millions of pages per second but reclames hundreds per sec only.
Here are the 5 secs interval snapshots of some counters:

$ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2
proc-meminfo-0616-160539.txt Cached:           347936 kB
proc-meminfo-0616-160549.txt Cached:           316316 kB
proc-meminfo-0616-160559.txt Cached:           322264 kB
proc-meminfo-0616-160539.txt SwapCached:      2853064 kB
proc-meminfo-0616-160549.txt SwapCached:      2853168 kB
proc-meminfo-0616-160559.txt SwapCached:      2853280 kB
proc-vmstat-0616-160535.txt nr_active_anon 14508616
proc-vmstat-0616-160545.txt nr_active_anon 14513725
proc-vmstat-0616-160555.txt nr_active_anon 14515197
proc-vmstat-0616-160535.txt nr_inactive_anon 747407
proc-vmstat-0616-160545.txt nr_inactive_anon 744846
proc-vmstat-0616-160555.txt nr_inactive_anon 744509
proc-vmstat-0616-160535.txt nr_vmscan_write 5589095
proc-vmstat-0616-160545.txt nr_vmscan_write 5589097
proc-vmstat-0616-160555.txt nr_vmscan_write 5589097
proc-vmstat-0616-160535.txt pgactivate 246016824
proc-vmstat-0616-160545.txt pgactivate 246033242
proc-vmstat-0616-160555.txt pgactivate 246042064
proc-vmstat-0616-160535.txt pgrefill_normal 22763262
proc-vmstat-0616-160545.txt pgrefill_normal 22768020
proc-vmstat-0616-160555.txt pgrefill_normal 22768178
proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637

The pgrefill_normal and pgactivate counters show that only few
hundreds/sec pages move from active to inactive and vice versa lists -
that is comparable with what was reclaimed. So it looks like kswapd
scans the pages from inactive list mostly in kind of a loop and does
not even have a chance to look at the pages from the active list
(where most of the application's anon pages are located).

The kernel version: linux-3.10.0-229.14.1.el7.

Any ideas? Would be be useful to change inactive_ratio dynamically in
such a cases so that more pages could be moved from active to inactive
list and get a chance to be reclaimed? (Note: when application is
restarted - the problem disappears for a while (days) until the
correspondent number of privately mapped pages are dirtied again.)

Thank you,
   Andriy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mm: kswapd struggles reclaiming the pages on 64GB server
  2016-08-12 20:52 mm: kswapd struggles reclaiming the pages on 64GB server Andriy Tkachuk
@ 2016-08-17 11:43 ` Michal Hocko
  2016-08-22 18:16   ` Andriy Tkachuk
       [not found] ` <20160818140513.GA2453@blaptop>
  2016-08-22 18:37 ` Andriy Tkachuk
  2 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-08-17 11:43 UTC (permalink / raw)
  To: Andriy Tkachuk; +Cc: linux-kernel, Mel Gorman, linux-mm, Johannes Weiner

[CCing linux-mm and Johannes]

On Fri 12-08-16 21:52:20, Andriy Tkachuk wrote:
> Hi,
> 
> our user-space application uses large amount of anon pages (private
> mapping of the large file, more than 64GB RAM available in the system)
> which are rarely accessible and are supposed to be swapped out.
> Instead, we see that most of these pages are kept in memory while the
> system suffers from the lack of free memory and overall performance
> (especially the disk I/O, vm.swappiness=100 does not help it). kswapd
> scans millions of pages per second but reclames hundreds per sec only.

I haven't looked at your numbers deeply but this smells like the long
standing problem/limitation we have. We are trying really hard to not
swap out and rather reclaim the page cache because the swap refault
tends to be more disruptive in many case. Not all, though, and trashing
like behavior you see is cetainly undesirable.

Johannes has been looking into that area recently. Have a look at
http://lkml.kernel.org/r/20160606194836.3624-1-hannes@cmpxchg.org

> Here are the 5 secs interval snapshots of some counters:
> 
> $ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
> proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2
> proc-meminfo-0616-160539.txt Cached:           347936 kB
> proc-meminfo-0616-160549.txt Cached:           316316 kB
> proc-meminfo-0616-160559.txt Cached:           322264 kB
> proc-meminfo-0616-160539.txt SwapCached:      2853064 kB
> proc-meminfo-0616-160549.txt SwapCached:      2853168 kB
> proc-meminfo-0616-160559.txt SwapCached:      2853280 kB
> proc-vmstat-0616-160535.txt nr_active_anon 14508616
> proc-vmstat-0616-160545.txt nr_active_anon 14513725
> proc-vmstat-0616-160555.txt nr_active_anon 14515197
> proc-vmstat-0616-160535.txt nr_inactive_anon 747407
> proc-vmstat-0616-160545.txt nr_inactive_anon 744846
> proc-vmstat-0616-160555.txt nr_inactive_anon 744509
> proc-vmstat-0616-160535.txt nr_vmscan_write 5589095
> proc-vmstat-0616-160545.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160555.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160535.txt pgactivate 246016824
> proc-vmstat-0616-160545.txt pgactivate 246033242
> proc-vmstat-0616-160555.txt pgactivate 246042064
> proc-vmstat-0616-160535.txt pgrefill_normal 22763262
> proc-vmstat-0616-160545.txt pgrefill_normal 22768020
> proc-vmstat-0616-160555.txt pgrefill_normal 22768178
> proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
> proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
> proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
> proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
> proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
> proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637
> 
> The pgrefill_normal and pgactivate counters show that only few
> hundreds/sec pages move from active to inactive and vice versa lists -
> that is comparable with what was reclaimed. So it looks like kswapd
> scans the pages from inactive list mostly in kind of a loop and does
> not even have a chance to look at the pages from the active list
> (where most of the application's anon pages are located).
> 
> The kernel version: linux-3.10.0-229.14.1.el7.
> 
> Any ideas? Would be be useful to change inactive_ratio dynamically in
> such a cases so that more pages could be moved from active to inactive
> list and get a chance to be reclaimed? (Note: when application is
> restarted - the problem disappears for a while (days) until the
> correspondent number of privately mapped pages are dirtied again.)
> 
> Thank you,
>    Andriy

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mm: kswapd struggles reclaiming the pages on 64GB server
       [not found] ` <20160818140513.GA2453@blaptop>
@ 2016-08-18 17:43   ` Andriy Tkachuk
  2016-08-22 18:10   ` Andriy Tkachuk
  1 sibling, 0 replies; 9+ messages in thread
From: Andriy Tkachuk @ 2016-08-18 17:43 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-kernel, Mel Gorman

Hi Minchan,

That's an interesting idea. Although the counters contradicts the theory:

proc-vmstat-0616-160535.txt nr_active_file 44616
proc-vmstat-0616-160545.txt nr_active_file 43471
proc-vmstat-0616-160555.txt nr_active_file 44477
proc-vmstat-0616-160535.txt nr_inactive_file 55534
proc-vmstat-0616-160545.txt nr_inactive_file 35602
proc-vmstat-0616-160555.txt nr_inactive_file 36108
proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637

Anyway, I will give it a try if inactive_ratio=1 (which I'm currently
testing) won't help.

Thank you,
  Andriy

On Thu, Aug 18, 2016 at 3:05 PM, Minchan Kim <minchan@kernel.org> wrote:
> Currently, VM never reclaims anonymous pages if the number of inactive
> file LRU is greater than one of actice file LRU. Even, it makes
> user tune of vm.swappiness void.
>
> Could you rebuild your kernel? I want to see what happens if we
> remove below part from get_scan_count.
>
> http://lxr.free-electrons.com/source/mm/vmscan.c?v=3.10#L1728
>
> 1727         /*
> 1728          * There is enough inactive page cache, do not reclaim
> 1729          * anything from the anonymous working set right now.
> 1730          */
> 1731         if (!inactive_file_is_low(lruvec)) {
> 1732                 scan_balance = SCAN_FILE;
> 1733                 goto out;
> 1734         }
>
> Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mm: kswapd struggles reclaiming the pages on 64GB server
       [not found] ` <20160818140513.GA2453@blaptop>
  2016-08-18 17:43   ` Andriy Tkachuk
@ 2016-08-22 18:10   ` Andriy Tkachuk
  1 sibling, 0 replies; 9+ messages in thread
From: Andriy Tkachuk @ 2016-08-22 18:10 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-kernel, Mel Gorman

Hi Minchan,

That's an interesting idea. Although the counters contradicts the theory:

proc-vmstat-0616-160535.txt nr_active_file 44616
proc-vmstat-0616-160545.txt nr_active_file 43471
proc-vmstat-0616-160555.txt nr_active_file 44477
proc-vmstat-0616-160535.txt nr_inactive_file 55534
proc-vmstat-0616-160545.txt nr_inactive_file 35602
proc-vmstat-0616-160555.txt nr_inactive_file 36108
proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637

Anyway, I will give it a try if inactive_ratio=1 (which I'm currently
testing) won't help.

Thank you,
  Andriy

On Thu, Aug 18, 2016 at 3:05 PM, Minchan Kim <minchan@kernel.org> wrote:
>
> Currently, VM never reclaims anonymous pages if the number of inactive
> file LRU is greater than one of actice file LRU. Even, it makes
> user tune of vm.swappiness void.
>
> Could you rebuild your kernel? I want to see what happens if we
> remove below part from get_scan_count.
>
> http://lxr.free-electrons.com/source/mm/vmscan.c?v=3.10#L1728
>
> 1727         /*
> 1728          * There is enough inactive page cache, do not reclaim
> 1729          * anything from the anonymous working set right now.
> 1730          */
> 1731         if (!inactive_file_is_low(lruvec)) {
> 1732                 scan_balance = SCAN_FILE;
> 1733                 goto out;
> 1734         }
>
> Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mm: kswapd struggles reclaiming the pages on 64GB server
  2016-08-17 11:43 ` Michal Hocko
@ 2016-08-22 18:16   ` Andriy Tkachuk
  0 siblings, 0 replies; 9+ messages in thread
From: Andriy Tkachuk @ 2016-08-22 18:16 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, Mel Gorman, linux-mm, Johannes Weiner

Hi Michal.

Thank you for the reply.

It looks like the root cause of the problems we are facing is a bit
different, although the ultimate effect is similar - bad swapping
effectiveness.

As far as I could understand, Johannes tries to fix the balancing
between anon and file lists. But in my case it looks like the anon
pages which are idle for a long time and could be swapped out - they
all are just sitting in active list and don't move to inactive without
a chance to be scanned and eventually swapped out. (See the
/proc/vmstat samples and explanations in my prev. mail. BTW, the
samples interval is 10 secs there, not the 5. My typo.)

It looks like in my case the system load activity enters a steady mode
when all the scanned pages from inactive list become referenced very
soon. So kswapd aggresively scans, but mostly the inactive list where
it can hardly find to reclaim anything. So the inactive list is not
shortened and, as result, is not refilled from the active one. That's
why the anon pages from active list are not even get a chance to be
scanned. Note: the zone's inactive_ratio is more than 10 on 64GB RAM
systems, so the inactive list is much smaller than active in my case.

  Andriy

On Wed, Aug 17, 2016 at 12:43 PM, Michal Hocko <mhocko@kernel.org> wrote:
> [CCing linux-mm and Johannes]
>
>
> I haven't looked at your numbers deeply but this smells like the long
> standing problem/limitation we have. We are trying really hard to not
> swap out and rather reclaim the page cache because the swap refault
> tends to be more disruptive in many case. Not all, though, and trashing
> like behavior you see is cetainly undesirable.
>
> Johannes has been looking into that area recently. Have a look at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_20160606194836.3624-2D1-2Dhannes-40cmpxchg.org&d=DQIBAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=rP2MQ-RHGa6a64ebEAbeV_m6Ae_GOWHWTIpipamZCdE&m=Mxava1puJmDToyZNc62FshgwDC66k26arjHAM6o54yI&s=wmYJ3WdYDc73B7hO75xxvmIk0hDoTUSjGH-KxSC48SA&e=
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mm: kswapd struggles reclaiming the pages on 64GB server
  2016-08-12 20:52 mm: kswapd struggles reclaiming the pages on 64GB server Andriy Tkachuk
  2016-08-17 11:43 ` Michal Hocko
       [not found] ` <20160818140513.GA2453@blaptop>
@ 2016-08-22 18:37 ` Andriy Tkachuk
  2016-08-22 22:46   ` Andriy Tkachuk
  2 siblings, 1 reply; 9+ messages in thread
From: Andriy Tkachuk @ 2016-08-22 18:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mel Gorman

The following patch resolved the problem:

--- linux-3.10.0-229.20.1.el7.x86_64.orig/mm/page_alloc.c
2015-09-24 15:47:25.000000000 +0000
+++ linux-3.10.0-229.20.1.el7.x86_64/mm/page_alloc.c    2016-08-15
09:49:46.922240569 +0000
@@ -5592,16 +5592,7 @@
  */
 static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
 {
-       unsigned int gb, ratio;
-
-       /* Zone size in gigabytes */
-       gb = zone->managed_pages >> (30 - PAGE_SHIFT);
-       if (gb)
-               ratio = int_sqrt(10 * gb);
-       else
-               ratio = 1;
-
-       zone->inactive_ratio = ratio;
+       zone->inactive_ratio = 1;
 }

 static void __meminit setup_per_zone_inactive_ratio(void)

Now, kswapd works Ok:

# sar -rB -s 12:40:00
Linux 3.10.0-229.20.1.el7.mmir.x86_64
(castor-dev2-1-ssu-1-1.xy01.xyratex.com) 22/08/16 _x86_64_ (20 CPU)

12:40:01     pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s
pgscank/s pgscand/s pgsteal/s    %vmeff
12:50:01        12.93 421699.50   1768.24      0.10 115379.86
59.40      0.16     58.61     98.40
13:00:01        15.12 1486930.99   3068.52      0.24 404668.48
145.59     10.86    123.66     79.04
13:10:01         2.93 1491112.20   3468.10      0.12 405929.73
131.39      2.27    132.57     99.18
13:20:01         1.54 1472652.82   3484.32      0.10 400732.43
142.20      0.85    139.14     97.26
13:30:01         3.96 1463064.37   3039.36      0.12 398160.98
147.01      2.37    143.24     95.89
13:40:01         4.10 1446720.90   3451.41      0.12 394217.53
148.88      0.63    145.94     97.61
13:50:01         5.97 1463028.44   3520.28      0.14 399012.98
139.21      0.69    133.79     95.63
14:00:01        13.85 1445805.09   3071.23      0.30 393987.10
131.68      8.85    137.08     97.55
14:10:01        25.07 1421646.44   3738.69      0.39 387701.97
164.25     20.15    166.93     90.53
14:20:01        11.89 1379723.10   4270.80      0.23 375872.58
146.70     14.93    125.28     77.51
14:30:01         8.31 1375180.31   5472.30      0.17 373244.28
134.46      3.91    119.24     86.18
14:40:01        10.11 1334036.49   8850.42      0.15 360701.23
159.57      4.10    154.27     94.25
14:50:01        82.80 1140616.24  12696.72      0.46 306044.83
159.30      3.28    125.51     77.20
15:00:01        19.21 1088806.11  14202.19      0.26 291642.70
161.19      4.74    111.28     67.06
15:10:01        27.08 1046627.43  17154.44      0.34 281435.13
192.43      7.53    102.38     51.20
15:20:01        68.72 1028855.37  14164.93      0.66 275401.73
215.65      4.13    135.70     61.74
15:30:01        16.75 992275.54  12239.61      0.23 265369.64
182.36     10.87     99.25     51.36
15:40:01       150.85 940090.67  12283.44      1.11 251461.55
299.20     23.75    142.78     44.21
15:50:01       132.14 977937.08  12619.63      1.00 261673.06
220.62     64.57    135.61     47.55
16:00:01        17.95 1024651.95  12030.00      0.28 274089.74
184.80     18.48    104.86     51.59
16:10:01        41.26 1023482.82  12726.94      0.38 274037.38
234.26     35.27    109.40     40.59
16:20:01        76.12 1026993.21  12549.89      0.47 274872.77
206.96     19.29    124.67     55.11
16:30:03        17.41 1012199.77  12135.95      0.27 270494.82
161.94     10.27    108.83     63.19
16:40:01        69.82 1016362.58  12766.07      0.64 271505.17
119.43      9.42    112.50     87.31
16:50:01        17.72 1026519.75  12774.39      0.18 274760.91
179.48     20.01    138.00     69.18
17:00:01         6.88 1033279.20  12364.86      0.13 276506.10
126.42      6.44     90.20     67.89
17:10:01        33.26 1018956.92  12867.66      0.49 272069.13
202.53      7.44    115.20     54.87
17:20:01        28.07 1040438.90  13080.03      0.35 278356.95
188.47      1.71    116.80     61.41
17:30:01        13.69 1052387.04  12603.56      0.16 281346.41
149.10     16.91    102.65     61.84
17:40:01        17.55 1223806.22   8258.37      0.30 330923.67
214.27      7.30    146.13     65.95
17:50:01        20.46 1424863.99   3607.95      0.33 388637.98
217.85     17.75    136.05     57.75
18:00:01         9.47 1435175.96   3076.28      0.21 391356.34
185.68      3.27    146.21     77.38
18:10:01        39.88 1408818.72   3579.11      0.53 384292.74
174.22      6.73    148.55     82.09
18:20:01        78.72 1408298.79   3394.73      0.54 384043.36
204.96     10.88    148.06     68.60
18:30:01        51.71 1408936.46   3101.97      0.31 384366.70
211.51     19.30    148.59     64.38
Average:        32.95 1200040.29   8385.19      0.34 324404.13
172.66     11.40    126.54     68.75

12:40:01    kbmemfree kbmemused  %memused kbbuffers  kbcached
kbcommit   %commit  kbactive   kbinact   kbdirty
12:50:01       451392  64439956     99.30     47944   2263748
17990440      0.60  56481888   3822316     87220
13:00:01       395572  64495776     99.39     47060   2156628
18012880      0.60  56586092   3776320    100392
13:10:01       375588  64515760     99.42     46372   2015544
18023380      0.60  56666464   3724456     82656
13:20:01       371580  64519768     99.43     42960   1862684
18038108      0.60  56753352   3646544     74712
13:30:01       387664  64503684     99.40     38288   1699400
17997108      0.60  56815292   3579896    100368
13:40:01       407416  64483932     99.37     37756   1530260
18050920      0.60  56885756   3495732     96496
13:50:01       384380  64506968     99.41     37492   1395668
18007024      0.60  56961776   3447744     69308
14:00:01       370912  64520436     99.43     37084   1249500
18042408      0.60  57049020   3374072     85240
14:10:01       430832  64460516     99.34     36476   1051056
18056976      0.60  57099724   3275024     72736
14:20:01       409092  64482256     99.37     36064    930640
18045188      0.60  57138080   3265412     90468
14:30:01       383788  64507560     99.41     36132    815988
18005000      0.60  57241368   3199932     73492
14:40:01       454772  64436576     99.30     35832    620456
18031016      0.60  57300680   3079968     62212
14:50:01       389056  64502292     99.40     35844    570248
18188116      0.61  57347356   3100696     50184
15:00:01       376236  64515112     99.42     35624    471872
18120144      0.60  57433844   3023324     31188
15:10:01       351076  64540272     99.46     36192    484056
18144660      0.61  57441692   3040976     43268
15:20:01       369376  64521972     99.43     35892    396844
18194364      0.61  57479424   2993376     24244
15:30:01       350816  64540532     99.46     16988    403452
18201032      0.61  57462032   3021052      5084
15:40:01       361708  64529640     99.44     35780    395412
18215232      0.61  57460724   3012920     48356
15:50:01       367740  64523608     99.43     35904    432400
18237484      0.61  57475064   2987216     52160
16:00:01       364612  64526736     99.44     30772    435952
18243788      0.61  57445236   3029688      6156
16:10:01       371876  64519472     99.43     36216    476668
18233160      0.61  57443644   3024052     31988
16:20:01       369688  64521660     99.43     35896    455088
18224132      0.61  57437980   3027880     28900
16:30:03       369128  64522220     99.43     36060    452908
18258964      0.61  57419672   3036052     38896
16:40:01       369420  64521928     99.43     36452    521608
18239212      0.61  57447508   3017172     15288
16:50:01       412064  64479284     99.36     36644    536480
18319720      0.61  57414720   3004872     31308
17:00:01       362868  64528480     99.44     36220    514224
18371112      0.61  57454484   3012064     33012
17:10:01       381992  64509356     99.41     36128    507728
18305896      0.61  57445048   3006644     35220
17:20:01       348128  64543220     99.46     36396    527712
18319784      0.61  57457988   2999732     36024
17:30:01       365352  64525996     99.44     36316    519996
18359776      0.61  57444096   3013648     44488
17:40:01       401872  64489476     99.38     36040    506764
18365400      0.61  57417724   3002256     98176
17:50:01       385680  64505668     99.41     35936    539116
18340608      0.61  57440976   3000548    101128
18:00:01       388136  64503212     99.40     35948    530852
18339400      0.61  57416176   3016880     82168
18:10:01       399332  64492016     99.38     35904    542400
18359172      0.61  57400064   3023684     73284
18:20:01       369256  64522092     99.43     35752    544800
18386912      0.61  57439340   3016916     91680
18:30:01       364256  64527092     99.44     30092    448544
18448408      0.62  57481544   2977968     66912
Average:       383219  64508129     99.41     36527    823048
18191912      0.61  57259595   3173629     58983

You can compare with how it was before:

# sar -rB -11 -e 10:00:00
Linux 3.10.0-229.20.1.el7.x86_64
(castor-dev2-1-ssu-1-1.xy01.xyratex.com) 11/08/16 _x86_64_ (20 CPU)

00:00:01     pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s
pgscank/s pgscand/s pgsteal/s    %vmeff
00:10:01        16.15 1404641.02   3294.83      0.30 384068.82
232.61      2.51    130.05     55.31
00:20:01         4.74 1386234.27   3264.20      0.17 378183.96
131.42     46.21    150.36     84.65
00:30:01        16.81 1357082.95   2856.56      0.44 370191.72
171.20      7.29    131.70     73.78
00:40:01        10.56 1380877.77   3330.86      0.21 376939.96
137.89      0.13    129.86     94.09
00:50:01        13.36 1389993.97   3514.01      0.24 379609.43
129.31      2.11    130.64     99.41
01:00:01         4.81 1384278.89   2788.69      0.18 378195.35
97.28     34.40    126.75     96.26
01:10:01        19.40 1397710.30   3254.28      0.48 382363.61
159.35     16.09    144.62     82.43
01:20:01        20.13 1373687.69   3373.39      0.40 375604.74
183.20      8.65    142.67     74.37
01:30:01         8.07 1384095.36   2937.47      0.22 377786.50
140.68      2.34    105.38     73.68
01:40:01        29.87 1356288.95   4059.10      0.66 369618.89
180.17     22.32    139.33     68.81
01:50:01        12.11 1360606.12   6005.05      0.30 369811.12
162.28     12.94    133.57     76.23
02:00:01        15.43 1310682.12   8775.43      0.38 354317.43
140.11      2.45    124.60     87.40
02:10:01       160.55 1183661.76  12517.79      1.19 317775.22
245.93      7.94    159.73     62.92
02:20:01        13.92 1115703.40  12939.00      0.34 299746.54
139.83     25.68    111.89     67.60
02:30:01        27.84 1082827.92  12784.33      0.51 289803.97
166.12     12.25    116.07     65.07
02:40:01       107.16 1042108.13  13104.45      1.66 280838.24
536.21     52.80    198.50     33.70
02:50:01       200.74 1013103.67  12483.72      1.91 272091.10
180.10     40.44    117.32     53.20
03:00:01        45.46 951048.89  11588.68      0.70 253985.85
152.91     17.33    107.89     63.37
03:10:01       274.33 914538.01  11580.86      1.84 244733.49
276.00     23.02    157.40     52.64
03:20:01       230.53 946778.38  11850.88      1.68 252944.34
255.13     25.37    153.57     54.75
03:30:01        33.31 951669.66  11559.20      0.49 254101.69
231.74      8.91    105.05     43.65
03:40:01       193.92 948298.99  11803.87      1.27 253376.52
205.85     40.59    135.43     54.96
03:50:01       209.60 944570.03  11719.10      1.68 253621.15
393.70     84.65    155.59     32.53
04:00:04        42.38 963933.85  11575.27      0.72 258323.08
184.07     23.46    109.61     52.82
04:10:01       184.68 935221.76  11612.56      1.55 249601.47
253.38     62.29    160.08     50.71
04:20:01       170.13 957115.71  11844.09      1.47 255755.14
189.18     17.96    132.43     63.93
04:30:01        22.73 962524.12  11491.38      0.49 256996.19
243.93     16.37    121.87     46.82
04:40:01       163.59 1016805.90  12686.28      1.38 273720.39
184.26     33.65    123.36     56.61
04:50:01       196.51 980225.02  12164.08      1.48 261900.97
322.95      1.58    148.96     45.90
05:00:01        33.60 987241.49  11727.46      0.85 263974.46
218.17      5.10    122.89     55.04
05:10:01       178.82 981502.18  12035.80      1.45 263338.30
207.56     30.26    127.70     53.70
05:20:01       161.28 975806.81  12097.08      1.49 261729.22
286.58     11.61    140.77     47.21
05:30:01        20.69 1031886.75  12458.76      0.48 275536.39
142.55      1.72    115.66     80.16
05:40:01       155.15 1040222.10  12654.04      1.52 278109.73
219.58      2.17    133.89     60.38
05:50:02       161.97 1051745.35  13017.94      1.35 280364.86
243.50      0.82    140.48     57.50
06:00:01        11.48 1040054.57  12496.37      0.45 277647.70
193.47      1.71    102.83     52.68
06:10:01        78.58 1047165.77  12873.49      1.08 281039.20
301.84      9.43    123.11     39.55
06:20:01        31.89 1031681.84  12845.93      0.74 275854.99
295.80     18.28    119.52     38.05

06:20:01     pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s
pgscank/s pgscand/s pgsteal/s    %vmeff
06:30:01        29.16 954012.01  11468.69      0.68 255034.94
257.05      7.65     93.99     35.51
06:40:01       174.64 948330.61  12026.78      1.52 254053.36
335.23     77.66    161.41     39.09
06:50:02       206.84 948250.20  11888.93      2.09 253588.96
297.95     84.36    118.77     31.07
07:00:01        78.79 907991.32  11184.25      2.01 242885.50
443.94    141.27    178.75     30.54
07:10:02       220.70 967317.05  11910.45      2.03 258254.43
195.45     48.15    104.63     42.95
07:20:01       246.72 907821.00  11519.52      2.05 242241.12
473.27     47.81    172.09     33.03
07:30:01        26.82 912501.70  11180.66      0.53 243061.75
302.41      7.15     92.05     29.74
07:40:01       208.51 908719.26  11516.04      1.57 242513.90
395.00     81.67    139.60     29.29
07:50:02       250.95 878989.78  11274.06      2.25 234792.89
457.96    101.79    183.52     32.79
08:00:01        39.00 906980.45  11138.19      0.76 241717.84
284.69     56.45     66.59     19.52
08:10:01       284.57 860345.15  10982.99      2.23 229377.75
704.29     60.74    162.32     21.22
08:20:02       270.84 894864.88  11444.78      2.32 238843.40
855.66     93.73    182.15     19.19
08:30:01       117.18 881583.22  10800.16      1.35 235006.90
879.26     92.42     95.46      9.82
08:40:01       387.30 848196.38  10971.65      4.08 226516.28
1766.13     81.07    183.09      9.91
08:50:02       510.00 823844.11  10676.30      4.26 219795.20
11058.82    328.04    235.80      2.07
09:00:01       587.25 753398.48   9799.37     15.02 203298.15
2790216.53 232433.56    252.03      0.01
09:10:01       726.68 714463.50   9959.91     19.55 194575.96
3512104.61 203148.26    267.57      0.01
09:20:02      1178.01 698973.34  13755.88     28.46 196646.42
2537620.87 172806.01    406.69      0.02
09:30:01       141.38   5648.11   1969.17      3.42   3386.52
1523.34     74.06     35.22      2.20
09:40:01       207.36      8.62   1151.16      2.07   1229.41
516.08     15.16     52.49      9.88
09:50:01       197.88     35.85   1164.62      2.00   1071.57
455.04     17.15     43.76      9.27
Average:       158.89 988256.17   9707.94      2.27 266132.70
150340.53  10347.54    139.99      0.09

00:00:01    kbmemfree kbmemused  %memused kbbuffers  kbcached
kbcommit   %commit  kbactive   kbinact   kbdirty
00:10:01       398544  64492804     99.39     40904   2276864
17750540      0.59  56426672   3899348     98240
00:20:01       441496  64449852     99.32     40060   2083052
17750804      0.59  56490908   3799812    100308
00:30:01       427920  64463428     99.34     39660   1936852
17771604      0.59  56570748   3729996     71680
00:40:01       435424  64455924     99.33     39620   1789040
17750644      0.59  56649744   3660540    100336
00:50:01       425944  64465404     99.34     39000   1641252
17750668      0.59  56738996   3588252     84644
01:00:01       406188  64485160     99.37     38340   1503840
17800380      0.59  56841136   3509300     98720
01:10:01       419436  64471912     99.35     37556   1334484
17796612      0.59  56917820   3421032     77480
01:20:01       454904  64436444     99.30     37256   1179180
17765700      0.59  56975044   3352512     22496
01:30:01       377404  64513944     99.42     36912   1100256
17726548      0.59  57109180   3297516    102600
01:40:01       376152  64515196     99.42     36700    948500
17769548      0.59  57179960   3231004     84488
01:50:01       381752  64509596     99.41     36724    801320
17771416      0.59  57255664   3160112     77400
02:00:01       376536  64514812     99.42     36556    666152
17849088      0.60  57323040   3105728     50264
02:10:01       395088  64496260     99.39     36072    583516
17754188      0.59  57348064   3065888     44128
02:20:01       374968  64516380     99.42     36268    490412
17854120      0.60  57409004   3024260     40088
02:30:01       364016  64527332     99.44     19548    411300
17839936      0.60  57463636   2985004      4688
02:40:01       443248  64448100     99.32      8564    318156
17897900      0.60  57372164   2992728      5488
02:50:01       352792  64538556     99.46     36000    343600
17912884      0.60  57452744   3005652     29320
03:00:01       363212  64528136     99.44      8984    304220
17857616      0.60  57527524   2919524     16064
03:10:01       368376  64522972     99.43      5876    339784
17972788      0.60  57504872   2942548     29676
03:20:01       375252  64516096     99.42     35908    293344
17987500      0.60  57478156   2957516     32196
03:30:01       385104  64506244     99.41      5752    321424
17865424      0.60  57453188   2979400     24976
03:40:01       365052  64526296     99.44     12292    329148
17915148      0.60  57511448   2940432     17920
03:50:01       366604  64524744     99.44     35664    330620
17962012      0.60  57453372   2992128     21452
04:00:04       365136  64526212     99.44      5720    337220
17933724      0.60  57480432   2970224      1832
04:10:01       400772  64490576     99.38     11492    323164
17954392      0.60  57420320   2986376     26988
04:20:01       395976  64495372     99.39     35824    333564
17961036      0.60  57463112   2955224     33324
04:30:01       443440  64447908     99.32     10612    302404
18052728      0.60  57443820   2929944      2260
04:40:01       396616  64494732     99.39     36304    325844
17988460      0.60  57450692   2963008     33752
04:50:01       390196  64501152     99.40     35948    332368
18006884      0.60  57467744   2948932     28452
05:00:01       425740  64465608     99.34     10640    301788
18018220      0.60  57446200   2942148     21504
05:10:01       394020  64497328     99.39     34668    330452
18042300      0.60  57462580   2952032     25512
05:20:01       392720  64498628     99.39     35868    359960
17997036      0.60  57461848   2957944     24752
05:30:01       412140  64479208     99.36     10952    340188
18032556      0.60  57452104   2944352     31972
05:40:01       397212  64494136     99.39     31560    415388
18055992      0.60  57393964   3019064     35752
05:50:02       385960  64505388     99.41     36384    427728
18048676      0.60  57433620   2987148     26508
06:00:01       389056  64502292     99.40     31820    447560
18109392      0.60  57413784   3005420     27844
06:10:01       390460  64500888     99.40     36180    425804
18055028      0.60  57423380   2993544     26900
06:20:01       348872  64542476     99.46     23412    346220
18180900      0.61  57504260   2962424     54312

06:20:01    kbmemfree kbmemused  %memused kbbuffers  kbcached
kbcommit   %commit  kbactive   kbinact   kbdirty
06:30:01       383212  64508136     99.41     10780    320716
18037288      0.60  57485372   2936344     18724
06:40:01       430536  64460812     99.34     10528    269148
18207168      0.61  57467604   2913220     27528
06:50:02       360936  64530412     99.44     35904    313088
18089936      0.60  57495212   2950468     22564
07:00:01       436180  64455168     99.33      2764    285604
18211532      0.61  57428600   2930812     22964
07:10:02       367076  64524272     99.43     19508    342588
18111152      0.60  57466580   2963400     20176
07:20:01       363380  64527968     99.44     31204    296676
18170900      0.61  57505540   2935904     18576
07:30:01       344552  64546796     99.47      2404    297296
18229132      0.61  57517812   2936092     16224
07:40:01       346232  64545116     99.47      5980    309552
18251668      0.61  57529104   2926776     15660
07:50:02       377344  64514004     99.42      5164    248264
18229748      0.61  57534464   2895440      9816
08:00:01       362056  64529292     99.44      2424    266640
18190456      0.61  57544288   2899040      8792
08:10:01       366340  64525008     99.44      2924    271820
18233488      0.61  57535464   2901500     15792
08:20:02       364604  64526744     99.44      4636    210400
18279788      0.61  57553236   2889244      7712
08:30:01       362364  64528984     99.44      2100    257252
18267668      0.61  57546552   2892036     24424
08:40:01       368580  64522768     99.43      1992    243256
18247696      0.61  57561308   2879664      3404
08:50:02       359044  64532304     99.45      1828    167736
18265192      0.61  57616672   2849180      6492
09:00:01       344344  64547004     99.47      1832    188956
18346860      0.61  57659208   2826504      4692
09:10:01       347712  64543636     99.46      1816    204228
18306036      0.61  57640468   2828728      4180
09:20:02       363428  64527920     99.44      1616    150076
19041536      0.64  57638924   2818804      4048
09:30:01       394796  64496552     99.39      1920    144876
18317028      0.61  57612308   2826476       176
09:40:01       391140  64500208     99.40     35384    109440
18318936      0.61  57577156   2862968       188
09:50:01       369640  64521708     99.43     35228    130556
18312972      0.61  57596304   2858088       184
Average:       386563  64504785     99.40     22128    537358
18036891      0.60  57367511   3047436     32011

Now, with the following settings also our performance problems
vanished completely:

sysctl -w vm.swappiness=100
sysctl -w vm.dirty_background_ratio=80
sysctl -w vm.dirty_ratio=90

Thank you,
  Andriy

On Fri, Aug 12, 2016 at 9:52 PM, Andriy Tkachuk
<andriy.tkachuk@seagate.com> wrote:
> Hi,
>
> our user-space application uses large amount of anon pages (private
> mapping of the large file, more than 64GB RAM available in the system)
> which are rarely accessible and are supposed to be swapped out.
> Instead, we see that most of these pages are kept in memory while the
> system suffers from the lack of free memory and overall performance
> (especially the disk I/O, vm.swappiness=100 does not help it). kswapd
> scans millions of pages per second but reclames hundreds per sec only.
> Here are the 5 secs interval snapshots of some counters:
>
> $ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
> proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2
> proc-meminfo-0616-160539.txt Cached:           347936 kB
> proc-meminfo-0616-160549.txt Cached:           316316 kB
> proc-meminfo-0616-160559.txt Cached:           322264 kB
> proc-meminfo-0616-160539.txt SwapCached:      2853064 kB
> proc-meminfo-0616-160549.txt SwapCached:      2853168 kB
> proc-meminfo-0616-160559.txt SwapCached:      2853280 kB
> proc-vmstat-0616-160535.txt nr_active_anon 14508616
> proc-vmstat-0616-160545.txt nr_active_anon 14513725
> proc-vmstat-0616-160555.txt nr_active_anon 14515197
> proc-vmstat-0616-160535.txt nr_inactive_anon 747407
> proc-vmstat-0616-160545.txt nr_inactive_anon 744846
> proc-vmstat-0616-160555.txt nr_inactive_anon 744509
> proc-vmstat-0616-160535.txt nr_vmscan_write 5589095
> proc-vmstat-0616-160545.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160555.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160535.txt pgactivate 246016824
> proc-vmstat-0616-160545.txt pgactivate 246033242
> proc-vmstat-0616-160555.txt pgactivate 246042064
> proc-vmstat-0616-160535.txt pgrefill_normal 22763262
> proc-vmstat-0616-160545.txt pgrefill_normal 22768020
> proc-vmstat-0616-160555.txt pgrefill_normal 22768178
> proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
> proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
> proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
> proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
> proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
> proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637
>
> The pgrefill_normal and pgactivate counters show that only few
> hundreds/sec pages move from active to inactive and vice versa lists -
> that is comparable with what was reclaimed. So it looks like kswapd
> scans the pages from inactive list mostly in kind of a loop and does
> not even have a chance to look at the pages from the active list
> (where most of the application's anon pages are located).
>
> The kernel version: linux-3.10.0-229.14.1.el7.
>
> Any ideas? Would be be useful to change inactive_ratio dynamically in
> such a cases so that more pages could be moved from active to inactive
> list and get a chance to be reclaimed? (Note: when application is
> restarted - the problem disappears for a while (days) until the
> correspondent number of privately mapped pages are dirtied again.)
>
> Thank you,
>    Andriy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mm: kswapd struggles reclaiming the pages on 64GB server
  2016-08-22 18:37 ` Andriy Tkachuk
@ 2016-08-22 22:46   ` Andriy Tkachuk
  2016-08-23 15:14     ` Andriy Tkachuk
  0 siblings, 1 reply; 9+ messages in thread
From: Andriy Tkachuk @ 2016-08-22 22:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mel Gorman

On Mon, Aug 22, 2016 at 7:37 PM, Andriy Tkachuk
<andriy.tkachuk@seagate.com> wrote:
>
> The following patch resolved the problem:
> ...

Sorry, I was too hurry in sending good news. As appeared - the problem
is still there:

06:10:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s
pgscank/s pgscand/s pgsteal/s    %vmeff
06:20:01 PM     78.72 1408298.79   3394.73      0.54 384043.36
204.96     10.88    148.06     68.60
06:30:01 PM     51.71 1408936.46   3101.97      0.31 384366.70
211.51     19.30    148.59     64.38
06:40:01 PM     65.29 1366755.43   3506.76      0.65 372628.96
383.94     27.11    154.71     37.64
06:50:01 PM     66.89 1376770.04   3544.59      0.78 375676.38
344.94     12.48    149.84     41.92
07:00:01 PM     15.87 1364029.92   3030.66      0.31 371903.73
307.63     13.32    132.75     41.36
07:10:01 PM    134.24 1370087.74   3397.32      1.06 373435.85
394.16     21.16    166.62     40.12
07:20:01 PM     95.27 1379686.06   3462.59      0.74 376050.98
356.59     29.23    159.00     41.21
07:30:01 PM     12.66 1392306.35   2929.64      0.25 379662.45
395.07     28.66    138.58     32.71
07:40:01 PM    142.06 1368613.00   3486.30      0.98 372912.77
575.74     40.40    174.45     28.31
07:50:02 PM    119.77 1389287.62   3562.60      1.02 378661.56
584.00     29.36    167.92     27.38
08:00:01 PM      7.31 1368140.71   3082.65      0.50 372828.37
711.84     58.08    130.82     16.99
08:10:01 PM    174.74 1381379.24   3529.78      1.65 376567.77
1023.07     46.96    182.60     17.07
08:20:01 PM    205.05 1353845.50   3546.95      1.88 368828.37
1807.24     57.29    183.46      9.84
08:30:01 PM     86.50 1375125.95   3182.56      1.41 374676.32
10590.52    217.62    148.83      1.38
08:40:01 PM    359.38 1345209.72   4185.55      3.45 366630.54
1419665.54  43307.32    220.06      0.02
08:50:01 PM    277.27 1324076.64   3889.98      2.51 360915.41
1779132.87    410.38    200.49      0.01
09:00:02 PM     96.48 1364092.11   3235.53      1.69 372031.87
1973266.99    167.74    163.43      0.01
09:10:01 PM    298.87 1324763.68   3914.11      2.90 360957.82
1772066.29    435.17    204.08      0.01

And I also see the inactive file list bigger than the active file:

# cat /proc/vmstat | egrep
'Cached|nr_.*active_|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
nr_inactive_anon 695534
nr_active_anon 14427464
nr_inactive_file 2786
nr_active_file 2698
nr_vmscan_write 1740097
pgactivate 115697891
pgrefill_normal 33345818
pgsteal_kswapd_normal 367908859
pgsteal_direct_normal 681266
pgscan_kswapd_normal 10255454426

So it looks like the time to check Minchan's patch really (disable the
check that inactive file list is bigger than active file)...

  Andriy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mm: kswapd struggles reclaiming the pages on 64GB server
  2016-08-22 22:46   ` Andriy Tkachuk
@ 2016-08-23 15:14     ` Andriy Tkachuk
  2016-08-31 13:27       ` Andriy Tkachuk
  0 siblings, 1 reply; 9+ messages in thread
From: Andriy Tkachuk @ 2016-08-23 15:14 UTC (permalink / raw)
  To: linux-kernel

Well, as appeared - the patch did not affect the problem at all since
the memory cgroup was on (in which case zone's inactive_ratio is not
used, but the ratio is calculated directly at
mem_cgroup_inactive_anon_is_low()). So the patch will be retested with
memory cgroup off.

  Andriy

On Mon, Aug 22, 2016 at 11:46 PM, Andriy Tkachuk
<andriy.tkachuk@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 7:37 PM, Andriy Tkachuk
> <andriy.tkachuk@seagate.com> wrote:
>>
>> The following patch resolved the problem:
>> ...
>
> Sorry, I was too hurry in sending good news. As appeared - the problem
> is still there:
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mm: kswapd struggles reclaiming the pages on 64GB server
  2016-08-23 15:14     ` Andriy Tkachuk
@ 2016-08-31 13:27       ` Andriy Tkachuk
  0 siblings, 0 replies; 9+ messages in thread
From: Andriy Tkachuk @ 2016-08-31 13:27 UTC (permalink / raw)
  To: linux-kernel

Alright - after disabling memory cgroup all works perfectly with the
patch. Even with default vm parameters.

Here are some vmstat results to compare. Now:

# vmstat 60
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0 67606176 375196  38708 1385896    0   74    23 1266751 198073
103648  6  7 86  1  0
 3  0 67647904 394872  38612 1371200    0  695    18 1371067 212143
93917  7  8 85  1  0
 2  0 67648016 375796  38676 1382812    1    2    13 1356271 215123
115987  6  7 85  1  0
 3  0 67657392 378336  38744 1383468    1  157    15 1383591 213694
102457  6  7 86  1  0
 6  0 67659088 367856  38796 1388696    1   28    26 1330238 208377
111469  6  7 86  1  0
 2  0 67701344 407320  38680 1371004    0  704    34 1255911 203308
126458  8  8 82  3  0
 4  0 67711920 402296  38776 1380836    0  176     8 1308525 201451
93053  6  7 86  1  0
 8  0 67721264 376676  38872 1394816    0  156    14 1409726 218269
108127  7  8 85  1  0
18  0 67753872 395568  38896 1397144    0  544    16 1288576 201680
105980  6  7 86  1  0
 2  0 67755544 362960  38992 1411744    0   28    17 1458453 232544
127088  6  7 85  1  0
 4  0 67784056 376684  39088 1410924    0  475    25 1385831 218800
110344  6  7 85  1  0
 2  0 67816104 393108  38800 1384108    1  535    17 1336857 208551
105872  6  7 85  1  0
 7  0 67816104 399492  38820 1387096    0    0    17 1280630 205478
109499  6  7 86  1  0
 1  0 67821648 375284  38908 1397132    1   93    15 1343042 208363
98031  6  7 85  1  0
 1  0 67823512 363828  38924 1402388    0   31    15 1366995 212606
101328  6  7 85  1  0
 5  0 67864264 416720  38784 1374480    1  680    21 1372581 210256
95369  7  8 83  3  0

Swapping works smoothly, more than enough memory for caching
available, cpu-wait is about 1.

Before:

# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  2 13755748 334968   2140  63780 6684    0  7644    21 3122 7704  0
 9 83  8  0
 2  2 13760380 333628   2140  62468 4572 7764  4764  9129 3326 8678  0
10 83  7  0
 2  2 13761072 332888   2140  62608 4576 4256  4616  4470 3377 8906  0
10 82  7  0
 2  2 13760812 341532   2148  62644 5388 3532  5996  3996 3451 7521  0
10 83  7  0
 3  3 13757648 335116   2148  62944 6176    0  6480   238 3412 8905  0
10 83  7  0
 2  2 13752936 331908   2148  62336 7488    0  7628   201 3433 7483  0
10 83  7  0
 2  2 13752520 344428   2148  69412 5292 2160 15820  2324 7254 15960
0 11 82  7  0
 3  2 13750856 338056   2148  69864 5576    0  5984    28 3384 8060  0
10 84  6  0
 2  2 13748836 331516   2156  70116 6076    0  6376    44 3683 6941  2
10 82  6  0
 2  2 13750184 335732   2148  70764 3544 2664  4252  2692 3682 8435  3
10 83  4  0
 2  4 13747528 338492   2144  70872 9520 3152  9688  3176 4846 7013  1
10 82  7  0
 3  2 13756580 341752   2144  71060 9020 14740  9148 14764 4167 8024
1 10 80  9  0
 2  2 13749484 336900   2144  71504 6444    0  6916    24 3613 8472  1
10 82  7  0
 2  2 13740560 333148   2152  72480 6932    0  7952    44 3891 6819  1
10 82  7  0
 2  2 13734456 330896   2148  72920 12228 1736 12488  1764 3454 9321
2  9 82  8  0

The system got into classic thrashing from which it never came out.

Now:

# cat /proc/vmstat | egrep
'nr_.*active_|pg(steal|scan|refill).*_normal|nr_vmscan_write|nr_swap|pgact'
nr_inactive_anon 7546598
nr_active_anon 7547226
nr_inactive_file 175973
nr_active_file 179439
nr_vmscan_write 17862257
pgactivate 213529452
pgrefill_normal 50400148
pgsteal_kswapd_normal 55904846
pgsteal_direct_normal 2417827
pgscan_kswapd_normal 76263257
pgscan_direct_normal 3213568

Before:

# cat /proc/vmstat | egrep
'nr_.*active_|pg(steal|scan|refill).*_normal|nr_vmscan_write|nr_swap|pgact'
nr_inactive_anon 695534
nr_active_anon 14427464
nr_inactive_file 2786
nr_active_file 2698
nr_vmscan_write 1740097
pgactivate 115697891
pgrefill_normal 33345818
pgsteal_kswapd_normal 367908859
pgsteal_direct_normal 681266
pgscan_kswapd_normal 10255454426

Here is the patch again for convenience:

--- linux-3.10.0-229.20.1.el7.x86_64.orig/mm/page_alloc.c
2015-09-24 15:47:25.000000000 +0000
+++ linux-3.10.0-229.20.1.el7.x86_64/mm/page_alloc.c    2016-08-15
09:49:46.922240569 +0000
@@ -5592,16 +5592,7 @@
  */
 static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
 {
-       unsigned int gb, ratio;
-
-       /* Zone size in gigabytes */
-       gb = zone->managed_pages >> (30 - PAGE_SHIFT);
-       if (gb)
-               ratio = int_sqrt(10 * gb);
-       else
-               ratio = 1;
-
-       zone->inactive_ratio = ratio;
+       zone->inactive_ratio = 1;
 }

Hope it will help someone facing the similar problems.

Regards,
  Andriy

On Tue, Aug 23, 2016 at 4:14 PM, Andriy Tkachuk
<andriy.tkachuk@seagate.com> wrote:
> Well, as appeared - the patch did not affect the problem at all since
> the memory cgroup was on (in which case zone's inactive_ratio is not
> used, but the ratio is calculated directly at
> mem_cgroup_inactive_anon_is_low()). So the patch will be retested with
> memory cgroup off.
>
>   Andriy
>
> On Mon, Aug 22, 2016 at 11:46 PM, Andriy Tkachuk
> <andriy.tkachuk@seagate.com> wrote:
>> On Mon, Aug 22, 2016 at 7:37 PM, Andriy Tkachuk
>> <andriy.tkachuk@seagate.com> wrote:
>>>
>>> The following patch resolved the problem:
>>> ...
>>
>> Sorry, I was too hurry in sending good news. As appeared - the problem
>> is still there:
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-08-31 13:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-12 20:52 mm: kswapd struggles reclaiming the pages on 64GB server Andriy Tkachuk
2016-08-17 11:43 ` Michal Hocko
2016-08-22 18:16   ` Andriy Tkachuk
     [not found] ` <20160818140513.GA2453@blaptop>
2016-08-18 17:43   ` Andriy Tkachuk
2016-08-22 18:10   ` Andriy Tkachuk
2016-08-22 18:37 ` Andriy Tkachuk
2016-08-22 22:46   ` Andriy Tkachuk
2016-08-23 15:14     ` Andriy Tkachuk
2016-08-31 13:27       ` Andriy Tkachuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).