* mm: kswapd struggles reclaiming the pages on 64GB server
@ 2016-08-12 20:52 Andriy Tkachuk
2016-08-17 11:43 ` Michal Hocko
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Andriy Tkachuk @ 2016-08-12 20:52 UTC (permalink / raw)
To: linux-kernel; +Cc: Mel Gorman
Hi,
our user-space application uses large amount of anon pages (private
mapping of the large file, more than 64GB RAM available in the system)
which are rarely accessible and are supposed to be swapped out.
Instead, we see that most of these pages are kept in memory while the
system suffers from the lack of free memory and overall performance
(especially the disk I/O, vm.swappiness=100 does not help it). kswapd
scans millions of pages per second but reclames hundreds per sec only.
Here are the 5 secs interval snapshots of some counters:
$ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2
proc-meminfo-0616-160539.txt Cached: 347936 kB
proc-meminfo-0616-160549.txt Cached: 316316 kB
proc-meminfo-0616-160559.txt Cached: 322264 kB
proc-meminfo-0616-160539.txt SwapCached: 2853064 kB
proc-meminfo-0616-160549.txt SwapCached: 2853168 kB
proc-meminfo-0616-160559.txt SwapCached: 2853280 kB
proc-vmstat-0616-160535.txt nr_active_anon 14508616
proc-vmstat-0616-160545.txt nr_active_anon 14513725
proc-vmstat-0616-160555.txt nr_active_anon 14515197
proc-vmstat-0616-160535.txt nr_inactive_anon 747407
proc-vmstat-0616-160545.txt nr_inactive_anon 744846
proc-vmstat-0616-160555.txt nr_inactive_anon 744509
proc-vmstat-0616-160535.txt nr_vmscan_write 5589095
proc-vmstat-0616-160545.txt nr_vmscan_write 5589097
proc-vmstat-0616-160555.txt nr_vmscan_write 5589097
proc-vmstat-0616-160535.txt pgactivate 246016824
proc-vmstat-0616-160545.txt pgactivate 246033242
proc-vmstat-0616-160555.txt pgactivate 246042064
proc-vmstat-0616-160535.txt pgrefill_normal 22763262
proc-vmstat-0616-160545.txt pgrefill_normal 22768020
proc-vmstat-0616-160555.txt pgrefill_normal 22768178
proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637
The pgrefill_normal and pgactivate counters show that only few
hundreds/sec pages move from active to inactive and vice versa lists -
that is comparable with what was reclaimed. So it looks like kswapd
scans the pages from inactive list mostly in kind of a loop and does
not even have a chance to look at the pages from the active list
(where most of the application's anon pages are located).
The kernel version: linux-3.10.0-229.14.1.el7.
Any ideas? Would be be useful to change inactive_ratio dynamically in
such a cases so that more pages could be moved from active to inactive
list and get a chance to be reclaimed? (Note: when application is
restarted - the problem disappears for a while (days) until the
correspondent number of privately mapped pages are dirtied again.)
Thank you,
Andriy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: kswapd struggles reclaiming the pages on 64GB server
2016-08-12 20:52 mm: kswapd struggles reclaiming the pages on 64GB server Andriy Tkachuk
@ 2016-08-17 11:43 ` Michal Hocko
[not found] ` <20160818140513.GA2453@blaptop>
2016-08-22 18:37 ` Andriy Tkachuk
2 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2016-08-17 11:43 UTC (permalink / raw)
To: Andriy Tkachuk; +Cc: linux-kernel, Mel Gorman, linux-mm, Johannes Weiner
[CCing linux-mm and Johannes]
On Fri 12-08-16 21:52:20, Andriy Tkachuk wrote:
> Hi,
>
> our user-space application uses large amount of anon pages (private
> mapping of the large file, more than 64GB RAM available in the system)
> which are rarely accessible and are supposed to be swapped out.
> Instead, we see that most of these pages are kept in memory while the
> system suffers from the lack of free memory and overall performance
> (especially the disk I/O, vm.swappiness=100 does not help it). kswapd
> scans millions of pages per second but reclames hundreds per sec only.
I haven't looked at your numbers deeply but this smells like the long
standing problem/limitation we have. We are trying really hard to not
swap out and rather reclaim the page cache because the swap refault
tends to be more disruptive in many case. Not all, though, and trashing
like behavior you see is cetainly undesirable.
Johannes has been looking into that area recently. Have a look at
http://lkml.kernel.org/r/20160606194836.3624-1-hannes@cmpxchg.org
> Here are the 5 secs interval snapshots of some counters:
>
> $ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
> proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2
> proc-meminfo-0616-160539.txt Cached: 347936 kB
> proc-meminfo-0616-160549.txt Cached: 316316 kB
> proc-meminfo-0616-160559.txt Cached: 322264 kB
> proc-meminfo-0616-160539.txt SwapCached: 2853064 kB
> proc-meminfo-0616-160549.txt SwapCached: 2853168 kB
> proc-meminfo-0616-160559.txt SwapCached: 2853280 kB
> proc-vmstat-0616-160535.txt nr_active_anon 14508616
> proc-vmstat-0616-160545.txt nr_active_anon 14513725
> proc-vmstat-0616-160555.txt nr_active_anon 14515197
> proc-vmstat-0616-160535.txt nr_inactive_anon 747407
> proc-vmstat-0616-160545.txt nr_inactive_anon 744846
> proc-vmstat-0616-160555.txt nr_inactive_anon 744509
> proc-vmstat-0616-160535.txt nr_vmscan_write 5589095
> proc-vmstat-0616-160545.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160555.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160535.txt pgactivate 246016824
> proc-vmstat-0616-160545.txt pgactivate 246033242
> proc-vmstat-0616-160555.txt pgactivate 246042064
> proc-vmstat-0616-160535.txt pgrefill_normal 22763262
> proc-vmstat-0616-160545.txt pgrefill_normal 22768020
> proc-vmstat-0616-160555.txt pgrefill_normal 22768178
> proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
> proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
> proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
> proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
> proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
> proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637
>
> The pgrefill_normal and pgactivate counters show that only few
> hundreds/sec pages move from active to inactive and vice versa lists -
> that is comparable with what was reclaimed. So it looks like kswapd
> scans the pages from inactive list mostly in kind of a loop and does
> not even have a chance to look at the pages from the active list
> (where most of the application's anon pages are located).
>
> The kernel version: linux-3.10.0-229.14.1.el7.
>
> Any ideas? Would be be useful to change inactive_ratio dynamically in
> such a cases so that more pages could be moved from active to inactive
> list and get a chance to be reclaimed? (Note: when application is
> restarted - the problem disappears for a while (days) until the
> correspondent number of privately mapped pages are dirtied again.)
>
> Thank you,
> Andriy
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: kswapd struggles reclaiming the pages on 64GB server
@ 2016-08-17 11:43 ` Michal Hocko
0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2016-08-17 11:43 UTC (permalink / raw)
To: Andriy Tkachuk; +Cc: linux-kernel, Mel Gorman, linux-mm, Johannes Weiner
[CCing linux-mm and Johannes]
On Fri 12-08-16 21:52:20, Andriy Tkachuk wrote:
> Hi,
>
> our user-space application uses large amount of anon pages (private
> mapping of the large file, more than 64GB RAM available in the system)
> which are rarely accessible and are supposed to be swapped out.
> Instead, we see that most of these pages are kept in memory while the
> system suffers from the lack of free memory and overall performance
> (especially the disk I/O, vm.swappiness=100 does not help it). kswapd
> scans millions of pages per second but reclames hundreds per sec only.
I haven't looked at your numbers deeply but this smells like the long
standing problem/limitation we have. We are trying really hard to not
swap out and rather reclaim the page cache because the swap refault
tends to be more disruptive in many case. Not all, though, and trashing
like behavior you see is cetainly undesirable.
Johannes has been looking into that area recently. Have a look at
http://lkml.kernel.org/r/20160606194836.3624-1-hannes@cmpxchg.org
> Here are the 5 secs interval snapshots of some counters:
>
> $ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
> proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2
> proc-meminfo-0616-160539.txt Cached: 347936 kB
> proc-meminfo-0616-160549.txt Cached: 316316 kB
> proc-meminfo-0616-160559.txt Cached: 322264 kB
> proc-meminfo-0616-160539.txt SwapCached: 2853064 kB
> proc-meminfo-0616-160549.txt SwapCached: 2853168 kB
> proc-meminfo-0616-160559.txt SwapCached: 2853280 kB
> proc-vmstat-0616-160535.txt nr_active_anon 14508616
> proc-vmstat-0616-160545.txt nr_active_anon 14513725
> proc-vmstat-0616-160555.txt nr_active_anon 14515197
> proc-vmstat-0616-160535.txt nr_inactive_anon 747407
> proc-vmstat-0616-160545.txt nr_inactive_anon 744846
> proc-vmstat-0616-160555.txt nr_inactive_anon 744509
> proc-vmstat-0616-160535.txt nr_vmscan_write 5589095
> proc-vmstat-0616-160545.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160555.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160535.txt pgactivate 246016824
> proc-vmstat-0616-160545.txt pgactivate 246033242
> proc-vmstat-0616-160555.txt pgactivate 246042064
> proc-vmstat-0616-160535.txt pgrefill_normal 22763262
> proc-vmstat-0616-160545.txt pgrefill_normal 22768020
> proc-vmstat-0616-160555.txt pgrefill_normal 22768178
> proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
> proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
> proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
> proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
> proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
> proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637
>
> The pgrefill_normal and pgactivate counters show that only few
> hundreds/sec pages move from active to inactive and vice versa lists -
> that is comparable with what was reclaimed. So it looks like kswapd
> scans the pages from inactive list mostly in kind of a loop and does
> not even have a chance to look at the pages from the active list
> (where most of the application's anon pages are located).
>
> The kernel version: linux-3.10.0-229.14.1.el7.
>
> Any ideas? Would be be useful to change inactive_ratio dynamically in
> such a cases so that more pages could be moved from active to inactive
> list and get a chance to be reclaimed? (Note: when application is
> restarted - the problem disappears for a while (days) until the
> correspondent number of privately mapped pages are dirtied again.)
>
> Thank you,
> Andriy
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: kswapd struggles reclaiming the pages on 64GB server
[not found] ` <20160818140513.GA2453@blaptop>
@ 2016-08-18 17:43 ` Andriy Tkachuk
2016-08-22 18:10 ` Andriy Tkachuk
1 sibling, 0 replies; 11+ messages in thread
From: Andriy Tkachuk @ 2016-08-18 17:43 UTC (permalink / raw)
To: Minchan Kim; +Cc: linux-kernel, Mel Gorman
Hi Minchan,
That's an interesting idea. Although the counters contradicts the theory:
proc-vmstat-0616-160535.txt nr_active_file 44616
proc-vmstat-0616-160545.txt nr_active_file 43471
proc-vmstat-0616-160555.txt nr_active_file 44477
proc-vmstat-0616-160535.txt nr_inactive_file 55534
proc-vmstat-0616-160545.txt nr_inactive_file 35602
proc-vmstat-0616-160555.txt nr_inactive_file 36108
proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637
Anyway, I will give it a try if inactive_ratio=1 (which I'm currently
testing) won't help.
Thank you,
Andriy
On Thu, Aug 18, 2016 at 3:05 PM, Minchan Kim <minchan@kernel.org> wrote:
> Currently, VM never reclaims anonymous pages if the number of inactive
> file LRU is greater than one of actice file LRU. Even, it makes
> user tune of vm.swappiness void.
>
> Could you rebuild your kernel? I want to see what happens if we
> remove below part from get_scan_count.
>
> http://lxr.free-electrons.com/source/mm/vmscan.c?v=3.10#L1728
>
> 1727 /*
> 1728 * There is enough inactive page cache, do not reclaim
> 1729 * anything from the anonymous working set right now.
> 1730 */
> 1731 if (!inactive_file_is_low(lruvec)) {
> 1732 scan_balance = SCAN_FILE;
> 1733 goto out;
> 1734 }
>
> Thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: kswapd struggles reclaiming the pages on 64GB server
[not found] ` <20160818140513.GA2453@blaptop>
2016-08-18 17:43 ` Andriy Tkachuk
@ 2016-08-22 18:10 ` Andriy Tkachuk
1 sibling, 0 replies; 11+ messages in thread
From: Andriy Tkachuk @ 2016-08-22 18:10 UTC (permalink / raw)
To: Minchan Kim; +Cc: linux-kernel, Mel Gorman
Hi Minchan,
That's an interesting idea. Although the counters contradicts the theory:
proc-vmstat-0616-160535.txt nr_active_file 44616
proc-vmstat-0616-160545.txt nr_active_file 43471
proc-vmstat-0616-160555.txt nr_active_file 44477
proc-vmstat-0616-160535.txt nr_inactive_file 55534
proc-vmstat-0616-160545.txt nr_inactive_file 35602
proc-vmstat-0616-160555.txt nr_inactive_file 36108
proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637
Anyway, I will give it a try if inactive_ratio=1 (which I'm currently
testing) won't help.
Thank you,
Andriy
On Thu, Aug 18, 2016 at 3:05 PM, Minchan Kim <minchan@kernel.org> wrote:
>
> Currently, VM never reclaims anonymous pages if the number of inactive
> file LRU is greater than one of actice file LRU. Even, it makes
> user tune of vm.swappiness void.
>
> Could you rebuild your kernel? I want to see what happens if we
> remove below part from get_scan_count.
>
> http://lxr.free-electrons.com/source/mm/vmscan.c?v=3.10#L1728
>
> 1727 /*
> 1728 * There is enough inactive page cache, do not reclaim
> 1729 * anything from the anonymous working set right now.
> 1730 */
> 1731 if (!inactive_file_is_low(lruvec)) {
> 1732 scan_balance = SCAN_FILE;
> 1733 goto out;
> 1734 }
>
> Thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: kswapd struggles reclaiming the pages on 64GB server
2016-08-17 11:43 ` Michal Hocko
@ 2016-08-22 18:16 ` Andriy Tkachuk
-1 siblings, 0 replies; 11+ messages in thread
From: Andriy Tkachuk @ 2016-08-22 18:16 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-kernel, Mel Gorman, linux-mm, Johannes Weiner
Hi Michal.
Thank you for the reply.
It looks like the root cause of the problems we are facing is a bit
different, although the ultimate effect is similar - bad swapping
effectiveness.
As far as I could understand, Johannes tries to fix the balancing
between anon and file lists. But in my case it looks like the anon
pages which are idle for a long time and could be swapped out - they
all are just sitting in active list and don't move to inactive without
a chance to be scanned and eventually swapped out. (See the
/proc/vmstat samples and explanations in my prev. mail. BTW, the
samples interval is 10 secs there, not the 5. My typo.)
It looks like in my case the system load activity enters a steady mode
when all the scanned pages from inactive list become referenced very
soon. So kswapd aggresively scans, but mostly the inactive list where
it can hardly find to reclaim anything. So the inactive list is not
shortened and, as result, is not refilled from the active one. That's
why the anon pages from active list are not even get a chance to be
scanned. Note: the zone's inactive_ratio is more than 10 on 64GB RAM
systems, so the inactive list is much smaller than active in my case.
Andriy
On Wed, Aug 17, 2016 at 12:43 PM, Michal Hocko <mhocko@kernel.org> wrote:
> [CCing linux-mm and Johannes]
>
>
> I haven't looked at your numbers deeply but this smells like the long
> standing problem/limitation we have. We are trying really hard to not
> swap out and rather reclaim the page cache because the swap refault
> tends to be more disruptive in many case. Not all, though, and trashing
> like behavior you see is cetainly undesirable.
>
> Johannes has been looking into that area recently. Have a look at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_20160606194836.3624-2D1-2Dhannes-40cmpxchg.org&d=DQIBAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=rP2MQ-RHGa6a64ebEAbeV_m6Ae_GOWHWTIpipamZCdE&m=Mxava1puJmDToyZNc62FshgwDC66k26arjHAM6o54yI&s=wmYJ3WdYDc73B7hO75xxvmIk0hDoTUSjGH-KxSC48SA&e=
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: kswapd struggles reclaiming the pages on 64GB server
@ 2016-08-22 18:16 ` Andriy Tkachuk
0 siblings, 0 replies; 11+ messages in thread
From: Andriy Tkachuk @ 2016-08-22 18:16 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-kernel, Mel Gorman, linux-mm, Johannes Weiner
Hi Michal.
Thank you for the reply.
It looks like the root cause of the problems we are facing is a bit
different, although the ultimate effect is similar - bad swapping
effectiveness.
As far as I could understand, Johannes tries to fix the balancing
between anon and file lists. But in my case it looks like the anon
pages which are idle for a long time and could be swapped out - they
all are just sitting in active list and don't move to inactive without
a chance to be scanned and eventually swapped out. (See the
/proc/vmstat samples and explanations in my prev. mail. BTW, the
samples interval is 10 secs there, not the 5. My typo.)
It looks like in my case the system load activity enters a steady mode
when all the scanned pages from inactive list become referenced very
soon. So kswapd aggresively scans, but mostly the inactive list where
it can hardly find to reclaim anything. So the inactive list is not
shortened and, as result, is not refilled from the active one. That's
why the anon pages from active list are not even get a chance to be
scanned. Note: the zone's inactive_ratio is more than 10 on 64GB RAM
systems, so the inactive list is much smaller than active in my case.
Andriy
On Wed, Aug 17, 2016 at 12:43 PM, Michal Hocko <mhocko@kernel.org> wrote:
> [CCing linux-mm and Johannes]
>
>
> I haven't looked at your numbers deeply but this smells like the long
> standing problem/limitation we have. We are trying really hard to not
> swap out and rather reclaim the page cache because the swap refault
> tends to be more disruptive in many case. Not all, though, and trashing
> like behavior you see is cetainly undesirable.
>
> Johannes has been looking into that area recently. Have a look at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_20160606194836.3624-2D1-2Dhannes-40cmpxchg.org&d=DQIBAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=rP2MQ-RHGa6a64ebEAbeV_m6Ae_GOWHWTIpipamZCdE&m=Mxava1puJmDToyZNc62FshgwDC66k26arjHAM6o54yI&s=wmYJ3WdYDc73B7hO75xxvmIk0hDoTUSjGH-KxSC48SA&e=
>
> --
> Michal Hocko
> SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: kswapd struggles reclaiming the pages on 64GB server
2016-08-12 20:52 mm: kswapd struggles reclaiming the pages on 64GB server Andriy Tkachuk
2016-08-17 11:43 ` Michal Hocko
[not found] ` <20160818140513.GA2453@blaptop>
@ 2016-08-22 18:37 ` Andriy Tkachuk
2016-08-22 22:46 ` Andriy Tkachuk
2 siblings, 1 reply; 11+ messages in thread
From: Andriy Tkachuk @ 2016-08-22 18:37 UTC (permalink / raw)
To: linux-kernel; +Cc: Mel Gorman
The following patch resolved the problem:
--- linux-3.10.0-229.20.1.el7.x86_64.orig/mm/page_alloc.c
2015-09-24 15:47:25.000000000 +0000
+++ linux-3.10.0-229.20.1.el7.x86_64/mm/page_alloc.c 2016-08-15
09:49:46.922240569 +0000
@@ -5592,16 +5592,7 @@
*/
static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
{
- unsigned int gb, ratio;
-
- /* Zone size in gigabytes */
- gb = zone->managed_pages >> (30 - PAGE_SHIFT);
- if (gb)
- ratio = int_sqrt(10 * gb);
- else
- ratio = 1;
-
- zone->inactive_ratio = ratio;
+ zone->inactive_ratio = 1;
}
static void __meminit setup_per_zone_inactive_ratio(void)
Now, kswapd works Ok:
# sar -rB -s 12:40:00
Linux 3.10.0-229.20.1.el7.mmir.x86_64
(castor-dev2-1-ssu-1-1.xy01.xyratex.com) 22/08/16 _x86_64_ (20 CPU)
12:40:01 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s
pgscank/s pgscand/s pgsteal/s %vmeff
12:50:01 12.93 421699.50 1768.24 0.10 115379.86
59.40 0.16 58.61 98.40
13:00:01 15.12 1486930.99 3068.52 0.24 404668.48
145.59 10.86 123.66 79.04
13:10:01 2.93 1491112.20 3468.10 0.12 405929.73
131.39 2.27 132.57 99.18
13:20:01 1.54 1472652.82 3484.32 0.10 400732.43
142.20 0.85 139.14 97.26
13:30:01 3.96 1463064.37 3039.36 0.12 398160.98
147.01 2.37 143.24 95.89
13:40:01 4.10 1446720.90 3451.41 0.12 394217.53
148.88 0.63 145.94 97.61
13:50:01 5.97 1463028.44 3520.28 0.14 399012.98
139.21 0.69 133.79 95.63
14:00:01 13.85 1445805.09 3071.23 0.30 393987.10
131.68 8.85 137.08 97.55
14:10:01 25.07 1421646.44 3738.69 0.39 387701.97
164.25 20.15 166.93 90.53
14:20:01 11.89 1379723.10 4270.80 0.23 375872.58
146.70 14.93 125.28 77.51
14:30:01 8.31 1375180.31 5472.30 0.17 373244.28
134.46 3.91 119.24 86.18
14:40:01 10.11 1334036.49 8850.42 0.15 360701.23
159.57 4.10 154.27 94.25
14:50:01 82.80 1140616.24 12696.72 0.46 306044.83
159.30 3.28 125.51 77.20
15:00:01 19.21 1088806.11 14202.19 0.26 291642.70
161.19 4.74 111.28 67.06
15:10:01 27.08 1046627.43 17154.44 0.34 281435.13
192.43 7.53 102.38 51.20
15:20:01 68.72 1028855.37 14164.93 0.66 275401.73
215.65 4.13 135.70 61.74
15:30:01 16.75 992275.54 12239.61 0.23 265369.64
182.36 10.87 99.25 51.36
15:40:01 150.85 940090.67 12283.44 1.11 251461.55
299.20 23.75 142.78 44.21
15:50:01 132.14 977937.08 12619.63 1.00 261673.06
220.62 64.57 135.61 47.55
16:00:01 17.95 1024651.95 12030.00 0.28 274089.74
184.80 18.48 104.86 51.59
16:10:01 41.26 1023482.82 12726.94 0.38 274037.38
234.26 35.27 109.40 40.59
16:20:01 76.12 1026993.21 12549.89 0.47 274872.77
206.96 19.29 124.67 55.11
16:30:03 17.41 1012199.77 12135.95 0.27 270494.82
161.94 10.27 108.83 63.19
16:40:01 69.82 1016362.58 12766.07 0.64 271505.17
119.43 9.42 112.50 87.31
16:50:01 17.72 1026519.75 12774.39 0.18 274760.91
179.48 20.01 138.00 69.18
17:00:01 6.88 1033279.20 12364.86 0.13 276506.10
126.42 6.44 90.20 67.89
17:10:01 33.26 1018956.92 12867.66 0.49 272069.13
202.53 7.44 115.20 54.87
17:20:01 28.07 1040438.90 13080.03 0.35 278356.95
188.47 1.71 116.80 61.41
17:30:01 13.69 1052387.04 12603.56 0.16 281346.41
149.10 16.91 102.65 61.84
17:40:01 17.55 1223806.22 8258.37 0.30 330923.67
214.27 7.30 146.13 65.95
17:50:01 20.46 1424863.99 3607.95 0.33 388637.98
217.85 17.75 136.05 57.75
18:00:01 9.47 1435175.96 3076.28 0.21 391356.34
185.68 3.27 146.21 77.38
18:10:01 39.88 1408818.72 3579.11 0.53 384292.74
174.22 6.73 148.55 82.09
18:20:01 78.72 1408298.79 3394.73 0.54 384043.36
204.96 10.88 148.06 68.60
18:30:01 51.71 1408936.46 3101.97 0.31 384366.70
211.51 19.30 148.59 64.38
Average: 32.95 1200040.29 8385.19 0.34 324404.13
172.66 11.40 126.54 68.75
12:40:01 kbmemfree kbmemused %memused kbbuffers kbcached
kbcommit %commit kbactive kbinact kbdirty
12:50:01 451392 64439956 99.30 47944 2263748
17990440 0.60 56481888 3822316 87220
13:00:01 395572 64495776 99.39 47060 2156628
18012880 0.60 56586092 3776320 100392
13:10:01 375588 64515760 99.42 46372 2015544
18023380 0.60 56666464 3724456 82656
13:20:01 371580 64519768 99.43 42960 1862684
18038108 0.60 56753352 3646544 74712
13:30:01 387664 64503684 99.40 38288 1699400
17997108 0.60 56815292 3579896 100368
13:40:01 407416 64483932 99.37 37756 1530260
18050920 0.60 56885756 3495732 96496
13:50:01 384380 64506968 99.41 37492 1395668
18007024 0.60 56961776 3447744 69308
14:00:01 370912 64520436 99.43 37084 1249500
18042408 0.60 57049020 3374072 85240
14:10:01 430832 64460516 99.34 36476 1051056
18056976 0.60 57099724 3275024 72736
14:20:01 409092 64482256 99.37 36064 930640
18045188 0.60 57138080 3265412 90468
14:30:01 383788 64507560 99.41 36132 815988
18005000 0.60 57241368 3199932 73492
14:40:01 454772 64436576 99.30 35832 620456
18031016 0.60 57300680 3079968 62212
14:50:01 389056 64502292 99.40 35844 570248
18188116 0.61 57347356 3100696 50184
15:00:01 376236 64515112 99.42 35624 471872
18120144 0.60 57433844 3023324 31188
15:10:01 351076 64540272 99.46 36192 484056
18144660 0.61 57441692 3040976 43268
15:20:01 369376 64521972 99.43 35892 396844
18194364 0.61 57479424 2993376 24244
15:30:01 350816 64540532 99.46 16988 403452
18201032 0.61 57462032 3021052 5084
15:40:01 361708 64529640 99.44 35780 395412
18215232 0.61 57460724 3012920 48356
15:50:01 367740 64523608 99.43 35904 432400
18237484 0.61 57475064 2987216 52160
16:00:01 364612 64526736 99.44 30772 435952
18243788 0.61 57445236 3029688 6156
16:10:01 371876 64519472 99.43 36216 476668
18233160 0.61 57443644 3024052 31988
16:20:01 369688 64521660 99.43 35896 455088
18224132 0.61 57437980 3027880 28900
16:30:03 369128 64522220 99.43 36060 452908
18258964 0.61 57419672 3036052 38896
16:40:01 369420 64521928 99.43 36452 521608
18239212 0.61 57447508 3017172 15288
16:50:01 412064 64479284 99.36 36644 536480
18319720 0.61 57414720 3004872 31308
17:00:01 362868 64528480 99.44 36220 514224
18371112 0.61 57454484 3012064 33012
17:10:01 381992 64509356 99.41 36128 507728
18305896 0.61 57445048 3006644 35220
17:20:01 348128 64543220 99.46 36396 527712
18319784 0.61 57457988 2999732 36024
17:30:01 365352 64525996 99.44 36316 519996
18359776 0.61 57444096 3013648 44488
17:40:01 401872 64489476 99.38 36040 506764
18365400 0.61 57417724 3002256 98176
17:50:01 385680 64505668 99.41 35936 539116
18340608 0.61 57440976 3000548 101128
18:00:01 388136 64503212 99.40 35948 530852
18339400 0.61 57416176 3016880 82168
18:10:01 399332 64492016 99.38 35904 542400
18359172 0.61 57400064 3023684 73284
18:20:01 369256 64522092 99.43 35752 544800
18386912 0.61 57439340 3016916 91680
18:30:01 364256 64527092 99.44 30092 448544
18448408 0.62 57481544 2977968 66912
Average: 383219 64508129 99.41 36527 823048
18191912 0.61 57259595 3173629 58983
You can compare with how it was before:
# sar -rB -11 -e 10:00:00
Linux 3.10.0-229.20.1.el7.x86_64
(castor-dev2-1-ssu-1-1.xy01.xyratex.com) 11/08/16 _x86_64_ (20 CPU)
00:00:01 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s
pgscank/s pgscand/s pgsteal/s %vmeff
00:10:01 16.15 1404641.02 3294.83 0.30 384068.82
232.61 2.51 130.05 55.31
00:20:01 4.74 1386234.27 3264.20 0.17 378183.96
131.42 46.21 150.36 84.65
00:30:01 16.81 1357082.95 2856.56 0.44 370191.72
171.20 7.29 131.70 73.78
00:40:01 10.56 1380877.77 3330.86 0.21 376939.96
137.89 0.13 129.86 94.09
00:50:01 13.36 1389993.97 3514.01 0.24 379609.43
129.31 2.11 130.64 99.41
01:00:01 4.81 1384278.89 2788.69 0.18 378195.35
97.28 34.40 126.75 96.26
01:10:01 19.40 1397710.30 3254.28 0.48 382363.61
159.35 16.09 144.62 82.43
01:20:01 20.13 1373687.69 3373.39 0.40 375604.74
183.20 8.65 142.67 74.37
01:30:01 8.07 1384095.36 2937.47 0.22 377786.50
140.68 2.34 105.38 73.68
01:40:01 29.87 1356288.95 4059.10 0.66 369618.89
180.17 22.32 139.33 68.81
01:50:01 12.11 1360606.12 6005.05 0.30 369811.12
162.28 12.94 133.57 76.23
02:00:01 15.43 1310682.12 8775.43 0.38 354317.43
140.11 2.45 124.60 87.40
02:10:01 160.55 1183661.76 12517.79 1.19 317775.22
245.93 7.94 159.73 62.92
02:20:01 13.92 1115703.40 12939.00 0.34 299746.54
139.83 25.68 111.89 67.60
02:30:01 27.84 1082827.92 12784.33 0.51 289803.97
166.12 12.25 116.07 65.07
02:40:01 107.16 1042108.13 13104.45 1.66 280838.24
536.21 52.80 198.50 33.70
02:50:01 200.74 1013103.67 12483.72 1.91 272091.10
180.10 40.44 117.32 53.20
03:00:01 45.46 951048.89 11588.68 0.70 253985.85
152.91 17.33 107.89 63.37
03:10:01 274.33 914538.01 11580.86 1.84 244733.49
276.00 23.02 157.40 52.64
03:20:01 230.53 946778.38 11850.88 1.68 252944.34
255.13 25.37 153.57 54.75
03:30:01 33.31 951669.66 11559.20 0.49 254101.69
231.74 8.91 105.05 43.65
03:40:01 193.92 948298.99 11803.87 1.27 253376.52
205.85 40.59 135.43 54.96
03:50:01 209.60 944570.03 11719.10 1.68 253621.15
393.70 84.65 155.59 32.53
04:00:04 42.38 963933.85 11575.27 0.72 258323.08
184.07 23.46 109.61 52.82
04:10:01 184.68 935221.76 11612.56 1.55 249601.47
253.38 62.29 160.08 50.71
04:20:01 170.13 957115.71 11844.09 1.47 255755.14
189.18 17.96 132.43 63.93
04:30:01 22.73 962524.12 11491.38 0.49 256996.19
243.93 16.37 121.87 46.82
04:40:01 163.59 1016805.90 12686.28 1.38 273720.39
184.26 33.65 123.36 56.61
04:50:01 196.51 980225.02 12164.08 1.48 261900.97
322.95 1.58 148.96 45.90
05:00:01 33.60 987241.49 11727.46 0.85 263974.46
218.17 5.10 122.89 55.04
05:10:01 178.82 981502.18 12035.80 1.45 263338.30
207.56 30.26 127.70 53.70
05:20:01 161.28 975806.81 12097.08 1.49 261729.22
286.58 11.61 140.77 47.21
05:30:01 20.69 1031886.75 12458.76 0.48 275536.39
142.55 1.72 115.66 80.16
05:40:01 155.15 1040222.10 12654.04 1.52 278109.73
219.58 2.17 133.89 60.38
05:50:02 161.97 1051745.35 13017.94 1.35 280364.86
243.50 0.82 140.48 57.50
06:00:01 11.48 1040054.57 12496.37 0.45 277647.70
193.47 1.71 102.83 52.68
06:10:01 78.58 1047165.77 12873.49 1.08 281039.20
301.84 9.43 123.11 39.55
06:20:01 31.89 1031681.84 12845.93 0.74 275854.99
295.80 18.28 119.52 38.05
06:20:01 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s
pgscank/s pgscand/s pgsteal/s %vmeff
06:30:01 29.16 954012.01 11468.69 0.68 255034.94
257.05 7.65 93.99 35.51
06:40:01 174.64 948330.61 12026.78 1.52 254053.36
335.23 77.66 161.41 39.09
06:50:02 206.84 948250.20 11888.93 2.09 253588.96
297.95 84.36 118.77 31.07
07:00:01 78.79 907991.32 11184.25 2.01 242885.50
443.94 141.27 178.75 30.54
07:10:02 220.70 967317.05 11910.45 2.03 258254.43
195.45 48.15 104.63 42.95
07:20:01 246.72 907821.00 11519.52 2.05 242241.12
473.27 47.81 172.09 33.03
07:30:01 26.82 912501.70 11180.66 0.53 243061.75
302.41 7.15 92.05 29.74
07:40:01 208.51 908719.26 11516.04 1.57 242513.90
395.00 81.67 139.60 29.29
07:50:02 250.95 878989.78 11274.06 2.25 234792.89
457.96 101.79 183.52 32.79
08:00:01 39.00 906980.45 11138.19 0.76 241717.84
284.69 56.45 66.59 19.52
08:10:01 284.57 860345.15 10982.99 2.23 229377.75
704.29 60.74 162.32 21.22
08:20:02 270.84 894864.88 11444.78 2.32 238843.40
855.66 93.73 182.15 19.19
08:30:01 117.18 881583.22 10800.16 1.35 235006.90
879.26 92.42 95.46 9.82
08:40:01 387.30 848196.38 10971.65 4.08 226516.28
1766.13 81.07 183.09 9.91
08:50:02 510.00 823844.11 10676.30 4.26 219795.20
11058.82 328.04 235.80 2.07
09:00:01 587.25 753398.48 9799.37 15.02 203298.15
2790216.53 232433.56 252.03 0.01
09:10:01 726.68 714463.50 9959.91 19.55 194575.96
3512104.61 203148.26 267.57 0.01
09:20:02 1178.01 698973.34 13755.88 28.46 196646.42
2537620.87 172806.01 406.69 0.02
09:30:01 141.38 5648.11 1969.17 3.42 3386.52
1523.34 74.06 35.22 2.20
09:40:01 207.36 8.62 1151.16 2.07 1229.41
516.08 15.16 52.49 9.88
09:50:01 197.88 35.85 1164.62 2.00 1071.57
455.04 17.15 43.76 9.27
Average: 158.89 988256.17 9707.94 2.27 266132.70
150340.53 10347.54 139.99 0.09
00:00:01 kbmemfree kbmemused %memused kbbuffers kbcached
kbcommit %commit kbactive kbinact kbdirty
00:10:01 398544 64492804 99.39 40904 2276864
17750540 0.59 56426672 3899348 98240
00:20:01 441496 64449852 99.32 40060 2083052
17750804 0.59 56490908 3799812 100308
00:30:01 427920 64463428 99.34 39660 1936852
17771604 0.59 56570748 3729996 71680
00:40:01 435424 64455924 99.33 39620 1789040
17750644 0.59 56649744 3660540 100336
00:50:01 425944 64465404 99.34 39000 1641252
17750668 0.59 56738996 3588252 84644
01:00:01 406188 64485160 99.37 38340 1503840
17800380 0.59 56841136 3509300 98720
01:10:01 419436 64471912 99.35 37556 1334484
17796612 0.59 56917820 3421032 77480
01:20:01 454904 64436444 99.30 37256 1179180
17765700 0.59 56975044 3352512 22496
01:30:01 377404 64513944 99.42 36912 1100256
17726548 0.59 57109180 3297516 102600
01:40:01 376152 64515196 99.42 36700 948500
17769548 0.59 57179960 3231004 84488
01:50:01 381752 64509596 99.41 36724 801320
17771416 0.59 57255664 3160112 77400
02:00:01 376536 64514812 99.42 36556 666152
17849088 0.60 57323040 3105728 50264
02:10:01 395088 64496260 99.39 36072 583516
17754188 0.59 57348064 3065888 44128
02:20:01 374968 64516380 99.42 36268 490412
17854120 0.60 57409004 3024260 40088
02:30:01 364016 64527332 99.44 19548 411300
17839936 0.60 57463636 2985004 4688
02:40:01 443248 64448100 99.32 8564 318156
17897900 0.60 57372164 2992728 5488
02:50:01 352792 64538556 99.46 36000 343600
17912884 0.60 57452744 3005652 29320
03:00:01 363212 64528136 99.44 8984 304220
17857616 0.60 57527524 2919524 16064
03:10:01 368376 64522972 99.43 5876 339784
17972788 0.60 57504872 2942548 29676
03:20:01 375252 64516096 99.42 35908 293344
17987500 0.60 57478156 2957516 32196
03:30:01 385104 64506244 99.41 5752 321424
17865424 0.60 57453188 2979400 24976
03:40:01 365052 64526296 99.44 12292 329148
17915148 0.60 57511448 2940432 17920
03:50:01 366604 64524744 99.44 35664 330620
17962012 0.60 57453372 2992128 21452
04:00:04 365136 64526212 99.44 5720 337220
17933724 0.60 57480432 2970224 1832
04:10:01 400772 64490576 99.38 11492 323164
17954392 0.60 57420320 2986376 26988
04:20:01 395976 64495372 99.39 35824 333564
17961036 0.60 57463112 2955224 33324
04:30:01 443440 64447908 99.32 10612 302404
18052728 0.60 57443820 2929944 2260
04:40:01 396616 64494732 99.39 36304 325844
17988460 0.60 57450692 2963008 33752
04:50:01 390196 64501152 99.40 35948 332368
18006884 0.60 57467744 2948932 28452
05:00:01 425740 64465608 99.34 10640 301788
18018220 0.60 57446200 2942148 21504
05:10:01 394020 64497328 99.39 34668 330452
18042300 0.60 57462580 2952032 25512
05:20:01 392720 64498628 99.39 35868 359960
17997036 0.60 57461848 2957944 24752
05:30:01 412140 64479208 99.36 10952 340188
18032556 0.60 57452104 2944352 31972
05:40:01 397212 64494136 99.39 31560 415388
18055992 0.60 57393964 3019064 35752
05:50:02 385960 64505388 99.41 36384 427728
18048676 0.60 57433620 2987148 26508
06:00:01 389056 64502292 99.40 31820 447560
18109392 0.60 57413784 3005420 27844
06:10:01 390460 64500888 99.40 36180 425804
18055028 0.60 57423380 2993544 26900
06:20:01 348872 64542476 99.46 23412 346220
18180900 0.61 57504260 2962424 54312
06:20:01 kbmemfree kbmemused %memused kbbuffers kbcached
kbcommit %commit kbactive kbinact kbdirty
06:30:01 383212 64508136 99.41 10780 320716
18037288 0.60 57485372 2936344 18724
06:40:01 430536 64460812 99.34 10528 269148
18207168 0.61 57467604 2913220 27528
06:50:02 360936 64530412 99.44 35904 313088
18089936 0.60 57495212 2950468 22564
07:00:01 436180 64455168 99.33 2764 285604
18211532 0.61 57428600 2930812 22964
07:10:02 367076 64524272 99.43 19508 342588
18111152 0.60 57466580 2963400 20176
07:20:01 363380 64527968 99.44 31204 296676
18170900 0.61 57505540 2935904 18576
07:30:01 344552 64546796 99.47 2404 297296
18229132 0.61 57517812 2936092 16224
07:40:01 346232 64545116 99.47 5980 309552
18251668 0.61 57529104 2926776 15660
07:50:02 377344 64514004 99.42 5164 248264
18229748 0.61 57534464 2895440 9816
08:00:01 362056 64529292 99.44 2424 266640
18190456 0.61 57544288 2899040 8792
08:10:01 366340 64525008 99.44 2924 271820
18233488 0.61 57535464 2901500 15792
08:20:02 364604 64526744 99.44 4636 210400
18279788 0.61 57553236 2889244 7712
08:30:01 362364 64528984 99.44 2100 257252
18267668 0.61 57546552 2892036 24424
08:40:01 368580 64522768 99.43 1992 243256
18247696 0.61 57561308 2879664 3404
08:50:02 359044 64532304 99.45 1828 167736
18265192 0.61 57616672 2849180 6492
09:00:01 344344 64547004 99.47 1832 188956
18346860 0.61 57659208 2826504 4692
09:10:01 347712 64543636 99.46 1816 204228
18306036 0.61 57640468 2828728 4180
09:20:02 363428 64527920 99.44 1616 150076
19041536 0.64 57638924 2818804 4048
09:30:01 394796 64496552 99.39 1920 144876
18317028 0.61 57612308 2826476 176
09:40:01 391140 64500208 99.40 35384 109440
18318936 0.61 57577156 2862968 188
09:50:01 369640 64521708 99.43 35228 130556
18312972 0.61 57596304 2858088 184
Average: 386563 64504785 99.40 22128 537358
18036891 0.60 57367511 3047436 32011
Now, with the following settings also our performance problems
vanished completely:
sysctl -w vm.swappiness=100
sysctl -w vm.dirty_background_ratio=80
sysctl -w vm.dirty_ratio=90
Thank you,
Andriy
On Fri, Aug 12, 2016 at 9:52 PM, Andriy Tkachuk
<andriy.tkachuk@seagate.com> wrote:
> Hi,
>
> our user-space application uses large amount of anon pages (private
> mapping of the large file, more than 64GB RAM available in the system)
> which are rarely accessible and are supposed to be swapped out.
> Instead, we see that most of these pages are kept in memory while the
> system suffers from the lack of free memory and overall performance
> (especially the disk I/O, vm.swappiness=100 does not help it). kswapd
> scans millions of pages per second but reclames hundreds per sec only.
> Here are the 5 secs interval snapshots of some counters:
>
> $ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
> proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2
> proc-meminfo-0616-160539.txt Cached: 347936 kB
> proc-meminfo-0616-160549.txt Cached: 316316 kB
> proc-meminfo-0616-160559.txt Cached: 322264 kB
> proc-meminfo-0616-160539.txt SwapCached: 2853064 kB
> proc-meminfo-0616-160549.txt SwapCached: 2853168 kB
> proc-meminfo-0616-160559.txt SwapCached: 2853280 kB
> proc-vmstat-0616-160535.txt nr_active_anon 14508616
> proc-vmstat-0616-160545.txt nr_active_anon 14513725
> proc-vmstat-0616-160555.txt nr_active_anon 14515197
> proc-vmstat-0616-160535.txt nr_inactive_anon 747407
> proc-vmstat-0616-160545.txt nr_inactive_anon 744846
> proc-vmstat-0616-160555.txt nr_inactive_anon 744509
> proc-vmstat-0616-160535.txt nr_vmscan_write 5589095
> proc-vmstat-0616-160545.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160555.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160535.txt pgactivate 246016824
> proc-vmstat-0616-160545.txt pgactivate 246033242
> proc-vmstat-0616-160555.txt pgactivate 246042064
> proc-vmstat-0616-160535.txt pgrefill_normal 22763262
> proc-vmstat-0616-160545.txt pgrefill_normal 22768020
> proc-vmstat-0616-160555.txt pgrefill_normal 22768178
> proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
> proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
> proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
> proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
> proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
> proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637
>
> The pgrefill_normal and pgactivate counters show that only few
> hundreds/sec pages move from active to inactive and vice versa lists -
> that is comparable with what was reclaimed. So it looks like kswapd
> scans the pages from inactive list mostly in kind of a loop and does
> not even have a chance to look at the pages from the active list
> (where most of the application's anon pages are located).
>
> The kernel version: linux-3.10.0-229.14.1.el7.
>
> Any ideas? Would be be useful to change inactive_ratio dynamically in
> such a cases so that more pages could be moved from active to inactive
> list and get a chance to be reclaimed? (Note: when application is
> restarted - the problem disappears for a while (days) until the
> correspondent number of privately mapped pages are dirtied again.)
>
> Thank you,
> Andriy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: kswapd struggles reclaiming the pages on 64GB server
2016-08-22 18:37 ` Andriy Tkachuk
@ 2016-08-22 22:46 ` Andriy Tkachuk
2016-08-23 15:14 ` Andriy Tkachuk
0 siblings, 1 reply; 11+ messages in thread
From: Andriy Tkachuk @ 2016-08-22 22:46 UTC (permalink / raw)
To: linux-kernel; +Cc: Mel Gorman
On Mon, Aug 22, 2016 at 7:37 PM, Andriy Tkachuk
<andriy.tkachuk@seagate.com> wrote:
>
> The following patch resolved the problem:
> ...
Sorry, I was too hurry in sending good news. As appeared - the problem
is still there:
06:10:01 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s
pgscank/s pgscand/s pgsteal/s %vmeff
06:20:01 PM 78.72 1408298.79 3394.73 0.54 384043.36
204.96 10.88 148.06 68.60
06:30:01 PM 51.71 1408936.46 3101.97 0.31 384366.70
211.51 19.30 148.59 64.38
06:40:01 PM 65.29 1366755.43 3506.76 0.65 372628.96
383.94 27.11 154.71 37.64
06:50:01 PM 66.89 1376770.04 3544.59 0.78 375676.38
344.94 12.48 149.84 41.92
07:00:01 PM 15.87 1364029.92 3030.66 0.31 371903.73
307.63 13.32 132.75 41.36
07:10:01 PM 134.24 1370087.74 3397.32 1.06 373435.85
394.16 21.16 166.62 40.12
07:20:01 PM 95.27 1379686.06 3462.59 0.74 376050.98
356.59 29.23 159.00 41.21
07:30:01 PM 12.66 1392306.35 2929.64 0.25 379662.45
395.07 28.66 138.58 32.71
07:40:01 PM 142.06 1368613.00 3486.30 0.98 372912.77
575.74 40.40 174.45 28.31
07:50:02 PM 119.77 1389287.62 3562.60 1.02 378661.56
584.00 29.36 167.92 27.38
08:00:01 PM 7.31 1368140.71 3082.65 0.50 372828.37
711.84 58.08 130.82 16.99
08:10:01 PM 174.74 1381379.24 3529.78 1.65 376567.77
1023.07 46.96 182.60 17.07
08:20:01 PM 205.05 1353845.50 3546.95 1.88 368828.37
1807.24 57.29 183.46 9.84
08:30:01 PM 86.50 1375125.95 3182.56 1.41 374676.32
10590.52 217.62 148.83 1.38
08:40:01 PM 359.38 1345209.72 4185.55 3.45 366630.54
1419665.54 43307.32 220.06 0.02
08:50:01 PM 277.27 1324076.64 3889.98 2.51 360915.41
1779132.87 410.38 200.49 0.01
09:00:02 PM 96.48 1364092.11 3235.53 1.69 372031.87
1973266.99 167.74 163.43 0.01
09:10:01 PM 298.87 1324763.68 3914.11 2.90 360957.82
1772066.29 435.17 204.08 0.01
And I also see the inactive file list bigger than the active file:
# cat /proc/vmstat | egrep
'Cached|nr_.*active_|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
nr_inactive_anon 695534
nr_active_anon 14427464
nr_inactive_file 2786
nr_active_file 2698
nr_vmscan_write 1740097
pgactivate 115697891
pgrefill_normal 33345818
pgsteal_kswapd_normal 367908859
pgsteal_direct_normal 681266
pgscan_kswapd_normal 10255454426
So it looks like the time to check Minchan's patch really (disable the
check that inactive file list is bigger than active file)...
Andriy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: kswapd struggles reclaiming the pages on 64GB server
2016-08-22 22:46 ` Andriy Tkachuk
@ 2016-08-23 15:14 ` Andriy Tkachuk
2016-08-31 13:27 ` Andriy Tkachuk
0 siblings, 1 reply; 11+ messages in thread
From: Andriy Tkachuk @ 2016-08-23 15:14 UTC (permalink / raw)
To: linux-kernel
Well, as appeared - the patch did not affect the problem at all since
the memory cgroup was on (in which case zone's inactive_ratio is not
used, but the ratio is calculated directly at
mem_cgroup_inactive_anon_is_low()). So the patch will be retested with
memory cgroup off.
Andriy
On Mon, Aug 22, 2016 at 11:46 PM, Andriy Tkachuk
<andriy.tkachuk@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 7:37 PM, Andriy Tkachuk
> <andriy.tkachuk@seagate.com> wrote:
>>
>> The following patch resolved the problem:
>> ...
>
> Sorry, I was too hurry in sending good news. As appeared - the problem
> is still there:
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: kswapd struggles reclaiming the pages on 64GB server
2016-08-23 15:14 ` Andriy Tkachuk
@ 2016-08-31 13:27 ` Andriy Tkachuk
0 siblings, 0 replies; 11+ messages in thread
From: Andriy Tkachuk @ 2016-08-31 13:27 UTC (permalink / raw)
To: linux-kernel
Alright - after disabling memory cgroup all works perfectly with the
patch. Even with default vm parameters.
Here are some vmstat results to compare. Now:
# vmstat 60
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 67606176 375196 38708 1385896 0 74 23 1266751 198073
103648 6 7 86 1 0
3 0 67647904 394872 38612 1371200 0 695 18 1371067 212143
93917 7 8 85 1 0
2 0 67648016 375796 38676 1382812 1 2 13 1356271 215123
115987 6 7 85 1 0
3 0 67657392 378336 38744 1383468 1 157 15 1383591 213694
102457 6 7 86 1 0
6 0 67659088 367856 38796 1388696 1 28 26 1330238 208377
111469 6 7 86 1 0
2 0 67701344 407320 38680 1371004 0 704 34 1255911 203308
126458 8 8 82 3 0
4 0 67711920 402296 38776 1380836 0 176 8 1308525 201451
93053 6 7 86 1 0
8 0 67721264 376676 38872 1394816 0 156 14 1409726 218269
108127 7 8 85 1 0
18 0 67753872 395568 38896 1397144 0 544 16 1288576 201680
105980 6 7 86 1 0
2 0 67755544 362960 38992 1411744 0 28 17 1458453 232544
127088 6 7 85 1 0
4 0 67784056 376684 39088 1410924 0 475 25 1385831 218800
110344 6 7 85 1 0
2 0 67816104 393108 38800 1384108 1 535 17 1336857 208551
105872 6 7 85 1 0
7 0 67816104 399492 38820 1387096 0 0 17 1280630 205478
109499 6 7 86 1 0
1 0 67821648 375284 38908 1397132 1 93 15 1343042 208363
98031 6 7 85 1 0
1 0 67823512 363828 38924 1402388 0 31 15 1366995 212606
101328 6 7 85 1 0
5 0 67864264 416720 38784 1374480 1 680 21 1372581 210256
95369 7 8 83 3 0
Swapping works smoothly, more than enough memory for caching
available, cpu-wait is about 1.
Before:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 2 13755748 334968 2140 63780 6684 0 7644 21 3122 7704 0
9 83 8 0
2 2 13760380 333628 2140 62468 4572 7764 4764 9129 3326 8678 0
10 83 7 0
2 2 13761072 332888 2140 62608 4576 4256 4616 4470 3377 8906 0
10 82 7 0
2 2 13760812 341532 2148 62644 5388 3532 5996 3996 3451 7521 0
10 83 7 0
3 3 13757648 335116 2148 62944 6176 0 6480 238 3412 8905 0
10 83 7 0
2 2 13752936 331908 2148 62336 7488 0 7628 201 3433 7483 0
10 83 7 0
2 2 13752520 344428 2148 69412 5292 2160 15820 2324 7254 15960
0 11 82 7 0
3 2 13750856 338056 2148 69864 5576 0 5984 28 3384 8060 0
10 84 6 0
2 2 13748836 331516 2156 70116 6076 0 6376 44 3683 6941 2
10 82 6 0
2 2 13750184 335732 2148 70764 3544 2664 4252 2692 3682 8435 3
10 83 4 0
2 4 13747528 338492 2144 70872 9520 3152 9688 3176 4846 7013 1
10 82 7 0
3 2 13756580 341752 2144 71060 9020 14740 9148 14764 4167 8024
1 10 80 9 0
2 2 13749484 336900 2144 71504 6444 0 6916 24 3613 8472 1
10 82 7 0
2 2 13740560 333148 2152 72480 6932 0 7952 44 3891 6819 1
10 82 7 0
2 2 13734456 330896 2148 72920 12228 1736 12488 1764 3454 9321
2 9 82 8 0
The system got into classic thrashing from which it never came out.
Now:
# cat /proc/vmstat | egrep
'nr_.*active_|pg(steal|scan|refill).*_normal|nr_vmscan_write|nr_swap|pgact'
nr_inactive_anon 7546598
nr_active_anon 7547226
nr_inactive_file 175973
nr_active_file 179439
nr_vmscan_write 17862257
pgactivate 213529452
pgrefill_normal 50400148
pgsteal_kswapd_normal 55904846
pgsteal_direct_normal 2417827
pgscan_kswapd_normal 76263257
pgscan_direct_normal 3213568
Before:
# cat /proc/vmstat | egrep
'nr_.*active_|pg(steal|scan|refill).*_normal|nr_vmscan_write|nr_swap|pgact'
nr_inactive_anon 695534
nr_active_anon 14427464
nr_inactive_file 2786
nr_active_file 2698
nr_vmscan_write 1740097
pgactivate 115697891
pgrefill_normal 33345818
pgsteal_kswapd_normal 367908859
pgsteal_direct_normal 681266
pgscan_kswapd_normal 10255454426
Here is the patch again for convenience:
--- linux-3.10.0-229.20.1.el7.x86_64.orig/mm/page_alloc.c
2015-09-24 15:47:25.000000000 +0000
+++ linux-3.10.0-229.20.1.el7.x86_64/mm/page_alloc.c 2016-08-15
09:49:46.922240569 +0000
@@ -5592,16 +5592,7 @@
*/
static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
{
- unsigned int gb, ratio;
-
- /* Zone size in gigabytes */
- gb = zone->managed_pages >> (30 - PAGE_SHIFT);
- if (gb)
- ratio = int_sqrt(10 * gb);
- else
- ratio = 1;
-
- zone->inactive_ratio = ratio;
+ zone->inactive_ratio = 1;
}
Hope it will help someone facing the similar problems.
Regards,
Andriy
On Tue, Aug 23, 2016 at 4:14 PM, Andriy Tkachuk
<andriy.tkachuk@seagate.com> wrote:
> Well, as appeared - the patch did not affect the problem at all since
> the memory cgroup was on (in which case zone's inactive_ratio is not
> used, but the ratio is calculated directly at
> mem_cgroup_inactive_anon_is_low()). So the patch will be retested with
> memory cgroup off.
>
> Andriy
>
> On Mon, Aug 22, 2016 at 11:46 PM, Andriy Tkachuk
> <andriy.tkachuk@seagate.com> wrote:
>> On Mon, Aug 22, 2016 at 7:37 PM, Andriy Tkachuk
>> <andriy.tkachuk@seagate.com> wrote:
>>>
>>> The following patch resolved the problem:
>>> ...
>>
>> Sorry, I was too hurry in sending good news. As appeared - the problem
>> is still there:
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-08-31 13:27 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-12 20:52 mm: kswapd struggles reclaiming the pages on 64GB server Andriy Tkachuk
2016-08-17 11:43 ` Michal Hocko
2016-08-17 11:43 ` Michal Hocko
2016-08-22 18:16 ` Andriy Tkachuk
2016-08-22 18:16 ` Andriy Tkachuk
[not found] ` <20160818140513.GA2453@blaptop>
2016-08-18 17:43 ` Andriy Tkachuk
2016-08-22 18:10 ` Andriy Tkachuk
2016-08-22 18:37 ` Andriy Tkachuk
2016-08-22 22:46 ` Andriy Tkachuk
2016-08-23 15:14 ` Andriy Tkachuk
2016-08-31 13:27 ` Andriy Tkachuk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.