From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752640AbcHLUwX (ORCPT ); Fri, 12 Aug 2016 16:52:23 -0400 Received: from mx0b-00003501.pphosted.com ([67.231.152.68]:44122 "EHLO mx0a-000cda01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751570AbcHLUwW (ORCPT ); Fri, 12 Aug 2016 16:52:22 -0400 Authentication-Results: seagate.com; dkim=pass header.s="google" header.d=seagate.com MIME-Version: 1.0 From: Andriy Tkachuk Date: Fri, 12 Aug 2016 21:52:20 +0100 Message-ID: Subject: mm: kswapd struggles reclaiming the pages on 64GB server To: linux-kernel@vger.kernel.org Cc: Mel Gorman Content-Type: text/plain; charset=UTF-8 X-Proofpoint-PolicyRoute: Outbound X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-08-12_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 suspectscore=10 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 impostorscore=0 lowpriorityscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608120223 X-Proofpoint-Spam-Policy: Default Domain Policy Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, our user-space application uses large amount of anon pages (private mapping of the large file, more than 64GB RAM available in the system) which are rarely accessible and are supposed to be swapped out. Instead, we see that most of these pages are kept in memory while the system suffers from the lack of free memory and overall performance (especially the disk I/O, vm.swappiness=100 does not help it). kswapd scans millions of pages per second but reclames hundreds per sec only. Here are the 5 secs interval snapshots of some counters: $ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact' proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2 proc-meminfo-0616-160539.txt Cached: 347936 kB proc-meminfo-0616-160549.txt Cached: 316316 kB proc-meminfo-0616-160559.txt Cached: 322264 kB proc-meminfo-0616-160539.txt SwapCached: 2853064 kB proc-meminfo-0616-160549.txt SwapCached: 2853168 kB proc-meminfo-0616-160559.txt SwapCached: 2853280 kB proc-vmstat-0616-160535.txt nr_active_anon 14508616 proc-vmstat-0616-160545.txt nr_active_anon 14513725 proc-vmstat-0616-160555.txt nr_active_anon 14515197 proc-vmstat-0616-160535.txt nr_inactive_anon 747407 proc-vmstat-0616-160545.txt nr_inactive_anon 744846 proc-vmstat-0616-160555.txt nr_inactive_anon 744509 proc-vmstat-0616-160535.txt nr_vmscan_write 5589095 proc-vmstat-0616-160545.txt nr_vmscan_write 5589097 proc-vmstat-0616-160555.txt nr_vmscan_write 5589097 proc-vmstat-0616-160535.txt pgactivate 246016824 proc-vmstat-0616-160545.txt pgactivate 246033242 proc-vmstat-0616-160555.txt pgactivate 246042064 proc-vmstat-0616-160535.txt pgrefill_normal 22763262 proc-vmstat-0616-160545.txt pgrefill_normal 22768020 proc-vmstat-0616-160555.txt pgrefill_normal 22768178 proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420 proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554 proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639 proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064 proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064 proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064 proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848 proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626 proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637 The pgrefill_normal and pgactivate counters show that only few hundreds/sec pages move from active to inactive and vice versa lists - that is comparable with what was reclaimed. So it looks like kswapd scans the pages from inactive list mostly in kind of a loop and does not even have a chance to look at the pages from the active list (where most of the application's anon pages are located). The kernel version: linux-3.10.0-229.14.1.el7. Any ideas? Would be be useful to change inactive_ratio dynamically in such a cases so that more pages could be moved from active to inactive list and get a chance to be reclaimed? (Note: when application is restarted - the problem disappears for a while (days) until the correspondent number of privately mapped pages are dirtied again.) Thank you, Andriy