From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760443AbcHaN10 (ORCPT ); Wed, 31 Aug 2016 09:27:26 -0400 Received: from mx0b-00003501.pphosted.com ([67.231.152.68]:45383 "EHLO mx0a-000cda01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1760259AbcHaN1Y (ORCPT ); Wed, 31 Aug 2016 09:27:24 -0400 Authentication-Results: seagate.com; dkim=pass header.s="google" header.d=seagate.com MIME-Version: 1.0 In-Reply-To: References: From: Andriy Tkachuk Date: Wed, 31 Aug 2016 14:27:21 +0100 Message-ID: Subject: Re: mm: kswapd struggles reclaiming the pages on 64GB server To: linux-kernel@vger.kernel.org Content-Type: text/plain; charset=UTF-8 X-Proofpoint-PolicyRoute: Outbound X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-08-31_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 suspectscore=10 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 impostorscore=0 lowpriorityscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608310165 X-Proofpoint-Spam-Policy: Default Domain Policy Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Alright - after disabling memory cgroup all works perfectly with the patch. Even with default vm parameters. Here are some vmstat results to compare. Now: # vmstat 60 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 4 0 67606176 375196 38708 1385896 0 74 23 1266751 198073 103648 6 7 86 1 0 3 0 67647904 394872 38612 1371200 0 695 18 1371067 212143 93917 7 8 85 1 0 2 0 67648016 375796 38676 1382812 1 2 13 1356271 215123 115987 6 7 85 1 0 3 0 67657392 378336 38744 1383468 1 157 15 1383591 213694 102457 6 7 86 1 0 6 0 67659088 367856 38796 1388696 1 28 26 1330238 208377 111469 6 7 86 1 0 2 0 67701344 407320 38680 1371004 0 704 34 1255911 203308 126458 8 8 82 3 0 4 0 67711920 402296 38776 1380836 0 176 8 1308525 201451 93053 6 7 86 1 0 8 0 67721264 376676 38872 1394816 0 156 14 1409726 218269 108127 7 8 85 1 0 18 0 67753872 395568 38896 1397144 0 544 16 1288576 201680 105980 6 7 86 1 0 2 0 67755544 362960 38992 1411744 0 28 17 1458453 232544 127088 6 7 85 1 0 4 0 67784056 376684 39088 1410924 0 475 25 1385831 218800 110344 6 7 85 1 0 2 0 67816104 393108 38800 1384108 1 535 17 1336857 208551 105872 6 7 85 1 0 7 0 67816104 399492 38820 1387096 0 0 17 1280630 205478 109499 6 7 86 1 0 1 0 67821648 375284 38908 1397132 1 93 15 1343042 208363 98031 6 7 85 1 0 1 0 67823512 363828 38924 1402388 0 31 15 1366995 212606 101328 6 7 85 1 0 5 0 67864264 416720 38784 1374480 1 680 21 1372581 210256 95369 7 8 83 3 0 Swapping works smoothly, more than enough memory for caching available, cpu-wait is about 1. Before: # vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 3 2 13755748 334968 2140 63780 6684 0 7644 21 3122 7704 0 9 83 8 0 2 2 13760380 333628 2140 62468 4572 7764 4764 9129 3326 8678 0 10 83 7 0 2 2 13761072 332888 2140 62608 4576 4256 4616 4470 3377 8906 0 10 82 7 0 2 2 13760812 341532 2148 62644 5388 3532 5996 3996 3451 7521 0 10 83 7 0 3 3 13757648 335116 2148 62944 6176 0 6480 238 3412 8905 0 10 83 7 0 2 2 13752936 331908 2148 62336 7488 0 7628 201 3433 7483 0 10 83 7 0 2 2 13752520 344428 2148 69412 5292 2160 15820 2324 7254 15960 0 11 82 7 0 3 2 13750856 338056 2148 69864 5576 0 5984 28 3384 8060 0 10 84 6 0 2 2 13748836 331516 2156 70116 6076 0 6376 44 3683 6941 2 10 82 6 0 2 2 13750184 335732 2148 70764 3544 2664 4252 2692 3682 8435 3 10 83 4 0 2 4 13747528 338492 2144 70872 9520 3152 9688 3176 4846 7013 1 10 82 7 0 3 2 13756580 341752 2144 71060 9020 14740 9148 14764 4167 8024 1 10 80 9 0 2 2 13749484 336900 2144 71504 6444 0 6916 24 3613 8472 1 10 82 7 0 2 2 13740560 333148 2152 72480 6932 0 7952 44 3891 6819 1 10 82 7 0 2 2 13734456 330896 2148 72920 12228 1736 12488 1764 3454 9321 2 9 82 8 0 The system got into classic thrashing from which it never came out. Now: # cat /proc/vmstat | egrep 'nr_.*active_|pg(steal|scan|refill).*_normal|nr_vmscan_write|nr_swap|pgact' nr_inactive_anon 7546598 nr_active_anon 7547226 nr_inactive_file 175973 nr_active_file 179439 nr_vmscan_write 17862257 pgactivate 213529452 pgrefill_normal 50400148 pgsteal_kswapd_normal 55904846 pgsteal_direct_normal 2417827 pgscan_kswapd_normal 76263257 pgscan_direct_normal 3213568 Before: # cat /proc/vmstat | egrep 'nr_.*active_|pg(steal|scan|refill).*_normal|nr_vmscan_write|nr_swap|pgact' nr_inactive_anon 695534 nr_active_anon 14427464 nr_inactive_file 2786 nr_active_file 2698 nr_vmscan_write 1740097 pgactivate 115697891 pgrefill_normal 33345818 pgsteal_kswapd_normal 367908859 pgsteal_direct_normal 681266 pgscan_kswapd_normal 10255454426 Here is the patch again for convenience: --- linux-3.10.0-229.20.1.el7.x86_64.orig/mm/page_alloc.c 2015-09-24 15:47:25.000000000 +0000 +++ linux-3.10.0-229.20.1.el7.x86_64/mm/page_alloc.c 2016-08-15 09:49:46.922240569 +0000 @@ -5592,16 +5592,7 @@ */ static void __meminit calculate_zone_inactive_ratio(struct zone *zone) { - unsigned int gb, ratio; - - /* Zone size in gigabytes */ - gb = zone->managed_pages >> (30 - PAGE_SHIFT); - if (gb) - ratio = int_sqrt(10 * gb); - else - ratio = 1; - - zone->inactive_ratio = ratio; + zone->inactive_ratio = 1; } Hope it will help someone facing the similar problems. Regards, Andriy On Tue, Aug 23, 2016 at 4:14 PM, Andriy Tkachuk wrote: > Well, as appeared - the patch did not affect the problem at all since > the memory cgroup was on (in which case zone's inactive_ratio is not > used, but the ratio is calculated directly at > mem_cgroup_inactive_anon_is_low()). So the patch will be retested with > memory cgroup off. > > Andriy > > On Mon, Aug 22, 2016 at 11:46 PM, Andriy Tkachuk > wrote: >> On Mon, Aug 22, 2016 at 7:37 PM, Andriy Tkachuk >> wrote: >>> >>> The following patch resolved the problem: >>> ... >> >> Sorry, I was too hurry in sending good news. As appeared - the problem >> is still there: >>