From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1760443AbcHaN10 (ORCPT <rfc822;w@1wt.eu>);
        Wed, 31 Aug 2016 09:27:26 -0400
Received: from mx0b-00003501.pphosted.com ([67.231.152.68]:45383 "EHLO
        mx0a-000cda01.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
        by vger.kernel.org with ESMTP id S1760259AbcHaN1Y (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 31 Aug 2016 09:27:24 -0400
Authentication-Results: seagate.com;
        dkim=pass header.s="google" header.d=seagate.com
MIME-Version: 1.0
In-Reply-To: <CAK-uSPqs-DR7t8Et3aBx7o6ipFRZDGikVKYUb3Td8GNrmBSmXQ@mail.gmail.com>
References: <CAK-uSPo9Nc-1HaURvwstOGYGuMEx4CXhPRv+cZevYLZX6URzYw@mail.gmail.com>
 <CAK-uSPrLUTrNqrWn-F5S_r47zx5dhf8j54MqSPobUCyZXTRVQQ@mail.gmail.com>
 <CAK-uSPpOVFVtC+i5xYTNdxjfdBuYsDW_=1Vu9DbVBwd2Y=Crmg@mail.gmail.com> <CAK-uSPqs-DR7t8Et3aBx7o6ipFRZDGikVKYUb3Td8GNrmBSmXQ@mail.gmail.com>
From: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Date: Wed, 31 Aug 2016 14:27:21 +0100
Message-ID: <CAK-uSPqSOHZ7AptEQjLu4TLXPxdQacZx4uhD5hu-Wvabta8nqg@mail.gmail.com>
Subject: Re: mm: kswapd struggles reclaiming the pages on 64GB server
To: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset=UTF-8
X-Proofpoint-PolicyRoute: Outbound
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-08-31_01:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 suspectscore=10
 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015
 impostorscore=0 lowpriorityscore=0 adultscore=0 classifier=spam adjust=0
 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608310165
X-Proofpoint-Spam-Policy: Default Domain Policy
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Alright - after disabling memory cgroup all works perfectly with the
patch. Even with default vm parameters.

Here are some vmstat results to compare. Now:

# vmstat 60
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0 67606176 375196  38708 1385896    0   74    23 1266751 198073
103648  6  7 86  1  0
 3  0 67647904 394872  38612 1371200    0  695    18 1371067 212143
93917  7  8 85  1  0
 2  0 67648016 375796  38676 1382812    1    2    13 1356271 215123
115987  6  7 85  1  0
 3  0 67657392 378336  38744 1383468    1  157    15 1383591 213694
102457  6  7 86  1  0
 6  0 67659088 367856  38796 1388696    1   28    26 1330238 208377
111469  6  7 86  1  0
 2  0 67701344 407320  38680 1371004    0  704    34 1255911 203308
126458  8  8 82  3  0
 4  0 67711920 402296  38776 1380836    0  176     8 1308525 201451
93053  6  7 86  1  0
 8  0 67721264 376676  38872 1394816    0  156    14 1409726 218269
108127  7  8 85  1  0
18  0 67753872 395568  38896 1397144    0  544    16 1288576 201680
105980  6  7 86  1  0
 2  0 67755544 362960  38992 1411744    0   28    17 1458453 232544
127088  6  7 85  1  0
 4  0 67784056 376684  39088 1410924    0  475    25 1385831 218800
110344  6  7 85  1  0
 2  0 67816104 393108  38800 1384108    1  535    17 1336857 208551
105872  6  7 85  1  0
 7  0 67816104 399492  38820 1387096    0    0    17 1280630 205478
109499  6  7 86  1  0
 1  0 67821648 375284  38908 1397132    1   93    15 1343042 208363
98031  6  7 85  1  0
 1  0 67823512 363828  38924 1402388    0   31    15 1366995 212606
101328  6  7 85  1  0
 5  0 67864264 416720  38784 1374480    1  680    21 1372581 210256
95369  7  8 83  3  0

Swapping works smoothly, more than enough memory for caching
available, cpu-wait is about 1.

Before:

# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  2 13755748 334968   2140  63780 6684    0  7644    21 3122 7704  0
 9 83  8  0
 2  2 13760380 333628   2140  62468 4572 7764  4764  9129 3326 8678  0
10 83  7  0
 2  2 13761072 332888   2140  62608 4576 4256  4616  4470 3377 8906  0
10 82  7  0
 2  2 13760812 341532   2148  62644 5388 3532  5996  3996 3451 7521  0
10 83  7  0
 3  3 13757648 335116   2148  62944 6176    0  6480   238 3412 8905  0
10 83  7  0
 2  2 13752936 331908   2148  62336 7488    0  7628   201 3433 7483  0
10 83  7  0
 2  2 13752520 344428   2148  69412 5292 2160 15820  2324 7254 15960
0 11 82  7  0
 3  2 13750856 338056   2148  69864 5576    0  5984    28 3384 8060  0
10 84  6  0
 2  2 13748836 331516   2156  70116 6076    0  6376    44 3683 6941  2
10 82  6  0
 2  2 13750184 335732   2148  70764 3544 2664  4252  2692 3682 8435  3
10 83  4  0
 2  4 13747528 338492   2144  70872 9520 3152  9688  3176 4846 7013  1
10 82  7  0
 3  2 13756580 341752   2144  71060 9020 14740  9148 14764 4167 8024
1 10 80  9  0
 2  2 13749484 336900   2144  71504 6444    0  6916    24 3613 8472  1
10 82  7  0
 2  2 13740560 333148   2152  72480 6932    0  7952    44 3891 6819  1
10 82  7  0
 2  2 13734456 330896   2148  72920 12228 1736 12488  1764 3454 9321
2  9 82  8  0

The system got into classic thrashing from which it never came out.

Now:

# cat /proc/vmstat | egrep
'nr_.*active_|pg(steal|scan|refill).*_normal|nr_vmscan_write|nr_swap|pgact'
nr_inactive_anon 7546598
nr_active_anon 7547226
nr_inactive_file 175973
nr_active_file 179439
nr_vmscan_write 17862257
pgactivate 213529452
pgrefill_normal 50400148
pgsteal_kswapd_normal 55904846
pgsteal_direct_normal 2417827
pgscan_kswapd_normal 76263257
pgscan_direct_normal 3213568

Before:

# cat /proc/vmstat | egrep
'nr_.*active_|pg(steal|scan|refill).*_normal|nr_vmscan_write|nr_swap|pgact'
nr_inactive_anon 695534
nr_active_anon 14427464
nr_inactive_file 2786
nr_active_file 2698
nr_vmscan_write 1740097
pgactivate 115697891
pgrefill_normal 33345818
pgsteal_kswapd_normal 367908859
pgsteal_direct_normal 681266
pgscan_kswapd_normal 10255454426

Here is the patch again for convenience:

--- linux-3.10.0-229.20.1.el7.x86_64.orig/mm/page_alloc.c
2015-09-24 15:47:25.000000000 +0000
+++ linux-3.10.0-229.20.1.el7.x86_64/mm/page_alloc.c    2016-08-15
09:49:46.922240569 +0000
@@ -5592,16 +5592,7 @@
  */
 static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
 {
-       unsigned int gb, ratio;
-
-       /* Zone size in gigabytes */
-       gb = zone->managed_pages >> (30 - PAGE_SHIFT);
-       if (gb)
-               ratio = int_sqrt(10 * gb);
-       else
-               ratio = 1;
-
-       zone->inactive_ratio = ratio;
+       zone->inactive_ratio = 1;
 }

Hope it will help someone facing the similar problems.

Regards,
  Andriy

On Tue, Aug 23, 2016 at 4:14 PM, Andriy Tkachuk
<andriy.tkachuk@seagate.com> wrote:
> Well, as appeared - the patch did not affect the problem at all since
> the memory cgroup was on (in which case zone's inactive_ratio is not
> used, but the ratio is calculated directly at
> mem_cgroup_inactive_anon_is_low()). So the patch will be retested with
> memory cgroup off.
>
>   Andriy
>
> On Mon, Aug 22, 2016 at 11:46 PM, Andriy Tkachuk
> <andriy.tkachuk@seagate.com> wrote:
>> On Mon, Aug 22, 2016 at 7:37 PM, Andriy Tkachuk
>> <andriy.tkachuk@seagate.com> wrote:
>>>
>>> The following patch resolved the problem:
>>> ...
>>
>> Sorry, I was too hurry in sending good news. As appeared - the problem
>> is still there:
>>