From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753129AbdBJSKX (ORCPT ); Fri, 10 Feb 2017 13:10:23 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:53186 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751519AbdBJSKP (ORCPT ); Fri, 10 Feb 2017 13:10:15 -0500 Date: Fri, 10 Feb 2017 10:01:02 -0800 From: Shaohua Li To: Michal Hocko CC: , , , , , , , , , Subject: Re: [PATCH V2 7/7] mm: add a separate RSS for MADV_FREE pages Message-ID: <20170210180101.GF86050@shli-mbp.local> References: <123396e3b523e8716dfc6fc87a5cea0c124ff29d.1486163864.git.shli@fb.com> <20170210133504.GO10893@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170210133504.GO10893@dhcp22.suse.cz> User-Agent: Mutt/1.6.1 (2016-04-27) X-Originating-IP: [2620:10d:c090:200::d:7cca] X-ClientProxiedBy: BN6PR05CA0014.namprd05.prod.outlook.com (10.174.92.155) To BN6PR15MB1633.namprd15.prod.outlook.com (10.175.129.23) X-MS-Office365-Filtering-Correlation-Id: 5c6b3fd7-5788-41bb-d6db-08d451decbf7 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:BN6PR15MB1633; X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1633;3:b3BzznOxxzuPOyb7UPstdnOYB/ErISbtBGsTmTrRrCUrQ/4qWSBnlum0gIx4NtWypTZex41Kpm1iFJMuAE9Jbk4aIvPt33uMQfNCk9Z28nIU8mdJEEpdllJ8z/GbWuzWTjT1YKdQ+Bhdhr6115KTzF9EnZGPB/TSVXu1c/wssULzfieP+oGhilmW2M1B0PgtkBP3QO6lBI9jAquLBrU1c3fEgUSwDqIALL6ZhHuDSc+1AJgbQQzMknJNTV8UaNHVjpbSbQ6WGI7cUkYVZuiCrA==;25:JtJ1zG7/Ejqcd4OvEngUPzGkqHmkXgsIrrWtiXWOaDB++4Qv2h4bXXkZfdJDt5Ru0H2FaLsz/T2rXrY4NauU551cry5SaTzii/AHZVuVQnc4UX2cr/wZXFaPqiCfvtzWunmIlg0brfUjJQyz3tVHj01E6sjnpO/UXmlZ8TBnU70YhOm0i7SId2T7IYcfdhTDO02qS+hfnfIHKYZOm6/pJGYvJXw+3qdgcKX0UHUjsHyqiSp4gBmVvqsRqsew6nhEQcsNkSvHgf7lGGoMRQF40VyRmhR4PVNfYIg6dXD/7liIiZve9oM8I1uuGILnSEQs4tESpGd0naZW8X1aSVAVkCU4lsll5fvCNtjrwsIgje7nc+8ac4fDcbWUFarkP4/MK89MgO+I7SOQxe+1cFz8EQtYU0AhsocRC3MRo+4wf+dOKlqCj5YBdJItBjEL7wiVXno3ZgKwLat7VUBgzTxk4A== X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1633;31:wQJ3HSfCcfkEmxnGoOHlbWlrMlkUOY/70+I/dY0iKjlpTgWo/w8O2q0FE6Ko5uI2GF38afzxl7iRG0w2E4zkVOXlGK0VWXVbgJcRP7R0Vj28SFGIYDy16OPuEJrm1wMdMR+eCnzGLoBsn9f7cXJiV9L093FyXVid3k+M2JR+cCqtBBNAKbf6Y23w6k8WHCR/t8+CU8hRz3688JuV05Z+1dgwzjD9X8aXMUojC1kQ2Rk=;20:XjG1YCAbODK+m6X5Ysjxek188Mbp0dE6cn+RkI3ab89mPJwN7+/fWybZUmOZrdr+Uz1HQWyrHSjF6LcdliMIp7qmp0+vzsAwe1z7q/wUosyM/XEvoKnlJYgbmTw1XGxiSZ1YrllVTaAomIitRLnhpd7hbNDwSzJEY0QWMucz+scybkDrTPN8AvMM/YFvXhPPTPbR17y10v2GHRC+k2x2Xho7bqueSExOX6Mllr+PPwm6PQJzVaAfWU0j6rlzWC227OQXEa70sfF7SAkw2YU/Xqke0TKEEmx8JE/MSaytyzBRnxYJu4h4calTTOm/wpsjuVObeempAr1SdkJAWBdJ3JNFnkyarVC/fODZ/uxtit+988nHuDPMslvi3J0p9ruct7KnlFVhRuBAqGJzANmKq3ihbn4bPla2zcRE1q3WBiDSz9UMgqeNEpuV3NJwF5M0ixU1MThgBglgC2vgup02+VHUaMwFbS9Gjgc++1S6jxu5GwdjxxZdiemmBZmN0EdC X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6041248)(20161123558025)(20161123555025)(20161123562025)(20161123564025)(20161123560025)(6072148);SRVR:BN6PR15MB1633;BCL:0;PCL:0;RULEID:;SRVR:BN6PR15MB1633; X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1633;4:68Kc1yb86qq5tn/VhQCPUxyYDRpfBS0654qzqlVy5/Vcjqa+E6bloHrygxlelsdcgQNUbBb3tKO5xlEiHAdeyxpPkhvR1ZELMT5ZbcJfxTUAACFfYmumd33p2zRsgh0oNI2DqZq4nt3LmYHmy3+OmfTYZ20u+cNCpp9OUC/6Q15ZccTN4KpOBBLZJsnwQ3vc9/vdduq1YkQgY+F/nbXn+E8VKkQjnog+G6eGU+5C6NerVADjYsFEcvOErizAvWsB7MMceOGGvrMYbq5IaVRYFiG5tGeJRIuaI7QeqH9Z1MJUsTBLW2IknySID6ZWyIteTf/d2ZPGUdMDlK++WHbFUINVpCW0fROxbIun6d0QvqWPPf6YKL7FFRJMt2yMhmQYnwEtLxqvZmNlO11I+Hmrev7dfEDJY0lrvnhZSWJI/dcHI8IKw0wthu9rUMwWjaGs2LEHY1NT+GAC3bFZM/ky3Z4vI8uUEJTpdBpWuDRaZSjW/JjfLMAn0zbcinHsrjZNS6QGLoHgIC5jrWOJfU94GpRTz6LFbYtTkdmG+B4mOKsCFZq5E1oP9Kd8gnN2vCzjc4mexaaXRPwXbo9B8eW8zOdZS0qKcXWs0Dfq4LpBUQA= X-Forefront-PRVS: 0214EB3F68 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(39450400003)(39410400002)(39830400002)(189002)(57704003)(199003)(377424004)(24454002)(106356001)(33656002)(50466002)(68736007)(92566002)(8676002)(7736002)(81166006)(7416002)(305945005)(81156014)(86362001)(42186005)(101416001)(47776003)(50986999)(83506001)(76176999)(54356999)(55016002)(46406003)(5660300001)(98436002)(105586002)(229853002)(6916009)(2950100002)(39060400001)(6666003)(97756001)(53936002)(189998001)(6246003)(6506006)(38730400002)(23726003)(2906002)(97736004)(25786008)(9686003)(6116002)(4001350100001)(1076002)(4326007)(110136004)(54906002)(18370500001);DIR:OUT;SFP:1102;SCL:1;SRVR:BN6PR15MB1633;H:shli-mbp.local;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;BN6PR15MB1633;23:cfKNSeZPQAAjIJJOD/PejaiiiJwnEdJ861nCnQSQi?= =?us-ascii?Q?QaIOhxFTWH8j2SsP4W8DBTb0c9xD2DbVL4i+49okg21vRgeXSFWoHh0Cwqdw?= =?us-ascii?Q?3BK1F0ujWmJ2E3/279ttSSHvJC2Bw555Na/7DL6zxjcrkFHEh8UnfR67dhp6?= =?us-ascii?Q?L+BfBiycuZzoQcokyTxZXTHQGnidGxggqlXCkl8pvtugCsItJPcB5R3/OLnB?= =?us-ascii?Q?Om64pWfiSWBCM2EMZY7PFtxeuSRTGKRlBIkK0bP500AAtBxZMx67wTV8RmQt?= =?us-ascii?Q?d3WcL7Wsr6oaqQUsSvDOkzukkGy38GMeEHMIfCKisB+v+o3QENuhQEvlce0X?= =?us-ascii?Q?OAEdMpP01mSdG5+xYMs1m60r/FvvGRUYRSfXQ5kLa1F3b4RPW+vXsed8SM03?= =?us-ascii?Q?bZUTx2xO2Fmq5Qmj6IvuPVNZEwhbJnpP035kvCRrK372mBcKpwJcbWbMgrKK?= =?us-ascii?Q?QcSI6QZeWk4GJKgcbhPORPM3qimzYqQvrUQvvTL5Iwjl+wcOS+tiawusnof5?= =?us-ascii?Q?FzyJiejuY0fuca9KarespzPv10tcXxMhx/7qm79xcQU+LS7B520q2dNqNQk5?= =?us-ascii?Q?ZKZRFAKE6NvSMnmp4mG1LPbW+CXdGagmhqg6PAB1chIi8/d5xx7Rg2pegOvf?= =?us-ascii?Q?/VN2U6rfH7HQFeqL68Xfnj2H4ie/pCGNjRcOorex1icNM31s5Sk4DOrzL2Rv?= =?us-ascii?Q?d4d78uCq9iLwn+lihzX36P1k7eb18Ze6P6/Lqxkp/moWZdr8+I7d17v+yoCU?= =?us-ascii?Q?o5fnDtPYCgjqxbv9YrivTts4MFXNTjqoDh/nKs/ZbmgsLlq/tx+zeVfx/n5p?= =?us-ascii?Q?FTAJWuVVf0jqHbYjGVbHxBkwuo2lMG5knQyl2Oo6UxjqUecf/GzboZ7d9W+i?= =?us-ascii?Q?jAptn/Bvudn35Ut3y25cQJrvlsJDawPJc0YLQa65CLHzdhfl5UmrTBxQKNJJ?= =?us-ascii?Q?TmiqVFpfHdF3ViUy317+3RL2x+qNUETKSrY5rYo4puGzeMS1TvWYf1+oSfO8?= =?us-ascii?Q?/0J9Zj4u5g/GrWfu1dTsw7t7wcFeTgJnWckplL2PNXNFINRPl02r7r2IEivv?= =?us-ascii?Q?3w9AvxT/hIdDWXA3klv0XBmlmb7uT5jbwba3IZ8V6ADPI2qdFGyEhyjOeUSw?= =?us-ascii?Q?Kwntx0eUB4q1SLxaFqhfWKdOJabT91o3YkOqyUJ0KDZ9LDYKAMcY5Y1fpfRm?= =?us-ascii?Q?EVPTe/aKRufaqTVfQ78hE1SButbgsm/QMUlspcx9Y03lnH/iaWc9ow5pShtB?= =?us-ascii?Q?JD/CdipXNYOxsMKiCh6YMr+l9C22SzIc+yf01avdc92YjEt/EjfkOuSSQf/n?= =?us-ascii?Q?wO5+boWscXRDccLYZ9bTC8TWpJGzYxDoFT+1F3NYy3EgY2TuOc9nsvXiA6p1?= =?us-ascii?Q?Kz4GhD4eyTHaJ+VnM+1i4OIGUZGmDTCbl2MIOQpl3g9BpNd?= X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1633;6:OTJ3Z4xFdV+qrGwZ+dt1AslDjNaQZkazq85cVLvGxOTpUvTd67H4LKeThdRzGm4mBO6dksW0b378mMzp+MX4LkA6uBFUKC+KUTc24NJa/LlwsxfK/jr5WwEcMJDj18OZIHcnzjFJs0uUEpqkK4VAi4NXz2bHJ1Z9xlxizJetZyH4KCGPjTXqq/HgU1YEGLOkZg+ZamG4p7kA3fSKpmoVVaOYkWJFEbISIYyT0a4dfjNeslcikLYvEsLsow+gp60J4CLQwr6RfMeASD8pV+4VhvuZmkaJDWxlKCmAWFz/BD3UrUp7tsvT+ZGy/CS8WVOdeplSYlUm1RofAapNY2VzhST6BncSKlYP0+jIcb/YbdkKUT/J+IolN3QGK1alZym4HHkCBclOPcjPvmlXm2OXXg==;5:TXSLgnoxdZRks99jgVxYnyWSWAl+Iza1ugTnkMVKafSy9avgeEFfS6h/m0NiZHJqdqHt8ySy4q9XmyB8nBzKLUoC9JUo4a1eFZQbqmHF1Jq3JBGYhdG0CXaZRQqQ5whpdgxPPyoVhKZcwb0XnnN5fQ==;24:IMOKilUEIcOWzkB9C+F9987rROPBLP1Lgs3gESIcdamFSDyHELsdRGbUtEPxiRe39FFowUO+LRM/sMv57eCoUT0g9I9TqBcQp97TzZEEh/Q= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1633;7:1xEgWpHxC6RiJ0l8luCwwlL6SeBo7FbTEpf2M58VH6HlWT8GGP9/FQSkHIZkqQmMjPPusYkSq/M5EZ+v6qeU+RrHlb3hGT4JYdDExgmL5wKL4aJCDb9dtGyyntuj6ooMaIZ23mURPX4e7qAizwbDG53j785wb7Tuz6S4xkGes3jl/1mTWbWPmsMQ1qlliWiSiK25+6ajlthsL0NHsZksBccdoX/21p5jQTIVgEPGt6lWl0mISuAcBeWzDgNgpCKVvC74bHTDjZI3NuntIwfT8smpVTHMp7TuudYoQLwkB++bbnC2RmZ2VUzcBv00Oi6Rz+PqRWs5X9vbtko5hOJwcV1qdpg0Y72cDgU+M9vM6zhYbA3mtzv4TdqGyeHssmPGrh3MmHCdPXZaf8+UAGSxsZmJdQjASr+GHSBXbltiFgWXitJNpN1aL3yips8BnVZ8V55FxLTjt6SoDSe76rxB94C67gxfhCsiv3XXNq+8BFW5b10aouhRd9R4oEe4zit6ITPBwdNNVaMwQ9FT98P1+Q==;20:xQYc0L7aPuR4O73ptsScGDtiGd1pkIuKhnsd+URDJHhBqjp+iXr6cqcjS+YUk8qO2b260BxfSQ2FxdrfzLa+e7l6/cZHXoxsCilNXlBbHyHB87XSO3ascrEyl2pBsnWHD4oYRRRZr3DYvTv4Cy9XOabOMUfVAbUT/pMTjblqMPk= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Feb 2017 18:01:09.9658 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR15MB1633 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-02-10_07:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 10, 2017 at 02:35:05PM +0100, Michal Hocko wrote: > On Fri 03-02-17 15:33:23, Shaohua Li wrote: > > Add a separate RSS for MADV_FREE pages. The pages are charged into > > MM_ANONPAGES (because they are mapped anon pages) and also charged into > > the MM_LAZYFREEPAGES. /proc/pid/statm will have an extra field to > > display the RSS, which userspace can use to determine the RSS excluding > > MADV_FREE pages. > > > > The basic idea is to increment the RSS in madvise and decrement in unmap > > or page reclaim. There is one limitation. If a page is shared by two > > processes, since madvise only has mm cotext of current process, it isn't > > convenient to charge the RSS for both processes. So we don't charge the > > RSS if the mapcount isn't 1. On the other hand, fork can make a > > MADV_FREE page shared by two processes. To make things consistent, we > > uncharge the RSS from the source mm in fork. > > > > A new flag is added to indicate if a page is accounted into the RSS. We > > can't use SwapBacked flag to do the determination because we can't > > guarantee the page has SwapBacked flag cleared in madvise. We are > > reusing mappedtodisk flag which should not be set for Anon pages. > > > > There are a couple of other places we need to uncharge the RSS, > > activate_page and mark_page_accessed. activate_page is used by swap, > > where MADV_FREE pages are already not in lazyfree state before going > > into swap. mark_page_accessed is mainly used for file pages, but there > > are several places it's used by anonymous pages. I fixed gup, but not > > some gpu drivers and kvm. If the drivers use MADV_FREE, we might have > > inprecise RSS accounting. > > > > Please note, the accounting is never going to be precise. MADV_FREE page > > could be written by userspace without notification to the kernel. The > > page can't be reclaimed like other clean lazyfree pages. The page isn't > > real lazyfree page. But since kernel isn't aware of this, the page is > > still accounted as lazyfree, thus the accounting could be incorrect. > > This is all quite complex and as you say unprecise already. From the > description it is not even clear why do we need it at all. Why is > /proc//smaps insufficient? I am also not fun of a new page flag - > even though you managed to recycle an existing one which is a plus. We have monitor app running in the system to check other apps' RSS and kill them if RSS is abnormal. Checking /proc/pid/smaps is too complicated and slow, don't think we can go that way. Yes, the accounting isn't precise, but should be much better than exporting nothing to userspace. Thanks, Shaohua From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f70.google.com (mail-vk0-f70.google.com [209.85.213.70]) by kanga.kvack.org (Postfix) with ESMTP id E43976B0038 for ; Fri, 10 Feb 2017 13:01:22 -0500 (EST) Received: by mail-vk0-f70.google.com with SMTP id n125so24288284vke.0 for ; Fri, 10 Feb 2017 10:01:22 -0800 (PST) Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com. [67.231.153.30]) by mx.google.com with ESMTPS id f67si742007uaf.136.2017.02.10.10.01.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 Feb 2017 10:01:22 -0800 (PST) Date: Fri, 10 Feb 2017 10:01:02 -0800 From: Shaohua Li Subject: Re: [PATCH V2 7/7] mm: add a separate RSS for MADV_FREE pages Message-ID: <20170210180101.GF86050@shli-mbp.local> References: <123396e3b523e8716dfc6fc87a5cea0c124ff29d.1486163864.git.shli@fb.com> <20170210133504.GO10893@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170210133504.GO10893@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Kernel-team@fb.com, danielmicay@gmail.com, minchan@kernel.org, hughd@google.com, hannes@cmpxchg.org, riel@redhat.com, mgorman@techsingularity.net, akpm@linux-foundation.org On Fri, Feb 10, 2017 at 02:35:05PM +0100, Michal Hocko wrote: > On Fri 03-02-17 15:33:23, Shaohua Li wrote: > > Add a separate RSS for MADV_FREE pages. The pages are charged into > > MM_ANONPAGES (because they are mapped anon pages) and also charged into > > the MM_LAZYFREEPAGES. /proc/pid/statm will have an extra field to > > display the RSS, which userspace can use to determine the RSS excluding > > MADV_FREE pages. > > > > The basic idea is to increment the RSS in madvise and decrement in unmap > > or page reclaim. There is one limitation. If a page is shared by two > > processes, since madvise only has mm cotext of current process, it isn't > > convenient to charge the RSS for both processes. So we don't charge the > > RSS if the mapcount isn't 1. On the other hand, fork can make a > > MADV_FREE page shared by two processes. To make things consistent, we > > uncharge the RSS from the source mm in fork. > > > > A new flag is added to indicate if a page is accounted into the RSS. We > > can't use SwapBacked flag to do the determination because we can't > > guarantee the page has SwapBacked flag cleared in madvise. We are > > reusing mappedtodisk flag which should not be set for Anon pages. > > > > There are a couple of other places we need to uncharge the RSS, > > activate_page and mark_page_accessed. activate_page is used by swap, > > where MADV_FREE pages are already not in lazyfree state before going > > into swap. mark_page_accessed is mainly used for file pages, but there > > are several places it's used by anonymous pages. I fixed gup, but not > > some gpu drivers and kvm. If the drivers use MADV_FREE, we might have > > inprecise RSS accounting. > > > > Please note, the accounting is never going to be precise. MADV_FREE page > > could be written by userspace without notification to the kernel. The > > page can't be reclaimed like other clean lazyfree pages. The page isn't > > real lazyfree page. But since kernel isn't aware of this, the page is > > still accounted as lazyfree, thus the accounting could be incorrect. > > This is all quite complex and as you say unprecise already. From the > description it is not even clear why do we need it at all. Why is > /proc//smaps insufficient? I am also not fun of a new page flag - > even though you managed to recycle an existing one which is a plus. We have monitor app running in the system to check other apps' RSS and kill them if RSS is abnormal. Checking /proc/pid/smaps is too complicated and slow, don't think we can go that way. Yes, the accounting isn't precise, but should be much better than exporting nothing to userspace. Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org