From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753498AbcHWRRi (ORCPT ); Tue, 23 Aug 2016 13:17:38 -0400 Received: from mail-db5eur01on0093.outbound.protection.outlook.com ([104.47.2.93]:19200 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752682AbcHWRRg (ORCPT ); Tue, 23 Aug 2016 13:17:36 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=VDavydov@virtuozzo.com; Date: Tue, 23 Aug 2016 19:44:59 +0300 From: Vladimir Davydov To: Sudeep K N CC: Eric Dumazet , Andrew Morton , "David S. Miller" , Johannes Weiner , Michal Hocko , , , netdev , , open list , Ingo Molnar , "Peter Zijlstra" , Sudeep Holla Subject: Re: [PATCH RESEND 8/8] af_unix: charge buffers to kmemcg Message-ID: <20160823164459.GD1863@esperanza> References: <1464094926.5939.48.camel@edumazet-glaptop3.roam.corp.google.com> <20160524163606.GB11150@esperanza> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: X-Originating-IP: [195.214.232.10] X-ClientProxiedBy: HE1PR0801CA0039.eurprd08.prod.outlook.com (10.167.184.49) To HE1PR0801MB1868.eurprd08.prod.outlook.com (10.168.94.11) X-MS-Office365-Filtering-Correlation-Id: e2804ba3-1d64-449d-4476-08d3cb74d5ba X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1868;2:xOXUuHsDUZYBWaYZpmgMuO4uNG9v3y+80FitkgqmFBu99kPbwUFwUvi0Qe5UtbQq4rUqxyC2af3wZGawrD8Vi8V3LqI2iwdJlJOekEpBPvScXzHbmUd/TiSn//eXZzvwXJD7trbnurVm74+FLeP10MUVyX9sGUXistd98hKR0pLCNRMTNeK+h2lTLH1jgvRF;3:et1W6mbHhuNMTTIXFaYtTmo4iyL35l/hWFrOukHhK0vR7WvCAipnw8VxfveoGAJfFwzSJGC0AjjyX5VVuGEF9rMph/LBCrx2j3y3tP8ResirXfjaGvOZpo1jP5skYzSc;25:iZdytIA3PEWvOFcHwnVCYj4v+ZQSP8BakbfZIvnU1qACqlwWS1PioqRee3VqSL5WaGlRP9qIiU6dhJn6OOSMpTu4gC5Ux3ysQzcDj1QxtWsJGodUCXfBK1tAe51bQ5EA2Tu9/6lwsHCHiwWDGFxho8rHRinNvYqcf5+LF8yjp5EK04Ux/RMQmdJJd1fdEATwk6cUYyWhXqs63aHzwZEjzg1RNd0zMMS1EEoTvdrvqy6gts03z2bgpxFwDQ8MGQBKTLCIJDGtqlH94ehKXzeU++T7+rYBWtEj2Qhxuc5Fd5fDPAlXFkno5OHtz3KWXukY/2vHU0dNUmGN4CSH5Xge2HbyWEWgWBSL/h9dU2xAiDigh74atsbtH1EguVUasXOcOtNlZSP2wb5YTy/5KT8Q/Ijx5b2M26+h/mT97PigYt0= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0801MB1868; X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1868;31:NzOUqGZ8xMEmkczfXsTAMNIUBJlVnsc2+K3b0ukW0nOIW7+dfHQge7OhI0mQKv5WK7A5WbiRZFhjsYrTaNY7BSd3FF2ra4eKVp+rqo2BFBTNRFJHtaw9AOQOamJoVqAEATh/85OZhDD8/X25GHYrUmrVGqVzpP256EZWDov4B2pKaMerZIV4bR0jd+beQSL1uPvQ44AbSUwbORqYkEk6ZgrWYKG6iYF4F6tYH9NcfHs=;4:Mn2QxRVSNtVCXpnQZBUz/pHiajJySA5wcsrc9lBiTGHV4wcV1itJbRZZhAR6lpJj4Zj1Qsv8WwK3y995aOPncKBc1EOPy6CJLrHqga2e5KXIfg7r1e3ldSp2qDxpjBW7thhTFRS0ReSSx7TudRQRLKrSSLXUM+yeG+PMqRtS/YNicarOF+1DU+5Vyd/0gkuj3UL1fQ/kH7fIAsCfIqCRNbWPodz0reb4Zz9YQsg1zk+Hox2jWVLhK4Ibpy4L991/EdNP16SSL31PRXE79uVAcsfoiS94CTUVAUGbp5DmAbEiBLuqEuOyq9rT7JnLVTtz7PDhO8+wQuBtXoztPeo0NgWJ4+toluOTy9R7jkrlWMPIp0QuF9h7ROyAby96pnQ7lJq4Ao3j24onAXJ38/hQ5HsZa6jMS5QvmiS1JbUCyRul7IXrtx2r/ejg2abt7q9os6X8jvvPrhKI1JpnRu31Uw== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6043046)(6042046);SRVR:HE1PR0801MB1868;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0801MB1868; X-Forefront-PRVS: 004395A01C X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(24454002)(52314003)(377454003)(199003)(377424004)(189002)(2950100001)(7846002)(105586002)(76176999)(6116002)(3846002)(586003)(54356999)(77096005)(66066001)(47776003)(8676002)(2906002)(81166006)(81156014)(80792005)(5660300001)(33656002)(97736004)(106356001)(110136002)(50466002)(42186005)(189998001)(50986999)(23726003)(4326007)(19580395003)(101416001)(93886004)(97756001)(33716001)(68736007)(7416002)(9686002)(1076002)(92566002)(86362001)(46406003)(19580405001)(305945005)(7736002)(217873001);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR0801MB1868;H:esperanza;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;HE1PR0801MB1868;23:tO98JL+OFU8qbWoKIyZaNeBMX3Q/AHBZsNQcVTv?= =?us-ascii?Q?2M/XPFlv7eozAcNpMCYDiwTerls7I6sXSPKSXvtat5xFkzvlEmgXNrUc20Bg?= =?us-ascii?Q?gSLw5hi2STr9XISVjltOyDWX4NgjQJuuQ5/vRrdWlWB8F23QXTE+b/Kgs6hp?= =?us-ascii?Q?Gx8UTagbLPOi6t9ZziSojxEh/5zVYAsigOZSyJkXqgOfPqxrRnSMlWGvwzNL?= =?us-ascii?Q?QU4TVsI6rR9FUexXV81u3kGEeH4DloksIo39B50PW6agCVkB/B4jUOaomQHb?= =?us-ascii?Q?hoSVJJlOY5cvFANI//3FvNwetPRJDCwW6Mgu4C8hfIl9pA3lJ2nY/D+YxEQc?= =?us-ascii?Q?ZhlgVgoOu4nPS1rjDRcVs75PFvnbfu8YqJvoitZv3p87rWErdI91ScfKwwQg?= =?us-ascii?Q?4n1yDMpxSdNL4jLJ/dqnBzPto0UVtBJpHms5/YmPGUEd50Ci/4Zk0f+CwxSQ?= =?us-ascii?Q?0rlCLo5r6QpwjgAIyk2jbCY5vz/JNbiq1XGkG1WB2RbjA5BzUbXUKe7OQrU2?= =?us-ascii?Q?WkGTWTE8HW9qlflBwiBAj4GPYmEQVFn0b+YZN/eGqb+XUnPB1e2nA4knFr9K?= =?us-ascii?Q?kcH6yNaepV5CiFjkiI5sgEZfxUlXHe399xJX+jkYyiY/LTTelgjQdhM6q0PV?= =?us-ascii?Q?+5AV1rjutPcs7S0CJlwyn01iqw/Sk2NAhPz6BfEpwEGHoDKsqwd6YKcEL8NA?= =?us-ascii?Q?KDbnQozwIURMi7Bs/tjkwQJeXUKH1QV5b7p4QuyBttrRZtMOa5mLIewT57pU?= =?us-ascii?Q?oJVzwRDa57cuOIGihFwQ4lXcgDnfs7WEOIXE7QBjZe2SL+ZblV4gVjdo3vsB?= =?us-ascii?Q?FHEn/CLlMtZ0USzPv0BSUH8OP8zoEKMjg0TQsmgU3ElcxQ6DD6oyfn7PWGRX?= =?us-ascii?Q?a3em+XWsQTxc1XDoEKt/iSFwravkxuYS8yKUOxA5fG2zfUipW4YdtGZ9A7wt?= =?us-ascii?Q?x0c9PyphiL/773ol5c7L0JvZgg1fr7hJ06UgegXCFOq1wBEV7g6/9mmO+LVn?= =?us-ascii?Q?/qIMTLJWqQutoZuyoH5YCZknwlMlooJe0HO0O+XOHWY6JXZ0/73v0S2WsDy2?= =?us-ascii?Q?UP6VQWdcgP097NjMPpFit59hyE3GYk9B4wcumw4HENSnSfCAI5+wTOQGWDJT?= =?us-ascii?Q?4P6y9OY1nflk8B1BOLeBruFPkKljKodjOlkdneBkGzorFYO8kX9lnOU5kky3?= =?us-ascii?Q?qgzXSQZK4fn2SRMbtbO6aBbMNHVARyjc0GkdVpq8PGbOwpYLP+RJQduMRQ3G?= =?us-ascii?Q?y1VKrnPVB2qC3OrsudHwKunFAM6OxSiKXeAUy2u5o?= X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1868;6:BW4WvRqFhQOxU3Fs4uzXibAt8pIP3PKMEncb21Fd6ui7CDmj8q7ZljUZch9aMqeNiO5r8dh+aafWhkkNjdu12grCYJBJUZMD2IpLhTdNmSjdN9rLnLd005lRp2UcMfRpSgmxmpaJju2xNiRVScZbYnbxr2VMu31ioIe/6IY9OgVpq/pL4X1mHJ4pdS1gxYjr1OwKGJfydIz7HfH3Yq+u4pos/c9QeuD5IKcrifBHii9iDErwjzRV5isCajJMHprheJvRgN93dA99gL4DNfJoMgytaMckc+43QMG1esa0aP6Fjd/7stTE8YZwT/6Oj9Is;5:ROs4NDn9GFknttzC3X6EJgTrS2vSjQt5i6mhm0Q4QbJK9L5lH29tn6MeFNAME5FLgEsJwfEc4m+Lnrhfe+g8chG8/mZr1RSwnX/mJTqs1JAVQbiRVvKIRuQbKqgtsSsW1CblEaeI0uwtXGxJrl+bjg==;24:wDgOVBIrZk3H7sFRHC9cFIX9tcxuo+2AU5IqMQZgylAsvhzq242639B7zNTFVw5zN7HCPMp7h+zCKXZKA2nWFbGCKtsQ+gr9KuGxQ6J8Rk4=;7:sDiDwrHSGuhGY/IWJLgUXVqMNJ7g/kjqhZEGzlHj62/sRvof2euYBmZ4fHdIHF5nSUF/J6x6r2EuVdl8VZulCTgA6FizZTSLmHVm5DW3z3KitOY2aEXAr0QkWi5oroXNasy420n66qg3zaqxZaUWDX5Hu0WS4nfnlEZ1AIAc59Psm/Pzs1s5tZfOJsP2Y/+vocDoAWWqDxpZ/kswUWVB/hJGu17k41efwOKsoLJBJC8w3JvWNkvxM6TTN/mD5uYl SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1868;20:AB1ze+4T9RF6HA0TMkHXNHZzKW/iXW6TwC9jY3rGNElfROg2HBKx9z2tXiuPPdAwyjz+13djXmKOJnvUKpp/4QrmVQhQ4hO1a05YQtREDxsIP6fUqdkceTBbiP3hM1nFUp6PtGWiaDGM9mpOMriZ9m/JAQQA3enMPLQn4XY/Ths= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Aug 2016 16:45:03.8359 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB1868 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Tue, Aug 23, 2016 at 02:48:11PM +0100, Sudeep K N wrote: > On Tue, May 24, 2016 at 5:36 PM, Vladimir Davydov > wrote: > > On Tue, May 24, 2016 at 06:02:06AM -0700, Eric Dumazet wrote: > >> On Tue, 2016-05-24 at 11:49 +0300, Vladimir Davydov wrote: > >> > Unix sockets can consume a significant amount of system memory, hence > >> > they should be accounted to kmemcg. > >> > > >> > Since unix socket buffers are always allocated from process context, > >> > all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in > >> > sock->sk_allocation mask. > >> > >> I have two questions : > >> > >> 1) What happens when a buffer, allocated from socket lands in a > >> different socket , maybe owned by another user/process. > >> > >> Who owns it now, in term of kmemcg accounting ? > > > > We never move memcg charges. E.g. if two processes from different > > cgroups are sharing a memory region, each page will be charged to the > > process which touched it first. Or if two processes are working with the > > same directory tree, inodes and dentries will be charged to the first > > user. The same is fair for unix socket buffers - they will be charged to > > the sender. > > > >> > >> 2) Has performance impact been evaluated ? > > > > I ran netperf STREAM_STREAM with default options in a kmemcg on > > a 4 core x 2 HT box. The results are below: > > > > # clients bandwidth (10^6bits/sec) > > base patched > > 1 67643 +- 725 64874 +- 353 - 4.0 % > > 4 193585 +- 2516 186715 +- 1460 - 3.5 % > > 8 194820 +- 377 187443 +- 1229 - 3.7 % > > > > So the accounting doesn't come for free - it takes ~4% of performance. > > I believe we could optimize it by using per cpu batching not only on > > charge, but also on uncharge in memcg core, but that's beyond the scope > > of this patch set - I'll take a look at this later. > > > > Anyway, if performance impact is found to be unacceptable, it is always > > possible to disable kmem accounting at boot time (cgroup.memory=nokmem) > > or not use memory cgroups at runtime at all (thanks to jump labels > > there'll be no overhead even if they are compiled in). > > > > I started seeing almost 10% degradation in the hackbench score with v4.8-rc1 > Bisecting it resulted in this patch, i.e. Commit 3aa9799e1364 ("af_unix: charge > buffers to kmemcg") in the mainline. > > As per the commit log, it seems like that's expected but I was not sure about > the margin. I also see the hackbench score is more inconsistent after this > patch, but I may be wrong as that's based on limited observation. > > Is this something we can ignore as hackbench is more synthetic compared > to the gain this patch provides in some real workloads ? AFAIU hackbench essentially measures the rate of sending data over a unix socket back and forth between processes running on different cpus, so it isn't a surprise that the patch resulted in a degradation, as it makes every skb page allocation/deallocation inc/dec an atomic counter inside memcg. The more processes/cpus running in the same cgroup are involved in this test, the more significant the overhead of this atomic counter is going to be. The degradation is not unavoidable - it can be fixed by making kmem charge/uncharge code use per-cpu batches. The infrastructure for this already exists in memcontrol.c. If it were not for the legacy mem_cgroup->kmem counter (which is actually useless and will be dropped in cgroup v2), the issue would be pretty easy to fix. However, this legacy counter makes a possible implementation quite messy, so I'd like to postpone it until cgroup v2 has finally settled down. Regarding your problem. As a workaround you can either start your workload in the root memory cgroup or disable kmem accounting for memory cgroups altogether (via cgroup.memory=nokmem boot option). If you find the issue critical, I don't mind reverting the patch - we can always re-apply it once per-cpu batches are implemented for kmem charges. Thanks, Vladimir From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vladimir Davydov Subject: Re: [PATCH RESEND 8/8] af_unix: charge buffers to kmemcg Date: Tue, 23 Aug 2016 19:44:59 +0300 Message-ID: <20160823164459.GD1863@esperanza> References: <1464094926.5939.48.camel@edumazet-glaptop3.roam.corp.google.com> <20160524163606.GB11150@esperanza> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Eric Dumazet , Andrew Morton , "David S. Miller" , Johannes Weiner , Michal Hocko , , , netdev , , open list , Ingo Molnar , Peter Zijlstra , Sudeep Holla To: Sudeep K N Return-path: Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org Hello, On Tue, Aug 23, 2016 at 02:48:11PM +0100, Sudeep K N wrote: > On Tue, May 24, 2016 at 5:36 PM, Vladimir Davydov > wrote: > > On Tue, May 24, 2016 at 06:02:06AM -0700, Eric Dumazet wrote: > >> On Tue, 2016-05-24 at 11:49 +0300, Vladimir Davydov wrote: > >> > Unix sockets can consume a significant amount of system memory, hence > >> > they should be accounted to kmemcg. > >> > > >> > Since unix socket buffers are always allocated from process context, > >> > all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in > >> > sock->sk_allocation mask. > >> > >> I have two questions : > >> > >> 1) What happens when a buffer, allocated from socket lands in a > >> different socket , maybe owned by another user/process. > >> > >> Who owns it now, in term of kmemcg accounting ? > > > > We never move memcg charges. E.g. if two processes from different > > cgroups are sharing a memory region, each page will be charged to the > > process which touched it first. Or if two processes are working with the > > same directory tree, inodes and dentries will be charged to the first > > user. The same is fair for unix socket buffers - they will be charged to > > the sender. > > > >> > >> 2) Has performance impact been evaluated ? > > > > I ran netperf STREAM_STREAM with default options in a kmemcg on > > a 4 core x 2 HT box. The results are below: > > > > # clients bandwidth (10^6bits/sec) > > base patched > > 1 67643 +- 725 64874 +- 353 - 4.0 % > > 4 193585 +- 2516 186715 +- 1460 - 3.5 % > > 8 194820 +- 377 187443 +- 1229 - 3.7 % > > > > So the accounting doesn't come for free - it takes ~4% of performance. > > I believe we could optimize it by using per cpu batching not only on > > charge, but also on uncharge in memcg core, but that's beyond the scope > > of this patch set - I'll take a look at this later. > > > > Anyway, if performance impact is found to be unacceptable, it is always > > possible to disable kmem accounting at boot time (cgroup.memory=nokmem) > > or not use memory cgroups at runtime at all (thanks to jump labels > > there'll be no overhead even if they are compiled in). > > > > I started seeing almost 10% degradation in the hackbench score with v4.8-rc1 > Bisecting it resulted in this patch, i.e. Commit 3aa9799e1364 ("af_unix: charge > buffers to kmemcg") in the mainline. > > As per the commit log, it seems like that's expected but I was not sure about > the margin. I also see the hackbench score is more inconsistent after this > patch, but I may be wrong as that's based on limited observation. > > Is this something we can ignore as hackbench is more synthetic compared > to the gain this patch provides in some real workloads ? AFAIU hackbench essentially measures the rate of sending data over a unix socket back and forth between processes running on different cpus, so it isn't a surprise that the patch resulted in a degradation, as it makes every skb page allocation/deallocation inc/dec an atomic counter inside memcg. The more processes/cpus running in the same cgroup are involved in this test, the more significant the overhead of this atomic counter is going to be. The degradation is not unavoidable - it can be fixed by making kmem charge/uncharge code use per-cpu batches. The infrastructure for this already exists in memcontrol.c. If it were not for the legacy mem_cgroup->kmem counter (which is actually useless and will be dropped in cgroup v2), the issue would be pretty easy to fix. However, this legacy counter makes a possible implementation quite messy, so I'd like to postpone it until cgroup v2 has finally settled down. Regarding your problem. As a workaround you can either start your workload in the root memory cgroup or disable kmem accounting for memory cgroups altogether (via cgroup.memory=nokmem boot option). If you find the issue critical, I don't mind reverting the patch - we can always re-apply it once per-cpu batches are implemented for kmem charges. Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw0-f198.google.com (mail-yw0-f198.google.com [209.85.161.198]) by kanga.kvack.org (Postfix) with ESMTP id F03FE6B0253 for ; Tue, 23 Aug 2016 12:45:09 -0400 (EDT) Received: by mail-yw0-f198.google.com with SMTP id i184so232488035ywb.1 for ; Tue, 23 Aug 2016 09:45:09 -0700 (PDT) Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on0137.outbound.protection.outlook.com. [104.47.1.137]) by mx.google.com with ESMTPS id u18si2753261qta.65.2016.08.23.09.45.08 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 23 Aug 2016 09:45:09 -0700 (PDT) Date: Tue, 23 Aug 2016 19:44:59 +0300 From: Vladimir Davydov Subject: Re: [PATCH RESEND 8/8] af_unix: charge buffers to kmemcg Message-ID: <20160823164459.GD1863@esperanza> References: <1464094926.5939.48.camel@edumazet-glaptop3.roam.corp.google.com> <20160524163606.GB11150@esperanza> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Sudeep K N Cc: Eric Dumazet , Andrew Morton , "David S. Miller" , Johannes Weiner , Michal Hocko , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, netdev , x86@kernel.org, open list , Ingo Molnar , Peter Zijlstra , Sudeep Holla Hello, On Tue, Aug 23, 2016 at 02:48:11PM +0100, Sudeep K N wrote: > On Tue, May 24, 2016 at 5:36 PM, Vladimir Davydov > wrote: > > On Tue, May 24, 2016 at 06:02:06AM -0700, Eric Dumazet wrote: > >> On Tue, 2016-05-24 at 11:49 +0300, Vladimir Davydov wrote: > >> > Unix sockets can consume a significant amount of system memory, hence > >> > they should be accounted to kmemcg. > >> > > >> > Since unix socket buffers are always allocated from process context, > >> > all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in > >> > sock->sk_allocation mask. > >> > >> I have two questions : > >> > >> 1) What happens when a buffer, allocated from socket lands in a > >> different socket , maybe owned by another user/process. > >> > >> Who owns it now, in term of kmemcg accounting ? > > > > We never move memcg charges. E.g. if two processes from different > > cgroups are sharing a memory region, each page will be charged to the > > process which touched it first. Or if two processes are working with the > > same directory tree, inodes and dentries will be charged to the first > > user. The same is fair for unix socket buffers - they will be charged to > > the sender. > > > >> > >> 2) Has performance impact been evaluated ? > > > > I ran netperf STREAM_STREAM with default options in a kmemcg on > > a 4 core x 2 HT box. The results are below: > > > > # clients bandwidth (10^6bits/sec) > > base patched > > 1 67643 +- 725 64874 +- 353 - 4.0 % > > 4 193585 +- 2516 186715 +- 1460 - 3.5 % > > 8 194820 +- 377 187443 +- 1229 - 3.7 % > > > > So the accounting doesn't come for free - it takes ~4% of performance. > > I believe we could optimize it by using per cpu batching not only on > > charge, but also on uncharge in memcg core, but that's beyond the scope > > of this patch set - I'll take a look at this later. > > > > Anyway, if performance impact is found to be unacceptable, it is always > > possible to disable kmem accounting at boot time (cgroup.memory=nokmem) > > or not use memory cgroups at runtime at all (thanks to jump labels > > there'll be no overhead even if they are compiled in). > > > > I started seeing almost 10% degradation in the hackbench score with v4.8-rc1 > Bisecting it resulted in this patch, i.e. Commit 3aa9799e1364 ("af_unix: charge > buffers to kmemcg") in the mainline. > > As per the commit log, it seems like that's expected but I was not sure about > the margin. I also see the hackbench score is more inconsistent after this > patch, but I may be wrong as that's based on limited observation. > > Is this something we can ignore as hackbench is more synthetic compared > to the gain this patch provides in some real workloads ? AFAIU hackbench essentially measures the rate of sending data over a unix socket back and forth between processes running on different cpus, so it isn't a surprise that the patch resulted in a degradation, as it makes every skb page allocation/deallocation inc/dec an atomic counter inside memcg. The more processes/cpus running in the same cgroup are involved in this test, the more significant the overhead of this atomic counter is going to be. The degradation is not unavoidable - it can be fixed by making kmem charge/uncharge code use per-cpu batches. The infrastructure for this already exists in memcontrol.c. If it were not for the legacy mem_cgroup->kmem counter (which is actually useless and will be dropped in cgroup v2), the issue would be pretty easy to fix. However, this legacy counter makes a possible implementation quite messy, so I'd like to postpone it until cgroup v2 has finally settled down. Regarding your problem. As a workaround you can either start your workload in the root memory cgroup or disable kmem accounting for memory cgroups altogether (via cgroup.memory=nokmem boot option). If you find the issue critical, I don't mind reverting the patch - we can always re-apply it once per-cpu batches are implemented for kmem charges. Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org