From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753498AbcHWRRi (ORCPT <rfc822;w@1wt.eu>);
        Tue, 23 Aug 2016 13:17:38 -0400
Received: from mail-db5eur01on0093.outbound.protection.outlook.com ([104.47.2.93]:19200
        "EHLO EUR01-DB5-obe.outbound.protection.outlook.com"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1752682AbcHWRRg (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 23 Aug 2016 13:17:36 -0400
Authentication-Results: spf=none (sender IP is )
 smtp.mailfrom=VDavydov@virtuozzo.com; 
Date: Tue, 23 Aug 2016 19:44:59 +0300
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Sudeep K N <sudeepholla.maillist@gmail.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        "David S. Miller" <davem@davemloft.net>,
        Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>,
        <linux-mm@kvack.org>, <linux-fsdevel@vger.kernel.org>,
        netdev <netdev@vger.kernel.org>, <x86@kernel.org>,
        open list <linux-kernel@vger.kernel.org>,
        Ingo Molnar <mingo@kernel.org>,
        "Peter Zijlstra" <peterz@infradead.org>,
        Sudeep Holla <sudeep.holla@arm.com>
Subject: Re: [PATCH RESEND 8/8] af_unix: charge buffers to kmemcg
Message-ID: <20160823164459.GD1863@esperanza>
References: <cover.1464079537.git.vdavydov@virtuozzo.com>
 <fcfe6cae27a59fbc5e40145664b3cf085a560c68.1464079538.git.vdavydov@virtuozzo.com>
 <1464094926.5939.48.camel@edumazet-glaptop3.roam.corp.google.com>
 <20160524163606.GB11150@esperanza>
 <CAPKp9uY9kFTqPT+9rkAcWACWrnE-FbGbuU=6mw715X6eCC4PVg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAPKp9uY9kFTqPT+9rkAcWACWrnE-FbGbuU=6mw715X6eCC4PVg@mail.gmail.com>
X-Originating-IP: [195.214.232.10]
X-ClientProxiedBy: HE1PR0801CA0039.eurprd08.prod.outlook.com (10.167.184.49)
 To HE1PR0801MB1868.eurprd08.prod.outlook.com (10.168.94.11)
X-MS-Office365-Filtering-Correlation-Id: e2804ba3-1d64-449d-4476-08d3cb74d5ba
X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1868;2:xOXUuHsDUZYBWaYZpmgMuO4uNG9v3y+80FitkgqmFBu99kPbwUFwUvi0Qe5UtbQq4rUqxyC2af3wZGawrD8Vi8V3LqI2iwdJlJOekEpBPvScXzHbmUd/TiSn//eXZzvwXJD7trbnurVm74+FLeP10MUVyX9sGUXistd98hKR0pLCNRMTNeK+h2lTLH1jgvRF;3:et1W6mbHhuNMTTIXFaYtTmo4iyL35l/hWFrOukHhK0vR7WvCAipnw8VxfveoGAJfFwzSJGC0AjjyX5VVuGEF9rMph/LBCrx2j3y3tP8ResirXfjaGvOZpo1jP5skYzSc;25:iZdytIA3PEWvOFcHwnVCYj4v+ZQSP8BakbfZIvnU1qACqlwWS1PioqRee3VqSL5WaGlRP9qIiU6dhJn6OOSMpTu4gC5Ux3ysQzcDj1QxtWsJGodUCXfBK1tAe51bQ5EA2Tu9/6lwsHCHiwWDGFxho8rHRinNvYqcf5+LF8yjp5EK04Ux/RMQmdJJd1fdEATwk6cUYyWhXqs63aHzwZEjzg1RNd0zMMS1EEoTvdrvqy6gts03z2bgpxFwDQ8MGQBKTLCIJDGtqlH94ehKXzeU++T7+rYBWtEj2Qhxuc5Fd5fDPAlXFkno5OHtz3KWXukY/2vHU0dNUmGN4CSH5Xge2HbyWEWgWBSL/h9dU2xAiDigh74atsbtH1EguVUasXOcOtNlZSP2wb5YTy/5KT8Q/Ijx5b2M26+h/mT97PigYt0=
X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0801MB1868;
X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1868;31:NzOUqGZ8xMEmkczfXsTAMNIUBJlVnsc2+K3b0ukW0nOIW7+dfHQge7OhI0mQKv5WK7A5WbiRZFhjsYrTaNY7BSd3FF2ra4eKVp+rqo2BFBTNRFJHtaw9AOQOamJoVqAEATh/85OZhDD8/X25GHYrUmrVGqVzpP256EZWDov4B2pKaMerZIV4bR0jd+beQSL1uPvQ44AbSUwbORqYkEk6ZgrWYKG6iYF4F6tYH9NcfHs=;4:Mn2QxRVSNtVCXpnQZBUz/pHiajJySA5wcsrc9lBiTGHV4wcV1itJbRZZhAR6lpJj4Zj1Qsv8WwK3y995aOPncKBc1EOPy6CJLrHqga2e5KXIfg7r1e3ldSp2qDxpjBW7thhTFRS0ReSSx7TudRQRLKrSSLXUM+yeG+PMqRtS/YNicarOF+1DU+5Vyd/0gkuj3UL1fQ/kH7fIAsCfIqCRNbWPodz0reb4Zz9YQsg1zk+Hox2jWVLhK4Ibpy4L991/EdNP16SSL31PRXE79uVAcsfoiS94CTUVAUGbp5DmAbEiBLuqEuOyq9rT7JnLVTtz7PDhO8+wQuBtXoztPeo0NgWJ4+toluOTy9R7jkrlWMPIp0QuF9h7ROyAby96pnQ7lJq4Ao3j24onAXJ38/hQ5HsZa6jMS5QvmiS1JbUCyRul7IXrtx2r/ejg2abt7q9os6X8jvvPrhKI1JpnRu31Uw==
X-Microsoft-Antispam-PRVS: <HE1PR0801MB18681CC3F1483DA7DFD8D2C5D8EB0@HE1PR0801MB1868.eurprd08.prod.outlook.com>
X-Exchange-Antispam-Report-Test: UriScan:(17755550239193);
X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6043046)(6042046);SRVR:HE1PR0801MB1868;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0801MB1868;
X-Forefront-PRVS: 004395A01C
X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(24454002)(52314003)(377454003)(199003)(377424004)(189002)(2950100001)(7846002)(105586002)(76176999)(6116002)(3846002)(586003)(54356999)(77096005)(66066001)(47776003)(8676002)(2906002)(81166006)(81156014)(80792005)(5660300001)(33656002)(97736004)(106356001)(110136002)(50466002)(42186005)(189998001)(50986999)(23726003)(4326007)(19580395003)(101416001)(93886004)(97756001)(33716001)(68736007)(7416002)(9686002)(1076002)(92566002)(86362001)(46406003)(19580405001)(305945005)(7736002)(217873001);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR0801MB1868;H:esperanza;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en;
X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;HE1PR0801MB1868;23:tO98JL+OFU8qbWoKIyZaNeBMX3Q/AHBZsNQcVTv?=
 =?us-ascii?Q?2M/XPFlv7eozAcNpMCYDiwTerls7I6sXSPKSXvtat5xFkzvlEmgXNrUc20Bg?=
 =?us-ascii?Q?gSLw5hi2STr9XISVjltOyDWX4NgjQJuuQ5/vRrdWlWB8F23QXTE+b/Kgs6hp?=
 =?us-ascii?Q?Gx8UTagbLPOi6t9ZziSojxEh/5zVYAsigOZSyJkXqgOfPqxrRnSMlWGvwzNL?=
 =?us-ascii?Q?QU4TVsI6rR9FUexXV81u3kGEeH4DloksIo39B50PW6agCVkB/B4jUOaomQHb?=
 =?us-ascii?Q?hoSVJJlOY5cvFANI//3FvNwetPRJDCwW6Mgu4C8hfIl9pA3lJ2nY/D+YxEQc?=
 =?us-ascii?Q?ZhlgVgoOu4nPS1rjDRcVs75PFvnbfu8YqJvoitZv3p87rWErdI91ScfKwwQg?=
 =?us-ascii?Q?4n1yDMpxSdNL4jLJ/dqnBzPto0UVtBJpHms5/YmPGUEd50Ci/4Zk0f+CwxSQ?=
 =?us-ascii?Q?0rlCLo5r6QpwjgAIyk2jbCY5vz/JNbiq1XGkG1WB2RbjA5BzUbXUKe7OQrU2?=
 =?us-ascii?Q?WkGTWTE8HW9qlflBwiBAj4GPYmEQVFn0b+YZN/eGqb+XUnPB1e2nA4knFr9K?=
 =?us-ascii?Q?kcH6yNaepV5CiFjkiI5sgEZfxUlXHe399xJX+jkYyiY/LTTelgjQdhM6q0PV?=
 =?us-ascii?Q?+5AV1rjutPcs7S0CJlwyn01iqw/Sk2NAhPz6BfEpwEGHoDKsqwd6YKcEL8NA?=
 =?us-ascii?Q?KDbnQozwIURMi7Bs/tjkwQJeXUKH1QV5b7p4QuyBttrRZtMOa5mLIewT57pU?=
 =?us-ascii?Q?oJVzwRDa57cuOIGihFwQ4lXcgDnfs7WEOIXE7QBjZe2SL+ZblV4gVjdo3vsB?=
 =?us-ascii?Q?FHEn/CLlMtZ0USzPv0BSUH8OP8zoEKMjg0TQsmgU3ElcxQ6DD6oyfn7PWGRX?=
 =?us-ascii?Q?a3em+XWsQTxc1XDoEKt/iSFwravkxuYS8yKUOxA5fG2zfUipW4YdtGZ9A7wt?=
 =?us-ascii?Q?x0c9PyphiL/773ol5c7L0JvZgg1fr7hJ06UgegXCFOq1wBEV7g6/9mmO+LVn?=
 =?us-ascii?Q?/qIMTLJWqQutoZuyoH5YCZknwlMlooJe0HO0O+XOHWY6JXZ0/73v0S2WsDy2?=
 =?us-ascii?Q?UP6VQWdcgP097NjMPpFit59hyE3GYk9B4wcumw4HENSnSfCAI5+wTOQGWDJT?=
 =?us-ascii?Q?4P6y9OY1nflk8B1BOLeBruFPkKljKodjOlkdneBkGzorFYO8kX9lnOU5kky3?=
 =?us-ascii?Q?qgzXSQZK4fn2SRMbtbO6aBbMNHVARyjc0GkdVpq8PGbOwpYLP+RJQduMRQ3G?=
 =?us-ascii?Q?y1VKrnPVB2qC3OrsudHwKunFAM6OxSiKXeAUy2u5o?=
X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1868;6:BW4WvRqFhQOxU3Fs4uzXibAt8pIP3PKMEncb21Fd6ui7CDmj8q7ZljUZch9aMqeNiO5r8dh+aafWhkkNjdu12grCYJBJUZMD2IpLhTdNmSjdN9rLnLd005lRp2UcMfRpSgmxmpaJju2xNiRVScZbYnbxr2VMu31ioIe/6IY9OgVpq/pL4X1mHJ4pdS1gxYjr1OwKGJfydIz7HfH3Yq+u4pos/c9QeuD5IKcrifBHii9iDErwjzRV5isCajJMHprheJvRgN93dA99gL4DNfJoMgytaMckc+43QMG1esa0aP6Fjd/7stTE8YZwT/6Oj9Is;5:ROs4NDn9GFknttzC3X6EJgTrS2vSjQt5i6mhm0Q4QbJK9L5lH29tn6MeFNAME5FLgEsJwfEc4m+Lnrhfe+g8chG8/mZr1RSwnX/mJTqs1JAVQbiRVvKIRuQbKqgtsSsW1CblEaeI0uwtXGxJrl+bjg==;24:wDgOVBIrZk3H7sFRHC9cFIX9tcxuo+2AU5IqMQZgylAsvhzq242639B7zNTFVw5zN7HCPMp7h+zCKXZKA2nWFbGCKtsQ+gr9KuGxQ6J8Rk4=;7:sDiDwrHSGuhGY/IWJLgUXVqMNJ7g/kjqhZEGzlHj62/sRvof2euYBmZ4fHdIHF5nSUF/J6x6r2EuVdl8VZulCTgA6FizZTSLmHVm5DW3z3KitOY2aEXAr0QkWi5oroXNasy420n66qg3zaqxZaUWDX5Hu0WS4nfnlEZ1AIAc59Psm/Pzs1s5tZfOJsP2Y/+vocDoAWWqDxpZ/kswUWVB/hJGu17k41efwOKsoLJBJC8w3JvWNkvxM6TTN/mD5uYl
SpamDiagnosticOutput: 1:99
SpamDiagnosticMetadata: NSPM
X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1868;20:AB1ze+4T9RF6HA0TMkHXNHZzKW/iXW6TwC9jY3rGNElfROg2HBKx9z2tXiuPPdAwyjz+13djXmKOJnvUKpp/4QrmVQhQ4hO1a05YQtREDxsIP6fUqdkceTBbiP3hM1nFUp6PtGWiaDGM9mpOMriZ9m/JAQQA3enMPLQn4XY/Ths=
X-OriginatorOrg: virtuozzo.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Aug 2016 16:45:03.8359 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB1868
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello,

On Tue, Aug 23, 2016 at 02:48:11PM +0100, Sudeep K N wrote:
> On Tue, May 24, 2016 at 5:36 PM, Vladimir Davydov
> <vdavydov@virtuozzo.com> wrote:
> > On Tue, May 24, 2016 at 06:02:06AM -0700, Eric Dumazet wrote:
> >> On Tue, 2016-05-24 at 11:49 +0300, Vladimir Davydov wrote:
> >> > Unix sockets can consume a significant amount of system memory, hence
> >> > they should be accounted to kmemcg.
> >> >
> >> > Since unix socket buffers are always allocated from process context,
> >> > all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in
> >> > sock->sk_allocation mask.
> >>
> >> I have two questions :
> >>
> >> 1) What happens when a buffer, allocated from socket <A> lands in a
> >> different socket <B>, maybe owned by another user/process.
> >>
> >> Who owns it now, in term of kmemcg accounting ?
> >
> > We never move memcg charges. E.g. if two processes from different
> > cgroups are sharing a memory region, each page will be charged to the
> > process which touched it first. Or if two processes are working with the
> > same directory tree, inodes and dentries will be charged to the first
> > user. The same is fair for unix socket buffers - they will be charged to
> > the sender.
> >
> >>
> >> 2) Has performance impact been evaluated ?
> >
> > I ran netperf STREAM_STREAM with default options in a kmemcg on
> > a 4 core x 2 HT box. The results are below:
> >
> >  # clients            bandwidth (10^6bits/sec)
> >                     base              patched
> >          1      67643 +-  725      64874 +-  353    - 4.0 %
> >          4     193585 +- 2516     186715 +- 1460    - 3.5 %
> >          8     194820 +-  377     187443 +- 1229    - 3.7 %
> >
> > So the accounting doesn't come for free - it takes ~4% of performance.
> > I believe we could optimize it by using per cpu batching not only on
> > charge, but also on uncharge in memcg core, but that's beyond the scope
> > of this patch set - I'll take a look at this later.
> >
> > Anyway, if performance impact is found to be unacceptable, it is always
> > possible to disable kmem accounting at boot time (cgroup.memory=nokmem)
> > or not use memory cgroups at runtime at all (thanks to jump labels
> > there'll be no overhead even if they are compiled in).
> >
> 
> I started seeing almost 10% degradation in the hackbench score with v4.8-rc1
> Bisecting it resulted in this patch, i.e. Commit 3aa9799e1364 ("af_unix: charge
> buffers to kmemcg") in the mainline.
> 
> As per the commit log, it seems like that's expected but I was not sure about
> the margin. I also see the hackbench score is more inconsistent after this
> patch, but I may be wrong as that's based on limited observation.
> 
> Is this something we can ignore as hackbench is more synthetic compared
> to the gain this patch provides in some real workloads ?

AFAIU hackbench essentially measures the rate of sending data over a
unix socket back and forth between processes running on different cpus,
so it isn't a surprise that the patch resulted in a degradation, as it
makes every skb page allocation/deallocation inc/dec an atomic counter
inside memcg. The more processes/cpus running in the same cgroup are
involved in this test, the more significant the overhead of this atomic
counter is going to be.

The degradation is not unavoidable - it can be fixed by making kmem
charge/uncharge code use per-cpu batches. The infrastructure for this
already exists in memcontrol.c. If it were not for the legacy
mem_cgroup->kmem counter (which is actually useless and will be dropped
in cgroup v2), the issue would be pretty easy to fix. However, this
legacy counter makes a possible implementation quite messy, so I'd like
to postpone it until cgroup v2 has finally settled down.

Regarding your problem. As a workaround you can either start your
workload in the root memory cgroup or disable kmem accounting for memory
cgroups altogether (via cgroup.memory=nokmem boot option). If you find
the issue critical, I don't mind reverting the patch - we can always
re-apply it once per-cpu batches are implemented for kmem charges.

Thanks,
Vladimir

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Vladimir Davydov <vdavydov@virtuozzo.com>
Subject: Re: [PATCH RESEND 8/8] af_unix: charge buffers to kmemcg
Date: Tue, 23 Aug 2016 19:44:59 +0300
Message-ID: <20160823164459.GD1863@esperanza>
References: <cover.1464079537.git.vdavydov@virtuozzo.com>
 <fcfe6cae27a59fbc5e40145664b3cf085a560c68.1464079538.git.vdavydov@virtuozzo.com>
 <1464094926.5939.48.camel@edumazet-glaptop3.roam.corp.google.com>
 <20160524163606.GB11150@esperanza>
 <CAPKp9uY9kFTqPT+9rkAcWACWrnE-FbGbuU=6mw715X6eCC4PVg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Cc: Eric Dumazet <eric.dumazet@gmail.com>, Andrew Morton
	<akpm@linux-foundation.org>, "David S. Miller" <davem@davemloft.net>,
	Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>,
	<linux-mm@kvack.org>, <linux-fsdevel@vger.kernel.org>, netdev
	<netdev@vger.kernel.org>, <x86@kernel.org>, open list
	<linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@kernel.org>, Peter
 Zijlstra <peterz@infradead.org>, Sudeep Holla <sudeep.holla@arm.com>
To: Sudeep K N <sudeepholla.maillist@gmail.com>
Return-path: <owner-linux-mm@kvack.org>
Content-Disposition: inline
In-Reply-To: <CAPKp9uY9kFTqPT+9rkAcWACWrnE-FbGbuU=6mw715X6eCC4PVg@mail.gmail.com>
Sender: owner-linux-mm@kvack.org
List-Id: netdev.vger.kernel.org

Hello,

On Tue, Aug 23, 2016 at 02:48:11PM +0100, Sudeep K N wrote:
> On Tue, May 24, 2016 at 5:36 PM, Vladimir Davydov
> <vdavydov@virtuozzo.com> wrote:
> > On Tue, May 24, 2016 at 06:02:06AM -0700, Eric Dumazet wrote:
> >> On Tue, 2016-05-24 at 11:49 +0300, Vladimir Davydov wrote:
> >> > Unix sockets can consume a significant amount of system memory, hence
> >> > they should be accounted to kmemcg.
> >> >
> >> > Since unix socket buffers are always allocated from process context,
> >> > all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in
> >> > sock->sk_allocation mask.
> >>
> >> I have two questions :
> >>
> >> 1) What happens when a buffer, allocated from socket <A> lands in a
> >> different socket <B>, maybe owned by another user/process.
> >>
> >> Who owns it now, in term of kmemcg accounting ?
> >
> > We never move memcg charges. E.g. if two processes from different
> > cgroups are sharing a memory region, each page will be charged to the
> > process which touched it first. Or if two processes are working with the
> > same directory tree, inodes and dentries will be charged to the first
> > user. The same is fair for unix socket buffers - they will be charged to
> > the sender.
> >
> >>
> >> 2) Has performance impact been evaluated ?
> >
> > I ran netperf STREAM_STREAM with default options in a kmemcg on
> > a 4 core x 2 HT box. The results are below:
> >
> >  # clients            bandwidth (10^6bits/sec)
> >                     base              patched
> >          1      67643 +-  725      64874 +-  353    - 4.0 %
> >          4     193585 +- 2516     186715 +- 1460    - 3.5 %
> >          8     194820 +-  377     187443 +- 1229    - 3.7 %
> >
> > So the accounting doesn't come for free - it takes ~4% of performance.
> > I believe we could optimize it by using per cpu batching not only on
> > charge, but also on uncharge in memcg core, but that's beyond the scope
> > of this patch set - I'll take a look at this later.
> >
> > Anyway, if performance impact is found to be unacceptable, it is always
> > possible to disable kmem accounting at boot time (cgroup.memory=nokmem)
> > or not use memory cgroups at runtime at all (thanks to jump labels
> > there'll be no overhead even if they are compiled in).
> >
> 
> I started seeing almost 10% degradation in the hackbench score with v4.8-rc1
> Bisecting it resulted in this patch, i.e. Commit 3aa9799e1364 ("af_unix: charge
> buffers to kmemcg") in the mainline.
> 
> As per the commit log, it seems like that's expected but I was not sure about
> the margin. I also see the hackbench score is more inconsistent after this
> patch, but I may be wrong as that's based on limited observation.
> 
> Is this something we can ignore as hackbench is more synthetic compared
> to the gain this patch provides in some real workloads ?

AFAIU hackbench essentially measures the rate of sending data over a
unix socket back and forth between processes running on different cpus,
so it isn't a surprise that the patch resulted in a degradation, as it
makes every skb page allocation/deallocation inc/dec an atomic counter
inside memcg. The more processes/cpus running in the same cgroup are
involved in this test, the more significant the overhead of this atomic
counter is going to be.

The degradation is not unavoidable - it can be fixed by making kmem
charge/uncharge code use per-cpu batches. The infrastructure for this
already exists in memcontrol.c. If it were not for the legacy
mem_cgroup->kmem counter (which is actually useless and will be dropped
in cgroup v2), the issue would be pretty easy to fix. However, this
legacy counter makes a possible implementation quite messy, so I'd like
to postpone it until cgroup v2 has finally settled down.

Regarding your problem. As a workaround you can either start your
workload in the root memory cgroup or disable kmem accounting for memory
cgroups altogether (via cgroup.memory=nokmem boot option). If you find
the issue critical, I don't mind reverting the patch - we can always
re-apply it once per-cpu batches are implemented for kmem charges.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail-yw0-f198.google.com (mail-yw0-f198.google.com [209.85.161.198])
	by kanga.kvack.org (Postfix) with ESMTP id F03FE6B0253
	for <linux-mm@kvack.org>; Tue, 23 Aug 2016 12:45:09 -0400 (EDT)
Received: by mail-yw0-f198.google.com with SMTP id i184so232488035ywb.1
        for <linux-mm@kvack.org>; Tue, 23 Aug 2016 09:45:09 -0700 (PDT)
Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on0137.outbound.protection.outlook.com. [104.47.1.137])
        by mx.google.com with ESMTPS id u18si2753261qta.65.2016.08.23.09.45.08
        for <linux-mm@kvack.org>
        (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
        Tue, 23 Aug 2016 09:45:09 -0700 (PDT)
Date: Tue, 23 Aug 2016 19:44:59 +0300
From: Vladimir Davydov <vdavydov@virtuozzo.com>
Subject: Re: [PATCH RESEND 8/8] af_unix: charge buffers to kmemcg
Message-ID: <20160823164459.GD1863@esperanza>
References: <cover.1464079537.git.vdavydov@virtuozzo.com>
 <fcfe6cae27a59fbc5e40145664b3cf085a560c68.1464079538.git.vdavydov@virtuozzo.com>
 <1464094926.5939.48.camel@edumazet-glaptop3.roam.corp.google.com>
 <20160524163606.GB11150@esperanza>
 <CAPKp9uY9kFTqPT+9rkAcWACWrnE-FbGbuU=6mw715X6eCC4PVg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAPKp9uY9kFTqPT+9rkAcWACWrnE-FbGbuU=6mw715X6eCC4PVg@mail.gmail.com>
Sender: owner-linux-mm@kvack.org
List-ID: <linux-mm.kvack.org>
To: Sudeep K N <sudeepholla.maillist@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, "David S. Miller" <davem@davemloft.net>, Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, netdev <netdev@vger.kernel.org>, x86@kernel.org, open list <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Sudeep Holla <sudeep.holla@arm.com>

Hello,

On Tue, Aug 23, 2016 at 02:48:11PM +0100, Sudeep K N wrote:
> On Tue, May 24, 2016 at 5:36 PM, Vladimir Davydov
> <vdavydov@virtuozzo.com> wrote:
> > On Tue, May 24, 2016 at 06:02:06AM -0700, Eric Dumazet wrote:
> >> On Tue, 2016-05-24 at 11:49 +0300, Vladimir Davydov wrote:
> >> > Unix sockets can consume a significant amount of system memory, hence
> >> > they should be accounted to kmemcg.
> >> >
> >> > Since unix socket buffers are always allocated from process context,
> >> > all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in
> >> > sock->sk_allocation mask.
> >>
> >> I have two questions :
> >>
> >> 1) What happens when a buffer, allocated from socket <A> lands in a
> >> different socket <B>, maybe owned by another user/process.
> >>
> >> Who owns it now, in term of kmemcg accounting ?
> >
> > We never move memcg charges. E.g. if two processes from different
> > cgroups are sharing a memory region, each page will be charged to the
> > process which touched it first. Or if two processes are working with the
> > same directory tree, inodes and dentries will be charged to the first
> > user. The same is fair for unix socket buffers - they will be charged to
> > the sender.
> >
> >>
> >> 2) Has performance impact been evaluated ?
> >
> > I ran netperf STREAM_STREAM with default options in a kmemcg on
> > a 4 core x 2 HT box. The results are below:
> >
> >  # clients            bandwidth (10^6bits/sec)
> >                     base              patched
> >          1      67643 +-  725      64874 +-  353    - 4.0 %
> >          4     193585 +- 2516     186715 +- 1460    - 3.5 %
> >          8     194820 +-  377     187443 +- 1229    - 3.7 %
> >
> > So the accounting doesn't come for free - it takes ~4% of performance.
> > I believe we could optimize it by using per cpu batching not only on
> > charge, but also on uncharge in memcg core, but that's beyond the scope
> > of this patch set - I'll take a look at this later.
> >
> > Anyway, if performance impact is found to be unacceptable, it is always
> > possible to disable kmem accounting at boot time (cgroup.memory=nokmem)
> > or not use memory cgroups at runtime at all (thanks to jump labels
> > there'll be no overhead even if they are compiled in).
> >
> 
> I started seeing almost 10% degradation in the hackbench score with v4.8-rc1
> Bisecting it resulted in this patch, i.e. Commit 3aa9799e1364 ("af_unix: charge
> buffers to kmemcg") in the mainline.
> 
> As per the commit log, it seems like that's expected but I was not sure about
> the margin. I also see the hackbench score is more inconsistent after this
> patch, but I may be wrong as that's based on limited observation.
> 
> Is this something we can ignore as hackbench is more synthetic compared
> to the gain this patch provides in some real workloads ?

AFAIU hackbench essentially measures the rate of sending data over a
unix socket back and forth between processes running on different cpus,
so it isn't a surprise that the patch resulted in a degradation, as it
makes every skb page allocation/deallocation inc/dec an atomic counter
inside memcg. The more processes/cpus running in the same cgroup are
involved in this test, the more significant the overhead of this atomic
counter is going to be.

The degradation is not unavoidable - it can be fixed by making kmem
charge/uncharge code use per-cpu batches. The infrastructure for this
already exists in memcontrol.c. If it were not for the legacy
mem_cgroup->kmem counter (which is actually useless and will be dropped
in cgroup v2), the issue would be pretty easy to fix. However, this
legacy counter makes a possible implementation quite messy, so I'd like
to postpone it until cgroup v2 has finally settled down.

Regarding your problem. As a workaround you can either start your
workload in the root memory cgroup or disable kmem accounting for memory
cgroups altogether (via cgroup.memory=nokmem boot option). If you find
the issue critical, I don't mind reverting the patch - we can always
re-apply it once per-cpu batches are implemented for kmem charges.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>