From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 466F2C07E9C for ; Wed, 14 Jul 2021 06:31:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A8C7613AA for ; Wed, 14 Jul 2021 06:31:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238079AbhGNGed (ORCPT ); Wed, 14 Jul 2021 02:34:33 -0400 Received: from relay.sw.ru ([185.231.240.75]:34486 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238003AbhGNGed (ORCPT ); Wed, 14 Jul 2021 02:34:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:From: Subject; bh=CmW4WEPrBYc9JkO1ZpHwc4AZ/QqQIR1GbHwoX4kgVJY=; b=o6m5sOcYtTchWFIWd TJWZ1yyazhKhz8n6I4Mzq8WRrP5nncMBYc4zFUWIYbiTxTH7Ggl7xiv/7LteuVnDqRbTsUdZd+E0o dgPq+uUl7Y4KhHVL6YmNtWaNe/uNpa865AZKZ+sqKfeq73BLmfg/ZQHSUyLCLQU8wEFCuWjR5H5HI =; Received: from [10.93.0.56] by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1m3YQh-003uXN-Pc; Wed, 14 Jul 2021 09:31:35 +0300 Subject: Re: [PATCH v2 1/1] memcg: enable accounting for pids in nested pid namespaces From: Vasily Averin To: Andrew Morton Cc: linux-kernel@vger.kernel.org, Roman Gushchin , Christian Brauner , =?UTF-8?Q?Michal_Koutn=c3=bd?= , Serge Hallyn , cgroups@vger.kernel.org, Michal Hocko References: <7b777e22-5b0d-7444-343d-92cbfae5f8b4@virtuozzo.com> <8b6de616-fd1a-02c6-cbdb-976ecdcfa604@virtuozzo.com> Message-ID: <21db0c2c-45ea-fded-9633-7b76ab2b1083@virtuozzo.com> Date: Wed, 14 Jul 2021 09:31:34 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <8b6de616-fd1a-02c6-cbdb-976ecdcfa604@virtuozzo.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear Andrew, could you please pick up this patch and add Reviewed-by: Shakeel Butt Thank you, Vasily Averin On 4/24/21 2:54 PM, Vasily Averin wrote: > Commit 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg") > enabled memcg accounting for pids allocated from init_pid_ns.pid_cachep, > but forgot to adjust the setting for nested pid namespaces. > As a result, pid memory is not accounted exactly where it is really needed, > inside memcg-limited containers with their own pid namespaces. > > Pid was one the first kernel objects enabled for memcg accounting. > init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that > any new pids in the system are memcg-accounted. > > Though recently I've noticed that it is wrong. nested pid namespaces creates > own slab caches for pid objects, nested pids have increased size because contain > id both for all parent and for own pid namespaces. The problem is that these slab > caches are _NOT_ marked by SLAB_ACCOUNT, as a result any pids allocated in > nested pid namespaces are not memcg-accounted. > > Pid struct in nested pid namespace consumes up to 500 bytes memory, > 100000 such objects gives us up to ~50Mb unaccounted memory, > this allow container to exceed assigned memcg limits. > > Fixes: 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg") > Cc: stable@vger.kernel.org > Signed-off-by: Vasily Averin > Reviewed-by: Michal Koutný > Acked-by: Christian Brauner > Acked-by: Roman Gushchin > --- > kernel/pid_namespace.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index 6cd6715..a46a372 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level) > mutex_lock(&pid_caches_mutex); > /* Name collision forces to do allocation under mutex. */ > if (!*pkc) > - *pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0); > + *pkc = kmem_cache_create(name, len, 0, > + SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0); > mutex_unlock(&pid_caches_mutex); > /* current can fail, but someone else can succeed. */ > return READ_ONCE(*pkc); > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vasily Averin Subject: Re: [PATCH v2 1/1] memcg: enable accounting for pids in nested pid namespaces Date: Wed, 14 Jul 2021 09:31:34 +0300 Message-ID: <21db0c2c-45ea-fded-9633-7b76ab2b1083@virtuozzo.com> References: <7b777e22-5b0d-7444-343d-92cbfae5f8b4@virtuozzo.com> <8b6de616-fd1a-02c6-cbdb-976ecdcfa604@virtuozzo.com> Mime-Version: 1.0 Content-Transfer-Encoding: base64 Return-path: DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:From: Subject; bh=CmW4WEPrBYc9JkO1ZpHwc4AZ/QqQIR1GbHwoX4kgVJY=; b=o6m5sOcYtTchWFIWd TJWZ1yyazhKhz8n6I4Mzq8WRrP5nncMBYc4zFUWIYbiTxTH7Ggl7xiv/7LteuVnDqRbTsUdZd+E0o dgPq+uUl7Y4KhHVL6YmNtWaNe/uNpa865AZKZ+sqKfeq73BLmfg/ZQHSUyLCLQU8wEFCuWjR5H5HI =; In-Reply-To: <8b6de616-fd1a-02c6-cbdb-976ecdcfa604-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> Content-Language: en-US List-ID: Content-Type: text/plain; charset="macroman" To: Andrew Morton Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Roman Gushchin , Christian Brauner , =?UTF-8?Q?Michal_Koutn=c3=bd?= , Serge Hallyn , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Michal Hocko RGVhciBBbmRyZXcsCmNvdWxkIHlvdSBwbGVhc2UgcGljayB1cCB0aGlzIHBhdGNoIGFuZCBhZGQK IFJldmlld2VkLWJ5OiBTaGFrZWVsIEJ1dHQgPHNoYWtlZWxiLWhwSXFzRDRBS2xmUVQwZFpSK0Fs ZkFAcHVibGljLmdtYW5lLm9yZz4KClRoYW5rIHlvdSwKCVZhc2lseSBBdmVyaW4KCk9uIDQvMjQv MjEgMjo1NCBQTSwgVmFzaWx5IEF2ZXJpbiB3cm90ZToKPiBDb21taXQgNWQwOTcwNTZjOWEwICgi a21lbWNnOiBhY2NvdW50IGNlcnRhaW4ga21lbSBhbGxvY2F0aW9ucyB0byBtZW1jZyIpCj4gZW5h YmxlZCBtZW1jZyBhY2NvdW50aW5nIGZvciBwaWRzIGFsbG9jYXRlZCBmcm9tIGluaXRfcGlkX25z LnBpZF9jYWNoZXAsCj4gYnV0IGZvcmdvdCB0byBhZGp1c3QgdGhlIHNldHRpbmcgZm9yIG5lc3Rl ZCBwaWQgbmFtZXNwYWNlcy4KPiBBcyBhIHJlc3VsdCwgcGlkIG1lbW9yeSBpcyBub3QgYWNjb3Vu dGVkIGV4YWN0bHkgd2hlcmUgaXQgaXMgcmVhbGx5IG5lZWRlZCwKPiBpbnNpZGUgbWVtY2ctbGlt aXRlZCBjb250YWluZXJzIHdpdGggdGhlaXIgb3duIHBpZCBuYW1lc3BhY2VzLgo+IAo+IFBpZCB3 YXMgb25lIHRoZSBmaXJzdCBrZXJuZWwgb2JqZWN0cyBlbmFibGVkIGZvciBtZW1jZyBhY2NvdW50 aW5nLgo+IGluaXRfcGlkX25zLnBpZF9jYWNoZXAgbWFya2VkIGJ5IFNMQUJfQUNDT1VOVCBhbmQg d2UgY2FuIGV4cGVjdCB0aGF0Cj4gYW55IG5ldyBwaWRzIGluIHRoZSBzeXN0ZW0gYXJlIG1lbWNn LWFjY291bnRlZC4KPiAKPiBUaG91Z2ggcmVjZW50bHkgSSd2ZSBub3RpY2VkIHRoYXQgaXQgaXMg d3JvbmcuIG5lc3RlZCBwaWQgbmFtZXNwYWNlcyBjcmVhdGVzIAo+IG93biBzbGFiIGNhY2hlcyBm b3IgcGlkIG9iamVjdHMsIG5lc3RlZCBwaWRzIGhhdmUgaW5jcmVhc2VkIHNpemUgYmVjYXVzZSBj b250YWluIAo+IGlkIGJvdGggZm9yIGFsbCBwYXJlbnQgYW5kIGZvciBvd24gcGlkIG5hbWVzcGFj ZXMuIFRoZSBwcm9ibGVtIGlzIHRoYXQgdGhlc2Ugc2xhYgo+IGNhY2hlcyBhcmUgX05PVF8gbWFy a2VkIGJ5IFNMQUJfQUNDT1VOVCwgYXMgYSByZXN1bHQgYW55IHBpZHMgYWxsb2NhdGVkIGluIAo+ IG5lc3RlZCBwaWQgbmFtZXNwYWNlcyBhcmUgbm90IG1lbWNnLWFjY291bnRlZC4KPiAKPiBQaWQg c3RydWN0IGluIG5lc3RlZCBwaWQgbmFtZXNwYWNlIGNvbnN1bWVzIHVwIHRvIDUwMCBieXRlcyBt ZW1vcnksIAo+IDEwMDAwMCBzdWNoIG9iamVjdHMgZ2l2ZXMgdXMgdXAgdG8gfjUwTWIgdW5hY2Nv dW50ZWQgbWVtb3J5LAo+IHRoaXMgYWxsb3cgY29udGFpbmVyIHRvIGV4Y2VlZCBhc3NpZ25lZCBt ZW1jZyBsaW1pdHMuCj4gCj4gRml4ZXM6IDVkMDk3MDU2YzlhMCAoImttZW1jZzogYWNjb3VudCBj ZXJ0YWluIGttZW0gYWxsb2NhdGlvbnMgdG8gbWVtY2ciKQo+IENjOiBzdGFibGUtdTc5dXdYTDI5 VFk3Nloyck01bUhYQUBwdWJsaWMuZ21hbmUub3JnCj4gU2lnbmVkLW9mZi1ieTogVmFzaWx5IEF2 ZXJpbiA8dnZzLTVIZHdHdW41bGYrZ1NweHNKRDFDNHdAcHVibGljLmdtYW5lLm9yZz4KPiBSZXZp ZXdlZC1ieTogTWljaGFsIEtvdXRuw70gPG1rb3V0bnktSUJpOVJHL2I2N2tAcHVibGljLmdtYW5l Lm9yZz4KPiBBY2tlZC1ieTogQ2hyaXN0aWFuIEJyYXVuZXIgPGNocmlzdGlhbi5icmF1bmVyLUdl V0lIL25NWnpMUVQwZFpSK0FsZkFAcHVibGljLmdtYW5lLm9yZz4KPiBBY2tlZC1ieTogUm9tYW4g R3VzaGNoaW4gPGd1cm8tYjEwa1lQMmRPTWdAcHVibGljLmdtYW5lLm9yZz4KPiAtLS0KPiAga2Vy bmVsL3BpZF9uYW1lc3BhY2UuYyB8IDMgKystCj4gIDEgZmlsZSBjaGFuZ2VkLCAyIGluc2VydGlv bnMoKyksIDEgZGVsZXRpb24oLSkKPiAKPiBkaWZmIC0tZ2l0IGEva2VybmVsL3BpZF9uYW1lc3Bh Y2UuYyBiL2tlcm5lbC9waWRfbmFtZXNwYWNlLmMKPiBpbmRleCA2Y2Q2NzE1Li5hNDZhMzcyIDEw MDY0NAo+IC0tLSBhL2tlcm5lbC9waWRfbmFtZXNwYWNlLmMKPiArKysgYi9rZXJuZWwvcGlkX25h bWVzcGFjZS5jCj4gQEAgLTUxLDcgKzUxLDggQEAgc3RhdGljIHN0cnVjdCBrbWVtX2NhY2hlICpj cmVhdGVfcGlkX2NhY2hlcCh1bnNpZ25lZCBpbnQgbGV2ZWwpCj4gIAltdXRleF9sb2NrKCZwaWRf Y2FjaGVzX211dGV4KTsKPiAgCS8qIE5hbWUgY29sbGlzaW9uIGZvcmNlcyB0byBkbyBhbGxvY2F0 aW9uIHVuZGVyIG11dGV4LiAqLwo+ICAJaWYgKCEqcGtjKQo+IC0JCSpwa2MgPSBrbWVtX2NhY2hl X2NyZWF0ZShuYW1lLCBsZW4sIDAsIFNMQUJfSFdDQUNIRV9BTElHTiwgMCk7Cj4gKwkJKnBrYyA9 IGttZW1fY2FjaGVfY3JlYXRlKG5hbWUsIGxlbiwgMCwKPiArCQkJCQkgU0xBQl9IV0NBQ0hFX0FM SUdOIHwgU0xBQl9BQ0NPVU5ULCAwKTsKPiAgCW11dGV4X3VubG9jaygmcGlkX2NhY2hlc19tdXRl eCk7Cj4gIAkvKiBjdXJyZW50IGNhbiBmYWlsLCBidXQgc29tZW9uZSBlbHNlIGNhbiBzdWNjZWVk LiAqLwo+ICAJcmV0dXJuIFJFQURfT05DRSgqcGtjKTsKPiAKCg==