From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 18B03C433EF
	for <linux-kernel@archiver.kernel.org>; Fri, 24 Jun 2022 10:40:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230490AbiFXKkV (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 24 Jun 2022 06:40:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49890 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229902AbiFXKkT (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 24 Jun 2022 06:40:19 -0400
Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 379BE7C85F
        for <linux-kernel@vger.kernel.org>; Fri, 24 Jun 2022 03:40:17 -0700 (PDT)
Received: by mail-lf1-x12d.google.com with SMTP id w20so3710539lfa.11
        for <linux-kernel@vger.kernel.org>; Fri, 24 Jun 2022 03:40:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=openvz-org.20210112.gappssmtp.com; s=20210112;
        h=message-id:date:mime-version:user-agent:subject:content-language:to
         :cc:references:from:in-reply-to:content-transfer-encoding;
        bh=AJmnLzRfv3CKjpAFSm8w5uUX4zkxlzpS9NJDok/nCk8=;
        b=7HedpVdtZRLHDg268+oLo/eUh38UD497b1xuP0kyTrOApUY9ey+kT8c0TnmBPY+t9U
         mL3HMOpxmxQIYWukMfePC89vZe9yHBTYZIR98SC8JnF5PNc4ICv87ZpVlu0TxWY8d45R
         d9z0Mgc5yU0iDhbJBhLbs8Baz9W25uKdFESjuRdahXy0c+L74oYBwxW/vkQdtCl/Zp2b
         o7Ofb3Rb/6lKL6zcj87m09gmw0mAbgTtb4y8Hd8FQfwViCQZAfDmOdf8Vt7L2w/ctbFt
         osjhl7TXYXzxDD/VddKOpv5+aF5y9/2rtyKZkQcDfn86chNYao3fOf/kYoH3POPiDcmI
         cuLQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
         :content-language:to:cc:references:from:in-reply-to
         :content-transfer-encoding;
        bh=AJmnLzRfv3CKjpAFSm8w5uUX4zkxlzpS9NJDok/nCk8=;
        b=LZexuOqFGhd+gxMqELM1ZBd/tOBVPww0fLLKM4FdvrsATAwGOC4NTE9RMT4f86q5ir
         WP/EJLz/HZ+GgXhkX0gSmmtFCTCTb9eV77E8AjWLDTOW9ss3Fz2jiZ60ynyeu1X+THkq
         0JiDK99qujdfB29daA656kPAJyN2Y5XcsDw8n+KzMnP2HaP40O7PvXbjmsqYjKRFDzRs
         SZp9NPHSJEhBUB/Z8p21xMzPLvLnhh9oDmyPqVZBsdePrs8u4+7oGjlE8JPJumC1deKs
         OgsWQnbR2r2LgR6LMmYuuPkWSuH3vXSfJsFA9vN7qNhyaSwcc94K/BOwQ6ftdytAB7dt
         /8TA==
X-Gm-Message-State: AJIora8FkPFMn1KlnbSwQ7C7TmOzwUXzrF6VkwrgWrzQNxUDwoaByjkL
        Orm3iprKy8Qisu5rtTz632/2yQ==
X-Google-Smtp-Source: AGRyM1spY9eC2EEhD934ao3bTtj+9AIG31KaKJxVQ5yli2D+p9LF1XUzqtEEzlNZUQyOCZEGB0y6hQ==
X-Received: by 2002:a19:dc0f:0:b0:47f:1b37:8d8a with SMTP id t15-20020a19dc0f000000b0047f1b378d8amr8422183lfg.12.1656067215555;
        Fri, 24 Jun 2022 03:40:15 -0700 (PDT)
Received: from [192.168.1.65] ([46.188.121.129])
        by smtp.gmail.com with ESMTPSA id bq26-20020a056512151a00b0047976e7388bsm311119lfb.81.2022.06.24.03.40.14
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Fri, 24 Jun 2022 03:40:15 -0700 (PDT)
Message-ID: <0f8146e3-5865-b7e6-6728-5baada375cf2@openvz.org>
Date:   Fri, 24 Jun 2022 13:40:14 +0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.9.1
Subject: Re: [PATCH mm v5 0/9] memcg: accounting for objects allocated by
 mkdir, cgroup
Content-Language: en-US
To:     Shakeel Butt <shakeelb@google.com>, Michal Hocko <mhocko@suse.com>
Cc:     kernel@openvz.org, Andrew Morton <akpm@linux-foundation.org>,
        LKML <linux-kernel@vger.kernel.org>,
        Linux MM <linux-mm@kvack.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        =?UTF-8?Q?Michal_Koutn=c3=bd?= <mkoutny@suse.com>,
        Vlastimil Babka <vbabka@suse.cz>,
        Muchun Song <songmuchun@bytedance.com>,
        Cgroups <cgroups@vger.kernel.org>
References: <4e685057-b07d-745d-fdaa-1a6a5a681060@openvz.org>
 <0fe836b4-5c0f-0e32-d511-db816d359748@openvz.org>
 <c516033f-a9e4-3485-26d9-a68afa694c1d@openvz.org>
 <YrSP25ebDmXE+kPS@dhcp22.suse.cz>
 <CALvZod4WsmrpfZtfnP4AmvtnkvBRngXXGyhM6+aQzkdfjsinAQ@mail.gmail.com>
From:   Vasily Averin <vvs@openvz.org>
In-Reply-To: <CALvZod4WsmrpfZtfnP4AmvtnkvBRngXXGyhM6+aQzkdfjsinAQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 6/23/22 19:55, Shakeel Butt wrote:
> On Thu, Jun 23, 2022 at 9:07 AM Michal Hocko <mhocko@suse.com> wrote:
>>
>> On Thu 23-06-22 18:03:31, Vasily Averin wrote:
>>> Dear Michal,
>>> do you still have any concerns about this patch set?
>>
>> Yes, I do not think we have concluded this to be really necessary. IIRC
>> Roman would like to see lingering cgroups addressed in not-so-distant
>> future (http://lkml.kernel.org/r/Ypd2DW7id4M3KJJW@carbon) and we already
>> have a limit for the number of cgroups in the tree. So why should we
>> chase after allocations that correspond the cgroups and somehow try to
>> cap their number via the memory consumption. This looks like something
>> that will get out of sync eventually and it also doesn't seem like the
>> best control to me (comparing to an explicit limit to prevent runaways).
>> --
> 
> Let me give a counter argument to that. On a system running multiple
> workloads, how can the admin come up with a sensible limit for the
> number of cgroups? There will definitely be jobs that require much
> more number of sub-cgroups. Asking the admins to dynamically tune
> another tuneable is just asking for more complications. At the end all
> the users would just set it to max.
> 
> I would recommend to see the commit ac7b79fd190b ("inotify, memcg:
> account inotify instances to kmemcg") where there is already a sysctl
> (inotify/max_user_instances) to limit the number of instances but
> there was no sensible way to set that limit on a multi-tenant system.

I've found that MEM_CGROUP_ID_MAX limits memory cgroups only. Other types
of cgroups do not have similar restrictions. Yes, we can set some per-container 
limit for all cgroups, but to me it looks like workaround while
proper memory accounting looks like real solution.

Btw could you please explain why memory cgroups have MEM_CGROUP_ID_MAX limit
Why it is required at all and why it was set to USHRT_MAX? I believe that
in the future it may be really reachable:

Let's set up per-container cgroup limit to some small numbers, 
for example to 512 as OpenVz doing right now. On real node with 300
containers we can easily get 100*300 = 30000 cgroups, and consume ~3Gb memory, 
without any misuse. I think it is too much to ignore its accounting.

Thank you,
	Vasily Averin

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Vasily Averin <vvs-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Subject: Re: [PATCH mm v5 0/9] memcg: accounting for objects allocated by
 mkdir, cgroup
Date: Fri, 24 Jun 2022 13:40:14 +0300
Message-ID: <0f8146e3-5865-b7e6-6728-5baada375cf2@openvz.org>
References: <4e685057-b07d-745d-fdaa-1a6a5a681060@openvz.org>
 <0fe836b4-5c0f-0e32-d511-db816d359748@openvz.org>
 <c516033f-a9e4-3485-26d9-a68afa694c1d@openvz.org>
 <YrSP25ebDmXE+kPS@dhcp22.suse.cz>
 <CALvZod4WsmrpfZtfnP4AmvtnkvBRngXXGyhM6+aQzkdfjsinAQ@mail.gmail.com>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Return-path: <cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=openvz-org.20210112.gappssmtp.com; s=20210112;
        h=message-id:date:mime-version:user-agent:subject:content-language:to
         :cc:references:from:in-reply-to:content-transfer-encoding;
        bh=AJmnLzRfv3CKjpAFSm8w5uUX4zkxlzpS9NJDok/nCk8=;
        b=7HedpVdtZRLHDg268+oLo/eUh38UD497b1xuP0kyTrOApUY9ey+kT8c0TnmBPY+t9U
         mL3HMOpxmxQIYWukMfePC89vZe9yHBTYZIR98SC8JnF5PNc4ICv87ZpVlu0TxWY8d45R
         d9z0Mgc5yU0iDhbJBhLbs8Baz9W25uKdFESjuRdahXy0c+L74oYBwxW/vkQdtCl/Zp2b
         o7Ofb3Rb/6lKL6zcj87m09gmw0mAbgTtb4y8Hd8FQfwViCQZAfDmOdf8Vt7L2w/ctbFt
         osjhl7TXYXzxDD/VddKOpv5+aF5y9/2rtyKZkQcDfn86chNYao3fOf/kYoH3POPiDcmI
         cuLQ==
Content-Language: en-US
In-Reply-To: <CALvZod4WsmrpfZtfnP4AmvtnkvBRngXXGyhM6+aQzkdfjsinAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
List-ID: <cgroups.vger.kernel.org>
Content-Type: text/plain; charset="us-ascii"
To: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
Cc: kernel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>, Roman Gushchin <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>, =?UTF-8?Q?Michal_Koutn=c3=bd?= <mkoutny-IBi9RG/b67k@public.gmane.org>, Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>, Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>, Cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>

On 6/23/22 19:55, Shakeel Butt wrote:
> On Thu, Jun 23, 2022 at 9:07 AM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
>>
>> On Thu 23-06-22 18:03:31, Vasily Averin wrote:
>>> Dear Michal,
>>> do you still have any concerns about this patch set?
>>
>> Yes, I do not think we have concluded this to be really necessary. IIRC
>> Roman would like to see lingering cgroups addressed in not-so-distant
>> future (http://lkml.kernel.org/r/Ypd2DW7id4M3KJJW@carbon) and we already
>> have a limit for the number of cgroups in the tree. So why should we
>> chase after allocations that correspond the cgroups and somehow try to
>> cap their number via the memory consumption. This looks like something
>> that will get out of sync eventually and it also doesn't seem like the
>> best control to me (comparing to an explicit limit to prevent runaways).
>> --
> 
> Let me give a counter argument to that. On a system running multiple
> workloads, how can the admin come up with a sensible limit for the
> number of cgroups? There will definitely be jobs that require much
> more number of sub-cgroups. Asking the admins to dynamically tune
> another tuneable is just asking for more complications. At the end all
> the users would just set it to max.
> 
> I would recommend to see the commit ac7b79fd190b ("inotify, memcg:
> account inotify instances to kmemcg") where there is already a sysctl
> (inotify/max_user_instances) to limit the number of instances but
> there was no sensible way to set that limit on a multi-tenant system.

I've found that MEM_CGROUP_ID_MAX limits memory cgroups only. Other types
of cgroups do not have similar restrictions. Yes, we can set some per-container 
limit for all cgroups, but to me it looks like workaround while
proper memory accounting looks like real solution.

Btw could you please explain why memory cgroups have MEM_CGROUP_ID_MAX limit
Why it is required at all and why it was set to USHRT_MAX? I believe that
in the future it may be really reachable:

Let's set up per-container cgroup limit to some small numbers, 
for example to 512 as OpenVz doing right now. On real node with 300
containers we can easily get 100*300 = 30000 cgroups, and consume ~3Gb memory, 
without any misuse. I think it is too much to ignore its accounting.

Thank you,
	Vasily Averin