From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18B03C433EF for ; Fri, 24 Jun 2022 10:40:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230490AbiFXKkV (ORCPT ); Fri, 24 Jun 2022 06:40:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229902AbiFXKkT (ORCPT ); Fri, 24 Jun 2022 06:40:19 -0400 Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 379BE7C85F for ; Fri, 24 Jun 2022 03:40:17 -0700 (PDT) Received: by mail-lf1-x12d.google.com with SMTP id w20so3710539lfa.11 for ; Fri, 24 Jun 2022 03:40:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=openvz-org.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=AJmnLzRfv3CKjpAFSm8w5uUX4zkxlzpS9NJDok/nCk8=; b=7HedpVdtZRLHDg268+oLo/eUh38UD497b1xuP0kyTrOApUY9ey+kT8c0TnmBPY+t9U mL3HMOpxmxQIYWukMfePC89vZe9yHBTYZIR98SC8JnF5PNc4ICv87ZpVlu0TxWY8d45R d9z0Mgc5yU0iDhbJBhLbs8Baz9W25uKdFESjuRdahXy0c+L74oYBwxW/vkQdtCl/Zp2b o7Ofb3Rb/6lKL6zcj87m09gmw0mAbgTtb4y8Hd8FQfwViCQZAfDmOdf8Vt7L2w/ctbFt osjhl7TXYXzxDD/VddKOpv5+aF5y9/2rtyKZkQcDfn86chNYao3fOf/kYoH3POPiDcmI cuLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=AJmnLzRfv3CKjpAFSm8w5uUX4zkxlzpS9NJDok/nCk8=; b=LZexuOqFGhd+gxMqELM1ZBd/tOBVPww0fLLKM4FdvrsATAwGOC4NTE9RMT4f86q5ir WP/EJLz/HZ+GgXhkX0gSmmtFCTCTb9eV77E8AjWLDTOW9ss3Fz2jiZ60ynyeu1X+THkq 0JiDK99qujdfB29daA656kPAJyN2Y5XcsDw8n+KzMnP2HaP40O7PvXbjmsqYjKRFDzRs SZp9NPHSJEhBUB/Z8p21xMzPLvLnhh9oDmyPqVZBsdePrs8u4+7oGjlE8JPJumC1deKs OgsWQnbR2r2LgR6LMmYuuPkWSuH3vXSfJsFA9vN7qNhyaSwcc94K/BOwQ6ftdytAB7dt /8TA== X-Gm-Message-State: AJIora8FkPFMn1KlnbSwQ7C7TmOzwUXzrF6VkwrgWrzQNxUDwoaByjkL Orm3iprKy8Qisu5rtTz632/2yQ== X-Google-Smtp-Source: AGRyM1spY9eC2EEhD934ao3bTtj+9AIG31KaKJxVQ5yli2D+p9LF1XUzqtEEzlNZUQyOCZEGB0y6hQ== X-Received: by 2002:a19:dc0f:0:b0:47f:1b37:8d8a with SMTP id t15-20020a19dc0f000000b0047f1b378d8amr8422183lfg.12.1656067215555; Fri, 24 Jun 2022 03:40:15 -0700 (PDT) Received: from [192.168.1.65] ([46.188.121.129]) by smtp.gmail.com with ESMTPSA id bq26-20020a056512151a00b0047976e7388bsm311119lfb.81.2022.06.24.03.40.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 24 Jun 2022 03:40:15 -0700 (PDT) Message-ID: <0f8146e3-5865-b7e6-6728-5baada375cf2@openvz.org> Date: Fri, 24 Jun 2022 13:40:14 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH mm v5 0/9] memcg: accounting for objects allocated by mkdir, cgroup Content-Language: en-US To: Shakeel Butt , Michal Hocko Cc: kernel@openvz.org, Andrew Morton , LKML , Linux MM , Roman Gushchin , =?UTF-8?Q?Michal_Koutn=c3=bd?= , Vlastimil Babka , Muchun Song , Cgroups References: <4e685057-b07d-745d-fdaa-1a6a5a681060@openvz.org> <0fe836b4-5c0f-0e32-d511-db816d359748@openvz.org> From: Vasily Averin In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/23/22 19:55, Shakeel Butt wrote: > On Thu, Jun 23, 2022 at 9:07 AM Michal Hocko wrote: >> >> On Thu 23-06-22 18:03:31, Vasily Averin wrote: >>> Dear Michal, >>> do you still have any concerns about this patch set? >> >> Yes, I do not think we have concluded this to be really necessary. IIRC >> Roman would like to see lingering cgroups addressed in not-so-distant >> future (http://lkml.kernel.org/r/Ypd2DW7id4M3KJJW@carbon) and we already >> have a limit for the number of cgroups in the tree. So why should we >> chase after allocations that correspond the cgroups and somehow try to >> cap their number via the memory consumption. This looks like something >> that will get out of sync eventually and it also doesn't seem like the >> best control to me (comparing to an explicit limit to prevent runaways). >> -- > > Let me give a counter argument to that. On a system running multiple > workloads, how can the admin come up with a sensible limit for the > number of cgroups? There will definitely be jobs that require much > more number of sub-cgroups. Asking the admins to dynamically tune > another tuneable is just asking for more complications. At the end all > the users would just set it to max. > > I would recommend to see the commit ac7b79fd190b ("inotify, memcg: > account inotify instances to kmemcg") where there is already a sysctl > (inotify/max_user_instances) to limit the number of instances but > there was no sensible way to set that limit on a multi-tenant system. I've found that MEM_CGROUP_ID_MAX limits memory cgroups only. Other types of cgroups do not have similar restrictions. Yes, we can set some per-container limit for all cgroups, but to me it looks like workaround while proper memory accounting looks like real solution. Btw could you please explain why memory cgroups have MEM_CGROUP_ID_MAX limit Why it is required at all and why it was set to USHRT_MAX? I believe that in the future it may be really reachable: Let's set up per-container cgroup limit to some small numbers, for example to 512 as OpenVz doing right now. On real node with 300 containers we can easily get 100*300 = 30000 cgroups, and consume ~3Gb memory, without any misuse. I think it is too much to ignore its accounting. Thank you, Vasily Averin From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vasily Averin Subject: Re: [PATCH mm v5 0/9] memcg: accounting for objects allocated by mkdir, cgroup Date: Fri, 24 Jun 2022 13:40:14 +0300 Message-ID: <0f8146e3-5865-b7e6-6728-5baada375cf2@openvz.org> References: <4e685057-b07d-745d-fdaa-1a6a5a681060@openvz.org> <0fe836b4-5c0f-0e32-d511-db816d359748@openvz.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=openvz-org.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=AJmnLzRfv3CKjpAFSm8w5uUX4zkxlzpS9NJDok/nCk8=; b=7HedpVdtZRLHDg268+oLo/eUh38UD497b1xuP0kyTrOApUY9ey+kT8c0TnmBPY+t9U mL3HMOpxmxQIYWukMfePC89vZe9yHBTYZIR98SC8JnF5PNc4ICv87ZpVlu0TxWY8d45R d9z0Mgc5yU0iDhbJBhLbs8Baz9W25uKdFESjuRdahXy0c+L74oYBwxW/vkQdtCl/Zp2b o7Ofb3Rb/6lKL6zcj87m09gmw0mAbgTtb4y8Hd8FQfwViCQZAfDmOdf8Vt7L2w/ctbFt osjhl7TXYXzxDD/VddKOpv5+aF5y9/2rtyKZkQcDfn86chNYao3fOf/kYoH3POPiDcmI cuLQ== Content-Language: en-US In-Reply-To: List-ID: Content-Type: text/plain; charset="us-ascii" To: Shakeel Butt , Michal Hocko Cc: kernel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, Andrew Morton , LKML , Linux MM , Roman Gushchin , =?UTF-8?Q?Michal_Koutn=c3=bd?= , Vlastimil Babka , Muchun Song , Cgroups On 6/23/22 19:55, Shakeel Butt wrote: > On Thu, Jun 23, 2022 at 9:07 AM Michal Hocko wrote: >> >> On Thu 23-06-22 18:03:31, Vasily Averin wrote: >>> Dear Michal, >>> do you still have any concerns about this patch set? >> >> Yes, I do not think we have concluded this to be really necessary. IIRC >> Roman would like to see lingering cgroups addressed in not-so-distant >> future (http://lkml.kernel.org/r/Ypd2DW7id4M3KJJW@carbon) and we already >> have a limit for the number of cgroups in the tree. So why should we >> chase after allocations that correspond the cgroups and somehow try to >> cap their number via the memory consumption. This looks like something >> that will get out of sync eventually and it also doesn't seem like the >> best control to me (comparing to an explicit limit to prevent runaways). >> -- > > Let me give a counter argument to that. On a system running multiple > workloads, how can the admin come up with a sensible limit for the > number of cgroups? There will definitely be jobs that require much > more number of sub-cgroups. Asking the admins to dynamically tune > another tuneable is just asking for more complications. At the end all > the users would just set it to max. > > I would recommend to see the commit ac7b79fd190b ("inotify, memcg: > account inotify instances to kmemcg") where there is already a sysctl > (inotify/max_user_instances) to limit the number of instances but > there was no sensible way to set that limit on a multi-tenant system. I've found that MEM_CGROUP_ID_MAX limits memory cgroups only. Other types of cgroups do not have similar restrictions. Yes, we can set some per-container limit for all cgroups, but to me it looks like workaround while proper memory accounting looks like real solution. Btw could you please explain why memory cgroups have MEM_CGROUP_ID_MAX limit Why it is required at all and why it was set to USHRT_MAX? I believe that in the future it may be really reachable: Let's set up per-container cgroup limit to some small numbers, for example to 512 as OpenVz doing right now. On real node with 300 containers we can easily get 100*300 = 30000 cgroups, and consume ~3Gb memory, without any misuse. I think it is too much to ignore its accounting. Thank you, Vasily Averin