From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5E26C43334 for ; Tue, 28 Jun 2022 03:43:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243490AbiF1DnO (ORCPT ); Mon, 27 Jun 2022 23:43:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231951AbiF1DnK (ORCPT ); Mon, 27 Jun 2022 23:43:10 -0400 Received: from mail-lf1-x133.google.com (mail-lf1-x133.google.com [IPv6:2a00:1450:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEEA52672 for ; Mon, 27 Jun 2022 20:43:06 -0700 (PDT) Received: by mail-lf1-x133.google.com with SMTP id a4so6448387lfm.0 for ; Mon, 27 Jun 2022 20:43:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=openvz-org.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=wqQTvy4npFnWvHaeytKvLIfMhrlsDZms9Ty/8zJvQjs=; b=64xD/V4SyUPW+r9Dv8phb4pTH4m0bCeeUz5aipfYF7c5UGDNkGiFBmg+C6z8y1Hrbr du+px0oVBuZ+uX02mYW+Dl6Jh+fxDHKqQXmWUZuXy2ryeQYSYs8FrRD+CZO8wraGqO5n 7IySqR1dCXsHmRt7gxHDWXK3RhgffiSFFMrHh/x+6cKqmtYcFZ4ratxoXAJe3J9S4wnr yJEqZci8S9QMvNexA/TvkWJANj800nAn8LlQs7d/hAYl3U4FjbLBZajINip30oMR4JsW Lrf9MYDoTwmpGxE+th98A5tqW14W5oXwHrcqbD9TIoMuyiaK7dCFbUpHU7GEMPWMjVpZ vStQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=wqQTvy4npFnWvHaeytKvLIfMhrlsDZms9Ty/8zJvQjs=; b=Ft9asG2M9zCwhCT5ZTQw62Jy4/WAb2XSM226N9Pmib+O3V3oYhivPnEcvih/1g8PwO HBvGcMDAVL6rDPb0wUGmhf6D02pJOR/byiyD9gZANqVQkUlVbWXmfRxTAFAtHDfRp5HS aNpW7YmtaQ+kNI2D+UBWYHXkNPBEmEU/S/QUd9Ww4orbUpssFthL7BC8pUd8WULczaaL oce0k9AhCCsPi8hZsSE/0WCNUILAQP19FwNO/5yb9msH0GbzF+55j0FxbtxjiaKQmaJq LZuRbEwtKv84jNhFxzeJ312ffIfWX9QRm5MflKIhCZ3hIGNmtliHTdzbYwd4hHdJuyNQ fq/Q== X-Gm-Message-State: AJIora+GnENWzAXSZbtV9hqsGmo/rHPnogXps4VNYxB6VzLUxV4XFeqo y0F2JBetKWZOJset+DGeXTeBOg== X-Google-Smtp-Source: AGRyM1uClJxQRttLZzkyeEI9s/FAeEu5A4OnJvh/ak//r13c+AlqjSUMcfZ222BEJVd6tRcw50JWvQ== X-Received: by 2002:a19:7917:0:b0:481:e0c:29fd with SMTP id u23-20020a197917000000b004810e0c29fdmr8139579lfc.350.1656387784592; Mon, 27 Jun 2022 20:43:04 -0700 (PDT) Received: from [192.168.1.65] ([46.188.121.129]) by smtp.gmail.com with ESMTPSA id o19-20020ac24e93000000b0047f8cb94004sm2038474lfr.35.2022.06.27.20.43.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Jun 2022 20:43:04 -0700 (PDT) Message-ID: Date: Tue, 28 Jun 2022 06:43:02 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH mm v2] memcg: notify about global mem_cgroup_id space depletion Content-Language: en-US To: Roman Gushchin Cc: Muchun Song , Shakeel Butt , =?UTF-8?Q?Michal_Koutn=c3=bd?= , Michal Hocko , kernel@openvz.org, LKML , Andrew Morton , Linux Memory Management List , Vlastimil Babka , Cgroups References: <97bed1fd-f230-c2ea-1cb6-8230825a9a64@openvz.org> From: Vasily Averin In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/28/22 04:11, Roman Gushchin wrote: > On Mon, Jun 27, 2022 at 09:49:18AM +0300, Vasily Averin wrote: >> On 6/27/22 06:23, Muchun Song wrote: >>> If the caller can know -ENOSPC is returned by mkdir(), then I >>> think the user (perhaps systemd) is the best place to throw out the >>> error message instead of in the kernel log. Right? >> >> Such an incident may occur inside the container. >> OpenVZ nodes can host 300-400 containers, and the host admin cannot >> monitor guest logs. the dmesg message is necessary to inform the host >> owner that the global limit has been reached, otherwise he can >> continue to believe that there are no problems on the node. > > Why this is happening? It's hard to believe someone really needs that > many cgroups. Is this when somebody fails to delete old cgroups? I do not have direct claims that some node really reached this limit, however I saw crashdumps with 30000+ cgroups. Theoretically OpenVz/LXC nodes can host up to several thousand containers per node. Practically production nodes with 300-400 containers are a common thing. I assume that each container can easily use up to 100-200 memory cgroups, and I think this is normal consumption. Therefore, I believe that 64K limit is quite achievable in real life. Primary goal of my patch is to confirm this theory. > I wanted to say that it's better to introduce a memcg event, but then > I realized it's probably not worth the wasted space. Is this a common > scenario? > > I think a better approach will be to add a cgroup event (displayed via > cgroup.events) about reaching the maximum limit of cgroups. E.g. > cgroups.events::max_nr_reached. Then you can set cgroup.max.descendants > to some value below memcg_id space size. It's more work, but IMO it's > a better way to communicate this event. As a bonus, you can easily > get an idea which cgroup depletes the limit. For my goal (i.e. just to confirm that 64K limit was reached) this functionality is too complicated. This confirmation is important because it should push us to increase the global limit. However, I think your idea is great, In perspective it helps both OpenVZ and LXC and possibly Shakeel to understand the real memcg using and set the proper limit for containers. I'm going to prepare such patches, however I'm not sure I'll have enough time for this task in the near future. Thank you, Vasily Averin From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vasily Averin Subject: Re: [PATCH mm v2] memcg: notify about global mem_cgroup_id space depletion Date: Tue, 28 Jun 2022 06:43:02 +0300 Message-ID: References: <97bed1fd-f230-c2ea-1cb6-8230825a9a64@openvz.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=openvz-org.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=wqQTvy4npFnWvHaeytKvLIfMhrlsDZms9Ty/8zJvQjs=; b=64xD/V4SyUPW+r9Dv8phb4pTH4m0bCeeUz5aipfYF7c5UGDNkGiFBmg+C6z8y1Hrbr du+px0oVBuZ+uX02mYW+Dl6Jh+fxDHKqQXmWUZuXy2ryeQYSYs8FrRD+CZO8wraGqO5n 7IySqR1dCXsHmRt7gxHDWXK3RhgffiSFFMrHh/x+6cKqmtYcFZ4ratxoXAJe3J9S4wnr yJEqZci8S9QMvNexA/TvkWJANj800nAn8LlQs7d/hAYl3U4FjbLBZajINip30oMR4JsW Lrf9MYDoTwmpGxE+th98A5tqW14W5oXwHrcqbD9TIoMuyiaK7dCFbUpHU7GEMPWMjVpZ vStQ== Content-Language: en-US In-Reply-To: List-ID: Content-Type: text/plain; charset="us-ascii" To: Roman Gushchin Cc: Muchun Song , Shakeel Butt , =?UTF-8?Q?Michal_Koutn=c3=bd?= , Michal Hocko , kernel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, LKML , Andrew Morton , Linux Memory Management List , Vlastimil Babka , Cgroups On 6/28/22 04:11, Roman Gushchin wrote: > On Mon, Jun 27, 2022 at 09:49:18AM +0300, Vasily Averin wrote: >> On 6/27/22 06:23, Muchun Song wrote: >>> If the caller can know -ENOSPC is returned by mkdir(), then I >>> think the user (perhaps systemd) is the best place to throw out the >>> error message instead of in the kernel log. Right? >> >> Such an incident may occur inside the container. >> OpenVZ nodes can host 300-400 containers, and the host admin cannot >> monitor guest logs. the dmesg message is necessary to inform the host >> owner that the global limit has been reached, otherwise he can >> continue to believe that there are no problems on the node. > > Why this is happening? It's hard to believe someone really needs that > many cgroups. Is this when somebody fails to delete old cgroups? I do not have direct claims that some node really reached this limit, however I saw crashdumps with 30000+ cgroups. Theoretically OpenVz/LXC nodes can host up to several thousand containers per node. Practically production nodes with 300-400 containers are a common thing. I assume that each container can easily use up to 100-200 memory cgroups, and I think this is normal consumption. Therefore, I believe that 64K limit is quite achievable in real life. Primary goal of my patch is to confirm this theory. > I wanted to say that it's better to introduce a memcg event, but then > I realized it's probably not worth the wasted space. Is this a common > scenario? > > I think a better approach will be to add a cgroup event (displayed via > cgroup.events) about reaching the maximum limit of cgroups. E.g. > cgroups.events::max_nr_reached. Then you can set cgroup.max.descendants > to some value below memcg_id space size. It's more work, but IMO it's > a better way to communicate this event. As a bonus, you can easily > get an idea which cgroup depletes the limit. For my goal (i.e. just to confirm that 64K limit was reached) this functionality is too complicated. This confirmation is important because it should push us to increase the global limit. However, I think your idea is great, In perspective it helps both OpenVZ and LXC and possibly Shakeel to understand the real memcg using and set the proper limit for containers. I'm going to prepare such patches, however I'm not sure I'll have enough time for this task in the near future. Thank you, Vasily Averin