From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B5E26C43334
	for <linux-kernel@archiver.kernel.org>; Tue, 28 Jun 2022 03:43:16 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S243490AbiF1DnO (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Jun 2022 23:43:14 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60770 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231951AbiF1DnK (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Jun 2022 23:43:10 -0400
Received: from mail-lf1-x133.google.com (mail-lf1-x133.google.com [IPv6:2a00:1450:4864:20::133])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEEA52672
        for <linux-kernel@vger.kernel.org>; Mon, 27 Jun 2022 20:43:06 -0700 (PDT)
Received: by mail-lf1-x133.google.com with SMTP id a4so6448387lfm.0
        for <linux-kernel@vger.kernel.org>; Mon, 27 Jun 2022 20:43:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=openvz-org.20210112.gappssmtp.com; s=20210112;
        h=message-id:date:mime-version:user-agent:subject:content-language:to
         :cc:references:from:in-reply-to:content-transfer-encoding;
        bh=wqQTvy4npFnWvHaeytKvLIfMhrlsDZms9Ty/8zJvQjs=;
        b=64xD/V4SyUPW+r9Dv8phb4pTH4m0bCeeUz5aipfYF7c5UGDNkGiFBmg+C6z8y1Hrbr
         du+px0oVBuZ+uX02mYW+Dl6Jh+fxDHKqQXmWUZuXy2ryeQYSYs8FrRD+CZO8wraGqO5n
         7IySqR1dCXsHmRt7gxHDWXK3RhgffiSFFMrHh/x+6cKqmtYcFZ4ratxoXAJe3J9S4wnr
         yJEqZci8S9QMvNexA/TvkWJANj800nAn8LlQs7d/hAYl3U4FjbLBZajINip30oMR4JsW
         Lrf9MYDoTwmpGxE+th98A5tqW14W5oXwHrcqbD9TIoMuyiaK7dCFbUpHU7GEMPWMjVpZ
         vStQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
         :content-language:to:cc:references:from:in-reply-to
         :content-transfer-encoding;
        bh=wqQTvy4npFnWvHaeytKvLIfMhrlsDZms9Ty/8zJvQjs=;
        b=Ft9asG2M9zCwhCT5ZTQw62Jy4/WAb2XSM226N9Pmib+O3V3oYhivPnEcvih/1g8PwO
         HBvGcMDAVL6rDPb0wUGmhf6D02pJOR/byiyD9gZANqVQkUlVbWXmfRxTAFAtHDfRp5HS
         aNpW7YmtaQ+kNI2D+UBWYHXkNPBEmEU/S/QUd9Ww4orbUpssFthL7BC8pUd8WULczaaL
         oce0k9AhCCsPi8hZsSE/0WCNUILAQP19FwNO/5yb9msH0GbzF+55j0FxbtxjiaKQmaJq
         LZuRbEwtKv84jNhFxzeJ312ffIfWX9QRm5MflKIhCZ3hIGNmtliHTdzbYwd4hHdJuyNQ
         fq/Q==
X-Gm-Message-State: AJIora+GnENWzAXSZbtV9hqsGmo/rHPnogXps4VNYxB6VzLUxV4XFeqo
        y0F2JBetKWZOJset+DGeXTeBOg==
X-Google-Smtp-Source: AGRyM1uClJxQRttLZzkyeEI9s/FAeEu5A4OnJvh/ak//r13c+AlqjSUMcfZ222BEJVd6tRcw50JWvQ==
X-Received: by 2002:a19:7917:0:b0:481:e0c:29fd with SMTP id u23-20020a197917000000b004810e0c29fdmr8139579lfc.350.1656387784592;
        Mon, 27 Jun 2022 20:43:04 -0700 (PDT)
Received: from [192.168.1.65] ([46.188.121.129])
        by smtp.gmail.com with ESMTPSA id o19-20020ac24e93000000b0047f8cb94004sm2038474lfr.35.2022.06.27.20.43.03
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 27 Jun 2022 20:43:04 -0700 (PDT)
Message-ID: <e93596ef-d4eb-5fcb-744c-9ef99cb5a69e@openvz.org>
Date:   Tue, 28 Jun 2022 06:43:02 +0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.9.1
Subject: Re: [PATCH mm v2] memcg: notify about global mem_cgroup_id space
 depletion
Content-Language: en-US
To:     Roman Gushchin <roman.gushchin@linux.dev>
Cc:     Muchun Song <songmuchun@bytedance.com>,
        Shakeel Butt <shakeelb@google.com>,
        =?UTF-8?Q?Michal_Koutn=c3=bd?= <mkoutny@suse.com>,
        Michal Hocko <mhocko@suse.com>, kernel@openvz.org,
        LKML <linux-kernel@vger.kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Linux Memory Management List <linux-mm@kvack.org>,
        Vlastimil Babka <vbabka@suse.cz>,
        Cgroups <cgroups@vger.kernel.org>
References: <Yre8tNUY8vBrO0yl@castle>
 <97bed1fd-f230-c2ea-1cb6-8230825a9a64@openvz.org>
 <CAMZfGtWQEFmyuDngPfg59D-+b9sf58m9qhGoVPSQ_jAGmgT+sg@mail.gmail.com>
 <f3e4059c-69ea-eccd-a22f-9f6c6780f33a@openvz.org> <YrpVP6rpYGFsl3jj@castle>
From:   Vasily Averin <vvs@openvz.org>
In-Reply-To: <YrpVP6rpYGFsl3jj@castle>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 6/28/22 04:11, Roman Gushchin wrote:
> On Mon, Jun 27, 2022 at 09:49:18AM +0300, Vasily Averin wrote:
>> On 6/27/22 06:23, Muchun Song wrote:
>>> If the caller can know -ENOSPC is returned by mkdir(), then I
>>> think the user (perhaps systemd) is the best place to throw out the
>>> error message instead of in the kernel log. Right?
>>
>> Such an incident may occur inside the container.
>> OpenVZ nodes can host 300-400 containers, and the host admin cannot
>> monitor guest logs. the dmesg message is necessary to inform the host
>> owner that the global limit has been reached, otherwise he can
>> continue to believe that there are no problems on the node.
> 
> Why this is happening? It's hard to believe someone really needs that
> many cgroups. Is this when somebody fails to delete old cgroups?

I do not have direct claims that some node really reached this limit,
however I saw crashdumps with 30000+ cgroups.

Theoretically OpenVz/LXC nodes can host up to several thousand containers
per node. Practically production nodes with 300-400 containers are a
common thing. I assume that each container can easily use up to 100-200
memory cgroups, and I think this is normal consumption.

Therefore, I believe that 64K limit is quite achievable in real life.
Primary goal of my patch is to confirm this theory.

> I wanted to say that it's better to introduce a memcg event, but then
> I realized it's probably not worth the wasted space. Is this a common
> scenario?
> 
> I think a better approach will be to add a cgroup event (displayed via
> cgroup.events) about reaching the maximum limit of cgroups. E.g.
> cgroups.events::max_nr_reached. Then you can set cgroup.max.descendants
> to some value below memcg_id space size. It's more work, but IMO it's
> a better way to communicate this event. As a bonus, you can easily
> get an idea which cgroup depletes the limit.

For my goal (i.e. just to confirm that 64K limit was reached) this
functionality is too complicated. This confirmation is important because
it should push us to increase the global limit.

However, I think your idea is great, In perspective it helps both OpenVZ
and LXC and possibly Shakeel to understand the real memcg using and set
the proper limit for containers. 

I'm going to prepare such patches, however I'm not sure I'll have enough
time for this task in the near future.

Thank you,
	Vasily Averin

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Vasily Averin <vvs-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Subject: Re: [PATCH mm v2] memcg: notify about global mem_cgroup_id space
 depletion
Date: Tue, 28 Jun 2022 06:43:02 +0300
Message-ID: <e93596ef-d4eb-5fcb-744c-9ef99cb5a69e@openvz.org>
References: <Yre8tNUY8vBrO0yl@castle>
 <97bed1fd-f230-c2ea-1cb6-8230825a9a64@openvz.org>
 <CAMZfGtWQEFmyuDngPfg59D-+b9sf58m9qhGoVPSQ_jAGmgT+sg@mail.gmail.com>
 <f3e4059c-69ea-eccd-a22f-9f6c6780f33a@openvz.org> <YrpVP6rpYGFsl3jj@castle>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Return-path: <cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=openvz-org.20210112.gappssmtp.com; s=20210112;
        h=message-id:date:mime-version:user-agent:subject:content-language:to
         :cc:references:from:in-reply-to:content-transfer-encoding;
        bh=wqQTvy4npFnWvHaeytKvLIfMhrlsDZms9Ty/8zJvQjs=;
        b=64xD/V4SyUPW+r9Dv8phb4pTH4m0bCeeUz5aipfYF7c5UGDNkGiFBmg+C6z8y1Hrbr
         du+px0oVBuZ+uX02mYW+Dl6Jh+fxDHKqQXmWUZuXy2ryeQYSYs8FrRD+CZO8wraGqO5n
         7IySqR1dCXsHmRt7gxHDWXK3RhgffiSFFMrHh/x+6cKqmtYcFZ4ratxoXAJe3J9S4wnr
         yJEqZci8S9QMvNexA/TvkWJANj800nAn8LlQs7d/hAYl3U4FjbLBZajINip30oMR4JsW
         Lrf9MYDoTwmpGxE+th98A5tqW14W5oXwHrcqbD9TIoMuyiaK7dCFbUpHU7GEMPWMjVpZ
         vStQ==
Content-Language: en-US
In-Reply-To: <YrpVP6rpYGFsl3jj@castle>
List-ID: <cgroups.vger.kernel.org>
Content-Type: text/plain; charset="us-ascii"
To: Roman Gushchin <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Cc: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>, Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, =?UTF-8?Q?Michal_Koutn=c3=bd?= <mkoutny-IBi9RG/b67k@public.gmane.org>, Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>, kernel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org, LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Linux Memory Management List <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>, Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>, Cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>

On 6/28/22 04:11, Roman Gushchin wrote:
> On Mon, Jun 27, 2022 at 09:49:18AM +0300, Vasily Averin wrote:
>> On 6/27/22 06:23, Muchun Song wrote:
>>> If the caller can know -ENOSPC is returned by mkdir(), then I
>>> think the user (perhaps systemd) is the best place to throw out the
>>> error message instead of in the kernel log. Right?
>>
>> Such an incident may occur inside the container.
>> OpenVZ nodes can host 300-400 containers, and the host admin cannot
>> monitor guest logs. the dmesg message is necessary to inform the host
>> owner that the global limit has been reached, otherwise he can
>> continue to believe that there are no problems on the node.
> 
> Why this is happening? It's hard to believe someone really needs that
> many cgroups. Is this when somebody fails to delete old cgroups?

I do not have direct claims that some node really reached this limit,
however I saw crashdumps with 30000+ cgroups.

Theoretically OpenVz/LXC nodes can host up to several thousand containers
per node. Practically production nodes with 300-400 containers are a
common thing. I assume that each container can easily use up to 100-200
memory cgroups, and I think this is normal consumption.

Therefore, I believe that 64K limit is quite achievable in real life.
Primary goal of my patch is to confirm this theory.

> I wanted to say that it's better to introduce a memcg event, but then
> I realized it's probably not worth the wasted space. Is this a common
> scenario?
> 
> I think a better approach will be to add a cgroup event (displayed via
> cgroup.events) about reaching the maximum limit of cgroups. E.g.
> cgroups.events::max_nr_reached. Then you can set cgroup.max.descendants
> to some value below memcg_id space size. It's more work, but IMO it's
> a better way to communicate this event. As a bonus, you can easily
> get an idea which cgroup depletes the limit.

For my goal (i.e. just to confirm that 64K limit was reached) this
functionality is too complicated. This confirmation is important because
it should push us to increase the global limit.

However, I think your idea is great, In perspective it helps both OpenVZ
and LXC and possibly Shakeel to understand the real memcg using and set
the proper limit for containers. 

I'm going to prepare such patches, however I'm not sure I'll have enough
time for this task in the near future.

Thank you,
	Vasily Averin