All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeelb@google.com>, <linux-mm@kvack.org>,
	<kernel-team@fb.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 4/5] mm: memcg: charge memcg percpu memory to the parent cgroup
Date: Tue, 11 Aug 2020 10:06:11 -0700	[thread overview]
Message-ID: <20200811170611.GB1507044@carbon.DHCP.thefacebook.com> (raw)
In-Reply-To: <20200811152737.GB650506@cmpxchg.org>

On Tue, Aug 11, 2020 at 11:27:37AM -0400, Johannes Weiner wrote:
> On Tue, Jun 23, 2020 at 11:45:14AM -0700, Roman Gushchin wrote:
> > Memory cgroups are using large chunks of percpu memory to store vmstat
> > data.  Yet this memory is not accounted at all, so in the case when there
> > are many (dying) cgroups, it's not exactly clear where all the memory is.
> > 
> > Because the size of memory cgroup internal structures can dramatically
> > exceed the size of object or page which is pinning it in the memory, it's
> > not a good idea to simple ignore it.  It actually breaks the isolation
> > between cgroups.
> > 
> > Let's account the consumed percpu memory to the parent cgroup.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Acked-by: Dennis Zhou <dennis@kernel.org>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Thank you!

> 
> This makes sense, and the accounting is in line with how we track and
> distribute child creation quotas (cgroup.max.descendants and
> cgroup.max.depth) up the cgroup tree.
> 
> I have one minor comment that isn't a dealbreaker for me:
> 
> > @@ -5069,13 +5069,15 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> >  	if (!pn)
> >  		return 1;
> >  
> > -	pn->lruvec_stat_local = alloc_percpu(struct lruvec_stat);
> > +	pn->lruvec_stat_local = alloc_percpu_gfp(struct lruvec_stat,
> > +						 GFP_KERNEL_ACCOUNT);
> >  	if (!pn->lruvec_stat_local) {
> >  		kfree(pn);
> >  		return 1;
> >  	}
> >  
> > -	pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat);
> > +	pn->lruvec_stat_cpu = alloc_percpu_gfp(struct lruvec_stat,
> > +					       GFP_KERNEL_ACCOUNT);
> >  	if (!pn->lruvec_stat_cpu) {
> >  		free_percpu(pn->lruvec_stat_local);
> >  		kfree(pn);
> > @@ -5149,11 +5151,13 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
> >  		goto fail;
> >  	}
> >  
> > -	memcg->vmstats_local = alloc_percpu(struct memcg_vmstats_percpu);
> > +	memcg->vmstats_local = alloc_percpu_gfp(struct memcg_vmstats_percpu,
> > +						GFP_KERNEL_ACCOUNT);
> >  	if (!memcg->vmstats_local)
> >  		goto fail;
> >  
> > -	memcg->vmstats_percpu = alloc_percpu(struct memcg_vmstats_percpu);
> > +	memcg->vmstats_percpu = alloc_percpu_gfp(struct memcg_vmstats_percpu,
> > +						 GFP_KERNEL_ACCOUNT);
> >  	if (!memcg->vmstats_percpu)
> >  		goto fail;
> >  
> > @@ -5202,7 +5206,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> >  	struct mem_cgroup *memcg;
> >  	long error = -ENOMEM;
> >  
> > +	memalloc_use_memcg(parent);
> >  	memcg = mem_cgroup_alloc();
> > +	memalloc_unuse_memcg();
> 
> The disconnect between 1) requesting accounting and 2) which cgroup to
> charge is making me uneasy. It makes mem_cgroup_alloc() a bit of a
> handgrenade, because accounting to the current task is almost
> guaranteed to be wrong if the use_memcg() annotation were to get lost
> in a refactor or not make it to a new caller of the function.
> 
> The saving grace is that mem_cgroup_alloc() is pretty unlikely to be
> used elsewhere. And pretending it's an independent interface would be
> overengineering. But how about the following in mem_cgroup_alloc() and
> alloc_mem_cgroup_per_node_info() to document that caller relationship:
> 
> 	/* We charge the parent cgroup, never the current task */
> 	WARN_ON_ONCE(!current->active_memcg);

I have nothing against.

Andrew, can you, please, squash the following diff into the patch?

Thanks!

--

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 130093bdf74b..e25f2db7e61c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5137,6 +5137,9 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
        if (!pn)
                return 1;
 
+       /* We charge the parent cgroup, never the current task */
+       WARN_ON_ONCE(!current->active_memcg);
+
        pn->lruvec_stat_local = alloc_percpu_gfp(struct lruvec_stat,
                                                 GFP_KERNEL_ACCOUNT);
        if (!pn->lruvec_stat_local) {
@@ -5219,6 +5222,9 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
                goto fail;
        }
 
+       /* We charge the parent cgroup, never the current task */
+       WARN_ON_ONCE(!current->active_memcg);
+
        memcg->vmstats_local = alloc_percpu_gfp(struct memcg_vmstats_percpu,
                                                GFP_KERNEL_ACCOUNT);
        if (!memcg->vmstats_local)

  reply	other threads:[~2020-08-11 17:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23 18:45 [PATCH v3 0/5] mm: memcg accounting of percpu memory Roman Gushchin
2020-06-23 18:45 ` [PATCH v3 1/5] percpu: return number of released bytes from pcpu_free_area() Roman Gushchin
2020-06-24  0:58   ` Shakeel Butt
2020-06-24  0:58     ` Shakeel Butt
2020-06-23 18:45 ` [PATCH v3 2/5] mm: memcg/percpu: account percpu memory to memory cgroups Roman Gushchin
2020-06-24  1:25   ` Shakeel Butt
2020-06-24  1:25     ` Shakeel Butt
2020-06-23 18:45 ` [PATCH v3 3/5] mm: memcg/percpu: per-memcg percpu memory statistics Roman Gushchin
2020-06-24  1:35   ` Shakeel Butt
2020-06-24  1:35     ` Shakeel Butt
2020-08-11 15:05   ` Johannes Weiner
2020-06-23 18:45 ` [PATCH v3 4/5] mm: memcg: charge memcg percpu memory to the parent cgroup Roman Gushchin
2020-06-24  1:40   ` Shakeel Butt
2020-06-24  1:40     ` Shakeel Butt
2020-06-24  1:49     ` Roman Gushchin
2020-07-29 17:10   ` Michal Koutný
2020-08-07  4:16     ` Andrew Morton
2020-08-07  4:37       ` Roman Gushchin
2020-08-10 19:33         ` Roman Gushchin
2020-08-11 14:47         ` Michal Koutný
2020-08-11 16:55           ` Roman Gushchin
2020-08-11 18:32             ` Michal Koutný
2020-08-11 19:32               ` Roman Gushchin
2020-08-12 16:28                 ` Michal Koutný
2020-08-11 15:27   ` Johannes Weiner
2020-08-11 17:06     ` Roman Gushchin [this message]
2020-08-13  9:16       ` Naresh Kamboju
2020-08-13  9:16         ` Naresh Kamboju
2020-08-13  9:16         ` Naresh Kamboju
2020-08-13 23:27         ` Stephen Rothwell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200811170611.GB1507044@carbon.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dennis@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.