All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>,
	Shakeel Butt <shakeelb@google.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 5/8] cgroup: rstat: punt root-level optimization to individual controllers
Date: Mon, 8 Feb 2021 15:29:21 -0500	[thread overview]
Message-ID: <YCGfIYTLzcTO+ng8@cmpxchg.org> (raw)
In-Reply-To: <YB4OT61owRaze5/M@mtj.duckdns.org>

On Fri, Feb 05, 2021 at 10:34:39PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Fri, Feb 05, 2021 at 01:28:03PM -0500, Johannes Weiner wrote:
> > Current users of the rstat code can source root-level statistics from
> > the native counters of their respective subsystem, allowing them to
> > forego aggregation at the root level. This optimization is currently
> > implemented inside the generic rstat code, which doesn't track the
> > root cgroup and doesn't invoke the subsystem flush callbacks on it.
> > 
> > However, the memory controller cannot do this optimization, because
> > cgroup1 breaks out memory specifically for the local level, including
> > at the root level. In preparation for the memory controller switching
> > to rstat, move the optimization from rstat core to the controllers.
> > 
> > Afterwards, rstat will always track the root cgroup for changes and
> > invoke the subsystem callbacks on it; and it's up to the subsystem to
> > special-case and skip aggregation of the root cgroup if it can source
> > this information through other, cheaper means.
> > 
> > The extra cost of tracking the root cgroup is negligible: on stat
> > changes, we actually remove a branch that checks for the root. The
> > queueing for a flush touches only per-cpu data, and only the first
> > stat change since a flush requires a (per-cpu) lock.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Generally looks good to me.
> 
> Acked-by: Tejun Heo <tj@kernel.org>

Thanks!

> A couple nits below.
> 
> > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > index 02ce2058c14b..76725e1cad7f 100644
> > --- a/block/blk-cgroup.c
> > +++ b/block/blk-cgroup.c
> > @@ -766,6 +766,10 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> >  	struct blkcg *blkcg = css_to_blkcg(css);
> >  	struct blkcg_gq *blkg;
> >  
> > +	/* Root-level stats are sourced from system-wide IO stats */
> > +	if (!cgroup_parent(css->cgroup))
> > +		return;
> > +
> >  	rcu_read_lock();
> >  
> >  	hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
> > @@ -789,6 +793,7 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> >  		u64_stats_update_end(&blkg->iostat.sync);
> >  
> >  		/* propagate global delta to parent */
> > +		/* XXX: could skip this if parent is root */
> >  		if (parent) {
> >  			u64_stats_update_begin(&parent->iostat.sync);
> >  			blkg_iostat_set(&delta, &blkg->iostat.cur);
> 
> Might as well update this similar to cgroup_base_stat_flush()?

I meant to revisit that, but I'm never 100% confident when it comes to
the interaction and lifetime of css, blkcg and blkg_gq.

IIUC, the blkg_gq->parent linkage always matches the css parent
linkage; it just exists as an optimization for ancestor walks, which
would otherwise have to do radix lookups when going through the css.

So with the cgroup_parent() check at the beginning of the function
making sure we're looking at a non-root group, blkg_gq->parent should
also never be NULL and I can do if (paren->parent) directly, right?

> > @@ -58,8 +53,16 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu)
> >  		if (rstatc->updated_next)
> >  			break;
> >  
> > +		if (!parent) {
> 
> Maybe useful to note that the node is being marked busy but not added to the
> non-existent parent.

Makes sense, I'll add a comment.

> > +			rstatc->updated_next = cgrp;
> > +			break;
> > +		}
> > +
> > +		prstatc = cgroup_rstat_cpu(parent, cpu);
> >  		rstatc->updated_next = prstatc->updated_children;
> >  		prstatc->updated_children = cgrp;
> > +
> > +		cgrp = parent;
> >  	}
> >  
> >  	raw_spin_unlock_irqrestore(cpu_lock, flags);
> ...
> >  static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
> >  {
> > -	struct cgroup *parent = cgroup_parent(cgrp);
> >  	struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
> > +	struct cgroup *parent = cgroup_parent(cgrp);
> 
> Is this chunk intentional?

Yeah, it puts the local variable declarations into reverse christmas
tree ordering to make them a bit easier to read. It's a while-at-it
cleanup, mostly a force of habit. I can drop it if it bothers you.

> >  	struct cgroup_base_stat cur, delta;
> >  	unsigned seq;
> >  
> > +	/* Root-level stats are sourced from system-wide CPU stats */
> > +	if (!parent)
> > +		return;
> > +
> >  	/* fetch the current per-cpu values */
> >  	do {
> >  		seq = __u64_stats_fetch_begin(&rstatc->bsync);
> > @@ -326,8 +336,8 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
> >  	cgroup_base_stat_add(&cgrp->bstat, &delta);
> >  	cgroup_base_stat_add(&rstatc->last_bstat, &delta);
> >  
> > -	/* propagate global delta to parent */
> > -	if (parent) {
> > +	/* propagate global delta to parent (unless that's root) */
> > +	if (cgroup_parent(parent)) {
> 
> Yeah, this makes sense. Can you add a short while-at-it note in the patch
> description?

Will do.

Thanks for the review!

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>,
	Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>,
	Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-team-b10kYP2dOMg@public.gmane.org
Subject: Re: [PATCH 5/8] cgroup: rstat: punt root-level optimization to individual controllers
Date: Mon, 8 Feb 2021 15:29:21 -0500	[thread overview]
Message-ID: <YCGfIYTLzcTO+ng8@cmpxchg.org> (raw)
In-Reply-To: <YB4OT61owRaze5/M@mtj.duckdns.org>

On Fri, Feb 05, 2021 at 10:34:39PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Fri, Feb 05, 2021 at 01:28:03PM -0500, Johannes Weiner wrote:
> > Current users of the rstat code can source root-level statistics from
> > the native counters of their respective subsystem, allowing them to
> > forego aggregation at the root level. This optimization is currently
> > implemented inside the generic rstat code, which doesn't track the
> > root cgroup and doesn't invoke the subsystem flush callbacks on it.
> > 
> > However, the memory controller cannot do this optimization, because
> > cgroup1 breaks out memory specifically for the local level, including
> > at the root level. In preparation for the memory controller switching
> > to rstat, move the optimization from rstat core to the controllers.
> > 
> > Afterwards, rstat will always track the root cgroup for changes and
> > invoke the subsystem callbacks on it; and it's up to the subsystem to
> > special-case and skip aggregation of the root cgroup if it can source
> > this information through other, cheaper means.
> > 
> > The extra cost of tracking the root cgroup is negligible: on stat
> > changes, we actually remove a branch that checks for the root. The
> > queueing for a flush touches only per-cpu data, and only the first
> > stat change since a flush requires a (per-cpu) lock.
> > 
> > Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> 
> Generally looks good to me.
> 
> Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Thanks!

> A couple nits below.
> 
> > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > index 02ce2058c14b..76725e1cad7f 100644
> > --- a/block/blk-cgroup.c
> > +++ b/block/blk-cgroup.c
> > @@ -766,6 +766,10 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> >  	struct blkcg *blkcg = css_to_blkcg(css);
> >  	struct blkcg_gq *blkg;
> >  
> > +	/* Root-level stats are sourced from system-wide IO stats */
> > +	if (!cgroup_parent(css->cgroup))
> > +		return;
> > +
> >  	rcu_read_lock();
> >  
> >  	hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
> > @@ -789,6 +793,7 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> >  		u64_stats_update_end(&blkg->iostat.sync);
> >  
> >  		/* propagate global delta to parent */
> > +		/* XXX: could skip this if parent is root */
> >  		if (parent) {
> >  			u64_stats_update_begin(&parent->iostat.sync);
> >  			blkg_iostat_set(&delta, &blkg->iostat.cur);
> 
> Might as well update this similar to cgroup_base_stat_flush()?

I meant to revisit that, but I'm never 100% confident when it comes to
the interaction and lifetime of css, blkcg and blkg_gq.

IIUC, the blkg_gq->parent linkage always matches the css parent
linkage; it just exists as an optimization for ancestor walks, which
would otherwise have to do radix lookups when going through the css.

So with the cgroup_parent() check at the beginning of the function
making sure we're looking at a non-root group, blkg_gq->parent should
also never be NULL and I can do if (paren->parent) directly, right?

> > @@ -58,8 +53,16 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu)
> >  		if (rstatc->updated_next)
> >  			break;
> >  
> > +		if (!parent) {
> 
> Maybe useful to note that the node is being marked busy but not added to the
> non-existent parent.

Makes sense, I'll add a comment.

> > +			rstatc->updated_next = cgrp;
> > +			break;
> > +		}
> > +
> > +		prstatc = cgroup_rstat_cpu(parent, cpu);
> >  		rstatc->updated_next = prstatc->updated_children;
> >  		prstatc->updated_children = cgrp;
> > +
> > +		cgrp = parent;
> >  	}
> >  
> >  	raw_spin_unlock_irqrestore(cpu_lock, flags);
> ...
> >  static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
> >  {
> > -	struct cgroup *parent = cgroup_parent(cgrp);
> >  	struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
> > +	struct cgroup *parent = cgroup_parent(cgrp);
> 
> Is this chunk intentional?

Yeah, it puts the local variable declarations into reverse christmas
tree ordering to make them a bit easier to read. It's a while-at-it
cleanup, mostly a force of habit. I can drop it if it bothers you.

> >  	struct cgroup_base_stat cur, delta;
> >  	unsigned seq;
> >  
> > +	/* Root-level stats are sourced from system-wide CPU stats */
> > +	if (!parent)
> > +		return;
> > +
> >  	/* fetch the current per-cpu values */
> >  	do {
> >  		seq = __u64_stats_fetch_begin(&rstatc->bsync);
> > @@ -326,8 +336,8 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
> >  	cgroup_base_stat_add(&cgrp->bstat, &delta);
> >  	cgroup_base_stat_add(&rstatc->last_bstat, &delta);
> >  
> > -	/* propagate global delta to parent */
> > -	if (parent) {
> > +	/* propagate global delta to parent (unless that's root) */
> > +	if (cgroup_parent(parent)) {
> 
> Yeah, this makes sense. Can you add a short while-at-it note in the patch
> description?

Will do.

Thanks for the review!

  reply	other threads:[~2021-02-08 21:28 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-05 18:27 [PATCH 0/8] mm: memcontrol: switch to rstat v2 Johannes Weiner
2021-02-05 18:27 ` Johannes Weiner
2021-02-05 18:27 ` [PATCH 1/8] mm: memcontrol: fix cpuhotplug statistics flushing Johannes Weiner
2021-02-05 18:28 ` [PATCH 2/8] mm: memcontrol: kill mem_cgroup_nodeinfo() Johannes Weiner
2021-02-05 18:28   ` Johannes Weiner
2021-02-05 18:28 ` [PATCH 3/8] mm: memcontrol: privatize memcg_page_state query functions Johannes Weiner
2021-02-05 18:28 ` [PATCH 4/8] cgroup: rstat: support cgroup1 Johannes Weiner
2021-02-05 18:28   ` Johannes Weiner
2021-02-05 22:11   ` Shakeel Butt
2021-02-05 22:11     ` Shakeel Butt
2021-02-05 22:11     ` Shakeel Butt
2021-02-06  3:00   ` Tejun Heo
2021-02-06  3:00     ` Tejun Heo
2021-02-05 18:28 ` [PATCH 5/8] cgroup: rstat: punt root-level optimization to individual controllers Johannes Weiner
2021-02-05 18:28   ` Johannes Weiner
2021-02-06  3:34   ` Tejun Heo
2021-02-06  3:34     ` Tejun Heo
2021-02-08 20:29     ` Johannes Weiner [this message]
2021-02-08 20:29       ` Johannes Weiner
2021-02-08 15:58       ` Tejun Heo
2021-02-08 15:58         ` Tejun Heo
2021-02-05 18:28 ` [PATCH 6/8] mm: memcontrol: switch to rstat Johannes Weiner
2021-02-08  2:19   ` Shakeel Butt
2021-02-08  2:19     ` Shakeel Butt
2021-02-08  2:19     ` Shakeel Butt
2021-02-08 20:40     ` Johannes Weiner
2021-02-08 20:40       ` Johannes Weiner
2021-02-05 18:28 ` [PATCH 7/8] mm: memcontrol: consolidate lruvec stat flushing Johannes Weiner
2021-02-05 18:28   ` Johannes Weiner
2021-02-08  2:28   ` Shakeel Butt
2021-02-08  2:28     ` Shakeel Butt
2021-02-08 20:54     ` Johannes Weiner
2021-02-08 20:54       ` Johannes Weiner
2021-02-08 16:02       ` Tejun Heo
2021-02-08 16:02         ` Tejun Heo
2021-02-08 13:54   ` Michal Hocko
2021-02-08 13:54     ` Michal Hocko
2021-02-05 18:28 ` [PATCH 8/8] kselftests: cgroup: update kmem test for new vmstat implementation Johannes Weiner
2021-02-05 18:28   ` Johannes Weiner
2021-02-09 13:48   ` Shakeel Butt
2021-02-09 13:48     ` Shakeel Butt
2021-02-09 13:48     ` Shakeel Butt
2021-02-06  3:58 ` [PATCH 0/8] mm: memcontrol: switch to rstat v2 Tejun Heo
2021-02-06  3:58   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YCGfIYTLzcTO+ng8@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.