From: Johannes Weiner <hannes@cmpxchg.org> To: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>, Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH 5/8] cgroup: rstat: punt root-level optimization to individual controllers Date: Mon, 8 Feb 2021 15:29:21 -0500 [thread overview] Message-ID: <YCGfIYTLzcTO+ng8@cmpxchg.org> (raw) In-Reply-To: <YB4OT61owRaze5/M@mtj.duckdns.org> On Fri, Feb 05, 2021 at 10:34:39PM -0500, Tejun Heo wrote: > Hello, > > On Fri, Feb 05, 2021 at 01:28:03PM -0500, Johannes Weiner wrote: > > Current users of the rstat code can source root-level statistics from > > the native counters of their respective subsystem, allowing them to > > forego aggregation at the root level. This optimization is currently > > implemented inside the generic rstat code, which doesn't track the > > root cgroup and doesn't invoke the subsystem flush callbacks on it. > > > > However, the memory controller cannot do this optimization, because > > cgroup1 breaks out memory specifically for the local level, including > > at the root level. In preparation for the memory controller switching > > to rstat, move the optimization from rstat core to the controllers. > > > > Afterwards, rstat will always track the root cgroup for changes and > > invoke the subsystem callbacks on it; and it's up to the subsystem to > > special-case and skip aggregation of the root cgroup if it can source > > this information through other, cheaper means. > > > > The extra cost of tracking the root cgroup is negligible: on stat > > changes, we actually remove a branch that checks for the root. The > > queueing for a flush touches only per-cpu data, and only the first > > stat change since a flush requires a (per-cpu) lock. > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > Generally looks good to me. > > Acked-by: Tejun Heo <tj@kernel.org> Thanks! > A couple nits below. > > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > > index 02ce2058c14b..76725e1cad7f 100644 > > --- a/block/blk-cgroup.c > > +++ b/block/blk-cgroup.c > > @@ -766,6 +766,10 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu) > > struct blkcg *blkcg = css_to_blkcg(css); > > struct blkcg_gq *blkg; > > > > + /* Root-level stats are sourced from system-wide IO stats */ > > + if (!cgroup_parent(css->cgroup)) > > + return; > > + > > rcu_read_lock(); > > > > hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { > > @@ -789,6 +793,7 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu) > > u64_stats_update_end(&blkg->iostat.sync); > > > > /* propagate global delta to parent */ > > + /* XXX: could skip this if parent is root */ > > if (parent) { > > u64_stats_update_begin(&parent->iostat.sync); > > blkg_iostat_set(&delta, &blkg->iostat.cur); > > Might as well update this similar to cgroup_base_stat_flush()? I meant to revisit that, but I'm never 100% confident when it comes to the interaction and lifetime of css, blkcg and blkg_gq. IIUC, the blkg_gq->parent linkage always matches the css parent linkage; it just exists as an optimization for ancestor walks, which would otherwise have to do radix lookups when going through the css. So with the cgroup_parent() check at the beginning of the function making sure we're looking at a non-root group, blkg_gq->parent should also never be NULL and I can do if (paren->parent) directly, right? > > @@ -58,8 +53,16 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) > > if (rstatc->updated_next) > > break; > > > > + if (!parent) { > > Maybe useful to note that the node is being marked busy but not added to the > non-existent parent. Makes sense, I'll add a comment. > > + rstatc->updated_next = cgrp; > > + break; > > + } > > + > > + prstatc = cgroup_rstat_cpu(parent, cpu); > > rstatc->updated_next = prstatc->updated_children; > > prstatc->updated_children = cgrp; > > + > > + cgrp = parent; > > } > > > > raw_spin_unlock_irqrestore(cpu_lock, flags); > ... > > static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) > > { > > - struct cgroup *parent = cgroup_parent(cgrp); > > struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); > > + struct cgroup *parent = cgroup_parent(cgrp); > > Is this chunk intentional? Yeah, it puts the local variable declarations into reverse christmas tree ordering to make them a bit easier to read. It's a while-at-it cleanup, mostly a force of habit. I can drop it if it bothers you. > > struct cgroup_base_stat cur, delta; > > unsigned seq; > > > > + /* Root-level stats are sourced from system-wide CPU stats */ > > + if (!parent) > > + return; > > + > > /* fetch the current per-cpu values */ > > do { > > seq = __u64_stats_fetch_begin(&rstatc->bsync); > > @@ -326,8 +336,8 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) > > cgroup_base_stat_add(&cgrp->bstat, &delta); > > cgroup_base_stat_add(&rstatc->last_bstat, &delta); > > > > - /* propagate global delta to parent */ > > - if (parent) { > > + /* propagate global delta to parent (unless that's root) */ > > + if (cgroup_parent(parent)) { > > Yeah, this makes sense. Can you add a short while-at-it note in the patch > description? Will do. Thanks for the review!
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>, Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>, Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kernel-team-b10kYP2dOMg@public.gmane.org Subject: Re: [PATCH 5/8] cgroup: rstat: punt root-level optimization to individual controllers Date: Mon, 8 Feb 2021 15:29:21 -0500 [thread overview] Message-ID: <YCGfIYTLzcTO+ng8@cmpxchg.org> (raw) In-Reply-To: <YB4OT61owRaze5/M@mtj.duckdns.org> On Fri, Feb 05, 2021 at 10:34:39PM -0500, Tejun Heo wrote: > Hello, > > On Fri, Feb 05, 2021 at 01:28:03PM -0500, Johannes Weiner wrote: > > Current users of the rstat code can source root-level statistics from > > the native counters of their respective subsystem, allowing them to > > forego aggregation at the root level. This optimization is currently > > implemented inside the generic rstat code, which doesn't track the > > root cgroup and doesn't invoke the subsystem flush callbacks on it. > > > > However, the memory controller cannot do this optimization, because > > cgroup1 breaks out memory specifically for the local level, including > > at the root level. In preparation for the memory controller switching > > to rstat, move the optimization from rstat core to the controllers. > > > > Afterwards, rstat will always track the root cgroup for changes and > > invoke the subsystem callbacks on it; and it's up to the subsystem to > > special-case and skip aggregation of the root cgroup if it can source > > this information through other, cheaper means. > > > > The extra cost of tracking the root cgroup is negligible: on stat > > changes, we actually remove a branch that checks for the root. The > > queueing for a flush touches only per-cpu data, and only the first > > stat change since a flush requires a (per-cpu) lock. > > > > Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> > > Generally looks good to me. > > Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Thanks! > A couple nits below. > > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > > index 02ce2058c14b..76725e1cad7f 100644 > > --- a/block/blk-cgroup.c > > +++ b/block/blk-cgroup.c > > @@ -766,6 +766,10 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu) > > struct blkcg *blkcg = css_to_blkcg(css); > > struct blkcg_gq *blkg; > > > > + /* Root-level stats are sourced from system-wide IO stats */ > > + if (!cgroup_parent(css->cgroup)) > > + return; > > + > > rcu_read_lock(); > > > > hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { > > @@ -789,6 +793,7 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu) > > u64_stats_update_end(&blkg->iostat.sync); > > > > /* propagate global delta to parent */ > > + /* XXX: could skip this if parent is root */ > > if (parent) { > > u64_stats_update_begin(&parent->iostat.sync); > > blkg_iostat_set(&delta, &blkg->iostat.cur); > > Might as well update this similar to cgroup_base_stat_flush()? I meant to revisit that, but I'm never 100% confident when it comes to the interaction and lifetime of css, blkcg and blkg_gq. IIUC, the blkg_gq->parent linkage always matches the css parent linkage; it just exists as an optimization for ancestor walks, which would otherwise have to do radix lookups when going through the css. So with the cgroup_parent() check at the beginning of the function making sure we're looking at a non-root group, blkg_gq->parent should also never be NULL and I can do if (paren->parent) directly, right? > > @@ -58,8 +53,16 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) > > if (rstatc->updated_next) > > break; > > > > + if (!parent) { > > Maybe useful to note that the node is being marked busy but not added to the > non-existent parent. Makes sense, I'll add a comment. > > + rstatc->updated_next = cgrp; > > + break; > > + } > > + > > + prstatc = cgroup_rstat_cpu(parent, cpu); > > rstatc->updated_next = prstatc->updated_children; > > prstatc->updated_children = cgrp; > > + > > + cgrp = parent; > > } > > > > raw_spin_unlock_irqrestore(cpu_lock, flags); > ... > > static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) > > { > > - struct cgroup *parent = cgroup_parent(cgrp); > > struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); > > + struct cgroup *parent = cgroup_parent(cgrp); > > Is this chunk intentional? Yeah, it puts the local variable declarations into reverse christmas tree ordering to make them a bit easier to read. It's a while-at-it cleanup, mostly a force of habit. I can drop it if it bothers you. > > struct cgroup_base_stat cur, delta; > > unsigned seq; > > > > + /* Root-level stats are sourced from system-wide CPU stats */ > > + if (!parent) > > + return; > > + > > /* fetch the current per-cpu values */ > > do { > > seq = __u64_stats_fetch_begin(&rstatc->bsync); > > @@ -326,8 +336,8 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) > > cgroup_base_stat_add(&cgrp->bstat, &delta); > > cgroup_base_stat_add(&rstatc->last_bstat, &delta); > > > > - /* propagate global delta to parent */ > > - if (parent) { > > + /* propagate global delta to parent (unless that's root) */ > > + if (cgroup_parent(parent)) { > > Yeah, this makes sense. Can you add a short while-at-it note in the patch > description? Will do. Thanks for the review!
next prev parent reply other threads:[~2021-02-08 21:28 UTC|newest] Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-02-05 18:27 [PATCH 0/8] mm: memcontrol: switch to rstat v2 Johannes Weiner 2021-02-05 18:27 ` Johannes Weiner 2021-02-05 18:27 ` [PATCH 1/8] mm: memcontrol: fix cpuhotplug statistics flushing Johannes Weiner 2021-02-05 18:28 ` [PATCH 2/8] mm: memcontrol: kill mem_cgroup_nodeinfo() Johannes Weiner 2021-02-05 18:28 ` Johannes Weiner 2021-02-05 18:28 ` [PATCH 3/8] mm: memcontrol: privatize memcg_page_state query functions Johannes Weiner 2021-02-05 18:28 ` [PATCH 4/8] cgroup: rstat: support cgroup1 Johannes Weiner 2021-02-05 18:28 ` Johannes Weiner 2021-02-05 22:11 ` Shakeel Butt 2021-02-05 22:11 ` Shakeel Butt 2021-02-05 22:11 ` Shakeel Butt 2021-02-06 3:00 ` Tejun Heo 2021-02-06 3:00 ` Tejun Heo 2021-02-05 18:28 ` [PATCH 5/8] cgroup: rstat: punt root-level optimization to individual controllers Johannes Weiner 2021-02-05 18:28 ` Johannes Weiner 2021-02-06 3:34 ` Tejun Heo 2021-02-06 3:34 ` Tejun Heo 2021-02-08 20:29 ` Johannes Weiner [this message] 2021-02-08 20:29 ` Johannes Weiner 2021-02-08 15:58 ` Tejun Heo 2021-02-08 15:58 ` Tejun Heo 2021-02-05 18:28 ` [PATCH 6/8] mm: memcontrol: switch to rstat Johannes Weiner 2021-02-08 2:19 ` Shakeel Butt 2021-02-08 2:19 ` Shakeel Butt 2021-02-08 2:19 ` Shakeel Butt 2021-02-08 20:40 ` Johannes Weiner 2021-02-08 20:40 ` Johannes Weiner 2021-02-05 18:28 ` [PATCH 7/8] mm: memcontrol: consolidate lruvec stat flushing Johannes Weiner 2021-02-05 18:28 ` Johannes Weiner 2021-02-08 2:28 ` Shakeel Butt 2021-02-08 2:28 ` Shakeel Butt 2021-02-08 20:54 ` Johannes Weiner 2021-02-08 20:54 ` Johannes Weiner 2021-02-08 16:02 ` Tejun Heo 2021-02-08 16:02 ` Tejun Heo 2021-02-08 13:54 ` Michal Hocko 2021-02-08 13:54 ` Michal Hocko 2021-02-05 18:28 ` [PATCH 8/8] kselftests: cgroup: update kmem test for new vmstat implementation Johannes Weiner 2021-02-05 18:28 ` Johannes Weiner 2021-02-09 13:48 ` Shakeel Butt 2021-02-09 13:48 ` Shakeel Butt 2021-02-09 13:48 ` Shakeel Butt 2021-02-06 3:58 ` [PATCH 0/8] mm: memcontrol: switch to rstat v2 Tejun Heo 2021-02-06 3:58 ` Tejun Heo
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YCGfIYTLzcTO+ng8@cmpxchg.org \ --to=hannes@cmpxchg.org \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=guro@fb.com \ --cc=kernel-team@fb.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@suse.com \ --cc=shakeelb@google.com \ --cc=tj@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.