All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/3] blk-cgroup: Optimize blkcg_rstat_flush()
@ 2022-11-05  0:58 Waiman Long
  2022-11-05  0:59 ` [PATCH v10 1/3] blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path Waiman Long
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Waiman Long @ 2022-11-05  0:58 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe
  Cc: cgroups, linux-block, linux-kernel, Ming Lei, Andy Shevchenko,
	Andrew Morton, Michal Koutný,
	Hillf Danton, Waiman Long

 v10:
  - Update patch 3 to rename the rstat function to
    cgroup_rstat_css_cpu_flush().

 v9:
  - Remove patch "llist: Allow optional sentinel node terminated lockless
    list" for now. This will be done as a follow-up patch.
  - Add a new lqueued field to blkg_iostat_set to store the status of
    whether lnode is in a lockless list.
  - Add a new patch 3 to speed up the freeing of blkcg by flushing out
    the rstat lockless lists at blkcg offline time.

 v8:
  - Update the llist patch to make existing llist functions and macros
    work for both NULL and sentinel terminated lockless list as much
    as possible and leave only the initialization and removal functions
    to have a sentinel terminated llist variants.

This patch series improves blkcg_rstat_flush() performance by eliminating
unnecessary blkg enumeration and flush operations for those blkg's and
blkg_iostat_set's that haven't been updated since the last flush.

Waiman Long (3):
  blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path
  blk-cgroup: Optimize blkcg_rstat_flush()
  blk-cgroup: Flush stats at blkgs destruction path

 block/blk-cgroup.c     | 103 +++++++++++++++++++++++++++++++++++------
 block/blk-cgroup.h     |  10 ++++
 include/linux/cgroup.h |   1 +
 kernel/cgroup/rstat.c  |  20 ++++++++
 4 files changed, 119 insertions(+), 15 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v10 1/3] blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path
  2022-11-05  0:58 [PATCH v10 0/3] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
@ 2022-11-05  0:59 ` Waiman Long
  2022-11-05  0:59 ` [PATCH v10 2/3] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2022-11-05  0:59 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe
  Cc: cgroups, linux-block, linux-kernel, Ming Lei, Andy Shevchenko,
	Andrew Morton, Michal Koutný,
	Hillf Danton, Waiman Long

For blkcg_css_alloc(), the only error that will be returned is -ENOMEM.
Simplify error handling code by returning this error directly instead
of setting an intermediate "ret" variable.

Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 6a5c849ee061..af8a4d2d1fd1 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1139,7 +1139,6 @@ static struct cgroup_subsys_state *
 blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 {
 	struct blkcg *blkcg;
-	struct cgroup_subsys_state *ret;
 	int i;
 
 	mutex_lock(&blkcg_pol_mutex);
@@ -1148,10 +1147,8 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 		blkcg = &blkcg_root;
 	} else {
 		blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL);
-		if (!blkcg) {
-			ret = ERR_PTR(-ENOMEM);
+		if (!blkcg)
 			goto unlock;
-		}
 	}
 
 	for (i = 0; i < BLKCG_MAX_POLS ; i++) {
@@ -1168,10 +1165,9 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 			continue;
 
 		cpd = pol->cpd_alloc_fn(GFP_KERNEL);
-		if (!cpd) {
-			ret = ERR_PTR(-ENOMEM);
+		if (!cpd)
 			goto free_pd_blkcg;
-		}
+
 		blkcg->cpd[i] = cpd;
 		cpd->blkcg = blkcg;
 		cpd->plid = i;
@@ -1200,7 +1196,7 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 		kfree(blkcg);
 unlock:
 	mutex_unlock(&blkcg_pol_mutex);
-	return ret;
+	return ERR_PTR(-ENOMEM);
 }
 
 static int blkcg_css_online(struct cgroup_subsys_state *css)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v10 2/3] blk-cgroup: Optimize blkcg_rstat_flush()
  2022-11-05  0:58 [PATCH v10 0/3] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
  2022-11-05  0:59 ` [PATCH v10 1/3] blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path Waiman Long
@ 2022-11-05  0:59 ` Waiman Long
  2022-11-05  0:59   ` Waiman Long
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2022-11-05  0:59 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe
  Cc: cgroups, linux-block, linux-kernel, Ming Lei, Andy Shevchenko,
	Andrew Morton, Michal Koutný,
	Hillf Danton, Waiman Long

For a system with many CPUs and block devices, the time to do
blkcg_rstat_flush() from cgroup_rstat_flush() can be rather long. It
can be especially problematic as interrupt is disabled during the flush.
It was reported that it might take seconds to complete in some extreme
cases leading to hard lockup messages.

As it is likely that not all the percpu blkg_iostat_set's has been
updated since the last flush, those stale blkg_iostat_set's don't need
to be flushed in this case. This patch optimizes blkcg_rstat_flush()
by keeping a lockless list of recently updated blkg_iostat_set's in a
newly added percpu blkcg->lhead pointer.

The blkg_iostat_set is added to a lockless list on the update side
in blk_cgroup_bio_start(). It is removed from the lockless list when
flushed in blkcg_rstat_flush(). Due to racing, it is possible that
blk_iostat_set's in the lockless list may have no new IO stats to be
flushed, but that is OK.

To protect against destruction of blkg, a percpu reference is gotten
when putting into the lockless list and put back when removed.

When booting up an instrumented test kernel with this patch on a
2-socket 96-thread system with cgroup v2, out of the 2051 calls to
cgroup_rstat_flush() after bootup, 1788 of the calls were exited
immediately because of empty lockless list. After an all-cpu kernel
build, the ratio became 6295424/6340513. That was more than 99%.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c | 76 ++++++++++++++++++++++++++++++++++++++++++----
 block/blk-cgroup.h | 10 ++++++
 2 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index af8a4d2d1fd1..3e03c0d13253 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -59,6 +59,37 @@ static struct workqueue_struct *blkcg_punt_bio_wq;
 
 #define BLKG_DESTROY_BATCH_SIZE  64
 
+/*
+ * Lockless lists for tracking IO stats update
+ *
+ * New IO stats are stored in the percpu iostat_cpu within blkcg_gq (blkg).
+ * There are multiple blkg's (one for each block device) attached to each
+ * blkcg. The rstat code keeps track of which cpu has IO stats updated,
+ * but it doesn't know which blkg has the updated stats. If there are many
+ * block devices in a system, the cost of iterating all the blkg's to flush
+ * out the IO stats can be high. To reduce such overhead, a set of percpu
+ * lockless lists (lhead) per blkcg are used to track the set of recently
+ * updated iostat_cpu's since the last flush. An iostat_cpu will be put
+ * onto the lockless list on the update side [blk_cgroup_bio_start()] if
+ * not there yet and then removed when being flushed [blkcg_rstat_flush()].
+ * References to blkg are gotten and then put back in the process to
+ * protect against blkg removal.
+ *
+ * Return: 0 if successful or -ENOMEM if allocation fails.
+ */
+static int init_blkcg_llists(struct blkcg *blkcg)
+{
+	int cpu;
+
+	blkcg->lhead = alloc_percpu_gfp(struct llist_head, GFP_KERNEL);
+	if (!blkcg->lhead)
+		return -ENOMEM;
+
+	for_each_possible_cpu(cpu)
+		init_llist_head(per_cpu_ptr(blkcg->lhead, cpu));
+	return 0;
+}
+
 /**
  * blkcg_css - find the current css
  *
@@ -236,8 +267,10 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct gendisk *disk,
 	blkg->blkcg = blkcg;
 
 	u64_stats_init(&blkg->iostat.sync);
-	for_each_possible_cpu(cpu)
+	for_each_possible_cpu(cpu) {
 		u64_stats_init(&per_cpu_ptr(blkg->iostat_cpu, cpu)->sync);
+		per_cpu_ptr(blkg->iostat_cpu, cpu)->blkg = blkg;
+	}
 
 	for (i = 0; i < BLKCG_MAX_POLS; i++) {
 		struct blkcg_policy *pol = blkcg_policy[i];
@@ -827,7 +860,9 @@ static void blkcg_iostat_update(struct blkcg_gq *blkg, struct blkg_iostat *cur,
 static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 {
 	struct blkcg *blkcg = css_to_blkcg(css);
-	struct blkcg_gq *blkg;
+	struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
+	struct llist_node *lnode;
+	struct blkg_iostat_set *bisc, *next_bisc;
 
 	/* Root-level stats are sourced from system-wide IO stats */
 	if (!cgroup_parent(css->cgroup))
@@ -835,12 +870,21 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 
 	rcu_read_lock();
 
-	hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
+	lnode = llist_del_all(lhead);
+	if (!lnode)
+		goto out;
+
+	/*
+	 * Iterate only the iostat_cpu's queued in the lockless list.
+	 */
+	llist_for_each_entry_safe(bisc, next_bisc, lnode, lnode) {
+		struct blkcg_gq *blkg = bisc->blkg;
 		struct blkcg_gq *parent = blkg->parent;
-		struct blkg_iostat_set *bisc = per_cpu_ptr(blkg->iostat_cpu, cpu);
 		struct blkg_iostat cur;
 		unsigned int seq;
 
+		WRITE_ONCE(bisc->lqueued, false);
+
 		/* fetch the current per-cpu values */
 		do {
 			seq = u64_stats_fetch_begin(&bisc->sync);
@@ -853,8 +897,10 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 		if (parent && parent->parent)
 			blkcg_iostat_update(parent, &blkg->iostat.cur,
 					    &blkg->iostat.last);
+		percpu_ref_put(&blkg->refcnt);
 	}
 
+out:
 	rcu_read_unlock();
 }
 
@@ -1132,6 +1178,7 @@ static void blkcg_css_free(struct cgroup_subsys_state *css)
 
 	mutex_unlock(&blkcg_pol_mutex);
 
+	free_percpu(blkcg->lhead);
 	kfree(blkcg);
 }
 
@@ -1151,6 +1198,9 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 			goto unlock;
 	}
 
+	if (init_blkcg_llists(blkcg))
+		goto free_blkcg;
+
 	for (i = 0; i < BLKCG_MAX_POLS ; i++) {
 		struct blkcg_policy *pol = blkcg_policy[i];
 		struct blkcg_policy_data *cpd;
@@ -1191,7 +1241,8 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 	for (i--; i >= 0; i--)
 		if (blkcg->cpd[i])
 			blkcg_policy[i]->cpd_free_fn(blkcg->cpd[i]);
-
+	free_percpu(blkcg->lhead);
+free_blkcg:
 	if (blkcg != &blkcg_root)
 		kfree(blkcg);
 unlock:
@@ -1939,6 +1990,7 @@ static int blk_cgroup_io_type(struct bio *bio)
 
 void blk_cgroup_bio_start(struct bio *bio)
 {
+	struct blkcg *blkcg = bio->bi_blkg->blkcg;
 	int rwd = blk_cgroup_io_type(bio), cpu;
 	struct blkg_iostat_set *bis;
 	unsigned long flags;
@@ -1957,9 +2009,21 @@ void blk_cgroup_bio_start(struct bio *bio)
 	}
 	bis->cur.ios[rwd]++;
 
+	/*
+	 * If the iostat_cpu isn't in a lockless list, put it into the
+	 * list to indicate that a stat update is pending.
+	 */
+	if (!READ_ONCE(bis->lqueued)) {
+		struct llist_head *lhead = this_cpu_ptr(blkcg->lhead);
+
+		llist_add(&bis->lnode, lhead);
+		WRITE_ONCE(bis->lqueued, true);
+		percpu_ref_get(&bis->blkg->refcnt);
+	}
+
 	u64_stats_update_end_irqrestore(&bis->sync, flags);
 	if (cgroup_subsys_on_dfl(io_cgrp_subsys))
-		cgroup_rstat_updated(bio->bi_blkg->blkcg->css.cgroup, cpu);
+		cgroup_rstat_updated(blkcg->css.cgroup, cpu);
 	put_cpu();
 }
 
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index aa2b286bc825..1e94e404eaa8 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -18,6 +18,7 @@
 #include <linux/cgroup.h>
 #include <linux/kthread.h>
 #include <linux/blk-mq.h>
+#include <linux/llist.h>
 
 struct blkcg_gq;
 struct blkg_policy_data;
@@ -43,6 +44,9 @@ struct blkg_iostat {
 
 struct blkg_iostat_set {
 	struct u64_stats_sync		sync;
+	struct blkcg_gq		       *blkg;
+	struct llist_node		lnode;
+	int				lqueued;	/* queued in llist */
 	struct blkg_iostat		cur;
 	struct blkg_iostat		last;
 };
@@ -97,6 +101,12 @@ struct blkcg {
 	struct blkcg_policy_data	*cpd[BLKCG_MAX_POLS];
 
 	struct list_head		all_blkcgs_node;
+
+	/*
+	 * List of updated percpu blkg_iostat_set's since the last flush.
+	 */
+	struct llist_head __percpu	*lhead;
+
 #ifdef CONFIG_BLK_CGROUP_FC_APPID
 	char                            fc_app_id[FC_APPID_LEN];
 #endif
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v10 3/3] blk-cgroup: Flush stats at blkgs destruction path
@ 2022-11-05  0:59   ` Waiman Long
  0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2022-11-05  0:59 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe
  Cc: cgroups, linux-block, linux-kernel, Ming Lei, Andy Shevchenko,
	Andrew Morton, Michal Koutný,
	Hillf Danton, Waiman Long

As noted by Michal, the blkg_iostat_set's in the lockless list
hold reference to blkg's to protect against their removal. Those
blkg's hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which is
called when the blkcg reference count reaches 0. This circular dependency
will prevent blkcg from being freed until some other events cause
cgroup_rstat_flush() to be called to flush out the pending blkcg stats.

To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush()
function to flush stats for a given css and cpu and call it at the blkgs
destruction path, blkcg_destroy_blkgs(), whenever there are still some
pending stats to be flushed. This will ensure that blkcg reference
count can reach 0 ASAP.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 block/blk-cgroup.c     | 15 ++++++++++++++-
 include/linux/cgroup.h |  1 +
 kernel/cgroup/rstat.c  | 20 ++++++++++++++++++++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 3e03c0d13253..57941d2a8ba3 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1084,10 +1084,12 @@ struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css)
  */
 static void blkcg_destroy_blkgs(struct blkcg *blkcg)
 {
+	int cpu;
+
 	might_sleep();
 
+	css_get(&blkcg->css);
 	spin_lock_irq(&blkcg->lock);
-
 	while (!hlist_empty(&blkcg->blkg_list)) {
 		struct blkcg_gq *blkg = hlist_entry(blkcg->blkg_list.first,
 						struct blkcg_gq, blkcg_node);
@@ -1110,6 +1112,17 @@ static void blkcg_destroy_blkgs(struct blkcg *blkcg)
 	}
 
 	spin_unlock_irq(&blkcg->lock);
+
+	/*
+	 * Flush all the non-empty percpu lockless lists.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
+
+		if (!llist_empty(lhead))
+			cgroup_rstat_css_cpu_flush(&blkcg->css, cpu);
+	}
+	css_put(&blkcg->css);
 }
 
 /**
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 528bd44b59e2..6c4e66b3fa84 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -766,6 +766,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp);
 void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp);
 void cgroup_rstat_flush_hold(struct cgroup *cgrp);
 void cgroup_rstat_flush_release(void);
+void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu);
 
 /*
  * Basic resource stats.
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 793ecff29038..910e633869b0 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -281,6 +281,26 @@ void cgroup_rstat_flush_release(void)
 	spin_unlock_irq(&cgroup_rstat_lock);
 }
 
+/**
+ * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu
+ * @css: target css to be flush
+ * @cpu: the cpu that holds the stats to be flush
+ *
+ * A lightweight rstat flush operation for a given css and cpu.
+ * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
+ * isn't used.
+ */
+void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu)
+{
+	raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
+
+	raw_spin_lock_irq(cpu_lock);
+	rcu_read_lock();
+	css->ss->css_rstat_flush(css, cpu);
+	rcu_read_unlock();
+	raw_spin_unlock_irq(cpu_lock);
+}
+
 int cgroup_rstat_init(struct cgroup *cgrp)
 {
 	int cpu;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v10 3/3] blk-cgroup: Flush stats at blkgs destruction path
@ 2022-11-05  0:59   ` Waiman Long
  0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2022-11-05  0:59 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ming Lei, Andy Shevchenko,
	Andrew Morton, Michal Koutný,
	Hillf Danton, Waiman Long

As noted by Michal, the blkg_iostat_set's in the lockless list
hold reference to blkg's to protect against their removal. Those
blkg's hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which is
called when the blkcg reference count reaches 0. This circular dependency
will prevent blkcg from being freed until some other events cause
cgroup_rstat_flush() to be called to flush out the pending blkcg stats.

To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush()
function to flush stats for a given css and cpu and call it at the blkgs
destruction path, blkcg_destroy_blkgs(), whenever there are still some
pending stats to be flushed. This will ensure that blkcg reference
count can reach 0 ASAP.

Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 block/blk-cgroup.c     | 15 ++++++++++++++-
 include/linux/cgroup.h |  1 +
 kernel/cgroup/rstat.c  | 20 ++++++++++++++++++++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 3e03c0d13253..57941d2a8ba3 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1084,10 +1084,12 @@ struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css)
  */
 static void blkcg_destroy_blkgs(struct blkcg *blkcg)
 {
+	int cpu;
+
 	might_sleep();
 
+	css_get(&blkcg->css);
 	spin_lock_irq(&blkcg->lock);
-
 	while (!hlist_empty(&blkcg->blkg_list)) {
 		struct blkcg_gq *blkg = hlist_entry(blkcg->blkg_list.first,
 						struct blkcg_gq, blkcg_node);
@@ -1110,6 +1112,17 @@ static void blkcg_destroy_blkgs(struct blkcg *blkcg)
 	}
 
 	spin_unlock_irq(&blkcg->lock);
+
+	/*
+	 * Flush all the non-empty percpu lockless lists.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
+
+		if (!llist_empty(lhead))
+			cgroup_rstat_css_cpu_flush(&blkcg->css, cpu);
+	}
+	css_put(&blkcg->css);
 }
 
 /**
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 528bd44b59e2..6c4e66b3fa84 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -766,6 +766,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp);
 void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp);
 void cgroup_rstat_flush_hold(struct cgroup *cgrp);
 void cgroup_rstat_flush_release(void);
+void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu);
 
 /*
  * Basic resource stats.
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 793ecff29038..910e633869b0 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -281,6 +281,26 @@ void cgroup_rstat_flush_release(void)
 	spin_unlock_irq(&cgroup_rstat_lock);
 }
 
+/**
+ * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu
+ * @css: target css to be flush
+ * @cpu: the cpu that holds the stats to be flush
+ *
+ * A lightweight rstat flush operation for a given css and cpu.
+ * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
+ * isn't used.
+ */
+void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu)
+{
+	raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
+
+	raw_spin_lock_irq(cpu_lock);
+	rcu_read_lock();
+	css->ss->css_rstat_flush(css, cpu);
+	rcu_read_unlock();
+	raw_spin_unlock_irq(cpu_lock);
+}
+
 int cgroup_rstat_init(struct cgroup *cgrp)
 {
 	int cpu;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 3/3] blk-cgroup: Flush stats at blkgs destruction path
  2022-11-05  0:59   ` Waiman Long
  (?)
@ 2022-11-14 22:43   ` Tejun Heo
  -1 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2022-11-14 22:43 UTC (permalink / raw)
  To: Waiman Long
  Cc: Jens Axboe, cgroups, linux-block, linux-kernel, Ming Lei,
	Andy Shevchenko, Andrew Morton, Michal Koutný,
	Hillf Danton

On Fri, Nov 04, 2022 at 08:59:02PM -0400, Waiman Long wrote:
> As noted by Michal, the blkg_iostat_set's in the lockless list
> hold reference to blkg's to protect against their removal. Those
> blkg's hold reference to blkcg. When a cgroup is being destroyed,
> cgroup_rstat_flush() is only called at css_release_work_fn() which is
> called when the blkcg reference count reaches 0. This circular dependency
> will prevent blkcg from being freed until some other events cause
> cgroup_rstat_flush() to be called to flush out the pending blkcg stats.
> 
> To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush()
> function to flush stats for a given css and cpu and call it at the blkgs
> destruction path, blkcg_destroy_blkgs(), whenever there are still some
> pending stats to be flushed. This will ensure that blkcg reference
> count can reach 0 ASAP.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 0/3] blk-cgroup: Optimize blkcg_rstat_flush()
  2022-11-05  0:58 [PATCH v10 0/3] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
                   ` (2 preceding siblings ...)
  2022-11-05  0:59   ` Waiman Long
@ 2022-11-16 23:06 ` Waiman Long
  2022-11-16 23:58     ` Jens Axboe
       [not found] ` <166864313668.13217.6182708630212912209.b4-ty@kernel.dk>
  4 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2022-11-16 23:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: cgroups, linux-block, linux-kernel, Ming Lei, Andy Shevchenko,
	Andrew Morton, Michal Koutný,
	Hillf Danton, Tejun Heo

On 11/4/22 20:58, Waiman Long wrote:
>   v10:
>    - Update patch 3 to rename the rstat function to
>      cgroup_rstat_css_cpu_flush().
>
>   v9:
>    - Remove patch "llist: Allow optional sentinel node terminated lockless
>      list" for now. This will be done as a follow-up patch.
>    - Add a new lqueued field to blkg_iostat_set to store the status of
>      whether lnode is in a lockless list.
>    - Add a new patch 3 to speed up the freeing of blkcg by flushing out
>      the rstat lockless lists at blkcg offline time.
>
>   v8:
>    - Update the llist patch to make existing llist functions and macros
>      work for both NULL and sentinel terminated lockless list as much
>      as possible and leave only the initialization and removal functions
>      to have a sentinel terminated llist variants.
>
> This patch series improves blkcg_rstat_flush() performance by eliminating
> unnecessary blkg enumeration and flush operations for those blkg's and
> blkg_iostat_set's that haven't been updated since the last flush.
>
> Waiman Long (3):
>    blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path
>    blk-cgroup: Optimize blkcg_rstat_flush()
>    blk-cgroup: Flush stats at blkgs destruction path
>
>   block/blk-cgroup.c     | 103 +++++++++++++++++++++++++++++++++++------
>   block/blk-cgroup.h     |  10 ++++
>   include/linux/cgroup.h |   1 +
>   kernel/cgroup/rstat.c  |  20 ++++++++
>   4 files changed, 119 insertions(+), 15 deletions(-)

Jens, do you have any further comment on this patchset? Is it possible 
to queue it for the next Linux version?

Cheers,
Longman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 0/3] blk-cgroup: Optimize blkcg_rstat_flush()
@ 2022-11-16 23:58     ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2022-11-16 23:58 UTC (permalink / raw)
  To: Waiman Long
  Cc: cgroups, linux-block, linux-kernel, Ming Lei, Andy Shevchenko,
	Andrew Morton, Michal Koutný,
	Hillf Danton, Tejun Heo

On 11/16/22 4:06 PM, Waiman Long wrote:
> On 11/4/22 20:58, Waiman Long wrote:
>> ? v10:
>> ?? - Update patch 3 to rename the rstat function to
>> ???? cgroup_rstat_css_cpu_flush().
>>
>> ? v9:
>> ?? - Remove patch "llist: Allow optional sentinel node terminated lockless
>> ???? list" for now. This will be done as a follow-up patch.
>> ?? - Add a new lqueued field to blkg_iostat_set to store the status of
>> ???? whether lnode is in a lockless list.
>> ?? - Add a new patch 3 to speed up the freeing of blkcg by flushing out
>> ???? the rstat lockless lists at blkcg offline time.
>>
>> ? v8:
>> ?? - Update the llist patch to make existing llist functions and macros
>> ???? work for both NULL and sentinel terminated lockless list as much
>> ???? as possible and leave only the initialization and removal functions
>> ???? to have a sentinel terminated llist variants.
>>
>> This patch series improves blkcg_rstat_flush() performance by eliminating
>> unnecessary blkg enumeration and flush operations for those blkg's and
>> blkg_iostat_set's that haven't been updated since the last flush.
>>
>> Waiman Long (3):
>> ?? blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path
>> ?? blk-cgroup: Optimize blkcg_rstat_flush()
>> ?? blk-cgroup: Flush stats at blkgs destruction path
>>
>> ? block/blk-cgroup.c???? | 103 +++++++++++++++++++++++++++++++++++------
>> ? block/blk-cgroup.h???? |? 10 ++++
>> ? include/linux/cgroup.h |?? 1 +
>> ? kernel/cgroup/rstat.c? |? 20 ++++++++
>> ? 4 files changed, 119 insertions(+), 15 deletions(-)
> 
> Jens, do you have any further comment on this patchset? Is it possible
> to queue it for the next Linux version?

Looks good to me, I'll queue it up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 0/3] blk-cgroup: Optimize blkcg_rstat_flush()
@ 2022-11-16 23:58     ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2022-11-16 23:58 UTC (permalink / raw)
  To: Waiman Long
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ming Lei, Andy Shevchenko,
	Andrew Morton, Michal Koutný,
	Hillf Danton, Tejun Heo

On 11/16/22 4:06 PM, Waiman Long wrote:
> On 11/4/22 20:58, Waiman Long wrote:
>> ? v10:
>> ?? - Update patch 3 to rename the rstat function to
>> ???? cgroup_rstat_css_cpu_flush().
>>
>> ? v9:
>> ?? - Remove patch "llist: Allow optional sentinel node terminated lockless
>> ???? list" for now. This will be done as a follow-up patch.
>> ?? - Add a new lqueued field to blkg_iostat_set to store the status of
>> ???? whether lnode is in a lockless list.
>> ?? - Add a new patch 3 to speed up the freeing of blkcg by flushing out
>> ???? the rstat lockless lists at blkcg offline time.
>>
>> ? v8:
>> ?? - Update the llist patch to make existing llist functions and macros
>> ???? work for both NULL and sentinel terminated lockless list as much
>> ???? as possible and leave only the initialization and removal functions
>> ???? to have a sentinel terminated llist variants.
>>
>> This patch series improves blkcg_rstat_flush() performance by eliminating
>> unnecessary blkg enumeration and flush operations for those blkg's and
>> blkg_iostat_set's that haven't been updated since the last flush.
>>
>> Waiman Long (3):
>> ?? blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path
>> ?? blk-cgroup: Optimize blkcg_rstat_flush()
>> ?? blk-cgroup: Flush stats at blkgs destruction path
>>
>> ? block/blk-cgroup.c???? | 103 +++++++++++++++++++++++++++++++++++------
>> ? block/blk-cgroup.h???? |? 10 ++++
>> ? include/linux/cgroup.h |?? 1 +
>> ? kernel/cgroup/rstat.c? |? 20 ++++++++
>> ? 4 files changed, 119 insertions(+), 15 deletions(-)
> 
> Jens, do you have any further comment on this patchset? Is it possible
> to queue it for the next Linux version?

Looks good to me, I'll queue it up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 0/3] blk-cgroup: Optimize blkcg_rstat_flush()
       [not found] ` <166864313668.13217.6182708630212912209.b4-ty@kernel.dk>
@ 2022-11-17  0:45   ` Waiman Long
  0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2022-11-17  0:45 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo
  Cc: Andrew Morton, Andy Shevchenko, Hillf Danton, cgroups,
	linux-block, linux-kernel, Michal Koutný,
	Ming Lei

On 11/16/22 18:58, Jens Axboe wrote:
> On Fri, 4 Nov 2022 20:58:59 -0400, Waiman Long wrote:
>>   v10:
>>    - Update patch 3 to rename the rstat function to
>>      cgroup_rstat_css_cpu_flush().
>>
>>   v9:
>>    - Remove patch "llist: Allow optional sentinel node terminated lockless
>>      list" for now. This will be done as a follow-up patch.
>>    - Add a new lqueued field to blkg_iostat_set to store the status of
>>      whether lnode is in a lockless list.
>>    - Add a new patch 3 to speed up the freeing of blkcg by flushing out
>>      the rstat lockless lists at blkcg offline time.
>>
>> [...]
> Applied, thanks!
>
> [1/3] blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path
>        commit: b5a9adcbd5dc95d34d1f5fc84eff9af6fc60d284
> [2/3] blk-cgroup: Optimize blkcg_rstat_flush()
>        commit: 3b8cc6298724021da845f2f9fd7dd4b6829a6817
> [3/3] blk-cgroup: Flush stats at blkgs destruction path
>        commit: dae590a6c96c799434e0ff8156ef29b88c257e60
>
> Best regards,

Thanks a lot!

Cheers,
Longman


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-17  0:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-05  0:58 [PATCH v10 0/3] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
2022-11-05  0:59 ` [PATCH v10 1/3] blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path Waiman Long
2022-11-05  0:59 ` [PATCH v10 2/3] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
2022-11-05  0:59 ` [PATCH v10 3/3] blk-cgroup: Flush stats at blkgs destruction path Waiman Long
2022-11-05  0:59   ` Waiman Long
2022-11-14 22:43   ` Tejun Heo
2022-11-16 23:06 ` [PATCH v10 0/3] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long
2022-11-16 23:58   ` Jens Axboe
2022-11-16 23:58     ` Jens Axboe
     [not found] ` <166864313668.13217.6182708630212912209.b4-ty@kernel.dk>
2022-11-17  0:45   ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.