linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it
@ 2019-07-24 17:35 Tejun Heo
  2019-07-24 17:37 ` [PATCH RESEND " Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2019-07-24 17:35 UTC (permalink / raw)
  Cc: linux-block, linux-kernel, kernel-team

blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
don't get offlined while there are active cgwbs on them.  However, it
ends up making offlining unordered sometimes causing parents to be
offlined before children.

To fix it, we want child blkcgs to pin the parents' online states
turning the refcnt into a more generic online pinning mechanism.

In prepartion,

* blkcg->cgwb_refcnt -> blkcg->online_pin
* blkcg_cgwb_get/put() -> blkcg_pin/unpin_online()
* Take them out of CONFIG_CGROUP_WRITEBACK

Signed-off-by: Tejun Heo <tj@kernel.org>
---
Hello,

The asynchronous blkcg offlining can break offline ordering.  This
doesn't affect any of in-kernel users but it broke an assumption that
the pending io.cost controller was making and is generally nasty.
These two patches fix the offlining ordering by making children pin
parents.

Thanks.

 block/blk-cgroup.c         |    6 +++---
 include/linux/blk-cgroup.h |   39 +++++++++++++--------------------------
 mm/backing-dev.c           |    6 +++---
 3 files changed, 19 insertions(+), 32 deletions(-)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1035,8 +1035,8 @@ static void blkcg_css_offline(struct cgr
 	/* this prevents anyone from attaching or migrating to this blkcg */
 	wb_blkcg_offline(blkcg);
 
-	/* put the base cgwb reference allowing step 2 to be triggered */
-	blkcg_cgwb_put(blkcg);
+	/* put the base online pin allowing step 2 to be triggered */
+	blkcg_unpin_online(blkcg);
 }
 
 /**
@@ -1135,11 +1135,11 @@ blkcg_css_alloc(struct cgroup_subsys_sta
 	}
 
 	spin_lock_init(&blkcg->lock);
+	refcount_set(&blkcg->online_pin, 1);
 	INIT_RADIX_TREE(&blkcg->blkg_tree, GFP_NOWAIT | __GFP_NOWARN);
 	INIT_HLIST_HEAD(&blkcg->blkg_list);
 #ifdef CONFIG_CGROUP_WRITEBACK
 	INIT_LIST_HEAD(&blkcg->cgwb_list);
-	refcount_set(&blkcg->cgwb_refcnt, 1);
 #endif
 	list_add_tail(&blkcg->all_blkcgs_node, &all_blkcgs);
 
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -47,6 +47,7 @@ struct blkcg_gq;
 struct blkcg {
 	struct cgroup_subsys_state	css;
 	spinlock_t			lock;
+	refcount_t			online_pin;
 
 	struct radix_tree_root		blkg_tree;
 	struct blkcg_gq	__rcu		*blkg_hint;
@@ -57,7 +58,6 @@ struct blkcg {
 	struct list_head		all_blkcgs_node;
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct list_head		cgwb_list;
-	refcount_t			cgwb_refcnt;
 #endif
 };
 
@@ -431,47 +431,34 @@ static inline struct blkcg *cpd_to_blkcg
 
 extern void blkcg_destroy_blkgs(struct blkcg *blkcg);
 
-#ifdef CONFIG_CGROUP_WRITEBACK
-
 /**
- * blkcg_cgwb_get - get a reference for blkcg->cgwb_list
+ * blkcg_pin_online - pin online state
  * @blkcg: blkcg of interest
  *
- * This is used to track the number of active wb's related to a blkcg.
+ * While pinned, a blkcg is kept online.  This is primarily used to
+ * impedance-match blkg and cgwb lifetimes so that blkg doesn't go offline
+ * while an associated cgwb is still active.
  */
-static inline void blkcg_cgwb_get(struct blkcg *blkcg)
+static inline void blkcg_pin_online(struct blkcg *blkcg)
 {
-	refcount_inc(&blkcg->cgwb_refcnt);
+	refcount_inc(&blkcg->online_pin);
 }
 
 /**
- * blkcg_cgwb_put - put a reference for @blkcg->cgwb_list
+ * blkcg_unpin_online - unpin online state
  * @blkcg: blkcg of interest
  *
- * This is used to track the number of active wb's related to a blkcg.
- * When this count goes to zero, all active wb has finished so the
+ * This is primarily used to impedance-match blkg and cgwb lifetimes so
+ * that blkg doesn't go offline while an associated cgwb is still active.
+ * When this count goes to zero, all active cgwbs have finished so the
  * blkcg can continue destruction by calling blkcg_destroy_blkgs().
- * This work may occur in cgwb_release_workfn() on the cgwb_release
- * workqueue.
  */
-static inline void blkcg_cgwb_put(struct blkcg *blkcg)
+static inline void blkcg_unpin_online(struct blkcg *blkcg)
 {
-	if (refcount_dec_and_test(&blkcg->cgwb_refcnt))
+	if (refcount_dec_and_test(&blkcg->online_pin))
 		blkcg_destroy_blkgs(blkcg);
 }
 
-#else
-
-static inline void blkcg_cgwb_get(struct blkcg *blkcg) { }
-
-static inline void blkcg_cgwb_put(struct blkcg *blkcg)
-{
-	/* wb isn't being accounted, so trigger destruction right away */
-	blkcg_destroy_blkgs(blkcg);
-}
-
-#endif
-
 /**
  * blkg_path - format cgroup path of blkg
  * @blkg: blkg of interest
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -487,8 +487,8 @@ static void cgwb_release_workfn(struct w
 	css_put(wb->blkcg_css);
 	mutex_unlock(&wb->bdi->cgwb_release_mutex);
 
-	/* triggers blkg destruction if cgwb_refcnt becomes zero */
-	blkcg_cgwb_put(blkcg);
+	/* triggers blkg destruction if no online users left */
+	blkcg_unpin_online(blkcg);
 
 	fprop_local_destroy_percpu(&wb->memcg_completions);
 	percpu_ref_exit(&wb->refcnt);
@@ -588,7 +588,7 @@ static int cgwb_create(struct backing_de
 			list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list);
 			list_add(&wb->memcg_node, memcg_cgwb_list);
 			list_add(&wb->blkcg_node, blkcg_cgwb_list);
-			blkcg_cgwb_get(blkcg);
+			blkcg_pin_online(blkcg);
 			css_get(memcg_css);
 			css_get(blkcg_css);
 		}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH RESEND 1/2] blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it
  2019-07-24 17:35 [PATCH 1/2] blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it Tejun Heo
@ 2019-07-24 17:37 ` Tejun Heo
  2019-07-24 17:37   ` [PATCH 2/2] blkcg: don't offline parent blkcg first Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2019-07-24 17:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel, kernel-team

blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
don't get offlined while there are active cgwbs on them.  However, it
ends up making offlining unordered sometimes causing parents to be
offlined before children.

To fix it, we want child blkcgs to pin the parents' online states
turning the refcnt into a more generic online pinning mechanism.

In prepartion,

* blkcg->cgwb_refcnt -> blkcg->online_pin
* blkcg_cgwb_get/put() -> blkcg_pin/unpin_online()
* Take them out of CONFIG_CGROUP_WRITEBACK

Signed-off-by: Tejun Heo <tj@kernel.org>
---
(Resending cuz somehow I cleared To: line before sending)

Hello,

The asynchronous blkcg offlining can break offline ordering.  This
doesn't affect any of in-kernel users but it broke an assumption that
the pending io.cost controller was making and is generally nasty.
These two patches fix the offlining ordering by making children pin
parents.

Thanks.

 block/blk-cgroup.c         |    6 +++---
 include/linux/blk-cgroup.h |   39 +++++++++++++--------------------------
 mm/backing-dev.c           |    6 +++---
 3 files changed, 19 insertions(+), 32 deletions(-)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1035,8 +1035,8 @@ static void blkcg_css_offline(struct cgr
 	/* this prevents anyone from attaching or migrating to this blkcg */
 	wb_blkcg_offline(blkcg);
 
-	/* put the base cgwb reference allowing step 2 to be triggered */
-	blkcg_cgwb_put(blkcg);
+	/* put the base online pin allowing step 2 to be triggered */
+	blkcg_unpin_online(blkcg);
 }
 
 /**
@@ -1135,11 +1135,11 @@ blkcg_css_alloc(struct cgroup_subsys_sta
 	}
 
 	spin_lock_init(&blkcg->lock);
+	refcount_set(&blkcg->online_pin, 1);
 	INIT_RADIX_TREE(&blkcg->blkg_tree, GFP_NOWAIT | __GFP_NOWARN);
 	INIT_HLIST_HEAD(&blkcg->blkg_list);
 #ifdef CONFIG_CGROUP_WRITEBACK
 	INIT_LIST_HEAD(&blkcg->cgwb_list);
-	refcount_set(&blkcg->cgwb_refcnt, 1);
 #endif
 	list_add_tail(&blkcg->all_blkcgs_node, &all_blkcgs);
 
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -47,6 +47,7 @@ struct blkcg_gq;
 struct blkcg {
 	struct cgroup_subsys_state	css;
 	spinlock_t			lock;
+	refcount_t			online_pin;
 
 	struct radix_tree_root		blkg_tree;
 	struct blkcg_gq	__rcu		*blkg_hint;
@@ -57,7 +58,6 @@ struct blkcg {
 	struct list_head		all_blkcgs_node;
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct list_head		cgwb_list;
-	refcount_t			cgwb_refcnt;
 #endif
 };
 
@@ -431,47 +431,34 @@ static inline struct blkcg *cpd_to_blkcg
 
 extern void blkcg_destroy_blkgs(struct blkcg *blkcg);
 
-#ifdef CONFIG_CGROUP_WRITEBACK
-
 /**
- * blkcg_cgwb_get - get a reference for blkcg->cgwb_list
+ * blkcg_pin_online - pin online state
  * @blkcg: blkcg of interest
  *
- * This is used to track the number of active wb's related to a blkcg.
+ * While pinned, a blkcg is kept online.  This is primarily used to
+ * impedance-match blkg and cgwb lifetimes so that blkg doesn't go offline
+ * while an associated cgwb is still active.
  */
-static inline void blkcg_cgwb_get(struct blkcg *blkcg)
+static inline void blkcg_pin_online(struct blkcg *blkcg)
 {
-	refcount_inc(&blkcg->cgwb_refcnt);
+	refcount_inc(&blkcg->online_pin);
 }
 
 /**
- * blkcg_cgwb_put - put a reference for @blkcg->cgwb_list
+ * blkcg_unpin_online - unpin online state
  * @blkcg: blkcg of interest
  *
- * This is used to track the number of active wb's related to a blkcg.
- * When this count goes to zero, all active wb has finished so the
+ * This is primarily used to impedance-match blkg and cgwb lifetimes so
+ * that blkg doesn't go offline while an associated cgwb is still active.
+ * When this count goes to zero, all active cgwbs have finished so the
  * blkcg can continue destruction by calling blkcg_destroy_blkgs().
- * This work may occur in cgwb_release_workfn() on the cgwb_release
- * workqueue.
  */
-static inline void blkcg_cgwb_put(struct blkcg *blkcg)
+static inline void blkcg_unpin_online(struct blkcg *blkcg)
 {
-	if (refcount_dec_and_test(&blkcg->cgwb_refcnt))
+	if (refcount_dec_and_test(&blkcg->online_pin))
 		blkcg_destroy_blkgs(blkcg);
 }
 
-#else
-
-static inline void blkcg_cgwb_get(struct blkcg *blkcg) { }
-
-static inline void blkcg_cgwb_put(struct blkcg *blkcg)
-{
-	/* wb isn't being accounted, so trigger destruction right away */
-	blkcg_destroy_blkgs(blkcg);
-}
-
-#endif
-
 /**
  * blkg_path - format cgroup path of blkg
  * @blkg: blkg of interest
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -487,8 +487,8 @@ static void cgwb_release_workfn(struct w
 	css_put(wb->blkcg_css);
 	mutex_unlock(&wb->bdi->cgwb_release_mutex);
 
-	/* triggers blkg destruction if cgwb_refcnt becomes zero */
-	blkcg_cgwb_put(blkcg);
+	/* triggers blkg destruction if no online users left */
+	blkcg_unpin_online(blkcg);
 
 	fprop_local_destroy_percpu(&wb->memcg_completions);
 	percpu_ref_exit(&wb->refcnt);
@@ -588,7 +588,7 @@ static int cgwb_create(struct backing_de
 			list_add_tail_rcu(&wb->bdi_node, &bdi->wb_list);
 			list_add(&wb->memcg_node, memcg_cgwb_list);
 			list_add(&wb->blkcg_node, blkcg_cgwb_list);
-			blkcg_cgwb_get(blkcg);
+			blkcg_pin_online(blkcg);
 			css_get(memcg_css);
 			css_get(blkcg_css);
 		}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] blkcg: don't offline parent blkcg first
  2019-07-24 17:37 ` [PATCH RESEND " Tejun Heo
@ 2019-07-24 17:37   ` Tejun Heo
  2020-04-01 20:35     ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2019-07-24 17:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel, kernel-team

blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
don't get offlined while there are active cgwbs on them.  However, it
ends up making offlining unordered sometimes causing parents to be
offlined before children.

Let's fix this by making child blkcgs pin the parents' online states.

Note that pin/unpin names are chosen over get/put intentionally
because css uses get/put online for something different.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c         |   16 ++++++++++++++++
 include/linux/blk-cgroup.h |    6 +++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1158,6 +1158,21 @@ unlock:
 	return ret;
 }
 
+static int blkcg_css_online(struct cgroup_subsys_state *css)
+{
+	struct blkcg *blkcg = css_to_blkcg(css);
+	struct blkcg *parent = blkcg_parent(blkcg);
+
+	/*
+	 * blkcg_pin_online() is used to delay blkcg offline so that blkgs
+	 * don't go offline while cgwbs are still active on them.  Pin the
+	 * parent so that offline always happens towards the root.
+	 */
+	if (parent)
+		blkcg_pin_online(parent);
+	return 0;
+}
+
 /**
  * blkcg_init_queue - initialize blkcg part of request queue
  * @q: request_queue to initialize
@@ -1300,6 +1315,7 @@ static void blkcg_exit(struct task_struc
 
 struct cgroup_subsys io_cgrp_subsys = {
 	.css_alloc = blkcg_css_alloc,
+	.css_online = blkcg_css_online,
 	.css_offline = blkcg_css_offline,
 	.css_free = blkcg_css_free,
 	.can_attach = blkcg_can_attach,
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -455,8 +455,12 @@ static inline void blkcg_pin_online(stru
  */
 static inline void blkcg_unpin_online(struct blkcg *blkcg)
 {
-	if (refcount_dec_and_test(&blkcg->online_pin))
+	do {
+		if (!refcount_dec_and_test(&blkcg->online_pin))
+			break;
 		blkcg_destroy_blkgs(blkcg);
+		blkcg = blkcg_parent(blkcg);
+	} while (blkcg);
 }
 
 /**

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] blkcg: don't offline parent blkcg first
  2019-07-24 17:37   ` [PATCH 2/2] blkcg: don't offline parent blkcg first Tejun Heo
@ 2020-04-01 20:35     ` Tejun Heo
  2020-04-01 20:57       ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2020-04-01 20:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel, kernel-team

On Wed, Jul 24, 2019 at 10:37:55AM -0700, Tejun Heo wrote:
> blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
> don't get offlined while there are active cgwbs on them.  However, it
> ends up making offlining unordered sometimes causing parents to be
> offlined before children.
> 
> Let's fix this by making child blkcgs pin the parents' online states.
> 
> Note that pin/unpin names are chosen over get/put intentionally
> because css uses get/put online for something different.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Jens, these two patches slipped through the cracks. Can you please take a look
at them?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] blkcg: don't offline parent blkcg first
  2020-04-01 20:35     ` Tejun Heo
@ 2020-04-01 20:57       ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2020-04-01 20:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-block, linux-kernel, kernel-team

On 4/1/20 2:35 PM, Tejun Heo wrote:
> On Wed, Jul 24, 2019 at 10:37:55AM -0700, Tejun Heo wrote:
>> blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
>> don't get offlined while there are active cgwbs on them.  However, it
>> ends up making offlining unordered sometimes causing parents to be
>> offlined before children.
>>
>> Let's fix this by making child blkcgs pin the parents' online states.
>>
>> Note that pin/unpin names are chosen over get/put intentionally
>> because css uses get/put online for something different.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
> 
> Jens, these two patches slipped through the cracks. Can you please take a look
> at them?

Huh indeed, looks fine to me. I'll add for 5.7, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-04-01 20:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-24 17:35 [PATCH 1/2] blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it Tejun Heo
2019-07-24 17:37 ` [PATCH RESEND " Tejun Heo
2019-07-24 17:37   ` [PATCH 2/2] blkcg: don't offline parent blkcg first Tejun Heo
2020-04-01 20:35     ` Tejun Heo
2020-04-01 20:57       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).