linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET] block: per-queue policy activation
@ 2012-04-12 23:29 Tejun Heo
  2012-04-12 23:29 ` [PATCH 1/8] blkcg: kill blkio_list and replace blkio_list_lock with a mutex Tejun Heo
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Tejun Heo @ 2012-04-12 23:29 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

Hello,

It seems that we're *finally* nearing the end of the long blkcg API
cleanup.  With all the previous updates, blkcg code is now clean &
modular enough to implement per-queue policy activation.

Upto now, blkcg assumed that all policies are active on all queues
which simply isn't true - cfq policy is only applicable to queues
which are using cfq as the elevator.  Also, for transition purposes,
the current implementation clears all !root blkgs blkgs across
elevator switch and policy [de]registration in racy manner.

This patchset implements per-queue policy activation.  Static policy
ID is replaced with dynamic registration and each policy should be
activated and deactivated explicitly on each queue.  On activation,
the matching policy data are created on all existing blkgs.  On
deactivation, the matching policy data are removed from all existing
blkgs.  blkg printing skips blkg-policy combination which is disabled
and trying to configure a policy on a blkg which doesn't have the
policy enabled would return -EINVAL instead of creating an unused
dummy configuration.

 0001-blkcg-kill-blkio_list-and-replace-blkio_list_lock-wi.patch
 0002-blkcg-use-pol-instead-of-plid-in-update_root_blkg_pd.patch
 0003-blkcg-remove-static-policy-ID-enums.patch
 0004-blkcg-make-blkg_conf_prep-take-pol-and-return-with-q.patch
 0005-blkcg-make-sure-blkg_lookup-returns-NULL-if-q-is-byp.patch
 0006-blkcg-add-request_queue-root_blkg.patch
 0007-blkcg-implement-per-queue-policy-activation.patch
 0008-blkcg-drop-stuff-unused-after-per-queue-policy-activ.patch

0001-0003 removes static policy IDs.

0004-0006 are prep patches.

0007 implements per-queue policy activation.

0008 drops unnecessary stuff.

This patchset is on top of the current block/for-3.5/core 5bc4afb1ec
"blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros" and also available in
the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blkcg-pols

diffstat follows.

 block/blk-cgroup.c     |  424 ++++++++++++++++++++++++++++++-------------------
 block/blk-cgroup.h     |   43 ++--
 block/blk-core.c       |    4 
 block/blk-throttle.c   |   78 +++------
 block/cfq-iosched.c    |   71 ++++----
 block/elevator.c       |    2 
 include/linux/blkdev.h |   10 +
 7 files changed, 367 insertions(+), 265 deletions(-)

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/8] blkcg: kill blkio_list and replace blkio_list_lock with a mutex
  2012-04-12 23:29 [PATCHSET] block: per-queue policy activation Tejun Heo
@ 2012-04-12 23:29 ` Tejun Heo
  2012-04-13 13:32   ` Vivek Goyal
  2012-04-12 23:29 ` [PATCH 2/8] blkcg: use @pol instead of @plid in update_root_blkg_pd() and blkcg_print_blkgs() Tejun Heo
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2012-04-12 23:29 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

With blkio_policy[], blkio_list is redundant and hinders with
per-queue policy activation.  Remove it.  Also, replace
blkio_list_lock with a mutex blkcg_pol_mutex and let it protect the
whole [un]registration.

This is to prepare for per-queue policy activation and doesn't cause
any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c |   32 +++++++++++++++++---------------
 block/blk-cgroup.h |    1 -
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 9449c38..af665fe 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -24,9 +24,7 @@
 
 #define MAX_KEY_LEN 100
 
-static DEFINE_SPINLOCK(blkio_list_lock);
-static LIST_HEAD(blkio_list);
-
+static DEFINE_MUTEX(blkcg_pol_mutex);
 static DEFINE_MUTEX(all_q_mutex);
 static LIST_HEAD(all_q_list);
 
@@ -311,8 +309,9 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup);
 	struct blkio_group *blkg;
 	struct hlist_node *n;
+	int i;
 
-	spin_lock(&blkio_list_lock);
+	mutex_lock(&blkcg_pol_mutex);
 	spin_lock_irq(&blkcg->lock);
 
 	/*
@@ -321,15 +320,16 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 	 * anyway.  If you get hit by a race, retry.
 	 */
 	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
-		struct blkio_policy_type *pol;
+		for (i = 0; i < BLKIO_NR_POLICIES; i++) {
+			struct blkio_policy_type *pol = blkio_policy[i];
 
-		list_for_each_entry(pol, &blkio_list, list)
-			if (pol->ops.blkio_reset_group_stats_fn)
+			if (pol && pol->ops.blkio_reset_group_stats_fn)
 				pol->ops.blkio_reset_group_stats_fn(blkg);
+		}
 	}
 
 	spin_unlock_irq(&blkcg->lock);
-	spin_unlock(&blkio_list_lock);
+	mutex_unlock(&blkcg_pol_mutex);
 	return 0;
 }
 
@@ -732,20 +732,21 @@ void blkio_policy_register(struct blkio_policy_type *blkiop)
 {
 	struct request_queue *q;
 
+	mutex_lock(&blkcg_pol_mutex);
+
 	blkcg_bypass_start();
-	spin_lock(&blkio_list_lock);
 
 	BUG_ON(blkio_policy[blkiop->plid]);
 	blkio_policy[blkiop->plid] = blkiop;
-	list_add_tail(&blkiop->list, &blkio_list);
-
-	spin_unlock(&blkio_list_lock);
 	list_for_each_entry(q, &all_q_list, all_q_node)
 		update_root_blkg_pd(q, blkiop->plid);
+
 	blkcg_bypass_end();
 
 	if (blkiop->cftypes)
 		WARN_ON(cgroup_add_cftypes(&blkio_subsys, blkiop->cftypes));
+
+	mutex_unlock(&blkcg_pol_mutex);
 }
 EXPORT_SYMBOL_GPL(blkio_policy_register);
 
@@ -753,19 +754,20 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
 {
 	struct request_queue *q;
 
+	mutex_lock(&blkcg_pol_mutex);
+
 	if (blkiop->cftypes)
 		cgroup_rm_cftypes(&blkio_subsys, blkiop->cftypes);
 
 	blkcg_bypass_start();
-	spin_lock(&blkio_list_lock);
 
 	BUG_ON(blkio_policy[blkiop->plid] != blkiop);
 	blkio_policy[blkiop->plid] = NULL;
-	list_del_init(&blkiop->list);
 
-	spin_unlock(&blkio_list_lock);
 	list_for_each_entry(q, &all_q_list, all_q_node)
 		update_root_blkg_pd(q, blkiop->plid);
 	blkcg_bypass_end();
+
+	mutex_unlock(&blkcg_pol_mutex);
 }
 EXPORT_SYMBOL_GPL(blkio_policy_unregister);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index ca0ff7c..f2ede76 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -102,7 +102,6 @@ struct blkio_policy_ops {
 };
 
 struct blkio_policy_type {
-	struct list_head list;
 	struct blkio_policy_ops ops;
 	enum blkio_policy_id plid;
 	size_t pdata_size;		/* policy specific private data size */
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/8] blkcg: use @pol instead of @plid in update_root_blkg_pd() and blkcg_print_blkgs()
  2012-04-12 23:29 [PATCHSET] block: per-queue policy activation Tejun Heo
  2012-04-12 23:29 ` [PATCH 1/8] blkcg: kill blkio_list and replace blkio_list_lock with a mutex Tejun Heo
@ 2012-04-12 23:29 ` Tejun Heo
  2012-04-12 23:29 ` [PATCH 3/8] blkcg: remove static policy ID enums Tejun Heo
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2012-04-12 23:29 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

The two functions were taking "enum blkio_policy_id plid".  Make them
take "const struct blkio_policy_type *pol" instead.

This is to prepare for per-queue policy activation and doesn't cause
any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c   |   21 +++++++++++----------
 block/blk-cgroup.h   |    7 ++++---
 block/blk-throttle.c |    6 +++---
 block/cfq-iosched.c  |   10 +++++-----
 4 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index af665fe..b123152 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -226,17 +226,17 @@ static void blkg_destroy(struct blkio_group *blkg)
  * aren't shot down.  This broken and racy implementation is temporary.
  * Eventually, blkg shoot down will be replaced by proper in-place update.
  */
-void update_root_blkg_pd(struct request_queue *q, enum blkio_policy_id plid)
+void update_root_blkg_pd(struct request_queue *q,
+			 const struct blkio_policy_type *pol)
 {
-	struct blkio_policy_type *pol = blkio_policy[plid];
 	struct blkio_group *blkg = blkg_lookup(&blkio_root_cgroup, q);
 	struct blkg_policy_data *pd;
 
 	if (!blkg)
 		return;
 
-	kfree(blkg->pd[plid]);
-	blkg->pd[plid] = NULL;
+	kfree(blkg->pd[pol->plid]);
+	blkg->pd[pol->plid] = NULL;
 
 	if (!pol)
 		return;
@@ -244,7 +244,7 @@ void update_root_blkg_pd(struct request_queue *q, enum blkio_policy_id plid)
 	pd = kzalloc(sizeof(*pd) + pol->pdata_size, GFP_KERNEL);
 	WARN_ON_ONCE(!pd);
 
-	blkg->pd[plid] = pd;
+	blkg->pd[pol->plid] = pd;
 	pd->blkg = blkg;
 	pol->ops.blkio_init_group_fn(blkg);
 }
@@ -360,7 +360,8 @@ static const char *blkg_dev_name(struct blkio_group *blkg)
  */
 void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
 		       u64 (*prfill)(struct seq_file *, void *, int),
-		       int pol, int data, bool show_total)
+		       const struct blkio_policy_type *pol, int data,
+		       bool show_total)
 {
 	struct blkio_group *blkg;
 	struct hlist_node *n;
@@ -368,8 +369,8 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
 
 	spin_lock_irq(&blkcg->lock);
 	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node)
-		if (blkg->pd[pol])
-			total += prfill(sf, blkg->pd[pol]->pdata, data);
+		if (blkg->pd[pol->plid])
+			total += prfill(sf, blkg->pd[pol->plid]->pdata, data);
 	spin_unlock_irq(&blkcg->lock);
 
 	if (show_total)
@@ -739,7 +740,7 @@ void blkio_policy_register(struct blkio_policy_type *blkiop)
 	BUG_ON(blkio_policy[blkiop->plid]);
 	blkio_policy[blkiop->plid] = blkiop;
 	list_for_each_entry(q, &all_q_list, all_q_node)
-		update_root_blkg_pd(q, blkiop->plid);
+		update_root_blkg_pd(q, blkiop);
 
 	blkcg_bypass_end();
 
@@ -765,7 +766,7 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
 	blkio_policy[blkiop->plid] = NULL;
 
 	list_for_each_entry(q, &all_q_list, all_q_node)
-		update_root_blkg_pd(q, blkiop->plid);
+		update_root_blkg_pd(q, blkiop);
 	blkcg_bypass_end();
 
 	mutex_unlock(&blkcg_pol_mutex);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index f2ede76..49116d3 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -117,11 +117,12 @@ extern void blkio_policy_register(struct blkio_policy_type *);
 extern void blkio_policy_unregister(struct blkio_policy_type *);
 extern void blkg_destroy_all(struct request_queue *q, bool destroy_root);
 extern void update_root_blkg_pd(struct request_queue *q,
-				enum blkio_policy_id plid);
+				const struct blkio_policy_type *pol);
 
 void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
 		       u64 (*prfill)(struct seq_file *, void *, int),
-		       int pol, int data, bool show_total);
+		       const struct blkio_policy_type *pol, int data,
+		       bool show_total);
 u64 __blkg_prfill_u64(struct seq_file *sf, void *pdata, u64 v);
 u64 __blkg_prfill_rwstat(struct seq_file *sf, void *pdata,
 			 const struct blkg_rwstat *rwstat);
@@ -333,7 +334,7 @@ static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { }
 static inline void blkg_destroy_all(struct request_queue *q,
 				    bool destory_root) { }
 static inline void update_root_blkg_pd(struct request_queue *q,
-				       enum blkio_policy_id plid) { }
+				       const struct blkio_policy_type *pol) { }
 
 static inline void *blkg_to_pdata(struct blkio_group *blkg,
 				struct blkio_policy_type *pol) { return NULL; }
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 6024014..07c17c2 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -946,7 +946,7 @@ static int tg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
-	blkcg_print_blkgs(sf, blkcg, tg_prfill_cpu_rwstat, BLKIO_POLICY_THROTL,
+	blkcg_print_blkgs(sf, blkcg, tg_prfill_cpu_rwstat, &blkio_policy_throtl,
 			  cft->private, true);
 	return 0;
 }
@@ -973,7 +973,7 @@ static int tg_print_conf_u64(struct cgroup *cgrp, struct cftype *cft,
 			     struct seq_file *sf)
 {
 	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp), tg_prfill_conf_u64,
-			  BLKIO_POLICY_THROTL, cft->private, false);
+			  &blkio_policy_throtl, cft->private, false);
 	return 0;
 }
 
@@ -981,7 +981,7 @@ static int tg_print_conf_uint(struct cgroup *cgrp, struct cftype *cft,
 			      struct seq_file *sf)
 {
 	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp), tg_prfill_conf_uint,
-			  BLKIO_POLICY_THROTL, cft->private, false);
+			  &blkio_policy_throtl, cft->private, false);
 	return 0;
 }
 
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index cff8b5b..7980173 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1381,7 +1381,7 @@ static int cfqg_print_weight_device(struct cgroup *cgrp, struct cftype *cft,
 				    struct seq_file *sf)
 {
 	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
-			  cfqg_prfill_weight_device, BLKIO_POLICY_PROP, 0,
+			  cfqg_prfill_weight_device, &blkio_policy_cfq, 0,
 			  false);
 	return 0;
 }
@@ -1446,7 +1446,7 @@ static int cfqg_print_stat(struct cgroup *cgrp, struct cftype *cft,
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
-	blkcg_print_blkgs(sf, blkcg, blkg_prfill_stat, BLKIO_POLICY_PROP,
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_stat, &blkio_policy_cfq,
 			  cft->private, false);
 	return 0;
 }
@@ -1456,7 +1456,7 @@ static int cfqg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
-	blkcg_print_blkgs(sf, blkcg, blkg_prfill_rwstat, BLKIO_POLICY_PROP,
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_rwstat, &blkio_policy_cfq,
 			  cft->private, true);
 	return 0;
 }
@@ -1483,7 +1483,7 @@ static int cfqg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft,
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
 	blkcg_print_blkgs(sf, blkcg, cfqg_prfill_avg_queue_size,
-			  BLKIO_POLICY_PROP, 0, false);
+			  &blkio_policy_cfq, 0, false);
 	return 0;
 }
 #endif	/* CONFIG_DEBUG_BLK_CGROUP */
@@ -3939,7 +3939,7 @@ static void cfq_exit_queue(struct elevator_queue *e)
 #ifndef CONFIG_CFQ_GROUP_IOSCHED
 	kfree(cfqd->root_group);
 #endif
-	update_root_blkg_pd(q, BLKIO_POLICY_PROP);
+	update_root_blkg_pd(q, &blkio_policy_cfq);
 	kfree(cfqd);
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/8] blkcg: remove static policy ID enums
  2012-04-12 23:29 [PATCHSET] block: per-queue policy activation Tejun Heo
  2012-04-12 23:29 ` [PATCH 1/8] blkcg: kill blkio_list and replace blkio_list_lock with a mutex Tejun Heo
  2012-04-12 23:29 ` [PATCH 2/8] blkcg: use @pol instead of @plid in update_root_blkg_pd() and blkcg_print_blkgs() Tejun Heo
@ 2012-04-12 23:29 ` Tejun Heo
  2012-04-12 23:29 ` [PATCH 4/8] blkcg: make blkg_conf_prep() take @pol and return with queue lock held Tejun Heo
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2012-04-12 23:29 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate
@pol->plid dynamically on registration.  The maximum number of blkcg
policies which can be registered at the same time is defined by
BLKCG_MAX_POLS constant added to include/linux/blkdev.h.

Note that blkio_policy_register() now may fail.  Policy init functions
updated accordingly and unnecessary ifdefs removed from cfq_init().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c     |   59 +++++++++++++++++++++++++++++++++++------------
 block/blk-cgroup.h     |   15 +++---------
 block/blk-throttle.c   |    4 +--
 block/cfq-iosched.c    |   25 +++++++++++---------
 include/linux/blkdev.h |    7 +++++-
 5 files changed, 69 insertions(+), 41 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b123152..2d4d7d6 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -31,7 +31,7 @@ static LIST_HEAD(all_q_list);
 struct blkio_cgroup blkio_root_cgroup = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT };
 EXPORT_SYMBOL_GPL(blkio_root_cgroup);
 
-static struct blkio_policy_type *blkio_policy[BLKIO_NR_POLICIES];
+static struct blkio_policy_type *blkio_policy[BLKCG_MAX_POLS];
 
 struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup)
 {
@@ -67,7 +67,7 @@ static void blkg_free(struct blkio_group *blkg)
 	if (!blkg)
 		return;
 
-	for (i = 0; i < BLKIO_NR_POLICIES; i++) {
+	for (i = 0; i < BLKCG_MAX_POLS; i++) {
 		struct blkio_policy_type *pol = blkio_policy[i];
 		struct blkg_policy_data *pd = blkg->pd[i];
 
@@ -107,7 +107,7 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
 	blkg->refcnt = 1;
 	cgroup_path(blkcg->css.cgroup, blkg->path, sizeof(blkg->path));
 
-	for (i = 0; i < BLKIO_NR_POLICIES; i++) {
+	for (i = 0; i < BLKCG_MAX_POLS; i++) {
 		struct blkio_policy_type *pol = blkio_policy[i];
 		struct blkg_policy_data *pd;
 
@@ -127,7 +127,7 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
 	}
 
 	/* invoke per-policy init */
-	for (i = 0; i < BLKIO_NR_POLICIES; i++) {
+	for (i = 0; i < BLKCG_MAX_POLS; i++) {
 		struct blkio_policy_type *pol = blkio_policy[i];
 
 		if (pol)
@@ -320,7 +320,7 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 	 * anyway.  If you get hit by a race, retry.
 	 */
 	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
-		for (i = 0; i < BLKIO_NR_POLICIES; i++) {
+		for (i = 0; i < BLKCG_MAX_POLS; i++) {
 			struct blkio_policy_type *pol = blkio_policy[i];
 
 			if (pol && pol->ops.blkio_reset_group_stats_fn)
@@ -729,46 +729,75 @@ struct cgroup_subsys blkio_subsys = {
 };
 EXPORT_SYMBOL_GPL(blkio_subsys);
 
-void blkio_policy_register(struct blkio_policy_type *blkiop)
+/**
+ * blkio_policy_register - register a blkcg policy
+ * @blkiop: blkcg policy to register
+ *
+ * Register @blkiop with blkcg core.  Might sleep and @blkiop may be
+ * modified on successful registration.  Returns 0 on success and -errno on
+ * failure.
+ */
+int blkio_policy_register(struct blkio_policy_type *blkiop)
 {
 	struct request_queue *q;
+	int i, ret;
 
 	mutex_lock(&blkcg_pol_mutex);
 
-	blkcg_bypass_start();
+	/* find an empty slot */
+	ret = -ENOSPC;
+	for (i = 0; i < BLKCG_MAX_POLS; i++)
+		if (!blkio_policy[i])
+			break;
+	if (i >= BLKCG_MAX_POLS)
+		goto out_unlock;
 
-	BUG_ON(blkio_policy[blkiop->plid]);
-	blkio_policy[blkiop->plid] = blkiop;
+	/* register and update blkgs */
+	blkiop->plid = i;
+	blkio_policy[i] = blkiop;
+
+	blkcg_bypass_start();
 	list_for_each_entry(q, &all_q_list, all_q_node)
 		update_root_blkg_pd(q, blkiop);
-
 	blkcg_bypass_end();
 
+	/* everything is in place, add intf files for the new policy */
 	if (blkiop->cftypes)
 		WARN_ON(cgroup_add_cftypes(&blkio_subsys, blkiop->cftypes));
-
+	ret = 0;
+out_unlock:
 	mutex_unlock(&blkcg_pol_mutex);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(blkio_policy_register);
 
+/**
+ * blkiop_policy_unregister - unregister a blkcg policy
+ * @blkiop: blkcg policy to unregister
+ *
+ * Undo blkio_policy_register(@blkiop).  Might sleep.
+ */
 void blkio_policy_unregister(struct blkio_policy_type *blkiop)
 {
 	struct request_queue *q;
 
 	mutex_lock(&blkcg_pol_mutex);
 
+	if (WARN_ON(blkio_policy[blkiop->plid] != blkiop))
+		goto out_unlock;
+
+	/* kill the intf files first */
 	if (blkiop->cftypes)
 		cgroup_rm_cftypes(&blkio_subsys, blkiop->cftypes);
 
-	blkcg_bypass_start();
-
-	BUG_ON(blkio_policy[blkiop->plid] != blkiop);
+	/* unregister and update blkgs */
 	blkio_policy[blkiop->plid] = NULL;
 
+	blkcg_bypass_start();
 	list_for_each_entry(q, &all_q_list, all_q_node)
 		update_root_blkg_pd(q, blkiop);
 	blkcg_bypass_end();
-
+out_unlock:
 	mutex_unlock(&blkcg_pol_mutex);
 }
 EXPORT_SYMBOL_GPL(blkio_policy_unregister);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 49116d3..cacdcaea 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -17,13 +17,6 @@
 #include <linux/u64_stats_sync.h>
 #include <linux/seq_file.h>
 
-enum blkio_policy_id {
-	BLKIO_POLICY_PROP = 0,		/* Proportional Bandwidth division */
-	BLKIO_POLICY_THROTL,		/* Throttling */
-
-	BLKIO_NR_POLICIES,
-};
-
 /* Max limits for throttle policy */
 #define THROTL_IOPS_MAX		UINT_MAX
 
@@ -86,7 +79,7 @@ struct blkio_group {
 	/* reference count */
 	int refcnt;
 
-	struct blkg_policy_data *pd[BLKIO_NR_POLICIES];
+	struct blkg_policy_data *pd[BLKCG_MAX_POLS];
 
 	struct rcu_head rcu_head;
 };
@@ -103,7 +96,7 @@ struct blkio_policy_ops {
 
 struct blkio_policy_type {
 	struct blkio_policy_ops ops;
-	enum blkio_policy_id plid;
+	int plid;
 	size_t pdata_size;		/* policy specific private data size */
 	struct cftype *cftypes;		/* cgroup files for the policy */
 };
@@ -113,7 +106,7 @@ extern void blkcg_drain_queue(struct request_queue *q);
 extern void blkcg_exit_queue(struct request_queue *q);
 
 /* Blkio controller policy registration */
-extern void blkio_policy_register(struct blkio_policy_type *);
+extern int blkio_policy_register(struct blkio_policy_type *);
 extern void blkio_policy_unregister(struct blkio_policy_type *);
 extern void blkg_destroy_all(struct request_queue *q, bool destroy_root);
 extern void update_root_blkg_pd(struct request_queue *q,
@@ -329,7 +322,7 @@ struct blkio_policy_type {
 static inline int blkcg_init_queue(struct request_queue *q) { return 0; }
 static inline void blkcg_drain_queue(struct request_queue *q) { }
 static inline void blkcg_exit_queue(struct request_queue *q) { }
-static inline void blkio_policy_register(struct blkio_policy_type *blkiop) { }
+static inline int blkio_policy_register(struct blkio_policy_type *blkiop) { return 0; }
 static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { }
 static inline void blkg_destroy_all(struct request_queue *q,
 				    bool destory_root) { }
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 07c17c2..0dc4645 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1089,7 +1089,6 @@ static struct blkio_policy_type blkio_policy_throtl = {
 		.blkio_exit_group_fn = throtl_exit_blkio_group,
 		.blkio_reset_group_stats_fn = throtl_reset_group_stats,
 	},
-	.plid = BLKIO_POLICY_THROTL,
 	.pdata_size = sizeof(struct throtl_grp),
 	.cftypes = throtl_files,
 };
@@ -1271,8 +1270,7 @@ static int __init throtl_init(void)
 	if (!kthrotld_workqueue)
 		panic("Failed to create kthrotld\n");
 
-	blkio_policy_register(&blkio_policy_throtl);
-	return 0;
+	return blkio_policy_register(&blkio_policy_throtl);
 }
 
 module_init(throtl_init);
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 7980173..a0fbc51 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -4158,7 +4158,6 @@ static struct blkio_policy_type blkio_policy_cfq = {
 		.blkio_init_group_fn =		cfq_init_blkio_group,
 		.blkio_reset_group_stats_fn =	cfqg_stats_reset,
 	},
-	.plid = BLKIO_POLICY_PROP,
 	.pdata_size = sizeof(struct cfq_group),
 	.cftypes = cfq_blkcg_files,
 };
@@ -4182,27 +4181,31 @@ static int __init cfq_init(void)
 #else
 		cfq_group_idle = 0;
 #endif
+
+	ret = blkio_policy_register(&blkio_policy_cfq);
+	if (ret)
+		return ret;
+
 	cfq_pool = KMEM_CACHE(cfq_queue, 0);
 	if (!cfq_pool)
-		return -ENOMEM;
+		goto err_pol_unreg;
 
 	ret = elv_register(&iosched_cfq);
-	if (ret) {
-		kmem_cache_destroy(cfq_pool);
-		return ret;
-	}
+	if (ret)
+		goto err_free_pool;
 
-#ifdef CONFIG_CFQ_GROUP_IOSCHED
-	blkio_policy_register(&blkio_policy_cfq);
-#endif
 	return 0;
+
+err_free_pool:
+	kmem_cache_destroy(cfq_pool);
+err_pol_unreg:
+	blkio_policy_unregister(&blkio_policy_cfq);
+	return ret;
 }
 
 static void __exit cfq_exit(void)
 {
-#ifdef CONFIG_CFQ_GROUP_IOSCHED
 	blkio_policy_unregister(&blkio_policy_cfq);
-#endif
 	elv_unregister(&iosched_cfq);
 	kmem_cache_destroy(cfq_pool);
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 33f1b29..d2c69f8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -35,6 +35,12 @@ struct bsg_job;
 #define BLKDEV_MIN_RQ	4
 #define BLKDEV_MAX_RQ	128	/* Default maximum */
 
+/*
+ * Maximum number of blkcg policies allowed to be registered concurrently.
+ * Defined here to simplify include dependency.
+ */
+#define BLKCG_MAX_POLS		2
+
 struct request;
 typedef void (rq_end_io_fn)(struct request *, int);
 
@@ -363,7 +369,6 @@ struct request_queue {
 
 	struct list_head	icq_list;
 #ifdef CONFIG_BLK_CGROUP
-	/* XXX: array size hardcoded to avoid include dependency (temporary) */
 	struct list_head	blkg_list;
 #endif
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/8] blkcg: make blkg_conf_prep() take @pol and return with queue lock held
  2012-04-12 23:29 [PATCHSET] block: per-queue policy activation Tejun Heo
                   ` (2 preceding siblings ...)
  2012-04-12 23:29 ` [PATCH 3/8] blkcg: remove static policy ID enums Tejun Heo
@ 2012-04-12 23:29 ` Tejun Heo
  2012-04-12 23:29 ` [PATCH 5/8] blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing Tejun Heo
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2012-04-12 23:29 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

Add @pol to blkg_conf_prep() and let it return with queue lock held
(to be released by blkg_conf_finish()).  Note that @pol isn't used
yet.

This is to prepare for per-queue policy activation and doesn't cause
any visible difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c   |   17 ++++++++++-------
 block/blk-cgroup.h   |    3 ++-
 block/blk-throttle.c |    2 +-
 block/cfq-iosched.c  |    2 +-
 4 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 2d4d7d6..f6581a0 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -464,17 +464,19 @@ EXPORT_SYMBOL_GPL(blkg_prfill_rwstat);
 /**
  * blkg_conf_prep - parse and prepare for per-blkg config update
  * @blkcg: target block cgroup
+ * @pol: target policy
  * @input: input string
  * @ctx: blkg_conf_ctx to be filled
  *
  * Parse per-blkg config update from @input and initialize @ctx with the
  * result.  @ctx->blkg points to the blkg to be updated and @ctx->v the new
- * value.  This function returns with RCU read locked and must be paired
- * with blkg_conf_finish().
+ * value.  This function returns with RCU read lock and queue lock held and
+ * must be paired with blkg_conf_finish().
  */
-int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
+int blkg_conf_prep(struct blkio_cgroup *blkcg,
+		   const struct blkio_policy_type *pol, const char *input,
 		   struct blkg_conf_ctx *ctx)
-	__acquires(rcu)
+	__acquires(rcu) __acquires(disk->queue->queue_lock)
 {
 	struct gendisk *disk;
 	struct blkio_group *blkg;
@@ -490,14 +492,14 @@ int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
 		return -EINVAL;
 
 	rcu_read_lock();
-
 	spin_lock_irq(disk->queue->queue_lock);
+
 	blkg = blkg_lookup_create(blkcg, disk->queue, false);
-	spin_unlock_irq(disk->queue->queue_lock);
 
 	if (IS_ERR(blkg)) {
 		ret = PTR_ERR(blkg);
 		rcu_read_unlock();
+		spin_unlock_irq(disk->queue->queue_lock);
 		put_disk(disk);
 		/*
 		 * If queue was bypassing, we should retry.  Do so after a
@@ -527,8 +529,9 @@ EXPORT_SYMBOL_GPL(blkg_conf_prep);
  * with blkg_conf_prep().
  */
 void blkg_conf_finish(struct blkg_conf_ctx *ctx)
-	__releases(rcu)
+	__releases(ctx->disk->queue->queue_lock) __releases(rcu)
 {
+	spin_unlock_irq(ctx->disk->queue->queue_lock);
 	rcu_read_unlock();
 	put_disk(ctx->disk);
 }
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index cacdcaea..44dd11d 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -128,7 +128,8 @@ struct blkg_conf_ctx {
 	u64			v;
 };
 
-int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
+int blkg_conf_prep(struct blkio_cgroup *blkcg,
+		   const struct blkio_policy_type *pol, const char *input,
 		   struct blkg_conf_ctx *ctx);
 void blkg_conf_finish(struct blkg_conf_ctx *ctx);
 
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 0dc4645..6f1bfdf 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -993,7 +993,7 @@ static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf,
 	struct throtl_grp *tg;
 	int ret;
 
-	ret = blkg_conf_prep(blkcg, buf, &ctx);
+	ret = blkg_conf_prep(blkcg, &blkio_policy_throtl, buf, &ctx);
 	if (ret)
 		return ret;
 
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index a0fbc51..761a376 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1401,7 +1401,7 @@ static int cfqg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
 	struct cfq_group *cfqg;
 	int ret;
 
-	ret = blkg_conf_prep(blkcg, buf, &ctx);
+	ret = blkg_conf_prep(blkcg, &blkio_policy_cfq, buf, &ctx);
 	if (ret)
 		return ret;
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/8] blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
  2012-04-12 23:29 [PATCHSET] block: per-queue policy activation Tejun Heo
                   ` (3 preceding siblings ...)
  2012-04-12 23:29 ` [PATCH 4/8] blkcg: make blkg_conf_prep() take @pol and return with queue lock held Tejun Heo
@ 2012-04-12 23:29 ` Tejun Heo
  2012-04-13 16:00   ` Vivek Goyal
  2012-04-12 23:29 ` [PATCH 6/8] blkcg: add request_queue->root_blkg Tejun Heo
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2012-04-12 23:29 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

Currently, blkg_lookup() doesn't check @q bypass state.  This patch
updates blk_queue_bypass_start() to do synchronize_rcu() before
returning and updates blkg_lookup() to check blk_queue_bypass() and
return %NULL if bypassing.  This ensures blkg_lookup() returns %NULL
if @q is bypassing.

This is to guarantee that nobody is accessing policy data while @q is
bypassing, which is necessary to allow replacing blkio_cgroup->pd[] in
place on policy [de]activation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c |   50 +++++++++++++++++++++++++++++++++-----------------
 block/blk-core.c   |    4 +++-
 2 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index f6581a0..fa19412 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -137,6 +137,38 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
 	return blkg;
 }
 
+static struct blkio_group *__blkg_lookup(struct blkio_cgroup *blkcg,
+					 struct request_queue *q)
+{
+	struct blkio_group *blkg;
+	struct hlist_node *n;
+
+	hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node)
+		if (blkg->q == q)
+			return blkg;
+	return NULL;
+}
+
+/**
+ * blkg_lookup - lookup blkg for the specified blkcg - q pair
+ * @blkcg: blkcg of interest
+ * @q: request_queue of interest
+ *
+ * Lookup blkg for the @blkcg - @q pair.  This function should be called
+ * under RCU read lock and is guaranteed to return %NULL if @q is
+ * bypassing.
+ */
+struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg,
+				struct request_queue *q)
+{
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	if (unlikely(blk_queue_bypass(q)))
+		return NULL;
+	return __blkg_lookup(blkcg, q);
+}
+EXPORT_SYMBOL_GPL(blkg_lookup);
+
 struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
 				       struct request_queue *q,
 				       bool for_root)
@@ -150,13 +182,11 @@ struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
 	/*
 	 * This could be the first entry point of blkcg implementation and
 	 * we shouldn't allow anything to go through for a bypassing queue.
-	 * The following can be removed if blkg lookup is guaranteed to
-	 * fail on a bypassing queue.
 	 */
 	if (unlikely(blk_queue_bypass(q)) && !for_root)
 		return ERR_PTR(blk_queue_dead(q) ? -EINVAL : -EBUSY);
 
-	blkg = blkg_lookup(blkcg, q);
+	blkg = __blkg_lookup(blkcg, q);
 	if (blkg)
 		return blkg;
 
@@ -185,20 +215,6 @@ out:
 }
 EXPORT_SYMBOL_GPL(blkg_lookup_create);
 
-/* called under rcu_read_lock(). */
-struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg,
-				struct request_queue *q)
-{
-	struct blkio_group *blkg;
-	struct hlist_node *n;
-
-	hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node)
-		if (blkg->q == q)
-			return blkg;
-	return NULL;
-}
-EXPORT_SYMBOL_GPL(blkg_lookup);
-
 static void blkg_destroy(struct blkio_group *blkg)
 {
 	struct request_queue *q = blkg->q;
diff --git a/block/blk-core.c b/block/blk-core.c
index 991c1d6..8b3d073 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -416,7 +416,8 @@ void blk_drain_queue(struct request_queue *q, bool drain_all)
  * In bypass mode, only the dispatch FIFO queue of @q is used.  This
  * function makes @q enter bypass mode and drains all requests which were
  * throttled or issued before.  On return, it's guaranteed that no request
- * is being throttled or has ELVPRIV set.
+ * is being throttled or has ELVPRIV set and blk_queue_bypass() is %true
+ * inside queue or RCU read lock.
  */
 void blk_queue_bypass_start(struct request_queue *q)
 {
@@ -426,6 +427,7 @@ void blk_queue_bypass_start(struct request_queue *q)
 	spin_unlock_irq(q->queue_lock);
 
 	blk_drain_queue(q, false);
+	synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(blk_queue_bypass_start);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/8] blkcg: add request_queue->root_blkg
  2012-04-12 23:29 [PATCHSET] block: per-queue policy activation Tejun Heo
                   ` (4 preceding siblings ...)
  2012-04-12 23:29 ` [PATCH 5/8] blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing Tejun Heo
@ 2012-04-12 23:29 ` Tejun Heo
  2012-04-12 23:29 ` [PATCH 7/8] blkcg: implement per-queue policy activation Tejun Heo
  2012-04-12 23:29 ` [PATCH 8/8] blkcg: drop stuff unused after per-queue policy activation update Tejun Heo
  7 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2012-04-12 23:29 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

With per-queue policy activation, root blkg creation will be moved to
blkcg core.  Add q->root_blkg in preparation.  For blk-throtl, this
replaces throtl_data->root_tg; however, cfq needs to keep
cfqd->root_group for !CONFIG_CFQ_GROUP_IOSCHED.

This is to prepare for per-queue policy activation and doesn't cause
any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-throttle.c   |   16 ++++++++++------
 block/cfq-iosched.c    |    4 +++-
 include/linux/blkdev.h |    2 ++
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 6f1bfdf..8c520fa 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -97,7 +97,6 @@ struct throtl_data
 	/* service tree for active throtl groups */
 	struct throtl_rb_root tg_service_tree;
 
-	struct throtl_grp *root_tg;
 	struct request_queue *queue;
 
 	/* Total Number of queued bios on READ and WRITE lists */
@@ -131,6 +130,11 @@ static inline struct blkio_group *tg_to_blkg(struct throtl_grp *tg)
 	return pdata_to_blkg(tg);
 }
 
+static inline struct throtl_grp *td_root_tg(struct throtl_data *td)
+{
+	return blkg_to_tg(td->queue->root_blkg);
+}
+
 enum tg_state_flags {
 	THROTL_TG_FLAG_on_rr = 0,	/* on round-robin busy list */
 };
@@ -261,7 +265,7 @@ throtl_grp *throtl_lookup_tg(struct throtl_data *td, struct blkio_cgroup *blkcg)
 	 * Avoid lookup in this case
 	 */
 	if (blkcg == &blkio_root_cgroup)
-		return td->root_tg;
+		return td_root_tg(td);
 
 	return blkg_to_tg(blkg_lookup(blkcg, td->queue));
 }
@@ -277,7 +281,7 @@ static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td,
 	 * Avoid lookup in this case
 	 */
 	if (blkcg == &blkio_root_cgroup) {
-		tg = td->root_tg;
+		tg = td_root_tg(td);
 	} else {
 		struct blkio_group *blkg;
 
@@ -287,7 +291,7 @@ static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td,
 		if (!IS_ERR(blkg))
 			tg = blkg_to_tg(blkg);
 		else if (!blk_queue_dead(q))
-			tg = td->root_tg;
+			tg = td_root_tg(td);
 	}
 
 	return tg;
@@ -1245,12 +1249,12 @@ int blk_throtl_init(struct request_queue *q)
 
 	blkg = blkg_lookup_create(&blkio_root_cgroup, q, true);
 	if (!IS_ERR(blkg))
-		td->root_tg = blkg_to_tg(blkg);
+		q->root_blkg = blkg;
 
 	spin_unlock_irq(q->queue_lock);
 	rcu_read_unlock();
 
-	if (!td->root_tg) {
+	if (!q->root_blkg) {
 		kfree(td);
 		return -ENOMEM;
 	}
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 761a376..b70425f 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3965,8 +3965,10 @@ static int cfq_init_queue(struct request_queue *q)
 	spin_lock_irq(q->queue_lock);
 
 	blkg = blkg_lookup_create(&blkio_root_cgroup, q, true);
-	if (!IS_ERR(blkg))
+	if (!IS_ERR(blkg)) {
+		q->root_blkg = blkg;
 		cfqd->root_group = blkg_to_cfqg(blkg);
+	}
 
 	spin_unlock_irq(q->queue_lock);
 	rcu_read_unlock();
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d2c69f8..b01c377 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -31,6 +31,7 @@ struct blk_trace;
 struct request;
 struct sg_io_hdr;
 struct bsg_job;
+struct blkio_group;
 
 #define BLKDEV_MIN_RQ	4
 #define BLKDEV_MAX_RQ	128	/* Default maximum */
@@ -369,6 +370,7 @@ struct request_queue {
 
 	struct list_head	icq_list;
 #ifdef CONFIG_BLK_CGROUP
+	struct blkio_group	*root_blkg;
 	struct list_head	blkg_list;
 #endif
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/8] blkcg: implement per-queue policy activation
  2012-04-12 23:29 [PATCHSET] block: per-queue policy activation Tejun Heo
                   ` (5 preceding siblings ...)
  2012-04-12 23:29 ` [PATCH 6/8] blkcg: add request_queue->root_blkg Tejun Heo
@ 2012-04-12 23:29 ` Tejun Heo
  2012-04-13 15:21   ` Vivek Goyal
  2012-04-13 18:12   ` Vivek Goyal
  2012-04-12 23:29 ` [PATCH 8/8] blkcg: drop stuff unused after per-queue policy activation update Tejun Heo
  7 siblings, 2 replies; 20+ messages in thread
From: Tejun Heo @ 2012-04-12 23:29 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

All blkcg policies were assumed to be enabled on all request_queues.
Due to various implementation obstacles, during the recent blkcg core
updates, this was temporarily implemented as shooting down all !root
blkgs on elevator switch and policy [de]registration combined with
half-broken in-place root blkg updates.  In addition to being buggy
and racy, this meant losing all blkcg configurations across those
events.

Now that blkcg is cleaned up enough, this patch replaces the temporary
implementation with proper per-queue policy activation.  Each blkcg
policy should call the new blkcg_[de]activate_policy() to enable and
disable the policy on a specific queue.  blkcg_activate_policy()
allocates and installs policy data for the policy for all existing
blkgs.  blkcg_deactivate_policy() does the reverse.  If a policy is
not enabled for a given queue, blkg printing / config functions skip
the respective blkg for the queue.

blkcg_activate_policy() also takes care of root blkg creation, and
cfq_init_queue() and blk_throtl_init() are updated accordingly.

This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd()
unnecessary.  Dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c     |  228 ++++++++++++++++++++++++++++++++----------------
 block/blk-cgroup.h     |   15 +++-
 block/blk-throttle.c   |   52 ++++-------
 block/cfq-iosched.c    |   36 +++-----
 block/elevator.c       |    2 -
 include/linux/blkdev.h |    1 +
 6 files changed, 200 insertions(+), 134 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index fa19412..a92eaf9 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -54,6 +54,17 @@ struct blkio_cgroup *bio_blkio_cgroup(struct bio *bio)
 }
 EXPORT_SYMBOL_GPL(bio_blkio_cgroup);
 
+static bool blkcg_policy_enabled(struct request_queue *q,
+				 const struct blkio_policy_type *pol)
+{
+	return pol && test_bit(pol->plid, q->blkcg_pols);
+}
+
+static size_t blkg_pd_size(const struct blkio_policy_type *pol)
+{
+	return sizeof(struct blkg_policy_data) + pol->pdata_size;
+}
+
 /**
  * blkg_free - free a blkg
  * @blkg: blkg to free
@@ -111,12 +122,11 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
 		struct blkio_policy_type *pol = blkio_policy[i];
 		struct blkg_policy_data *pd;
 
-		if (!pol)
+		if (!blkcg_policy_enabled(q, pol))
 			continue;
 
 		/* alloc per-policy data and attach it to blkg */
-		pd = kzalloc_node(sizeof(*pd) + pol->pdata_size, GFP_ATOMIC,
-				  q->node);
+		pd = kzalloc_node(blkg_pd_size(pol), GFP_ATOMIC, q->node);
 		if (!pd) {
 			blkg_free(blkg);
 			return NULL;
@@ -130,7 +140,7 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
 	for (i = 0; i < BLKCG_MAX_POLS; i++) {
 		struct blkio_policy_type *pol = blkio_policy[i];
 
-		if (pol)
+		if (blkcg_policy_enabled(blkg->q, pol))
 			pol->ops.blkio_init_group_fn(blkg);
 	}
 
@@ -236,36 +246,6 @@ static void blkg_destroy(struct blkio_group *blkg)
 	blkg_put(blkg);
 }
 
-/*
- * XXX: This updates blkg policy data in-place for root blkg, which is
- * necessary across elevator switch and policy registration as root blkgs
- * aren't shot down.  This broken and racy implementation is temporary.
- * Eventually, blkg shoot down will be replaced by proper in-place update.
- */
-void update_root_blkg_pd(struct request_queue *q,
-			 const struct blkio_policy_type *pol)
-{
-	struct blkio_group *blkg = blkg_lookup(&blkio_root_cgroup, q);
-	struct blkg_policy_data *pd;
-
-	if (!blkg)
-		return;
-
-	kfree(blkg->pd[pol->plid]);
-	blkg->pd[pol->plid] = NULL;
-
-	if (!pol)
-		return;
-
-	pd = kzalloc(sizeof(*pd) + pol->pdata_size, GFP_KERNEL);
-	WARN_ON_ONCE(!pd);
-
-	blkg->pd[pol->plid] = pd;
-	pd->blkg = blkg;
-	pol->ops.blkio_init_group_fn(blkg);
-}
-EXPORT_SYMBOL_GPL(update_root_blkg_pd);
-
 /**
  * blkg_destroy_all - destroy all blkgs associated with a request_queue
  * @q: request_queue of interest
@@ -339,7 +319,8 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 		for (i = 0; i < BLKCG_MAX_POLS; i++) {
 			struct blkio_policy_type *pol = blkio_policy[i];
 
-			if (pol && pol->ops.blkio_reset_group_stats_fn)
+			if (blkcg_policy_enabled(blkg->q, pol) &&
+			    pol->ops.blkio_reset_group_stats_fn)
 				pol->ops.blkio_reset_group_stats_fn(blkg);
 		}
 	}
@@ -385,7 +366,7 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
 
 	spin_lock_irq(&blkcg->lock);
 	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node)
-		if (blkg->pd[pol->plid])
+		if (blkcg_policy_enabled(blkg->q, pol))
 			total += prfill(sf, blkg->pd[pol->plid]->pdata, data);
 	spin_unlock_irq(&blkcg->lock);
 
@@ -510,7 +491,10 @@ int blkg_conf_prep(struct blkio_cgroup *blkcg,
 	rcu_read_lock();
 	spin_lock_irq(disk->queue->queue_lock);
 
-	blkg = blkg_lookup_create(blkcg, disk->queue, false);
+	if (blkcg_policy_enabled(disk->queue, pol))
+		blkg = blkg_lookup_create(blkcg, disk->queue, false);
+	else
+		blkg = ERR_PTR(-EINVAL);
 
 	if (IS_ERR(blkg)) {
 		ret = PTR_ERR(blkg);
@@ -712,30 +696,6 @@ static int blkiocg_can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)
 	return ret;
 }
 
-static void blkcg_bypass_start(void)
-	__acquires(&all_q_mutex)
-{
-	struct request_queue *q;
-
-	mutex_lock(&all_q_mutex);
-
-	list_for_each_entry(q, &all_q_list, all_q_node) {
-		blk_queue_bypass_start(q);
-		blkg_destroy_all(q, false);
-	}
-}
-
-static void blkcg_bypass_end(void)
-	__releases(&all_q_mutex)
-{
-	struct request_queue *q;
-
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		blk_queue_bypass_end(q);
-
-	mutex_unlock(&all_q_mutex);
-}
-
 struct cgroup_subsys blkio_subsys = {
 	.name = "blkio",
 	.create = blkiocg_create,
@@ -749,6 +709,139 @@ struct cgroup_subsys blkio_subsys = {
 EXPORT_SYMBOL_GPL(blkio_subsys);
 
 /**
+ * blkcg_activate_policy - activate a blkcg policy on a request_queue
+ * @q: request_queue of interest
+ * @pol: blkcg policy to activate
+ *
+ * Activate @pol on @q.  Requires %GFP_KERNEL context.  @q goes through
+ * bypass mode to populate its blkgs with policy_data for @pol.
+ *
+ * Activation happens with @q bypassed, so nobody would be accessing blkgs
+ * from IO path.  Update of each blkg is protected by both queue and blkcg
+ * locks so that holding either lock and testing blkcg_policy_enabled() is
+ * always enough for dereferencing policy data.
+ *
+ * The caller is responsible for synchronizing [de]activations and policy
+ * [un]registerations.  Returns 0 on success, -errno on failure.
+ */
+int blkcg_activate_policy(struct request_queue *q,
+			  const struct blkio_policy_type *pol)
+{
+	LIST_HEAD(pds);
+	struct blkio_group *blkg;
+	struct blkg_policy_data *pd, *n;
+	int cnt = 0, ret;
+
+	if (blkcg_policy_enabled(q, pol))
+		return 0;
+
+	blk_queue_bypass_start(q);
+
+	/* make sure the root blkg exists and count the existing blkgs */
+	spin_lock_irq(q->queue_lock);
+
+	rcu_read_lock();
+	blkg = blkg_lookup_create(&blkio_root_cgroup, q, true);
+	rcu_read_unlock();
+
+	if (IS_ERR(blkg)) {
+		ret = PTR_ERR(blkg);
+		goto out_unlock;
+	}
+	q->root_blkg = blkg;
+
+	list_for_each_entry(blkg, &q->blkg_list, q_node)
+		cnt++;
+
+	spin_unlock_irq(q->queue_lock);
+
+	/* allocate policy_data for all existing blkgs */
+	while (cnt--) {
+		pd = kzalloc_node(blkg_pd_size(pol), GFP_KERNEL, q->node);
+		if (!pd) {
+			ret = -ENOMEM;
+			goto out_free;
+		}
+		list_add_tail(&pd->alloc_node, &pds);
+	}
+
+	/*
+	 * Install the allocated pds.  With @q bypassing, no new blkg
+	 * should have been created while the queue lock was dropped.
+	 */
+	spin_lock_irq(q->queue_lock);
+
+	list_for_each_entry(blkg, &q->blkg_list, q_node) {
+		if (WARN_ON(list_empty(&pds))) {
+			/* umm... this shouldn't happen, just abort */
+			ret = -ENOMEM;
+			goto out_unlock;
+		}
+		pd = list_first_entry(&pds, struct blkg_policy_data, alloc_node);
+		list_del_init(&pd->alloc_node);
+
+		/* grab blkcg lock too while installing @pd on @blkg */
+		spin_lock(&blkg->blkcg->lock);
+
+		blkg->pd[pol->plid] = pd;
+		pd->blkg = blkg;
+		pol->ops.blkio_init_group_fn(blkg);
+
+		spin_unlock(&blkg->blkcg->lock);
+	}
+
+	__set_bit(pol->plid, q->blkcg_pols);
+	ret = 0;
+out_unlock:
+	spin_unlock_irq(q->queue_lock);
+out_free:
+	blk_queue_bypass_end(q);
+	list_for_each_entry_safe(pd, n, &pds, alloc_node)
+		kfree(pd);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkcg_activate_policy);
+
+/**
+ * blkcg_deactivate_policy - deactivate a blkcg policy on a request_queue
+ * @q: request_queue of interest
+ * @pol: blkcg policy to deactivate
+ *
+ * Deactivate @pol on @q.  Follows the same synchronization rules as
+ * blkcg_activate_policy().
+ */
+void blkcg_deactivate_policy(struct request_queue *q,
+			     const struct blkio_policy_type *pol)
+{
+	struct blkio_group *blkg;
+
+	if (!blkcg_policy_enabled(q, pol))
+		return;
+
+	blk_queue_bypass_start(q);
+	spin_lock_irq(q->queue_lock);
+
+	__clear_bit(pol->plid, q->blkcg_pols);
+
+	list_for_each_entry(blkg, &q->blkg_list, q_node) {
+		/* grab blkcg lock too while removing @pd from @blkg */
+		spin_lock(&blkg->blkcg->lock);
+
+		if (pol->ops.blkio_exit_group_fn)
+			pol->ops.blkio_exit_group_fn(blkg);
+
+		kfree(blkg->pd[pol->plid]);
+		blkg->pd[pol->plid] = NULL;
+
+		spin_unlock(&blkg->blkcg->lock);
+	}
+
+	spin_unlock_irq(q->queue_lock);
+	blk_queue_bypass_end(q);
+}
+EXPORT_SYMBOL_GPL(blkcg_deactivate_policy);
+
+/**
  * blkio_policy_register - register a blkcg policy
  * @blkiop: blkcg policy to register
  *
@@ -758,7 +851,6 @@ EXPORT_SYMBOL_GPL(blkio_subsys);
  */
 int blkio_policy_register(struct blkio_policy_type *blkiop)
 {
-	struct request_queue *q;
 	int i, ret;
 
 	mutex_lock(&blkcg_pol_mutex);
@@ -775,11 +867,6 @@ int blkio_policy_register(struct blkio_policy_type *blkiop)
 	blkiop->plid = i;
 	blkio_policy[i] = blkiop;
 
-	blkcg_bypass_start();
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		update_root_blkg_pd(q, blkiop);
-	blkcg_bypass_end();
-
 	/* everything is in place, add intf files for the new policy */
 	if (blkiop->cftypes)
 		WARN_ON(cgroup_add_cftypes(&blkio_subsys, blkiop->cftypes));
@@ -798,8 +885,6 @@ EXPORT_SYMBOL_GPL(blkio_policy_register);
  */
 void blkio_policy_unregister(struct blkio_policy_type *blkiop)
 {
-	struct request_queue *q;
-
 	mutex_lock(&blkcg_pol_mutex);
 
 	if (WARN_ON(blkio_policy[blkiop->plid] != blkiop))
@@ -811,11 +896,6 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
 
 	/* unregister and update blkgs */
 	blkio_policy[blkiop->plid] = NULL;
-
-	blkcg_bypass_start();
-	list_for_each_entry(q, &all_q_list, all_q_node)
-		update_root_blkg_pd(q, blkiop);
-	blkcg_bypass_end();
 out_unlock:
 	mutex_unlock(&blkcg_pol_mutex);
 }
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 44dd11d..519dc6b 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -64,6 +64,9 @@ struct blkg_policy_data {
 	/* the blkg this per-policy data belongs to */
 	struct blkio_group *blkg;
 
+	/* used during policy activation */
+	struct list_head alloc_node;
+
 	/* pol->pdata_size bytes of private data used by policy impl */
 	char pdata[] __aligned(__alignof__(unsigned long long));
 };
@@ -108,9 +111,11 @@ extern void blkcg_exit_queue(struct request_queue *q);
 /* Blkio controller policy registration */
 extern int blkio_policy_register(struct blkio_policy_type *);
 extern void blkio_policy_unregister(struct blkio_policy_type *);
+extern int blkcg_activate_policy(struct request_queue *q,
+				 const struct blkio_policy_type *pol);
+extern void blkcg_deactivate_policy(struct request_queue *q,
+				    const struct blkio_policy_type *pol);
 extern void blkg_destroy_all(struct request_queue *q, bool destroy_root);
-extern void update_root_blkg_pd(struct request_queue *q,
-				const struct blkio_policy_type *pol);
 
 void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
 		       u64 (*prfill)(struct seq_file *, void *, int),
@@ -325,10 +330,12 @@ static inline void blkcg_drain_queue(struct request_queue *q) { }
 static inline void blkcg_exit_queue(struct request_queue *q) { }
 static inline int blkio_policy_register(struct blkio_policy_type *blkiop) { return 0; }
 static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { }
+static inline int blkcg_activate_policy(struct request_queue *q,
+					const struct blkio_policy_type *pol) { return 0; }
+static inline void blkcg_deactivate_policy(struct request_queue *q,
+					   const struct blkio_policy_type *pol) { }
 static inline void blkg_destroy_all(struct request_queue *q,
 				    bool destory_root) { }
-static inline void update_root_blkg_pd(struct request_queue *q,
-				       const struct blkio_policy_type *pol) { }
 
 static inline void *blkg_to_pdata(struct blkio_group *blkg,
 				struct blkio_policy_type *pol) { return NULL; }
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8c520fa..2fc964e 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -995,35 +995,31 @@ static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf,
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 	struct blkg_conf_ctx ctx;
 	struct throtl_grp *tg;
+	struct throtl_data *td;
 	int ret;
 
 	ret = blkg_conf_prep(blkcg, &blkio_policy_throtl, buf, &ctx);
 	if (ret)
 		return ret;
 
-	ret = -EINVAL;
 	tg = blkg_to_tg(ctx.blkg);
-	if (tg) {
-		struct throtl_data *td = ctx.blkg->q->td;
-
-		if (!ctx.v)
-			ctx.v = -1;
+	td = ctx.blkg->q->td;
 
-		if (is_u64)
-			*(u64 *)((void *)tg + cft->private) = ctx.v;
-		else
-			*(unsigned int *)((void *)tg + cft->private) = ctx.v;
+	if (!ctx.v)
+		ctx.v = -1;
 
-		/* XXX: we don't need the following deferred processing */
-		xchg(&tg->limits_changed, true);
-		xchg(&td->limits_changed, true);
-		throtl_schedule_delayed_work(td, 0);
+	if (is_u64)
+		*(u64 *)((void *)tg + cft->private) = ctx.v;
+	else
+		*(unsigned int *)((void *)tg + cft->private) = ctx.v;
 
-		ret = 0;
-	}
+	/* XXX: we don't need the following deferred processing */
+	xchg(&tg->limits_changed, true);
+	xchg(&td->limits_changed, true);
+	throtl_schedule_delayed_work(td, 0);
 
 	blkg_conf_finish(&ctx);
-	return ret;
+	return 0;
 }
 
 static int tg_set_conf_u64(struct cgroup *cgrp, struct cftype *cft,
@@ -1230,7 +1226,7 @@ void blk_throtl_drain(struct request_queue *q)
 int blk_throtl_init(struct request_queue *q)
 {
 	struct throtl_data *td;
-	struct blkio_group *blkg;
+	int ret;
 
 	td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node);
 	if (!td)
@@ -1243,28 +1239,18 @@ int blk_throtl_init(struct request_queue *q)
 	q->td = td;
 	td->queue = q;
 
-	/* alloc and init root group. */
-	rcu_read_lock();
-	spin_lock_irq(q->queue_lock);
-
-	blkg = blkg_lookup_create(&blkio_root_cgroup, q, true);
-	if (!IS_ERR(blkg))
-		q->root_blkg = blkg;
-
-	spin_unlock_irq(q->queue_lock);
-	rcu_read_unlock();
-
-	if (!q->root_blkg) {
+	/* activate policy */
+	ret = blkcg_activate_policy(q, &blkio_policy_throtl);
+	if (ret)
 		kfree(td);
-		return -ENOMEM;
-	}
-	return 0;
+	return ret;
 }
 
 void blk_throtl_exit(struct request_queue *q)
 {
 	BUG_ON(!q->td);
 	throtl_shutdown_wq(q);
+	blkcg_deactivate_policy(q, &blkio_policy_throtl);
 	kfree(q->td);
 }
 
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b70425f..947db5f 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1407,8 +1407,7 @@ static int cfqg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
 
 	ret = -EINVAL;
 	cfqg = blkg_to_cfqg(ctx.blkg);
-	if (cfqg && (!ctx.v || (ctx.v >= CFQ_WEIGHT_MIN &&
-				ctx.v <= CFQ_WEIGHT_MAX))) {
+	if (!ctx.v || (ctx.v >= CFQ_WEIGHT_MIN && ctx.v <= CFQ_WEIGHT_MAX)) {
 		cfqg->dev_weight = ctx.v;
 		cfqg->new_weight = cfqg->dev_weight ?: blkcg->cfq_weight;
 		ret = 0;
@@ -3939,7 +3938,7 @@ static void cfq_exit_queue(struct elevator_queue *e)
 #ifndef CONFIG_CFQ_GROUP_IOSCHED
 	kfree(cfqd->root_group);
 #endif
-	update_root_blkg_pd(q, &blkio_policy_cfq);
+	blkcg_deactivate_policy(q, &blkio_policy_cfq);
 	kfree(cfqd);
 }
 
@@ -3947,7 +3946,7 @@ static int cfq_init_queue(struct request_queue *q)
 {
 	struct cfq_data *cfqd;
 	struct blkio_group *blkg __maybe_unused;
-	int i;
+	int i, ret;
 
 	cfqd = kmalloc_node(sizeof(*cfqd), GFP_KERNEL | __GFP_ZERO, q->node);
 	if (!cfqd)
@@ -3961,28 +3960,19 @@ static int cfq_init_queue(struct request_queue *q)
 
 	/* Init root group and prefer root group over other groups by default */
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
-	rcu_read_lock();
-	spin_lock_irq(q->queue_lock);
-
-	blkg = blkg_lookup_create(&blkio_root_cgroup, q, true);
-	if (!IS_ERR(blkg)) {
-		q->root_blkg = blkg;
-		cfqd->root_group = blkg_to_cfqg(blkg);
-	}
+	ret = blkcg_activate_policy(q, &blkio_policy_cfq);
+	if (ret)
+		goto out_free;
 
-	spin_unlock_irq(q->queue_lock);
-	rcu_read_unlock();
+	cfqd->root_group = blkg_to_cfqg(q->root_blkg);
 #else
 	cfqd->root_group = kzalloc_node(sizeof(*cfqd->root_group),
 					GFP_KERNEL, cfqd->queue->node);
-	if (cfqd->root_group)
-		cfq_init_cfqg_base(cfqd->root_group);
-#endif
-	if (!cfqd->root_group) {
-		kfree(cfqd);
-		return -ENOMEM;
-	}
+	if (!cfqd->root_group)
+		goto out_free;
 
+	cfq_init_cfqg_base(cfqd->root_group);
+#endif
 	cfqd->root_group->weight = 2 * CFQ_WEIGHT_DEFAULT;
 
 	/*
@@ -4032,6 +4022,10 @@ static int cfq_init_queue(struct request_queue *q)
 	 */
 	cfqd->last_delayed_sync = jiffies - HZ;
 	return 0;
+
+out_free:
+	kfree(cfqd);
+	return ret;
 }
 
 /*
diff --git a/block/elevator.c b/block/elevator.c
index be3ab6d..6a55d41 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -896,8 +896,6 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 	ioc_clear_queue(q);
 	spin_unlock_irq(q->queue_lock);
 
-	blkg_destroy_all(q, false);
-
 	/* allocate, init and register new elevator */
 	err = -ENOMEM;
 	q->elevator = elevator_alloc(q, new_e);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b01c377..68720ab 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -370,6 +370,7 @@ struct request_queue {
 
 	struct list_head	icq_list;
 #ifdef CONFIG_BLK_CGROUP
+	DECLARE_BITMAP		(blkcg_pols, BLKCG_MAX_POLS);
 	struct blkio_group	*root_blkg;
 	struct list_head	blkg_list;
 #endif
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/8] blkcg: drop stuff unused after per-queue policy activation update
  2012-04-12 23:29 [PATCHSET] block: per-queue policy activation Tejun Heo
                   ` (6 preceding siblings ...)
  2012-04-12 23:29 ` [PATCH 7/8] blkcg: implement per-queue policy activation Tejun Heo
@ 2012-04-12 23:29 ` Tejun Heo
  7 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2012-04-12 23:29 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

* All_q_list is unused.  Drop all_q_{mutex|list}.

* @for_root of blkg_lookup_create() is always %false when called from
  outside blk-cgroup.c proper.  Factor out __blkg_lookup_create() so
  that it doesn't check whether @q is bypassing and use the
  underscored version for the @for_root callsite.

* blkg_destroy_all() is used only from blkcg proper and @destroy_root
  is always %true.  Make it static and drop @destroy_root.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c   |   61 ++++++++++++++++---------------------------------
 block/blk-cgroup.h   |    6 +----
 block/blk-throttle.c |    2 +-
 block/cfq-iosched.c  |    2 +-
 4 files changed, 23 insertions(+), 48 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index a92eaf9..be72e20 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -25,8 +25,6 @@
 #define MAX_KEY_LEN 100
 
 static DEFINE_MUTEX(blkcg_pol_mutex);
-static DEFINE_MUTEX(all_q_mutex);
-static LIST_HEAD(all_q_list);
 
 struct blkio_cgroup blkio_root_cgroup = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT };
 EXPORT_SYMBOL_GPL(blkio_root_cgroup);
@@ -179,9 +177,8 @@ struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg,
 }
 EXPORT_SYMBOL_GPL(blkg_lookup);
 
-struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
-				       struct request_queue *q,
-				       bool for_root)
+static struct blkio_group *__blkg_lookup_create(struct blkio_cgroup *blkcg,
+						struct request_queue *q)
 	__releases(q->queue_lock) __acquires(q->queue_lock)
 {
 	struct blkio_group *blkg;
@@ -189,13 +186,6 @@ struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
 	WARN_ON_ONCE(!rcu_read_lock_held());
 	lockdep_assert_held(q->queue_lock);
 
-	/*
-	 * This could be the first entry point of blkcg implementation and
-	 * we shouldn't allow anything to go through for a bypassing queue.
-	 */
-	if (unlikely(blk_queue_bypass(q)) && !for_root)
-		return ERR_PTR(blk_queue_dead(q) ? -EINVAL : -EBUSY);
-
 	blkg = __blkg_lookup(blkcg, q);
 	if (blkg)
 		return blkg;
@@ -223,6 +213,18 @@ struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
 out:
 	return blkg;
 }
+
+struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
+				       struct request_queue *q)
+{
+	/*
+	 * This could be the first entry point of blkcg implementation and
+	 * we shouldn't allow anything to go through for a bypassing queue.
+	 */
+	if (unlikely(blk_queue_bypass(q)))
+		return ERR_PTR(blk_queue_dead(q) ? -EINVAL : -EBUSY);
+	return __blkg_lookup_create(blkcg, q);
+}
 EXPORT_SYMBOL_GPL(blkg_lookup_create);
 
 static void blkg_destroy(struct blkio_group *blkg)
@@ -249,12 +251,10 @@ static void blkg_destroy(struct blkio_group *blkg)
 /**
  * blkg_destroy_all - destroy all blkgs associated with a request_queue
  * @q: request_queue of interest
- * @destroy_root: whether to destroy root blkg or not
  *
- * Destroy blkgs associated with @q.  If @destroy_root is %true, all are
- * destroyed; otherwise, root blkg is left alone.
+ * Destroy all blkgs associated with @q.
  */
-void blkg_destroy_all(struct request_queue *q, bool destroy_root)
+static void blkg_destroy_all(struct request_queue *q)
 {
 	struct blkio_group *blkg, *n;
 
@@ -263,10 +263,6 @@ void blkg_destroy_all(struct request_queue *q, bool destroy_root)
 	list_for_each_entry_safe(blkg, n, &q->blkg_list, q_node) {
 		struct blkio_cgroup *blkcg = blkg->blkcg;
 
-		/* skip root? */
-		if (!destroy_root && blkg->blkcg == &blkio_root_cgroup)
-			continue;
-
 		spin_lock(&blkcg->lock);
 		blkg_destroy(blkg);
 		spin_unlock(&blkcg->lock);
@@ -274,7 +270,6 @@ void blkg_destroy_all(struct request_queue *q, bool destroy_root)
 
 	spin_unlock_irq(q->queue_lock);
 }
-EXPORT_SYMBOL_GPL(blkg_destroy_all);
 
 static void blkg_rcu_free(struct rcu_head *rcu_head)
 {
@@ -492,7 +487,7 @@ int blkg_conf_prep(struct blkio_cgroup *blkcg,
 	spin_lock_irq(disk->queue->queue_lock);
 
 	if (blkcg_policy_enabled(disk->queue, pol))
-		blkg = blkg_lookup_create(blkcg, disk->queue, false);
+		blkg = blkg_lookup_create(blkcg, disk->queue);
 	else
 		blkg = ERR_PTR(-EINVAL);
 
@@ -625,20 +620,9 @@ done:
  */
 int blkcg_init_queue(struct request_queue *q)
 {
-	int ret;
-
 	might_sleep();
 
-	ret = blk_throtl_init(q);
-	if (ret)
-		return ret;
-
-	mutex_lock(&all_q_mutex);
-	INIT_LIST_HEAD(&q->all_q_node);
-	list_add_tail(&q->all_q_node, &all_q_list);
-	mutex_unlock(&all_q_mutex);
-
-	return 0;
+	return blk_throtl_init(q);
 }
 
 /**
@@ -662,12 +646,7 @@ void blkcg_drain_queue(struct request_queue *q)
  */
 void blkcg_exit_queue(struct request_queue *q)
 {
-	mutex_lock(&all_q_mutex);
-	list_del_init(&q->all_q_node);
-	mutex_unlock(&all_q_mutex);
-
-	blkg_destroy_all(q, true);
-
+	blkg_destroy_all(q);
 	blk_throtl_exit(q);
 }
 
@@ -741,7 +720,7 @@ int blkcg_activate_policy(struct request_queue *q,
 	spin_lock_irq(q->queue_lock);
 
 	rcu_read_lock();
-	blkg = blkg_lookup_create(&blkio_root_cgroup, q, true);
+	blkg = __blkg_lookup_create(&blkio_root_cgroup, q);
 	rcu_read_unlock();
 
 	if (IS_ERR(blkg)) {
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 519dc6b..1f5ebee 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -115,7 +115,6 @@ extern int blkcg_activate_policy(struct request_queue *q,
 				 const struct blkio_policy_type *pol);
 extern void blkcg_deactivate_policy(struct request_queue *q,
 				    const struct blkio_policy_type *pol);
-extern void blkg_destroy_all(struct request_queue *q, bool destroy_root);
 
 void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
 		       u64 (*prfill)(struct seq_file *, void *, int),
@@ -334,8 +333,6 @@ static inline int blkcg_activate_policy(struct request_queue *q,
 					const struct blkio_policy_type *pol) { return 0; }
 static inline void blkcg_deactivate_policy(struct request_queue *q,
 					   const struct blkio_policy_type *pol) { }
-static inline void blkg_destroy_all(struct request_queue *q,
-				    bool destory_root) { }
 
 static inline void *blkg_to_pdata(struct blkio_group *blkg,
 				struct blkio_policy_type *pol) { return NULL; }
@@ -354,8 +351,7 @@ extern struct blkio_cgroup *bio_blkio_cgroup(struct bio *bio);
 extern struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg,
 				       struct request_queue *q);
 struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
-				       struct request_queue *q,
-				       bool for_root);
+				       struct request_queue *q);
 #else
 struct cgroup;
 static inline struct blkio_cgroup *
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 2fc964e..e2aaf27 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -285,7 +285,7 @@ static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td,
 	} else {
 		struct blkio_group *blkg;
 
-		blkg = blkg_lookup_create(blkcg, q, false);
+		blkg = blkg_lookup_create(blkcg, q);
 
 		/* if %NULL and @q is alive, fall back to root_tg */
 		if (!IS_ERR(blkg))
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 947db5f..79987fb 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1349,7 +1349,7 @@ static struct cfq_group *cfq_lookup_create_cfqg(struct cfq_data *cfqd,
 	} else {
 		struct blkio_group *blkg;
 
-		blkg = blkg_lookup_create(blkcg, q, false);
+		blkg = blkg_lookup_create(blkcg, q);
 		if (!IS_ERR(blkg))
 			cfqg = blkg_to_cfqg(blkg);
 	}
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] blkcg: kill blkio_list and replace blkio_list_lock with a mutex
  2012-04-12 23:29 ` [PATCH 1/8] blkcg: kill blkio_list and replace blkio_list_lock with a mutex Tejun Heo
@ 2012-04-13 13:32   ` Vivek Goyal
  2012-04-13 16:55     ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2012-04-13 13:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Thu, Apr 12, 2012 at 04:29:33PM -0700, Tejun Heo wrote:

Hi Tejun,

> @@ -732,20 +732,21 @@ void blkio_policy_register(struct blkio_policy_type *blkiop)
>  {
>  	struct request_queue *q;
>  
> +	mutex_lock(&blkcg_pol_mutex);
> +
>  	blkcg_bypass_start();
> -	spin_lock(&blkio_list_lock);
>  
>  	BUG_ON(blkio_policy[blkiop->plid]);

A minor nit. It might be a good idea to check that blkiop->plid is not
greater than BLKIO_NR_POLICIES for both registration and unresitration
functions to avoid touching memory beyond blkio_policy[] array.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/8] blkcg: implement per-queue policy activation
  2012-04-12 23:29 ` [PATCH 7/8] blkcg: implement per-queue policy activation Tejun Heo
@ 2012-04-13 15:21   ` Vivek Goyal
  2012-04-13 15:23     ` Vivek Goyal
  2012-04-13 18:12   ` Vivek Goyal
  1 sibling, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2012-04-13 15:21 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Thu, Apr 12, 2012 at 04:29:39PM -0700, Tejun Heo wrote:

[..]
> @@ -111,12 +122,11 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
>  		struct blkio_policy_type *pol = blkio_policy[i];
>  		struct blkg_policy_data *pd;
>  
> -		if (!pol)
> +		if (!blkcg_policy_enabled(q, pol))
>  			continue;
>  
>  		/* alloc per-policy data and attach it to blkg */
> -		pd = kzalloc_node(sizeof(*pd) + pol->pdata_size, GFP_ATOMIC,
> -				  q->node);
> +		pd = kzalloc_node(blkg_pd_size(pol), GFP_ATOMIC, q->node);
>  		if (!pd) {
>  			blkg_free(blkg);
>  			return NULL;
> @@ -130,7 +140,7 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
>  	for (i = 0; i < BLKCG_MAX_POLS; i++) {
>  		struct blkio_policy_type *pol = blkio_policy[i];
>  
> -		if (pol)
> +		if (blkcg_policy_enabled(blkg->q, pol))
>  			pol->ops.blkio_init_group_fn(blkg);
>  	}

May be pol->ops.blkio_init_group_fn() can be called in the loop above
where you are allocating the group. So you don't end up looping through
policies twice and don't have to call blkcg_policy_enabled() twice.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/8] blkcg: implement per-queue policy activation
  2012-04-13 15:21   ` Vivek Goyal
@ 2012-04-13 15:23     ` Vivek Goyal
  0 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2012-04-13 15:23 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Fri, Apr 13, 2012 at 11:21:22AM -0400, Vivek Goyal wrote:
> On Thu, Apr 12, 2012 at 04:29:39PM -0700, Tejun Heo wrote:
> 
> [..]
> > @@ -111,12 +122,11 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
> >  		struct blkio_policy_type *pol = blkio_policy[i];
> >  		struct blkg_policy_data *pd;
> >  
> > -		if (!pol)
> > +		if (!blkcg_policy_enabled(q, pol))
> >  			continue;
> >  
> >  		/* alloc per-policy data and attach it to blkg */
> > -		pd = kzalloc_node(sizeof(*pd) + pol->pdata_size, GFP_ATOMIC,
> > -				  q->node);
> > +		pd = kzalloc_node(blkg_pd_size(pol), GFP_ATOMIC, q->node);
> >  		if (!pd) {
> >  			blkg_free(blkg);
> >  			return NULL;
> > @@ -130,7 +140,7 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
> >  	for (i = 0; i < BLKCG_MAX_POLS; i++) {
> >  		struct blkio_policy_type *pol = blkio_policy[i];
> >  
> > -		if (pol)
> > +		if (blkcg_policy_enabled(blkg->q, pol))
> >  			pol->ops.blkio_init_group_fn(blkg);
> >  	}
> 
> May be pol->ops.blkio_init_group_fn() can be called in the loop above
> where you are allocating the group. So you don't end up looping through
> policies twice and don't have to call blkcg_policy_enabled() twice.

Oh..., I guess you did it because you did not want to call into individual
policies till all the policy data allocations were successfully done.
Otherwise you need to call into policies again to undo all initializations.

So yes, it is fine as it is.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
  2012-04-12 23:29 ` [PATCH 5/8] blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing Tejun Heo
@ 2012-04-13 16:00   ` Vivek Goyal
  2012-04-13 17:03     ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2012-04-13 16:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Thu, Apr 12, 2012 at 04:29:37PM -0700, Tejun Heo wrote:

[..]
>   * In bypass mode, only the dispatch FIFO queue of @q is used.  This
>   * function makes @q enter bypass mode and drains all requests which were
>   * throttled or issued before.  On return, it's guaranteed that no request
> - * is being throttled or has ELVPRIV set.
> + * is being throttled or has ELVPRIV set and blk_queue_bypass() is %true
> + * inside queue or RCU read lock.
>   */
>  void blk_queue_bypass_start(struct request_queue *q)
>  {
> @@ -426,6 +427,7 @@ void blk_queue_bypass_start(struct request_queue *q)
>  	spin_unlock_irq(q->queue_lock);
>  
>  	blk_drain_queue(q, false);
> +	synchronize_rcu();

Hi Tejun,

I guess this synchronize_rcu() needs some comments here to make it clear
what it meant for. IIUC, you are protecting against policy data (stats
update) which happen under rcu in throttling code? You want to make sure
all these updaters are done before you go ahead with
activation/deactivation of a policy.

Well, I am wondering if CFQ is policy being activated/deactivated why
should we try to drain other policie's requests. Can't one continue
to work without draining all the throttled requests. We probably just
need to make sure new groups are not created.

Anyway, this is not a big deal. I was more concerned about other polices
losing configuration while one policy is changing. And that is fixed
with this patchset. It is nice to see all that in place update_root
go away. That was confusing.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] blkcg: kill blkio_list and replace blkio_list_lock with a mutex
  2012-04-13 13:32   ` Vivek Goyal
@ 2012-04-13 16:55     ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2012-04-13 16:55 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Fri, Apr 13, 2012 at 09:32:26AM -0400, Vivek Goyal wrote:
> On Thu, Apr 12, 2012 at 04:29:33PM -0700, Tejun Heo wrote:
> 
> Hi Tejun,
> 
> > @@ -732,20 +732,21 @@ void blkio_policy_register(struct blkio_policy_type *blkiop)
> >  {
> >  	struct request_queue *q;
> >  
> > +	mutex_lock(&blkcg_pol_mutex);
> > +
> >  	blkcg_bypass_start();
> > -	spin_lock(&blkio_list_lock);
> >  
> >  	BUG_ON(blkio_policy[blkiop->plid]);
> 
> A minor nit. It might be a good idea to check that blkiop->plid is not
> greater than BLKIO_NR_POLICIES for both registration and unresitration
> functions to avoid touching memory beyond blkio_policy[] array.

I'm making ->plid to be dynamically allocated on registration anyway,
so I don't think it really matters.  The updated unregistration
function does BUG_ON(blkio_policy[pol->plid] != pol) which should
catch stray plid.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
  2012-04-13 16:00   ` Vivek Goyal
@ 2012-04-13 17:03     ` Tejun Heo
  2012-04-13 17:23       ` Vivek Goyal
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2012-04-13 17:03 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

Hey,

On Fri, Apr 13, 2012 at 12:00:53PM -0400, Vivek Goyal wrote:
> On Thu, Apr 12, 2012 at 04:29:37PM -0700, Tejun Heo wrote:
> 
> [..]
> >   * In bypass mode, only the dispatch FIFO queue of @q is used.  This
> >   * function makes @q enter bypass mode and drains all requests which were
> >   * throttled or issued before.  On return, it's guaranteed that no request
> > - * is being throttled or has ELVPRIV set.
> > + * is being throttled or has ELVPRIV set and blk_queue_bypass() is %true
> > + * inside queue or RCU read lock.
> >   */
> >  void blk_queue_bypass_start(struct request_queue *q)
> >  {
> > @@ -426,6 +427,7 @@ void blk_queue_bypass_start(struct request_queue *q)
> >  	spin_unlock_irq(q->queue_lock);
> >  
> >  	blk_drain_queue(q, false);
> > +	synchronize_rcu();
> 
> I guess this synchronize_rcu() needs some comments here to make it clear
> what it meant for. IIUC, you are protecting against policy data (stats
> update) which happen under rcu in throttling code? You want to make sure
> all these updaters are done before you go ahead with
> activation/deactivation of a policy.
> 
> Well, I am wondering if CFQ is policy being activated/deactivated why
> should we try to drain other policie's requests. Can't one continue
> to work without draining all the throttled requests. We probably just
> need to make sure new groups are not created.

So, I think synchronization rules like this are something which the
core should define.  cfq may not use it but the sync rules should
still be the same for all policies.  In this case, what the core
provides is "blk_queue_bypass() is guaranteed to be seen as %true
inside RCU read lock section once this function returns", which in
turn will guarantee that RCU read-lock protected blkg_lookup() is
guaranteed to fail once the function returns.  This property makes RCU
protected blkg_lookup() safe against queue bypassing, which is what we
want.

Now, all these are something which happens (and should happen) inside
blkcg core.  All blk-throtl knows is that if it sticks to the rules
defined by blkcg core (e.g. blkg_lookup() should be RCU protected),
it's not stepping on anyone's toe, so, no, this definitely belongs to
block / blkcg core proper.  We can move it from block to blkcg but I
think it's a nice property to have for queue bypassing and it's not
like queue bypassing is a hot path.

That said, yeah, I think it can use a bit more documentation.  Will
add more comments.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
  2012-04-13 17:03     ` Tejun Heo
@ 2012-04-13 17:23       ` Vivek Goyal
  2012-04-13 17:28         ` Tejun Heo
  2012-04-16 22:41         ` Paul E. McKenney
  0 siblings, 2 replies; 20+ messages in thread
From: Vivek Goyal @ 2012-04-13 17:23 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Fri, Apr 13, 2012 at 10:03:34AM -0700, Tejun Heo wrote:
> Hey,
> 
> On Fri, Apr 13, 2012 at 12:00:53PM -0400, Vivek Goyal wrote:
> > On Thu, Apr 12, 2012 at 04:29:37PM -0700, Tejun Heo wrote:
> > 
> > [..]
> > >   * In bypass mode, only the dispatch FIFO queue of @q is used.  This
> > >   * function makes @q enter bypass mode and drains all requests which were
> > >   * throttled or issued before.  On return, it's guaranteed that no request
> > > - * is being throttled or has ELVPRIV set.
> > > + * is being throttled or has ELVPRIV set and blk_queue_bypass() is %true
> > > + * inside queue or RCU read lock.
> > >   */
> > >  void blk_queue_bypass_start(struct request_queue *q)
> > >  {
> > > @@ -426,6 +427,7 @@ void blk_queue_bypass_start(struct request_queue *q)
> > >  	spin_unlock_irq(q->queue_lock);
> > >  
> > >  	blk_drain_queue(q, false);
> > > +	synchronize_rcu();
> > 
> > I guess this synchronize_rcu() needs some comments here to make it clear
> > what it meant for. IIUC, you are protecting against policy data (stats
> > update) which happen under rcu in throttling code? You want to make sure
> > all these updaters are done before you go ahead with
> > activation/deactivation of a policy.
> > 
> > Well, I am wondering if CFQ is policy being activated/deactivated why
> > should we try to drain other policie's requests. Can't one continue
> > to work without draining all the throttled requests. We probably just
> > need to make sure new groups are not created.
> 
> So, I think synchronization rules like this are something which the
> core should define.  cfq may not use it but the sync rules should
> still be the same for all policies.  In this case, what the core
> provides is "blk_queue_bypass() is guaranteed to be seen as %true
> inside RCU read lock section once this function returns", which in
> turn will guarantee that RCU read-lock protected blkg_lookup() is
> guaranteed to fail once the function returns.  This property makes RCU
> protected blkg_lookup() safe against queue bypassing, which is what we
> want.

I think now synchronize_rcu() has become part of cfq_init_queue()
effectively and that will slow down boot. In the past I had to remove
it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
  2012-04-13 17:23       ` Vivek Goyal
@ 2012-04-13 17:28         ` Tejun Heo
  2012-04-16 22:41         ` Paul E. McKenney
  1 sibling, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2012-04-13 17:28 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Fri, Apr 13, 2012 at 01:23:36PM -0400, Vivek Goyal wrote:
> I think now synchronize_rcu() has become part of cfq_init_queue()
> effectively and that will slow down boot. In the past I had to remove
> it.

Eh... crap.  You're right.  Forgot about that.  Maybe we need to
special case initial queue setup.  I'll look into it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/8] blkcg: implement per-queue policy activation
  2012-04-12 23:29 ` [PATCH 7/8] blkcg: implement per-queue policy activation Tejun Heo
  2012-04-13 15:21   ` Vivek Goyal
@ 2012-04-13 18:12   ` Vivek Goyal
  2012-04-13 18:17     ` Tejun Heo
  1 sibling, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2012-04-13 18:12 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Thu, Apr 12, 2012 at 04:29:39PM -0700, Tejun Heo wrote:

[..]
> diff --git a/block/elevator.c b/block/elevator.c
> index be3ab6d..6a55d41 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -896,8 +896,6 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
>  	ioc_clear_queue(q);
>  	spin_unlock_irq(q->queue_lock);
>  
> -	blkg_destroy_all(q, false);
> -

So now groups don't reclaimed until either cgroup is deleted or queue
exits. So if BLK_DEV_THROTTLE=n and cfq is switched out, blkg created
by CFQ policy will not be reclaimed (despite the fact nobody is using
them).

This is not necessarily bad, just thought of clarifying that it is the
design intent.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/8] blkcg: implement per-queue policy activation
  2012-04-13 18:12   ` Vivek Goyal
@ 2012-04-13 18:17     ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2012-04-13 18:17 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

Hello,

On Fri, Apr 13, 2012 at 02:12:57PM -0400, Vivek Goyal wrote:
> On Thu, Apr 12, 2012 at 04:29:39PM -0700, Tejun Heo wrote:
> 
> [..]
> > diff --git a/block/elevator.c b/block/elevator.c
> > index be3ab6d..6a55d41 100644
> > --- a/block/elevator.c
> > +++ b/block/elevator.c
> > @@ -896,8 +896,6 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
> >  	ioc_clear_queue(q);
> >  	spin_unlock_irq(q->queue_lock);
> >  
> > -	blkg_destroy_all(q, false);
> > -
> 
> So now groups don't reclaimed until either cgroup is deleted or queue
> exits. So if BLK_DEV_THROTTLE=n and cfq is switched out, blkg created
> by CFQ policy will not be reclaimed (despite the fact nobody is using
> them).

It's not difficult to change.  We can just trigger reclaim if no
policy is enabled for the queue.  Hmmm... yeah, that's gonna be like
five line change.  I'll add it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
  2012-04-13 17:23       ` Vivek Goyal
  2012-04-13 17:28         ` Tejun Heo
@ 2012-04-16 22:41         ` Paul E. McKenney
  1 sibling, 0 replies; 20+ messages in thread
From: Paul E. McKenney @ 2012-04-16 22:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Fri, Apr 13, 2012 at 01:23:36PM -0400, Vivek Goyal wrote:
> On Fri, Apr 13, 2012 at 10:03:34AM -0700, Tejun Heo wrote:
> > Hey,
> > 
> > On Fri, Apr 13, 2012 at 12:00:53PM -0400, Vivek Goyal wrote:
> > > On Thu, Apr 12, 2012 at 04:29:37PM -0700, Tejun Heo wrote:
> > > 
> > > [..]
> > > >   * In bypass mode, only the dispatch FIFO queue of @q is used.  This
> > > >   * function makes @q enter bypass mode and drains all requests which were
> > > >   * throttled or issued before.  On return, it's guaranteed that no request
> > > > - * is being throttled or has ELVPRIV set.
> > > > + * is being throttled or has ELVPRIV set and blk_queue_bypass() is %true
> > > > + * inside queue or RCU read lock.
> > > >   */
> > > >  void blk_queue_bypass_start(struct request_queue *q)
> > > >  {
> > > > @@ -426,6 +427,7 @@ void blk_queue_bypass_start(struct request_queue *q)
> > > >  	spin_unlock_irq(q->queue_lock);
> > > >  
> > > >  	blk_drain_queue(q, false);
> > > > +	synchronize_rcu();
> > > 
> > > I guess this synchronize_rcu() needs some comments here to make it clear
> > > what it meant for. IIUC, you are protecting against policy data (stats
> > > update) which happen under rcu in throttling code? You want to make sure
> > > all these updaters are done before you go ahead with
> > > activation/deactivation of a policy.
> > > 
> > > Well, I am wondering if CFQ is policy being activated/deactivated why
> > > should we try to drain other policie's requests. Can't one continue
> > > to work without draining all the throttled requests. We probably just
> > > need to make sure new groups are not created.
> > 
> > So, I think synchronization rules like this are something which the
> > core should define.  cfq may not use it but the sync rules should
> > still be the same for all policies.  In this case, what the core
> > provides is "blk_queue_bypass() is guaranteed to be seen as %true
> > inside RCU read lock section once this function returns", which in
> > turn will guarantee that RCU read-lock protected blkg_lookup() is
> > guaranteed to fail once the function returns.  This property makes RCU
> > protected blkg_lookup() safe against queue bypassing, which is what we
> > want.
> 
> I think now synchronize_rcu() has become part of cfq_init_queue()
> effectively and that will slow down boot. In the past I had to remove
> it.

One alternative approach is to use synchronize_rcu_expedited().

							Thanx, Paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-04-16 22:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-12 23:29 [PATCHSET] block: per-queue policy activation Tejun Heo
2012-04-12 23:29 ` [PATCH 1/8] blkcg: kill blkio_list and replace blkio_list_lock with a mutex Tejun Heo
2012-04-13 13:32   ` Vivek Goyal
2012-04-13 16:55     ` Tejun Heo
2012-04-12 23:29 ` [PATCH 2/8] blkcg: use @pol instead of @plid in update_root_blkg_pd() and blkcg_print_blkgs() Tejun Heo
2012-04-12 23:29 ` [PATCH 3/8] blkcg: remove static policy ID enums Tejun Heo
2012-04-12 23:29 ` [PATCH 4/8] blkcg: make blkg_conf_prep() take @pol and return with queue lock held Tejun Heo
2012-04-12 23:29 ` [PATCH 5/8] blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing Tejun Heo
2012-04-13 16:00   ` Vivek Goyal
2012-04-13 17:03     ` Tejun Heo
2012-04-13 17:23       ` Vivek Goyal
2012-04-13 17:28         ` Tejun Heo
2012-04-16 22:41         ` Paul E. McKenney
2012-04-12 23:29 ` [PATCH 6/8] blkcg: add request_queue->root_blkg Tejun Heo
2012-04-12 23:29 ` [PATCH 7/8] blkcg: implement per-queue policy activation Tejun Heo
2012-04-13 15:21   ` Vivek Goyal
2012-04-13 15:23     ` Vivek Goyal
2012-04-13 18:12   ` Vivek Goyal
2012-04-13 18:17     ` Tejun Heo
2012-04-12 23:29 ` [PATCH 8/8] blkcg: drop stuff unused after per-queue policy activation update Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).