linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET] blkcg: kill policy node and blkg->dev
@ 2012-01-19  1:11 Tejun Heo
  2012-01-19  1:11 ` [PATCH 01/12] blkcg: obtaining blkg should be enclosed inside rcu_read_lock() Tejun Heo
                   ` (12 more replies)
  0 siblings, 13 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel

Hello, guys.

This is the first of three (maybe four) patchset series trying to
cleanup block cgroup API.

<rant>
The current state of blkcg API is extremely sad.  From the first
glance, it looks as though it's built with some modular design in mind
with policy registration and all but the whole thing is one giant
unholy spaghetti mess which renders the phrase "layer violation"
meaningless as it's unclear whether it exists at all from the
beginning.

I've been looking at the code for three weeks now.  I've done a lot of
code cleanups all around the kernel and seen a lot of scary/messy
stuff, but blkcg takes the worst spot without any shred of doubt.  We
really shouldn't have code like this in the kernel.
</rant>

Alright, I just needed to get that out of my system.  We all do a lot
of horrible things.  I know I've been and still am, so let's just
concentrate on getting it better from here on.

blk-cgroup, blk-throttle and cfq-iosched are pretty tightly tangled
and untangling requires seemingly unrelated changes and addition of
some transitional stuff.  Some are in this patchset but they'll be
more prominent in the second patchset.  With that said, this patchset
concentrates on removing blkio_policy_node and blkio_group->dev.

blkio_policy_node carries configuration per cgroup-request_queue
association separately from the matching blkio_group.  The goal seems
to be allowing configuration of missing devices and retaining
configuration across hot unplug/plug cycles.  Such behavior is very
different from existing conventions, misleading and unnecessary.  This
patchset moves configuration to blkio_group and removes
blkio_policy_node.

blkg->dev removal is much simpler.  blkg->dev is used to print out
device number when printing out per-device configurations or
statistics.  The device name can be easily determined by following the
associated request_queue from blkg instead.

This patchset contains the following 12 patches.

  0001-blkcg-obtaining-blkg-should-be-enclosed-inside-rcu_r.patch
  0002-cfq-don-t-register-propio-policy-if-CONFIG_CFQ_GROUP.patch
  0003-elevator-clear-auxiliary-data-earlier-during-elevato.patch
  0004-elevator-make-elevator_init_fn-return-0-errno.patch
  0005-blkcg-update-blkg-get-functions-take-blkio_cgroup-as.patch
  0006-blkcg-use-q-and-plid-instead-of-opaque-void-for-blki.patch
  0007-blkcg-add-blkio_policy-array-and-allow-one-policy-pe.patch
  0008-blkcg-use-the-usual-get-blkg-path-for-root-blkio_gro.patch
  0009-blkcg-factor-out-blkio_group-creation.patch
  0010-blkcg-don-t-allow-or-retain-configuration-of-missing.patch
  0011-blkcg-kill-blkio_policy_node.patch
  0012-blkcg-kill-the-mind-bending-blkg-dev.patch

0001 is rcu locking fix.  Applying to mainline as fix is probably a
good idea.

0002 is misc cfq prep.

0003-0004 update elevator.  Elevator switch behavior is changed once
more to simplify elevator init/exit paths.

0005-0009 prepare blkcg for further changes.  Two policy operations
and new one policy per policy ID restriction are added.  All will be
removed as cleanup progresses further.

0010-0011 move blkcg configuration into blkg and drop policy node.

0012 drops blkg->dev.

This patchset is on top of linus#master (4ba3069fea "Merge branch
'for-next' of git://../nab/target-pending") and available in the
following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blkcg-kill-pn

Thanks.

 block/blk-cgroup.c       |  635 +++++++++++++++--------------------------------
 block/blk-cgroup.h       |   91 ++----
 block/blk-core.c         |    6 
 block/blk-throttle.c     |  259 +++++--------------
 block/cfq-iosched.c      |  237 +++++------------
 block/cfq.h              |    7 
 block/deadline-iosched.c |    8 
 block/elevator.c         |   90 +++---
 block/noop-iosched.c     |    8 
 include/linux/elevator.h |    2 
 10 files changed, 436 insertions(+), 907 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/12] blkcg: obtaining blkg should be enclosed inside rcu_read_lock()
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19 10:07   ` Vivek Goyal
  2012-01-19  1:11 ` [PATCH 02/12] cfq: don't register propio policy if !CONFIG_CFQ_GROUP_IOSCHED Tejun Heo
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

When looking up or creating blkg's, both blk-throttle and cfq-iosched
drops rcu_read_lock() right after lookup is complete.  This isn't
safe.  Refcnt isn't incremented at that point and rcu lock is the only
thing holding the blkg.  It shouldn't be dropped until after refcnt is
incremented by the caller.

Update throtl_get_tg() and cfq_get_cfqg() such that they are called
and return under rcu_read_lock() and let their callers manage rcu
locking appropriately.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-throttle.c |   18 ++++++------------
 block/cfq-iosched.c  |   11 +++++------
 2 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 5eed6a7..ae53056 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -313,25 +313,23 @@ static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
 	if (unlikely(blk_queue_dead(q)))
 		return NULL;
 
-	rcu_read_lock();
 	blkcg = task_blkio_cgroup(current);
 	tg = throtl_find_tg(td, blkcg);
-	if (tg) {
-		rcu_read_unlock();
+	if (tg)
 		return tg;
-	}
 
 	/*
 	 * Need to allocate a group. Allocation of group also needs allocation
 	 * of per cpu stats which in-turn takes a mutex() and can block. Hence
 	 * we need to drop rcu lock and queue_lock before we call alloc.
 	 */
-	rcu_read_unlock();
 	spin_unlock_irq(q->queue_lock);
+	rcu_read_unlock();
 
 	tg = throtl_alloc_tg(td);
 
 	/* Group allocated and queue is still alive. take the lock */
+	rcu_read_lock();
 	spin_lock_irq(q->queue_lock);
 
 	/* Make sure @q is still alive */
@@ -343,7 +341,6 @@ static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
 	/*
 	 * Initialize the new group. After sleeping, read the blkcg again.
 	 */
-	rcu_read_lock();
 	blkcg = task_blkio_cgroup(current);
 
 	/*
@@ -354,7 +351,6 @@ static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
 
 	if (__tg) {
 		kfree(tg);
-		rcu_read_unlock();
 		return __tg;
 	}
 
@@ -365,7 +361,6 @@ static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
 	}
 
 	throtl_init_add_tg_lists(td, tg, blkcg);
-	rcu_read_unlock();
 	return tg;
 }
 
@@ -1127,7 +1122,6 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
 	 * basic fields like stats and io rates. If a group has no rules,
 	 * just update the dispatch stats in lockless manner and return.
 	 */
-
 	rcu_read_lock();
 	blkcg = task_blkio_cgroup(current);
 	tg = throtl_find_tg(td, blkcg);
@@ -1137,11 +1131,9 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
 		if (tg_no_rule_group(tg, rw)) {
 			blkiocg_update_dispatch_stats(&tg->blkg, bio->bi_size,
 					rw, rw_is_sync(bio->bi_rw));
-			rcu_read_unlock();
-			goto out;
+			goto out_unlock_rcu;
 		}
 	}
-	rcu_read_unlock();
 
 	/*
 	 * Either group has not been allocated yet or it is not an unlimited
@@ -1199,6 +1191,8 @@ queue_bio:
 
 out_unlock:
 	spin_unlock_irq(q->queue_lock);
+out_unlock_rcu:
+	rcu_read_unlock();
 out:
 	return throttled;
 }
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index ee55019..9f6219f 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1128,13 +1128,10 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd)
 	struct cfq_group *cfqg = NULL, *__cfqg = NULL;
 	struct request_queue *q = cfqd->queue;
 
-	rcu_read_lock();
 	blkcg = task_blkio_cgroup(current);
 	cfqg = cfq_find_cfqg(cfqd, blkcg);
-	if (cfqg) {
-		rcu_read_unlock();
+	if (cfqg)
 		return cfqg;
-	}
 
 	/*
 	 * Need to allocate a group. Allocation of group also needs allocation
@@ -1164,7 +1161,6 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd)
 
 	if (__cfqg) {
 		kfree(cfqg);
-		rcu_read_unlock();
 		return __cfqg;
 	}
 
@@ -1172,7 +1168,6 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd)
 		cfqg = &cfqd->root_group;
 
 	cfq_init_add_cfqg_lists(cfqd, cfqg, blkcg);
-	rcu_read_unlock();
 	return cfqg;
 }
 
@@ -2860,6 +2855,8 @@ cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync,
 	struct cfq_group *cfqg;
 
 retry:
+	rcu_read_lock();
+
 	cfqg = cfq_get_cfqg(cfqd);
 	cic = cfq_cic_lookup(cfqd, ioc);
 	/* cic always exists here */
@@ -2875,6 +2872,7 @@ retry:
 			cfqq = new_cfqq;
 			new_cfqq = NULL;
 		} else if (gfp_mask & __GFP_WAIT) {
+			rcu_read_unlock();
 			spin_unlock_irq(cfqd->queue->queue_lock);
 			new_cfqq = kmem_cache_alloc_node(cfq_pool,
 					gfp_mask | __GFP_ZERO,
@@ -2900,6 +2898,7 @@ retry:
 	if (new_cfqq)
 		kmem_cache_free(cfq_pool, new_cfqq);
 
+	rcu_read_unlock();
 	return cfqq;
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/12] cfq: don't register propio policy if !CONFIG_CFQ_GROUP_IOSCHED
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
  2012-01-19  1:11 ` [PATCH 01/12] blkcg: obtaining blkg should be enclosed inside rcu_read_lock() Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19 10:11   ` Vivek Goyal
  2012-01-19  1:11 ` [PATCH 03/12] elevator: clear auxiliary data earlier during elevator switch Tejun Heo
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

cfq has been registering zeroed blkio_poilcy_cfq if CFQ_GROUP_IOSCHED
is disabled.  This fortunately doesn't collide with blk-throtl as
BLKIO_POLICY_PROP is zero but is unnecessary and risky.  Just don't
register it if not enabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 9f6219f..5a2e670 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3883,8 +3883,6 @@ static struct blkio_policy_type blkio_policy_cfq = {
 	},
 	.plid = BLKIO_POLICY_PROP,
 };
-#else
-static struct blkio_policy_type blkio_policy_cfq;
 #endif
 
 static int __init cfq_init(void)
@@ -3915,14 +3913,17 @@ static int __init cfq_init(void)
 		return ret;
 	}
 
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
 	blkio_policy_register(&blkio_policy_cfq);
-
+#endif
 	return 0;
 }
 
 static void __exit cfq_exit(void)
 {
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
 	blkio_policy_unregister(&blkio_policy_cfq);
+#endif
 	elv_unregister(&iosched_cfq);
 	kmem_cache_destroy(cfq_pool);
 }
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/12] elevator: clear auxiliary data earlier during elevator switch
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
  2012-01-19  1:11 ` [PATCH 01/12] blkcg: obtaining blkg should be enclosed inside rcu_read_lock() Tejun Heo
  2012-01-19  1:11 ` [PATCH 02/12] cfq: don't register propio policy if !CONFIG_CFQ_GROUP_IOSCHED Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19  1:11 ` [PATCH 04/12] elevator: make elevator_init_fn() return 0/-errno Tejun Heo
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

Elevator switch tries hard to keep as much as context until new
elevator is ready so that it can revert to the original state if
initializing the new elevator fails for some reason.  Unfortunately,
with more auxiliary contexts to manage, this makes elevator init and
exit paths too complex and fragile.

This patch makes elevator_switch() unregister the current elevator and
flush icq's before start initializing the new one.  As we still keep
the old elevator itself, the only difference is that we lose icq's on
rare occassions of switching failure, which isn't critical at all.

Note that this makes explicit elevator parameter to
elevator_init_queue() and __elv_register_queue() unnecessary as they
always can use the current elevator.

This patch enables block cgroup cleanups.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/elevator.c |   88 +++++++++++++++++++++++++++---------------------------
 1 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 91e18f8..42543e3 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -168,11 +168,10 @@ static struct elevator_type *elevator_get(const char *name)
 	return e;
 }
 
-static int elevator_init_queue(struct request_queue *q,
-			       struct elevator_queue *eq)
+static int elevator_init_queue(struct request_queue *q)
 {
-	eq->elevator_data = eq->type->ops.elevator_init_fn(q);
-	if (eq->elevator_data)
+	q->elevator->elevator_data = q->elevator->type->ops.elevator_init_fn(q);
+	if (q->elevator->elevator_data)
 		return 0;
 	return -ENOMEM;
 }
@@ -235,7 +234,6 @@ static void elevator_release(struct kobject *kobj)
 int elevator_init(struct request_queue *q, char *name)
 {
 	struct elevator_type *e = NULL;
-	struct elevator_queue *eq;
 	int err;
 
 	if (unlikely(q->elevator))
@@ -269,17 +267,16 @@ int elevator_init(struct request_queue *q, char *name)
 		}
 	}
 
-	eq = elevator_alloc(q, e);
-	if (!eq)
+	q->elevator = elevator_alloc(q, e);
+	if (!q->elevator)
 		return -ENOMEM;
 
-	err = elevator_init_queue(q, eq);
+	err = elevator_init_queue(q);
 	if (err) {
-		kobject_put(&eq->kobj);
+		kobject_put(&q->elevator->kobj);
 		return err;
 	}
 
-	q->elevator = eq;
 	return 0;
 }
 EXPORT_SYMBOL(elevator_init);
@@ -848,8 +845,9 @@ static struct kobj_type elv_ktype = {
 	.release	= elevator_release,
 };
 
-int __elv_register_queue(struct request_queue *q, struct elevator_queue *e)
+int elv_register_queue(struct request_queue *q)
 {
+	struct elevator_queue *e = q->elevator;
 	int error;
 
 	error = kobject_add(&e->kobj, &q->kobj, "%s", "iosched");
@@ -867,11 +865,6 @@ int __elv_register_queue(struct request_queue *q, struct elevator_queue *e)
 	}
 	return error;
 }
-
-int elv_register_queue(struct request_queue *q)
-{
-	return __elv_register_queue(q, q->elevator);
-}
 EXPORT_SYMBOL(elv_register_queue);
 
 void elv_unregister_queue(struct request_queue *q)
@@ -954,39 +947,47 @@ EXPORT_SYMBOL_GPL(elv_unregister);
  */
 static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 {
-	struct elevator_queue *old_elevator, *e;
+	struct elevator_queue *old = q->elevator;
+	bool registered = old->registered;
 	int err;
 
-	/* allocate new elevator */
-	e = elevator_alloc(q, new_e);
-	if (!e)
-		return -ENOMEM;
-
-	err = elevator_init_queue(q, e);
-	if (err) {
-		kobject_put(&e->kobj);
-		return err;
-	}
-
-	/* turn on BYPASS and drain all requests w/ elevator private data */
+	/*
+	 * Turn on BYPASS and drain all requests w/ elevator private data.
+	 * Block layer doesn't call into a quiesced elevator - all requests
+	 * are directly put on the dispatch list without elevator data
+	 * using INSERT_BACK.  All requests have SOFTBARRIER set and no
+	 * merge happens either.
+	 */
 	elv_quiesce_start(q);
 
-	/* unregister old queue, register new one and kill old elevator */
-	if (q->elevator->registered) {
+	/* unregister and clear all auxiliary data of the old elevator */
+	if (registered)
 		elv_unregister_queue(q);
-		err = __elv_register_queue(q, e);
-		if (err)
-			goto fail_register;
-	}
 
-	/* done, clear io_cq's, switch elevators and turn off BYPASS */
 	spin_lock_irq(q->queue_lock);
 	ioc_clear_queue(q);
-	old_elevator = q->elevator;
-	q->elevator = e;
 	spin_unlock_irq(q->queue_lock);
 
-	elevator_exit(old_elevator);
+	/* allocate, init and register new elevator */
+	err = -ENOMEM;
+	q->elevator = elevator_alloc(q, new_e);
+	if (!q->elevator)
+		goto fail_init;
+
+	err = elevator_init_queue(q);
+	if (err) {
+		kobject_put(&q->elevator->kobj);
+		goto fail_init;
+	}
+
+	if (registered) {
+		err = elv_register_queue(q);
+		if (err)
+			goto fail_register;
+	}
+
+	/* done, kill the old one and finish */
+	elevator_exit(old);
 	elv_quiesce_end(q);
 
 	blk_add_trace_msg(q, "elv switch: %s", e->type->elevator_name);
@@ -994,11 +995,10 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 	return 0;
 
 fail_register:
-	/*
-	 * switch failed, exit the new io scheduler and reattach the old
-	 * one again (along with re-adding the sysfs dir)
-	 */
-	elevator_exit(e);
+	elevator_exit(q->elevator);
+fail_init:
+	/* switch failed, restore and re-register old elevator */
+	q->elevator = old;
 	elv_register_queue(q);
 	elv_quiesce_end(q);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/12] elevator: make elevator_init_fn() return 0/-errno
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
                   ` (2 preceding siblings ...)
  2012-01-19  1:11 ` [PATCH 03/12] elevator: clear auxiliary data earlier during elevator switch Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19 11:11   ` Vivek Goyal
  2012-01-19  1:11 ` [PATCH 05/12] blkcg: update blkg get functions take blkio_cgroup as parameter Tejun Heo
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

elevator_ops->elevator_init_fn() has a weird return value.  It returns
a void * which the caller should assign to q->elevator->elevator_data
and %NULL return denotes init failure.

Update such that it returns integer 0/-errno and sets elevator_data
directly as necessary.

This makes the interface more conventional and eases further cleanup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c      |    9 +++++----
 block/deadline-iosched.c |    8 +++++---
 block/elevator.c         |   12 ++----------
 block/noop-iosched.c     |    8 +++++---
 include/linux/elevator.h |    2 +-
 5 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 5a2e670..fd39a44 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3661,7 +3661,7 @@ static void cfq_exit_queue(struct elevator_queue *e)
 	kfree(cfqd);
 }
 
-static void *cfq_init_queue(struct request_queue *q)
+static int cfq_init_queue(struct request_queue *q)
 {
 	struct cfq_data *cfqd;
 	int i, j;
@@ -3670,7 +3670,7 @@ static void *cfq_init_queue(struct request_queue *q)
 
 	cfqd = kmalloc_node(sizeof(*cfqd), GFP_KERNEL | __GFP_ZERO, q->node);
 	if (!cfqd)
-		return NULL;
+		return -ENOMEM;
 
 	/* Init root service tree */
 	cfqd->grp_service_tree = CFQ_RB_ROOT;
@@ -3697,7 +3697,7 @@ static void *cfq_init_queue(struct request_queue *q)
 	if (blkio_alloc_blkg_stats(&cfqg->blkg)) {
 		kfree(cfqg);
 		kfree(cfqd);
-		return NULL;
+		return -ENOMEM;
 	}
 
 	rcu_read_lock();
@@ -3728,6 +3728,7 @@ static void *cfq_init_queue(struct request_queue *q)
 	cfq_link_cfqq_cfqg(&cfqd->oom_cfqq, &cfqd->root_group);
 
 	cfqd->queue = q;
+	q->elevator->elevator_data = cfqd;
 
 	init_timer(&cfqd->idle_slice_timer);
 	cfqd->idle_slice_timer.function = cfq_idle_slice_timer;
@@ -3752,7 +3753,7 @@ static void *cfq_init_queue(struct request_queue *q)
 	 * second, in order to have larger depth for async operations.
 	 */
 	cfqd->last_delayed_sync = jiffies - HZ;
-	return cfqd;
+	return 0;
 }
 
 /*
diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index 7bf12d7..599b12e 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -337,13 +337,13 @@ static void deadline_exit_queue(struct elevator_queue *e)
 /*
  * initialize elevator private data (deadline_data).
  */
-static void *deadline_init_queue(struct request_queue *q)
+static int deadline_init_queue(struct request_queue *q)
 {
 	struct deadline_data *dd;
 
 	dd = kmalloc_node(sizeof(*dd), GFP_KERNEL | __GFP_ZERO, q->node);
 	if (!dd)
-		return NULL;
+		return -ENOMEM;
 
 	INIT_LIST_HEAD(&dd->fifo_list[READ]);
 	INIT_LIST_HEAD(&dd->fifo_list[WRITE]);
@@ -354,7 +354,9 @@ static void *deadline_init_queue(struct request_queue *q)
 	dd->writes_starved = writes_starved;
 	dd->front_merges = 1;
 	dd->fifo_batch = fifo_batch;
-	return dd;
+
+	q->elevator->elevator_data = dd;
+	return 0;
 }
 
 /*
diff --git a/block/elevator.c b/block/elevator.c
index 42543e3..8dc65b1 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -168,14 +168,6 @@ static struct elevator_type *elevator_get(const char *name)
 	return e;
 }
 
-static int elevator_init_queue(struct request_queue *q)
-{
-	q->elevator->elevator_data = q->elevator->type->ops.elevator_init_fn(q);
-	if (q->elevator->elevator_data)
-		return 0;
-	return -ENOMEM;
-}
-
 static char chosen_elevator[ELV_NAME_MAX];
 
 static int __init elevator_setup(char *str)
@@ -271,7 +263,7 @@ int elevator_init(struct request_queue *q, char *name)
 	if (!q->elevator)
 		return -ENOMEM;
 
-	err = elevator_init_queue(q);
+	err = e->ops.elevator_init_fn(q);
 	if (err) {
 		kobject_put(&q->elevator->kobj);
 		return err;
@@ -974,7 +966,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 	if (!q->elevator)
 		goto fail_init;
 
-	err = elevator_init_queue(q);
+	err = new_e->ops.elevator_init_fn(q);
 	if (err) {
 		kobject_put(&q->elevator->kobj);
 		goto fail_init;
diff --git a/block/noop-iosched.c b/block/noop-iosched.c
index 413a0b1..5d1bf70 100644
--- a/block/noop-iosched.c
+++ b/block/noop-iosched.c
@@ -59,15 +59,17 @@ noop_latter_request(struct request_queue *q, struct request *rq)
 	return list_entry(rq->queuelist.next, struct request, queuelist);
 }
 
-static void *noop_init_queue(struct request_queue *q)
+static int noop_init_queue(struct request_queue *q)
 {
 	struct noop_data *nd;
 
 	nd = kmalloc_node(sizeof(*nd), GFP_KERNEL, q->node);
 	if (!nd)
-		return NULL;
+		return -ENOMEM;
+
 	INIT_LIST_HEAD(&nd->queue);
-	return nd;
+	q->elevator->elevator_data = nd;
+	return 0;
 }
 
 static void noop_exit_queue(struct elevator_queue *e)
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index c24f3d7..d7726c9 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -33,7 +33,7 @@ typedef void (elevator_put_req_fn) (struct request *);
 typedef void (elevator_activate_req_fn) (struct request_queue *, struct request *);
 typedef void (elevator_deactivate_req_fn) (struct request_queue *, struct request *);
 
-typedef void *(elevator_init_fn) (struct request_queue *);
+typedef int (elevator_init_fn) (struct request_queue *);
 typedef void (elevator_exit_fn) (struct elevator_queue *);
 
 struct elevator_ops
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/12] blkcg: update blkg get functions take blkio_cgroup as parameter
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
                   ` (3 preceding siblings ...)
  2012-01-19  1:11 ` [PATCH 04/12] elevator: make elevator_init_fn() return 0/-errno Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19 11:21   ` Vivek Goyal
  2012-01-19  1:11 ` [PATCH 06/12] blkcg: use q and plid instead of opaque void * for blkio_group association Tejun Heo
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

In both blkg get functions - throtl_get_tg() and cfq_get_cfqg(),
instead of obtaining blkcg of %current explicitly, let the caller
specify the blkcg to use as parameter and make both functions hold on
to the blkcg.

This is part of block cgroup interface cleanup and will help making
blkcg API more modular.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-throttle.c |   16 +++++++---------
 block/cfq-iosched.c  |   20 ++++++++++++--------
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index ae53056..dfd950c 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -303,21 +303,23 @@ throtl_grp *throtl_find_tg(struct throtl_data *td, struct blkio_cgroup *blkcg)
 	return tg;
 }
 
-static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
+static struct throtl_grp *throtl_get_tg(struct throtl_data *td,
+					struct blkio_cgroup *blkcg)
 {
 	struct throtl_grp *tg = NULL, *__tg = NULL;
-	struct blkio_cgroup *blkcg;
 	struct request_queue *q = td->queue;
 
 	/* no throttling for dead queue */
 	if (unlikely(blk_queue_dead(q)))
 		return NULL;
 
-	blkcg = task_blkio_cgroup(current);
 	tg = throtl_find_tg(td, blkcg);
 	if (tg)
 		return tg;
 
+	if (!css_tryget(&blkcg->css))
+		return NULL;
+
 	/*
 	 * Need to allocate a group. Allocation of group also needs allocation
 	 * of per cpu stats which in-turn takes a mutex() and can block. Hence
@@ -331,6 +333,7 @@ static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
 	/* Group allocated and queue is still alive. take the lock */
 	rcu_read_lock();
 	spin_lock_irq(q->queue_lock);
+	css_put(&blkcg->css);
 
 	/* Make sure @q is still alive */
 	if (unlikely(blk_queue_dead(q))) {
@@ -339,11 +342,6 @@ static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
 	}
 
 	/*
-	 * Initialize the new group. After sleeping, read the blkcg again.
-	 */
-	blkcg = task_blkio_cgroup(current);
-
-	/*
 	 * If some other thread already allocated the group while we were
 	 * not holding queue lock, free up the group
 	 */
@@ -1140,7 +1138,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
 	 * IO group
 	 */
 	spin_lock_irq(q->queue_lock);
-	tg = throtl_get_tg(td);
+	tg = throtl_get_tg(td, blkcg);
 	if (unlikely(!tg))
 		goto out_unlock;
 
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index fd39a44..a4c1154 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1122,17 +1122,19 @@ cfq_find_cfqg(struct cfq_data *cfqd, struct blkio_cgroup *blkcg)
  * Search for the cfq group current task belongs to. request_queue lock must
  * be held.
  */
-static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd)
+static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd,
+				      struct blkio_cgroup *blkcg)
 {
-	struct blkio_cgroup *blkcg;
 	struct cfq_group *cfqg = NULL, *__cfqg = NULL;
 	struct request_queue *q = cfqd->queue;
 
-	blkcg = task_blkio_cgroup(current);
 	cfqg = cfq_find_cfqg(cfqd, blkcg);
 	if (cfqg)
 		return cfqg;
 
+	if (!css_tryget(&blkcg->css))
+		return NULL;
+
 	/*
 	 * Need to allocate a group. Allocation of group also needs allocation
 	 * of per cpu stats which in-turn takes a mutex() and can block. Hence
@@ -1142,16 +1144,14 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd)
 	 * around by the time we return. CFQ queue allocation code does
 	 * the same. It might be racy though.
 	 */
-
 	rcu_read_unlock();
 	spin_unlock_irq(q->queue_lock);
 
 	cfqg = cfq_alloc_cfqg(cfqd);
 
 	spin_lock_irq(q->queue_lock);
-
 	rcu_read_lock();
-	blkcg = task_blkio_cgroup(current);
+	css_put(&blkcg->css);
 
 	/*
 	 * If some other thread already allocated the group while we were
@@ -1261,7 +1261,8 @@ static void cfq_unlink_blkio_group(void *key, struct blkio_group *blkg)
 }
 
 #else /* GROUP_IOSCHED */
-static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd)
+static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd,
+				      struct blkio_cgroup *blkcg)
 {
 	return &cfqd->root_group;
 }
@@ -2850,6 +2851,7 @@ static struct cfq_queue *
 cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync,
 		     struct io_context *ioc, gfp_t gfp_mask)
 {
+	struct blkio_cgroup *blkcg;
 	struct cfq_queue *cfqq, *new_cfqq = NULL;
 	struct cfq_io_cq *cic;
 	struct cfq_group *cfqg;
@@ -2857,7 +2859,9 @@ cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync,
 retry:
 	rcu_read_lock();
 
-	cfqg = cfq_get_cfqg(cfqd);
+	blkcg = task_blkio_cgroup(current);
+
+	cfqg = cfq_get_cfqg(cfqd, blkcg);
 	cic = cfq_cic_lookup(cfqd, ioc);
 	/* cic always exists here */
 	cfqq = cic_to_cfqq(cic, is_sync);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/12] blkcg: use q and plid instead of opaque void * for blkio_group association
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
                   ` (4 preceding siblings ...)
  2012-01-19  1:11 ` [PATCH 05/12] blkcg: update blkg get functions take blkio_cgroup as parameter Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19 14:04   ` Vivek Goyal
  2012-01-19  1:11 ` [PATCH 07/12] blkcg: add blkio_policy[] array and allow one policy per policy ID Tejun Heo
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

blkgio_group is association between a block cgroup and a queue for a
given policy.  Using opaque void * for association makes things
confusing and hinders factoring of common code.  Use request_queue *
and, if necessary, policy id instead.

This will help block cgroup API cleanup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c   |   32 +++++++++++++++-----------------
 block/blk-cgroup.h   |   23 ++++++++++++-----------
 block/blk-throttle.c |   50 +++++++++++++++++++++++---------------------------
 block/cfq-iosched.c  |   30 ++++++++++++++++--------------
 block/cfq.h          |    7 ++++---
 5 files changed, 70 insertions(+), 72 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index fa8f263..2babed7 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -131,7 +131,7 @@ blkio_update_group_weight(struct blkio_group *blkg, unsigned int weight)
 		if (blkiop->plid != blkg->plid)
 			continue;
 		if (blkiop->ops.blkio_update_group_weight_fn)
-			blkiop->ops.blkio_update_group_weight_fn(blkg->key,
+			blkiop->ops.blkio_update_group_weight_fn(blkg->q,
 							blkg, weight);
 	}
 }
@@ -149,12 +149,12 @@ static inline void blkio_update_group_bps(struct blkio_group *blkg, u64 bps,
 
 		if (fileid == BLKIO_THROTL_read_bps_device
 		    && blkiop->ops.blkio_update_group_read_bps_fn)
-			blkiop->ops.blkio_update_group_read_bps_fn(blkg->key,
+			blkiop->ops.blkio_update_group_read_bps_fn(blkg->q,
 								blkg, bps);
 
 		if (fileid == BLKIO_THROTL_write_bps_device
 		    && blkiop->ops.blkio_update_group_write_bps_fn)
-			blkiop->ops.blkio_update_group_write_bps_fn(blkg->key,
+			blkiop->ops.blkio_update_group_write_bps_fn(blkg->q,
 								blkg, bps);
 	}
 }
@@ -172,12 +172,12 @@ static inline void blkio_update_group_iops(struct blkio_group *blkg,
 
 		if (fileid == BLKIO_THROTL_read_iops_device
 		    && blkiop->ops.blkio_update_group_read_iops_fn)
-			blkiop->ops.blkio_update_group_read_iops_fn(blkg->key,
+			blkiop->ops.blkio_update_group_read_iops_fn(blkg->q,
 								blkg, iops);
 
 		if (fileid == BLKIO_THROTL_write_iops_device
 		    && blkiop->ops.blkio_update_group_write_iops_fn)
-			blkiop->ops.blkio_update_group_write_iops_fn(blkg->key,
+			blkiop->ops.blkio_update_group_write_iops_fn(blkg->q,
 								blkg,iops);
 	}
 }
@@ -480,14 +480,14 @@ int blkio_alloc_blkg_stats(struct blkio_group *blkg)
 EXPORT_SYMBOL_GPL(blkio_alloc_blkg_stats);
 
 void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-		struct blkio_group *blkg, void *key, dev_t dev,
+		struct blkio_group *blkg, struct request_queue *q, dev_t dev,
 		enum blkio_policy_id plid)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&blkcg->lock, flags);
 	spin_lock_init(&blkg->stats_lock);
-	rcu_assign_pointer(blkg->key, key);
+	rcu_assign_pointer(blkg->q, q);
 	blkg->blkcg_id = css_id(&blkcg->css);
 	hlist_add_head_rcu(&blkg->blkcg_node, &blkcg->blkg_list);
 	blkg->plid = plid;
@@ -533,18 +533,16 @@ int blkiocg_del_blkio_group(struct blkio_group *blkg)
 EXPORT_SYMBOL_GPL(blkiocg_del_blkio_group);
 
 /* called under rcu_read_lock(). */
-struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key)
+struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
+					 struct request_queue *q,
+					 enum blkio_policy_id plid)
 {
 	struct blkio_group *blkg;
 	struct hlist_node *n;
-	void *__key;
 
-	hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node) {
-		__key = blkg->key;
-		if (__key == key)
+	hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node)
+		if (blkg->q == q && blkg->plid == plid)
 			return blkg;
-	}
-
 	return NULL;
 }
 EXPORT_SYMBOL_GPL(blkiocg_lookup_group);
@@ -1553,7 +1551,7 @@ static void blkiocg_destroy(struct cgroup_subsys *subsys, struct cgroup *cgroup)
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup);
 	unsigned long flags;
 	struct blkio_group *blkg;
-	void *key;
+	struct request_queue *q;
 	struct blkio_policy_type *blkiop;
 	struct blkio_policy_node *pn, *pntmp;
 
@@ -1568,7 +1566,7 @@ static void blkiocg_destroy(struct cgroup_subsys *subsys, struct cgroup *cgroup)
 
 		blkg = hlist_entry(blkcg->blkg_list.first, struct blkio_group,
 					blkcg_node);
-		key = rcu_dereference(blkg->key);
+		q = rcu_dereference(blkg->q);
 		__blkiocg_del_blkio_group(blkg);
 
 		spin_unlock_irqrestore(&blkcg->lock, flags);
@@ -1582,7 +1580,7 @@ static void blkiocg_destroy(struct cgroup_subsys *subsys, struct cgroup *cgroup)
 		list_for_each_entry(blkiop, &blkio_list, list) {
 			if (blkiop->plid != blkg->plid)
 				continue;
-			blkiop->ops.blkio_unlink_group_fn(key, blkg);
+			blkiop->ops.blkio_unlink_group_fn(q, blkg);
 		}
 		spin_unlock(&blkio_list_lock);
 	} while (1);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 6f3ace7..599cc65 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -159,8 +159,8 @@ struct blkio_group_stats_cpu {
 };
 
 struct blkio_group {
-	/* An rcu protected unique identifier for the group */
-	void *key;
+	/* Pointer to the associated request_queue, RCU protected */
+	struct request_queue __rcu *q;
 	struct hlist_node blkcg_node;
 	unsigned short blkcg_id;
 	/* Store cgroup path */
@@ -208,17 +208,17 @@ extern unsigned int blkcg_get_read_iops(struct blkio_cgroup *blkcg,
 extern unsigned int blkcg_get_write_iops(struct blkio_cgroup *blkcg,
 				     dev_t dev);
 
-typedef void (blkio_unlink_group_fn) (void *key, struct blkio_group *blkg);
-
-typedef void (blkio_update_group_weight_fn) (void *key,
+typedef void (blkio_unlink_group_fn)(struct request_queue *q,
+			struct blkio_group *blkg);
+typedef void (blkio_update_group_weight_fn)(struct request_queue *q,
 			struct blkio_group *blkg, unsigned int weight);
-typedef void (blkio_update_group_read_bps_fn) (void * key,
+typedef void (blkio_update_group_read_bps_fn)(struct request_queue *q,
 			struct blkio_group *blkg, u64 read_bps);
-typedef void (blkio_update_group_write_bps_fn) (void *key,
+typedef void (blkio_update_group_write_bps_fn)(struct request_queue *q,
 			struct blkio_group *blkg, u64 write_bps);
-typedef void (blkio_update_group_read_iops_fn) (void *key,
+typedef void (blkio_update_group_read_iops_fn)(struct request_queue *q,
 			struct blkio_group *blkg, unsigned int read_iops);
-typedef void (blkio_update_group_write_iops_fn) (void *key,
+typedef void (blkio_update_group_write_iops_fn)(struct request_queue *q,
 			struct blkio_group *blkg, unsigned int write_iops);
 
 struct blkio_policy_ops {
@@ -308,12 +308,13 @@ extern struct blkio_cgroup blkio_root_cgroup;
 extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
 extern struct blkio_cgroup *task_blkio_cgroup(struct task_struct *tsk);
 extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-	struct blkio_group *blkg, void *key, dev_t dev,
+	struct blkio_group *blkg, struct request_queue *q, dev_t dev,
 	enum blkio_policy_id plid);
 extern int blkio_alloc_blkg_stats(struct blkio_group *blkg);
 extern int blkiocg_del_blkio_group(struct blkio_group *blkg);
 extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
-						void *key);
+						struct request_queue *q,
+						enum blkio_policy_id plid);
 void blkiocg_update_timeslice_used(struct blkio_group *blkg,
 					unsigned long time,
 					unsigned long unaccounted_time);
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index dfd950c..8314da4 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -252,7 +252,7 @@ static void throtl_init_add_tg_lists(struct throtl_data *td,
 	__throtl_tg_fill_dev_details(td, tg);
 
 	/* Add group onto cgroup list */
-	blkiocg_add_blkio_group(blkcg, &tg->blkg, (void *)td,
+	blkiocg_add_blkio_group(blkcg, &tg->blkg, td->queue,
 				tg->blkg.dev, BLKIO_POLICY_THROTL);
 
 	tg->bps[READ] = blkcg_get_read_bps(blkcg, tg->blkg.dev);
@@ -288,7 +288,6 @@ static struct
 throtl_grp *throtl_find_tg(struct throtl_data *td, struct blkio_cgroup *blkcg)
 {
 	struct throtl_grp *tg = NULL;
-	void *key = td;
 
 	/*
 	 * This is the common case when there are no blkio cgroups.
@@ -297,7 +296,8 @@ throtl_grp *throtl_find_tg(struct throtl_data *td, struct blkio_cgroup *blkcg)
 	if (blkcg == &blkio_root_cgroup)
 		tg = td->root_tg;
 	else
-		tg = tg_of_blkg(blkiocg_lookup_group(blkcg, key));
+		tg = tg_of_blkg(blkiocg_lookup_group(blkcg, td->queue,
+						     BLKIO_POLICY_THROTL));
 
 	__throtl_tg_fill_dev_details(td, tg);
 	return tg;
@@ -1004,22 +1004,22 @@ static void throtl_release_tgs(struct throtl_data *td)
  * no new IO will come in this group. So get rid of this group as soon as
  * any pending IO in the group is finished.
  *
- * This function is called under rcu_read_lock(). key is the rcu protected
- * pointer. That means "key" is a valid throtl_data pointer as long as we are
- * rcu read lock.
+ * This function is called under rcu_read_lock(). @q is the rcu protected
+ * pointer. That means @q is a valid request_queue pointer as long as we
+ * are rcu read lock.
  *
- * "key" was fetched from blkio_group under blkio_cgroup->lock. That means
+ * @q was fetched from blkio_group under blkio_cgroup->lock. That means
  * it should not be NULL as even if queue was going away, cgroup deltion
  * path got to it first.
  */
-void throtl_unlink_blkio_group(void *key, struct blkio_group *blkg)
+void throtl_unlink_blkio_group(struct request_queue *q,
+			       struct blkio_group *blkg)
 {
 	unsigned long flags;
-	struct throtl_data *td = key;
 
-	spin_lock_irqsave(td->queue->queue_lock, flags);
-	throtl_destroy_tg(td, tg_of_blkg(blkg));
-	spin_unlock_irqrestore(td->queue->queue_lock, flags);
+	spin_lock_irqsave(q->queue_lock, flags);
+	throtl_destroy_tg(q->td, tg_of_blkg(blkg));
+	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
 static void throtl_update_blkio_group_common(struct throtl_data *td,
@@ -1032,52 +1032,48 @@ static void throtl_update_blkio_group_common(struct throtl_data *td,
 }
 
 /*
- * For all update functions, key should be a valid pointer because these
+ * For all update functions, @q should be a valid pointer because these
  * update functions are called under blkcg_lock, that means, blkg is
- * valid and in turn key is valid. queue exit path can not race because
+ * valid and in turn @q is valid. queue exit path can not race because
  * of blkcg_lock
  *
  * Can not take queue lock in update functions as queue lock under blkcg_lock
  * is not allowed. Under other paths we take blkcg_lock under queue_lock.
  */
-static void throtl_update_blkio_group_read_bps(void *key,
+static void throtl_update_blkio_group_read_bps(struct request_queue *q,
 				struct blkio_group *blkg, u64 read_bps)
 {
-	struct throtl_data *td = key;
 	struct throtl_grp *tg = tg_of_blkg(blkg);
 
 	tg->bps[READ] = read_bps;
-	throtl_update_blkio_group_common(td, tg);
+	throtl_update_blkio_group_common(q->td, tg);
 }
 
-static void throtl_update_blkio_group_write_bps(void *key,
+static void throtl_update_blkio_group_write_bps(struct request_queue *q,
 				struct blkio_group *blkg, u64 write_bps)
 {
-	struct throtl_data *td = key;
 	struct throtl_grp *tg = tg_of_blkg(blkg);
 
 	tg->bps[WRITE] = write_bps;
-	throtl_update_blkio_group_common(td, tg);
+	throtl_update_blkio_group_common(q->td, tg);
 }
 
-static void throtl_update_blkio_group_read_iops(void *key,
+static void throtl_update_blkio_group_read_iops(struct request_queue *q,
 			struct blkio_group *blkg, unsigned int read_iops)
 {
-	struct throtl_data *td = key;
 	struct throtl_grp *tg = tg_of_blkg(blkg);
 
 	tg->iops[READ] = read_iops;
-	throtl_update_blkio_group_common(td, tg);
+	throtl_update_blkio_group_common(q->td, tg);
 }
 
-static void throtl_update_blkio_group_write_iops(void *key,
+static void throtl_update_blkio_group_write_iops(struct request_queue *q,
 			struct blkio_group *blkg, unsigned int write_iops)
 {
-	struct throtl_data *td = key;
 	struct throtl_grp *tg = tg_of_blkg(blkg);
 
 	tg->iops[WRITE] = write_iops;
-	throtl_update_blkio_group_common(td, tg);
+	throtl_update_blkio_group_common(q->td, tg);
 }
 
 static void throtl_shutdown_wq(struct request_queue *q)
@@ -1283,7 +1279,7 @@ void blk_throtl_exit(struct request_queue *q)
 	spin_unlock_irq(q->queue_lock);
 
 	/*
-	 * Wait for tg->blkg->key accessors to exit their grace periods.
+	 * Wait for tg->blkg->q accessors to exit their grace periods.
 	 * Do this wait only if there are other undestroyed groups out
 	 * there (other than root group). This can happen if cgroup deletion
 	 * path claimed the responsibility of cleaning up a group before
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index a4c1154..511c0f1 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1020,7 +1020,8 @@ static inline struct cfq_group *cfqg_of_blkg(struct blkio_group *blkg)
 	return NULL;
 }
 
-static void cfq_update_blkio_group_weight(void *key, struct blkio_group *blkg,
+static void cfq_update_blkio_group_weight(struct request_queue *q,
+					  struct blkio_group *blkg,
 					  unsigned int weight)
 {
 	struct cfq_group *cfqg = cfqg_of_blkg(blkg);
@@ -1043,10 +1044,10 @@ static void cfq_init_add_cfqg_lists(struct cfq_data *cfqd,
 	if (bdi->dev) {
 		sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
 		cfq_blkiocg_add_blkio_group(blkcg, &cfqg->blkg,
-					(void *)cfqd, MKDEV(major, minor));
+					cfqd->queue, MKDEV(major, minor));
 	} else
 		cfq_blkiocg_add_blkio_group(blkcg, &cfqg->blkg,
-					(void *)cfqd, 0);
+					cfqd->queue, 0);
 
 	cfqd->nr_blkcg_linked_grps++;
 	cfqg->weight = blkcg_get_weight(blkcg, cfqg->blkg.dev);
@@ -1097,7 +1098,6 @@ static struct cfq_group *
 cfq_find_cfqg(struct cfq_data *cfqd, struct blkio_cgroup *blkcg)
 {
 	struct cfq_group *cfqg = NULL;
-	void *key = cfqd;
 	struct backing_dev_info *bdi = &cfqd->queue->backing_dev_info;
 	unsigned int major, minor;
 
@@ -1108,7 +1108,8 @@ cfq_find_cfqg(struct cfq_data *cfqd, struct blkio_cgroup *blkcg)
 	if (blkcg == &blkio_root_cgroup)
 		cfqg = &cfqd->root_group;
 	else
-		cfqg = cfqg_of_blkg(blkiocg_lookup_group(blkcg, key));
+		cfqg = cfqg_of_blkg(blkiocg_lookup_group(blkcg, cfqd->queue,
+							 BLKIO_POLICY_PROP));
 
 	if (cfqg && !cfqg->blkg.dev && bdi->dev && dev_name(bdi->dev)) {
 		sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
@@ -1243,21 +1244,22 @@ static void cfq_release_cfq_groups(struct cfq_data *cfqd)
  * any pending IO in the group is finished.
  *
  * This function is called under rcu_read_lock(). key is the rcu protected
- * pointer. That means "key" is a valid cfq_data pointer as long as we are rcu
- * read lock.
+ * pointer. That means @q is a valid request_queue pointer as long as we
+ * are rcu read lock.
  *
- * "key" was fetched from blkio_group under blkio_cgroup->lock. That means
+ * @q was fetched from blkio_group under blkio_cgroup->lock. That means
  * it should not be NULL as even if elevator was exiting, cgroup deltion
  * path got to it first.
  */
-static void cfq_unlink_blkio_group(void *key, struct blkio_group *blkg)
+static void cfq_unlink_blkio_group(struct request_queue *q,
+				   struct blkio_group *blkg)
 {
-	unsigned long  flags;
-	struct cfq_data *cfqd = key;
+	struct cfq_data *cfqd = q->elevator->elevator_data;
+	unsigned long flags;
 
-	spin_lock_irqsave(cfqd->queue->queue_lock, flags);
+	spin_lock_irqsave(q->queue_lock, flags);
 	cfq_destroy_cfqg(cfqd, cfqg_of_blkg(blkg));
-	spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
+	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
 #else /* GROUP_IOSCHED */
@@ -3707,7 +3709,7 @@ static int cfq_init_queue(struct request_queue *q)
 	rcu_read_lock();
 
 	cfq_blkiocg_add_blkio_group(&blkio_root_cgroup, &cfqg->blkg,
-					(void *)cfqd, 0);
+				    cfqd->queue, 0);
 	rcu_read_unlock();
 	cfqd->nr_blkcg_linked_grps++;
 
diff --git a/block/cfq.h b/block/cfq.h
index 2a15592..343b78a 100644
--- a/block/cfq.h
+++ b/block/cfq.h
@@ -68,8 +68,9 @@ static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
 }
 
 static inline void cfq_blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-			struct blkio_group *blkg, void *key, dev_t dev) {
-	blkiocg_add_blkio_group(blkcg, blkg, key, dev, BLKIO_POLICY_PROP);
+		struct blkio_group *blkg, struct request_queue *q, dev_t dev)
+{
+	blkiocg_add_blkio_group(blkcg, blkg, q, dev, BLKIO_POLICY_PROP);
 }
 
 static inline int cfq_blkiocg_del_blkio_group(struct blkio_group *blkg)
@@ -105,7 +106,7 @@ static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
 static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg, uint64_t start_time, uint64_t io_start_time, bool direction, bool sync) {}
 
 static inline void cfq_blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-			struct blkio_group *blkg, void *key, dev_t dev) {}
+		struct blkio_group *blkg, struct request_queue *q, dev_t dev) {}
 static inline int cfq_blkiocg_del_blkio_group(struct blkio_group *blkg)
 {
 	return 0;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/12] blkcg: add blkio_policy[] array and allow one policy per policy ID
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
                   ` (5 preceding siblings ...)
  2012-01-19  1:11 ` [PATCH 06/12] blkcg: use q and plid instead of opaque void * for blkio_group association Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19  1:11 ` [PATCH 08/12] blkcg: use the usual get blkg path for root blkio_group Tejun Heo
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

Block cgroup policies are maintained in a linked list and,
theoretically, multiple policies sharing the same policy ID are
allowed.

This patch temporarily restricts one policy per plid and adds
blkio_policy[] array which indexes registered policy types by plid.
Both the restriction and blkio_policy[] array are transitional and
will be removed once API cleanup is complete.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c |   10 ++++++++++
 block/blk-cgroup.h |    2 ++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 2babed7..940586b 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -28,6 +28,8 @@ static LIST_HEAD(blkio_list);
 struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
 EXPORT_SYMBOL_GPL(blkio_root_cgroup);
 
+static struct blkio_policy_type *blkio_policy[BLKIO_NR_POLICIES];
+
 static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
 						  struct cgroup *);
 static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
@@ -1665,7 +1667,11 @@ static void blkiocg_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 void blkio_policy_register(struct blkio_policy_type *blkiop)
 {
 	spin_lock(&blkio_list_lock);
+
+	BUG_ON(blkio_policy[blkiop->plid]);
+	blkio_policy[blkiop->plid] = blkiop;
 	list_add_tail(&blkiop->list, &blkio_list);
+
 	spin_unlock(&blkio_list_lock);
 }
 EXPORT_SYMBOL_GPL(blkio_policy_register);
@@ -1673,7 +1679,11 @@ EXPORT_SYMBOL_GPL(blkio_policy_register);
 void blkio_policy_unregister(struct blkio_policy_type *blkiop)
 {
 	spin_lock(&blkio_list_lock);
+
+	BUG_ON(blkio_policy[blkiop->plid] != blkiop);
+	blkio_policy[blkiop->plid] = NULL;
 	list_del_init(&blkiop->list);
+
 	spin_unlock(&blkio_list_lock);
 }
 EXPORT_SYMBOL_GPL(blkio_policy_unregister);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 599cc65..989d2f6 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -19,6 +19,8 @@
 enum blkio_policy_id {
 	BLKIO_POLICY_PROP = 0,		/* Proportional Bandwidth division */
 	BLKIO_POLICY_THROTL,		/* Throttling */
+
+	BLKIO_NR_POLICIES,
 };
 
 /* Max limits for throttle policy */
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/12] blkcg: use the usual get blkg path for root blkio_group
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
                   ` (6 preceding siblings ...)
  2012-01-19  1:11 ` [PATCH 07/12] blkcg: add blkio_policy[] array and allow one policy per policy ID Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19 14:41   ` Vivek Goyal
  2012-01-19  1:11 ` [PATCH 09/12] blkcg: factor out blkio_group creation Tejun Heo
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

For root blkg, blk_throtl_init() was using throtl_alloc_tg()
explicitly and cfq_init_queue() was manually initializing embedded
cfqd->root_group, adding unnecessarily different code paths to blkg
handling.

Make both use the usual blkio_group get functions - throtl_get_tg()
and cfq_get_cfqg() - for the root blkio_group too.  Note that
blk_throtl_init() callsite is pushed downwards in
blk_alloc_queue_node() so that @q is sufficiently initialized for
throtl_get_tg().

This simplifies root blkg handling noticeably for cfq and will allow
further modularization of blkcg API.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-core.c     |    6 ++--
 block/blk-throttle.c |   18 +++++++-------
 block/cfq-iosched.c  |   64 ++++++++++++++++---------------------------------
 3 files changed, 33 insertions(+), 55 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e6c05a9..e2a4656a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -498,9 +498,6 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (err)
 		goto fail_id;
 
-	if (blk_throtl_init(q))
-		goto fail_id;
-
 	setup_timer(&q->backing_dev_info.laptop_mode_wb_timer,
 		    laptop_mode_timer_fn, (unsigned long) q);
 	setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q);
@@ -522,6 +519,9 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	 */
 	q->queue_lock = &q->__queue_lock;
 
+	if (blk_throtl_init(q))
+		goto fail_id;
+
 	return q;
 
 fail_id:
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8314da4..cd3eb6a 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1229,7 +1229,6 @@ void blk_throtl_drain(struct request_queue *q)
 int blk_throtl_init(struct request_queue *q)
 {
 	struct throtl_data *td;
-	struct throtl_grp *tg;
 
 	td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node);
 	if (!td)
@@ -1242,19 +1241,20 @@ int blk_throtl_init(struct request_queue *q)
 
 	/* alloc and Init root group. */
 	td->queue = q;
-	tg = throtl_alloc_tg(td);
 
-	if (!tg) {
-		kfree(td);
-		return -ENOMEM;
-	}
+	rcu_read_lock();
+	spin_lock_irq(q->queue_lock);
 
-	td->root_tg = tg;
+	td->root_tg = throtl_get_tg(td, &blkio_root_cgroup);
 
-	rcu_read_lock();
-	throtl_init_add_tg_lists(td, tg, &blkio_root_cgroup);
+	spin_unlock_irq(q->queue_lock);
 	rcu_read_unlock();
 
+	if (!td->root_tg) {
+		kfree(td);
+		return -ENOMEM;
+	}
+
 	/* Attach throtl data to request queue */
 	q->td = td;
 	return 0;
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 511c0f1..4b09ba0 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -229,7 +229,7 @@ struct cfq_data {
 	struct request_queue *queue;
 	/* Root service tree for cfq_groups */
 	struct cfq_rb_root grp_service_tree;
-	struct cfq_group root_group;
+	struct cfq_group *root_group;
 
 	/*
 	 * The priority currently being served
@@ -1106,7 +1106,7 @@ cfq_find_cfqg(struct cfq_data *cfqd, struct blkio_cgroup *blkcg)
 	 * Avoid lookup in this case
 	 */
 	if (blkcg == &blkio_root_cgroup)
-		cfqg = &cfqd->root_group;
+		cfqg = cfqd->root_group;
 	else
 		cfqg = cfqg_of_blkg(blkiocg_lookup_group(blkcg, cfqd->queue,
 							 BLKIO_POLICY_PROP));
@@ -1166,7 +1166,7 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd,
 	}
 
 	if (!cfqg)
-		cfqg = &cfqd->root_group;
+		cfqg = cfqd->root_group;
 
 	cfq_init_add_cfqg_lists(cfqd, cfqg, blkcg);
 	return cfqg;
@@ -1182,7 +1182,7 @@ static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg)
 {
 	/* Currently, all async queues are mapped to root group */
 	if (!cfq_cfqq_sync(cfqq))
-		cfqg = &cfqq->cfqd->root_group;
+		cfqg = cfqq->cfqd->root_group;
 
 	cfqq->cfqg = cfqg;
 	/* cfqq reference on cfqg */
@@ -3660,62 +3660,40 @@ static void cfq_exit_queue(struct elevator_queue *e)
 	if (wait)
 		synchronize_rcu();
 
-#ifdef CONFIG_CFQ_GROUP_IOSCHED
-	/* Free up per cpu stats for root group */
-	free_percpu(cfqd->root_group.blkg.stats_cpu);
-#endif
 	kfree(cfqd);
 }
 
 static int cfq_init_queue(struct request_queue *q)
 {
 	struct cfq_data *cfqd;
-	int i, j;
-	struct cfq_group *cfqg;
-	struct cfq_rb_root *st;
+	int i;
 
 	cfqd = kmalloc_node(sizeof(*cfqd), GFP_KERNEL | __GFP_ZERO, q->node);
 	if (!cfqd)
 		return -ENOMEM;
 
+	cfqd->queue = q;
+	q->elevator->elevator_data = cfqd;
+
 	/* Init root service tree */
 	cfqd->grp_service_tree = CFQ_RB_ROOT;
 
-	/* Init root group */
-	cfqg = &cfqd->root_group;
-	for_each_cfqg_st(cfqg, i, j, st)
-		*st = CFQ_RB_ROOT;
-	RB_CLEAR_NODE(&cfqg->rb_node);
+	/* Init root group and prefer root group over other groups by default */
+	rcu_read_lock();
+	spin_lock_irq(q->queue_lock);
 
-	/* Give preference to root group over other groups */
-	cfqg->weight = 2*BLKIO_WEIGHT_DEFAULT;
+	cfqd->root_group = cfq_get_cfqg(cfqd, &blkio_root_cgroup);
 
-#ifdef CONFIG_CFQ_GROUP_IOSCHED
-	/*
-	 * Set root group reference to 2. One reference will be dropped when
-	 * all groups on cfqd->cfqg_list are being deleted during queue exit.
-	 * Other reference will remain there as we don't want to delete this
-	 * group as it is statically allocated and gets destroyed when
-	 * throtl_data goes away.
-	 */
-	cfqg->ref = 2;
+	spin_unlock_irq(q->queue_lock);
+	rcu_read_unlock();
 
-	if (blkio_alloc_blkg_stats(&cfqg->blkg)) {
-		kfree(cfqg);
+	if (!cfqd->root_group) {
 		kfree(cfqd);
 		return -ENOMEM;
 	}
 
-	rcu_read_lock();
-
-	cfq_blkiocg_add_blkio_group(&blkio_root_cgroup, &cfqg->blkg,
-				    cfqd->queue, 0);
-	rcu_read_unlock();
-	cfqd->nr_blkcg_linked_grps++;
+	cfqd->root_group->weight = 2*BLKIO_WEIGHT_DEFAULT;
 
-	/* Add group on cfqd->cfqg_list */
-	hlist_add_head(&cfqg->cfqd_node, &cfqd->cfqg_list);
-#endif
 	/*
 	 * Not strictly needed (since RB_ROOT just clears the node and we
 	 * zeroed cfqd on alloc), but better be safe in case someone decides
@@ -3727,14 +3705,14 @@ static int cfq_init_queue(struct request_queue *q)
 	/*
 	 * Our fallback cfqq if cfq_find_alloc_queue() runs into OOM issues.
 	 * Grab a permanent reference to it, so that the normal code flow
-	 * will not attempt to free it.
+	 * will not attempt to free it.  oom_cfqq is linked to root_group
+	 * but shouldn't hold a reference as it'll never be unlinked.  Lose
+	 * the reference from linking right away.
 	 */
 	cfq_init_cfqq(cfqd, &cfqd->oom_cfqq, 1, 0);
 	cfqd->oom_cfqq.ref++;
-	cfq_link_cfqq_cfqg(&cfqd->oom_cfqq, &cfqd->root_group);
-
-	cfqd->queue = q;
-	q->elevator->elevator_data = cfqd;
+	cfq_link_cfqq_cfqg(&cfqd->oom_cfqq, cfqd->root_group);
+	cfq_put_cfqg(cfqd->root_group);
 
 	init_timer(&cfqd->idle_slice_timer);
 	cfqd->idle_slice_timer.function = cfq_idle_slice_timer;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/12] blkcg: factor out blkio_group creation
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
                   ` (7 preceding siblings ...)
  2012-01-19  1:11 ` [PATCH 08/12] blkcg: use the usual get blkg path for root blkio_group Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19  1:11 ` [PATCH 10/12] blkcg: don't allow or retain configuration of missing devices Tejun Heo
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

Currently both blk-throttle and cfq-iosched implement their own
blkio_group creation code in throtl_get_tg() and cfq_get_cfqg().  This
patch factors out the common code.

* New plkio_policy_ops methods blkio_alloc_group_fn() and
  blkio_link_group_fn added.  Both are transitional and will be
  removed once the blkg management code is fully moved into
  blk-cgroup.c.

* blkio_alloc_group_fn() allocates policy-specific blkg which is
  usually a larger data structure with blkg as the first entry and
  intiailizes it.  Note that initialization of blkg proper, including
  percpu stats, is responsibility of blk-cgroup proper.

  Note that default config (weight, bps...) initialization is done
  from this method; otherwise, we end up violating locking order
  between blkcg and q locks via blkcg_get_CONF() functions.

* blkio_link_group_fn() is called under queue_lock and responsible for
  linking the blkg to the queue.  blkcg side is handled by blk-cgroup
  proper.

* The common blkg creation function is named blkg_lookup_create() and
  blkiocg_lookup_group() is renamed to blkg_lookup() for consistency.
  Also, throtl / cfq related functions are similarly [re]named for
  consistency.

* Code to fill cfqg->blkg.dev is open coded in cfq_find_alloc_queue().
  This is transitional and will be removed soon.

This simplifies blkcg policy implementations and enables further
cleanup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c   |  113 ++++++++++++++++++++++++++++-----------
 block/blk-cgroup.h   |   29 +++++-----
 block/blk-throttle.c |  144 ++++++++++++++------------------------------------
 block/cfq-iosched.c  |  135 +++++++++++++----------------------------------
 block/cfq.h          |    8 ---
 5 files changed, 173 insertions(+), 256 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 940586b..9e013bb 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -467,38 +467,89 @@ void blkiocg_update_io_merged_stats(struct blkio_group *blkg, bool direction,
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_io_merged_stats);
 
-/*
- * This function allocates the per cpu stats for blkio_group. Should be called
- * from sleepable context as alloc_per_cpu() requires that.
- */
-int blkio_alloc_blkg_stats(struct blkio_group *blkg)
+struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
+				       struct request_queue *q,
+				       enum blkio_policy_id plid)
+	__releases(q->queue_lock) __acquires(q->queue_lock)
 {
-	/* Allocate memory for per cpu stats */
-	blkg->stats_cpu = alloc_percpu(struct blkio_group_stats_cpu);
-	if (!blkg->stats_cpu)
-		return -ENOMEM;
-	return 0;
-}
-EXPORT_SYMBOL_GPL(blkio_alloc_blkg_stats);
+	struct blkio_policy_type *pol = blkio_policy[plid];
+	struct blkio_group *blkg = NULL;
+	struct blkio_group *new_blkg = NULL;
 
-void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-		struct blkio_group *blkg, struct request_queue *q, dev_t dev,
-		enum blkio_policy_id plid)
-{
-	unsigned long flags;
+	WARN_ON_ONCE(!rcu_read_lock_held());
+	lockdep_assert_held(q->queue_lock);
 
-	spin_lock_irqsave(&blkcg->lock, flags);
-	spin_lock_init(&blkg->stats_lock);
-	rcu_assign_pointer(blkg->q, q);
-	blkg->blkcg_id = css_id(&blkcg->css);
+	/*
+	 * This could be the first entry point of blkcg implementation and
+	 * we shouldn't allow anything to go through for a dead queue.  The
+	 * following can be removed if blkg lookup is guaranteed to fail on
+	 * a dead queue.
+	 */
+	if (unlikely(blk_queue_dead(q)))
+		return NULL;
+
+	blkg = blkg_lookup(blkcg, q, plid);
+	if (blkg)
+		return blkg;
+
+	if (!css_tryget(&blkcg->css))
+		return NULL;
+
+	/*
+	 * Allocate and initialize.
+	 *
+	 * FIXME: The following is broken.  Percpu memory allocation
+	 * requires %GFP_KERNEL context and can't be performed from IO
+	 * path.  Allocation here should inherently be atomic and the
+	 * following lock dancing can be removed once the broken percpu
+	 * allocation is fixed.
+	 */
+	spin_unlock_irq(q->queue_lock);
+	rcu_read_unlock();
+
+	new_blkg = pol->ops.blkio_alloc_group_fn(q, blkcg);
+	if (new_blkg) {
+		new_blkg->stats_cpu = alloc_percpu(struct blkio_group_stats_cpu);
+
+		spin_lock_init(&new_blkg->stats_lock);
+		rcu_assign_pointer(new_blkg->q, q);
+		new_blkg->blkcg_id = css_id(&blkcg->css);
+		new_blkg->plid = plid;
+		cgroup_path(blkcg->css.cgroup, new_blkg->path,
+			    sizeof(new_blkg->path));
+	}
+
+	rcu_read_lock();
+	spin_lock_irq(q->queue_lock);
+	css_put(&blkcg->css);
+
+	/* did the device die inbetween? */
+	if (unlikely(blk_queue_dead(q)))
+		goto out;
+
+	/* did someone beat us to it? */
+	blkg = blkg_lookup(blkcg, q, plid);
+	if (unlikely(blkg))
+		goto out;
+
+	/* did alloc fail? */
+	if (unlikely(!new_blkg || !new_blkg->stats_cpu))
+		goto out;
+
+	/* insert */
+	spin_lock(&blkcg->lock);
+	swap(blkg, new_blkg);
 	hlist_add_head_rcu(&blkg->blkcg_node, &blkcg->blkg_list);
-	blkg->plid = plid;
-	spin_unlock_irqrestore(&blkcg->lock, flags);
-	/* Need to take css reference ? */
-	cgroup_path(blkcg->css.cgroup, blkg->path, sizeof(blkg->path));
-	blkg->dev = dev;
+	pol->ops.blkio_link_group_fn(q, blkg);
+	spin_unlock(&blkcg->lock);
+out:
+	if (new_blkg) {
+		free_percpu(new_blkg->stats_cpu);
+		kfree(new_blkg);
+	}
+	return blkg;
 }
-EXPORT_SYMBOL_GPL(blkiocg_add_blkio_group);
+EXPORT_SYMBOL_GPL(blkg_lookup_create);
 
 static void __blkiocg_del_blkio_group(struct blkio_group *blkg)
 {
@@ -535,9 +586,9 @@ int blkiocg_del_blkio_group(struct blkio_group *blkg)
 EXPORT_SYMBOL_GPL(blkiocg_del_blkio_group);
 
 /* called under rcu_read_lock(). */
-struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
-					 struct request_queue *q,
-					 enum blkio_policy_id plid)
+struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg,
+				struct request_queue *q,
+				enum blkio_policy_id plid)
 {
 	struct blkio_group *blkg;
 	struct hlist_node *n;
@@ -547,7 +598,7 @@ struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
 			return blkg;
 	return NULL;
 }
-EXPORT_SYMBOL_GPL(blkiocg_lookup_group);
+EXPORT_SYMBOL_GPL(blkg_lookup);
 
 static void blkio_reset_stats_cpu(struct blkio_group *blkg)
 {
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 989d2f6..e8cbc52 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -210,6 +210,10 @@ extern unsigned int blkcg_get_read_iops(struct blkio_cgroup *blkcg,
 extern unsigned int blkcg_get_write_iops(struct blkio_cgroup *blkcg,
 				     dev_t dev);
 
+typedef struct blkio_group *(blkio_alloc_group_fn)(struct request_queue *q,
+						   struct blkio_cgroup *blkcg);
+typedef void (blkio_link_group_fn)(struct request_queue *q,
+			struct blkio_group *blkg);
 typedef void (blkio_unlink_group_fn)(struct request_queue *q,
 			struct blkio_group *blkg);
 typedef void (blkio_update_group_weight_fn)(struct request_queue *q,
@@ -224,6 +228,8 @@ typedef void (blkio_update_group_write_iops_fn)(struct request_queue *q,
 			struct blkio_group *blkg, unsigned int write_iops);
 
 struct blkio_policy_ops {
+	blkio_alloc_group_fn *blkio_alloc_group_fn;
+	blkio_link_group_fn *blkio_link_group_fn;
 	blkio_unlink_group_fn *blkio_unlink_group_fn;
 	blkio_update_group_weight_fn *blkio_update_group_weight_fn;
 	blkio_update_group_read_bps_fn *blkio_update_group_read_bps_fn;
@@ -309,14 +315,13 @@ static inline void blkiocg_set_start_empty_time(struct blkio_group *blkg) {}
 extern struct blkio_cgroup blkio_root_cgroup;
 extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
 extern struct blkio_cgroup *task_blkio_cgroup(struct task_struct *tsk);
-extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-	struct blkio_group *blkg, struct request_queue *q, dev_t dev,
-	enum blkio_policy_id plid);
-extern int blkio_alloc_blkg_stats(struct blkio_group *blkg);
 extern int blkiocg_del_blkio_group(struct blkio_group *blkg);
-extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
-						struct request_queue *q,
-						enum blkio_policy_id plid);
+extern struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg,
+				       struct request_queue *q,
+				       enum blkio_policy_id plid);
+struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
+				       struct request_queue *q,
+				       enum blkio_policy_id plid);
 void blkiocg_update_timeslice_used(struct blkio_group *blkg,
 					unsigned long time,
 					unsigned long unaccounted_time);
@@ -337,17 +342,11 @@ cgroup_to_blkio_cgroup(struct cgroup *cgroup) { return NULL; }
 static inline struct blkio_cgroup *
 task_blkio_cgroup(struct task_struct *tsk) { return NULL; }
 
-static inline void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-		struct blkio_group *blkg, void *key, dev_t dev,
-		enum blkio_policy_id plid) {}
-
-static inline int blkio_alloc_blkg_stats(struct blkio_group *blkg) { return 0; }
-
 static inline int
 blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; }
 
-static inline struct blkio_group *
-blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; }
+static inline struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg,
+					      void *key) { return NULL; }
 static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg,
 						unsigned long time,
 						unsigned long unaccounted_time)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index cd3eb6a..bb5af29 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -181,17 +181,25 @@ static void throtl_put_tg(struct throtl_grp *tg)
 	call_rcu(&tg->rcu_head, throtl_free_tg);
 }
 
-static void throtl_init_group(struct throtl_grp *tg)
+static struct blkio_group *throtl_alloc_blkio_group(struct request_queue *q,
+						    struct blkio_cgroup *blkcg)
 {
+	struct throtl_grp *tg;
+
+	tg = kzalloc_node(sizeof(*tg), GFP_ATOMIC, q->node);
+	if (!tg)
+		return NULL;
+
 	INIT_HLIST_NODE(&tg->tg_node);
 	RB_CLEAR_NODE(&tg->rb_node);
 	bio_list_init(&tg->bio_lists[0]);
 	bio_list_init(&tg->bio_lists[1]);
 	tg->limits_changed = false;
 
-	/* Practically unlimited BW */
-	tg->bps[0] = tg->bps[1] = -1;
-	tg->iops[0] = tg->iops[1] = -1;
+	tg->bps[READ] = blkcg_get_read_bps(blkcg, tg->blkg.dev);
+	tg->bps[WRITE] = blkcg_get_write_bps(blkcg, tg->blkg.dev);
+	tg->iops[READ] = blkcg_get_read_iops(blkcg, tg->blkg.dev);
+	tg->iops[WRITE] = blkcg_get_write_iops(blkcg, tg->blkg.dev);
 
 	/*
 	 * Take the initial reference that will be released on destroy
@@ -200,14 +208,8 @@ static void throtl_init_group(struct throtl_grp *tg)
 	 * exit or cgroup deletion path depending on who is exiting first.
 	 */
 	atomic_set(&tg->ref, 1);
-}
 
-/* Should be called with rcu read lock held (needed for blkcg) */
-static void
-throtl_add_group_to_td_list(struct throtl_data *td, struct throtl_grp *tg)
-{
-	hlist_add_head(&tg->tg_node, &td->tg_list);
-	td->nr_undestroyed_grps++;
+	return &tg->blkg;
 }
 
 static void
@@ -246,46 +248,20 @@ throtl_tg_fill_dev_details(struct throtl_data *td, struct throtl_grp *tg)
 	spin_unlock_irq(td->queue->queue_lock);
 }
 
-static void throtl_init_add_tg_lists(struct throtl_data *td,
-			struct throtl_grp *tg, struct blkio_cgroup *blkcg)
+static void throtl_link_blkio_group(struct request_queue *q,
+				    struct blkio_group *blkg)
 {
-	__throtl_tg_fill_dev_details(td, tg);
-
-	/* Add group onto cgroup list */
-	blkiocg_add_blkio_group(blkcg, &tg->blkg, td->queue,
-				tg->blkg.dev, BLKIO_POLICY_THROTL);
-
-	tg->bps[READ] = blkcg_get_read_bps(blkcg, tg->blkg.dev);
-	tg->bps[WRITE] = blkcg_get_write_bps(blkcg, tg->blkg.dev);
-	tg->iops[READ] = blkcg_get_read_iops(blkcg, tg->blkg.dev);
-	tg->iops[WRITE] = blkcg_get_write_iops(blkcg, tg->blkg.dev);
-
-	throtl_add_group_to_td_list(td, tg);
-}
-
-/* Should be called without queue lock and outside of rcu period */
-static struct throtl_grp *throtl_alloc_tg(struct throtl_data *td)
-{
-	struct throtl_grp *tg = NULL;
-	int ret;
-
-	tg = kzalloc_node(sizeof(*tg), GFP_ATOMIC, td->queue->node);
-	if (!tg)
-		return NULL;
-
-	ret = blkio_alloc_blkg_stats(&tg->blkg);
+	struct throtl_data *td = q->td;
+	struct throtl_grp *tg = tg_of_blkg(blkg);
 
-	if (ret) {
-		kfree(tg);
-		return NULL;
-	}
+	__throtl_tg_fill_dev_details(td, tg);
 
-	throtl_init_group(tg);
-	return tg;
+	hlist_add_head(&tg->tg_node, &td->tg_list);
+	td->nr_undestroyed_grps++;
 }
 
 static struct
-throtl_grp *throtl_find_tg(struct throtl_data *td, struct blkio_cgroup *blkcg)
+throtl_grp *throtl_lookup_tg(struct throtl_data *td, struct blkio_cgroup *blkcg)
 {
 	struct throtl_grp *tg = NULL;
 
@@ -296,69 +272,29 @@ throtl_grp *throtl_find_tg(struct throtl_data *td, struct blkio_cgroup *blkcg)
 	if (blkcg == &blkio_root_cgroup)
 		tg = td->root_tg;
 	else
-		tg = tg_of_blkg(blkiocg_lookup_group(blkcg, td->queue,
-						     BLKIO_POLICY_THROTL));
+		tg = tg_of_blkg(blkg_lookup(blkcg, td->queue,
+					    BLKIO_POLICY_THROTL));
 
 	__throtl_tg_fill_dev_details(td, tg);
 	return tg;
 }
 
-static struct throtl_grp *throtl_get_tg(struct throtl_data *td,
-					struct blkio_cgroup *blkcg)
+static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td,
+						  struct blkio_cgroup *blkcg)
 {
-	struct throtl_grp *tg = NULL, *__tg = NULL;
 	struct request_queue *q = td->queue;
+	struct throtl_grp *tg = NULL;
+	struct blkio_group *blkg;
 
-	/* no throttling for dead queue */
-	if (unlikely(blk_queue_dead(q)))
-		return NULL;
-
-	tg = throtl_find_tg(td, blkcg);
-	if (tg)
-		return tg;
-
-	if (!css_tryget(&blkcg->css))
-		return NULL;
-
-	/*
-	 * Need to allocate a group. Allocation of group also needs allocation
-	 * of per cpu stats which in-turn takes a mutex() and can block. Hence
-	 * we need to drop rcu lock and queue_lock before we call alloc.
-	 */
-	spin_unlock_irq(q->queue_lock);
-	rcu_read_unlock();
-
-	tg = throtl_alloc_tg(td);
-
-	/* Group allocated and queue is still alive. take the lock */
-	rcu_read_lock();
-	spin_lock_irq(q->queue_lock);
-	css_put(&blkcg->css);
-
-	/* Make sure @q is still alive */
-	if (unlikely(blk_queue_dead(q))) {
-		kfree(tg);
-		return NULL;
-	}
-
-	/*
-	 * If some other thread already allocated the group while we were
-	 * not holding queue lock, free up the group
-	 */
-	__tg = throtl_find_tg(td, blkcg);
-
-	if (__tg) {
-		kfree(tg);
-		return __tg;
-	}
+	blkg = blkg_lookup_create(blkcg, q, BLKIO_POLICY_THROTL);
 
-	/* Group allocation failed. Account the IO to root group */
-	if (!tg) {
+	/* if %NULL and @q is alive, fall back to root_tg */
+	if (blkg)
+		tg = tg_of_blkg(blkg);
+	else if (!blk_queue_dead(q))
 		tg = td->root_tg;
-		return tg;
-	}
 
-	throtl_init_add_tg_lists(td, tg, blkcg);
+	__throtl_tg_fill_dev_details(td, tg);
 	return tg;
 }
 
@@ -1085,6 +1021,8 @@ static void throtl_shutdown_wq(struct request_queue *q)
 
 static struct blkio_policy_type blkio_policy_throtl = {
 	.ops = {
+		.blkio_alloc_group_fn = throtl_alloc_blkio_group,
+		.blkio_link_group_fn = throtl_link_blkio_group,
 		.blkio_unlink_group_fn = throtl_unlink_blkio_group,
 		.blkio_update_group_read_bps_fn =
 					throtl_update_blkio_group_read_bps,
@@ -1118,7 +1056,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
 	 */
 	rcu_read_lock();
 	blkcg = task_blkio_cgroup(current);
-	tg = throtl_find_tg(td, blkcg);
+	tg = throtl_lookup_tg(td, blkcg);
 	if (tg) {
 		throtl_tg_fill_dev_details(td, tg);
 
@@ -1134,7 +1072,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
 	 * IO group
 	 */
 	spin_lock_irq(q->queue_lock);
-	tg = throtl_get_tg(td, blkcg);
+	tg = throtl_lookup_create_tg(td, blkcg);
 	if (unlikely(!tg))
 		goto out_unlock;
 
@@ -1239,13 +1177,14 @@ int blk_throtl_init(struct request_queue *q)
 	td->limits_changed = false;
 	INIT_DELAYED_WORK(&td->throtl_work, blk_throtl_work);
 
-	/* alloc and Init root group. */
+	q->td = td;
 	td->queue = q;
 
+	/* alloc and init root group. */
 	rcu_read_lock();
 	spin_lock_irq(q->queue_lock);
 
-	td->root_tg = throtl_get_tg(td, &blkio_root_cgroup);
+	td->root_tg = throtl_lookup_create_tg(td, &blkio_root_cgroup);
 
 	spin_unlock_irq(q->queue_lock);
 	rcu_read_unlock();
@@ -1254,9 +1193,6 @@ int blk_throtl_init(struct request_queue *q)
 		kfree(td);
 		return -ENOMEM;
 	}
-
-	/* Attach throtl data to request queue */
-	q->td = td;
 	return 0;
 }
 
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 4b09ba0..da16f05 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1029,10 +1029,12 @@ static void cfq_update_blkio_group_weight(struct request_queue *q,
 	cfqg->needs_update = true;
 }
 
-static void cfq_init_add_cfqg_lists(struct cfq_data *cfqd,
-			struct cfq_group *cfqg, struct blkio_cgroup *blkcg)
+static void cfq_link_blkio_group(struct request_queue *q,
+				 struct blkio_group *blkg)
 {
-	struct backing_dev_info *bdi = &cfqd->queue->backing_dev_info;
+	struct cfq_data *cfqd = q->elevator->elevator_data;
+	struct backing_dev_info *bdi = &q->backing_dev_info;
+	struct cfq_group *cfqg = cfqg_of_blkg(blkg);
 	unsigned int major, minor;
 
 	/*
@@ -1043,31 +1045,23 @@ static void cfq_init_add_cfqg_lists(struct cfq_data *cfqd,
 	 */
 	if (bdi->dev) {
 		sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
-		cfq_blkiocg_add_blkio_group(blkcg, &cfqg->blkg,
-					cfqd->queue, MKDEV(major, minor));
-	} else
-		cfq_blkiocg_add_blkio_group(blkcg, &cfqg->blkg,
-					cfqd->queue, 0);
+		blkg->dev = MKDEV(major, minor);
+	}
 
 	cfqd->nr_blkcg_linked_grps++;
-	cfqg->weight = blkcg_get_weight(blkcg, cfqg->blkg.dev);
 
 	/* Add group on cfqd list */
 	hlist_add_head(&cfqg->cfqd_node, &cfqd->cfqg_list);
 }
 
-/*
- * Should be called from sleepable context. No request queue lock as per
- * cpu stats are allocated dynamically and alloc_percpu needs to be called
- * from sleepable context.
- */
-static struct cfq_group * cfq_alloc_cfqg(struct cfq_data *cfqd)
+static struct blkio_group *cfq_alloc_blkio_group(struct request_queue *q,
+						 struct blkio_cgroup *blkcg)
 {
 	struct cfq_group *cfqg = NULL;
-	int i, j, ret;
+	int i, j;
 	struct cfq_rb_root *st;
 
-	cfqg = kzalloc_node(sizeof(*cfqg), GFP_ATOMIC, cfqd->queue->node);
+	cfqg = kzalloc_node(sizeof(*cfqg), GFP_ATOMIC, q->node);
 	if (!cfqg)
 		return NULL;
 
@@ -1076,6 +1070,7 @@ static struct cfq_group * cfq_alloc_cfqg(struct cfq_data *cfqd)
 	RB_CLEAR_NODE(&cfqg->rb_node);
 
 	cfqg->ttime.last_end_request = jiffies;
+	cfqg->weight = blkcg_get_weight(blkcg, cfqg->blkg.dev);
 
 	/*
 	 * Take the initial reference that will be released on destroy
@@ -1085,91 +1080,20 @@ static struct cfq_group * cfq_alloc_cfqg(struct cfq_data *cfqd)
 	 */
 	cfqg->ref = 1;
 
-	ret = blkio_alloc_blkg_stats(&cfqg->blkg);
-	if (ret) {
-		kfree(cfqg);
-		return NULL;
-	}
-
-	return cfqg;
-}
-
-static struct cfq_group *
-cfq_find_cfqg(struct cfq_data *cfqd, struct blkio_cgroup *blkcg)
-{
-	struct cfq_group *cfqg = NULL;
-	struct backing_dev_info *bdi = &cfqd->queue->backing_dev_info;
-	unsigned int major, minor;
-
-	/*
-	 * This is the common case when there are no blkio cgroups.
-	 * Avoid lookup in this case
-	 */
-	if (blkcg == &blkio_root_cgroup)
-		cfqg = cfqd->root_group;
-	else
-		cfqg = cfqg_of_blkg(blkiocg_lookup_group(blkcg, cfqd->queue,
-							 BLKIO_POLICY_PROP));
-
-	if (cfqg && !cfqg->blkg.dev && bdi->dev && dev_name(bdi->dev)) {
-		sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
-		cfqg->blkg.dev = MKDEV(major, minor);
-	}
-
-	return cfqg;
+	return &cfqg->blkg;
 }
 
 /*
  * Search for the cfq group current task belongs to. request_queue lock must
  * be held.
  */
-static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd,
-				      struct blkio_cgroup *blkcg)
+static struct cfq_group *cfq_lookup_create_cfqg(struct cfq_data *cfqd,
+						struct blkio_cgroup *blkcg)
 {
-	struct cfq_group *cfqg = NULL, *__cfqg = NULL;
-	struct request_queue *q = cfqd->queue;
-
-	cfqg = cfq_find_cfqg(cfqd, blkcg);
-	if (cfqg)
-		return cfqg;
-
-	if (!css_tryget(&blkcg->css))
-		return NULL;
-
-	/*
-	 * Need to allocate a group. Allocation of group also needs allocation
-	 * of per cpu stats which in-turn takes a mutex() and can block. Hence
-	 * we need to drop rcu lock and queue_lock before we call alloc.
-	 *
-	 * Not taking any queue reference here and assuming that queue is
-	 * around by the time we return. CFQ queue allocation code does
-	 * the same. It might be racy though.
-	 */
-	rcu_read_unlock();
-	spin_unlock_irq(q->queue_lock);
-
-	cfqg = cfq_alloc_cfqg(cfqd);
-
-	spin_lock_irq(q->queue_lock);
-	rcu_read_lock();
-	css_put(&blkcg->css);
-
-	/*
-	 * If some other thread already allocated the group while we were
-	 * not holding queue lock, free up the group
-	 */
-	__cfqg = cfq_find_cfqg(cfqd, blkcg);
+	struct blkio_group *blkg;
 
-	if (__cfqg) {
-		kfree(cfqg);
-		return __cfqg;
-	}
-
-	if (!cfqg)
-		cfqg = cfqd->root_group;
-
-	cfq_init_add_cfqg_lists(cfqd, cfqg, blkcg);
-	return cfqg;
+	blkg = blkg_lookup_create(blkcg, cfqd->queue, BLKIO_POLICY_PROP);
+	return cfqg_of_blkg(blkg);
 }
 
 static inline struct cfq_group *cfq_ref_get_cfqg(struct cfq_group *cfqg)
@@ -1263,8 +1187,8 @@ static void cfq_unlink_blkio_group(struct request_queue *q,
 }
 
 #else /* GROUP_IOSCHED */
-static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd,
-				      struct blkio_cgroup *blkcg)
+static struct cfq_group *cfq_lookup_create_cfqg(struct cfq_data *cfqd,
+						struct blkio_cgroup *blkcg)
 {
 	return &cfqd->root_group;
 }
@@ -2853,6 +2777,7 @@ static struct cfq_queue *
 cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync,
 		     struct io_context *ioc, gfp_t gfp_mask)
 {
+	struct backing_dev_info *bdi = &cfqd->queue->backing_dev_info;
 	struct blkio_cgroup *blkcg;
 	struct cfq_queue *cfqq, *new_cfqq = NULL;
 	struct cfq_io_cq *cic;
@@ -2863,7 +2788,19 @@ retry:
 
 	blkcg = task_blkio_cgroup(current);
 
-	cfqg = cfq_get_cfqg(cfqd, blkcg);
+	/* avoid lookup for the common case where there's no blkio cgroup */
+	if (blkcg == &blkio_root_cgroup)
+		cfqg = cfqd->root_group;
+	else
+		cfqg = cfq_lookup_create_cfqg(cfqd, blkcg);
+
+	if (cfqg && !cfqg->blkg.dev && bdi->dev && dev_name(bdi->dev)) {
+		unsigned int major, minor;
+
+		sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
+		cfqg->blkg.dev = MKDEV(major, minor);
+	}
+
 	cic = cfq_cic_lookup(cfqd, ioc);
 	/* cic always exists here */
 	cfqq = cic_to_cfqq(cic, is_sync);
@@ -3682,7 +3619,7 @@ static int cfq_init_queue(struct request_queue *q)
 	rcu_read_lock();
 	spin_lock_irq(q->queue_lock);
 
-	cfqd->root_group = cfq_get_cfqg(cfqd, &blkio_root_cgroup);
+	cfqd->root_group = cfq_lookup_create_cfqg(cfqd, &blkio_root_cgroup);
 
 	spin_unlock_irq(q->queue_lock);
 	rcu_read_unlock();
@@ -3863,6 +3800,8 @@ static struct elevator_type iosched_cfq = {
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
 static struct blkio_policy_type blkio_policy_cfq = {
 	.ops = {
+		.blkio_alloc_group_fn =		cfq_alloc_blkio_group,
+		.blkio_link_group_fn =		cfq_link_blkio_group,
 		.blkio_unlink_group_fn =	cfq_unlink_blkio_group,
 		.blkio_update_group_weight_fn =	cfq_update_blkio_group_weight,
 	},
diff --git a/block/cfq.h b/block/cfq.h
index 343b78a..3987601 100644
--- a/block/cfq.h
+++ b/block/cfq.h
@@ -67,12 +67,6 @@ static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
 				direction, sync);
 }
 
-static inline void cfq_blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-		struct blkio_group *blkg, struct request_queue *q, dev_t dev)
-{
-	blkiocg_add_blkio_group(blkcg, blkg, q, dev, BLKIO_POLICY_PROP);
-}
-
 static inline int cfq_blkiocg_del_blkio_group(struct blkio_group *blkg)
 {
 	return blkiocg_del_blkio_group(blkg);
@@ -105,8 +99,6 @@ static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
 				uint64_t bytes, bool direction, bool sync) {}
 static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg, uint64_t start_time, uint64_t io_start_time, bool direction, bool sync) {}
 
-static inline void cfq_blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-		struct blkio_group *blkg, struct request_queue *q, dev_t dev) {}
 static inline int cfq_blkiocg_del_blkio_group(struct blkio_group *blkg)
 {
 	return 0;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/12] blkcg: don't allow or retain configuration of missing devices
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
                   ` (8 preceding siblings ...)
  2012-01-19  1:11 ` [PATCH 09/12] blkcg: factor out blkio_group creation Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19  1:11 ` [PATCH 11/12] blkcg: kill blkio_policy_node Tejun Heo
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo, Kay Sievers

blkcg is very peculiar in that it allows setting and remembering
configurations for non-existent devices by maintaining separate data
structures for configuration.

This behavior is completely out of the usual norms and outright
confusing; furthermore, it uses dev_t number to match the
configuration to devices, which is unpredictable to begin with and
becomes completely unuseable if EXT_DEVT is fully used.

It is wholely unnecessary - we already have fully functional userland
mechanism to program devices being hotplugged which has full access to
device identification, connection topology and filesystem information.

Add a new struct blkio_group_conf which contains all blkcg
configurations to blkio_group and let blkio_group, which can be
created iff the associated device exists and is removed when the
associated device goes away, carry all configurations.

Note that, after this patch, all newly created blkg's will always have
the default configuration (unlimited for throttling and blkcg's weight
for propio).

This patch makes blkio_policy_node meaningless but doesn't remove it.
The next patch will.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
---
 block/blk-cgroup.c   |   78 ++++++++++++++++++++++++++++++++++++--------------
 block/blk-cgroup.h   |    9 ++++++
 block/blk-throttle.c |    8 ++--
 block/cfq-iosched.c  |    2 +-
 4 files changed, 70 insertions(+), 27 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 9e013bb..1db8a27 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -822,9 +822,12 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg,
 }
 
 static int blkio_policy_parse_and_set(char *buf,
-	struct blkio_policy_node *newpn, enum blkio_policy_id plid, int fileid)
+				      struct blkio_policy_node *newpn,
+				      enum blkio_policy_id plid, int fileid,
+				      struct blkio_cgroup *blkcg)
 {
 	struct gendisk *disk = NULL;
+	struct blkio_group *blkg = NULL;
 	char *s[4], *p, *major_s = NULL, *minor_s = NULL;
 	unsigned long major, minor;
 	int i = 0, ret = -EINVAL;
@@ -870,12 +873,19 @@ static int blkio_policy_parse_and_set(char *buf,
 		goto out;
 
 	/* For rule removal, do not check for device presence. */
-	if (temp) {
-		disk = get_gendisk(dev, &part);
-		if (!disk || part) {
-			ret = -ENODEV;
-			goto out;
-		}
+	disk = get_gendisk(dev, &part);
+
+	if ((!disk || part) && temp) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	rcu_read_lock();
+
+	if (disk && !part) {
+		spin_lock_irq(disk->queue->queue_lock);
+		blkg = blkg_lookup_create(blkcg, disk->queue, plid);
+		spin_unlock_irq(disk->queue->queue_lock);
 	}
 
 	newpn->dev = dev;
@@ -884,25 +894,46 @@ static int blkio_policy_parse_and_set(char *buf,
 	case BLKIO_POLICY_PROP:
 		if ((temp < BLKIO_WEIGHT_MIN && temp > 0) ||
 		     temp > BLKIO_WEIGHT_MAX)
-			goto out;
+			goto out_unlock;
 
 		newpn->plid = plid;
 		newpn->fileid = fileid;
 		newpn->val.weight = temp;
+		if (blkg)
+			blkg->conf.weight = temp;
 		break;
 	case BLKIO_POLICY_THROTL:
 		switch(fileid) {
 		case BLKIO_THROTL_read_bps_device:
+			if (blkg)
+				blkg->conf.bps[READ] = temp;
+			newpn->plid = plid;
+			newpn->fileid = fileid;
+			newpn->val.bps = temp;
+			break;
 		case BLKIO_THROTL_write_bps_device:
+			if (blkg)
+				blkg->conf.bps[WRITE] = temp;
 			newpn->plid = plid;
 			newpn->fileid = fileid;
 			newpn->val.bps = temp;
 			break;
 		case BLKIO_THROTL_read_iops_device:
+			if (temp > THROTL_IOPS_MAX)
+				goto out_unlock;
+
+			if (blkg)
+				blkg->conf.iops[READ] = temp;
+			newpn->plid = plid;
+			newpn->fileid = fileid;
+			newpn->val.iops = (unsigned int)temp;
+			break;
 		case BLKIO_THROTL_write_iops_device:
 			if (temp > THROTL_IOPS_MAX)
-				goto out;
+				goto out_unlock;
 
+			if (blkg)
+				blkg->conf.iops[WRITE] = temp;
 			newpn->plid = plid;
 			newpn->fileid = fileid;
 			newpn->val.iops = (unsigned int)temp;
@@ -913,6 +944,8 @@ static int blkio_policy_parse_and_set(char *buf,
 		BUG();
 	}
 	ret = 0;
+out_unlock:
+	rcu_read_unlock();
 out:
 	put_disk(disk);
 	return ret;
@@ -1062,26 +1095,29 @@ static void blkio_update_policy_rule(struct blkio_policy_node *oldpn,
 static void blkio_update_blkg_policy(struct blkio_cgroup *blkcg,
 		struct blkio_group *blkg, struct blkio_policy_node *pn)
 {
-	unsigned int weight, iops;
-	u64 bps;
+	struct blkio_group_conf *conf = &blkg->conf;
 
 	switch(pn->plid) {
 	case BLKIO_POLICY_PROP:
-		weight = pn->val.weight ? pn->val.weight :
-				blkcg->weight;
-		blkio_update_group_weight(blkg, weight);
+		blkio_update_group_weight(blkg, conf->weight ?: blkcg->weight);
 		break;
 	case BLKIO_POLICY_THROTL:
 		switch(pn->fileid) {
 		case BLKIO_THROTL_read_bps_device:
+			blkio_update_group_bps(blkg, conf->bps[READ] ?: -1,
+					       pn->fileid);
+			break;
 		case BLKIO_THROTL_write_bps_device:
-			bps = pn->val.bps ? pn->val.bps : (-1);
-			blkio_update_group_bps(blkg, bps, pn->fileid);
+			blkio_update_group_bps(blkg, conf->bps[WRITE] ?: -1,
+					       pn->fileid);
 			break;
 		case BLKIO_THROTL_read_iops_device:
+			blkio_update_group_iops(blkg, conf->iops[READ] ?: -1,
+						pn->fileid);
+			break;
 		case BLKIO_THROTL_write_iops_device:
-			iops = pn->val.iops ? pn->val.iops : (-1);
-			blkio_update_group_iops(blkg, iops, pn->fileid);
+			blkio_update_group_iops(blkg, conf->iops[WRITE] ?: -1,
+						pn->fileid);
 			break;
 		}
 		break;
@@ -1119,7 +1155,7 @@ static int blkiocg_file_write(struct cgroup *cgrp, struct cftype *cft,
 	int ret = 0;
 	char *buf;
 	struct blkio_policy_node *newpn, *pn;
-	struct blkio_cgroup *blkcg;
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 	int keep_newpn = 0;
 	enum blkio_policy_id plid = BLKIOFILE_POLICY(cft->private);
 	int fileid = BLKIOFILE_ATTR(cft->private);
@@ -1134,12 +1170,10 @@ static int blkiocg_file_write(struct cgroup *cgrp, struct cftype *cft,
 		goto free_buf;
 	}
 
-	ret = blkio_policy_parse_and_set(buf, newpn, plid, fileid);
+	ret = blkio_policy_parse_and_set(buf, newpn, plid, fileid, blkcg);
 	if (ret)
 		goto free_newpn;
 
-	blkcg = cgroup_to_blkio_cgroup(cgrp);
-
 	spin_lock_irq(&blkcg->lock);
 
 	pn = blkio_policy_search_node(blkcg, newpn->dev, plid, fileid);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index e8cbc52..07bc7d9 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -160,6 +160,12 @@ struct blkio_group_stats_cpu {
 	struct u64_stats_sync syncp;
 };
 
+struct blkio_group_conf {
+	unsigned int weight;
+	unsigned int iops[2];
+	u64 bps[2];
+};
+
 struct blkio_group {
 	/* Pointer to the associated request_queue, RCU protected */
 	struct request_queue __rcu *q;
@@ -172,6 +178,9 @@ struct blkio_group {
 	/* policy which owns this blk group */
 	enum blkio_policy_id plid;
 
+	/* Configuration */
+	struct blkio_group_conf conf;
+
 	/* Need to serialize the stats in the case of reset/update */
 	spinlock_t stats_lock;
 	struct blkio_group_stats stats;
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index bb5af29..880d166 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -196,10 +196,10 @@ static struct blkio_group *throtl_alloc_blkio_group(struct request_queue *q,
 	bio_list_init(&tg->bio_lists[1]);
 	tg->limits_changed = false;
 
-	tg->bps[READ] = blkcg_get_read_bps(blkcg, tg->blkg.dev);
-	tg->bps[WRITE] = blkcg_get_write_bps(blkcg, tg->blkg.dev);
-	tg->iops[READ] = blkcg_get_read_iops(blkcg, tg->blkg.dev);
-	tg->iops[WRITE] = blkcg_get_write_iops(blkcg, tg->blkg.dev);
+	tg->bps[READ] = -1;
+	tg->bps[WRITE] = -1;
+	tg->iops[READ] = -1;
+	tg->iops[WRITE] = -1;
 
 	/*
 	 * Take the initial reference that will be released on destroy
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index da16f05..dbb5cf3 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1070,7 +1070,7 @@ static struct blkio_group *cfq_alloc_blkio_group(struct request_queue *q,
 	RB_CLEAR_NODE(&cfqg->rb_node);
 
 	cfqg->ttime.last_end_request = jiffies;
-	cfqg->weight = blkcg_get_weight(blkcg, cfqg->blkg.dev);
+	cfqg->weight = blkcg->weight;
 
 	/*
 	 * Take the initial reference that will be released on destroy
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/12] blkcg: kill blkio_policy_node
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
                   ` (9 preceding siblings ...)
  2012-01-19  1:11 ` [PATCH 10/12] blkcg: don't allow or retain configuration of missing devices Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19 20:32   ` Vivek Goyal
  2012-01-19  1:11 ` [PATCH 12/12] blkcg: kill the mind-bending blkg->dev Tejun Heo
  2012-01-19  1:18 ` [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
  12 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

Now that blkcg configuration lives in blkg's, blkio_policy_node is no
longer necessary.  Kill it.

blkio_policy_parse_and_set() now fails if invoked for missing device
and functions to print out configurations are updated to print from
blkg's.

cftype_blkg_same_policy() is dropped along with other policy functions
for consistency.  Its one line is open coded in the only user -
blkio_read_blkg_stats().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c |  430 +++++++---------------------------------------------
 block/blk-cgroup.h |   32 ----
 2 files changed, 58 insertions(+), 404 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1db8a27..e93610c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -61,54 +61,6 @@ struct cgroup_subsys blkio_subsys = {
 };
 EXPORT_SYMBOL_GPL(blkio_subsys);
 
-static inline void blkio_policy_insert_node(struct blkio_cgroup *blkcg,
-					    struct blkio_policy_node *pn)
-{
-	list_add(&pn->node, &blkcg->policy_list);
-}
-
-static inline bool cftype_blkg_same_policy(struct cftype *cft,
-			struct blkio_group *blkg)
-{
-	enum blkio_policy_id plid = BLKIOFILE_POLICY(cft->private);
-
-	if (blkg->plid == plid)
-		return 1;
-
-	return 0;
-}
-
-/* Determines if policy node matches cgroup file being accessed */
-static inline bool pn_matches_cftype(struct cftype *cft,
-			struct blkio_policy_node *pn)
-{
-	enum blkio_policy_id plid = BLKIOFILE_POLICY(cft->private);
-	int fileid = BLKIOFILE_ATTR(cft->private);
-
-	return (plid == pn->plid && fileid == pn->fileid);
-}
-
-/* Must be called with blkcg->lock held */
-static inline void blkio_policy_delete_node(struct blkio_policy_node *pn)
-{
-	list_del(&pn->node);
-}
-
-/* Must be called with blkcg->lock held */
-static struct blkio_policy_node *
-blkio_policy_search_node(const struct blkio_cgroup *blkcg, dev_t dev,
-		enum blkio_policy_id plid, int fileid)
-{
-	struct blkio_policy_node *pn;
-
-	list_for_each_entry(pn, &blkcg->policy_list, node) {
-		if (pn->dev == dev && pn->plid == plid && pn->fileid == fileid)
-			return pn;
-	}
-
-	return NULL;
-}
-
 struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup)
 {
 	return container_of(cgroup_subsys_state(cgroup, blkio_subsys_id),
@@ -821,10 +773,8 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg,
 	return disk_total;
 }
 
-static int blkio_policy_parse_and_set(char *buf,
-				      struct blkio_policy_node *newpn,
-				      enum blkio_policy_id plid, int fileid,
-				      struct blkio_cgroup *blkcg)
+static int blkio_policy_parse_and_set(char *buf, enum blkio_policy_id plid,
+				      int fileid, struct blkio_cgroup *blkcg)
 {
 	struct gendisk *disk = NULL;
 	struct blkio_group *blkg = NULL;
@@ -872,23 +822,18 @@ static int blkio_policy_parse_and_set(char *buf,
 	if (strict_strtoull(s[1], 10, &temp))
 		goto out;
 
-	/* For rule removal, do not check for device presence. */
 	disk = get_gendisk(dev, &part);
-
-	if ((!disk || part) && temp) {
-		ret = -ENODEV;
+	if (!disk || part)
 		goto out;
-	}
 
 	rcu_read_lock();
 
-	if (disk && !part) {
-		spin_lock_irq(disk->queue->queue_lock);
-		blkg = blkg_lookup_create(blkcg, disk->queue, plid);
-		spin_unlock_irq(disk->queue->queue_lock);
-	}
+	spin_lock_irq(disk->queue->queue_lock);
+	blkg = blkg_lookup_create(blkcg, disk->queue, plid);
+	spin_unlock_irq(disk->queue->queue_lock);
 
-	newpn->dev = dev;
+	if (!blkg)
+		goto out_unlock;
 
 	switch (plid) {
 	case BLKIO_POLICY_PROP:
@@ -896,47 +841,30 @@ static int blkio_policy_parse_and_set(char *buf,
 		     temp > BLKIO_WEIGHT_MAX)
 			goto out_unlock;
 
-		newpn->plid = plid;
-		newpn->fileid = fileid;
-		newpn->val.weight = temp;
-		if (blkg)
-			blkg->conf.weight = temp;
+		blkg->conf.weight = temp;
+		blkio_update_group_weight(blkg, temp ?: blkcg->weight);
 		break;
 	case BLKIO_POLICY_THROTL:
 		switch(fileid) {
 		case BLKIO_THROTL_read_bps_device:
-			if (blkg)
-				blkg->conf.bps[READ] = temp;
-			newpn->plid = plid;
-			newpn->fileid = fileid;
-			newpn->val.bps = temp;
+			blkg->conf.bps[READ] = temp;
+			blkio_update_group_bps(blkg, temp ?: -1, fileid);
 			break;
 		case BLKIO_THROTL_write_bps_device:
-			if (blkg)
-				blkg->conf.bps[WRITE] = temp;
-			newpn->plid = plid;
-			newpn->fileid = fileid;
-			newpn->val.bps = temp;
+			blkg->conf.bps[WRITE] = temp;
+			blkio_update_group_bps(blkg, temp ?: -1, fileid);
 			break;
 		case BLKIO_THROTL_read_iops_device:
 			if (temp > THROTL_IOPS_MAX)
 				goto out_unlock;
-
-			if (blkg)
-				blkg->conf.iops[READ] = temp;
-			newpn->plid = plid;
-			newpn->fileid = fileid;
-			newpn->val.iops = (unsigned int)temp;
+			blkg->conf.iops[READ] = temp;
+			blkio_update_group_iops(blkg, temp ?: -1, fileid);
 			break;
 		case BLKIO_THROTL_write_iops_device:
 			if (temp > THROTL_IOPS_MAX)
 				goto out_unlock;
-
-			if (blkg)
-				blkg->conf.iops[WRITE] = temp;
-			newpn->plid = plid;
-			newpn->fileid = fileid;
-			newpn->val.iops = (unsigned int)temp;
+			blkg->conf.iops[WRITE] = temp;
+			blkio_update_group_iops(blkg, temp ?: -1, fileid);
 			break;
 		}
 		break;
@@ -951,212 +879,12 @@ out:
 	return ret;
 }
 
-unsigned int blkcg_get_weight(struct blkio_cgroup *blkcg,
-			      dev_t dev)
-{
-	struct blkio_policy_node *pn;
-	unsigned long flags;
-	unsigned int weight;
-
-	spin_lock_irqsave(&blkcg->lock, flags);
-
-	pn = blkio_policy_search_node(blkcg, dev, BLKIO_POLICY_PROP,
-				BLKIO_PROP_weight_device);
-	if (pn)
-		weight = pn->val.weight;
-	else
-		weight = blkcg->weight;
-
-	spin_unlock_irqrestore(&blkcg->lock, flags);
-
-	return weight;
-}
-EXPORT_SYMBOL_GPL(blkcg_get_weight);
-
-uint64_t blkcg_get_read_bps(struct blkio_cgroup *blkcg, dev_t dev)
-{
-	struct blkio_policy_node *pn;
-	unsigned long flags;
-	uint64_t bps = -1;
-
-	spin_lock_irqsave(&blkcg->lock, flags);
-	pn = blkio_policy_search_node(blkcg, dev, BLKIO_POLICY_THROTL,
-				BLKIO_THROTL_read_bps_device);
-	if (pn)
-		bps = pn->val.bps;
-	spin_unlock_irqrestore(&blkcg->lock, flags);
-
-	return bps;
-}
-
-uint64_t blkcg_get_write_bps(struct blkio_cgroup *blkcg, dev_t dev)
-{
-	struct blkio_policy_node *pn;
-	unsigned long flags;
-	uint64_t bps = -1;
-
-	spin_lock_irqsave(&blkcg->lock, flags);
-	pn = blkio_policy_search_node(blkcg, dev, BLKIO_POLICY_THROTL,
-				BLKIO_THROTL_write_bps_device);
-	if (pn)
-		bps = pn->val.bps;
-	spin_unlock_irqrestore(&blkcg->lock, flags);
-
-	return bps;
-}
-
-unsigned int blkcg_get_read_iops(struct blkio_cgroup *blkcg, dev_t dev)
-{
-	struct blkio_policy_node *pn;
-	unsigned long flags;
-	unsigned int iops = -1;
-
-	spin_lock_irqsave(&blkcg->lock, flags);
-	pn = blkio_policy_search_node(blkcg, dev, BLKIO_POLICY_THROTL,
-				BLKIO_THROTL_read_iops_device);
-	if (pn)
-		iops = pn->val.iops;
-	spin_unlock_irqrestore(&blkcg->lock, flags);
-
-	return iops;
-}
-
-unsigned int blkcg_get_write_iops(struct blkio_cgroup *blkcg, dev_t dev)
-{
-	struct blkio_policy_node *pn;
-	unsigned long flags;
-	unsigned int iops = -1;
-
-	spin_lock_irqsave(&blkcg->lock, flags);
-	pn = blkio_policy_search_node(blkcg, dev, BLKIO_POLICY_THROTL,
-				BLKIO_THROTL_write_iops_device);
-	if (pn)
-		iops = pn->val.iops;
-	spin_unlock_irqrestore(&blkcg->lock, flags);
-
-	return iops;
-}
-
-/* Checks whether user asked for deleting a policy rule */
-static bool blkio_delete_rule_command(struct blkio_policy_node *pn)
-{
-	switch(pn->plid) {
-	case BLKIO_POLICY_PROP:
-		if (pn->val.weight == 0)
-			return 1;
-		break;
-	case BLKIO_POLICY_THROTL:
-		switch(pn->fileid) {
-		case BLKIO_THROTL_read_bps_device:
-		case BLKIO_THROTL_write_bps_device:
-			if (pn->val.bps == 0)
-				return 1;
-			break;
-		case BLKIO_THROTL_read_iops_device:
-		case BLKIO_THROTL_write_iops_device:
-			if (pn->val.iops == 0)
-				return 1;
-		}
-		break;
-	default:
-		BUG();
-	}
-
-	return 0;
-}
-
-static void blkio_update_policy_rule(struct blkio_policy_node *oldpn,
-					struct blkio_policy_node *newpn)
-{
-	switch(oldpn->plid) {
-	case BLKIO_POLICY_PROP:
-		oldpn->val.weight = newpn->val.weight;
-		break;
-	case BLKIO_POLICY_THROTL:
-		switch(newpn->fileid) {
-		case BLKIO_THROTL_read_bps_device:
-		case BLKIO_THROTL_write_bps_device:
-			oldpn->val.bps = newpn->val.bps;
-			break;
-		case BLKIO_THROTL_read_iops_device:
-		case BLKIO_THROTL_write_iops_device:
-			oldpn->val.iops = newpn->val.iops;
-		}
-		break;
-	default:
-		BUG();
-	}
-}
-
-/*
- * Some rules/values in blkg have changed. Propagate those to respective
- * policies.
- */
-static void blkio_update_blkg_policy(struct blkio_cgroup *blkcg,
-		struct blkio_group *blkg, struct blkio_policy_node *pn)
-{
-	struct blkio_group_conf *conf = &blkg->conf;
-
-	switch(pn->plid) {
-	case BLKIO_POLICY_PROP:
-		blkio_update_group_weight(blkg, conf->weight ?: blkcg->weight);
-		break;
-	case BLKIO_POLICY_THROTL:
-		switch(pn->fileid) {
-		case BLKIO_THROTL_read_bps_device:
-			blkio_update_group_bps(blkg, conf->bps[READ] ?: -1,
-					       pn->fileid);
-			break;
-		case BLKIO_THROTL_write_bps_device:
-			blkio_update_group_bps(blkg, conf->bps[WRITE] ?: -1,
-					       pn->fileid);
-			break;
-		case BLKIO_THROTL_read_iops_device:
-			blkio_update_group_iops(blkg, conf->iops[READ] ?: -1,
-						pn->fileid);
-			break;
-		case BLKIO_THROTL_write_iops_device:
-			blkio_update_group_iops(blkg, conf->iops[WRITE] ?: -1,
-						pn->fileid);
-			break;
-		}
-		break;
-	default:
-		BUG();
-	}
-}
-
-/*
- * A policy node rule has been updated. Propagate this update to all the
- * block groups which might be affected by this update.
- */
-static void blkio_update_policy_node_blkg(struct blkio_cgroup *blkcg,
-				struct blkio_policy_node *pn)
-{
-	struct blkio_group *blkg;
-	struct hlist_node *n;
-
-	spin_lock(&blkio_list_lock);
-	spin_lock_irq(&blkcg->lock);
-
-	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
-		if (pn->dev != blkg->dev || pn->plid != blkg->plid)
-			continue;
-		blkio_update_blkg_policy(blkcg, blkg, pn);
-	}
-
-	spin_unlock_irq(&blkcg->lock);
-	spin_unlock(&blkio_list_lock);
-}
-
 static int blkiocg_file_write(struct cgroup *cgrp, struct cftype *cft,
  				       const char *buffer)
 {
 	int ret = 0;
 	char *buf;
-	struct blkio_policy_node *newpn, *pn;
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-	int keep_newpn = 0;
 	enum blkio_policy_id plid = BLKIOFILE_POLICY(cft->private);
 	int fileid = BLKIOFILE_ATTR(cft->private);
 
@@ -1164,69 +892,42 @@ static int blkiocg_file_write(struct cgroup *cgrp, struct cftype *cft,
 	if (!buf)
 		return -ENOMEM;
 
-	newpn = kzalloc(sizeof(*newpn), GFP_KERNEL);
-	if (!newpn) {
-		ret = -ENOMEM;
-		goto free_buf;
-	}
-
-	ret = blkio_policy_parse_and_set(buf, newpn, plid, fileid, blkcg);
-	if (ret)
-		goto free_newpn;
-
-	spin_lock_irq(&blkcg->lock);
-
-	pn = blkio_policy_search_node(blkcg, newpn->dev, plid, fileid);
-	if (!pn) {
-		if (!blkio_delete_rule_command(newpn)) {
-			blkio_policy_insert_node(blkcg, newpn);
-			keep_newpn = 1;
-		}
-		spin_unlock_irq(&blkcg->lock);
-		goto update_io_group;
-	}
-
-	if (blkio_delete_rule_command(newpn)) {
-		blkio_policy_delete_node(pn);
-		kfree(pn);
-		spin_unlock_irq(&blkcg->lock);
-		goto update_io_group;
-	}
-	spin_unlock_irq(&blkcg->lock);
-
-	blkio_update_policy_rule(pn, newpn);
-
-update_io_group:
-	blkio_update_policy_node_blkg(blkcg, newpn);
-
-free_newpn:
-	if (!keep_newpn)
-		kfree(newpn);
-free_buf:
+	ret = blkio_policy_parse_and_set(buf, plid, fileid, blkcg);
 	kfree(buf);
 	return ret;
 }
 
-static void
-blkio_print_policy_node(struct seq_file *m, struct blkio_policy_node *pn)
+static void blkio_print_group_conf(struct cftype *cft, struct blkio_group *blkg,
+				   struct seq_file *m)
 {
-	switch(pn->plid) {
+	int fileid = BLKIOFILE_ATTR(cft->private);
+	int rw = WRITE;
+
+	switch (blkg->plid) {
 		case BLKIO_POLICY_PROP:
-			if (pn->fileid == BLKIO_PROP_weight_device)
-				seq_printf(m, "%u:%u\t%u\n", MAJOR(pn->dev),
-					MINOR(pn->dev), pn->val.weight);
+			if (blkg->conf.weight)
+				seq_printf(m, "%u:%u\t%u\n", MAJOR(blkg->dev),
+					MINOR(blkg->dev), blkg->conf.weight);
 			break;
 		case BLKIO_POLICY_THROTL:
-			switch(pn->fileid) {
+			switch (fileid) {
 			case BLKIO_THROTL_read_bps_device:
+				rw = READ;
 			case BLKIO_THROTL_write_bps_device:
-				seq_printf(m, "%u:%u\t%llu\n", MAJOR(pn->dev),
-					MINOR(pn->dev), pn->val.bps);
+				if (blkg->conf.bps[rw])
+					seq_printf(m, "%u:%u\t%llu\n",
+						   MAJOR(blkg->dev),
+						   MINOR(blkg->dev),
+						   blkg->conf.bps[rw]);
 				break;
 			case BLKIO_THROTL_read_iops_device:
+				rw = READ;
 			case BLKIO_THROTL_write_iops_device:
-				seq_printf(m, "%u:%u\t%u\n", MAJOR(pn->dev),
-					MINOR(pn->dev), pn->val.iops);
+				if (blkg->conf.iops[rw])
+					seq_printf(m, "%u:%u\t%u\n",
+						   MAJOR(blkg->dev),
+						   MINOR(blkg->dev),
+						   blkg->conf.iops[rw]);
 				break;
 			}
 			break;
@@ -1236,20 +937,17 @@ blkio_print_policy_node(struct seq_file *m, struct blkio_policy_node *pn)
 }
 
 /* cgroup files which read their data from policy nodes end up here */
-static void blkio_read_policy_node_files(struct cftype *cft,
-			struct blkio_cgroup *blkcg, struct seq_file *m)
+static void blkio_read_conf(struct cftype *cft, struct blkio_cgroup *blkcg,
+			    struct seq_file *m)
 {
-	struct blkio_policy_node *pn;
+	struct blkio_group *blkg;
+	struct hlist_node *n;
 
-	if (!list_empty(&blkcg->policy_list)) {
-		spin_lock_irq(&blkcg->lock);
-		list_for_each_entry(pn, &blkcg->policy_list, node) {
-			if (!pn_matches_cftype(cft, pn))
-				continue;
-			blkio_print_policy_node(m, pn);
-		}
-		spin_unlock_irq(&blkcg->lock);
-	}
+	spin_lock_irq(&blkcg->lock);
+	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node)
+		if (BLKIOFILE_POLICY(cft->private) == blkg->plid)
+			blkio_print_group_conf(cft, blkg, m);
+	spin_unlock_irq(&blkcg->lock);
 }
 
 static int blkiocg_file_read(struct cgroup *cgrp, struct cftype *cft,
@@ -1265,7 +963,7 @@ static int blkiocg_file_read(struct cgroup *cgrp, struct cftype *cft,
 	case BLKIO_POLICY_PROP:
 		switch(name) {
 		case BLKIO_PROP_weight_device:
-			blkio_read_policy_node_files(cft, blkcg, m);
+			blkio_read_conf(cft, blkcg, m);
 			return 0;
 		default:
 			BUG();
@@ -1277,7 +975,7 @@ static int blkiocg_file_read(struct cgroup *cgrp, struct cftype *cft,
 		case BLKIO_THROTL_write_bps_device:
 		case BLKIO_THROTL_read_iops_device:
 		case BLKIO_THROTL_write_iops_device:
-			blkio_read_policy_node_files(cft, blkcg, m);
+			blkio_read_conf(cft, blkcg, m);
 			return 0;
 		default:
 			BUG();
@@ -1301,7 +999,7 @@ static int blkio_read_blkg_stats(struct blkio_cgroup *blkcg,
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node) {
 		if (blkg->dev) {
-			if (!cftype_blkg_same_policy(cft, blkg))
+			if (BLKIOFILE_POLICY(cft->private) != blkg->plid)
 				continue;
 			if (pcpu)
 				cgroup_total += blkio_get_stat_cpu(blkg, cb,
@@ -1400,11 +1098,10 @@ static int blkiocg_file_read_map(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
-static int blkio_weight_write(struct blkio_cgroup *blkcg, u64 val)
+static int blkio_weight_write(struct blkio_cgroup *blkcg, int plid, u64 val)
 {
 	struct blkio_group *blkg;
 	struct hlist_node *n;
-	struct blkio_policy_node *pn;
 
 	if (val < BLKIO_WEIGHT_MIN || val > BLKIO_WEIGHT_MAX)
 		return -EINVAL;
@@ -1413,14 +1110,10 @@ static int blkio_weight_write(struct blkio_cgroup *blkcg, u64 val)
 	spin_lock_irq(&blkcg->lock);
 	blkcg->weight = (unsigned int)val;
 
-	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
-		pn = blkio_policy_search_node(blkcg, blkg->dev,
-				BLKIO_POLICY_PROP, BLKIO_PROP_weight_device);
-		if (pn)
-			continue;
+	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node)
+		if (blkg->plid == plid && !blkg->conf.weight)
+			blkio_update_group_weight(blkg, blkcg->weight);
 
-		blkio_update_group_weight(blkg, blkcg->weight);
-	}
 	spin_unlock_irq(&blkcg->lock);
 	spin_unlock(&blkio_list_lock);
 	return 0;
@@ -1459,7 +1152,7 @@ blkiocg_file_write_u64(struct cgroup *cgrp, struct cftype *cft, u64 val)
 	case BLKIO_POLICY_PROP:
 		switch(name) {
 		case BLKIO_PROP_weight:
-			return blkio_weight_write(blkcg, val);
+			return blkio_weight_write(blkcg, plid, val);
 		}
 		break;
 	default:
@@ -1640,7 +1333,6 @@ static void blkiocg_destroy(struct cgroup_subsys *subsys, struct cgroup *cgroup)
 	struct blkio_group *blkg;
 	struct request_queue *q;
 	struct blkio_policy_type *blkiop;
-	struct blkio_policy_node *pn, *pntmp;
 
 	rcu_read_lock();
 	do {
@@ -1672,11 +1364,6 @@ static void blkiocg_destroy(struct cgroup_subsys *subsys, struct cgroup *cgroup)
 		spin_unlock(&blkio_list_lock);
 	} while (1);
 
-	list_for_each_entry_safe(pn, pntmp, &blkcg->policy_list, node) {
-		blkio_policy_delete_node(pn);
-		kfree(pn);
-	}
-
 	free_css_id(&blkio_subsys, &blkcg->css);
 	rcu_read_unlock();
 	if (blkcg != &blkio_root_cgroup)
@@ -1703,7 +1390,6 @@ done:
 	spin_lock_init(&blkcg->lock);
 	INIT_HLIST_HEAD(&blkcg->blkg_list);
 
-	INIT_LIST_HEAD(&blkcg->policy_list);
 	return &blkcg->css;
 }
 
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 07bc7d9..5524971 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -118,7 +118,6 @@ struct blkio_cgroup {
 	unsigned int weight;
 	spinlock_t lock;
 	struct hlist_head blkg_list;
-	struct list_head policy_list; /* list of blkio_policy_node */
 };
 
 struct blkio_group_stats {
@@ -188,37 +187,6 @@ struct blkio_group {
 	struct blkio_group_stats_cpu __percpu *stats_cpu;
 };
 
-struct blkio_policy_node {
-	struct list_head node;
-	dev_t dev;
-	/* This node belongs to max bw policy or porportional weight policy */
-	enum blkio_policy_id plid;
-	/* cgroup file to which this rule belongs to */
-	int fileid;
-
-	union {
-		unsigned int weight;
-		/*
-		 * Rate read/write in terms of bytes per second
-		 * Whether this rate represents read or write is determined
-		 * by file type "fileid".
-		 */
-		u64 bps;
-		unsigned int iops;
-	} val;
-};
-
-extern unsigned int blkcg_get_weight(struct blkio_cgroup *blkcg,
-				     dev_t dev);
-extern uint64_t blkcg_get_read_bps(struct blkio_cgroup *blkcg,
-				     dev_t dev);
-extern uint64_t blkcg_get_write_bps(struct blkio_cgroup *blkcg,
-				     dev_t dev);
-extern unsigned int blkcg_get_read_iops(struct blkio_cgroup *blkcg,
-				     dev_t dev);
-extern unsigned int blkcg_get_write_iops(struct blkio_cgroup *blkcg,
-				     dev_t dev);
-
 typedef struct blkio_group *(blkio_alloc_group_fn)(struct request_queue *q,
 						   struct blkio_cgroup *blkcg);
 typedef void (blkio_link_group_fn)(struct request_queue *q,
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 12/12] blkcg: kill the mind-bending blkg->dev
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
                   ` (10 preceding siblings ...)
  2012-01-19  1:11 ` [PATCH 11/12] blkcg: kill blkio_policy_node Tejun Heo
@ 2012-01-19  1:11 ` Tejun Heo
  2012-01-19 15:49   ` Vivek Goyal
  2012-01-19  1:18 ` [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
  12 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:11 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel, Tejun Heo

blkg->dev is dev_t recording the device number of the block device for
the associated request_queue.  It is used to identify the associated
block device when printing out configuration or stats.

This is redundant to begin with.  A blkg is an association between a
cgroup and a request_queue and it of course is possible to reach
request_queue from blkg and synchronization conventions are in place
for safe q dereferencing, so this shouldn't be necessary from the
beginning.  Furthermore, it's initialized by sscanf()ing the device
name of backing_dev_info.  The mind boggles.

Anyways, if blkg is visible under rcu lock, we *know* that the
associated request_queue hasn't gone away yet and its bdi is
registered and alive - blkg can't be created for request_queue which
hasn't been fully initialized and it can't go away before blkg is
removed.

Let stat and conf read functions get device name from
blkg->q->backing_dev_info.dev and pass it down to printing functions
and remove blkg->dev.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
---
 block/blk-cgroup.c   |   86 ++++++++++++++++++++++++++------------------------
 block/blk-cgroup.h   |    2 -
 block/blk-throttle.c |   51 +----------------------------
 block/cfq-iosched.c  |   21 ------------
 4 files changed, 47 insertions(+), 113 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index e93610c..c34fb6e 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -629,10 +629,10 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 	return 0;
 }
 
-static void blkio_get_key_name(enum stat_sub_type type, dev_t dev, char *str,
-				int chars_left, bool diskname_only)
+static void blkio_get_key_name(enum stat_sub_type type, const char *dname,
+			       char *str, int chars_left, bool diskname_only)
 {
-	snprintf(str, chars_left, "%d:%d", MAJOR(dev), MINOR(dev));
+	snprintf(str, chars_left, "%s", dname);
 	chars_left -= strlen(str);
 	if (chars_left <= 0) {
 		printk(KERN_WARNING
@@ -663,9 +663,9 @@ static void blkio_get_key_name(enum stat_sub_type type, dev_t dev, char *str,
 }
 
 static uint64_t blkio_fill_stat(char *str, int chars_left, uint64_t val,
-				struct cgroup_map_cb *cb, dev_t dev)
+				struct cgroup_map_cb *cb, const char *dname)
 {
-	blkio_get_key_name(0, dev, str, chars_left, true);
+	blkio_get_key_name(0, dname, str, chars_left, true);
 	cb->fill(cb, str, val);
 	return val;
 }
@@ -697,7 +697,8 @@ static uint64_t blkio_read_stat_cpu(struct blkio_group *blkg,
 }
 
 static uint64_t blkio_get_stat_cpu(struct blkio_group *blkg,
-		struct cgroup_map_cb *cb, dev_t dev, enum stat_type_cpu type)
+				   struct cgroup_map_cb *cb, const char *dname,
+				   enum stat_type_cpu type)
 {
 	uint64_t disk_total, val;
 	char key_str[MAX_KEY_LEN];
@@ -705,12 +706,14 @@ static uint64_t blkio_get_stat_cpu(struct blkio_group *blkg,
 
 	if (type == BLKIO_STAT_CPU_SECTORS) {
 		val = blkio_read_stat_cpu(blkg, type, 0);
-		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, val, cb, dev);
+		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, val, cb,
+				       dname);
 	}
 
 	for (sub_type = BLKIO_STAT_READ; sub_type < BLKIO_STAT_TOTAL;
 			sub_type++) {
-		blkio_get_key_name(sub_type, dev, key_str, MAX_KEY_LEN, false);
+		blkio_get_key_name(sub_type, dname, key_str, MAX_KEY_LEN,
+				   false);
 		val = blkio_read_stat_cpu(blkg, type, sub_type);
 		cb->fill(cb, key_str, val);
 	}
@@ -718,14 +721,16 @@ static uint64_t blkio_get_stat_cpu(struct blkio_group *blkg,
 	disk_total = blkio_read_stat_cpu(blkg, type, BLKIO_STAT_READ) +
 			blkio_read_stat_cpu(blkg, type, BLKIO_STAT_WRITE);
 
-	blkio_get_key_name(BLKIO_STAT_TOTAL, dev, key_str, MAX_KEY_LEN, false);
+	blkio_get_key_name(BLKIO_STAT_TOTAL, dname, key_str, MAX_KEY_LEN,
+			   false);
 	cb->fill(cb, key_str, disk_total);
 	return disk_total;
 }
 
 /* This should be called with blkg->stats_lock held */
 static uint64_t blkio_get_stat(struct blkio_group *blkg,
-		struct cgroup_map_cb *cb, dev_t dev, enum stat_type type)
+			       struct cgroup_map_cb *cb, const char *dname,
+			       enum stat_type type)
 {
 	uint64_t disk_total;
 	char key_str[MAX_KEY_LEN];
@@ -733,11 +738,11 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg,
 
 	if (type == BLKIO_STAT_TIME)
 		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
-					blkg->stats.time, cb, dev);
+					blkg->stats.time, cb, dname);
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 	if (type == BLKIO_STAT_UNACCOUNTED_TIME)
 		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
-					blkg->stats.unaccounted_time, cb, dev);
+				       blkg->stats.unaccounted_time, cb, dname);
 	if (type == BLKIO_STAT_AVG_QUEUE_SIZE) {
 		uint64_t sum = blkg->stats.avg_queue_size_sum;
 		uint64_t samples = blkg->stats.avg_queue_size_samples;
@@ -745,30 +750,33 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg,
 			do_div(sum, samples);
 		else
 			sum = 0;
-		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, sum, cb, dev);
+		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
+				       sum, cb, dname);
 	}
 	if (type == BLKIO_STAT_GROUP_WAIT_TIME)
 		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
-					blkg->stats.group_wait_time, cb, dev);
+				       blkg->stats.group_wait_time, cb, dname);
 	if (type == BLKIO_STAT_IDLE_TIME)
 		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
-					blkg->stats.idle_time, cb, dev);
+				       blkg->stats.idle_time, cb, dname);
 	if (type == BLKIO_STAT_EMPTY_TIME)
 		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
-					blkg->stats.empty_time, cb, dev);
+				       blkg->stats.empty_time, cb, dname);
 	if (type == BLKIO_STAT_DEQUEUE)
 		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
-					blkg->stats.dequeue, cb, dev);
+				       blkg->stats.dequeue, cb, dname);
 #endif
 
 	for (sub_type = BLKIO_STAT_READ; sub_type < BLKIO_STAT_TOTAL;
 			sub_type++) {
-		blkio_get_key_name(sub_type, dev, key_str, MAX_KEY_LEN, false);
+		blkio_get_key_name(sub_type, dname, key_str, MAX_KEY_LEN,
+				   false);
 		cb->fill(cb, key_str, blkg->stats.stat_arr[type][sub_type]);
 	}
 	disk_total = blkg->stats.stat_arr[type][BLKIO_STAT_READ] +
 			blkg->stats.stat_arr[type][BLKIO_STAT_WRITE];
-	blkio_get_key_name(BLKIO_STAT_TOTAL, dev, key_str, MAX_KEY_LEN, false);
+	blkio_get_key_name(BLKIO_STAT_TOTAL, dname, key_str, MAX_KEY_LEN,
+			   false);
 	cb->fill(cb, key_str, disk_total);
 	return disk_total;
 }
@@ -900,14 +908,15 @@ static int blkiocg_file_write(struct cgroup *cgrp, struct cftype *cft,
 static void blkio_print_group_conf(struct cftype *cft, struct blkio_group *blkg,
 				   struct seq_file *m)
 {
+	const char *dname = dev_name(blkg->q->backing_dev_info.dev);
 	int fileid = BLKIOFILE_ATTR(cft->private);
 	int rw = WRITE;
 
 	switch (blkg->plid) {
 		case BLKIO_POLICY_PROP:
 			if (blkg->conf.weight)
-				seq_printf(m, "%u:%u\t%u\n", MAJOR(blkg->dev),
-					MINOR(blkg->dev), blkg->conf.weight);
+				seq_printf(m, "%s\t%u\n",
+					   dname, blkg->conf.weight);
 			break;
 		case BLKIO_POLICY_THROTL:
 			switch (fileid) {
@@ -915,19 +924,15 @@ static void blkio_print_group_conf(struct cftype *cft, struct blkio_group *blkg,
 				rw = READ;
 			case BLKIO_THROTL_write_bps_device:
 				if (blkg->conf.bps[rw])
-					seq_printf(m, "%u:%u\t%llu\n",
-						   MAJOR(blkg->dev),
-						   MINOR(blkg->dev),
-						   blkg->conf.bps[rw]);
+					seq_printf(m, "%s\t%llu\n",
+						   dname, blkg->conf.bps[rw]);
 				break;
 			case BLKIO_THROTL_read_iops_device:
 				rw = READ;
 			case BLKIO_THROTL_write_iops_device:
 				if (blkg->conf.iops[rw])
-					seq_printf(m, "%u:%u\t%u\n",
-						   MAJOR(blkg->dev),
-						   MINOR(blkg->dev),
-						   blkg->conf.iops[rw]);
+					seq_printf(m, "%s\t%u\n",
+						   dname, blkg->conf.iops[rw]);
 				break;
 			}
 			break;
@@ -998,18 +1003,17 @@ static int blkio_read_blkg_stats(struct blkio_cgroup *blkcg,
 
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node) {
-		if (blkg->dev) {
-			if (BLKIOFILE_POLICY(cft->private) != blkg->plid)
-				continue;
-			if (pcpu)
-				cgroup_total += blkio_get_stat_cpu(blkg, cb,
-						blkg->dev, type);
-			else {
-				spin_lock_irq(&blkg->stats_lock);
-				cgroup_total += blkio_get_stat(blkg, cb,
-						blkg->dev, type);
-				spin_unlock_irq(&blkg->stats_lock);
-			}
+		const char *dname = dev_name(blkg->q->backing_dev_info.dev);
+
+		if (BLKIOFILE_POLICY(cft->private) != blkg->plid)
+			continue;
+		if (pcpu)
+			cgroup_total += blkio_get_stat_cpu(blkg, cb, dname,
+							   type);
+		else {
+			spin_lock_irq(&blkg->stats_lock);
+			cgroup_total += blkio_get_stat(blkg, cb, dname, type);
+			spin_unlock_irq(&blkg->stats_lock);
 		}
 	}
 	if (show_total)
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 5524971..d96a5ca 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -172,8 +172,6 @@ struct blkio_group {
 	unsigned short blkcg_id;
 	/* Store cgroup path */
 	char path[128];
-	/* The device MKDEV(major, minor), this group has been created for */
-	dev_t dev;
 	/* policy which owns this blk group */
 	enum blkio_policy_id plid;
 
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 880d166..0e78b88 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -212,50 +212,12 @@ static struct blkio_group *throtl_alloc_blkio_group(struct request_queue *q,
 	return &tg->blkg;
 }
 
-static void
-__throtl_tg_fill_dev_details(struct throtl_data *td, struct throtl_grp *tg)
-{
-	struct backing_dev_info *bdi = &td->queue->backing_dev_info;
-	unsigned int major, minor;
-
-	if (!tg || tg->blkg.dev)
-		return;
-
-	/*
-	 * Fill in device details for a group which might not have been
-	 * filled at group creation time as queue was being instantiated
-	 * and driver had not attached a device yet
-	 */
-	if (bdi->dev && dev_name(bdi->dev)) {
-		sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
-		tg->blkg.dev = MKDEV(major, minor);
-	}
-}
-
-/*
- * Should be called with without queue lock held. Here queue lock will be
- * taken rarely. It will be taken only once during life time of a group
- * if need be
- */
-static void
-throtl_tg_fill_dev_details(struct throtl_data *td, struct throtl_grp *tg)
-{
-	if (!tg || tg->blkg.dev)
-		return;
-
-	spin_lock_irq(td->queue->queue_lock);
-	__throtl_tg_fill_dev_details(td, tg);
-	spin_unlock_irq(td->queue->queue_lock);
-}
-
 static void throtl_link_blkio_group(struct request_queue *q,
 				    struct blkio_group *blkg)
 {
 	struct throtl_data *td = q->td;
 	struct throtl_grp *tg = tg_of_blkg(blkg);
 
-	__throtl_tg_fill_dev_details(td, tg);
-
 	hlist_add_head(&tg->tg_node, &td->tg_list);
 	td->nr_undestroyed_grps++;
 }
@@ -263,20 +225,14 @@ static void throtl_link_blkio_group(struct request_queue *q,
 static struct
 throtl_grp *throtl_lookup_tg(struct throtl_data *td, struct blkio_cgroup *blkcg)
 {
-	struct throtl_grp *tg = NULL;
-
 	/*
 	 * This is the common case when there are no blkio cgroups.
  	 * Avoid lookup in this case
  	 */
 	if (blkcg == &blkio_root_cgroup)
-		tg = td->root_tg;
-	else
-		tg = tg_of_blkg(blkg_lookup(blkcg, td->queue,
-					    BLKIO_POLICY_THROTL));
+		return td->root_tg;
 
-	__throtl_tg_fill_dev_details(td, tg);
-	return tg;
+	return tg_of_blkg(blkg_lookup(blkcg, td->queue, BLKIO_POLICY_THROTL));
 }
 
 static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td,
@@ -294,7 +250,6 @@ static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td,
 	else if (!blk_queue_dead(q))
 		tg = td->root_tg;
 
-	__throtl_tg_fill_dev_details(td, tg);
 	return tg;
 }
 
@@ -1058,8 +1013,6 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
 	blkcg = task_blkio_cgroup(current);
 	tg = throtl_lookup_tg(td, blkcg);
 	if (tg) {
-		throtl_tg_fill_dev_details(td, tg);
-
 		if (tg_no_rule_group(tg, rw)) {
 			blkiocg_update_dispatch_stats(&tg->blkg, bio->bi_size,
 					rw, rw_is_sync(bio->bi_rw));
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index dbb5cf3..a06f1a9 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1033,20 +1033,7 @@ static void cfq_link_blkio_group(struct request_queue *q,
 				 struct blkio_group *blkg)
 {
 	struct cfq_data *cfqd = q->elevator->elevator_data;
-	struct backing_dev_info *bdi = &q->backing_dev_info;
 	struct cfq_group *cfqg = cfqg_of_blkg(blkg);
-	unsigned int major, minor;
-
-	/*
-	 * Add group onto cgroup list. It might happen that bdi->dev is
-	 * not initialized yet. Initialize this new group without major
-	 * and minor info and this info will be filled in once a new thread
-	 * comes for IO.
-	 */
-	if (bdi->dev) {
-		sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
-		blkg->dev = MKDEV(major, minor);
-	}
 
 	cfqd->nr_blkcg_linked_grps++;
 
@@ -2777,7 +2764,6 @@ static struct cfq_queue *
 cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync,
 		     struct io_context *ioc, gfp_t gfp_mask)
 {
-	struct backing_dev_info *bdi = &cfqd->queue->backing_dev_info;
 	struct blkio_cgroup *blkcg;
 	struct cfq_queue *cfqq, *new_cfqq = NULL;
 	struct cfq_io_cq *cic;
@@ -2794,13 +2780,6 @@ retry:
 	else
 		cfqg = cfq_lookup_create_cfqg(cfqd, blkcg);
 
-	if (cfqg && !cfqg->blkg.dev && bdi->dev && dev_name(bdi->dev)) {
-		unsigned int major, minor;
-
-		sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
-		cfqg->blkg.dev = MKDEV(major, minor);
-	}
-
 	cic = cfq_cic_lookup(cfqd, ioc);
 	/* cic always exists here */
 	cfqq = cic_to_cfqq(cic, is_sync);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCHSET] blkcg: kill policy node and blkg->dev
  2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
                   ` (11 preceding siblings ...)
  2012-01-19  1:11 ` [PATCH 12/12] blkcg: kill the mind-bending blkg->dev Tejun Heo
@ 2012-01-19  1:18 ` Tejun Heo
  12 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19  1:18 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel

Ooh, BTW, the percpu stats removal unfortunately will be near the end
of cleanups.  I tried to pull it at the head of cleanups but couldn't.
Dunno what to do about -stable.  :(

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/12] blkcg: obtaining blkg should be enclosed inside rcu_read_lock()
  2012-01-19  1:11 ` [PATCH 01/12] blkcg: obtaining blkg should be enclosed inside rcu_read_lock() Tejun Heo
@ 2012-01-19 10:07   ` Vivek Goyal
  2012-01-19 15:39     ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 10:07 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Wed, Jan 18, 2012 at 05:11:19PM -0800, Tejun Heo wrote:
> When looking up or creating blkg's, both blk-throttle and cfq-iosched
> drops rcu_read_lock() right after lookup is complete.  This isn't
> safe.  Refcnt isn't incremented at that point and rcu lock is the only
> thing holding the blkg.  It shouldn't be dropped until after refcnt is
> incremented by the caller.

Hi Tejun,

throtl_get_tg() and cfq_get_cfqg() are called with queue lock held and
tg and cfqg are protected by queue lock as they can not go away as long
as queue lock is held.

If the group is already created, then caller accesses it under queue
lock and does not pass the pointer around. In case of throttling, caller
can use it under rcu also just to do update of stats (when there are
no rules).

I had used rcu read lock to access blkcg pointer here. That's why when
we are done with accessing blkcg, we drop rcu read lock and return back
to caller with group pointer, which is aready holding either a queue
lock or rcu read lock to protect returned group pointer.

So if we are protecting blkcg using rcu, then it should make sense to
take that lock inside throtl_get_tg() and cfq_get_cfqg() respectively and
it should not be left to the caller?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/12] cfq: don't register propio policy if !CONFIG_CFQ_GROUP_IOSCHED
  2012-01-19  1:11 ` [PATCH 02/12] cfq: don't register propio policy if !CONFIG_CFQ_GROUP_IOSCHED Tejun Heo
@ 2012-01-19 10:11   ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 10:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Wed, Jan 18, 2012 at 05:11:20PM -0800, Tejun Heo wrote:
> cfq has been registering zeroed blkio_poilcy_cfq if CFQ_GROUP_IOSCHED
> is disabled.  This fortunately doesn't collide with blk-throtl as
> BLKIO_POLICY_PROP is zero but is unnecessary and risky.  Just don't
> register it if not enabled.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Vivek Goyal <vgoyal@redhat.com>

Thanks. Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/12] elevator: make elevator_init_fn() return 0/-errno
  2012-01-19  1:11 ` [PATCH 04/12] elevator: make elevator_init_fn() return 0/-errno Tejun Heo
@ 2012-01-19 11:11   ` Vivek Goyal
  2012-01-19 15:44     ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 11:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Wed, Jan 18, 2012 at 05:11:22PM -0800, Tejun Heo wrote:
> elevator_ops->elevator_init_fn() has a weird return value.  It returns
> a void * which the caller should assign to q->elevator->elevator_data
> and %NULL return denotes init failure.
> 
> Update such that it returns integer 0/-errno and sets elevator_data
> directly as necessary.
> 

kmalloc() does the same thing. returning pointer means success and null
means failure.

Also elevator initialization might not necessarily mean that you are
the active elevator. So cfq_init_queue() assuming that I am the active
elevator and setting up q->elevator->elevator_data sounds odd to me.

Personally, I liked previous interface better. But, let me read rest of
the patches and see how you have made use of it to do more cleanup.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/12] blkcg: update blkg get functions take blkio_cgroup as parameter
  2012-01-19  1:11 ` [PATCH 05/12] blkcg: update blkg get functions take blkio_cgroup as parameter Tejun Heo
@ 2012-01-19 11:21   ` Vivek Goyal
  2012-01-19 15:45     ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 11:21 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Wed, Jan 18, 2012 at 05:11:23PM -0800, Tejun Heo wrote:

[..]
> -static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
> +static struct throtl_grp *throtl_get_tg(struct throtl_data *td,
> +					struct blkio_cgroup *blkcg)
>  {
>  	struct throtl_grp *tg = NULL, *__tg = NULL;
> -	struct blkio_cgroup *blkcg;
>  	struct request_queue *q = td->queue;
>  
>  	/* no throttling for dead queue */
>  	if (unlikely(blk_queue_dead(q)))
>  		return NULL;
>  
> -	blkcg = task_blkio_cgroup(current);
>  	tg = throtl_find_tg(td, blkcg);
>  	if (tg)
>  		return tg;
>  
> +	if (!css_tryget(&blkcg->css))
> +		return NULL;
> +

This reference is to make sure that blkcg does not go away while we drop
the rcu lock and do the group allocation?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/12] blkcg: use q and plid instead of opaque void * for blkio_group association
  2012-01-19  1:11 ` [PATCH 06/12] blkcg: use q and plid instead of opaque void * for blkio_group association Tejun Heo
@ 2012-01-19 14:04   ` Vivek Goyal
  2012-01-19 15:55     ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 14:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Wed, Jan 18, 2012 at 05:11:24PM -0800, Tejun Heo wrote:
> blkgio_group is association between a block cgroup and a queue for a
> given policy.  Using opaque void * for association makes things
> confusing and hinders factoring of common code.  Use request_queue *
> and, if necessary, policy id instead.
> 
> This will help block cgroup API cleanup.

Using void* allowed one to pass any type of data pointer as key by the
client.

I think passing cfq_data or throtl_data as key is better than passing
request queue as key.

- During elevator exit, it looks like there will be a small window where
  groups from both old elevator and new elevator will be present in blkcg
  list. Given the fact that there can be only one active elevator at a
  time, during cgroup removal call there is no way to reach a group's
  cfqd. One can only retrieve request queue reliably and can't rely
  on q->elevator->elevator_data.

  So passing cfq_data as key provides more flexibility and allows
  co-existence of two elevators more naturally without information loss.

[..]
> -static void cfq_unlink_blkio_group(void *key, struct blkio_group *blkg)
> +static void cfq_unlink_blkio_group(struct request_queue *q,
> +				   struct blkio_group *blkg)
>  {
> -	unsigned long  flags;
> -	struct cfq_data *cfqd = key;
> +	struct cfq_data *cfqd = q->elevator->elevator_data;
> +	unsigned long flags;
>  
> -	spin_lock_irqsave(cfqd->queue->queue_lock, flags);
> +	spin_lock_irqsave(q->queue_lock, flags);
>  	cfq_destroy_cfqg(cfqd, cfqg_of_blkg(blkg));
> -	spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
> +	spin_unlock_irqrestore(q->queue_lock, flags);

I think this code will create problem where both old elevator group and
new elevator group is on blkcg list and upon cgroup removal one can not
rely that q->elevator->elevator_data will give us old elevator's cfqd.

Having said that, in practice we might never hit it as elevator init time
we only initialize and connect root group in blkcg list and one can not
delete root cgroup so above function is never called for root group.

But I think it is confusing so it is probably better to register cfq_data
or throtl_data as key instead of request queue.
 
Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 08/12] blkcg: use the usual get blkg path for root blkio_group
  2012-01-19  1:11 ` [PATCH 08/12] blkcg: use the usual get blkg path for root blkio_group Tejun Heo
@ 2012-01-19 14:41   ` Vivek Goyal
  2012-01-19 16:17     ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 14:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Wed, Jan 18, 2012 at 05:11:26PM -0800, Tejun Heo wrote:
> For root blkg, blk_throtl_init() was using throtl_alloc_tg()
> explicitly and cfq_init_queue() was manually initializing embedded
> cfqd->root_group, adding unnecessarily different code paths to blkg
> handling.
> 
> Make both use the usual blkio_group get functions - throtl_get_tg()
> and cfq_get_cfqg() - for the root blkio_group too.  Note that
> blk_throtl_init() callsite is pushed downwards in
> blk_alloc_queue_node() so that @q is sufficiently initialized for
> throtl_get_tg().
> 
> This simplifies root blkg handling noticeably for cfq and will allow
> further modularization of blkcg API.

Does this work if BLK_CGROUP=n?

I am trying to think that why did I keep allocating root cgroup statically
for CFQ. May be the reason was that this group was present even if
CFQ_GROUP_IOSCHED=n. There was other code which assumed a group to be
always present and I did not want to put more #ifdef.

Anyway, if it works well, with BLK_CGROUP=n, then it is fine.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/12] blkcg: obtaining blkg should be enclosed inside rcu_read_lock()
  2012-01-19 10:07   ` Vivek Goyal
@ 2012-01-19 15:39     ` Tejun Heo
  2012-01-19 15:54       ` Vivek Goyal
  0 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19 15:39 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel

Hello, Vivek.

On Thu, Jan 19, 2012 at 05:07:29AM -0500, Vivek Goyal wrote:
> On Wed, Jan 18, 2012 at 05:11:19PM -0800, Tejun Heo wrote:
> > When looking up or creating blkg's, both blk-throttle and cfq-iosched
> > drops rcu_read_lock() right after lookup is complete.  This isn't
> > safe.  Refcnt isn't incremented at that point and rcu lock is the only
> > thing holding the blkg.  It shouldn't be dropped until after refcnt is
> > incremented by the caller.
> 
> throtl_get_tg() and cfq_get_cfqg() are called with queue lock held and
> tg and cfqg are protected by queue lock as they can not go away as long
> as queue lock is held.

Ah, right.

> I had used rcu read lock to access blkcg pointer here. That's why when
> we are done with accessing blkcg, we drop rcu read lock and return back
> to caller with group pointer, which is aready holding either a queue
> lock or rcu read lock to protect returned group pointer.
>
> So if we are protecting blkcg using rcu, then it should make sense to
> take that lock inside throtl_get_tg() and cfq_get_cfqg() respectively and
> it should not be left to the caller?

No, no matter whatever synchronization scheme is in use, the code is
seriously screwed up if it's doing something like,

	lock();
	a = lookup();
	unlock();
	return a;

You should *NEVER* be doing that.  It can't serve any purpose and is
misleading at best.  In this case, the right thing to do is dropping
the rcu read locking and using annotated rcu deferencing macros to
enforce q locked || rcu read locked.  I'll update the patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/12] elevator: make elevator_init_fn() return 0/-errno
  2012-01-19 11:11   ` Vivek Goyal
@ 2012-01-19 15:44     ` Tejun Heo
  0 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19 15:44 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel

Hello,

On Thu, Jan 19, 2012 at 06:11:51AM -0500, Vivek Goyal wrote:
> On Wed, Jan 18, 2012 at 05:11:22PM -0800, Tejun Heo wrote:
> > elevator_ops->elevator_init_fn() has a weird return value.  It returns
> > a void * which the caller should assign to q->elevator->elevator_data
> > and %NULL return denotes init failure.
> > 
> > Update such that it returns integer 0/-errno and sets elevator_data
> > directly as necessary.
> 
> kmalloc() does the same thing. returning pointer means success and null
> means failure.

Yes, allocation functions usually do that.  elevator_init_fn() isn't
exactly one.  It could be but not necessarily so.  Look at the
elevator init wrapper in elevator.c.  We shouldn't need something like
that.

> Also elevator initialization might not necessarily mean that you are
> the active elevator. So cfq_init_queue() assuming that I am the active
> elevator and setting up q->elevator->elevator_data sounds odd to me.

That was changed by the elvator switch sequence update earlier in this
series.

We need this change to use common blkg code for root blkg init.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/12] blkcg: update blkg get functions take blkio_cgroup as parameter
  2012-01-19 11:21   ` Vivek Goyal
@ 2012-01-19 15:45     ` Tejun Heo
  0 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19 15:45 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel

On Thu, Jan 19, 2012 at 06:21:02AM -0500, Vivek Goyal wrote:
> On Wed, Jan 18, 2012 at 05:11:23PM -0800, Tejun Heo wrote:
> 
> [..]
> > -static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
> > +static struct throtl_grp *throtl_get_tg(struct throtl_data *td,
> > +					struct blkio_cgroup *blkcg)
> >  {
> >  	struct throtl_grp *tg = NULL, *__tg = NULL;
> > -	struct blkio_cgroup *blkcg;
> >  	struct request_queue *q = td->queue;
> >  
> >  	/* no throttling for dead queue */
> >  	if (unlikely(blk_queue_dead(q)))
> >  		return NULL;
> >  
> > -	blkcg = task_blkio_cgroup(current);
> >  	tg = throtl_find_tg(td, blkcg);
> >  	if (tg)
> >  		return tg;
> >  
> > +	if (!css_tryget(&blkcg->css))
> > +		return NULL;
> > +
> 
> This reference is to make sure that blkcg does not go away while we drop
> the rcu lock and do the group allocation?

Yeah, for now but later patch will make blkg's hold onto the
associated blkcg and this reference will be dropped on blkg
destruction.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 12/12] blkcg: kill the mind-bending blkg->dev
  2012-01-19  1:11 ` [PATCH 12/12] blkcg: kill the mind-bending blkg->dev Tejun Heo
@ 2012-01-19 15:49   ` Vivek Goyal
  2012-01-19 16:30     ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 15:49 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Wed, Jan 18, 2012 at 05:11:30PM -0800, Tejun Heo wrote:
> blkg->dev is dev_t recording the device number of the block device for
> the associated request_queue.  It is used to identify the associated
> block device when printing out configuration or stats.
> 
> This is redundant to begin with.  A blkg is an association between a
> cgroup and a request_queue and it of course is possible to reach
> request_queue from blkg and synchronization conventions are in place
> for safe q dereferencing, so this shouldn't be necessary from the
> beginning.  Furthermore, it's initialized by sscanf()ing the device
> name of backing_dev_info.  The mind boggles.
> 
> Anyways, if blkg is visible under rcu lock, we *know* that the
> associated request_queue hasn't gone away yet and its bdi is
> registered and alive - blkg can't be created for request_queue which
> hasn't been fully initialized and it can't go away before blkg is
> removed.
> 
> Let stat and conf read functions get device name from
> blkg->q->backing_dev_info.dev and pass it down to printing functions
> and remove blkg->dev.

I think one of the reasons for not using blkg->q->backing_dev_info.dev was
that there were still drivers where multiple gendisks were sharing the
request queue.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/12] blkcg: obtaining blkg should be enclosed inside rcu_read_lock()
  2012-01-19 15:39     ` Tejun Heo
@ 2012-01-19 15:54       ` Vivek Goyal
  2012-01-19 15:58         ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 15:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Thu, Jan 19, 2012 at 07:39:38AM -0800, Tejun Heo wrote:
> Hello, Vivek.
> 
> On Thu, Jan 19, 2012 at 05:07:29AM -0500, Vivek Goyal wrote:
> > On Wed, Jan 18, 2012 at 05:11:19PM -0800, Tejun Heo wrote:
> > > When looking up or creating blkg's, both blk-throttle and cfq-iosched
> > > drops rcu_read_lock() right after lookup is complete.  This isn't
> > > safe.  Refcnt isn't incremented at that point and rcu lock is the only
> > > thing holding the blkg.  It shouldn't be dropped until after refcnt is
> > > incremented by the caller.
> > 
> > throtl_get_tg() and cfq_get_cfqg() are called with queue lock held and
> > tg and cfqg are protected by queue lock as they can not go away as long
> > as queue lock is held.
> 
> Ah, right.
> 
> > I had used rcu read lock to access blkcg pointer here. That's why when
> > we are done with accessing blkcg, we drop rcu read lock and return back
> > to caller with group pointer, which is aready holding either a queue
> > lock or rcu read lock to protect returned group pointer.
> >
> > So if we are protecting blkcg using rcu, then it should make sense to
> > take that lock inside throtl_get_tg() and cfq_get_cfqg() respectively and
> > it should not be left to the caller?
> 
> No, no matter whatever synchronization scheme is in use, the code is
> seriously screwed up if it's doing something like,
> 
> 	lock();
> 	a = lookup();
> 	unlock();
> 	return a;
> 
> You should *NEVER* be doing that.

I guess ioc_lookup_icq() is doing something similar. We call it under
queue lock. Take rcu lock inside for sanity of radix tree and then 
release rcu lock and return icq.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/12] blkcg: use q and plid instead of opaque void * for blkio_group association
  2012-01-19 14:04   ` Vivek Goyal
@ 2012-01-19 15:55     ` Tejun Heo
  2012-01-19 16:16       ` Vivek Goyal
  0 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19 15:55 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel

Hello,

On Thu, Jan 19, 2012 at 09:04:42AM -0500, Vivek Goyal wrote:
> On Wed, Jan 18, 2012 at 05:11:24PM -0800, Tejun Heo wrote:
> > blkgio_group is association between a block cgroup and a queue for a
> > given policy.  Using opaque void * for association makes things
> > confusing and hinders factoring of common code.  Use request_queue *
> > and, if necessary, policy id instead.
> > 
> > This will help block cgroup API cleanup.
> 
> Using void* allowed one to pass any type of data pointer as key by the
> client.
>
> I think passing cfq_data or throtl_data as key is better than passing
> request queue as key.
> 
> - During elevator exit, it looks like there will be a small window where
>   groups from both old elevator and new elevator will be present in blkcg
>   list. Given the fact that there can be only one active elevator at a
>   time, during cgroup removal call there is no way to reach a group's
>   cfqd. One can only retrieve request queue reliably and can't rely
>   on q->elevator->elevator_data.
> 
>   So passing cfq_data as key provides more flexibility and allows
>   co-existence of two elevators more naturally without information loss.

Yeah, sure, at the cost of making everything opaque to blkcg core and
duplicating a lot of stuff in policies.  The sad part is that the
'flexibility' bought that way isn't even necessary if layering is done
right.  You're asking completely wrong questions and then coming up
with convoluted solutions somehow working around those silly problem
definitions.  Block cgroup core layer should do all blkg management
and policy implementations shouldn't be worrying about dereferencing
some stupid pointer while elevator is being switched.

> [..]
> > -static void cfq_unlink_blkio_group(void *key, struct blkio_group *blkg)
> > +static void cfq_unlink_blkio_group(struct request_queue *q,
> > +				   struct blkio_group *blkg)
> >  {
> > -	unsigned long  flags;
> > -	struct cfq_data *cfqd = key;
> > +	struct cfq_data *cfqd = q->elevator->elevator_data;
> > +	unsigned long flags;
> >  
> > -	spin_lock_irqsave(cfqd->queue->queue_lock, flags);
> > +	spin_lock_irqsave(q->queue_lock, flags);
> >  	cfq_destroy_cfqg(cfqd, cfqg_of_blkg(blkg));
> > -	spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
> > +	spin_unlock_irqrestore(q->queue_lock, flags);
> 
> I think this code will create problem where both old elevator group and
> new elevator group is on blkcg list and upon cgroup removal one can not
> rely that q->elevator->elevator_data will give us old elevator's cfqd.

Again, if I didn't botch up earlier elevator switch code, it shouldn't.

> But I think it is confusing so it is probably better to register cfq_data
> or throtl_data as key instead of request queue.

I'm not gonna repeat.  I really don't have any nice thing left to say
about it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/12] blkcg: obtaining blkg should be enclosed inside rcu_read_lock()
  2012-01-19 15:54       ` Vivek Goyal
@ 2012-01-19 15:58         ` Tejun Heo
  0 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19 15:58 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel

On Thu, Jan 19, 2012 at 10:54:45AM -0500, Vivek Goyal wrote:
> > No, no matter whatever synchronization scheme is in use, the code is
> > seriously screwed up if it's doing something like,
> > 
> > 	lock();
> > 	a = lookup();
> > 	unlock();
> > 	return a;
> > 
> > You should *NEVER* be doing that.
> 
> I guess ioc_lookup_icq() is doing something similar. We call it under
> queue lock. Take rcu lock inside for sanity of radix tree and then 
> release rcu lock and return icq.

Yeap, it is.  We should be using rcu_dereference_check() there too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/12] blkcg: use q and plid instead of opaque void * for blkio_group association
  2012-01-19 15:55     ` Tejun Heo
@ 2012-01-19 16:16       ` Vivek Goyal
  2012-01-19 16:25         ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 16:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Thu, Jan 19, 2012 at 07:55:45AM -0800, Tejun Heo wrote:

[..]
> > > +static void cfq_unlink_blkio_group(struct request_queue *q,
> > > +				   struct blkio_group *blkg)
> > >  {
> > > -	unsigned long  flags;
> > > -	struct cfq_data *cfqd = key;
> > > +	struct cfq_data *cfqd = q->elevator->elevator_data;
> > > +	unsigned long flags;
> > >  
> > > -	spin_lock_irqsave(cfqd->queue->queue_lock, flags);
> > > +	spin_lock_irqsave(q->queue_lock, flags);
> > >  	cfq_destroy_cfqg(cfqd, cfqg_of_blkg(blkg));
> > > -	spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
> > > +	spin_unlock_irqrestore(q->queue_lock, flags);
> > 
> > I think this code will create problem where both old elevator group and
> > new elevator group is on blkcg list and upon cgroup removal one can not
> > rely that q->elevator->elevator_data will give us old elevator's cfqd.
> 
> Again, if I didn't botch up earlier elevator switch code, it shouldn't.

I think I am missing something. IIUC, following is new elevator switch
sequence.

1. elv_quiesce_start
2. unregister old elevator
3. ioc_clear_queue
4. allocate new elevator
5. init new elevator
6. exit old elevator

So any groups on old elevator, will be cleaned up in step 6. So till step
5 these groups are still present on blkcg list. Now assume between step 5
and step 6, if a cgroup removal takes place and blkcg tries to call into
elevator to remove that group, will it not be accessing the wrong cfqd
in cfq_destroy_cfqg() (cfqd of new elevator instead of old elevator).

What am I missing?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 08/12] blkcg: use the usual get blkg path for root blkio_group
  2012-01-19 14:41   ` Vivek Goyal
@ 2012-01-19 16:17     ` Tejun Heo
  0 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19 16:17 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel

Hello,

On Thu, Jan 19, 2012 at 09:41:36AM -0500, Vivek Goyal wrote:
> > This simplifies root blkg handling noticeably for cfq and will allow
> > further modularization of blkcg API.
> 
> Does this work if BLK_CGROUP=n?
> 
> I am trying to think that why did I keep allocating root cgroup statically
> for CFQ. May be the reason was that this group was present even if
> CFQ_GROUP_IOSCHED=n. There was other code which assumed a group to be
> always present and I did not want to put more #ifdef.
> 
> Anyway, if it works well, with BLK_CGROUP=n, then it is fine.

Right, it's currently broken but it shouldn't be too difficult to fix.
Will update the patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/12] blkcg: use q and plid instead of opaque void * for blkio_group association
  2012-01-19 16:16       ` Vivek Goyal
@ 2012-01-19 16:25         ` Tejun Heo
  2012-01-19 16:42           ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19 16:25 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel

Hello,

On Thu, Jan 19, 2012 at 11:16:42AM -0500, Vivek Goyal wrote:
> 1. elv_quiesce_start
> 2. unregister old elevator
> 3. ioc_clear_queue
> 4. allocate new elevator
> 5. init new elevator
> 6. exit old elevator
> 
> So any groups on old elevator, will be cleaned up in step 6. So till step
> 5 these groups are still present on blkcg list. Now assume between step 5
> and step 6, if a cgroup removal takes place and blkcg tries to call into
> elevator to remove that group, will it not be accessing the wrong cfqd
> in cfq_destroy_cfqg() (cfqd of new elevator instead of old elevator).
> 
> What am I missing?

I don't think you're missing anything.  The issue is moot as the
second patchset, which I'll probably post today, unifies blkg's so
that there's single blkg per cgroup - request_queue pair.

Hmmm... we can add code to shoot down blkg's along with ioc's while
preparing for elevator switch or just note the problem in the commit
message and explain the problem is transitional.  I'll see how complex
shooting down blkg's would be.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 12/12] blkcg: kill the mind-bending blkg->dev
  2012-01-19 15:49   ` Vivek Goyal
@ 2012-01-19 16:30     ` Tejun Heo
  2012-01-19 16:46       ` Vivek Goyal
  0 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19 16:30 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel

Hello,

On Thu, Jan 19, 2012 at 10:49:19AM -0500, Vivek Goyal wrote:
> > Let stat and conf read functions get device name from
> > blkg->q->backing_dev_info.dev and pass it down to printing functions
> > and remove blkg->dev.
> 
> I think one of the reasons for not using blkg->q->backing_dev_info.dev was
> that there were still drivers where multiple gendisks were sharing the
> request queue.

Hmmm... is that relevant?  The configurations / stats are per
cgroup-request_queue pair.  How does multiple genhd's sharing a queue
make any difference?  Also, the fill_dev_details functions were using
the same q->backing_dev_info to fill it in, so it's not like it was
using anything different.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/12] blkcg: use q and plid instead of opaque void * for blkio_group association
  2012-01-19 16:25         ` Tejun Heo
@ 2012-01-19 16:42           ` Tejun Heo
  0 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2012-01-19 16:42 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel

On Thu, Jan 19, 2012 at 08:25:44AM -0800, Tejun Heo wrote:
> Hmmm... we can add code to shoot down blkg's along with ioc's while
> preparing for elevator switch or just note the problem in the commit
> message and explain the problem is transitional.  I'll see how complex
> shooting down blkg's would be.

I think we need blkg shootdown on elv switch whether blkg is unified
or not.  I'll update the patch.  Thanks for pointing it out.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 12/12] blkcg: kill the mind-bending blkg->dev
  2012-01-19 16:30     ` Tejun Heo
@ 2012-01-19 16:46       ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 16:46 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Thu, Jan 19, 2012 at 08:30:56AM -0800, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jan 19, 2012 at 10:49:19AM -0500, Vivek Goyal wrote:
> > > Let stat and conf read functions get device name from
> > > blkg->q->backing_dev_info.dev and pass it down to printing functions
> > > and remove blkg->dev.
> > 
> > I think one of the reasons for not using blkg->q->backing_dev_info.dev was
> > that there were still drivers where multiple gendisks were sharing the
> > request queue.
> 
> Hmmm... is that relevant?  The configurations / stats are per
> cgroup-request_queue pair.  How does multiple genhd's sharing a queue
> make any difference?  Also, the fill_dev_details functions were using
> the same q->backing_dev_info to fill it in, so it's not like it was
> using anything different.

You are right. Looks like it should not make any difference. Also looks
like bdi->dev points to first disk on the queue and ignores later ones.

int bdi_register(struct backing_dev_info *bdi, struct device *parent,
                const char *fmt, ...)
{       
        if (bdi->dev)   /* The driver needs to use separate queues per
device */
                return 0;
}

So if a queue is being shared by multiple disks, then any throttling rule
on the device will practically trated a rule on the queue and will be
effective on all the devices cumulatively. So sounds fine.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/12] blkcg: kill blkio_policy_node
  2012-01-19  1:11 ` [PATCH 11/12] blkcg: kill blkio_policy_node Tejun Heo
@ 2012-01-19 20:32   ` Vivek Goyal
  2012-01-19 22:03     ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 20:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Wed, Jan 18, 2012 at 05:11:29PM -0800, Tejun Heo wrote:

[..]
> @@ -1413,14 +1110,10 @@ static int blkio_weight_write(struct blkio_cgroup *blkcg, u64 val)
>  	spin_lock_irq(&blkcg->lock);
>  	blkcg->weight = (unsigned int)val;
>  
> -	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
> -		pn = blkio_policy_search_node(blkcg, blkg->dev,
> -				BLKIO_POLICY_PROP, BLKIO_PROP_weight_device);
> -		if (pn)
> -			continue;
> +	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node)
> +		if (blkg->plid == plid && !blkg->conf.weight)
> +			blkio_update_group_weight(blkg, blkcg->weight);

Tejun,

Why do we check for blkg->conf.weight=0 here. Even if a group already has
weight and if user has changed the cgroup weight later, that update should
be propogated to all the groups on all the queues.

Where do we assign default cgroup weight to a blkg upon creation? May be at
group create time, we just need to copy blkcg->weight to blkg.conf.weight.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/12] blkcg: kill blkio_policy_node
  2012-01-19 20:32   ` Vivek Goyal
@ 2012-01-19 22:03     ` Tejun Heo
  2012-01-19 22:30       ` Vivek Goyal
  0 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2012-01-19 22:03 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: axboe, ctalbott, rni, linux-kernel

Hello,

On Thu, Jan 19, 2012 at 03:32:25PM -0500, Vivek Goyal wrote:
> On Wed, Jan 18, 2012 at 05:11:29PM -0800, Tejun Heo wrote:
> 
> [..]
> > @@ -1413,14 +1110,10 @@ static int blkio_weight_write(struct blkio_cgroup *blkcg, u64 val)
> >  	spin_lock_irq(&blkcg->lock);
> >  	blkcg->weight = (unsigned int)val;
> >  
> > -	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
> > -		pn = blkio_policy_search_node(blkcg, blkg->dev,
> > -				BLKIO_POLICY_PROP, BLKIO_PROP_weight_device);
> > -		if (pn)
> > -			continue;
> > +	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node)
> > +		if (blkg->plid == plid && !blkg->conf.weight)
> > +			blkio_update_group_weight(blkg, blkcg->weight);
> 
> Tejun,
> 
> Why do we check for blkg->conf.weight=0 here. Even if a group already has
> weight and if user has changed the cgroup weight later, that update should
> be propogated to all the groups on all the queues.

Maybe I misread the code but if explicit per-device config exists,
cgroup-wide writes are ignored, no?  The original code searches for
matching pn and if it exists, continues without updating.

> Where do we assign default cgroup weight to a blkg upon creation? May be at
> group create time, we just need to copy blkcg->weight to blkg.conf.weight.

We don't.  blk->conf.weight is set only on explicit per-device
configuration.  It's gonna be removed eventually after config
definition and handling are moved into policies.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/12] blkcg: kill blkio_policy_node
  2012-01-19 22:03     ` Tejun Heo
@ 2012-01-19 22:30       ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2012-01-19 22:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel

On Thu, Jan 19, 2012 at 02:03:06PM -0800, Tejun Heo wrote:

[..]
> > Why do we check for blkg->conf.weight=0 here. Even if a group already has
> > weight and if user has changed the cgroup weight later, that update should
> > be propogated to all the groups on all the queues.
> 
> Maybe I misread the code but if explicit per-device config exists,
> cgroup-wide writes are ignored, no?  The original code searches for
> matching pn and if it exists, continues without updating.
> 
> > Where do we assign default cgroup weight to a blkg upon creation? May be at
> > group create time, we just need to copy blkcg->weight to blkg.conf.weight.
> 
> We don't.  blk->conf.weight is set only on explicit per-device
> configuration.  It's gonna be removed eventually after config
> definition and handling are moved into policies.

Ok. So blkg->conf.weight is set only for per device weights. For groups
using cgroup weight, I see that cfq inherits it during group creation.
Any updates to cgroup weight are sent only if per device weight is not
set. Previously we used to do this check by looking for policy node and
you are checking for blkg.weight. That clarifies it. Thank.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2012-01-19 22:31 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-19  1:11 [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo
2012-01-19  1:11 ` [PATCH 01/12] blkcg: obtaining blkg should be enclosed inside rcu_read_lock() Tejun Heo
2012-01-19 10:07   ` Vivek Goyal
2012-01-19 15:39     ` Tejun Heo
2012-01-19 15:54       ` Vivek Goyal
2012-01-19 15:58         ` Tejun Heo
2012-01-19  1:11 ` [PATCH 02/12] cfq: don't register propio policy if !CONFIG_CFQ_GROUP_IOSCHED Tejun Heo
2012-01-19 10:11   ` Vivek Goyal
2012-01-19  1:11 ` [PATCH 03/12] elevator: clear auxiliary data earlier during elevator switch Tejun Heo
2012-01-19  1:11 ` [PATCH 04/12] elevator: make elevator_init_fn() return 0/-errno Tejun Heo
2012-01-19 11:11   ` Vivek Goyal
2012-01-19 15:44     ` Tejun Heo
2012-01-19  1:11 ` [PATCH 05/12] blkcg: update blkg get functions take blkio_cgroup as parameter Tejun Heo
2012-01-19 11:21   ` Vivek Goyal
2012-01-19 15:45     ` Tejun Heo
2012-01-19  1:11 ` [PATCH 06/12] blkcg: use q and plid instead of opaque void * for blkio_group association Tejun Heo
2012-01-19 14:04   ` Vivek Goyal
2012-01-19 15:55     ` Tejun Heo
2012-01-19 16:16       ` Vivek Goyal
2012-01-19 16:25         ` Tejun Heo
2012-01-19 16:42           ` Tejun Heo
2012-01-19  1:11 ` [PATCH 07/12] blkcg: add blkio_policy[] array and allow one policy per policy ID Tejun Heo
2012-01-19  1:11 ` [PATCH 08/12] blkcg: use the usual get blkg path for root blkio_group Tejun Heo
2012-01-19 14:41   ` Vivek Goyal
2012-01-19 16:17     ` Tejun Heo
2012-01-19  1:11 ` [PATCH 09/12] blkcg: factor out blkio_group creation Tejun Heo
2012-01-19  1:11 ` [PATCH 10/12] blkcg: don't allow or retain configuration of missing devices Tejun Heo
2012-01-19  1:11 ` [PATCH 11/12] blkcg: kill blkio_policy_node Tejun Heo
2012-01-19 20:32   ` Vivek Goyal
2012-01-19 22:03     ` Tejun Heo
2012-01-19 22:30       ` Vivek Goyal
2012-01-19  1:11 ` [PATCH 12/12] blkcg: kill the mind-bending blkg->dev Tejun Heo
2012-01-19 15:49   ` Vivek Goyal
2012-01-19 16:30     ` Tejun Heo
2012-01-19 16:46       ` Vivek Goyal
2012-01-19  1:18 ` [PATCHSET] blkcg: kill policy node and blkg->dev Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).