linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET] block: modularize blkcg config and stat file handling
@ 2012-03-28 22:51 Tejun Heo
  2012-03-28 22:51 ` [PATCH 01/21] blkcg: remove unused @pol and @plid parameters Tejun Heo
                   ` (23 more replies)
  0 siblings, 24 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

Hello,

The way configuration and statistics are handled in block cgroup is
rather depressing.  Every configuration and stat counter are defined
in blkcg core and specific policy implementations should go through
some blkcg interface to access and manipulate them and be notified of
the changes.

By doing things this way, it achieves both complexity and
inflexibility - the code paths are unnecessarily complex while not
achieving any layering or modularity.  A configuration or statistics
counter cannot be added to policies without modifying multiple places
in blkcg core code.  Wanna implement a new policy?  Good luck.  The
implementation details are sad too.  Stuff goes through needless
layers of functions with hard coded cases on specific counters and
configurations and fields specific to policies are thrown into common
area with some shared and others unused.

This patchset is an attempt at bringing some sanity to blkcg config
and stat file handling.  It makes use of the pending dynamic cgroup
file type addition / removal support [1], which will be merged into
cgroup/for-3.5 once 3.4-rc1 is released.

All conf and stat file handling is moved into the policy
implementaiton the files belong to and blkcg supplies helpers to ease
file handling in policy implementations without requiring full
knowledge of all configurations and statistic counters.

This patchset sheds >500 lines of code while maintaining the same
features and with more comments and much more modular design.  Yeah,
it was just wrong.

This patchset contains the following 21 patches

 0001-blkcg-remove-unused-pol-and-plid-parameters.patch
 0002-blkcg-BLKIO_STAT_CPU_SECTORS-doesn-t-have-subcounter.patch
 0003-blkcg-introduce-blkg_stat-and-blkg_rwstat.patch
 0004-blkcg-restructure-statistics-printing.patch
 0005-blkcg-drop-blkiocg_file_write_u64.patch
 0006-blkcg-restructure-configuration-printing.patch
 0007-blkcg-restructure-blkio_group-configruation-setting.patch
 0008-blkcg-blkg_conf_prep.patch
 0009-blkcg-export-conf-stat-helpers-to-prepare-for-reorga.patch
 0010-blkcg-implement-blkio_policy_type-cftypes.patch
 0011-blkcg-move-conf-stat-file-handling-code-to-policies.patch
 0012-cfq-collapse-cfq.h-into-cfq-iosched.c.patch
 0013-blkcg-move-statistics-update-code-to-policies.patch
 0014-blkcg-cfq-doesn-t-need-per-cpu-dispatch-stats.patch
 0015-blkcg-add-blkio_policy_ops-operations-for-exit-and-s.patch
 0016-blkcg-move-blkio_group_stats-to-cfq-iosched.c.patch
 0017-blkcg-move-blkio_group_stats_cpu-and-friends-to-blk-.patch
 0018-blkcg-move-blkio_group_conf-weight-to-cfq.patch
 0019-blkcg-move-blkio_group_conf-iops-and-bps-to-blk-thro.patch
 0020-blkcg-pass-around-pd-pdata-instead-of-pd-itself-in-p.patch
 0021-blkcg-drop-BLKCG_STAT_-PRIV-POL-OFF-macros.patch

and is on top of

  block/for-3.5/core eb7d8c07f9 "cfq: fix cfqg ref handling..."
+ [1] cgroup-cftypes d954ca6469 "cgroup: implement cgroup_rm_cftypes()"

Note that the cgroup branch is temporary and the merge between the two
branches aren't trivial.  I'll prepare a proper merged branch once the
cgroup/for-3.5 branch is settled.

This patchset is also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blkcg-files

diffstat follows.

 block/blk-cgroup.c   | 1343 ++++++---------------------------------------------
 block/blk-cgroup.h   |  414 ++++++---------
 block/blk-throttle.c |  318 ++++++++++--
 block/cfq-iosched.c  |  567 +++++++++++++++++++--
 block/cfq.h          |  118 ----
 5 files changed, 1102 insertions(+), 1658 deletions(-)

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel.containers/22623

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 01/21] blkcg: remove unused @pol and @plid parameters
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 02/21] blkcg: BLKIO_STAT_CPU_SECTORS doesn't have subcounters Tejun Heo
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

@pol to blkg_to_pdata() and @plid to blkg_lookup_create() are no
longer necessary.  Drop them.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c   |    3 +--
 block/blk-cgroup.h   |    8 ++------
 block/blk-throttle.c |    7 +++----
 block/cfq-iosched.c  |    7 +++----
 4 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 7947e17..d4cf77d 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -568,7 +568,6 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
 
 struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
 				       struct request_queue *q,
-				       enum blkio_policy_id plid,
 				       bool for_root)
 	__releases(q->queue_lock) __acquires(q->queue_lock)
 {
@@ -1027,7 +1026,7 @@ static int blkio_policy_parse_and_set(char *buf, enum blkio_policy_id plid,
 	rcu_read_lock();
 
 	spin_lock_irq(disk->queue->queue_lock);
-	blkg = blkg_lookup_create(blkcg, disk->queue, plid, false);
+	blkg = blkg_lookup_create(blkcg, disk->queue, false);
 	spin_unlock_irq(disk->queue->queue_lock);
 
 	if (IS_ERR(blkg)) {
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 1cb8f76..1add3dc 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -266,13 +266,10 @@ static inline void *blkg_to_pdata(struct blkio_group *blkg,
 /**
  * pdata_to_blkg - get blkg associated with policy private data
  * @pdata: policy private data of interest
- * @pol: policy @pdata is for
  *
- * @pdata is policy private data for @pol.  Determine the blkg it's
- * associated with.
+ * @pdata is policy private data.  Determine the blkg it's associated with.
  */
-static inline struct blkio_group *pdata_to_blkg(void *pdata,
-						struct blkio_policy_type *pol)
+static inline struct blkio_group *pdata_to_blkg(void *pdata)
 {
 	if (pdata) {
 		struct blkg_policy_data *pd =
@@ -402,7 +399,6 @@ extern struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg,
 				       struct request_queue *q);
 struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
 				       struct request_queue *q,
-				       enum blkio_policy_id plid,
 				       bool for_root);
 void blkiocg_update_timeslice_used(struct blkio_group *blkg,
 				   struct blkio_policy_type *pol,
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 4ba1418..1cc6c23d 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -107,7 +107,7 @@ static inline struct throtl_grp *blkg_to_tg(struct blkio_group *blkg)
 
 static inline struct blkio_group *tg_to_blkg(struct throtl_grp *tg)
 {
-	return pdata_to_blkg(tg, &blkio_policy_throtl);
+	return pdata_to_blkg(tg);
 }
 
 enum tg_state_flags {
@@ -185,7 +185,7 @@ static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td,
 	} else {
 		struct blkio_group *blkg;
 
-		blkg = blkg_lookup_create(blkcg, q, BLKIO_POLICY_THROTL, false);
+		blkg = blkg_lookup_create(blkcg, q, false);
 
 		/* if %NULL and @q is alive, fall back to root_tg */
 		if (!IS_ERR(blkg))
@@ -1033,8 +1033,7 @@ int blk_throtl_init(struct request_queue *q)
 	rcu_read_lock();
 	spin_lock_irq(q->queue_lock);
 
-	blkg = blkg_lookup_create(&blkio_root_cgroup, q, BLKIO_POLICY_THROTL,
-				  true);
+	blkg = blkg_lookup_create(&blkio_root_cgroup, q, true);
 	if (!IS_ERR(blkg))
 		td->root_tg = blkg_to_tg(blkg);
 
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 39c4330..8cca6161 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -374,7 +374,7 @@ static inline struct cfq_group *blkg_to_cfqg(struct blkio_group *blkg)
 
 static inline struct blkio_group *cfqg_to_blkg(struct cfq_group *cfqg)
 {
-	return pdata_to_blkg(cfqg, &blkio_policy_cfq);
+	return pdata_to_blkg(cfqg);
 }
 
 static inline void cfqg_get(struct cfq_group *cfqg)
@@ -1092,7 +1092,7 @@ static struct cfq_group *cfq_lookup_create_cfqg(struct cfq_data *cfqd,
 	} else {
 		struct blkio_group *blkg;
 
-		blkg = blkg_lookup_create(blkcg, q, BLKIO_POLICY_PROP, false);
+		blkg = blkg_lookup_create(blkcg, q, false);
 		if (!IS_ERR(blkg))
 			cfqg = blkg_to_cfqg(blkg);
 	}
@@ -3523,8 +3523,7 @@ static int cfq_init_queue(struct request_queue *q)
 	rcu_read_lock();
 	spin_lock_irq(q->queue_lock);
 
-	blkg = blkg_lookup_create(&blkio_root_cgroup, q, BLKIO_POLICY_PROP,
-				  true);
+	blkg = blkg_lookup_create(&blkio_root_cgroup, q, true);
 	if (!IS_ERR(blkg))
 		cfqd->root_group = blkg_to_cfqg(blkg);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 02/21] blkcg: BLKIO_STAT_CPU_SECTORS doesn't have subcounters
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
  2012-03-28 22:51 ` [PATCH 01/21] blkcg: remove unused @pol and @plid parameters Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 03/21] blkcg: introduce blkg_stat and blkg_rwstat Tejun Heo
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

BLKIO_STAT_CPU_SECTORS doesn't need read/write/sync/async subcounters
and is counted by blkio_group_stats_cpu->sectors; however, it still
holds a member in blkio_group_stats_cpu->stat_arr_cpu.

Rearrange stat_type_cpu and define BLKIO_STAT_CPU_ARR_NR and use it
for stat_arr_cpu[] size so that only SERVICE_BYTES and SERVICED have
subcounters.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.h |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 1add3dc..2060d81 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -58,14 +58,17 @@ enum stat_type {
 
 /* Per cpu stats */
 enum stat_type_cpu {
-	BLKIO_STAT_CPU_SECTORS,
 	/* Total bytes transferred */
 	BLKIO_STAT_CPU_SERVICE_BYTES,
 	/* Total IOs serviced, post merge */
 	BLKIO_STAT_CPU_SERVICED,
-	BLKIO_STAT_CPU_NR
+
+	/* All the single valued stats go below this */
+	BLKIO_STAT_CPU_SECTORS,
 };
 
+#define BLKIO_STAT_CPU_ARR_NR	(BLKIO_STAT_CPU_SERVICED + 1)
+
 enum stat_sub_type {
 	BLKIO_STAT_READ = 0,
 	BLKIO_STAT_WRITE,
@@ -167,7 +170,7 @@ struct blkio_group_stats {
 /* Per cpu blkio group stats */
 struct blkio_group_stats_cpu {
 	uint64_t sectors;
-	uint64_t stat_arr_cpu[BLKIO_STAT_CPU_NR][BLKIO_STAT_TOTAL];
+	uint64_t stat_arr_cpu[BLKIO_STAT_CPU_ARR_NR][BLKIO_STAT_TOTAL];
 	struct u64_stats_sync syncp;
 };
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 03/21] blkcg: introduce blkg_stat and blkg_rwstat
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
  2012-03-28 22:51 ` [PATCH 01/21] blkcg: remove unused @pol and @plid parameters Tejun Heo
  2012-03-28 22:51 ` [PATCH 02/21] blkcg: BLKIO_STAT_CPU_SECTORS doesn't have subcounters Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 04/21] blkcg: restructure statistics printing Tejun Heo
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

blkcg uses u64_stats_sync to avoid reading wrong u64 statistic values
on 32bit archs and some stat counters have subtypes to distinguish
read/writes and sync/async IOs.  The stat code paths are confusing and
involve a lot of going back and forth between blkcg core and specific
policy implementations, and synchronization and subtype handling are
open coded in blkcg core.

This patch introduces struct blkg_stat and blkg_rwstat which, with
accompanying operations, encapsulate stat updating and accessing with
proper synchronization.

blkg_stat is simple u64 counter with 64bit read-access protection.
blkg_rwstat is the one with rw and [a]sync subcounters and takes @rw
flags to distinguish IO subtypes (%REQ_WRITE and %REQ_SYNC) and
replaces stat_sub_type indexed arrays.

All counters in blkio_group_stats and blkio_group_stats_cpu are
replaced with either blkg_stat or blkg_rwstat along with all users.

This does add one u64_stats_sync per counter and increase stats_sync
operations but they're empty/noops on 64bit archs and blkcg doesn't
have too many counters, especially with DEBUG_BLK_CGROUP off.

While the currently resulting code isn't necessarily simpler at the
moment, this will enable further clean up of blkcg stats code.

- BLKIO_STAT_{READ|WRITE|SYNC|ASYNC|TOTAL} renamed to
  BLKG_RWSTAT_{READ|WRITE|SYNC|ASYNC|TOTAL}.

- blkg_stat_add() replaces blkio_add_stat() and
  blkio_check_and_dec_stat().  Note that BUG_ON() on underflow in the
  latter function no longer exists.  It's *way* better to have
  underflowed stat counters than oopsing.

- blkio_group_stats->dequeue is now a proper u64 stat counter instead
  of ulong.

- reset_stats() updated to clear each stat counters individually and
  BLKG_STATS_DEBUG_CLEAR_{START|SIZE} are removed.

- Some functions reconstruct rw flags from direction and sync
  booleans.  This will be removed by future patches.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c |  289 +++++++++++++++++++++++-----------------------------
 block/blk-cgroup.h |  211 ++++++++++++++++++++++++++++++--------
 2 files changed, 293 insertions(+), 207 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index d4cf77d..153a2db 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -132,46 +132,6 @@ static inline void blkio_update_group_iops(struct blkio_group *blkg,
 	}
 }
 
-/*
- * Add to the appropriate stat variable depending on the request type.
- * This should be called with queue_lock held.
- */
-static void blkio_add_stat(uint64_t *stat, uint64_t add, bool direction,
-				bool sync)
-{
-	if (direction)
-		stat[BLKIO_STAT_WRITE] += add;
-	else
-		stat[BLKIO_STAT_READ] += add;
-	if (sync)
-		stat[BLKIO_STAT_SYNC] += add;
-	else
-		stat[BLKIO_STAT_ASYNC] += add;
-}
-
-/*
- * Decrements the appropriate stat variable if non-zero depending on the
- * request type. Panics on value being zero.
- * This should be called with the queue_lock held.
- */
-static void blkio_check_and_dec_stat(uint64_t *stat, bool direction, bool sync)
-{
-	if (direction) {
-		BUG_ON(stat[BLKIO_STAT_WRITE] == 0);
-		stat[BLKIO_STAT_WRITE]--;
-	} else {
-		BUG_ON(stat[BLKIO_STAT_READ] == 0);
-		stat[BLKIO_STAT_READ]--;
-	}
-	if (sync) {
-		BUG_ON(stat[BLKIO_STAT_SYNC] == 0);
-		stat[BLKIO_STAT_SYNC]--;
-	} else {
-		BUG_ON(stat[BLKIO_STAT_ASYNC] == 0);
-		stat[BLKIO_STAT_ASYNC]--;
-	}
-}
-
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 /* This should be called with the queue_lock held. */
 static void blkio_set_start_group_wait_time(struct blkio_group *blkg,
@@ -198,7 +158,8 @@ static void blkio_update_group_wait_time(struct blkio_group_stats *stats)
 
 	now = sched_clock();
 	if (time_after64(now, stats->start_group_wait_time))
-		stats->group_wait_time += now - stats->start_group_wait_time;
+		blkg_stat_add(&stats->group_wait_time,
+			      now - stats->start_group_wait_time);
 	blkio_clear_blkg_waiting(stats);
 }
 
@@ -212,7 +173,8 @@ static void blkio_end_empty_time(struct blkio_group_stats *stats)
 
 	now = sched_clock();
 	if (time_after64(now, stats->start_empty_time))
-		stats->empty_time += now - stats->start_empty_time;
+		blkg_stat_add(&stats->empty_time,
+			      now - stats->start_empty_time);
 	blkio_clear_blkg_empty(stats);
 }
 
@@ -239,11 +201,9 @@ void blkiocg_update_idle_time_stats(struct blkio_group *blkg,
 	if (blkio_blkg_idling(stats)) {
 		unsigned long long now = sched_clock();
 
-		if (time_after64(now, stats->start_idle_time)) {
-			u64_stats_update_begin(&stats->syncp);
-			stats->idle_time += now - stats->start_idle_time;
-			u64_stats_update_end(&stats->syncp);
-		}
+		if (time_after64(now, stats->start_idle_time))
+			blkg_stat_add(&stats->idle_time,
+				      now - stats->start_idle_time);
 		blkio_clear_blkg_idling(stats);
 	}
 }
@@ -256,13 +216,10 @@ void blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
 
 	lockdep_assert_held(blkg->q->queue_lock);
 
-	u64_stats_update_begin(&stats->syncp);
-	stats->avg_queue_size_sum +=
-			stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_READ] +
-			stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_WRITE];
-	stats->avg_queue_size_samples++;
+	blkg_stat_add(&stats->avg_queue_size_sum,
+		      blkg_rwstat_sum(&stats->queued));
+	blkg_stat_add(&stats->avg_queue_size_samples, 1);
 	blkio_update_group_wait_time(stats);
-	u64_stats_update_end(&stats->syncp);
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_avg_queue_size_stats);
 
@@ -273,8 +230,7 @@ void blkiocg_set_start_empty_time(struct blkio_group *blkg,
 
 	lockdep_assert_held(blkg->q->queue_lock);
 
-	if (stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_READ] ||
-			stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_WRITE])
+	if (blkg_rwstat_sum(&stats->queued))
 		return;
 
 	/*
@@ -298,7 +254,7 @@ void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
 
 	lockdep_assert_held(blkg->q->queue_lock);
 
-	pd->stats.dequeue += dequeue;
+	blkg_stat_add(&pd->stats.dequeue, dequeue);
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_dequeue_stats);
 #else
@@ -314,14 +270,12 @@ void blkiocg_update_io_add_stats(struct blkio_group *blkg,
 				 bool sync)
 {
 	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
 
 	lockdep_assert_held(blkg->q->queue_lock);
 
-	u64_stats_update_begin(&stats->syncp);
-	blkio_add_stat(stats->stat_arr[BLKIO_STAT_QUEUED], 1, direction, sync);
+	blkg_rwstat_add(&stats->queued, rw, 1);
 	blkio_end_empty_time(stats);
-	u64_stats_update_end(&stats->syncp);
-
 	blkio_set_start_group_wait_time(blkg, pol, curr_blkg);
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_io_add_stats);
@@ -331,13 +285,11 @@ void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
 				    bool direction, bool sync)
 {
 	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
 
 	lockdep_assert_held(blkg->q->queue_lock);
 
-	u64_stats_update_begin(&stats->syncp);
-	blkio_check_and_dec_stat(stats->stat_arr[BLKIO_STAT_QUEUED], direction,
-				 sync);
-	u64_stats_update_end(&stats->syncp);
+	blkg_rwstat_add(&stats->queued, rw, -1);
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_io_remove_stats);
 
@@ -350,12 +302,10 @@ void blkiocg_update_timeslice_used(struct blkio_group *blkg,
 
 	lockdep_assert_held(blkg->q->queue_lock);
 
-	u64_stats_update_begin(&stats->syncp);
-	stats->time += time;
+	blkg_stat_add(&stats->time, time);
 #ifdef CONFIG_DEBUG_BLK_CGROUP
-	stats->unaccounted_time += unaccounted_time;
+	blkg_stat_add(&stats->unaccounted_time, unaccounted_time);
 #endif
-	u64_stats_update_end(&stats->syncp);
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used);
 
@@ -367,6 +317,7 @@ void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
 				   struct blkio_policy_type *pol,
 				   uint64_t bytes, bool direction, bool sync)
 {
+	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
 	struct blkg_policy_data *pd = blkg->pd[pol->plid];
 	struct blkio_group_stats_cpu *stats_cpu;
 	unsigned long flags;
@@ -384,13 +335,10 @@ void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
 
 	stats_cpu = this_cpu_ptr(pd->stats_cpu);
 
-	u64_stats_update_begin(&stats_cpu->syncp);
-	stats_cpu->sectors += bytes >> 9;
-	blkio_add_stat(stats_cpu->stat_arr_cpu[BLKIO_STAT_CPU_SERVICED],
-			1, direction, sync);
-	blkio_add_stat(stats_cpu->stat_arr_cpu[BLKIO_STAT_CPU_SERVICE_BYTES],
-			bytes, direction, sync);
-	u64_stats_update_end(&stats_cpu->syncp);
+	blkg_stat_add(&stats_cpu->sectors, bytes >> 9);
+	blkg_rwstat_add(&stats_cpu->serviced, rw, 1);
+	blkg_rwstat_add(&stats_cpu->service_bytes, rw, bytes);
+
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_dispatch_stats);
@@ -403,17 +351,15 @@ void blkiocg_update_completion_stats(struct blkio_group *blkg,
 {
 	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
 	unsigned long long now = sched_clock();
+	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
 
 	lockdep_assert_held(blkg->q->queue_lock);
 
-	u64_stats_update_begin(&stats->syncp);
 	if (time_after64(now, io_start_time))
-		blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICE_TIME],
-				now - io_start_time, direction, sync);
+		blkg_rwstat_add(&stats->service_time, rw, now - io_start_time);
 	if (time_after64(io_start_time, start_time))
-		blkio_add_stat(stats->stat_arr[BLKIO_STAT_WAIT_TIME],
-				io_start_time - start_time, direction, sync);
-	u64_stats_update_end(&stats->syncp);
+		blkg_rwstat_add(&stats->wait_time, rw,
+				io_start_time - start_time);
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_completion_stats);
 
@@ -423,12 +369,11 @@ void blkiocg_update_io_merged_stats(struct blkio_group *blkg,
 				    bool direction, bool sync)
 {
 	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
 
 	lockdep_assert_held(blkg->q->queue_lock);
 
-	u64_stats_update_begin(&stats->syncp);
-	blkio_add_stat(stats->stat_arr[BLKIO_STAT_MERGED], 1, direction, sync);
-	u64_stats_update_end(&stats->syncp);
+	blkg_rwstat_add(&stats->merged, rw, 1);
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_io_merged_stats);
 
@@ -757,8 +702,9 @@ static void blkio_reset_stats_cpu(struct blkio_group *blkg, int plid)
 		struct blkio_group_stats_cpu *sc =
 			per_cpu_ptr(pd->stats_cpu, cpu);
 
-		sc->sectors = 0;
-		memset(sc->stat_arr_cpu, 0, sizeof(sc->stat_arr_cpu));
+		blkg_rwstat_reset(&sc->service_bytes);
+		blkg_rwstat_reset(&sc->serviced);
+		blkg_stat_reset(&sc->sectors);
 	}
 }
 
@@ -768,7 +714,6 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup);
 	struct blkio_group *blkg;
 	struct hlist_node *n;
-	int i;
 
 	spin_lock(&blkio_list_lock);
 	spin_lock_irq(&blkcg->lock);
@@ -786,14 +731,18 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 			struct blkio_group_stats *stats = &pd->stats;
 
 			/* queued stats shouldn't be cleared */
-			for (i = 0; i < ARRAY_SIZE(stats->stat_arr); i++)
-				if (i != BLKIO_STAT_QUEUED)
-					memset(stats->stat_arr[i], 0,
-					       sizeof(stats->stat_arr[i]));
-			stats->time = 0;
+			blkg_rwstat_reset(&stats->merged);
+			blkg_rwstat_reset(&stats->service_time);
+			blkg_rwstat_reset(&stats->wait_time);
+			blkg_stat_reset(&stats->time);
 #ifdef CONFIG_DEBUG_BLK_CGROUP
-			memset((void *)stats + BLKG_STATS_DEBUG_CLEAR_START, 0,
-			       BLKG_STATS_DEBUG_CLEAR_SIZE);
+			blkg_stat_reset(&stats->unaccounted_time);
+			blkg_stat_reset(&stats->avg_queue_size_sum);
+			blkg_stat_reset(&stats->avg_queue_size_samples);
+			blkg_stat_reset(&stats->dequeue);
+			blkg_stat_reset(&stats->group_wait_time);
+			blkg_stat_reset(&stats->idle_time);
+			blkg_stat_reset(&stats->empty_time);
 #endif
 			blkio_reset_stats_cpu(blkg, pol->plid);
 		}
@@ -804,7 +753,7 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 	return 0;
 }
 
-static void blkio_get_key_name(enum stat_sub_type type, const char *dname,
+static void blkio_get_key_name(enum blkg_rwstat_type type, const char *dname,
 			       char *str, int chars_left, bool diskname_only)
 {
 	snprintf(str, chars_left, "%s", dname);
@@ -817,19 +766,19 @@ static void blkio_get_key_name(enum stat_sub_type type, const char *dname,
 	if (diskname_only)
 		return;
 	switch (type) {
-	case BLKIO_STAT_READ:
+	case BLKG_RWSTAT_READ:
 		strlcat(str, " Read", chars_left);
 		break;
-	case BLKIO_STAT_WRITE:
+	case BLKG_RWSTAT_WRITE:
 		strlcat(str, " Write", chars_left);
 		break;
-	case BLKIO_STAT_SYNC:
+	case BLKG_RWSTAT_SYNC:
 		strlcat(str, " Sync", chars_left);
 		break;
-	case BLKIO_STAT_ASYNC:
+	case BLKG_RWSTAT_ASYNC:
 		strlcat(str, " Async", chars_left);
 		break;
-	case BLKIO_STAT_TOTAL:
+	case BLKG_RWSTAT_TOTAL:
 		strlcat(str, " Total", chars_left);
 		break;
 	default:
@@ -838,29 +787,34 @@ static void blkio_get_key_name(enum stat_sub_type type, const char *dname,
 }
 
 static uint64_t blkio_read_stat_cpu(struct blkio_group *blkg, int plid,
-			enum stat_type_cpu type, enum stat_sub_type sub_type)
+				    enum stat_type_cpu type,
+				    enum blkg_rwstat_type sub_type)
 {
 	struct blkg_policy_data *pd = blkg->pd[plid];
+	u64 val = 0;
 	int cpu;
-	struct blkio_group_stats_cpu *stats_cpu;
-	u64 val = 0, tval;
 
 	if (pd->stats_cpu == NULL)
 		return val;
 
 	for_each_possible_cpu(cpu) {
-		unsigned int start;
-		stats_cpu = per_cpu_ptr(pd->stats_cpu, cpu);
-
-		do {
-			start = u64_stats_fetch_begin(&stats_cpu->syncp);
-			if (type == BLKIO_STAT_CPU_SECTORS)
-				tval = stats_cpu->sectors;
-			else
-				tval = stats_cpu->stat_arr_cpu[type][sub_type];
-		} while(u64_stats_fetch_retry(&stats_cpu->syncp, start));
-
-		val += tval;
+		struct blkio_group_stats_cpu *stats_cpu =
+			per_cpu_ptr(pd->stats_cpu, cpu);
+		struct blkg_rwstat rws;
+
+		switch (type) {
+		case BLKIO_STAT_CPU_SECTORS:
+			val += blkg_stat_read(&stats_cpu->sectors);
+			break;
+		case BLKIO_STAT_CPU_SERVICE_BYTES:
+			rws = blkg_rwstat_read(&stats_cpu->service_bytes);
+			val += rws.cnt[sub_type];
+			break;
+		case BLKIO_STAT_CPU_SERVICED:
+			rws = blkg_rwstat_read(&stats_cpu->serviced);
+			val += rws.cnt[sub_type];
+			break;
+		}
 	}
 
 	return val;
@@ -872,7 +826,7 @@ static uint64_t blkio_get_stat_cpu(struct blkio_group *blkg, int plid,
 {
 	uint64_t disk_total, val;
 	char key_str[MAX_KEY_LEN];
-	enum stat_sub_type sub_type;
+	enum blkg_rwstat_type sub_type;
 
 	if (type == BLKIO_STAT_CPU_SECTORS) {
 		val = blkio_read_stat_cpu(blkg, plid, type, 0);
@@ -881,7 +835,7 @@ static uint64_t blkio_get_stat_cpu(struct blkio_group *blkg, int plid,
 		return val;
 	}
 
-	for (sub_type = BLKIO_STAT_READ; sub_type < BLKIO_STAT_TOTAL;
+	for (sub_type = BLKG_RWSTAT_READ; sub_type < BLKG_RWSTAT_NR;
 			sub_type++) {
 		blkio_get_key_name(sub_type, dname, key_str, MAX_KEY_LEN,
 				   false);
@@ -889,10 +843,10 @@ static uint64_t blkio_get_stat_cpu(struct blkio_group *blkg, int plid,
 		cb->fill(cb, key_str, val);
 	}
 
-	disk_total = blkio_read_stat_cpu(blkg, plid, type, BLKIO_STAT_READ) +
-		blkio_read_stat_cpu(blkg, plid, type, BLKIO_STAT_WRITE);
+	disk_total = blkio_read_stat_cpu(blkg, plid, type, BLKG_RWSTAT_READ) +
+		blkio_read_stat_cpu(blkg, plid, type, BLKG_RWSTAT_WRITE);
 
-	blkio_get_key_name(BLKIO_STAT_TOTAL, dname, key_str, MAX_KEY_LEN,
+	blkio_get_key_name(BLKG_RWSTAT_TOTAL, dname, key_str, MAX_KEY_LEN,
 			   false);
 	cb->fill(cb, key_str, disk_total);
 	return disk_total;
@@ -905,65 +859,76 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg, int plid,
 	struct blkio_group_stats *stats = &blkg->pd[plid]->stats;
 	uint64_t v = 0, disk_total = 0;
 	char key_str[MAX_KEY_LEN];
-	unsigned int sync_start;
+	struct blkg_rwstat rws = { };
 	int st;
 
 	if (type >= BLKIO_STAT_ARR_NR) {
-		do {
-			sync_start = u64_stats_fetch_begin(&stats->syncp);
-			switch (type) {
-			case BLKIO_STAT_TIME:
-				v = stats->time;
-				break;
+		switch (type) {
+		case BLKIO_STAT_TIME:
+			v = blkg_stat_read(&stats->time);
+			break;
 #ifdef CONFIG_DEBUG_BLK_CGROUP
-			case BLKIO_STAT_UNACCOUNTED_TIME:
-				v = stats->unaccounted_time;
-				break;
-			case BLKIO_STAT_AVG_QUEUE_SIZE: {
-				uint64_t samples = stats->avg_queue_size_samples;
+		case BLKIO_STAT_UNACCOUNTED_TIME:
+			v = blkg_stat_read(&stats->unaccounted_time);
+			break;
+		case BLKIO_STAT_AVG_QUEUE_SIZE: {
+			uint64_t samples;
 
-				if (samples) {
-					v = stats->avg_queue_size_sum;
-					do_div(v, samples);
-				}
-				break;
+			samples = blkg_stat_read(&stats->avg_queue_size_samples);
+			if (samples) {
+				v = blkg_stat_read(&stats->avg_queue_size_sum);
+				do_div(v, samples);
 			}
-			case BLKIO_STAT_IDLE_TIME:
-				v = stats->idle_time;
-				break;
-			case BLKIO_STAT_EMPTY_TIME:
-				v = stats->empty_time;
-				break;
-			case BLKIO_STAT_DEQUEUE:
-				v = stats->dequeue;
-				break;
-			case BLKIO_STAT_GROUP_WAIT_TIME:
-				v = stats->group_wait_time;
-				break;
+			break;
+		}
+		case BLKIO_STAT_IDLE_TIME:
+			v = blkg_stat_read(&stats->idle_time);
+			break;
+		case BLKIO_STAT_EMPTY_TIME:
+			v = blkg_stat_read(&stats->empty_time);
+			break;
+		case BLKIO_STAT_DEQUEUE:
+			v = blkg_stat_read(&stats->dequeue);
+			break;
+		case BLKIO_STAT_GROUP_WAIT_TIME:
+			v = blkg_stat_read(&stats->group_wait_time);
+			break;
 #endif
-			default:
-				WARN_ON_ONCE(1);
-			}
-		} while (u64_stats_fetch_retry(&stats->syncp, sync_start));
+		default:
+			WARN_ON_ONCE(1);
+		}
 
 		blkio_get_key_name(0, dname, key_str, MAX_KEY_LEN, true);
 		cb->fill(cb, key_str, v);
 		return v;
 	}
 
-	for (st = BLKIO_STAT_READ; st < BLKIO_STAT_TOTAL; st++) {
-		do {
-			sync_start = u64_stats_fetch_begin(&stats->syncp);
-			v = stats->stat_arr[type][st];
-		} while (u64_stats_fetch_retry(&stats->syncp, sync_start));
+	switch (type) {
+	case BLKIO_STAT_MERGED:
+		rws = blkg_rwstat_read(&stats->merged);
+		break;
+	case BLKIO_STAT_SERVICE_TIME:
+		rws = blkg_rwstat_read(&stats->service_time);
+		break;
+	case BLKIO_STAT_WAIT_TIME:
+		rws = blkg_rwstat_read(&stats->wait_time);
+		break;
+	case BLKIO_STAT_QUEUED:
+		rws = blkg_rwstat_read(&stats->queued);
+		break;
+	default:
+		WARN_ON_ONCE(true);
+		break;
+	}
 
+	for (st = BLKG_RWSTAT_READ; st < BLKG_RWSTAT_NR; st++) {
 		blkio_get_key_name(st, dname, key_str, MAX_KEY_LEN, false);
-		cb->fill(cb, key_str, v);
-		if (st == BLKIO_STAT_READ || st == BLKIO_STAT_WRITE)
-			disk_total += v;
+		cb->fill(cb, key_str, rws.cnt[st]);
+		if (st == BLKG_RWSTAT_READ || st == BLKG_RWSTAT_WRITE)
+			disk_total += rws.cnt[st];
 	}
 
-	blkio_get_key_name(BLKIO_STAT_TOTAL, dname, key_str, MAX_KEY_LEN,
+	blkio_get_key_name(BLKG_RWSTAT_TOTAL, dname, key_str, MAX_KEY_LEN,
 			   false);
 	cb->fill(cb, key_str, disk_total);
 	return disk_total;
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 2060d81..7578df3 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -69,12 +69,14 @@ enum stat_type_cpu {
 
 #define BLKIO_STAT_CPU_ARR_NR	(BLKIO_STAT_CPU_SERVICED + 1)
 
-enum stat_sub_type {
-	BLKIO_STAT_READ = 0,
-	BLKIO_STAT_WRITE,
-	BLKIO_STAT_SYNC,
-	BLKIO_STAT_ASYNC,
-	BLKIO_STAT_TOTAL
+enum blkg_rwstat_type {
+	BLKG_RWSTAT_READ,
+	BLKG_RWSTAT_WRITE,
+	BLKG_RWSTAT_SYNC,
+	BLKG_RWSTAT_ASYNC,
+
+	BLKG_RWSTAT_NR,
+	BLKG_RWSTAT_TOTAL = BLKG_RWSTAT_NR,
 };
 
 /* blkg state flags */
@@ -124,54 +126,58 @@ struct blkio_cgroup {
 	uint64_t id;
 };
 
+struct blkg_stat {
+	struct u64_stats_sync		syncp;
+	uint64_t			cnt;
+};
+
+struct blkg_rwstat {
+	struct u64_stats_sync		syncp;
+	uint64_t			cnt[BLKG_RWSTAT_NR];
+};
+
 struct blkio_group_stats {
-	struct u64_stats_sync syncp;
+	/* number of ios merged */
+	struct blkg_rwstat		merged;
+	/* total time spent on device in ns, may not be accurate w/ queueing */
+	struct blkg_rwstat		service_time;
+	/* total time spent waiting in scheduler queue in ns */
+	struct blkg_rwstat		wait_time;
+	/* number of IOs queued up */
+	struct blkg_rwstat		queued;
 	/* total disk time and nr sectors dispatched by this group */
-	uint64_t time;
-	uint64_t stat_arr[BLKIO_STAT_ARR_NR][BLKIO_STAT_TOTAL];
+	struct blkg_stat		time;
 #ifdef CONFIG_DEBUG_BLK_CGROUP
-	/* Time not charged to this cgroup */
-	uint64_t unaccounted_time;
-
-	/* Sum of number of IOs queued across all samples */
-	uint64_t avg_queue_size_sum;
-	/* Count of samples taken for average */
-	uint64_t avg_queue_size_samples;
-	/* How many times this group has been removed from service tree */
-	unsigned long dequeue;
-
-	/* Total time spent waiting for it to be assigned a timeslice. */
-	uint64_t group_wait_time;
-
-	/* Time spent idling for this blkio_group */
-	uint64_t idle_time;
-	/*
-	 * Total time when we have requests queued and do not contain the
-	 * current active queue.
-	 */
-	uint64_t empty_time;
-
+	/* time not charged to this cgroup */
+	struct blkg_stat		unaccounted_time;
+	/* sum of number of ios queued across all samples */
+	struct blkg_stat		avg_queue_size_sum;
+	/* count of samples taken for average */
+	struct blkg_stat		avg_queue_size_samples;
+	/* how many times this group has been removed from service tree */
+	struct blkg_stat		dequeue;
+	/* total time spent waiting for it to be assigned a timeslice. */
+	struct blkg_stat		group_wait_time;
+	/* time spent idling for this blkio_group */
+	struct blkg_stat		idle_time;
+	/* total time with empty current active q with other requests queued */
+	struct blkg_stat		empty_time;
 	/* fields after this shouldn't be cleared on stat reset */
-	uint64_t start_group_wait_time;
-	uint64_t start_idle_time;
-	uint64_t start_empty_time;
-	uint16_t flags;
+	uint64_t			start_group_wait_time;
+	uint64_t			start_idle_time;
+	uint64_t			start_empty_time;
+	uint16_t			flags;
 #endif
 };
 
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-#define BLKG_STATS_DEBUG_CLEAR_START	\
-	offsetof(struct blkio_group_stats, unaccounted_time)
-#define BLKG_STATS_DEBUG_CLEAR_SIZE	\
-	(offsetof(struct blkio_group_stats, start_group_wait_time) - \
-	 BLKG_STATS_DEBUG_CLEAR_START)
-#endif
-
 /* Per cpu blkio group stats */
 struct blkio_group_stats_cpu {
-	uint64_t sectors;
-	uint64_t stat_arr_cpu[BLKIO_STAT_CPU_ARR_NR][BLKIO_STAT_TOTAL];
-	struct u64_stats_sync syncp;
+	/* total bytes transferred */
+	struct blkg_rwstat		service_bytes;
+	/* total IOs serviced, post merge */
+	struct blkg_rwstat		serviced;
+	/* total sectors transferred */
+	struct blkg_stat		sectors;
 };
 
 struct blkio_group_conf {
@@ -316,6 +322,121 @@ static inline void blkg_put(struct blkio_group *blkg)
 		__blkg_release(blkg);
 }
 
+/**
+ * blkg_stat_add - add a value to a blkg_stat
+ * @stat: target blkg_stat
+ * @val: value to add
+ *
+ * Add @val to @stat.  The caller is responsible for synchronizing calls to
+ * this function.
+ */
+static inline void blkg_stat_add(struct blkg_stat *stat, uint64_t val)
+{
+	u64_stats_update_begin(&stat->syncp);
+	stat->cnt += val;
+	u64_stats_update_end(&stat->syncp);
+}
+
+/**
+ * blkg_stat_read - read the current value of a blkg_stat
+ * @stat: blkg_stat to read
+ *
+ * Read the current value of @stat.  This function can be called without
+ * synchroniztion and takes care of u64 atomicity.
+ */
+static inline uint64_t blkg_stat_read(struct blkg_stat *stat)
+{
+	unsigned int start;
+	uint64_t v;
+
+	do {
+		start = u64_stats_fetch_begin(&stat->syncp);
+		v = stat->cnt;
+	} while (u64_stats_fetch_retry(&stat->syncp, start));
+
+	return v;
+}
+
+/**
+ * blkg_stat_reset - reset a blkg_stat
+ * @stat: blkg_stat to reset
+ */
+static inline void blkg_stat_reset(struct blkg_stat *stat)
+{
+	stat->cnt = 0;
+}
+
+/**
+ * blkg_rwstat_add - add a value to a blkg_rwstat
+ * @rwstat: target blkg_rwstat
+ * @rw: mask of REQ_{WRITE|SYNC}
+ * @val: value to add
+ *
+ * Add @val to @rwstat.  The counters are chosen according to @rw.  The
+ * caller is responsible for synchronizing calls to this function.
+ */
+static inline void blkg_rwstat_add(struct blkg_rwstat *rwstat,
+				   int rw, uint64_t val)
+{
+	u64_stats_update_begin(&rwstat->syncp);
+
+	if (rw & REQ_WRITE)
+		rwstat->cnt[BLKG_RWSTAT_WRITE] += val;
+	else
+		rwstat->cnt[BLKG_RWSTAT_READ] += val;
+	if (rw & REQ_SYNC)
+		rwstat->cnt[BLKG_RWSTAT_SYNC] += val;
+	else
+		rwstat->cnt[BLKG_RWSTAT_ASYNC] += val;
+
+	u64_stats_update_end(&rwstat->syncp);
+}
+
+/**
+ * blkg_rwstat_read - read the current values of a blkg_rwstat
+ * @rwstat: blkg_rwstat to read
+ *
+ * Read the current snapshot of @rwstat and return it as the return value.
+ * This function can be called without synchronization and takes care of
+ * u64 atomicity.
+ */
+static struct blkg_rwstat blkg_rwstat_read(struct blkg_rwstat *rwstat)
+{
+	unsigned int start;
+	struct blkg_rwstat tmp;
+
+	do {
+		start = u64_stats_fetch_begin(&rwstat->syncp);
+		tmp = *rwstat;
+	} while (u64_stats_fetch_retry(&rwstat->syncp, start));
+
+	return tmp;
+}
+
+/**
+ * blkg_rwstat_sum - read the total count of a blkg_rwstat
+ * @rwstat: blkg_rwstat to read
+ *
+ * Return the total count of @rwstat regardless of the IO direction.  This
+ * function can be called without synchronization and takes care of u64
+ * atomicity.
+ */
+static inline uint64_t blkg_rwstat_sum(struct blkg_rwstat *rwstat)
+{
+	struct blkg_rwstat tmp = blkg_rwstat_read(rwstat);
+
+	return tmp.cnt[BLKG_RWSTAT_READ] + tmp.cnt[BLKG_RWSTAT_WRITE];
+}
+
+/**
+ * blkg_rwstat_reset - reset a blkg_rwstat
+ * @rwstat: blkg_rwstat to reset
+ */
+static inline void blkg_rwstat_reset(struct blkg_rwstat *rwstat)
+{
+	memset(rwstat->cnt, 0, sizeof(rwstat->cnt));
+}
+
 #else
 
 struct blkio_group {
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 04/21] blkcg: restructure statistics printing
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (2 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 03/21] blkcg: introduce blkg_stat and blkg_rwstat Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 05/21] blkcg: drop blkiocg_file_write_u64() Tejun Heo
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

blkcg stats handling is a mess.  None of the stats has much to do with
blkcg core but they are all implemented in blkcg core.  Code sharing
is achieved by mixing common code with hard-coded cases for each stat
counter.

This patch restructures statistics printing such that

* Common logic exists as helper functions and specific print functions
  use the helpers to implement specific cases.

* Printing functions serving multiple counters don't require hardcoded
  switching on specific counters.

* Printing uses read_seq_string callback (other methods will be phased
  out).

This change enables further cleanups and relocating stats code to the
policy implementation it belongs to.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c |  557 ++++++++++++++++++++++------------------------------
 block/blk-cgroup.h |   60 +------
 2 files changed, 243 insertions(+), 374 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 153a2db..f670217 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -753,186 +753,227 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 	return 0;
 }
 
-static void blkio_get_key_name(enum blkg_rwstat_type type, const char *dname,
-			       char *str, int chars_left, bool diskname_only)
-{
-	snprintf(str, chars_left, "%s", dname);
-	chars_left -= strlen(str);
-	if (chars_left <= 0) {
-		printk(KERN_WARNING
-			"Possibly incorrect cgroup stat display format");
-		return;
-	}
-	if (diskname_only)
-		return;
-	switch (type) {
-	case BLKG_RWSTAT_READ:
-		strlcat(str, " Read", chars_left);
-		break;
-	case BLKG_RWSTAT_WRITE:
-		strlcat(str, " Write", chars_left);
-		break;
-	case BLKG_RWSTAT_SYNC:
-		strlcat(str, " Sync", chars_left);
-		break;
-	case BLKG_RWSTAT_ASYNC:
-		strlcat(str, " Async", chars_left);
-		break;
-	case BLKG_RWSTAT_TOTAL:
-		strlcat(str, " Total", chars_left);
-		break;
-	default:
-		strlcat(str, " Invalid", chars_left);
-	}
+static const char *blkg_dev_name(struct blkio_group *blkg)
+{
+	/* some drivers (floppy) instantiate a queue w/o disk registered */
+	if (blkg->q->backing_dev_info.dev)
+		return dev_name(blkg->q->backing_dev_info.dev);
+	return NULL;
 }
 
-static uint64_t blkio_read_stat_cpu(struct blkio_group *blkg, int plid,
-				    enum stat_type_cpu type,
-				    enum blkg_rwstat_type sub_type)
+/**
+ * blkcg_print_blkgs - helper for printing per-blkg data
+ * @sf: seq_file to print to
+ * @blkcg: blkcg of interest
+ * @prfill: fill function to print out a blkg
+ * @pol: policy in question
+ * @data: data to be passed to @prfill
+ * @show_total: to print out sum of prfill return values or not
+ *
+ * This function invokes @prfill on each blkg of @blkcg if pd for the
+ * policy specified by @pol exists.  @prfill is invoked with @sf, the
+ * policy data and @data.  If @show_total is %true, the sum of the return
+ * values from @prfill is printed with "Total" label at the end.
+ *
+ * This is to be used to construct print functions for
+ * cftype->read_seq_string method.
+ */
+static void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
+			      u64 (*prfill)(struct seq_file *,
+					    struct blkg_policy_data *, int),
+			      int pol, int data, bool show_total)
 {
-	struct blkg_policy_data *pd = blkg->pd[plid];
-	u64 val = 0;
-	int cpu;
+	struct blkio_group *blkg;
+	struct hlist_node *n;
+	u64 total = 0;
 
-	if (pd->stats_cpu == NULL)
-		return val;
+	spin_lock_irq(&blkcg->lock);
+	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node)
+		if (blkg->pd[pol])
+			total += prfill(sf, blkg->pd[pol], data);
+	spin_unlock_irq(&blkcg->lock);
+
+	if (show_total)
+		seq_printf(sf, "Total %llu\n", (unsigned long long)total);
+}
+
+/**
+ * __blkg_prfill_u64 - prfill helper for a single u64 value
+ * @sf: seq_file to print to
+ * @pd: policy data of interest
+ * @v: value to print
+ *
+ * Print @v to @sf for the device assocaited with @pd.
+ */
+static u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd,
+			     u64 v)
+{
+	const char *dname = blkg_dev_name(pd->blkg);
+
+	if (!dname)
+		return 0;
+
+	seq_printf(sf, "%s %llu\n", dname, (unsigned long long)v);
+	return v;
+}
+
+/**
+ * __blkg_prfill_rwstat - prfill helper for a blkg_rwstat
+ * @sf: seq_file to print to
+ * @pd: policy data of interest
+ * @rwstat: rwstat to print
+ *
+ * Print @rwstat to @sf for the device assocaited with @pd.
+ */
+static u64 __blkg_prfill_rwstat(struct seq_file *sf,
+				struct blkg_policy_data *pd,
+				const struct blkg_rwstat *rwstat)
+{
+	static const char *rwstr[] = {
+		[BLKG_RWSTAT_READ]	= "Read",
+		[BLKG_RWSTAT_WRITE]	= "Write",
+		[BLKG_RWSTAT_SYNC]	= "Sync",
+		[BLKG_RWSTAT_ASYNC]	= "Async",
+	};
+	const char *dname = blkg_dev_name(pd->blkg);
+	u64 v;
+	int i;
+
+	if (!dname)
+		return 0;
+
+	for (i = 0; i < BLKG_RWSTAT_NR; i++)
+		seq_printf(sf, "%s %s %llu\n", dname, rwstr[i],
+			   (unsigned long long)rwstat->cnt[i]);
+
+	v = rwstat->cnt[BLKG_RWSTAT_READ] + rwstat->cnt[BLKG_RWSTAT_WRITE];
+	seq_printf(sf, "%s Total %llu\n", dname, (unsigned long long)v);
+	return v;
+}
+
+static u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd,
+			    int off)
+{
+	return __blkg_prfill_u64(sf, pd,
+				 blkg_stat_read((void *)&pd->stats + off));
+}
+
+static u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
+			      int off)
+{
+	struct blkg_rwstat rwstat = blkg_rwstat_read((void *)&pd->stats + off);
+
+	return __blkg_prfill_rwstat(sf, pd, &rwstat);
+}
+
+/* print blkg_stat specified by BLKCG_STAT_PRIV() */
+static int blkcg_print_stat(struct cgroup *cgrp, struct cftype *cft,
+			    struct seq_file *sf)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_stat,
+			  BLKCG_STAT_POL(cft->private),
+			  BLKCG_STAT_OFF(cft->private), false);
+	return 0;
+}
+
+/* print blkg_rwstat specified by BLKCG_STAT_PRIV() */
+static int blkcg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
+			      struct seq_file *sf)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_rwstat,
+			  BLKCG_STAT_POL(cft->private),
+			  BLKCG_STAT_OFF(cft->private), true);
+	return 0;
+}
+
+static u64 blkg_prfill_cpu_stat(struct seq_file *sf,
+				struct blkg_policy_data *pd, int off)
+{
+	u64 v = 0;
+	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		struct blkio_group_stats_cpu *stats_cpu =
+		struct blkio_group_stats_cpu *sc =
 			per_cpu_ptr(pd->stats_cpu, cpu);
-		struct blkg_rwstat rws;
 
-		switch (type) {
-		case BLKIO_STAT_CPU_SECTORS:
-			val += blkg_stat_read(&stats_cpu->sectors);
-			break;
-		case BLKIO_STAT_CPU_SERVICE_BYTES:
-			rws = blkg_rwstat_read(&stats_cpu->service_bytes);
-			val += rws.cnt[sub_type];
-			break;
-		case BLKIO_STAT_CPU_SERVICED:
-			rws = blkg_rwstat_read(&stats_cpu->serviced);
-			val += rws.cnt[sub_type];
-			break;
-		}
+		v += blkg_stat_read((void *)sc + off);
 	}
 
-	return val;
+	return __blkg_prfill_u64(sf, pd, v);
 }
 
-static uint64_t blkio_get_stat_cpu(struct blkio_group *blkg, int plid,
-				   struct cgroup_map_cb *cb, const char *dname,
-				   enum stat_type_cpu type)
+static u64 blkg_prfill_cpu_rwstat(struct seq_file *sf,
+				  struct blkg_policy_data *pd, int off)
 {
-	uint64_t disk_total, val;
-	char key_str[MAX_KEY_LEN];
-	enum blkg_rwstat_type sub_type;
+	struct blkg_rwstat rwstat = { }, tmp;
+	int i, cpu;
 
-	if (type == BLKIO_STAT_CPU_SECTORS) {
-		val = blkio_read_stat_cpu(blkg, plid, type, 0);
-		blkio_get_key_name(0, dname, key_str, MAX_KEY_LEN, true);
-		cb->fill(cb, key_str, val);
-		return val;
-	}
+	for_each_possible_cpu(cpu) {
+		struct blkio_group_stats_cpu *sc =
+			per_cpu_ptr(pd->stats_cpu, cpu);
 
-	for (sub_type = BLKG_RWSTAT_READ; sub_type < BLKG_RWSTAT_NR;
-			sub_type++) {
-		blkio_get_key_name(sub_type, dname, key_str, MAX_KEY_LEN,
-				   false);
-		val = blkio_read_stat_cpu(blkg, plid, type, sub_type);
-		cb->fill(cb, key_str, val);
+		tmp = blkg_rwstat_read((void *)sc + off);
+		for (i = 0; i < BLKG_RWSTAT_NR; i++)
+			rwstat.cnt[i] += tmp.cnt[i];
 	}
 
-	disk_total = blkio_read_stat_cpu(blkg, plid, type, BLKG_RWSTAT_READ) +
-		blkio_read_stat_cpu(blkg, plid, type, BLKG_RWSTAT_WRITE);
-
-	blkio_get_key_name(BLKG_RWSTAT_TOTAL, dname, key_str, MAX_KEY_LEN,
-			   false);
-	cb->fill(cb, key_str, disk_total);
-	return disk_total;
+	return __blkg_prfill_rwstat(sf, pd, &rwstat);
 }
 
-static uint64_t blkio_get_stat(struct blkio_group *blkg, int plid,
-			       struct cgroup_map_cb *cb, const char *dname,
-			       enum stat_type type)
+/* print per-cpu blkg_stat specified by BLKCG_STAT_PRIV() */
+static int blkcg_print_cpu_stat(struct cgroup *cgrp, struct cftype *cft,
+				struct seq_file *sf)
 {
-	struct blkio_group_stats *stats = &blkg->pd[plid]->stats;
-	uint64_t v = 0, disk_total = 0;
-	char key_str[MAX_KEY_LEN];
-	struct blkg_rwstat rws = { };
-	int st;
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
-	if (type >= BLKIO_STAT_ARR_NR) {
-		switch (type) {
-		case BLKIO_STAT_TIME:
-			v = blkg_stat_read(&stats->time);
-			break;
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-		case BLKIO_STAT_UNACCOUNTED_TIME:
-			v = blkg_stat_read(&stats->unaccounted_time);
-			break;
-		case BLKIO_STAT_AVG_QUEUE_SIZE: {
-			uint64_t samples;
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_cpu_stat,
+			  BLKCG_STAT_POL(cft->private),
+			  BLKCG_STAT_OFF(cft->private), false);
+	return 0;
+}
 
-			samples = blkg_stat_read(&stats->avg_queue_size_samples);
-			if (samples) {
-				v = blkg_stat_read(&stats->avg_queue_size_sum);
-				do_div(v, samples);
-			}
-			break;
-		}
-		case BLKIO_STAT_IDLE_TIME:
-			v = blkg_stat_read(&stats->idle_time);
-			break;
-		case BLKIO_STAT_EMPTY_TIME:
-			v = blkg_stat_read(&stats->empty_time);
-			break;
-		case BLKIO_STAT_DEQUEUE:
-			v = blkg_stat_read(&stats->dequeue);
-			break;
-		case BLKIO_STAT_GROUP_WAIT_TIME:
-			v = blkg_stat_read(&stats->group_wait_time);
-			break;
-#endif
-		default:
-			WARN_ON_ONCE(1);
-		}
+/* print per-cpu blkg_rwstat specified by BLKCG_STAT_PRIV() */
+static int blkcg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
+				  struct seq_file *sf)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
-		blkio_get_key_name(0, dname, key_str, MAX_KEY_LEN, true);
-		cb->fill(cb, key_str, v);
-		return v;
-	}
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_cpu_rwstat,
+			  BLKCG_STAT_POL(cft->private),
+			  BLKCG_STAT_OFF(cft->private), true);
+	return 0;
+}
 
-	switch (type) {
-	case BLKIO_STAT_MERGED:
-		rws = blkg_rwstat_read(&stats->merged);
-		break;
-	case BLKIO_STAT_SERVICE_TIME:
-		rws = blkg_rwstat_read(&stats->service_time);
-		break;
-	case BLKIO_STAT_WAIT_TIME:
-		rws = blkg_rwstat_read(&stats->wait_time);
-		break;
-	case BLKIO_STAT_QUEUED:
-		rws = blkg_rwstat_read(&stats->queued);
-		break;
-	default:
-		WARN_ON_ONCE(true);
-		break;
-	}
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+static u64 blkg_prfill_avg_queue_size(struct seq_file *sf,
+				      struct blkg_policy_data *pd, int off)
+{
+	u64 samples = blkg_stat_read(&pd->stats.avg_queue_size_samples);
+	u64 v = 0;
 
-	for (st = BLKG_RWSTAT_READ; st < BLKG_RWSTAT_NR; st++) {
-		blkio_get_key_name(st, dname, key_str, MAX_KEY_LEN, false);
-		cb->fill(cb, key_str, rws.cnt[st]);
-		if (st == BLKG_RWSTAT_READ || st == BLKG_RWSTAT_WRITE)
-			disk_total += rws.cnt[st];
+	if (samples) {
+		v = blkg_stat_read(&pd->stats.avg_queue_size_sum);
+		do_div(v, samples);
 	}
+	__blkg_prfill_u64(sf, pd, v);
+	return 0;
+}
+
+/* print avg_queue_size */
+static int blkcg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft,
+				      struct seq_file *sf)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
-	blkio_get_key_name(BLKG_RWSTAT_TOTAL, dname, key_str, MAX_KEY_LEN,
-			   false);
-	cb->fill(cb, key_str, disk_total);
-	return disk_total;
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_avg_queue_size,
+			  BLKIO_POLICY_PROP, 0, false);
+	return 0;
 }
+#endif	/* CONFIG_DEBUG_BLK_CGROUP */
 
 static int blkio_policy_parse_and_set(char *buf, enum blkio_policy_id plid,
 				      int fileid, struct blkio_cgroup *blkcg)
@@ -1074,14 +1115,6 @@ static int blkiocg_file_write(struct cgroup *cgrp, struct cftype *cft,
 	return ret;
 }
 
-static const char *blkg_dev_name(struct blkio_group *blkg)
-{
-	/* some drivers (floppy) instantiate a queue w/o disk registered */
-	if (blkg->q->backing_dev_info.dev)
-		return dev_name(blkg->q->backing_dev_info.dev);
-	return NULL;
-}
-
 static void blkio_print_group_conf(struct cftype *cft, struct blkio_group *blkg,
 				   struct seq_file *m)
 {
@@ -1174,116 +1207,6 @@ static int blkiocg_file_read(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
-static int blkio_read_blkg_stats(struct blkio_cgroup *blkcg,
-		struct cftype *cft, struct cgroup_map_cb *cb,
-		enum stat_type type, bool show_total, bool pcpu)
-{
-	struct blkio_group *blkg;
-	struct hlist_node *n;
-	uint64_t cgroup_total = 0;
-
-	spin_lock_irq(&blkcg->lock);
-
-	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
-		const char *dname = blkg_dev_name(blkg);
-		int plid = BLKIOFILE_POLICY(cft->private);
-
-		if (!dname)
-			continue;
-		if (pcpu)
-			cgroup_total += blkio_get_stat_cpu(blkg, plid,
-							   cb, dname, type);
-		else
-			cgroup_total += blkio_get_stat(blkg, plid,
-						       cb, dname, type);
-	}
-	if (show_total)
-		cb->fill(cb, "Total", cgroup_total);
-
-	spin_unlock_irq(&blkcg->lock);
-	return 0;
-}
-
-/* All map kind of cgroup file get serviced by this function */
-static int blkiocg_file_read_map(struct cgroup *cgrp, struct cftype *cft,
-				struct cgroup_map_cb *cb)
-{
-	struct blkio_cgroup *blkcg;
-	enum blkio_policy_id plid = BLKIOFILE_POLICY(cft->private);
-	int name = BLKIOFILE_ATTR(cft->private);
-
-	blkcg = cgroup_to_blkio_cgroup(cgrp);
-
-	switch(plid) {
-	case BLKIO_POLICY_PROP:
-		switch(name) {
-		case BLKIO_PROP_time:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_TIME, 0, 0);
-		case BLKIO_PROP_sectors:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_CPU_SECTORS, 0, 1);
-		case BLKIO_PROP_io_service_bytes:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-					BLKIO_STAT_CPU_SERVICE_BYTES, 1, 1);
-		case BLKIO_PROP_io_serviced:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_CPU_SERVICED, 1, 1);
-		case BLKIO_PROP_io_service_time:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_SERVICE_TIME, 1, 0);
-		case BLKIO_PROP_io_wait_time:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_WAIT_TIME, 1, 0);
-		case BLKIO_PROP_io_merged:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_MERGED, 1, 0);
-		case BLKIO_PROP_io_queued:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_QUEUED, 1, 0);
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-		case BLKIO_PROP_unaccounted_time:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-					BLKIO_STAT_UNACCOUNTED_TIME, 0, 0);
-		case BLKIO_PROP_dequeue:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_DEQUEUE, 0, 0);
-		case BLKIO_PROP_avg_queue_size:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-					BLKIO_STAT_AVG_QUEUE_SIZE, 0, 0);
-		case BLKIO_PROP_group_wait_time:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-					BLKIO_STAT_GROUP_WAIT_TIME, 0, 0);
-		case BLKIO_PROP_idle_time:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_IDLE_TIME, 0, 0);
-		case BLKIO_PROP_empty_time:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_EMPTY_TIME, 0, 0);
-#endif
-		default:
-			BUG();
-		}
-		break;
-	case BLKIO_POLICY_THROTL:
-		switch(name){
-		case BLKIO_THROTL_io_service_bytes:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_CPU_SERVICE_BYTES, 1, 1);
-		case BLKIO_THROTL_io_serviced:
-			return blkio_read_blkg_stats(blkcg, cft, cb,
-						BLKIO_STAT_CPU_SERVICED, 1, 1);
-		default:
-			BUG();
-		}
-		break;
-	default:
-		BUG();
-	}
-
-	return 0;
-}
-
 static int blkio_weight_write(struct blkio_cgroup *blkcg, int plid, u64 val)
 {
 	struct blkio_group *blkg;
@@ -1369,51 +1292,51 @@ struct cftype blkio_files[] = {
 	},
 	{
 		.name = "time",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_time),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, time)),
+		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "sectors",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_sectors),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats_cpu, sectors)),
+		.read_seq_string = blkcg_print_cpu_stat,
 	},
 	{
 		.name = "io_service_bytes",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_io_service_bytes),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats_cpu, service_bytes)),
+		.read_seq_string = blkcg_print_cpu_rwstat,
 	},
 	{
 		.name = "io_serviced",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_io_serviced),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats_cpu, serviced)),
+		.read_seq_string = blkcg_print_cpu_rwstat,
 	},
 	{
 		.name = "io_service_time",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_io_service_time),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, service_time)),
+		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "io_wait_time",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_io_wait_time),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, wait_time)),
+		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "io_merged",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_io_merged),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, merged)),
+		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "io_queued",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_io_queued),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, queued)),
+		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "reset_stats",
@@ -1457,54 +1380,52 @@ struct cftype blkio_files[] = {
 	},
 	{
 		.name = "throttle.io_service_bytes",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
-				BLKIO_THROTL_io_service_bytes),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_THROTL,
+				offsetof(struct blkio_group_stats_cpu, service_bytes)),
+		.read_seq_string = blkcg_print_cpu_rwstat,
 	},
 	{
 		.name = "throttle.io_serviced",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
-				BLKIO_THROTL_io_serviced),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_THROTL,
+				offsetof(struct blkio_group_stats_cpu, serviced)),
+		.read_seq_string = blkcg_print_cpu_rwstat,
 	},
 #endif /* CONFIG_BLK_DEV_THROTTLING */
 
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 	{
 		.name = "avg_queue_size",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_avg_queue_size),
-		.read_map = blkiocg_file_read_map,
+		.read_seq_string = blkcg_print_avg_queue_size,
 	},
 	{
 		.name = "group_wait_time",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_group_wait_time),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, group_wait_time)),
+		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "idle_time",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_idle_time),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, idle_time)),
+		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "empty_time",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_empty_time),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, empty_time)),
+		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "dequeue",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_dequeue),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, dequeue)),
+		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "unaccounted_time",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_unaccounted_time),
-		.read_map = blkiocg_file_read_map,
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, unaccounted_time)),
+		.read_seq_string = blkcg_print_stat,
 	},
 #endif
 	{ }	/* terminate */
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 7578df3..7331d79 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -28,46 +28,10 @@ enum blkio_policy_id {
 
 #ifdef CONFIG_BLK_CGROUP
 
-enum stat_type {
-	/* Number of IOs merged */
-	BLKIO_STAT_MERGED,
-	/* Total time spent (in ns) between request dispatch to the driver and
-	 * request completion for IOs doen by this cgroup. This may not be
-	 * accurate when NCQ is turned on. */
-	BLKIO_STAT_SERVICE_TIME,
-	/* Total time spent waiting in scheduler queue in ns */
-	BLKIO_STAT_WAIT_TIME,
-	/* Number of IOs queued up */
-	BLKIO_STAT_QUEUED,
-
-	/* All the single valued stats go below this */
-	BLKIO_STAT_TIME,
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-	/* Time not charged to this cgroup */
-	BLKIO_STAT_UNACCOUNTED_TIME,
-	BLKIO_STAT_AVG_QUEUE_SIZE,
-	BLKIO_STAT_IDLE_TIME,
-	BLKIO_STAT_EMPTY_TIME,
-	BLKIO_STAT_GROUP_WAIT_TIME,
-	BLKIO_STAT_DEQUEUE
-#endif
-};
-
-/* Types lower than this live in stat_arr and have subtypes */
-#define BLKIO_STAT_ARR_NR	(BLKIO_STAT_QUEUED + 1)
-
-/* Per cpu stats */
-enum stat_type_cpu {
-	/* Total bytes transferred */
-	BLKIO_STAT_CPU_SERVICE_BYTES,
-	/* Total IOs serviced, post merge */
-	BLKIO_STAT_CPU_SERVICED,
-
-	/* All the single valued stats go below this */
-	BLKIO_STAT_CPU_SECTORS,
-};
-
-#define BLKIO_STAT_CPU_ARR_NR	(BLKIO_STAT_CPU_SERVICED + 1)
+/* cft->private [un]packing for stat printing */
+#define BLKCG_STAT_PRIV(pol, off)	(((unsigned)(pol) << 16) | (off))
+#define BLKCG_STAT_POL(prv)		((unsigned)(prv) >> 16)
+#define BLKCG_STAT_OFF(prv)		((unsigned)(prv) & 0xffff)
 
 enum blkg_rwstat_type {
 	BLKG_RWSTAT_READ,
@@ -90,20 +54,6 @@ enum blkg_state_flags {
 enum blkcg_file_name_prop {
 	BLKIO_PROP_weight = 1,
 	BLKIO_PROP_weight_device,
-	BLKIO_PROP_io_service_bytes,
-	BLKIO_PROP_io_serviced,
-	BLKIO_PROP_time,
-	BLKIO_PROP_sectors,
-	BLKIO_PROP_unaccounted_time,
-	BLKIO_PROP_io_service_time,
-	BLKIO_PROP_io_wait_time,
-	BLKIO_PROP_io_merged,
-	BLKIO_PROP_io_queued,
-	BLKIO_PROP_avg_queue_size,
-	BLKIO_PROP_group_wait_time,
-	BLKIO_PROP_idle_time,
-	BLKIO_PROP_empty_time,
-	BLKIO_PROP_dequeue,
 };
 
 /* cgroup files owned by throttle policy */
@@ -112,8 +62,6 @@ enum blkcg_file_name_throtl {
 	BLKIO_THROTL_write_bps_device,
 	BLKIO_THROTL_read_iops_device,
 	BLKIO_THROTL_write_iops_device,
-	BLKIO_THROTL_io_service_bytes,
-	BLKIO_THROTL_io_serviced,
 };
 
 struct blkio_cgroup {
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 05/21] blkcg: drop blkiocg_file_write_u64()
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (3 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 04/21] blkcg: restructure statistics printing Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 06/21] blkcg: restructure configuration printing Tejun Heo
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

blkiocg_file_write_u64() has single switch case.  Drop
blkiocg_file_write_u64(), rename blkio_weight_write() to
blkcg_set_weight() and use it directly for .write_u64 callback.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c |   35 +++++++----------------------------
 1 files changed, 7 insertions(+), 28 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index f670217..ae539d3 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1207,8 +1207,9 @@ static int blkiocg_file_read(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
-static int blkio_weight_write(struct blkio_cgroup *blkcg, int plid, u64 val)
+static int blkcg_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
 {
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 	struct blkio_group *blkg;
 	struct hlist_node *n;
 
@@ -1220,10 +1221,11 @@ static int blkio_weight_write(struct blkio_cgroup *blkcg, int plid, u64 val)
 	blkcg->weight = (unsigned int)val;
 
 	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
-		struct blkg_policy_data *pd = blkg->pd[plid];
+		struct blkg_policy_data *pd = blkg->pd[BLKIO_POLICY_PROP];
 
-		if (!pd->conf.weight)
-			blkio_update_group_weight(blkg, plid, blkcg->weight);
+		if (pd && !pd->conf.weight)
+			blkio_update_group_weight(blkg, BLKIO_POLICY_PROP,
+						  blkcg->weight);
 	}
 
 	spin_unlock_irq(&blkcg->lock);
@@ -1251,29 +1253,6 @@ static u64 blkiocg_file_read_u64 (struct cgroup *cgrp, struct cftype *cft) {
 	return 0;
 }
 
-static int
-blkiocg_file_write_u64(struct cgroup *cgrp, struct cftype *cft, u64 val)
-{
-	struct blkio_cgroup *blkcg;
-	enum blkio_policy_id plid = BLKIOFILE_POLICY(cft->private);
-	int name = BLKIOFILE_ATTR(cft->private);
-
-	blkcg = cgroup_to_blkio_cgroup(cgrp);
-
-	switch(plid) {
-	case BLKIO_POLICY_PROP:
-		switch(name) {
-		case BLKIO_PROP_weight:
-			return blkio_weight_write(blkcg, plid, val);
-		}
-		break;
-	default:
-		BUG();
-	}
-
-	return 0;
-}
-
 struct cftype blkio_files[] = {
 	{
 		.name = "weight_device",
@@ -1288,7 +1267,7 @@ struct cftype blkio_files[] = {
 		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
 				BLKIO_PROP_weight),
 		.read_u64 = blkiocg_file_read_u64,
-		.write_u64 = blkiocg_file_write_u64,
+		.write_u64 = blkcg_set_weight,
 	},
 	{
 		.name = "time",
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 06/21] blkcg: restructure configuration printing
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (4 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 05/21] blkcg: drop blkiocg_file_write_u64() Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 07/21] blkcg: restructure blkio_group configruation setting Tejun Heo
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

Similarly to the previous stat restructuring, this patch restructures
conf printing code such that,

* Conf printing uses the same helpers as stat.

* Printing function doesn't require hardcoded switching on the config
  being printed.  Note that this isn't complete yet for throttle
  confs.  The next patch will convert setting for these confs and will
  complete the transition.

* Printing uses read_seq_string callback (other methods will be phased
  out).

Note that blkio_group_conf.iops[2] is changed to u64 so that they can
be manipulated with the same functions.  This is transitional and will
go away later.

After this patch, per-device configurations - weight, bps and iops -
use __blkg_prfill_u64() for printing which uses white space as
delimiter instead of tab.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c |  156 ++++++++++++++++++----------------------------------
 block/blk-cgroup.h |    3 +-
 2 files changed, 55 insertions(+), 104 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index ae539d3..5e8a818 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1115,95 +1115,28 @@ static int blkiocg_file_write(struct cgroup *cgrp, struct cftype *cft,
 	return ret;
 }
 
-static void blkio_print_group_conf(struct cftype *cft, struct blkio_group *blkg,
-				   struct seq_file *m)
+/* for propio conf */
+static u64 blkg_prfill_weight_device(struct seq_file *sf,
+				     struct blkg_policy_data *pd, int off)
 {
-	int plid = BLKIOFILE_POLICY(cft->private);
-	int fileid = BLKIOFILE_ATTR(cft->private);
-	struct blkg_policy_data *pd = blkg->pd[plid];
-	const char *dname = blkg_dev_name(blkg);
-	int rw = WRITE;
-
-	if (!dname)
-		return;
-
-	switch (plid) {
-		case BLKIO_POLICY_PROP:
-			if (pd->conf.weight)
-				seq_printf(m, "%s\t%u\n",
-					   dname, pd->conf.weight);
-			break;
-		case BLKIO_POLICY_THROTL:
-			switch (fileid) {
-			case BLKIO_THROTL_read_bps_device:
-				rw = READ;
-			case BLKIO_THROTL_write_bps_device:
-				if (pd->conf.bps[rw])
-					seq_printf(m, "%s\t%llu\n",
-						   dname, pd->conf.bps[rw]);
-				break;
-			case BLKIO_THROTL_read_iops_device:
-				rw = READ;
-			case BLKIO_THROTL_write_iops_device:
-				if (pd->conf.iops[rw])
-					seq_printf(m, "%s\t%u\n",
-						   dname, pd->conf.iops[rw]);
-				break;
-			}
-			break;
-		default:
-			BUG();
-	}
+	if (!pd->conf.weight)
+		return 0;
+	return __blkg_prfill_u64(sf, pd, pd->conf.weight);
 }
 
-/* cgroup files which read their data from policy nodes end up here */
-static void blkio_read_conf(struct cftype *cft, struct blkio_cgroup *blkcg,
-			    struct seq_file *m)
+static int blkcg_print_weight_device(struct cgroup *cgrp, struct cftype *cft,
+				     struct seq_file *sf)
 {
-	struct blkio_group *blkg;
-	struct hlist_node *n;
-
-	spin_lock_irq(&blkcg->lock);
-	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node)
-		blkio_print_group_conf(cft, blkg, m);
-	spin_unlock_irq(&blkcg->lock);
+	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
+			  blkg_prfill_weight_device, BLKIO_POLICY_PROP, 0,
+			  false);
+	return 0;
 }
 
-static int blkiocg_file_read(struct cgroup *cgrp, struct cftype *cft,
-				struct seq_file *m)
+static int blkcg_print_weight(struct cgroup *cgrp, struct cftype *cft,
+			      struct seq_file *sf)
 {
-	struct blkio_cgroup *blkcg;
-	enum blkio_policy_id plid = BLKIOFILE_POLICY(cft->private);
-	int name = BLKIOFILE_ATTR(cft->private);
-
-	blkcg = cgroup_to_blkio_cgroup(cgrp);
-
-	switch(plid) {
-	case BLKIO_POLICY_PROP:
-		switch(name) {
-		case BLKIO_PROP_weight_device:
-			blkio_read_conf(cft, blkcg, m);
-			return 0;
-		default:
-			BUG();
-		}
-		break;
-	case BLKIO_POLICY_THROTL:
-		switch(name){
-		case BLKIO_THROTL_read_bps_device:
-		case BLKIO_THROTL_write_bps_device:
-		case BLKIO_THROTL_read_iops_device:
-		case BLKIO_THROTL_write_iops_device:
-			blkio_read_conf(cft, blkcg, m);
-			return 0;
-		default:
-			BUG();
-		}
-		break;
-	default:
-		BUG();
-	}
-
+	seq_printf(sf, "%u\n", cgroup_to_blkio_cgroup(cgrp)->weight);
 	return 0;
 }
 
@@ -1233,40 +1166,59 @@ static int blkcg_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
 	return 0;
 }
 
-static u64 blkiocg_file_read_u64 (struct cgroup *cgrp, struct cftype *cft) {
-	struct blkio_cgroup *blkcg;
-	enum blkio_policy_id plid = BLKIOFILE_POLICY(cft->private);
-	int name = BLKIOFILE_ATTR(cft->private);
+/* for blk-throttle conf */
+#ifdef CONFIG_BLK_DEV_THROTTLING
+static u64 blkg_prfill_conf_u64(struct seq_file *sf,
+				struct blkg_policy_data *pd, int off)
+{
+	u64 v = *(u64 *)((void *)&pd->conf + off);
 
-	blkcg = cgroup_to_blkio_cgroup(cgrp);
+	if (!v)
+		return 0;
+	return __blkg_prfill_u64(sf, pd, v);
+}
 
-	switch(plid) {
-	case BLKIO_POLICY_PROP:
-		switch(name) {
-		case BLKIO_PROP_weight:
-			return (u64)blkcg->weight;
-		}
+static int blkcg_print_conf_u64(struct cgroup *cgrp, struct cftype *cft,
+				struct seq_file *sf)
+{
+	int off;
+
+	switch (BLKIOFILE_ATTR(cft->private)) {
+	case BLKIO_THROTL_read_bps_device:
+		off = offsetof(struct blkio_group_conf, bps[READ]);
+		break;
+	case BLKIO_THROTL_write_bps_device:
+		off = offsetof(struct blkio_group_conf, bps[WRITE]);
+		break;
+	case BLKIO_THROTL_read_iops_device:
+		off = offsetof(struct blkio_group_conf, iops[READ]);
+		break;
+	case BLKIO_THROTL_write_iops_device:
+		off = offsetof(struct blkio_group_conf, iops[WRITE]);
 		break;
 	default:
-		BUG();
+		return -EINVAL;
 	}
+
+	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
+			  blkg_prfill_conf_u64, BLKIO_POLICY_THROTL,
+			  off, false);
 	return 0;
 }
+#endif
 
 struct cftype blkio_files[] = {
 	{
 		.name = "weight_device",
 		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
 				BLKIO_PROP_weight_device),
-		.read_seq_string = blkiocg_file_read,
+		.read_seq_string = blkcg_print_weight_device,
 		.write_string = blkiocg_file_write,
 		.max_write_len = 256,
 	},
 	{
 		.name = "weight",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_weight),
-		.read_u64 = blkiocg_file_read_u64,
+		.read_seq_string = blkcg_print_weight,
 		.write_u64 = blkcg_set_weight,
 	},
 	{
@@ -1326,7 +1278,7 @@ struct cftype blkio_files[] = {
 		.name = "throttle.read_bps_device",
 		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
 				BLKIO_THROTL_read_bps_device),
-		.read_seq_string = blkiocg_file_read,
+		.read_seq_string = blkcg_print_conf_u64,
 		.write_string = blkiocg_file_write,
 		.max_write_len = 256,
 	},
@@ -1335,7 +1287,7 @@ struct cftype blkio_files[] = {
 		.name = "throttle.write_bps_device",
 		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
 				BLKIO_THROTL_write_bps_device),
-		.read_seq_string = blkiocg_file_read,
+		.read_seq_string = blkcg_print_conf_u64,
 		.write_string = blkiocg_file_write,
 		.max_write_len = 256,
 	},
@@ -1344,7 +1296,7 @@ struct cftype blkio_files[] = {
 		.name = "throttle.read_iops_device",
 		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
 				BLKIO_THROTL_read_iops_device),
-		.read_seq_string = blkiocg_file_read,
+		.read_seq_string = blkcg_print_conf_u64,
 		.write_string = blkiocg_file_write,
 		.max_write_len = 256,
 	},
@@ -1353,7 +1305,7 @@ struct cftype blkio_files[] = {
 		.name = "throttle.write_iops_device",
 		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
 				BLKIO_THROTL_write_iops_device),
-		.read_seq_string = blkiocg_file_read,
+		.read_seq_string = blkcg_print_conf_u64,
 		.write_string = blkiocg_file_write,
 		.max_write_len = 256,
 	},
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 7331d79..b67eefa 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -52,7 +52,6 @@ enum blkg_state_flags {
 
 /* cgroup files owned by proportional weight policy */
 enum blkcg_file_name_prop {
-	BLKIO_PROP_weight = 1,
 	BLKIO_PROP_weight_device,
 };
 
@@ -130,7 +129,7 @@ struct blkio_group_stats_cpu {
 
 struct blkio_group_conf {
 	unsigned int weight;
-	unsigned int iops[2];
+	u64 iops[2];
 	u64 bps[2];
 };
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 07/21] blkcg: restructure blkio_group configruation setting
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (5 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 06/21] blkcg: restructure configuration printing Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 08/21] blkcg: blkg_conf_prep() Tejun Heo
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

As part of userland interface restructuring, this patch updates
per-blkio_group configuration setting.  Instead of funneling
everything through a master function which has hard-coded cases for
each config file it may handle, the common part is factored into
blkg_conf_prep() and blkg_conf_finish() and different configuration
setters are implemented using the helpers.

While this doesn't result in immediate LOC reduction, this enables
further cleanups and more modular implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c |  274 ++++++++++++++++++++++++++++------------------------
 block/blk-cgroup.h |   13 ---
 2 files changed, 147 insertions(+), 140 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 5e8a818..0d4f21e 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -43,12 +43,6 @@ EXPORT_SYMBOL_GPL(blkio_root_cgroup);
 
 static struct blkio_policy_type *blkio_policy[BLKIO_NR_POLICIES];
 
-/* for encoding cft->private value on file */
-#define BLKIOFILE_PRIVATE(x, val)	(((x) << 16) | (val))
-/* What policy owns the file, proportional or throttle */
-#define BLKIOFILE_POLICY(val)		(((val) >> 16) & 0xffff)
-#define BLKIOFILE_ATTR(val)		((val) & 0xffff)
-
 struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup)
 {
 	return container_of(cgroup_subsys_state(cgroup, blkio_subsys_id),
@@ -86,7 +80,7 @@ static inline void blkio_update_group_weight(struct blkio_group *blkg,
 }
 
 static inline void blkio_update_group_bps(struct blkio_group *blkg, int plid,
-					  u64 bps, int fileid)
+					  u64 bps, int rw)
 {
 	struct blkio_policy_type *blkiop;
 
@@ -96,21 +90,18 @@ static inline void blkio_update_group_bps(struct blkio_group *blkg, int plid,
 		if (blkiop->plid != plid)
 			continue;
 
-		if (fileid == BLKIO_THROTL_read_bps_device
-		    && blkiop->ops.blkio_update_group_read_bps_fn)
+		if (rw == READ && blkiop->ops.blkio_update_group_read_bps_fn)
 			blkiop->ops.blkio_update_group_read_bps_fn(blkg->q,
 								blkg, bps);
 
-		if (fileid == BLKIO_THROTL_write_bps_device
-		    && blkiop->ops.blkio_update_group_write_bps_fn)
+		if (rw == WRITE && blkiop->ops.blkio_update_group_write_bps_fn)
 			blkiop->ops.blkio_update_group_write_bps_fn(blkg->q,
 								blkg, bps);
 	}
 }
 
-static inline void blkio_update_group_iops(struct blkio_group *blkg,
-					   int plid, unsigned int iops,
-					   int fileid)
+static inline void blkio_update_group_iops(struct blkio_group *blkg, int plid,
+					   u64 iops, int rw)
 {
 	struct blkio_policy_type *blkiop;
 
@@ -120,13 +111,11 @@ static inline void blkio_update_group_iops(struct blkio_group *blkg,
 		if (blkiop->plid != plid)
 			continue;
 
-		if (fileid == BLKIO_THROTL_read_iops_device
-		    && blkiop->ops.blkio_update_group_read_iops_fn)
+		if (rw == READ && blkiop->ops.blkio_update_group_read_iops_fn)
 			blkiop->ops.blkio_update_group_read_iops_fn(blkg->q,
 								blkg, iops);
 
-		if (fileid == BLKIO_THROTL_write_iops_device
-		    && blkiop->ops.blkio_update_group_write_iops_fn)
+		if (rw == WRITE && blkiop->ops.blkio_update_group_write_iops_fn)
 			blkiop->ops.blkio_update_group_write_iops_fn(blkg->q,
 								blkg,iops);
 	}
@@ -975,19 +964,40 @@ static int blkcg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft,
 }
 #endif	/* CONFIG_DEBUG_BLK_CGROUP */
 
-static int blkio_policy_parse_and_set(char *buf, enum blkio_policy_id plid,
-				      int fileid, struct blkio_cgroup *blkcg)
+struct blkg_conf_ctx {
+	struct gendisk		*disk;
+	struct blkio_group	*blkg;
+	u64			v;
+};
+
+/**
+ * blkg_conf_prep - parse and prepare for per-blkg config update
+ * @blkcg: target block cgroup
+ * @input: input string
+ * @ctx: blkg_conf_ctx to be filled
+ *
+ * Parse per-blkg config update from @input and initialize @ctx with the
+ * result.  @ctx->blkg points to the blkg to be updated and @ctx->v the new
+ * value.  This function returns with RCU read locked and must be paired
+ * with blkg_conf_finish().
+ */
+static int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
+			  struct blkg_conf_ctx *ctx)
+	__acquires(rcu)
 {
-	struct gendisk *disk = NULL;
-	struct blkio_group *blkg = NULL;
-	struct blkg_policy_data *pd;
-	char *s[4], *p, *major_s = NULL, *minor_s = NULL;
+	struct gendisk *disk;
+	struct blkio_group *blkg;
+	char *buf, *s[4], *p, *major_s, *minor_s;
 	unsigned long major, minor;
 	int i = 0, ret = -EINVAL;
 	int part;
 	dev_t dev;
 	u64 temp;
 
+	buf = kstrdup(input, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
 	memset(s, 0, sizeof(s));
 
 	while ((p = strsep(&buf, " ")) != NULL) {
@@ -1037,82 +1047,42 @@ static int blkio_policy_parse_and_set(char *buf, enum blkio_policy_id plid,
 
 	if (IS_ERR(blkg)) {
 		ret = PTR_ERR(blkg);
-		goto out_unlock;
-	}
-
-	pd = blkg->pd[plid];
-
-	switch (plid) {
-	case BLKIO_POLICY_PROP:
-		if ((temp < BLKIO_WEIGHT_MIN && temp > 0) ||
-		     temp > BLKIO_WEIGHT_MAX)
-			goto out_unlock;
-
-		pd->conf.weight = temp;
-		blkio_update_group_weight(blkg, plid, temp ?: blkcg->weight);
-		break;
-	case BLKIO_POLICY_THROTL:
-		switch(fileid) {
-		case BLKIO_THROTL_read_bps_device:
-			pd->conf.bps[READ] = temp;
-			blkio_update_group_bps(blkg, plid, temp ?: -1, fileid);
-			break;
-		case BLKIO_THROTL_write_bps_device:
-			pd->conf.bps[WRITE] = temp;
-			blkio_update_group_bps(blkg, plid, temp ?: -1, fileid);
-			break;
-		case BLKIO_THROTL_read_iops_device:
-			if (temp > THROTL_IOPS_MAX)
-				goto out_unlock;
-			pd->conf.iops[READ] = temp;
-			blkio_update_group_iops(blkg, plid, temp ?: -1, fileid);
-			break;
-		case BLKIO_THROTL_write_iops_device:
-			if (temp > THROTL_IOPS_MAX)
-				goto out_unlock;
-			pd->conf.iops[WRITE] = temp;
-			blkio_update_group_iops(blkg, plid, temp ?: -1, fileid);
-			break;
+		rcu_read_unlock();
+		put_disk(disk);
+		/*
+		 * If queue was bypassing, we should retry.  Do so after a
+		 * short msleep().  It isn't strictly necessary but queue
+		 * can be bypassing for some time and it's always nice to
+		 * avoid busy looping.
+		 */
+		if (ret == -EBUSY) {
+			msleep(10);
+			ret = restart_syscall();
 		}
-		break;
-	default:
-		BUG();
+		goto out;
 	}
+
+	ctx->disk = disk;
+	ctx->blkg = blkg;
+	ctx->v = temp;
 	ret = 0;
-out_unlock:
-	rcu_read_unlock();
 out:
-	put_disk(disk);
-
-	/*
-	 * If queue was bypassing, we should retry.  Do so after a short
-	 * msleep().  It isn't strictly necessary but queue can be
-	 * bypassing for some time and it's always nice to avoid busy
-	 * looping.
-	 */
-	if (ret == -EBUSY) {
-		msleep(10);
-		return restart_syscall();
-	}
+	kfree(buf);
 	return ret;
 }
 
-static int blkiocg_file_write(struct cgroup *cgrp, struct cftype *cft,
- 				       const char *buffer)
+/**
+ * blkg_conf_finish - finish up per-blkg config update
+ * @ctx: blkg_conf_ctx intiailized by blkg_conf_prep()
+ *
+ * Finish up after per-blkg config update.  This function must be paired
+ * with blkg_conf_prep().
+ */
+static void blkg_conf_finish(struct blkg_conf_ctx *ctx)
+	__releases(rcu)
 {
-	int ret = 0;
-	char *buf;
-	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-	enum blkio_policy_id plid = BLKIOFILE_POLICY(cft->private);
-	int fileid = BLKIOFILE_ATTR(cft->private);
-
-	buf = kstrdup(buffer, GFP_KERNEL);
-	if (!buf)
-		return -ENOMEM;
-
-	ret = blkio_policy_parse_and_set(buf, plid, fileid, blkcg);
-	kfree(buf);
-	return ret;
+	rcu_read_unlock();
+	put_disk(ctx->disk);
 }
 
 /* for propio conf */
@@ -1140,6 +1110,32 @@ static int blkcg_print_weight(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
+static int blkcg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
+				   const char *buf)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+	struct blkg_policy_data *pd;
+	struct blkg_conf_ctx ctx;
+	int ret;
+
+	ret = blkg_conf_prep(blkcg, buf, &ctx);
+	if (ret)
+		return ret;
+
+	ret = -EINVAL;
+	pd = ctx.blkg->pd[BLKIO_POLICY_PROP];
+	if (pd && (!ctx.v || (ctx.v >= BLKIO_WEIGHT_MIN &&
+			      ctx.v <= BLKIO_WEIGHT_MAX))) {
+		pd->conf.weight = ctx.v;
+		blkio_update_group_weight(ctx.blkg, BLKIO_POLICY_PROP,
+					  ctx.v ?: blkcg->weight);
+		ret = 0;
+	}
+
+	blkg_conf_finish(&ctx);
+	return ret;
+}
+
 static int blkcg_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
@@ -1181,39 +1177,67 @@ static u64 blkg_prfill_conf_u64(struct seq_file *sf,
 static int blkcg_print_conf_u64(struct cgroup *cgrp, struct cftype *cft,
 				struct seq_file *sf)
 {
-	int off;
-
-	switch (BLKIOFILE_ATTR(cft->private)) {
-	case BLKIO_THROTL_read_bps_device:
-		off = offsetof(struct blkio_group_conf, bps[READ]);
-		break;
-	case BLKIO_THROTL_write_bps_device:
-		off = offsetof(struct blkio_group_conf, bps[WRITE]);
-		break;
-	case BLKIO_THROTL_read_iops_device:
-		off = offsetof(struct blkio_group_conf, iops[READ]);
-		break;
-	case BLKIO_THROTL_write_iops_device:
-		off = offsetof(struct blkio_group_conf, iops[WRITE]);
-		break;
-	default:
-		return -EINVAL;
-	}
-
 	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
 			  blkg_prfill_conf_u64, BLKIO_POLICY_THROTL,
-			  off, false);
+			  cft->private, false);
 	return 0;
 }
+
+static int blkcg_set_conf_u64(struct cgroup *cgrp, struct cftype *cft,
+			      const char *buf, int rw,
+			      void (*update)(struct blkio_group *, int, u64, int))
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+	struct blkg_policy_data *pd;
+	struct blkg_conf_ctx ctx;
+	int ret;
+
+	ret = blkg_conf_prep(blkcg, buf, &ctx);
+	if (ret)
+		return ret;
+
+	ret = -EINVAL;
+	pd = ctx.blkg->pd[BLKIO_POLICY_THROTL];
+	if (pd) {
+		*(u64 *)((void *)&pd->conf + cft->private) = ctx.v;
+		update(ctx.blkg, BLKIO_POLICY_THROTL, ctx.v ?: -1, rw);
+		ret = 0;
+	}
+
+	blkg_conf_finish(&ctx);
+	return ret;
+}
+
+static int blkcg_set_conf_bps_r(struct cgroup *cgrp, struct cftype *cft,
+				const char *buf)
+{
+	return blkcg_set_conf_u64(cgrp, cft, buf, READ, blkio_update_group_bps);
+}
+
+static int blkcg_set_conf_bps_w(struct cgroup *cgrp, struct cftype *cft,
+				const char *buf)
+{
+	return blkcg_set_conf_u64(cgrp, cft, buf, WRITE, blkio_update_group_bps);
+}
+
+static int blkcg_set_conf_iops_r(struct cgroup *cgrp, struct cftype *cft,
+				 const char *buf)
+{
+	return blkcg_set_conf_u64(cgrp, cft, buf, READ, blkio_update_group_iops);
+}
+
+static int blkcg_set_conf_iops_w(struct cgroup *cgrp, struct cftype *cft,
+				 const char *buf)
+{
+	return blkcg_set_conf_u64(cgrp, cft, buf, WRITE, blkio_update_group_iops);
+}
 #endif
 
 struct cftype blkio_files[] = {
 	{
 		.name = "weight_device",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_PROP,
-				BLKIO_PROP_weight_device),
 		.read_seq_string = blkcg_print_weight_device,
-		.write_string = blkiocg_file_write,
+		.write_string = blkcg_set_weight_device,
 		.max_write_len = 256,
 	},
 	{
@@ -1276,37 +1300,33 @@ struct cftype blkio_files[] = {
 #ifdef CONFIG_BLK_DEV_THROTTLING
 	{
 		.name = "throttle.read_bps_device",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
-				BLKIO_THROTL_read_bps_device),
+		.private = offsetof(struct blkio_group_conf, bps[READ]),
 		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkiocg_file_write,
+		.write_string = blkcg_set_conf_bps_r,
 		.max_write_len = 256,
 	},
 
 	{
 		.name = "throttle.write_bps_device",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
-				BLKIO_THROTL_write_bps_device),
+		.private = offsetof(struct blkio_group_conf, bps[WRITE]),
 		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkiocg_file_write,
+		.write_string = blkcg_set_conf_bps_w,
 		.max_write_len = 256,
 	},
 
 	{
 		.name = "throttle.read_iops_device",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
-				BLKIO_THROTL_read_iops_device),
+		.private = offsetof(struct blkio_group_conf, iops[READ]),
 		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkiocg_file_write,
+		.write_string = blkcg_set_conf_iops_r,
 		.max_write_len = 256,
 	},
 
 	{
 		.name = "throttle.write_iops_device",
-		.private = BLKIOFILE_PRIVATE(BLKIO_POLICY_THROTL,
-				BLKIO_THROTL_write_iops_device),
+		.private = offsetof(struct blkio_group_conf, iops[WRITE]),
 		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkiocg_file_write,
+		.write_string = blkcg_set_conf_iops_w,
 		.max_write_len = 256,
 	},
 	{
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index b67eefa..108ffbf 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -50,19 +50,6 @@ enum blkg_state_flags {
 	BLKG_empty,
 };
 
-/* cgroup files owned by proportional weight policy */
-enum blkcg_file_name_prop {
-	BLKIO_PROP_weight_device,
-};
-
-/* cgroup files owned by throttle policy */
-enum blkcg_file_name_throtl {
-	BLKIO_THROTL_read_bps_device,
-	BLKIO_THROTL_write_bps_device,
-	BLKIO_THROTL_read_iops_device,
-	BLKIO_THROTL_write_iops_device,
-};
-
 struct blkio_cgroup {
 	struct cgroup_subsys_state css;
 	unsigned int weight;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 08/21] blkcg: blkg_conf_prep()
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (6 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 07/21] blkcg: restructure blkio_group configruation setting Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:53   ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 09/21] blkcg: export conf/stat helpers to prepare for reorganization Tejun Heo
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

blkg_conf_prep() implements "MAJ:MIN VAL" parsing manually, which is
unnecessary.  Just use sscanf("%u:%u %llu").  This might not reject
some malformed input (extra input at the end) but we don't care.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c |   64 ++++++++-------------------------------------------
 1 files changed, 10 insertions(+), 54 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0d4f21e..3d933b0 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -987,57 +987,16 @@ static int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
 {
 	struct gendisk *disk;
 	struct blkio_group *blkg;
-	char *buf, *s[4], *p, *major_s, *minor_s;
-	unsigned long major, minor;
-	int i = 0, ret = -EINVAL;
-	int part;
-	dev_t dev;
-	u64 temp;
+	unsigned int major, minor;
+	unsigned long long v;
+	int part, ret;
 
-	buf = kstrdup(input, GFP_KERNEL);
-	if (!buf)
-		return -ENOMEM;
-
-	memset(s, 0, sizeof(s));
-
-	while ((p = strsep(&buf, " ")) != NULL) {
-		if (!*p)
-			continue;
-
-		s[i++] = p;
-
-		/* Prevent from inputing too many things */
-		if (i == 3)
-			break;
-	}
-
-	if (i != 2)
-		goto out;
-
-	p = strsep(&s[0], ":");
-	if (p != NULL)
-		major_s = p;
-	else
-		goto out;
-
-	minor_s = s[0];
-	if (!minor_s)
-		goto out;
-
-	if (strict_strtoul(major_s, 10, &major))
-		goto out;
-
-	if (strict_strtoul(minor_s, 10, &minor))
-		goto out;
-
-	dev = MKDEV(major, minor);
-
-	if (strict_strtoull(s[1], 10, &temp))
-		goto out;
+	if (sscanf(input, "%u:%u %llu", &major, &minor, &v) != 3)
+		return -EINVAL;
 
-	disk = get_gendisk(dev, &part);
+	disk = get_gendisk(MKDEV(major, minor), &part);
 	if (!disk || part)
-		goto out;
+		return -EINVAL;
 
 	rcu_read_lock();
 
@@ -1059,16 +1018,13 @@ static int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
 			msleep(10);
 			ret = restart_syscall();
 		}
-		goto out;
+		return ret;
 	}
 
 	ctx->disk = disk;
 	ctx->blkg = blkg;
-	ctx->v = temp;
-	ret = 0;
-out:
-	kfree(buf);
-	return ret;
+	ctx->v = v;
+	return 0;
 }
 
 /**
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 09/21] blkcg: export conf/stat helpers to prepare for reorganization
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (7 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 08/21] blkcg: blkg_conf_prep() Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 10/21] blkcg: implement blkio_policy_type->cftypes Tejun Heo
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

conf/stat handling is about to be moved to policy implementation from
blkcg core.  Export conf/stat helpers from blkcg core so that
blk-throttle and cfq-iosched can use them.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c |   52 +++++++++++++++++++++++++---------------------------
 block/blk-cgroup.h |   27 +++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 27 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 3d933b0..df1e197 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -11,7 +11,6 @@
  * 	              Nauman Rafique <nauman@google.com>
  */
 #include <linux/ioprio.h>
-#include <linux/seq_file.h>
 #include <linux/kdev_t.h>
 #include <linux/module.h>
 #include <linux/err.h>
@@ -767,10 +766,9 @@ static const char *blkg_dev_name(struct blkio_group *blkg)
  * This is to be used to construct print functions for
  * cftype->read_seq_string method.
  */
-static void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
-			      u64 (*prfill)(struct seq_file *,
-					    struct blkg_policy_data *, int),
-			      int pol, int data, bool show_total)
+void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
+		       u64 (*prfill)(struct seq_file *, struct blkg_policy_data *, int),
+		       int pol, int data, bool show_total)
 {
 	struct blkio_group *blkg;
 	struct hlist_node *n;
@@ -785,6 +783,7 @@ static void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
 	if (show_total)
 		seq_printf(sf, "Total %llu\n", (unsigned long long)total);
 }
+EXPORT_SYMBOL_GPL(blkcg_print_blkgs);
 
 /**
  * __blkg_prfill_u64 - prfill helper for a single u64 value
@@ -794,8 +793,7 @@ static void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
  *
  * Print @v to @sf for the device assocaited with @pd.
  */
-static u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd,
-			     u64 v)
+u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v)
 {
 	const char *dname = blkg_dev_name(pd->blkg);
 
@@ -805,6 +803,7 @@ static u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd,
 	seq_printf(sf, "%s %llu\n", dname, (unsigned long long)v);
 	return v;
 }
+EXPORT_SYMBOL_GPL(__blkg_prfill_u64);
 
 /**
  * __blkg_prfill_rwstat - prfill helper for a blkg_rwstat
@@ -814,9 +813,8 @@ static u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd,
  *
  * Print @rwstat to @sf for the device assocaited with @pd.
  */
-static u64 __blkg_prfill_rwstat(struct seq_file *sf,
-				struct blkg_policy_data *pd,
-				const struct blkg_rwstat *rwstat)
+u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
+			 const struct blkg_rwstat *rwstat)
 {
 	static const char *rwstr[] = {
 		[BLKG_RWSTAT_READ]	= "Read",
@@ -856,8 +854,8 @@ static u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
 }
 
 /* print blkg_stat specified by BLKCG_STAT_PRIV() */
-static int blkcg_print_stat(struct cgroup *cgrp, struct cftype *cft,
-			    struct seq_file *sf)
+int blkcg_print_stat(struct cgroup *cgrp, struct cftype *cft,
+		     struct seq_file *sf)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
@@ -866,10 +864,11 @@ static int blkcg_print_stat(struct cgroup *cgrp, struct cftype *cft,
 			  BLKCG_STAT_OFF(cft->private), false);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blkcg_print_stat);
 
 /* print blkg_rwstat specified by BLKCG_STAT_PRIV() */
-static int blkcg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
-			      struct seq_file *sf)
+int blkcg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
+		       struct seq_file *sf)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
@@ -878,6 +877,7 @@ static int blkcg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
 			  BLKCG_STAT_OFF(cft->private), true);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blkcg_print_rwstat);
 
 static u64 blkg_prfill_cpu_stat(struct seq_file *sf,
 				struct blkg_policy_data *pd, int off)
@@ -914,8 +914,8 @@ static u64 blkg_prfill_cpu_rwstat(struct seq_file *sf,
 }
 
 /* print per-cpu blkg_stat specified by BLKCG_STAT_PRIV() */
-static int blkcg_print_cpu_stat(struct cgroup *cgrp, struct cftype *cft,
-				struct seq_file *sf)
+int blkcg_print_cpu_stat(struct cgroup *cgrp, struct cftype *cft,
+			 struct seq_file *sf)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
@@ -924,10 +924,11 @@ static int blkcg_print_cpu_stat(struct cgroup *cgrp, struct cftype *cft,
 			  BLKCG_STAT_OFF(cft->private), false);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blkcg_print_cpu_stat);
 
 /* print per-cpu blkg_rwstat specified by BLKCG_STAT_PRIV() */
-static int blkcg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
-				  struct seq_file *sf)
+int blkcg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
+			   struct seq_file *sf)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
@@ -936,6 +937,7 @@ static int blkcg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
 			  BLKCG_STAT_OFF(cft->private), true);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blkcg_print_cpu_rwstat);
 
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 static u64 blkg_prfill_avg_queue_size(struct seq_file *sf,
@@ -964,12 +966,6 @@ static int blkcg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft,
 }
 #endif	/* CONFIG_DEBUG_BLK_CGROUP */
 
-struct blkg_conf_ctx {
-	struct gendisk		*disk;
-	struct blkio_group	*blkg;
-	u64			v;
-};
-
 /**
  * blkg_conf_prep - parse and prepare for per-blkg config update
  * @blkcg: target block cgroup
@@ -981,8 +977,8 @@ struct blkg_conf_ctx {
  * value.  This function returns with RCU read locked and must be paired
  * with blkg_conf_finish().
  */
-static int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
-			  struct blkg_conf_ctx *ctx)
+int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
+		   struct blkg_conf_ctx *ctx)
 	__acquires(rcu)
 {
 	struct gendisk *disk;
@@ -1026,6 +1022,7 @@ static int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
 	ctx->v = v;
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blkg_conf_prep);
 
 /**
  * blkg_conf_finish - finish up per-blkg config update
@@ -1034,12 +1031,13 @@ static int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
  * Finish up after per-blkg config update.  This function must be paired
  * with blkg_conf_prep().
  */
-static void blkg_conf_finish(struct blkg_conf_ctx *ctx)
+void blkg_conf_finish(struct blkg_conf_ctx *ctx)
 	__releases(rcu)
 {
 	rcu_read_unlock();
 	put_disk(ctx->disk);
 }
+EXPORT_SYMBOL_GPL(blkg_conf_finish);
 
 /* for propio conf */
 static u64 blkg_prfill_weight_device(struct seq_file *sf,
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 108ffbf..361ecfa 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -15,6 +15,7 @@
 
 #include <linux/cgroup.h>
 #include <linux/u64_stats_sync.h>
+#include <linux/seq_file.h>
 
 enum blkio_policy_id {
 	BLKIO_POLICY_PROP = 0,		/* Proportional Bandwidth division */
@@ -193,6 +194,32 @@ extern void blkg_destroy_all(struct request_queue *q, bool destroy_root);
 extern void update_root_blkg_pd(struct request_queue *q,
 				enum blkio_policy_id plid);
 
+void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
+		       u64 (*prfill)(struct seq_file *, struct blkg_policy_data *, int),
+		       int pol, int data, bool show_total);
+u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v);
+u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
+			 const struct blkg_rwstat *rwstat);
+int blkcg_print_stat(struct cgroup *cgrp, struct cftype *cft,
+		     struct seq_file *sf);
+int blkcg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
+		       struct seq_file *sf);
+int blkcg_print_cpu_stat(struct cgroup *cgrp, struct cftype *cft,
+			 struct seq_file *sf);
+int blkcg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
+			   struct seq_file *sf);
+
+struct blkg_conf_ctx {
+	struct gendisk		*disk;
+	struct blkio_group	*blkg;
+	u64			v;
+};
+
+int blkg_conf_prep(struct blkio_cgroup *blkcg, const char *input,
+		   struct blkg_conf_ctx *ctx);
+void blkg_conf_finish(struct blkg_conf_ctx *ctx);
+
+
 /**
  * blkg_to_pdata - get policy private data
  * @blkg: blkg of interest
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 10/21] blkcg: implement blkio_policy_type->cftypes
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (8 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 09/21] blkcg: export conf/stat helpers to prepare for reorganization Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 11/21] blkcg: move conf/stat file handling code to policies Tejun Heo
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

Add blkiop->cftypes which is added and removed together with the
policy.  This will be used to move conf/stat handling to the policies.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c |    6 ++++++
 block/blk-cgroup.h |    1 +
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index df1e197..2f05056 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1538,6 +1538,9 @@ void blkio_policy_register(struct blkio_policy_type *blkiop)
 	list_for_each_entry(q, &all_q_list, all_q_node)
 		update_root_blkg_pd(q, blkiop->plid);
 	blkcg_bypass_end();
+
+	if (blkiop->cftypes)
+		WARN_ON(cgroup_add_cftypes(&blkio_subsys, blkiop->cftypes));
 }
 EXPORT_SYMBOL_GPL(blkio_policy_register);
 
@@ -1545,6 +1548,9 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
 {
 	struct request_queue *q;
 
+	if (blkiop->cftypes)
+		cgroup_rm_cftypes(&blkio_subsys, blkiop->cftypes);
+
 	blkcg_bypass_start();
 	spin_lock(&blkio_list_lock);
 
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 361ecfa..fa744d5 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -181,6 +181,7 @@ struct blkio_policy_type {
 	struct blkio_policy_ops ops;
 	enum blkio_policy_id plid;
 	size_t pdata_size;		/* policy specific private data size */
+	struct cftype *cftypes;		/* cgroup files for the policy */
 };
 
 extern int blkcg_init_queue(struct request_queue *q);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 11/21] blkcg: move conf/stat file handling code to policies
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (9 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 10/21] blkcg: implement blkio_policy_type->cftypes Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 12/21] cfq: collapse cfq.h into cfq-iosched.c Tejun Heo
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

blkcg conf/stat handling is convoluted in that details which belong to
specific policy implementations are all out in blkcg core and then
policies hook into core layer to access and manipulate confs and
stats.  This sadly achieves both inflexibility (confs/stats can't be
modified without messing with blkcg core) and complexity (all the
call-ins and call-backs).

The previous patches restructured conf and stat handling code such
that they can be separated out.  This patch relocates the file
handling part.  All conf/stat file handling code which belongs to
BLKIO_POLICY_PROP is moved to cfq-iosched.c and all
BKLIO_POLICY_THROTL code to blk-throtl.c.

The move is verbatim except for blkio_update_group_{weight|bps|iops}()
callbacks which relays conf changes to policies.  The configuration
settings are handled in policies themselves so the relaying isn't
necessary.  Conf setting functions are modified to directly call
per-policy update functions and the relaying mechanism is dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c   |  373 --------------------------------------------------
 block/blk-cgroup.h   |   15 --
 block/blk-throttle.c |  163 ++++++++++++++++++----
 block/cfq-iosched.c  |  202 +++++++++++++++++++++++++++-
 4 files changed, 333 insertions(+), 420 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 2f05056..96b6b5a 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -63,63 +63,6 @@ struct blkio_cgroup *bio_blkio_cgroup(struct bio *bio)
 }
 EXPORT_SYMBOL_GPL(bio_blkio_cgroup);
 
-static inline void blkio_update_group_weight(struct blkio_group *blkg,
-					     int plid, unsigned int weight)
-{
-	struct blkio_policy_type *blkiop;
-
-	list_for_each_entry(blkiop, &blkio_list, list) {
-		/* If this policy does not own the blkg, do not send updates */
-		if (blkiop->plid != plid)
-			continue;
-		if (blkiop->ops.blkio_update_group_weight_fn)
-			blkiop->ops.blkio_update_group_weight_fn(blkg->q,
-							blkg, weight);
-	}
-}
-
-static inline void blkio_update_group_bps(struct blkio_group *blkg, int plid,
-					  u64 bps, int rw)
-{
-	struct blkio_policy_type *blkiop;
-
-	list_for_each_entry(blkiop, &blkio_list, list) {
-
-		/* If this policy does not own the blkg, do not send updates */
-		if (blkiop->plid != plid)
-			continue;
-
-		if (rw == READ && blkiop->ops.blkio_update_group_read_bps_fn)
-			blkiop->ops.blkio_update_group_read_bps_fn(blkg->q,
-								blkg, bps);
-
-		if (rw == WRITE && blkiop->ops.blkio_update_group_write_bps_fn)
-			blkiop->ops.blkio_update_group_write_bps_fn(blkg->q,
-								blkg, bps);
-	}
-}
-
-static inline void blkio_update_group_iops(struct blkio_group *blkg, int plid,
-					   u64 iops, int rw)
-{
-	struct blkio_policy_type *blkiop;
-
-	list_for_each_entry(blkiop, &blkio_list, list) {
-
-		/* If this policy does not own the blkg, do not send updates */
-		if (blkiop->plid != plid)
-			continue;
-
-		if (rw == READ && blkiop->ops.blkio_update_group_read_iops_fn)
-			blkiop->ops.blkio_update_group_read_iops_fn(blkg->q,
-								blkg, iops);
-
-		if (rw == WRITE && blkiop->ops.blkio_update_group_write_iops_fn)
-			blkiop->ops.blkio_update_group_write_iops_fn(blkg->q,
-								blkg,iops);
-	}
-}
-
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 /* This should be called with the queue_lock held. */
 static void blkio_set_start_group_wait_time(struct blkio_group *blkg,
@@ -939,33 +882,6 @@ int blkcg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
 }
 EXPORT_SYMBOL_GPL(blkcg_print_cpu_rwstat);
 
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-static u64 blkg_prfill_avg_queue_size(struct seq_file *sf,
-				      struct blkg_policy_data *pd, int off)
-{
-	u64 samples = blkg_stat_read(&pd->stats.avg_queue_size_samples);
-	u64 v = 0;
-
-	if (samples) {
-		v = blkg_stat_read(&pd->stats.avg_queue_size_sum);
-		do_div(v, samples);
-	}
-	__blkg_prfill_u64(sf, pd, v);
-	return 0;
-}
-
-/* print avg_queue_size */
-static int blkcg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft,
-				      struct seq_file *sf)
-{
-	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-
-	blkcg_print_blkgs(sf, blkcg, blkg_prfill_avg_queue_size,
-			  BLKIO_POLICY_PROP, 0, false);
-	return 0;
-}
-#endif	/* CONFIG_DEBUG_BLK_CGROUP */
-
 /**
  * blkg_conf_prep - parse and prepare for per-blkg config update
  * @blkcg: target block cgroup
@@ -1039,300 +955,11 @@ void blkg_conf_finish(struct blkg_conf_ctx *ctx)
 }
 EXPORT_SYMBOL_GPL(blkg_conf_finish);
 
-/* for propio conf */
-static u64 blkg_prfill_weight_device(struct seq_file *sf,
-				     struct blkg_policy_data *pd, int off)
-{
-	if (!pd->conf.weight)
-		return 0;
-	return __blkg_prfill_u64(sf, pd, pd->conf.weight);
-}
-
-static int blkcg_print_weight_device(struct cgroup *cgrp, struct cftype *cft,
-				     struct seq_file *sf)
-{
-	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
-			  blkg_prfill_weight_device, BLKIO_POLICY_PROP, 0,
-			  false);
-	return 0;
-}
-
-static int blkcg_print_weight(struct cgroup *cgrp, struct cftype *cft,
-			      struct seq_file *sf)
-{
-	seq_printf(sf, "%u\n", cgroup_to_blkio_cgroup(cgrp)->weight);
-	return 0;
-}
-
-static int blkcg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
-				   const char *buf)
-{
-	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-	struct blkg_policy_data *pd;
-	struct blkg_conf_ctx ctx;
-	int ret;
-
-	ret = blkg_conf_prep(blkcg, buf, &ctx);
-	if (ret)
-		return ret;
-
-	ret = -EINVAL;
-	pd = ctx.blkg->pd[BLKIO_POLICY_PROP];
-	if (pd && (!ctx.v || (ctx.v >= BLKIO_WEIGHT_MIN &&
-			      ctx.v <= BLKIO_WEIGHT_MAX))) {
-		pd->conf.weight = ctx.v;
-		blkio_update_group_weight(ctx.blkg, BLKIO_POLICY_PROP,
-					  ctx.v ?: blkcg->weight);
-		ret = 0;
-	}
-
-	blkg_conf_finish(&ctx);
-	return ret;
-}
-
-static int blkcg_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
-{
-	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-	struct blkio_group *blkg;
-	struct hlist_node *n;
-
-	if (val < BLKIO_WEIGHT_MIN || val > BLKIO_WEIGHT_MAX)
-		return -EINVAL;
-
-	spin_lock(&blkio_list_lock);
-	spin_lock_irq(&blkcg->lock);
-	blkcg->weight = (unsigned int)val;
-
-	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
-		struct blkg_policy_data *pd = blkg->pd[BLKIO_POLICY_PROP];
-
-		if (pd && !pd->conf.weight)
-			blkio_update_group_weight(blkg, BLKIO_POLICY_PROP,
-						  blkcg->weight);
-	}
-
-	spin_unlock_irq(&blkcg->lock);
-	spin_unlock(&blkio_list_lock);
-	return 0;
-}
-
-/* for blk-throttle conf */
-#ifdef CONFIG_BLK_DEV_THROTTLING
-static u64 blkg_prfill_conf_u64(struct seq_file *sf,
-				struct blkg_policy_data *pd, int off)
-{
-	u64 v = *(u64 *)((void *)&pd->conf + off);
-
-	if (!v)
-		return 0;
-	return __blkg_prfill_u64(sf, pd, v);
-}
-
-static int blkcg_print_conf_u64(struct cgroup *cgrp, struct cftype *cft,
-				struct seq_file *sf)
-{
-	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
-			  blkg_prfill_conf_u64, BLKIO_POLICY_THROTL,
-			  cft->private, false);
-	return 0;
-}
-
-static int blkcg_set_conf_u64(struct cgroup *cgrp, struct cftype *cft,
-			      const char *buf, int rw,
-			      void (*update)(struct blkio_group *, int, u64, int))
-{
-	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-	struct blkg_policy_data *pd;
-	struct blkg_conf_ctx ctx;
-	int ret;
-
-	ret = blkg_conf_prep(blkcg, buf, &ctx);
-	if (ret)
-		return ret;
-
-	ret = -EINVAL;
-	pd = ctx.blkg->pd[BLKIO_POLICY_THROTL];
-	if (pd) {
-		*(u64 *)((void *)&pd->conf + cft->private) = ctx.v;
-		update(ctx.blkg, BLKIO_POLICY_THROTL, ctx.v ?: -1, rw);
-		ret = 0;
-	}
-
-	blkg_conf_finish(&ctx);
-	return ret;
-}
-
-static int blkcg_set_conf_bps_r(struct cgroup *cgrp, struct cftype *cft,
-				const char *buf)
-{
-	return blkcg_set_conf_u64(cgrp, cft, buf, READ, blkio_update_group_bps);
-}
-
-static int blkcg_set_conf_bps_w(struct cgroup *cgrp, struct cftype *cft,
-				const char *buf)
-{
-	return blkcg_set_conf_u64(cgrp, cft, buf, WRITE, blkio_update_group_bps);
-}
-
-static int blkcg_set_conf_iops_r(struct cgroup *cgrp, struct cftype *cft,
-				 const char *buf)
-{
-	return blkcg_set_conf_u64(cgrp, cft, buf, READ, blkio_update_group_iops);
-}
-
-static int blkcg_set_conf_iops_w(struct cgroup *cgrp, struct cftype *cft,
-				 const char *buf)
-{
-	return blkcg_set_conf_u64(cgrp, cft, buf, WRITE, blkio_update_group_iops);
-}
-#endif
-
 struct cftype blkio_files[] = {
 	{
-		.name = "weight_device",
-		.read_seq_string = blkcg_print_weight_device,
-		.write_string = blkcg_set_weight_device,
-		.max_write_len = 256,
-	},
-	{
-		.name = "weight",
-		.read_seq_string = blkcg_print_weight,
-		.write_u64 = blkcg_set_weight,
-	},
-	{
-		.name = "time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, time)),
-		.read_seq_string = blkcg_print_stat,
-	},
-	{
-		.name = "sectors",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats_cpu, sectors)),
-		.read_seq_string = blkcg_print_cpu_stat,
-	},
-	{
-		.name = "io_service_bytes",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats_cpu, service_bytes)),
-		.read_seq_string = blkcg_print_cpu_rwstat,
-	},
-	{
-		.name = "io_serviced",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats_cpu, serviced)),
-		.read_seq_string = blkcg_print_cpu_rwstat,
-	},
-	{
-		.name = "io_service_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, service_time)),
-		.read_seq_string = blkcg_print_rwstat,
-	},
-	{
-		.name = "io_wait_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, wait_time)),
-		.read_seq_string = blkcg_print_rwstat,
-	},
-	{
-		.name = "io_merged",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, merged)),
-		.read_seq_string = blkcg_print_rwstat,
-	},
-	{
-		.name = "io_queued",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, queued)),
-		.read_seq_string = blkcg_print_rwstat,
-	},
-	{
 		.name = "reset_stats",
 		.write_u64 = blkiocg_reset_stats,
 	},
-#ifdef CONFIG_BLK_DEV_THROTTLING
-	{
-		.name = "throttle.read_bps_device",
-		.private = offsetof(struct blkio_group_conf, bps[READ]),
-		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkcg_set_conf_bps_r,
-		.max_write_len = 256,
-	},
-
-	{
-		.name = "throttle.write_bps_device",
-		.private = offsetof(struct blkio_group_conf, bps[WRITE]),
-		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkcg_set_conf_bps_w,
-		.max_write_len = 256,
-	},
-
-	{
-		.name = "throttle.read_iops_device",
-		.private = offsetof(struct blkio_group_conf, iops[READ]),
-		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkcg_set_conf_iops_r,
-		.max_write_len = 256,
-	},
-
-	{
-		.name = "throttle.write_iops_device",
-		.private = offsetof(struct blkio_group_conf, iops[WRITE]),
-		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkcg_set_conf_iops_w,
-		.max_write_len = 256,
-	},
-	{
-		.name = "throttle.io_service_bytes",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_THROTL,
-				offsetof(struct blkio_group_stats_cpu, service_bytes)),
-		.read_seq_string = blkcg_print_cpu_rwstat,
-	},
-	{
-		.name = "throttle.io_serviced",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_THROTL,
-				offsetof(struct blkio_group_stats_cpu, serviced)),
-		.read_seq_string = blkcg_print_cpu_rwstat,
-	},
-#endif /* CONFIG_BLK_DEV_THROTTLING */
-
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-	{
-		.name = "avg_queue_size",
-		.read_seq_string = blkcg_print_avg_queue_size,
-	},
-	{
-		.name = "group_wait_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, group_wait_time)),
-		.read_seq_string = blkcg_print_stat,
-	},
-	{
-		.name = "idle_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, idle_time)),
-		.read_seq_string = blkcg_print_stat,
-	},
-	{
-		.name = "empty_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, empty_time)),
-		.read_seq_string = blkcg_print_stat,
-	},
-	{
-		.name = "dequeue",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, dequeue)),
-		.read_seq_string = blkcg_print_stat,
-	},
-	{
-		.name = "unaccounted_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, unaccounted_time)),
-		.read_seq_string = blkcg_print_stat,
-	},
-#endif
 	{ }	/* terminate */
 };
 
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index fa744d5..ba64b28 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -156,24 +156,9 @@ struct blkio_group {
 };
 
 typedef void (blkio_init_group_fn)(struct blkio_group *blkg);
-typedef void (blkio_update_group_weight_fn)(struct request_queue *q,
-			struct blkio_group *blkg, unsigned int weight);
-typedef void (blkio_update_group_read_bps_fn)(struct request_queue *q,
-			struct blkio_group *blkg, u64 read_bps);
-typedef void (blkio_update_group_write_bps_fn)(struct request_queue *q,
-			struct blkio_group *blkg, u64 write_bps);
-typedef void (blkio_update_group_read_iops_fn)(struct request_queue *q,
-			struct blkio_group *blkg, unsigned int read_iops);
-typedef void (blkio_update_group_write_iops_fn)(struct request_queue *q,
-			struct blkio_group *blkg, unsigned int write_iops);
 
 struct blkio_policy_ops {
 	blkio_init_group_fn *blkio_init_group_fn;
-	blkio_update_group_weight_fn *blkio_update_group_weight_fn;
-	blkio_update_group_read_bps_fn *blkio_update_group_read_bps_fn;
-	blkio_update_group_write_bps_fn *blkio_update_group_write_bps_fn;
-	blkio_update_group_read_iops_fn *blkio_update_group_read_iops_fn;
-	blkio_update_group_write_iops_fn *blkio_update_group_write_iops_fn;
 };
 
 struct blkio_policy_type {
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 1cc6c23d..fb6f257 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -804,6 +804,11 @@ throtl_schedule_delayed_work(struct throtl_data *td, unsigned long delay)
 	}
 }
 
+/*
+ * Can not take queue lock in update functions as queue lock under
+ * blkcg_lock is not allowed. Under other paths we take blkcg_lock under
+ * queue_lock.
+ */
 static void throtl_update_blkio_group_common(struct throtl_data *td,
 				struct throtl_grp *tg)
 {
@@ -813,51 +818,158 @@ static void throtl_update_blkio_group_common(struct throtl_data *td,
 	throtl_schedule_delayed_work(td, 0);
 }
 
-/*
- * For all update functions, @q should be a valid pointer because these
- * update functions are called under blkcg_lock, that means, blkg is
- * valid and in turn @q is valid. queue exit path can not race because
- * of blkcg_lock
- *
- * Can not take queue lock in update functions as queue lock under blkcg_lock
- * is not allowed. Under other paths we take blkcg_lock under queue_lock.
- */
-static void throtl_update_blkio_group_read_bps(struct request_queue *q,
-				struct blkio_group *blkg, u64 read_bps)
+static u64 blkg_prfill_conf_u64(struct seq_file *sf,
+				struct blkg_policy_data *pd, int off)
+{
+	u64 v = *(u64 *)((void *)&pd->conf + off);
+
+	if (!v)
+		return 0;
+	return __blkg_prfill_u64(sf, pd, v);
+}
+
+static int blkcg_print_conf_u64(struct cgroup *cgrp, struct cftype *cft,
+				struct seq_file *sf)
+{
+	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
+			  blkg_prfill_conf_u64, BLKIO_POLICY_THROTL,
+			  cft->private, false);
+	return 0;
+}
+
+static void throtl_update_blkio_group_read_bps(struct blkio_group *blkg,
+					       u64 read_bps)
 {
 	struct throtl_grp *tg = blkg_to_tg(blkg);
 
 	tg->bps[READ] = read_bps;
-	throtl_update_blkio_group_common(q->td, tg);
+	throtl_update_blkio_group_common(blkg->q->td, tg);
 }
 
-static void throtl_update_blkio_group_write_bps(struct request_queue *q,
-				struct blkio_group *blkg, u64 write_bps)
+static void throtl_update_blkio_group_write_bps(struct blkio_group *blkg,
+						u64 write_bps)
 {
 	struct throtl_grp *tg = blkg_to_tg(blkg);
 
 	tg->bps[WRITE] = write_bps;
-	throtl_update_blkio_group_common(q->td, tg);
+	throtl_update_blkio_group_common(blkg->q->td, tg);
 }
 
-static void throtl_update_blkio_group_read_iops(struct request_queue *q,
-			struct blkio_group *blkg, unsigned int read_iops)
+static void throtl_update_blkio_group_read_iops(struct blkio_group *blkg,
+						u64 read_iops)
 {
 	struct throtl_grp *tg = blkg_to_tg(blkg);
 
 	tg->iops[READ] = read_iops;
-	throtl_update_blkio_group_common(q->td, tg);
+	throtl_update_blkio_group_common(blkg->q->td, tg);
 }
 
-static void throtl_update_blkio_group_write_iops(struct request_queue *q,
-			struct blkio_group *blkg, unsigned int write_iops)
+static void throtl_update_blkio_group_write_iops(struct blkio_group *blkg,
+						 u64 write_iops)
 {
 	struct throtl_grp *tg = blkg_to_tg(blkg);
 
 	tg->iops[WRITE] = write_iops;
-	throtl_update_blkio_group_common(q->td, tg);
+	throtl_update_blkio_group_common(blkg->q->td, tg);
+}
+
+static int blkcg_set_conf_u64(struct cgroup *cgrp, struct cftype *cft,
+			      const char *buf,
+			      void (*update)(struct blkio_group *, u64))
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+	struct blkg_policy_data *pd;
+	struct blkg_conf_ctx ctx;
+	int ret;
+
+	ret = blkg_conf_prep(blkcg, buf, &ctx);
+	if (ret)
+		return ret;
+
+	ret = -EINVAL;
+	pd = ctx.blkg->pd[BLKIO_POLICY_THROTL];
+	if (pd) {
+		*(u64 *)((void *)&pd->conf + cft->private) = ctx.v;
+		update(ctx.blkg, ctx.v ?: -1);
+		ret = 0;
+	}
+
+	blkg_conf_finish(&ctx);
+	return ret;
 }
 
+static int blkcg_set_conf_bps_r(struct cgroup *cgrp, struct cftype *cft,
+				const char *buf)
+{
+	return blkcg_set_conf_u64(cgrp, cft, buf,
+				  throtl_update_blkio_group_read_bps);
+}
+
+static int blkcg_set_conf_bps_w(struct cgroup *cgrp, struct cftype *cft,
+				const char *buf)
+{
+	return blkcg_set_conf_u64(cgrp, cft, buf,
+				  throtl_update_blkio_group_write_bps);
+}
+
+static int blkcg_set_conf_iops_r(struct cgroup *cgrp, struct cftype *cft,
+				 const char *buf)
+{
+	return blkcg_set_conf_u64(cgrp, cft, buf,
+				  throtl_update_blkio_group_read_iops);
+}
+
+static int blkcg_set_conf_iops_w(struct cgroup *cgrp, struct cftype *cft,
+				 const char *buf)
+{
+	return blkcg_set_conf_u64(cgrp, cft, buf,
+				  throtl_update_blkio_group_write_iops);
+}
+
+static struct cftype throtl_files[] = {
+	{
+		.name = "throttle.read_bps_device",
+		.private = offsetof(struct blkio_group_conf, bps[READ]),
+		.read_seq_string = blkcg_print_conf_u64,
+		.write_string = blkcg_set_conf_bps_r,
+		.max_write_len = 256,
+	},
+	{
+		.name = "throttle.write_bps_device",
+		.private = offsetof(struct blkio_group_conf, bps[WRITE]),
+		.read_seq_string = blkcg_print_conf_u64,
+		.write_string = blkcg_set_conf_bps_w,
+		.max_write_len = 256,
+	},
+	{
+		.name = "throttle.read_iops_device",
+		.private = offsetof(struct blkio_group_conf, iops[READ]),
+		.read_seq_string = blkcg_print_conf_u64,
+		.write_string = blkcg_set_conf_iops_r,
+		.max_write_len = 256,
+	},
+	{
+		.name = "throttle.write_iops_device",
+		.private = offsetof(struct blkio_group_conf, iops[WRITE]),
+		.read_seq_string = blkcg_print_conf_u64,
+		.write_string = blkcg_set_conf_iops_w,
+		.max_write_len = 256,
+	},
+	{
+		.name = "throttle.io_service_bytes",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_THROTL,
+				offsetof(struct blkio_group_stats_cpu, service_bytes)),
+		.read_seq_string = blkcg_print_cpu_rwstat,
+	},
+	{
+		.name = "throttle.io_serviced",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_THROTL,
+				offsetof(struct blkio_group_stats_cpu, serviced)),
+		.read_seq_string = blkcg_print_cpu_rwstat,
+	},
+	{ }	/* terminate */
+};
+
 static void throtl_shutdown_wq(struct request_queue *q)
 {
 	struct throtl_data *td = q->td;
@@ -868,17 +980,10 @@ static void throtl_shutdown_wq(struct request_queue *q)
 static struct blkio_policy_type blkio_policy_throtl = {
 	.ops = {
 		.blkio_init_group_fn = throtl_init_blkio_group,
-		.blkio_update_group_read_bps_fn =
-					throtl_update_blkio_group_read_bps,
-		.blkio_update_group_write_bps_fn =
-					throtl_update_blkio_group_write_bps,
-		.blkio_update_group_read_iops_fn =
-					throtl_update_blkio_group_read_iops,
-		.blkio_update_group_write_iops_fn =
-					throtl_update_blkio_group_write_iops,
 	},
 	.plid = BLKIO_POLICY_THROTL,
 	.pdata_size = sizeof(struct throtl_grp),
+	.cftypes = throtl_files,
 };
 
 bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 8cca6161..119e061 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1058,8 +1058,7 @@ static void cfq_init_cfqg_base(struct cfq_group *cfqg)
 }
 
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
-static void cfq_update_blkio_group_weight(struct request_queue *q,
-					  struct blkio_group *blkg,
+static void cfq_update_blkio_group_weight(struct blkio_group *blkg,
 					  unsigned int weight)
 {
 	struct cfq_group *cfqg = blkg_to_cfqg(blkg);
@@ -1111,6 +1110,203 @@ static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg)
 	cfqg_get(cfqg);
 }
 
+static u64 blkg_prfill_weight_device(struct seq_file *sf,
+				     struct blkg_policy_data *pd, int off)
+{
+	if (!pd->conf.weight)
+		return 0;
+	return __blkg_prfill_u64(sf, pd, pd->conf.weight);
+}
+
+static int blkcg_print_weight_device(struct cgroup *cgrp, struct cftype *cft,
+				     struct seq_file *sf)
+{
+	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
+			  blkg_prfill_weight_device, BLKIO_POLICY_PROP, 0,
+			  false);
+	return 0;
+}
+
+static int blkcg_print_weight(struct cgroup *cgrp, struct cftype *cft,
+			      struct seq_file *sf)
+{
+	seq_printf(sf, "%u\n", cgroup_to_blkio_cgroup(cgrp)->weight);
+	return 0;
+}
+
+static int blkcg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
+				   const char *buf)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+	struct blkg_policy_data *pd;
+	struct blkg_conf_ctx ctx;
+	int ret;
+
+	ret = blkg_conf_prep(blkcg, buf, &ctx);
+	if (ret)
+		return ret;
+
+	ret = -EINVAL;
+	pd = ctx.blkg->pd[BLKIO_POLICY_PROP];
+	if (pd && (!ctx.v || (ctx.v >= BLKIO_WEIGHT_MIN &&
+			      ctx.v <= BLKIO_WEIGHT_MAX))) {
+		pd->conf.weight = ctx.v;
+		cfq_update_blkio_group_weight(ctx.blkg, ctx.v ?: blkcg->weight);
+		ret = 0;
+	}
+
+	blkg_conf_finish(&ctx);
+	return ret;
+}
+
+static int blkcg_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+	struct blkio_group *blkg;
+	struct hlist_node *n;
+
+	if (val < BLKIO_WEIGHT_MIN || val > BLKIO_WEIGHT_MAX)
+		return -EINVAL;
+
+	spin_lock_irq(&blkcg->lock);
+	blkcg->weight = (unsigned int)val;
+
+	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
+		struct blkg_policy_data *pd = blkg->pd[BLKIO_POLICY_PROP];
+
+		if (pd && !pd->conf.weight)
+			cfq_update_blkio_group_weight(blkg, blkcg->weight);
+	}
+
+	spin_unlock_irq(&blkcg->lock);
+	return 0;
+}
+
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+static u64 blkg_prfill_avg_queue_size(struct seq_file *sf,
+				      struct blkg_policy_data *pd, int off)
+{
+	u64 samples = blkg_stat_read(&pd->stats.avg_queue_size_samples);
+	u64 v = 0;
+
+	if (samples) {
+		v = blkg_stat_read(&pd->stats.avg_queue_size_sum);
+		do_div(v, samples);
+	}
+	__blkg_prfill_u64(sf, pd, v);
+	return 0;
+}
+
+/* print avg_queue_size */
+static int blkcg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft,
+				      struct seq_file *sf)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_avg_queue_size,
+			  BLKIO_POLICY_PROP, 0, false);
+	return 0;
+}
+#endif	/* CONFIG_DEBUG_BLK_CGROUP */
+
+static struct cftype cfq_blkcg_files[] = {
+	{
+		.name = "weight_device",
+		.read_seq_string = blkcg_print_weight_device,
+		.write_string = blkcg_set_weight_device,
+		.max_write_len = 256,
+	},
+	{
+		.name = "weight",
+		.read_seq_string = blkcg_print_weight,
+		.write_u64 = blkcg_set_weight,
+	},
+	{
+		.name = "time",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, time)),
+		.read_seq_string = blkcg_print_stat,
+	},
+	{
+		.name = "sectors",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats_cpu, sectors)),
+		.read_seq_string = blkcg_print_cpu_stat,
+	},
+	{
+		.name = "io_service_bytes",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats_cpu, service_bytes)),
+		.read_seq_string = blkcg_print_cpu_rwstat,
+	},
+	{
+		.name = "io_serviced",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats_cpu, serviced)),
+		.read_seq_string = blkcg_print_cpu_rwstat,
+	},
+	{
+		.name = "io_service_time",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, service_time)),
+		.read_seq_string = blkcg_print_rwstat,
+	},
+	{
+		.name = "io_wait_time",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, wait_time)),
+		.read_seq_string = blkcg_print_rwstat,
+	},
+	{
+		.name = "io_merged",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, merged)),
+		.read_seq_string = blkcg_print_rwstat,
+	},
+	{
+		.name = "io_queued",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, queued)),
+		.read_seq_string = blkcg_print_rwstat,
+	},
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	{
+		.name = "avg_queue_size",
+		.read_seq_string = blkcg_print_avg_queue_size,
+	},
+	{
+		.name = "group_wait_time",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, group_wait_time)),
+		.read_seq_string = blkcg_print_stat,
+	},
+	{
+		.name = "idle_time",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, idle_time)),
+		.read_seq_string = blkcg_print_stat,
+	},
+	{
+		.name = "empty_time",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, empty_time)),
+		.read_seq_string = blkcg_print_stat,
+	},
+	{
+		.name = "dequeue",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, dequeue)),
+		.read_seq_string = blkcg_print_stat,
+	},
+	{
+		.name = "unaccounted_time",
+		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
+				offsetof(struct blkio_group_stats, unaccounted_time)),
+		.read_seq_string = blkcg_print_stat,
+	},
+#endif	/* CONFIG_DEBUG_BLK_CGROUP */
+	{ }	/* terminate */
+};
 #else /* GROUP_IOSCHED */
 static struct cfq_group *cfq_lookup_create_cfqg(struct cfq_data *cfqd,
 						struct blkio_cgroup *blkcg)
@@ -3715,10 +3911,10 @@ static struct elevator_type iosched_cfq = {
 static struct blkio_policy_type blkio_policy_cfq = {
 	.ops = {
 		.blkio_init_group_fn =		cfq_init_blkio_group,
-		.blkio_update_group_weight_fn =	cfq_update_blkio_group_weight,
 	},
 	.plid = BLKIO_POLICY_PROP,
 	.pdata_size = sizeof(struct cfq_group),
+	.cftypes = cfq_blkcg_files,
 };
 #endif
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 12/21] cfq: collapse cfq.h into cfq-iosched.c
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (10 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 11/21] blkcg: move conf/stat file handling code to policies Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 13/21] blkcg: move statistics update code to policies Tejun Heo
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

block/cfq.h contains some functions which interact with blkcg;
however, this is only part of it and cfq-iosched.c already has quite
some #ifdef CONFIG_CFQ_GROUP_IOSCHED.  With conf/stat handling being
moved to specific policies, having these relay functions isolated in
cfq.h doesn't make much sense.  Collapse cfq.h into cfq-iosched.c for
now.  Let's split blkcg support properly later if necessary.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/cfq-iosched.c |  114 ++++++++++++++++++++++++++++++++++++++++++++++++-
 block/cfq.h         |  118 ---------------------------------------------------
 2 files changed, 113 insertions(+), 119 deletions(-)
 delete mode 100644 block/cfq.h

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 119e061..2e13e9e 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -15,7 +15,6 @@
 #include <linux/ioprio.h>
 #include <linux/blktrace_api.h>
 #include "blk.h"
-#include "cfq.h"
 
 static struct blkio_policy_type blkio_policy_cfq;
 
@@ -367,6 +366,9 @@ CFQ_CFQQ_FNS(wait_busy);
 #undef CFQ_CFQQ_FNS
 
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
+
+#include "blk-cgroup.h"
+
 static inline struct cfq_group *blkg_to_cfqg(struct blkio_group *blkg)
 {
 	return blkg_to_pdata(blkg, &blkio_policy_cfq);
@@ -396,6 +398,82 @@ static inline void cfqg_put(struct cfq_group *cfqg)
 	blk_add_trace_msg((cfqd)->queue, "%s " fmt,			\
 			blkg_path(cfqg_to_blkg((cfqg))), ##args)	\
 
+static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol,
+			struct blkio_group *curr_blkg,
+			bool direction, bool sync)
+{
+	blkiocg_update_io_add_stats(blkg, pol, curr_blkg, direction, sync);
+}
+
+static inline void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, unsigned long dequeue)
+{
+	blkiocg_update_dequeue_stats(blkg, pol, dequeue);
+}
+
+static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, unsigned long time,
+			unsigned long unaccounted_time)
+{
+	blkiocg_update_timeslice_used(blkg, pol, time, unaccounted_time);
+}
+
+static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
+			struct blkio_policy_type *pol)
+{
+	blkiocg_set_start_empty_time(blkg, pol);
+}
+
+static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, bool direction,
+			bool sync)
+{
+	blkiocg_update_io_remove_stats(blkg, pol, direction, sync);
+}
+
+static inline void cfq_blkiocg_update_io_merged_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, bool direction,
+			bool sync)
+{
+	blkiocg_update_io_merged_stats(blkg, pol, direction, sync);
+}
+
+static inline void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol)
+{
+	blkiocg_update_idle_time_stats(blkg, pol);
+}
+
+static inline void
+cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol)
+{
+	blkiocg_update_avg_queue_size_stats(blkg, pol);
+}
+
+static inline void
+cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol)
+{
+	blkiocg_update_set_idle_time_stats(blkg, pol);
+}
+
+static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, uint64_t bytes,
+			bool direction, bool sync)
+{
+	blkiocg_update_dispatch_stats(blkg, pol, bytes, direction, sync);
+}
+
+static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, uint64_t start_time,
+			uint64_t io_start_time, bool direction, bool sync)
+{
+	blkiocg_update_completion_stats(blkg, pol, start_time, io_start_time,
+					direction, sync);
+}
+
 #else	/* CONFIG_CFQ_GROUP_IOSCHED */
 
 static inline struct cfq_group *blkg_to_cfqg(struct blkio_group *blkg) { return NULL; }
@@ -407,6 +485,40 @@ static inline void cfqg_put(struct cfq_group *cfqg) { }
 	blk_add_trace_msg((cfqd)->queue, "cfq%d " fmt, (cfqq)->pid, ##args)
 #define cfq_log_cfqg(cfqd, cfqg, fmt, args...)		do {} while (0)
 
+static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol,
+			struct blkio_group *curr_blkg, bool direction,
+			bool sync) { }
+static inline void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, unsigned long dequeue) { }
+static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, unsigned long time,
+			unsigned long unaccounted_time) { }
+static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
+			struct blkio_policy_type *pol) { }
+static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, bool direction,
+			bool sync) { }
+static inline void cfq_blkiocg_update_io_merged_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, bool direction,
+			bool sync) { }
+static inline void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol) { }
+static inline void
+cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
+					struct blkio_policy_type *pol) { }
+
+static inline void
+cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
+				       struct blkio_policy_type *pol) { }
+
+static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, uint64_t bytes,
+			bool direction, bool sync) { }
+static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
+			struct blkio_policy_type *pol, uint64_t start_time,
+			uint64_t io_start_time, bool direction, bool sync) { }
+
 #endif	/* CONFIG_CFQ_GROUP_IOSCHED */
 
 #define cfq_log(cfqd, fmt, args...)	\
diff --git a/block/cfq.h b/block/cfq.h
deleted file mode 100644
index c8b15ef..0000000
--- a/block/cfq.h
+++ /dev/null
@@ -1,118 +0,0 @@
-#ifndef _CFQ_H
-#define _CFQ_H
-#include "blk-cgroup.h"
-
-#ifdef CONFIG_CFQ_GROUP_IOSCHED
-static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol,
-			struct blkio_group *curr_blkg,
-			bool direction, bool sync)
-{
-	blkiocg_update_io_add_stats(blkg, pol, curr_blkg, direction, sync);
-}
-
-static inline void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, unsigned long dequeue)
-{
-	blkiocg_update_dequeue_stats(blkg, pol, dequeue);
-}
-
-static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, unsigned long time,
-			unsigned long unaccounted_time)
-{
-	blkiocg_update_timeslice_used(blkg, pol, time, unaccounted_time);
-}
-
-static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
-			struct blkio_policy_type *pol)
-{
-	blkiocg_set_start_empty_time(blkg, pol);
-}
-
-static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, bool direction,
-			bool sync)
-{
-	blkiocg_update_io_remove_stats(blkg, pol, direction, sync);
-}
-
-static inline void cfq_blkiocg_update_io_merged_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, bool direction,
-			bool sync)
-{
-	blkiocg_update_io_merged_stats(blkg, pol, direction, sync);
-}
-
-static inline void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol)
-{
-	blkiocg_update_idle_time_stats(blkg, pol);
-}
-
-static inline void
-cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol)
-{
-	blkiocg_update_avg_queue_size_stats(blkg, pol);
-}
-
-static inline void
-cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol)
-{
-	blkiocg_update_set_idle_time_stats(blkg, pol);
-}
-
-static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, uint64_t bytes,
-			bool direction, bool sync)
-{
-	blkiocg_update_dispatch_stats(blkg, pol, bytes, direction, sync);
-}
-
-static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, uint64_t start_time,
-			uint64_t io_start_time, bool direction, bool sync)
-{
-	blkiocg_update_completion_stats(blkg, pol, start_time, io_start_time,
-					direction, sync);
-}
-
-#else /* CFQ_GROUP_IOSCHED */
-static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol,
-			struct blkio_group *curr_blkg, bool direction,
-			bool sync) { }
-static inline void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, unsigned long dequeue) { }
-static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, unsigned long time,
-			unsigned long unaccounted_time) { }
-static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
-			struct blkio_policy_type *pol) { }
-static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, bool direction,
-			bool sync) { }
-static inline void cfq_blkiocg_update_io_merged_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, bool direction,
-			bool sync) { }
-static inline void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol) { }
-static inline void
-cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
-					struct blkio_policy_type *pol) { }
-
-static inline void
-cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
-				       struct blkio_policy_type *pol) { }
-
-static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, uint64_t bytes,
-			bool direction, bool sync) { }
-static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, uint64_t start_time,
-			uint64_t io_start_time, bool direction, bool sync) { }
-
-#endif /* CFQ_GROUP_IOSCHED */
-#endif
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 13/21] blkcg: move statistics update code to policies
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (11 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 12/21] cfq: collapse cfq.h into cfq-iosched.c Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 14/21] blkcg: cfq doesn't need per-cpu dispatch stats Tejun Heo
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

As with conf/stats file handling code, there's no reason for stat
update code to live in blkcg core with policies calling into update
them.  The current organization is both inflexible and complex.

This patch moves stat update code to specific policies.  All
blkiocg_update_*_stats() functions which deal with BLKIO_POLICY_PROP
stats are collapsed into their cfq_blkiocg_update_*_stats()
counterparts.  blkiocg_update_dispatch_stats() is used by both
policies and duplicated as throtl_update_dispatch_stats() and
cfq_blkiocg_update_dispatch_stats().  This will be cleaned up later.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c   |  245 -------------------------------------------
 block/blk-cgroup.h   |   94 -----------------
 block/blk-throttle.c |   37 ++++++--
 block/cfq-iosched.c  |  280 +++++++++++++++++++++++++++++++++++++++++---------
 4 files changed, 259 insertions(+), 397 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 96b6b5a..dfa5f2c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -63,251 +63,6 @@ struct blkio_cgroup *bio_blkio_cgroup(struct bio *bio)
 }
 EXPORT_SYMBOL_GPL(bio_blkio_cgroup);
 
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-/* This should be called with the queue_lock held. */
-static void blkio_set_start_group_wait_time(struct blkio_group *blkg,
-					    struct blkio_policy_type *pol,
-					    struct blkio_group *curr_blkg)
-{
-	struct blkg_policy_data *pd = blkg->pd[pol->plid];
-
-	if (blkio_blkg_waiting(&pd->stats))
-		return;
-	if (blkg == curr_blkg)
-		return;
-	pd->stats.start_group_wait_time = sched_clock();
-	blkio_mark_blkg_waiting(&pd->stats);
-}
-
-/* This should be called with the queue_lock held. */
-static void blkio_update_group_wait_time(struct blkio_group_stats *stats)
-{
-	unsigned long long now;
-
-	if (!blkio_blkg_waiting(stats))
-		return;
-
-	now = sched_clock();
-	if (time_after64(now, stats->start_group_wait_time))
-		blkg_stat_add(&stats->group_wait_time,
-			      now - stats->start_group_wait_time);
-	blkio_clear_blkg_waiting(stats);
-}
-
-/* This should be called with the queue_lock held. */
-static void blkio_end_empty_time(struct blkio_group_stats *stats)
-{
-	unsigned long long now;
-
-	if (!blkio_blkg_empty(stats))
-		return;
-
-	now = sched_clock();
-	if (time_after64(now, stats->start_empty_time))
-		blkg_stat_add(&stats->empty_time,
-			      now - stats->start_empty_time);
-	blkio_clear_blkg_empty(stats);
-}
-
-void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
-					struct blkio_policy_type *pol)
-{
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-
-	lockdep_assert_held(blkg->q->queue_lock);
-	BUG_ON(blkio_blkg_idling(stats));
-
-	stats->start_idle_time = sched_clock();
-	blkio_mark_blkg_idling(stats);
-}
-EXPORT_SYMBOL_GPL(blkiocg_update_set_idle_time_stats);
-
-void blkiocg_update_idle_time_stats(struct blkio_group *blkg,
-				    struct blkio_policy_type *pol)
-{
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	if (blkio_blkg_idling(stats)) {
-		unsigned long long now = sched_clock();
-
-		if (time_after64(now, stats->start_idle_time))
-			blkg_stat_add(&stats->idle_time,
-				      now - stats->start_idle_time);
-		blkio_clear_blkg_idling(stats);
-	}
-}
-EXPORT_SYMBOL_GPL(blkiocg_update_idle_time_stats);
-
-void blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
-					 struct blkio_policy_type *pol)
-{
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_stat_add(&stats->avg_queue_size_sum,
-		      blkg_rwstat_sum(&stats->queued));
-	blkg_stat_add(&stats->avg_queue_size_samples, 1);
-	blkio_update_group_wait_time(stats);
-}
-EXPORT_SYMBOL_GPL(blkiocg_update_avg_queue_size_stats);
-
-void blkiocg_set_start_empty_time(struct blkio_group *blkg,
-				  struct blkio_policy_type *pol)
-{
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	if (blkg_rwstat_sum(&stats->queued))
-		return;
-
-	/*
-	 * group is already marked empty. This can happen if cfqq got new
-	 * request in parent group and moved to this group while being added
-	 * to service tree. Just ignore the event and move on.
-	 */
-	if (blkio_blkg_empty(stats))
-		return;
-
-	stats->start_empty_time = sched_clock();
-	blkio_mark_blkg_empty(stats);
-}
-EXPORT_SYMBOL_GPL(blkiocg_set_start_empty_time);
-
-void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-				  struct blkio_policy_type *pol,
-				  unsigned long dequeue)
-{
-	struct blkg_policy_data *pd = blkg->pd[pol->plid];
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_stat_add(&pd->stats.dequeue, dequeue);
-}
-EXPORT_SYMBOL_GPL(blkiocg_update_dequeue_stats);
-#else
-static inline void blkio_set_start_group_wait_time(struct blkio_group *blkg,
-					struct blkio_policy_type *pol,
-					struct blkio_group *curr_blkg) { }
-static inline void blkio_end_empty_time(struct blkio_group_stats *stats) { }
-#endif
-
-void blkiocg_update_io_add_stats(struct blkio_group *blkg,
-				 struct blkio_policy_type *pol,
-				 struct blkio_group *curr_blkg, bool direction,
-				 bool sync)
-{
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_rwstat_add(&stats->queued, rw, 1);
-	blkio_end_empty_time(stats);
-	blkio_set_start_group_wait_time(blkg, pol, curr_blkg);
-}
-EXPORT_SYMBOL_GPL(blkiocg_update_io_add_stats);
-
-void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
-				    struct blkio_policy_type *pol,
-				    bool direction, bool sync)
-{
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_rwstat_add(&stats->queued, rw, -1);
-}
-EXPORT_SYMBOL_GPL(blkiocg_update_io_remove_stats);
-
-void blkiocg_update_timeslice_used(struct blkio_group *blkg,
-				   struct blkio_policy_type *pol,
-				   unsigned long time,
-				   unsigned long unaccounted_time)
-{
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_stat_add(&stats->time, time);
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-	blkg_stat_add(&stats->unaccounted_time, unaccounted_time);
-#endif
-}
-EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used);
-
-/*
- * should be called under rcu read lock or queue lock to make sure blkg pointer
- * is valid.
- */
-void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
-				   struct blkio_policy_type *pol,
-				   uint64_t bytes, bool direction, bool sync)
-{
-	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-	struct blkg_policy_data *pd = blkg->pd[pol->plid];
-	struct blkio_group_stats_cpu *stats_cpu;
-	unsigned long flags;
-
-	/* If per cpu stats are not allocated yet, don't do any accounting. */
-	if (pd->stats_cpu == NULL)
-		return;
-
-	/*
-	 * Disabling interrupts to provide mutual exclusion between two
-	 * writes on same cpu. It probably is not needed for 64bit. Not
-	 * optimizing that case yet.
-	 */
-	local_irq_save(flags);
-
-	stats_cpu = this_cpu_ptr(pd->stats_cpu);
-
-	blkg_stat_add(&stats_cpu->sectors, bytes >> 9);
-	blkg_rwstat_add(&stats_cpu->serviced, rw, 1);
-	blkg_rwstat_add(&stats_cpu->service_bytes, rw, bytes);
-
-	local_irq_restore(flags);
-}
-EXPORT_SYMBOL_GPL(blkiocg_update_dispatch_stats);
-
-void blkiocg_update_completion_stats(struct blkio_group *blkg,
-				     struct blkio_policy_type *pol,
-				     uint64_t start_time,
-				     uint64_t io_start_time, bool direction,
-				     bool sync)
-{
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-	unsigned long long now = sched_clock();
-	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	if (time_after64(now, io_start_time))
-		blkg_rwstat_add(&stats->service_time, rw, now - io_start_time);
-	if (time_after64(io_start_time, start_time))
-		blkg_rwstat_add(&stats->wait_time, rw,
-				io_start_time - start_time);
-}
-EXPORT_SYMBOL_GPL(blkiocg_update_completion_stats);
-
-/*  Merged stats are per cpu.  */
-void blkiocg_update_io_merged_stats(struct blkio_group *blkg,
-				    struct blkio_policy_type *pol,
-				    bool direction, bool sync)
-{
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_rwstat_add(&stats->merged, rw, 1);
-}
-EXPORT_SYMBOL_GPL(blkiocg_update_io_merged_stats);
-
 /*
  * Worker for allocating per cpu stat for blk groups. This is scheduled on
  * the system_nrt_wq once there are some groups on the alloc_list waiting
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index ba64b28..0b0a176 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -44,13 +44,6 @@ enum blkg_rwstat_type {
 	BLKG_RWSTAT_TOTAL = BLKG_RWSTAT_NR,
 };
 
-/* blkg state flags */
-enum blkg_state_flags {
-	BLKG_waiting = 0,
-	BLKG_idling,
-	BLKG_empty,
-};
-
 struct blkio_cgroup {
 	struct cgroup_subsys_state css;
 	unsigned int weight;
@@ -416,52 +409,6 @@ static inline void blkg_put(struct blkio_group *blkg) { }
 #define BLKIO_WEIGHT_MAX	1000
 #define BLKIO_WEIGHT_DEFAULT	500
 
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-void blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
-					 struct blkio_policy_type *pol);
-void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-				  struct blkio_policy_type *pol,
-				  unsigned long dequeue);
-void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
-					struct blkio_policy_type *pol);
-void blkiocg_update_idle_time_stats(struct blkio_group *blkg,
-				    struct blkio_policy_type *pol);
-void blkiocg_set_start_empty_time(struct blkio_group *blkg,
-				  struct blkio_policy_type *pol);
-
-#define BLKG_FLAG_FNS(name)						\
-static inline void blkio_mark_blkg_##name(				\
-		struct blkio_group_stats *stats)			\
-{									\
-	stats->flags |= (1 << BLKG_##name);				\
-}									\
-static inline void blkio_clear_blkg_##name(				\
-		struct blkio_group_stats *stats)			\
-{									\
-	stats->flags &= ~(1 << BLKG_##name);				\
-}									\
-static inline int blkio_blkg_##name(struct blkio_group_stats *stats)	\
-{									\
-	return (stats->flags & (1 << BLKG_##name)) != 0;		\
-}									\
-
-BLKG_FLAG_FNS(waiting)
-BLKG_FLAG_FNS(idling)
-BLKG_FLAG_FNS(empty)
-#undef BLKG_FLAG_FNS
-#else
-static inline void blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol) { }
-static inline void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, unsigned long dequeue) { }
-static inline void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol) { }
-static inline void blkiocg_update_idle_time_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol) { }
-static inline void blkiocg_set_start_empty_time(struct blkio_group *blkg,
-			struct blkio_policy_type *pol) { }
-#endif
-
 #ifdef CONFIG_BLK_CGROUP
 extern struct blkio_cgroup blkio_root_cgroup;
 extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
@@ -471,28 +418,6 @@ extern struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg,
 struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
 				       struct request_queue *q,
 				       bool for_root);
-void blkiocg_update_timeslice_used(struct blkio_group *blkg,
-				   struct blkio_policy_type *pol,
-				   unsigned long time,
-				   unsigned long unaccounted_time);
-void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
-				   struct blkio_policy_type *pol,
-				   uint64_t bytes, bool direction, bool sync);
-void blkiocg_update_completion_stats(struct blkio_group *blkg,
-				     struct blkio_policy_type *pol,
-				     uint64_t start_time,
-				     uint64_t io_start_time, bool direction,
-				     bool sync);
-void blkiocg_update_io_merged_stats(struct blkio_group *blkg,
-				    struct blkio_policy_type *pol,
-				    bool direction, bool sync);
-void blkiocg_update_io_add_stats(struct blkio_group *blkg,
-				 struct blkio_policy_type *pol,
-				 struct blkio_group *curr_blkg, bool direction,
-				 bool sync);
-void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
-				    struct blkio_policy_type *pol,
-				    bool direction, bool sync);
 #else
 struct cgroup;
 static inline struct blkio_cgroup *
@@ -502,24 +427,5 @@ bio_blkio_cgroup(struct bio *bio) { return NULL; }
 
 static inline struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg,
 					      void *key) { return NULL; }
-static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, unsigned long time,
-			unsigned long unaccounted_time) { }
-static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, uint64_t bytes,
-			bool direction, bool sync) { }
-static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, uint64_t start_time,
-			uint64_t io_start_time, bool direction, bool sync) { }
-static inline void blkiocg_update_io_merged_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, bool direction,
-			bool sync) { }
-static inline void blkiocg_update_io_add_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol,
-			struct blkio_group *curr_blkg, bool direction,
-			bool sync) { }
-static inline void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, bool direction,
-			bool sync) { }
 #endif
 #endif /* _BLK_CGROUP_H */
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index fb6f257..5d647ed 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -562,17 +562,42 @@ static bool tg_may_dispatch(struct throtl_data *td, struct throtl_grp *tg,
 	return 0;
 }
 
+static void throtl_update_dispatch_stats(struct blkio_group *blkg, u64 bytes,
+					 int rw)
+{
+	struct blkg_policy_data *pd = blkg->pd[BLKIO_POLICY_THROTL];
+	struct blkio_group_stats_cpu *stats_cpu;
+	unsigned long flags;
+
+	/* If per cpu stats are not allocated yet, don't do any accounting. */
+	if (pd->stats_cpu == NULL)
+		return;
+
+	/*
+	 * Disabling interrupts to provide mutual exclusion between two
+	 * writes on same cpu. It probably is not needed for 64bit. Not
+	 * optimizing that case yet.
+	 */
+	local_irq_save(flags);
+
+	stats_cpu = this_cpu_ptr(pd->stats_cpu);
+
+	blkg_stat_add(&stats_cpu->sectors, bytes >> 9);
+	blkg_rwstat_add(&stats_cpu->serviced, rw, 1);
+	blkg_rwstat_add(&stats_cpu->service_bytes, rw, bytes);
+
+	local_irq_restore(flags);
+}
+
 static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 {
 	bool rw = bio_data_dir(bio);
-	bool sync = rw_is_sync(bio->bi_rw);
 
 	/* Charge the bio to the group */
 	tg->bytes_disp[rw] += bio->bi_size;
 	tg->io_disp[rw]++;
 
-	blkiocg_update_dispatch_stats(tg_to_blkg(tg), &blkio_policy_throtl,
-				      bio->bi_size, rw, sync);
+	throtl_update_dispatch_stats(tg_to_blkg(tg), bio->bi_size, bio->bi_rw);
 }
 
 static void throtl_add_bio_tg(struct throtl_data *td, struct throtl_grp *tg,
@@ -1012,10 +1037,8 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
 	tg = throtl_lookup_tg(td, blkcg);
 	if (tg) {
 		if (tg_no_rule_group(tg, rw)) {
-			blkiocg_update_dispatch_stats(tg_to_blkg(tg),
-						      &blkio_policy_throtl,
-						      bio->bi_size, rw,
-						      rw_is_sync(bio->bi_rw));
+			throtl_update_dispatch_stats(tg_to_blkg(tg),
+						     bio->bi_size, bio->bi_rw);
 			goto out_unlock_rcu;
 		}
 	}
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 2e13e9e..4991380 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -15,6 +15,7 @@
 #include <linux/ioprio.h>
 #include <linux/blktrace_api.h>
 #include "blk.h"
+#include "blk-cgroup.h"
 
 static struct blkio_policy_type blkio_policy_cfq;
 
@@ -365,9 +366,177 @@ CFQ_CFQQ_FNS(deep);
 CFQ_CFQQ_FNS(wait_busy);
 #undef CFQ_CFQQ_FNS
 
-#ifdef CONFIG_CFQ_GROUP_IOSCHED
+#if defined(CONFIG_CFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
 
-#include "blk-cgroup.h"
+/* blkg state flags */
+enum blkg_state_flags {
+	BLKG_waiting = 0,
+	BLKG_idling,
+	BLKG_empty,
+};
+
+#define BLKG_FLAG_FNS(name)						\
+static inline void blkio_mark_blkg_##name(				\
+		struct blkio_group_stats *stats)			\
+{									\
+	stats->flags |= (1 << BLKG_##name);				\
+}									\
+static inline void blkio_clear_blkg_##name(				\
+		struct blkio_group_stats *stats)			\
+{									\
+	stats->flags &= ~(1 << BLKG_##name);				\
+}									\
+static inline int blkio_blkg_##name(struct blkio_group_stats *stats)	\
+{									\
+	return (stats->flags & (1 << BLKG_##name)) != 0;		\
+}									\
+
+BLKG_FLAG_FNS(waiting)
+BLKG_FLAG_FNS(idling)
+BLKG_FLAG_FNS(empty)
+#undef BLKG_FLAG_FNS
+
+/* This should be called with the queue_lock held. */
+static void blkio_update_group_wait_time(struct blkio_group_stats *stats)
+{
+	unsigned long long now;
+
+	if (!blkio_blkg_waiting(stats))
+		return;
+
+	now = sched_clock();
+	if (time_after64(now, stats->start_group_wait_time))
+		blkg_stat_add(&stats->group_wait_time,
+			      now - stats->start_group_wait_time);
+	blkio_clear_blkg_waiting(stats);
+}
+
+/* This should be called with the queue_lock held. */
+static void blkio_set_start_group_wait_time(struct blkio_group *blkg,
+					    struct blkio_policy_type *pol,
+					    struct blkio_group *curr_blkg)
+{
+	struct blkg_policy_data *pd = blkg->pd[pol->plid];
+
+	if (blkio_blkg_waiting(&pd->stats))
+		return;
+	if (blkg == curr_blkg)
+		return;
+	pd->stats.start_group_wait_time = sched_clock();
+	blkio_mark_blkg_waiting(&pd->stats);
+}
+
+/* This should be called with the queue_lock held. */
+static void blkio_end_empty_time(struct blkio_group_stats *stats)
+{
+	unsigned long long now;
+
+	if (!blkio_blkg_empty(stats))
+		return;
+
+	now = sched_clock();
+	if (time_after64(now, stats->start_empty_time))
+		blkg_stat_add(&stats->empty_time,
+			      now - stats->start_empty_time);
+	blkio_clear_blkg_empty(stats);
+}
+
+static void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
+					     struct blkio_policy_type *pol,
+					     unsigned long dequeue)
+{
+	struct blkg_policy_data *pd = blkg->pd[pol->plid];
+
+	lockdep_assert_held(blkg->q->queue_lock);
+
+	blkg_stat_add(&pd->stats.dequeue, dequeue);
+}
+
+static void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
+					     struct blkio_policy_type *pol)
+{
+	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+
+	lockdep_assert_held(blkg->q->queue_lock);
+
+	if (blkg_rwstat_sum(&stats->queued))
+		return;
+
+	/*
+	 * group is already marked empty. This can happen if cfqq got new
+	 * request in parent group and moved to this group while being added
+	 * to service tree. Just ignore the event and move on.
+	 */
+	if (blkio_blkg_empty(stats))
+		return;
+
+	stats->start_empty_time = sched_clock();
+	blkio_mark_blkg_empty(stats);
+}
+
+static void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg,
+					       struct blkio_policy_type *pol)
+{
+	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+
+	lockdep_assert_held(blkg->q->queue_lock);
+
+	if (blkio_blkg_idling(stats)) {
+		unsigned long long now = sched_clock();
+
+		if (time_after64(now, stats->start_idle_time))
+			blkg_stat_add(&stats->idle_time,
+				      now - stats->start_idle_time);
+		blkio_clear_blkg_idling(stats);
+	}
+}
+
+static void cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
+						   struct blkio_policy_type *pol)
+{
+	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+
+	lockdep_assert_held(blkg->q->queue_lock);
+	BUG_ON(blkio_blkg_idling(stats));
+
+	stats->start_idle_time = sched_clock();
+	blkio_mark_blkg_idling(stats);
+}
+
+static void cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
+						    struct blkio_policy_type *pol)
+{
+	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+
+	lockdep_assert_held(blkg->q->queue_lock);
+
+	blkg_stat_add(&stats->avg_queue_size_sum,
+		      blkg_rwstat_sum(&stats->queued));
+	blkg_stat_add(&stats->avg_queue_size_samples, 1);
+	blkio_update_group_wait_time(stats);
+}
+
+#else	/* CONFIG_CFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */
+
+static void blkio_set_start_group_wait_time(struct blkio_group *blkg,
+					    struct blkio_policy_type *pol,
+					    struct blkio_group *curr_blkg) { }
+static void blkio_end_empty_time(struct blkio_group_stats *stats) { }
+static void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
+					     struct blkio_policy_type *pol,
+					     unsigned long dequeue) { }
+static void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
+					     struct blkio_policy_type *pol) { }
+static void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg,
+					       struct blkio_policy_type *pol) { }
+static void cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
+						   struct blkio_policy_type *pol) { }
+static void cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
+						    struct blkio_policy_type *pol) { }
+
+#endif	/* CONFIG_CFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */
+
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
 
 static inline struct cfq_group *blkg_to_cfqg(struct blkio_group *blkg)
 {
@@ -403,75 +572,98 @@ static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
 			struct blkio_group *curr_blkg,
 			bool direction, bool sync)
 {
-	blkiocg_update_io_add_stats(blkg, pol, curr_blkg, direction, sync);
-}
+	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
 
-static inline void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, unsigned long dequeue)
-{
-	blkiocg_update_dequeue_stats(blkg, pol, dequeue);
+	lockdep_assert_held(blkg->q->queue_lock);
+
+	blkg_rwstat_add(&stats->queued, rw, 1);
+	blkio_end_empty_time(stats);
+	blkio_set_start_group_wait_time(blkg, pol, curr_blkg);
 }
 
 static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg,
 			struct blkio_policy_type *pol, unsigned long time,
 			unsigned long unaccounted_time)
 {
-	blkiocg_update_timeslice_used(blkg, pol, time, unaccounted_time);
-}
+	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
 
-static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
-			struct blkio_policy_type *pol)
-{
-	blkiocg_set_start_empty_time(blkg, pol);
+	lockdep_assert_held(blkg->q->queue_lock);
+
+	blkg_stat_add(&stats->time, time);
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	blkg_stat_add(&stats->unaccounted_time, unaccounted_time);
+#endif
 }
 
 static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg,
 			struct blkio_policy_type *pol, bool direction,
 			bool sync)
 {
-	blkiocg_update_io_remove_stats(blkg, pol, direction, sync);
+	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
+
+	lockdep_assert_held(blkg->q->queue_lock);
+
+	blkg_rwstat_add(&stats->queued, rw, -1);
 }
 
 static inline void cfq_blkiocg_update_io_merged_stats(struct blkio_group *blkg,
 			struct blkio_policy_type *pol, bool direction,
 			bool sync)
 {
-	blkiocg_update_io_merged_stats(blkg, pol, direction, sync);
-}
+	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
 
-static inline void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol)
-{
-	blkiocg_update_idle_time_stats(blkg, pol);
-}
+	lockdep_assert_held(blkg->q->queue_lock);
 
-static inline void
-cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol)
-{
-	blkiocg_update_avg_queue_size_stats(blkg, pol);
-}
-
-static inline void
-cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol)
-{
-	blkiocg_update_set_idle_time_stats(blkg, pol);
+	blkg_rwstat_add(&stats->merged, rw, 1);
 }
 
 static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
 			struct blkio_policy_type *pol, uint64_t bytes,
 			bool direction, bool sync)
 {
-	blkiocg_update_dispatch_stats(blkg, pol, bytes, direction, sync);
+	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
+	struct blkg_policy_data *pd = blkg->pd[pol->plid];
+	struct blkio_group_stats_cpu *stats_cpu;
+	unsigned long flags;
+
+	/* If per cpu stats are not allocated yet, don't do any accounting. */
+	if (pd->stats_cpu == NULL)
+		return;
+
+	/*
+	 * Disabling interrupts to provide mutual exclusion between two
+	 * writes on same cpu. It probably is not needed for 64bit. Not
+	 * optimizing that case yet.
+	 */
+	local_irq_save(flags);
+
+	stats_cpu = this_cpu_ptr(pd->stats_cpu);
+
+	blkg_stat_add(&stats_cpu->sectors, bytes >> 9);
+	blkg_rwstat_add(&stats_cpu->serviced, rw, 1);
+	blkg_rwstat_add(&stats_cpu->service_bytes, rw, bytes);
+
+	local_irq_restore(flags);
 }
 
 static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
 			struct blkio_policy_type *pol, uint64_t start_time,
 			uint64_t io_start_time, bool direction, bool sync)
 {
-	blkiocg_update_completion_stats(blkg, pol, start_time, io_start_time,
-					direction, sync);
+	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+	unsigned long long now = sched_clock();
+	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
+
+	lockdep_assert_held(blkg->q->queue_lock);
+
+	if (time_after64(now, io_start_time))
+		blkg_rwstat_add(&stats->service_time, rw, now - io_start_time);
+	if (time_after64(io_start_time, start_time))
+		blkg_rwstat_add(&stats->wait_time, rw,
+				io_start_time - start_time);
 }
 
 #else	/* CONFIG_CFQ_GROUP_IOSCHED */
@@ -489,29 +681,15 @@ static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
 			struct blkio_policy_type *pol,
 			struct blkio_group *curr_blkg, bool direction,
 			bool sync) { }
-static inline void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, unsigned long dequeue) { }
 static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg,
 			struct blkio_policy_type *pol, unsigned long time,
 			unsigned long unaccounted_time) { }
-static inline void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
-			struct blkio_policy_type *pol) { }
 static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg,
 			struct blkio_policy_type *pol, bool direction,
 			bool sync) { }
 static inline void cfq_blkiocg_update_io_merged_stats(struct blkio_group *blkg,
 			struct blkio_policy_type *pol, bool direction,
 			bool sync) { }
-static inline void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol) { }
-static inline void
-cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
-					struct blkio_policy_type *pol) { }
-
-static inline void
-cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
-				       struct blkio_policy_type *pol) { }
-
 static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
 			struct blkio_policy_type *pol, uint64_t bytes,
 			bool direction, bool sync) { }
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 14/21] blkcg: cfq doesn't need per-cpu dispatch stats
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (12 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 13/21] blkcg: move statistics update code to policies Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 15/21] blkcg: add blkio_policy_ops operations for exit and stat reset Tejun Heo
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

blkio_group_stats_cpu is used to count dispatch stats using per-cpu
counters.  This is used by both blk-throtl and cfq-iosched but the
sharing is rather silly.

* cfq-iosched doesn't need per-cpu dispatch stats.  cfq always updates
  those stats while holding queue_lock.

* blk-throtl needs per-cpu dispatch stats but only service_bytes and
  serviced.  It doesn't make use of sectors.

This patch makes cfq add and use global stats for service_bytes,
serviced and sectors, removes per-cpu sectors counter and moves
per-cpu stat printing code to blk-throttle.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c   |   63 +------------------------------------------------
 block/blk-cgroup.h   |   12 ++++----
 block/blk-throttle.c |   31 +++++++++++++++++++++++-
 block/cfq-iosched.c  |   37 ++++++++---------------------
 4 files changed, 48 insertions(+), 95 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index dfa5f2c..16158e5 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -390,7 +390,6 @@ static void blkio_reset_stats_cpu(struct blkio_group *blkg, int plid)
 
 		blkg_rwstat_reset(&sc->service_bytes);
 		blkg_rwstat_reset(&sc->serviced);
-		blkg_stat_reset(&sc->sectors);
 	}
 }
 
@@ -417,6 +416,8 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 			struct blkio_group_stats *stats = &pd->stats;
 
 			/* queued stats shouldn't be cleared */
+			blkg_rwstat_reset(&stats->service_bytes);
+			blkg_rwstat_reset(&stats->serviced);
 			blkg_rwstat_reset(&stats->merged);
 			blkg_rwstat_reset(&stats->service_time);
 			blkg_rwstat_reset(&stats->wait_time);
@@ -577,66 +578,6 @@ int blkcg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
 }
 EXPORT_SYMBOL_GPL(blkcg_print_rwstat);
 
-static u64 blkg_prfill_cpu_stat(struct seq_file *sf,
-				struct blkg_policy_data *pd, int off)
-{
-	u64 v = 0;
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-		struct blkio_group_stats_cpu *sc =
-			per_cpu_ptr(pd->stats_cpu, cpu);
-
-		v += blkg_stat_read((void *)sc + off);
-	}
-
-	return __blkg_prfill_u64(sf, pd, v);
-}
-
-static u64 blkg_prfill_cpu_rwstat(struct seq_file *sf,
-				  struct blkg_policy_data *pd, int off)
-{
-	struct blkg_rwstat rwstat = { }, tmp;
-	int i, cpu;
-
-	for_each_possible_cpu(cpu) {
-		struct blkio_group_stats_cpu *sc =
-			per_cpu_ptr(pd->stats_cpu, cpu);
-
-		tmp = blkg_rwstat_read((void *)sc + off);
-		for (i = 0; i < BLKG_RWSTAT_NR; i++)
-			rwstat.cnt[i] += tmp.cnt[i];
-	}
-
-	return __blkg_prfill_rwstat(sf, pd, &rwstat);
-}
-
-/* print per-cpu blkg_stat specified by BLKCG_STAT_PRIV() */
-int blkcg_print_cpu_stat(struct cgroup *cgrp, struct cftype *cft,
-			 struct seq_file *sf)
-{
-	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-
-	blkcg_print_blkgs(sf, blkcg, blkg_prfill_cpu_stat,
-			  BLKCG_STAT_POL(cft->private),
-			  BLKCG_STAT_OFF(cft->private), false);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(blkcg_print_cpu_stat);
-
-/* print per-cpu blkg_rwstat specified by BLKCG_STAT_PRIV() */
-int blkcg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
-			   struct seq_file *sf)
-{
-	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-
-	blkcg_print_blkgs(sf, blkcg, blkg_prfill_cpu_rwstat,
-			  BLKCG_STAT_POL(cft->private),
-			  BLKCG_STAT_OFF(cft->private), true);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(blkcg_print_cpu_rwstat);
-
 /**
  * blkg_conf_prep - parse and prepare for per-blkg config update
  * @blkcg: target block cgroup
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 0b0a176..c82de47 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -65,6 +65,10 @@ struct blkg_rwstat {
 };
 
 struct blkio_group_stats {
+	/* total bytes transferred */
+	struct blkg_rwstat		service_bytes;
+	/* total IOs serviced, post merge */
+	struct blkg_rwstat		serviced;
 	/* number of ios merged */
 	struct blkg_rwstat		merged;
 	/* total time spent on device in ns, may not be accurate w/ queueing */
@@ -73,6 +77,8 @@ struct blkio_group_stats {
 	struct blkg_rwstat		wait_time;
 	/* number of IOs queued up */
 	struct blkg_rwstat		queued;
+	/* total sectors transferred */
+	struct blkg_stat		sectors;
 	/* total disk time and nr sectors dispatched by this group */
 	struct blkg_stat		time;
 #ifdef CONFIG_DEBUG_BLK_CGROUP
@@ -104,8 +110,6 @@ struct blkio_group_stats_cpu {
 	struct blkg_rwstat		service_bytes;
 	/* total IOs serviced, post merge */
 	struct blkg_rwstat		serviced;
-	/* total sectors transferred */
-	struct blkg_stat		sectors;
 };
 
 struct blkio_group_conf {
@@ -183,10 +187,6 @@ int blkcg_print_stat(struct cgroup *cgrp, struct cftype *cft,
 		     struct seq_file *sf);
 int blkcg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
 		       struct seq_file *sf);
-int blkcg_print_cpu_stat(struct cgroup *cgrp, struct cftype *cft,
-			 struct seq_file *sf);
-int blkcg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
-			   struct seq_file *sf);
 
 struct blkg_conf_ctx {
 	struct gendisk		*disk;
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 5d647ed..cb259bc 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -582,7 +582,6 @@ static void throtl_update_dispatch_stats(struct blkio_group *blkg, u64 bytes,
 
 	stats_cpu = this_cpu_ptr(pd->stats_cpu);
 
-	blkg_stat_add(&stats_cpu->sectors, bytes >> 9);
 	blkg_rwstat_add(&stats_cpu->serviced, rw, 1);
 	blkg_rwstat_add(&stats_cpu->service_bytes, rw, bytes);
 
@@ -843,6 +842,36 @@ static void throtl_update_blkio_group_common(struct throtl_data *td,
 	throtl_schedule_delayed_work(td, 0);
 }
 
+static u64 blkg_prfill_cpu_rwstat(struct seq_file *sf,
+				  struct blkg_policy_data *pd, int off)
+{
+	struct blkg_rwstat rwstat = { }, tmp;
+	int i, cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct blkio_group_stats_cpu *sc =
+			per_cpu_ptr(pd->stats_cpu, cpu);
+
+		tmp = blkg_rwstat_read((void *)sc + off);
+		for (i = 0; i < BLKG_RWSTAT_NR; i++)
+			rwstat.cnt[i] += tmp.cnt[i];
+	}
+
+	return __blkg_prfill_rwstat(sf, pd, &rwstat);
+}
+
+/* print per-cpu blkg_rwstat specified by BLKCG_STAT_PRIV() */
+static int blkcg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
+				  struct seq_file *sf)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_cpu_rwstat,
+			  BLKCG_STAT_POL(cft->private),
+			  BLKCG_STAT_OFF(cft->private), true);
+	return 0;
+}
+
 static u64 blkg_prfill_conf_u64(struct seq_file *sf,
 				struct blkg_policy_data *pd, int off)
 {
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 4991380..effd894 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -624,29 +624,12 @@ static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
 			struct blkio_policy_type *pol, uint64_t bytes,
 			bool direction, bool sync)
 {
+	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
 	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-	struct blkg_policy_data *pd = blkg->pd[pol->plid];
-	struct blkio_group_stats_cpu *stats_cpu;
-	unsigned long flags;
 
-	/* If per cpu stats are not allocated yet, don't do any accounting. */
-	if (pd->stats_cpu == NULL)
-		return;
-
-	/*
-	 * Disabling interrupts to provide mutual exclusion between two
-	 * writes on same cpu. It probably is not needed for 64bit. Not
-	 * optimizing that case yet.
-	 */
-	local_irq_save(flags);
-
-	stats_cpu = this_cpu_ptr(pd->stats_cpu);
-
-	blkg_stat_add(&stats_cpu->sectors, bytes >> 9);
-	blkg_rwstat_add(&stats_cpu->serviced, rw, 1);
-	blkg_rwstat_add(&stats_cpu->service_bytes, rw, bytes);
-
-	local_irq_restore(flags);
+	blkg_stat_add(&stats->sectors, bytes >> 9);
+	blkg_rwstat_add(&stats->serviced, rw, 1);
+	blkg_rwstat_add(&stats->service_bytes, rw, bytes);
 }
 
 static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
@@ -1520,20 +1503,20 @@ static struct cftype cfq_blkcg_files[] = {
 	{
 		.name = "sectors",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats_cpu, sectors)),
-		.read_seq_string = blkcg_print_cpu_stat,
+				offsetof(struct blkio_group_stats, sectors)),
+		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "io_service_bytes",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats_cpu, service_bytes)),
-		.read_seq_string = blkcg_print_cpu_rwstat,
+				offsetof(struct blkio_group_stats, service_bytes)),
+		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "io_serviced",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats_cpu, serviced)),
-		.read_seq_string = blkcg_print_cpu_rwstat,
+				offsetof(struct blkio_group_stats, serviced)),
+		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "io_service_time",
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 15/21] blkcg: add blkio_policy_ops operations for exit and stat reset
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (13 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 14/21] blkcg: cfq doesn't need per-cpu dispatch stats Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 16/21] blkcg: move blkio_group_stats to cfq-iosched.c Tejun Heo
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

Add blkio_policy_ops->blkio_exit_group_fn() and
->blkio_reset_group_stats_fn().  These will be used to further
modularize blkcg policy implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c |   16 ++++++++++++----
 block/blk-cgroup.h |    4 ++++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 16158e5..e4848ee 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -131,12 +131,17 @@ static void blkg_free(struct blkio_group *blkg)
 		return;
 
 	for (i = 0; i < BLKIO_NR_POLICIES; i++) {
+		struct blkio_policy_type *pol = blkio_policy[i];
 		struct blkg_policy_data *pd = blkg->pd[i];
 
-		if (pd) {
-			free_percpu(pd->stats_cpu);
-			kfree(pd);
-		}
+		if (!pd)
+			continue;
+
+		if (pol && pol->ops.blkio_exit_group_fn)
+			pol->ops.blkio_exit_group_fn(blkg);
+
+		free_percpu(pd->stats_cpu);
+		kfree(pd);
 	}
 
 	kfree(blkg);
@@ -432,6 +437,9 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 			blkg_stat_reset(&stats->empty_time);
 #endif
 			blkio_reset_stats_cpu(blkg, pol->plid);
+
+			if (pol->ops.blkio_reset_group_stats_fn)
+				pol->ops.blkio_reset_group_stats_fn(blkg);
 		}
 	}
 
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index c82de47..d0ee649 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -153,9 +153,13 @@ struct blkio_group {
 };
 
 typedef void (blkio_init_group_fn)(struct blkio_group *blkg);
+typedef void (blkio_exit_group_fn)(struct blkio_group *blkg);
+typedef void (blkio_reset_group_stats_fn)(struct blkio_group *blkg);
 
 struct blkio_policy_ops {
 	blkio_init_group_fn *blkio_init_group_fn;
+	blkio_exit_group_fn *blkio_exit_group_fn;
+	blkio_reset_group_stats_fn *blkio_reset_group_stats_fn;
 };
 
 struct blkio_policy_type {
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 16/21] blkcg: move blkio_group_stats to cfq-iosched.c
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (14 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 15/21] blkcg: add blkio_policy_ops operations for exit and stat reset Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 17/21] blkcg: move blkio_group_stats_cpu and friends to blk-throttle.c Tejun Heo
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

blkio_group_stats contains only fields used by cfq and has no reason
to be defined in blkcg core.

* Move blkio_group_stats to cfq-iosched.c and rename it to cfqg_stats.

* blkg_policy_data->stats is replaced with cfq_group->stats.
  blkg_prfill_[rw]stat() are updated to use offset against pd->pdata
  instead.

* All related macros / functions are renamed so that they have cfqg_
  prefix and the unnecessary @pol arguments are dropped.

* All stat functions now take cfq_group * instead of blkio_group *.

* lockdep assertion on queue lock dropped.  Elevator runs under queue
  lock by default.  There isn't much to be gained by adding lockdep
  assertions at stat function level.

* cfqg_stats_reset() implemented for blkio_reset_group_stats_fn method
  so that cfqg->stats can be reset.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c  |   23 +---
 block/blk-cgroup.h  |   41 -----
 block/cfq-iosched.c |  407 ++++++++++++++++++++++++---------------------------
 3 files changed, 193 insertions(+), 278 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index e4848ee..d7c7f17 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -417,25 +417,6 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 		struct blkio_policy_type *pol;
 
 		list_for_each_entry(pol, &blkio_list, list) {
-			struct blkg_policy_data *pd = blkg->pd[pol->plid];
-			struct blkio_group_stats *stats = &pd->stats;
-
-			/* queued stats shouldn't be cleared */
-			blkg_rwstat_reset(&stats->service_bytes);
-			blkg_rwstat_reset(&stats->serviced);
-			blkg_rwstat_reset(&stats->merged);
-			blkg_rwstat_reset(&stats->service_time);
-			blkg_rwstat_reset(&stats->wait_time);
-			blkg_stat_reset(&stats->time);
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-			blkg_stat_reset(&stats->unaccounted_time);
-			blkg_stat_reset(&stats->avg_queue_size_sum);
-			blkg_stat_reset(&stats->avg_queue_size_samples);
-			blkg_stat_reset(&stats->dequeue);
-			blkg_stat_reset(&stats->group_wait_time);
-			blkg_stat_reset(&stats->idle_time);
-			blkg_stat_reset(&stats->empty_time);
-#endif
 			blkio_reset_stats_cpu(blkg, pol->plid);
 
 			if (pol->ops.blkio_reset_group_stats_fn)
@@ -549,13 +530,13 @@ static u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd,
 			    int off)
 {
 	return __blkg_prfill_u64(sf, pd,
-				 blkg_stat_read((void *)&pd->stats + off));
+				 blkg_stat_read((void *)pd->pdata + off));
 }
 
 static u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
 			      int off)
 {
-	struct blkg_rwstat rwstat = blkg_rwstat_read((void *)&pd->stats + off);
+	struct blkg_rwstat rwstat = blkg_rwstat_read((void *)pd->pdata + off);
 
 	return __blkg_prfill_rwstat(sf, pd, &rwstat);
 }
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index d0ee649..791570394 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -64,46 +64,6 @@ struct blkg_rwstat {
 	uint64_t			cnt[BLKG_RWSTAT_NR];
 };
 
-struct blkio_group_stats {
-	/* total bytes transferred */
-	struct blkg_rwstat		service_bytes;
-	/* total IOs serviced, post merge */
-	struct blkg_rwstat		serviced;
-	/* number of ios merged */
-	struct blkg_rwstat		merged;
-	/* total time spent on device in ns, may not be accurate w/ queueing */
-	struct blkg_rwstat		service_time;
-	/* total time spent waiting in scheduler queue in ns */
-	struct blkg_rwstat		wait_time;
-	/* number of IOs queued up */
-	struct blkg_rwstat		queued;
-	/* total sectors transferred */
-	struct blkg_stat		sectors;
-	/* total disk time and nr sectors dispatched by this group */
-	struct blkg_stat		time;
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-	/* time not charged to this cgroup */
-	struct blkg_stat		unaccounted_time;
-	/* sum of number of ios queued across all samples */
-	struct blkg_stat		avg_queue_size_sum;
-	/* count of samples taken for average */
-	struct blkg_stat		avg_queue_size_samples;
-	/* how many times this group has been removed from service tree */
-	struct blkg_stat		dequeue;
-	/* total time spent waiting for it to be assigned a timeslice. */
-	struct blkg_stat		group_wait_time;
-	/* time spent idling for this blkio_group */
-	struct blkg_stat		idle_time;
-	/* total time with empty current active q with other requests queued */
-	struct blkg_stat		empty_time;
-	/* fields after this shouldn't be cleared on stat reset */
-	uint64_t			start_group_wait_time;
-	uint64_t			start_idle_time;
-	uint64_t			start_empty_time;
-	uint16_t			flags;
-#endif
-};
-
 /* Per cpu blkio group stats */
 struct blkio_group_stats_cpu {
 	/* total bytes transferred */
@@ -126,7 +86,6 @@ struct blkg_policy_data {
 	/* Configuration */
 	struct blkio_group_conf conf;
 
-	struct blkio_group_stats stats;
 	/* Per cpu stats pointer */
 	struct blkio_group_stats_cpu __percpu *stats_cpu;
 
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index effd894..a1f37df 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -173,6 +173,48 @@ enum wl_type_t {
 	SYNC_WORKLOAD = 2
 };
 
+struct cfqg_stats {
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
+	/* total bytes transferred */
+	struct blkg_rwstat		service_bytes;
+	/* total IOs serviced, post merge */
+	struct blkg_rwstat		serviced;
+	/* number of ios merged */
+	struct blkg_rwstat		merged;
+	/* total time spent on device in ns, may not be accurate w/ queueing */
+	struct blkg_rwstat		service_time;
+	/* total time spent waiting in scheduler queue in ns */
+	struct blkg_rwstat		wait_time;
+	/* number of IOs queued up */
+	struct blkg_rwstat		queued;
+	/* total sectors transferred */
+	struct blkg_stat		sectors;
+	/* total disk time and nr sectors dispatched by this group */
+	struct blkg_stat		time;
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	/* time not charged to this cgroup */
+	struct blkg_stat		unaccounted_time;
+	/* sum of number of ios queued across all samples */
+	struct blkg_stat		avg_queue_size_sum;
+	/* count of samples taken for average */
+	struct blkg_stat		avg_queue_size_samples;
+	/* how many times this group has been removed from service tree */
+	struct blkg_stat		dequeue;
+	/* total time spent waiting for it to be assigned a timeslice. */
+	struct blkg_stat		group_wait_time;
+	/* time spent idling for this blkio_group */
+	struct blkg_stat		idle_time;
+	/* total time with empty current active q with other requests queued */
+	struct blkg_stat		empty_time;
+	/* fields after this shouldn't be cleared on stat reset */
+	uint64_t			start_group_wait_time;
+	uint64_t			start_idle_time;
+	uint64_t			start_empty_time;
+	uint16_t			flags;
+#endif	/* CONFIG_DEBUG_BLK_CGROUP */
+#endif	/* CONFIG_CFQ_GROUP_IOSCHED */
+};
+
 /* This is per cgroup per device grouping structure */
 struct cfq_group {
 	/* group service_tree member */
@@ -212,6 +254,7 @@ struct cfq_group {
 	/* number of requests that are on the dispatch list or inside driver */
 	int dispatched;
 	struct cfq_ttime ttime;
+	struct cfqg_stats stats;
 };
 
 struct cfq_io_cq {
@@ -368,96 +411,84 @@ CFQ_CFQQ_FNS(wait_busy);
 
 #if defined(CONFIG_CFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
 
-/* blkg state flags */
-enum blkg_state_flags {
-	BLKG_waiting = 0,
-	BLKG_idling,
-	BLKG_empty,
+/* cfqg stats flags */
+enum cfqg_stats_flags {
+	CFQG_stats_waiting = 0,
+	CFQG_stats_idling,
+	CFQG_stats_empty,
 };
 
-#define BLKG_FLAG_FNS(name)						\
-static inline void blkio_mark_blkg_##name(				\
-		struct blkio_group_stats *stats)			\
+#define CFQG_FLAG_FNS(name)						\
+static inline void cfqg_stats_mark_##name(struct cfqg_stats *stats)	\
 {									\
-	stats->flags |= (1 << BLKG_##name);				\
+	stats->flags |= (1 << CFQG_stats_##name);			\
 }									\
-static inline void blkio_clear_blkg_##name(				\
-		struct blkio_group_stats *stats)			\
+static inline void cfqg_stats_clear_##name(struct cfqg_stats *stats)	\
 {									\
-	stats->flags &= ~(1 << BLKG_##name);				\
+	stats->flags &= ~(1 << CFQG_stats_##name);			\
 }									\
-static inline int blkio_blkg_##name(struct blkio_group_stats *stats)	\
+static inline int cfqg_stats_##name(struct cfqg_stats *stats)		\
 {									\
-	return (stats->flags & (1 << BLKG_##name)) != 0;		\
+	return (stats->flags & (1 << CFQG_stats_##name)) != 0;		\
 }									\
 
-BLKG_FLAG_FNS(waiting)
-BLKG_FLAG_FNS(idling)
-BLKG_FLAG_FNS(empty)
-#undef BLKG_FLAG_FNS
+CFQG_FLAG_FNS(waiting)
+CFQG_FLAG_FNS(idling)
+CFQG_FLAG_FNS(empty)
+#undef CFQG_FLAG_FNS
 
 /* This should be called with the queue_lock held. */
-static void blkio_update_group_wait_time(struct blkio_group_stats *stats)
+static void cfqg_stats_update_group_wait_time(struct cfqg_stats *stats)
 {
 	unsigned long long now;
 
-	if (!blkio_blkg_waiting(stats))
+	if (!cfqg_stats_waiting(stats))
 		return;
 
 	now = sched_clock();
 	if (time_after64(now, stats->start_group_wait_time))
 		blkg_stat_add(&stats->group_wait_time,
 			      now - stats->start_group_wait_time);
-	blkio_clear_blkg_waiting(stats);
+	cfqg_stats_clear_waiting(stats);
 }
 
 /* This should be called with the queue_lock held. */
-static void blkio_set_start_group_wait_time(struct blkio_group *blkg,
-					    struct blkio_policy_type *pol,
-					    struct blkio_group *curr_blkg)
+static void cfqg_stats_set_start_group_wait_time(struct cfq_group *cfqg,
+						 struct cfq_group *curr_cfqg)
 {
-	struct blkg_policy_data *pd = blkg->pd[pol->plid];
+	struct cfqg_stats *stats = &cfqg->stats;
 
-	if (blkio_blkg_waiting(&pd->stats))
+	if (cfqg_stats_waiting(stats))
 		return;
-	if (blkg == curr_blkg)
+	if (cfqg == curr_cfqg)
 		return;
-	pd->stats.start_group_wait_time = sched_clock();
-	blkio_mark_blkg_waiting(&pd->stats);
+	stats->start_group_wait_time = sched_clock();
+	cfqg_stats_mark_waiting(stats);
 }
 
 /* This should be called with the queue_lock held. */
-static void blkio_end_empty_time(struct blkio_group_stats *stats)
+static void cfqg_stats_end_empty_time(struct cfqg_stats *stats)
 {
 	unsigned long long now;
 
-	if (!blkio_blkg_empty(stats))
+	if (!cfqg_stats_empty(stats))
 		return;
 
 	now = sched_clock();
 	if (time_after64(now, stats->start_empty_time))
 		blkg_stat_add(&stats->empty_time,
 			      now - stats->start_empty_time);
-	blkio_clear_blkg_empty(stats);
+	cfqg_stats_clear_empty(stats);
 }
 
-static void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-					     struct blkio_policy_type *pol,
-					     unsigned long dequeue)
+static void cfqg_stats_update_dequeue(struct cfq_group *cfqg)
 {
-	struct blkg_policy_data *pd = blkg->pd[pol->plid];
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_stat_add(&pd->stats.dequeue, dequeue);
+	blkg_stat_add(&cfqg->stats.dequeue, 1);
 }
 
-static void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
-					     struct blkio_policy_type *pol)
+static void cfqg_stats_set_start_empty_time(struct cfq_group *cfqg)
 {
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-
-	lockdep_assert_held(blkg->q->queue_lock);
+	struct cfqg_stats *stats = &cfqg->stats;
 
 	if (blkg_rwstat_sum(&stats->queued))
 		return;
@@ -467,72 +498,57 @@ static void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
 	 * request in parent group and moved to this group while being added
 	 * to service tree. Just ignore the event and move on.
 	 */
-	if (blkio_blkg_empty(stats))
+	if (cfqg_stats_empty(stats))
 		return;
 
 	stats->start_empty_time = sched_clock();
-	blkio_mark_blkg_empty(stats);
+	cfqg_stats_mark_empty(stats);
 }
 
-static void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg,
-					       struct blkio_policy_type *pol)
+static void cfqg_stats_update_idle_time(struct cfq_group *cfqg)
 {
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+	struct cfqg_stats *stats = &cfqg->stats;
 
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	if (blkio_blkg_idling(stats)) {
+	if (cfqg_stats_idling(stats)) {
 		unsigned long long now = sched_clock();
 
 		if (time_after64(now, stats->start_idle_time))
 			blkg_stat_add(&stats->idle_time,
 				      now - stats->start_idle_time);
-		blkio_clear_blkg_idling(stats);
+		cfqg_stats_clear_idling(stats);
 	}
 }
 
-static void cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
-						   struct blkio_policy_type *pol)
+static void cfqg_stats_set_start_idle_time(struct cfq_group *cfqg)
 {
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+	struct cfqg_stats *stats = &cfqg->stats;
 
-	lockdep_assert_held(blkg->q->queue_lock);
-	BUG_ON(blkio_blkg_idling(stats));
+	BUG_ON(cfqg_stats_idling(stats));
 
 	stats->start_idle_time = sched_clock();
-	blkio_mark_blkg_idling(stats);
+	cfqg_stats_mark_idling(stats);
 }
 
-static void cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
-						    struct blkio_policy_type *pol)
+static void cfqg_stats_update_avg_queue_size(struct cfq_group *cfqg)
 {
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-
-	lockdep_assert_held(blkg->q->queue_lock);
+	struct cfqg_stats *stats = &cfqg->stats;
 
 	blkg_stat_add(&stats->avg_queue_size_sum,
 		      blkg_rwstat_sum(&stats->queued));
 	blkg_stat_add(&stats->avg_queue_size_samples, 1);
-	blkio_update_group_wait_time(stats);
+	cfqg_stats_update_group_wait_time(stats);
 }
 
 #else	/* CONFIG_CFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */
 
-static void blkio_set_start_group_wait_time(struct blkio_group *blkg,
-					    struct blkio_policy_type *pol,
-					    struct blkio_group *curr_blkg) { }
-static void blkio_end_empty_time(struct blkio_group_stats *stats) { }
-static void cfq_blkiocg_update_dequeue_stats(struct blkio_group *blkg,
-					     struct blkio_policy_type *pol,
-					     unsigned long dequeue) { }
-static void cfq_blkiocg_set_start_empty_time(struct blkio_group *blkg,
-					     struct blkio_policy_type *pol) { }
-static void cfq_blkiocg_update_idle_time_stats(struct blkio_group *blkg,
-					       struct blkio_policy_type *pol) { }
-static void cfq_blkiocg_update_set_idle_time_stats(struct blkio_group *blkg,
-						   struct blkio_policy_type *pol) { }
-static void cfq_blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg,
-						    struct blkio_policy_type *pol) { }
+static void cfqg_stats_set_start_group_wait_time(struct cfq_group *cfqg,
+						 struct cfq_group *curr_cfqg) { }
+static void cfqg_stats_end_empty_time(struct cfqg_stats *stats) { }
+static void cfqg_stats_update_dequeue(struct cfq_group *cfqg) { }
+static void cfqg_stats_set_start_empty_time(struct cfq_group *cfqg) { }
+static void cfqg_stats_update_idle_time(struct cfq_group *cfqg) { }
+static void cfqg_stats_set_start_idle_time(struct cfq_group *cfqg) { }
+static void cfqg_stats_update_avg_queue_size(struct cfq_group *cfqg) { }
 
 #endif	/* CONFIG_CFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */
 
@@ -567,80 +583,46 @@ static inline void cfqg_put(struct cfq_group *cfqg)
 	blk_add_trace_msg((cfqd)->queue, "%s " fmt,			\
 			blkg_path(cfqg_to_blkg((cfqg))), ##args)	\
 
-static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol,
-			struct blkio_group *curr_blkg,
-			bool direction, bool sync)
+static inline void cfqg_stats_update_io_add(struct cfq_group *cfqg,
+					    struct cfq_group *curr_cfqg, int rw)
 {
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_rwstat_add(&stats->queued, rw, 1);
-	blkio_end_empty_time(stats);
-	blkio_set_start_group_wait_time(blkg, pol, curr_blkg);
+	blkg_rwstat_add(&cfqg->stats.queued, rw, 1);
+	cfqg_stats_end_empty_time(&cfqg->stats);
+	cfqg_stats_set_start_group_wait_time(cfqg, curr_cfqg);
 }
 
-static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, unsigned long time,
-			unsigned long unaccounted_time)
+static inline void cfqg_stats_update_timeslice_used(struct cfq_group *cfqg,
+			unsigned long time, unsigned long unaccounted_time)
 {
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_stat_add(&stats->time, time);
+	blkg_stat_add(&cfqg->stats.time, time);
 #ifdef CONFIG_DEBUG_BLK_CGROUP
-	blkg_stat_add(&stats->unaccounted_time, unaccounted_time);
+	blkg_stat_add(&cfqg->stats.unaccounted_time, unaccounted_time);
 #endif
 }
 
-static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, bool direction,
-			bool sync)
+static inline void cfqg_stats_update_io_remove(struct cfq_group *cfqg, int rw)
 {
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_rwstat_add(&stats->queued, rw, -1);
+	blkg_rwstat_add(&cfqg->stats.queued, rw, -1);
 }
 
-static inline void cfq_blkiocg_update_io_merged_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, bool direction,
-			bool sync)
+static inline void cfqg_stats_update_io_merged(struct cfq_group *cfqg, int rw)
 {
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-
-	lockdep_assert_held(blkg->q->queue_lock);
-
-	blkg_rwstat_add(&stats->merged, rw, 1);
+	blkg_rwstat_add(&cfqg->stats.merged, rw, 1);
 }
 
-static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, uint64_t bytes,
-			bool direction, bool sync)
+static inline void cfqg_stats_update_dispatch(struct cfq_group *cfqg,
+					      uint64_t bytes, int rw)
 {
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
-	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-
-	blkg_stat_add(&stats->sectors, bytes >> 9);
-	blkg_rwstat_add(&stats->serviced, rw, 1);
-	blkg_rwstat_add(&stats->service_bytes, rw, bytes);
+	blkg_stat_add(&cfqg->stats.sectors, bytes >> 9);
+	blkg_rwstat_add(&cfqg->stats.serviced, rw, 1);
+	blkg_rwstat_add(&cfqg->stats.service_bytes, rw, bytes);
 }
 
-static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, uint64_t start_time,
-			uint64_t io_start_time, bool direction, bool sync)
+static inline void cfqg_stats_update_completion(struct cfq_group *cfqg,
+			uint64_t start_time, uint64_t io_start_time, int rw)
 {
-	struct blkio_group_stats *stats = &blkg->pd[pol->plid]->stats;
+	struct cfqg_stats *stats = &cfqg->stats;
 	unsigned long long now = sched_clock();
-	int rw = (direction ? REQ_WRITE : 0) | (sync ? REQ_SYNC : 0);
-
-	lockdep_assert_held(blkg->q->queue_lock);
 
 	if (time_after64(now, io_start_time))
 		blkg_rwstat_add(&stats->service_time, rw, now - io_start_time);
@@ -649,6 +631,29 @@ static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
 				io_start_time - start_time);
 }
 
+static void cfqg_stats_reset(struct blkio_group *blkg)
+{
+	struct cfq_group *cfqg = blkg_to_cfqg(blkg);
+	struct cfqg_stats *stats = &cfqg->stats;
+
+	/* queued stats shouldn't be cleared */
+	blkg_rwstat_reset(&stats->service_bytes);
+	blkg_rwstat_reset(&stats->serviced);
+	blkg_rwstat_reset(&stats->merged);
+	blkg_rwstat_reset(&stats->service_time);
+	blkg_rwstat_reset(&stats->wait_time);
+	blkg_stat_reset(&stats->time);
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	blkg_stat_reset(&stats->unaccounted_time);
+	blkg_stat_reset(&stats->avg_queue_size_sum);
+	blkg_stat_reset(&stats->avg_queue_size_samples);
+	blkg_stat_reset(&stats->dequeue);
+	blkg_stat_reset(&stats->group_wait_time);
+	blkg_stat_reset(&stats->idle_time);
+	blkg_stat_reset(&stats->empty_time);
+#endif
+}
+
 #else	/* CONFIG_CFQ_GROUP_IOSCHED */
 
 static inline struct cfq_group *blkg_to_cfqg(struct blkio_group *blkg) { return NULL; }
@@ -660,25 +665,16 @@ static inline void cfqg_put(struct cfq_group *cfqg) { }
 	blk_add_trace_msg((cfqd)->queue, "cfq%d " fmt, (cfqq)->pid, ##args)
 #define cfq_log_cfqg(cfqd, cfqg, fmt, args...)		do {} while (0)
 
-static inline void cfq_blkiocg_update_io_add_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol,
-			struct blkio_group *curr_blkg, bool direction,
-			bool sync) { }
-static inline void cfq_blkiocg_update_timeslice_used(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, unsigned long time,
-			unsigned long unaccounted_time) { }
-static inline void cfq_blkiocg_update_io_remove_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, bool direction,
-			bool sync) { }
-static inline void cfq_blkiocg_update_io_merged_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, bool direction,
-			bool sync) { }
-static inline void cfq_blkiocg_update_dispatch_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, uint64_t bytes,
-			bool direction, bool sync) { }
-static inline void cfq_blkiocg_update_completion_stats(struct blkio_group *blkg,
-			struct blkio_policy_type *pol, uint64_t start_time,
-			uint64_t io_start_time, bool direction, bool sync) { }
+static inline void cfqg_stats_update_io_add(struct cfq_group *cfqg,
+			struct cfq_group *curr_cfqg, int rw) { }
+static inline void cfqg_stats_update_timeslice_used(struct cfq_group *cfqg,
+			unsigned long time, unsigned long unaccounted_time) { }
+static inline void cfqg_stats_update_io_remove(struct cfq_group *cfqg, int rw) { }
+static inline void cfqg_stats_update_io_merged(struct cfq_group *cfqg, int rw) { }
+static inline void cfqg_stats_update_dispatch(struct cfq_group *cfqg,
+					      uint64_t bytes, int rw) { }
+static inline void cfqg_stats_update_completion(struct cfq_group *cfqg,
+			uint64_t start_time, uint64_t io_start_time, int rw) { }
 
 #endif	/* CONFIG_CFQ_GROUP_IOSCHED */
 
@@ -1233,8 +1229,7 @@ cfq_group_notify_queue_del(struct cfq_data *cfqd, struct cfq_group *cfqg)
 	cfq_log_cfqg(cfqd, cfqg, "del_from_rr group");
 	cfq_group_service_tree_del(st, cfqg);
 	cfqg->saved_workload_slice = 0;
-	cfq_blkiocg_update_dequeue_stats(cfqg_to_blkg(cfqg),
-					 &blkio_policy_cfq, 1);
+	cfqg_stats_update_dequeue(cfqg);
 }
 
 static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq,
@@ -1306,9 +1301,8 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
 		     "sl_used=%u disp=%u charge=%u iops=%u sect=%lu",
 		     used_sl, cfqq->slice_dispatch, charge,
 		     iops_mode(cfqd), cfqq->nr_sectors);
-	cfq_blkiocg_update_timeslice_used(cfqg_to_blkg(cfqg), &blkio_policy_cfq,
-					  used_sl, unaccounted_sl);
-	cfq_blkiocg_set_start_empty_time(cfqg_to_blkg(cfqg), &blkio_policy_cfq);
+	cfqg_stats_update_timeslice_used(cfqg, used_sl, unaccounted_sl);
+	cfqg_stats_set_start_empty_time(cfqg);
 }
 
 /**
@@ -1456,14 +1450,15 @@ static int blkcg_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
 }
 
 #ifdef CONFIG_DEBUG_BLK_CGROUP
-static u64 blkg_prfill_avg_queue_size(struct seq_file *sf,
+static u64 cfqg_prfill_avg_queue_size(struct seq_file *sf,
 				      struct blkg_policy_data *pd, int off)
 {
-	u64 samples = blkg_stat_read(&pd->stats.avg_queue_size_samples);
+	struct cfq_group *cfqg = (void *)pd->pdata;
+	u64 samples = blkg_stat_read(&cfqg->stats.avg_queue_size_samples);
 	u64 v = 0;
 
 	if (samples) {
-		v = blkg_stat_read(&pd->stats.avg_queue_size_sum);
+		v = blkg_stat_read(&cfqg->stats.avg_queue_size_sum);
 		do_div(v, samples);
 	}
 	__blkg_prfill_u64(sf, pd, v);
@@ -1471,12 +1466,12 @@ static u64 blkg_prfill_avg_queue_size(struct seq_file *sf,
 }
 
 /* print avg_queue_size */
-static int blkcg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft,
-				      struct seq_file *sf)
+static int cfqg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft,
+				     struct seq_file *sf)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
-	blkcg_print_blkgs(sf, blkcg, blkg_prfill_avg_queue_size,
+	blkcg_print_blkgs(sf, blkcg, cfqg_prfill_avg_queue_size,
 			  BLKIO_POLICY_PROP, 0, false);
 	return 0;
 }
@@ -1497,84 +1492,84 @@ static struct cftype cfq_blkcg_files[] = {
 	{
 		.name = "time",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, time)),
+				offsetof(struct cfq_group, stats.time)),
 		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "sectors",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, sectors)),
+				offsetof(struct cfq_group, stats.sectors)),
 		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "io_service_bytes",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, service_bytes)),
+				offsetof(struct cfq_group, stats.service_bytes)),
 		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "io_serviced",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, serviced)),
+				offsetof(struct cfq_group, stats.serviced)),
 		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "io_service_time",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, service_time)),
+				offsetof(struct cfq_group, stats.service_time)),
 		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "io_wait_time",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, wait_time)),
+				offsetof(struct cfq_group, stats.wait_time)),
 		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "io_merged",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, merged)),
+				offsetof(struct cfq_group, stats.merged)),
 		.read_seq_string = blkcg_print_rwstat,
 	},
 	{
 		.name = "io_queued",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, queued)),
+				offsetof(struct cfq_group, stats.queued)),
 		.read_seq_string = blkcg_print_rwstat,
 	},
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 	{
 		.name = "avg_queue_size",
-		.read_seq_string = blkcg_print_avg_queue_size,
+		.read_seq_string = cfqg_print_avg_queue_size,
 	},
 	{
 		.name = "group_wait_time",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, group_wait_time)),
+				offsetof(struct cfq_group, stats.group_wait_time)),
 		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "idle_time",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, idle_time)),
+				offsetof(struct cfq_group, stats.idle_time)),
 		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "empty_time",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, empty_time)),
+				offsetof(struct cfq_group, stats.empty_time)),
 		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "dequeue",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, dequeue)),
+				offsetof(struct cfq_group, stats.dequeue)),
 		.read_seq_string = blkcg_print_stat,
 	},
 	{
 		.name = "unaccounted_time",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct blkio_group_stats, unaccounted_time)),
+				offsetof(struct cfq_group, stats.unaccounted_time)),
 		.read_seq_string = blkcg_print_stat,
 	},
 #endif	/* CONFIG_DEBUG_BLK_CGROUP */
@@ -1858,14 +1853,10 @@ static void cfq_reposition_rq_rb(struct cfq_queue *cfqq, struct request *rq)
 {
 	elv_rb_del(&cfqq->sort_list, rq);
 	cfqq->queued[rq_is_sync(rq)]--;
-	cfq_blkiocg_update_io_remove_stats(cfqg_to_blkg(RQ_CFQG(rq)),
-					   &blkio_policy_cfq, rq_data_dir(rq),
-					   rq_is_sync(rq));
+	cfqg_stats_update_io_remove(RQ_CFQG(rq), rq->cmd_flags);
 	cfq_add_rq_rb(rq);
-	cfq_blkiocg_update_io_add_stats(cfqg_to_blkg(RQ_CFQG(rq)),
-					&blkio_policy_cfq,
-					cfqg_to_blkg(cfqq->cfqd->serving_group),
-					rq_data_dir(rq), rq_is_sync(rq));
+	cfqg_stats_update_io_add(RQ_CFQG(rq), cfqq->cfqd->serving_group,
+				 rq->cmd_flags);
 }
 
 static struct request *
@@ -1921,9 +1912,7 @@ static void cfq_remove_request(struct request *rq)
 	cfq_del_rq_rb(rq);
 
 	cfqq->cfqd->rq_queued--;
-	cfq_blkiocg_update_io_remove_stats(cfqg_to_blkg(RQ_CFQG(rq)),
-					   &blkio_policy_cfq, rq_data_dir(rq),
-					   rq_is_sync(rq));
+	cfqg_stats_update_io_remove(RQ_CFQG(rq), rq->cmd_flags);
 	if (rq->cmd_flags & REQ_PRIO) {
 		WARN_ON(!cfqq->prio_pending);
 		cfqq->prio_pending--;
@@ -1958,9 +1947,7 @@ static void cfq_merged_request(struct request_queue *q, struct request *req,
 static void cfq_bio_merged(struct request_queue *q, struct request *req,
 				struct bio *bio)
 {
-	cfq_blkiocg_update_io_merged_stats(cfqg_to_blkg(RQ_CFQG(req)),
-					   &blkio_policy_cfq, bio_data_dir(bio),
-					   cfq_bio_sync(bio));
+	cfqg_stats_update_io_merged(RQ_CFQG(req), bio->bi_rw);
 }
 
 static void
@@ -1982,9 +1969,7 @@ cfq_merged_requests(struct request_queue *q, struct request *rq,
 	if (cfqq->next_rq == next)
 		cfqq->next_rq = rq;
 	cfq_remove_request(next);
-	cfq_blkiocg_update_io_merged_stats(cfqg_to_blkg(RQ_CFQG(rq)),
-					   &blkio_policy_cfq, rq_data_dir(next),
-					   rq_is_sync(next));
+	cfqg_stats_update_io_merged(RQ_CFQG(rq), next->cmd_flags);
 
 	cfqq = RQ_CFQQ(next);
 	/*
@@ -2025,8 +2010,7 @@ static int cfq_allow_merge(struct request_queue *q, struct request *rq,
 static inline void cfq_del_timer(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 {
 	del_timer(&cfqd->idle_slice_timer);
-	cfq_blkiocg_update_idle_time_stats(cfqg_to_blkg(cfqq->cfqg),
-					   &blkio_policy_cfq);
+	cfqg_stats_update_idle_time(cfqq->cfqg);
 }
 
 static void __cfq_set_active_queue(struct cfq_data *cfqd,
@@ -2035,8 +2019,7 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd,
 	if (cfqq) {
 		cfq_log_cfqq(cfqd, cfqq, "set_active wl_prio:%d wl_type:%d",
 				cfqd->serving_prio, cfqd->serving_type);
-		cfq_blkiocg_update_avg_queue_size_stats(cfqg_to_blkg(cfqq->cfqg),
-							&blkio_policy_cfq);
+		cfqg_stats_update_avg_queue_size(cfqq->cfqg);
 		cfqq->slice_start = 0;
 		cfqq->dispatch_start = jiffies;
 		cfqq->allocated_slice = 0;
@@ -2384,8 +2367,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 		sl = cfqd->cfq_slice_idle;
 
 	mod_timer(&cfqd->idle_slice_timer, jiffies + sl);
-	cfq_blkiocg_update_set_idle_time_stats(cfqg_to_blkg(cfqq->cfqg),
-					       &blkio_policy_cfq);
+	cfqg_stats_set_start_idle_time(cfqq->cfqg);
 	cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu group_idle: %d", sl,
 			group_idle ? 1 : 0);
 }
@@ -2408,9 +2390,7 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
 
 	cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
 	cfqq->nr_sectors += blk_rq_sectors(rq);
-	cfq_blkiocg_update_dispatch_stats(cfqg_to_blkg(cfqq->cfqg),
-					  &blkio_policy_cfq, blk_rq_bytes(rq),
-					  rq_data_dir(rq), rq_is_sync(rq));
+	cfqg_stats_update_dispatch(cfqq->cfqg, blk_rq_bytes(rq), rq->cmd_flags);
 }
 
 /*
@@ -3513,9 +3493,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 				cfq_clear_cfqq_wait_request(cfqq);
 				__blk_run_queue(cfqd->queue);
 			} else {
-				cfq_blkiocg_update_idle_time_stats(
-						cfqg_to_blkg(cfqq->cfqg),
-						&blkio_policy_cfq);
+				cfqg_stats_update_idle_time(cfqq->cfqg);
 				cfq_mark_cfqq_must_dispatch(cfqq);
 			}
 		}
@@ -3542,10 +3520,8 @@ static void cfq_insert_request(struct request_queue *q, struct request *rq)
 	rq_set_fifo_time(rq, jiffies + cfqd->cfq_fifo_expire[rq_is_sync(rq)]);
 	list_add_tail(&rq->queuelist, &cfqq->fifo);
 	cfq_add_rq_rb(rq);
-	cfq_blkiocg_update_io_add_stats(cfqg_to_blkg(RQ_CFQG(rq)),
-					&blkio_policy_cfq,
-					cfqg_to_blkg(cfqd->serving_group),
-					rq_data_dir(rq), rq_is_sync(rq));
+	cfqg_stats_update_io_add(RQ_CFQG(rq), cfqd->serving_group,
+				 rq->cmd_flags);
 	cfq_rq_enqueued(cfqd, cfqq, rq);
 }
 
@@ -3641,10 +3617,8 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
 	cfqd->rq_in_driver--;
 	cfqq->dispatched--;
 	(RQ_CFQG(rq))->dispatched--;
-	cfq_blkiocg_update_completion_stats(cfqg_to_blkg(cfqq->cfqg),
-			&blkio_policy_cfq, rq_start_time_ns(rq),
-			rq_io_start_time_ns(rq), rq_data_dir(rq),
-			rq_is_sync(rq));
+	cfqg_stats_update_completion(cfqq->cfqg, rq_start_time_ns(rq),
+				     rq_io_start_time_ns(rq), rq->cmd_flags);
 
 	cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]--;
 
@@ -4184,6 +4158,7 @@ static struct elevator_type iosched_cfq = {
 static struct blkio_policy_type blkio_policy_cfq = {
 	.ops = {
 		.blkio_init_group_fn =		cfq_init_blkio_group,
+		.blkio_reset_group_stats_fn =	cfqg_stats_reset,
 	},
 	.plid = BLKIO_POLICY_PROP,
 	.pdata_size = sizeof(struct cfq_group),
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 17/21] blkcg: move blkio_group_stats_cpu and friends to blk-throttle.c
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (15 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 16/21] blkcg: move blkio_group_stats to cfq-iosched.c Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq Tejun Heo
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

blkio_group_stats_cpu is used only by blk-throtl and has no reason to
be defined in blkcg core.

* Move blkio_group_stats_cpu to blk-throttle.c and rename it to
  tg_stats_cpu.

* blkg_policy_data->stats_cpu is replaced with throtl_grp->stats_cpu.
  prfill functions updated accordingly.

* All related macros / functions are renamed so that they have tg_
  prefix and the unnecessary @pol arguments are dropped.

* Per-cpu stats allocation code is also moved from blk-cgroup.c to
  blk-throttle.c and gets simplified to only deal with
  BLKIO_POLICY_THROTL.  percpu stat free is performed by the exit
  method throtl_exit_blkio_group().

* throtl_reset_group_stats() implemented for
  blkio_reset_group_stats_fn method so that tg->stats_cpu can be
  reset.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c   |   98 +--------------------------------------
 block/blk-cgroup.h   |   13 -----
 block/blk-throttle.c |  128 ++++++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 114 insertions(+), 125 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index d7c7f17..0b4b765 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -30,13 +30,6 @@ static LIST_HEAD(blkio_list);
 static DEFINE_MUTEX(all_q_mutex);
 static LIST_HEAD(all_q_list);
 
-/* List of groups pending per cpu stats allocation */
-static DEFINE_SPINLOCK(alloc_list_lock);
-static LIST_HEAD(alloc_list);
-
-static void blkio_stat_alloc_fn(struct work_struct *);
-static DECLARE_DELAYED_WORK(blkio_stat_alloc_work, blkio_stat_alloc_fn);
-
 struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
 EXPORT_SYMBOL_GPL(blkio_root_cgroup);
 
@@ -63,60 +56,6 @@ struct blkio_cgroup *bio_blkio_cgroup(struct bio *bio)
 }
 EXPORT_SYMBOL_GPL(bio_blkio_cgroup);
 
-/*
- * Worker for allocating per cpu stat for blk groups. This is scheduled on
- * the system_nrt_wq once there are some groups on the alloc_list waiting
- * for allocation.
- */
-static void blkio_stat_alloc_fn(struct work_struct *work)
-{
-	static void *pcpu_stats[BLKIO_NR_POLICIES];
-	struct delayed_work *dwork = to_delayed_work(work);
-	struct blkio_group *blkg;
-	int i;
-	bool empty = false;
-
-alloc_stats:
-	for (i = 0; i < BLKIO_NR_POLICIES; i++) {
-		if (pcpu_stats[i] != NULL)
-			continue;
-
-		pcpu_stats[i] = alloc_percpu(struct blkio_group_stats_cpu);
-
-		/* Allocation failed. Try again after some time. */
-		if (pcpu_stats[i] == NULL) {
-			queue_delayed_work(system_nrt_wq, dwork,
-						msecs_to_jiffies(10));
-			return;
-		}
-	}
-
-	spin_lock_irq(&blkio_list_lock);
-	spin_lock(&alloc_list_lock);
-
-	/* cgroup got deleted or queue exited. */
-	if (!list_empty(&alloc_list)) {
-		blkg = list_first_entry(&alloc_list, struct blkio_group,
-						alloc_node);
-		for (i = 0; i < BLKIO_NR_POLICIES; i++) {
-			struct blkg_policy_data *pd = blkg->pd[i];
-
-			if (blkio_policy[i] && pd && !pd->stats_cpu)
-				swap(pd->stats_cpu, pcpu_stats[i]);
-		}
-
-		list_del_init(&blkg->alloc_node);
-	}
-
-	empty = list_empty(&alloc_list);
-
-	spin_unlock(&alloc_list_lock);
-	spin_unlock_irq(&blkio_list_lock);
-
-	if (!empty)
-		goto alloc_stats;
-}
-
 /**
  * blkg_free - free a blkg
  * @blkg: blkg to free
@@ -140,7 +79,6 @@ static void blkg_free(struct blkio_group *blkg)
 		if (pol && pol->ops.blkio_exit_group_fn)
 			pol->ops.blkio_exit_group_fn(blkg);
 
-		free_percpu(pd->stats_cpu);
 		kfree(pd);
 	}
 
@@ -167,7 +105,6 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg,
 
 	blkg->q = q;
 	INIT_LIST_HEAD(&blkg->q_node);
-	INIT_LIST_HEAD(&blkg->alloc_node);
 	blkg->blkcg = blkcg;
 	blkg->refcnt = 1;
 	cgroup_path(blkcg->css.cgroup, blkg->path, sizeof(blkg->path));
@@ -245,12 +182,6 @@ struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg,
 	hlist_add_head_rcu(&blkg->blkcg_node, &blkcg->blkg_list);
 	list_add(&blkg->q_node, &q->blkg_list);
 	spin_unlock(&blkcg->lock);
-
-	spin_lock(&alloc_list_lock);
-	list_add(&blkg->alloc_node, &alloc_list);
-	/* Queue per cpu stat allocation from worker thread. */
-	queue_delayed_work(system_nrt_wq, &blkio_stat_alloc_work, 0);
-	spin_unlock(&alloc_list_lock);
 out:
 	return blkg;
 }
@@ -284,10 +215,6 @@ static void blkg_destroy(struct blkio_group *blkg)
 	list_del_init(&blkg->q_node);
 	hlist_del_init_rcu(&blkg->blkcg_node);
 
-	spin_lock(&alloc_list_lock);
-	list_del_init(&blkg->alloc_node);
-	spin_unlock(&alloc_list_lock);
-
 	/*
 	 * Put the reference taken at the time of creation so that when all
 	 * queues are gone, group can be destroyed.
@@ -319,9 +246,6 @@ void update_root_blkg_pd(struct request_queue *q, enum blkio_policy_id plid)
 	pd = kzalloc(sizeof(*pd) + pol->pdata_size, GFP_KERNEL);
 	WARN_ON_ONCE(!pd);
 
-	pd->stats_cpu = alloc_percpu(struct blkio_group_stats_cpu);
-	WARN_ON_ONCE(!pd->stats_cpu);
-
 	blkg->pd[plid] = pd;
 	pd->blkg = blkg;
 	pol->ops.blkio_init_group_fn(blkg);
@@ -381,23 +305,6 @@ void __blkg_release(struct blkio_group *blkg)
 }
 EXPORT_SYMBOL_GPL(__blkg_release);
 
-static void blkio_reset_stats_cpu(struct blkio_group *blkg, int plid)
-{
-	struct blkg_policy_data *pd = blkg->pd[plid];
-	int cpu;
-
-	if (pd->stats_cpu == NULL)
-		return;
-
-	for_each_possible_cpu(cpu) {
-		struct blkio_group_stats_cpu *sc =
-			per_cpu_ptr(pd->stats_cpu, cpu);
-
-		blkg_rwstat_reset(&sc->service_bytes);
-		blkg_rwstat_reset(&sc->serviced);
-	}
-}
-
 static int
 blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 {
@@ -416,12 +323,9 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
 		struct blkio_policy_type *pol;
 
-		list_for_each_entry(pol, &blkio_list, list) {
-			blkio_reset_stats_cpu(blkg, pol->plid);
-
+		list_for_each_entry(pol, &blkio_list, list)
 			if (pol->ops.blkio_reset_group_stats_fn)
 				pol->ops.blkio_reset_group_stats_fn(blkg);
-		}
 	}
 
 	spin_unlock_irq(&blkcg->lock);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 791570394..e368dd00 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -64,14 +64,6 @@ struct blkg_rwstat {
 	uint64_t			cnt[BLKG_RWSTAT_NR];
 };
 
-/* Per cpu blkio group stats */
-struct blkio_group_stats_cpu {
-	/* total bytes transferred */
-	struct blkg_rwstat		service_bytes;
-	/* total IOs serviced, post merge */
-	struct blkg_rwstat		serviced;
-};
-
 struct blkio_group_conf {
 	unsigned int weight;
 	u64 iops[2];
@@ -86,9 +78,6 @@ struct blkg_policy_data {
 	/* Configuration */
 	struct blkio_group_conf conf;
 
-	/* Per cpu stats pointer */
-	struct blkio_group_stats_cpu __percpu *stats_cpu;
-
 	/* pol->pdata_size bytes of private data used by policy impl */
 	char pdata[] __aligned(__alignof__(unsigned long long));
 };
@@ -106,8 +95,6 @@ struct blkio_group {
 
 	struct blkg_policy_data *pd[BLKIO_NR_POLICIES];
 
-	/* List of blkg waiting for per cpu stats memory to be allocated */
-	struct list_head alloc_node;
 	struct rcu_head rcu_head;
 };
 
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index cb259bc..27f7960 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -40,6 +40,14 @@ struct throtl_rb_root {
 
 #define rb_entry_tg(node)	rb_entry((node), struct throtl_grp, rb_node)
 
+/* Per-cpu group stats */
+struct tg_stats_cpu {
+	/* total bytes transferred */
+	struct blkg_rwstat		service_bytes;
+	/* total IOs serviced, post merge */
+	struct blkg_rwstat		serviced;
+};
+
 struct throtl_grp {
 	/* active throtl group service_tree member */
 	struct rb_node rb_node;
@@ -76,6 +84,12 @@ struct throtl_grp {
 
 	/* Some throttle limits got updated for the group */
 	int limits_changed;
+
+	/* Per cpu stats pointer */
+	struct tg_stats_cpu __percpu *stats_cpu;
+
+	/* List of tgs waiting for per cpu stats memory to be allocated */
+	struct list_head stats_alloc_node;
 };
 
 struct throtl_data
@@ -100,6 +114,13 @@ struct throtl_data
 	int limits_changed;
 };
 
+/* list and work item to allocate percpu group stats */
+static DEFINE_SPINLOCK(tg_stats_alloc_lock);
+static LIST_HEAD(tg_stats_alloc_list);
+
+static void tg_stats_alloc_fn(struct work_struct *);
+static DECLARE_DELAYED_WORK(tg_stats_alloc_work, tg_stats_alloc_fn);
+
 static inline struct throtl_grp *blkg_to_tg(struct blkio_group *blkg)
 {
 	return blkg_to_pdata(blkg, &blkio_policy_throtl);
@@ -142,6 +163,44 @@ static inline unsigned int total_nr_queued(struct throtl_data *td)
 	return td->nr_queued[0] + td->nr_queued[1];
 }
 
+/*
+ * Worker for allocating per cpu stat for tgs. This is scheduled on the
+ * system_nrt_wq once there are some groups on the alloc_list waiting for
+ * allocation.
+ */
+static void tg_stats_alloc_fn(struct work_struct *work)
+{
+	static struct tg_stats_cpu *stats_cpu;	/* this fn is non-reentrant */
+	struct delayed_work *dwork = to_delayed_work(work);
+	bool empty = false;
+
+alloc_stats:
+	if (!stats_cpu) {
+		stats_cpu = alloc_percpu(struct tg_stats_cpu);
+		if (!stats_cpu) {
+			/* allocation failed, try again after some time */
+			queue_delayed_work(system_nrt_wq, dwork,
+					   msecs_to_jiffies(10));
+			return;
+		}
+	}
+
+	spin_lock_irq(&tg_stats_alloc_lock);
+
+	if (!list_empty(&tg_stats_alloc_list)) {
+		struct throtl_grp *tg = list_first_entry(&tg_stats_alloc_list,
+							 struct throtl_grp,
+							 stats_alloc_node);
+		swap(tg->stats_cpu, stats_cpu);
+		list_del_init(&tg->stats_alloc_node);
+	}
+
+	empty = list_empty(&tg_stats_alloc_list);
+	spin_unlock_irq(&tg_stats_alloc_lock);
+	if (!empty)
+		goto alloc_stats;
+}
+
 static void throtl_init_blkio_group(struct blkio_group *blkg)
 {
 	struct throtl_grp *tg = blkg_to_tg(blkg);
@@ -155,6 +214,43 @@ static void throtl_init_blkio_group(struct blkio_group *blkg)
 	tg->bps[WRITE] = -1;
 	tg->iops[READ] = -1;
 	tg->iops[WRITE] = -1;
+
+	/*
+	 * Ugh... We need to perform per-cpu allocation for tg->stats_cpu
+	 * but percpu allocator can't be called from IO path.  Queue tg on
+	 * tg_stats_alloc_list and allocate from work item.
+	 */
+	spin_lock(&tg_stats_alloc_lock);
+	list_add(&tg->stats_alloc_node, &tg_stats_alloc_list);
+	queue_delayed_work(system_nrt_wq, &tg_stats_alloc_work, 0);
+	spin_unlock(&tg_stats_alloc_lock);
+}
+
+static void throtl_exit_blkio_group(struct blkio_group *blkg)
+{
+	struct throtl_grp *tg = blkg_to_tg(blkg);
+
+	spin_lock(&tg_stats_alloc_lock);
+	list_del_init(&tg->stats_alloc_node);
+	spin_unlock(&tg_stats_alloc_lock);
+
+	free_percpu(tg->stats_cpu);
+}
+
+static void throtl_reset_group_stats(struct blkio_group *blkg)
+{
+	struct throtl_grp *tg = blkg_to_tg(blkg);
+	int cpu;
+
+	if (tg->stats_cpu == NULL)
+		return;
+
+	for_each_possible_cpu(cpu) {
+		struct tg_stats_cpu *sc = per_cpu_ptr(tg->stats_cpu, cpu);
+
+		blkg_rwstat_reset(&sc->service_bytes);
+		blkg_rwstat_reset(&sc->serviced);
+	}
 }
 
 static struct
@@ -565,12 +661,12 @@ static bool tg_may_dispatch(struct throtl_data *td, struct throtl_grp *tg,
 static void throtl_update_dispatch_stats(struct blkio_group *blkg, u64 bytes,
 					 int rw)
 {
-	struct blkg_policy_data *pd = blkg->pd[BLKIO_POLICY_THROTL];
-	struct blkio_group_stats_cpu *stats_cpu;
+	struct throtl_grp *tg = blkg_to_tg(blkg);
+	struct tg_stats_cpu *stats_cpu;
 	unsigned long flags;
 
 	/* If per cpu stats are not allocated yet, don't do any accounting. */
-	if (pd->stats_cpu == NULL)
+	if (tg->stats_cpu == NULL)
 		return;
 
 	/*
@@ -580,7 +676,7 @@ static void throtl_update_dispatch_stats(struct blkio_group *blkg, u64 bytes,
 	 */
 	local_irq_save(flags);
 
-	stats_cpu = this_cpu_ptr(pd->stats_cpu);
+	stats_cpu = this_cpu_ptr(tg->stats_cpu);
 
 	blkg_rwstat_add(&stats_cpu->serviced, rw, 1);
 	blkg_rwstat_add(&stats_cpu->service_bytes, rw, bytes);
@@ -842,15 +938,15 @@ static void throtl_update_blkio_group_common(struct throtl_data *td,
 	throtl_schedule_delayed_work(td, 0);
 }
 
-static u64 blkg_prfill_cpu_rwstat(struct seq_file *sf,
-				  struct blkg_policy_data *pd, int off)
+static u64 tg_prfill_cpu_rwstat(struct seq_file *sf,
+				struct blkg_policy_data *pd, int off)
 {
+	struct throtl_grp *tg = (void *)pd->pdata;
 	struct blkg_rwstat rwstat = { }, tmp;
 	int i, cpu;
 
 	for_each_possible_cpu(cpu) {
-		struct blkio_group_stats_cpu *sc =
-			per_cpu_ptr(pd->stats_cpu, cpu);
+		struct tg_stats_cpu *sc = per_cpu_ptr(tg->stats_cpu, cpu);
 
 		tmp = blkg_rwstat_read((void *)sc + off);
 		for (i = 0; i < BLKG_RWSTAT_NR; i++)
@@ -861,12 +957,12 @@ static u64 blkg_prfill_cpu_rwstat(struct seq_file *sf,
 }
 
 /* print per-cpu blkg_rwstat specified by BLKCG_STAT_PRIV() */
-static int blkcg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
-				  struct seq_file *sf)
+static int tg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
+			       struct seq_file *sf)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
-	blkcg_print_blkgs(sf, blkcg, blkg_prfill_cpu_rwstat,
+	blkcg_print_blkgs(sf, blkcg, tg_prfill_cpu_rwstat,
 			  BLKCG_STAT_POL(cft->private),
 			  BLKCG_STAT_OFF(cft->private), true);
 	return 0;
@@ -1012,14 +1108,14 @@ static struct cftype throtl_files[] = {
 	{
 		.name = "throttle.io_service_bytes",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_THROTL,
-				offsetof(struct blkio_group_stats_cpu, service_bytes)),
-		.read_seq_string = blkcg_print_cpu_rwstat,
+				offsetof(struct tg_stats_cpu, service_bytes)),
+		.read_seq_string = tg_print_cpu_rwstat,
 	},
 	{
 		.name = "throttle.io_serviced",
 		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_THROTL,
-				offsetof(struct blkio_group_stats_cpu, serviced)),
-		.read_seq_string = blkcg_print_cpu_rwstat,
+				offsetof(struct tg_stats_cpu, serviced)),
+		.read_seq_string = tg_print_cpu_rwstat,
 	},
 	{ }	/* terminate */
 };
@@ -1034,6 +1130,8 @@ static void throtl_shutdown_wq(struct request_queue *q)
 static struct blkio_policy_type blkio_policy_throtl = {
 	.ops = {
 		.blkio_init_group_fn = throtl_init_blkio_group,
+		.blkio_exit_group_fn = throtl_exit_blkio_group,
+		.blkio_reset_group_stats_fn = throtl_reset_group_stats,
 	},
 	.plid = BLKIO_POLICY_THROTL,
 	.pdata_size = sizeof(struct throtl_grp),
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (16 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 17/21] blkcg: move blkio_group_stats_cpu and friends to blk-throttle.c Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-04-01 21:09   ` Vivek Goyal
  2012-04-02 21:39   ` Tao Ma
  2012-03-28 22:51 ` [PATCH 19/21] blkcg: move blkio_group_conf->iops and ->bps to blk-throttle Tejun Heo
                   ` (5 subsequent siblings)
  23 siblings, 2 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

blkio_group_conf->weight is owned by cfq and has no reason to be
defined in blkcg core.  Replace it with cfq_group->dev_weight and let
conf setting functions directly set it.  If dev_weight is zero, the
cfqg doesn't have device specific weight configured.

Also, rename BLKIO_WEIGHT_* constants to CFQ_WEIGHT_* and rename
blkio_cgroup->weight to blkio_cgroup->cfq_weight.  We eventually want
per-policy storage in blkio_cgroup but just mark the ownership of the
field for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c  |    4 +-
 block/blk-cgroup.h  |   14 +++++----
 block/cfq-iosched.c |   77 +++++++++++++++++++++++---------------------------
 3 files changed, 45 insertions(+), 50 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0b4b765..7688aef 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -30,7 +30,7 @@ static LIST_HEAD(blkio_list);
 static DEFINE_MUTEX(all_q_mutex);
 static LIST_HEAD(all_q_list);
 
-struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
+struct blkio_cgroup blkio_root_cgroup = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT };
 EXPORT_SYMBOL_GPL(blkio_root_cgroup);
 
 static struct blkio_policy_type *blkio_policy[BLKIO_NR_POLICIES];
@@ -611,7 +611,7 @@ static struct cgroup_subsys_state *blkiocg_create(struct cgroup *cgroup)
 	if (!blkcg)
 		return ERR_PTR(-ENOMEM);
 
-	blkcg->weight = BLKIO_WEIGHT_DEFAULT;
+	blkcg->cfq_weight = CFQ_WEIGHT_DEFAULT;
 	blkcg->id = atomic64_inc_return(&id_seq); /* root is 0, start from 1 */
 done:
 	spin_lock_init(&blkcg->lock);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index e368dd00..386db29 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -29,6 +29,11 @@ enum blkio_policy_id {
 
 #ifdef CONFIG_BLK_CGROUP
 
+/* CFQ specific, out here for blkcg->cfq_weight */
+#define CFQ_WEIGHT_MIN		10
+#define CFQ_WEIGHT_MAX		1000
+#define CFQ_WEIGHT_DEFAULT	500
+
 /* cft->private [un]packing for stat printing */
 #define BLKCG_STAT_PRIV(pol, off)	(((unsigned)(pol) << 16) | (off))
 #define BLKCG_STAT_POL(prv)		((unsigned)(prv) >> 16)
@@ -46,12 +51,14 @@ enum blkg_rwstat_type {
 
 struct blkio_cgroup {
 	struct cgroup_subsys_state css;
-	unsigned int weight;
 	spinlock_t lock;
 	struct hlist_head blkg_list;
 
 	/* for policies to test whether associated blkcg has changed */
 	uint64_t id;
+
+	/* TODO: per-policy storage in blkio_cgroup */
+	unsigned int cfq_weight;	/* belongs to cfq */
 };
 
 struct blkg_stat {
@@ -65,7 +72,6 @@ struct blkg_rwstat {
 };
 
 struct blkio_group_conf {
-	unsigned int weight;
 	u64 iops[2];
 	u64 bps[2];
 };
@@ -355,10 +361,6 @@ static inline void blkg_put(struct blkio_group *blkg) { }
 
 #endif
 
-#define BLKIO_WEIGHT_MIN	10
-#define BLKIO_WEIGHT_MAX	1000
-#define BLKIO_WEIGHT_DEFAULT	500
-
 #ifdef CONFIG_BLK_CGROUP
 extern struct blkio_cgroup blkio_root_cgroup;
 extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index a1f37df..adab10d 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -224,7 +224,7 @@ struct cfq_group {
 	u64 vdisktime;
 	unsigned int weight;
 	unsigned int new_weight;
-	bool needs_update;
+	unsigned int dev_weight;
 
 	/* number of cfqq currently on this group */
 	int nr_cfqq;
@@ -838,7 +838,7 @@ static inline u64 cfq_scale_slice(unsigned long delta, struct cfq_group *cfqg)
 {
 	u64 d = delta << CFQ_SERVICE_SHIFT;
 
-	d = d * BLKIO_WEIGHT_DEFAULT;
+	d = d * CFQ_WEIGHT_DEFAULT;
 	do_div(d, cfqg->weight);
 	return d;
 }
@@ -1165,9 +1165,9 @@ static void
 cfq_update_group_weight(struct cfq_group *cfqg)
 {
 	BUG_ON(!RB_EMPTY_NODE(&cfqg->rb_node));
-	if (cfqg->needs_update) {
+	if (cfqg->new_weight) {
 		cfqg->weight = cfqg->new_weight;
-		cfqg->needs_update = false;
+		cfqg->new_weight = 0;
 	}
 }
 
@@ -1325,21 +1325,12 @@ static void cfq_init_cfqg_base(struct cfq_group *cfqg)
 }
 
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
-static void cfq_update_blkio_group_weight(struct blkio_group *blkg,
-					  unsigned int weight)
-{
-	struct cfq_group *cfqg = blkg_to_cfqg(blkg);
-
-	cfqg->new_weight = weight;
-	cfqg->needs_update = true;
-}
-
 static void cfq_init_blkio_group(struct blkio_group *blkg)
 {
 	struct cfq_group *cfqg = blkg_to_cfqg(blkg);
 
 	cfq_init_cfqg_base(cfqg);
-	cfqg->weight = blkg->blkcg->weight;
+	cfqg->weight = blkg->blkcg->cfq_weight;
 }
 
 /*
@@ -1377,36 +1368,38 @@ static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg)
 	cfqg_get(cfqg);
 }
 
-static u64 blkg_prfill_weight_device(struct seq_file *sf,
+static u64 cfqg_prfill_weight_device(struct seq_file *sf,
 				     struct blkg_policy_data *pd, int off)
 {
-	if (!pd->conf.weight)
+	struct cfq_group *cfqg = (void *)pd->pdata;
+
+	if (!cfqg->dev_weight)
 		return 0;
-	return __blkg_prfill_u64(sf, pd, pd->conf.weight);
+	return __blkg_prfill_u64(sf, pd, cfqg->dev_weight);
 }
 
-static int blkcg_print_weight_device(struct cgroup *cgrp, struct cftype *cft,
-				     struct seq_file *sf)
+static int cfqg_print_weight_device(struct cgroup *cgrp, struct cftype *cft,
+				    struct seq_file *sf)
 {
 	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
-			  blkg_prfill_weight_device, BLKIO_POLICY_PROP, 0,
+			  cfqg_prfill_weight_device, BLKIO_POLICY_PROP, 0,
 			  false);
 	return 0;
 }
 
-static int blkcg_print_weight(struct cgroup *cgrp, struct cftype *cft,
-			      struct seq_file *sf)
+static int cfq_print_weight(struct cgroup *cgrp, struct cftype *cft,
+			    struct seq_file *sf)
 {
-	seq_printf(sf, "%u\n", cgroup_to_blkio_cgroup(cgrp)->weight);
+	seq_printf(sf, "%u\n", cgroup_to_blkio_cgroup(cgrp)->cfq_weight);
 	return 0;
 }
 
-static int blkcg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
-				   const char *buf)
+static int cfqg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
+				  const char *buf)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-	struct blkg_policy_data *pd;
 	struct blkg_conf_ctx ctx;
+	struct cfq_group *cfqg;
 	int ret;
 
 	ret = blkg_conf_prep(blkcg, buf, &ctx);
@@ -1414,11 +1407,11 @@ static int blkcg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
 		return ret;
 
 	ret = -EINVAL;
-	pd = ctx.blkg->pd[BLKIO_POLICY_PROP];
-	if (pd && (!ctx.v || (ctx.v >= BLKIO_WEIGHT_MIN &&
-			      ctx.v <= BLKIO_WEIGHT_MAX))) {
-		pd->conf.weight = ctx.v;
-		cfq_update_blkio_group_weight(ctx.blkg, ctx.v ?: blkcg->weight);
+	cfqg = blkg_to_cfqg(ctx.blkg);
+	if (cfqg && (!ctx.v || (ctx.v >= CFQ_WEIGHT_MIN &&
+				ctx.v <= CFQ_WEIGHT_MAX))) {
+		cfqg->dev_weight = ctx.v;
+		cfqg->new_weight = cfqg->dev_weight ?: blkcg->cfq_weight;
 		ret = 0;
 	}
 
@@ -1426,23 +1419,23 @@ static int blkcg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
 	return ret;
 }
 
-static int blkcg_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
+static int cfq_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 	struct blkio_group *blkg;
 	struct hlist_node *n;
 
-	if (val < BLKIO_WEIGHT_MIN || val > BLKIO_WEIGHT_MAX)
+	if (val < CFQ_WEIGHT_MIN || val > CFQ_WEIGHT_MAX)
 		return -EINVAL;
 
 	spin_lock_irq(&blkcg->lock);
-	blkcg->weight = (unsigned int)val;
+	blkcg->cfq_weight = (unsigned int)val;
 
 	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
-		struct blkg_policy_data *pd = blkg->pd[BLKIO_POLICY_PROP];
+		struct cfq_group *cfqg = blkg_to_cfqg(blkg);
 
-		if (pd && !pd->conf.weight)
-			cfq_update_blkio_group_weight(blkg, blkcg->weight);
+		if (cfqg && !cfqg->dev_weight)
+			cfqg->new_weight = blkcg->cfq_weight;
 	}
 
 	spin_unlock_irq(&blkcg->lock);
@@ -1480,14 +1473,14 @@ static int cfqg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft,
 static struct cftype cfq_blkcg_files[] = {
 	{
 		.name = "weight_device",
-		.read_seq_string = blkcg_print_weight_device,
-		.write_string = blkcg_set_weight_device,
+		.read_seq_string = cfqg_print_weight_device,
+		.write_string = cfqg_set_weight_device,
 		.max_write_len = 256,
 	},
 	{
 		.name = "weight",
-		.read_seq_string = blkcg_print_weight,
-		.write_u64 = blkcg_set_weight,
+		.read_seq_string = cfq_print_weight,
+		.write_u64 = cfq_set_weight,
 	},
 	{
 		.name = "time",
@@ -3983,7 +3976,7 @@ static int cfq_init_queue(struct request_queue *q)
 		return -ENOMEM;
 	}
 
-	cfqd->root_group->weight = 2*BLKIO_WEIGHT_DEFAULT;
+	cfqd->root_group->weight = 2 * CFQ_WEIGHT_DEFAULT;
 
 	/*
 	 * Not strictly needed (since RB_ROOT just clears the node and we
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 19/21] blkcg: move blkio_group_conf->iops and ->bps to blk-throttle
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (17 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 20/21] blkcg: pass around pd->pdata instead of pd itself in prfill functions Tejun Heo
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

blkio_cgroup_conf->iops and ->bps are owned by blk-throttle and has no
reason to be defined in blkcg core.  Drop them and let conf setting
functions directly manipulate throtl_grp->bps[] and ->iops[].

This makes blkio_group_conf empty.  Drop it.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.h   |    8 ---
 block/blk-throttle.c |  153 +++++++++++++++++++-------------------------------
 2 files changed, 58 insertions(+), 103 deletions(-)

diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 386db29..a77ab1a 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -71,19 +71,11 @@ struct blkg_rwstat {
 	uint64_t			cnt[BLKG_RWSTAT_NR];
 };
 
-struct blkio_group_conf {
-	u64 iops[2];
-	u64 bps[2];
-};
-
 /* per-blkg per-policy data */
 struct blkg_policy_data {
 	/* the blkg this per-policy data belongs to */
 	struct blkio_group *blkg;
 
-	/* Configuration */
-	struct blkio_group_conf conf;
-
 	/* pol->pdata_size bytes of private data used by policy impl */
 	char pdata[] __aligned(__alignof__(unsigned long long));
 };
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 27f7960..004964b 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -924,20 +924,6 @@ throtl_schedule_delayed_work(struct throtl_data *td, unsigned long delay)
 	}
 }
 
-/*
- * Can not take queue lock in update functions as queue lock under
- * blkcg_lock is not allowed. Under other paths we take blkcg_lock under
- * queue_lock.
- */
-static void throtl_update_blkio_group_common(struct throtl_data *td,
-				struct throtl_grp *tg)
-{
-	xchg(&tg->limits_changed, true);
-	xchg(&td->limits_changed, true);
-	/* Schedule a work now to process the limit change */
-	throtl_schedule_delayed_work(td, 0);
-}
-
 static u64 tg_prfill_cpu_rwstat(struct seq_file *sf,
 				struct blkg_policy_data *pd, int off)
 {
@@ -968,68 +954,48 @@ static int tg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
-static u64 blkg_prfill_conf_u64(struct seq_file *sf,
-				struct blkg_policy_data *pd, int off)
+static u64 tg_prfill_conf_u64(struct seq_file *sf, struct blkg_policy_data *pd,
+			      int off)
 {
-	u64 v = *(u64 *)((void *)&pd->conf + off);
+	u64 v = *(u64 *)((void *)pd->pdata + off);
 
-	if (!v)
+	if (v == -1)
 		return 0;
 	return __blkg_prfill_u64(sf, pd, v);
 }
 
-static int blkcg_print_conf_u64(struct cgroup *cgrp, struct cftype *cft,
-				struct seq_file *sf)
+static u64 tg_prfill_conf_uint(struct seq_file *sf, struct blkg_policy_data *pd,
+			       int off)
 {
-	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
-			  blkg_prfill_conf_u64, BLKIO_POLICY_THROTL,
-			  cft->private, false);
-	return 0;
-}
+	unsigned int v = *(unsigned int *)((void *)pd->pdata + off);
 
-static void throtl_update_blkio_group_read_bps(struct blkio_group *blkg,
-					       u64 read_bps)
-{
-	struct throtl_grp *tg = blkg_to_tg(blkg);
-
-	tg->bps[READ] = read_bps;
-	throtl_update_blkio_group_common(blkg->q->td, tg);
-}
-
-static void throtl_update_blkio_group_write_bps(struct blkio_group *blkg,
-						u64 write_bps)
-{
-	struct throtl_grp *tg = blkg_to_tg(blkg);
-
-	tg->bps[WRITE] = write_bps;
-	throtl_update_blkio_group_common(blkg->q->td, tg);
+	if (v == -1)
+		return 0;
+	return __blkg_prfill_u64(sf, pd, v);
 }
 
-static void throtl_update_blkio_group_read_iops(struct blkio_group *blkg,
-						u64 read_iops)
+static int tg_print_conf_u64(struct cgroup *cgrp, struct cftype *cft,
+			     struct seq_file *sf)
 {
-	struct throtl_grp *tg = blkg_to_tg(blkg);
-
-	tg->iops[READ] = read_iops;
-	throtl_update_blkio_group_common(blkg->q->td, tg);
+	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp), tg_prfill_conf_u64,
+			  BLKIO_POLICY_THROTL, cft->private, false);
+	return 0;
 }
 
-static void throtl_update_blkio_group_write_iops(struct blkio_group *blkg,
-						 u64 write_iops)
+static int tg_print_conf_uint(struct cgroup *cgrp, struct cftype *cft,
+			      struct seq_file *sf)
 {
-	struct throtl_grp *tg = blkg_to_tg(blkg);
-
-	tg->iops[WRITE] = write_iops;
-	throtl_update_blkio_group_common(blkg->q->td, tg);
+	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp), tg_prfill_conf_uint,
+			  BLKIO_POLICY_THROTL, cft->private, false);
+	return 0;
 }
 
-static int blkcg_set_conf_u64(struct cgroup *cgrp, struct cftype *cft,
-			      const char *buf,
-			      void (*update)(struct blkio_group *, u64))
+static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf,
+		       bool is_u64)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-	struct blkg_policy_data *pd;
 	struct blkg_conf_ctx ctx;
+	struct throtl_grp *tg;
 	int ret;
 
 	ret = blkg_conf_prep(blkcg, buf, &ctx);
@@ -1037,10 +1003,23 @@ static int blkcg_set_conf_u64(struct cgroup *cgrp, struct cftype *cft,
 		return ret;
 
 	ret = -EINVAL;
-	pd = ctx.blkg->pd[BLKIO_POLICY_THROTL];
-	if (pd) {
-		*(u64 *)((void *)&pd->conf + cft->private) = ctx.v;
-		update(ctx.blkg, ctx.v ?: -1);
+	tg = blkg_to_tg(ctx.blkg);
+	if (tg) {
+		struct throtl_data *td = ctx.blkg->q->td;
+
+		if (!ctx.v)
+			ctx.v = -1;
+
+		if (is_u64)
+			*(u64 *)((void *)tg + cft->private) = ctx.v;
+		else
+			*(unsigned int *)((void *)tg + cft->private) = ctx.v;
+
+		/* XXX: we don't need the following deferred processing */
+		xchg(&tg->limits_changed, true);
+		xchg(&td->limits_changed, true);
+		throtl_schedule_delayed_work(td, 0);
+
 		ret = 0;
 	}
 
@@ -1048,61 +1027,45 @@ static int blkcg_set_conf_u64(struct cgroup *cgrp, struct cftype *cft,
 	return ret;
 }
 
-static int blkcg_set_conf_bps_r(struct cgroup *cgrp, struct cftype *cft,
-				const char *buf)
-{
-	return blkcg_set_conf_u64(cgrp, cft, buf,
-				  throtl_update_blkio_group_read_bps);
-}
-
-static int blkcg_set_conf_bps_w(struct cgroup *cgrp, struct cftype *cft,
-				const char *buf)
-{
-	return blkcg_set_conf_u64(cgrp, cft, buf,
-				  throtl_update_blkio_group_write_bps);
-}
-
-static int blkcg_set_conf_iops_r(struct cgroup *cgrp, struct cftype *cft,
-				 const char *buf)
+static int tg_set_conf_u64(struct cgroup *cgrp, struct cftype *cft,
+			   const char *buf)
 {
-	return blkcg_set_conf_u64(cgrp, cft, buf,
-				  throtl_update_blkio_group_read_iops);
+	return tg_set_conf(cgrp, cft, buf, true);
 }
 
-static int blkcg_set_conf_iops_w(struct cgroup *cgrp, struct cftype *cft,
-				 const char *buf)
+static int tg_set_conf_uint(struct cgroup *cgrp, struct cftype *cft,
+			    const char *buf)
 {
-	return blkcg_set_conf_u64(cgrp, cft, buf,
-				  throtl_update_blkio_group_write_iops);
+	return tg_set_conf(cgrp, cft, buf, false);
 }
 
 static struct cftype throtl_files[] = {
 	{
 		.name = "throttle.read_bps_device",
-		.private = offsetof(struct blkio_group_conf, bps[READ]),
-		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkcg_set_conf_bps_r,
+		.private = offsetof(struct throtl_grp, bps[READ]),
+		.read_seq_string = tg_print_conf_u64,
+		.write_string = tg_set_conf_u64,
 		.max_write_len = 256,
 	},
 	{
 		.name = "throttle.write_bps_device",
-		.private = offsetof(struct blkio_group_conf, bps[WRITE]),
-		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkcg_set_conf_bps_w,
+		.private = offsetof(struct throtl_grp, bps[WRITE]),
+		.read_seq_string = tg_print_conf_u64,
+		.write_string = tg_set_conf_u64,
 		.max_write_len = 256,
 	},
 	{
 		.name = "throttle.read_iops_device",
-		.private = offsetof(struct blkio_group_conf, iops[READ]),
-		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkcg_set_conf_iops_r,
+		.private = offsetof(struct throtl_grp, iops[READ]),
+		.read_seq_string = tg_print_conf_uint,
+		.write_string = tg_set_conf_uint,
 		.max_write_len = 256,
 	},
 	{
 		.name = "throttle.write_iops_device",
-		.private = offsetof(struct blkio_group_conf, iops[WRITE]),
-		.read_seq_string = blkcg_print_conf_u64,
-		.write_string = blkcg_set_conf_iops_w,
+		.private = offsetof(struct throtl_grp, iops[WRITE]),
+		.read_seq_string = tg_print_conf_uint,
+		.write_string = tg_set_conf_uint,
 		.max_write_len = 256,
 	},
 	{
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 20/21] blkcg: pass around pd->pdata instead of pd itself in prfill functions
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (18 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 19/21] blkcg: move blkio_group_conf->iops and ->bps to blk-throttle Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-28 22:51 ` [PATCH 21/21] blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros Tejun Heo
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

Now that all conf and stat fields are moved into policy specific
blkio_policy_data->pdata areas, there's no reason to use
blkio_policy_data itself in prfill functions.  Pass around @pd->pdata
instead of @pd.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c   |   33 +++++++++++++++------------------
 block/blk-cgroup.h   |    6 +++---
 block/blk-throttle.c |   21 +++++++++------------
 block/cfq-iosched.c  |   14 ++++++--------
 4 files changed, 33 insertions(+), 41 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 7688aef..8f678d7 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -359,7 +359,7 @@ static const char *blkg_dev_name(struct blkio_group *blkg)
  * cftype->read_seq_string method.
  */
 void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
-		       u64 (*prfill)(struct seq_file *, struct blkg_policy_data *, int),
+		       u64 (*prfill)(struct seq_file *, void *, int),
 		       int pol, int data, bool show_total)
 {
 	struct blkio_group *blkg;
@@ -369,7 +369,7 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
 	spin_lock_irq(&blkcg->lock);
 	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node)
 		if (blkg->pd[pol])
-			total += prfill(sf, blkg->pd[pol], data);
+			total += prfill(sf, blkg->pd[pol]->pdata, data);
 	spin_unlock_irq(&blkcg->lock);
 
 	if (show_total)
@@ -380,14 +380,14 @@ EXPORT_SYMBOL_GPL(blkcg_print_blkgs);
 /**
  * __blkg_prfill_u64 - prfill helper for a single u64 value
  * @sf: seq_file to print to
- * @pd: policy data of interest
+ * @pdata: policy private data of interest
  * @v: value to print
  *
- * Print @v to @sf for the device assocaited with @pd.
+ * Print @v to @sf for the device assocaited with @pdata.
  */
-u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v)
+u64 __blkg_prfill_u64(struct seq_file *sf, void *pdata, u64 v)
 {
-	const char *dname = blkg_dev_name(pd->blkg);
+	const char *dname = blkg_dev_name(pdata_to_blkg(pdata));
 
 	if (!dname)
 		return 0;
@@ -400,12 +400,12 @@ EXPORT_SYMBOL_GPL(__blkg_prfill_u64);
 /**
  * __blkg_prfill_rwstat - prfill helper for a blkg_rwstat
  * @sf: seq_file to print to
- * @pd: policy data of interest
+ * @pdata: policy private data of interest
  * @rwstat: rwstat to print
  *
- * Print @rwstat to @sf for the device assocaited with @pd.
+ * Print @rwstat to @sf for the device assocaited with @pdata.
  */
-u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
+u64 __blkg_prfill_rwstat(struct seq_file *sf, void *pdata,
 			 const struct blkg_rwstat *rwstat)
 {
 	static const char *rwstr[] = {
@@ -414,7 +414,7 @@ u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
 		[BLKG_RWSTAT_SYNC]	= "Sync",
 		[BLKG_RWSTAT_ASYNC]	= "Async",
 	};
-	const char *dname = blkg_dev_name(pd->blkg);
+	const char *dname = blkg_dev_name(pdata_to_blkg(pdata));
 	u64 v;
 	int i;
 
@@ -430,19 +430,16 @@ u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
 	return v;
 }
 
-static u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd,
-			    int off)
+static u64 blkg_prfill_stat(struct seq_file *sf, void *pdata, int off)
 {
-	return __blkg_prfill_u64(sf, pd,
-				 blkg_stat_read((void *)pd->pdata + off));
+	return __blkg_prfill_u64(sf, pdata, blkg_stat_read(pdata + off));
 }
 
-static u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
-			      int off)
+static u64 blkg_prfill_rwstat(struct seq_file *sf, void *pdata, int off)
 {
-	struct blkg_rwstat rwstat = blkg_rwstat_read((void *)pd->pdata + off);
+	struct blkg_rwstat rwstat = blkg_rwstat_read(pdata + off);
 
-	return __blkg_prfill_rwstat(sf, pd, &rwstat);
+	return __blkg_prfill_rwstat(sf, pdata, &rwstat);
 }
 
 /* print blkg_stat specified by BLKCG_STAT_PRIV() */
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index a77ab1a..c930895 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -126,10 +126,10 @@ extern void update_root_blkg_pd(struct request_queue *q,
 				enum blkio_policy_id plid);
 
 void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
-		       u64 (*prfill)(struct seq_file *, struct blkg_policy_data *, int),
+		       u64 (*prfill)(struct seq_file *, void *, int),
 		       int pol, int data, bool show_total);
-u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v);
-u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
+u64 __blkg_prfill_u64(struct seq_file *sf, void *pdata, u64 v);
+u64 __blkg_prfill_rwstat(struct seq_file *sf, void *pdata,
 			 const struct blkg_rwstat *rwstat);
 int blkcg_print_stat(struct cgroup *cgrp, struct cftype *cft,
 		     struct seq_file *sf);
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 004964b..bd6dbfe 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -924,10 +924,9 @@ throtl_schedule_delayed_work(struct throtl_data *td, unsigned long delay)
 	}
 }
 
-static u64 tg_prfill_cpu_rwstat(struct seq_file *sf,
-				struct blkg_policy_data *pd, int off)
+static u64 tg_prfill_cpu_rwstat(struct seq_file *sf, void *pdata, int off)
 {
-	struct throtl_grp *tg = (void *)pd->pdata;
+	struct throtl_grp *tg = pdata;
 	struct blkg_rwstat rwstat = { }, tmp;
 	int i, cpu;
 
@@ -939,7 +938,7 @@ static u64 tg_prfill_cpu_rwstat(struct seq_file *sf,
 			rwstat.cnt[i] += tmp.cnt[i];
 	}
 
-	return __blkg_prfill_rwstat(sf, pd, &rwstat);
+	return __blkg_prfill_rwstat(sf, pdata, &rwstat);
 }
 
 /* print per-cpu blkg_rwstat specified by BLKCG_STAT_PRIV() */
@@ -954,24 +953,22 @@ static int tg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
-static u64 tg_prfill_conf_u64(struct seq_file *sf, struct blkg_policy_data *pd,
-			      int off)
+static u64 tg_prfill_conf_u64(struct seq_file *sf, void *pdata, int off)
 {
-	u64 v = *(u64 *)((void *)pd->pdata + off);
+	u64 v = *(u64 *)(pdata + off);
 
 	if (v == -1)
 		return 0;
-	return __blkg_prfill_u64(sf, pd, v);
+	return __blkg_prfill_u64(sf, pdata, v);
 }
 
-static u64 tg_prfill_conf_uint(struct seq_file *sf, struct blkg_policy_data *pd,
-			       int off)
+static u64 tg_prfill_conf_uint(struct seq_file *sf, void *pdata, int off)
 {
-	unsigned int v = *(unsigned int *)((void *)pd->pdata + off);
+	unsigned int v = *(unsigned int *)(pdata + off);
 
 	if (v == -1)
 		return 0;
-	return __blkg_prfill_u64(sf, pd, v);
+	return __blkg_prfill_u64(sf, pdata, v);
 }
 
 static int tg_print_conf_u64(struct cgroup *cgrp, struct cftype *cft,
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index adab10d..fd505f7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1368,14 +1368,13 @@ static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg)
 	cfqg_get(cfqg);
 }
 
-static u64 cfqg_prfill_weight_device(struct seq_file *sf,
-				     struct blkg_policy_data *pd, int off)
+static u64 cfqg_prfill_weight_device(struct seq_file *sf, void *pdata, int off)
 {
-	struct cfq_group *cfqg = (void *)pd->pdata;
+	struct cfq_group *cfqg = pdata;
 
 	if (!cfqg->dev_weight)
 		return 0;
-	return __blkg_prfill_u64(sf, pd, cfqg->dev_weight);
+	return __blkg_prfill_u64(sf, pdata, cfqg->dev_weight);
 }
 
 static int cfqg_print_weight_device(struct cgroup *cgrp, struct cftype *cft,
@@ -1443,10 +1442,9 @@ static int cfq_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
 }
 
 #ifdef CONFIG_DEBUG_BLK_CGROUP
-static u64 cfqg_prfill_avg_queue_size(struct seq_file *sf,
-				      struct blkg_policy_data *pd, int off)
+static u64 cfqg_prfill_avg_queue_size(struct seq_file *sf, void *pdata, int off)
 {
-	struct cfq_group *cfqg = (void *)pd->pdata;
+	struct cfq_group *cfqg = pdata;
 	u64 samples = blkg_stat_read(&cfqg->stats.avg_queue_size_samples);
 	u64 v = 0;
 
@@ -1454,7 +1452,7 @@ static u64 cfqg_prfill_avg_queue_size(struct seq_file *sf,
 		v = blkg_stat_read(&cfqg->stats.avg_queue_size_sum);
 		do_div(v, samples);
 	}
-	__blkg_prfill_u64(sf, pd, v);
+	__blkg_prfill_u64(sf, pdata, v);
 	return 0;
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 21/21] blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (19 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 20/21] blkcg: pass around pd->pdata instead of pd itself in prfill functions Tejun Heo
@ 2012-03-28 22:51 ` Tejun Heo
  2012-03-29  8:18 ` [PATCHSET] block: modularize blkcg config and stat file handling Jens Axboe
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:51 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers, Tejun Heo

Now that all stat handling code lives in policy implementations,
there's no need to encode policy ID in cft->private.

* Export blkcg_prfill_[rw]stat() from blkcg, remove
  blkcg_print_[rw]stat(), and implement cfqg_print_[rw]stat() which
  use hard-code BLKIO_POLICY_PROP.

* Use cft->private for offset of the target field directly and drop
  BLKCG_STAT_{PRIV|POL|OFF}().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c   |   48 ++++++++++++----------------
 block/blk-cgroup.h   |   11 +-----
 block/blk-throttle.c |   12 ++-----
 block/cfq-iosched.c  |   85 +++++++++++++++++++++++++++-----------------------
 4 files changed, 72 insertions(+), 84 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 8f678d7..f762333 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -430,43 +430,35 @@ u64 __blkg_prfill_rwstat(struct seq_file *sf, void *pdata,
 	return v;
 }
 
-static u64 blkg_prfill_stat(struct seq_file *sf, void *pdata, int off)
+/**
+ * blkg_prfill_stat - prfill callback for blkg_stat
+ * @sf: seq_file to print to
+ * @pdata: policy private data of interest
+ * @off: offset to the blkg_stat in @pdata
+ *
+ * prfill callback for printing a blkg_stat.
+ */
+u64 blkg_prfill_stat(struct seq_file *sf, void *pdata, int off)
 {
 	return __blkg_prfill_u64(sf, pdata, blkg_stat_read(pdata + off));
 }
+EXPORT_SYMBOL_GPL(blkg_prfill_stat);
 
-static u64 blkg_prfill_rwstat(struct seq_file *sf, void *pdata, int off)
+/**
+ * blkg_prfill_rwstat - prfill callback for blkg_rwstat
+ * @sf: seq_file to print to
+ * @pdata: policy private data of interest
+ * @off: offset to the blkg_rwstat in @pdata
+ *
+ * prfill callback for printing a blkg_rwstat.
+ */
+u64 blkg_prfill_rwstat(struct seq_file *sf, void *pdata, int off)
 {
 	struct blkg_rwstat rwstat = blkg_rwstat_read(pdata + off);
 
 	return __blkg_prfill_rwstat(sf, pdata, &rwstat);
 }
-
-/* print blkg_stat specified by BLKCG_STAT_PRIV() */
-int blkcg_print_stat(struct cgroup *cgrp, struct cftype *cft,
-		     struct seq_file *sf)
-{
-	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-
-	blkcg_print_blkgs(sf, blkcg, blkg_prfill_stat,
-			  BLKCG_STAT_POL(cft->private),
-			  BLKCG_STAT_OFF(cft->private), false);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(blkcg_print_stat);
-
-/* print blkg_rwstat specified by BLKCG_STAT_PRIV() */
-int blkcg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
-		       struct seq_file *sf)
-{
-	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
-
-	blkcg_print_blkgs(sf, blkcg, blkg_prfill_rwstat,
-			  BLKCG_STAT_POL(cft->private),
-			  BLKCG_STAT_OFF(cft->private), true);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(blkcg_print_rwstat);
+EXPORT_SYMBOL_GPL(blkg_prfill_rwstat);
 
 /**
  * blkg_conf_prep - parse and prepare for per-blkg config update
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index c930895..ca0ff7c 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -34,11 +34,6 @@ enum blkio_policy_id {
 #define CFQ_WEIGHT_MAX		1000
 #define CFQ_WEIGHT_DEFAULT	500
 
-/* cft->private [un]packing for stat printing */
-#define BLKCG_STAT_PRIV(pol, off)	(((unsigned)(pol) << 16) | (off))
-#define BLKCG_STAT_POL(prv)		((unsigned)(prv) >> 16)
-#define BLKCG_STAT_OFF(prv)		((unsigned)(prv) & 0xffff)
-
 enum blkg_rwstat_type {
 	BLKG_RWSTAT_READ,
 	BLKG_RWSTAT_WRITE,
@@ -131,10 +126,8 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg,
 u64 __blkg_prfill_u64(struct seq_file *sf, void *pdata, u64 v);
 u64 __blkg_prfill_rwstat(struct seq_file *sf, void *pdata,
 			 const struct blkg_rwstat *rwstat);
-int blkcg_print_stat(struct cgroup *cgrp, struct cftype *cft,
-		     struct seq_file *sf);
-int blkcg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
-		       struct seq_file *sf);
+u64 blkg_prfill_stat(struct seq_file *sf, void *pdata, int off);
+u64 blkg_prfill_rwstat(struct seq_file *sf, void *pdata, int off);
 
 struct blkg_conf_ctx {
 	struct gendisk		*disk;
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index bd6dbfe..6024014 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -941,15 +941,13 @@ static u64 tg_prfill_cpu_rwstat(struct seq_file *sf, void *pdata, int off)
 	return __blkg_prfill_rwstat(sf, pdata, &rwstat);
 }
 
-/* print per-cpu blkg_rwstat specified by BLKCG_STAT_PRIV() */
 static int tg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft,
 			       struct seq_file *sf)
 {
 	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
 
-	blkcg_print_blkgs(sf, blkcg, tg_prfill_cpu_rwstat,
-			  BLKCG_STAT_POL(cft->private),
-			  BLKCG_STAT_OFF(cft->private), true);
+	blkcg_print_blkgs(sf, blkcg, tg_prfill_cpu_rwstat, BLKIO_POLICY_THROTL,
+			  cft->private, true);
 	return 0;
 }
 
@@ -1067,14 +1065,12 @@ static struct cftype throtl_files[] = {
 	},
 	{
 		.name = "throttle.io_service_bytes",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_THROTL,
-				offsetof(struct tg_stats_cpu, service_bytes)),
+		.private = offsetof(struct tg_stats_cpu, service_bytes),
 		.read_seq_string = tg_print_cpu_rwstat,
 	},
 	{
 		.name = "throttle.io_serviced",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_THROTL,
-				offsetof(struct tg_stats_cpu, serviced)),
+		.private = offsetof(struct tg_stats_cpu, serviced),
 		.read_seq_string = tg_print_cpu_rwstat,
 	},
 	{ }	/* terminate */
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index fd505f7..cff8b5b 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1441,6 +1441,26 @@ static int cfq_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
 	return 0;
 }
 
+static int cfqg_print_stat(struct cgroup *cgrp, struct cftype *cft,
+			   struct seq_file *sf)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_stat, BLKIO_POLICY_PROP,
+			  cft->private, false);
+	return 0;
+}
+
+static int cfqg_print_rwstat(struct cgroup *cgrp, struct cftype *cft,
+			     struct seq_file *sf)
+{
+	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
+
+	blkcg_print_blkgs(sf, blkcg, blkg_prfill_rwstat, BLKIO_POLICY_PROP,
+			  cft->private, true);
+	return 0;
+}
+
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 static u64 cfqg_prfill_avg_queue_size(struct seq_file *sf, void *pdata, int off)
 {
@@ -1482,51 +1502,43 @@ static struct cftype cfq_blkcg_files[] = {
 	},
 	{
 		.name = "time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.time)),
-		.read_seq_string = blkcg_print_stat,
+		.private = offsetof(struct cfq_group, stats.time),
+		.read_seq_string = cfqg_print_stat,
 	},
 	{
 		.name = "sectors",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.sectors)),
-		.read_seq_string = blkcg_print_stat,
+		.private = offsetof(struct cfq_group, stats.sectors),
+		.read_seq_string = cfqg_print_stat,
 	},
 	{
 		.name = "io_service_bytes",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.service_bytes)),
-		.read_seq_string = blkcg_print_rwstat,
+		.private = offsetof(struct cfq_group, stats.service_bytes),
+		.read_seq_string = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_serviced",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.serviced)),
-		.read_seq_string = blkcg_print_rwstat,
+		.private = offsetof(struct cfq_group, stats.serviced),
+		.read_seq_string = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_service_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.service_time)),
-		.read_seq_string = blkcg_print_rwstat,
+		.private = offsetof(struct cfq_group, stats.service_time),
+		.read_seq_string = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_wait_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.wait_time)),
-		.read_seq_string = blkcg_print_rwstat,
+		.private = offsetof(struct cfq_group, stats.wait_time),
+		.read_seq_string = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_merged",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.merged)),
-		.read_seq_string = blkcg_print_rwstat,
+		.private = offsetof(struct cfq_group, stats.merged),
+		.read_seq_string = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_queued",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.queued)),
-		.read_seq_string = blkcg_print_rwstat,
+		.private = offsetof(struct cfq_group, stats.queued),
+		.read_seq_string = cfqg_print_rwstat,
 	},
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 	{
@@ -1535,33 +1547,28 @@ static struct cftype cfq_blkcg_files[] = {
 	},
 	{
 		.name = "group_wait_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.group_wait_time)),
-		.read_seq_string = blkcg_print_stat,
+		.private = offsetof(struct cfq_group, stats.group_wait_time),
+		.read_seq_string = cfqg_print_stat,
 	},
 	{
 		.name = "idle_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.idle_time)),
-		.read_seq_string = blkcg_print_stat,
+		.private = offsetof(struct cfq_group, stats.idle_time),
+		.read_seq_string = cfqg_print_stat,
 	},
 	{
 		.name = "empty_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.empty_time)),
-		.read_seq_string = blkcg_print_stat,
+		.private = offsetof(struct cfq_group, stats.empty_time),
+		.read_seq_string = cfqg_print_stat,
 	},
 	{
 		.name = "dequeue",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.dequeue)),
-		.read_seq_string = blkcg_print_stat,
+		.private = offsetof(struct cfq_group, stats.dequeue),
+		.read_seq_string = cfqg_print_stat,
 	},
 	{
 		.name = "unaccounted_time",
-		.private = BLKCG_STAT_PRIV(BLKIO_POLICY_PROP,
-				offsetof(struct cfq_group, stats.unaccounted_time)),
-		.read_seq_string = blkcg_print_stat,
+		.private = offsetof(struct cfq_group, stats.unaccounted_time),
+		.read_seq_string = cfqg_print_stat,
 	},
 #endif	/* CONFIG_DEBUG_BLK_CGROUP */
 	{ }	/* terminate */
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 08/21] blkcg: blkg_conf_prep()
  2012-03-28 22:51 ` [PATCH 08/21] blkcg: blkg_conf_prep() Tejun Heo
@ 2012-03-28 22:53   ` Tejun Heo
  0 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-03-28 22:53 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

Oops, $SUBJ should have been "simplify blkg_conf_prep()".

-- 
tejun

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCHSET] block: modularize blkcg config and stat file handling
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (20 preceding siblings ...)
  2012-03-28 22:51 ` [PATCH 21/21] blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros Tejun Heo
@ 2012-03-29  8:18 ` Jens Axboe
  2012-04-02 20:02   ` Tejun Heo
  2012-04-01 19:38 ` Vivek Goyal
  2012-04-01 21:42 ` Tejun Heo
  23 siblings, 1 reply; 56+ messages in thread
From: Jens Axboe @ 2012-03-29  8:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

On 03/29/2012 12:51 AM, Tejun Heo wrote:
> and is on top of
> 
>   block/for-3.5/core eb7d8c07f9 "cfq: fix cfqg ref handling..."
> + [1] cgroup-cftypes d954ca6469 "cgroup: implement cgroup_rm_cftypes()"
> 
> Note that the cgroup branch is temporary and the merge between the two
> branches aren't trivial.  I'll prepare a proper merged branch once the
> cgroup/for-3.5 branch is settled.

The diffstat is very tasty... I'll leave this on out until the cgroup
merge mess is settled, though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCHSET] block: modularize blkcg config and stat file handling
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (21 preceding siblings ...)
  2012-03-29  8:18 ` [PATCHSET] block: modularize blkcg config and stat file handling Jens Axboe
@ 2012-04-01 19:38 ` Vivek Goyal
  2012-04-01 21:42 ` Tejun Heo
  23 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2012-04-01 19:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Wed, Mar 28, 2012 at 03:51:10PM -0700, Tejun Heo wrote:

[..]
> This patchset is an attempt at bringing some sanity to blkcg config
> and stat file handling.  It makes use of the pending dynamic cgroup
> file type addition / removal support [1], which will be merged into
> cgroup/for-3.5 once 3.4-rc1 is released.

Thanks Tejun. This looks like a nice cleanup. Especially the stat part.
It was very messy in blkcg core code. Now atleast it is clear who owns what
file and who needs per cpu stats etc. And new policies should be easily
able to define their own cgroup files.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-03-28 22:51 ` [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq Tejun Heo
@ 2012-04-01 21:09   ` Vivek Goyal
  2012-04-01 21:22     ` Tejun Heo
  2012-04-02 21:39   ` Tao Ma
  1 sibling, 1 reply; 56+ messages in thread
From: Vivek Goyal @ 2012-04-01 21:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers, Fengguang Wu

On Wed, Mar 28, 2012 at 03:51:28PM -0700, Tejun Heo wrote:
> blkio_group_conf->weight is owned by cfq and has no reason to be
> defined in blkcg core.  Replace it with cfq_group->dev_weight and let
> conf setting functions directly set it.  If dev_weight is zero, the
> cfqg doesn't have device specific weight configured.
> 
> Also, rename BLKIO_WEIGHT_* constants to CFQ_WEIGHT_* and rename
> blkio_cgroup->weight to blkio_cgroup->cfq_weight.  We eventually want
> per-policy storage in blkio_cgroup but just mark the ownership of the
> field for now.

Hi Tejun,

blkio_cgroup->weight can be thought of in a more generic manner and that
is sytem wide cgroup weight. And more than one policy should be allowed
to make use of it. That's a differnt thing that currently only CFQ makes
use of it.

For example, Fengguang posted RFC patches to try to make use of blkcg->weight
and differentiate between buffered write bandwidth.

https://lkml.org/lkml/2012/3/28/275

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-04-01 21:09   ` Vivek Goyal
@ 2012-04-01 21:22     ` Tejun Heo
  0 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-04-01 21:22 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: axboe, ctalbott, rni, linux-kernel, cgroups, containers, Fengguang Wu

Hello, Vivek.

On Sun, Apr 01, 2012 at 05:09:56PM -0400, Vivek Goyal wrote:
> blkio_cgroup->weight can be thought of in a more generic manner and that
> is sytem wide cgroup weight. And more than one policy should be allowed
> to make use of it. That's a differnt thing that currently only CFQ makes
> use of it.
> 
> For example, Fengguang posted RFC patches to try to make use of blkcg->weight
> and differentiate between buffered write bandwidth.
> 
> https://lkml.org/lkml/2012/3/28/275

I don't think that's a good idea.  It makes it fuzzy which knob
controls what.  If we're gonna have single set of controls followed by
all controllers, fine, but I really don't think we should be mixing
different layers of configurations.  I mean, what about blkcg.read_bps
then?  Let's just give this one to cfq.  If someone else wants weight,
let it use its own weight config.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCHSET] block: modularize blkcg config and stat file handling
  2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
                   ` (22 preceding siblings ...)
  2012-04-01 19:38 ` Vivek Goyal
@ 2012-04-01 21:42 ` Tejun Heo
  23 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-04-01 21:42 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

On Wed, Mar 28, 2012 at 03:51:10PM -0700, Tejun Heo wrote:
> and is on top of
> 
>   block/for-3.5/core eb7d8c07f9 "cfq: fix cfqg ref handling..."
> + [1] cgroup-cftypes d954ca6469 "cgroup: implement cgroup_rm_cftypes()"
> 
> Note that the cgroup branch is temporary and the merge between the two
> branches aren't trivial.  I'll prepare a proper merged branch once the
> cgroup/for-3.5 branch is settled.
> 
> This patchset is also available in the following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blkcg-files

The above branch has been updated on top of block/for-3.5/core +
cgroup/for-3.5.  Ready to be pulled.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCHSET] block: modularize blkcg config and stat file handling
  2012-03-29  8:18 ` [PATCHSET] block: modularize blkcg config and stat file handling Jens Axboe
@ 2012-04-02 20:02   ` Tejun Heo
  2012-04-02 21:51     ` Jens Axboe
  0 siblings, 1 reply; 56+ messages in thread
From: Tejun Heo @ 2012-04-02 20:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

Hey, Jens.

On Thu, Mar 29, 2012 at 10:18:53AM +0200, Jens Axboe wrote:
> The diffstat is very tasty... I'll leave this on out until the cgroup
> merge mess is settled, though.

The following branch contains this patchset on top of the current
block/for-3.5/core 959d851caa "Merge branch 'for-3.5' of ../cgroup
into block/for-3.5/core-merged".

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blkcg-files

Other than $SUBJ update on the eigth patch from "blkcg:
blkg_conf_prep()" to "blkcg: simplify blkg_conf_prep()", the patchset
can be applied as-is on top of the current block/for-3.5/core.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-03-28 22:51 ` [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq Tejun Heo
  2012-04-01 21:09   ` Vivek Goyal
@ 2012-04-02 21:39   ` Tao Ma
  2012-04-02 21:49     ` Tejun Heo
  1 sibling, 1 reply; 56+ messages in thread
From: Tao Ma @ 2012-04-02 21:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

Hi Tejun,
On 03/29/2012 06:51 AM, Tejun Heo wrote:
> blkio_group_conf->weight is owned by cfq and has no reason to be
> defined in blkcg core.  Replace it with cfq_group->dev_weight and let
> conf setting functions directly set it.  If dev_weight is zero, the
> cfqg doesn't have device specific weight configured.
> 
> Also, rename BLKIO_WEIGHT_* constants to CFQ_WEIGHT_* and rename
> blkio_cgroup->weight to blkio_cgroup->cfq_weight.  We eventually want
> per-policy storage in blkio_cgroup but just mark the ownership of the
> field for now.
I guess blkio->weight is a generic way of abstracting the weight between
different block cgroups. Yes, currently, only cfq uses it, but I am
trying to improve Shaohua's original fiops scheduler and add cgroup
support to it. So please leave it there so that future scheduler(if
other than the fiops scheduler) can use the framework.

Thanks
Tao
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  block/blk-cgroup.c  |    4 +-
>  block/blk-cgroup.h  |   14 +++++----
>  block/cfq-iosched.c |   77 +++++++++++++++++++++++---------------------------
>  3 files changed, 45 insertions(+), 50 deletions(-)
> 
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 0b4b765..7688aef 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -30,7 +30,7 @@ static LIST_HEAD(blkio_list);
>  static DEFINE_MUTEX(all_q_mutex);
>  static LIST_HEAD(all_q_list);
>  
> -struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
> +struct blkio_cgroup blkio_root_cgroup = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT };
>  EXPORT_SYMBOL_GPL(blkio_root_cgroup);
>  
>  static struct blkio_policy_type *blkio_policy[BLKIO_NR_POLICIES];
> @@ -611,7 +611,7 @@ static struct cgroup_subsys_state *blkiocg_create(struct cgroup *cgroup)
>  	if (!blkcg)
>  		return ERR_PTR(-ENOMEM);
>  
> -	blkcg->weight = BLKIO_WEIGHT_DEFAULT;
> +	blkcg->cfq_weight = CFQ_WEIGHT_DEFAULT;
>  	blkcg->id = atomic64_inc_return(&id_seq); /* root is 0, start from 1 */
>  done:
>  	spin_lock_init(&blkcg->lock);
> diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
> index e368dd00..386db29 100644
> --- a/block/blk-cgroup.h
> +++ b/block/blk-cgroup.h
> @@ -29,6 +29,11 @@ enum blkio_policy_id {
>  
>  #ifdef CONFIG_BLK_CGROUP
>  
> +/* CFQ specific, out here for blkcg->cfq_weight */
> +#define CFQ_WEIGHT_MIN		10
> +#define CFQ_WEIGHT_MAX		1000
> +#define CFQ_WEIGHT_DEFAULT	500
> +
>  /* cft->private [un]packing for stat printing */
>  #define BLKCG_STAT_PRIV(pol, off)	(((unsigned)(pol) << 16) | (off))
>  #define BLKCG_STAT_POL(prv)		((unsigned)(prv) >> 16)
> @@ -46,12 +51,14 @@ enum blkg_rwstat_type {
>  
>  struct blkio_cgroup {
>  	struct cgroup_subsys_state css;
> -	unsigned int weight;
>  	spinlock_t lock;
>  	struct hlist_head blkg_list;
>  
>  	/* for policies to test whether associated blkcg has changed */
>  	uint64_t id;
> +
> +	/* TODO: per-policy storage in blkio_cgroup */
> +	unsigned int cfq_weight;	/* belongs to cfq */
>  };
>  
>  struct blkg_stat {
> @@ -65,7 +72,6 @@ struct blkg_rwstat {
>  };
>  
>  struct blkio_group_conf {
> -	unsigned int weight;
>  	u64 iops[2];
>  	u64 bps[2];
>  };
> @@ -355,10 +361,6 @@ static inline void blkg_put(struct blkio_group *blkg) { }
>  
>  #endif
>  
> -#define BLKIO_WEIGHT_MIN	10
> -#define BLKIO_WEIGHT_MAX	1000
> -#define BLKIO_WEIGHT_DEFAULT	500
> -
>  #ifdef CONFIG_BLK_CGROUP
>  extern struct blkio_cgroup blkio_root_cgroup;
>  extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index a1f37df..adab10d 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -224,7 +224,7 @@ struct cfq_group {
>  	u64 vdisktime;
>  	unsigned int weight;
>  	unsigned int new_weight;
> -	bool needs_update;
> +	unsigned int dev_weight;
>  
>  	/* number of cfqq currently on this group */
>  	int nr_cfqq;
> @@ -838,7 +838,7 @@ static inline u64 cfq_scale_slice(unsigned long delta, struct cfq_group *cfqg)
>  {
>  	u64 d = delta << CFQ_SERVICE_SHIFT;
>  
> -	d = d * BLKIO_WEIGHT_DEFAULT;
> +	d = d * CFQ_WEIGHT_DEFAULT;
>  	do_div(d, cfqg->weight);
>  	return d;
>  }
> @@ -1165,9 +1165,9 @@ static void
>  cfq_update_group_weight(struct cfq_group *cfqg)
>  {
>  	BUG_ON(!RB_EMPTY_NODE(&cfqg->rb_node));
> -	if (cfqg->needs_update) {
> +	if (cfqg->new_weight) {
>  		cfqg->weight = cfqg->new_weight;
> -		cfqg->needs_update = false;
> +		cfqg->new_weight = 0;
>  	}
>  }
>  
> @@ -1325,21 +1325,12 @@ static void cfq_init_cfqg_base(struct cfq_group *cfqg)
>  }
>  
>  #ifdef CONFIG_CFQ_GROUP_IOSCHED
> -static void cfq_update_blkio_group_weight(struct blkio_group *blkg,
> -					  unsigned int weight)
> -{
> -	struct cfq_group *cfqg = blkg_to_cfqg(blkg);
> -
> -	cfqg->new_weight = weight;
> -	cfqg->needs_update = true;
> -}
> -
>  static void cfq_init_blkio_group(struct blkio_group *blkg)
>  {
>  	struct cfq_group *cfqg = blkg_to_cfqg(blkg);
>  
>  	cfq_init_cfqg_base(cfqg);
> -	cfqg->weight = blkg->blkcg->weight;
> +	cfqg->weight = blkg->blkcg->cfq_weight;
>  }
>  
>  /*
> @@ -1377,36 +1368,38 @@ static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg)
>  	cfqg_get(cfqg);
>  }
>  
> -static u64 blkg_prfill_weight_device(struct seq_file *sf,
> +static u64 cfqg_prfill_weight_device(struct seq_file *sf,
>  				     struct blkg_policy_data *pd, int off)
>  {
> -	if (!pd->conf.weight)
> +	struct cfq_group *cfqg = (void *)pd->pdata;
> +
> +	if (!cfqg->dev_weight)
>  		return 0;
> -	return __blkg_prfill_u64(sf, pd, pd->conf.weight);
> +	return __blkg_prfill_u64(sf, pd, cfqg->dev_weight);
>  }
>  
> -static int blkcg_print_weight_device(struct cgroup *cgrp, struct cftype *cft,
> -				     struct seq_file *sf)
> +static int cfqg_print_weight_device(struct cgroup *cgrp, struct cftype *cft,
> +				    struct seq_file *sf)
>  {
>  	blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp),
> -			  blkg_prfill_weight_device, BLKIO_POLICY_PROP, 0,
> +			  cfqg_prfill_weight_device, BLKIO_POLICY_PROP, 0,
>  			  false);
>  	return 0;
>  }
>  
> -static int blkcg_print_weight(struct cgroup *cgrp, struct cftype *cft,
> -			      struct seq_file *sf)
> +static int cfq_print_weight(struct cgroup *cgrp, struct cftype *cft,
> +			    struct seq_file *sf)
>  {
> -	seq_printf(sf, "%u\n", cgroup_to_blkio_cgroup(cgrp)->weight);
> +	seq_printf(sf, "%u\n", cgroup_to_blkio_cgroup(cgrp)->cfq_weight);
>  	return 0;
>  }
>  
> -static int blkcg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
> -				   const char *buf)
> +static int cfqg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
> +				  const char *buf)
>  {
>  	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
> -	struct blkg_policy_data *pd;
>  	struct blkg_conf_ctx ctx;
> +	struct cfq_group *cfqg;
>  	int ret;
>  
>  	ret = blkg_conf_prep(blkcg, buf, &ctx);
> @@ -1414,11 +1407,11 @@ static int blkcg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
>  		return ret;
>  
>  	ret = -EINVAL;
> -	pd = ctx.blkg->pd[BLKIO_POLICY_PROP];
> -	if (pd && (!ctx.v || (ctx.v >= BLKIO_WEIGHT_MIN &&
> -			      ctx.v <= BLKIO_WEIGHT_MAX))) {
> -		pd->conf.weight = ctx.v;
> -		cfq_update_blkio_group_weight(ctx.blkg, ctx.v ?: blkcg->weight);
> +	cfqg = blkg_to_cfqg(ctx.blkg);
> +	if (cfqg && (!ctx.v || (ctx.v >= CFQ_WEIGHT_MIN &&
> +				ctx.v <= CFQ_WEIGHT_MAX))) {
> +		cfqg->dev_weight = ctx.v;
> +		cfqg->new_weight = cfqg->dev_weight ?: blkcg->cfq_weight;
>  		ret = 0;
>  	}
>  
> @@ -1426,23 +1419,23 @@ static int blkcg_set_weight_device(struct cgroup *cgrp, struct cftype *cft,
>  	return ret;
>  }
>  
> -static int blkcg_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
> +static int cfq_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val)
>  {
>  	struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp);
>  	struct blkio_group *blkg;
>  	struct hlist_node *n;
>  
> -	if (val < BLKIO_WEIGHT_MIN || val > BLKIO_WEIGHT_MAX)
> +	if (val < CFQ_WEIGHT_MIN || val > CFQ_WEIGHT_MAX)
>  		return -EINVAL;
>  
>  	spin_lock_irq(&blkcg->lock);
> -	blkcg->weight = (unsigned int)val;
> +	blkcg->cfq_weight = (unsigned int)val;
>  
>  	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
> -		struct blkg_policy_data *pd = blkg->pd[BLKIO_POLICY_PROP];
> +		struct cfq_group *cfqg = blkg_to_cfqg(blkg);
>  
> -		if (pd && !pd->conf.weight)
> -			cfq_update_blkio_group_weight(blkg, blkcg->weight);
> +		if (cfqg && !cfqg->dev_weight)
> +			cfqg->new_weight = blkcg->cfq_weight;
>  	}
>  
>  	spin_unlock_irq(&blkcg->lock);
> @@ -1480,14 +1473,14 @@ static int cfqg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft,
>  static struct cftype cfq_blkcg_files[] = {
>  	{
>  		.name = "weight_device",
> -		.read_seq_string = blkcg_print_weight_device,
> -		.write_string = blkcg_set_weight_device,
> +		.read_seq_string = cfqg_print_weight_device,
> +		.write_string = cfqg_set_weight_device,
>  		.max_write_len = 256,
>  	},
>  	{
>  		.name = "weight",
> -		.read_seq_string = blkcg_print_weight,
> -		.write_u64 = blkcg_set_weight,
> +		.read_seq_string = cfq_print_weight,
> +		.write_u64 = cfq_set_weight,
>  	},
>  	{
>  		.name = "time",
> @@ -3983,7 +3976,7 @@ static int cfq_init_queue(struct request_queue *q)
>  		return -ENOMEM;
>  	}
>  
> -	cfqd->root_group->weight = 2*BLKIO_WEIGHT_DEFAULT;
> +	cfqd->root_group->weight = 2 * CFQ_WEIGHT_DEFAULT;
>  
>  	/*
>  	 * Not strictly needed (since RB_ROOT just clears the node and we


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-04-02 21:39   ` Tao Ma
@ 2012-04-02 21:49     ` Tejun Heo
  2012-04-02 22:03       ` Tao Ma
  0 siblings, 1 reply; 56+ messages in thread
From: Tejun Heo @ 2012-04-02 21:49 UTC (permalink / raw)
  To: Tao Ma; +Cc: axboe, vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

Hello,

On Tue, Apr 03, 2012 at 05:39:23AM +0800, Tao Ma wrote:
> I guess blkio->weight is a generic way of abstracting the weight between
> different block cgroups.

It isn't and can't be.  There's nothing generic about it across
different policies and it's not even clear what that means.  If the
user chooses combine, iops limit with cfq weights, what the hell is
"generic" about that weight?

> Yes, currently, only cfq uses it, but I am
> trying to improve Shaohua's original fiops scheduler and add cgroup
> support to it. So please leave it there so that future scheduler(if
> other than the fiops scheduler) can use the framework.

So, if you want to implement a new blkcg policy, add the config
parameters and export the stats the policy wants *yourself*.  Not
having clear separation between policies and generic stuff was what
led us to this yucky mess and there's no way we're going back there.
So, NO.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCHSET] block: modularize blkcg config and stat file handling
  2012-04-02 20:02   ` Tejun Heo
@ 2012-04-02 21:51     ` Jens Axboe
  2012-04-02 22:33       ` Tejun Heo
  0 siblings, 1 reply; 56+ messages in thread
From: Jens Axboe @ 2012-04-02 21:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

On 2012-04-02 13:02, Tejun Heo wrote:
> Hey, Jens.
> 
> On Thu, Mar 29, 2012 at 10:18:53AM +0200, Jens Axboe wrote:
>> The diffstat is very tasty... I'll leave this on out until the cgroup
>> merge mess is settled, though.
> 
> The following branch contains this patchset on top of the current
> block/for-3.5/core 959d851caa "Merge branch 'for-3.5' of ../cgroup
> into block/for-3.5/core-merged".
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blkcg-files
> 
> Other than $SUBJ update on the eigth patch from "blkcg:
> blkg_conf_prep()" to "blkcg: simplify blkg_conf_prep()", the patchset
> can be applied as-is on top of the current block/for-3.5/core.

Alright, pulled in as well. I'm glad we pushed this to 3.5 :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-04-02 21:49     ` Tejun Heo
@ 2012-04-02 22:03       ` Tao Ma
  2012-04-02 22:17         ` Tejun Heo
  0 siblings, 1 reply; 56+ messages in thread
From: Tao Ma @ 2012-04-02 22:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

On 04/03/2012 05:49 AM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Apr 03, 2012 at 05:39:23AM +0800, Tao Ma wrote:
>> I guess blkio->weight is a generic way of abstracting the weight between
>> different block cgroups.
> 
> It isn't and can't be.  There's nothing generic about it across
> different policies and it's not even clear what that means.  If the
> user chooses combine, iops limit with cfq weights, what the hell is
> "generic" about that weight?
> 
>> Yes, currently, only cfq uses it, but I am
>> trying to improve Shaohua's original fiops scheduler and add cgroup
>> support to it. So please leave it there so that future scheduler(if
>> other than the fiops scheduler) can use the framework.
> 
> So, if you want to implement a new blkcg policy, add the config
> parameters and export the stats the policy wants *yourself*.  Not
> having clear separation between policies and generic stuff was what
> led us to this yucky mess and there's no way we're going back there.
> So, NO.
Currently weight is just used to calculate the time slice of different
cfq group, right? So why can't it be used to indicate other weight? So
say, if we are just want to use iops to indicate the difference between
different cgroups(100 weight vs 200 weight), so one process will send
100 ios while the other will send 200 ios just for example. We will need
a new iops_weight in your option to be exported?

Thanks
Tao

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-04-02 22:03       ` Tao Ma
@ 2012-04-02 22:17         ` Tejun Heo
  2012-04-02 22:20           ` Tao Ma
  0 siblings, 1 reply; 56+ messages in thread
From: Tejun Heo @ 2012-04-02 22:17 UTC (permalink / raw)
  To: Tao Ma; +Cc: axboe, vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

On Tue, Apr 03, 2012 at 06:03:03AM +0800, Tao Ma wrote:
> Currently weight is just used to calculate the time slice of different
> cfq group, right? So why can't it be used to indicate other weight? So
> say, if we are just want to use iops to indicate the difference between
> different cgroups(100 weight vs 200 weight), so one process will send
> 100 ios while the other will send 200 ios just for example.

Because it's configuring stuff which is completely unrelated.  Let's
say you added a new elevator w/ iops based proportional IO which
shares blkio.weight configuration with cfq but nothing else and in
turn your new thing would probably need some other config parameters
which don't make much sense for cfq, right?

Now, let's say there's a system which has two hard drives and sda is
using cfq and sdb is using your new elevator and you're trying to
configure cgroup blkio limits.  Now, you have blkio.weight which
applies to both elevators and other configurations which aren't and
from the looks of it there's no way to tell which configuration
controls what.

It also makes the configuration implementation hairier.  We'll need
callbacks from blkcg core layer to all policies to notify changes to
per-cgroup configuration and from there policies would have to decide
whether it has overriding per-cgroup-device configuration.  It's not
even clear we even want per-cgroup configuration.  blk-throttle only
has per-cgroup-device configuration after all.

So, again, no.  blkcg.weight isn't and won't be generic.

> We will need a new iops_weight in your option to be exported?

Yeah, just add config and stat files prefixed with the name of the new
blkcg policy.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-04-02 22:17         ` Tejun Heo
@ 2012-04-02 22:20           ` Tao Ma
  2012-04-02 22:25             ` Vivek Goyal
  0 siblings, 1 reply; 56+ messages in thread
From: Tao Ma @ 2012-04-02 22:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

On 04/03/2012 06:17 AM, Tejun Heo wrote:
> On Tue, Apr 03, 2012 at 06:03:03AM +0800, Tao Ma wrote:
>> Currently weight is just used to calculate the time slice of different
>> cfq group, right? So why can't it be used to indicate other weight? So
>> say, if we are just want to use iops to indicate the difference between
>> different cgroups(100 weight vs 200 weight), so one process will send
>> 100 ios while the other will send 200 ios just for example.
> 
> Because it's configuring stuff which is completely unrelated.  Let's
> say you added a new elevator w/ iops based proportional IO which
> shares blkio.weight configuration with cfq but nothing else and in
> turn your new thing would probably need some other config parameters
> which don't make much sense for cfq, right?
> 
> Now, let's say there's a system which has two hard drives and sda is
> using cfq and sdb is using your new elevator and you're trying to
> configure cgroup blkio limits.  Now, you have blkio.weight which
> applies to both elevators and other configurations which aren't and
> from the looks of it there's no way to tell which configuration
> controls what.
> 
> It also makes the configuration implementation hairier.  We'll need
> callbacks from blkcg core layer to all policies to notify changes to
> per-cgroup configuration and from there policies would have to decide
> whether it has overriding per-cgroup-device configuration.  It's not
> even clear we even want per-cgroup configuration.  blk-throttle only
> has per-cgroup-device configuration after all.
Fair enough.
> 
> So, again, no.  blkcg.weight isn't and won't be generic.
> 
>> We will need a new iops_weight in your option to be exported?
> 
> Yeah, just add config and stat files prefixed with the name of the new
> blkcg policy.
OK, I will add a new config file for it.

Thanks
Tao

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-04-02 22:20           ` Tao Ma
@ 2012-04-02 22:25             ` Vivek Goyal
  2012-04-02 22:28               ` Tejun Heo
  2012-04-02 22:41               ` Tao Ma
  0 siblings, 2 replies; 56+ messages in thread
From: Vivek Goyal @ 2012-04-02 22:25 UTC (permalink / raw)
  To: Tao Ma; +Cc: Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Tue, Apr 03, 2012 at 06:20:10AM +0800, Tao Ma wrote:

[..]
> > Yeah, just add config and stat files prefixed with the name of the new
> > blkcg policy.
> OK, I will add a new config file for it.

Only if CFQ could be modified to add one iops mode, flippable through a
sysfs tunable, things will be much simpler. You will not have to add a
new IO scheduler, no new configuration/stat files in blkcg (which is
already crowded now).

I don't think anybody has shown the code that why CFQ can't be modified
to support iops mode.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-04-02 22:25             ` Vivek Goyal
@ 2012-04-02 22:28               ` Tejun Heo
  2012-04-02 22:41               ` Tao Ma
  1 sibling, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-04-02 22:28 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tao Ma, axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Mon, Apr 02, 2012 at 06:25:04PM -0400, Vivek Goyal wrote:
> On Tue, Apr 03, 2012 at 06:20:10AM +0800, Tao Ma wrote:
> 
> [..]
> > > Yeah, just add config and stat files prefixed with the name of the new
> > > blkcg policy.
> > OK, I will add a new config file for it.
> 
> Only if CFQ could be modified to add one iops mode, flippable through a
> sysfs tunable, things will be much simpler. You will not have to add a
> new IO scheduler, no new configuration/stat files in blkcg (which is
> already crowded now).
> 
> I don't think anybody has shown the code that why CFQ can't be modified
> to support iops mode.

I haven't looked at the code so it's just an imporession but if we're
talking about completely different scheduling policy - cfq is about
slicing disk service time and IIUC the new thing being talked about is
using iops as scheduling unit probably for devices where seeking isn't
extremely expensive, and it might not make much sense to mix them
together.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCHSET] block: modularize blkcg config and stat file handling
  2012-04-02 21:51     ` Jens Axboe
@ 2012-04-02 22:33       ` Tejun Heo
  0 siblings, 0 replies; 56+ messages in thread
From: Tejun Heo @ 2012-04-02 22:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: vgoyal, ctalbott, rni, linux-kernel, cgroups, containers

On Mon, Apr 02, 2012 at 02:51:14PM -0700, Jens Axboe wrote:
> > Other than $SUBJ update on the eigth patch from "blkcg:
> > blkg_conf_prep()" to "blkcg: simplify blkg_conf_prep()", the patchset
> > can be applied as-is on top of the current block/for-3.5/core.
> 
> Alright, pulled in as well. I'm glad we pushed this to 3.5 :-)

Yeah, I'm fairly sure if we had shoved this through 3.4, we would be
having another round of f**k you's from Linus.

I'll probably send one or two more rounds of patches to clean up
in-place blkg updates and make active policy selection per
request_queue, but I'm glad to finally have the end in sight.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq
  2012-04-02 22:25             ` Vivek Goyal
  2012-04-02 22:28               ` Tejun Heo
@ 2012-04-02 22:41               ` Tao Ma
  2012-04-03 15:37                 ` IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq) Vivek Goyal
  1 sibling, 1 reply; 56+ messages in thread
From: Tao Ma @ 2012-04-02 22:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups, containers

On 04/03/2012 06:25 AM, Vivek Goyal wrote:
> On Tue, Apr 03, 2012 at 06:20:10AM +0800, Tao Ma wrote:
> 
> [..]
>>> Yeah, just add config and stat files prefixed with the name of the new
>>> blkcg policy.
>> OK, I will add a new config file for it.
> 
> Only if CFQ could be modified to add one iops mode, flippable through a
> sysfs tunable, things will be much simpler. You will not have to add a
> new IO scheduler, no new configuration/stat files in blkcg (which is
> already crowded now).
> 
> I don't think anybody has shown the code that why CFQ can't be modified
> to support iops mode.
Yes, I have thought of it, but it seems to me that time slice is deeply
involved within the cfq(even current cfq's iops mode has used time slice
to calculate). So I don't think it is feasible for me to change it. And
cfq works perfectly well for sas/sata environment and the code is quite
stable, more codes and more complicate algorithm does mean more bugs. So
I guess a new iops based scheduler is easy and not intrusive for the
user(since he can choose whether to use it or not).

Thanks
Tao

^ permalink raw reply	[flat|nested] 56+ messages in thread

* IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-02 22:41               ` Tao Ma
@ 2012-04-03 15:37                 ` Vivek Goyal
  2012-04-03 16:36                   ` Tao Ma
  0 siblings, 1 reply; 56+ messages in thread
From: Vivek Goyal @ 2012-04-03 15:37 UTC (permalink / raw)
  To: Tao Ma; +Cc: Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups, containers

On Tue, Apr 03, 2012 at 06:41:37AM +0800, Tao Ma wrote:
> On 04/03/2012 06:25 AM, Vivek Goyal wrote:
> > On Tue, Apr 03, 2012 at 06:20:10AM +0800, Tao Ma wrote:
> > 
> > [..]
> >>> Yeah, just add config and stat files prefixed with the name of the new
> >>> blkcg policy.
> >> OK, I will add a new config file for it.
> > 
> > Only if CFQ could be modified to add one iops mode, flippable through a
> > sysfs tunable, things will be much simpler. You will not have to add a
> > new IO scheduler, no new configuration/stat files in blkcg (which is
> > already crowded now).
> > 
> > I don't think anybody has shown the code that why CFQ can't be modified
> > to support iops mode.
> Yes, I have thought of it, but it seems to me that time slice is deeply
> involved within the cfq(even current cfq's iops mode has used time slice
> to calculate). So I don't think it is feasible for me to change it. And
> cfq works perfectly well for sas/sata environment and the code is quite
> stable, more codes and more complicate algorithm does mean more bugs. So
> I guess a new iops based scheduler is easy and not intrusive for the
> user(since he can choose whether to use it or not).

Ok, let me take one step back.

- What's the goal of iops based scheduler. In what kind of workload and
  storage it is going to help.

- Can't we just set the slice_idle=0 and "quantum" to some high value
  say "64" or "128" and achieve similar results to iops based scheduler?

In theory, above will cut down on idling and try to provide fairness in
terms of time. I thought fairness in terms of time is most fair. The
most common problem is measurement of time is not attributable to
individual queue in an NCQ hardware. I guess that throws time measurement
of out the window until and unless we have a better algorithm to measure
time in NCQ environment.

I guess then we can just replace time with number of requests dispatched
from a process queue. Allow it to dispatch requests for some time and
then schedule it out and put it back on service tree and charge it 
according to its weight.

This all works only if we have right workload. The workloads which are
not doing dependent reads and can keep the disk busy continuously. If
there is think time involved, and we do not idle, process will lose its
share and whole scheme of trying to differentiate between processes will
become ineffective.

So if you have come with a better algorith which can keep track of iops
without idling and still provide service differentiation for common 
workloads, it will be interesting. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-03 15:37                 ` IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq) Vivek Goyal
@ 2012-04-03 16:36                   ` Tao Ma
  2012-04-03 16:50                     ` Vivek Goyal
  0 siblings, 1 reply; 56+ messages in thread
From: Tao Ma @ 2012-04-03 16:36 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups,
	containers, Shaohua Li

add Shaohua to the cc list,
On 04/03/2012 11:37 PM, Vivek Goyal wrote:
> On Tue, Apr 03, 2012 at 06:41:37AM +0800, Tao Ma wrote:
>> On 04/03/2012 06:25 AM, Vivek Goyal wrote:
>>> On Tue, Apr 03, 2012 at 06:20:10AM +0800, Tao Ma wrote:
>>>
>>> [..]
>>>>> Yeah, just add config and stat files prefixed with the name of the new
>>>>> blkcg policy.
>>>> OK, I will add a new config file for it.
>>>
>>> Only if CFQ could be modified to add one iops mode, flippable through a
>>> sysfs tunable, things will be much simpler. You will not have to add a
>>> new IO scheduler, no new configuration/stat files in blkcg (which is
>>> already crowded now).
>>>
>>> I don't think anybody has shown the code that why CFQ can't be modified
>>> to support iops mode.
>> Yes, I have thought of it, but it seems to me that time slice is deeply
>> involved within the cfq(even current cfq's iops mode has used time slice
>> to calculate). So I don't think it is feasible for me to change it. And
>> cfq works perfectly well for sas/sata environment and the code is quite
>> stable, more codes and more complicate algorithm does mean more bugs. So
>> I guess a new iops based scheduler is easy and not intrusive for the
>> user(since he can choose whether to use it or not).
> 
> Ok, let me take one step back.
> 
> - What's the goal of iops based scheduler. In what kind of workload and
>   storage it is going to help.
> 
> - Can't we just set the slice_idle=0 and "quantum" to some high value
>   say "64" or "128" and achieve similar results to iops based scheduler?
yes, I should say cfq with slice_idle = 0 works well in most cases. But
if it comes to blkcg with ssd, it is really a disaster. You know, cfq
has to choose between different cgroups, so even if you choose 1ms as
the service time for each cgroup(actually in my test, only >2ms can work
reliably). the latency for some requests(which have been sent by the
user while not submitting to the driver) is really too much for the
application. I don't think there is a way to resolve it in cfq.

> 
> In theory, above will cut down on idling and try to provide fairness in
> terms of time. I thought fairness in terms of time is most fair. The
> most common problem is measurement of time is not attributable to
> individual queue in an NCQ hardware. I guess that throws time measurement
> of out the window until and unless we have a better algorithm to measure
> time in NCQ environment.
> 
> I guess then we can just replace time with number of requests dispatched
> from a process queue. Allow it to dispatch requests for some time and
> then schedule it out and put it back on service tree and charge it 
> according to its weight.
As I have said, in this case, the minimal time(1ms) multiple the group
number is too much for a ssd.

If we can use iops based scheduler, we can use iops_weight for different
cgroups and switch cgroup according to this number. So all the
applications can have a moderate response time which can be estimated.

btw, I have talked with Shaohua in LSF and we made a consensus that I
will continue his work and try to add cgroup support to it.

Thanks
Tao
> 
> This all works only if we have right workload. The workloads which are
> not doing dependent reads and can keep the disk busy continuously. If
> there is think time involved, and we do not idle, process will lose its
> share and whole scheme of trying to differentiate between processes will
> become ineffective.
> 
> So if you have come with a better algorith which can keep track of iops
> without idling and still provide service differentiation for common 
> workloads, it will be interesting. 
> 
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-03 16:36                   ` Tao Ma
@ 2012-04-03 16:50                     ` Vivek Goyal
  2012-04-03 17:26                       ` Tao Ma
  0 siblings, 1 reply; 56+ messages in thread
From: Vivek Goyal @ 2012-04-03 16:50 UTC (permalink / raw)
  To: Tao Ma
  Cc: Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups,
	containers, Shaohua Li

On Wed, Apr 04, 2012 at 12:36:24AM +0800, Tao Ma wrote:

[..]
> > - Can't we just set the slice_idle=0 and "quantum" to some high value
> >   say "64" or "128" and achieve similar results to iops based scheduler?
> yes, I should say cfq with slice_idle = 0 works well in most cases. But
> if it comes to blkcg with ssd, it is really a disaster. You know, cfq
> has to choose between different cgroups, so even if you choose 1ms as
> the service time for each cgroup(actually in my test, only >2ms can work
> reliably). the latency for some requests(which have been sent by the
> user while not submitting to the driver) is really too much for the
> application. I don't think there is a way to resolve it in cfq.

Ok, so now you are saying that CFQ as such is not a problem but blkcg
logic in CFQ is an issue.

What's the issue there? I think the issue there also is group idling.
If you set group_idle=0, that idling will be cut down and switching
between groups will be fast. That's a different thing that in the
process you will most likely lose service differentiation also for
most of the workloads.

> 
> > 
> > In theory, above will cut down on idling and try to provide fairness in
> > terms of time. I thought fairness in terms of time is most fair. The
> > most common problem is measurement of time is not attributable to
> > individual queue in an NCQ hardware. I guess that throws time measurement
> > of out the window until and unless we have a better algorithm to measure
> > time in NCQ environment.
> > 
> > I guess then we can just replace time with number of requests dispatched
> > from a process queue. Allow it to dispatch requests for some time and
> > then schedule it out and put it back on service tree and charge it 
> > according to its weight.
> As I have said, in this case, the minimal time(1ms) multiple the group
> number is too much for a ssd.
> 
> If we can use iops based scheduler, we can use iops_weight for different
> cgroups and switch cgroup according to this number. So all the
> applications can have a moderate response time which can be estimated.

How iops_weight and switching different than CFQ group scheduling logic?
I think shaohua was talking of using similar logic. What would you do
fundamentally different so that without idling you will get service 
differentiation? 

If you explain your logic in detail, it will help.

BTW, in last mail you mentioned that in iops_mode() we make use of time.
That's not the case. in iops_mode() we charge group based on number of
requests dispatched. (slice_dispatch records number of requests dispatched
from the queue in that slice). So to me counting number of requests
instead of time will effectively switch CFQ to iops based scheduler, isn't
it?

> 
> btw, I have talked with Shaohua in LSF and we made a consensus that I
> will continue his work and try to add cgroup support to it.

That's fine. you can continue to work. But first explaining the problem
clearly and how you are going to fix it will help. Instead of just saying
"CFQ has problem and we will fix it by bringing in a new scheduler".

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-03 16:50                     ` Vivek Goyal
@ 2012-04-03 17:26                       ` Tao Ma
  2012-04-04 12:35                         ` Shaohua Li
  2012-04-04 13:31                         ` Vivek Goyal
  0 siblings, 2 replies; 56+ messages in thread
From: Tao Ma @ 2012-04-03 17:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups,
	containers, Shaohua Li

On 04/04/2012 12:50 AM, Vivek Goyal wrote:
> On Wed, Apr 04, 2012 at 12:36:24AM +0800, Tao Ma wrote:
> 
> [..]
>>> - Can't we just set the slice_idle=0 and "quantum" to some high value
>>>   say "64" or "128" and achieve similar results to iops based scheduler?
>> yes, I should say cfq with slice_idle = 0 works well in most cases. But
>> if it comes to blkcg with ssd, it is really a disaster. You know, cfq
>> has to choose between different cgroups, so even if you choose 1ms as
>> the service time for each cgroup(actually in my test, only >2ms can work
>> reliably). the latency for some requests(which have been sent by the
>> user while not submitting to the driver) is really too much for the
>> application. I don't think there is a way to resolve it in cfq.
> 
> Ok, so now you are saying that CFQ as such is not a problem but blkcg
> logic in CFQ is an issue.
> 
> What's the issue there? I think the issue there also is group idling.
> If you set group_idle=0, that idling will be cut down and switching
> between groups will be fast. That's a different thing that in the
> process you will most likely lose service differentiation also for
> most of the workloads.
No, group_idle=0 doesn't help. We don't have problem with idling, the
disk is busy for all the tasks, we just want it to be proportional and
time endurable.
> 
>>
>>>
>>> In theory, above will cut down on idling and try to provide fairness in
>>> terms of time. I thought fairness in terms of time is most fair. The
>>> most common problem is measurement of time is not attributable to
>>> individual queue in an NCQ hardware. I guess that throws time measurement
>>> of out the window until and unless we have a better algorithm to measure
>>> time in NCQ environment.
>>>
>>> I guess then we can just replace time with number of requests dispatched
>>> from a process queue. Allow it to dispatch requests for some time and
>>> then schedule it out and put it back on service tree and charge it 
>>> according to its weight.
>> As I have said, in this case, the minimal time(1ms) multiple the group
>> number is too much for a ssd.
>>
>> If we can use iops based scheduler, we can use iops_weight for different
>> cgroups and switch cgroup according to this number. So all the
>> applications can have a moderate response time which can be estimated.
> 
> How iops_weight and switching different than CFQ group scheduling logic?
> I think shaohua was talking of using similar logic. What would you do
> fundamentally different so that without idling you will get service 
> differentiation?
I am thinking of differentiate different groups with iops, so if there
are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
2 io and 3 io in a round-robin way. With a intel ssd, every io can be
finished within 100us. So the maximum latency for one io is about 600us,
still less than 1ms. But with cfq, if all the cgroups are busy, we have
to switch between these group in ms which means the maximum latency will
be 6ms. It is terrible for some applications since they use ssds now.

> 
> If you explain your logic in detail, it will help.
> 
> BTW, in last mail you mentioned that in iops_mode() we make use of time.
> That's not the case. in iops_mode() we charge group based on number of
> requests dispatched. (slice_dispatch records number of requests dispatched
> from the queue in that slice). So to me counting number of requests
> instead of time will effectively switch CFQ to iops based scheduler, isn't
> it?
yes, iops_mode in cfq is calculated iops, but it is switched according
to the time slice, right? So it can't resolve the problem I mentioned above.

Thanks
Tao
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-03 17:26                       ` Tao Ma
@ 2012-04-04 12:35                         ` Shaohua Li
  2012-04-04 13:37                           ` Vivek Goyal
  2012-04-04 13:31                         ` Vivek Goyal
  1 sibling, 1 reply; 56+ messages in thread
From: Shaohua Li @ 2012-04-04 12:35 UTC (permalink / raw)
  To: Tao Ma
  Cc: Vivek Goyal, Tejun Heo, axboe, ctalbott, rni, linux-kernel,
	cgroups, containers

2012/4/3 Tao Ma <tm@tao.ma>:
> On 04/04/2012 12:50 AM, Vivek Goyal wrote:
>> On Wed, Apr 04, 2012 at 12:36:24AM +0800, Tao Ma wrote:
>>
>> [..]
>>>> - Can't we just set the slice_idle=0 and "quantum" to some high value
>>>>   say "64" or "128" and achieve similar results to iops based scheduler?
>>> yes, I should say cfq with slice_idle = 0 works well in most cases. But
>>> if it comes to blkcg with ssd, it is really a disaster. You know, cfq
>>> has to choose between different cgroups, so even if you choose 1ms as
>>> the service time for each cgroup(actually in my test, only >2ms can work
>>> reliably). the latency for some requests(which have been sent by the
>>> user while not submitting to the driver) is really too much for the
>>> application. I don't think there is a way to resolve it in cfq.
>>
>> Ok, so now you are saying that CFQ as such is not a problem but blkcg
>> logic in CFQ is an issue.
>>
>> What's the issue there? I think the issue there also is group idling.
>> If you set group_idle=0, that idling will be cut down and switching
>> between groups will be fast. That's a different thing that in the
>> process you will most likely lose service differentiation also for
>> most of the workloads.
> No, group_idle=0 doesn't help. We don't have problem with idling, the
> disk is busy for all the tasks, we just want it to be proportional and
> time endurable.
>>
>>>
>>>>
>>>> In theory, above will cut down on idling and try to provide fairness in
>>>> terms of time. I thought fairness in terms of time is most fair. The
>>>> most common problem is measurement of time is not attributable to
>>>> individual queue in an NCQ hardware. I guess that throws time measurement
>>>> of out the window until and unless we have a better algorithm to measure
>>>> time in NCQ environment.
>>>>
>>>> I guess then we can just replace time with number of requests dispatched
>>>> from a process queue. Allow it to dispatch requests for some time and
>>>> then schedule it out and put it back on service tree and charge it
>>>> according to its weight.
>>> As I have said, in this case, the minimal time(1ms) multiple the group
>>> number is too much for a ssd.
>>>
>>> If we can use iops based scheduler, we can use iops_weight for different
>>> cgroups and switch cgroup according to this number. So all the
>>> applications can have a moderate response time which can be estimated.
>>
>> How iops_weight and switching different than CFQ group scheduling logic?
>> I think shaohua was talking of using similar logic. What would you do
>> fundamentally different so that without idling you will get service
>> differentiation?
> I am thinking of differentiate different groups with iops, so if there
> are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
> 2 io and 3 io in a round-robin way. With a intel ssd, every io can be
> finished within 100us. So the maximum latency for one io is about 600us,
> still less than 1ms. But with cfq, if all the cgroups are busy, we have
> to switch between these group in ms which means the maximum latency will
> be 6ms. It is terrible for some applications since they use ssds now.
Yes, with iops based scheduling, we do queue switching for every request.
Doing the same thing between groups is quite straightforward. The only issue
I found is this will introduce more process context switch, this isn't
a big issue
for io bound application, but depends. It cuts latency a lot, which I
guess is more
important for web 2.0 application.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-03 17:26                       ` Tao Ma
  2012-04-04 12:35                         ` Shaohua Li
@ 2012-04-04 13:31                         ` Vivek Goyal
  1 sibling, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2012-04-04 13:31 UTC (permalink / raw)
  To: Tao Ma
  Cc: Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups,
	containers, Shaohua Li

On Wed, Apr 04, 2012 at 01:26:06AM +0800, Tao Ma wrote:
> On 04/04/2012 12:50 AM, Vivek Goyal wrote:
> > On Wed, Apr 04, 2012 at 12:36:24AM +0800, Tao Ma wrote:
> > 
> > [..]
> >>> - Can't we just set the slice_idle=0 and "quantum" to some high value
> >>>   say "64" or "128" and achieve similar results to iops based scheduler?
> >> yes, I should say cfq with slice_idle = 0 works well in most cases. But
> >> if it comes to blkcg with ssd, it is really a disaster. You know, cfq
> >> has to choose between different cgroups, so even if you choose 1ms as
> >> the service time for each cgroup(actually in my test, only >2ms can work
> >> reliably). the latency for some requests(which have been sent by the
> >> user while not submitting to the driver) is really too much for the
> >> application. I don't think there is a way to resolve it in cfq.
> > 
> > Ok, so now you are saying that CFQ as such is not a problem but blkcg
> > logic in CFQ is an issue.
> > 
> > What's the issue there? I think the issue there also is group idling.
> > If you set group_idle=0, that idling will be cut down and switching
> > between groups will be fast. That's a different thing that in the
> > process you will most likely lose service differentiation also for
> > most of the workloads.
> No, group_idle=0 doesn't help. We don't have problem with idling, the
> disk is busy for all the tasks, we just want it to be proportional and
> time endurable.

I am not sure what does time "endurable" mean here. So if group idling
is not a problem, then what is the problem. I am still failing to
understand that what's the problem?

[..]
> > How iops_weight and switching different than CFQ group scheduling logic?
> > I think shaohua was talking of using similar logic. What would you do
> > fundamentally different so that without idling you will get service 
> > differentiation?
> I am thinking of differentiate different groups with iops, so if there
> are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
> 2 io and 3 io in a round-robin way. With a intel ssd, every io can be
> finished within 100us. So the maximum latency for one io is about 600us,
> still less than 1ms. But with cfq, if all the cgroups are busy, we have
> to switch between these group in ms which means the maximum latency will
> be 6ms. It is terrible for some applications since they use ssds now.

You can always do faster switching in CFQ. With idling disabled, you can
always expire a queue after dispatching few requests. You don't have to
wait for 1ms. I am not sure why are you assuming that the minimum time
a queue/group has to dispatch is 1ms. 

We already have the notion of not dispatching too many IOs from async
queues. (cfq_prio_to_maxrq()). Something similar can be quickly written
for iops_mode(). Just define a quantum of requests to be dispatched (say
10), and expire a queue after that and charge the queue/group for those
10 requests. Based on its weight, it will automatically go in right
position in the tree and you should get iops based scheduling.

> > 
> > If you explain your logic in detail, it will help.
> > 
> > BTW, in last mail you mentioned that in iops_mode() we make use of time.
> > That's not the case. in iops_mode() we charge group based on number of
> > requests dispatched. (slice_dispatch records number of requests dispatched
> > from the queue in that slice). So to me counting number of requests
> > instead of time will effectively switch CFQ to iops based scheduler, isn't
> > it?
> yes, iops_mode in cfq is calculated iops, but it is switched according
> to the time slice, right? So it can't resolve the problem I mentioned above.

What do you mean that it is switched according to time slice? 

We currently have separate scheduling tree for queue and groups. Currently
iops mode works only for groups. We might still allocate a time slice
to a queue but with idling disabled we will expire it much early. Because
most of the workloads don't keep queue busy long enough. If your workload
keeps the queue busy long enough (say for few ms), then we can introduce
the logic in queue expiry to expire queue after dispatch of few requests
in iops mode so that queue don't get extended time slices.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 12:35                         ` Shaohua Li
@ 2012-04-04 13:37                           ` Vivek Goyal
  2012-04-04 14:52                             ` Shaohua Li
  2012-04-04 16:45                             ` Tao Ma
  0 siblings, 2 replies; 56+ messages in thread
From: Vivek Goyal @ 2012-04-04 13:37 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Tao Ma, Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups,
	containers

On Wed, Apr 04, 2012 at 05:35:49AM -0700, Shaohua Li wrote:

[..]
> >> How iops_weight and switching different than CFQ group scheduling logic?
> >> I think shaohua was talking of using similar logic. What would you do
> >> fundamentally different so that without idling you will get service
> >> differentiation?
> > I am thinking of differentiate different groups with iops, so if there
> > are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
> > 2 io and 3 io in a round-robin way. With a intel ssd, every io can be
> > finished within 100us. So the maximum latency for one io is about 600us,
> > still less than 1ms. But with cfq, if all the cgroups are busy, we have
> > to switch between these group in ms which means the maximum latency will
> > be 6ms. It is terrible for some applications since they use ssds now.
> Yes, with iops based scheduling, we do queue switching for every request.
> Doing the same thing between groups is quite straightforward. The only issue
> I found is this will introduce more process context switch, this isn't
> a big issue
> for io bound application, but depends. It cuts latency a lot, which I
> guess is more
> important for web 2.0 application.

In iops_mode(), expire each cfqq after dispatch of 1 or bunch of requests
and you should get the same behavior (with slice_idle=0 and group_idle=0).
So why write a new scheduler.

Only thing is that with above, current code will provide iops fairness only
for groups. We should be able to tweak queue scheduling to support iops
fairness also.

Anyway, we will end up doing that at some point of time. Supporting two
scheduling algorihtms for queue and groups is not sustainable. There are
already calls to make CFQ hierarchical and in that case both queue and
groups need to be on a single service tree and that means need to follow
same algorithm for scheduling.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 13:37                           ` Vivek Goyal
@ 2012-04-04 14:52                             ` Shaohua Li
  2012-04-04 15:10                               ` Vivek Goyal
  2012-04-04 16:45                             ` Tao Ma
  1 sibling, 1 reply; 56+ messages in thread
From: Shaohua Li @ 2012-04-04 14:52 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tao Ma, Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups,
	containers

2012/4/4 Vivek Goyal <vgoyal@redhat.com>:
> On Wed, Apr 04, 2012 at 05:35:49AM -0700, Shaohua Li wrote:
>
> [..]
>> >> How iops_weight and switching different than CFQ group scheduling logic?
>> >> I think shaohua was talking of using similar logic. What would you do
>> >> fundamentally different so that without idling you will get service
>> >> differentiation?
>> > I am thinking of differentiate different groups with iops, so if there
>> > are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
>> > 2 io and 3 io in a round-robin way. With a intel ssd, every io can be
>> > finished within 100us. So the maximum latency for one io is about 600us,
>> > still less than 1ms. But with cfq, if all the cgroups are busy, we have
>> > to switch between these group in ms which means the maximum latency will
>> > be 6ms. It is terrible for some applications since they use ssds now.
>> Yes, with iops based scheduling, we do queue switching for every request.
>> Doing the same thing between groups is quite straightforward. The only issue
>> I found is this will introduce more process context switch, this isn't
>> a big issue
>> for io bound application, but depends. It cuts latency a lot, which I
>> guess is more
>> important for web 2.0 application.
>
> In iops_mode(), expire each cfqq after dispatch of 1 or bunch of requests
> and you should get the same behavior (with slice_idle=0 and group_idle=0).
> So why write a new scheduler.
>
> Only thing is that with above, current code will provide iops fairness only
> for groups. We should be able to tweak queue scheduling to support iops
> fairness also.
Agreed, we can tweak cfq to make it support iops fairness because the two
are conceptually the same. The problem is if this is a mess. CFQ is quite
complicated already. In iops mode, a lot of code isn't required, like idle,
queue merging, thinktime/seek detection and so on, as the scheduler
will be only for ssd. With recent iocontext cleanup, the iops scheduler
code is quite short actually.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 14:52                             ` Shaohua Li
@ 2012-04-04 15:10                               ` Vivek Goyal
  2012-04-04 16:06                                 ` Tao Ma
  0 siblings, 1 reply; 56+ messages in thread
From: Vivek Goyal @ 2012-04-04 15:10 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Tao Ma, Tejun Heo, axboe, ctalbott, rni, linux-kernel, cgroups,
	containers

On Wed, Apr 04, 2012 at 07:52:24AM -0700, Shaohua Li wrote:

[..]
> Agreed, we can tweak cfq to make it support iops fairness because the two
> are conceptually the same. The problem is if this is a mess. CFQ is quite
> complicated already. In iops mode, a lot of code isn't required, like idle,
> queue merging, thinktime/seek detection and so on, as the scheduler
> will be only for ssd. With recent iocontext cleanup, the iops scheduler
> code is quite short actually.

Ok, this is somewhat a better reason to have a separate scheduler. I guess
we need to look that actual iops code and that can help decide whether
to keep it as a separate scheduler.

One question is still unanswered though. What real workload benefits
from it? If you are not doing idling in iops based scheduler, I doubt
you are doing to see much service differentiation on fast SSDs. As for
service differentiation IO queues have to be continuously backlogged and
total IOPS needed by applications need to be more than what disk can
offer. It becomes very hard to produe continuously backlogged queues
because real applications tend to read some data, proces data and then
generate more IO. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 15:10                               ` Vivek Goyal
@ 2012-04-04 16:06                                 ` Tao Ma
  0 siblings, 0 replies; 56+ messages in thread
From: Tao Ma @ 2012-04-04 16:06 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Shaohua Li, Tejun Heo, axboe, ctalbott, rni, linux-kernel,
	cgroups, containers

On 04/04/2012 11:10 PM, Vivek Goyal wrote:
> On Wed, Apr 04, 2012 at 07:52:24AM -0700, Shaohua Li wrote:
> 
> [..]
>> Agreed, we can tweak cfq to make it support iops fairness because the two
>> are conceptually the same. The problem is if this is a mess. CFQ is quite
>> complicated already. In iops mode, a lot of code isn't required, like idle,
>> queue merging, thinktime/seek detection and so on, as the scheduler
>> will be only for ssd. With recent iocontext cleanup, the iops scheduler
>> code is quite short actually.
> 
> Ok, this is somewhat a better reason to have a separate scheduler. I guess
> we need to look that actual iops code and that can help decide whether
> to keep it as a separate scheduler.
yes, actually I am afraid of making any *big* changes to cfq since it is
stable and complicated. It would be terrible for our customer which uses
sata and sas most of their time. So iops based scheduler is only used
for ssds.
> 
> One question is still unanswered though. What real workload benefits
> from it? If you are not doing idling in iops based scheduler, I doubt
> you are doing to see much service differentiation on fast SSDs. As for
> service differentiation IO queues have to be continuously backlogged and
> total IOPS needed by applications need to be more than what disk can
> offer. It becomes very hard to produe continuously backlogged queues
> because real applications tend to read some data, proces data and then
> generate more IO. 
OK, I guess I can describe our workload somehow. Yes, for very fast
SSDs, it would not help since the io depth is too high and we can't fill
in enough requests. But for some not that fast SSDs(say intel's x25m
series), it can only have tens of thousands iops, and it would help us
if we can have this type of ssd work as proportional. Yes, in most case,
the ssd will be idle, but we do have times that the disk is very busy
and we need the proportional iops to fit our customer's need.

Thanks
Tao

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 13:37                           ` Vivek Goyal
  2012-04-04 14:52                             ` Shaohua Li
@ 2012-04-04 16:45                             ` Tao Ma
  2012-04-04 16:50                               ` Vivek Goyal
  1 sibling, 1 reply; 56+ messages in thread
From: Tao Ma @ 2012-04-04 16:45 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Shaohua Li, Tejun Heo, axboe, ctalbott, rni, linux-kernel,
	cgroups, containers

On 04/04/2012 09:37 PM, Vivek Goyal wrote:
> On Wed, Apr 04, 2012 at 05:35:49AM -0700, Shaohua Li wrote:
> 
> [..]
>>>> How iops_weight and switching different than CFQ group scheduling logic?
>>>> I think shaohua was talking of using similar logic. What would you do
>>>> fundamentally different so that without idling you will get service
>>>> differentiation?
>>> I am thinking of differentiate different groups with iops, so if there
>>> are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
>>> 2 io and 3 io in a round-robin way. With a intel ssd, every io can be
>>> finished within 100us. So the maximum latency for one io is about 600us,
>>> still less than 1ms. But with cfq, if all the cgroups are busy, we have
>>> to switch between these group in ms which means the maximum latency will
>>> be 6ms. It is terrible for some applications since they use ssds now.
>> Yes, with iops based scheduling, we do queue switching for every request.
>> Doing the same thing between groups is quite straightforward. The only issue
>> I found is this will introduce more process context switch, this isn't
>> a big issue
>> for io bound application, but depends. It cuts latency a lot, which I
>> guess is more
>> important for web 2.0 application.
> 
> In iops_mode(), expire each cfqq after dispatch of 1 or bunch of requests
> and you should get the same behavior (with slice_idle=0 and group_idle=0).
> So why write a new scheduler.
really? How could we config cfq to work like this? Or you mean we can
change the code for it?
> 
> Only thing is that with above, current code will provide iops fairness only
> for groups. We should be able to tweak queue scheduling to support iops
> fairness also.
OK, as I have said in another e-mail another my concern is the
complexity. It will make cfq too much complicated. I just checked the
source code of shaohua's original patch, fiops scheduler is only ~700
lines, so with cgroup support added it would be ~1000 lines I guess.
Currently cfq-iosched.c is around ~4000 lines even after Tejun's cleanup
of io context...

Thanks
Tao
> 
> Anyway, we will end up doing that at some point of time. Supporting two
> scheduling algorihtms for queue and groups is not sustainable. There are
> already calls to make CFQ hierarchical and in that case both queue and
> groups need to be on a single service tree and that means need to follow
> same algorithm for scheduling.
> 
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 16:45                             ` Tao Ma
@ 2012-04-04 16:50                               ` Vivek Goyal
  2012-04-04 17:17                                 ` Vivek Goyal
  2012-04-04 17:18                                 ` Tao Ma
  0 siblings, 2 replies; 56+ messages in thread
From: Vivek Goyal @ 2012-04-04 16:50 UTC (permalink / raw)
  To: Tao Ma
  Cc: Shaohua Li, Tejun Heo, axboe, ctalbott, rni, linux-kernel,
	cgroups, containers

On Thu, Apr 05, 2012 at 12:45:05AM +0800, Tao Ma wrote:

[..]
> > In iops_mode(), expire each cfqq after dispatch of 1 or bunch of requests
> > and you should get the same behavior (with slice_idle=0 and group_idle=0).
> > So why write a new scheduler.
> really? How could we config cfq to work like this? Or you mean we can
> change the code for it?

You can just put a few lines of code to expire queue after 1-2 requests
dispatched from the queue. Than run your workload with slice_idle=0
and group_idle=0 and see what happens.

I don't even know what your workload is. 

> > 
> > Only thing is that with above, current code will provide iops fairness only
> > for groups. We should be able to tweak queue scheduling to support iops
> > fairness also.
> OK, as I have said in another e-mail another my concern is the
> complexity. It will make cfq too much complicated. I just checked the
> source code of shaohua's original patch, fiops scheduler is only ~700
> lines, so with cgroup support added it would be ~1000 lines I guess.
> Currently cfq-iosched.c is around ~4000 lines even after Tejun's cleanup
> of io context...

I think a large chunk of that iops scheduler code will be borrowed from
CFQ code. All the cgroup logic, queue creation logic, group scheduling
logic etc. And that's the reason I was still exploring the possibility 
of having common code base.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 16:50                               ` Vivek Goyal
@ 2012-04-04 17:17                                 ` Vivek Goyal
  2012-04-04 17:18                                 ` Tao Ma
  1 sibling, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2012-04-04 17:17 UTC (permalink / raw)
  To: Tao Ma
  Cc: Shaohua Li, Tejun Heo, axboe, ctalbott, rni, linux-kernel,
	cgroups, containers

On Wed, Apr 04, 2012 at 12:50:48PM -0400, Vivek Goyal wrote:
> On Thu, Apr 05, 2012 at 12:45:05AM +0800, Tao Ma wrote:
> 
> [..]
> > > In iops_mode(), expire each cfqq after dispatch of 1 or bunch of requests
> > > and you should get the same behavior (with slice_idle=0 and group_idle=0).
> > > So why write a new scheduler.
> > really? How could we config cfq to work like this? Or you mean we can
> > change the code for it?
> 
> You can just put a few lines of code to expire queue after 1-2 requests
> dispatched from the queue. Than run your workload with slice_idle=0
> and group_idle=0 and see what happens.

Can you apply following patch and test your workload with slice_idle=0,
group_idle=0 and quantum=64/128.

I expect that fast queue and group switching will take place. Even if your
workload is creating continuously backlogged queues, we will still
expire the queue after dispatching 5 requests from the queue and requeue
it.

I also expect that you should see service differentation at *group level*.
(And not at cfqq level), if your workload is creating continuously
backlogged groups. Otherwise it will effectively become round robin
scheduling.

If possible, send me small backtrace (5 seconds trace), of your workload
and that can help understand little better what is going on.

Thanks
Vivek


---
 block/cfq-iosched.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: linux-2.6/block/cfq-iosched.c
===================================================================
--- linux-2.6.orig/block/cfq-iosched.c	2012-04-03 23:18:33.000000000 -0400
+++ linux-2.6/block/cfq-iosched.c	2012-04-05 00:02:07.517806185 -0400
@@ -655,8 +655,13 @@ cfq_set_prio_slice(struct cfq_data *cfqd
  */
 static inline bool cfq_slice_used(struct cfq_queue *cfqq)
 {
+	/* In iops mode, we really are not looking for time measurement */
+	if (iops_mode(cfqq->cfqd) && cfqq->slice_dispatch > 5)
+		return true;
+
 	if (cfq_cfqq_slice_new(cfqq))
 		return false;
+
 	if (time_before(jiffies, cfqq->slice_end))
 		return false;
 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 16:50                               ` Vivek Goyal
  2012-04-04 17:17                                 ` Vivek Goyal
@ 2012-04-04 17:18                                 ` Tao Ma
  2012-04-04 17:27                                   ` Vivek Goyal
  2012-04-04 18:22                                   ` Vivek Goyal
  1 sibling, 2 replies; 56+ messages in thread
From: Tao Ma @ 2012-04-04 17:18 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Shaohua Li, Tejun Heo, axboe, ctalbott, rni, linux-kernel,
	cgroups, containers

On 04/05/2012 12:50 AM, Vivek Goyal wrote:
> On Thu, Apr 05, 2012 at 12:45:05AM +0800, Tao Ma wrote:
> 
> [..]
>>> In iops_mode(), expire each cfqq after dispatch of 1 or bunch of requests
>>> and you should get the same behavior (with slice_idle=0 and group_idle=0).
>>> So why write a new scheduler.
>> really? How could we config cfq to work like this? Or you mean we can
>> change the code for it?
> 
> You can just put a few lines of code to expire queue after 1-2 requests
> dispatched from the queue. Than run your workload with slice_idle=0
> and group_idle=0 and see what happens.
oh, yes I can do this to see whether the latency helps, but it is
hacking and doesn't work with the cgroup proportion...
> 
> I don't even know what your workload is. 
Sorry for not allowing to say more about it.
> 
>>>
>>> Only thing is that with above, current code will provide iops fairness only
>>> for groups. We should be able to tweak queue scheduling to support iops
>>> fairness also.
>> OK, as I have said in another e-mail another my concern is the
>> complexity. It will make cfq too much complicated. I just checked the
>> source code of shaohua's original patch, fiops scheduler is only ~700
>> lines, so with cgroup support added it would be ~1000 lines I guess.
>> Currently cfq-iosched.c is around ~4000 lines even after Tejun's cleanup
>> of io context...
> 
> I think a large chunk of that iops scheduler code will be borrowed from
> CFQ code. All the cgroup logic, queue creation logic, group scheduling
> logic etc. And that's the reason I was still exploring the possibility 
> of having common code base.
Yeah, actually I was thinking of abstracting a generic logic, but it
seems a lot bit hard. Maybe we can try to unify the code later?

Thanks
Tao

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 17:18                                 ` Tao Ma
@ 2012-04-04 17:27                                   ` Vivek Goyal
  2012-04-04 18:22                                   ` Vivek Goyal
  1 sibling, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2012-04-04 17:27 UTC (permalink / raw)
  To: Tao Ma
  Cc: Shaohua Li, Tejun Heo, axboe, ctalbott, rni, linux-kernel,
	cgroups, containers

On Thu, Apr 05, 2012 at 01:18:05AM +0800, Tao Ma wrote:

[..]
> > You can just put a few lines of code to expire queue after 1-2 requests
> > dispatched from the queue. Than run your workload with slice_idle=0
> > and group_idle=0 and see what happens.
> oh, yes I can do this to see whether the latency helps, but it is
> hacking and doesn't work with the cgroup proportion...

"Hacking"?. I think effectively that's what effectively iops scheduler
should be doing to achieve faster switching.

Also, if your workload is keeping groups continuously busy, you should
get proportional behavior at group level. 

Do try the patch I sent you in a separate mail with your workload.

[..]
> > I think a large chunk of that iops scheduler code will be borrowed from
> > CFQ code. All the cgroup logic, queue creation logic, group scheduling
> > logic etc. And that's the reason I was still exploring the possibility 
> > of having common code base.
> Yeah, actually I was thinking of abstracting a generic logic, but it
> seems a lot bit hard. Maybe we can try to unify the code later?

Once you write and merge a new scheduler, that code merge is never going
to happen. They will happily part ways with lot of code/logic shared.

Once the hierarchical support comes to CFQ, same hierarchical cgroup 
support needs to be written to this new scheduler also.


Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 17:18                                 ` Tao Ma
  2012-04-04 17:27                                   ` Vivek Goyal
@ 2012-04-04 18:22                                   ` Vivek Goyal
  2012-04-04 18:36                                     ` Tao Ma
  1 sibling, 1 reply; 56+ messages in thread
From: Vivek Goyal @ 2012-04-04 18:22 UTC (permalink / raw)
  To: Tao Ma
  Cc: Shaohua Li, Tejun Heo, axboe, ctalbott, rni, linux-kernel,
	cgroups, containers

On Thu, Apr 05, 2012 at 01:18:05AM +0800, Tao Ma wrote:

[..]
> > I think a large chunk of that iops scheduler code will be borrowed from
> > CFQ code. All the cgroup logic, queue creation logic, group scheduling
> > logic etc. And that's the reason I was still exploring the possibility 
> > of having common code base.
> Yeah, actually I was thinking of abstracting a generic logic, but it
> seems a lot bit hard. Maybe we can try to unify the code later?

I think if we change the cfqq scheduling logic to something similar to
group scheduling logic, it will help a lot.

- Current virtual time based logic does not care whether you are operating
  in time mode or iops mode. Switching cfqq logic to similar logic will
  help moving to iops mode quickly.

- Keeping track of vtime will help that we will get rid of all the
  residual time logic. If some queue was preempted, and did not use full
  slice, we will automaticlally charge it less and give smaller vtime.

- Keeping both the scheduling logic will enable us the smoother
  integration of both cfqq and group logic once we support hierarchical
  cgroups.

- It will also enable easier integration of iops related logic.

So I am in favor of cleaning up CFQ code and change it to deal with both
time as well iops. Seriously, implmenting time or iops is not hard. It is
about rest of the logic like trees, groups which contributes towards bulk
of the code and I am really not convinced that iops scheduler is going to
be different enough that it needs new io scheduler.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq)
  2012-04-04 18:22                                   ` Vivek Goyal
@ 2012-04-04 18:36                                     ` Tao Ma
  0 siblings, 0 replies; 56+ messages in thread
From: Tao Ma @ 2012-04-04 18:36 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Shaohua Li, Tejun Heo, axboe, ctalbott, rni, linux-kernel,
	cgroups, containers

On 04/05/2012 02:22 AM, Vivek Goyal wrote:
> On Thu, Apr 05, 2012 at 01:18:05AM +0800, Tao Ma wrote:
> 
> [..]
>>> I think a large chunk of that iops scheduler code will be borrowed from
>>> CFQ code. All the cgroup logic, queue creation logic, group scheduling
>>> logic etc. And that's the reason I was still exploring the possibility 
>>> of having common code base.
>> Yeah, actually I was thinking of abstracting a generic logic, but it
>> seems a lot bit hard. Maybe we can try to unify the code later?
> 
> I think if we change the cfqq scheduling logic to something similar to
> group scheduling logic, it will help a lot.
> 
> - Current virtual time based logic does not care whether you are operating
>   in time mode or iops mode. Switching cfqq logic to similar logic will
>   help moving to iops mode quickly.
> 
> - Keeping track of vtime will help that we will get rid of all the
>   residual time logic. If some queue was preempted, and did not use full
>   slice, we will automaticlally charge it less and give smaller vtime.
> 
> - Keeping both the scheduling logic will enable us the smoother
>   integration of both cfqq and group logic once we support hierarchical
>   cgroups.
> 
> - It will also enable easier integration of iops related logic.
Maybe, I will check all of these after my travel. Also I will try your
patch at that time.
> 
> So I am in favor of cleaning up CFQ code and change it to deal with both
> time as well iops. Seriously, implmenting time or iops is not hard. It is
> about rest of the logic like trees, groups which contributes towards bulk
> of the code and I am really not convinced that iops scheduler is going to
> be different enough that it needs new io scheduler.
Sure, I also want to make my problem perfect resolved while maitaining
things simple.

Thanks
Tao

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2012-04-04 18:36 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-28 22:51 [PATCHSET] block: modularize blkcg config and stat file handling Tejun Heo
2012-03-28 22:51 ` [PATCH 01/21] blkcg: remove unused @pol and @plid parameters Tejun Heo
2012-03-28 22:51 ` [PATCH 02/21] blkcg: BLKIO_STAT_CPU_SECTORS doesn't have subcounters Tejun Heo
2012-03-28 22:51 ` [PATCH 03/21] blkcg: introduce blkg_stat and blkg_rwstat Tejun Heo
2012-03-28 22:51 ` [PATCH 04/21] blkcg: restructure statistics printing Tejun Heo
2012-03-28 22:51 ` [PATCH 05/21] blkcg: drop blkiocg_file_write_u64() Tejun Heo
2012-03-28 22:51 ` [PATCH 06/21] blkcg: restructure configuration printing Tejun Heo
2012-03-28 22:51 ` [PATCH 07/21] blkcg: restructure blkio_group configruation setting Tejun Heo
2012-03-28 22:51 ` [PATCH 08/21] blkcg: blkg_conf_prep() Tejun Heo
2012-03-28 22:53   ` Tejun Heo
2012-03-28 22:51 ` [PATCH 09/21] blkcg: export conf/stat helpers to prepare for reorganization Tejun Heo
2012-03-28 22:51 ` [PATCH 10/21] blkcg: implement blkio_policy_type->cftypes Tejun Heo
2012-03-28 22:51 ` [PATCH 11/21] blkcg: move conf/stat file handling code to policies Tejun Heo
2012-03-28 22:51 ` [PATCH 12/21] cfq: collapse cfq.h into cfq-iosched.c Tejun Heo
2012-03-28 22:51 ` [PATCH 13/21] blkcg: move statistics update code to policies Tejun Heo
2012-03-28 22:51 ` [PATCH 14/21] blkcg: cfq doesn't need per-cpu dispatch stats Tejun Heo
2012-03-28 22:51 ` [PATCH 15/21] blkcg: add blkio_policy_ops operations for exit and stat reset Tejun Heo
2012-03-28 22:51 ` [PATCH 16/21] blkcg: move blkio_group_stats to cfq-iosched.c Tejun Heo
2012-03-28 22:51 ` [PATCH 17/21] blkcg: move blkio_group_stats_cpu and friends to blk-throttle.c Tejun Heo
2012-03-28 22:51 ` [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq Tejun Heo
2012-04-01 21:09   ` Vivek Goyal
2012-04-01 21:22     ` Tejun Heo
2012-04-02 21:39   ` Tao Ma
2012-04-02 21:49     ` Tejun Heo
2012-04-02 22:03       ` Tao Ma
2012-04-02 22:17         ` Tejun Heo
2012-04-02 22:20           ` Tao Ma
2012-04-02 22:25             ` Vivek Goyal
2012-04-02 22:28               ` Tejun Heo
2012-04-02 22:41               ` Tao Ma
2012-04-03 15:37                 ` IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group_conf->weight to cfq) Vivek Goyal
2012-04-03 16:36                   ` Tao Ma
2012-04-03 16:50                     ` Vivek Goyal
2012-04-03 17:26                       ` Tao Ma
2012-04-04 12:35                         ` Shaohua Li
2012-04-04 13:37                           ` Vivek Goyal
2012-04-04 14:52                             ` Shaohua Li
2012-04-04 15:10                               ` Vivek Goyal
2012-04-04 16:06                                 ` Tao Ma
2012-04-04 16:45                             ` Tao Ma
2012-04-04 16:50                               ` Vivek Goyal
2012-04-04 17:17                                 ` Vivek Goyal
2012-04-04 17:18                                 ` Tao Ma
2012-04-04 17:27                                   ` Vivek Goyal
2012-04-04 18:22                                   ` Vivek Goyal
2012-04-04 18:36                                     ` Tao Ma
2012-04-04 13:31                         ` Vivek Goyal
2012-03-28 22:51 ` [PATCH 19/21] blkcg: move blkio_group_conf->iops and ->bps to blk-throttle Tejun Heo
2012-03-28 22:51 ` [PATCH 20/21] blkcg: pass around pd->pdata instead of pd itself in prfill functions Tejun Heo
2012-03-28 22:51 ` [PATCH 21/21] blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros Tejun Heo
2012-03-29  8:18 ` [PATCHSET] block: modularize blkcg config and stat file handling Jens Axboe
2012-04-02 20:02   ` Tejun Heo
2012-04-02 21:51     ` Jens Axboe
2012-04-02 22:33       ` Tejun Heo
2012-04-01 19:38 ` Vivek Goyal
2012-04-01 21:42 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).