All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-13  1:32 ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-04-13  1:32 UTC (permalink / raw)
  To: Vivek Goyal, Jens Axboe; +Cc: linux-kernel, Li Zefan, cgroups

Hello,

Unified hierarchy has finally been posted.

  http://thread.gmane.org/gmane.linux.kernel.containers/27601

It took a lot longer than I originally anticipated and over the course
quite a few aspects of the initial design have changed, hopefully, for
the better.  One of the areas which has seen major redirection is
internal tasks competing against child cgroups.  The original plan was
implementing an implicit leaf node for each cgroup so that internal
tasks compete as an implicit child against other children.  cfq was
chosen as the first one to try the strategy and the result was
leaf_weight[_device] knobs.  This, unfortunately, doesn't seem to be
the right direction.

The strategy complicates the controller interface and implementation,
and ends up adding an extra layer of nesting at leaves.  Also,
initially, it was thought that internal tasks are problematic only for
weight based controllers; however, it turns out it also is a problem
for absolute limit based ones like memory, which means that we'd need
to complicate more controllers.

As the problem is something fundamental in most resource controllers,
the implemented unified hierarchy now enforces structural constraint
which prevents competion between internal tasks and child cgroups from
the get-go making leaf_weight[_device] mechanism unnecessary.

In addition, blkio interface is generally a bit messy and all cfq
knobs are missing "cfq." prefix.  As unified hierarchy involves
noticeable changes in usage, this patch takes the chance and make
blkio present more consistent and concise interface on unified
hierarchy

I'm currently preparing a doc describing the design and rationales of
unified hierarchy and most of the above information will be present in
the documentation.

Thanks.

------ 8< ------
Some blkcg knobs don't make sense on unified hierarchy.  This patch
makes the following changes for unified hierarchy.

* reset_stats never made much sense outside developement and
  debugging.  Omit it.

* As unified hierarchy excludes internal tasks competing against child
  cgroups, cfq's leaf_weight[_device] knobs are no longer necessary.
  Omit it.

* For cfq, weight[_device] behave inherently differently from the same
  knobs on !root cgroups as root cgroup is treated as parallel to
  immediate children.  Let's omit it for now.

* cfq's time stats reports the total time slice consumed but its unit
  is in jiffies and it's not reported per-device like all other stats.
  Omit it for now.  If slice time is actually necessary, let's add it
  in the form consistent with other stats.

* cfq's io_queued reporting is different from all other stats in that
  it's a snapshot value rather than accumulated value and is mostly
  meaningful only for debugging.  Omit it.

* As unified hierarchy doesn't allow internal tasks, non-recursive
  stats are largely irrelevant.  Omit them.

* cfq adds a number of stats knobs if CONFIG_DEBUG_BLK_CGROUP.  If
  these are actually for debugging, they shouldn't show up by default
  (ie. should be gated by kernel param or something).  If userland
  already developed dependency on them, we need to take them out of
  CONFIG_DEBUG_BLK_CGROUP.  Skip them for now.  We can add back the
  relevant ones later.

* Prefix the remaining cfq knobs with "cfq." and make the names
  consistent.

  weight_device			-> cfq.weight_device
  weight			-> cfq.weight
  sectors_recursive		-> cfq.sectors
  io_service_bytes_recursive	-> cfq.bytes
  io_serviced_recursive		-> cfq.serviced
  io_merged_recursive		-> cfq.merged
  io_service_time_recursive	-> cfq.io_time
  io_wait_time_recursive	-> cfq.wait_time

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/blk-cgroup.c  |    1 
 block/cfq-iosched.c |   89 +++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 86 insertions(+), 4 deletions(-)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -761,6 +761,7 @@ EXPORT_SYMBOL_GPL(blkg_conf_finish);
 struct cftype blkcg_files[] = {
 	{
 		.name = "reset_stats",
+		.flags = CFTYPE_INSANE,
 		.write_u64 = blkcg_reset_stats,
 	},
 	{ }	/* terminate */
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1835,13 +1835,13 @@ static struct cftype cfq_blkcg_files[] =
 	/* on root, weight is mapped to leaf_weight */
 	{
 		.name = "weight_device",
-		.flags = CFTYPE_ONLY_ON_ROOT,
+		.flags = CFTYPE_INSANE | CFTYPE_ONLY_ON_ROOT,
 		.seq_show = cfqg_print_leaf_weight_device,
 		.write_string = cfqg_set_leaf_weight_device,
 	},
 	{
 		.name = "weight",
-		.flags = CFTYPE_ONLY_ON_ROOT,
+		.flags = CFTYPE_INSANE | CFTYPE_ONLY_ON_ROOT,
 		.seq_show = cfq_print_leaf_weight,
 		.write_u64 = cfq_set_leaf_weight,
 	},
@@ -1849,24 +1849,26 @@ static struct cftype cfq_blkcg_files[] =
 	/* no such mapping necessary for !roots */
 	{
 		.name = "weight_device",
-		.flags = CFTYPE_NOT_ON_ROOT,
+		.flags = CFTYPE_INSANE | CFTYPE_NOT_ON_ROOT,
 		.seq_show = cfqg_print_weight_device,
 		.write_string = cfqg_set_weight_device,
 	},
 	{
 		.name = "weight",
-		.flags = CFTYPE_NOT_ON_ROOT,
+		.flags = CFTYPE_INSANE | CFTYPE_NOT_ON_ROOT,
 		.seq_show = cfq_print_weight,
 		.write_u64 = cfq_set_weight,
 	},
 
 	{
 		.name = "leaf_weight_device",
+		.flags = CFTYPE_INSANE,
 		.seq_show = cfqg_print_leaf_weight_device,
 		.write_string = cfqg_set_leaf_weight_device,
 	},
 	{
 		.name = "leaf_weight",
+		.flags = CFTYPE_INSANE,
 		.seq_show = cfq_print_leaf_weight,
 		.write_u64 = cfq_set_leaf_weight,
 	},
@@ -1874,41 +1876,49 @@ static struct cftype cfq_blkcg_files[] =
 	/* statistics, covers only the tasks in the cfqg */
 	{
 		.name = "time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.time),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "sectors",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.sectors),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "io_service_bytes",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.service_bytes),
 		.seq_show = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_serviced",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.serviced),
 		.seq_show = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_service_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.service_time),
 		.seq_show = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_wait_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.wait_time),
 		.seq_show = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_merged",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.merged),
 		.seq_show = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_queued",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.queued),
 		.seq_show = cfqg_print_rwstat,
 	},
@@ -1916,75 +1926,146 @@ static struct cftype cfq_blkcg_files[] =
 	/* the same statictics which cover the cfqg and its descendants */
 	{
 		.name = "time_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.time),
 		.seq_show = cfqg_print_stat_recursive,
 	},
 	{
 		.name = "sectors_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.sectors),
 		.seq_show = cfqg_print_stat_recursive,
 	},
 	{
 		.name = "io_service_bytes_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.service_bytes),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 	{
 		.name = "io_serviced_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.serviced),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 	{
 		.name = "io_service_time_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.service_time),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 	{
 		.name = "io_wait_time_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.wait_time),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 	{
 		.name = "io_merged_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.merged),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 	{
 		.name = "io_queued_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.queued),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 	{
 		.name = "avg_queue_size",
+		.flags = CFTYPE_INSANE,
 		.seq_show = cfqg_print_avg_queue_size,
 	},
 	{
 		.name = "group_wait_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.group_wait_time),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "idle_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.idle_time),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "empty_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.empty_time),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "dequeue",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.dequeue),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "unaccounted_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.unaccounted_time),
 		.seq_show = cfqg_print_stat,
 	},
 #endif	/* CONFIG_DEBUG_BLK_CGROUP */
+
+	/* files for default hierarchy, properly prefixed with cfq */
+	{
+		.name = "cfq.weight_device",
+		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
+		.seq_show = cfqg_print_weight_device,
+		.write_string = cfqg_set_weight_device,
+	},
+	{
+		.name = "cfq.weight",
+		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
+		.seq_show = cfq_print_weight,
+		.write_u64 = cfq_set_weight,
+	},
+
+	/*
+	 * All stats are recursive and fewer are visible.  Please do not
+	 * add stats which aren't strictly necessary or expose internal
+	 * details.
+	 */
+	{
+		.name = "cfq.sectors",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.sectors),
+		.seq_show = cfqg_print_stat_recursive,
+	},
+	{
+		.name = "cfq.bytes",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.service_bytes),
+		.seq_show = cfqg_print_rwstat_recursive,
+	},
+	{
+		.name = "cfq.serviced",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.serviced),
+		.seq_show = cfqg_print_rwstat_recursive,
+	},
+	{
+		.name = "cfq.merged",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.merged),
+		.seq_show = cfqg_print_rwstat_recursive,
+	},
+	{
+		.name = "cfq.io_time",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.service_time),
+		.seq_show = cfqg_print_rwstat_recursive,
+	},
+	{
+		.name = "cfq.wait_time",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.wait_time),
+		.seq_show = cfqg_print_rwstat_recursive,
+	},
+
 	{ }	/* terminate */
 };
 #else /* GROUP_IOSCHED */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-13  1:32 ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-04-13  1:32 UTC (permalink / raw)
  To: Vivek Goyal, Jens Axboe
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

Unified hierarchy has finally been posted.

  http://thread.gmane.org/gmane.linux.kernel.containers/27601

It took a lot longer than I originally anticipated and over the course
quite a few aspects of the initial design have changed, hopefully, for
the better.  One of the areas which has seen major redirection is
internal tasks competing against child cgroups.  The original plan was
implementing an implicit leaf node for each cgroup so that internal
tasks compete as an implicit child against other children.  cfq was
chosen as the first one to try the strategy and the result was
leaf_weight[_device] knobs.  This, unfortunately, doesn't seem to be
the right direction.

The strategy complicates the controller interface and implementation,
and ends up adding an extra layer of nesting at leaves.  Also,
initially, it was thought that internal tasks are problematic only for
weight based controllers; however, it turns out it also is a problem
for absolute limit based ones like memory, which means that we'd need
to complicate more controllers.

As the problem is something fundamental in most resource controllers,
the implemented unified hierarchy now enforces structural constraint
which prevents competion between internal tasks and child cgroups from
the get-go making leaf_weight[_device] mechanism unnecessary.

In addition, blkio interface is generally a bit messy and all cfq
knobs are missing "cfq." prefix.  As unified hierarchy involves
noticeable changes in usage, this patch takes the chance and make
blkio present more consistent and concise interface on unified
hierarchy

I'm currently preparing a doc describing the design and rationales of
unified hierarchy and most of the above information will be present in
the documentation.

Thanks.

------ 8< ------
Some blkcg knobs don't make sense on unified hierarchy.  This patch
makes the following changes for unified hierarchy.

* reset_stats never made much sense outside developement and
  debugging.  Omit it.

* As unified hierarchy excludes internal tasks competing against child
  cgroups, cfq's leaf_weight[_device] knobs are no longer necessary.
  Omit it.

* For cfq, weight[_device] behave inherently differently from the same
  knobs on !root cgroups as root cgroup is treated as parallel to
  immediate children.  Let's omit it for now.

* cfq's time stats reports the total time slice consumed but its unit
  is in jiffies and it's not reported per-device like all other stats.
  Omit it for now.  If slice time is actually necessary, let's add it
  in the form consistent with other stats.

* cfq's io_queued reporting is different from all other stats in that
  it's a snapshot value rather than accumulated value and is mostly
  meaningful only for debugging.  Omit it.

* As unified hierarchy doesn't allow internal tasks, non-recursive
  stats are largely irrelevant.  Omit them.

* cfq adds a number of stats knobs if CONFIG_DEBUG_BLK_CGROUP.  If
  these are actually for debugging, they shouldn't show up by default
  (ie. should be gated by kernel param or something).  If userland
  already developed dependency on them, we need to take them out of
  CONFIG_DEBUG_BLK_CGROUP.  Skip them for now.  We can add back the
  relevant ones later.

* Prefix the remaining cfq knobs with "cfq." and make the names
  consistent.

  weight_device			-> cfq.weight_device
  weight			-> cfq.weight
  sectors_recursive		-> cfq.sectors
  io_service_bytes_recursive	-> cfq.bytes
  io_serviced_recursive		-> cfq.serviced
  io_merged_recursive		-> cfq.merged
  io_service_time_recursive	-> cfq.io_time
  io_wait_time_recursive	-> cfq.wait_time

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
---
 block/blk-cgroup.c  |    1 
 block/cfq-iosched.c |   89 +++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 86 insertions(+), 4 deletions(-)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -761,6 +761,7 @@ EXPORT_SYMBOL_GPL(blkg_conf_finish);
 struct cftype blkcg_files[] = {
 	{
 		.name = "reset_stats",
+		.flags = CFTYPE_INSANE,
 		.write_u64 = blkcg_reset_stats,
 	},
 	{ }	/* terminate */
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1835,13 +1835,13 @@ static struct cftype cfq_blkcg_files[] =
 	/* on root, weight is mapped to leaf_weight */
 	{
 		.name = "weight_device",
-		.flags = CFTYPE_ONLY_ON_ROOT,
+		.flags = CFTYPE_INSANE | CFTYPE_ONLY_ON_ROOT,
 		.seq_show = cfqg_print_leaf_weight_device,
 		.write_string = cfqg_set_leaf_weight_device,
 	},
 	{
 		.name = "weight",
-		.flags = CFTYPE_ONLY_ON_ROOT,
+		.flags = CFTYPE_INSANE | CFTYPE_ONLY_ON_ROOT,
 		.seq_show = cfq_print_leaf_weight,
 		.write_u64 = cfq_set_leaf_weight,
 	},
@@ -1849,24 +1849,26 @@ static struct cftype cfq_blkcg_files[] =
 	/* no such mapping necessary for !roots */
 	{
 		.name = "weight_device",
-		.flags = CFTYPE_NOT_ON_ROOT,
+		.flags = CFTYPE_INSANE | CFTYPE_NOT_ON_ROOT,
 		.seq_show = cfqg_print_weight_device,
 		.write_string = cfqg_set_weight_device,
 	},
 	{
 		.name = "weight",
-		.flags = CFTYPE_NOT_ON_ROOT,
+		.flags = CFTYPE_INSANE | CFTYPE_NOT_ON_ROOT,
 		.seq_show = cfq_print_weight,
 		.write_u64 = cfq_set_weight,
 	},
 
 	{
 		.name = "leaf_weight_device",
+		.flags = CFTYPE_INSANE,
 		.seq_show = cfqg_print_leaf_weight_device,
 		.write_string = cfqg_set_leaf_weight_device,
 	},
 	{
 		.name = "leaf_weight",
+		.flags = CFTYPE_INSANE,
 		.seq_show = cfq_print_leaf_weight,
 		.write_u64 = cfq_set_leaf_weight,
 	},
@@ -1874,41 +1876,49 @@ static struct cftype cfq_blkcg_files[] =
 	/* statistics, covers only the tasks in the cfqg */
 	{
 		.name = "time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.time),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "sectors",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.sectors),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "io_service_bytes",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.service_bytes),
 		.seq_show = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_serviced",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.serviced),
 		.seq_show = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_service_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.service_time),
 		.seq_show = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_wait_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.wait_time),
 		.seq_show = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_merged",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.merged),
 		.seq_show = cfqg_print_rwstat,
 	},
 	{
 		.name = "io_queued",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.queued),
 		.seq_show = cfqg_print_rwstat,
 	},
@@ -1916,75 +1926,146 @@ static struct cftype cfq_blkcg_files[] =
 	/* the same statictics which cover the cfqg and its descendants */
 	{
 		.name = "time_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.time),
 		.seq_show = cfqg_print_stat_recursive,
 	},
 	{
 		.name = "sectors_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.sectors),
 		.seq_show = cfqg_print_stat_recursive,
 	},
 	{
 		.name = "io_service_bytes_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.service_bytes),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 	{
 		.name = "io_serviced_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.serviced),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 	{
 		.name = "io_service_time_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.service_time),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 	{
 		.name = "io_wait_time_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.wait_time),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 	{
 		.name = "io_merged_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.merged),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 	{
 		.name = "io_queued_recursive",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.queued),
 		.seq_show = cfqg_print_rwstat_recursive,
 	},
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 	{
 		.name = "avg_queue_size",
+		.flags = CFTYPE_INSANE,
 		.seq_show = cfqg_print_avg_queue_size,
 	},
 	{
 		.name = "group_wait_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.group_wait_time),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "idle_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.idle_time),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "empty_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.empty_time),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "dequeue",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.dequeue),
 		.seq_show = cfqg_print_stat,
 	},
 	{
 		.name = "unaccounted_time",
+		.flags = CFTYPE_INSANE,
 		.private = offsetof(struct cfq_group, stats.unaccounted_time),
 		.seq_show = cfqg_print_stat,
 	},
 #endif	/* CONFIG_DEBUG_BLK_CGROUP */
+
+	/* files for default hierarchy, properly prefixed with cfq */
+	{
+		.name = "cfq.weight_device",
+		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
+		.seq_show = cfqg_print_weight_device,
+		.write_string = cfqg_set_weight_device,
+	},
+	{
+		.name = "cfq.weight",
+		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
+		.seq_show = cfq_print_weight,
+		.write_u64 = cfq_set_weight,
+	},
+
+	/*
+	 * All stats are recursive and fewer are visible.  Please do not
+	 * add stats which aren't strictly necessary or expose internal
+	 * details.
+	 */
+	{
+		.name = "cfq.sectors",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.sectors),
+		.seq_show = cfqg_print_stat_recursive,
+	},
+	{
+		.name = "cfq.bytes",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.service_bytes),
+		.seq_show = cfqg_print_rwstat_recursive,
+	},
+	{
+		.name = "cfq.serviced",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.serviced),
+		.seq_show = cfqg_print_rwstat_recursive,
+	},
+	{
+		.name = "cfq.merged",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.merged),
+		.seq_show = cfqg_print_rwstat_recursive,
+	},
+	{
+		.name = "cfq.io_time",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.service_time),
+		.seq_show = cfqg_print_rwstat_recursive,
+	},
+	{
+		.name = "cfq.wait_time",
+		.flags = CFTYPE_ONLY_ON_DFL,
+		.private = offsetof(struct cfq_group, stats.wait_time),
+		.seq_show = cfqg_print_rwstat_recursive,
+	},
+
 	{ }	/* terminate */
 };
 #else /* GROUP_IOSCHED */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
  2014-04-13  1:32 ` Tejun Heo
  (?)
@ 2014-04-14 18:08 ` Vivek Goyal
  2014-04-14 19:32   ` Tejun Heo
  -1 siblings, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2014-04-14 18:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

On Sat, Apr 12, 2014 at 09:32:09PM -0400, Tejun Heo wrote:
> Hello,
> 
> Unified hierarchy has finally been posted.
> 
>   http://thread.gmane.org/gmane.linux.kernel.containers/27601
> 
> It took a lot longer than I originally anticipated and over the course
> quite a few aspects of the initial design have changed, hopefully, for
> the better.  One of the areas which has seen major redirection is
> internal tasks competing against child cgroups.  The original plan was
> implementing an implicit leaf node for each cgroup so that internal
> tasks compete as an implicit child against other children.  cfq was
> chosen as the first one to try the strategy and the result was
> leaf_weight[_device] knobs.  This, unfortunately, doesn't seem to be
> the right direction.
> 
> The strategy complicates the controller interface and implementation,
> and ends up adding an extra layer of nesting at leaves.  Also,
> initially, it was thought that internal tasks are problematic only for
> weight based controllers; however, it turns out it also is a problem
> for absolute limit based ones like memory, which means that we'd need
> to complicate more controllers.
> 
> As the problem is something fundamental in most resource controllers,
> the implemented unified hierarchy now enforces structural constraint
> which prevents competion between internal tasks and child cgroups from
> the get-go making leaf_weight[_device] mechanism unnecessary.
> 
> In addition, blkio interface is generally a bit messy and all cfq
> knobs are missing "cfq." prefix.  As unified hierarchy involves
> noticeable changes in usage, this patch takes the chance and make
> blkio present more consistent and concise interface on unified
> hierarchy
> 
> I'm currently preparing a doc describing the design and rationales of
> unified hierarchy and most of the above information will be present in
> the documentation.
> 
> Thanks.

Hi Tejun,

Can you please also update Documentation/block/cfq-iosched.txt and
Documentation/cgroup/blkio-controller.txt to reflect these new
changes.

So now we have tree modes?

- Orignal multi hierachy mode
- Multi hierarchy with sane flag
	- Sane flag modifies behavior of throttling.
- Unified hierarchy
	- Changes tunables.
> 
> ------ 8< ------
> Some blkcg knobs don't make sense on unified hierarchy.  This patch
> makes the following changes for unified hierarchy.

Looks like some of the changes are related to sane flag and others
are related to unified hierarchy. Will it make sense to break it down
int two patches.

> 
> * reset_stats never made much sense outside developement and
>   debugging.  Omit it.
> 

Agreed. Lets get rid of it.

> * As unified hierarchy excludes internal tasks competing against child
>   cgroups, cfq's leaf_weight[_device] knobs are no longer necessary.
>   Omit it.

Agreed. All this leaf_weight logic was little magic. Happy to see it go.

> 
> * For cfq, weight[_device] behave inherently differently from the same
>   knobs on !root cgroups as root cgroup is treated as parallel to
>   immediate children.  Let's omit it for now.

So now root cgroup will not have any cfq, weight[_device] knobs but
non root cgroups will have one? What's the harm in providing one for
root too for the sake of uniformity. That's a differnt thing thet root
does not have a competitor so value in that knob does not matter.

> 
> * cfq's time stats reports the total time slice consumed but its unit
>   is in jiffies and it's not reported per-device like all other stats.
>   Omit it for now.  If slice time is actually necessary, let's add it
>   in the form consistent with other stats.

To me this knob is also for debugging purposes. CFQ divides disk time
proportionally. And if one wants to see how disk time is being divided
among cgroups, this knob helps. This is more to verify whether our
algorithm is working fine or not and if disk time is being divided
in proportion to weights or not.

But again, I am not particular about this knob. Sounds more like
debugging stuff.

> 
> * cfq's io_queued reporting is different from all other stats in that
>   it's a snapshot value rather than accumulated value and is mostly
>   meaningful only for debugging.  Omit it.

io_queued shows number of IOs queued currently in cgroup. So this was
never supposed to be an accumulated value. Accumulated value will not
make sense here, I think.

Google folks wanted all these knobs. And they said that they have lot
of more knobs which they carry internally.

Personally I think io_queued might be a useful knob because it can
show current backlog on a particular cgroup and some tool might want
to dynamically monitor the queue lenghts of various cgroups and adjust
weights dynamically.

So this one might be worth retaining.

> 
> * As unified hierarchy doesn't allow internal tasks, non-recursive
>   stats are largely irrelevant.  Omit them.

Makes sense. So this is more of a unified hierarchy related cleanup.

> 
> * cfq adds a number of stats knobs if CONFIG_DEBUG_BLK_CGROUP.  If
>   these are actually for debugging, they shouldn't show up by default
>   (ie. should be gated by kernel param or something).  If userland
>   already developed dependency on them, we need to take them out of
>   CONFIG_DEBUG_BLK_CGROUP.  Skip them for now.  We can add back the
>   relevant ones later.

I am fine with this. Again Google folks intorduced those. Not sure
if they still use those. Less stats are better.

> 
> * Prefix the remaining cfq knobs with "cfq." and make the names
>   consistent.

Looks good. This makes it consistent with blkio.throttle.* knobs.

> 
>   weight_device			-> cfq.weight_device
>   weight			-> cfq.weight
>   sectors_recursive		-> cfq.sectors
>   io_service_bytes_recursive	-> cfq.bytes

throttling has correspnding knob as throttle.io_service_bytes. So either
we can change the name of throttling knob to throttle.bytes or
chnage keep this one as it is (cfq.io_service_bytes) for the sake of
consistency.

>   io_serviced_recursive		-> cfq.serviced

throttling layer uses throttle.io_serviced.

Given the fact that throttling layer counts bios and CFQ counts requests
may be we can keep throttle.bio_serviced  and cfq.rq_serviced as names.

Or may be just throttle.bios and cfq.requests respectively.

>   io_merged_recursive		-> cfq.merged
>   io_service_time_recursive	-> cfq.io_time
>   io_wait_time_recursive	-> cfq.wait_time
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-cgroup.c  |    1 
>  block/cfq-iosched.c |   89 +++++++++++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 86 insertions(+), 4 deletions(-)
> 
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -761,6 +761,7 @@ EXPORT_SYMBOL_GPL(blkg_conf_finish);
>  struct cftype blkcg_files[] = {
>  	{
>  		.name = "reset_stats",
> +		.flags = CFTYPE_INSANE,
>  		.write_u64 = blkcg_reset_stats,

So this knob will not be user visible with unified hierarchy as well as with
sane behavior?

So this looks like a cleanup independent of unified hiearchy.

[..]
> +	/* files for default hierarchy, properly prefixed with cfq */
> +	{
> +		.name = "cfq.weight_device",
> +		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
> +		.seq_show = cfqg_print_weight_device,
> +		.write_string = cfqg_set_weight_device,
> +	},
> +	{
> +		.name = "cfq.weight",
> +		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
> +		.seq_show = cfq_print_weight,
> +		.write_u64 = cfq_set_weight,
> +	},
> +

What't the harm in providing these knobs for root too?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
  2014-04-14 18:08 ` Vivek Goyal
@ 2014-04-14 19:32   ` Tejun Heo
  2014-04-15 13:53       ` Vivek Goyal
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2014-04-14 19:32 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

Hello,

On Mon, Apr 14, 2014 at 02:08:24PM -0400, Vivek Goyal wrote:
> Can you please also update Documentation/block/cfq-iosched.txt and
> Documentation/cgroup/blkio-controller.txt to reflect these new
> changes.

Sure thing.

> So now we have tree modes?
> 
> - Orignal multi hierachy mode
> - Multi hierarchy with sane flag
> 	- Sane flag modifies behavior of throttling.
> - Unified hierarchy
> 	- Changes tunables.

No, we don't.  __DEVEL__sane_behavior is a tool to implement unified
hierarchy.  Once unified hierarchy lands, sane_behavior will be folded
into unified hierarchy.

> Looks like some of the changes are related to sane flag and others
> are related to unified hierarchy. Will it make sense to break it down
> int two patches.

So, they're one and the same.

> > * For cfq, weight[_device] behave inherently differently from the same
> >   knobs on !root cgroups as root cgroup is treated as parallel to
> >   immediate children.  Let's omit it for now.
> 
> So now root cgroup will not have any cfq, weight[_device] knobs but
> non root cgroups will have one? What's the harm in providing one for
> root too for the sake of uniformity. That's a differnt thing thet root
> does not have a competitor so value in that knob does not matter.

Please see at the end of the mail.

> > * cfq's time stats reports the total time slice consumed but its unit
> >   is in jiffies and it's not reported per-device like all other stats.
> >   Omit it for now.  If slice time is actually necessary, let's add it
> >   in the form consistent with other stats.
> 
> To me this knob is also for debugging purposes. CFQ divides disk time
> proportionally. And if one wants to see how disk time is being divided
> among cgroups, this knob helps. This is more to verify whether our
> algorithm is working fine or not and if disk time is being divided
> in proportion to weights or not.
> 
> But again, I am not particular about this knob. Sounds more like
> debugging stuff.

If it is a debug knob, let's please require it to be enabled
explicitly - ie. require a boot param which *clearly* indicates that
this is a debug option; otherwise, we end up leaking internal details
w/o actually intending or designing for it and userland is likely to
grow dependency on it.

> Google folks wanted all these knobs. And they said that they have lot
> of more knobs which they carry internally.

AFAIK, google folks forked cfq a couple years ago, I'm not sure how
much this will affect them.

> Personally I think io_queued might be a useful knob because it can
> show current backlog on a particular cgroup and some tool might want
> to dynamically monitor the queue lenghts of various cgroups and adjust
> weights dynamically.
> 
> So this one might be worth retaining.

Can we do that once we know for sure that this is actually necessary?
I'm quite skeptical that this sort of snapshot stats are actually
useful except for debugging.

> > * cfq adds a number of stats knobs if CONFIG_DEBUG_BLK_CGROUP.  If
> >   these are actually for debugging, they shouldn't show up by default
> >   (ie. should be gated by kernel param or something).  If userland
> >   already developed dependency on them, we need to take them out of
> >   CONFIG_DEBUG_BLK_CGROUP.  Skip them for now.  We can add back the
> >   relevant ones later.
> 
> I am fine with this. Again Google folks intorduced those. Not sure
> if they still use those. Less stats are better.

The same thing with above.  If these are really for debugging, let's
please hide them better.

> >   weight_device			-> cfq.weight_device
> >   weight			-> cfq.weight
> >   sectors_recursive		-> cfq.sectors
> >   io_service_bytes_recursive	-> cfq.bytes
> 
> throttling has correspnding knob as throttle.io_service_bytes. So either
> we can change the name of throttling knob to throttle.bytes or
> chnage keep this one as it is (cfq.io_service_bytes) for the sake of
> consistency.

Yeah, thought about keeping them consistent but would that really
matter?  Having internal consistency in cfq seems more important to
me.

> >   io_serviced_recursive		-> cfq.serviced
> 
> throttling layer uses throttle.io_serviced.

Ditto.

> Given the fact that throttling layer counts bios and CFQ counts requests
> may be we can keep throttle.bio_serviced  and cfq.rq_serviced as names.
> 
> Or may be just throttle.bios and cfq.requests respectively.

I'm not sure whether we want to rename throttle knobs.  Renaming cfq
knobs makes sense as they need to be renamed for cfq. prefix anyway
but throttle knobs are already mostly fine, so wouldn't it make sense
to leave them as they are?  It makes things a bit incosistent between
cfq and throttle but do we really care?

> > @@ -761,6 +761,7 @@ EXPORT_SYMBOL_GPL(blkg_conf_finish);
> >  struct cftype blkcg_files[] = {
> >  	{
> >  		.name = "reset_stats",
> > +		.flags = CFTYPE_INSANE,
> >  		.write_u64 = blkcg_reset_stats,
> 
> So this knob will not be user visible with unified hierarchy as well as with
> sane behavior?
> 
> So this looks like a cleanup independent of unified hiearchy.

Again, sane_behavior is subset of unified hierarchy and will be
absorbed into unified hierarchy.  No need to make the distinction.

> > +	/* files for default hierarchy, properly prefixed with cfq */
> > +	{
> > +		.name = "cfq.weight_device",
> > +		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
> > +		.seq_show = cfqg_print_weight_device,
> > +		.write_string = cfqg_set_weight_device,
> > +	},
> > +	{
> > +		.name = "cfq.weight",
> > +		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
> > +		.seq_show = cfq_print_weight,
> > +		.write_u64 = cfq_set_weight,
> > +	},
> > +
> 
> What't the harm in providing these knobs for root too?

Because I'm not sure about the benefits and it makes cfq behave
differently from other controllers.  root cgroup behavior probably has
to be special per controller so there's nothing fundamentally wrong
with that.  I just am not sure whether this actually adds something
significantly beneficial and it's kinda silly to commit to things when
not really necessary.  Adding things later is easy, removing is not,
so...

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-15 13:53       ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2014-04-15 13:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

On Mon, Apr 14, 2014 at 03:32:14PM -0400, Tejun Heo wrote:

[..]
> > So now we have tree modes?
> > 
> > - Orignal multi hierachy mode
> > - Multi hierarchy with sane flag
> > 	- Sane flag modifies behavior of throttling.
> > - Unified hierarchy
> > 	- Changes tunables.
> 
> No, we don't.  __DEVEL__sane_behavior is a tool to implement unified
> hierarchy.  Once unified hierarchy lands, sane_behavior will be folded
> into unified hierarchy.

Ok, got it. So unified hierarchy will co-exist with other hierarchies
in the system.

I somehow had the impression that either one will have unified hierarchy
or will be multiple hierarchies. But one can't have both.

But looks like that you are targetting that one can have multiple
hierarchies. One of those will be unified hierarchy with new contstraints.
Other hierarchies can be old type hierarchies. Do I understand it right?

> 
> > Looks like some of the changes are related to sane flag and others
> > are related to unified hierarchy. Will it make sense to break it down
> > int two patches.
> 
> So, they're one and the same.
> 
> > > * For cfq, weight[_device] behave inherently differently from the same
> > >   knobs on !root cgroups as root cgroup is treated as parallel to
> > >   immediate children.  Let's omit it for now.
> > 
> > So now root cgroup will not have any cfq, weight[_device] knobs but
> > non root cgroups will have one? What's the harm in providing one for
> > root too for the sake of uniformity. That's a differnt thing thet root
> > does not have a competitor so value in that knob does not matter.
> 
> Please see at the end of the mail.
> 
> > > * cfq's time stats reports the total time slice consumed but its unit
> > >   is in jiffies and it's not reported per-device like all other stats.
> > >   Omit it for now.  If slice time is actually necessary, let's add it
> > >   in the form consistent with other stats.
> > 
> > To me this knob is also for debugging purposes. CFQ divides disk time
> > proportionally. And if one wants to see how disk time is being divided
> > among cgroups, this knob helps. This is more to verify whether our
> > algorithm is working fine or not and if disk time is being divided
> > in proportion to weights or not.
> > 
> > But again, I am not particular about this knob. Sounds more like
> > debugging stuff.
> 
> If it is a debug knob, let's please require it to be enabled
> explicitly - ie. require a boot param which *clearly* indicates that
> this is a debug option; otherwise, we end up leaking internal details
> w/o actually intending or designing for it and userland is likely to
> grow dependency on it.

So you want all the debug knobs to be enabled by  DEBUG_BLK_CGROUP option as
well as using a kernel command line? But I thought that in general
people are not supposed to set CONFIG_DEBUG_BLK_CGROUP=y. It brings in
unnecessary overhead. Those who are doing development and trying to debug
things only those can enable above config option.

But anyway, I am fine with hiding these files behind a command line
parameter too.

> 
> > Google folks wanted all these knobs. And they said that they have lot
> > of more knobs which they carry internally.
> 
> AFAIK, google folks forked cfq a couple years ago, I'm not sure how
> much this will affect them.
> 
> > Personally I think io_queued might be a useful knob because it can
> > show current backlog on a particular cgroup and some tool might want
> > to dynamically monitor the queue lenghts of various cgroups and adjust
> > weights dynamically.
> > 
> > So this one might be worth retaining.
> 
> Can we do that once we know for sure that this is actually necessary?
> I'm quite skeptical that this sort of snapshot stats are actually
> useful except for debugging.

Ok, that's fine. We can introduce it once somebody really needs it.

> 
> > > * cfq adds a number of stats knobs if CONFIG_DEBUG_BLK_CGROUP.  If
> > >   these are actually for debugging, they shouldn't show up by default
> > >   (ie. should be gated by kernel param or something).  If userland
> > >   already developed dependency on them, we need to take them out of
> > >   CONFIG_DEBUG_BLK_CGROUP.  Skip them for now.  We can add back the
> > >   relevant ones later.
> > 
> > I am fine with this. Again Google folks intorduced those. Not sure
> > if they still use those. Less stats are better.
> 
> The same thing with above.  If these are really for debugging, let's
> please hide them better.
> 
> > >   weight_device			-> cfq.weight_device
> > >   weight			-> cfq.weight
> > >   sectors_recursive		-> cfq.sectors
> > >   io_service_bytes_recursive	-> cfq.bytes
> > 
> > throttling has correspnding knob as throttle.io_service_bytes. So either
> > we can change the name of throttling knob to throttle.bytes or
> > chnage keep this one as it is (cfq.io_service_bytes) for the sake of
> > consistency.
> 
> Yeah, thought about keeping them consistent but would that really
> matter?  Having internal consistency in cfq seems more important to
> me.

I think from a user's perspective it really matters. If I see two knobs
in a cgroup, say throttle.io_service_bytes and cfq.io_service_bytes, it
is much more intuitive.

So I feel that we should aim for having consistent names in CFQ and
throttling. If one understands one knob at one layer, it is intuitive
to figure out what same knob means at other layer.

And, thankfully, we have only two stats at throttling layer.

blkio.throttle.io_serviced
blkio.throttle.io_service_bytes

Though names are little longer, but they look fine to me. They are
intuitive. So I can't see a strong reason to change those names
at CFQ layer.

blkio.cfq.io_serviced
blkio.cfq.io_service_bytes

> 
> > >   io_serviced_recursive		-> cfq.serviced
> > 
> > throttling layer uses throttle.io_serviced.
> 
> Ditto.
> 
> > Given the fact that throttling layer counts bios and CFQ counts requests
> > may be we can keep throttle.bio_serviced  and cfq.rq_serviced as names.
> > 
> > Or may be just throttle.bios and cfq.requests respectively.
> 
> I'm not sure whether we want to rename throttle knobs.  Renaming cfq
> knobs makes sense as they need to be renamed for cfq. prefix anyway
> but throttle knobs are already mostly fine, so wouldn't it make sense
> to leave them as they are?  It makes things a bit incosistent between
> cfq and throttle but do we really care?

Instead of renaming throttling knob, I will prefer to keep new CFQ knob
names consistent with throttling names.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-15 13:53       ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2014-04-15 13:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Mon, Apr 14, 2014 at 03:32:14PM -0400, Tejun Heo wrote:

[..]
> > So now we have tree modes?
> > 
> > - Orignal multi hierachy mode
> > - Multi hierarchy with sane flag
> > 	- Sane flag modifies behavior of throttling.
> > - Unified hierarchy
> > 	- Changes tunables.
> 
> No, we don't.  __DEVEL__sane_behavior is a tool to implement unified
> hierarchy.  Once unified hierarchy lands, sane_behavior will be folded
> into unified hierarchy.

Ok, got it. So unified hierarchy will co-exist with other hierarchies
in the system.

I somehow had the impression that either one will have unified hierarchy
or will be multiple hierarchies. But one can't have both.

But looks like that you are targetting that one can have multiple
hierarchies. One of those will be unified hierarchy with new contstraints.
Other hierarchies can be old type hierarchies. Do I understand it right?

> 
> > Looks like some of the changes are related to sane flag and others
> > are related to unified hierarchy. Will it make sense to break it down
> > int two patches.
> 
> So, they're one and the same.
> 
> > > * For cfq, weight[_device] behave inherently differently from the same
> > >   knobs on !root cgroups as root cgroup is treated as parallel to
> > >   immediate children.  Let's omit it for now.
> > 
> > So now root cgroup will not have any cfq, weight[_device] knobs but
> > non root cgroups will have one? What's the harm in providing one for
> > root too for the sake of uniformity. That's a differnt thing thet root
> > does not have a competitor so value in that knob does not matter.
> 
> Please see at the end of the mail.
> 
> > > * cfq's time stats reports the total time slice consumed but its unit
> > >   is in jiffies and it's not reported per-device like all other stats.
> > >   Omit it for now.  If slice time is actually necessary, let's add it
> > >   in the form consistent with other stats.
> > 
> > To me this knob is also for debugging purposes. CFQ divides disk time
> > proportionally. And if one wants to see how disk time is being divided
> > among cgroups, this knob helps. This is more to verify whether our
> > algorithm is working fine or not and if disk time is being divided
> > in proportion to weights or not.
> > 
> > But again, I am not particular about this knob. Sounds more like
> > debugging stuff.
> 
> If it is a debug knob, let's please require it to be enabled
> explicitly - ie. require a boot param which *clearly* indicates that
> this is a debug option; otherwise, we end up leaking internal details
> w/o actually intending or designing for it and userland is likely to
> grow dependency on it.

So you want all the debug knobs to be enabled by  DEBUG_BLK_CGROUP option as
well as using a kernel command line? But I thought that in general
people are not supposed to set CONFIG_DEBUG_BLK_CGROUP=y. It brings in
unnecessary overhead. Those who are doing development and trying to debug
things only those can enable above config option.

But anyway, I am fine with hiding these files behind a command line
parameter too.

> 
> > Google folks wanted all these knobs. And they said that they have lot
> > of more knobs which they carry internally.
> 
> AFAIK, google folks forked cfq a couple years ago, I'm not sure how
> much this will affect them.
> 
> > Personally I think io_queued might be a useful knob because it can
> > show current backlog on a particular cgroup and some tool might want
> > to dynamically monitor the queue lenghts of various cgroups and adjust
> > weights dynamically.
> > 
> > So this one might be worth retaining.
> 
> Can we do that once we know for sure that this is actually necessary?
> I'm quite skeptical that this sort of snapshot stats are actually
> useful except for debugging.

Ok, that's fine. We can introduce it once somebody really needs it.

> 
> > > * cfq adds a number of stats knobs if CONFIG_DEBUG_BLK_CGROUP.  If
> > >   these are actually for debugging, they shouldn't show up by default
> > >   (ie. should be gated by kernel param or something).  If userland
> > >   already developed dependency on them, we need to take them out of
> > >   CONFIG_DEBUG_BLK_CGROUP.  Skip them for now.  We can add back the
> > >   relevant ones later.
> > 
> > I am fine with this. Again Google folks intorduced those. Not sure
> > if they still use those. Less stats are better.
> 
> The same thing with above.  If these are really for debugging, let's
> please hide them better.
> 
> > >   weight_device			-> cfq.weight_device
> > >   weight			-> cfq.weight
> > >   sectors_recursive		-> cfq.sectors
> > >   io_service_bytes_recursive	-> cfq.bytes
> > 
> > throttling has correspnding knob as throttle.io_service_bytes. So either
> > we can change the name of throttling knob to throttle.bytes or
> > chnage keep this one as it is (cfq.io_service_bytes) for the sake of
> > consistency.
> 
> Yeah, thought about keeping them consistent but would that really
> matter?  Having internal consistency in cfq seems more important to
> me.

I think from a user's perspective it really matters. If I see two knobs
in a cgroup, say throttle.io_service_bytes and cfq.io_service_bytes, it
is much more intuitive.

So I feel that we should aim for having consistent names in CFQ and
throttling. If one understands one knob at one layer, it is intuitive
to figure out what same knob means at other layer.

And, thankfully, we have only two stats at throttling layer.

blkio.throttle.io_serviced
blkio.throttle.io_service_bytes

Though names are little longer, but they look fine to me. They are
intuitive. So I can't see a strong reason to change those names
at CFQ layer.

blkio.cfq.io_serviced
blkio.cfq.io_service_bytes

> 
> > >   io_serviced_recursive		-> cfq.serviced
> > 
> > throttling layer uses throttle.io_serviced.
> 
> Ditto.
> 
> > Given the fact that throttling layer counts bios and CFQ counts requests
> > may be we can keep throttle.bio_serviced  and cfq.rq_serviced as names.
> > 
> > Or may be just throttle.bios and cfq.requests respectively.
> 
> I'm not sure whether we want to rename throttle knobs.  Renaming cfq
> knobs makes sense as they need to be renamed for cfq. prefix anyway
> but throttle knobs are already mostly fine, so wouldn't it make sense
> to leave them as they are?  It makes things a bit incosistent between
> cfq and throttle but do we really care?

Instead of renaming throttling knob, I will prefer to keep new CFQ knob
names consistent with throttling names.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-15 14:06         ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-04-15 14:06 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

Hello,

On Tue, Apr 15, 2014 at 09:53:59AM -0400, Vivek Goyal wrote:
> But looks like that you are targetting that one can have multiple
> hierarchies. One of those will be unified hierarchy with new contstraints.
> Other hierarchies can be old type hierarchies. Do I understand it right?

Yeap, for backward compatibility.

> So you want all the debug knobs to be enabled by  DEBUG_BLK_CGROUP option as
> well as using a kernel command line? But I thought that in general
> people are not supposed to set CONFIG_DEBUG_BLK_CGROUP=y. It brings in
> unnecessary overhead. Those who are doing development and trying to debug
> things only those can enable above config option.
> 
> But anyway, I am fine with hiding these files behind a command line
> parameter too.

It's easy to enable the config option and there's nothing clearly
indicating those knobs expose internal details and volatile in nature.
Once userland develops dependency on them, we get locked into it
whether we intended or not.  Debug features should be very clearly
marked as such.

> > Yeah, thought about keeping them consistent but would that really
> > matter?  Having internal consistency in cfq seems more important to
> > me.
> 
> I think from a user's perspective it really matters. If I see two knobs
> in a cgroup, say throttle.io_service_bytes and cfq.io_service_bytes, it
> is much more intuitive.
>
> So I feel that we should aim for having consistent names in CFQ and
> throttling. If one understands one knob at one layer, it is intuitive
> to figure out what same knob means at other layer.
> 
> And, thankfully, we have only two stats at throttling layer.
> 
> blkio.throttle.io_serviced
> blkio.throttle.io_service_bytes
> 
> Though names are little longer, but they look fine to me. They are
> intuitive. So I can't see a strong reason to change those names
> at CFQ layer.
> 
> blkio.cfq.io_serviced
> blkio.cfq.io_service_bytes

But then do we name other stat knobs similarly too?

 blkio.cfq.io_service_sectors
 blkio.cfq.io_service_bytes
 blkio.cfq.io_serviced
 blkio.cfq.io_merged

I don't know.  The names look outright stupid to me.  If we don't do
the above, then we have internal inconsistencies among cfq knob names
which gotta be worse then cfq / throttl inconsistency.  It's not a
perfect situation no matter what we do.  As long as each knob is
clearly documented, I don't think these inconsistencies are big deal,
so let's just clean up cfq names as we need to add prefix anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-15 14:06         ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-04-15 14:06 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Tue, Apr 15, 2014 at 09:53:59AM -0400, Vivek Goyal wrote:
> But looks like that you are targetting that one can have multiple
> hierarchies. One of those will be unified hierarchy with new contstraints.
> Other hierarchies can be old type hierarchies. Do I understand it right?

Yeap, for backward compatibility.

> So you want all the debug knobs to be enabled by  DEBUG_BLK_CGROUP option as
> well as using a kernel command line? But I thought that in general
> people are not supposed to set CONFIG_DEBUG_BLK_CGROUP=y. It brings in
> unnecessary overhead. Those who are doing development and trying to debug
> things only those can enable above config option.
> 
> But anyway, I am fine with hiding these files behind a command line
> parameter too.

It's easy to enable the config option and there's nothing clearly
indicating those knobs expose internal details and volatile in nature.
Once userland develops dependency on them, we get locked into it
whether we intended or not.  Debug features should be very clearly
marked as such.

> > Yeah, thought about keeping them consistent but would that really
> > matter?  Having internal consistency in cfq seems more important to
> > me.
> 
> I think from a user's perspective it really matters. If I see two knobs
> in a cgroup, say throttle.io_service_bytes and cfq.io_service_bytes, it
> is much more intuitive.
>
> So I feel that we should aim for having consistent names in CFQ and
> throttling. If one understands one knob at one layer, it is intuitive
> to figure out what same knob means at other layer.
> 
> And, thankfully, we have only two stats at throttling layer.
> 
> blkio.throttle.io_serviced
> blkio.throttle.io_service_bytes
> 
> Though names are little longer, but they look fine to me. They are
> intuitive. So I can't see a strong reason to change those names
> at CFQ layer.
> 
> blkio.cfq.io_serviced
> blkio.cfq.io_service_bytes

But then do we name other stat knobs similarly too?

 blkio.cfq.io_service_sectors
 blkio.cfq.io_service_bytes
 blkio.cfq.io_serviced
 blkio.cfq.io_merged

I don't know.  The names look outright stupid to me.  If we don't do
the above, then we have internal inconsistencies among cfq knob names
which gotta be worse then cfq / throttl inconsistency.  It's not a
perfect situation no matter what we do.  As long as each knob is
clearly documented, I don't think these inconsistencies are big deal,
so let's just clean up cfq names as we need to add prefix anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
  2014-04-15 14:06         ` Tejun Heo
  (?)
@ 2014-04-15 14:18         ` Vivek Goyal
  2014-04-23 17:01             ` Tejun Heo
  -1 siblings, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2014-04-15 14:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

On Tue, Apr 15, 2014 at 10:06:50AM -0400, Tejun Heo wrote:

[..]
> But then do we name other stat knobs similarly too?
> 
>  blkio.cfq.io_service_sectors
>  blkio.cfq.io_service_bytes
>  blkio.cfq.io_serviced
>  blkio.cfq.io_merged
> 
> I don't know.  The names look outright stupid to me.  If we don't do
> the above, then we have internal inconsistencies among cfq knob names
> which gotta be worse then cfq / throttl inconsistency.  It's not a
> perfect situation no matter what we do.  As long as each knob is
> clearly documented, I don't think these inconsistencies are big deal,
> so let's just clean up cfq names as we need to add prefix anyway.

Ok, that's fine. Let us just document the knobs well so that people can
find which knob is giving what information and make cfq names better at
the expense of inconsistency of names with throttling layer.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-23 17:01             ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-04-23 17:01 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

On Tue, Apr 15, 2014 at 10:18:26AM -0400, Vivek Goyal wrote:
> Ok, that's fine. Let us just document the knobs well so that people can
> find which knob is giving what information and make cfq names better at
> the expense of inconsistency of names with throttling layer.

I've been thinking about it more.  Why do we even have separate stats
for common things like bytes transferred?  It doesn't serve any
purpose to do double accounting and reporting on everything, does it?
Shouldn't we just have single set of common stats for things like
requests / bytes serviced?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-23 17:01             ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-04-23 17:01 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, Apr 15, 2014 at 10:18:26AM -0400, Vivek Goyal wrote:
> Ok, that's fine. Let us just document the knobs well so that people can
> find which knob is giving what information and make cfq names better at
> the expense of inconsistency of names with throttling layer.

I've been thinking about it more.  Why do we even have separate stats
for common things like bytes transferred?  It doesn't serve any
purpose to do double accounting and reporting on everything, does it?
Shouldn't we just have single set of common stats for things like
requests / bytes serviced?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-23 17:17               ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2014-04-23 17:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

On Wed, Apr 23, 2014 at 01:01:41PM -0400, Tejun Heo wrote:
> On Tue, Apr 15, 2014 at 10:18:26AM -0400, Vivek Goyal wrote:
> > Ok, that's fine. Let us just document the knobs well so that people can
> > find which knob is giving what information and make cfq names better at
> > the expense of inconsistency of names with throttling layer.
> 
> I've been thinking about it more.  Why do we even have separate stats
> for common things like bytes transferred?  It doesn't serve any
> purpose to do double accounting and reporting on everything, does it?
> Shouldn't we just have single set of common stats for things like
> requests / bytes serviced?

I think we should just require two. One for measuring rate in terms
of IOPS and other for measuring rate in terms of [kMG]B/sec.

So exporting both sector and bytes seems redundant. May be exporting
bytes and dropping sector interface will do. Is size of sector exported
somewhere so that user space can easily convert bytes to sector if needed.

Number of bio/requests also will need to be exported so that one can come
up with IOPS.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-23 17:17               ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2014-04-23 17:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 23, 2014 at 01:01:41PM -0400, Tejun Heo wrote:
> On Tue, Apr 15, 2014 at 10:18:26AM -0400, Vivek Goyal wrote:
> > Ok, that's fine. Let us just document the knobs well so that people can
> > find which knob is giving what information and make cfq names better at
> > the expense of inconsistency of names with throttling layer.
> 
> I've been thinking about it more.  Why do we even have separate stats
> for common things like bytes transferred?  It doesn't serve any
> purpose to do double accounting and reporting on everything, does it?
> Shouldn't we just have single set of common stats for things like
> requests / bytes serviced?

I think we should just require two. One for measuring rate in terms
of IOPS and other for measuring rate in terms of [kMG]B/sec.

So exporting both sector and bytes seems redundant. May be exporting
bytes and dropping sector interface will do. Is size of sector exported
somewhere so that user space can easily convert bytes to sector if needed.

Number of bio/requests also will need to be exported so that one can come
up with IOPS.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
  2014-04-23 17:17               ` Vivek Goyal
  (?)
@ 2014-04-23 18:52               ` Tejun Heo
  2014-04-23 18:58                   ` Vivek Goyal
  -1 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2014-04-23 18:52 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

Hello,

On Wed, Apr 23, 2014 at 01:17:20PM -0400, Vivek Goyal wrote:
> I think we should just require two. One for measuring rate in terms
> of IOPS and other for measuring rate in terms of [kMG]B/sec.

I meant between cfq and blk-throttle.  Why do we have separate stats
for them to present ultimately the same numbers?

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-23 18:58                   ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2014-04-23 18:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

On Wed, Apr 23, 2014 at 02:52:31PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Apr 23, 2014 at 01:17:20PM -0400, Vivek Goyal wrote:
> > I think we should just require two. One for measuring rate in terms
> > of IOPS and other for measuring rate in terms of [kMG]B/sec.
> 
> I meant between cfq and blk-throttle.  Why do we have separate stats
> for them to present ultimately the same numbers?

Oh, sorry, I had misunderstood your question.

- Number of IOs serviced will be different at throttling layer and
  CFQ layer as throttling accounts IO in terms of bios and CFQ
  accounts in terms of number of requests.

- CFQ might not be operational on a device while throttling might be
  on and one needs bytes stats.

- In a custom kernel throttling might not be on and CFQ is on and
  one needs the stats.

So I think we do require duplication of some stats across throttling
and CFQ, isn't it?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-23 18:58                   ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2014-04-23 18:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 23, 2014 at 02:52:31PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Apr 23, 2014 at 01:17:20PM -0400, Vivek Goyal wrote:
> > I think we should just require two. One for measuring rate in terms
> > of IOPS and other for measuring rate in terms of [kMG]B/sec.
> 
> I meant between cfq and blk-throttle.  Why do we have separate stats
> for them to present ultimately the same numbers?

Oh, sorry, I had misunderstood your question.

- Number of IOs serviced will be different at throttling layer and
  CFQ layer as throttling accounts IO in terms of bios and CFQ
  accounts in terms of number of requests.

- CFQ might not be operational on a device while throttling might be
  on and one needs bytes stats.

- In a custom kernel throttling might not be on and CFQ is on and
  one needs the stats.

So I think we do require duplication of some stats across throttling
and CFQ, isn't it?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
  2014-04-23 18:58                   ` Vivek Goyal
  (?)
@ 2014-04-23 19:00                   ` Tejun Heo
  2014-04-23 19:21                       ` Vivek Goyal
  -1 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2014-04-23 19:00 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

Hello,

On Wed, Apr 23, 2014 at 02:58:35PM -0400, Vivek Goyal wrote:
> Oh, sorry, I had misunderstood your question.
> 
> - Number of IOs serviced will be different at throttling layer and
>   CFQ layer as throttling accounts IO in terms of bios and CFQ
>   accounts in terms of number of requests.

But shouldn't it be possible to determine that from merged count?  Or
we can just expose both bio and request counts at block core layer.
The point is that there's nothing controller specific about these
stats.

> - CFQ might not be operational on a device while throttling might be
>   on and one needs bytes stats.

Yeah, so expose both from blkcg core as blkio.* stats.

> - In a custom kernel throttling might not be on and CFQ is on and
>   one needs the stats.

Ditto.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-23 19:21                       ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2014-04-23 19:21 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

On Wed, Apr 23, 2014 at 03:00:43PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Apr 23, 2014 at 02:58:35PM -0400, Vivek Goyal wrote:
> > Oh, sorry, I had misunderstood your question.
> > 
> > - Number of IOs serviced will be different at throttling layer and
> >   CFQ layer as throttling accounts IO in terms of bios and CFQ
> >   accounts in terms of number of requests.
> 
> But shouldn't it be possible to determine that from merged count?  Or
> we can just expose both bio and request counts at block core layer.
> The point is that there's nothing controller specific about these
> stats.

In general this idea makes sense. Exporting both request and bio will
solve the problem of io accounting. Also that should allow us to
get rid of blkio.io_merged.

What about sync/async differentiation? Throttling layer seems to flag a request sync
only if bio->bi_rw flag has REQ_SYNC set. While CFQ seems to consider
request sync if bio is either read or bio->bi_rw has REQ_SYNC flag set.

So we need to make this definition uniform. Or I am wondering do we
really need to export sync/async data. (Again put in by google folks).
How useful this info really is. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
@ 2014-04-23 19:21                       ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2014-04-23 19:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 23, 2014 at 03:00:43PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Apr 23, 2014 at 02:58:35PM -0400, Vivek Goyal wrote:
> > Oh, sorry, I had misunderstood your question.
> > 
> > - Number of IOs serviced will be different at throttling layer and
> >   CFQ layer as throttling accounts IO in terms of bios and CFQ
> >   accounts in terms of number of requests.
> 
> But shouldn't it be possible to determine that from merged count?  Or
> we can just expose both bio and request counts at block core layer.
> The point is that there's nothing controller specific about these
> stats.

In general this idea makes sense. Exporting both request and bio will
solve the problem of io accounting. Also that should allow us to
get rid of blkio.io_merged.

What about sync/async differentiation? Throttling layer seems to flag a request sync
only if bio->bi_rw flag has REQ_SYNC set. While CFQ seems to consider
request sync if bio is either read or bio->bi_rw has REQ_SYNC flag set.

So we need to make this definition uniform. Or I am wondering do we
really need to export sync/async data. (Again put in by google folks).
How useful this info really is. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
  2014-04-23 19:21                       ` Vivek Goyal
  (?)
@ 2014-04-23 19:27                       ` Tejun Heo
  -1 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2014-04-23 19:27 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

Hello,

On Wed, Apr 23, 2014 at 03:21:09PM -0400, Vivek Goyal wrote:
> In general this idea makes sense. Exporting both request and bio will
> solve the problem of io accounting. Also that should allow us to
> get rid of blkio.io_merged.

Yeah, that'd make more sense, I think.  IO submitted vs. actually
executed after merging.  Pretty clear definition.

> What about sync/async differentiation? Throttling layer seems to flag a request sync
> only if bio->bi_rw flag has REQ_SYNC set. While CFQ seems to consider
> request sync if bio is either read or bio->bi_rw has REQ_SYNC flag set.

Heh, I think we'd need to unify those no matter what.  The subtle
difference is extremely confusing.

> So we need to make this definition uniform. Or I am wondering do we
> really need to export sync/async data. (Again put in by google folks).
> How useful this info really is. 

Hmmmm... yeah, maybe that'd be the best way to go about it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
  2014-04-23 19:21                       ` Vivek Goyal
  (?)
  (?)
@ 2014-05-23 17:39                       ` Tejun Heo
  2014-05-27 12:49                         ` Vivek Goyal
  -1 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2014-05-23 17:39 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

Hello, Vivek.

On Wed, Apr 23, 2014 at 03:21:09PM -0400, Vivek Goyal wrote:
> What about sync/async differentiation? Throttling layer seems to flag a request sync
> only if bio->bi_rw flag has REQ_SYNC set. While CFQ seems to consider
> request sync if bio is either read or bio->bi_rw has REQ_SYNC flag set.

Working on this again, AFAICS, both treat REQ_SYNC the same way as far
as stats are concerned.  If SYNC is set, it's sync; otherwise, it's
accounted as async whether read or write.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy
  2014-05-23 17:39                       ` Tejun Heo
@ 2014-05-27 12:49                         ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2014-05-27 12:49 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel, Li Zefan, cgroups

On Fri, May 23, 2014 at 01:39:57PM -0400, Tejun Heo wrote:
> Hello, Vivek.
> 
> On Wed, Apr 23, 2014 at 03:21:09PM -0400, Vivek Goyal wrote:
> > What about sync/async differentiation? Throttling layer seems to flag a request sync
> > only if bio->bi_rw flag has REQ_SYNC set. While CFQ seems to consider
> > request sync if bio is either read or bio->bi_rw has REQ_SYNC flag set.
> 
> Working on this again, AFAICS, both treat REQ_SYNC the same way as far
> as stats are concerned.  If SYNC is set, it's sync; otherwise, it's
> accounted as async whether read or write.

Ok, that seems to be the case.

static inline void blkg_rwstat_add(struct blkg_rwstat *rwstat,
                                   int rw, uint64_t val)
{
        u64_stats_update_begin(&rwstat->syncp);

        if (rw & REQ_SYNC)
                rwstat->cnt[BLKG_RWSTAT_SYNC] += val;
        else
                rwstat->cnt[BLKG_RWSTAT_ASYNC] += val;

        u64_stats_update_end(&rwstat->syncp);
}


So sync will represent not policy specific interpretation of sync but
based on sync flag on request.

I guess it is fine. So far nobody seems to be complaining.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-05-27 12:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-13  1:32 [PATCH RFC] blkcg: prepare blkcg knobs for default hierarchy Tejun Heo
2014-04-13  1:32 ` Tejun Heo
2014-04-14 18:08 ` Vivek Goyal
2014-04-14 19:32   ` Tejun Heo
2014-04-15 13:53     ` Vivek Goyal
2014-04-15 13:53       ` Vivek Goyal
2014-04-15 14:06       ` Tejun Heo
2014-04-15 14:06         ` Tejun Heo
2014-04-15 14:18         ` Vivek Goyal
2014-04-23 17:01           ` Tejun Heo
2014-04-23 17:01             ` Tejun Heo
2014-04-23 17:17             ` Vivek Goyal
2014-04-23 17:17               ` Vivek Goyal
2014-04-23 18:52               ` Tejun Heo
2014-04-23 18:58                 ` Vivek Goyal
2014-04-23 18:58                   ` Vivek Goyal
2014-04-23 19:00                   ` Tejun Heo
2014-04-23 19:21                     ` Vivek Goyal
2014-04-23 19:21                       ` Vivek Goyal
2014-04-23 19:27                       ` Tejun Heo
2014-05-23 17:39                       ` Tejun Heo
2014-05-27 12:49                         ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.