linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH, v3 0/2] Timer slack cgroup subsystem
@ 2011-02-02 20:47 Kirill A. Shutsemov
  2011-02-02 20:47 ` [PATCH, v3 1/2] cgroups: export cgroup_iter_{start,next,end} Kirill A. Shutsemov
  2011-02-02 20:47 ` [PATCH, v3 2/2] cgroups: introduce timer slack subsystem Kirill A. Shutsemov
  0 siblings, 2 replies; 22+ messages in thread
From: Kirill A. Shutsemov @ 2011-02-02 20:47 UTC (permalink / raw)
  To: Paul Menage, Li Zefan
  Cc: containers, jacob.jun.pan, Arjan van de Ven, linux-kernel,
	Matt Helsley, Kirill A. Shutemov

From: Kirill A. Shutemov <kirill@shutemov.name>

Changelog:
v3: 
 - rework interface
 - s/EXPORT_SYMBOL/EXPORT_SYMBOL_GPL/
v2:
 - fixed with CONFIG_CGROUP_TIMER_SLACK=y
v1:
 - initial revision

Kirill A. Shutemov (2):
  cgroups: export cgroup_iter_{start,next,end}
  cgroups: introduce timer slack subsystem

 include/linux/cgroup_subsys.h |    6 +
 include/linux/init_task.h     |    4 +-
 init/Kconfig                  |   10 ++
 kernel/Makefile               |    1 +
 kernel/cgroup.c               |    3 +
 kernel/cgroup_timer_slack.c   |  242 +++++++++++++++++++++++++++++++++++++++++
 kernel/sys.c                  |   24 +++-
 7 files changed, 283 insertions(+), 7 deletions(-)
 create mode 100644 kernel/cgroup_timer_slack.c

-- 
1.7.3.5


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH, v3 1/2] cgroups: export cgroup_iter_{start,next,end}
  2011-02-02 20:47 [PATCH, v3 0/2] Timer slack cgroup subsystem Kirill A. Shutsemov
@ 2011-02-02 20:47 ` Kirill A. Shutsemov
  2011-02-02 21:21   ` Paul Menage
  2011-02-02 20:47 ` [PATCH, v3 2/2] cgroups: introduce timer slack subsystem Kirill A. Shutsemov
  1 sibling, 1 reply; 22+ messages in thread
From: Kirill A. Shutsemov @ 2011-02-02 20:47 UTC (permalink / raw)
  To: Paul Menage, Li Zefan
  Cc: containers, jacob.jun.pan, Arjan van de Ven, linux-kernel,
	Matt Helsley, Kirill A. Shutemov

From: Kirill A. Shutemov <kirill@shutemov.name>

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 kernel/cgroup.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b24d702..8234daa 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2443,6 +2443,7 @@ void cgroup_iter_start(struct cgroup *cgrp, struct cgroup_iter *it)
 	it->cg_link = &cgrp->css_sets;
 	cgroup_advance_iter(cgrp, it);
 }
+EXPORT_SYMBOL_GPL(cgroup_iter_start);
 
 struct task_struct *cgroup_iter_next(struct cgroup *cgrp,
 					struct cgroup_iter *it)
@@ -2467,11 +2468,13 @@ struct task_struct *cgroup_iter_next(struct cgroup *cgrp,
 	}
 	return res;
 }
+EXPORT_SYMBOL_GPL(cgroup_iter_next);
 
 void cgroup_iter_end(struct cgroup *cgrp, struct cgroup_iter *it)
 {
 	read_unlock(&css_set_lock);
 }
+EXPORT_SYMBOL_GPL(cgroup_iter_end);
 
 static inline int started_after_time(struct task_struct *t1,
 				     struct timespec *time,
-- 
1.7.3.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-02 20:47 [PATCH, v3 0/2] Timer slack cgroup subsystem Kirill A. Shutsemov
  2011-02-02 20:47 ` [PATCH, v3 1/2] cgroups: export cgroup_iter_{start,next,end} Kirill A. Shutsemov
@ 2011-02-02 20:47 ` Kirill A. Shutsemov
  2011-02-02 22:56   ` jacob pan
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Kirill A. Shutsemov @ 2011-02-02 20:47 UTC (permalink / raw)
  To: Paul Menage, Li Zefan
  Cc: containers, jacob.jun.pan, Arjan van de Ven, linux-kernel,
	Matt Helsley, Kirill A. Shutemov

From: Kirill A. Shutemov <kirill@shutemov.name>

Provides a way of tasks grouping by timer slack value. Introduces per
cgroup max and min timer slack value. When a task attaches to a cgroup,
its timer slack value adjusts (if needed) to fit min-max range.

It also provides a way to set timer slack value for all tasks in the
cgroup at once.

This functionality is useful in mobile devices where certain background
apps are attached to a cgroup and minimum wakeups are desired.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Idea-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 include/linux/cgroup_subsys.h |    6 +
 include/linux/init_task.h     |    4 +-
 init/Kconfig                  |   10 ++
 kernel/Makefile               |    1 +
 kernel/cgroup_timer_slack.c   |  242 +++++++++++++++++++++++++++++++++++++++++
 kernel/sys.c                  |   24 +++-
 6 files changed, 280 insertions(+), 7 deletions(-)
 create mode 100644 kernel/cgroup_timer_slack.c

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index ccefff0..e399228 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -66,3 +66,9 @@ SUBSYS(blkio)
 #endif
 
 /* */
+
+#ifdef CONFIG_CGROUP_TIMER_SLACK
+SUBSYS(timer_slack)
+#endif
+
+/* */
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index caa151f..48eca8f 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -124,6 +124,8 @@ extern struct cred init_cred;
 # define INIT_PERF_EVENTS(tsk)
 #endif
 
+#define TIMER_SLACK_NS_DEFAULT 50000
+
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1fffff (=2MB)
@@ -177,7 +179,7 @@ extern struct cred init_cred;
 	.cpu_timers	= INIT_CPU_TIMERS(tsk.cpu_timers),		\
 	.fs_excl	= ATOMIC_INIT(0),				\
 	.pi_lock	= __RAW_SPIN_LOCK_UNLOCKED(tsk.pi_lock),	\
-	.timer_slack_ns = 50000, /* 50 usec default slack */		\
+	.timer_slack_ns = TIMER_SLACK_NS_DEFAULT,			\
 	.pids = {							\
 		[PIDTYPE_PID]  = INIT_PID_LINK(PIDTYPE_PID),		\
 		[PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID),		\
diff --git a/init/Kconfig b/init/Kconfig
index be788c0..f21b4ce 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -596,6 +596,16 @@ config CGROUP_FREEZER
 	  Provides a way to freeze and unfreeze all tasks in a
 	  cgroup.
 
+config CGROUP_TIMER_SLACK
+	tristate "Timer slack cgroup subsystem"
+	help
+	  Provides a way of tasks grouping by timer slack value.
+	  Introduces per cgroup timer slack value which will override
+	  the default timer slack value once a task is attached to a
+	  cgroup.
+	  It's useful in mobile devices where certain background apps
+	  are attached to a cgroup and minimum wakeups are desired.
+
 config CGROUP_DEVICE
 	bool "Device controller for cgroups"
 	help
diff --git a/kernel/Makefile b/kernel/Makefile
index 353d3fe..0b60239 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o
 obj-$(CONFIG_CGROUPS) += cgroup.o
 obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
+obj-$(CONFIG_CGROUP_TIMER_SLACK) += cgroup_timer_slack.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
 obj-$(CONFIG_UTS_NS) += utsname.o
diff --git a/kernel/cgroup_timer_slack.c b/kernel/cgroup_timer_slack.c
new file mode 100644
index 0000000..a343a50
--- /dev/null
+++ b/kernel/cgroup_timer_slack.c
@@ -0,0 +1,242 @@
+/*
+ * cgroup_timer_slack.c - control group timer slack subsystem
+ *
+ * Copyright Nokia Corparation, 2011
+ * Author: Kirill A. Shutemov
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/cgroup.h>
+#include <linux/init_task.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+struct cgroup_subsys timer_slack_subsys;
+struct timer_slack_cgroup {
+	struct cgroup_subsys_state css;
+	unsigned long min_slack_ns;
+	unsigned long max_slack_ns;
+};
+
+enum {
+	TIMER_SLACK_MIN,
+	TIMER_SLACK_MAX,
+};
+
+int dummy_timer_slack_check(struct task_struct *task, unsigned long slack_ns);
+extern int (*timer_slack_check)(struct task_struct *task,
+		unsigned long slack_ns);
+
+static struct timer_slack_cgroup *cgroup_to_tslack_cgroup(struct cgroup *cgroup)
+{
+	struct cgroup_subsys_state *css;
+
+	css = cgroup_subsys_state(cgroup, timer_slack_subsys.subsys_id);
+	return container_of(css, struct timer_slack_cgroup, css);
+}
+
+static int cgroup_timer_slack_check(struct task_struct *task,
+		unsigned long slack_ns)
+{
+	struct cgroup_subsys_state *css;
+	struct timer_slack_cgroup *tslack_cgroup;
+
+	css = task_subsys_state(task, timer_slack_subsys.subsys_id);
+	tslack_cgroup = container_of(css, struct timer_slack_cgroup, css);
+
+	if (slack_ns < tslack_cgroup->min_slack_ns)
+		return -EPERM;
+	if (slack_ns > tslack_cgroup->max_slack_ns)
+		return -EPERM;
+	return 0;
+}
+
+static struct cgroup_subsys_state *
+tslack_cgroup_create(struct cgroup_subsys *subsys, struct cgroup *cgroup)
+{
+	struct timer_slack_cgroup *tslack_cgroup;
+
+	tslack_cgroup = kmalloc(sizeof(*tslack_cgroup), GFP_KERNEL);
+	if (!tslack_cgroup)
+		return ERR_PTR(-ENOMEM);
+
+	if (cgroup->parent) {
+		struct timer_slack_cgroup *parent;
+		parent = cgroup_to_tslack_cgroup(cgroup->parent);
+		tslack_cgroup->min_slack_ns = parent->min_slack_ns;
+		tslack_cgroup->max_slack_ns = parent->max_slack_ns;
+	} else {
+		tslack_cgroup->min_slack_ns = 0UL;
+		tslack_cgroup->max_slack_ns = ULONG_MAX;
+	}
+
+	return &tslack_cgroup->css;
+}
+
+static void tslack_cgroup_destroy(struct cgroup_subsys *subsys,
+		struct cgroup *cgroup)
+{
+	kfree(cgroup_to_tslack_cgroup(cgroup));
+}
+
+/*
+ * Adjust ->timer_slack_ns and ->default_max_slack_ns of the task to fit
+ * limits of the cgroup.
+ */
+static void tslack_adjust_task(struct timer_slack_cgroup *tslack_cgroup,
+		struct task_struct *tsk)
+{
+	if (tslack_cgroup->min_slack_ns > tsk->timer_slack_ns)
+		tsk->timer_slack_ns = tslack_cgroup->min_slack_ns;
+	else if (tslack_cgroup->max_slack_ns < tsk->timer_slack_ns)
+		tsk->timer_slack_ns = tslack_cgroup->max_slack_ns;
+
+	if (tslack_cgroup->min_slack_ns > tsk->default_timer_slack_ns)
+		tsk->default_timer_slack_ns = tslack_cgroup->min_slack_ns;
+	else if (tslack_cgroup->max_slack_ns < tsk->default_timer_slack_ns)
+		tsk->default_timer_slack_ns = tslack_cgroup->max_slack_ns;
+}
+
+static void tslack_cgroup_attach(struct cgroup_subsys *subsys,
+		struct cgroup *cgroup, struct cgroup *prev,
+		struct task_struct *tsk, bool threadgroup)
+{
+	tslack_adjust_task(cgroup_to_tslack_cgroup(cgroup), tsk);
+}
+
+static int tslack_write_set_slack_ns(struct cgroup *cgroup, struct cftype *cft,
+		u64 val)
+{
+	struct timer_slack_cgroup *tslack_cgroup;
+	struct cgroup_iter it;
+	struct task_struct *task;
+
+	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
+	if (!val || val < tslack_cgroup->min_slack_ns ||
+			val > tslack_cgroup->max_slack_ns )
+		return -EINVAL;
+
+	/* Change timer slack value for all tasks in the cgroup */
+	cgroup_iter_start(cgroup, &it);
+	while ((task = cgroup_iter_next(cgroup, &it)))
+		task->timer_slack_ns = val;
+	cgroup_iter_end(cgroup, &it);
+
+	return 0;
+}
+
+static u64 tslack_read_range(struct cgroup *cgroup, struct cftype *cft)
+{
+	struct timer_slack_cgroup *tslack_cgroup;
+
+	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
+	switch (cft->private) {
+	case TIMER_SLACK_MIN:
+		return tslack_cgroup->min_slack_ns;
+	case TIMER_SLACK_MAX:
+		return tslack_cgroup->max_slack_ns;
+	default:
+		BUG();
+	};
+}
+
+static int tslack_write_range(struct cgroup *cgroup, struct cftype *cft,
+		u64 val)
+{
+	struct timer_slack_cgroup *tslack_cgroup;
+	struct cgroup_iter it;
+	struct task_struct *task;
+
+	if (!val)
+		return -EINVAL;
+
+	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
+	switch (cft->private) {
+	case TIMER_SLACK_MIN:
+		if (val > tslack_cgroup->max_slack_ns)
+			return -EINVAL;
+		tslack_cgroup->min_slack_ns = val;
+		break;
+	case TIMER_SLACK_MAX:
+		if (val < tslack_cgroup->min_slack_ns)
+			return -EINVAL;
+		tslack_cgroup->max_slack_ns = val;
+		break;
+	default:
+		BUG();
+	}
+
+	/*
+	 * Adjust timer slack value for all tasks in the cgroup to fit
+	 * min-max range.
+	 */
+	cgroup_iter_start(cgroup, &it);
+	while ((task = cgroup_iter_next(cgroup, &it)))
+		tslack_adjust_task(tslack_cgroup, task);
+	cgroup_iter_end(cgroup, &it);
+
+	return 0;
+}
+
+static struct cftype files[] = {
+	{
+		.name = "set_slack_ns",
+		.write_u64 = tslack_write_set_slack_ns,
+	},
+	{
+		.name = "min_slack_ns",
+		.private = TIMER_SLACK_MIN,
+		.read_u64 = tslack_read_range,
+		.write_u64 = tslack_write_range,
+	},
+	{
+		.name = "max_slack_ns",
+		.private = TIMER_SLACK_MAX,
+		.read_u64 = tslack_read_range,
+		.write_u64 = tslack_write_range,
+	},
+};
+
+static int tslack_cgroup_populate(struct cgroup_subsys *subsys,
+		struct cgroup *cgroup)
+{
+	return cgroup_add_files(cgroup, subsys, files, ARRAY_SIZE(files));
+}
+
+struct cgroup_subsys timer_slack_subsys = {
+	.name = "timer_slack",
+	.module = THIS_MODULE,
+#ifdef CONFIG_CGROUP_TIMER_SLACK
+	.subsys_id = timer_slack_subsys_id,
+#endif
+	.create = tslack_cgroup_create,
+	.destroy = tslack_cgroup_destroy,
+	.attach = tslack_cgroup_attach,
+	.populate = tslack_cgroup_populate,
+};
+
+static int __init init_cgroup_timer_slack(void)
+{
+	BUG_ON(timer_slack_check != dummy_timer_slack_check);
+	timer_slack_check = cgroup_timer_slack_check;
+	return cgroup_load_subsys(&timer_slack_subsys);
+}
+
+static void __exit exit_cgroup_timer_slack(void)
+{
+	BUG_ON(timer_slack_check != cgroup_timer_slack_check);
+	timer_slack_check = dummy_timer_slack_check;
+	cgroup_unload_subsys(&timer_slack_subsys);
+}
+
+module_init(init_cgroup_timer_slack);
+module_exit(exit_cgroup_timer_slack);
+MODULE_LICENSE("GPL");
diff --git a/kernel/sys.c b/kernel/sys.c
index 18da702..57322a7 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -118,6 +118,16 @@ EXPORT_SYMBOL(cad_pid);
 
 void (*pm_power_off_prepare)(void);
 
+int dummy_timer_slack_check(struct task_struct *task, unsigned long slack_ns)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dummy_timer_slack_check);
+
+int (*timer_slack_check)(struct task_struct *task, unsigned long slack_ns) =
+	dummy_timer_slack_check;
+EXPORT_SYMBOL_GPL(timer_slack_check);
+
 /*
  * set the priority of a task
  * - the caller must hold the RCU read lock
@@ -1694,12 +1704,14 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			error = current->timer_slack_ns;
 			break;
 		case PR_SET_TIMERSLACK:
-			if (arg2 <= 0)
-				current->timer_slack_ns =
-					current->default_timer_slack_ns;
-			else
-				current->timer_slack_ns = arg2;
-			error = 0;
+			if (arg2 <= 0) {
+				me->timer_slack_ns = me->default_timer_slack_ns;
+				break;
+			}
+
+			error = timer_slack_check(me, arg2);
+			if (!error)
+				me->timer_slack_ns = arg2;
 			break;
 		case PR_MCE_KILL:
 			if (arg4 | arg5)
-- 
1.7.3.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 1/2] cgroups: export cgroup_iter_{start,next,end}
  2011-02-02 20:47 ` [PATCH, v3 1/2] cgroups: export cgroup_iter_{start,next,end} Kirill A. Shutsemov
@ 2011-02-02 21:21   ` Paul Menage
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Menage @ 2011-02-02 21:21 UTC (permalink / raw)
  To: Kirill A. Shutsemov
  Cc: Li Zefan, containers, jacob.jun.pan, Arjan van de Ven,
	linux-kernel, Matt Helsley

On Wed, Feb 2, 2011 at 12:47 PM, Kirill A. Shutsemov
<kirill@shutemov.name> wrote:
> From: Kirill A. Shutemov <kirill@shutemov.name>
>
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>

Acked-by: Paul Menage <menage@google.com>

> ---
>  kernel/cgroup.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index b24d702..8234daa 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2443,6 +2443,7 @@ void cgroup_iter_start(struct cgroup *cgrp, struct cgroup_iter *it)
>        it->cg_link = &cgrp->css_sets;
>        cgroup_advance_iter(cgrp, it);
>  }
> +EXPORT_SYMBOL_GPL(cgroup_iter_start);
>
>  struct task_struct *cgroup_iter_next(struct cgroup *cgrp,
>                                        struct cgroup_iter *it)
> @@ -2467,11 +2468,13 @@ struct task_struct *cgroup_iter_next(struct cgroup *cgrp,
>        }
>        return res;
>  }
> +EXPORT_SYMBOL_GPL(cgroup_iter_next);
>
>  void cgroup_iter_end(struct cgroup *cgrp, struct cgroup_iter *it)
>  {
>        read_unlock(&css_set_lock);
>  }
> +EXPORT_SYMBOL_GPL(cgroup_iter_end);
>
>  static inline int started_after_time(struct task_struct *t1,
>                                     struct timespec *time,
> --
> 1.7.3.5
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-02 20:47 ` [PATCH, v3 2/2] cgroups: introduce timer slack subsystem Kirill A. Shutsemov
@ 2011-02-02 22:56   ` jacob pan
  2011-02-03  9:22     ` Kirill A. Shutemov
  2011-02-02 23:23   ` Paul Menage
  2011-02-03  5:46   ` Matt Helsley
  2 siblings, 1 reply; 22+ messages in thread
From: jacob pan @ 2011-02-02 22:56 UTC (permalink / raw)
  To: Kirill A. Shutsemov
  Cc: Paul Menage, Li Zefan, containers, Arjan van de Ven,
	linux-kernel, Matt Helsley

On Wed,  2 Feb 2011 22:47:36 +0200
"Kirill A. Shutsemov" <kirill@shutemov.name> wrote:

> From: Kirill A. Shutemov <kirill@shutemov.name>
> 
> Provides a way of tasks grouping by timer slack value. Introduces per
> cgroup max and min timer slack value. When a task attaches to a
> cgroup, its timer slack value adjusts (if needed) to fit min-max
> range.
> 
> It also provides a way to set timer slack value for all tasks in the
> cgroup at once.
> 
> This functionality is useful in mobile devices where certain
> background apps are attached to a cgroup and minimum wakeups are
> desired.
> 
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
> Idea-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  include/linux/cgroup_subsys.h |    6 +
>  include/linux/init_task.h     |    4 +-
>  init/Kconfig                  |   10 ++
>  kernel/Makefile               |    1 +
>  kernel/cgroup_timer_slack.c   |  242
> +++++++++++++++++++++++++++++++++++++++++

> +
> +static int cgroup_timer_slack_check(struct task_struct *task,
> +		unsigned long slack_ns)
> +{
> +	struct cgroup_subsys_state *css;
> +	struct timer_slack_cgroup *tslack_cgroup;
> +

use rcu_read_lock()/unlock?
lockdep complains unprotected rcu dereference check.

> +	css = task_subsys_state(task, timer_slack_subsys.subsys_id);
> +	tslack_cgroup = container_of(css, struct timer_slack_cgroup,
> css); +
> +	if (slack_ns < tslack_cgroup->min_slack_ns)
> +		return -EPERM;
> +	if (slack_ns > tslack_cgroup->max_slack_ns)
> +		return -EPERM;
> +	return 0;
> +}
> +

> +
> +static struct cftype files[] = {
> +	{
> +		.name = "set_slack_ns",
> +		.write_u64 = tslack_write_set_slack_ns,
> +	},
should we also allow reading of the current slack_ns?


> +	{
> +		.name = "min_slack_ns",
> +		.private = TIMER_SLACK_MIN,
> +		.read_u64 = tslack_read_range,
> +		.write_u64 = tslack_write_range,
> +	},
> +	{
> +		.name = "max_slack_ns",
> +		.private = TIMER_SLACK_MAX,
> +		.read_u64 = tslack_read_range,
> +		.write_u64 = tslack_write_range,
> +	},
> +};
> +

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-02 20:47 ` [PATCH, v3 2/2] cgroups: introduce timer slack subsystem Kirill A. Shutsemov
  2011-02-02 22:56   ` jacob pan
@ 2011-02-02 23:23   ` Paul Menage
  2011-02-03  5:48     ` Matt Helsley
  2011-02-03  9:28     ` Kirill A. Shutemov
  2011-02-03  5:46   ` Matt Helsley
  2 siblings, 2 replies; 22+ messages in thread
From: Paul Menage @ 2011-02-02 23:23 UTC (permalink / raw)
  To: Kirill A. Shutsemov
  Cc: Li Zefan, containers, jacob.jun.pan, Arjan van de Ven,
	linux-kernel, Matt Helsley

On Wed, Feb 2, 2011 at 12:47 PM, Kirill A. Shutsemov
<kirill@shutemov.name> wrote:
> From: Kirill A. Shutemov <kirill@shutemov.name>
>
> Provides a way of tasks grouping by timer slack value. Introduces per
> cgroup max and min timer slack value. When a task attaches to a cgroup,
> its timer slack value adjusts (if needed) to fit min-max range.
>
> It also provides a way to set timer slack value for all tasks in the
> cgroup at once.
>
> This functionality is useful in mobile devices where certain background
> apps are attached to a cgroup and minimum wakeups are desired.

If you really want to be able to make this modular, I'd be inclined to
make the check_timer_slack hook just default to NULL, rather than
introducing dummy_timer_slack_check()


> +
> +static int tslack_write_range(struct cgroup *cgroup, struct cftype *cft,
> +               u64 val)
> +{
> +       struct timer_slack_cgroup *tslack_cgroup;
> +       struct cgroup_iter it;
> +       struct task_struct *task;
> +
> +       if (!val)
> +               return -EINVAL;
> +
> +       tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> +       switch (cft->private) {
> +       case TIMER_SLACK_MIN:
> +               if (val > tslack_cgroup->max_slack_ns)
> +                       return -EINVAL;
> +               tslack_cgroup->min_slack_ns = val;
> +               break;
> +       case TIMER_SLACK_MAX:
> +               if (val < tslack_cgroup->min_slack_ns)
> +                       return -EINVAL;
> +               tslack_cgroup->max_slack_ns = val;
> +               break;
> +       default:
> +               BUG();
> +       }
> +

Don't we want to keep the min/max applied hierarchically as well? i.e.
a child can't set its min/max outside the range of its parents?

> +
> +static int __init init_cgroup_timer_slack(void)
> +{
> +       BUG_ON(timer_slack_check != dummy_timer_slack_check);

Better to make this just fail the initialization if someone else has
already claimed the hook, rather than crashing.

Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-02 20:47 ` [PATCH, v3 2/2] cgroups: introduce timer slack subsystem Kirill A. Shutsemov
  2011-02-02 22:56   ` jacob pan
  2011-02-02 23:23   ` Paul Menage
@ 2011-02-03  5:46   ` Matt Helsley
  2011-02-03  9:41     ` Kirill A. Shutemov
  2 siblings, 1 reply; 22+ messages in thread
From: Matt Helsley @ 2011-02-03  5:46 UTC (permalink / raw)
  To: Kirill A. Shutsemov
  Cc: Paul Menage, Li Zefan, containers, jacob.jun.pan,
	Arjan van de Ven, linux-kernel, Matt Helsley

On Wed, Feb 02, 2011 at 10:47:36PM +0200, Kirill A. Shutsemov wrote:
> From: Kirill A. Shutemov <kirill@shutemov.name>
> 
> Provides a way of tasks grouping by timer slack value. Introduces per
> cgroup max and min timer slack value. When a task attaches to a cgroup,
> its timer slack value adjusts (if needed) to fit min-max range.
> 
> It also provides a way to set timer slack value for all tasks in the
> cgroup at once.
> 
> This functionality is useful in mobile devices where certain background
> apps are attached to a cgroup and minimum wakeups are desired.
> 
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
> Idea-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  include/linux/cgroup_subsys.h |    6 +
>  include/linux/init_task.h     |    4 +-
>  init/Kconfig                  |   10 ++
>  kernel/Makefile               |    1 +
>  kernel/cgroup_timer_slack.c   |  242 +++++++++++++++++++++++++++++++++++++++++
>  kernel/sys.c                  |   24 +++-
>  6 files changed, 280 insertions(+), 7 deletions(-)
>  create mode 100644 kernel/cgroup_timer_slack.c
> 
> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> index ccefff0..e399228 100644
> --- a/include/linux/cgroup_subsys.h
> +++ b/include/linux/cgroup_subsys.h
> @@ -66,3 +66,9 @@ SUBSYS(blkio)
>  #endif
> 
>  /* */
> +
> +#ifdef CONFIG_CGROUP_TIMER_SLACK
> +SUBSYS(timer_slack)
> +#endif
> +
> +/* */
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index caa151f..48eca8f 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -124,6 +124,8 @@ extern struct cred init_cred;
>  # define INIT_PERF_EVENTS(tsk)
>  #endif
> 
> +#define TIMER_SLACK_NS_DEFAULT 50000
> +
>  /*
>   *  INIT_TASK is used to set up the first task table, touch at
>   * your own risk!. Base=0, limit=0x1fffff (=2MB)
> @@ -177,7 +179,7 @@ extern struct cred init_cred;
>  	.cpu_timers	= INIT_CPU_TIMERS(tsk.cpu_timers),		\
>  	.fs_excl	= ATOMIC_INIT(0),				\
>  	.pi_lock	= __RAW_SPIN_LOCK_UNLOCKED(tsk.pi_lock),	\
> -	.timer_slack_ns = 50000, /* 50 usec default slack */		\
> +	.timer_slack_ns = TIMER_SLACK_NS_DEFAULT,			\
>  	.pids = {							\
>  		[PIDTYPE_PID]  = INIT_PID_LINK(PIDTYPE_PID),		\
>  		[PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID),		\
> diff --git a/init/Kconfig b/init/Kconfig
> index be788c0..f21b4ce 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -596,6 +596,16 @@ config CGROUP_FREEZER
>  	  Provides a way to freeze and unfreeze all tasks in a
>  	  cgroup.
> 
> +config CGROUP_TIMER_SLACK
> +	tristate "Timer slack cgroup subsystem"
> +	help
> +	  Provides a way of tasks grouping by timer slack value.
> +	  Introduces per cgroup timer slack value which will override
> +	  the default timer slack value once a task is attached to a
> +	  cgroup.
> +	  It's useful in mobile devices where certain background apps
> +	  are attached to a cgroup and minimum wakeups are desired.

Again -- I don't think "minimum" is the right choice of words here.

> +
>  config CGROUP_DEVICE
>  	bool "Device controller for cgroups"
>  	help
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 353d3fe..0b60239 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -61,6 +61,7 @@ obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
>  obj-$(CONFIG_COMPAT) += compat.o
>  obj-$(CONFIG_CGROUPS) += cgroup.o
>  obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
> +obj-$(CONFIG_CGROUP_TIMER_SLACK) += cgroup_timer_slack.o
>  obj-$(CONFIG_CPUSETS) += cpuset.o
>  obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
>  obj-$(CONFIG_UTS_NS) += utsname.o
> diff --git a/kernel/cgroup_timer_slack.c b/kernel/cgroup_timer_slack.c
> new file mode 100644
> index 0000000..a343a50
> --- /dev/null
> +++ b/kernel/cgroup_timer_slack.c
> @@ -0,0 +1,242 @@
> +/*
> + * cgroup_timer_slack.c - control group timer slack subsystem
> + *
> + * Copyright Nokia Corparation, 2011
> + * Author: Kirill A. Shutemov
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +#include <linux/cgroup.h>
> +#include <linux/init_task.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +
> +struct cgroup_subsys timer_slack_subsys;
> +struct timer_slack_cgroup {
> +	struct cgroup_subsys_state css;
> +	unsigned long min_slack_ns;
> +	unsigned long max_slack_ns;
> +};
> +
> +enum {
> +	TIMER_SLACK_MIN,
> +	TIMER_SLACK_MAX,
> +};
> +
> +int dummy_timer_slack_check(struct task_struct *task, unsigned long slack_ns);

The above should be "extern" shouldn't it? Also, I think there's a way
to eliminate it (more below).

> +extern int (*timer_slack_check)(struct task_struct *task,
> +		unsigned long slack_ns);
> +
> +static struct timer_slack_cgroup *cgroup_to_tslack_cgroup(struct cgroup *cgroup)
> +{
> +	struct cgroup_subsys_state *css;
> +
> +	css = cgroup_subsys_state(cgroup, timer_slack_subsys.subsys_id);
> +	return container_of(css, struct timer_slack_cgroup, css);
> +}
> +
> +static int cgroup_timer_slack_check(struct task_struct *task,
> +		unsigned long slack_ns)

nit: rename to cgroup_timer_slack_allowed() ?

> +{
> +	struct cgroup_subsys_state *css;
> +	struct timer_slack_cgroup *tslack_cgroup;
> +
> +	css = task_subsys_state(task, timer_slack_subsys.subsys_id);
> +	tslack_cgroup = container_of(css, struct timer_slack_cgroup, css);
> +
> +	if (slack_ns < tslack_cgroup->min_slack_ns)
> +		return -EPERM;
> +	if (slack_ns > tslack_cgroup->max_slack_ns)
> +		return -EPERM;
> +	return 0;
> +}
> +
> +static struct cgroup_subsys_state *
> +tslack_cgroup_create(struct cgroup_subsys *subsys, struct cgroup *cgroup)
> +{
> +	struct timer_slack_cgroup *tslack_cgroup;
> +
> +	tslack_cgroup = kmalloc(sizeof(*tslack_cgroup), GFP_KERNEL);
> +	if (!tslack_cgroup)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (cgroup->parent) {
> +		struct timer_slack_cgroup *parent;
> +		parent = cgroup_to_tslack_cgroup(cgroup->parent);
> +		tslack_cgroup->min_slack_ns = parent->min_slack_ns;
> +		tslack_cgroup->max_slack_ns = parent->max_slack_ns;
> +	} else {
> +		tslack_cgroup->min_slack_ns = 0UL;
> +		tslack_cgroup->max_slack_ns = ULONG_MAX;
> +	}
> +
> +	return &tslack_cgroup->css;
> +}
> +
> +static void tslack_cgroup_destroy(struct cgroup_subsys *subsys,
> +		struct cgroup *cgroup)
> +{
> +	kfree(cgroup_to_tslack_cgroup(cgroup));
> +}
> +
> +/*
> + * Adjust ->timer_slack_ns and ->default_max_slack_ns of the task to fit
> + * limits of the cgroup.
> + */
> +static void tslack_adjust_task(struct timer_slack_cgroup *tslack_cgroup,
> +		struct task_struct *tsk)
> +{
> +	if (tslack_cgroup->min_slack_ns > tsk->timer_slack_ns)
> +		tsk->timer_slack_ns = tslack_cgroup->min_slack_ns;
> +	else if (tslack_cgroup->max_slack_ns < tsk->timer_slack_ns)
> +		tsk->timer_slack_ns = tslack_cgroup->max_slack_ns;
> +
> +	if (tslack_cgroup->min_slack_ns > tsk->default_timer_slack_ns)
> +		tsk->default_timer_slack_ns = tslack_cgroup->min_slack_ns;
> +	else if (tslack_cgroup->max_slack_ns < tsk->default_timer_slack_ns)
> +		tsk->default_timer_slack_ns = tslack_cgroup->max_slack_ns;
> +}
> +
> +static void tslack_cgroup_attach(struct cgroup_subsys *subsys,
> +		struct cgroup *cgroup, struct cgroup *prev,
> +		struct task_struct *tsk, bool threadgroup)
> +{
> +	tslack_adjust_task(cgroup_to_tslack_cgroup(cgroup), tsk);
> +}
> +
> +static int tslack_write_set_slack_ns(struct cgroup *cgroup, struct cftype *cft,
> +		u64 val)
> +{
> +	struct timer_slack_cgroup *tslack_cgroup;
> +	struct cgroup_iter it;
> +	struct task_struct *task;
> +
> +	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> +	if (!val || val < tslack_cgroup->min_slack_ns ||

Why is a val of 0 disallowed? I know having slack is good, but for
an administrator or tool that doesn't care about number of wakeups
and cares more about wringing out performance a slack of
0 seems acceptable. Is this just here to be consistent with the
values passed in via prctl?

> +			val > tslack_cgroup->max_slack_ns )
> +		return -EINVAL;

Shouldn't it be EPERM and not EINVAL?

The write(2) man page says: "Other errors may occur, depending on the
object connected to fd." So I think EPERM is fine and more descriptive.

> +
> +	/* Change timer slack value for all tasks in the cgroup */
> +	cgroup_iter_start(cgroup, &it);
> +	while ((task = cgroup_iter_next(cgroup, &it)))
> +		task->timer_slack_ns = val;
> +	cgroup_iter_end(cgroup, &it);
> +
> +	return 0;
> +}
> +
> +static u64 tslack_read_range(struct cgroup *cgroup, struct cftype *cft)
> +{
> +	struct timer_slack_cgroup *tslack_cgroup;
> +
> +	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> +	switch (cft->private) {
> +	case TIMER_SLACK_MIN:
> +		return tslack_cgroup->min_slack_ns;
> +	case TIMER_SLACK_MAX:
> +		return tslack_cgroup->max_slack_ns;
> +	default:
> +		BUG();
> +	};
> +}
> +
> +static int tslack_write_range(struct cgroup *cgroup, struct cftype *cft,
> +		u64 val)
> +{
> +	struct timer_slack_cgroup *tslack_cgroup;
> +	struct cgroup_iter it;
> +	struct task_struct *task;
> +
> +	if (!val)
> +		return -EINVAL;

Same question re: 0 slack...

> +
> +	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> +	switch (cft->private) {
> +	case TIMER_SLACK_MIN:
> +		if (val > tslack_cgroup->max_slack_ns)
> +			return -EINVAL;
> +		tslack_cgroup->min_slack_ns = val;
> +		break;
> +	case TIMER_SLACK_MAX:
> +		if (val < tslack_cgroup->min_slack_ns)
> +			return -EINVAL;
> +		tslack_cgroup->max_slack_ns = val;
> +		break;
> +	default:
> +		BUG();
> +	}
> +
> +	/*
> +	 * Adjust timer slack value for all tasks in the cgroup to fit
> +	 * min-max range.
> +	 */
> +	cgroup_iter_start(cgroup, &it);
> +	while ((task = cgroup_iter_next(cgroup, &it)))
> +		tslack_adjust_task(tslack_cgroup, task);
> +	cgroup_iter_end(cgroup, &it);
> +
> +	return 0;
> +}
> +
> +static struct cftype files[] = {
> +	{
> +		.name = "set_slack_ns",
> +		.write_u64 = tslack_write_set_slack_ns,
> +	},
> +	{
> +		.name = "min_slack_ns",
> +		.private = TIMER_SLACK_MIN,
> +		.read_u64 = tslack_read_range,
> +		.write_u64 = tslack_write_range,
> +	},
> +	{
> +		.name = "max_slack_ns",
> +		.private = TIMER_SLACK_MAX,
> +		.read_u64 = tslack_read_range,
> +		.write_u64 = tslack_write_range,
> +	},
> +};

Looks good, thanks!

> +
> +static int tslack_cgroup_populate(struct cgroup_subsys *subsys,
> +		struct cgroup *cgroup)
> +{
> +	return cgroup_add_files(cgroup, subsys, files, ARRAY_SIZE(files));
> +}
> +
> +struct cgroup_subsys timer_slack_subsys = {
> +	.name = "timer_slack",
> +	.module = THIS_MODULE,
> +#ifdef CONFIG_CGROUP_TIMER_SLACK
> +	.subsys_id = timer_slack_subsys_id,
> +#endif

Uhh... why is the #ifdef necessary? Won't it always be true if we're
compiling this file?

> +	.create = tslack_cgroup_create,
> +	.destroy = tslack_cgroup_destroy,
> +	.attach = tslack_cgroup_attach,
> +	.populate = tslack_cgroup_populate,
> +};
> +
> +static int __init init_cgroup_timer_slack(void)
> +{
> +	BUG_ON(timer_slack_check != dummy_timer_slack_check);

You could just save the old timer_slack_check value into a static pointer
and then restore it in exit_cgroup_timer_slack. That would eliminate the
need for one of the two EXPORTs below but you wouldn't be able to do this
BUG_ON check. I think that'd be a slight improvement.

> +	timer_slack_check = cgroup_timer_slack_check;
> +	return cgroup_load_subsys(&timer_slack_subsys);

Shouldn't you load the subsystem first then set the timer_slack_check
function pointer once you know that the load has succeeded?

> +}
> +
> +static void __exit exit_cgroup_timer_slack(void)
> +{
> +	BUG_ON(timer_slack_check != cgroup_timer_slack_check);
> +	timer_slack_check = dummy_timer_slack_check;
> +	cgroup_unload_subsys(&timer_slack_subsys);
> +}
> +
> +module_init(init_cgroup_timer_slack);
> +module_exit(exit_cgroup_timer_slack);
> +MODULE_LICENSE("GPL");
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 18da702..57322a7 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -118,6 +118,16 @@ EXPORT_SYMBOL(cad_pid);
> 
>  void (*pm_power_off_prepare)(void);
> 
> +int dummy_timer_slack_check(struct task_struct *task, unsigned long slack_ns)
> +{
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(dummy_timer_slack_check);

As described previously, I don't think EXPORT of dummy_timer_slack_check is
necessary.

> +
> +int (*timer_slack_check)(struct task_struct *task, unsigned long slack_ns) =
> +	dummy_timer_slack_check;
> +EXPORT_SYMBOL_GPL(timer_slack_check);
> +
>  /*
>   * set the priority of a task
>   * - the caller must hold the RCU read lock
> @@ -1694,12 +1704,14 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  			error = current->timer_slack_ns;
>  			break;
>  		case PR_SET_TIMERSLACK:
> -			if (arg2 <= 0)
> -				current->timer_slack_ns =
> -					current->default_timer_slack_ns;
> -			else
> -				current->timer_slack_ns = arg2;
> -			error = 0;
> +			if (arg2 <= 0) {
> +				me->timer_slack_ns = me->default_timer_slack_ns;
> +				break;
> +			}
> +
> +			error = timer_slack_check(me, arg2);
> +			if (!error)
> +				me->timer_slack_ns = arg2;
>  			break;
>  		case PR_MCE_KILL:
>  			if (arg4 | arg5)
> -- 
> 1.7.3.5
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-02 23:23   ` Paul Menage
@ 2011-02-03  5:48     ` Matt Helsley
  2011-02-03  9:28     ` Kirill A. Shutemov
  1 sibling, 0 replies; 22+ messages in thread
From: Matt Helsley @ 2011-02-03  5:48 UTC (permalink / raw)
  To: Paul Menage
  Cc: Kirill A. Shutsemov, Li Zefan, containers, jacob.jun.pan,
	Arjan van de Ven, linux-kernel, Matt Helsley

On Wed, Feb 02, 2011 at 03:23:15PM -0800, Paul Menage wrote:
> On Wed, Feb 2, 2011 at 12:47 PM, Kirill A. Shutsemov
<kirill@shutemov.name> wrote:
> > +static int tslack_write_range(struct cgroup *cgroup, struct cftype *cft,
> > +               u64 val)
> > +{
> > +       struct timer_slack_cgroup *tslack_cgroup;
> > +       struct cgroup_iter it;
> > +       struct task_struct *task;
> > +
> > +       if (!val)
> > +               return -EINVAL;
> > +
> > +       tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> > +       switch (cft->private) {
> > +       case TIMER_SLACK_MIN:
> > +               if (val > tslack_cgroup->max_slack_ns)
> > +                       return -EINVAL;
> > +               tslack_cgroup->min_slack_ns = val;
> > +               break;
> > +       case TIMER_SLACK_MAX:
> > +               if (val < tslack_cgroup->min_slack_ns)
> > +                       return -EINVAL;
> > +               tslack_cgroup->max_slack_ns = val;
> > +               break;
> > +       default:
> > +               BUG();
> > +       }
> > +
> 
> Don't we want to keep the min/max applied hierarchically as well? i.e.
> a child can't set its min/max outside the range of its parents?

That was my expectation as well.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-02 22:56   ` jacob pan
@ 2011-02-03  9:22     ` Kirill A. Shutemov
  2011-02-03 17:51       ` Jacob Pan
  0 siblings, 1 reply; 22+ messages in thread
From: Kirill A. Shutemov @ 2011-02-03  9:22 UTC (permalink / raw)
  To: jacob pan
  Cc: Paul Menage, Li Zefan, containers, Arjan van de Ven,
	linux-kernel, Matt Helsley, Paul E. McKenney

On Wed, Feb 02, 2011 at 02:56:05PM -0800, jacob pan wrote:
> On Wed,  2 Feb 2011 22:47:36 +0200
> "Kirill A. Shutsemov" <kirill@shutemov.name> wrote:
> 
> > From: Kirill A. Shutemov <kirill@shutemov.name>
> > 
> > Provides a way of tasks grouping by timer slack value. Introduces per
> > cgroup max and min timer slack value. When a task attaches to a
> > cgroup, its timer slack value adjusts (if needed) to fit min-max
> > range.
> > 
> > It also provides a way to set timer slack value for all tasks in the
> > cgroup at once.
> > 
> > This functionality is useful in mobile devices where certain
> > background apps are attached to a cgroup and minimum wakeups are
> > desired.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
> > Idea-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  include/linux/cgroup_subsys.h |    6 +
> >  include/linux/init_task.h     |    4 +-
> >  init/Kconfig                  |   10 ++
> >  kernel/Makefile               |    1 +
> >  kernel/cgroup_timer_slack.c   |  242
> > +++++++++++++++++++++++++++++++++++++++++
> 
> > +
> > +static int cgroup_timer_slack_check(struct task_struct *task,
> > +		unsigned long slack_ns)
> > +{
> > +	struct cgroup_subsys_state *css;
> > +	struct timer_slack_cgroup *tslack_cgroup;
> > +
> 
> use rcu_read_lock()/unlock?
> lockdep complains unprotected rcu dereference check.

Hmm.. I'm not sure, but it looks like lockdep false positive similar to
cgroup_freezer's (see commit 8b46f88084).
Paul, am I right?

> > +	css = task_subsys_state(task, timer_slack_subsys.subsys_id);
> > +	tslack_cgroup = container_of(css, struct timer_slack_cgroup,
> > css); +
> > +	if (slack_ns < tslack_cgroup->min_slack_ns)
> > +		return -EPERM;
> > +	if (slack_ns > tslack_cgroup->max_slack_ns)
> > +		return -EPERM;
> > +	return 0;
> > +}
> > +
> 
> > +
> > +static struct cftype files[] = {
> > +	{
> > +		.name = "set_slack_ns",
> > +		.write_u64 = tslack_write_set_slack_ns,
> > +	},
> should we also allow reading of the current slack_ns?

There is no 'current slack_ns' for a cgroup since any process free to
change it with prctl().

> > +	{
> > +		.name = "min_slack_ns",
> > +		.private = TIMER_SLACK_MIN,
> > +		.read_u64 = tslack_read_range,
> > +		.write_u64 = tslack_write_range,
> > +	},
> > +	{
> > +		.name = "max_slack_ns",
> > +		.private = TIMER_SLACK_MAX,
> > +		.read_u64 = tslack_read_range,
> > +		.write_u64 = tslack_write_range,
> > +	},
> > +};
> > +

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-02 23:23   ` Paul Menage
  2011-02-03  5:48     ` Matt Helsley
@ 2011-02-03  9:28     ` Kirill A. Shutemov
  1 sibling, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2011-02-03  9:28 UTC (permalink / raw)
  To: Paul Menage
  Cc: Li Zefan, containers, jacob.jun.pan, Arjan van de Ven,
	linux-kernel, Matt Helsley

On Wed, Feb 02, 2011 at 03:23:15PM -0800, Paul Menage wrote:
> On Wed, Feb 2, 2011 at 12:47 PM, Kirill A. Shutsemov
> <kirill@shutemov.name> wrote:
> > From: Kirill A. Shutemov <kirill@shutemov.name>
> >
> > Provides a way of tasks grouping by timer slack value. Introduces per
> > cgroup max and min timer slack value. When a task attaches to a cgroup,
> > its timer slack value adjusts (if needed) to fit min-max range.
> >
> > It also provides a way to set timer slack value for all tasks in the
> > cgroup at once.
> >
> > This functionality is useful in mobile devices where certain background
> > apps are attached to a cgroup and minimum wakeups are desired.
> 
> If you really want to be able to make this modular,

Why not?

> I'd be inclined to
> make the check_timer_slack hook just default to NULL, rather than
> introducing dummy_timer_slack_check()

Good point.

> > +
> > +static int tslack_write_range(struct cgroup *cgroup, struct cftype *cft,
> > +               u64 val)
> > +{
> > +       struct timer_slack_cgroup *tslack_cgroup;
> > +       struct cgroup_iter it;
> > +       struct task_struct *task;
> > +
> > +       if (!val)
> > +               return -EINVAL;
> > +
> > +       tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> > +       switch (cft->private) {
> > +       case TIMER_SLACK_MIN:
> > +               if (val > tslack_cgroup->max_slack_ns)
> > +                       return -EINVAL;
> > +               tslack_cgroup->min_slack_ns = val;
> > +               break;
> > +       case TIMER_SLACK_MAX:
> > +               if (val < tslack_cgroup->min_slack_ns)
> > +                       return -EINVAL;
> > +               tslack_cgroup->max_slack_ns = val;
> > +               break;
> > +       default:
> > +               BUG();
> > +       }
> > +
> 
> Don't we want to keep the min/max applied hierarchically as well? i.e.
> a child can't set its min/max outside the range of its parents?

Ok, I'll implement it.

> > +
> > +static int __init init_cgroup_timer_slack(void)
> > +{
> > +       BUG_ON(timer_slack_check != dummy_timer_slack_check);
> 
> Better to make this just fail the initialization if someone else has
> already claimed the hook, rather than crashing.

Ok.

Thanks, for reviewing.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-03  5:46   ` Matt Helsley
@ 2011-02-03  9:41     ` Kirill A. Shutemov
  2011-02-06  2:49       ` Matt Helsley
  0 siblings, 1 reply; 22+ messages in thread
From: Kirill A. Shutemov @ 2011-02-03  9:41 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Paul Menage, Li Zefan, containers, jacob.jun.pan,
	Arjan van de Ven, linux-kernel

On Wed, Feb 02, 2011 at 09:46:16PM -0800, Matt Helsley wrote:
> On Wed, Feb 02, 2011 at 10:47:36PM +0200, Kirill A. Shutsemov wrote:
> > From: Kirill A. Shutemov <kirill@shutemov.name>
> > 
> > Provides a way of tasks grouping by timer slack value. Introduces per
> > cgroup max and min timer slack value. When a task attaches to a cgroup,
> > its timer slack value adjusts (if needed) to fit min-max range.
> > 
> > It also provides a way to set timer slack value for all tasks in the
> > cgroup at once.
> > 
> > This functionality is useful in mobile devices where certain background
> > apps are attached to a cgroup and minimum wakeups are desired.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
> > Idea-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  include/linux/cgroup_subsys.h |    6 +
> >  include/linux/init_task.h     |    4 +-
> >  init/Kconfig                  |   10 ++
> >  kernel/Makefile               |    1 +
> >  kernel/cgroup_timer_slack.c   |  242 +++++++++++++++++++++++++++++++++++++++++
> >  kernel/sys.c                  |   24 +++-
> >  6 files changed, 280 insertions(+), 7 deletions(-)
> >  create mode 100644 kernel/cgroup_timer_slack.c
> > 
> > diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> > index ccefff0..e399228 100644
> > --- a/include/linux/cgroup_subsys.h
> > +++ b/include/linux/cgroup_subsys.h
> > @@ -66,3 +66,9 @@ SUBSYS(blkio)
> >  #endif
> > 
> >  /* */
> > +
> > +#ifdef CONFIG_CGROUP_TIMER_SLACK
> > +SUBSYS(timer_slack)
> > +#endif
> > +
> > +/* */
> > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > index caa151f..48eca8f 100644
> > --- a/include/linux/init_task.h
> > +++ b/include/linux/init_task.h
> > @@ -124,6 +124,8 @@ extern struct cred init_cred;
> >  # define INIT_PERF_EVENTS(tsk)
> >  #endif
> > 
> > +#define TIMER_SLACK_NS_DEFAULT 50000
> > +
> >  /*
> >   *  INIT_TASK is used to set up the first task table, touch at
> >   * your own risk!. Base=0, limit=0x1fffff (=2MB)
> > @@ -177,7 +179,7 @@ extern struct cred init_cred;
> >  	.cpu_timers	= INIT_CPU_TIMERS(tsk.cpu_timers),		\
> >  	.fs_excl	= ATOMIC_INIT(0),				\
> >  	.pi_lock	= __RAW_SPIN_LOCK_UNLOCKED(tsk.pi_lock),	\
> > -	.timer_slack_ns = 50000, /* 50 usec default slack */		\
> > +	.timer_slack_ns = TIMER_SLACK_NS_DEFAULT,			\
> >  	.pids = {							\
> >  		[PIDTYPE_PID]  = INIT_PID_LINK(PIDTYPE_PID),		\
> >  		[PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID),		\
> > diff --git a/init/Kconfig b/init/Kconfig
> > index be788c0..f21b4ce 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -596,6 +596,16 @@ config CGROUP_FREEZER
> >  	  Provides a way to freeze and unfreeze all tasks in a
> >  	  cgroup.
> > 
> > +config CGROUP_TIMER_SLACK
> > +	tristate "Timer slack cgroup subsystem"
> > +	help
> > +	  Provides a way of tasks grouping by timer slack value.
> > +	  Introduces per cgroup timer slack value which will override
> > +	  the default timer slack value once a task is attached to a
> > +	  cgroup.
> > +	  It's useful in mobile devices where certain background apps
> > +	  are attached to a cgroup and minimum wakeups are desired.
> 
> Again -- I don't think "minimum" is the right choice of words here.

Ok, I'll fix it.

> > +
> >  config CGROUP_DEVICE
> >  	bool "Device controller for cgroups"
> >  	help
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index 353d3fe..0b60239 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -61,6 +61,7 @@ obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
> >  obj-$(CONFIG_COMPAT) += compat.o
> >  obj-$(CONFIG_CGROUPS) += cgroup.o
> >  obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
> > +obj-$(CONFIG_CGROUP_TIMER_SLACK) += cgroup_timer_slack.o
> >  obj-$(CONFIG_CPUSETS) += cpuset.o
> >  obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
> >  obj-$(CONFIG_UTS_NS) += utsname.o
> > diff --git a/kernel/cgroup_timer_slack.c b/kernel/cgroup_timer_slack.c
> > new file mode 100644
> > index 0000000..a343a50
> > --- /dev/null
> > +++ b/kernel/cgroup_timer_slack.c
> > @@ -0,0 +1,242 @@
> > +/*
> > + * cgroup_timer_slack.c - control group timer slack subsystem
> > + *
> > + * Copyright Nokia Corparation, 2011
> > + * Author: Kirill A. Shutemov
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + */
> > +#include <linux/cgroup.h>
> > +#include <linux/init_task.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +
> > +struct cgroup_subsys timer_slack_subsys;
> > +struct timer_slack_cgroup {
> > +	struct cgroup_subsys_state css;
> > +	unsigned long min_slack_ns;
> > +	unsigned long max_slack_ns;
> > +};
> > +
> > +enum {
> > +	TIMER_SLACK_MIN,
> > +	TIMER_SLACK_MAX,
> > +};
> > +
> > +int dummy_timer_slack_check(struct task_struct *task, unsigned long slack_ns);
> 
> The above should be "extern" shouldn't it? Also, I think there's a way
> to eliminate it (more below).
> 
> > +extern int (*timer_slack_check)(struct task_struct *task,
> > +		unsigned long slack_ns);
> > +
> > +static struct timer_slack_cgroup *cgroup_to_tslack_cgroup(struct cgroup *cgroup)
> > +{
> > +	struct cgroup_subsys_state *css;
> > +
> > +	css = cgroup_subsys_state(cgroup, timer_slack_subsys.subsys_id);
> > +	return container_of(css, struct timer_slack_cgroup, css);
> > +}
> > +
> > +static int cgroup_timer_slack_check(struct task_struct *task,
> > +		unsigned long slack_ns)
> 
> nit: rename to cgroup_timer_slack_allowed() ?

cgroup_is_timer_slack_allowed() ?

> > +{
> > +	struct cgroup_subsys_state *css;
> > +	struct timer_slack_cgroup *tslack_cgroup;
> > +
> > +	css = task_subsys_state(task, timer_slack_subsys.subsys_id);
> > +	tslack_cgroup = container_of(css, struct timer_slack_cgroup, css);
> > +
> > +	if (slack_ns < tslack_cgroup->min_slack_ns)
> > +		return -EPERM;
> > +	if (slack_ns > tslack_cgroup->max_slack_ns)
> > +		return -EPERM;
> > +	return 0;
> > +}
> > +
> > +static struct cgroup_subsys_state *
> > +tslack_cgroup_create(struct cgroup_subsys *subsys, struct cgroup *cgroup)
> > +{
> > +	struct timer_slack_cgroup *tslack_cgroup;
> > +
> > +	tslack_cgroup = kmalloc(sizeof(*tslack_cgroup), GFP_KERNEL);
> > +	if (!tslack_cgroup)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	if (cgroup->parent) {
> > +		struct timer_slack_cgroup *parent;
> > +		parent = cgroup_to_tslack_cgroup(cgroup->parent);
> > +		tslack_cgroup->min_slack_ns = parent->min_slack_ns;
> > +		tslack_cgroup->max_slack_ns = parent->max_slack_ns;
> > +	} else {
> > +		tslack_cgroup->min_slack_ns = 0UL;
> > +		tslack_cgroup->max_slack_ns = ULONG_MAX;
> > +	}
> > +
> > +	return &tslack_cgroup->css;
> > +}
> > +
> > +static void tslack_cgroup_destroy(struct cgroup_subsys *subsys,
> > +		struct cgroup *cgroup)
> > +{
> > +	kfree(cgroup_to_tslack_cgroup(cgroup));
> > +}
> > +
> > +/*
> > + * Adjust ->timer_slack_ns and ->default_max_slack_ns of the task to fit
> > + * limits of the cgroup.
> > + */
> > +static void tslack_adjust_task(struct timer_slack_cgroup *tslack_cgroup,
> > +		struct task_struct *tsk)
> > +{
> > +	if (tslack_cgroup->min_slack_ns > tsk->timer_slack_ns)
> > +		tsk->timer_slack_ns = tslack_cgroup->min_slack_ns;
> > +	else if (tslack_cgroup->max_slack_ns < tsk->timer_slack_ns)
> > +		tsk->timer_slack_ns = tslack_cgroup->max_slack_ns;
> > +
> > +	if (tslack_cgroup->min_slack_ns > tsk->default_timer_slack_ns)
> > +		tsk->default_timer_slack_ns = tslack_cgroup->min_slack_ns;
> > +	else if (tslack_cgroup->max_slack_ns < tsk->default_timer_slack_ns)
> > +		tsk->default_timer_slack_ns = tslack_cgroup->max_slack_ns;
> > +}
> > +
> > +static void tslack_cgroup_attach(struct cgroup_subsys *subsys,
> > +		struct cgroup *cgroup, struct cgroup *prev,
> > +		struct task_struct *tsk, bool threadgroup)
> > +{
> > +	tslack_adjust_task(cgroup_to_tslack_cgroup(cgroup), tsk);
> > +}
> > +
> > +static int tslack_write_set_slack_ns(struct cgroup *cgroup, struct cftype *cft,
> > +		u64 val)
> > +{
> > +	struct timer_slack_cgroup *tslack_cgroup;
> > +	struct cgroup_iter it;
> > +	struct task_struct *task;
> > +
> > +	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> > +	if (!val || val < tslack_cgroup->min_slack_ns ||
> 
> Why is a val of 0 disallowed? I know having slack is good, but for
> an administrator or tool that doesn't care about number of wakeups
> and cares more about wringing out performance a slack of
> 0 seems acceptable. Is this just here to be consistent with the
> values passed in via prctl?

Yes, it's to consistent with the prctl(). I don't think that it's good
idea to allow to set timer_slack outside of range prctl() allows. It may
lead to interface abuse.

> > +			val > tslack_cgroup->max_slack_ns )
> > +		return -EINVAL;
> 
> Shouldn't it be EPERM and not EINVAL?
> 
> The write(2) man page says: "Other errors may occur, depending on the
> object connected to fd." So I think EPERM is fine and more descriptive.

What do you think about -EINVAL for (val == 0) and -EPERM for rest?

> > +
> > +	/* Change timer slack value for all tasks in the cgroup */
> > +	cgroup_iter_start(cgroup, &it);
> > +	while ((task = cgroup_iter_next(cgroup, &it)))
> > +		task->timer_slack_ns = val;
> > +	cgroup_iter_end(cgroup, &it);
> > +
> > +	return 0;
> > +}
> > +
> > +static u64 tslack_read_range(struct cgroup *cgroup, struct cftype *cft)
> > +{
> > +	struct timer_slack_cgroup *tslack_cgroup;
> > +
> > +	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> > +	switch (cft->private) {
> > +	case TIMER_SLACK_MIN:
> > +		return tslack_cgroup->min_slack_ns;
> > +	case TIMER_SLACK_MAX:
> > +		return tslack_cgroup->max_slack_ns;
> > +	default:
> > +		BUG();
> > +	};
> > +}
> > +
> > +static int tslack_write_range(struct cgroup *cgroup, struct cftype *cft,
> > +		u64 val)
> > +{
> > +	struct timer_slack_cgroup *tslack_cgroup;
> > +	struct cgroup_iter it;
> > +	struct task_struct *task;
> > +
> > +	if (!val)
> > +		return -EINVAL;
> 
> Same question re: 0 slack...
> 
> > +
> > +	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> > +	switch (cft->private) {
> > +	case TIMER_SLACK_MIN:
> > +		if (val > tslack_cgroup->max_slack_ns)
> > +			return -EINVAL;
> > +		tslack_cgroup->min_slack_ns = val;
> > +		break;
> > +	case TIMER_SLACK_MAX:
> > +		if (val < tslack_cgroup->min_slack_ns)
> > +			return -EINVAL;
> > +		tslack_cgroup->max_slack_ns = val;
> > +		break;
> > +	default:
> > +		BUG();
> > +	}
> > +
> > +	/*
> > +	 * Adjust timer slack value for all tasks in the cgroup to fit
> > +	 * min-max range.
> > +	 */
> > +	cgroup_iter_start(cgroup, &it);
> > +	while ((task = cgroup_iter_next(cgroup, &it)))
> > +		tslack_adjust_task(tslack_cgroup, task);
> > +	cgroup_iter_end(cgroup, &it);
> > +
> > +	return 0;
> > +}
> > +
> > +static struct cftype files[] = {
> > +	{
> > +		.name = "set_slack_ns",
> > +		.write_u64 = tslack_write_set_slack_ns,
> > +	},
> > +	{
> > +		.name = "min_slack_ns",
> > +		.private = TIMER_SLACK_MIN,
> > +		.read_u64 = tslack_read_range,
> > +		.write_u64 = tslack_write_range,
> > +	},
> > +	{
> > +		.name = "max_slack_ns",
> > +		.private = TIMER_SLACK_MAX,
> > +		.read_u64 = tslack_read_range,
> > +		.write_u64 = tslack_write_range,
> > +	},
> > +};
> 
> Looks good, thanks!
> 
> > +
> > +static int tslack_cgroup_populate(struct cgroup_subsys *subsys,
> > +		struct cgroup *cgroup)
> > +{
> > +	return cgroup_add_files(cgroup, subsys, files, ARRAY_SIZE(files));
> > +}
> > +
> > +struct cgroup_subsys timer_slack_subsys = {
> > +	.name = "timer_slack",
> > +	.module = THIS_MODULE,
> > +#ifdef CONFIG_CGROUP_TIMER_SLACK
> > +	.subsys_id = timer_slack_subsys_id,
> > +#endif
> 
> Uhh... why is the #ifdef necessary? Won't it always be true if we're
> compiling this file?

Try to compile it as module. ;)

> > +	.create = tslack_cgroup_create,
> > +	.destroy = tslack_cgroup_destroy,
> > +	.attach = tslack_cgroup_attach,
> > +	.populate = tslack_cgroup_populate,
> > +};
> > +
> > +static int __init init_cgroup_timer_slack(void)
> > +{
> > +	BUG_ON(timer_slack_check != dummy_timer_slack_check);
> 
> You could just save the old timer_slack_check value into a static pointer
> and then restore it in exit_cgroup_timer_slack. That would eliminate the
> need for one of the two EXPORTs below but you wouldn't be able to do this
> BUG_ON check. I think that'd be a slight improvement.
> 
> > +	timer_slack_check = cgroup_timer_slack_check;
> > +	return cgroup_load_subsys(&timer_slack_subsys);
> 
> Shouldn't you load the subsystem first then set the timer_slack_check
> function pointer once you know that the load has succeeded?

I'll rework this part.

Thanks for reviewing.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-03  9:22     ` Kirill A. Shutemov
@ 2011-02-03 17:51       ` Jacob Pan
  2011-02-03 18:12         ` Paul Menage
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Pan @ 2011-02-03 17:51 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Paul Menage, Li Zefan, containers, Arjan van de Ven,
	linux-kernel, Matt Helsley, Paul E. McKenney

On Thu, 3 Feb 2011 11:22:29 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Wed, Feb 02, 2011 at 02:56:05PM -0800, jacob pan wrote:
> > On Wed,  2 Feb 2011 22:47:36 +0200
> > "Kirill A. Shutsemov" <kirill@shutemov.name> wrote:
> > 
> > > From: Kirill A. Shutemov <kirill@shutemov.name>
> > > 
> > > Provides a way of tasks grouping by timer slack value. Introduces
> > > per cgroup max and min timer slack value. When a task attaches to
> > > a cgroup, its timer slack value adjusts (if needed) to fit min-max
> > > range.
> > > 
> > > It also provides a way to set timer slack value for all tasks in
> > > the cgroup at once.
> > > 
> > > This functionality is useful in mobile devices where certain
> > > background apps are attached to a cgroup and minimum wakeups are
> > > desired.
> > > 
> > > Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
> > > Idea-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > ---
> > >  include/linux/cgroup_subsys.h |    6 +
> > >  include/linux/init_task.h     |    4 +-
> > >  init/Kconfig                  |   10 ++
> > >  kernel/Makefile               |    1 +
> > >  kernel/cgroup_timer_slack.c   |  242
> > > +++++++++++++++++++++++++++++++++++++++++
> > 


> > 
> > > +
> > > +static struct cftype files[] = {
> > > +	{
> > > +		.name = "set_slack_ns",
> > > +		.write_u64 = tslack_write_set_slack_ns,
> > > +	},
> > should we also allow reading of the current slack_ns?
> 
> There is no 'current slack_ns' for a cgroup since any process free to
> change it with prctl().
> 

I think there is still a need for current slack_ns. e.g. if i created a
cgroup_1 then attach task_A and task_B to it such that their
individual timer_slack got adjusted based on the limit in the
cgroup. Now I set cgroup_1 timer_slack to be ts_1, then timer_slack
for both task_A and task_B are set to ts_1.

If i attach another task_C to cgroup_1, timer_slack for task_C will be
adjusted based on min/max setting of cgroup_1, which can be different
than ts_1. User has to manually set cgroup time_slack again to make them
identical.

I think this logic defeats the purpose of having timer_slack
subsystem in the first place. IMHO, the original intention was to have
grouping effect of tasks in the cgroup.

So my suggestion is to keep a per cgroup current timer_slack value,
which can be default to the system default at 50us. Like Arjan
suggested, we can enforce the timer_slack value in the timer code when
it is used. This way we can solve another problem where when a task is
detached from the cgroup, it would be desirable to restore its original
slack value.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-03 17:51       ` Jacob Pan
@ 2011-02-03 18:12         ` Paul Menage
  2011-02-03 19:57           ` Jacob Pan
  0 siblings, 1 reply; 22+ messages in thread
From: Paul Menage @ 2011-02-03 18:12 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Kirill A. Shutemov, Li Zefan, containers, Arjan van de Ven,
	linux-kernel, Matt Helsley, Paul E. McKenney

On Thu, Feb 3, 2011 at 9:51 AM, Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
>
> I think this logic defeats the purpose of having timer_slack
> subsystem in the first place. IMHO, the original intention was to have
> grouping effect of tasks in the cgroup.

You can get the semantics you want by just setting min_slack_ns = max_slack_ns.

Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-03 18:12         ` Paul Menage
@ 2011-02-03 19:57           ` Jacob Pan
  2011-02-04 13:34             ` Kirill A. Shutemov
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Pan @ 2011-02-03 19:57 UTC (permalink / raw)
  To: Paul Menage
  Cc: Kirill A. Shutemov, Li Zefan, containers, Arjan van de Ven,
	linux-kernel, Matt Helsley, Paul E. McKenney

On Thu, 3 Feb 2011 10:12:51 -0800
Paul Menage <menage@google.com> wrote:

> On Thu, Feb 3, 2011 at 9:51 AM, Jacob Pan
> <jacob.jun.pan@linux.intel.com> wrote:
> >
> > I think this logic defeats the purpose of having timer_slack
> > subsystem in the first place. IMHO, the original intention was to
> > have grouping effect of tasks in the cgroup.
> 
> You can get the semantics you want by just setting min_slack_ns =
> max_slack_ns.
> 
true. it will just make set fail when min = max. it is awkward and
counter intuitive when you want to change the group timer_slack. you
will have to move both min and max to clamp the value, where set
function can not be used.

In addition, when a parent changes min = max, I don't see the current
code enforce new settings on the children. Am i missing something?

In my use case, i want to put some apps into a managed group where
relaxed slack value is used, but when time comes to move the app out of
that cgroup, we would like to resore the original timer slack. I see
having a current value per cgroup can be useful if we let timer code
pick whether to use task slack value or the cgroup slack value.
Or we have to cache the old value per task

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-03 19:57           ` Jacob Pan
@ 2011-02-04 13:34             ` Kirill A. Shutemov
  2011-02-04 17:27               ` Jacob Pan
  0 siblings, 1 reply; 22+ messages in thread
From: Kirill A. Shutemov @ 2011-02-04 13:34 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Paul Menage, Li Zefan, containers, Arjan van de Ven,
	linux-kernel, Matt Helsley, Paul E. McKenney

On Thu, Feb 03, 2011 at 11:57:43AM -0800, Jacob Pan wrote:
> On Thu, 3 Feb 2011 10:12:51 -0800
> Paul Menage <menage@google.com> wrote:
> 
> > On Thu, Feb 3, 2011 at 9:51 AM, Jacob Pan
> > <jacob.jun.pan@linux.intel.com> wrote:
> > >
> > > I think this logic defeats the purpose of having timer_slack
> > > subsystem in the first place. IMHO, the original intention was to
> > > have grouping effect of tasks in the cgroup.
> > 
> > You can get the semantics you want by just setting min_slack_ns =
> > max_slack_ns.
> > 
> true. it will just make set fail when min = max. it is awkward and
> counter intuitive when you want to change the group timer_slack. you
> will have to move both min and max to clamp the value, where set
> function can not be used.

Interface is very similar to /sys/devices/system/cpu/cpuX/cpufreq.
I think it's sane. If you want some extention, you can do it with
userspace helper.

> In addition, when a parent changes min = max, I don't see the current
> code enforce new settings on the children. Am i missing something?

I've missed it. I'll fix.

> In my use case, i want to put some apps into a managed group where
> relaxed slack value is used, but when time comes to move the app out of
> that cgroup, we would like to resore the original timer slack. I see
> having a current value per cgroup can be useful if we let timer code
> pick whether to use task slack value or the cgroup slack value.
> Or we have to cache the old value per task

What's mean "original timer slack" if you are free to move a task
between a lot of cgroups and process itself free to change it anytime?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-04 13:34             ` Kirill A. Shutemov
@ 2011-02-04 17:27               ` Jacob Pan
  2011-02-07  0:33                 ` Matt Helsley
  2011-02-07 11:06                 ` Kirill A. Shutemov
  0 siblings, 2 replies; 22+ messages in thread
From: Jacob Pan @ 2011-02-04 17:27 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Paul Menage, Li Zefan, containers, Arjan van de Ven,
	linux-kernel, Matt Helsley, Paul E. McKenney

On Fri, 4 Feb 2011 15:34:39 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Thu, Feb 03, 2011 at 11:57:43AM -0800, Jacob Pan wrote:
> > On Thu, 3 Feb 2011 10:12:51 -0800
> > Paul Menage <menage@google.com> wrote:
> > 
> > > On Thu, Feb 3, 2011 at 9:51 AM, Jacob Pan
> > > <jacob.jun.pan@linux.intel.com> wrote:
> > > >
> > > > I think this logic defeats the purpose of having timer_slack
> > > > subsystem in the first place. IMHO, the original intention was
> > > > to have grouping effect of tasks in the cgroup.
> > > 
> > > You can get the semantics you want by just setting min_slack_ns =
> > > max_slack_ns.
> > > 
> > true. it will just make set fail when min = max. it is awkward and
> > counter intuitive when you want to change the group timer_slack. you
> > will have to move both min and max to clamp the value, where set
> > function can not be used.
> 
> Interface is very similar to /sys/devices/system/cpu/cpuX/cpufreq.
> I think it's sane. If you want some extention, you can do it with
> userspace helper.
> 
I don't disagree the current interface is usable. Just less intuitive.
The situation is different for cpufreq, where you
don't the situation of adding new entries to be adjusted in the
existing max-min range.

> > In addition, when a parent changes min = max, I don't see the
> > current code enforce new settings on the children. Am i missing
> > something?
> 
> I've missed it. I'll fix.
> 
> > In my use case, i want to put some apps into a managed group where
> > relaxed slack value is used, but when time comes to move the app
> > out of that cgroup, we would like to resore the original timer
> > slack. I see having a current value per cgroup can be useful if we
> > let timer code pick whether to use task slack value or the cgroup
> > slack value. Or we have to cache the old value per task
> 
> What's mean "original timer slack" if you are free to move a task
> between a lot of cgroups and process itself free to change it anytime?
> 

I need to manage tasks by a management software instead of letting the
task change timer_slack by itself. The goal is to make management
transparent and no modifications to the existing apps. Therefore, it is
desirable to automatically enforce timer_slack when the apps are in the
cgroup while automatically restore it when it is no longer under cgroup
management.
So the "original timer slack" can be the default 50us or whatever value
chosen by the task itself. But the app itself should not care or even be
aware of which cgroup it is in.

So here are two optoins i can think of
1. add a new variable called cg_timer_slack_ns to struct task_struct{}
cg_timer_slack_ns will be set by cgroup timer_slack subsystem, then we
can retain the original per task value in timer_slack_ns.
timer code will pick max(cg_timer_slack_ns, timer_slack_ns) if
cg_timer_slack_ns is set.

2. leave task_struct unchanged, add a current_timer_slack to the
cgroup. timer_slack cgroup does not modify per task timer_slack_ns.
similar to option #1, let timer code pick the timer_slack to use based
on whether the task is in timer_slack cgroup.

Any thoughts?

Thanks,
Jacob

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-03  9:41     ` Kirill A. Shutemov
@ 2011-02-06  2:49       ` Matt Helsley
  2011-02-07  9:48         ` Kirill A. Shutemov
  0 siblings, 1 reply; 22+ messages in thread
From: Matt Helsley @ 2011-02-06  2:49 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Matt Helsley, Paul Menage, Li Zefan, containers, jacob.jun.pan,
	Arjan van de Ven, linux-kernel

On Thu, Feb 03, 2011 at 11:41:38AM +0200, Kirill A. Shutemov wrote:
> On Wed, Feb 02, 2011 at 09:46:16PM -0800, Matt Helsley wrote:
> > On Wed, Feb 02, 2011 at 10:47:36PM +0200, Kirill A. Shutsemov wrote:
> > > From: Kirill A. Shutemov <kirill@shutemov.name>

<snip>

> > > diff --git a/kernel/cgroup_timer_slack.c b/kernel/cgroup_timer_slack.c
> > > new file mode 100644
> > > index 0000000..a343a50
> > > --- /dev/null
> > > +++ b/kernel/cgroup_timer_slack.c

<snip>

> > > +static int tslack_write_set_slack_ns(struct cgroup *cgroup, struct cftype *cft,
> > > +		u64 val)
> > > +{
> > > +	struct timer_slack_cgroup *tslack_cgroup;
> > > +	struct cgroup_iter it;
> > > +	struct task_struct *task;
> > > +
> > > +	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> > > +	if (!val || val < tslack_cgroup->min_slack_ns ||
> > 
> > Why is a val of 0 disallowed? I know having slack is good, but for
> > an administrator or tool that doesn't care about number of wakeups
> > and cares more about wringing out performance a slack of
> > 0 seems acceptable. Is this just here to be consistent with the
> > values passed in via prctl?
> 
> Yes, it's to consistent with the prctl(). I don't think that it's good
> idea to allow to set timer_slack outside of range prctl() allows. It may
> lead to interface abuse.

Hmm, I was just thinking that 0 timer slack might be useful. But I
suppose you could just as easily set it to 1 and nobody would notice.

> > > +			val > tslack_cgroup->max_slack_ns )
> > > +		return -EINVAL;
> > 
> > Shouldn't it be EPERM and not EINVAL?
> > 
> > The write(2) man page says: "Other errors may occur, depending on the
> > object connected to fd." So I think EPERM is fine and more descriptive.
> 
> What do you think about -EINVAL for (val == 0) and -EPERM for rest?

OK, that makes sense to me given both of our points above.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-04 17:27               ` Jacob Pan
@ 2011-02-07  0:33                 ` Matt Helsley
  2011-02-07 11:06                 ` Kirill A. Shutemov
  1 sibling, 0 replies; 22+ messages in thread
From: Matt Helsley @ 2011-02-07  0:33 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Kirill A. Shutemov, Paul Menage, Li Zefan, containers,
	Arjan van de Ven, linux-kernel, Matt Helsley, Paul E. McKenney

On Fri, Feb 04, 2011 at 09:27:55AM -0800, Jacob Pan wrote:
> On Fri, 4 Feb 2011 15:34:39 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Thu, Feb 03, 2011 at 11:57:43AM -0800, Jacob Pan wrote:
> > > On Thu, 3 Feb 2011 10:12:51 -0800
> > > Paul Menage <menage@google.com> wrote:
> > > 
> > > > On Thu, Feb 3, 2011 at 9:51 AM, Jacob Pan
> > > > <jacob.jun.pan@linux.intel.com> wrote:
> > > > >
> > > > > I think this logic defeats the purpose of having timer_slack
> > > > > subsystem in the first place. IMHO, the original intention was
> > > > > to have grouping effect of tasks in the cgroup.
> > > > 
> > > > You can get the semantics you want by just setting min_slack_ns =
> > > > max_slack_ns.
> > > > 
> > > true. it will just make set fail when min = max. it is awkward and
> > > counter intuitive when you want to change the group timer_slack. you
> > > will have to move both min and max to clamp the value, where set
> > > function can not be used.
> > 
> > Interface is very similar to /sys/devices/system/cpu/cpuX/cpufreq.
> > I think it's sane. If you want some extention, you can do it with
> > userspace helper.
> > 
> I don't disagree the current interface is usable. Just less intuitive.
> The situation is different for cpufreq, where you
> don't the situation of adding new entries to be adjusted in the
> existing max-min range.

We can probably eliminate the upper end of the range for now -- I don't
see any practical reason for it and we can always add it later if I'm
being too short-sighted. :)

We could also probably eliminate the "set" interface and rely solely on
prctl() for that. The only thing needed is a "minimum" timer slack
applied for the cgroup. We could apply that minimum like the current
code does _or_ via the same method as option #2 which you presented below.

>
> 
> > > In addition, when a parent changes min = max, I don't see the
> > > current code enforce new settings on the children. Am i missing
> > > something?
> > 
> > I've missed it. I'll fix.

Sounds good. Allowing child cgroups also allow for nesting of containers or
imposing limits to timer slack on users and all of their containers.

> > 
> > > In my use case, i want to put some apps into a managed group where
> > > relaxed slack value is used, but when time comes to move the app
> > > out of that cgroup, we would like to resore the original timer
> > > slack. I see having a current value per cgroup can be useful if we
> > > let timer code pick whether to use task slack value or the cgroup
> > > slack value. Or we have to cache the old value per task
> > 
> > What's mean "original timer slack" if you are free to move a task
> > between a lot of cgroups and process itself free to change it anytime?
> > 
> 
> I need to manage tasks by a management software instead of letting the
> task change timer_slack by itself. The goal is to make management
> transparent and no modifications to the existing apps. Therefore, it is

Until those apps learn that there's a subsystem they can manipulate
too ;).

> desirable to automatically enforce timer_slack when the apps are in the
> cgroup while automatically restore it when it is no longer under cgroup
> management.
>
> So the "original timer slack" can be the default 50us or whatever value
> chosen by the task itself. But the app itself should not care or even be
> aware of which cgroup it is in.

I don't think anyone is arguing it should.

> 
> So here are two optoins i can think of
> 1. add a new variable called cg_timer_slack_ns to struct task_struct{}
> cg_timer_slack_ns will be set by cgroup timer_slack subsystem, then we
> can retain the original per task value in timer_slack_ns.
> timer code will pick max(cg_timer_slack_ns, timer_slack_ns) if
> cg_timer_slack_ns is set.
> 
> 2. leave task_struct unchanged, add a current_timer_slack to the
> cgroup. timer_slack cgroup does not modify per task timer_slack_ns.
> similar to option #1, let timer code pick the timer_slack to use based
> on whether the task is in timer_slack cgroup.

I really like #2 and strongly dislike #1 because cgroup subsystem
information should stay out of the task struct as much as possible.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-06  2:49       ` Matt Helsley
@ 2011-02-07  9:48         ` Kirill A. Shutemov
  0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2011-02-07  9:48 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Paul Menage, Li Zefan, containers, jacob.jun.pan,
	Arjan van de Ven, linux-kernel

On Sat, Feb 05, 2011 at 06:49:51PM -0800, Matt Helsley wrote:
> On Thu, Feb 03, 2011 at 11:41:38AM +0200, Kirill A. Shutemov wrote:
> > On Wed, Feb 02, 2011 at 09:46:16PM -0800, Matt Helsley wrote:
> > > On Wed, Feb 02, 2011 at 10:47:36PM +0200, Kirill A. Shutsemov wrote:
> > > > From: Kirill A. Shutemov <kirill@shutemov.name>
> 
> <snip>
> 
> > > > diff --git a/kernel/cgroup_timer_slack.c b/kernel/cgroup_timer_slack.c
> > > > new file mode 100644
> > > > index 0000000..a343a50
> > > > --- /dev/null
> > > > +++ b/kernel/cgroup_timer_slack.c
> 
> <snip>
> 
> > > > +static int tslack_write_set_slack_ns(struct cgroup *cgroup, struct cftype *cft,
> > > > +		u64 val)
> > > > +{
> > > > +	struct timer_slack_cgroup *tslack_cgroup;
> > > > +	struct cgroup_iter it;
> > > > +	struct task_struct *task;
> > > > +
> > > > +	tslack_cgroup = cgroup_to_tslack_cgroup(cgroup);
> > > > +	if (!val || val < tslack_cgroup->min_slack_ns ||
> > > 
> > > Why is a val of 0 disallowed? I know having slack is good, but for
> > > an administrator or tool that doesn't care about number of wakeups
> > > and cares more about wringing out performance a slack of
> > > 0 seems acceptable. Is this just here to be consistent with the
> > > values passed in via prctl?
> > 
> > Yes, it's to consistent with the prctl(). I don't think that it's good
> > idea to allow to set timer_slack outside of range prctl() allows. It may
> > lead to interface abuse.
> 
> Hmm, I was just thinking that 0 timer slack might be useful. But I
> suppose you could just as easily set it to 1 and nobody would notice.

I've rechecked once again. it lookes cleaner to allow 0 as timer slack
value.
I allowed it in version 4 of the patchset.


-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-04 17:27               ` Jacob Pan
  2011-02-07  0:33                 ` Matt Helsley
@ 2011-02-07 11:06                 ` Kirill A. Shutemov
  2011-02-07 17:20                   ` Jacob Pan
  1 sibling, 1 reply; 22+ messages in thread
From: Kirill A. Shutemov @ 2011-02-07 11:06 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Paul Menage, Li Zefan, containers, Arjan van de Ven,
	linux-kernel, Matt Helsley, Paul E. McKenney

On Fri, Feb 04, 2011 at 09:27:55AM -0800, Jacob Pan wrote:
> On Fri, 4 Feb 2011 15:34:39 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > What's mean "original timer slack" if you are free to move a task
> > between a lot of cgroups and process itself free to change it anytime?
> > 
> 
> I need to manage tasks by a management software instead of letting the
> task change timer_slack by itself. The goal is to make management
> transparent and no modifications to the existing apps. Therefore, it is
> desirable to automatically enforce timer_slack when the apps are in the
> cgroup while automatically restore it when it is no longer under cgroup
> management.

Tasks are always under cgroup management. Root cgroup is still cgroup.

> So the "original timer slack" can be the default 50us or whatever value
> chosen by the task itself. But the app itself should not care or even be
> aware of which cgroup it is in.
> 
> So here are two optoins i can think of
> 1. add a new variable called cg_timer_slack_ns to struct task_struct{}
> cg_timer_slack_ns will be set by cgroup timer_slack subsystem, then we
> can retain the original per task value in timer_slack_ns.
> timer code will pick max(cg_timer_slack_ns, timer_slack_ns) if
> cg_timer_slack_ns is set.
> 
> 2. leave task_struct unchanged, add a current_timer_slack to the
> cgroup. timer_slack cgroup does not modify per task timer_slack_ns.
> similar to option #1, let timer code pick the timer_slack to use based
> on whether the task is in timer_slack cgroup.
> 
> Any thoughts?

I think it's over-engineering.

What about configuration like this:

 root_cgroup
 |--timer_slack.min_slack_ns = 0
 |--timer_slack.max_slack_ns = ULONG_MAX
 |--50us
 |  |--timer_slack.min_slack_ns = 50000
 |  |--timer_slack.max_slack_ns = 50000
 |--500us
    |--timer_slack.min_slack_ns = 500000
    |--timer_slack_max_slack_ns = ULONG_MAX

If you want a task allow to drive its timer_slack, just leave it in
root_cgroup.

It you want to drive timer_slack of a task, move it between 50us and 500um
based on your policy.
 
-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-07 11:06                 ` Kirill A. Shutemov
@ 2011-02-07 17:20                   ` Jacob Pan
  2011-02-07 18:32                     ` Kirill A. Shutemov
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Pan @ 2011-02-07 17:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Paul Menage, Li Zefan, containers, Arjan van de Ven,
	linux-kernel, Matt Helsley, Paul E. McKenney

On Mon, 7 Feb 2011 13:06:03 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Fri, Feb 04, 2011 at 09:27:55AM -0800, Jacob Pan wrote:
> > On Fri, 4 Feb 2011 15:34:39 +0200
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > > What's mean "original timer slack" if you are free to move a task
> > > between a lot of cgroups and process itself free to change it
> > > anytime?
> > > 
> > 
> > I need to manage tasks by a management software instead of letting
> > the task change timer_slack by itself. The goal is to make
> > management transparent and no modifications to the existing apps.
> > Therefore, it is desirable to automatically enforce timer_slack
> > when the apps are in the cgroup while automatically restore it when
> > it is no longer under cgroup management.
> 
> Tasks are always under cgroup management. Root cgroup is still cgroup.
> 
> > So the "original timer slack" can be the default 50us or whatever
> > value chosen by the task itself. But the app itself should not care
> > or even be aware of which cgroup it is in.
> > 
> > So here are two optoins i can think of
> > 1. add a new variable called cg_timer_slack_ns to struct
> > task_struct{} cg_timer_slack_ns will be set by cgroup timer_slack
> > subsystem, then we can retain the original per task value in
> > timer_slack_ns. timer code will pick max(cg_timer_slack_ns,
> > timer_slack_ns) if cg_timer_slack_ns is set.
> > 
> > 2. leave task_struct unchanged, add a current_timer_slack to the
> > cgroup. timer_slack cgroup does not modify per task timer_slack_ns.
> > similar to option #1, let timer code pick the timer_slack to use
> > based on whether the task is in timer_slack cgroup.
> > 
> > Any thoughts?
> 
> I think it's over-engineering.
> 
We do have a real use case that cannot be solved by the current logic,
I only want to find a solution.
> What about configuration like this:
> 
>  root_cgroup
>  |--timer_slack.min_slack_ns = 0
>  |--timer_slack.max_slack_ns = ULONG_MAX
>  |--50us
>  |  |--timer_slack.min_slack_ns = 50000
>  |  |--timer_slack.max_slack_ns = 50000
>  |--500us
>     |--timer_slack.min_slack_ns = 500000
>     |--timer_slack_max_slack_ns = ULONG_MAX
> 
> If you want a task allow to drive its timer_slack, just leave it in
> root_cgroup.
> 
> It you want to drive timer_slack of a task, move it between 50us and
> 500um based on your policy.
>  
I surely agree with the dummy root configuration. But when the
management software moves task among 50us or 500us cgroups, or back to
dummy root, the timer slack cannot be automatically changed.

consider the following scenarios.
1. task_A has timer_slack = ts1 = 3us when it is running in the
foreground by window manager
2. then it is moved to 50us cgroup because it is no longer in the
foreground, so now ts1 = 50us. 
3. After a while, the system is running in the low power state, so
task_A is moved to 500us cgroup, ts1 = 500us. Then the user switch the
device into normal running state and put task_A in foreground again.
4. Management software then moves task_A from 500us cgroup to dummy
root, but it will not be able to restore the 3us timer slack as needed
by task_A. Task can surely drive its timer_slack but it breaks the
cgroup-transparency desired by the management scheme.

The key is that tasks being managed are not aware of cgroup
involvement. The management software is the one that moves tasks
around, but we don't want the management software keeps track of very
timer slacks value of every tasks. 

Thanks,
Jacob

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH, v3 2/2] cgroups: introduce timer slack subsystem
  2011-02-07 17:20                   ` Jacob Pan
@ 2011-02-07 18:32                     ` Kirill A. Shutemov
  0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2011-02-07 18:32 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Paul Menage, Li Zefan, containers, Arjan van de Ven,
	linux-kernel, Matt Helsley, Paul E. McKenney

On Mon, Feb 07, 2011 at 09:20:40AM -0800, Jacob Pan wrote:
> On Mon, 7 Feb 2011 13:06:03 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Fri, Feb 04, 2011 at 09:27:55AM -0800, Jacob Pan wrote:
> > > On Fri, 4 Feb 2011 15:34:39 +0200
> > > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > > > What's mean "original timer slack" if you are free to move a task
> > > > between a lot of cgroups and process itself free to change it
> > > > anytime?
> > > > 
> > > 
> > > I need to manage tasks by a management software instead of letting
> > > the task change timer_slack by itself. The goal is to make
> > > management transparent and no modifications to the existing apps.
> > > Therefore, it is desirable to automatically enforce timer_slack
> > > when the apps are in the cgroup while automatically restore it when
> > > it is no longer under cgroup management.
> > 
> > Tasks are always under cgroup management. Root cgroup is still cgroup.
> > 
> > > So the "original timer slack" can be the default 50us or whatever
> > > value chosen by the task itself. But the app itself should not care
> > > or even be aware of which cgroup it is in.
> > > 
> > > So here are two optoins i can think of
> > > 1. add a new variable called cg_timer_slack_ns to struct
> > > task_struct{} cg_timer_slack_ns will be set by cgroup timer_slack
> > > subsystem, then we can retain the original per task value in
> > > timer_slack_ns. timer code will pick max(cg_timer_slack_ns,
> > > timer_slack_ns) if cg_timer_slack_ns is set.
> > > 
> > > 2. leave task_struct unchanged, add a current_timer_slack to the
> > > cgroup. timer_slack cgroup does not modify per task timer_slack_ns.
> > > similar to option #1, let timer code pick the timer_slack to use
> > > based on whether the task is in timer_slack cgroup.
> > > 
> > > Any thoughts?
> > 
> > I think it's over-engineering.
> > 
> We do have a real use case that cannot be solved by the current logic,
> I only want to find a solution.
> > What about configuration like this:
> > 
> >  root_cgroup
> >  |--timer_slack.min_slack_ns = 0
> >  |--timer_slack.max_slack_ns = ULONG_MAX
> >  |--50us
> >  |  |--timer_slack.min_slack_ns = 50000
> >  |  |--timer_slack.max_slack_ns = 50000
> >  |--500us
> >     |--timer_slack.min_slack_ns = 500000
> >     |--timer_slack_max_slack_ns = ULONG_MAX
> > 
> > If you want a task allow to drive its timer_slack, just leave it in
> > root_cgroup.
> > 
> > It you want to drive timer_slack of a task, move it between 50us and
> > 500um based on your policy.
> >  
> I surely agree with the dummy root configuration. But when the
> management software moves task among 50us or 500us cgroups, or back to
> dummy root, the timer slack cannot be automatically changed.
> 
> consider the following scenarios.
> 1. task_A has timer_slack = ts1 = 3us when it is running in the
> foreground by window manager
> 2. then it is moved to 50us cgroup because it is no longer in the
> foreground, so now ts1 = 50us. 
> 3. After a while, the system is running in the low power state, so
> task_A is moved to 500us cgroup, ts1 = 500us. Then the user switch the
> device into normal running state and put task_A in foreground again.
> 4. Management software then moves task_A from 500us cgroup to dummy
> root, but it will not be able to restore the 3us timer slack as needed
> by task_A. Task can surely drive its timer_slack but it breaks the
> cgroup-transparency desired by the management scheme.

Can you use 3us cgroup instead of moving to root cgroup? :)

I understand your use-case, but I can't provide sane interface to
implement this.

> The key is that tasks being managed are not aware of cgroup
> involvement. The management software is the one that moves tasks
> around, but we don't want the management software keeps track of very
> timer slacks value of every tasks. 
> 
> Thanks,
> Jacob

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-02-07 18:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-02 20:47 [PATCH, v3 0/2] Timer slack cgroup subsystem Kirill A. Shutsemov
2011-02-02 20:47 ` [PATCH, v3 1/2] cgroups: export cgroup_iter_{start,next,end} Kirill A. Shutsemov
2011-02-02 21:21   ` Paul Menage
2011-02-02 20:47 ` [PATCH, v3 2/2] cgroups: introduce timer slack subsystem Kirill A. Shutsemov
2011-02-02 22:56   ` jacob pan
2011-02-03  9:22     ` Kirill A. Shutemov
2011-02-03 17:51       ` Jacob Pan
2011-02-03 18:12         ` Paul Menage
2011-02-03 19:57           ` Jacob Pan
2011-02-04 13:34             ` Kirill A. Shutemov
2011-02-04 17:27               ` Jacob Pan
2011-02-07  0:33                 ` Matt Helsley
2011-02-07 11:06                 ` Kirill A. Shutemov
2011-02-07 17:20                   ` Jacob Pan
2011-02-07 18:32                     ` Kirill A. Shutemov
2011-02-02 23:23   ` Paul Menage
2011-02-03  5:48     ` Matt Helsley
2011-02-03  9:28     ` Kirill A. Shutemov
2011-02-03  5:46   ` Matt Helsley
2011-02-03  9:41     ` Kirill A. Shutemov
2011-02-06  2:49       ` Matt Helsley
2011-02-07  9:48         ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).