linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET RFC] cpu,cpuacct: make cpu serve cpuacct files and deprecate cpuacct
@ 2012-09-19 22:43 Tejun Heo
  2012-09-19 22:43 ` [PATCH 1/3] cgroup: implement CFTYPE_NO_PREFIX Tejun Heo
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Tejun Heo @ 2012-09-19 22:43 UTC (permalink / raw)
  To: linux-kernel, containers, cgroups, lizefan
  Cc: peterz, glommer, mhocko, kay.sievers, mzxreary, davej, ben, pjt

Hello,

This is an attempt at allowing quick deprecation and removal of
cpuacct.  The patchset makes cpu serve the same cpuacct.* files and
updates cgroup core such that it ignores cpuacct if requested to be
co-mounted with cpu whether CONFIG_CGROUP_CPUACCT is enabled or not.

This de-couples cpuacct deprecation on kernel side from userland
transition as long as userland is co-mounting cpu and cpuacct.  In
this series, cpuacct implementation is simply copied into cpu, the
goal being to provide base for proper optimization (most likely
periodic collection of stats across the hierarchy).

I didn't try to maintain everything the same.  /proc/cgroups and
/proc/PID/cgroup may mis-represent or miss cpuacct entry.  Faking it
completely is doable too but I'd like to keep it as simple as
possible.

* If cpuacct is not used, nothing changes of course.

* If cpuacct is requested to be co-mounted with cpu, cpuacct is not
  used.  It doesn't matter whether CONFIG_CGROUP_CPUACCT is enabled or
  not.  cpu will provide cpusage.* statistics.

* If cpuacct is requested to be mounted separately and cpuacct is
  enabled, cgroup whines before mounting it.  If cpuacct is disabled,
  it fails.

I think the resulting behavior isn't too crazy and it should allow us
to remove anything cpuacct related fairly soon.

Longer term, I'd prefer to be able to drop the compatibility hack from
cgroup core too although I don't think that's too urgent.  I think
userland can either create symlinks for compatibility -
e.g. /sys/fs/cgroup/cpuacct and /sys/fs/cgroup/cpu,cpucct both
pointing to /sys/fs/cgroup/cpu - whether cpuacct exists or not, or
just ignore anything cpuacct related.

This series is based on for-3.7-hierarchy 8c7f6edbda0 ("cgroup: mark
subsystems with broken hierarchy...") and also available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-deprecate-cpuacct

 include/linux/cgroup.h   |    1 
 init/Kconfig             |   11 ++
 kernel/cgroup.c          |   57 ++++++++++++-
 kernel/sched/core.c      |  196 ++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/fair.c      |    1 
 kernel/sched/rt.c        |    1 
 kernel/sched/sched.h     |    7 +
 kernel/sched/stop_task.c |    1 
 8 files changed, 269 insertions(+), 6 deletions(-)

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] cgroup: implement CFTYPE_NO_PREFIX
  2012-09-19 22:43 [PATCHSET RFC] cpu,cpuacct: make cpu serve cpuacct files and deprecate cpuacct Tejun Heo
@ 2012-09-19 22:43 ` Tejun Heo
  2012-09-19 22:43 ` [PATCH 2/3] cgroup, sched: let cpu serve the same files as cpuacct Tejun Heo
  2012-09-19 22:43 ` [PATCH 3/3] cgroup, sched: deprecate cpuacct Tejun Heo
  2 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2012-09-19 22:43 UTC (permalink / raw)
  To: linux-kernel, containers, cgroups, lizefan
  Cc: peterz, glommer, mhocko, kay.sievers, mzxreary, davej, ben, pjt,
	Tejun Heo

When cgroup files are created, cgroup core automatically prepends the
name of the subsystem as prefix.  This patch adds CFTYPE_NO_PREFIX
which disables the automatic prefix.

This will be used to deprecate cpuacct which will make cpu create and
serve the cpuacct files.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Glauber Costa <glommer@parallels.com>
---
 include/linux/cgroup.h |    1 +
 kernel/cgroup.c        |    3 ++-
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 68e8df7..7d6a298 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -283,6 +283,7 @@ struct cgroup_map_cb {
 /* cftype->flags */
 #define CFTYPE_ONLY_ON_ROOT	(1U << 0)	/* only create on root cg */
 #define CFTYPE_NOT_ON_ROOT	(1U << 1)	/* don't create onp root cg */
+#define CFTYPE_NO_PREFIX	(1U << 2)	/* skip subsys prefix */
 
 #define MAX_CFTYPE_NAME		64
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b7d9606..08edb52 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2687,7 +2687,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
 	if ((cft->flags & CFTYPE_ONLY_ON_ROOT) && cgrp->parent)
 		return 0;
 
-	if (subsys && !test_bit(ROOT_NOPREFIX, &cgrp->root->flags)) {
+	if (subsys && !(cft->flags & CFTYPE_NO_PREFIX) &&
+	    !test_bit(ROOT_NOPREFIX, &cgrp->root->flags)) {
 		strcpy(name, subsys->name);
 		strcat(name, ".");
 	}
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] cgroup, sched: let cpu serve the same files as cpuacct
  2012-09-19 22:43 [PATCHSET RFC] cpu,cpuacct: make cpu serve cpuacct files and deprecate cpuacct Tejun Heo
  2012-09-19 22:43 ` [PATCH 1/3] cgroup: implement CFTYPE_NO_PREFIX Tejun Heo
@ 2012-09-19 22:43 ` Tejun Heo
  2012-09-20  8:05   ` Glauber Costa
  2012-09-19 22:43 ` [PATCH 3/3] cgroup, sched: deprecate cpuacct Tejun Heo
  2 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2012-09-19 22:43 UTC (permalink / raw)
  To: linux-kernel, containers, cgroups, lizefan
  Cc: peterz, glommer, mhocko, kay.sievers, mzxreary, davej, ben, pjt,
	Tejun Heo

cpuacct being on a separate hierarchy is one of the main cgroup
related complaints from scheduler side and the consensus seems to be

* Allowing cpuacct to be a separate controller was a mistake.  In
  general multiple controllers on the same type of resource should be
  avoided, especially accounting-only ones.

* Statistics provided by cpuacct are useful and should instead be
  served by cpu.

This patch makes cpu maintain and serve all cpuacct.* files and make
cgroup core ignore cpuacct if it's co-mounted with cpu.  This is a
step in deprecating cpuacct.  The next patch will allow disabling or
dropping cpuacct without affecting userland too much.

Note that this creates some discrepancies in /proc/cgroups and
/proc/PID/cgroup.  The co-mounted cpuacct won't be reflected correctly
there.  cpuacct will eventually be removed completely probably except
for the statistics filenames and I'd like to keep the amount of
compatbility hackery to minimum as much as possible.

The cpu statistics implementation isn't optimized in any way.  It's
mostly verbatim copy from cpuacct.  The goal is allowing quick
disabling and removal of CONFIG_CGROUP_CPUACCT and creating a base on
top of which cpu can implement proper optimization.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Paul Turner <pjt@google.com>
---
 kernel/cgroup.c          |   13 +++
 kernel/sched/core.c      |  194 +++++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/fair.c      |    1 +
 kernel/sched/rt.c        |    1 +
 kernel/sched/sched.h     |    7 ++
 kernel/sched/stop_task.c |    1 +
 6 files changed, 215 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 08edb52..01c11e3 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1265,6 +1265,19 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
 	/* Consistency checks */
 
 	/*
+	 * cpuacct is deprecated and cpu will serve the same stat files.
+	 * If co-mount with cpu is requested, ignore cpuacct.  Note that
+	 * this creates some discrepancies in /proc/cgroups and
+	 * /proc/PID/cgroup.
+	 *
+	 * https://lkml.org/lkml/2012/9/13/542
+	 */
+#if IS_ENABLED(CONFIG_CGROUP_SCHED) && IS_ENABLED(CONFIG_CGROUP_CPUACCT)
+	if ((opts->subsys_bits & (1 << cpu_cgroup_subsys_id)) &&
+	    (opts->subsys_bits & (1 << cpuacct_subsys_id)))
+		opts->subsys_bits &= ~(1 << cpuacct_subsys_id);
+#endif
+	/*
 	 * Option noprefix was introduced just for backward compatibility
 	 * with the old cpuset, so we allow noprefix only if mounting just
 	 * the cpuset subsystem.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fbf1fd0..9648671 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2817,8 +2817,10 @@ struct cpuacct root_cpuacct;
 static inline void task_group_account_field(struct task_struct *p, int index,
 					    u64 tmp)
 {
+#ifdef CONFIG_CGROUP_SCHED
+	struct task_group *tg;
+#endif
 #ifdef CONFIG_CGROUP_CPUACCT
-	struct kernel_cpustat *kcpustat;
 	struct cpuacct *ca;
 #endif
 	/*
@@ -2829,6 +2831,20 @@ static inline void task_group_account_field(struct task_struct *p, int index,
 	 */
 	__get_cpu_var(kernel_cpustat).cpustat[index] += tmp;
 
+#ifdef CONFIG_CGROUP_SCHED
+	rcu_read_lock();
+	tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id),
+			  struct task_group, css);
+
+	while (tg && (tg != &root_task_group)) {
+		struct kernel_cpustat *kcpustat = this_cpu_ptr(tg->cpustat);
+
+		kcpustat->cpustat[index] += tmp;
+		tg = tg->parent;
+	}
+	rcu_read_unlock();
+#endif
+
 #ifdef CONFIG_CGROUP_CPUACCT
 	if (unlikely(!cpuacct_subsys.active))
 		return;
@@ -2836,7 +2852,8 @@ static inline void task_group_account_field(struct task_struct *p, int index,
 	rcu_read_lock();
 	ca = task_ca(p);
 	while (ca && (ca != &root_cpuacct)) {
-		kcpustat = this_cpu_ptr(ca->cpustat);
+		struct kernel_cpustat *kcpustat = this_cpu_ptr(ca->cpustat);
+
 		kcpustat->cpustat[index] += tmp;
 		ca = parent_ca(ca);
 	}
@@ -7253,6 +7270,7 @@ int in_sched_functions(unsigned long addr)
 #ifdef CONFIG_CGROUP_SCHED
 struct task_group root_task_group;
 LIST_HEAD(task_groups);
+static DEFINE_PER_CPU(u64, root_tg_cpuusage);
 #endif
 
 DECLARE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
@@ -7311,6 +7329,8 @@ void __init sched_init(void)
 #endif /* CONFIG_RT_GROUP_SCHED */
 
 #ifdef CONFIG_CGROUP_SCHED
+	root_task_group.cpustat = &kernel_cpustat;
+	root_task_group.cpuusage = &root_tg_cpuusage;
 	list_add(&root_task_group.list, &task_groups);
 	INIT_LIST_HEAD(&root_task_group.children);
 	INIT_LIST_HEAD(&root_task_group.siblings);
@@ -7594,6 +7614,8 @@ static void free_sched_group(struct task_group *tg)
 	free_fair_sched_group(tg);
 	free_rt_sched_group(tg);
 	autogroup_free(tg);
+	free_percpu(tg->cpuusage);
+	free_percpu(tg->cpustat);
 	kfree(tg);
 }
 
@@ -7607,6 +7629,11 @@ struct task_group *sched_create_group(struct task_group *parent)
 	if (!tg)
 		return ERR_PTR(-ENOMEM);
 
+	tg->cpuusage = alloc_percpu(u64);
+	tg->cpustat = alloc_percpu(struct kernel_cpustat);
+	if (!tg->cpuusage || !tg->cpustat)
+		goto err;
+
 	if (!alloc_fair_sched_group(tg, parent))
 		goto err;
 
@@ -7698,6 +7725,24 @@ void sched_move_task(struct task_struct *tsk)
 
 	task_rq_unlock(rq, tsk, &flags);
 }
+
+void task_group_charge(struct task_struct *tsk, u64 cputime)
+{
+	struct task_group *tg;
+	int cpu = task_cpu(tsk);
+
+	rcu_read_lock();
+
+	tg = container_of(task_subsys_state(tsk, cpu_cgroup_subsys_id),
+			  struct task_group, css);
+
+	for (; tg; tg = tg->parent) {
+		u64 *cpuusage = per_cpu_ptr(tg->cpuusage, cpu);
+		*cpuusage += cputime;
+	}
+
+	rcu_read_unlock();
+}
 #endif /* CONFIG_CGROUP_SCHED */
 
 #if defined(CONFIG_RT_GROUP_SCHED) || defined(CONFIG_CFS_BANDWIDTH)
@@ -8054,6 +8099,134 @@ cpu_cgroup_exit(struct cgroup *cgrp, struct cgroup *old_cgrp,
 	sched_move_task(task);
 }
 
+static u64 task_group_cpuusage_read(struct task_group *tg, int cpu)
+{
+	u64 *cpuusage = per_cpu_ptr(tg->cpuusage, cpu);
+	u64 data;
+
+#ifndef CONFIG_64BIT
+	/*
+	 * Take rq->lock to make 64-bit read safe on 32-bit platforms.
+	 */
+	raw_spin_lock_irq(&cpu_rq(cpu)->lock);
+	data = *cpuusage;
+	raw_spin_unlock_irq(&cpu_rq(cpu)->lock);
+#else
+	data = *cpuusage;
+#endif
+
+	return data;
+}
+
+static void task_group_cpuusage_write(struct task_group *tg, int cpu, u64 val)
+{
+	u64 *cpuusage = per_cpu_ptr(tg->cpuusage, cpu);
+
+#ifndef CONFIG_64BIT
+	/*
+	 * Take rq->lock to make 64-bit write safe on 32-bit platforms.
+	 */
+	raw_spin_lock_irq(&cpu_rq(cpu)->lock);
+	*cpuusage = val;
+	raw_spin_unlock_irq(&cpu_rq(cpu)->lock);
+#else
+	*cpuusage = val;
+#endif
+}
+
+/* return total cpu usage (in nanoseconds) of a group */
+static u64 cpucg_cpuusage_read(struct cgroup *cgrp, struct cftype *cft)
+{
+	struct task_group *tg;
+	u64 totalcpuusage = 0;
+	int i;
+
+	tg = container_of(cgroup_subsys_state(cgrp, cpu_cgroup_subsys_id),
+			  struct task_group, css);
+
+	for_each_present_cpu(i)
+		totalcpuusage += task_group_cpuusage_read(tg, i);
+
+	return totalcpuusage;
+}
+
+static int cpucg_cpuusage_write(struct cgroup *cgrp, struct cftype *cftype,
+				u64 reset)
+{
+	struct task_group *tg;
+	int err = 0;
+	int i;
+
+	tg = container_of(cgroup_subsys_state(cgrp, cpu_cgroup_subsys_id),
+			  struct task_group, css);
+
+	if (reset) {
+		err = -EINVAL;
+		goto out;
+	}
+
+	for_each_present_cpu(i)
+		task_group_cpuusage_write(tg, i, 0);
+
+out:
+	return err;
+}
+
+static int cpucg_percpu_seq_read(struct cgroup *cgrp, struct cftype *cft,
+				 struct seq_file *m)
+{
+	struct task_group *tg;
+	u64 percpu;
+	int i;
+
+	tg = container_of(cgroup_subsys_state(cgrp, cpu_cgroup_subsys_id),
+			  struct task_group, css);
+
+	for_each_present_cpu(i) {
+		percpu = task_group_cpuusage_read(tg, i);
+		seq_printf(m, "%llu ", (unsigned long long) percpu);
+	}
+	seq_printf(m, "\n");
+	return 0;
+}
+
+static const char *cpucg_stat_desc[] = {
+	[CPUACCT_STAT_USER] = "user",
+	[CPUACCT_STAT_SYSTEM] = "system",
+};
+
+static int cpucg_stats_show(struct cgroup *cgrp, struct cftype *cft,
+			    struct cgroup_map_cb *cb)
+{
+	struct task_group *tg;
+	int cpu;
+	s64 val = 0;
+
+	tg = container_of(cgroup_subsys_state(cgrp, cpu_cgroup_subsys_id),
+			  struct task_group, css);
+
+	for_each_online_cpu(cpu) {
+		struct kernel_cpustat *kcpustat = per_cpu_ptr(tg->cpustat, cpu);
+		val += kcpustat->cpustat[CPUTIME_USER];
+		val += kcpustat->cpustat[CPUTIME_NICE];
+	}
+	val = cputime64_to_clock_t(val);
+	cb->fill(cb, cpucg_stat_desc[CPUACCT_STAT_USER], val);
+
+	val = 0;
+	for_each_online_cpu(cpu) {
+		struct kernel_cpustat *kcpustat = per_cpu_ptr(tg->cpustat, cpu);
+		val += kcpustat->cpustat[CPUTIME_SYSTEM];
+		val += kcpustat->cpustat[CPUTIME_IRQ];
+		val += kcpustat->cpustat[CPUTIME_SOFTIRQ];
+	}
+
+	val = cputime64_to_clock_t(val);
+	cb->fill(cb, cpucg_stat_desc[CPUACCT_STAT_SYSTEM], val);
+
+	return 0;
+}
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 static int cpu_shares_write_u64(struct cgroup *cgrp, struct cftype *cftype,
 				u64 shareval)
@@ -8360,6 +8533,23 @@ static struct cftype cpu_files[] = {
 		.write_u64 = cpu_rt_period_write_uint,
 	},
 #endif
+	/* cpuacct.* which used to be served by a separate cpuacct controller */
+	{
+		.name = "cpuacct.usage",
+		.flags = CFTYPE_NO_PREFIX,
+		.read_u64 = cpucg_cpuusage_read,
+		.write_u64 = cpucg_cpuusage_write,
+	},
+	{
+		.name = "cpuacct.usage_percpu",
+		.flags = CFTYPE_NO_PREFIX,
+		.read_seq_string = cpucg_percpu_seq_read,
+	},
+	{
+		.name = "cpuacct.stat",
+		.flags = CFTYPE_NO_PREFIX,
+		.read_map = cpucg_stats_show,
+	},
 	{ }	/* terminate */
 };
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c219bf8..8935c3a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -706,6 +706,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 		struct task_struct *curtask = task_of(curr);
 
 		trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime);
+		task_group_charge(curtask, delta_exec);
 		cpuacct_charge(curtask, delta_exec);
 		account_group_exec_runtime(curtask, delta_exec);
 	}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 944cb68..53f7d3d 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -934,6 +934,7 @@ static void update_curr_rt(struct rq *rq)
 	account_group_exec_runtime(curr, delta_exec);
 
 	curr->se.exec_start = rq->clock_task;
+	task_group_charge(curr, delta_exec);
 	cpuacct_charge(curr, delta_exec);
 
 	sched_rt_avg_update(rq, delta_exec);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f6714d0..bfa115e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -104,6 +104,10 @@ struct cfs_bandwidth {
 struct task_group {
 	struct cgroup_subsys_state css;
 
+	/* statistics */
+	u64 __percpu *cpuusage;
+	struct kernel_cpustat __percpu *cpustat;
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	/* schedulable entities of this group on each cpu */
 	struct sched_entity **se;
@@ -575,6 +579,8 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
 #endif
 }
 
+extern void task_group_charge(struct task_struct *tsk, u64 cputime);
+
 #else /* CONFIG_CGROUP_SCHED */
 
 static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
@@ -582,6 +588,7 @@ static inline struct task_group *task_group(struct task_struct *p)
 {
 	return NULL;
 }
+static inline void task_group_charge(struct task_struct *tsk, u64 cputime) { }
 
 #endif /* CONFIG_CGROUP_SCHED */
 
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index da5eb5b..e323790 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -68,6 +68,7 @@ static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
 	account_group_exec_runtime(curr, delta_exec);
 
 	curr->se.exec_start = rq->clock_task;
+	task_group_charge(curr, delta_exec);
 	cpuacct_charge(curr, delta_exec);
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] cgroup, sched: deprecate cpuacct
  2012-09-19 22:43 [PATCHSET RFC] cpu,cpuacct: make cpu serve cpuacct files and deprecate cpuacct Tejun Heo
  2012-09-19 22:43 ` [PATCH 1/3] cgroup: implement CFTYPE_NO_PREFIX Tejun Heo
  2012-09-19 22:43 ` [PATCH 2/3] cgroup, sched: let cpu serve the same files as cpuacct Tejun Heo
@ 2012-09-19 22:43 ` Tejun Heo
  2 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2012-09-19 22:43 UTC (permalink / raw)
  To: linux-kernel, containers, cgroups, lizefan
  Cc: peterz, glommer, mhocko, kay.sievers, mzxreary, davej, ben, pjt,
	Tejun Heo

Now that cpu serves the same files as cpuacct and using cpuacct
separately from cpu is deprecated, we can deprecate cpuacct.  To avoid
disturbing userland which has been co-mounting cpu and cpuacct,
implement some hackery in cgroup core so that cpuacct co-mounting
still works even if cpuacct is disabled.

The goal of this patch is to accelerate disabling and removal of
cpuacct by decoupling kernel-side deprecation from userland changes.
Userland is recommended to do the following.

* If /proc/cgroups lists cpuacct, always co-mount it with cpu under
  e.g. /sys/fs/cgroup/cpu.

* Optionally create symlinks for compatibility -
  e.g. /sys/fs/cgroup/cpuacct and /sys/fs/cgroup/cpu,cpucct both
  pointing to /sys/fs/cgroup/cpu - whether cpuacct exists or not.

This compatibility hack will eventually go away.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Paul Turner <pjt@google.com>
---
 init/Kconfig        |   11 ++++++++++-
 kernel/cgroup.c     |   41 +++++++++++++++++++++++++++++++++++++++--
 kernel/sched/core.c |    2 ++
 3 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index af6c7f8..a13cf8f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -675,11 +675,20 @@ config PROC_PID_CPUSET
 	default y
 
 config CGROUP_CPUACCT
-	bool "Simple CPU accounting cgroup subsystem"
+	bool "DEPRECATED: Simple CPU accounting cgroup subsystem"
+	default n
 	help
 	  Provides a simple Resource Controller for monitoring the
 	  total CPU consumed by the tasks in a cgroup.
 
+	  This cgroup subsystem is deprecated.  The CPU cgroup
+	  subsystem serves the same accounting files and "cpuacct"
+	  mount option is ignored if specified with "cpu".  As long as
+	  userland co-mounts cpu and cpuacct, disabling this
+	  controller should be mostly unnoticeable - one notable
+	  difference is that /proc/PID/cgroup won't list cpuacct
+	  anymore.
+
 config RESOURCE_COUNTERS
 	bool "Resource counters"
 	help
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 01c11e3..a577f6c 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1157,6 +1157,7 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
 	unsigned long mask = (unsigned long)-1;
 	int i;
 	bool module_pin_failed = false;
+	bool cpuacct_requested = false;
 
 	BUG_ON(!mutex_is_locked(&cgroup_mutex));
 
@@ -1242,8 +1243,13 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
 
 			break;
 		}
-		if (i == CGROUP_SUBSYS_COUNT)
+		/* handle deprecated cpuacct specially, see below */
+		if (!strcmp(token, "cpuacct")) {
+			cpuacct_requested = true;
+			one_ss = true;
+		} else if (i == CGROUP_SUBSYS_COUNT) {
 			return -ENOENT;
+		}
 	}
 
 	/*
@@ -1270,8 +1276,25 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
 	 * this creates some discrepancies in /proc/cgroups and
 	 * /proc/PID/cgroup.
 	 *
+	 * Accept and ignore "cpuacct" option if comounted with "cpu" even
+	 * when cpuacct itself is disabled to allow quick disabling and
+	 * removal of cpuacct.  This will be removed eventually.
+	 *
 	 * https://lkml.org/lkml/2012/9/13/542
 	 */
+	if (cpuacct_requested) {
+		bool comounted = false;
+
+#if IS_ENABLED(CONFIG_CGROUP_SCHED)
+		comounted = opts->subsys_bits & (1 << cpu_cgroup_subsys_id);
+#endif
+		if (!comounted) {
+			pr_warning("cgroup: mounting cpuacct separately from cpu is deprecated\n");
+#if !IS_ENABLED(CONFIG_CGROUP_CPUACCT)
+			return -EINVAL;
+#endif
+		}
+	}
 #if IS_ENABLED(CONFIG_CGROUP_SCHED) && IS_ENABLED(CONFIG_CGROUP_CPUACCT)
 	if ((opts->subsys_bits & (1 << cpu_cgroup_subsys_id)) &&
 	    (opts->subsys_bits & (1 << cpuacct_subsys_id)))
@@ -4678,6 +4701,7 @@ const struct file_operations proc_cgroup_operations = {
 /* Display information about each subsystem and each hierarchy */
 static int proc_cgroupstats_show(struct seq_file *m, void *v)
 {
+	struct cgroup_subsys *ss;
 	int i;
 
 	seq_puts(m, "#subsys_name\thierarchy\tnum_cgroups\tenabled\n");
@@ -4688,7 +4712,7 @@ static int proc_cgroupstats_show(struct seq_file *m, void *v)
 	 */
 	mutex_lock(&cgroup_mutex);
 	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
-		struct cgroup_subsys *ss = subsys[i];
+		ss = subsys[i];
 		if (ss == NULL)
 			continue;
 		seq_printf(m, "%s\t%d\t%d\t%d\n",
@@ -4696,6 +4720,19 @@ static int proc_cgroupstats_show(struct seq_file *m, void *v)
 			   ss->root->number_of_cgroups, !ss->disabled);
 	}
 	mutex_unlock(&cgroup_mutex);
+
+	/*
+	 * Fake /proc/cgroups entry for cpuacct to trick userland into
+	 * cpu,cpuacct comounts.  This is to allow quick disabling and
+	 * removal of cpuacct and will be removed eventually.
+	 */
+#if IS_ENABLED(CONFIG_CGROUP_SCHED) && !IS_ENABLED(CONFIG_CGROUP_CPUACCT)
+	ss = subsys[cpu_cgroup_subsys_id];
+	if (ss) {
+		seq_printf(m, "cpuacct\t%d\t%d\t%d\n", ss->root->hierarchy_id,
+			   ss->root->number_of_cgroups, !ss->disabled);
+	}
+#endif
 	return 0;
 }
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9648671..10176a3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8569,6 +8569,8 @@ struct cgroup_subsys cpu_cgroup_subsys = {
 
 #ifdef CONFIG_CGROUP_CPUACCT
 
+#warning CONFIG_CGROUP_CPUACCT is deprecated, read the Kconfig help message
+
 /*
  * CPU accounting code for task groups.
  *
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] cgroup, sched: let cpu serve the same files as cpuacct
  2012-09-19 22:43 ` [PATCH 2/3] cgroup, sched: let cpu serve the same files as cpuacct Tejun Heo
@ 2012-09-20  8:05   ` Glauber Costa
  2012-09-20 18:00     ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Glauber Costa @ 2012-09-20  8:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, containers, cgroups, lizefan, peterz, mhocko,
	kay.sievers, mzxreary, davej, ben, pjt

On 09/20/2012 02:43 AM, Tejun Heo wrote:
> +
> +void task_group_charge(struct task_struct *tsk, u64 cputime)
> +{
> +	struct task_group *tg;
> +	int cpu = task_cpu(tsk);
> +
> +	rcu_read_lock();
> +
> +	tg = container_of(task_subsys_state(tsk, cpu_cgroup_subsys_id),
> +			  struct task_group, css);
> +
> +	for (; tg; tg = tg->parent) {
> +		u64 *cpuusage = per_cpu_ptr(tg->cpuusage, cpu);
> +		*cpuusage += cputime;
> +	}
> +
> +	rcu_read_unlock();
> +}
>  #endif /* CONFIG_CGROUP_SCHED */

The whole point of this merge is that this is not needed.
This information is already available from exec_clock for fair tasks.
for rt tasks, we have no exec clock, but do have a hierarchy walk a bit
below the current cpuacct charge, that can be used for that purpose.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] cgroup, sched: let cpu serve the same files as cpuacct
  2012-09-20  8:05   ` Glauber Costa
@ 2012-09-20 18:00     ` Tejun Heo
  0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2012-09-20 18:00 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-kernel, containers, cgroups, lizefan, peterz, mhocko,
	kay.sievers, mzxreary, davej, ben, pjt

Hell, Glauber.

On Thu, Sep 20, 2012 at 12:05:10PM +0400, Glauber Costa wrote:
> The whole point of this merge is that this is not needed.

Yeah, the whole point of this series is enabling that and other
optimizations.

> This information is already available from exec_clock for fair tasks.
> for rt tasks, we have no exec clock, but do have a hierarchy walk a bit
> below the current cpuacct charge, that can be used for that purpose.

Yeah yeah, sure, now you can build that on top of this patchset
without worrying about cpuacct, which can also, hopefully, be removed
fairly soon.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-09-20 18:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-19 22:43 [PATCHSET RFC] cpu,cpuacct: make cpu serve cpuacct files and deprecate cpuacct Tejun Heo
2012-09-19 22:43 ` [PATCH 1/3] cgroup: implement CFTYPE_NO_PREFIX Tejun Heo
2012-09-19 22:43 ` [PATCH 2/3] cgroup, sched: let cpu serve the same files as cpuacct Tejun Heo
2012-09-20  8:05   ` Glauber Costa
2012-09-20 18:00     ` Tejun Heo
2012-09-19 22:43 ` [PATCH 3/3] cgroup, sched: deprecate cpuacct Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).