All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online()
@ 2017-01-26  9:41 Konstantin Khlebnikov
  2017-01-26  9:41 ` [PATCH 2/2] kernfs: define name and path to "(null)" for NULL pointer kernfs nodes Konstantin Khlebnikov
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-26  9:41 UTC (permalink / raw)
  To: Tejun Heo, Peter Zijlstra, Ingo Molnar, linux-kernel; +Cc: stable

Commit 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init") moved
sched_online_group() from css_online() to css_alloc(). It exposes half-baked
task group into global lists before initializing generic cgroup stuff.

LTP testcase (third in cgroup_regression_test) written for testing
similar race in kernels 2.6.26-2.6.28 easily triggers this oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: kernfs_path_from_node_locked+0x260/0x320
CPU: 1 PID: 30346 Comm: cat Not tainted 4.10.0-rc5-test #4
Call Trace:
? kernfs_path_from_node+0x4f/0x60
kernfs_path_from_node+0x3e/0x60
print_rt_rq+0x44/0x2b0
print_rt_stats+0x7a/0xd0
print_cpu+0x2fc/0xe80
? __might_sleep+0x4a/0x80
sched_debug_show+0x17/0x30
seq_read+0xf2/0x3b0
proc_reg_read+0x42/0x70
__vfs_read+0x28/0x130
? security_file_permission+0x9b/0xc0
? rw_verify_area+0x4e/0xb0
vfs_read+0xa5/0x170
SyS_read+0x46/0xa0
entry_SYSCALL_64_fastpath+0x1e/0xad

Here task group already linked into global RCU-protected list task_groups
but pointer css->cgroup is still NULL.

This patch reverts this chunk and moves online back to css_online().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init")
Cc: <stable@vger.kernel.org> # 4.6+
---
 kernel/sched/core.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c56fb57f2991..a4020bec7a87 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8398,11 +8398,19 @@ cpu_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 	if (IS_ERR(tg))
 		return ERR_PTR(-ENOMEM);
 
-	sched_online_group(tg, parent);
-
 	return &tg->css;
 }
 
+static int cpu_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	struct task_group *tg = css_tg(css);
+	struct task_group *parent = css_tg(css->parent);
+
+	if (parent)
+		sched_online_group(tg, parent);
+	return 0;
+}
+
 static void cpu_cgroup_css_released(struct cgroup_subsys_state *css)
 {
 	struct task_group *tg = css_tg(css);
@@ -8805,6 +8813,7 @@ static struct cftype cpu_files[] = {
 
 struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_alloc	= cpu_cgroup_css_alloc,
+	.css_online	= cpu_cgroup_css_online,
 	.css_released	= cpu_cgroup_css_released,
 	.css_free	= cpu_cgroup_css_free,
 	.fork		= cpu_cgroup_fork,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] kernfs: define name and path to "(null)" for NULL pointer kernfs nodes
  2017-01-26  9:41 [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online() Konstantin Khlebnikov
@ 2017-01-26  9:41 ` Konstantin Khlebnikov
  2017-01-26 19:37   ` Tejun Heo
  2017-01-26 10:17 ` [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online() Peter Zijlstra
  2017-02-08 11:27 ` [PATCH RESEND/add comment] " Konstantin Khlebnikov
  2 siblings, 1 reply; 8+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-26  9:41 UTC (permalink / raw)
  To: Tejun Heo, Peter Zijlstra, Ingo Molnar, linux-kernel; +Cc: stable

The same LTP testcase triggers NULL deref css->cgroup->kn in the same place.

Commit a5bca2152036 ("cgroup: factor out cgroup_create() out of
cgroup_mkdir()") reordered cgroup construction. Now css created and
set online twice: first in cgroup_create(), second in cgroup_mkdir().
First happens before creation of kernfs node.

It seems safer and cleaner to handle NULL pointer right in kernfs than
spreading this magic across all other code. Now all functions that
prints kernfs path and name prints "(null)" for NULL pointer.

Also it might be a good idea to reorder cgroup construction to make these
NULL pointers unreachable: online css only after complete initialization.
After that NULL checks in kernfs could be surrounded with WARN_ON().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: a5bca2152036 ("cgroup: factor out cgroup_create() out of cgroup_mkdir()")
Cc: <stable@vger.kernel.org> # 4.6+
---
 fs/kernfs/dir.c               |    6 ++++++
 include/trace/events/cgroup.h |   20 ++++++--------------
 2 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index cf4c636ff4da..2ba728d134b2 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -41,6 +41,9 @@ static bool kernfs_lockdep(struct kernfs_node *kn)
 
 static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 {
+	if (!kn)
+		return strlcpy(buf, "(null)", buflen);
+
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
@@ -123,6 +126,9 @@ static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
 	size_t depth_from, depth_to, len = 0;
 	int i, j;
 
+	if (!kn_to)
+		return strlcpy(buf, "(null)", buflen);
+
 	if (!kn_from)
 		kn_from = kernfs_root(kn_to)->kn;
 
diff --git a/include/trace/events/cgroup.h b/include/trace/events/cgroup.h
index ab68640a18d0..c226f50e88fa 100644
--- a/include/trace/events/cgroup.h
+++ b/include/trace/events/cgroup.h
@@ -61,19 +61,15 @@ DECLARE_EVENT_CLASS(cgroup,
 		__field(	int,		id			)
 		__field(	int,		level			)
 		__dynamic_array(char,		path,
-				cgrp->kn ? cgroup_path(cgrp, NULL, 0) + 1
-					 : strlen("(null)"))
+				cgroup_path(cgrp, NULL, 0) + 1)
 	),
 
 	TP_fast_assign(
 		__entry->root = cgrp->root->hierarchy_id;
 		__entry->id = cgrp->id;
 		__entry->level = cgrp->level;
-		if (cgrp->kn)
-			cgroup_path(cgrp, __get_dynamic_array(path),
-				    __get_dynamic_array_len(path));
-		else
-			__assign_str(path, "(null)");
+		cgroup_path(cgrp, __get_dynamic_array(path),
+				  __get_dynamic_array_len(path));
 	),
 
 	TP_printk("root=%d id=%d level=%d path=%s",
@@ -119,8 +115,7 @@ DECLARE_EVENT_CLASS(cgroup_migrate,
 		__field(	int,		dst_id			)
 		__field(	int,		dst_level		)
 		__dynamic_array(char,		dst_path,
-				dst_cgrp->kn ? cgroup_path(dst_cgrp, NULL, 0) + 1
-					     : strlen("(null)"))
+				cgroup_path(dst_cgrp, NULL, 0) + 1)
 		__field(	int,		pid			)
 		__string(	comm,		task->comm		)
 	),
@@ -129,11 +124,8 @@ DECLARE_EVENT_CLASS(cgroup_migrate,
 		__entry->dst_root = dst_cgrp->root->hierarchy_id;
 		__entry->dst_id = dst_cgrp->id;
 		__entry->dst_level = dst_cgrp->level;
-		if (dst_cgrp->kn)
-			cgroup_path(dst_cgrp, __get_dynamic_array(dst_path),
-				    __get_dynamic_array_len(dst_path));
-		else
-			__assign_str(dst_path, "(null)");
+		cgroup_path(dst_cgrp, __get_dynamic_array(dst_path),
+				      __get_dynamic_array_len(dst_path));
 		__entry->pid = task->pid;
 		__assign_str(comm, task->comm);
 	),

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online()
  2017-01-26  9:41 [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online() Konstantin Khlebnikov
  2017-01-26  9:41 ` [PATCH 2/2] kernfs: define name and path to "(null)" for NULL pointer kernfs nodes Konstantin Khlebnikov
@ 2017-01-26 10:17 ` Peter Zijlstra
  2017-01-26 10:27   ` Konstantin Khlebnikov
  2017-02-08 11:27 ` [PATCH RESEND/add comment] " Konstantin Khlebnikov
  2 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2017-01-26 10:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: Tejun Heo, Ingo Molnar, linux-kernel, stable

On Thu, Jan 26, 2017 at 12:41:41PM +0300, Konstantin Khlebnikov wrote:
> Commit 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init") moved
> sched_online_group() from css_online() to css_alloc(). It exposes half-baked
> task group into global lists before initializing generic cgroup stuff.
> 
> LTP testcase (third in cgroup_regression_test) written for testing
> similar race in kernels 2.6.26-2.6.28 easily triggers this oops:
> 

So nobody's run LTP against the kernel for almost a year?

> 
> Here task group already linked into global RCU-protected list task_groups
> but pointer css->cgroup is still NULL.
> 
> This patch reverts this chunk and moves online back to css_online().

Maybe put a comment with it that explains why this is needed?

Something along the lines of this perhaps?

/*
 * Don't expose the cgroup until initialization of it is complete in the
 * cgroup core. Otherwise things like cgroup_path() will return NULL
 * pointers and the like.
 */

> +static int cpu_cgroup_css_online(struct cgroup_subsys_state *css)
> +{
> +	struct task_group *tg = css_tg(css);
> +	struct task_group *parent = css_tg(css->parent);
> +
> +	if (parent)
> +		sched_online_group(tg, parent);
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online()
  2017-01-26 10:17 ` [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online() Peter Zijlstra
@ 2017-01-26 10:27   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 8+ messages in thread
From: Konstantin Khlebnikov @ 2017-01-26 10:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Tejun Heo, Ingo Molnar, linux-kernel, stable



On 26.01.2017 13:17, Peter Zijlstra wrote:
> On Thu, Jan 26, 2017 at 12:41:41PM +0300, Konstantin Khlebnikov wrote:
>> Commit 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init") moved
>> sched_online_group() from css_online() to css_alloc(). It exposes half-baked
>> task group into global lists before initializing generic cgroup stuff.
>>
>> LTP testcase (third in cgroup_regression_test) written for testing
>> similar race in kernels 2.6.26-2.6.28 easily triggers this oops:
>>
>
> So nobody's run LTP against the kernel for almost a year?

Yep. Nobody runs LTP =)

CONFIG_RT_GROUP_SCHED must be y (which almost impossible to use IRL, I have some patches for it out of tree)

Also systemd by default binds cgroups cpu and cpuacct together - this breaks testcase.

>
>>
>> Here task group already linked into global RCU-protected list task_groups
>> but pointer css->cgroup is still NULL.
>>
>> This patch reverts this chunk and moves online back to css_online().
>
> Maybe put a comment with it that explains why this is needed?
>
> Something along the lines of this perhaps?
>

Actually online is called before complete initialization. See second patch. =)
I don't know what to do with this. cgroups are messy as always.

> /*
>  * Don't expose the cgroup until initialization of it is complete in the
>  * cgroup core. Otherwise things like cgroup_path() will return NULL
>  * pointers and the like.
>  */
>
>> +static int cpu_cgroup_css_online(struct cgroup_subsys_state *css)
>> +{
>> +	struct task_group *tg = css_tg(css);
>> +	struct task_group *parent = css_tg(css->parent);
>> +
>> +	if (parent)
>> +		sched_online_group(tg, parent);
>> +	return 0;
>> +}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] kernfs: define name and path to "(null)" for NULL pointer kernfs nodes
  2017-01-26  9:41 ` [PATCH 2/2] kernfs: define name and path to "(null)" for NULL pointer kernfs nodes Konstantin Khlebnikov
@ 2017-01-26 19:37   ` Tejun Heo
  2017-01-26 21:50     ` [PATCH cgroup/for-4.10-fixes] cgroup: don't online subsystems before cgroup_name/path() are operational Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2017-01-26 19:37 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, stable

Hello,

On Thu, Jan 26, 2017 at 12:41:48PM +0300, Konstantin Khlebnikov wrote:
> The same LTP testcase triggers NULL deref css->cgroup->kn in the same place.
> 
> Commit a5bca2152036 ("cgroup: factor out cgroup_create() out of
> cgroup_mkdir()") reordered cgroup construction. Now css created and
> set online twice: first in cgroup_create(), second in cgroup_mkdir().
> First happens before creation of kernfs node.

It's isn't set online twice.  It's onlined and then made visible
later.  The problem here is that cgroup_name/path() depend on a kernfs
node being associated with it and there's a gap between onlining and
kernfs_node association.  It's fine in cgroup proper but
cgroup_name/path() are used from other places too, so it is
problematic.

> It seems safer and cleaner to handle NULL pointer right in kernfs than
> spreading this magic across all other code. Now all functions that
> prints kernfs path and name prints "(null)" for NULL pointer.

I'm not against this patch but yeah this doesn't address the root
cause here.  Can you please send the patch to Greg?  Please feel free
to add my Acked-by.

> Also it might be a good idea to reorder cgroup construction to make these
> NULL pointers unreachable: online css only after complete initialization.
> After that NULL checks in kernfs could be surrounded with WARN_ON().

Oh yeah, definitely.  The names should be available before online.
Will look into it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH cgroup/for-4.10-fixes] cgroup: don't online subsystems before cgroup_name/path() are operational
  2017-01-26 19:37   ` Tejun Heo
@ 2017-01-26 21:50     ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2017-01-26 21:50 UTC (permalink / raw)
  To: Li Zefan, Johannes Weiner, cgroups
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, stable,
	Konstantin Khlebnikov, kernel-team

>From 07cd12945551b63ecb1a349d50a6d69d1d6feb4a Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 26 Jan 2017 16:47:28 -0500

While refactoring cgroup creation, a5bca2152036 ("cgroup: factor out
cgroup_create() out of cgroup_mkdir()") incorrectly onlined subsystems
before the new cgroup is associated with it kernfs_node.  This is fine
for cgroup proper but cgroup_name/path() depend on the associated
kernfs_node and if a subsystem makes the new cgroup_subsys_state
visible, which they're allowed to after onlining, it can lead to NULL
dereference.

The current code performs cgroup creation and subsystem onlining in
cgroup_create() and cgroup_mkdir() makes the cgroup and subsystems
visible afterwards.  There's no reason to online the subsystems early
and we can simply drop cgroup_apply_control_enable() call from
cgroup_create() so that the subsystems are onlined and made visible at
the same time.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: a5bca2152036 ("cgroup: factor out cgroup_create() out of cgroup_mkdir()")
Cc: stable@vger.kernel.org # v4.6+
---
Applied to cgroup/for-4.10-fixes.

Thanks.

 kernel/cgroup.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2ee9ec3..688dd02 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5221,6 +5221,11 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
 	return ERR_PTR(err);
 }
 
+/*
+ * The returned cgroup is fully initialized including its control mask, but
+ * it isn't associated with its kernfs_node and doesn't have the control
+ * mask applied.
+ */
 static struct cgroup *cgroup_create(struct cgroup *parent)
 {
 	struct cgroup_root *root = parent->root;
@@ -5288,11 +5293,6 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 
 	cgroup_propagate_control(cgrp);
 
-	/* @cgrp doesn't have dir yet so the following will only create csses */
-	ret = cgroup_apply_control_enable(cgrp);
-	if (ret)
-		goto out_destroy;
-
 	return cgrp;
 
 out_cancel_ref:
@@ -5300,9 +5300,6 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 out_free_cgrp:
 	kfree(cgrp);
 	return ERR_PTR(ret);
-out_destroy:
-	cgroup_destroy_locked(cgrp);
-	return ERR_PTR(ret);
 }
 
 static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
-- 
2.9.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RESEND/add comment] sched/cgroup: move sched_online_group() back into css_online()
  2017-01-26  9:41 [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online() Konstantin Khlebnikov
  2017-01-26  9:41 ` [PATCH 2/2] kernfs: define name and path to "(null)" for NULL pointer kernfs nodes Konstantin Khlebnikov
  2017-01-26 10:17 ` [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online() Peter Zijlstra
@ 2017-02-08 11:27 ` Konstantin Khlebnikov
  2017-02-24  9:19   ` [tip:sched/urgent] sched/cgroup: Move sched_online_group() back into css_online() to fix crash tip-bot for Konstantin Khlebnikov
  2 siblings, 1 reply; 8+ messages in thread
From: Konstantin Khlebnikov @ 2017-02-08 11:27 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel; +Cc: Tejun Heo

Commit 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init") moved
sched_online_group() from css_online() to css_alloc(). It exposes half-baked
task group into global lists before initializing generic cgroup stuff.

LTP testcase (third in cgroup_regression_test) written for testing
similar race in kernels 2.6.26-2.6.28 easily triggers this oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: kernfs_path_from_node_locked+0x260/0x320
CPU: 1 PID: 30346 Comm: cat Not tainted 4.10.0-rc5-test #4
Call Trace:
? kernfs_path_from_node+0x4f/0x60
kernfs_path_from_node+0x3e/0x60
print_rt_rq+0x44/0x2b0
print_rt_stats+0x7a/0xd0
print_cpu+0x2fc/0xe80
? __might_sleep+0x4a/0x80
sched_debug_show+0x17/0x30
seq_read+0xf2/0x3b0
proc_reg_read+0x42/0x70
__vfs_read+0x28/0x130
? security_file_permission+0x9b/0xc0
? rw_verify_area+0x4e/0xb0
vfs_read+0xa5/0x170
SyS_read+0x46/0xa0
entry_SYSCALL_64_fastpath+0x1e/0xad

Here task group already linked into global RCU-protected list task_groups
but pointer css->cgroup is still NULL.

This patch reverts this chunk and moves online back to css_online().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init")
Cc: <stable@vger.kernel.org> # 4.6+
---
 kernel/sched/core.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c56fb57f2991..5bfd9cc2aa29 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8398,11 +8398,20 @@ cpu_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 	if (IS_ERR(tg))
 		return ERR_PTR(-ENOMEM);
 
-	sched_online_group(tg, parent);
-
 	return &tg->css;
 }
 
+/* Expose task group only after completing cgroup initialization */
+static int cpu_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	struct task_group *tg = css_tg(css);
+	struct task_group *parent = css_tg(css->parent);
+
+	if (parent)
+		sched_online_group(tg, parent);
+	return 0;
+}
+
 static void cpu_cgroup_css_released(struct cgroup_subsys_state *css)
 {
 	struct task_group *tg = css_tg(css);
@@ -8805,6 +8814,7 @@ static struct cftype cpu_files[] = {
 
 struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_alloc	= cpu_cgroup_css_alloc,
+	.css_online	= cpu_cgroup_css_online,
 	.css_released	= cpu_cgroup_css_released,
 	.css_free	= cpu_cgroup_css_free,
 	.fork		= cpu_cgroup_fork,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:sched/urgent] sched/cgroup: Move sched_online_group() back into css_online() to fix crash
  2017-02-08 11:27 ` [PATCH RESEND/add comment] " Konstantin Khlebnikov
@ 2017-02-24  9:19   ` tip-bot for Konstantin Khlebnikov
  0 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Konstantin Khlebnikov @ 2017-02-24  9:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: khlebnikov, torvalds, tglx, peterz, hpa, tj, linux-kernel, mingo

Commit-ID:  96b777452d8881480fd5be50112f791c17db4b6b
Gitweb:     http://git.kernel.org/tip/96b777452d8881480fd5be50112f791c17db4b6b
Author:     Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
AuthorDate: Wed, 8 Feb 2017 14:27:27 +0300
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 24 Feb 2017 09:25:28 +0100

sched/cgroup: Move sched_online_group() back into css_online() to fix crash

Commit:

  2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init")

.. moved sched_online_group() from css_online() to css_alloc().
It exposes half-baked task group into global lists before initializing
generic cgroup stuff.

LTP testcase (third in cgroup_regression_test) written for testing
similar race in kernels 2.6.26-2.6.28 easily triggers this oops:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: kernfs_path_from_node_locked+0x260/0x320
  CPU: 1 PID: 30346 Comm: cat Not tainted 4.10.0-rc5-test #4
  Call Trace:
  ? kernfs_path_from_node+0x4f/0x60
  kernfs_path_from_node+0x3e/0x60
  print_rt_rq+0x44/0x2b0
  print_rt_stats+0x7a/0xd0
  print_cpu+0x2fc/0xe80
  ? __might_sleep+0x4a/0x80
  sched_debug_show+0x17/0x30
  seq_read+0xf2/0x3b0
  proc_reg_read+0x42/0x70
  __vfs_read+0x28/0x130
  ? security_file_permission+0x9b/0xc0
  ? rw_verify_area+0x4e/0xb0
  vfs_read+0xa5/0x170
  SyS_read+0x46/0xa0
  entry_SYSCALL_64_fastpath+0x1e/0xad

Here the task group is already linked into the global RCU-protected 'task_groups'
list, but the css->cgroup pointer is still NULL.

This patch reverts this chunk and moves online back to css_online().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init")
Link: http://lkml.kernel.org/r/148655324740.424917.5302984537258726349.stgit@buzz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cc1e3e0..e01bd80 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6811,11 +6811,20 @@ cpu_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 	if (IS_ERR(tg))
 		return ERR_PTR(-ENOMEM);
 
-	sched_online_group(tg, parent);
-
 	return &tg->css;
 }
 
+/* Expose task group only after completing cgroup initialization */
+static int cpu_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	struct task_group *tg = css_tg(css);
+	struct task_group *parent = css_tg(css->parent);
+
+	if (parent)
+		sched_online_group(tg, parent);
+	return 0;
+}
+
 static void cpu_cgroup_css_released(struct cgroup_subsys_state *css)
 {
 	struct task_group *tg = css_tg(css);
@@ -7221,6 +7230,7 @@ static struct cftype cpu_files[] = {
 
 struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_alloc	= cpu_cgroup_css_alloc,
+	.css_online	= cpu_cgroup_css_online,
 	.css_released	= cpu_cgroup_css_released,
 	.css_free	= cpu_cgroup_css_free,
 	.fork		= cpu_cgroup_fork,

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-02-24  9:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-26  9:41 [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online() Konstantin Khlebnikov
2017-01-26  9:41 ` [PATCH 2/2] kernfs: define name and path to "(null)" for NULL pointer kernfs nodes Konstantin Khlebnikov
2017-01-26 19:37   ` Tejun Heo
2017-01-26 21:50     ` [PATCH cgroup/for-4.10-fixes] cgroup: don't online subsystems before cgroup_name/path() are operational Tejun Heo
2017-01-26 10:17 ` [PATCH 1/2] sched/cgroup: move sched_online_group() back into css_online() Peter Zijlstra
2017-01-26 10:27   ` Konstantin Khlebnikov
2017-02-08 11:27 ` [PATCH RESEND/add comment] " Konstantin Khlebnikov
2017-02-24  9:19   ` [tip:sched/urgent] sched/cgroup: Move sched_online_group() back into css_online() to fix crash tip-bot for Konstantin Khlebnikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.