* [PATCH][BUG] sched/fair: child->se.parent,cfs_rq might point to invalid ones
@ 2013-09-10 9:16 ` Daisuke Nishimura
0 siblings, 0 replies; 3+ messages in thread
From: Daisuke Nishimura @ 2013-09-10 9:16 UTC (permalink / raw)
To: 'Ingo Molnar', 'Peter Zijlstra'
Cc: 'LKML', 'cgroups'
There is a small race between copy_process() and cgroup_attach_task()
where child->se.parent,cfs_rq point to invalid(old) ones.
parent doing fork() | someone moving the parent to another cgroup
-------------------------------+---------------------------------------------
copy_process()
+ dup_task_struct()
-> parent->se is copied to child->se.
se.parent,cfs_rq of them point to old ones.
cgroup_attach_task()
+ cgroup_task_migrate()
-> parent->cgroup is updated.
+ cpu_cgroup_attach()
+ sched_move_task()
+ task_move_group_fair()
+- set_task_rq()
-> se.parent,cfs_rq of parent
are updated.
+ cgroup_fork()
-> parent->cgroup is copied to child->cgroup. (*1)
+ sched_fork()
+ task_fork_fair()
-> se.parent,cfs_rq of child are accessed
while they point to old ones. (*2)
In the worst case, this bug can lead to "use-after-free" and cause panic,
because it's new cgroup's refcount that is incremented at (*1),
so the old cgroup(and related data) can be freed before (*2).
In fact, a panic caused by this bug was originally caught in RHEL6.4.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
PGD 11c7a3067 PUD 11c35f067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipv6 vhost_net macvtap macvlan tun uinput microcode virtio_balloon 8139too 8139cp mii snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Pid: 5485, comm: fork_exit Not tainted 2.6.32-358.6.1.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffff81051e3e>] [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
RSP: 0018:ffff88011ab37d30 EFLAGS: 00010046
RAX: 0000000008562577 RBX: ffff880117abf800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000400 RDI: 00000000002160ec
RBP: ffff88011ab37d50 R08: 0000000000000401 R09: 0000000000000000
R10: ffff880108346278 R11: 0000000000000000 R12: ffff88011ab37d30
R13: ffff880117e5aad8 R14: ffff880119d75a00 R15: ffff88011c1820b8
FS: 00007f5577a0b700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000001185ef000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process fork_exit (pid: 5485, threadinfo ffff88011ab36000, task ffff88011c182080)
Stack:
0000000000000400 00000000003ff004 0000000004216342 ffff880117e5aad8
<d> ffff88011ab37d70 ffffffff81051f25 ffff880117e5aaa0 ffff880028216700
<d> ffff88011ab37dc0 ffffffff81056a3a 0000000000000286 000000008103b8ac
Call Trace:
[<ffffffff81051f25>] place_entity+0x75/0xa0
[<ffffffff81056a3a>] task_fork_fair+0xaa/0x160
[<ffffffff81063c0b>] sched_fork+0x6b/0x140
[<ffffffff8106c3c2>] copy_process+0x5b2/0x1450
[<ffffffff81063b49>] ? wake_up_new_task+0xd9/0x130
[<ffffffff8106d2f4>] do_fork+0x94/0x460
[<ffffffff81072a9e>] ? sys_wait4+0xae/0x100
[<ffffffff81009598>] sys_clone+0x28/0x30
[<ffffffff8100b393>] stub_clone+0x13/0x20
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Code: 85 c9 48 8b 93 78 01 00 00 74 20 48 8b 33 48 89 c7 e8 07 ff ff ff 48 8b 9b 70 01 00 00 48 85 db 75 db 48 83 c4 10 5b 41 5c c9 c3 <48> 8b 0a 48 89 4d e0 48 8b 52 08 48 89 55 e8 48 8b 13 48 c7 45
RIP [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
RSP <ffff88011ab37d30>
CR2: 0000000000000000
Cc: <stable@vger.kernel.org>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
kernel/sched/fair.c | 14 +++++++++-----
1 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 68f1609..31cbc15 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5818,11 +5818,15 @@ static void task_fork_fair(struct task_struct *p)
cfs_rq = task_cfs_rq(current);
curr = cfs_rq->curr;
- if (unlikely(task_cpu(p) != this_cpu)) {
- rcu_read_lock();
- __set_task_cpu(p, this_cpu);
- rcu_read_unlock();
- }
+ /*
+ * Not only the cpu but also the task_group of the parent might have
+ * been changed after parent->se.parent,cfs_rq were copied to
+ * child->se.parent,cfs_rq. So call __set_task_cpu() to make those
+ * of child point to valid ones.
+ */
+ rcu_read_lock();
+ __set_task_cpu(p, this_cpu);
+ rcu_read_unlock();
update_curr(cfs_rq);
--
1.7.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH][BUG] sched/fair: child->se.parent,cfs_rq might point to invalid ones
@ 2013-09-10 9:16 ` Daisuke Nishimura
0 siblings, 0 replies; 3+ messages in thread
From: Daisuke Nishimura @ 2013-09-10 9:16 UTC (permalink / raw)
To: 'Ingo Molnar', 'Peter Zijlstra'
Cc: 'LKML', 'cgroups'
There is a small race between copy_process() and cgroup_attach_task()
where child->se.parent,cfs_rq point to invalid(old) ones.
parent doing fork() | someone moving the parent to another cgroup
-------------------------------+---------------------------------------------
copy_process()
+ dup_task_struct()
-> parent->se is copied to child->se.
se.parent,cfs_rq of them point to old ones.
cgroup_attach_task()
+ cgroup_task_migrate()
-> parent->cgroup is updated.
+ cpu_cgroup_attach()
+ sched_move_task()
+ task_move_group_fair()
+- set_task_rq()
-> se.parent,cfs_rq of parent
are updated.
+ cgroup_fork()
-> parent->cgroup is copied to child->cgroup. (*1)
+ sched_fork()
+ task_fork_fair()
-> se.parent,cfs_rq of child are accessed
while they point to old ones. (*2)
In the worst case, this bug can lead to "use-after-free" and cause panic,
because it's new cgroup's refcount that is incremented at (*1),
so the old cgroup(and related data) can be freed before (*2).
In fact, a panic caused by this bug was originally caught in RHEL6.4.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
PGD 11c7a3067 PUD 11c35f067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipv6 vhost_net macvtap macvlan tun uinput microcode virtio_balloon 8139too 8139cp mii snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Pid: 5485, comm: fork_exit Not tainted 2.6.32-358.6.1.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffff81051e3e>] [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
RSP: 0018:ffff88011ab37d30 EFLAGS: 00010046
RAX: 0000000008562577 RBX: ffff880117abf800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000400 RDI: 00000000002160ec
RBP: ffff88011ab37d50 R08: 0000000000000401 R09: 0000000000000000
R10: ffff880108346278 R11: 0000000000000000 R12: ffff88011ab37d30
R13: ffff880117e5aad8 R14: ffff880119d75a00 R15: ffff88011c1820b8
FS: 00007f5577a0b700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000001185ef000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process fork_exit (pid: 5485, threadinfo ffff88011ab36000, task ffff88011c182080)
Stack:
0000000000000400 00000000003ff004 0000000004216342 ffff880117e5aad8
<d> ffff88011ab37d70 ffffffff81051f25 ffff880117e5aaa0 ffff880028216700
<d> ffff88011ab37dc0 ffffffff81056a3a 0000000000000286 000000008103b8ac
Call Trace:
[<ffffffff81051f25>] place_entity+0x75/0xa0
[<ffffffff81056a3a>] task_fork_fair+0xaa/0x160
[<ffffffff81063c0b>] sched_fork+0x6b/0x140
[<ffffffff8106c3c2>] copy_process+0x5b2/0x1450
[<ffffffff81063b49>] ? wake_up_new_task+0xd9/0x130
[<ffffffff8106d2f4>] do_fork+0x94/0x460
[<ffffffff81072a9e>] ? sys_wait4+0xae/0x100
[<ffffffff81009598>] sys_clone+0x28/0x30
[<ffffffff8100b393>] stub_clone+0x13/0x20
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Code: 85 c9 48 8b 93 78 01 00 00 74 20 48 8b 33 48 89 c7 e8 07 ff ff ff 48 8b 9b 70 01 00 00 48 85 db 75 db 48 83 c4 10 5b 41 5c c9 c3 <48> 8b 0a 48 89 4d e0 48 8b 52 08 48 89 55 e8 48 8b 13 48 c7 45
RIP [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
RSP <ffff88011ab37d30>
CR2: 0000000000000000
Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
---
kernel/sched/fair.c | 14 +++++++++-----
1 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 68f1609..31cbc15 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5818,11 +5818,15 @@ static void task_fork_fair(struct task_struct *p)
cfs_rq = task_cfs_rq(current);
curr = cfs_rq->curr;
- if (unlikely(task_cpu(p) != this_cpu)) {
- rcu_read_lock();
- __set_task_cpu(p, this_cpu);
- rcu_read_unlock();
- }
+ /*
+ * Not only the cpu but also the task_group of the parent might have
+ * been changed after parent->se.parent,cfs_rq were copied to
+ * child->se.parent,cfs_rq. So call __set_task_cpu() to make those
+ * of child point to valid ones.
+ */
+ rcu_read_lock();
+ __set_task_cpu(p, this_cpu);
+ rcu_read_unlock();
update_curr(cfs_rq);
--
1.7.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [tip:sched/core] sched/fair: Fix small race where child-> se.parent,cfs_rq might point to invalid ones
2013-09-10 9:16 ` Daisuke Nishimura
(?)
@ 2013-09-12 18:04 ` tip-bot for Daisuke Nishimura
-1 siblings, 0 replies; 3+ messages in thread
From: tip-bot for Daisuke Nishimura @ 2013-09-12 18:04 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, peterz, stable, tglx, nishimura
Commit-ID: 6c9a27f5da9609fca46cb2b183724531b48f71ad
Gitweb: http://git.kernel.org/tip/6c9a27f5da9609fca46cb2b183724531b48f71ad
Author: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
AuthorDate: Tue, 10 Sep 2013 18:16:36 +0900
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 12 Sep 2013 19:14:14 +0200
sched/fair: Fix small race where child->se.parent,cfs_rq might point to invalid ones
There is a small race between copy_process() and cgroup_attach_task()
where child->se.parent,cfs_rq points to invalid (old) ones.
parent doing fork() | someone moving the parent to another cgroup
-------------------------------+---------------------------------------------
copy_process()
+ dup_task_struct()
-> parent->se is copied to child->se.
se.parent,cfs_rq of them point to old ones.
cgroup_attach_task()
+ cgroup_task_migrate()
-> parent->cgroup is updated.
+ cpu_cgroup_attach()
+ sched_move_task()
+ task_move_group_fair()
+- set_task_rq()
-> se.parent,cfs_rq of parent
are updated.
+ cgroup_fork()
-> parent->cgroup is copied to child->cgroup. (*1)
+ sched_fork()
+ task_fork_fair()
-> se.parent,cfs_rq of child are accessed
while they point to old ones. (*2)
In the worst case, this bug can lead to "use-after-free" and cause a panic,
because it's new cgroup's refcount that is incremented at (*1),
so the old cgroup(and related data) can be freed before (*2).
In fact, a panic caused by this bug was originally caught in RHEL6.4.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
[...]
Call Trace:
[<ffffffff81051f25>] place_entity+0x75/0xa0
[<ffffffff81056a3a>] task_fork_fair+0xaa/0x160
[<ffffffff81063c0b>] sched_fork+0x6b/0x140
[<ffffffff8106c3c2>] copy_process+0x5b2/0x1450
[<ffffffff81063b49>] ? wake_up_new_task+0xd9/0x130
[<ffffffff8106d2f4>] do_fork+0x94/0x460
[<ffffffff81072a9e>] ? sys_wait4+0xae/0x100
[<ffffffff81009598>] sys_clone+0x28/0x30
[<ffffffff8100b393>] stub_clone+0x13/0x20
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/039601ceae06$733d3130$59b79390$@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/fair.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9b3fe1c..11cd136 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5928,11 +5928,15 @@ static void task_fork_fair(struct task_struct *p)
cfs_rq = task_cfs_rq(current);
curr = cfs_rq->curr;
- if (unlikely(task_cpu(p) != this_cpu)) {
- rcu_read_lock();
- __set_task_cpu(p, this_cpu);
- rcu_read_unlock();
- }
+ /*
+ * Not only the cpu but also the task_group of the parent might have
+ * been changed after parent->se.parent,cfs_rq were copied to
+ * child->se.parent,cfs_rq. So call __set_task_cpu() to make those
+ * of child point to valid ones.
+ */
+ rcu_read_lock();
+ __set_task_cpu(p, this_cpu);
+ rcu_read_unlock();
update_curr(cfs_rq);
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-09-12 18:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-10 9:16 [PATCH][BUG] sched/fair: child->se.parent,cfs_rq might point to invalid ones Daisuke Nishimura
2013-09-10 9:16 ` Daisuke Nishimura
2013-09-12 18:04 ` [tip:sched/core] sched/fair: Fix small race where child-> se.parent,cfs_rq " tip-bot for Daisuke Nishimura
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.