All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][BUG] sched/fair: child->se.parent,cfs_rq might point to invalid ones
@ 2013-09-10  9:16 ` Daisuke Nishimura
  0 siblings, 0 replies; 3+ messages in thread
From: Daisuke Nishimura @ 2013-09-10  9:16 UTC (permalink / raw)
  To: 'Ingo Molnar', 'Peter Zijlstra'
  Cc: 'LKML', 'cgroups'

There is a small race  between copy_process() and cgroup_attach_task()
where child->se.parent,cfs_rq point to invalid(old) ones.

        parent doing fork()      | someone moving the parent to another cgroup
  -------------------------------+---------------------------------------------
    copy_process()
      + dup_task_struct()
        -> parent->se is copied to child->se.
           se.parent,cfs_rq of them point to old ones.

                                     cgroup_attach_task()
                                       + cgroup_task_migrate()
                                         -> parent->cgroup is updated.
                                       + cpu_cgroup_attach()
                                         + sched_move_task()
                                           + task_move_group_fair()
                                             +- set_task_rq()
                                                -> se.parent,cfs_rq of parent
                                                   are updated.

      + cgroup_fork()
        -> parent->cgroup is copied to child->cgroup. (*1)
      + sched_fork()
        + task_fork_fair()
          -> se.parent,cfs_rq of child are accessed
             while they point to old ones. (*2)

In the worst case, this bug can lead to "use-after-free" and cause panic,
because it's new cgroup's refcount that is incremented at (*1),
so the old cgroup(and related data) can be freed before (*2).
In fact, a panic caused by this bug was originally caught in RHEL6.4.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
    PGD 11c7a3067 PUD 11c35f067 PMD 0
    Oops: 0000 [#1] SMP
    last sysfs file: /sys/kernel/mm/ksm/run
    CPU 0
    Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipv6 vhost_net macvtap macvlan tun uinput microcode virtio_balloon 8139too 8139cp mii snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

    Pid: 5485, comm: fork_exit Not tainted 2.6.32-358.6.1.el6.x86_64 #1 Red Hat KVM
    RIP: 0010:[<ffffffff81051e3e>]  [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
    RSP: 0018:ffff88011ab37d30  EFLAGS: 00010046
    RAX: 0000000008562577 RBX: ffff880117abf800 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000400 RDI: 00000000002160ec
    RBP: ffff88011ab37d50 R08: 0000000000000401 R09: 0000000000000000
    R10: ffff880108346278 R11: 0000000000000000 R12: ffff88011ab37d30
    R13: ffff880117e5aad8 R14: ffff880119d75a00 R15: ffff88011c1820b8
    FS:  00007f5577a0b700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000000 CR3: 00000001185ef000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process fork_exit (pid: 5485, threadinfo ffff88011ab36000, task ffff88011c182080)
    Stack:
     0000000000000400 00000000003ff004 0000000004216342 ffff880117e5aad8
    <d> ffff88011ab37d70 ffffffff81051f25 ffff880117e5aaa0 ffff880028216700
    <d> ffff88011ab37dc0 ffffffff81056a3a 0000000000000286 000000008103b8ac
    Call Trace:
     [<ffffffff81051f25>] place_entity+0x75/0xa0
     [<ffffffff81056a3a>] task_fork_fair+0xaa/0x160
     [<ffffffff81063c0b>] sched_fork+0x6b/0x140
     [<ffffffff8106c3c2>] copy_process+0x5b2/0x1450
     [<ffffffff81063b49>] ? wake_up_new_task+0xd9/0x130
     [<ffffffff8106d2f4>] do_fork+0x94/0x460
     [<ffffffff81072a9e>] ? sys_wait4+0xae/0x100
     [<ffffffff81009598>] sys_clone+0x28/0x30
     [<ffffffff8100b393>] stub_clone+0x13/0x20
     [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
    Code: 85 c9 48 8b 93 78 01 00 00 74 20 48 8b 33 48 89 c7 e8 07 ff ff ff 48 8b 9b 70 01 00 00 48 85 db 75 db 48 83 c4 10 5b 41 5c c9 c3 <48> 8b 0a 48 89 4d e0 48 8b 52 08 48 89 55 e8 48 8b 13 48 c7 45
    RIP  [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
     RSP <ffff88011ab37d30>
    CR2: 0000000000000000

Cc: <stable@vger.kernel.org>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 kernel/sched/fair.c |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 68f1609..31cbc15 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5818,11 +5818,15 @@ static void task_fork_fair(struct task_struct *p)
 	cfs_rq = task_cfs_rq(current);
 	curr = cfs_rq->curr;
 
-	if (unlikely(task_cpu(p) != this_cpu)) {
-		rcu_read_lock();
-		__set_task_cpu(p, this_cpu);
-		rcu_read_unlock();
-	}
+	/*
+	 * Not only the cpu but also the task_group of the parent might have
+	 * been changed after parent->se.parent,cfs_rq were copied to
+	 * child->se.parent,cfs_rq. So call __set_task_cpu() to make those
+	 * of child point to valid ones.
+	 */
+	rcu_read_lock();
+	__set_task_cpu(p, this_cpu);
+	rcu_read_unlock();
 
 	update_curr(cfs_rq);
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH][BUG] sched/fair: child->se.parent,cfs_rq might point to invalid ones
@ 2013-09-10  9:16 ` Daisuke Nishimura
  0 siblings, 0 replies; 3+ messages in thread
From: Daisuke Nishimura @ 2013-09-10  9:16 UTC (permalink / raw)
  To: 'Ingo Molnar', 'Peter Zijlstra'
  Cc: 'LKML', 'cgroups'

There is a small race  between copy_process() and cgroup_attach_task()
where child->se.parent,cfs_rq point to invalid(old) ones.

        parent doing fork()      | someone moving the parent to another cgroup
  -------------------------------+---------------------------------------------
    copy_process()
      + dup_task_struct()
        -> parent->se is copied to child->se.
           se.parent,cfs_rq of them point to old ones.

                                     cgroup_attach_task()
                                       + cgroup_task_migrate()
                                         -> parent->cgroup is updated.
                                       + cpu_cgroup_attach()
                                         + sched_move_task()
                                           + task_move_group_fair()
                                             +- set_task_rq()
                                                -> se.parent,cfs_rq of parent
                                                   are updated.

      + cgroup_fork()
        -> parent->cgroup is copied to child->cgroup. (*1)
      + sched_fork()
        + task_fork_fair()
          -> se.parent,cfs_rq of child are accessed
             while they point to old ones. (*2)

In the worst case, this bug can lead to "use-after-free" and cause panic,
because it's new cgroup's refcount that is incremented at (*1),
so the old cgroup(and related data) can be freed before (*2).
In fact, a panic caused by this bug was originally caught in RHEL6.4.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
    PGD 11c7a3067 PUD 11c35f067 PMD 0
    Oops: 0000 [#1] SMP
    last sysfs file: /sys/kernel/mm/ksm/run
    CPU 0
    Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipv6 vhost_net macvtap macvlan tun uinput microcode virtio_balloon 8139too 8139cp mii snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

    Pid: 5485, comm: fork_exit Not tainted 2.6.32-358.6.1.el6.x86_64 #1 Red Hat KVM
    RIP: 0010:[<ffffffff81051e3e>]  [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
    RSP: 0018:ffff88011ab37d30  EFLAGS: 00010046
    RAX: 0000000008562577 RBX: ffff880117abf800 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000400 RDI: 00000000002160ec
    RBP: ffff88011ab37d50 R08: 0000000000000401 R09: 0000000000000000
    R10: ffff880108346278 R11: 0000000000000000 R12: ffff88011ab37d30
    R13: ffff880117e5aad8 R14: ffff880119d75a00 R15: ffff88011c1820b8
    FS:  00007f5577a0b700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000000 CR3: 00000001185ef000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process fork_exit (pid: 5485, threadinfo ffff88011ab36000, task ffff88011c182080)
    Stack:
     0000000000000400 00000000003ff004 0000000004216342 ffff880117e5aad8
    <d> ffff88011ab37d70 ffffffff81051f25 ffff880117e5aaa0 ffff880028216700
    <d> ffff88011ab37dc0 ffffffff81056a3a 0000000000000286 000000008103b8ac
    Call Trace:
     [<ffffffff81051f25>] place_entity+0x75/0xa0
     [<ffffffff81056a3a>] task_fork_fair+0xaa/0x160
     [<ffffffff81063c0b>] sched_fork+0x6b/0x140
     [<ffffffff8106c3c2>] copy_process+0x5b2/0x1450
     [<ffffffff81063b49>] ? wake_up_new_task+0xd9/0x130
     [<ffffffff8106d2f4>] do_fork+0x94/0x460
     [<ffffffff81072a9e>] ? sys_wait4+0xae/0x100
     [<ffffffff81009598>] sys_clone+0x28/0x30
     [<ffffffff8100b393>] stub_clone+0x13/0x20
     [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
    Code: 85 c9 48 8b 93 78 01 00 00 74 20 48 8b 33 48 89 c7 e8 07 ff ff ff 48 8b 9b 70 01 00 00 48 85 db 75 db 48 83 c4 10 5b 41 5c c9 c3 <48> 8b 0a 48 89 4d e0 48 8b 52 08 48 89 55 e8 48 8b 13 48 c7 45
    RIP  [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
     RSP <ffff88011ab37d30>
    CR2: 0000000000000000

Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
---
 kernel/sched/fair.c |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 68f1609..31cbc15 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5818,11 +5818,15 @@ static void task_fork_fair(struct task_struct *p)
 	cfs_rq = task_cfs_rq(current);
 	curr = cfs_rq->curr;
 
-	if (unlikely(task_cpu(p) != this_cpu)) {
-		rcu_read_lock();
-		__set_task_cpu(p, this_cpu);
-		rcu_read_unlock();
-	}
+	/*
+	 * Not only the cpu but also the task_group of the parent might have
+	 * been changed after parent->se.parent,cfs_rq were copied to
+	 * child->se.parent,cfs_rq. So call __set_task_cpu() to make those
+	 * of child point to valid ones.
+	 */
+	rcu_read_lock();
+	__set_task_cpu(p, this_cpu);
+	rcu_read_unlock();
 
 	update_curr(cfs_rq);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [tip:sched/core] sched/fair: Fix small race where child-> se.parent,cfs_rq might point to invalid ones
  2013-09-10  9:16 ` Daisuke Nishimura
  (?)
@ 2013-09-12 18:04 ` tip-bot for Daisuke Nishimura
  -1 siblings, 0 replies; 3+ messages in thread
From: tip-bot for Daisuke Nishimura @ 2013-09-12 18:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, stable, tglx, nishimura

Commit-ID:  6c9a27f5da9609fca46cb2b183724531b48f71ad
Gitweb:     http://git.kernel.org/tip/6c9a27f5da9609fca46cb2b183724531b48f71ad
Author:     Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
AuthorDate: Tue, 10 Sep 2013 18:16:36 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 12 Sep 2013 19:14:14 +0200

sched/fair: Fix small race where child->se.parent,cfs_rq might point to invalid ones

There is a small race between copy_process() and cgroup_attach_task()
where child->se.parent,cfs_rq points to invalid (old) ones.

        parent doing fork()      | someone moving the parent to another cgroup
  -------------------------------+---------------------------------------------
    copy_process()
      + dup_task_struct()
        -> parent->se is copied to child->se.
           se.parent,cfs_rq of them point to old ones.

                                     cgroup_attach_task()
                                       + cgroup_task_migrate()
                                         -> parent->cgroup is updated.
                                       + cpu_cgroup_attach()
                                         + sched_move_task()
                                           + task_move_group_fair()
                                             +- set_task_rq()
                                                -> se.parent,cfs_rq of parent
                                                   are updated.

      + cgroup_fork()
        -> parent->cgroup is copied to child->cgroup. (*1)
      + sched_fork()
        + task_fork_fair()
          -> se.parent,cfs_rq of child are accessed
             while they point to old ones. (*2)

In the worst case, this bug can lead to "use-after-free" and cause a panic,
because it's new cgroup's refcount that is incremented at (*1),
so the old cgroup(and related data) can be freed before (*2).

In fact, a panic caused by this bug was originally caught in RHEL6.4.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [<ffffffff81051e3e>] sched_slice+0x6e/0xa0
    [...]
    Call Trace:
     [<ffffffff81051f25>] place_entity+0x75/0xa0
     [<ffffffff81056a3a>] task_fork_fair+0xaa/0x160
     [<ffffffff81063c0b>] sched_fork+0x6b/0x140
     [<ffffffff8106c3c2>] copy_process+0x5b2/0x1450
     [<ffffffff81063b49>] ? wake_up_new_task+0xd9/0x130
     [<ffffffff8106d2f4>] do_fork+0x94/0x460
     [<ffffffff81072a9e>] ? sys_wait4+0xae/0x100
     [<ffffffff81009598>] sys_clone+0x28/0x30
     [<ffffffff8100b393>] stub_clone+0x13/0x20
     [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/039601ceae06$733d3130$59b79390$@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9b3fe1c..11cd136 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5928,11 +5928,15 @@ static void task_fork_fair(struct task_struct *p)
 	cfs_rq = task_cfs_rq(current);
 	curr = cfs_rq->curr;
 
-	if (unlikely(task_cpu(p) != this_cpu)) {
-		rcu_read_lock();
-		__set_task_cpu(p, this_cpu);
-		rcu_read_unlock();
-	}
+	/*
+	 * Not only the cpu but also the task_group of the parent might have
+	 * been changed after parent->se.parent,cfs_rq were copied to
+	 * child->se.parent,cfs_rq. So call __set_task_cpu() to make those
+	 * of child point to valid ones.
+	 */
+	rcu_read_lock();
+	__set_task_cpu(p, this_cpu);
+	rcu_read_unlock();
 
 	update_curr(cfs_rq);
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-09-12 18:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-10  9:16 [PATCH][BUG] sched/fair: child->se.parent,cfs_rq might point to invalid ones Daisuke Nishimura
2013-09-10  9:16 ` Daisuke Nishimura
2013-09-12 18:04 ` [tip:sched/core] sched/fair: Fix small race where child-> se.parent,cfs_rq " tip-bot for Daisuke Nishimura

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.