linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirill Tkhai <tkhai@yandex.ru>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Vladimir Davydov <vdavydov@parallels.com>
Subject: Re: [PATCH] sched/numa: fix unsafe get_task_struct() in task_numa_assign()
Date: Sun, 19 Oct 2014 03:13:31 +0400	[thread overview]
Message-ID: <33631413674011@web7o.yandex.ru> (raw)
In-Reply-To: <20141018205614.GA15934@redhat.com>

19.10.2014, 00:59, "Oleg Nesterov" <oleg@redhat.com>:
>  On 10/18, Kirill Tkhai wrote:
>>   18.10.2014, 01:40, "Oleg Nesterov" <oleg@redhat.com>:
>>>   ...
>>>   The
>>>   task_struct itself can't go away,
>>>   ...
>>>   --- a/kernel/sched/fair.c
>>>   +++ b/kernel/sched/fair.c
>>>   @@ -1158,7 +1158,13 @@ static void task_numa_compare(struct task_numa_env *env,
>>>
>>>            rcu_read_lock();
>>>            cur = ACCESS_ONCE(dst_rq->curr);
>>>   - if (cur->pid == 0) /* idle */
>>>   + /*
>>>   + * No need to move the exiting task, and this ensures that ->curr
>>>   + * wasn't reaped and thus get_task_struct() in task_numa_assign()
>>>   + * is safe; note that rcu_read_lock() can't protect from the final
>>>   + * put_task_struct() after the last schedule().
>>>   + */
>>>   + if (is_idle_task(cur) || (cur->flags & PF_EXITING))
>>>                    cur = NULL;
>>>
>>>            /*
>>   Oleg, I've looked once again, and now it's not good for me.
>  Ah. Thanks a lot Kirill for correcting me!
>
>  I was looking at this rcu_read_lock() and I didn't even try to think
>  what it can actually protect. Nothing.

<snip>

>  No, I don't think this can work. Let's look at the current code:
>
>          rcu_read_lock();
>          cur = ACCESS_ONCE(dst_rq->curr);
>          if (cur->pid == 0) /* idle */
>
>  And any dereference, even reading ->pid is not safe. This memory can be
>  freed, unmapped, reused, etc.
>
>  Looks like, task_numa_compare() needs to take dst_rq->lock and get the
>  refernce first.

Yeah, detection of idle is not save. If we reorder the checks almost all
problems will be gone. All except unmapping. JFI, is it possible with
such kernel structures as task_struct? I.e. do mem caches use high memory
in their bosoms?
Thanks!

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b78280c..114ec33 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1165,7 +1165,30 @@ static void task_numa_compare(struct task_numa_env *env,
 
 	rcu_read_lock();
 	cur = ACCESS_ONCE(dst_rq->curr);
-	if (cur->pid == 0) /* idle */
+	/*
+	 * No need to move the exiting task, and this ensures that ->curr
+	 * wasn't reaped and thus get_task_struct() in task_numa_assign()
+	 * is safe; note that rcu_read_lock() can't protect from the final
+	 * put_task_struct() after the last schedule().
+	 */
+	if (cur->flags & PF_EXITING)
+		cur = NULL;
+	smp_rmb(); /* Pairs with dst_rq->lock unlocking which implies smp_wmb */
+	/*
+	 * Check once again to be sure curr is still on dst_rq. Three situations
+	 * are possible here:
+	 * 1)cur has gone and been freed, and dst_rq->curr is pointing on other
+	 *   memory. In this case the check will fail;
+	 * 2)cur is pointing to a new task, which is using the memory of just
+	 *   freed cur (and it is new dst_rq->curr). It's OK, because we've
+	 *   locked RCU before the new task has been even created
+	 *   (so delayed_put_task_struct() hasn't been called);
+	 * 3)we've taken not exiting task (likely case). No need to worry.
+	 */
+	if (cur != ACCESS_ONCE(dst_rq->curr))
+		cur = NULL;
+
+	if (is_idle_task(cur))
 		cur = NULL;
 
 	/*


>  Or, perhaps, we need to change the rules to ensure that any "task_struct *"
>  pointer is rcu-safe. Perhaps we have more similar problems... I'd like to
>  avoid this if possible.

RT tree has:

https://git.kernel.org/cgit/linux/kernel/git/paulg/3.10-rt-patches.git/tree/patches/sched-delay-put-task.patch

But other problem was decided there..

>  Hmm. I'll try to think more.
>
>  Thanks!

Kirill

  reply	other threads:[~2014-10-18 23:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-15 12:31 [PATCH RFC] sched: Revert delayed_put_task_struct() and fix use after free Kirill Tkhai
2014-10-15 15:06 ` Oleg Nesterov
2014-10-15 19:40   ` Oleg Nesterov
2014-10-15 21:46     ` Kirill Tkhai
2014-10-15 22:02       ` Kirill Tkhai
2014-10-16  7:59       ` Peter Zijlstra
2014-10-16  8:16         ` Kirill Tkhai
2014-10-16  9:43           ` Peter Zijlstra
2014-10-16  9:50             ` Kirill Tkhai
2014-10-16  9:51               ` Kirill Tkhai
2014-10-16 10:04                 ` Kirill Tkhai
2014-10-17 21:34       ` Oleg Nesterov
2014-10-16  7:56     ` Peter Zijlstra
2014-10-16  8:01   ` Peter Zijlstra
2014-10-16 22:05     ` Oleg Nesterov
2014-10-17 21:36 ` [PATCH] sched/numa: fix unsafe get_task_struct() in task_numa_assign() Oleg Nesterov
2014-10-18  8:15   ` Kirill Tkhai
2014-10-18  8:33     ` Kirill Tkhai
2014-10-18 19:36       ` Peter Zijlstra
2014-10-18 21:18         ` Oleg Nesterov
2014-10-19  8:20         ` Kirill Tkhai
2014-10-18 20:56     ` Oleg Nesterov
2014-10-18 23:13       ` Kirill Tkhai [this message]
2014-10-19 19:24         ` Oleg Nesterov
2014-10-19 19:37           ` Oleg Nesterov
2014-10-19 19:43             ` Oleg Nesterov
2014-10-20  9:03               ` Kirill Tkhai
2014-10-20  9:13             ` Peter Zijlstra
2014-10-20 10:36               ` Kirill Tkhai
2014-10-20  9:00           ` Kirill Tkhai
2014-10-19 21:38         ` Peter Zijlstra
2014-10-20  8:56           ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33631413674011@web7o.yandex.ru \
    --to=tkhai@yandex.ru \
    --cc=ktkhai@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).