linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Kautuk Consul <consul.kautuk@gmail.com>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	David Rientjes <rientjes@google.com>,
	Ionut Alexa <ionut.m.alexa@gmail.com>,
	Guillaume Morin <guillaume@morinfr.org>,
	linux-kernel@vger.kernel.org, Kirill Tkhai <tkhai@yandex.ru>
Subject: Re: [PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up()
Date: Mon, 1 Sep 2014 19:58:51 +0200	[thread overview]
Message-ID: <20140901175851.GA15210@redhat.com> (raw)
In-Reply-To: <20140901153935.GQ27892@worktop.ger.corp.intel.com>

On 09/01, Peter Zijlstra wrote:
>
> On Mon, Aug 25, 2014 at 05:57:38PM +0200, Oleg Nesterov wrote:
> > Peter, do you remember another problem with TASK_DEAD we discussed recently?
> > (prev_state == TASK_DEAD detection in finish_task_switch() still looks racy).
>
> Uhm, right. That was somewhere on the todo list :-)
>
> > I am starting to think that perhaps we need something like below, what do
> > you all think?
>
> I'm thinking you lost the hunk that adds rq::dead :-), more comments
> below.

And "goto deactivate" should be moved down, after "switch_count"
initialization.

> > +	if (unlikely(rq->dead))
> > +		goto deactivate;
> > +
>
> Yeah, it would be best to not have to do that; ideally we would be able
> to maybe do both; set rq->dead and current->state == TASK_DEAD.

To avoid spin_unlock_wait() in do_exit(). But on a second thought this
can't work, please see below.

> > --- x/kernel/exit.c
> > +++ x/kernel/exit.c
> > @@ -815,25 +815,8 @@ void do_exit(long code)
> >  		__this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied);
> >  	exit_rcu();
> >
> > -	/*
> > -	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
> > -	 * when the following two conditions become true.
> > -	 *   - There is race condition of mmap_sem (It is acquired by
> > -	 *     exit_mm()), and
> > -	 *   - SMI occurs before setting TASK_RUNINNG.
> > -	 *     (or hypervisor of virtual machine switches to other guest)
> > -	 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
> > -	 *
> > -	 * To avoid it, we have to wait for releasing tsk->pi_lock which
> > -	 * is held by try_to_wake_up()
> > -	 */
> > -	smp_mb();
> > -	raw_spin_unlock_wait(&tsk->pi_lock);
> > -
> > -	/* causes final put_task_struct in finish_task_switch(). */
> > -	tsk->state = TASK_DEAD;
> >  	tsk->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
> > -	schedule();
> > +	exit_schedule();
> >  	BUG();
> >  	/* Avoid "noreturn function does return".  */
> >  	for (;;)
>
> Yes, something like this might work fine..

Not really :/ Yes, rq->dead (or just "bool prev_dead") should obviously
solve the problem with ttwu() after the last schedule(). But only in a
sense that the dying task won't be activated.

However, the very fact that another CPU can look at this task_struct
means that we still need spin_unlock_wait(). If nothing else to ensure
that try_to_wake_up()->spin_unlock(pi_lock) won't write into the memory
we are are going to free.

So I think the comment in do exit should be updated too, and smp_mb()
should be moved under raw_spin_unlock_wait() but ...

But. If am right, doesn't this mean we that have even more problems with
postmortem wakeups??? Why ttwu() can't _start_ after spin_unlock_wait ?

Oleg.


  reply	other threads:[~2014-09-01 18:01 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-25 10:54 [PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up() Kautuk Consul
2014-08-25 15:57 ` Oleg Nesterov
2014-08-26  4:45   ` Kautuk Consul
2014-08-26 15:03     ` Oleg Nesterov
2014-09-01 15:39   ` Peter Zijlstra
2014-09-01 17:58     ` Oleg Nesterov [this message]
2014-09-01 19:09       ` Peter Zijlstra
2014-09-02 15:52         ` Oleg Nesterov
2014-09-02 16:47           ` Oleg Nesterov
2014-09-02 17:39             ` Peter Zijlstra
2014-09-03 13:36               ` Oleg Nesterov
2014-09-03 14:44                 ` Peter Zijlstra
2014-09-03 15:18                   ` Oleg Nesterov
2014-09-04  7:15                     ` Peter Zijlstra
2014-09-04 17:03                       ` Paul E. McKenney
2014-09-04  5:04                   ` Ingo Molnar
2014-09-04  6:32                     ` Peter Zijlstra
2014-09-03 16:08             ` task_numa_fault() && TASK_DEAD Oleg Nesterov
2014-09-03 16:33               ` Rik van Riel
2014-09-04  7:11               ` Peter Zijlstra
2014-09-04 10:39                 ` Oleg Nesterov
2014-09-04 19:14                   ` Hugh Dickins
2014-09-05 11:35                     ` Oleg Nesterov
2014-09-03  9:04   ` [PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up() Kirill Tkhai
2014-09-03  9:45     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140901175851.GA15210@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=consul.kautuk@gmail.com \
    --cc=guillaume@morinfr.org \
    --cc=ionut.m.alexa@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=tkhai@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).