From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1757018Ab2AXRZf (ORCPT <rfc822;w@1wt.eu>);
	Tue, 24 Jan 2012 12:25:35 -0500
Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:59106 "EHLO
	fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1756999Ab2AXRZd (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 24 Jan 2012 12:25:33 -0500
Message-ID: <4F1EE992.5000901@jp.fujitsu.com>
Date: Tue, 24 Jan 2012 12:25:38 -0500
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:9.0) Gecko/20111222 Thunderbird/9.0.1
MIME-Version: 1.0
To: peterz@infradead.org
CC: oleg@redhat.com, mingo@elte.hu, y-goto@jp.fujitsu.com, tglx@linutronix.de,
        kamezawa.hiroyu@jp.fujitsu.com, linux-kernel@vger.kernel.org
Subject: Re: [BUG] TASK_DEAD task is able to be woken up in special condition
References: <20120116205140.6120.E1E9C6FF@jp.fujitsu.com>  <1326721082.2442.234.camel@twins>  <20120117174031.3118.E1E9C6FF@jp.fujitsu.com>  <20120117090605.GD7612@elte.hu> <20120117151242.GA13290@redhat.com>  <20120118094219.GE5842@elte.hu>  <20120118142005.GB10105@redhat.com>  <1327400349.2614.10.camel@laptop> <1327402527.2614.17.camel@laptop>
In-Reply-To: <1327402527.2614.17.camel@laptop>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 1/24/2012 5:55 AM, Peter Zijlstra wrote:
> On Tue, 2012-01-24 at 11:19 +0100, Peter Zijlstra wrote:
>> On Wed, 2012-01-18 at 15:20 +0100, Oleg Nesterov wrote:
>>> do_exit() is different because it can not handle the spurious wakeup.
>>> Well, may be we can? we can simply do
>>>
>>>                 for (;;) {
>>>                         tsk->state = TASK_DEAD;
>>>                         schedule();
>>>                 }
>>>
>>> __schedule() can't race with ttwu() once it takes rq->lock. If the
>>> exiting task is deactivated, finish_task_switch() will see EXIT_DEAD.
>>
>> TASK_DEAD, right?
>>
>>> Unless I missed something, the only problem is preempt_disable(),
>>> but schedule_debug() checks ->exit_state.
>>>
>>> OTOH, if we fix this race then probably schedule_debug() should
>>> check state == EXIT_DEAD instead. 
>>
>> Hmm, interesting. On the up side that removes the need for that inf loop
>> after BUG, down side is of course that we loose the BUG itself too. Now
>> I'm not too sure we actually care about that, a task spinning at 100% in
>> x state should be fairly obvious borkage and its not like we hit this
>> thing very often.
> 
> Something like so, right? schedule_debug() already tests
> prev->exit_state so it should DTRT afaict.
> 
> Also, while going over this again, I think Yasunori-San's patch isn't
> sufficient, note how the p->state = TASK_RUNNING in ttwu_do_wakeup() can
> happen outside of p->pi_lock when the task gets queued on a remote cpu.
> 
> ---
>  kernel/exit.c |   17 +++++++++++------
>  1 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 294b170..ccd4f84 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1039,13 +1039,18 @@ void do_exit(long code)
>  		__this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied);
>  	exit_rcu();
>  	/* causes final put_task_struct in finish_task_switch(). */
> -	tsk->state = TASK_DEAD;
>  	tsk->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
> -	schedule();
> -	BUG();
> -	/* Avoid "noreturn function does return".  */
> -	for (;;)
> -		cpu_relax();	/* For when BUG is null */
> +	for (;;) {
> +		/*
> +		 * A spurious wakeup, eg. generated by rwsem when down()'s call
> +		 * to schedule() doesn't happen but the wakeup from the
> +		 * previous owner's up() did, can stomp on our ->state.
> +		 *
> +		 * This loop also avoids "noreturn functions does return"
> +		 */
> +		tsk->state = TASK_DEAD;
> +		schedule();
> +	}
>  }

This looks ok to me.