From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755983Ab2AXKzd (ORCPT ); Tue, 24 Jan 2012 05:55:33 -0500 Received: from merlin.infradead.org ([205.233.59.134]:44935 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755677Ab2AXKzb (ORCPT ); Tue, 24 Jan 2012 05:55:31 -0500 Subject: Re: [BUG] TASK_DEAD task is able to be woken up in special condition From: Peter Zijlstra To: Oleg Nesterov Cc: Ingo Molnar , Yasunori Goto , Thomas Gleixner , Hiroyuki KAMEZAWA , Motohiro Kosaki , Linux Kernel ML In-Reply-To: <1327400349.2614.10.camel@laptop> References: <20120116205140.6120.E1E9C6FF@jp.fujitsu.com> <1326721082.2442.234.camel@twins> <20120117174031.3118.E1E9C6FF@jp.fujitsu.com> <20120117090605.GD7612@elte.hu> <20120117151242.GA13290@redhat.com> <20120118094219.GE5842@elte.hu> <20120118142005.GB10105@redhat.com> <1327400349.2614.10.camel@laptop> Content-Type: text/plain; charset="UTF-8" Date: Tue, 24 Jan 2012 11:55:27 +0100 Message-ID: <1327402527.2614.17.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2012-01-24 at 11:19 +0100, Peter Zijlstra wrote: > On Wed, 2012-01-18 at 15:20 +0100, Oleg Nesterov wrote: > > do_exit() is different because it can not handle the spurious wakeup. > > Well, may be we can? we can simply do > > > > for (;;) { > > tsk->state = TASK_DEAD; > > schedule(); > > } > > > > __schedule() can't race with ttwu() once it takes rq->lock. If the > > exiting task is deactivated, finish_task_switch() will see EXIT_DEAD. > > TASK_DEAD, right? > > > Unless I missed something, the only problem is preempt_disable(), > > but schedule_debug() checks ->exit_state. > > > > OTOH, if we fix this race then probably schedule_debug() should > > check state == EXIT_DEAD instead. > > Hmm, interesting. On the up side that removes the need for that inf loop > after BUG, down side is of course that we loose the BUG itself too. Now > I'm not too sure we actually care about that, a task spinning at 100% in > x state should be fairly obvious borkage and its not like we hit this > thing very often. Something like so, right? schedule_debug() already tests prev->exit_state so it should DTRT afaict. Also, while going over this again, I think Yasunori-San's patch isn't sufficient, note how the p->state = TASK_RUNNING in ttwu_do_wakeup() can happen outside of p->pi_lock when the task gets queued on a remote cpu. --- kernel/exit.c | 17 +++++++++++------ 1 files changed, 11 insertions(+), 6 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 294b170..ccd4f84 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1039,13 +1039,18 @@ void do_exit(long code) __this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied); exit_rcu(); /* causes final put_task_struct in finish_task_switch(). */ - tsk->state = TASK_DEAD; tsk->flags |= PF_NOFREEZE; /* tell freezer to ignore us */ - schedule(); - BUG(); - /* Avoid "noreturn function does return". */ - for (;;) - cpu_relax(); /* For when BUG is null */ + for (;;) { + /* + * A spurious wakeup, eg. generated by rwsem when down()'s call + * to schedule() doesn't happen but the wakeup from the + * previous owner's up() did, can stomp on our ->state. + * + * This loop also avoids "noreturn functions does return" + */ + tsk->state = TASK_DEAD; + schedule(); + } } EXPORT_SYMBOL_GPL(do_exit);